Over the years I've used a variety of Linux Distros: Mandrake, Red Hat, FreeBSD, Fedora, Gentoo, and Ubuntu. Distrowatch keeps track of everything we need to know about the distros, and recently there has been an enormous push in desktop Linux thanks to Dell putting Ubuntu on desktops and Compiz-Fusion bringing snazzy eye candy to even low end machines. Distrowatch gives some pretty decent stats on the main Distros but for a while I've wanted to know how Google sees their popularity; mainly by how many pages mention specific distros.
Using some python, a spreadsheet, and a little scraping, I was able to get my answer. To see how Google would rank different distros I'm using the number of results Google returns when searching for the Distro's name as my numbers. I'm going to write a HOWTO on the technical aspects of what I did sometime this week, but here are the basic steps
- In a Google Spreadsheet I made a sheet that held the names of the top distros on Distrowatch.
- Another sheet holds the full list from distro watch (366 on record at the time of this writing)
- I set up a dapp to take these names, and return the number of results Google would have if you searched them
- A python script pulls the distros out of the spreadsheet, queries the dapp, and puts the results back into another sheet I have 2 sets of results. One is a query using the vanilla list out of the spreadsheet. The second is appending the word Linux to the distro if it does not already have it as the title, I was curious as to how this would effect the results. Below are the results of the most popular Distros on Distrowatch. Look, Ubuntu! The spreadsheet that has all of the findings (and all 366 distros) is shared here
Stay tuned for the code behind it! Subscribe to the feed to get more updates.