Googler Daniel Russell knows how to find the answers to questions you can't get to with a simple Google query. In his weekly Search Research column, Russell issues a search challenge, then follows up later in the week with his solution—using whatever search technology and methodology fits the bill. This week's challenge: Are there more languages above or below the equator?
The question was:
As I said, a big part of answering questions like this is making your terms very clear. When we decide to consider only "Official" languages, and not, say, every small language spoken in New Guinea (which has thousands) or every language spoken in central Africa (more thousands), we REALLY simplify the problem.
What this lets us do is search for a list of "Official languages." So just doing a search like [official languages] quickly leads to the Wikipedia page of OfficialLanguages. And that's not a bad place to start.
But by looking around, I found that search ALSO leads to a languages-by-country table on Infoplease with more-or-less the same data in a somewhat more convenient form. (Another search that works would have been [ list of languages by country ].)
So I now have a table from Infoplease that looks like this:
To convert it into a form that we can manipulate, I just copied out the text of the table and saved it into a plain text file. Handily, the copy/paste into a file automatically inserts TABs between the columns. This will make it simple to import into a spreadsheet.
Now I need to figure out which of these countries have their capital in the northern hemisphere. So I THEN did a search for [ country capitals latlong ] and found another handy table of country capitals with their associated lat-longs!
I did the copy/paste to get this data from their web page into my plain text file. I then imported that into another spreadsheet.
Now I've got two spreadsheets, one with a list of languages-by-country, and another with country-capitals and their lat-longs.
From the Lang-by-country spreadsheet I can easily write a spreadsheet function to count the number of languages in each country.
A typical row looks like this:
Afghanistan, Dari(Persian), Pashtu (both official),other Turkic and minor languages
The 199 languages of each country in this table are separated by commas. So, I write a short spreadsheet function to count commas in each row, and-voila!-I have a table of Country / Number-of-languages. (I'm happy to let the phrase "…other Turkic and minor languages" be counted as 1 extra language. You'll see it won't matter in the end.) By this method, I see that Afghanistan has 3 languages. Yes, there are others, but there are only 2 official languages (and many non-official languages).
In my other spreadsheet I have a table of 200 Country / Capital / Lat-longs.
Obviously what I want to do is to use the Lat-longs to determine if the country is ABOVE or BELOW the equator, and then just count the number of languages in each group.
This is exactly what Google Fusion Tables are designed to do.
A Fusion Table can take two different tables that share a common element (in this case, the column labeled "Country") and JOIN them together. That's 200 copy/paste operations I won't have to do by hand.
I created two Fusion Tables, one for each of my two spreadsheets. Using Fusion Tables "Merge" function, I just joined them together on their shared column "Country."
I opened the Country-LatLong table and then FUSED it with the Country-Language-Count table. This basically merges the two tables so that I now have new table with columns for Country / Lat-long / Number of languages!
Once I have this, it's an easy sort of the table to put all of the countries above the equator (that is, their Lat-long contains an "N" in the entry), and then write a simple =SUM(…) function to add them all up.
Here's what my table looks like now. I converted the presence/absence of an "N" in the lat-long into a 1 (for ABOVE) or 0 (for BELOW).
And now we can just read off the answer: There are 546 official languages spoken above the equator, but only 164 spoken below. That's a big difference.
I'll leave it to you to speculate why that is. But at least we've got a little data to go on.
And, since I was in Fusion Tables, it was trivial to visualize the data by mapping the number of languages onto the map-of-the-world and create this map. The darker the green, the more languages spoken there. As you can see, the north has a LOT more than the south.
SEARCH LESSON: Often you'll find that data is out there on the web, but it's in an un-handy form. In this case, the information we wanted wasn't immediately available. But with a little data extraction, spreadsheet jockeying and using Fusion Tables to merge two data sets, we can get to what we want fairly quickly.
Keep your eyes open for these data-merge possibilities. Once you start looking around, you'll see all kinds of merging that leads to insights about the world.
Daniel M. Russell studies the way people search and research—an anthropologist of search, if you will. You can read more from Russell on his SearchReSearch blog, and stay tuned for his weekly challenges (and answers) here on Lifehacker.