How to Analyze Data Using Microsoft Excel

For any profession, the ability to analyze data using Microsoft Excel is essential.

In this post I’ll show you how I was able to look at the Olympic Gold Medal winners (by country), but rank them according to their population or GDP. You’ll be surprised to see the new angle of this.

Now I’ll outline exactly how I did this, because this type of analysis is not beyond the average Excel user.

First I had to accumulate my data sources. Wikipedia is always a good place to start. I did a simple Google search for ‘country medals 2012’, ‘country population’ and ‘country gdp’. Pretty simple, huh? For more difficult searching, check out this article on how to optimize your Google searching.¬†

I found my 3 lists, with the country names (highlighted in red) being the common denominator across each list. Simply copy paste into Excel, with one list next to the other. This will help me consolidate into a single list later on. This is also why I find Wikipedia useful – because most of their lists follow a similar format and thus you don’t have to tweak the data as much.


The next step is to create a single list, although before we get there we need to make sure our common denominator (country) is identical across each list. It’s difficult to tell in this frame, but each country in the Population and GDP list has a space before the country. This simple thing would prevent your lists from consolidating correctly.

To fix it, we’ll use a formula to delete the first character of these cells. Before that, let’s create a bunch of new columns so that we’re all set to build out our analysis.

Notice I expanded the first list to include 2 columns – one for population and one for GDP. Later we’ll use a vlookup formula to pull in the values from the other lists.

To delete the first character in the country cells (for the Population and GDP lists), we’ll use the MID formula, which returns the characters from the middle of a text string, given a starting position and length. We’ll also include the LEN formula, which returns the number of characters in a text string. Enter this in J3:


L3 is in the Population list, and it’s the first country listed (” China”). By declaring L3 and then the number 2, we are telling the formula to start in L3, and within L3 to look at the number 2 character. In this case the number 1 character is the blank space, so it will start at “C”. The third part of the MID formula is for the number of characters you want to return (starting at the 2nd character, how many MORE do you want to return). By using LEN, we are telling the formula to return however many characters are actually in the text string ” China”, starting at the 2nd character (“C”). Try this out for the GDP countries as well, and drag the formulas down.

Now that we have a common denominator of country names across all 3 lists, we’ll use vlookup to pull in values for Population and GDP into the first list. Enter this in G3:

=VLOOKUP(B3, $J3:$P244, 4, 0)

What vlookup does is use a lookup value (in this case, B3, which is “United States”) to scan through other lists (Population or GDP lists) and then return a corresponding value (actual population or actual GDP). So what we are asking Excel to do here is look up B3 (“United States”) in the table array¬†$J3:$P244 (entire list for Population). Note here that the first column of your table array (J) needs to be the column that contains your common denominator (country name “United States”). Also note the dollar signs in the array – this is simply to refer to a static table array, rather than shifting the table downwards as you drag your formula down.

Once it finds it (happens to be in J5), we have told the formula to retrieve the 4th column value (which, starting at column J, is returning column M “Population”). The last part of vlookup is rarely used and so I won’t bother to explain. Just put in the number zero. Now, in G3 you should hopefully return this value: 314,168,000. Drag it down and now try to apply the same formula to column H for GDP.

Here’s a snapshot of where we’re at now:

Hopefully you noticed some errors in columns G and H, in the form of “#N/A”. This is common in data analysis, and requires a little manual intervention. If you look at H4, you’ll see this error for the GDP value for China. This is explained by the fact that China is listed as ‘China, People’s Republic’ in the GDP list. Again, this is normal when doing analysis, and there’s no easy way around it for some situations. So what we need to do is filter columns G and H for the “#N/A” values and identify where the issue comes from. Most should be explanatory (another example is G5 and H5 for “Great Britain”, which is listed as “United Kingdom” the other lists).

To further our analysis, we want to compare our 3 lists to one another by using our extended list. We’ll create 4 additional columns.

  1. Gold Medals per Capita
  2. Total Medals per Capita
  3. Price per Gold Medal
  4. Price per Medal

The formulas for these columns will be simple: either Population divided by Gold/Total Medals, or GDP divided by Gold/Total Medals.

Your values should look like this:

If we highlight our country column, and highlight the Gold Medals per Capita column, we can create a helpful scatter plot to illustrate some data analysis. Once you’ve highlighted, go to Insert > Scatter. At this point you should have a graph like the one below, which shows how some countries (ie: Lithuania), although small, performed extremely well considering their actual population. You can create a similar graph for the other columns we created, but for purposes of this post I only explain this one.

To crunch a few numbers, Lithuania secured a gold medal for about every 2 million people who live there. If the USA had that type of efficiency, it’d take home over 150 golds as opposed to 46. China by comparison would take home 637 golds!

Please contact me for specific questions or comments about this.