How many times have you received a dataset that seems organized until you start looking at the details? Specifically, you need a sum aggregation for certain item. The problem is that the item in question was not consistently entered. Inspecting the description field, you realize that the item in question is sometime mentioned at the beginning, middle or end of the entry; sometime even misspelled. In any case, going through thousands of rows using sorting, filtering and color coding seems like a herculean task. So what do you do?
When working with a spreadsheet, there are several options to get the result you seek. Some will take 10-15 minutes, while others will take several hours. The trick is to be creative in applying spreadsheet techniques relating to tagging. Through the use of a few string functions and/or pivot table, you can find a quick and effective approach.
Let's take the spreadsheet shown below. The goal is to sum the total purchases for video game XYZ. The problem is that video game XYZ is listed in the dataset in different ways. No problem through. In this simple situation you have two options:
1. Use a pivot table to aggregate the data by description entry and quickly add the results manually or using the magic of Ctrl/Command select; not a very fun approach
2. Use a few string functions to stag the data
A |
B |
|
1 |
Description |
Purchases |
2 |
XYZ Video Game |
12 |
3 |
Video Game XYZ |
6 |
4 |
Game XYZ Version 1 |
7 |
5 |
Game ABC |
15 |
6 |
Game DEF |
9 |
7 |
Video Game XYZ |
10 |
8 |
Game DEF Verion 1 |
11 |
9 |
XYZ Video Game |
12 |
Option 1 - Pivot Table (boring)
Selecting the dataset above, you can easily create a pivot table and get the result below. At this point it is a matter of adding the results (47).
A |
SUM of B |
||
Description |
0 |
||
Game ABC |
15 |
||
Game DEF |
9 |
||
Game DEF Verion 1 |
11 |
||
Game XYZ Version 1 |
7 |
||
Video Game XYZ |
16 |
||
XYZ Video Game |
24 |
sum |
47 |
Grand Total |
82 |
Option 2 - String Functions to Create a New Description Column
Below is the result of creating a new description column by using the left, right, and mid string functions. One caveat with this example is that you know about the different iterations of the entry for video game XYZ. However, the more important point of using this approach is getting comfortable with these functions for tagging a dataset, and perhaps applying a scalable solution to a new or growing dataset. This end goal results in a transformed dataset. Additionally, using these string functions provides a bridge to working with databases in the future. Eventually large datasets grow beyond the limits of a spreadsheet.
A |
B |
C |
D |
E |
F |
|
1 |
Description |
Purchases |
D1 |
D2 |
D3 |
New Descr |
2 |
XYZ Video Game |
12 |
XYZ->if(left(B2,3)="XYZ","XYZ","") |
if(right(B2,3)="XYZ","XYZ","") |
if(mid(B2,6,3)="XYZ","XYZ","") |
XYZ->D2&E2&F2 |
3 |
Video Game XYZ |
6 |
XYZ |
XYZ |
||
4 |
Game XYZ Version 1 |
7 |
XYZ |
XYZ |
||
5 |
Game ABC |
15 |
||||
6 |
Game DEF |
9 |
||||
7 |
Video Game XYZ |
10 |
XYZ |
XYZ |
||
8 |
Game DEF Verion 1 |
11 |
||||
9 |
XYZ Video Game |
12 |
XYZ |
XYZ |
When it comes to spreadsheet work, there are various options when it comes to sorting and aggregating information. The fact of the matter is that spreadsheets require a lot of ad hoc work. Scalable solutions for working with data ultimately require a database. However, by learning and applying different techniques for tagging datasets, you will be well on your way to taking your spreadsheet work to the next level.