Some time back, Shane Devenshire and I added our two cents to a question a user asked on LinkedIn here. So when I decided to write a post regarding UnPivot using nested For..Next Loops, it only made sense to ask Shane if he would like to write a guest-post and tutorial on his process and Normalization Utility.
So, without further ado, take it away Shane!
Un-pivoting Excel Spreadsheet data – Creating a Normalized Table
Introduction:
Recently I was working with Tableau and discovered that the company had an Excel add-in for taking data laid out like a pivot table or standard spreadsheet layout:
And converting it to data laid out like a table (normalized):
When I tested the add-in on a large data it took over an hour. I knew I could do this faster, even manually, but I decided to automate the process, which uses a number of Excel’s built-in features. The attached is the initial result of this endeavor.
When one wants to convert spreadsheet data, laid out as in a pivot table, into what is called a normalized database or table one can use a VBA approach which loops through cells rearranging the data and copying repeated data as many times as necessary. You could do this manually but it’s rather labor intensive. However, even the looping approach is not very efficient, which becomes apparent when the data sets are large. An alternate approach using some of Excel’s less familiar features can speed up the process dramatically.
I created the three step wizard shown below to provide some enhancements:
In the first step, see Figure 1, the user indicates the top left corner of the values area, the sample show what we are looking for. This allows the VBA routine to decide where the column titles are and which columns will be retained in their default structure and which column will be “transposed” (placed in a singe “values” column.
Figure 1 – First step of the Wizard:
If the user clicks Finish, the code accepts all the defaults and created the normalized table. If the user clicks Next the second step of the Wizard is displayed, see Figure 2.
Figure 2 – This is the second step of the Wizard:
Here the Wizard displays the first ten fields to the left of the values area allowing the user to choose which ones to include. If there are dates at the top of the values area a default name for the new field is Date which is automatically entered in the Field Name box which the user can change or leave blank. The Delimiter box allows the user to indicate what delimiter to use when concatenating the fields together (discussed later). The delimiter should not be a character found in the chosen fields on the left in the dialog box.
Clicking Next brings you to the third step of the wizard, shown in Figure 3.
Figure 3 – This is the third step of the Wizard:
The first two options on this screen allow the user to retain or remove rows of data in the output table which have either a 0 (zero) or are blank in the values column. This reduces the file size when appropriate. Because the Excel feature used to normalize the data generates an Excel “table”, the last three options allow the user to turn on or off various features of tables or convert the table to a range.
Overview:
You can create a pivot table in Excel with a feature called Multiple Consolidation Ranges. You can then use the Show Details command on the Grand/Grand Total to produce a normalized table. Both of these commands are very fast! One limitation of this command is that is was designed to be used with data which has only one label field to the left of the values area. To bypass this limitation one can combine all the label fields by using a concatenation formula, but to later break it apart a delimiter is need. Then the multiple consolidation command is run against this one field and all of the value columns. The concatenated field of the output is then parsed using Excel’s Text to Columns command with the delimiter specified by the user. These steps are very fast. As noted in the discussion of step 3 of the Wizard, I have also incorporated a number of additional options.
I create two modules and a user form to control this process. All of the code and form can be found in the attached file. I have tried not to obfuscate the code. There are four main components to consider
Option Explicit Dim k As Integer number of fields to be output Sub NormalizeData() Dim sTempSheet As String Name of the sheet where the pivot table is placed Dim j As Integer loop counter Dim i As Integer loop counter Dim sMyFormula As String the concatenated formula Dim sMySheet As String Name of the sheet where initial data is Dim lMyColumn As Long column number of column used for concatenated formula Dim sDataSheet As String Name of the sheet where the final output is placed Dim sSource As String Source range for the pivot table Dim sMyTitles As String The concatenated titles 'Speeds things up With Application .ScreenUpdating = False .Calculation = xlCalculationManual End With sMySheet = ActiveSheet.Name 'uses the ref edit address to place the cursor in the top left corner of the values to be transposed Range(frmNormalize.refFirstData).Activate 'Inserts new blank column for concatenation formula ActiveCell.EntireColumn.Insert lMyColumn = ActiveCell.Column 'initializes loop counters j = 1 k = 0 'loops through each column and concatenates it if that column was choosen 'this loop runs a max of ten times For i = iTotCols To 1 Step -1 If frmNormalize.Controls("Checkbox" & j) = True Then If sMyFormula = "" Then sMyFormula = "=RC[" & -i & "]" Else sMyFormula = sMyFormula & "&" & """" & sDelimiter & """" & "&RC[" & -i & "]" End If k = k + 1 End If j = j + 1 Next i
The above code creates a concatenation formula of the forma =A5&”*”&B5&”*”&C5, which is then converted to values like 2008*Coffee*Seattle. The above code loops once to build the concatenated formula for a single row, and then fills it down. All very fast. The delimiter must be unique because it will be used by the Text to Columns command to parse the data.
'Selects the title row and inserts the formula there. 'Here it will be the concatenated titles ActiveCell.Offset(-1, 0).Select ActiveCell = sMyFormula Range(ActiveCell, Cells(lLastRow, ActiveCell.Column)).FillDown ActiveSheet.Calculate 'converts the range to values Selection = Selection.Value 'the concatenated titles is saved for later use since it would be lost during the next commands sMyTitles = ActiveCell 'Creates "consolidated ranges" pivot table, which will later be removed sSource = Range(Cells(lFirstDataRow - 1, lFirstDataColumn), Cells(lLastRow, lLastColumn)).Address(, , xlR1C1) ActiveWorkbook.PivotCaches.Create(SourceType:=xlConsolidation, _ SourceData:=sSource).CreatePivotTable _ TableDestination:="", _ TableName:="NPT" On Error Resume Next ActiveSheet.PivotTables("NPT").PivotFields("Count of Value").Function = xlSum
The above code is one of the steps that make this process so fast. Excel builds a “consolidated range” pivot table. If some of the data is non-numeric the data field would use “Count” which we want to convert to “Sum”, this is the reason for lines 67 and 68. Manually you can reach this command by pressing Alt+D+P to expose the PivotTable and PivotChart Wizard from Excel 2003, shown in Figure 4, below:
Figure 4 – This is the first step of Microsoft’s Wizard, which the code is emulating:
The command we are using is Multiple consolidation ranges. Excel gives us a pivot table which Is, in and of itself, not very useful. But in the following code, the ShowDetail command used on the Grand/Grand Total creates the desired output. And it’s very fast!
'creates database detail from the above pivot table, which is pretty much what we want sTempSheet = ActiveSheet.Name ActiveCell.SpecialCells(xlLastCell).ShowDetail = True 'Generates normalized database 'you can turn off these alerts earlier in the macro but its often safer to turn them of and on only where needed ‘The pivot table sheet is no longer need and can be deleted With Application .DisplayAlerts = False Sheets(sTempSheet).Delete .DisplayAlerts = True
The following code first inserts some blank columns and then uses Excel’s Text to Columns command to convert the concatenated field back to its original individual fields. Yet another very fast command!
'inserts as many columns as need to display all the concatenated fields Range("B1:" & Cells(1, k).Address).EntireColumn.Insert 'puts the concatenated titles in cell A1 Range("A1") = sMyTitles Range("A1", [A1].End(xlDown)).Select 'the text to columns command overwrites the table's default titles .DisplayAlerts = False 'converts concatenated field to a set of columns Selection.TextToColumns _ Destination:=Range("A1"), _ DataType:=xlDelimited, _ ConsecutiveDelimiter:=False, _ Other:=True, _ OtherChar:=sDelimiter .DisplayAlerts = True End With 'best fit the data Range("A1:" & Cells(1, iTotCols).Address).EntireColumn.AutoFit 'enters the field name for the transposed row Range("A1").Offset(0, k) = frmNormalize.txtNewFieldName 'Options - removes rows With blanks or zeros in the value field If frmNormalize.cbBlanks = True Then RemoveBlanks End If If frmNormalize.cbZeros = True Then RemoveZeros End If ConvertToRange 'Cleanup sDataSheet = ActiveSheet.Name Sheets(sMySheet).Activate ‘removes the concatenated formula column from the original data. Cells(1, lMyColumn).EntireColumn.Delete Sheets(sDataSheet).Activate Application.Calculation = xlCalculationAutomatic End Sub
Deleting rows can be done by looping from the bottom and deleting each row that meets a given condition. This is very slow! If you can select all the rows which meet some condition you can then delete them with one command. This is faster. However, if Excel needs to adjust cell references based on the deleted rows this can also slow things down. One solution is to move all the rows you want to delete to the bottom of the dataset and then delete them with a single command. In the following two modules something like that is done. For removing rows with blanks in the values column you simple need to sort by the values and the use the Go to Special, Blanks command to select all the rows.
RemoveBlanks Subroutine
Removes rows with blanks in value column Private Sub RemoveBlanks() 'puts all blank cells at the bottom of the table making deleting rows faster With ActiveSheet.Sort .SortFields.Clear .SortFields.Add _ Key:=Cells(1, k + 2), _ SortOn:=xlSortOnValues, _ Order:=xlAscending .SetRange ActiveCell.CurrentRegion .Header = xlYes .Orientation = xlTopToBottom .Apply End With 'this command deletes all rows with blanks in the values column Selection.Offset(0, k + 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete End Sub
Removing zeros is a little trickier, because, if the user has chosen not to remove blanks you can’t clear all cells with 0’s and then run the above command, extra row may be deleted. In this case you can replace the 0’s with text, then clear the cells with zeros and then use line 18 above to remove the rows. Finally you can clear the cells containing text. I chose not to do that; instead I entered the formula =1/0 in all the cells with 0 and then sorted putting this group near the bottom. Then I selected an deleted all the cells with error formulas with one step, line 25.
RemoveZeros Subroutine
This code removes all rows with zeros in the values field Private Sub RemoveZeros() 'replaces all cells with 0 with a formula generating an DIV/0! error Range("A1").CurrentRegion.Resize(, 1).Offset(0, k + 1).Select Selection.Replace _ What:="0", _ Replacement:="=1/0", _ LookAt:=xlWhole 'this sorts all cells with errors to the bottom making deleting rows faster With ActiveSheet.Sort .SortFields.Clear .SortFields.Add _ Key:=Cells(1, k + 2), _ SortOn:=xlSortOnValues, _ Order:=xlAscending .SetRange ActiveCell.CurrentRegion .Header = xlYes .Orientation = xlTopToBottom .Apply End With 'this command deletes all rows with errors Selection.SpecialCells(xlCellTypeFormulas, 16).EntireRow.Delete End Sub
ConvertToRange Subroutine
'The user can keep the table or convert it to a regular spreadsheet range 'The user can keep or remove the autofilters in either case 'The user can keep or remove the formatting in either case Private Sub ConvertToRange() Dim rList As Range With ActiveSheet.ListObjects(1) Set rList = .Range If frmNormalize.cbConvertToRange = True Then .Unlist 'converts a table to a range If frmNormalize.cbFilter = False Then Range("A1").CurrentRegion.AutoFilter 'turns on autofilter If frmNormalize.cbFormatting = True Then 'removes table formatting With rList .Interior.ColorIndex = xlColorIndexNone .Font.ColorIndex = xlColorIndexAutomatic .Borders.LineStyle = xlLineStyleNone End With End If Else If frmNormalize.cbFilter = True Then .Range.AutoFilter 'Turns off autofilter If frmNormalize.cbFormatting = True Then 'Turns off table formatting .TableStyle = "" End If End If End With End Sub
Tidy Up
That’s it for today. HUGE thanks to Shane Devenshire for sharing his Data Normalization Utility with us and allowing me to post it on my blog. I hope you enjoy it and find it useful in your daily work.
Downloads
Download the file from OneDrive. The zip archive contains the Data Normalizer in .xlsm format as well as a Microsoft Word version of the tutorial.