In my last post on working with strings, I demonstrated some VBA with a call to the Regular Expression (RegExp) library to split alpha characters from numeric characters where no delimiter was present. Today I received a data set that contained 10’s of thousand of strings that contained some trailing stuff that I wanted to remove. My initial thought was to use the RegExp engine with the correct pattern, but I discovered a better way.
The Requirements
Upon review of the strings, the pattern I discovered:
- The string always begin with an alpha or numeric character with a mix of upper and lower case
- The string I need to preserve end with an alpha or numeric character with a mix of upper and lower case
- Everything from the beginning of the of the first alphanumeric to the last alphanumeric must be preserved as is, spaces, case, special characters, whatever
- Everything trailing to right of the last alphanumeric may safely be removed, special characters, non-printable characters, spaces, whatever
- Strings are of random lengths both input and output
So, I need to :
- Search from the end (right) of the string
- Find the first alphanumeric character irregardless of case
- Return the string beginning from the first character to the character position determined in the previous step
Quick Segue – The Functions
Before I get to the Sub Procedure, I would like to review all of the Functions I am using to make the process fairly dynamic :
GetSelectedSheet
Here I am using an InputBox to allow the user to select a worksheet at run-time. The InputBox Method has 1 required parameter and 7 optional parameters. If the optional parameters are not utilized, then the InputBox returns a text value. However, The optional Type parameter makes the InputBox more powerful. In the function, I am using Type:=8 to return a cell reference as a Range Object. You can read more about the InputBox Method here.
Public Function GetSelectedSheet() As String 'Declare variables Dim ws As Worksheet Dim rng As Range 'Users - select a cell on a worksheet Set rng = InputBox( _ Prompt:="Please select a cell on a worksheet", _ Title:="Select a worksheet", _ Default:=ActiveCell.Address, _ Type:=8) 'Range selection 'Get the parent worksheet of the selected cell Set ws = rng.Parent 'Pass the name of the worksheet to the function GetSelectedSheet = ws.Name 'Tidy up Set rng = Nothing Set ws = Nothing End Function
GetRows
Fairly straight forward, I pass the selected worksheet to the function and it returns the maximum rows of data based on Column 1. I could make this more dynamic by passing a column number to the function as well, but I generally always use Column 1.
Public Function GetRows(ws As Worksheet) As Long Dim r As Long With ws r = .Cells(Rows.Count, 1).End(xlUp).Row End With GetRows = r End Function
GetColumns
Straight forward, I pass the selected worksheet to the function and it returns the maximum columns of data based on Row 1. I could make this more dynamic by passing a row number to the function as well, but I generally always use Row 1.
Public Function GetColumns(ws As Worksheet) As Long 'Declare variables Dim c As Long 'Get column count, store it in a variable With ws c = .Cells(1, Columns.Count).End(xlToLeft).Column End With 'Pass the variable value to the function GetColumns = c End Function
GetUserInput
Again, fairly straight forward. Get a text value from the user to search for in the next function.
I call the function like this :
'User - What is search term? strSearchTerm = GetUserInput(strPrompt:="What is the search term?", _ strTitle:="Find Column Number")
Public Function GetUserInput(strPrompt As String, _ strTitle As String) As String 'Declare variables Dim strUserInput As String 'Call the InputBox Method, pass user input to a variable strUserInput = InputBox(Prompt:=strPrompt, _ Title:=strTitle) 'Pass the variable value to the function GetUserInput = strUserInput End Function
GetColumnNumber
The function has 2 arguments, a worksheet and a string value that I got from the user in the last function. The function will create a Range Object and search that Range for the term supplied by the user. Again, I am using Row 1 here, but I could make it more dynamic by passing a row number to the function as one of its arguments. Below, I am using the Named Argument, LookAt:= and passing the value xlPart instead of xlWhole. You may want to consider this in your VBA Projects as you program defensively around what the user may input. Since I am using this for myself, I am not too concerned for now,
Public Function GetColumnNumber(ws As Worksheet, _ strSearchTerm As String) As Long 'Declare variables Dim rng As Range Dim MaxColumns As Long Dim lngField As Long 'Initialize MaxColumns = GetColumns(ws:=ws) With ws Set rng = .Range(.Cells(1, 1), .Cells(1, MaxColumns)) End With 'Find columns number lngField = rng.Find(What:=strSearchTerm, _ LookIn:=xlValues, _ LookAt:=xlPart, _ MatchCase:=False).Column 'Pass the column number to the function GetColumnNumber = lngField 'Tidy up Set rng = Nothing End Function
GetCleanAlphaNumeric
This is the money! All other functions to this point were setup work to allow this function to do the heavy lifting. The function uses the LIKE operator to compare a character beginning at the right-most position of the string to the pattern, “[0-9A-Za-z]”. As soon as a match is found, the For..Next Loop is exited, thus saving time by not checking characters unnecessarily. I then use the MID() Function to get the string from the 1st character to the last alphanumeric character position determined in the For..Next Loop. More on the LIKE operator here.
Public Function GetCleanAlphaNumeric(strChar As String) As String 'Comments : Remove non-alpha numeric characters from end of string 'Declare variables Dim i As Long Dim lngLengthString As Long Dim blnTest As Boolean Dim posLastAlphaNumeric As Long Dim strClean As String 'Initialize blnTest = False 'Length of string to check lngLengthString = Len(CStr(strChar)) 'Compare each charcter to pattern 'Begin at end of string 'Stop as soon as find alphanumeric For posLastAlphaNumeric = lngLengthString To 1 Step -1 blnTest = Mid(CStr(strChar), posLastAlphaNumeric, 1) Like "[0-9A-Za-z]" If blnTest = True Then Exit For Next posLastAlphaNumeric 'posLastAlphaNumeric is the position of last AlphaNumeric character 'Use the position of the last alphanumeric to get the final length of the string 'Assign the value to the range strClean = CStr(Mid(strChar, 1, posLastAlphaNumeric)) 'Pass the clean string to the function GetCleanAlphaNumeric = strClean End Function
The Main Procedure
Here is the main procedure that calls all of the functions. Note: Screen updating must be on for the user to select a cell on a worksheet. Turn ScreenUpdating off after the user selects a cell on a worksheet.
Option Explicit Sub CleanStrings() 'Author: Winston Snyder 'Date: 3/28/14 'Purpose: Get string excluding non-alphanumeric trailing characters '--------------------------------------------------------------------------------------------------------------------------------------------- 'Declare variables Dim wb As Workbook Dim ws As Worksheet Dim rng As Range Dim C As Range Dim strSearchTerm As String Dim strStringToBeCleaned As String Dim lngColumnNumber As Long Dim MaxRows As Long 'Excel environment - speed things up With Application .DisplayAlerts = False .EnableEvents = False .Calculation = xlCalculationManual End With 'Initialize Set wb = ThisWorkbook Set ws = wb.Worksheets(GetSelectedSheet) Application.ScreenUpdating = False 'Get maximum number of rows on the worksheet MaxRows = GetRows(ws:=ws) 'User - What is search term? strSearchTerm = GetUserInput(strPrompt:="What is the search term?", _ strTitle:="Find Column Number") 'Get the column number based on the search term lngColumnNumber = GetColumnNumber(ws:=ws, _ strSearchTerm:=strSearchTerm) 'Define the range that contains strings to be cleaned With ws Set rng = .Range(.Cells(2, lngColumnNumber), .Cells(MaxRows, lngColumnNumber)) End With 'Clean each string in the range For Each C In rng strStringToBeCleaned = CStr(C.Value) C.Value = GetCleanAlphaNumeric(strChar:=strStringToBeCleaned) Next C 'Tidy up 'Destroy objects Set C = Nothing Set rng = Nothing Set ws = Nothing Set wb = Nothing 'Restore Excel environment With Application .ScreenUpdating = True .DisplayAlerts = True .EnableEvents = True .Calculation = xlCalculationAutomatic End With End Sub
Tidy up
Final Thoughts
That’s it today. I like the LIKE operator. This process is fast, reviewed 25K strings, and updated them when needed in no time. Awesome!