String Calculations in Tableau

The King James Bible goes Tableau – A (simple) Use Case

Before I am coming to the very bone-dry and lengthy explanations of string calculations in Tableau, I felt I should do something motivating to get started: a small Tableau visualization putting at least one or two string formulas to practice.

Here it is.

The Good Book goes Tableau. A Tableau dashboard to explore the King James Bible:

What can you do with this visualization?

You can select a book at the top and read all verses of this book, if you want to. You can also filter this view by searching for a word or phrase. The bar chart below the text table shows statistics of all books (respectively of all books containing the search string): count of the verses, count of words, average words per verse and (if specified) occurrences of the search string. The selected book is highlighted in red.

Play around and explore the King James Bible using Tableau. For instance, search for names like “Moses” or “Job” or other words like “good” and “evil” or whatever you like.

Nothing spectacular, I know. Just a little example to demonstrate how string calculations could be used in Tableau.

Acknowledgement

Believe it or not, I didn’t compile the data for the King James Bible visualization on my own. I simply took it from John Walkenbach’s The Spreadsheet Page. It’s funny, because John also wrote all the Excel Bible books and then he published the real Bible in an Excel spreadsheet. Arizona-type of humor, I guess.

I really doubt John will ever read this, but anyway: many thanks John, for everything you did for the Excel community in the last 20+ years. I learned a lot from you. Much appreciated.

Now on to the academic part for the Tableau developers among you: 44 string calculation examples in Tableau Software.

The Database

For the development and test of this little library of string calculations in Tableau, I created a dummy data set of 200 fake records using the Fake Name Generator. The data is made up and totally pointless, but to understand the field names used in the formulas below, you should have a quick look on what is in the data source:

Faked example data - click to enlarge

If you are interested in downloading the Excel workbook with this data, see the section Download Links at the end of the post.

Concatenation and Insertion

01 – Concatenate strings

Description: Concatenate two or more strings (dimensions and hardcoded)

Example: Concatenate [GivenName], [MiddleInitial] (if applicable) and [Surname] to one string

[GivenName] + ” ” +IfNULL([MiddleInitial] + “. “,””) + [Surname]

How does it work?

The existing dimensions are concatenated using the plus operator (+). The IFNULL function checks, if a middle initial exists. If so, the spaces and the field [MiddleIntital] with a period are inserted between [GivenName] and [Surname]. If not, the space is inserted between [GivenName] and [Surname].

02 – Concatenate strings and date

Description: Concatenate two or more strings and a date or a part of a date

Example: Concatenate [GivenName], [MiddleInitial] (if applicable), [Surname] and the year of the birth to one string

[GivenName] + ” ” +IFNULL([MiddleInitial] + “. “,””) + [Surname] +
” born in ” + STR(YEAR([Birthday]))

How does it work?

For the first part of the formula, see Calculated Field “01–Concatenate strings”.
At the end a hard coded text (” born in “) is added, followed by the year of birth, converted from the dimension [Birthday] using the functions STR(expression) and YEAR(date).

03 – Insert line feed

Description: Concatenate two or more strings and insert a line feed between them

Example: Concatenate [GivenName] and [Surname] to one string and insert a line feed in between

[GivenName] + CHAR(10) + [Surname]

How does it work?

CHAR(10) inserts a line feed. 10 is the ASCII code number of the line feed character.

Instead of using CHAR(10) you can also insert a line feed directly in the Calculated Field Editor like this:

[GivenName] + ”
” + [Surname]

04 – Insert quotation marks inside of strings

Description: Concatenate strings and insert quotation marks inside the string

Example: Concatenate [GivenName] and [Surname] and add the initials in brackets and quotation marks at the end

[GivenName] + ” ” + [Surname] +
” (“”” +
LEFT([GivenName],1) +
LEFT([Surname],1) +
“””)”

How does it work?

Use double quotation marks to create quotation marks inside a hardcoded string.

05 – Fill string to right

Description: Fill a string after the last character to a specified total length with a specified character

Example: Fill the field [Surname] with e.g. hyphens or pipes (the value of [STR Parameter 1]) to a maximum length of e.g. 20 or 30 (value of [INT Parameter 1])

IF LEN([Surname]) [Surname]+
REPLACE(SPACE([INT Parameter 1]-LEN([Surname])),” “,LEFT([STR Parameter 1],1))
ELSE
[Surname]
END

How does it work?

SPACE(number) creates a string with the specified number of blanks. All blanks in this string are then replaced by the specified character and the string is added to the end of [Surname].

LEFT([STR Parameter 1],1) ensures the fill text has only one character. The IF clause makes sure that fill characters are only added if the length of [Surname] is smaller than the specified maximum number ([INT Parameter 1]).

In the view (the table) this only looks good, if you are using a mono spaced font like Courier. This example may not have many practical uses cases, but the technique can be helpful as it mimics Excel’s REPT function which isn’t available in Tableau. You can – for instance – use this to fake a horizontal bar chart within a text, similar to what Andy Cotgreave did here: Charts inside a tooltip.

06 – Fill string from left

Description: Fill a string left to the first character to a specified total length with a specified character

Example: Fill the field [Surname] with e.g. hyphens or pipes (value of [STR Parameter 1]) from left to a maximum length of e.g. 20 or 30 (value of [INT Parameter 1])

IF LEN([Surname])REPLACE(SPACE([INT Parameter 1]-LEN([Surname])),” “,LEFT([STR Parameter 1],1))+
[Surname]
ELSE
[Surname]
END

How does it work?

See Calculated Field [05 – Fill string to right]. The only difference is adding the field [Surname] to the created string instead of adding the created string to [Surname].

As mentioned above, this only works with mono spaced font types. A more practical application for this technique would be to e.g. left pad ZIP codes with zeros to a defined total number of digits, if leading zeros were lost during the ETL process.

07 – Ordinal numbers

Description: Convert a number into a string in English ordinal numbering format

Example: Ordinal numbers of field [Number], i.e. 1 = 1st, 2 = 2nd, 3 = 3rd, etc.

STR([Number]) +
IF RIGHT(STR([Number]),2) = “11” OR
RIGHT(STR([Number]),2) = “12” OR
RIGHT(STR([Number]),2) = “13” THEN
“th”
ELSE
CASE RIGHT(STR([Number]),1)
WHEN “1” THEN “st”
WHEN “2” THEN “nd”
WHEN “3” THEN “rd”
ELSE “th”
END
END

How does it work?

STR(expression) converts a number into a string and RIGHT(string, n) returns the last n characters from a string. Depending on the last 2 respectively the last character of the string, an IF clause and a CASE statement assign the correct extension to the number:

If the last 2 characters are 11, 12 or 13, the correct extension is “th”. In all other cases, it depends on the last character: 1 = “st”, 2 = “nd”, 3 = “rd”,  all others are “th”.

Conversion and Manipulation

08 – One letter upper case

Description: Extract the first character of a string and make it upper case
Example: Convert [Gender] from “male” to “M” and from “female” to “F”

UPPER(LEFT([Gender],1))

How does it work?

LEFT(string, 1) returns the first character, UPPER(string) converts it to upper case.

09 – Number to upper case character

Description: Convert a number to the corresponding upper case character in the alphabet (1=A, 2=B, …, 26=Z)

Example: Convert the field [Number] to the corresponding character in the alphabet

CHAR(65+([Number]-1)%26)

How does it work?

Upper case letters in the ASCII table start at 65, i.e. add the calculated number minus 1 to this starting number.

CHAR(number) returns the character of the ASCII code. The modulo operator (%) transfers any positive integer number into the range from 1 to 26:
1=1,…, 26=26, 27=1, 28=2,…,52=26, 53=1,54=2 etc.

Example calculation: [Number] = 49
65 + (49-1) % 26 =
65 + 48 % 26 =         // modulo operation: 48 divided by 26 leaves a remainder of 22
65 + 22 = 87
CHAR (87) = “W”

10 – Number to lower case character

Description: Convert a number to the corresponding lower case character in the alphabet (1=a, 2=b, …, 26=z)

Example: Convert the field [Number] to the corresponding character in the alphabet

CHAR(97+([Number]-1)%26)

How does it work?

See explanation of Calculated Field [09 – Number to upper case character]. It’s the same formula and logic except for starting at ASCII code 97 instead of 65.

11 – String to Date

Description: Convert a string to a date

Example: Convert field [String Date] in format “YYYYMMDD” (e.g. 20140131) into a date (e.g. 01/31/2014)

DATEPARSE(“yyyyMMdd”,[String Date])

How does it work?

DATEPARSE (format, string) converts a string to a date in the specified format.

12 – Date string

Description: Convert a date into a string in a defined format (other than the date field’s default format)

Example: Convert the date [Birthday] into a string in format “Month Day, Year”, e.g. “February 12, 2014”

DATENAME(‘month’,[Birthday])
+ ” ” +
STR(DAY([Birthday]))+
“, ” +
STR(YEAR([Birthday]))

How does it work?

DATENAME returns the month of [Birthday] as a string. Finally add the day and year converted to strings at the end. STR(expression), converts an expression to a string. DAY (date), MONTH (date) and YEAR(date) return the date parts.

13 – Text bins

Description: Assign strings to bins according to their first character

Example: Assign the field [Surname] according to the first character to bins of “A-K”, “L-Q” and “R-Z”

IF UPPER(LEFT([Surname],1))
“A-K”
ELSEIF UPPER(LEFT([Surname],1))
“L-Q”
ELSE
“R-Z”
END

How does it work?

UPPER(LEFT([Surname],1)) returns the first character of the [Surname] converted to upper case. “K” and “Q” in the IF ELSE clause define the bins A-K, L-Q and R-Z. Adjust the characters and the texts as wanted.

14 – Reverse 2 words

Description: Reverse the order of 2 words in a string

Example: Reverse the first name and surname in the field [Name, Given Name] (separated by comma): “Doe, John” is turned into “John Doe”

MID([Name, Given Name], FIND([Name, Given Name],”,”)+2) + ” ”
+ LEFT([Name, Given Name],FIND([Name, Given Name],”,”)-1)

How does it work?

Get the given name using the function MID. Get the substring from 2 characters after the position of the comma (detected by FIND) and return all remaining characters (optional parameter [length] of MID is omitted). Add the all characters from the beginning of the string (LEFT) to the position of the comma (FIND) minus 1 (i.e. don’t return the comma).

15 – Reverse 3 words

Description: Reverse the order of the words in a string with exactly 3 words

Example: Reverse the first name. middle initial and surname in the field [Full Name] to surname, given name, middle initial: “John F. Doe” is turned into “Doe, John F.”

01: RIGHT(
02: RIGHT([Full Name],LEN([Full Name])-FIND([Full Name],” “)),
03: LEN(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name],” “)))-        
04: FIND(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name],” “)),” “))   
05: +”, “+
06: LEFT([Full Name],FIND([Full Name],” “)-1)+
07: ” “+
08: IF CONTAINS ([Full Name],”.”) THEN
09: MID([Full Name],FIND([Full Name],” “)+1,2)
10: ELSE
11: “”
12: END

Note: the red line numbers in front of each line are only for the explanation and do not belong to the formula of this Calculated Field.

How does it work?

It is hard to explain this formula in plain English, so I will try to dissect the formula line by line:

01: get the last part of the string
02: get the rest of the string after the first blank
03: detect the length of the substring from the first blank to the end
04: minus the position of the second blank
05: add a comma and a blank
06: extract and add the first name
07: add a blank
08: is there a middle name (a period)?
09: if yes, extract and add the middle initial
10: if no
11: add an empty string
12: end if

16 – Imperial to metric

Description: Convert a string with a size displayed in imperial notation into a number in metric system format, e.g. 6’2’’ is converted to 187.96 cm

Example: Convert [FeetInches] (the body height stored as a string in feet and inches) into cm

INT(
LEFT(REPLACE([FeetInches],” “,””),
FIND(REPLACE([FeetInches],” “,””),”‘”)-1))*
30.48+
INT(MID(REPLACE([FeetInches],” “,””),
FIND(REPLACE([FeetInches],” “,””),”‘”)+1,
FIND(REPLACE([FeetInches],” “,””),CHAR(34))-
1-
FIND(REPLACE([FeetInches],” “,””),”‘”)))
*2.54

How does it work?

The LEFT statement extracts the characters left to the first apostrophe, i.e. the foot. The REPLACE statements inside remove all blanks first. INT converts the string to a number and the multiplication by 30.48 converts foot to centimeter.

The next part of the formula uses MID to extract everything right to the first and left to the quotation mark (CHAR(34)), i.e. the inches. Again INT converts the string to a number and multiplying by 2.54 returns the inches in centimeter.

The sum of the results of both formula parts is the size in centimeter.

17 – Proper case first 3

Description: Convert the first 3 words of a string to proper case, i.e. first character is upper case, all other characters are lower case, e.g. “this is text” is converted to “This Is Text”

Example: Convert the first 3 words of the dimension [Text lower case] to proper case

IF FIND(LTRIM([Text lower case]),” “,FIND(LTRIM([Text lower case]),” “)+1)>0 THEN

UPPER(LEFT(LTRIM([Text lower case]),1))+
MID(LOWER(LTRIM([Text lower case])),2,FIND(LTRIM([Text lower case]),” “)-2)+
UPPER(MID(LTRIM([Text lower case]),FIND(LTRIM([Text lower case]),” “),2))+
MID(LOWER(LTRIM([Text lower case])),FIND(LTRIM([Text lower case]),” “)+2,FIND(LTRIM([Text lower case]),” “,FIND(LTRIM([Text lower case]),” “)+1)-FIND(LTRIM([Text lower case]),” “)-2)+
UPPER(MID(LTRIM([Text lower case]),FIND(LTRIM([Text lower case]),” “,FIND(LTRIM([Text lower case]),” “)+1),2))+
MID(LOWER(LTRIM([Text lower case])),FIND(LTRIM([Text lower case]),” “,FIND(LTRIM([Text lower case]),” “)+1)+2)

ELSEIF FIND(LTRIM([Text lower case]),” “)>0 THEN

UPPER(LEFT(LTRIM([Text lower case]),1))+
MID(LOWER(LTRIM([Text lower case])),2,FIND(LTRIM([Text lower case]),” “)-2)+
UPPER(MID(LTRIM([Text lower case]),FIND(LTRIM([Text lower case]),” “),2))+
MID(LOWER(LTRIM([Text lower case])),FIND(LTRIM([Text lower case]),” “)+2,999)

ELSE

UPPER(LEFT(LTRIM([Text lower case]),1))+MID(LOWER(LTRIM([Text lower case])),2)

END

How does it work?

Well, look at this formula. I am sorry, but I don’t think I have a chance to explain this in all details. Thus, only a few remarks to point you into the right direction:

The IF THEN ELSEIF ELSE clause checks how many words the string contains: IF = 3, ELSEIF = 2, ELSE = 1. The formulas inside the branches of the IF statement use UPPER and LOWER to change the case of the characters, LTRIM to delete leading blanks and MID and FIND to return the parts of the first, second and third word / character to be converted to either upper or lower case.

Analysis

18 – Count blanks

Description: Count the number of blanks in a string

Example: Count the number of blanks in the dimension [Text]

LEN([Text])-LEN(REPLACE([Text],” “,””))

How does it work?

REPLACE ([Text],” “,””) creates a temporary string without blanks. The count of blanks in a string is the length of the original string minus the length of this temporary string with no blanks.

19 – Count non blanks

Description: Count the number of characters in a string which are no blanks

Example: Count the number of non blank characters in the dimension [Text]

LEN(REPLACE([Text],” “,””))

How does it work?

REPLACE ([Text],” “,””) creates a temporary string without blanks. The length of this string is the number of characters which are no blanks.

20 – Count words

Description: Count the number of words in a string

Example: Count the number of words in dimension [Text]

Condition: Words are separated by exactly one blank

LEN(TRIM([Text]))-LEN(REPLACE(TRIM([Text]),” “,””))+1

How does it work?

REPLACE ([Text],” “,””) creates a temporary string without blanks. TRIM removes all leading or trailing blanks. On condition that the words are separated by exactly one blank, the number of words is the length of the original string minus the length of temporary string without blanks plus 1 (no blank after last word).

21 – Contains number

Description: Returns TRUE if a string contains a number

Example: Check if the field [Text w/ and w/o Numbers] contains a number

CONTAINS([Text w/ and w/o Numbers],”0″) OR
CONTAINS([Text w/ and w/o Numbers],”1″) OR
CONTAINS([Text w/ and w/o Numbers],”2″) OR
CONTAINS([Text w/ and w/o Numbers],”3″) OR
CONTAINS([Text w/ and w/o Numbers],”4″) OR
CONTAINS([Text w/ and w/o Numbers],”5″) OR
CONTAINS([Text w/ and w/o Numbers],”6″) OR
CONTAINS([Text w/ and w/o Numbers],”7″) OR
CONTAINS([Text w/ and w/o Numbers],”8″) OR
CONTAINS([Text w/ and w/o Numbers],”9″)

How does it work?

CONTAINS (string, substring) is called for every possible digit (“0” to “9”).
IF CONTAINS returns TRUE for at least one digit, the Calculated Field returns TRUE, otherwise FALSE.

22 – First char is alphabetic

Description: Returns TRUE, if the first character of a string is alphabetic (“a” to “z” or “A” to “Z”)

Example: Check if the first character of [Text w/ and w/o Numbers] is alphabetic

(LTRIM([Text w/ and w/o Numbers]) >= “A” AND
LTRIM([Text w/ and w/o Numbers])
OR
(LTRIM([Text w/ and w/o Numbers]) >= “a” AND
LTRIM([Text w/ and w/o Numbers])

How does it work?

LTRIM(string) removes leading blanks and returns the first character in the string.
Boolean expressions (greater than and less than) combined by AND and OR operators check if the first character is inside the character ranges (upper and lower cases) and return TRUE if this is the case and otherwise FALSE.

23 – First char is numeric

Description: Returns TRUE, if the first character of a string is numeric (0 to 9)

Example: Check if the first character of [Text w/ and w/o Numbers] is numeric

ASCII(LTRIM([Text w/ and w/o Numbers])) >= 48 AND
ASCII(LTRIM([Text w/ and w/o Numbers]))

How does it work?

Check if the first character is inside the range of digits “0” to “9” in the ASCI code (48 to 57).

24 – Count substrings

Description: Count occurrences of a specified substring in a string

Example: Count occurrences of parameter [STR Parameter 2] in the dimension [Text]

(LEN([Text])-LEN(REPLACE([Text],[STR Parameter 2],””))) /
LEN([STR Parameter 2])

How does it work?

REPLACE ([Text],[STR Parameter 2],””) creates a temporary copy of the string and deletes all occurrences of [STR Parameter 2] in this copy. The length of the string minus the length of the temporary copy without the specified substring returns the number of characters of all substrings ([STR Parameter 2] in the string. To get to the occurrences of the substring in the string, you have to divide this by the length of the substring.

Example: let’s say our search string is “abc” and it occurs four times in the string. The REPLACE statement deletes “abc” from the string four times, i.e. it deletes 12 characters in total and the numerator of the division (first line) returns 12. If we divide this by the length of the search string (3), we get the correct count of occurrences of “abc” in the string: 4.

25 – Count comma separated items

Description: Count the number of substrings in a string separated by a comma

Example: Count the number of substrings in the dimension [Text] which are separated by a comma

LEN([Text])-LEN(REPLACE([Text],”,”,””))+1

How does it work?

See explanation of Calculated Field [20 – Count words]. It is the same formula, just replacing the commas instead of the blanks.

26 – Find first number

Description: Find the position of the first number in a string. Return 0, if the string does not contain a number

Example: Find the position of the first number in [Text w/ and w/o Numbers]

IF [21-Contains number] THEN
MIN(MIN(MIN(MIN(MIN(MIN(MIN(MIN(MIN(
FIND([Text w/ and w/o Numbers]+”0123456789″,”0″),
FIND([Text w/ and w/o Numbers]+”0123456789″,”1″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”2″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”3″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”4″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”5″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”6″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”7″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”8″)),
FIND([Text w/ and w/o Numbers]+”0123456789″,”9″))
ELSE
0
END

How does it work?

Check first if the string contains a number in the IF clause, using the Calculated Field [21-Contains Number]. If the string does contain a number, create a temporary string by adding a hardcoded substring with all digits (“0123456789”) at the end of the string. Search for the position of each digit (“0”, “1”, etc.) in this temporary string. The minimum of all positions in the extended string is the position of the first digit in the original string.

Caution: this is a very complex calculation and may seriously hit the performance of your Tableau workbook. Use this with caution, i.e. only if you really need it.

27 – Nth occurrence

Description: Find the position of the nth occurrence of a substring within a string.

E.g. the second occurrence of “one” in “one formula is one formula is one formula” is 16, the third occurrence of “one” is 31.

Example: Find the [INT Parameter 2]th occurrence of substring [STR Parameter 2] in the dimension [Text].

Limitation: The formula works only up to the fifth occurrence.

IF [INT Parameter 2] > 5 THEN
9999
ELSEIF (LEN([Text])-LEN(REPLACE([Text],[STR Parameter 2],””)))/LEN([STR Parameter 2])
0
ELSE
CASE [INT Parameter 2]
WHEN 1 THEN FIND([Text], [STR Parameter 2])
WHEN 2 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) + 1)
WHEN 3 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) +1) +1)
WHEN 4 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) +1) +1) +1)
WHEN 5 THEN FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2], FIND([Text], [STR Parameter 2]) +1) +1) +1) + 1)
END
END

How does it work?

The IF clause first checks, if the maximum of n (=5) isn’t exceeded. If it is, the formula returns 9999. The ELSEIF part of the IF clause checks if the string contains the substring at all. If it doesn’t,the formula returns 0.

If n is less or equal than 5 and the string contains the search string (ELSE part), the position of the nth occurrence is calculated by nested FIND functions in a CASE WHEN statement. If n = 1, FIND looks from the start of the string. If n = 2, FIND looks from the character after the first occurrence of the search string, etc.

A brief aside:

This formula is limited to a maximum of the 5th occurrence and nested FINDs are everything else than an elegant solution. If you ever tried to solve this in Microsoft Excel, you probably know that there is a very elegant option in Excel for finding the nth occurrence by using a combination of FIND and SUBSTITUTE. The equivalent of SUBSTITUTE in Tableau is REPLACE. However, you cannot rebuild the Excel solution in Tableau, because unlike Excel’s SUBSTITUTE, Tableau’s REPLACE does not allow to specify the instance, i.e. which occurrence of the substring shall be replaced. Therefore – unfortunately – I do not have a better way to do this in Tableau. All I am having is this “monster” above.

Extraction

28 – Extract first character

Description: Extract the first character of a string

Example: Extract the initials from [GivenName] and [Surname] and concatenate them to a string

LEFT([GivenName],1) + LEFT([Surname],1)

How does it work?

LEFT(string, 1) returns the first character of a string.

29 – Extract first word

Description: Extract the first word of a string

Example: Extract the street number from [StreetAddress]: all characters left to the first space in the string

LEFT(LTRIM([StreetAddress]), FIND(LTRIM([StreetAddress]+” “),” “) -1)

How does it work?

LTRIM(string) removes leading blanks, FIND(LTRIM([StreetAddress]+” “),” “) returns the position of the first blank and LEFT(string, n) returns the first n characters of the string, i.e. all characters left to the first blank. A blank is added to the first parameter of the FIND statement to ensure the calculation works, if the string contains only one word.

30 – Extract first 2 words

Description: Extract the first two words of a string

Example: Extract the first two words from [Text]: all characters left to the second occurrence of a blank (excluding leading blanks)

LEFT(TRIM([Text]),
FIND(TRIM([Text])+”   “,” “,
FIND(TRIM([Text])+” “, ” “)+1)-1)

How does it work?

TRIM(string) removes leading and trailing blanks. The first FIND statement looks for a blank starting one character right to the position of the first blank (second FIND). Three blanks are added to the trimmed string to ensure a blank will be found even if there are only one or two words.

31 – Extract last part

Description: Extract all text of a string after the first word

Example: Extract street name from [StreetAddress]: all characters right to space after the first word, i.e. after the street number

RIGHT([StreetAddress], LEN([StreetAddress]) – FIND([StreetAddress],” “))

How does it work?

The length of the string minus the first occurrence of a space = remaining length of the string after the first word, i.e. the number of characters to be extracted by using the RIGHT function.

32 – Extract mid part fixed

Description: Extract a substring from somewhere in a string if the position and the length of the substring are fix

Example: Extract the the bank identification code from an IBAN (8 characters from position 5 on) and insert blanks to make it more readable

MID([IBAN],5,3) +” ” +
MID([IBAN],8,3) +” ” +
MID([IBAN],11,2)

How does it work?

The bank identification code in an IBAN starts at position 5 and is 8 characters long. MID(string, start, [length]) extract from string “length” characters starting at character “start”. Blanks are inserted using string concatenation to make the identification code more readable.

33 – Extract mid part variable

Description: Extract a substring from a string (variable position, fixed length)

Example: Extract the middle initial from the full name (2 characters after the first blank in the full name)

Condition: The middle initial has exactly 2 characters, i.e. initial and a period

IF CONTAINS ([Full Name], “.”) THEN
MID([Full Name], FIND([Full Name], ” “) + 1, 2)
ELSE
“”
END

How does it work?

The IF clause and CONTAINS(string, substring) checks if there is a period in the string, i.e. [Full Name] contains a middle initial. MID returns the initial and the period (2 characters) from the character right to the position where the first blank was found (FIND function).

34 – Extract 2 similar delimiters

Description: Extract a substring between 2 given, similar delimiters from a string

Example: Extract the short form of a URL from a full URL, i.e.get “www.clearlyandsimply.com” from “https://www.clearlyandsimply.com/about.html/”.

LEFT(
MID([Website],
FIND([Website],”://”)+3),
FIND(MID([Website],FIND([Website],”://”)+3),”/”)-1)

How does it work?

The first MID statement extracts everything (optional parameter [length] of MID is omitted) right to the third character after the first occurrence of “://”, i.e. everything from “www…” on (first FIND statement). The second FIND looks for the next slash in the remaining string. LEFT starts at the result of the first FIND statement and returns the characters from this character to the character returned by the second statement.

35 – Extract 2 different delimiters

Description: Extract a substring between 2 given different delimiters from a string

Example: Extract the last name from an email address in a format givenname.name@example.com

MID(LEFT([E-Mail Address],FIND([E-Mail Address],”@”)-1),
FIND([E-Mail Address],”.”)+1)

How does it work?

The LEFT statement returns the substring left to the “at” sign (@). MID starts extracting at the position of the period in this substring returned by LEFT and returns everything after the period (optional parameter [length] of MID is omitted).

36 – Extract substring in parentheses

Description: Extract the first substring in parentheses, i.e. “(….)”

Example: Extract the first substring in parentheses found in the field [Text with Parentheses]

IF FIND([Text with Parentheses],”(“) > 0 THEN
MID([Text with Parentheses],
FIND([Text with Parentheses],”(“)+1,
FIND([Text with Parentheses],”)”)-FIND([Text with Parentheses],”(“)-1)
ELSE
“”
END

How does it work?

The IF clause checks, if the string contains a left parenthesis (first FIND). MID extracts the substring starting one position right to the first occurrence of the left parenthesis (second FIND). The number of characters to be extracted is the position of the right parenthesis (third FIND) minus the position of the first left parenthesis (fourth FIND) minus 1.

37 – Extract first word after specified word

Description: Extract the first word from a string after a specified word

Example: Extract the first word in [Text] after the first occurrence specified in parameter [STR Parameter 2]

IF FIND(UPPER(” “+[Text])+” “,UPPER(” “+[STR Parameter 2])+” “) > 0 THEN
MID(” ” +[Text]+” “,
FIND(UPPER(” ” +[Text]+” “),UPPER(” “+[STR Parameter 2]+” “))+LEN([STR Parameter 2])+2,
FIND(
MID(” ” +[Text]+” “,
FIND(UPPER(” ” +[Text]+” “),UPPER(” “+[STR Parameter 2]+” “))+LEN([STR Parameter 2])+2),” “)-1)
ELSE
“”
END

How does it work?

The IF clause checks if the string contains the specified word ([STR Parameter 2], first FIND statement)). MID starts to extract two positions right to the specified word (second FIND): the position of the first character of the specified word plus the length of the specified word plus 2 (blank right to it). The third FIND statement looks in the remaining string (i.e. the text after the specified word and the following blank) for the next blank, i.e. for the end of the next word after the specified word. Subtracting 1 cuts off the blank behind the searched word. This result is the number of characters, the first MID function will return. i.e. the first word after the specified word.

Adding blanks at the beginning and the end of the string and the search string ensures that only entire words are found (i.e. not parts of a word). UPPER functions make the search case insensitive

38 – Extract last word

Description: Extract the last word from a string

Example: Extract the surname (last word) from the dimension [Full Name]

Condition: Maximum 2 blanks in the full name, i.e. the formula does not work for strings with more than 3 words.

RIGHT(
RIGHT([Full Name],LEN([Full Name])-FIND([Full Name],” “)),
LEN(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name],” “)))-
FIND(RIGHT([Full Name],LEN([Full Name])-FIND([Full Name],” “)),” “))

How does it work?

The second RIGHT statement returns the rest of the string after the first blank. This substring is passed to the first RIGHT statement as the string. The number of characters to be returned is the length of this substring minus the position of the first blank in this substring, i.e. the position of the second blank in the original string.

39 – Extract first n words

Description: Extract the first n word from a string

Example: Extract the n (parameter [INT Parameter 2] first words from the dimension [Text]

Condition: n is limited to a maximum of 5

IF [INT Parameter 2] > 5 THEN
“n/a”
ELSEIF (LEN([Text])-LEN(REPLACE([Text],” “,””)))/LEN(” “) “n/a”
ELSE
CASE [INT Parameter 2]
WHEN 0 THEN “n/a”
WHEN 1 THEN LEFT([Text], FIND([Text], ” “)-1)
WHEN 2 THEN LEFT([Text], FIND([Text], ” “, FIND([Text], ” “) + 1)-1)
WHEN 3 THEN
LEFT([Text], FIND([Text], ” “, FIND([Text], ” “, FIND([Text], ” “) +1) +1)-1)
WHEN 4 THEN
LEFT([Text], FIND([Text], ” “, FIND([Text], ” “, FIND([Text], ” “, FIND([Text], ” “) +1) +1) +1)-1)
WHEN 5 THEN
LEFT([Text], FIND([Text], ” ” , FIND([Text], ” “, FIND([Text], ” “, FIND([Text], ” “, FIND([Text], ” “) +1) +1) +1) + 1)-1)
END
END

How does it work?

The IF clause first checks, if the maximum of n (=5) isn’t exceeded. If it is, the formula returns “n/a”. The ELSEIF part of the IF clause checks if the string contains at least n words. If it doesn’t,the formula returns “n/a”.

If n is less or equal than 5 and the string consists of n or more words (ELSE part), the first n words are extracted using LEFT and nested FIND functions. The formula uses the same logic as the Calculated Field [27 – Nth occurrence] (see above).

40 – Extract date from string

Description: Extract the first date in a string and convert it to a date

Example: Extract the first date found in the field [Text with a Date]

Conditions: The date in the string is in format MM/DD/YYYY. The formula does not work if a slash is in the string left to the date and it will always extract the first date in the string

DATE(MID([Text with a Date],FIND([Text with a Date],”/”) -2,10))

How does it work?

The extraction of the date string uses MID starting 2 characters left to the position of the first slash (“MM/”) and returns the next 10 characters (“MM/DD/YYYY”). DATE converts the result substring to a date.

41 – Remove all spaces

Description: Remove all blanks in a text

Example: Remove all blanks from field [Text with lots of spaces]

REPLACE([Text with lots of spaces],” “,””)

How does it work?

REPLACE function replaces all blanks by nothing (“”).

42 – Trim spaces

Description: Clean all unnecessary blanks from a string, i.e. replace 2 or 3 or 4, etc. consecutive blanks by one blank

Example: Clean the field [Text with lots of spaces] from all unnecessary blanks

REPLACE(
REPLACE(
REPLACE(
REPLACE(
TRIM([Text with lots of spaces]),
”     “,” “),
”    “,” “),
”   “,” “),
”  “,” “)

How does it work?

Nested REPLACE statements replace all unnecessary blanks. Replace 5 consecutive blanks first, then 4 consecutive blanks, etc. The TRIM function inside the nested REPLACE statements removes all leading and trailing spaces first.

Disclaimer: The solution eliminates blank substrings up to 38 spaces. Starting the REPLACEs with 6 spaces will work for substrings up to 208 blanks.

Alternative solution
If you know for sure that a certain character (e.g. a pipe “|”) is not part of the string, you can also use this formula:

REPLACE(
REPLACE(
REPLACE(
TRIM([Text with lots of spaces]),” “,” “+”|”),”|”+” “,””),”|”,””)

43 – Remove line feeds

Description: Remove the line feeds from a string

Example: Remove the line feeds from the field [Name with non-printable chars]

REPLACE([Name with non-printable chars],”
“,” “)

How does it work?

REPLACE(string, substring, replacement) replaces the line feed by a blank.

Note: REPLACE([Name with non-printable chars], CHAR(10),” “) does not work in Tableau!You have to insert the line feed in the string in the Calculated Field editor between the quotation marks, i.e. the line feed shown in the formula above is essential.

44 – Extract first number

Description: Extract the first number in a string

Example: Extract the first number found in the dimension [Text w/ and w/o Numbers] and return 0 if the field does not contain a number

Condition: Numbers and words are separated by a blank, i.e. “lorem 123 ipsum” will work, but “lorem123 ipsum” and “lorem 123ipsum” will not work

IF [26-Find first number] > 0 THEN
FLOAT(
MID(
[Text w/ and w/o Numbers],
[26-Find first number],
IF FIND([Text w/ and w/o Numbers],” “, [26-Find first number]) = 0 THEN 
999
ELSE
FIND([Text w/ and w/o Numbers],” “, [26-Find first number])-[26-Find first number]+1
END))
ELSE
0
END

How does it work?

The formula is based on the Calculated Field [26-Find first number].

The outer IF clause checks if there is a number in the string. If not, the formula returns 0 (ELSE). If there is a number, the MID statement extracts a substring starting at the result of “26-Find first number”. The inner IF clause checks if there is a blank following after the starting position of MID. If there isn’t a blank in the rest of the string, all remaining characters to the end are extracted (999). Otherwise all characters from the starting point to one position left to the next blank are returned. FLOAT finally converts the extracted substring to a number.

Caution: this is a very very complex calculation and will definitely hit the performance of your Tableau workbook. Use this only if you really need it.

Download Links

The Tableau packaged workbook for free download on Tableau Public:

String Calculations in Tableau (on Tableau Public)

Finally, if you are interested, here is the Microsoft Excel workbook used as the data source:

Download String Example Data (79.5K)

Stay tuned.

Update on Saturday, 28th of June, 2014

Leonid Koyfman (again!) was kind enough to thoroughly review the entire article and all formulas. He found a couple of bugs in my original Calculated Fields and provided fixes for all of them as well as some great improvements to some other formulas.

I updated the post and the workbook. Since Leonid’s suggestions affected more than a dozen formulas, I refrained from explaining the bugs, fixes and improvements. I simply replaced my formulas and explanations by Leonid’s.

Many thanks again, Leonid!

Add a Comment

Your email address will not be published. Required fields are marked *