Animate cumulative data with Tableau

Use a custom SQL data connection to animate cumulative data on the page shelf in Tableau

Inspired by Nathan’s Walmart growth movie, Daniel Ferry recently had a very interesting post at his outstanding blog Excel Hero. Daniel presented a beautiful Excel implementation of animating the growth of Walmart, plotting dynamic named ranges on an XY scatter chart against a background image map of the US.

There is nothing to add to Daniel’s great post and implementation with regards to the use of Microsoft Excel. But how about Tableau? Can you create animations like this with Tableau Software?

At first sight this should be a piece of cake: If you think of animating data with Tableau, of course the page shelf is the first thing that comes to your mind, isn’t it? Dragging a field (the year of the opening date of the stores in our example) to the page shelf allows you to either manually navigate through all the years or to use the playback controls for a slide show.

However, the page shelf creates a view on the currently selected page. Thus, dragging the opening date on the page shelf would show an animation only displaying the location of the new Walmart stores in the current year. At the end of the animation, for instance, the visualization would include solely all stores opened in 2006 instead of all stores opened since 1962.

Therefore the page shelf and Tableau’s built-in mapping functionality are only half the battle won. We need a little tweak to visualize and animate the cumulative data, i.e. all Walmart stores from the very beginning.

Today’s post presents a way of emulating Daniel’s Excel implementation with Tableau. As always including the Tableau packaged workbook for free download.

The idea

The basic idea is using a custom SQL statement in the data connection to create a second date field called [actualdate] to be used on the page shelf.

The how-to tutorial

Step 1: Clean up the data

Please be advised that this is an optional step and you do not really need to do this. To keep the data source lean, I simply took the data from Daniel’s Excel workbook and deleted everything except for the opening dates and the zip codes. Since Tableau automatically generates the latitudes and longitudes of geographical fields (including US zip codes), we do not have to take care of that.

Step 2: Connect to the data using a custom SQL statement

When connecting to the Excel workbook, activate the radio button “Custom SQL” in step 2 and type in the following SQL statement:

SELECT
[Walmart$].[opendate] AS [opendate],
[Walmart$].[zipcode] AS [zipcode],
[t2].[opendate] AS [actualdate]
FROM
[Walmart$], [Walmart$] AS [t2]
WHERE
[Walmart$].[opendate] <=
[t2].[opendate]

See the next section for further details on what this SQL statement does.

Step 3: Set up the Tableau visualization

The rest is a piece of cake:

  • Drag the dimension [zipcode] to the level of detail shelf. The measures latitude and longitude (generated by Tableau) will automatically go to the column and row shelf.
  • Drag the the dimension [actualdate], generated by our custom SQL statement, to the page shelf and choose YEAR(actualdate)
  • Drag the dimension [opendate] to the color shelf, convert it to continuous, select the blue color palette and activate the tick “Reversed”. This is an optional step. It adds another visual effect to the marks on the map: the older the store, the darker the color of the dot and vice versa.

That’s it.

The background – what does the SQL statement do?

The core piece of this workaround is the custom SQL statement. All other steps are only the usual suspects of creating animated visualizations with Tableau. But how does this custom SQL thing work?

If you are familiar with writing SQL statements, I think you can easily skip this section. If not, I will not discuss the SQL in detail, but rather try to explain the effects of the statement from the result’s point of view:

The custom SQL refers to the underlying Excel workbook containing the original source data and adds another date field called [actualdate]. Basically, the statement “creates” a cross-tab by multiplying the field [opendate] with itself. This virtual cross-tab would look like this:

The cross-tab contains all opening dates in the row headers, all opening dates again in the column header (called [actualdate]) and the according zip code in the table cells, if the opening date is less or equal than the actual date.

Of course the SQL statement does not really produce the cross-tab shown above. It rather produces a database table:

Please be advised that the table contains only the 3 data fields. I added the column with the remarks left to the table manually.

If you create a Pivot Table using Microsoft Excel or Access on this data, you will receive exactly the cross-tab shown above.

In a nutshell: we are using the Custom SQL statement to enhance the data by adding a new date field and duplicating the fields [opendate] and [zipcode] for every entry where [actualdate] is larger or equal than [opendate]. Thus, the number of data rows increases from 3,176 to 5,222,087.

Agreed, it’s brute force to bloat the data to this extent and this leads to a couple of disadvantages (see next section). But it does the job.

A workaround. No more, no less.

The drawbacks

Here are the drawbacks coming with the presented technique:

  • Additional time and effort needed to set up the customized SQL statement
  • Considerably slowing down the workbook’s performance: due to the bloated data source, executing the query takes quite a while (will get even worse when using large data sets): opening the workbook and connecting to the data takes ~70 seconds on my computer. Annoying, no doubt about it, but I think the effect is worth waiting. The performance of the animation itself, however, is not affected.
  • Creating an additional field with the Custom SQL statement inherently bloats the data source. You have to be aware of this fact when conducting other data analysis and visualizations. CNT(opendate), for one, will return a by far larger number than you may expect.

The download link

If you have Tableau Desktop or Tableau Reader installed, here is the packaged workbook for free download:

What’s next?

As I already promised at the end of my last post, the next article will discuss something completely different: a mathematical optimization method implemented in Microsoft Excel. Furthermore, I am planning to write one more post on Tableau’s data animation capabilities and to start a discussion on the power of Tableau’s calculated fields during the upcoming weeks.

Stay tuned.

Update on November 23, 2010

With Tableau 6, you do not need this workaround anymore. You can easily create motion charts of cumulative data using the new functionality “Show History”. For more details please refer to this follow up post: Animate Cumulative Data with Tableau 6.

Add a Comment

Your email address will not be published. Required fields are marked *