Writing and Reading Tableau Views to and from Databases and Text Files – Part 1

Download and Install R

The first step is to download the R platform from:

https://www.r-project.org/about.html

or there are mirror download sites at:

https://cran.r-project.org/mirrors.html

As of this writing, the most current version of R is version 3.2.5. The examples in this article should work with versions 3.0 and higher. The computer this writer uses runs R 3.2.4.

Download the R distribution, which should contain a 64 and 32 bit implementation of R. If you are running 64 bit Windows, you can install the 32 and 64 bit versions of R.

Note: If you are running 64 bit Windows and 32 bit Microsoft Office and want to write data to an MS Access database, make sure to install the 64 and 32 bit R implementations. If you are running 32 bit Windows, your only option is to download the 32 bit R distribution.

In your Windows Program Groups (assuming 64 bit Windows), you’ll see:

Windows Programs Group

which confirms that both 32 and 64 bit R were installed. R comes with a simple editor called Rgui, that you can use to write R scripts. As you can see, this writer is using R version 3.2.4.

RGui

I suggest you head over to www.rstudio.com and download an alternative development environment named R Studio.

RStudio

It’s a free download, easy to install, and easy to use.

Download, Install and Load R Packages Rserve and RODBC

You can extend the functionality of your base R installation by downloading and installing packages. R has a very dedicated and productive developer community, and most packages from this community are well documented. To run the examples in this post, you will need to install two R packages that are not included in your base R installation: Rserve and RODBC.

Start up RStudio and invoke the Install Packages command from the Tools menu.

Install R Packages 1

An Install Packages dialog will appear. Enter Rserve and RODBC (separated by a comma) in the packages field as shown below and click on Install.

Install R Packages 2

R Studio will download and install these two packages within your R installation. You should then see something that looks like:

Install R Packages 3

By the way, both of these packages should be available to the Rgui Editor, should you choose to use it.

The next step is to load the RODB and Rserve packages. Within RStudio, by default on the right hand side of the screen, you will see a list of available packages:

Load R Packages 1

Locate the RODBC and Rserve packages and tick the box to the left of each of the package names. RStudio loads the RODBC and Rserve packages. The R Scripts you will write as you work through this post also load RODBC, but pre-loading RODBC in RStudio ensures this package will be available when referenced from Tableau.

Load R Packages 2

Load R Packages 3

As Rserve is a binary .exe file, we also need to start it from within RStudio so that Tableau can access R packages within Calculated Fields.

Start Rserve

Type Rserve() and press ENTER at the > prompt as shown above.

Configure Tableau to use your R Installation

To use Tableau (through version 9.2) with R, select Help -> Settings and Performance – > Manage R Connection as shown below

R Connection 1

and enter localhost (or 127.0.0.1) in the Server field and 6311 as the Port.

R Connection 2

In Tableau 9.3 and above, Select Help -> Settings and performance -> Manage External Service connection as shown below.

R Connection 3

Enter 127.0.0.1 or localhost in the Server field and 6311 as the Port.

R Connection 4

Then click on the Test Connection Button, and you should then see:

R Connection 5

You are now ready to use R with Tableau.

NOTE: The supplied tutorial Tableau Workbook (a download link is at the end of part 2 of this post) was made in Tableau 9.3, and cannot be opened with earlier versions of Tableau Desktop. You can however use earlier versions of Tableau (starting with version 8.1) to re-create the examples shown in parts 1 and 2 of this post. 

Set Up your Database and an ODBC Data Source

You then need to export the Excel Superstore Orders data that comes with Tableau to a database. The examples in this article use a SQL Server Database, but you could export the data to Microsoft Access, Oracle, or any database server that supports ODBC.

NOTE: Microsoft SQL Server and Microsoft Access versions of the Superstore data set provided in Excel by Tableau that the author used to create the tutorial workbook are provided for download to use when re-creating the examples discussed in both parts of this post.  Download links for the data and other resources are at the end of the second part of this post.

If your version of Access is 32 bit, but you’re running 64 bit Windows, see the appendix at the end of this article, which describes how to configure a 32 bit ODBC connection under 64 bit Windows before proceeding. Make sure that you export the data in the Orders sheet in Excel to your database. This should result in there being a table named Orders in the database.

The next step is to configure an ODBC Data Source (in Windows 7 or 8 via Control Panel – > Administrative Tools -> ODBC (in OS X, use ODBC Manager). To configure an ODBC data source in Windows 10 check out this link which explains how to run the ODBC Connection Manager within Windows 10:

5 Ways to Open ODBC Data Source Administrator in Windows 10

Again, if you are running 32 bit Access under 64 bit Windows, please read the appendix at the end of this post before proceeding.

Below are screens for creating an ODBC Data Source (Windows) for Data in SQL Server for a database named Tableau_Superstore_Orders. This database contains the Superstore Sales Data (in Excel) that ships with Tableau.

Open the ODBC Administrator Tool, select the System DSN Tab and then Add to add a new Data Source as shown below:

ODBC Set Up 1

For SQL Server, select either the 10.0 (SQL Server 2008) or 11.0 (SQL Server 2012) driver as shown below, and then click on Finish.

ODBC Set Up 2

Another dialog will then appear:

ODBC Set Up 3

Give the ODBC Data Source a name and indicate the SQL Server Instance name, and then click on Next.

Another dialog appears:

ODBC Set Up 4

Indicate how the login to the database will be authenticated. By default, Integrated Windows authentication has been selected. If you administrate the machine SQL Server is installed on, this option should work. Otherwise, enter credentials for a SQL Server user who is authorized to log into the database. Then click on Next.

Change the default database to Tableau_Superstore_Orders in the next Dialog that appears as shown below, and then click on Next.

ODBC Set Up 5

Another dialog will open. Accept the default entries and click on Finish.

ODBC Set Up 6

A dialog appears that lets you test the ODBC connection you just configured. If there are no issues, you will see a message like what you see below:

ODBC Set Up 7

Closing the dialog boxes above will bring you back to the main screen of the ODBC Administrator. Click on OK to close the ODBC Administrator.

Create a new Tableau Workbook and Connect to the Database

We can open Tableau. The author used Tableau 9.3 when writing this post. Create a new file and make a connection to the Tableau Superstore data you exported to a database. The author exported the data to SQL Server. The Excel Superstore sales data is in a sheet named Orders, so there should be a table named Orders in the database you exported the Excel data to.  If you download and open the tutorial Tableau workbook mentioned earlier (in Tableau 9.3 format), you could open that workbook as well, and use it for reference as you build your new workbook based on the content of this post.

Create Calculated Fields in Tableau

After connecting to the Superstore Data, define a new Calculated Field named OrdersSQLString.

The calculation is:

‘SELECT * FROM Orders’

Note the single quotation marks surrounding the text of the SELECT statement.

Now define another calculation that connects to the Superstore Data in SQL Server and writes out the results from a SELECT query to a text file. The calculation will be named RunOrdersQuery.

SCRIPT_STR(“

library(RODBC)

crs$odbc

crs$odbc.tables

crs$dataset

write.table(crs$dataset, file = ‘C:\Data\Clearly_And_Simply\RODBC_Article\OrdersQuery.csv’, row.names = FALSE, quote = FALSE, sep = ‘|’)

close(crs$odbc)

‘Query Results from refreshed Data Connection'”

,[OrdersSQLString]

)

NOTE: It has been reported that including the ‘$‘ character within the name of an RODBC object name (for example crs$odbc, crs$odbc.tables, crs$dataset) in a calculation could cause a runtime error on some computer systems and display ‘Object not found’ error.  Leonid Koyfman suggests replacing the ‘$’ character with an underscore ( _ ) in the affected calculation.

Also note that the calculation above contains a reference to a hard coded path to a file on disk.  You can alter this path in the calculation to a location of your choice on your system.  You will need to create the required directories and sub-directories (i.e. folders) on the path to the destination file on your system before running the calculation.  The RunOrdersQuery calculation will not create them.

Calls to R functionality within Calculated Fields have to follow certain conventions. The calculation must start with the word SCRIPT_ and then specify a return value, in this case a String (STR), followed by an open parenthesis and a quotation mark. The string return value of the calculation is the informational message “Query Results from refreshed Data Connection”.

The calculation then proceeds to connect to the Tableau_Superstore_Orders ODBC data source you created and runs a query of the value contained in the incoming parameter .arg1.

The write.table statement within the calculation writes out the results of the SQL command to a text file to file name indicated in the command.  You can change the file path and name as desired. Note that you need two backslashes to indicate drive and directory names when constructing the file path. Note the | (Pipe) delimiter which prevents scrambled data in the output text file if any data returned by the query contains commas in a data field.

The calculation then closes the ODBC connection and displays the informational message, which – as mentioned earlier – is the string return value for the calculation. Immediately following the informational message is a closing quotation mark that offsets the quotation mark that follows the open parenthesis in the first line of the calculation.

The last line of the calculation contains a comma, and then the value of .arg1 – the value of the OrdersSQLString calculation you defined earlier.

The Calculated Field ends with a closed parenthesis.

Disable Auto-Updating

For our Calculated Fields to work as expected, we need to disable Auto-Updating of worksheets.

Why do you have to update the view (and other views with similar queries) manually?

We do the manual update to control when the Table Calculation that executes an SQL command actually runs. This is similar to re-calculating an Excel worksheet manually. If we left auto updates on, every change made to the view (adding or removing fields, formatting, filtering, etc.) would cause any calculation running SQL commands to run, which could cause all sorts of problems in your database. In this example, the text file would simply be re-written to disk. Turning auto update off solves this problem.

From the Worksheet menu, select Auto updates, and untick Auto update worksheet.

Auto Update Disabling

Create a Tableau View

After you have written these two Calculated Fields, set up a Tableau view that looks like:

Tableau View 1

Note that Tableau considers all calculations that use R to be Table Calculations.

Now run a manual update of the worksheet by clicking on the Run update icon or tab the F9 function key.

pic025_Run_UpdateRunning the update executes the RunOrdersQuery Calculation. The view should then look like:

Tableau View 2

We’re seeing the sum of sales from our connection to SQL Server, which is no surprise. We see the values of the Orders SQL String Calculation and the return value of the RunOrdersQuery Calculation above the bar.

Create a New Data Connection

We now want to configure a new data connection to the text file that the RunOrdersQuery calculation should have created.

Create a new sheet, configure a Text File connection, and point to the folder specified in the RunOrdersQuery calculation.

pic027_Tableau_Data_Connection_1After making the connection, and back on the Tableau worksheet, set up a view that looks like:

Tableau View 3

If you see what appears above, the “proof of concept” we have worked through worked expected. You’ve connected to a database using ODBC within a Calculated Field, and re-directed the output from a SQL Query to a table in that database to a text file.

If you navigate to the folder on your system where the text file is, and open the text file in an editor, it will look like:

pic029_Text_FileParameters and Calculated Fields for User defined Conditions

We’ll now define another “proof of concept” calculation that includes a WHERE clause. Start by creating two parameters: Category Condition and Regional Condition.

pic030_Parameter_1

pic031_Parameter_2If you haven’t used parameters in Tableau before, check out:

Tableau Knowledge Base: Create Parameters

Create a new sheet and place the two parameters on the sheet as shown below:

Tableau View 4

Create a Calculated Field called  Category WHERE

Calcualted Field Category WHERE

This calculations returns the value of the currently selected category in the Category Condition Parameter

Create another Calculated Field named  Regional WHERE

Calcualted Field Regional WHERE

This Calculation returns the value of the currently selected region in the Regional Condition Parameter.

Next: a Calculated Field named Full SQL Statement

Calcualted Field Full SQL Statement

We can now write another calculation named RunFilteredOrdersQuery, which will take the calculation Full SQL Statement as an argument.

SCRIPT_STR(“

library(RODBC)

crs$odbc

crs$odbc.tables

crs$dataset

write.table(crs$dataset,file=

‘C:\Data\Clearly_And_Simply\RODBC_Article\FilteredOrdersQuery.csv’, row.names = FALSE, quote = FALSE, sep = ‘|’)

close(crs$odbc)

‘Query Results from refreshed Data Connection'”

,[Full SQL Statement]

)

This calculation works just like the RunOrdersQuery Calculation. It establishes an ODBC connection to the SQL Server Databases, and runs the SQL Query contained in the Full SQL Statement Calculation.  Remember to specify the correct path to the output file in your version of the calculation.

The Full SQL Statement Calculation uses parameters to gather input for a WHERE clause. The parameters are currently set to the Office Supplies Category in the South.

Additional Tableau Worksheets

In a new sheet, with a connection to the Orders table in your back end database, disable automatic updates for this sheet as shown previously, and set up a view that looks like:

Tableau View 5

Now run the RunFilteredOrdersQuery Calculation by manually updating the worksheet.

You’ll then see:

Tableau View 6

Again, the sales numbers are coming from SQL Server.

Create a new sheet and make a Data Connection to the text file that the RunFilteredOrdersQuery calculation should have created.

Tableau Data Connection 2

When we go back to the Tableau worksheet, we see that sales are constrained to the Office Supply Category in the south

Tableau View 7

If you were to go back to sheet 3, change the parameter values, manually update the worksheet (which re-runs the RunFilteredOrdersQuery Calculation), then come back to sheet 4 and refresh the FilteredOrdersQuery Data Source, the Bar Graph in sheet 4 will reflect the new filtering.

Appendix: 64 bit Windows and 32 bit Microsoft Access

If you are running 64 bit Windows and have 32 bit Microsoft Access, this appendix describes the steps to follow to configure your environment to run 32 bit R and create a 32 bit ODBC Data Source.

Go to https://cran.r-project.org/bin/windows/base/.

Download the R platform installer that contains 32 and 64 Bit versions of R. When running the installer you downloaded, 32 and 64 bit R will be installed by default – multiple versions of R can be installed on a single PC. If you have not already done so, download and install RStudio as described earlier in this post. If you are running 32 bit Windows, you are also running 32 bit Tableau, and only the 32 bit version of R will be installed on your system.

Next: create a 32 Bit ODBC Data Source using the 32 bit version of the Windows ODBC Connection Manager.

Using Windows Explorer, navigate to C:WindowsSysWOW64odbcad32.exe (under Windows 7 and Windows 8). If you are using Windows 10, check out

5 Ways to Open ODBC Data Source Administrator in Windows 10

Double click on odbcad32.exe to load it. Select one of the available Microsoft Access database drivers to configure your Data Source. Create an ODBC data source as described earlier in this post (a User or System Datasource). Creating the data source includes a step where you navigate to your MS Access database file with Windows Explorer and select it with your mouse.

Next: Open R Studio – tell RStudio that you want to use the 32 Bit version of R. Select Global Options from the Tools menu as shown below and select the 32 Bit R Installation as your default R implementation to use.

Appendix RStudio Options

Close and restart RStudio to save your changes.

Next: Install the Rserve and RODBC R Packages within your 32 Bit R Installation. From the Tools menu, select Install Packages and type in Rserve, a comma and then RODBC in the Packages field. Tick the “Install Dependencies” option – then click on Install.

Appendix RStudio Install Packages

As described earlier in this post, configure Tableau to use your 32 bit R installation.

Load the RODBC package and then load and start Rserve as described earlier in this post.

Use the 32 bit MS Access ODBC Data Source name you defined in your Wrapper Calculations that run SQL statements.

If you are running 64 bit Windows, and want to run 64 bit R later, follow the instructions you followed to use 32 bit R in reverse. Select Global Options from the Tools menu as shown below and select the 64 Bit R installation as your default R implementation to use. Close and restart RStudio to save the change.

To recap:

  1. Download an R platform installer that contains 32 and 64 bit R versions
  2. Configure your RStudio installation to use 32 Bit R
  3. Within RStudio, install the Rserve and RODBC Packages
  4. From the RStudio Console, load the RODBC package and start Rserve
  5. Create a 32 bit ODBC Data source using C:WindowsSysWOW64odbcad32.exe
  6. Configure your Tableau to R connection as described earlier in this post.
  7. Reference the name of the 32 bit ODBC data source you created in your Calculated Fields that reference the RODBC package and execute SQL statements.

Remember that SQL statements to be executed must be written in the dialect of SQL that your back end database expects, in this case Microsoft Access.

[End of Appendix]

Part 2 – Coming soon

So far, we have only run SELECT queries. The second part of the article will show how to run INSERT, DELETE, UPDATE or INSERT INTO (append data) queries using the same approach.

Stay tuned.

Add a Comment

Your email address will not be published. Required fields are marked *