Thoughts on data visualization and Tableau
Customizing Grand Totals in Tableau v8 The Stacking Snag
Padding and Working with Null or Missing Values
Formatting Bar Charts and Gantt Charts
Counting from Nothing A Double Remix (or, Partitioning via Table Calculations v2)
Over on the Tableau forums Alexander Mou answered a thread ongenerating a count from sparse data, and the solution he came up with is found in his blog postDynamic Histogram Over Time. In this post Im diving into some details of what Alexander did, coming up with a couple of alternative remixes of that solution, and describing a couple of different ways to effectively partition a table calculation via another table calculation. Read on for details!
This entry was posted inTips and Techniquesand taggedhistogramnested table calculationpartitionpartition by table calculationspartitioningPREVIOUS_VALUE()rankrankingtable calculationson
Comparing Each Against Each Other: The No-SQL Cross Product
Heres a problem that has been bouncing around in my brain since I first used Tableau. How do I compare the results of every permutation of one item vs. another? Heres an example using Superstore Sales I put Region on Rows and Columns, and SUM(Sales) on the Text Shelf, and only see four values:
What if I want to compare Sales in Central to those in East, South, and West, and Sales in East to South and West, and Sales in West to Sales in South simultaneously? We can compare two at a time using parameters or a self-blend, or one vs. the rest in different ways via sets or table calcs or calculated fields, but how about each against each other? What if we want a correlation matrix? Read on to find out how to do this without any SQL, and learn a little bit about domain completion.
This entry was posted inTips and Techniquesand taggedCartesian joincorrelationcorrelation matrixcross joincross productcustom SQLdensificationdomain completionPREVIOUS_VALUE()Rtable calculationstriangular matrixon
SeeWant to Learn Table Calculations? Heres Howfor a list of posts that I recommend.
1. First step in table calcs is what kind of pills you have:
2. What shelf these pill types are on
each of these types has a different effect on the evaluation of a table calc depending on their arrangement.
when you have a dimension on both the rows and columns shelf, Tableau will padd for missing dinension combinations, unless you are using a set based on these dimensions
building your view with all dimensions on the rows shelf is a good place to start, and once results are as expected, then move pills one at a time to other shelf, and see how shelf placement effects evaluation.
if you use the compute using selection from the context menu, or the drop-down in the dialog, and set it to a dimension, that dimension will be used for addressing, and all others will be used for partitioning. I call this Regular Compute Using
when setting the compute using to something like Table/Pane/Cell, I call that Layout Compute Using, and try to avoid this because I find it to be a delicate setting (move a pill, move a dimension pill, and your evaluation will likely be effected).
with Regular Compute Using, if all addressing values do not exist in all partitioning values, the INDEX() value for each addressing value will not be the same across partition values. This is apparent with not all combinations of dimension values exist. (In this situation, there are states that have no sales records for some months, so whit your compute using set to State, and partitioning on Month, each State will not have the same INDEX() value in each month.)
when using a Regular Compute Using and a crosstab of the dimensions in use (one on the Rows, and the other on the Columns), and no Set based on those dimensions, Tableau will pad the data making each State have the same INDEX() value for each Month.
So without the crosstab setup, but with the desire for Tableau to pad the data to cause the INDEX() function to evaluate as desired, you can have Tableau pad the data as desired with an Advanced Compute Using.
When you use multiple dimensions on the Compute Using (right-side list box) for addressing, I think of it as like a pseudo-crosstab of all potential combinations of the dimension values (this is what we want in this situation). If you use a Set instead of multiple dimensions, that data will not be padded.
It is my current belief that if you are aware of what pills you have in use, what shelf they are on, how your compute using is setup, and how all these potential combinations of setups effect the evaluation of table calcs, then table calcs are straightforward.
Without doubt, I have not included all information needed to understand table calcs here, but only some key concepts that play a factor in the attached (fixed) worksheets.
Joes notes on 6.1 to 7.0 differences:
Have you attempted the formula that you recommended at
Instead of ATTR([Status]) != PREVIOUS_VALUE(ATTR([Status]))
try ATTR([Status]) != LOOKUP(ATTR([Status]),-1)
The PREVIOUS_VALUE() function is self referential, kind of like a recursive function or a loop. The argument you pass the function is just the starting value for the partition, while the LOOKUP() function gets the value based on an offset, and an offset of -1 means previous.
In your statement, the first mark will result in False, effectively [Status] != [Status], and every other record will also result in True because it is likely [Status] != False or [Status] != True
The formula ATTR([Status]) != LOOKUP(ATTR([Status]),-1) is basically saying:
Compare the Status of the current mark and the Status of the previous mark in the partition.
I recommend you try some trial and error with the PREVIOUS_VALUE() function
multiple reference lines on the data using table calcs and/or duplicating data
using average age for male & female reference lines, interesting table calcs here
older discussion from 2010 showing use of multiple table calcs trying to get a minimum of a set of values for display, great comments from Ross Bunker
The addressing fields define what part of the table you are computing along. The partitioning fields define how to group the calculation. In the example of a running sum of product sales across several years, the addressing field is the Date field while the parititioning field is the product field. When youd define the addressing for a table calculation, all the other fields are used for partitioning.
You can specify the addressing in the Table Calculation dialog box. The addressing can be relative to the table structure or a specific field. Each addressing option is described below.
This option sets the adressing to compute along the entire table moving horizontally through each partition.
• Partitioning: the scope or grouping of the calculation. This can be the full table, a pane, a cell, a dimension or it can be customized even further for more advanced calculations.
• Addressing: the anchor or the source of each partition. It defines the root of the calculation.
Consider, for example, looking at a running sum of sales by product across several years. In our example, the running sum is partitioned by product: every products sale is summed over time, so the result of the calculation is a running sum of coffee sales, tea sales, etc. The addressing field is the date field. With every new date, the sales of that data are added to the sum. When you define the addressing for a table calculation, all the other fields are used for partitioning.
Table (Across) is a Calculate Along option. The chart below uses Table (Across) to set the addressing to compute along the entire table, moving horizontally through each partition.
When calculation addressing is set to Table( Across), the fields that span horizontally across the table are the addressing fields (Category and Region).
The fields specified in the Compute Using menu or Edit Table Calculation dialog are termed the addressing fields. All of the dimensions on a sheet that are not addressing fields are partitioning fields. This corresponds to the computed along (addressing) and for each (partitioning) components of the description in the table calculation dialog. One other nice thing is that in the final v6, the tooltip for a table calculation will actually contain the description also, so you dont have to open the dialog to understand what it is doing.
Restarting every moves the given field and those above it in the Advanced dialog to be partitioning. Similarly At the level moves fields after it in the list to be partitioning, though there is a subtle difference in how that partitioning is done (its not done on value, but rather on postition within the partition, ill give more details on that in the tutorial).
One might ask why you would put something in the ordering list in Advanced only to partition on it by setting Restarting every. The answer lies in the sorting behavior. For example, if you want to know the top products for each market, you cant simply sort products by sales (which sorts based on sales for all markets). Instead, you sort Market,Product by sales. Then, when you partition on Market, the products within each of those Market partitions are still sorted by the sales within that market. Play around with it to see the difference.
Addressing is the compute along/over/by.
Partitioning is the grouping or scope of the calculation to some degree defined by the context of the calculation, since All dimensions not part of addressing are used for partitioning. Also includes dimensions (and discrete pills) on Level of Detail shelf (and Color shelf too?)
Restarting every moves the given field and those above it in the Advanced dialog to be partitioning.
Example of running sum of sales by product across several years:
product is partition, each new product creates a new returned value of sum of sales
date is address, sum of sales for each new date is added to the returned value for that partition
Where Ive been confused (I think) is that there are three things in table calcs, not two: partitioning, addressing, and what is being calculated
More notes from convo with Joe on 3/2012
I dont want to misrepresent Joes awesome knowledge here, if there are any mistakes in these notes from our conversation they are almost certainly mine.
Table (Across), etc. do a visual sort base on the layout of the view. The results from Table (Across), etc. can sometimes be the same as what you can get by using Advanced Compute Using, they arent always. For example, you can have a complex sort set up in the view and Table (Across), etc. will work just fine, but you cant duplicate that sort using Advanced Compute Using because Advanced Compute Using creates a set of the fields in the Compute Using and then sorts that set along whatever is set in the Order Along.
This can be confusing because the English language description Tableau gives when you hover over the pill in the view can be the same for a Table (Across) and an Advanced Compute Using(), but the results will be different.
Put Container on Rows, Category on Color, and Sales on Columns. Create an INDEX() function and see what it does try to duplicate that using an Advanced Compute Using
Show Me can be used to freeze a setup, typically by Table (Across) or Table (Down). Can be used to get to a view that you cant get w/AdvancedCompute Using. Table (Down) is magic dust with Show Me. ~~~Need an example here, probably one with a funky INDEX().
This creates a rank that shows ties, or can be used to ignore something in a table calc.
Rows: Roll Your Own Index (set Compute Using to Container), Container (sorted on Sum(Sales) Asc) Color Shelf: Category
This allows the index to be computed on container but ignore category for sort and partitioning. This would be a nice feature.
Using dimensions on both Rows and Columns shelves causes Tableau to pad, while dimensions only on Rows doesnt cause Tableau to pad.
~~~Put Container on Rows, Category on Cols, see what INDEX() does Put Container on Rows, Category on Rows, see what INDEX() does will see different marks used now Compute Using set to Across then Down vs. Down.
next example: Put Customer, Order Date on Rows. Index() on Text Index on Customer ranks them Index on Order Date causes padding, Tableau takes a really long time to return data. If you use MONTH(Order Date) it doesnt take as long.
Some details w/display can get lost when using Primary/Secondary sources. example: duplicate Superstore sales, when Customer from both is in the Customer from secondary does not show ATTR(), i.e. doesnt show that its an aggregate. (thought here is to outline the pills)
1. put every dimension on Rows shelf
2. Measure Names as Columns, Measure Values as text 3. Do addressing on the right-most pill
This way, can turn on totals and subtotals to check what they return vis-a-vis the table calcs. Also can see the partitioning and padding. Using a full-blown cross tab makes it more confusing. Also allows filters using table calcs to be tested since all dimensions are available on the Rows shelf.
Table calc filters can only see whats on Columns and Rows shelves, and/or see their own instance on the Level of Detail, so if needed fields for the calc arent on Columns or Rows you can put an instance of the table calc on the Level of detail shelf.
Another advantage to using Joes method for making table calcs work is that table calc filters can be tested as well.
Aggregated/continuous pills can go on Filter shelf Aggregated/discrete pills cant go on Filter shelf Table calc/continuous or discrete pills can go on Filter shelf
Create SUM(1) calculated field, call it Test Agg. As Continuous, it can go on Filter shelf. As Discrete, cant go on filter shelf. Create another calculated field LOOKUP([Test Agg],0). As Continuous or Discrete, it can go on the Filter shelf.
Dont try this with TOTAL(), it can mess things up
Probably shouldnt try this with TOTAL() either
TOTAL() hits DB while WINDOW_SUM() does not, would probably affect order of filters? TOTAL() causes any enclosed aggregate functions to be evaluated over each partition but Im not really sure why Table Across behaves differently to
Uncheck Ignore in Table Calculations so discrete field can be used in a table calc:
Duplicate a dimension and put the copy on the LoD shelf so calcs can be along that, but summed overall.
Discussion of this and partitioning (Ross Bunker)
Using PREVIOUS_VALUE to combine multiple rows into one
Example here is to make a list of names like Jane, John, Joe, etc. from multiple rows.
TODAY() and NOW() generate Internal Expression Error inside table calc
As of Tableau v7, the following formula generates an Internal Expression Error: Function TODAY is not defined in the current context error.
WINDOW_SUM(IF ATTR([Order Date]) = TODAY()-29 THEN WINDOW_SUM(SUM([Sales])) END)
The same happens when NOW() is used instead of TODAY(). The solution is to make a calculated field with NOW() or TODAY(), and then use the calculated field in the function.
Improving table calc partitioning & addressing
[loop category=wikicontent tag=tc,tablecalcs,table-calculations]
[field title] Added [field date]
Related posts:[loop tag=densification,address,addressing,tc,table-calcs,table-calculations,aggregation,index,padding,partition,partitioning,PREVIOUS_VALUE(),rank,ranking,at-the-level exclude=this relation=and compare=not taxonomy=category value=wikicontent]
This entry was posted in and taggedaddressaddressingaggregationat the leveldensificationindexpaddingpartitionpartitioningPREVIOUS_VALUE()rankrankingtable calculationson
Top 10 Table Calculations The Next N, Where N
Last year I did the big workbook onconditional formattingto answer some really common questions on the TableauCommunity Forums. One of my projects lately has been to do the same for table calculations, which are incredibly powerful, sometimes incredibly complicated, and I believe underutilized. Tableau put together a set ofTop 10 Table Calculations, heres a list Ive compiled of the next N most-commonly useful table calculations, based on volume of questions on the forums and relative ease of construction (theres no densification, domain padding, domain completion, or any of that stuff in this batch):
Filter Top N Without Affecting Results
Filter 1st Time Period from Difference from Prior
Aggregating at Different Levels
Filtering Out Extra Marks by Using a Duplicate on the Filters Shelf
Nesting Table Calculations to Aggregate in Different Directions
Performance One Computation to Return Same Result to All Rows
Extending an Axis with an Invisible Reference Line
And of course, theres a workbook with instructions! Click to view and download thenext N table calculations workbookon Tableau Public or click the image below:
I cant claim to have originated any of these calculations, thanks to Ross Bunker, James Baker, Joe Mako, Andy Cotgreave, Richard Leeke, and others Im sure Im forgetting for their work!
If you have any other really common uses for table calculations, leave a comment!
This entry was posted inTableau v8 – The KrakenTips and Techniquesand taggedfilteringINDEX()LOOKUP()PREVIOUS_VALUE()sortingtable calculationson
Creating a Dynamic Range Parameter in Tableau
Using a Filter Action as a Parameter
Creating Lists of Values for Tableau from Text Excel Sources
Waffle Charts and Unit Charts on Maps
Im a father of an eight year-old who loves math and gymnastics. Im also a husband, consultant forDataBlick, TableauZen MasterandForum Ambassador, former massage therapist, somatic experiencing practitioner, writer, and meditator. My not-so-visual blog about parenting is at
Enter your email address to subscribe to this blog and receive notifications of new posts by email.
Creating a Dynamic Parameter with a Tableau Data Blend
Padding and Working with Null or Missing Values
Formatting Time Durations in Tableau
Older But Still Useful – Conditional Formatting
TDE or Live? When to Use Tableau Data Extracts (or not)
LOD Expressions and Separate Custom Grand Totals for Rows and Columns