Below are several data modelling patterns you are likely to encounter:
The Monolithic table design involves joining all source tables together into a single denormalized representation. Tabular is able to group / aggregate and filter rows easily in this model, so while care needs to be taken when writing DAX expressions, the resulting cube will perform well.
- Easy to get started.
- Performs well.
- DAX expressions trickier to write.
- Cube loading times may suffer.
- Only similar-grained data can be accommodated.
When facts are derived from disparate sources, a monolithic design is not practical. In this case, multiple fact tables can be conformed by presenting a façade of filtering tables. Unlike traditional OLAP dimensions, these table do not need to present surrogate keys – only the union of unique columns values that appear in the joined fact tables.
- Easy to implement.
- Fact tables perform relatively well.
- Quickly becomes messy due to all the small filter tables - User experience degraded.
The end user experience can, in some cases, be improved by hiding facts behind a chain of filtering tables. When this chain of tables present behavior that is consistent with an end-user’s understanding of the business, the model becomes easier t consume.
Such chains usually perform moderately when compared to other virtuous patterns, but can provide an ideal end-user experience.
- Balance between performance and end user experience.
- Harder to implement – careful planning needed to figure out how to push each filtering column into just one place upstream of its facts.
- Is only useful when calculations only rely on one fact table at a time – i.e. where the formula engine can convert the entire expression into an “inner joined” SQL statement.
Avoid designs where the same filter(able) columns appear in multiple places. Inevitably, business will ask a question that involve both facts tables. The filter can only be (easily) applied to one table or the other, but not both.
- Add a common filtering table, or
- Combine (union) the facts into a monolithic table.
While many-to-many relationships are possible in analysis services, they rarely perform well since they are often bridges between large tables millions of rows.
Alternatives to consider:
- Combine all of the fact tables.
- Combine two of the fact tables into a child of the remaining table.
Patterns that rely on large (>1m rows) tables to facilitate joins between facts perform poorly.
The reason for this is that the formula engine cannot resolve the joins in a single query to the storage engine, and so instead it:
- Uses the filter to query all keys from the large intermediate table.
- Passes the keys back to the storage engine to get the first set of facts.
- Extremely slow measures (minutes instead of milliseconds)
- Memory exhaustion failures.
- Combine the fact tables.
- Create a third [combined] fact table.
- Join the fact tables directly to the filtering table.
For similar reasons to the “large intermediate table” design, using filtered child tables to filter parent tables can result in lack-lustre performance, stemming from how the formula engine resolves the query.
Alternatives to consider:
- Link the parent table directly to the filtering of the small dimension.
- Combine the child and parent table.