ETL

Dimensional Modelling – Fact and dimension tables..

Fact: Fact is measurement of a particular business process or business itself. Fact tables are used to store the measurements as a track of business process performance.

E.G if we are selling some product at some dollar value then the measurement will be the units sold and amount of dollars earned from that product’s sale.

Each row in fact table is one measurement event. And this data is stored at specific level of detail which is called ‘grain’ such as per above example, it could be one row per product sold. Single fact table should have same grain size to avoid data inconsistency.

Mainly there are 3 type of of  fact tables:

1.  Additive : Additive facts are the ones which can be added together to generate some insights from the data. E.g To calculate total sales of a region for a year in terms of dollar value can be calculated by summing up the Sales amount for last 1 year records in fact table.
2. Semi- Additive: Semi additive facts which cant be added in all cases. E.g Account balance , can not be summed across the time dimension. i.e. If in morning account balance is \$100 and by evening after doing the transaction it becomes \$50 then total balance by end of the day can be calculated by adding \$100 and \$50.
3. Non-additive : These type of facts can never be added e.g unit price of a product. We can only perform average and count operations on this type of measures.

While storing a data in fact table, one should consider putting a valid fact information e.g if there is no sales activity for a given product than ,no row should be put in fact table. We should not populate zeroes in the fact table to represent no activity. Because despite not capturing no activity fact tables take up 90% of the space in dimensional models. We should be very judicious in using fact table space.

Grain of fact table can be categorized in 3 parts:

1. Transaction: This is the most common type of grain fact tables.
2. Periodic Snapshot
3. Accumulating Snapshot

Fact tables have 2 or more foreign keys connect to dimension table’s primary keys. E.g in above picture office id is primary key in dealers office dimension whereas it is foreign key in Fact table. Fact tables generally have their own primary keys composed of subset of the foreign keys of dimensions also called composite key. Fact tables have many to many relationships.

Dimension Tables: Dimension table contains the textual information in context to the fact tables. They describe ‘who, what, when ,where, how and why’ associated with the business events. E.g Time , DealersOffice are dimensions in above example. Dimensions have more attributes than fact tables but lesser rows. Each dimension table has single primary key on basis of which it can be joined with fact table.

It serves as source of the report labels and make DW/BI system understandable and useful. Naming of dimensional attribute should resemble the business properties and not always be some code which has to be memorize to understand and decode while reporting.

Sometimes looking at numeric value , it is tough to determine whether it is dimension or fact e.g cost of a product can serve as dimension also which can be fact as well as dimension. Continuously valued numeric are almost always considered as facts where discrete observations are considered as dimensions.

Fact and Dimensional tables in Dimensional modelling

Above diagram is snowflakes schema some property business. In Dimensional modelling schema are kept as simple as possible but not simpler. So that data can be processed with less joins. Dimensional model should be well planned so that when business wants to analyze the data separately every time , schema need not be changed. So generally data is kept at the lowest level of granule. Atomic data which is not aggregated is the most expressive data.

While creating a report from the dimensional models, dimension attributes supply report filters and labeling , whereas fact tables supply numeric values.

Kimball’s DW/BI architecture:

There are 4 different components to consider the DW/BI architecture:

1. Operational Source systems i.e. OLTP : These systems generally captures business transactions. E.g Point of Sales transaction systems are OLTP systems.
2. ETL systems : Extract Load Transform systems. This comes between OLTP and DW/BI presentation area. Detail on ETL can be chekced on article.
3. Data Presentation Layer