Tuesday, December 9, 2008

Data Modeling Approach for DWH and Data Marts



What is Data Modelng ?

We most of tech people know what is data modeling .but when u have to define it "Data modeling is the process of creating and extending data models which are visual representations of data and its organization " The ERD Diagram (Entity Relationship Diagram) is the most popular type of data model. Data models exist at multiple levels including


1. The Conceptual Data Model (Subject Area Model) describes data from a high level. It defines the problem rather than the solution from the business point of view. It includes entities and their relationships. Typically the conceptual data model is developed first.


2 The Logical Data Model (Business Area Model) describes a logical solution to a data project. It provides more details than the conceptual data model and is nearly ready for the creation of a database. These details include attributes, the individual pieces of information that will be included. Typically the logical data model is developed second.


3. The Physical Data Model describes the implementation of data in a physical database. It is the blueprint for the database. Typically the physical data model is developed third.




Characteristic of DWH Data Model-

1. datawarehouse data model should be Enterprise focus i.e means that the data contained in it does not have a bias toward one part of the enterprise over another.

2.it is assumed that the data within data warehouse does not violate any business rules established by the enterprise.

3. The data warehouse must be loaded with new data as quickly and efficiently as possible. The bulk of the work to get data into a data warehouse must occur in the ETL process, leaving minimal time to load the data.

4. The data warehouse must be set up from the beginning to support multiple BI technologies

5. The data warehouse must gracefully accommodate change in its data and data structures.




Data Model Objects --

Subject : - A subject area is the subset of the enterprise’s data and consists of related entities and relationships. Customers, Sales, and Products are examples of subject areas.

Entity : - An entity is generally defined as a person, place, thing, concept, or event in which the enterprise has both the interest and the capability to capture and store information.

Element or Attribute: - An element or attribute is the lowest level of information relating to any entity.

Relationships: - A relationship documents the business rule associating two entities together


Data Modeling Approach--

There are four approaches with generic data models as shown below in the diagram.






Activities to Make A DWH Physical Data Model


1.Prepare Subject Area Model.

2.Identify the Subject Area involved (Entities).
3.Identify the entities of interest.
4.Determine the relationships between pairs of entities.
5.Add all Important attributes.
6.Confirm the model’s structure.
7.Confirm the model’s content.
8.Prepare Business Data Model.
9.Define List of Reports currently used.
10.Analyze the Reports Input Data.
11.Analyze Users Requirements.
12.Analyze source Data Used.
13.Review and Finalize.
14.Create Gap Analysis Document and its solution.
15.Determine data elements.
16.Add Time components.
17.Add Derived data.
18.Adjust granularity details.
19.Finalized the Summery Data.
20.Improve data delivery.
21.Segregate Balance data.
22.Define missing Data.
23.Baseline Data Model ER Model.



DHW Data Model Guidelines --

1. Normalization
Use 3NF for DWH Data Model to contain all Atomic Data , Use IDEF1X data modeling Standards.
2. Understand the Business Model
Understand Specific requirement for Company.
3. Model the Calendar
Analyze Business Calendars.
Define Date Keys and Date time and further normalize and demoralize the calendar.
4. Model the Hierarchies
Define Hierarchy Depth , Percentage and Texture.
Check for Customer , Product , Balanced and Ragged Hierarchies.
5. Model the Business Transactions
Understand business use of the data warehouse
Check average lines per transactions
Check Business Rule Concerning Changes
Check for Snapshot Interfaces - Complete or Current
Check for Delta Interfaces - Columnar , Row
Check for Loading Process.
6. Follow Naming Standards
7. Stamp all rows with create date , Update Date and Source


Data Mart Data Model Guidelines --



















The Physical data model will be developed as per guidelines of this sample given below.
The Details for data model development guidelines are given bel
1.Use Dimensional Modeling Standards
a)Schema Design with star and snow flaking model
b)Use surrogate primary keys for dimensions
2.Fully Develop a Data Subject Area
a)Use DWH Data for Facts etc
b)Use the Dimension Data for Dimensions, i.e. Product, Model, etc.
c)Don’t keep unused application attributes
3.Follow Naming Standards – Table Names, Attribute Names
4.Stamp all rows with Create Date, Update Date, Source
5.Use Surrogate Keys for all Dimension Tables and Keys for all Fact Tables
6.Source Application Surrogate Keys will be kept in the tables but demoted to non-key status
7.Prefer using Normalized Dimension Tables (where Dimension data changes less frequently)
8.Snow Flaking is the preferred approach for dimension tables. Some Dimension tables can be de normalized if frequency of dimension attribute changes is high.


Data Mart Data Model Activities:-






Step 1: Understand source database
Step 2: Identify tables referenced by data sourcing
Step 3: Remove irrelevant attributes
Step 4: Create First Cut Model
Step 5: Identify Facts and Dimensions
Step 6: Integrate Model with common Dimensions
Step 7: Create Second Cut Model
Step 8: Add derived data
Step 9: Create Final Model