Components of a Data Warehouse (…continued)
January 19th, 2009Data refining
Data refining is process of creating subsets of the enterprise data warehouse, which have either a multidimensional or a relational organization format for optimized performance. This process usually lies within the entire BI architecture. The atomic level of information from the logical data model needs to be aggregated, summarized, and modified for specific requirements.
This data refining process generates aggregates, data marts and OLAP cubes that:
• Create a subset of the data in the model suitable for analysis.
• Create calculated and derived attributes.
• Summarize the information at a granularity as per requirement for analysis
• Aggregate the information.
Physical database model
In BI, talking about the physical data model is talking about relational or Multidimensional data models. Both database architectures can be selected to create departmental data marts, but the way to access the data in the databases is different:
• To access data from a relational database, common access methods like SQL or middleware products like ODBC can be used.
• Multidimensional databases require specialized APIs to access the usually proprietary database architecture.
Physical data model has following key attributes:
o Specification of all tables and columns.
o Foreign keys are used to identify relationships between tables.
o Denormalization may occur based on user requirements.
o Physical considerations may cause the physical data model to be quite different from the logical data model.
o At this level, the data modeler will specify how the logical data model will be realized in the database schema.
The steps for physical data model design are as follows:
1. Convert entities into tables.
2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical constraints / requirements.
Logical data model
A logical data model is a graphical representation of the information requirements of a business area, it is not a database. The word logical is very critical because it modifies the phrase data modeling to a more specific activity. A logical data model is independent of a physical, data storage device.
This is the key concept of the logical data model. The reason that a logical data model must be independent of technology is simply because technology is changing so rapidly.
In addition to the previously mentioned physical database model, a logical data model plays key pivotal role in assuring the real world objects/processes being mapped into BI solutions. There are different approaches for Logical Data Modeling.
• Star-Join Schema
• 3 NF Schema
Start-Join Schema is the most commonly used logical database model. The Star-Join Schema consists of two components:
• Fact tables
• Dimension tables
A very generic definition for those two components of the Star-Join
Schema is:
• Fact Tables — “what are we measuring?”
Contain the basic transaction-level information of the business that is of interest to a particular application. In marketing analysis, for example, this is the basic sales transaction data. Fact tables are large, often holding millions of rows, and mainly numerical.
• Dimension Tables — “by what are we measuring?”
Contain descriptive information and are small in comparison to the fact tables. In a marketing analysis application, for example, typical dimension tables include time period, marketing region, product type etc.
A Logical data model has following key attributes:
o Includes all entities and relationships among them.
o All attributes for each entity are specified.
o The primary key for each entity specified.
o Foreign keys (keys identifying the relationship between different entities) are specified.
o Normalization occurs at this level.
In Logical Data Model, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).
Steps for designing the logical data model are as follows:
1. Identify all entities.
2. Specify primary keys for all entities.
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve many-to-many relationships.
6. Normalization.
Metadata information
Metadata is information that describes another set of data. In a business intelligence (BI) platform, metadata links information in a data store to business entities and rules that define a BI application. Metadata structures the information in the data warehouse in categories, topics, groups, hierarchies and so on. It helps us by providing information about the data within a data warehouse. Few examples are as follows:
• It can be “Subject oriented”, based on abstractions of real-world entities like (’project’, ‘customer’, organization’, ‘MSISDN’, ‘Billing Account’)
• It defines the way in which the transformed data is to be interpreted, for example (’5/9/99′ = 5th September 1999 or 9th May 1999 — British or US, 0 for Female, 1 for Male etc.)
• Gives information about related data in the Data Warehouse.
• Estimates response time by showing the number of records to be processed in a query.
• Holds calculated fields and pre-calculated formulas to avoid misinterpretation, and contains historical changes of a view.
From a data warehouse administrator’s perspective, metadata is a full repository and documentation of all contents and all processes in the data warehouse, whereas, from an end user perspective, metadata is the roadmap through the information in the data warehouse.
Stay tunned…..