Thursday, April 3, 2014

Chapter 8 : Accessing Organizational Information-Data Warehouse



History of Data Warehousing
  • Data warehouse extend the transformation of data into information.
  • In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business function.
  • The data warehouse provided the ability to support decision making without disrupting the day-to-day operations.

Data Warehouse Fundamentals
Data warehouse a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks.
  • The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes.
  • The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information. This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repositoryData warehouses support only analytical processing (OLAP)

Extraction, tranformation and loading (ETL) - a process that extracts information from internal and external databases, transforms the information using common set of enterprise definitions and loads the information into a data warehouse.

  • The ETL process gathers data from the internal and external databases and passes it to the data warehouse.
  • The ETL process also gathers data from the data warehouse and passes it to the data marts.
Data mart - contains a subset of data warehouse information.


    • The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL.
    • It then send subsets of information to the data marts through the ETL process.


    Multidimensional Analysis and Data Mining
    • Databases contain information in a series of two-dimensional tables. 
    • In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows.

    Dimension - a particular attribute of information.

    • Dimensions could include products, promotions, stores category, region, stock price, date, time and weather.
    • The ability to look at information from different dimensions can add tremendous business insight.
    • By slicing-and-dicing the information a business can uncover great unexpected insights. 

    Cube - common term for the representation of multidimensional information.

    • Users can slice and dice the cube to drill  down into the information.
    • Cube A represents store information (the layers), product information (the rows), and promotion information (the columns).
    • Cube B represents a slice of information displaying promotion II for all products all stores.
    • Cube C represents a slice of information displaying promotion III for product B at store 2.

    Data mining - the process of analyzing data to extract information not offered by the raw data alone.

    • To perform data mining users need data-mining tools.

    Data-mining tool - uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making. 

    Information Cleansing or Scrubbing

    • An organization must maintain high-quality data in the data warehouse.
    Information cleansing or scrubbing - a process that weeds out and fixes or discards inconsistent, incorrect or incomplete information.
    • Contact information in an operational system.


    • Standardizing Customer name from Operational Systems.



    • Information cleansing activities.
    • It allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse.


    • Accurate and complete information.



    Business Intelligence

    Business Intelligence - information that people use to support their decision-making efforts.
    • Principle BI enablers include: technology, people and culture.

    0 comments:

    Post a Comment