History of Data Warehousing
- Data warehouse extend the transformation of data into information.
- In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business function.
- The data warehouse provided the ability to support decision making without disrupting the day-to-day operations.
Data warehouse - a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks.
- The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes.
- The primary difference between a database and a data warehouse is that a database stores information for a single application, whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information. This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository. Data warehouses support only analytical processing (OLAP)
Extraction, tranformation and loading (ETL) - a process that extracts information from internal and external databases, transforms the information using common set of enterprise definitions and loads the information into a data warehouse.
- The ETL process gathers data from the internal and external databases and passes it to the data warehouse.
- The ETL process also gathers data from the data warehouse and passes it to the data marts.
Data mart - contains a subset of data warehouse information.
- The data warehouse modeled in the above figure compiles information from internal databases or transactional/operational databases and external databases through ETL.
- It then send subsets of information to the data marts through the ETL process.
- Databases contain information in a series of two-dimensional tables.
- In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows.
Dimension - a particular attribute of information.
- Dimensions could include products, promotions, stores category, region, stock price, date, time and weather.
- The ability to look at information from different dimensions can add tremendous business insight.
- By slicing-and-dicing the information a business can uncover great unexpected insights.
Cube - common term for the representation of multidimensional information.
- Users can slice and dice the cube to drill down into the information.
- Cube A represents store information (the layers), product information (the rows), and promotion information (the columns).
- Cube B represents a slice of information displaying promotion II for all products all stores.
- Cube C represents a slice of information displaying promotion III for product B at store 2.
Data mining - the process of analyzing data to extract information not offered by the raw data alone.
- To perform data mining users need data-mining tools.
Data-mining tool - uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making.
- An organization must maintain high-quality data in the data warehouse.
Information cleansing or scrubbing - a process that weeds out and fixes or discards inconsistent, incorrect or incomplete information.
- Contact information in an operational system.
- Standardizing Customer name from Operational Systems.
- Information cleansing activities.
- It allows an organization to fix these types of inconsistencies and cleans the data in the data warehouse.
- Accurate and complete information.
Business Intelligence
Business Intelligence - information that people use to support their decision-making efforts.
- Principle BI enablers include: technology, people and culture.
0 comments:
Post a Comment