Master of Science in Management & Systems
Data Warehousing and Data Mining

MASY1-GC3510

Professor:Sam Sultan [sam.sultan@nyu.edu]
Class website: [oit2.scps.nyu.edu/~sultans/dwdm] (or) [samsultan.com/dwdm]
Office hours: By Appointment
Course Days: Tuesdays & Thursdays
Course Hours: 6:00pm - 9:00pm

Announcement(s):

+ syllabus
+ outline
+ books
+ grades
+ final project
+ student listing
+ examples & demos
+ homework submission
+ student feedback
+ student evaluation & comments

Session - 1   2   3   4   5   6   7   8a   8b   9   10   11   12  
                1-sql   3-design   5-join   7-group   8-DDL   9-DML   X-func  

Search -
ITS - Data Warehousing - Data Mining - Linear Regression - Tree Induction - Entropy & IG - Co-occurence Grouping - SQL*Tester©- SQL*Chart©
Site Helpful?

COURSE DESCRIPTION:

The course addresses the concepts, skills, methodologies, and models of data warehousing. The course addresses proper techniques for designing data warehouses for various business domains, and covers concpets for potential uses of the data warehouse and other data repositories in mining opportunities.


COURSE LEARNING GOALS:

1. Course Objectives:

In today's organization, the data warehouse is the center of the information systems' knowledge repository. Data warehousing supports informational processing by providing a solid platform of integrated, historical data from which to perform enterprise-wide data analysis. This helps improve profit and guide strategic decision making

Data mining is a recent advancement in data analysis. Data mining exploits the knowledge that is held in enterprise data warehouses and other data stores by examining the data to reveal untapped patterns that suggest better ways to improve quality of product, customer satisfaction and retention, and profit potentials

This course will cover the concepts and methodologies of both data warehousing and data mining.

       The focus of the course will be on the following topics:

2. Student Learning Outcomes:


COURSE REQUIREMENTS AND POLICIES:

See [Requirements and Policies]


BOOKS:

Required Reading & Materials -

Recommended Reading & Materials -

GRADE ASSIGNMENT AND EVALUATION

Contributing factors for determining your course grade include:

Details of Assignment and Evaluation. NYU SPS Grading Scale

Grades are FINAL.

Please do not negotiate for a better grade. If you are expecting to receive a grade of an "A" at the end of the semester, then I expect you to attend all sessions (unless I am notified ahead of time), to participate in these sessions, to keep up with the class reading material, and to complete your homework assigments. This will ensure that you stay current with the class content, and will ensure that you get a good grade on your test(s), project as well as your final grade.


COURSE OUTLINE:

DATE SESSION TOPIC[s] COVERED
 
[Week 1] 1
  • Introduction to Data Warehousing
  • Relationship of Data Mining and Data Warehousing
  • What is a Data Warehouse?
  • Data Warehousing ROI
  • DSS - Decision Support Systems
  • Operational vs. Analytical Systems
  • Evolution of DSS and Data Warehousing
  • OLTP - Online Transaction Processing
  • Characteristics of a Data Warehouse
  • What is a Data Mart? Creating a Data Mart
  • Data Comparison Chart
  • OLAP - Online Analytical Processing
  • Reading: Chapter 1 (both DW Toolkit, and Building the DW),
    Skim thru Glossary (DW Lifecycle Toolkit)
     
    [Week 1] 2
  • Planning & Building the Data Warehouse
  • Sponsorship and Cost Justification
  • Project Prerequisites
  • Barriers, Challenges and Risks
  • Preparing for Implementation
  • Developing the Data Warehouse
  • SDLC Methodologies - Waterfall vs. RUP Approach
  • Planning & Project Management
  • Analysis
  • Logical & Physical Design
  • Implementation and Deployment
  • Operations
  • Reading: Chapter 1, 2 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 2] 3
  • Data Warehouse Design
  • Drivers for Multi-Demensional Analysis
  • Limitations of Relational Models
  • The Data Cube
  • What is dimensional modeling?
  • Advantages of Dimensional Models
  • Logical and Physical Design
  • Data Normalization
  • Benefits and Drawbacks of Data Normalization
  • De-Normalizing of Data
  • Characteristics of a Data Warehouse
  • Subject Oriented, Integrated, Time Variant, Non-Volatile
  • The Star Schema
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 2] 4
  • Data Warehouse Schemas
  • Dimensions and Dimension Tables
  • Facts and Fact Tables
  • The Star Schema
  • The Snowflake Schema
  • Degenerate and Junk Dimensions
  • The Data Warehouse Bus Architecture
  • Conformed Dimensions and Standard Facts
  • Data Granularity
  • Changing Dimensions
  • Reading: Chapter 6 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 3] 5
  • Components of a Data Warehouse
  • Source Systems, Staging Area, Presentation, Access Tools
  • Building the Data Matrix
  • The Four Steps Process
  • Multiple Fact Tables in a single Data Mart
  • Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts
  • Fact and Dimension Table Detail
  • Identifying Source for each Fact & Dimension
  • Mapping from Source to Target
  • Reading: Chapter 7, 4 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 3] 6
  • The ETL Process
  • Extracting the Data into the Staging Area
  • The Challenge of Extracting from Disparate Platforms
  • Full vs. Incremental Extracts
  • Detecting Changes to Data
  • Transforming the Data
  • Complexity of Data Integration
  • Dealing with Missing & Dirty Data
  • Data Transformation Tasks
  • Loading the Data
  • Timing and Job Control of Data Loads
  • Reading: Chapter 9 (The Data Warehouse Lifecycle Toolkit)
     
    [Week 4] 7
  • Midterm Exam

  • Aggregating Data
  • Goals and Risks of Data Aggregation
  • Deciding What to Aggregate
  • Data Sparsity
  • Design Requirement for Aggregates
  • The problem with Aggregates
  • Aggregate Navigators
  • Reading: Chapter 8 p353-357(The Data Warehouse Lifecycle Toolkit)
     
    [Week 4] 8a
  • Selecting the Business Subject
  • Declaring the Grain
  • Choosing the Dimension
  • Identify the Fact
  • Avoiding Null Keys
  • Retail Market Basket Analysis
  • Additive and Semi-Additive Facts
  • The Value Chain Integrated Inventory Model
  • Order Management Data Marts
  • Date and Other Dimension Role Playing
  • Allocation to Lower Level Facts
  • Profit and Loss Data Marts
  • Reading: Chapter 2, 3, 5 (The Data Warehouse Toolkit)
     
      8b
  • CRM Overview
  • Customer Dimension
  • Demographic Dimension Outriggers
  • Date Dimension Outriggers
  • Large Changing Customer Dimension
  • Mini-Dimensions
  • Commercial Customer Hierarchies
  • Fixed vs. Variable Level Hierarchies
  • General Ledger Accounting
  • OLAP role in G/L and Chart of Accounts
  • Time Stamped Employee Dimensions
  • Reading: Chapter 6, 7, 8 (The Data Warehouse Toolkit)
     
    [Week 5] 9
  • Clickstream and Web Based Data Warehouses
  • Overview of Web Based Interaction
  • Challenges of Tracking Data
  • Creating Persistent State on the Web
  • Techniques for Tracking States
  • Working with Cookies
  • User Registration
  • Web Server Log Files
  • Online Advertising
  • Online Page Tracking and Analytics
  • User Dimension and Page Hits Facts
  • Reading: Chapter 15 (The Data Warehouse Toolkit)
     
    [Week 5] 10
  • Data Mining
  • What is Data Mining Good For?
  • Statistics, Artificial Intelligence & Machine Learning
  • Data Mining Examples and Tools
  • Connection between Data Mining and Data Warehousing
  • Retrospective Reporting vs. Predictive
  • Data Mining Applications
  • Data Mining vs. Statistics vs. OLAP
  • Data Mining Statistical Techniques (Sampling, Regression & Decision Trees)
  • Clustering, Segmentation and Nearest Neighbor Techniques
  • Keys to commercial success of Data Mining
  • Reading: Online
     
    [Week 6] 11
  • Data Mining Techniques
  • Hands-on Presentation and Lab
  • Classification, Regression, Similarity Matching, Co-occurence Grouping
  • Predictive Modeling
  • Clustering/Segmentation
  • Data Mining and Statistics Terminologies
  • Supervised vs. Unsupervised
  • Tree Induction
  • Entropy and Information Gain
  • Reading: Online
     
    [Week 6] 12
  • Final Exam
  • Final Project Due


  • All contents © Sam Sultan.
    NYU SPS Master's Degree Program web site
    For more information, send e-mail to: sam.sultan@nyu.edu