0% found this document useful (0 votes)
42 views15 pages

Asset v1 HKUSTx+MSBD5002x+1T2022+Type@Asset+Block@01 Overview

The document outlines the MSBD5002 course on Data Mining and Knowledge Discovery, detailing major topics such as Association, Clustering, Classification, Data Warehousing, and Web Databases. It includes references to key textbooks and provides examples of concepts like frequent patterns in association rules and decision trees in classification. The course aims to equip students with essential data mining techniques and their applications.

Uploaded by

o.elfaidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views15 pages

Asset v1 HKUSTx+MSBD5002x+1T2022+Type@Asset+Block@01 Overview

The document outlines the MSBD5002 course on Data Mining and Knowledge Discovery, detailing major topics such as Association, Clustering, Classification, Data Warehousing, and Web Databases. It includes references to key textbooks and provides examples of concepts like frequent patterns in association rules and decision trees in classification. The course aims to equip students with essential data mining techniques and their applications.

Uploaded by

o.elfaidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

MSBD5002 (Micromaster)

Data Mining and Knowledge


Discovery
Overview

Prepared by Raymond Wong


Presented by Raymond Wong
raywong@cse

MSBD5002 1
Course Details
 Reference books/materials:
 Papers

MSBD5002 2
Course Details
 Data Mining: Concepts and
Techniques. Jiawei Han and Micheline
Kamber. Morgan Kaufmann Publishers
(3rd edition)
 Introduction to Data Mining. Pang-
Ning Tan, Michael Steinbach, Vipin
Kumar Boston : Pearson Addison
Wesley (2006)

MSBD5002 3
Major Topics
1. Association
2. Clustering
3. Classification
4. Data Warehouse
5. Web Databases

MSBD5002 4
1. Association
Custom Apple Orange Milk
We are interested
er
in the
Raymond Apple Orange items/itemsets
Ada Orange Milk with frequency >=
2
Grace Apple Orange
… … … …
Items/Itemsets Frequency
Frequent Pattern
Apple 2 (or Frequent Item)
Orange 3
Frequent Pattern
Milk 1 (or Frequent Item)
{Apple, Orange} 2
Frequent Pattern
{Orange, Milk}
MSBD5002
1 (or Frequent Itemset) 5
1. Association
Custom Apple Orange Milk
We are interested
er
in the
Raymond Apple Orange items/itemsets
Ada Orange Milk with frequency >=
2
Association Rule:
Grace Apple Orange 1. Apple  Orange
… … … (… customers who buy
100%
Items/Itemsets Frequency apple will probably buy
orange.)
Apple 2
Orange 3
3
2. Orange  Apple
Milk 1 ( 67% customer who buy
2 orange will probably buy
{Apple, Orange} 2 apple.)
{Orange,
Problem: toMilk} 1
find all frequent
MSBD5002
patterns and association 6
rules
Major Topics
1. Association
2. Clustering
3. Classification
4. Data Warehouse
5. Web Databases

MSBD5002 7
2. Clustering
Cluster 2
(e.g. High Score in History
and Low Score in Computer)

History
Comput History
er
Raymon
100 40
d
Louis 90 45
Wyman 20 95
Cluster 1 Computer
… … … (e.g. High Score in Computer
and Low Score in History)
Problem: to find all clusters

MSBD5002 8
Major Topics
1. Association
2. Clustering
3. Classification
4. Data Warehouse
5. Web Databases

MSBD5002 9
3. Classification
Suppose there is a person.
Race Incom Child Insuranc
e e
whit high no ?
e
root
child=yes child=no
100% Yes
0% No
Income=high Income=low
100% Yes 0% Yes
0% No 100% No

Decision tree
MSBD5002 10
Major Topics
1. Association
2. Clustering
3. Classification
4. Data Warehouse
5. Web Databases

MSBD5002 11
4. Warehouse
Query

Databases Users

Need to wait for a long time


(e.g., 1 day to 1 week)

Data Users
Databases
Warehouse

Pre-computed results
MSBD5002 12
Major Topics
1. Association
2. Clustering
3. Classification
4. Data Warehouse
5. Web Databases

MSBD5002 13
5. Web Databases

Raymond Wong

MSBD5002 14
How to rank the webpages?

MSBD5002 15

You might also like