ASAP COURSES
Data Mining
Course Contents
1.1 Understand the importance of data mining and the knowledge discovery that can be made from information repositories in a business. (1 hr)
1.2 Understand techniques of preparing real-world data for performing data mining.
Data pre-processing · ETL, ELT · Data janitor · Datamart · Data warehousing (2 hrs)
1.3.1 Introduction to Supervised and Unsupervised Learning (1 hr)
1.3.1.1 Bias–Variance trade-off (1 hr)
1.3.1.2 How to handle unstructured data (2 hrs)
1.3.1.3 How to handle unbalanced data (2 hrs)
1.3.1.4 Model validation (2 hrs)
Practicals: 9 hrs
Practicals: 6 hrs
1.3.4 Decision Trees (2 hrs)
1.3.5 Random Forest (2 hrs)
1.3.6 Neural Networks, ANN (1 hr)
1.3.6.1 Basics of Deep Learning Models (2 hrs)
1.3.6.2 CNN, RNN, LSTM (1 hr)
1.3.7 Support Vector Machines (1 hr)
1.3.8 Boosting, Bagging (1 hr)
Practicals: 14 hrs
1.4.1 Simple Linear Regression (1 hr)
1.4.2 Logistic Regression (1 hr)
1.4.3 Linear Discriminant Analysis (1 hr)
1.4.4 Text Analytics (1 hr)
1.4.5 Web Scraping (1 hr)
1.4.6 Image Analysis (1 hr)
Practicals: 16 hrs
• MBA (Market Basket Analysis)
• Apriori Algorithm
• Mining Association Rules
Practicals: 2 hrs
• A/B Testing
• Discriminative Testing
• Analytical and Affective Testing
• Perception Testing
Practicals: 1 hr
Continue
Data Mining: Knowledge Discovery from Business Repositories
Understand the importance of data mining and the knowledge discovery process in business decision making.
Objective: Introduce data mining, business knowledge discovery, and real‑world analytical interpretation.
Today’s Topics
- Introduction & Business Use Cases
- Data Quality & Preprocessing
- Classification & Clustering
- Association Rule Mining
- Evaluation & Deployment
1. Why Data Mining?
We live in the data age. Every second, massive volumes of raw data are generated from businesses, sensors, devices, online services, science, medicine, and social platforms. This overwhelming data growth demands powerful tools that can automatically extract patterns, insights, and knowledge.
Why Mining Data?
- Data volumes are too large for manual analysis.
- Businesses need fast and accurate decision-making.
- Hidden patterns must be discovered automatically.
- Data mining supports prediction, classification, and trend discovery.
2. Moving Toward the Information Age
People say we live in the “information age,” but in truth we live in the data age. Scientific tools, sensors, web systems, businesses, and telecommunications produce enormous data streams daily. Automated, intelligent tools are needed to convert this raw data into meaningful knowledge.
Examples of Huge Data Sources
- Retail transactions (e.g., hundreds of millions/week)
- Scientific experiments and sensor networks
- Engineering and industrial process data
- Medical records, imaging, and patient monitoring
- Telecommunication networks moving petabytes daily
- Web logs, online platforms, social media activity
3. Data Mining as the Evolution of Information Technology
Data mining did not appear suddenly — it is the natural outcome of decades of progress in information technology. As data volumes grew, IT shifted from simple file processing to advanced analytical intelligence.
Major Evolutionary Stages
- Data Collection & Database Creation – early file systems, basic storage.
- Data Management – indexing, retrieval, OLTP, relational systems.
- Advanced Data Analysis – data warehousing, OLAP, data mining.
4. Early Growth: Data Collection → Data Management
From the 1960s to the 1980s, the focus shifted from basic data collection to structured data management. This phase established the foundation required for modern analytical technologies.
- Hierarchical and network DBMS laid the early structure.
- Relational databases became dominant in the 1970s–80s.
- Efficient indexing, storage, and query processing evolved rapidly.
- OLTP technologies enabled fast, reliable transactional systems.
- User-friendly query languages (SQL) transformed data access.
5. Advanced Database Systems & Data Warehousing
From the mid-1980s onward, research expanded database systems beyond traditional relational structures to support complex data, large-scale integration, and multidimensional analysis.
Key Advances
- Object-oriented & object-relational databases for complex data.
- Spatial, temporal, multimedia, sensor, and scientific DBs emerged.
- Data warehouses unified multiple heterogeneous sources.
- ETL, data cleaning, and integration became essential processes.
- OLAP technologies enabled multidimensional analysis.
