CPSC 340 - Machine Learning and Data Mining (Spring 2024)

Lectures Sections (beginning January 8): Instructors: Jeff Clune and Jiarui Ding

Instructor office hours: Whichever prof teaches that Wednesday will have office hours at 2:50 (i.e. right after the second class ends, including walking back with anyone who wants to chat while walking, and then office hours will conclude in that profesor's office). The hour will be 2:50-3:50. Students can go straight to the office and wait for the professor to return. Professor Clune's office is X863. Professor Ding's is ICCS X541.

Tutorials (beginning January 15):

Teaching assistants: TBA
TA office hours (all in Demco Learning Centre): See Piazza

Frequently Answered Questions

Midterm information
TBD

Synopsis: We introduce basic principles and techniques in the fields of data mining and machine learning. These techniques are now running behind the scenes to discover patterns and make predictions in various applications in our daily lives. We will focus on many of the core data mining and machine learning technologies, with motivating applications from a variety of disciplines.

Registration: Undergraduate and graduate students from any department are welcome to take the course. Undergraduate students should enroll in CPSC 340 while graduate students should enroll in CPSC 540 (when it is offered; CPSC 540 also has an extra small project component). Below are more details on registration for each course:

Starting in the second week of classes, we will have weekly tutorials run by the TAs. These will do things like go through provided assignment code, review background material, review big concepts, and/or do exercises. You can register for particular tutorial sections if you want to save a seat at a particular time, but note that you do not need to register in a tutorial section.

Prerequisites:

Students who do not meet these requirements should consider taking CPSC 330 ("Applied Machine Learning").

Textbook: There is no required textbook for the class. A introductory book that covers many (but not all) the topics we will discuss is the Artificial Intelligence book of Rusell and Norvig (AI:AMA) or the Artificial Intelligence book of Poole and Mackworth (you may need these for other classes). More advanced books include The Elements of Statistical Learning (ESL) by Hastie et al., Murphy's Machine Learning: A Probabilistic Perspective (ML:APP) which can be accessed through the library here, and Bishop's Pattern Recognition and Machine Learning (PRML). For books with a bigger focus on data mining, see Introduction to Data Mining (IDM) and Mining of Massive DataSets.

Related Courses: The most related course is CPSC 330: Applied Machine Learning. This course has fewer prerequisities and covers some of the same material, but focuses more on applications rather than understanding ML ideas in depth. A discussion on the difference between CPSC 340 and similar courses in statistics written by a former student (Geoff Roeder) is available here (this was written in 2016 so may be out of date).

Grading (tentative):

Assignments: There are a total of 6 written assignments for this course. Please follow the instructions linked here to submit your assignments.

List of topics

We will roughly cover the following topics:

Lectures, Assignments, Related Readings, and Links

Date Slides Related Readings and Links Homework and Notes
Mon Jan 8 Motivation and Syllabus What is Machine Learning? Machine Learning
Rise of the Machines Talking Machine Episode 1
Mathematics for Machine Learning
Assignment 1 (pdf)
Assignment 1 (tex/code/data)
Wed Jan 10 Exploratory Data Analysis Gotta Catch'em all Why Not to Trust Statistics
Visualization Types Google Chart Gallery Other Tools
Fri Jan 12 Decision Trees A Visual Introduction to Machine Learning, Decision Trees Entropy What is Big O Notation?
AI:AMA 19.2-3, ESL: 9.2, ML:APP 16.2
Big-O Notes
Mon Jan 15 Fundamentals of Learning 7 Steps of Machine Learning IID Cross-validation Bias-variance No Free Lunch
AI:AMA 19.4-5, ESL 7.1-7.4, 7.10, ML:APP 1.4, 6.5
Course Notation Guide
Wed Jan 17 Probabilistic Classifiers Conditional probability (demo) Naive Bayes Probabilities and Battleship
AI:AMA 12.6, ESL 4.3, ML:APP 2.2, 3.5, 4.1-4.2

Probability Notes Probability Slides
Fri Jan 19 Non-Parametric Models K-nearest neighbours Decision Theory for Darts Norms
AI:AMA 19.7, ESL 13.3, ML:APP 1.4
Assignment 1 due
Assignment 2 (pdf)
Assignment 2 (tex/code/data)
Mon Jan 22 Ensemble Methods Ensemble Methods Random Forests Empirical Study Kinect
AI:AMA 19.8, ESL: 7.11, 8.2, 15, 16.3, ML:APP 6.2.1, 16.2.5, 16.6
Withdrawl deadline
Wed Jan 24 Clustering Clustering K-means clustering (demo) K-Means++ (demo)
IDM 8.1-8.2, ESL: 14.3
Fri Jan 26
More Clustering DBSCAN (video, demo) Hierarchical Clustering Phylogenetic Trees
IDM 8.4
Mon Jan 29
Outlier Detection Empirical Study
IDM 8.3, ESL 14.3.12, ML:APP 25.5
Wed Jan 31
Least Squares Linear Regression (demo, 2D data, 2D video) Least Squares Essence of Calculus Partial Derivative Gradient
ESL 3.1-2, ML:APP 7.1-3, AI:AMA 19.6
Fri Feb 2
Nonlinear Regression Why should one learn machine learning from scratch? Essence of Linear Algebra Matrix Differentiation Fluid Simulation (video)
ESL 5.1, 6.3
Linear Algebra Notes
Linear/Quadratic Gradients

Assignment 2 due
Assignment 3 (pdf)
Assignment 3 (tex/code/data)
Mon Feb 5
Gradient Descent Gradient Descent Convex Functions
Wed Feb 7
Robust Regression ML:APP 7.4
Fri Feb 9
Feature Selection Genome-Wide Association Studies AIC, BIC
ESL 3.3 , 7.5-7
Mon Feb 12
Regularization ESL 3.4., ML:APP 7.5, AI:AMA 19.4
Wed Feb 14
More Regularization RBF video RBF and Regularization video
ESL 6.7, ML:APP 13.3-4
Fri Feb 16
Linear Classifiers Perceptron
ESL 4.5, ML:APP 8.5
Assignment 3 due
Mon Feb 19
Midterm Break
Wed Feb 21
Midterm Break
Fri Feb 23
Midterm Break
Mon Feb 26
More Linear Classifiers Support Vector Machines
ESL 4.4, 12.1-2, ML:APP 8.1-3, 9.5 14.5, AI:AMA 19.6
Assignment 4 (pdf)
Assignment 4 (tex/code/data)
Wed Feb 28
Feature Engineering Gmail Priority Inbox
Fri Mar 1
Kernel Trick ESL 12.3, ML:APP 14.1-4
Mon Mar 4
Guest Lecture ESL 12.3, ML:APP 14.1-4 MIDTERM
Wed Mar 6
Stochastic Gradient Stochastic Gradient Descent, Theory and Practice
ML:APP 8.5
Fri Mar 8
Boosting, Start of MLE AdaBoost (video) XGBoost (video)
ML:APP 16.4
Mon Mar 11
MLE and MAP Maximum Likelihood Estimation
ML:APP 9.3-4
Wed Mar 13
PCA Principal Component Analysis
ESL 14.5, IDM B.1, ML:APP 12.2
Fri Mar 15
More PCA Making Sense of PCA SVD Eigenfaces Assignment 4 due
Max and Argmax Notes
Mon Mar 18
Sparse Matrix Factorization Non-Negative Matrix Factorization (original - access from UBC)
ESL 14.6, ML: APP 13.8
Assignment 5 (pdf)
Assignment 5 (tex/code/data)
Wed Mar 20
Recommender Systems & MDS Recommender Systems Netflix Prize
Fri Mar 22
Neural Networks Google Video What is a Neural Network? Interactive Guide
ML:APP 16.5, ESL 11.1-4, AI:AMA 21.1
Mon Mar 25
Deep Learning Fortune Article Deep Learning References Alchemy
ML:APP 28.3, ESL 11.5, AI:AMA 21.2 and 21.4-5
Wed Mar 27
Deep Learning Fortune Article Deep Learning References Alchemy
ML:APP 28.3, ESL 11.5, AI:AMA 21.2 and 21.4-5
Fri Mar 29
UBC Holiday Assignment 5 Due
Mon Apr 1
UBC Holiday Assignment 6 (pdf)
Assignment 6 (tex/code/data)
Wed Apr 3
Deep Learning But what is a convolution?
Fri Apr 5
Deep Learning, Begin CNNs if time permits Convolutional Neural Networks
ML:APP 28.4, ESL 11.7, AI:AMA 21.3
Mon Apr 8
CNNs
Wed Apr 10
More CNNs
Fri Apr 12
Conclusion Assignment 6 Due

Mike's Demos

In semesters where Mike Gelbart taught the course, he included Jupyter notebooks associated with most lectures. These notebooks are available here (note that the lecture numbers may not exactly match the current semester's course).

Related courses that have online notes



Mark Schmidt > Courses > CPSC 340