Data mining is used to discover patterns and relationships in data. Emphasis is placed on large complex data sets such as those in very large databases or through web mining. In this course, we will study the most common methods and techniques used in analyzing and modeling real world data. Course topics include linear models, classification, regularization, decision trees, association rules, clustering, and case based methods.
Summer Quarter : Jun. - Aug. 2024
Lecture : Monday, Wednesday 4:30 PM - 5:50 PM
Review : Friday 4:30 PM - 5:50 PM
Location : Packard 101
You can find an up-to-date list of times here. We will be hosting office hours both in person and over Zoom (using QueueStatus). Please check specific office hour sessions for details.
We use ed for course communication. Any questions regarding course content and course organization should be posted on ed. You are strongly encouraged to answer other students' questions when you know the answer.
All lectures this quarter will be presented in person. Recordings will subsequently uploaded to Canvas.
What are the pre-requisites? Introductory statistics / probability (preferably at a graduate level, e.g. STATS 116). Linear algebra & Multivariable calculus (e.g., Math 51).Computer programming (e.g., CS 105). The course will allow you to choose between R and Python languages.
We rely heavily on An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Springer, 2nd ed., 2021) for this course. The book is also available at the Stanford Bookstore and free online through the Stanford Libraries.
We also occasionally rely on material and readings from The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman (Springer, 2nd ed., 2009).
Can I audit or sit in?In general we are very open to sitting-in guests if you are a member of the Stanford community (registered student, staff, and/or faculty). Out of courtesy, we would appreciate that you first email us or talk to the instructor after the first class you attend. If the class is too full and we're running out of space, we would ask that you please allow registered students to attend.
What if I cannot make the exam dates?We must receive prior notification and justification of your impending absence in order to authorize a make-up exam. Messages must be sent by email at least a week prior to the start of the exam. An exam must be made up within one week of the original exam date. There will be no exceptions.
What if I'm taking the exam remotely through SCPD?Remote SCPD students must designate an "exam monitor" to proctor their exams (local students have the option of taking the exam at Stanford at the standard in-class time in the standard classroom). You will find general information on SCPD exam monitor protocol here.
Please call or email SCPD directly for more information on choosing an exam monitor, where to send exam solutions, etc. You will have a window of 24 hours after the exam time at Stanford to complete and return the exam. Exam-specific instructions (e.g., resources allowed and time limit) will be provided within each exam and also in advance through the website and/or mailing list.
Acknowledgments. HTML taken from various CS courses given at Stanford: cs229, cs231a, cs231n, and cs236.