CSI 5v93: Machine Learning, Spring 2005
Announcements
- Thursday, 3/24/2005
- The description of the paper presentation is posted. Please read it and get started on the paper presentation.
- Tuesday, 3/1/2005
- The fourth assignment is posted. It will be due on March 22, and has a considerable amount of work; please start on it immediately.
- Tuesday, 2/15/2005
- Here are my accuracy results for the k-nearest neighbor classifier on
the ZIP data, training on the training data. You can compare these with
your own results. Note the fact that the test accuracy goes up slightly
with k = 5 over k = 1.
accuracy k = 1 k = 5 k = 25 k = 125 training set 1.000 0.979 0.947 0.890 test set 0.943 0.944 0.918 0.855 - Friday, 2/11/2005
- The third assignment is posted.
- Friday, 1/28/2005
- Please see the second assignment for an update (see the announcement at the top of the page).
- Thursday, 1/27/2005
- The second assignment is posted, and the due date has been changed to February 8th.
- Tuesday, 1/11/2005
- Welcome to the course! The first assignment is posted.
Objectives
This is a course in machine learning, a subfield of artificial intelligence. Machine learning is a very big, interesting, and fast-growing field. The central problem we address in this class is how to use the computer to make models which can learn, or make inferences, from data. Further, we would like to use the learned models to make predictions about unknowns.
This course covers:
- introduction to machine learning
- supervised and unsupervised learning
- models for regression and classification, including linear methods, naive bayes, kernels, and support vector machines
- statistical and mathematical methods for learning
- model assessment
- recent research in the field
- experiments with the concepts learned
This list of topics is optimistic. Be prepared to invest the time necessary to understand the concepts, and to do the programming projects. My best advice is to attend the lectures, read the book, ask questions, and start projects early.
Practical information
Lectures are from 8:00 AM to 9:00 AM in Rogers 312 on Tuesdays and Thursdays.
My office is in the Rogers Engineering and Computer Science building. My office hours are T-F, 10-11 AM, and by appointment. I am often in my office and am glad to talk to students.
Schedule
Here is an aggressive schedule of the material we will cover:
Week | Dates | New topics | Chapters | Tuesday | Thursday |
---|---|---|---|---|---|
1 | Jan 11, 13 | Introduction | 1, 2 | Notes, Homework 1 assigned | Notes |
2 | Jan 18, 20 | Notes | Notes, Homework 1 due | ||
3 | Jan 25, 27 | Linear regression methods | 3 | Notes, Homework 2 assigned | Notes |
4 | Feb 1, 3 | Notes | Notes | ||
5 | Feb 8, 10 | Linear classification methods | 4 | Notes, Homework 2 due | Notes, Homework 3 assigned |
6 | Feb 15, 17 | Notes | Case study | ||
7 | Feb 22, 24 | Bayesian learning | Mitchell 6 | Case study, Homework 3 due | Notes |
8 | Mar 1, 3 | Notes, Homework 4 assigned | Notes | ||
9 | Mar 8, 10 | Notes | Notes | ||
10 | Mar 15, 17 | spring break | spring break | ||
11 | Mar 22, 24 | SVMs | 12 | Notes | Notes, Homework 4 due, Paper presentations assigned |
12 | Mar 29, 31 | Notes | Notes | ||
13 | Apr 5, 7 | Paper presentations | Notes | Paper presentations | |
14 | Apr 12, 14 | Paper presentations, Homework 5 assigned | diadeloso | ||
15 | Apr 19, 21 | Unsupervised learning | 14 | Notes | Notes |
16 | Apr 26, 28 | Notes | Notes, Homework 5 due |
The final exam is on Saturday, May 7th, between 2-4 PM.
Textbooks & resources
Required text: we will be using The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. You can purchase this book from the Baylor bookstore or amazon, among other places.
Optional text: Machine Learning by Tom Mitchell.
Further online resources:
- We will use Blackboard as a class discussion board. Feel free to post questions and responses there which do not violate the non-collaboration agreement.
- KDD dataset repository
- Matlab tutorial
- LaTeX introduction, another LaTeX introduction, LaTeX reference
Grading
Grades will be assigned based on this breakdown:
- paper presentation: 25%
- projects: 45%
- final exam: 30%
Here is a tentative grading scale:
A: 90-100, B+: 88-89, B: 80-87, C+: 78-79, C: 70-77, D: 60-69, F: 0-59
Some projects may be worth more than others. Exams are closed-book. The final will be comprehensive.
Policies
- Check this website every day for updates and announcements. We only meet three times a week, but I may post updates at any time. It is your responsibility to follow these updates by reading this website.
- All work in this course is strictly individual, unless the instructor explicitly states otherwise. While discussion of course material is encouraged, collaboration on any work for the course is not allowed. Collaboration includes (but is not limited to) discussing with anyone other than the professor any material that is specific to completing an assignment. You are encouraged to discuss the course material with the professor, preferably in office hours, and also by email.
- Baylor policy requires 75% class attendance from each student. Even "excused" absences are included in the overall absent count. If a student attends less than 75% of the classes, he or she will automatically fail the course.
- In order to facilitate keeping attendance, on the second class meeting I will ask you to choose a seat for the rest of the course. Please sit in your chosen seat for the remainder of the course.
- Projects which are late are not accepted. Exams are the only things which may be made up with prior arrangement (made at least one class before to the exam) or due to illness, with a note from a health care professional.
- Bring any grading correction requests to my attention within 2 weeks of receiving the grade or before the end of the semester, whichever comes first. After that, I will not adjust your grade. If you find any mistake in grading, please let me know.
Academic honesty
I take academic honesty very seriously.
Many studies, including one by Sheilah Maramark and Mindi Barth Maline have suggested that "some students cheat because of ignorance, uncertainty, or confusion regarding what behaviors constitute dishonesty" (Maramark and Maline, Issues in Education: Academic Dishonesty Among College Students, U.S. Department of Education, Office of Research, August 1993, page 5). In an effort to reduce misunderstandings in this course, a minimal list of activities that will be considered cheating have been listed below.
- Copying another student's work. Simply looking over someone else's source code is copying.
- Providing your work for another student to copy.
- Collaboration on any assignment, unless the work is explicitly given as collaborative work.
- Using notes or books during any exam.
- Giving another student answers during an exam.
- Reviewing a stolen copy of an exam.
- Plagiarism.
- Studying tests or using assignments from previous semesters.
- Providing someone with tests or assignments from previous semesters.
- Taking an exam for someone else.
- Turning in someone else's work as your own work.
- Studying a copy of an exam prior to taking a make-up exam.
- Providing a copy of an exam to someone who is going to take a make-up exam.
- Giving test questions to students in another class.
- Reviewing previous copies of the instructor's tests without permission from the instructor.
Copyright © 2004 Greg Hamerly, with some content
taken from a syllabus by Jeff Donahoo.
Computer Science Department
Baylor University