CSI 5325 paper presentations
Here we will organize the papers that the students in the class will present. Typically, we will have a presentation on one paper relevant to the topic of discussion shortly after the professor has finished his lectures on the topic.
I will put up a list of potential papers for each topic; it's up to the presenter to choose the paper at least a week prior to the presentation. Let me know which paper you choose. Note that some papers are in postscript format -- you may need to get a postscript reader for these.
Before the presentation, everyone in the class must read the paper so we may have a fruitful discussion. Thus the target audience of the presentation is people who already have learned something about the topic.
Guidelines and evaluation
The presentation should use slides and be about 20 minutes long, and allow for an additional 10 minutes of discussion (either at the end, or during the talk). As a rule of thumb, 1 minute of presentation means about 1 slide. The presentation should address and lead the class in a discussion of the main points of the article. In particular, talk about:
- primary algorithms
- how the ideas are evaluated (e.g. experiment setup)
- the results (including advantages/drawbacks)
Consider this advice from Charles Elkan on notes on giving a research talk
Your grade will be based on the clarity and quality of your presentation, how well you lead the discussion and are able to answer any questions that come up.
Topic: Decision trees (February 4; Brandy)
Choose from one of the following 5 papers. I recommend the Oliver or Blockeel papers.
- Efficient algorithms for decision tree cross-validation (Blockeel, Struyf)
- Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria (Drummond, Holte)
- Option Decision Trees with Majority Votes (Kohavi, Kunz)
- The Effects of Training Set Size on Decision Tree Complexity (Oates, Jensen)
- On pruning and averaging decision trees (Oliver, Hand)
Update (Tue Jan 29 12:17:36 CST 2008) -- we will be reading the paper by Kohavi and Kunz.
Topic: Neural networks (February 14; Peter)
Choose from one of the following 7 papers. I recommend the Caruana or Dietterich papers.
- A Neural Probabilistic Language Model (Bengio, Ducharme, Vincent)
- Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping (Caruana, Lawrence, Giles)
- A comparison of ID3 and backpropagation for English text-to-speech mapping (Dietterich, Hild, Bakiri)
- Fast Neural Network Emulation of Dynamical Systems for Computer Animation (Grzeszczuk, Terzopoulos, Hinton)
- Neural Networks for Density Estimation (Magdon-Ismail, Atiya)
- Intrusion Detection with Neural Networks (Ryan, Lin, Mikkulainen)
- Effective Training of a Neural Network Character Classifier for Word Recognition (Yaeger, Lyon, Webb)
Update (Thu Feb 7 13:10:38 CST 2008) -- we will be reading the paper by Caruana, Lawrence, and Giles.
Topic: Evaluating hypotheses (February 21; no one)
We'll skip the paper presentation for this topic.
Topic: Learning theory (March 25; Ben)
Choose from one of the following 5 papers.
- Teaching randomized learners (Balbach and Zeugmann)
- A PAC-style model for learning from labeled and unlabeled data (Balcan and Blum)
- Learnability and the Vapnik-Chervonenkis dimension (Blumer, Eherenfeucht, Haussler, and Warmuth)
- Quantifying inductive bias: AI learning algorithms and Valiant's learning framework (Haussler)
- A theory of the learnable (Valiant)
Update (Tue Mar 18 12:31:57 CDT 2008) -- we will be reading the Valiant paper.
Topic: Instance-based learning (April 1; Nate)
Choose from one of the following 7 papers.
- An optimal algorithm for approximate nearest neighbor searching (Arya, Mount, Netanyahu, Silverman, and Wu)
- A learning framework for nearest neighbor search (Cayton and Dasgupta)
- Multiresolution instance-based learning (Deng and Moore)
- Approximate nearest neighbors: towards removing the curse of dimensionality (Indyk and Motwani)
- Efficient exact k-nn and nonparametric classification in high dimensions (Liu, Moore, and Gray)
- Efficient algorithms for minimizing cross validation error (Moore and Lee)
- Combining instance-based and model-based learning (Quinlan)
Update (Wed Mar 26 20:47:52 CDT 2008) -- we will be reading the Arya paper.
Topic: Support vector machines (April 15; Aaron)
Choose from one of the following 6 papers.
- Support Vector Machines for Histogram-Based Image Classification (Chapelle, Haffner, and Vapnik)
- Support Vector Machines for Spam Categorization (Drucker, Wu, and Vapnik)
- A Comparison of Methods for Multiclass Support Vector Machines (Hsu and Lin)
- Text Categorization with Support Vector Machines: Learning with Many Relevant Features (Joachims)
- Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods (Platt)
- A tutorial on support vector regression (Smola and Schölkopf)
Update (Mon Apr 7 23:00:44 CDT 2008) -- we will be reading the Drucker paper.
Topic: Unsupervised learning (April 29; Alex)
Choose from one of the following 5 papers.
- Soft vector quantization and the EM algorithm (Alpaydin)
- A Density-Based algorithm for discovering clusters in large spatial databases with noise (Ester, Kriegel, Sander, Xu)
- Large scale hierarchical clustering of protein sequences (Krause, Stoye, and Vingron)
- On spectral clustering: Analysis and an algorithm (Ng, Jordan, and Weiss)
- Combining multiple weak clusterings (Topchy, Jain, and Punch)
Update (Sun Apr 20 06:19:46 CDT 2008) -- we will be reading the Ng paper.
Topic: Boosting (May 1; Chris)
Choose from one of the following 4 papers.
- The Application of AdaBoost for Distributed, Scalable and On-line Learning (Fan, Stolfo, and Zhang)
- Boosted Wrapper Induction (Freitag and Kushmerick)
- Generalized Multiclass AdaBoost and Its Applications to Multimedia Classification (Hao and Luo)
- Face Recognition Using Boosted Local Features (Jones and Viola)
Update (Sun Apr 27 21:02:20 CDT 2008) -- we will be reading the Hao paper.