# CSI 5325 paper presentations

Here we will organize the papers that the students in the class will present. Typically, we will have a presentation on one paper relevant to the topic of discussion shortly after the professor has finished his lectures on the topic.

I will put up a list of potential papers for each topic; it's up to the presenter to choose the paper at least a week prior to the presentation. Let me know which paper you choose. Note that some papers are in postscript format -- you may need to get a postscript reader for these.

Before the presentation, everyone in the class must read the paper so we may have a fruitful discussion. Thus the target audience of the presentation is people who already have learned something about the topic.

## Guidelines and evaluation

The presentation should use slides and be about 20 minutes long, and allow for an additional 10 minutes of discussion (either at the end, or during the talk). As a rule of thumb, 1 minute of presentation means about 1 slide. The presentation should address and lead the class in a discussion of the main points of the article. In particular, talk about:

- the point of the paper
- primary algorithms
- how the ideas are evaluated (e.g. experimental setup)
- the results (including advantages/drawbacks)

Consider this advice from Charles Elkan on notes on giving a research talk

Your grade will be based on the clarity and quality of your presentation, how well you lead the discussion and are able to answer any questions that come up.

## Topic: Decision trees (February 1; Chiam)

We will read the paper "Decision Trees for Hierarchical Multi-label Classification: A Case Study in Functional Genomics" by Blockeel, Schietgat, Struyf, Dzeroski, and Clare.

- Efficient algorithms for decision tree cross-validation (Blockeel, Struyf)
- Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria (Drummond, Holte)
- A Comparison of Decision Tree Ensemble Creation Techniques (Banfield, Hall, Bowyer, Kegelmeyer)
- Decision Trees for Hierarchical Multi-label Classification: A Case Study in Functional Genomics (Blockeel, Schietgat, Struyf, Dzeroski, Clare)
- Decision Tree and Instance-Based Learning for Label Ranking
- The random subspace method for constructing decision forests (Ho)
- An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization (Dietterich)
- BOAT - optimistic decision tree construction (Gehrke, Ganti, Ramakrishnan, Loh)
- An Empirical Comparison of Selection Measures for Decision-Tree Induction (Mingers)
- Decision Tree Induction Based on Efficient Tree Restructuring (Utgoff, Berkman, Clouse)
- Tree Induction for Probability-Based Ranking (Provost, Domingos)

## Topic: Neural networks (February 15; Yilan)

We will read the paper "Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping" by Caruana, Lawrence, and Giles.

- A Neural Probabilistic Language Model (Bengio, Ducharme, Vincent)
- Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping (Caruana, Lawrence, Giles)
- A comparison of ID3 and backpropagation for English text-to-speech mapping (Dietterich, Hild, Bakiri)
- Fast Neural Network Emulation of Dynamical Systems for Computer Animation (Grzeszczuk, Terzopoulos, Hinton)
- Neural Networks for Density Estimation (Magdon-Ismail, Atiya)
- Intrusion Detection with Neural Networks (Ryan, Lin, Mikkulainen)
- Effective Training of a Neural Network Character Classifier for Word Recognition (Yaeger, Lyon, Webb)

## Topic: Learning theory (March 31; Radu)

We'll read the paper "A PAC-style model for learning from labeled and unlabeled data" by Balcan and Blum.

- Teaching randomized learners (Balbach and Zeugmann)
- A PAC-style model for learning from labeled and unlabeled data (Balcan and Blum)
- Learnability and the Vapnik-Chervonenkis dimension (Blumer, Eherenfeucht, Haussler, and Warmuth)
- Quantifying inductive bias: AI learning algorithms and Valiant's learning framework (Haussler)
- A theory of the learnable (Valiant)

## Topic: Bayesian learning (March 14; Yanxin)

We will read the paper "Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier" by Domingos and Pazzani.

- Naive Bayes Models for Probability Estimation (Lowd, Domingos)
- Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier (Domingos, Pazzani)
- NewsWeeder: Learning to Filter Netnews (Lang)
- Convergence rates of the Voting Gibbs classifier with application to Bayesian feature selection (Ng, Jordan)
- Augmenting Naive Bayes for Ranking (Zhang, Jiang, Su)

## Topic: Instance-based learning (April 21; Hao)

We will read "A learning framework for nearest neighbor search" by Cayton and Dasgupta.

- An optimal algorithm for approximate nearest neighbor searching (Arya, Mount, Netanyahu, Silverman, and Wu)
- A learning framework for nearest neighbor search (Cayton and Dasgupta)
- Multiresolution instance-based learning (Deng and Moore)
- Approximate nearest neighbors: towards removing the curse of dimensionality (Indyk and Motwani)
- Efficient exact k-nn and nonparametric classification in high dimensions (Liu, Moore, and Gray)
- Efficient algorithms for minimizing cross validation error (Moore and Lee)
- Combining instance-based and model-based learning (Quinlan)

## Topic: Support vector machines

- Support Vector Machines for Histogram-Based Image Classification (Chapelle, Haffner, and Vapnik)
- Support Vector Machines for Spam Categorization (Drucker, Wu, and Vapnik)
- A Comparison of Methods for Multiclass Support Vector Machines (Hsu and Lin)
- Text Categorization with Support Vector Machines: Learning with Many Relevant Features (Joachims)
- Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods (Platt)
- A tutorial on support vector regression (Smola and Schölkopf)

## Topic: Unsupervised learning

- Soft vector quantization and the EM algorithm (Alpaydin)
- A Density-Based algorithm for discovering clusters in large spatial databases with noise (Ester, Kriegel, Sander, Xu)
- Large scale hierarchical clustering of protein sequences (Krause, Stoye, and Vingron)
- On spectral clustering: Analysis and an algorithm (Ng, Jordan, and Weiss)
- Combining multiple weak clusterings (Topchy, Jain, and Punch)

## Topic: Boosting

- The Application of AdaBoost for Distributed, Scalable and On-line Learning (Fan, Stolfo, and Zhang)
- Boosted Wrapper Induction (Freitag and Kushmerick)
- Generalized Multiclass AdaBoost and Its Applications to Multimedia Classification (Hao and Luo)
- Robust Real-Time Face Detection Robust Real-Time Face Detection (Viola and Jones)

Copyright © 2011 Greg Hamerly.

Computer Science Department

Baylor University