(Listed in the order recommended or mentioned in our discussions; This list will be organized "eventually".)
'Introduction to Stochastic Search and Optimization: Estimation,
Simulation and Control' by James C. Spall, Wiley, 2003
Optimization by Simulated Annealing by Kirkpatrick, Gelatt and Vecchi, 1983
Equation of State Calculations by Fast Computing Machines by Metropolis, Rosenbluth, and Teller, 1953
Outline for a Logical Theory of Adaptive Systems by John Holland, 1962
Genetic Algorithms and the Optimal Allocation of Trials, John Holland,
1973
Genetic Programming: A Paradigm for Genetically Breeding Populations of Computer Programs
to Solve Problems, JR Koza, 1990, Stanford University Computer Science Department, technical report STAN-CS-90-1314
Flocks, Herds and Schools: a Distributed Behavioral Model, Craig Raynolds (Computer Graphics, 21(4):25-34. SIGGRAPH '87 Conference Proceedings)
Particle Swarm Optimization
(From Proceedings IEEE Int'l. Conf. on Neural Networks), Kennedy and Eberhart, 1995
Autonomous Automata, Larry Fogel, 1962, Industrial Research, vol. 4, pp. 14 - 19 (For now, this is available from David, so please send e-mail.)
Ant Colony System website Marco Dorigo et al
On Bagging and Nonlinear Estimation by Friedman and Hall. (Jerome H. Friedman's site)
Also see the original bagging paper:
Bagging Predictors by Breiman.
Boosted discriminant projections for nearest neighbor classification by Masip and Vitria.
Density Modeling and Clustering Using Dirichlet Diffusion Trees by Neal.
Also see Defining Priors for Distributions Using Dirichlet Diffusion Trees.
Discriminant Adaptive Nearest Neighbor Classification, Hastie and Tibshirani, June 1996 (Vol. 18, No. 6) pp. 607-616
Discriminative Clustering (preprint) by Kaski, Sinkkonen, and Klami.
Kernel Methods and the Exponential Family by Canu and Smola.
A Framework for Multiple-Instance Learning by Maron and Lozano-Perez.
The Relevance Vector Machine by Tipping.
Semiparametric Latent Factor Models by Teh, Seeger, and Jordan.
Algebraic Geometry and Stochastic Complexity of Hidden Markov Models, Yamazaki and Watanabe, Neurocomputing, Vol.69, pp. 62-84, 2005
All Learning is Local: Multi-agent Learning in Global Reward Games by Chang, Ho, and Kaelbling.
Extending Q-Learning to General Adaptive Multi-agent Systems by Tesauro.
On the Dynamics of Boosting by Rudin, Daubechies, and Schapire.
Breaking SVM Complexity with Cross-Training, Bakir, Bottou, and Weston, Advances in Neural Information Processing Systems 17 Saul, L.K. (ed.), MIT Press, Cambridge, pp. 81-88, 2005
Brain Inspired Reinforcement Learning by Rivest and Bengio.
TD(0) Leads to Better Policies than Approximate Value Iteration by Roy.
Bayesian Surprise Attracts Human Attention by Itti and Baldi.
Robust Fisher Discriminant Analysis by Kim, Magnani, and Boyd.
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by Mahadevan and Maggioni.
Non-Local Manifold Parzen Windows by Bengio, Larochelle, and Vincent.
From Lasso Regression to Feature Vector Machine by Li, Yang, and Xing.
A class of instantaneously trained neural networks, Subhash Kak, "Information Sciences",
Volume 148, Issue 1-4, December 2002, pp. 97-102
Better Web Searches and Prediction with Instantaneously Trained Neural Networks, Subhash Kak,
"IEEE Intelligent Systems", Volume 14 Issue 6, November 1999