Stephen Bay

 
Institute for the Study of Learning and Expertise
2164 Staunton Court
Palo Alto, CA 94306
USA
Tel: +1(650) 723-1684
Email: sbay@apres.stanford.edu

I am a research scientist at the Institute for the Study of Learning and Expertise and a member of the Computational Learning Laboratory in the Center for the Study of Language and Information at Stanford University.



Research | KDD Archive | Publications | Software | Teaching | Fun Stuff

Research

I am interested in Machine Learning, Data Mining, and Knowledge Discovery. I have worked on projects in the following areas:

Knowledge Discovery in Databases Archive

I used to be the librarian and maintainer for the new UCI KDD Archive. This is an online repository of large databases which is designed to encompass a wide variety of data types and analysis tasks. It expands on the current UCI Machine Learning Archive by storing databases which are much larger and involve other tasks than just classification.

Publications

Comments, suggestions, and feedback are welcome.

Chrisman, L., Langley, P., Bay, S. D., and Pohorille, A. (2003). Incorporating biological knowledge into evaluation of causal regulatory hypotheses. Proceedings of the Pacific Symposium on Biocomputing. Postscript.

Saito, K., Bay, S. D., and Langley, P. (2002). Revising Qualitative Models of Gene Regulation. Proceedings of the Fifth International Conference on Discovery Science. Lubeck, Germany. Postscript. PDF.

Bay, S. D., Shapiro, D. G., and Langley, P. (2002). Revising engineering models: Combining computational discovery with knowledge. Proceedings of the Thirteenth European Conference on Machine Learning. Helsinki, Finland. Postscript. PDF.

Bay, S. D. (2001). Clustering and Merging Geographic Regions Based on Sample Information. PAKDD workshop on Spatial and Temporal Data Mining. Hong Kong.

Bay, S. D. and Pazzani, M. J. (2001). Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery. Abstract. Postscript. PDF.

Bay, S. D. (2001). Multivariate Discretization for Set Mining. Knowledge and Information Systems. Abstract. Postscript. PDF.

Bay, S. D., Kibler, D., Pazzani, M. J., and Smyth, P. (2001). The UCI KDD Archive of Large Data Sets for Data Mining Research and Experimentation. Information Processing Society of Japan Magazine. Volume 42, Number 5, pages 462-466. English language version reprinted in SIGKDD Explorations. Volume 2, Issue 2, pages 81-85, 2000. Abstract. Postscript. PDF.

Bay, S. D. (2000). Multivariate Discretization of Continuous Variables for Set Mining. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Abstract. Postscript. PDF.

Bay, S. D. and Pazzani, M. J. (2000). Discovering and Describing Category Differences: What makes a discovered difference insightful?. Proceedings of the Twenty Second Annual Meeting of the Cognitive Science Society. Abstract. Postscript. PDF.

Bay, S. D. and Pazzani, M. J. (2000). Characterizing Model Errors and Differences. Proceedings of the Seventeenth International Conference on Machine Learning. Abstract. Postscript. PDF. Slides.

Bay, S. D. and Pazzani, M. J. (2000). Characterizing Model Performance in the Feature Space. ICML 2000 Workshop on What Works Well Where?. Postscript. PDF. handout instructions. handout 1. handout 2. handout 3.

Bay, S. D., Kibler, D., Pazzani, M. J., and Smyth, P. (2000). The UC Irvine Knowledge Discovery in Databases Archive. The 32nd Symposium on the Interface: Computing Science and Statistics. Invited poster.

Bay, S. D. and Pazzani, M. J. (1999). Detecting Change in Categorical Data: Mining Contrast Sets. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Abstract. Postscript. PDF.

Pazzani, M. J. and Bay, S. D. (1999). The Independent Sign Bias: Gaining Insight from Multiple Linear Regression. Proceedings of the Twenty-First Annual Meeting of the Cognitive Science Society. Abstract. PDF. Postscript. Slides.

Bay, S. D. (1999). Nearest Neighbor Classification from Multiple Feature Subsets. Intelligent Data Analysis. 3(3):191-209. Abstract. Postscript. PDF. (preprint).

Bay, S. D. (1998). Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets. Proceedings of the Fifteenth International Conference on Machine Learning. Madison, WI. Abstract. Postscript. PDF.

Bay, S. D. (1997). Nearest Neighbour Classification from Multiple Data Representations. Master's thesis, University of Waterloo, Department of Systems Design Engineering.

My BibTeX file myrefs.bib.

Software

MFS. Nearest Neighbor Ensembles based on Multiple Feature Subsets.

Teaching

I taught ICS 171: Introduction to Artificial Intelligence at UCI in Summer 2000. The course webpage has detailed problem sets with solutions. If you are an instructor, I can make available the latex source.

Fun Stuff

Long Beach, California: jellyfish
Balboa Park, San Diego: roots
Venice Beach, California: two curious dogs, Kara
San Francisco, California: Exploratorium, Golden Gate Bridge
Ensenada, Mexico: a little island, La Bufadora (calm), La Bufadora (active), KC
Air and Space Museum, Washington D.C.: Wright Flyer
Juggling: 3 Clubs, 5 Balls
Banff, Alberta: 10000+ft, at the top, a wild eep (Does anybody know what animal this really is?)
Arches National Park, Utah: a neat looking rock, underneath it
YellowStone, Wyoming: Old Faithful, Elmerald Pool, Smoke Mountain, somewhere in the park, Grand Canyon of Yellowstone
Grand Canyon, Arizona: North Side
Moab, Utah: slickrock , more slickrock
Algonquin Park, Ontario: resting after a 2km portage, a moose
Salt Lake City, Utah: biggest pipe organ I have ever seen
Don't try this at home: fire
My Favorite Stories
Funny Stuff
Friends (with websites): ArtGangLa, Eamonn, Engine Turning


Last modified 2001-10-8.