USM LINKS

Computer Science

Mathematics

Physics

Mathematical Sciences




WEB PAGES

CMInfo

USM Pages

Campus Info




RESOURCES

Math Resources

Search Engines

Pax Index


SYMPOSIUM MINISYMPOSIA

What's New

FIRST SOUTHERN SYMPOSIUM ON COMPUTING

December 4-5, 1998
University of Southern Mississippi
Hattiesburg, Mississippi


ABSTRACT

Weighted Projection in Nearest-Neighbor Classifiers

Mirek Kubat and Wei Kian Chen

Automated classification is a well-established domain in Artificial Intelligence, Pattern Recognition, and Machine Learning. We are interested in the setting where the computer identifies objects described by attribute vectors. One of the most widely studied approaches, the nearest-neighbor classifier, uses the following scenario: (1) store a set of pre-classified examples; (2) whenever asked to classify a new attribute vector, x, find the most similar example and denote this example's label by c; (3) assign to x the label c. Similarity is determined by a distance measure defined on the instance space (the space of all possible vectors x): the smaller the distance between two vectors, the higher their similarity. Successful applications of the nearest-neighbor principle have been reported in many real-world domains.

In this paper we study a technique that addresses three well-known weaknesses of this paradigm: (1) sensitivity to irrelevant attributes; (2) sensitivity to different attribute scales; and (3) the high computational costs associated with the need to compare x with many stored examples. The essence of our approach builds on the recent idea of Akkus and Güvenir (1996). Instead of measuring the distances between points in the instance space, they use simple transformation: each example is projected on each of the n attribute axes. The original training set is thus replaced with n new training sets, each describing the examples by a different (single) attribute. When classifying x, the agent investigates the new training sets one at a time, and in each of them determines the label of the nearest neighbor. The final decision is then reached by plain voting: x is assigned the most frequent label.

We extend this scheme by a simple technique that associates each of the n training sets with a weight determined by a technique developed by Littlestone and Warmuth (1994). The plain voting used by our predecessors is then replaced with a weighted majority voting over the suggestions delivered by the individual training sets. This simple mechanism reduces the impact of irrelevant attributes. Since the distances along a single attribute are not affected by the distances along other attributes, the metric is insensitive to different attribute scales. A simple conjecture also enables us to discard most of the examples in each traininig set, which significantly reduces the computation costs.

We test the behavior of the mechanism using synthetic data with controlable parameters as well as on publicly available benchmark domains.


Getting More Information

To obtain more information about the meeting send e-mail to: fscc98@pax.st.usm.edu.


Return to PAX
Return to USM

USM
inquire@delphi.st.usm.edu
Updated
http://pax.st.usm.edu