Chapter 13. Text classification and naive bayes
1. To achieve good recall, standing queries thus have to be refined over time and can gradually become quite complex. Given a set of classes, we seek to determine which class(es) a given object belongs to.
2.The classification task is called text classification, text categorisation, topic classification or topic spotting.
3. Naive Bayes is robust, meaning that it can be applied to many different learning problems and is unlikely to produce classifiers that fail catastrophically. It is a probabilistic learning method. To eliminate zeros, we use add-one or Laplace smoothing, which simply adds one to each count.
4. Positional independence: the conditional probabilities for a term are the same independent of position in the document.
5. Feature selection for multiple classifiers: it is desirable to select a single set of features instead of a different one for each classifier. Feature selection statistics are first computed separately for each class on the two-class classification task c versus c and then combined.
Chapter 14. Vector space classification
Contiguity hypothesis: documents in the same class form a contiguous region and regions of different classes do not overlap. Knn or k nearest neighbour classification assigns the majority class of the k nearest neighbours to a test document. Knn requires no explicit training and can use the unprocessed training set directly in classification.
No comments:
Post a Comment