IS2140-IR: April 2014

Thursday, April 17, 2014

Week14——Reading notes(unit 14)

Information retrieval on the Semantic Web

They focus on three scenarios: IR, Q&A and complex question answering. They utilise a set of ontologies and a hybrid information retrieval mechanism in order to solve the conflict between the person's and machine's views of the query. They also used precision and recall to evaluation their system's performance. Their future work is a more complicated search engine and added with users' feedback to improve accuracy.

week14——Muddiest points(unit 13)

Unit 13 Text classification and clustering

About Classification: before classify text, need we to do the semantic and grammar analysis? Can we use the same methods in language model to smooth?
How to compare the performance of each methods in different cases?

Friday, April 11, 2014

Week13——Reading notes(unit 13)

Chapter 13. Text classification and naive bayes
1. To achieve good recall, standing queries thus have to be refined over time and can gradually become quite complex. Given a set of classes, we seek to determine which class(es) a given object belongs to.

2.The classification task is called text classification, text categorisation, topic classification or topic spotting.

3. Naive Bayes is robust, meaning that it can be applied to many different learning problems and is unlikely to produce classifiers that fail catastrophically. It is a probabilistic learning method. To eliminate zeros, we use add-one or Laplace smoothing, which simply adds one to each count.

4. Positional independence: the conditional probabilities for a term are the same independent of position in the document.

5. Feature selection for multiple classifiers: it is desirable to select a single set of features instead of a different one for each classifier. Feature selection statistics are first computed separately for each class on the two-class classification task c versus c and then combined.

Chapter 14. Vector space classification
Contiguity hypothesis: documents in the same class form a contiguous region and regions of different classes do not overlap. Knn or k nearest neighbour classification assigns the majority class of the k nearest neighbours to a test document. Knn requires no explicit training and can use the unprocessed training set directly in classification.

Week13——Muddiest points(unit 12)

Unit 12 Intelligent information retrieval

No questions of this unit.

Friday, April 4, 2014

Week12——Reading notes(unit 11)

User Profiles for Personalized Information Access

1. User profiling refers to use popular techniques for collecting information about users, representing and building user profiles.

2. Collecting information about users: the information collected may be explicitly input by the user or implicitly gathered by a software agent, collected on the user's client machine or gathered by the application server itself.

3. User profile representations: keyword profiles and semantic network profile.

4. User profile construction: building keyword profiles and semantic network profile, then building concept profiles.

Content-Based Recommendation Systems

1.Systems recommend an item to a user based upon a description of the item and a profile of the users' interests.

2. Item should be stored in database(item representation), and information about user profiles. Then we build a user model, including user's preference from the user history. We utilize decision tree by recursively partitioning training data and other classification methods. Also we need relevance feedback to evaluate the accuracy.

week12——Muddiest points

Unit 12: MLIA and Parallel IR

1. About bilingual dictionaries:
I doubt that the MRD is exited with different languages or to be generated dynamtic depends on query, and where and how to storage them. Another problem is that how do it judge which meaning of the words in the context?

2. About bilingual corpora:
what are the differences between parallel corpora and comparable corpora?

3. About translation disambiguation:
how to confirm a phrases or segment words into phrases?