My Research Diary

Monday, December 13, 2004

Evolution of Information

There were couple of thoughts expressed as a part of feedback during my dissertation proposal presentation that I would like to discuss. The term "Information Evolution" is new and has been introduced by me and Dr. Bayrak together. What this means is that the term has no prior significance and we need to define what it means. Evolution is about the never-ending change. We consider information to be of evolutionary nature as it changes its form and medium.
How exactly can such a phenomenon be justified. Well, it can not be because information evolution has side effects that can be observed but not the actual changes. The changes in the structure of information are noticeable but how does one measure the internal or intrinsic changes in the nature of information. We refer to evolution of information from semantic or meaning point of view. The meaning or say the implied meaning of a particular information changes with respect to context. What this means is that a particular word might have a particular meaning in one context but a totally different meaning/sense in another context.
The evolutionary behavior of words has been studied as "Word Sense Disambiguation" problem from Natural Language Processing [NLP].
I suggest we use multiple answer set paradigms to represent multiple sense of the word and depending on the on text the attempt can be made to disambiguate the actual sense in "that" context.
The other issue raised is about my statement in the proposal "Information Evolution is a superset of Information Retrieval" IE is superset of IR according to me because, IE can not only effectively search for the information to perform IR related tasks but can also reason about it. The main objection is that IE approach does not produce better accuracy retrieval of documents that existing IR approaches. According to me, IE is suppose to produce better results than IR. The onus in IE s not just on accuracy of the results but also on 2 other objectives
1. Better reasoning about acquired knowledge/information
2. Better understanding (semantic) of the information acquired.
Now in order to achieve both of these, we need to store the information not in the best efficient form from retrieval point of view, but we also need to store the context, which means the unnecessary or unwanted information is being stored from IR point of view. Then how can we produce better results than IR when the objectives here are multi-fold.
Dr. Xu asked me the question during my proposal presentation that "What is the rationale about using sentence based LSI approach instead of existing word-based approach". This is my response:
Myself and Dr. Bayrak, both are of the opinion that the sentence preserves true context of the information. Context is important as meaning changes from one context to another, depending on how it is being used and also in what context/reference it is being used. So using existing bag-of-words approach will yield better results at the expense of loosing context related information about the acquired knowledge. The only way to preserve it is to read like human mind, in terms of sentences. So we preserve sentences in order to preserve context. If the word-by-documents matrix is formed as in existing approaches, the word-based queries can be answered by using cosine similarity of each document vector to the query vector. Now we form sentence and store the correlation delta values of each sentence in the matrix. This means, we keep all the words but still form sentence-by-documents matrix. It has two inherent advantages in This approach.
1. The sentence x docs matrix has considerably less number of rows compared to words x docs matrix.
2. Unlike word x docs matrix, the sentence x docs matrix still preserves the context of the information as it was presented to us.
I know some researchers will not agree with this philosophy but I strongly believe this is right. Sentence based LSA is the right approach. The only answer that needs to be justified is whether the correlation delta is true representation of the characteristic behavior of the implied text semantics. We will find out!!

# posted by Hemant Joshi : 12/13/2004 01:10:00 AM

Comments: Post a Comment

My Research Diary

Monday, December 13, 2004

Evolution of Information

Links

Archives