Thursday, August 24, 2006
What can you do with AOL data?
There are very interesting trends and infinite possibilities with the AOL dataset if permitted to use for research purpose. AOL data has timestamped user queries and click through results (if present).
Here are few ideas I can chew upon:
Here are few ideas I can chew upon:
- Study contextual session searches. If we assume that in a single session (time threshold criteria to be determined) user is looking for related information, we will be able to form concept sets that should help narrow down the contextual references. Consider this like a N-gram study over number of days. For say 10-day period we study the searches of the specific user than in a single session, we can learn a lot about short term and little long term objectives of search for the specific user thus forming a concept.
- The trend in user searching is also interesting to study from info. seeking point of view. If we can understand the trend of user searches in single session or even for a period of time, we can learn a lot about the general searching behavior of the user. Study of multiple user's search behavior will yield commonalities among users that are specific to the expertise of the user to search. We would be able to classify users into several categories like novice, expert, knows-a-bit, related-expert etc. Having classified this, we can predict new user's category based on real time searches and help according to the class of the current user. If the current user is determined to be a novice, suggested queries should be different than the ones for the expert user.
- I do not believe that click through results completely indicate that the user did accept the search result. If the user accepted the result he/she clicked on, then I assume there should be no searches shortly after the click through. If there are repeat searches, that means a) The clicked URL is already familiar to the user and hence he/she does not have anything new to look for or b) The user find the clicked URL irrelevant to what he/she is looking for. The reasons why user didn't stick around on the clicked URL may be many and unknown to us but what is true is that the user is looking for something that has not been presented yet as the result. These patterns would be very interesting. The time between two click throughs is good indicator of the user's information seeking behavior.
- Even mining is another possibility which google trends is trying to capture on. Even mining is general term and may refer to many temporal aspects of mining. One of such ideas is to see the time based changes in user queries over a period of time. I agree that such results would be user specific. So what can be achieved through such an analysis? I believe time based trends of particular users also suggest the business product they are likely to buy. You can call this more like half-baked idea but never-the-less.
- If you classify user queries into labeled classes such as finance, sports, adult, education etc. You can learn the user behavior by class combinational aspects. The users that look for education, sports and music are assumed young. User searching in categories such as finance, real estate, stock shares are likely investors (big or small)
You and Your Research
I am currently reading this article on Dr. Finin's web page. Looks like a good insight of how to go about research. Some day I would like to work with Dr. Finin. :) Someday.
Read more at www.cs.umbc.edu/~finin/...
Wednesday, August 23, 2006
Poicare conjecture and understanding the problem itself
I have been reading recently about Dr. Grigory Perelman and also about poincare conjecture but I am not able to understand the problem itself. The problem statement is for 3 dimensional manifold poincare conjecture. It seems problem of conjecture with 4 or more manifolds has already been solved by Dr. Stephen Smale in 1960. This is what I have understood so far. manifold in mathematical sense here refers to mathematical space similar to Euclidean space with complex dimensional geometry. I would presume as the dimensions increase the space will be more complicated. So like line would be one dimensional and plane would be two-dimensional. If that is the case what makes 3 dimensional manifolds special ? Poincare conjecture states that any closed simply-connected 3 dimensional manifold is homomorphic (continuous stretching and bending of the object into a new shape. Thus, a square and a circle are homeomorphic) to the standard 3 dimensional sphere. I think the problem is to prove if any simply-connected 3 dimensional object can be continuously stretched to form 3 dimensional sphere without ripping it. So you can not transform a doughnut into a sphere without ripping it but other 3 dimensional shapes can be transformed into 3D sphere. Dr. Pereleman has published 3 papers on this outlining how to prove this. They are available here , here and here . Clay institute of mathematics has laid out millennium grand challenges, one of which is poincare conjecture. Read more here .
Friday, August 11, 2006
bumps but no road blocks
Hello once again. After the rejection at SIGIR, I have learnt a lot regarding research. In fact everyday I learn something new. There are 2 important news to write about. If you keep working hard, things will start falling in place eventually. First news to share is that I published 2 papers recently in 2 different IEEE conferences. Both the papers deal with using logic programs and non monotonic reasoning to model children story regarding "who stole the bat" and reason about actions.
Another good development is that I have submitted the paper rejected at SIGIR to another good conference ICDE . Offcourse I have tried my best to incorporate the feedback I received from SIGIR. Now the results include TREC dataset with over 54000 documents from Federal Register 1994 articles. Also some comments regarding clarity of the paper were sorted out in this version. I am hoping to get this work published at ICDE 2007.
I have to write about something else but that will be a separate post.
Another good development is that I have submitted the paper rejected at SIGIR to another good conference ICDE . Offcourse I have tried my best to incorporate the feedback I received from SIGIR. Now the results include TREC dataset with over 54000 documents from Federal Register 1994 articles. Also some comments regarding clarity of the paper were sorted out in this version. I am hoping to get this work published at ICDE 2007.
I have to write about something else but that will be a separate post.