Thursday, August 24, 2006
What can you do with AOL data?
There are very interesting trends and infinite possibilities with the AOL dataset if permitted to use for research purpose. AOL data has timestamped user queries and click through results (if present).
Here are few ideas I can chew upon:
Here are few ideas I can chew upon:
- Study contextual session searches. If we assume that in a single session (time threshold criteria to be determined) user is looking for related information, we will be able to form concept sets that should help narrow down the contextual references. Consider this like a N-gram study over number of days. For say 10-day period we study the searches of the specific user than in a single session, we can learn a lot about short term and little long term objectives of search for the specific user thus forming a concept.
- The trend in user searching is also interesting to study from info. seeking point of view. If we can understand the trend of user searches in single session or even for a period of time, we can learn a lot about the general searching behavior of the user. Study of multiple user's search behavior will yield commonalities among users that are specific to the expertise of the user to search. We would be able to classify users into several categories like novice, expert, knows-a-bit, related-expert etc. Having classified this, we can predict new user's category based on real time searches and help according to the class of the current user. If the current user is determined to be a novice, suggested queries should be different than the ones for the expert user.
- I do not believe that click through results completely indicate that the user did accept the search result. If the user accepted the result he/she clicked on, then I assume there should be no searches shortly after the click through. If there are repeat searches, that means a) The clicked URL is already familiar to the user and hence he/she does not have anything new to look for or b) The user find the clicked URL irrelevant to what he/she is looking for. The reasons why user didn't stick around on the clicked URL may be many and unknown to us but what is true is that the user is looking for something that has not been presented yet as the result. These patterns would be very interesting. The time between two click throughs is good indicator of the user's information seeking behavior.
- Even mining is another possibility which google trends is trying to capture on. Even mining is general term and may refer to many temporal aspects of mining. One of such ideas is to see the time based changes in user queries over a period of time. I agree that such results would be user specific. So what can be achieved through such an analysis? I believe time based trends of particular users also suggest the business product they are likely to buy. You can call this more like half-baked idea but never-the-less.
- If you classify user queries into labeled classes such as finance, sports, adult, education etc. You can learn the user behavior by class combinational aspects. The users that look for education, sports and music are assumed young. User searching in categories such as finance, real estate, stock shares are likely investors (big or small)
Comments:
Post a Comment