My Research Diary

Monday, July 30, 2007

PhD is not enough

Dr. Seker gave me this book as my graduation gift. PhD is not enough- the book is just fantastic. I have read it almost 3/4th now and I keep wondering, why did I not read this before. When I started reading, I initially thought it is too late for me since I already graduated but it is useful throughout and after PhD. Is this survival guide for real? Only time will tell but lots of things mentioned in this book seem to have happened with me. It is probably common man's account of how things can go wrong and how they can be improved. Also it gives good guidelines on how to make better decisions.

Thank you Dr. Seker for such a wonderful graduation gift.

That is all for today.

# posted by Hemant Joshi : 7/30/2007 03:10:00 PM

Friday, June 29, 2007

Microsoft RFP awards

Microsoft last year had announced access to their query logs and a possible grant of up to $50,000 for up to 10 groups that propose innovative research ideas in the field of query log analysis, machine learning, information retrieval, knowledge management etc. I had applied for this and did not get it. I was anxious specifically for the sheer size of query log data that was going to be made available for the research. Well, these folks got it. I am well aware of Bruce Croft, ChengXiang Zhai and their research work in this field but I see lot of unfamiliar names. I have got to keep myself updated with their research work.

# posted by Hemant Joshi : 6/29/2007 10:01:00 AM

Graduation

I graduated May 19, 2007 and the next day afternoon was a grand party organized by my wife to celebrate. My parents came all the way from India and so did my in-laws. I was happy to share this precious and proud moment with all my family members. My elder brother couldn't make it and I missed him.

Now I look forward to my job at Acxiom coporation research labs. Recently Acxiom was bought by two private equity firms, ValueAct and Silverlake which are primarily investment companies that probably will make the company go public after they make it even more profitable. I don't understand this very well and so I would rather not comment about it.

# posted by Hemant Joshi : 6/29/2007 09:53:00 AM

Tuesday, February 20, 2007

UCAIR(User-Centered Adaptive Information Retrieval) Project

UIUC has a wonderful project with the help from NSF grant. This firefox or IE toolbar deals with re-ranking of google results based on what you click. This is personalization of web search results to some extent. THe project is work done by students at UIUC under Dr. Zhai.

# posted by Hemant Joshi : 2/20/2007 08:31:00 AM

Thursday, August 24, 2006

What can you do with AOL data?

There are very interesting trends and infinite possibilities with the AOL dataset if permitted to use for research purpose. AOL data has timestamped user queries and click through results (if present).

Here are few ideas I can chew upon:

Study contextual session searches. If we assume that in a single session (time threshold criteria to be determined) user is looking for related information, we will be able to form concept sets that should help narrow down the contextual references. Consider this like a N-gram study over number of days. For say 10-day period we study the searches of the specific user than in a single session, we can learn a lot about short term and little long term objectives of search for the specific user thus forming a concept.
The trend in user searching is also interesting to study from info. seeking point of view. If we can understand the trend of user searches in single session or even for a period of time, we can learn a lot about the general searching behavior of the user. Study of multiple user's search behavior will yield commonalities among users that are specific to the expertise of the user to search. We would be able to classify users into several categories like novice, expert, knows-a-bit, related-expert etc. Having classified this, we can predict new user's category based on real time searches and help according to the class of the current user. If the current user is determined to be a novice, suggested queries should be different than the ones for the expert user.
I do not believe that click through results completely indicate that the user did accept the search result. If the user accepted the result he/she clicked on, then I assume there should be no searches shortly after the click through. If there are repeat searches, that means a) The clicked URL is already familiar to the user and hence he/she does not have anything new to look for or b) The user find the clicked URL irrelevant to what he/she is looking for. The reasons why user didn't stick around on the clicked URL may be many and unknown to us but what is true is that the user is looking for something that has not been presented yet as the result. These patterns would be very interesting. The time between two click throughs is good indicator of the user's information seeking behavior.
Even mining is another possibility which google trends is trying to capture on. Even mining is general term and may refer to many temporal aspects of mining. One of such ideas is to see the time based changes in user queries over a period of time. I agree that such results would be user specific. So what can be achieved through such an analysis? I believe time based trends of particular users also suggest the business product they are likely to buy. You can call this more like half-baked idea but never-the-less.
If you classify user queries into labeled classes such as finance, sports, adult, education etc. You can learn the user behavior by class combinational aspects. The users that look for education, sports and music are assumed young. User searching in categories such as finance, real estate, stock shares are likely investors (big or small)

# posted by Hemant Joshi : 8/24/2006 11:28:00 AM

You and Your Research

I am currently reading this article on Dr. Finin's web page. Looks like a good insight of how to go about research. Some day I would like to work with Dr. Finin. :) Someday.

Wednesday, August 23, 2006

Poicare conjecture and understanding the problem itself

I have been reading recently about Dr. Grigory Perelman and also about poincare conjecture but I am not able to understand the problem itself. The problem statement is for 3 dimensional manifold poincare conjecture. It seems problem of conjecture with 4 or more manifolds has already been solved by Dr. Stephen Smale in 1960. This is what I have understood so far. manifold in mathematical sense here refers to mathematical space similar to Euclidean space with complex dimensional geometry. I would presume as the dimensions increase the space will be more complicated. So like line would be one dimensional and plane would be two-dimensional. If that is the case what makes 3 dimensional manifolds special ? Poincare conjecture states that any closed simply-connected 3 dimensional manifold is homomorphic (continuous stretching and bending of the object into a new shape. Thus, a square and a circle are homeomorphic) to the standard 3 dimensional sphere. I think the problem is to prove if any simply-connected 3 dimensional object can be continuously stretched to form 3 dimensional sphere without ripping it. So you can not transform a doughnut into a sphere without ripping it but other 3 dimensional shapes can be transformed into 3D sphere. Dr. Pereleman has published 3 papers on this outlining how to prove this. They are available here , here and here . Clay institute of mathematics has laid out millennium grand challenges, one of which is poincare conjecture. Read more here .

# posted by Hemant Joshi : 8/23/2006 09:37:00 AM

Friday, August 11, 2006

bumps but no road blocks

Hello once again. After the rejection at SIGIR, I have learnt a lot regarding research. In fact everyday I learn something new. There are 2 important news to write about. If you keep working hard, things will start falling in place eventually. First news to share is that I published 2 papers recently in 2 different IEEE conferences. Both the papers deal with using logic programs and non monotonic reasoning to model children story regarding "who stole the bat" and reason about actions.

Another good development is that I have submitted the paper rejected at SIGIR to another good conference ICDE . Offcourse I have tried my best to incorporate the feedback I received from SIGIR. Now the results include TREC dataset with over 54000 documents from Federal Register 1994 articles. Also some comments regarding clarity of the paper were sorted out in this version. I am hoping to get this work published at ICDE 2007.

I have to write about something else but that will be a separate post.

# posted by Hemant Joshi : 8/11/2006 04:49:00 PM

Monday, April 17, 2006

SIGIR paper rejected

Hello there,
There are thousands of reasons why your paper may be rejected. There are very few why it should be accepted. For 2006, SIGIR accepted 74 out of 399 papers submitted this year and my paper belongs to those 325 papers that were just not good enough for the standard SIGIR poses. I am happy about one thing though. I got the review from some of the best known names in Information Retrieval field.

My paper was rejected basically for 3 reasons:
1. Presentation of the paper was poor - The paper I have written is marred by language and lack of good technical writing skills. It is not appealing to reviewers to read such work if they have to try hard to understand soemthing. They read hundreads of papers all the time. The quality of good paper is not just good work but also well written presentation of the work in the paper.

2. The dataset I have been using so far is classic3 dataset from cornell. It is really a tiny dataset and highly inappropriate especailly for matrix dimensionality reduction problems. The dataset to demonstrate dimensionality reduction has to be considerable in size like TREC datasets.

3. This last point was established by Dr. Xu who is also co author of this paper when he analysed why it was rejected. According to him, in a conference of the standard of SIGIR, if the paper has prpoblem defined prpoerly and the solution algorithm/approach presented clearly alongwith experimental results etc. then it is a bad paper. Surprised? I was too. But then he explained that a good paper is the one that not only explains the problem, solution approach/algorithm and experiments but it also conducts comparitive analysis of the algorithm with existing approaches. It also will focus on error bounds of the algorithm and it will conduct extensive research with many different datasets of different sizes and different fields. That's a good paper.

Considering the above 3 points mentioned, I must admit I have a long way to go. I am disappointed but have not given up. Failure like this is part of learning process. Being in a university like UALR, you learn to live with on-site training rather than established proven methodical approaches taught at big universities.

I have chosen this path and I have to make it work with perseverence and hard work.
Bye for now.

# posted by Hemant Joshi : 4/17/2006 09:52:00 AM

Wednesday, March 29, 2006

Google pages for the basic homepage hosting...

I read a lot about google pages where google offers 100 MB of permanent place to host your homepage . I am not sure this would be big help for the commercial website and webpages but for someone small, it probably is much easier to host his/her own personal homepage at http://pages.google.com You can check out mine at http://hemantmjoshi.googlepages.com

# posted by Hemant Joshi : 3/29/2006 08:43:00 AM