My Research Diary

Monday, September 06, 2004

Take a break

I am enjoying long weekend for labor day right now. Well, I have huge loads to research work lined up. I think I need some break from the work. I utilize long weekends (there aren't many) towards this.
Essentially even research is modular work just like professional projects. Especially prototyping part. I would think of research work into 4 phases. You can have these 4 phases for each problem, sub-problem and sub sub-problem in your research work. they are as follows:

Initial conceptualization/theory of what you want to do.
Some basic analysis/outline is followed by prototype development.
Work with results of the experiments. Mark all different cases and leave no test case untested.
Write a report on those experiences which may or may not be a research paper.

I have agenda of working on 2 sub sub-problems for this long weekend.

I have to work on conceptualization in more details for the clustering problem. I am looking at clustering by random sampling or min-max algorithm. I will write about this in coming paragraphs.
I have to work on the formula to calculate delta i.e. the one I have formulized after discussing with Dr. Cox of Mathematics department.

I got to know from various friends of mine that people in maths department will not be interested in helping me formulate the delta calculations as they do not see any advantage (in terms of grant/knowledge or credits etc.) in doing so. I have personally got big help from Dr. Cox. She is highly approachable. It does matter to an international student if a person smiles at you before you start talking to him or her. She has contributed to my research work in substantial manner that I feel, I will put her name on my research report with Dr. Bayrak's permission (offcourse). There has been some issues raised about my ability to write "proper english" in research work. I guess, me being from vernacular medium and also with English as my third language during school days, find it difficult to phrase or sentence things the way they are expected. I hope this is not an issue if I keep writing and keep improving.

Ok, coming to Min-Max algorithm, I talked to Nitin (My school friend) discussing this problem at length. I believe/propose that all documents are meant to be in individual cluster. Remember that clusters are suppose to have similar items clubbed together. But with different clusters comes the concepts of inter-cluster distances and intra-cluster distances. If we have only 1 cluster for all documents, notice that inter-cluster distance would be zero but intra-cluster distance would be really high that is indicator that even dis similar documents may have got clubbed together by mistake. On the other hand if we have 1 cluster per item to be classified, then we are looking at zero intra-cluster distance but now, the sum of distances of every cluster centre with other cluster centres (referred to as centroids) would be maximum. So we need to have balance of both. Idea is to have no/few parameters to drive clustering and automatically determine the ideal clustering looking at both the properties. But it is easier said than done. Now as I mentioned above, this is concetualization and now needs finer details to build a prototype and run experiments. Let us see...!!

# posted by Hemant Joshi : 9/06/2004 07:01:00 PM

Comments: Post a Comment

My Research Diary

Monday, September 06, 2004

Take a break

Links

Archives