Monday, September 06, 2004
Take a break
I am enjoying long weekend for labor day right now. Well, I have huge loads to research work lined up. I think I need some break from the work. I utilize long weekends (there aren't many) towards this.
Essentially even research is modular work just like professional projects. Especially prototyping part. I would think of research work into 4 phases. You can have these 4 phases for each problem, sub-problem and sub sub-problem in your research work. they are as follows:
Ok, coming to Min-Max algorithm, I talked to Nitin (My school friend) discussing this problem at length. I believe/propose that all documents are meant to be in individual cluster. Remember that clusters are suppose to have similar items clubbed together. But with different clusters comes the concepts of inter-cluster distances and intra-cluster distances. If we have only 1 cluster for all documents, notice that inter-cluster distance would be zero but intra-cluster distance would be really high that is indicator that even dis similar documents may have got clubbed together by mistake. On the other hand if we have 1 cluster per item to be classified, then we are looking at zero intra-cluster distance but now, the sum of distances of every cluster centre with other cluster centres (referred to as centroids) would be maximum. So we need to have balance of both. Idea is to have no/few parameters to drive clustering and automatically determine the ideal clustering looking at both the properties. But it is easier said than done. Now as I mentioned above, this is concetualization and now needs finer details to build a prototype and run experiments. Let us see...!!
Essentially even research is modular work just like professional projects. Especially prototyping part. I would think of research work into 4 phases. You can have these 4 phases for each problem, sub-problem and sub sub-problem in your research work. they are as follows:
- Initial conceptualization/theory of what you want to do.
- Some basic analysis/outline is followed by prototype development.
- Work with results of the experiments. Mark all different cases and leave no test case untested.
- Write a report on those experiences which may or may not be a research paper.
- I have to work on conceptualization in more details for the clustering problem. I am looking at clustering by random sampling or min-max algorithm. I will write about this in coming paragraphs.
- I have to work on the formula to calculate delta i.e. the one I have formulized after discussing with Dr. Cox of Mathematics department.
Ok, coming to Min-Max algorithm, I talked to Nitin (My school friend) discussing this problem at length. I believe/propose that all documents are meant to be in individual cluster. Remember that clusters are suppose to have similar items clubbed together. But with different clusters comes the concepts of inter-cluster distances and intra-cluster distances. If we have only 1 cluster for all documents, notice that inter-cluster distance would be zero but intra-cluster distance would be really high that is indicator that even dis similar documents may have got clubbed together by mistake. On the other hand if we have 1 cluster per item to be classified, then we are looking at zero intra-cluster distance but now, the sum of distances of every cluster centre with other cluster centres (referred to as centroids) would be maximum. So we need to have balance of both. Idea is to have no/few parameters to drive clustering and automatically determine the ideal clustering looking at both the properties. But it is easier said than done. Now as I mentioned above, this is concetualization and now needs finer details to build a prototype and run experiments. Let us see...!!
Comments:
Post a Comment