S. Dasgupta and V. Ng (2010) Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

S. Dasgupta and V. Ng (2010) "Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback", Volume 39, pages 581-632

PDF | PostScript | doi:10.1613/jair.3003

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user's intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.

Click here to return to Volume 39 contents list