Biomedical Document Clustering Using Ontology based Concept Weight
Abstract—Conventional document clustering techniques are mainly based on the existence of keywords and the number of occurrences of it. Most of the term frequency based clustering techniques consider the documents as bag-of-words and ignore the important relationships between the words in the document. Phrase based clustering techniques also capture only the order in which the words occur in a sentence rather than the semantics behind the words.< Final Year Projects > Hence a concept based clustering technique is proposed in this paper. It uses Medical Subject Headings MeSH ontology for concept extraction and concept weight calculation based on the identity and synonymy relationships. K-means algorithm is used for clustering the documents based on the semantic similarity and the results are analyzed.