Wednesday, March 09, 2022

Topic Modeling vs. Topic Classification

Topic Modeling is an unsupervised method of infering "topics" or classification tags within a cluster of documents. Whereas topic classification is a supervised ML approach wherein we define a list of topics and label a few documents with these topics for training. 

A wonderful article explaining the differences between both the approaches is here -

Some snippets from the above article:

"Topic modeling involves counting words and grouping similar word patterns to infer topics within unstructured data. By detecting patterns such as word frequency and distance between words, a topic model clusters feedback that is similar, and words and expressions that appear most often. With this information, you can quickly deduce what each set of texts are talking about.

 If you don’t have a lot of time to analyze texts, or you’re not looking for a fine-grained analysis and just want to figure out what topics a bunch of texts are talking about, you’ll probably be happy with a topic modeling algorithm.

However, if you have a list of predefined topics for a set of texts and want to label them automatically without having to read each one, as well as gain accurate insights, you’re better off using a topic classification algorithm."

No comments:

Post a Comment