T. Mahara Jothi, K.Thirumoorthy
This paper focuses on the study of web forum crawling problem which is an important task in web applications such as web mining and search engines. Due to the richness of the information contributed by millions of internet users every day, web forum sites have become precious deposits of information on the web. As a result, mining knowledge from forum sites has become more important and more significant. However, forum sites exist in different layouts or styles and they are powered by different software packages which makes forum crawling, a tedious task. In addition, large amount of duplicate pages and uninformative pages on forum sites also makes forum crawling task inefficient. In this paper, various forum crawling techniques and their comparisons has been discussed.