Project

Topic Clustered RSS Reader


RSS Feeds

We have been experimenting with various clustering problems (small and large) for some time now. An interesting aspect of the clustering problem is identifying the correct number of clusters for a given dataset. This is particularly difficult when the dataset is dynamic.

To solve the problem we are using a variation of a single-linked clustering we developed. We have found it to be reasonably accurate when dealing with small scale clustering problems. This is an attempt to demonstrate and test the method we developed. We chose the RSS News as a test case because it not only allows us to demonstrate the results in a concise and effective way but also makes the testing easier due to the size of the problem.

The RSS News Feed Clustering demo is implemented as a Yahoo! Widgets desktop widget so you'll need to download the widget engine from widgets.yahoo.com if you haven't already done so.

This widget presents a collection of RSS news feeds in a clustered format. We selected a few News sections (Top Stories, World, Sports) and for each section we hand picked a number of RSS feed sources. At regular intervals we fetch the RSS feeds for each section and parse them to get stories. We cluster the stories that are in the feeds using the variation of a single-link clustering mentioned above and present "clusters" of stories to the user. One thing to note here is that we use only the title and description provided in the feed to form a story (not the complete story from the web).

Project Link: http://research.yahoo.com/rnc/YahooRSSNewsClusters.widget