slashdot

slashdot topic feeds

Matt was looking over my shoulder while I was reading feeds at the airport yesterday, and he noticed that I have a feed for Google-related posts at Slashdot. I told him I was scraping it together because Slashdot doesn't offer topic feeds (and I don't want to see everything at Slashdot), and Matt thought I should share the rss-generating love with the world. I agreed, and here we are.

Here's the script I'm using to scrape Slashdot. It's in Perl, and you'll need a couple modules: LWP::Simple and XML::RSS::SimpleGen. Once installed, grab the code: slashfeed.pl.

You'll also need the numeric topic ID for any Slashdot topic you want to track. They're easy to find. Those big icons in any Slashdot post link to a topic page. Click on one of those, and look for a number in the URL. For example, the Slashdot Google Topic Page is here:

http://slashdot.org/search.pl?tid=217

Note the tid=217 in the URL. That's your Slashdot topic ID for posts about Google. You can browse the directory of all available Slashdot topics at the top of the Slashdot Search page.

To generate an RSS feed full of Slashdot Google goodness, run the script from a command prompt, passing in a topic ID like this:

% perl slashfeed.pl 217

The script will spit out a file called slashdot_217.xml that contains the latest Google-related posts, RSS style. Just make sure the script saves this file to a publicly addressable web folder (you might need to tweak the output file path on line 55). The final URL should look something like:

http://example.com/feeds/slashfeed_217.xml

Throw your new URL in your feed reader, and run the script on a regular basis with cron or Windows Task Scheduler. That's all there is to building a topic-specific Slashdot feed.

Scaping is notoriously brittle, so if Slashdot changes their HTML this script will break. If that happens, view source on the Slashdot topic page and rewrite the regular expressions on line 39 or so of the script. That's the only labor-intensive bit in this script.