slashdot topic feeds
Matt was looking over my shoulder while I was reading feeds at the airport yesterday, and he noticed that I have a feed for Google-related posts at Slashdot. I told him I was scraping it together because Slashdot doesn't offer topic feeds (and I don't want to see everything at Slashdot), and Matt thought I should share the rss-generating love with the world. I agreed, and here we are.
Here's the script I'm using to scrape Slashdot. It's in Perl, and you'll need a couple modules:
You'll also need the numeric topic ID for any Slashdot topic you want to track. They're easy to find. Those big icons in any Slashdot post link to a topic page. Click on one of those, and look for a number in the URL. For example, the Slashdot Google Topic Page is here:
Note the
To generate an RSS feed full of Slashdot Google goodness, run the script from a command prompt, passing in a topic ID like this:
The script will spit out a file called
Throw your new URL in your feed reader, and run the script on a regular basis with
Scaping is notoriously brittle, so if Slashdot changes their HTML this script will break. If that happens, view source on the Slashdot topic page and rewrite the regular expressions on line 39 or so of the script. That's the only labor-intensive bit in this script.
Here's the script I'm using to scrape Slashdot. It's in Perl, and you'll need a couple modules:
LWP::Simple
and XML::RSS::SimpleGen
. Once installed, grab the code: slashfeed.pl.
You'll also need the numeric topic ID for any Slashdot topic you want to track. They're easy to find. Those big icons in any Slashdot post link to a topic page. Click on one of those, and look for a number in the URL. For example, the Slashdot Google Topic Page is here:
http://slashdot.org/search.pl?tid=217
Note the
tid=217
in the URL. That's your Slashdot topic ID for posts about Google. You can browse the directory of all available Slashdot topics at the top of the Slashdot Search page.
To generate an RSS feed full of Slashdot Google goodness, run the script from a command prompt, passing in a topic ID like this:
% perl slashfeed.pl 217
The script will spit out a file called
slashdot_217.xml
that contains the latest Google-related posts, RSS style. Just make sure the script saves this file to a publicly addressable web folder (you might need to tweak the output file path on line 55). The final URL should look something like:
http://example.com/feeds/slashfeed_217.xml
Throw your new URL in your feed reader, and run the script on a regular basis with
cron
or Windows Task Scheduler. That's all there is to building a topic-specific Slashdot feed.
Scaping is notoriously brittle, so if Slashdot changes their HTML this script will break. If that happens, view source on the Slashdot topic page and rewrite the regular expressions on line 39 or so of the script. That's the only labor-intensive bit in this script.