Aggregators, auto-discovery algorithms, and so on

I've been using SharpReader for awhile now, and I've found it to be great. I decided to check out the [latest beta](javascript:showDisclaimer( 'download.aspx?FileGuid=a55cd046-b07b-4396-943e-1634ffc11a22' );) of RSS Bandit which posted over the weekend, and, I have to say, I'm extremely impressed at the progress they've made with it.

As Luke points out, message threading has been incorporated, and there are plenty of other goodies in there. Most notable is probably the multi-tabbed browsing, which is a great feature for the way blogs are typically read anyways.

I'm going to make a more concerted effort to keep on top of the various readers, because they're all doing a hell of a job.

There's also talk of introducing a formal plug-in architecture for RSS Bandit, for things like IBlogThis. Ideally, I could write a plug-in to abstract out the back end stores for my feed, while still using the slick front-end of these tools. My dream aggregator might come true after all :)

That being said, I'd also been curious as to how the auto-discovery algorithms worked.

The simplest and least flexible way, of course, is simply to try a whole bunch of "standard" locations based on the original url - baseUrl + rss.aspx, + rss.xml, and so on.

Another approach might be to spider the HTML returned and look for any valid RSS items. If I put a link to someone else's RSS feed, which is quite common, this makes things a little complicated. How does the application determine which is "the" feed and which is a linked feed - guess? some comparison algorithm on the url?

Then when I put in http://www.sellsbrothers.com/spout into SharpReader, it "discovered" http://www.growinglifestyle.com/h92/index.rss. How the hell did we end up there?

I opened SharpReader up in Anakrino to see what it was trying to do. I'll take a closer look tonight, as I never quite found the auto-discovery logic, but I did notice a SubscriptionServer object that listens on TCP port 5335. I'm obviously not too familiar with the SharpReader architecture, but this makes me wonder if Luke was planning on supporting a remote subscription server sometime in the future. Seems like we could easily plug in a remote location for the SubscriptionServer instead of using the loopback IP.

Hmm....