Fetchez le Python

August 20, 2008

Atomisator, a framework to build custom RSS feeds

Filed under: plone, python, zope — Tarek Ziadé @ 4:56 pm
Tags: , ,

We are all overwhelmed by the amount of data in our feed readers. While this problem is unavoidable if you keep on adding new feeds in it, they could be automatically filtered and categorized to reduce the flow of data.

I wanted for a long time to try out some custom filters over my feeds to find for example related entries, by trying to understand the meaning of the posts, using tools like NLTK.

So I needed a playground for this, where I could play with feeds.

I think the closest tool for this is to use Yahoo Pipes but as far as I know, the only way to create custom filters is to run a web service and call it from Yahoo Pipes.

Anyways, I started to code a framework (at first it was an example for my latest book) that looks a lot like Yahoo Pipes in its principles. I don’t have any User Interface at this time of course, but a simple plugin-based tool that will let me combine my code snippets with feeds.

It is called Atomisator (see http://atomisator.ziade.org).

The big picture

The big picture

The process is quite simple:

  1. Readers are plugins that know how to read a source and provide entries out of it.
  2. Filters are plugins that know how to remove unwanted entries, or enhance them (change their title, summary, etc.). They can be combined.
  3. the entries are then pushed in a database. This is useful to avoid doublons, and to keep track of past entries.
  4. to create the feed, the entries are read from the database
  5. Enhancers are plugins that will add to entries extra info. Typically info that can’t be stored, like Digg comments if the entry is detected on Digg, or Google related searches, and so on
  6. The feed is then generated.

Right now I am focusing on making it fast, which is not simple because the plugins can play with all entries in the database.

It is in early stage and undertested, but it kinda works. I pushed it at PyPI to see of it meets interest. If it does, I will document the process of writing plugins.

Make sure you have SQlite installed, and give it a try :

$ easy_install atomisator.main
$ atomisator -c atomisator.cfg
$ atomisator

You will have an atomisator.xml feed created. You can add other feeds in atomisator.cfg as well and try them.

Now with this environment, I can start to try out custom algorithms over my feeds.

I’ve been told the name doesn’t sound right in Ehglish, but it does in French so I keep it ;)

7 Comments »

  1. Hi Tarek.

    You probably don’t do Perl, but there is a huge framework in Perl for doing exactly what you’re aiming for. It’s called Plagger (http://plagger.org/trac) and it’s unbelievable powerful.

    Good luck with your own framework too!

    Comment by Ruben Fonseca — August 20, 2008 @ 6:12 pm | Reply

  2. This sounds a lot like Sam Ruby’s Venus project, a branch of the Planet Atom aggregator. Have you had a look at the Venus code to see if building on it would be easier than starting from scratch?

    http://intertwingly.net/code/venus/docs/index.html

    And you don’t even have to do Perl as Venus is written in Python. ;-)

    Comment by Dethe Elza — August 20, 2008 @ 9:27 pm | Reply

  3. Cool stuff! I’m working on some feed combining/filtering stuff in python as well http://www.openplans.org/projects/melkjug (demo at http://melkjug.org) We’re mostly focused on trying to bubble up articles that are good according to different measures. This framework looks like it could do some interesting stuff — I’ll definitely be checking out what you’re up to.

    Comment by Luke Tucker — August 20, 2008 @ 9:41 pm | Reply

  4. @Ruben Fonseca, @Dethe Elza: Thanks a lot guys, I’ll check on those projects !

    @Luke: Wow ! This project is really close to mine ! There are some filters in yours I could even use in mine. You have a really advanced Web UI. That’s the next step I wanted to take in the framework, but I will first check on your project and see how you do things.

    Comment by Tarek Ziadé — August 21, 2008 @ 8:26 am | Reply

  5. [...] in atomisator, python tagged screenscraping at 7:50 pm by Tarek Ziadé I am writing a plugin for Atomisator that detects when a post is a Reddit or a Delicious entry, and add a sample from the page it links [...]

    Pingback by Atomisator, visiting links « Carpet Python — August 27, 2008 @ 7:50 pm | Reply

  6. Don’t underestimate the value of UI! The melkjug site is sweet in that regard (a bit slow on performance, didn’t get far into features, but super-easy to use…I also appreciate as a non-developer being able to test it without having to deal with eggs etc!).

    The Atomisator site seems to be down currently, fyi.

    Comment by cjj — December 15, 2008 @ 4:22 am | Reply

  7. @cij : right, the UI is very very important. (the site is up again)

    Comment by Tarek Ziadé — December 15, 2008 @ 4:46 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.