01.30.08

Buildout: plone.org and pypi.python.org are acting like SPOF

Posted in plone, python, windows, zope tagged at 9:20 am by Tarek Ziadé

Yesterday, plone.org was moved on another server. It was an horrible day for our people here that didn’t have a local cache of eggs to build their instances. So plone.org was acting like a Single Point Of Failure(SPOF) for some packages.

A few developers, that are under windows, were even having permission denied errors on their buildout because when a package is badly downloaded is not correctly crushed before a new attempt (I need to add a ticket about this in setuptools tracker I guess).

Anyway, we decided to create a mirror here, (I am buiding it at http://release.ingeniweb.com/ this morning hopefully) to avoid such problems.

This makes me think that zc.buildout should introduce a high-level mirror mechanism in the find-links variable, that would let someone explicitely provide a list of mirror. It could look like this:

find-links =
  http://pypi.python.org/simple |  http://release.ingeniweb.com/pypi-mirror
  http://dist.plone.org |  http://release.ingeniweb.com/plone-dist-mirror

It could be used to switch the find-links values sent to setuptools when the primary url is down by attempting a simple call with a timeout.

01.27.08

Snow sprint report #4 : a new command in zc.buildout + a context-free grammar text generator

Posted in plone, python, zope at 11:06 am by Tarek Ziadé

So this was the last sprint day here at the snowsprint, and a lot of work was done to wrap-up some of the tasks. On my side I worked on two topics:

  • adding a new describe command to zc.buildout
  • coding a random-text generator library

Adding a “describe” command to zc.buildout

Godefroid came up with a nice idea about buildouts: when you are working with a recipe, it’s sometimes hard to figure out what are the options it takes, which ones are optionals, what are the default, etc.

It means that you have to digg into the code, or get to the PyPI page. Hopefully this page will give you the infos, if the long_description variable was hooked into some reSTructuredText. (see iw.recipe.pound for example).

That make a lot of context changes for the developer, so basically, the idea of the new describe command is to be able to query for a given recipe help. This help will be displayed online as long as the recipe creator fills the Recipe class docstring. We checked with Jim that this would be a good idea, since he wants (and that’s good) to keep the recipe as simple as possible (basically, any class with an install and an update commands). Since he liked the idea, we started to code it.

So basically, the command is called like this:

    $ bin/buildout describe my.recipes
    my.recipes
        The coolest recipe on Earth.
        Ever.

It deals with recipe versions and takes care of multiple entry points:

    $ bin/buildout describe my.recipes:default my.recipes:second
    my.recipes:default
        The coolest recipe on Earth.
        Ever.
    my.recipes:second
        No description available

This feature looks quite simple, but was a bit tricky to implement, since we had to parse the working set of the current buildout to extract the infos. The version section is also taken care of.

All that work we did together with Godefroid and Dokai is in a branch, waiting for Jim’s feedback.

Coding a random-text generator

When we worked on benchmarking Solr versus Plain catalog on the indexing task, we created a small script to generate random text, based on a chomsky algorithm. We were really excited about going deeper in this topic. Both Dokai and I worked on some generators. I have written on my side a Python port of nonsense, and the results were pretty interesting.

Anyway, we started a fun task for the last day with Dokai and Ethan: write a random-text generator library and a grok-based web app on the top of it. I worked on the core part, and we came up with this cool command line scripts that would generate som random text, given a file that would provide structure of sentences, and for each part of the sentence a list of choices.

The command is building the sentences picking the choices randomly. For example this file adapted from nonsense (extract):

[gibberish]

default =
    ${course}

name = college

annoucement =
    The ${university} class “${course}” has been cancelled due to lack of interest.
    Starting next year, incoming freshman at ${university} will be required to take “${course}.”
    “${course}” will no longer be offered at ${university} due to lack of interest.
    Due to overwhelming popularity, an additional section of “${course}” will be offered at ${university} next semester.
    Not one single student signed up for ${university}’s “${course}” last semester.

course =
    ${adjective} ${noun} ${suffix}
    ${adjective} ${noun}: ${ending}
    ${adjective} ${noun} And ${adjective} ${noun} ${suffix}
    ${noun} & ${noun} ${suffix}
    ${group1} ${group2} ${life} ${suffix}
    ${group2} ${noun} ${life} ${suffix}
    ${group1} ${group2} ${life} Since {#1800-1970}
    ${group2} ${life}: ${ending}
 event =
    The African Diaspora
    The Harlem Renaissance
    The Civil Rights Movement
    The Italian Renaissance
    Westward Expansion
    Manifest Destiny
    Women’s Suffrage
    World War I
    World War II
    The War Of 1812
    The American Revolution
    The French Revolution
    The Russian Revolution
    The American Civil War
    The Spanish-American War
    The Franco-Prussian War
    The JFK Assasination

action =
    Basketweaving
    Aquatic Ballet
    Synchronized Swimming
    Professional Sports
    The ${adjective} Pottery Experience
    Home Economics
    Cardplaying
    Birdwatching

noun =
    Diversity
    Globalism
 …

will generate random, domain-specific text. A Grok application has been built on the top on this, allowing dynamic creation of such files, and online text generation. Check out Dokai’s blog about this during the week, as he will present the Grok part. The code is in a Git repo here: http://repo.or.cz/w/gibberis.ch.git

Thank you Lovely Systems

Thanks to the Lovely team for this sprint !

01.22.08

Snow sprint report #3 : ZopeSkel refactoring

Posted in plone, python, zope tagged at 9:27 pm by Tarek Ziadé

Today, while the zope instances we have prepared for the benchmarks where suffering from being fed with 40 000 documents, I have worked on a small task I wanted to do some times ago: refactor a bit ZopeSkel and add some tests in it.

I splitted the template objects that were located in separate modules, and added for each one of them a doctest that is running the template. This prevents the template to be broken because when you work in it, it is not obvious. As a matter of fact, at the time I did it, I found one template to be broken, so I think this is going to be useful to prevent regressions.

If you work on this package and change or add a template, please, pretty please, run the following to make sure nothing is broken:

$ cd ZopeSkel
$ python boostrap.py
$ bin/buildout
$ bin/test

Also the important thing to do for now on, is to keep accurate doctests in the docs/ folder. These files are simple to write, as I have added primitives to simplify the work (well, I have reused what Gael did in our skels at Ingeniweb).

For instance, let’s have a look at the recipe doctest:

http://dev.plone.org/collective/browser/ZopeSkel/trunk/zopeskel/docs/recipe.txt

It actually launches the paster over the template, and also launches the freshly created recipe’s own tests.

The next moves on ZopeSkel I can think of would be to:

  • write more detailed doctests. Each one of them could become a recipe on how to use the given template;
  • gather in a top document all the doctest, to provide a detailed documentation in Zopeskel’s frontal README.txt file, which is parsed and displayed at PyPI.

ZopeSkel is a very important product in my opinion, because it insuflates a standard way to write Plone code in the community. For that matter, the recipe template I have added a while ago was improved here at the sprint, and you should check on Dokai’s blog about this. He’s writing a wrapup about it right now ;)

01.21.08

Snow sprint report #2 : benchmarking

Posted in plone, python, zope tagged at 2:22 pm by Tarek Ziadé

EDIT: The chomsky was somehow limited, and was creating very similar documents. Dokai worked on another text generator that generates more various document. It is based on various file and combine random texts that are quite nice, check it out ! (same place, but the method is called random_text() (I have updated the code extract as well))

Dokai and Tom are working hard on the best way to hook the regular catalog with the Solr utility. I was a bit aside on this task so I didn’t catch up with it yet.

Anyway, I have prepared the field in order to compare a pure plone 3 with a solr-enabled one. I wanted to generate a Plone instance with many documents, which content would look realistic.

I found on ASPN a great recipe for a Chomsky-based random text generator: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440546

So I have just bundled it in a script that can be used to generate Plone folders with documents in it. When Dokai and Tom work will be ready, we will use this script to load several thoushands of documents in the catalogs, to start a few benchmarks.

Here’s the script (used in an Extension, but straight forward to bundle in a class), you can also download it from here

""" Generates documents with realistic content,
    with a Chomsky random generator
    taken here : http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440546
"""
from Products.CMFCore.utils import getToolByName
import logging
import transaction

leadins = """bunch of lines"""
subjects = """bunch of lines"""
verbs = """bunch of lines""" objects = """bunch of lines"""
import textwrap, random
from itertools import chain, islice, izip

def chomsky(times=1):
    """Chomsky method of generating random text."""
    return ' '.join(chain(random.choice(part).strip()
                          for part
                          in (leadins, subjects, verbs, objects)
                          for i in xrange(times)))
def gen_documents(context, folder, numdocs=10, root_name='doc_'):

    wftool = getToolByName(context, 'portal_workflow')
    for i in range(numdocs):
        root = i
        id_ = root_name + str(root)
        while id_ in folder.objectIds():
            root += 1
            id_ = root_name + str(root)
        desc = chomsky(5)
        title = chomsky(2)
        sub = chomsky(1)
        context.invokeFactory('Document', id_, description=desc, title=title,
                              subject=sub)
        obj = context[id_]
        wftool.doActionFor(obj, ‘publish’)
        logging.info(’created document #%d’ % i)
        if i % 100 == 0:
            transaction.savepoint()

def gen_folders(context, numfolders=10, numdocs=1000, root_folder_name=’folder_’,
                root_name=’doc_’):
    wftool = getToolByName(context, ‘portal_workflow’)
    for i in range(numfolders):
        root = i
        id_ = root_folder_name + str(root)
        while id_ in context.objectIds():
            root += 1
            id_ = root_folder_name + str(root)
        context.invokeFactory(’Folder’, id_)
        obj = context[id_]
        wftool.doActionFor(obj, ‘publish’)
        logging.info(’created folder #%d’ % i)
        gen_documents(obj, obj, numdocs, root_name)
        transaction.savepoint()

def gen_sample(portal):
    gen_folders(portal)      def random_text(data, num_words=100):
    “”"Source: http://www.physics.cornell.edu/sethna/StatMech/ComputerExercises/RandText”"”
    # Read in the file and create a prefix mapping
    words = data.split()
    prefix = {}
    for i in xrange(len(words)-2):
        prefix.setdefault((words[i], words[i+1]), []).append(words[i+2])

    current_pair = random.choice(prefix.keys())
    random_text = current_pair[0] + ‘ ‘ + current_pair[1]
    for i in xrange(num_words-2):
        # last two words in document may not have a suffix
        if current_pair not in prefix:
            break
        next = random.choice(prefix[current_pair])
        random_text = random_text + ‘ ‘ + next
        current_pair = (current_pair[1], next)

    return random_text

01.20.08

Snow sprint report #1 : indexing

Posted in plone, python, sprint, zope tagged at 11:25 pm by Tarek Ziadé

So we are here in Austria, sprinting on Zope and Plone (thanks to Lovely Systems). I have proposed a task on building an alternative indexer system for Plone. So, we worked with Dokai and Tom on this. Those guys rock, really !

Our goal was to create a plone 3 buildout that provides an out of the box solution.

Background

Let me give you some background about indexing in Zope before presenting our work. The default indexing system is quite effective, as long as your instance is not getting too big. Some years ago, we had to create an alternative indexer for CPS at Nuxeo, that would externalize the catalog because we figured out that :

  • 50% of the size of the ZODB was the catalog (I am talking about gigas here)
  • 50% of the time on object creation was taken by indexing tasks, and was getting quite slow as the instance was growing.

Those values are approximate, but quite near the reality back then (I know some people worked on making indexing better on Zope lately).

Julien then wrote a XML-RPC server that would take care of the indexing tasks and reply to queries. The software behind it was Lucene, together with PyLucene. The overall solution was quite good, beside the pain we had to install it on some specific Linux back then.

Anyway. What did Julien some years ago exists now and is called Solr. I also had some experiences a while ago with Xapian (as Sidnei did too), which is quite efficient too, and easier to use from Python (see here)

Solr, Xapian

So the first task to do was to decide what to use. I called Alan from Enfold Systems because the guys over there have been working on the topic for years.

As a matter of fact, they have created a package for Python that bind a Solr server.
They also have a Plone integration that provides an utility to index content on Solr.
Since the guys are releasing all of this very soon as open source, we decided
to go with this solution for the sprint.

It is not a technological choice (Lucene) because Alan and some guys from
Lemur are actually considering a drop-in replacement for Solr based on Xapian.

In other words, the work done will be compatible with both Lucene and Xapian technologies. Xapian is pretty interesting since it avoids deploying Java ;)

The sprint task

The task was quite “simple” since the Enfold guys did all the hard work :)
So we worked on:

  1. a buildout that builds a Solr server and launches it
  2. a Plone integration to use Solr seamlessly

The buildout

The buildout done and usable (We tried it under Windows, MacOSX and Debian)
It uses new recipe we wrote:

- collective.recipe.ant : build Java softwares using ant
- collective.recipe.solrinstance : builds a Solr instance and provide a script to launch it

If you want to try it, here’s (roughly) how (comment the blog entry in case of a problem)

$ svn co https://svn.enfoldsystems.com/public/enfold.solr/branches/snowsprint08-buildout buildout
$ cd buildout/plone-3.0.5/
$ python2.4 bootstrap.py
$ bin/buildout -v
$ bin/solr-instance &     <-- launches solr (python bin\solr-instance under Windows
$ bin/instance fg         <-- launches Zope

Then, on Zope, install SolrIntegration in the quick_installer. The next document you will publish will be indexed on Solr side, and searchable with the search box.

The portal_catalog remains though, so it is indexed twice ;) you can empty it to check
Solr is acting right.

Plone integration

The last part we need to work on is to make the SearchableText index 100% Solr based. Whit advices us to create a storage for TextIndexNG so that’s where we are heading on (should be done tomorrow hopefully)

We would also like to do some benchmarks to compare the speed and ZODB size. We will
probably use Jmeter for this.

I would like to thank Alan, Leonardo, Sidnei for their work on this area, and for releasing it as open source: I really believe that it will become a great indexing solution for Plone in the next months. I was really waiting for this momentum in indexing in the Plone community.

An installer for a buildout-ready Windows

Posted in plone, python, windows, zope tagged at 8:05 pm by Tarek Ziadé

When you need to run a buildout under Windows, you have to take care of setting up quite a few things, like installing MinGW and linking it to Python, and setting up a svn command-line client for most buildouts.

We created a simple package that contains everything needed to make your windows buildout-friendly. It is a simple zip file that contains a batch script and third-party installers. When the batch is run, the environment variables are set as well, and win32-compatible buildouts should run without problems from there.

You can get it here:

http://release.ingeniweb.com/third-party-dist/python2.4.4-win32.zip

and another version that adds developing tools like vim and tail:

http://release.ingeniweb.com/third-party-dist/python2.4.4-win32-dev.zip

This zip file is built itself with a buildout that we might publish soon so you can make a custom zip file.

01.17.08

distutils: multiple servers in .pypirc

Posted in plone, python, zope tagged at 11:06 am by Tarek Ziadé

Since I am working on PloneSoftwareCenter to make it PyPI-compatible, I have worked on distutils side to make the register and upload command more friendly when an egg has to be registered to several servers.

The problem with the actual .pypirc file is that it won’t let you define many username/password for many servers: it is dedicated for one server. In the meantime, you can specify in a command line option which server you want to deal with:

$ python setup.py register -r http://my.server/pypi

But this will take the username/password in .pypirc. So if your username differs from one server to another, it won’t work.

I have worked on an enhanced version for this, described here: http://wiki.python.org/moin/EnhancedPyPI

The patch is ready, and comes with new unit tests register and upload commands didn’t have yet. The new .pypirc format was shaped with the help and feedback of catalog-sig people, thanks to Martin v. Loewis and Fred Drake and others. I am going to submit it for inclusion today. If it is accepted and integrated we will be able to deal with our eggs like this:

$ python setup.py register sdist upload    # goes to PyPI
$ python setup.py register sdist upload -r plone.org   # goes to plone.org ;) 

The next step is to provide a patch for a permissive trove classifier in PyPI. Then all PyPI-like servers will be able to provide the same service for egg developers, no matter how they deal with classifiers.

01.08.08

sys.setdefaultencoding is evil

Posted in plone, python, quality, zope tagged at 10:45 am by Tarek Ziadé

I have recently found some UnicodeDecodeError bugs on some products, that some people couldn’t reproduced. The bug was due to a call to a CMF API that was doing a str() over the object, right before using it.

This is perfectly fine in that case, because the object is supposed to be a ZODB id, so it has to be full ASCII.

So the bug looks like this :

>>> id = u'éou'
>>> str(id)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 0: ordinal not in range(12 8) 

The people that couldn’t reproduced it because they use that ugly hack which consists of setting Python’s default encoding to utf8:

>>> import sys
>>> sys.setdefaultencoding('utf8')
>>> id = u'éou'
>>> str(id)
'\xc3\xa9ou'

This will be applied to the whole process, and Python itself dynamically removes the method from the module at it first use. From the official doc:

setdefaultencoding(name)
Set the current default string encoding used by the Unicode implementation.
If name does not match any available encoding, LookupError is raised.
This function is only intended to be used by the site module implementation and,
where needed, by sitecustomize. Once used by the site module, it is removed from
the sys module's namespace. New in version 2.0.

I can’t find the link back, but I have read once that this built-in was to be removed because it should not be used outside site.py

The problem is that people tend to add a sitecustomize.py in their environment, then work with str() and unicode() calls and forget about doing it right. The result is a major
misused of strings and unicodes and the code created will be buggy on other computers.

So never ever use this in your code. If you have a UnicodeDecodeError it probably means the function is waiting for a string. If you have a UnicodeEncodeError, it should be unicode. In the same way, do not guess the encoding in your code. You should work with one type (str or unicode) and know exactly what is its encoding.

I think this misued is partly due to a lack of warning here: http://www.diveintopython.org/xml_processing/unicode.html

Because that’s one of the first page a developer finds when he tries to understand why

he has such bugs.

See a similar entry on the topic 2 years ago here: http://faassen.n–tree.net/blog/view/weblog/2005/08/02/0

Planet and WordPress buggy title

Posted in plone, python, zope tagged at 10:13 am by Tarek Ziadé

I have found out why my entry titles are removed in all planets. WordPress recently added in their feeds a new tag in each item of the feed:

<media:title type=“html”>tarek</media:title>

That’s the one which get caught in feedparser, instead of the item real title. This bug was already noticed and added in the bug tracker: http://code.google.com/p/feedparser/issues/detail?id=83&q=wordpress.

So I guess it’s just a matter of time for Lennart, me, and some other people, to appear right in all Planets.

01.07.08

PloneSoftwareCenter: news from the PyPI front

Posted in plone, python, zope tagged at 2:44 pm by Tarek Ziadé

I have almost finished the work in the pypi branch for PloneSoftwareCenter PyPI support. There are a few things to polish but it works. (90% of the work was already done by Sidnei, so I have done mostly minor refactorings). I guess I’ll polish it until Alex Clark merge it into the trunk.

If you want to give it a try, you can use the buildout I have added in the collective here:

http://svn.plone.org/svn/collective/PloneSoftwareCenter/buildout/branches/pypi/

I also have a running prototype here:

http://products.ingeniweb.com/catalog

were we are trying it. This will be our public PSC instance and will soon contain all our packages and some public packages mirrored or repackaged as eggs.

Current features

The features are:

  • support of distutils and setuptools register and upload commands
  • automatic creation of projects and releases
  • support of PyPI’s simple page, so it can be used by easy_install and zc.buildout

If you want to try it up ask me for a user account.

Trove classification

The current default categories uses PyPI to classify the packages, and everything is hooked to the register command. So when you upload a package, it will appear in the proper categories in the software center.

That said, you can change the categories to manage your own. When a package is uploaded, it will just ignore the unknown categories. I am working on PyPI side so the Cheeseshop itself works the same way.
(hopefully, it will be accepted, because the guys from the catalog team are helping me out in polishing my proposal)

see my document at : http://wiki.python.org/moin/EnhancedPyPI (see Making PyPI permissive for Trove classification)

Dealing with several PyPI-like servers

Last but not least, as a Plone developer, the final goal is to be able to register and upload packages to both PSC and PyPI. This is a bit tricky with the current distutils implementation and I am working on this so it can deal with several servers.

The final form will be to be able to do:

$ python setup.py register sdist upload -r http://example.com/repository       # registering and uploading at example.com
$ python setup.py register sdist upload        # registering and uploading at PyPI

There are some default policies in PSC though, to avoid people uploading projects and file to easily: if the user is a simple member, a register command call will create a project and submit it, and the upload command won’t work until the project has been accepted and published.

Next steps

I am really excited about having the same standard everywhere, and to be able to deploy our packages in the community through a simple command line:

  • at the cheeseshop
  • in our private software center
  • in our public software center
  • hopefully, in plone.org when it goes Plone 3.x

The next steps will be:

  • to get some feedback from the Plone community, and build a TODO list with it (I have to collect Wichert remarks from the ML to start to build it)
  • to polish the code
  • to add XML-RPC APIs, like PyPI has