09.30.07

Extending setuptools: adding a new command

Posted in python, quality at 1:51 am by Tarek Ziadé

Before deploying a package with python setup.py install, it’s a good idea to launch the tests with python setup.py test.

This command can be used as well to quickly launch tests within a package that is being developed. Since setuptools can be extended, other commands can be added to be launched from the setup script, while you work in your package.

I have created for example a qa command that launches pyflakes over the package code, to make sure I don’t leave unused import. I could have used a direct pyflakes call, but my QAs test are going to grow so keeping the QA script details under a python setup.py qa call is a good practice. This will also make buildbot integration easier, as I can check for package QA through an unified serie of calls, that plays with the package setup.py script.

Commands are simple class that derives from setuptools.Command, and define some minimum elements, which are:

  • description: describe the command
  • user_options: a list of options
  • initialize_options(): called at startup
  • finalize_options(): called at the end
  • run(): called to run the command

The setuptools doc is still empty about subclassing Command, but a minimal class will look like this:

 class MyCommand(Command):
     """setuptools Command"""
     description = "run my command"
     user_options = tuple()
     def initialize_options(self):
         """init options"""
         pass

     def finalize_options(self):
         """finalize options"""
         pass

     def run(self):
         """runner"""
         XXX DO THE JOB HERE

The class can then be hook as a command, using an entry point in its setup.py file:

 setup(
     # ...
     entry_points = {
     "distutils.commands": [
     "my_command = mypackage.some_module:MyCommand"]}
 )

This will add an entry point when the package is installed, so you can run your new command this way:

 python setup.py my_command

You can give a try of my qa example by installing my eggchecker package:

 easy_install http://programmation-python.org/pycommunity/eggchecker-0.1.tgz

That’s merely a draft, but will show you how pyflakes is launched within the setup.py script.

09.28.07

A co-server for Zope

Posted in plone, python, zope at 8:13 am by Tarek Ziadé

 

Zasync

A few years ago, when we hit with CPS on some big customers intranet scalability problems, we started to use ZAsync in order to perform some tasks in the background. That improved a lot the application overall performance. What ZAsync does is recording in BTrees within the ZODB tasks to perform, let’s say Python scripts to simplify. Then a twisted client that runs independantly opens the ZODB to read the BTree and find the task to perform. It acts like another Zope thread in some ways. But there’s something I never understood:

Why the job queue is stored in the ZODB database ?

When we talk about scalability, most of time, the infrastructure is more complex than a simple ZEO. It has Apaches, smtps, load balancers all over the place. It has cron tasks to perform a variety of things, link sending mails, creating images, or anything that can be done in the background. It has most of the time other piece of software that perform other things. Having a co-server that gives Zope code the ability to perform background tasks is good.

Having a co-server that gives any software the ability to program a task is better

Many applications, many different Zope instances, can benefit from a centralized task manager.

 

Quartz

In Java world, I have used a server called Quartz. It is an independant task manager, where you can register tasks and perform jobs, given a timing. It’s like a smart cron. Using the beans technology, it can run code independantly, or run it within a Java Server application’s context.

Why don’t we have such a software in Python ?

Maybe we do, but I have never found it, so i ported part of the idea to Python in a tool called TaskManager, that I use for example on fr.luvdit.com which is a Django application. It sends mails, calculates neighbourhoods, etc.. Maybe I should release it but that’s a packaging work I didn’t find the time to do. Any piece of Python software can register itself as a task, in order to provide a service. The jobs are stored into a SQL Database, that is opened through an API by all the clients that want to perform a task, and by the co-server that reads the queue and actually perform the tasks. It has three queue in fact, for different priorities. The client-side APIs are really simple and are nothing but SQL queries.

 

lovely.remotetask

Back to Zope. Lovely systems works on a Zope 3 tool, which seems to be working a bit like ZAsync: it stills stores the tasks in the ZODB, but dedicates a Zope application to work as a web service provider if I understood well. It’s the way to go in term of infrastructure but I think that it’s overkill to use a Zope instance for that.

Why do we need to deploy a whole Zope stack to have a co-server ?

A dedicated, pure Python application, using a SQL database, fits better because several task runners can work in the same queue, to create a real producer-consumers queue. In their need to perform tasks on various platforms, having a centralized job queue and several executors is more scalable because the producer doesn’t deal with several co-servers.

Furthermore, the XML-RPC layer is not a necessity, and not as robust as SQL: if the co-server is down, the Zope server cannot send jobs anymore, or check for job states and get them. Working with a SQL table prevent from this. You might argue that this is the worst scenario, but by experience, the more application servers an infrastructure has, the more potential point of failure you get. You might argue that the SQL server might go down as well, but it’s not a code stack, and just holds data to be processed: all the functionalities, thus the weaknesses, are on the co-server side. You might also argue that it makes the solution Python-dependant, but it would be deadly simple to provide a client for another language.

Anyway, using the ZODB to store such things and a Zope to play with them is a small mistake in my humble opinion, even if it’s based on PersistentQueue, which looks pretty robust. Let’s keep this kind of database do what it was meant for: storing persistent objects that are publishable.

 

What I would love to have

The perfect co-server that I can think of, would be an independant Python software, like TaskManager that would look like this:

          -------        -------  <-> co-server instance 1 / win32
         | zope  | <->  | sqldb | <-> co-server instance 2 / linux
          -------        -------  <-> co-server instance 3 / linux
                           ^
 ----------------          |
| another server |<--------
 ----------------
  • sqldb is a database that store jobs;
  • each arrow is provided by a python API, that knows how to interact with the database;
  • a co-server is an independant, pure Python runner, that picks up some work into the DB;
  • each co-server instance is able to perform tasks, that are provided through a plugin system.
  • for zope-dependant tasks, a generic task provides an entry point to execute code through XML-RPC calls or through a direct ZODB opening to avoid eating a thread (eg à la ZAsync);

OK, this is exactly Quartz :)

In the last five years, most of the scalability problems I bumped into, were resolved by a good practice: let’s be less Zope-centric when we talk about infrastructure.

I would be pleased to have a few comments from Lovely guys on this topic, and I thank them for their latest post, that helps a lot the community to think about scalable solutions for Zope.

Using JMeter as a functional test tool

Posted in plone, python, zope at 8:05 am by Tarek Ziadé

Today I made an audit on some customer intranet, and I used JMeter to perform stress tests. This tool is awesome, as you can get a whole lot of live statistics, and create a powerfull, distributed stress campaign.

There are some features to control that the output of HTTP calls are right, with simple but sufficient patterns (the output contains, the output doesn’t contains, the header has.., etc..) and regular expressions can be used.

A high-level functional test is really nothing much more than that: it performs a user story and check for the result. Ok, maybe some tool like Selenium have more features, but they are not essential ones, and JMeter brings some better things.

What brings JMeter beside functionnal testing are:

  • a powerfull reporting tool
  • the ability to stress-load your application with the user stories

A comparable tool, less powerfull though, is ben’s funkload.

-> Creating you app through JMeter will give you the opportunity to tune it without extra work.

It won’t make me drop zope.testbrowser tests, because those are merged within my code and explain how it works in doctests, but it will surely make my customers feel better with JMeter reporting capabilities, and myself calmer with its performance analysis:

“Hey, look at the screen, that’s your functionality #123 running right now, and you can see its performance through this live performance graph”

graph_results.png

09.24.07

Eight tips to start with Python

Posted in python at 7:56 am by Tarek Ziadé

A friend of mine is starting Python. I tried to sum up some tips for him, that may be useful to others. Don’t hesitate to comment it if you think something important is missing.

  1. Get the best online documentation.

    There are a few online documentation you must read:

    • the official tutorial, that gives you a quite complete overview of Python;
    • the standard library module index. You can download it to simplify the search through greps. This is the documentation you get through the help command in the prompt.
    • Active State’s Python Cookbook. There are thousands of code snippets that are created, ranked, categorized and commented by developers.
    • Dive Into Python online book, that makes you discover Python features through well thought examples.
  2. Read PyCon, EuroPython and Pycon UK wrapups and slides.

    They are the three main Python events, and a lot of things are happening there. You’ll learn a lot by reading the talks slides. If you can go there, it’s even better: sprints, bird of feathers and lighting talks are organized. To convince your boss to send you there, you could make a talk proposal “My first steps in Python” ;)

  3. Suscribe to the right feeds.

    • The mainstream is Planet Python. It gathers most of the blogs out there, so it is the best place to start.
    • Pythonware’s Daily Python URL. Human-filtered feed. It used to provide several dozains of links per week, but it seems to have slowed down, and provides a few links a week now. I think it’s better this way.
  4. Learn and use the rising standards.

    There are a few libraries that have a deep impact on the way people write and distribute their work:

    • setuptools: helpers to build and distribute your code eggs. A public repository à la Perl’s CPAN called Cheeseshop is wired with this library so people can distribute their code there. It’s one of the major innovation of last years in Python world in my opinion.
    • sqlalchemy: The ORM that is now used by the majority of Python frameworks. Its flexibility is impressive. I think there is no equivalent tool in any other language (please let me know if there is);
    • Python paster. This tool allows you to create templates that can be used to generate skeletons for your code. It is used by many web frameworks to provide people a simple way to generate a standardized boiler-plate code canvas when they start up something. This is done in Java for quite a long time (you cannot do without it in Java, otherwise it would take you years to write any program ;)), and tools like PyDev and Eclipse would provide the canvas to do similar things. But the paster is independant from any IDE;
    • reStructuredText: learn how to use it. It’s our LaTeX. Your code documentation should use it.
  5. I am sure they are other tools out of my domain of expertise that are major. I am thinking of libraries in the scientific world for example.

    Anyone can help me on this topic to complete this tip ?

  6. Ask for help. The three places you can get some help are:
    1.  the mailing list
    2. the irc channel #python on freenode.
    3. the tutor mailing list. Mihai Campean says: “This is a list specifically for those new to Python and those interested in helping people learn the language, and the atmosphere is very friendly. It’s probably a better place to start than python-list, in my opinion”

    There are some talented guys that dedicate their free time to help newcomers.

  7. Try to adapt your successfull code patterns.

    When I started Python, I tried to adapt what I used to do with the tool I mastered then (Delphi). Since There should be one– and preferably only one –obvious way to do it. (try import this in a prompt), that helped me a lot to learn and understand all the subtles of Python on use cases I mastered.
    The most pleasant thing about it is that you quickly drop all Python books and guide to work with the language, unlike Java for example, where you need to keep many reference books on your desk.

  8. Share on your experience and participate !.

    A newcomer (yeah! fresh blood!) experience is a highly valuable material for the language advocacy: the discovering state of mind sometime reveals weaknesses or absurdities experienced users don’t see anymore. Furthermore, fresh new ideas are often brought by people that comes from other communities. If you feel that something is absurd, unclear or wrong, you should start a thread on the language mailing list. If you have an idea onany kind of enhancement, maybe it worth a Python Enhancement Proposal.

  9. Watch what is being done in Python 3, PyPy and web frameworks

    Python 3 is the next version of Python, PyPy is Python written in Python. Web frameworks like Django or Zope are large Python codebases. These three sub-communities have something in common: they form the R&D of the language.
    Zope
    for example, has enhanced a lot setuptools and doctest through a massive feedback. Keeping an eye on them even if you don’t use them will make you live and understand what rises in the language.
    PyPy
    is an amazing project. Even if you don’t understand everything (Python in Python ? what the… ;)), seeing one of Armin Ringo talks will give you an instructive high level view of Python. Now for Python 3, even if you cannot read and understand all threads in the dedicated mailing list, keeping an eye of Guido’s wrapups and thread subjects will help you to do the jump on P3k, and probably make your Python 2 code look nicer.

09.21.07

How to generate proper reStructuredText titles with Python Paste

Posted in plone, python, zope at 9:54 am by Tarek Ziadé

Python Paste and ZopeSkel are just great, if you don’t know them and you are working on Zope packages, you should really take the time to look at them.

  • Python Paste provides, besides other features, a engine to generate any kind of package structures based on templates. The templates can be written in Cheetah syntax;
  • ZopeSkel is a serie of templates that helps a Zope and/or Plone developer to start a Python, Zope or Plone package, taking care of all the boiler plate code generation, to make sure the packages are done in a egg-compatible, standard way.

Basing my work on ZopeSkel, I have started to create a serie of custom templates to speed up and simplify package coding bootstraps. The documentation in these packages are in reStructuredText, and I bumped into a small problem: when you create for example a “README.txt” in your template, which has a title that uses the project name, for example:

======================
$project documentation
======================

blablabla

the result may vary depending on the variable value:

======================
kool documentation
======================

blablabla

or even:

======================
i_like_long_names_for_packages documentation
======================

blablabla

It’s ugly, and the reST file won’t compile if you try to generate html or PDF with docutils.

Since Paste uses Cheetah, we can fix this, by calculating the length of underlines:

#repeat $len($project) + 13
=#slurp
#end repeat

$project documentation
#repeat $len($project) + 13
=#slurp
#end repeat

blablabla

This will create the proper length. You might argue it’s a detail, but that saves me almost ten seconds for each new package now ;)

09.14.07

To blob or not to blob ?

Posted in plone, zope at 4:08 pm by Tarek Ziadé

Back to the storing discussion: at this time we have a quite complete tool to handle big files in Plone: File System Storage (FSS).

 

FSS Current features

The idea of this product is to provide a storage with a backup mechanism, to be able to do some restore. It can handle big file and store them on server-side with several strategies:

  • Flat storage: All field values are stored in a flat structure. This strategy is the default one.
  • Directory storage strategy: All field values are stored in a directory structure.
  • Site storage strategy 1: All field values are stored in a directory structure mirroring structure of PloneSite. Backup files are stored in a flat structure.
  • Site storage strategy 2: All field values are stored in a directory structure mirroring structure of PloneSite. Backup files are stored in a flat structure.

More information on this here : FSS Strategies

 

A Blob Strategy ?

ZODB Blob, that are beeing integrated into Zope provide similar features: it stores on the filesystem the file and provide access from the ZODB. To configure such a storage, the zodb_db section of zope.conf would look like this:

<zodb_db main>
    # Main FileStorage database
    <blobstorage>

      blob-dir $INSTANCE/var/blobs
      <filestorage base="1">
        path $INSTANCE/var/Data.fs
      </filestorage>
    </blobstorage>
    mount-point /
</zodb_db>

This tells Zope at startup to create a blob-compatible storage and store blobs in the $INSTANCE/var/blobs folder. Then, the subsection provide the regular Data.fs.

Using this kind of configuration will let us use ZODB.blob APIs in the code. To give a shot on blobs, I have created a new strategy on FSS, that uses blobs. A base class handles blob readings and writings:

class ImplicitBlob(Implicit, blob.Blob):

    def __init__(self, title='', path='',
                 name='', mimetype='text/plain'):
        """stores blob metadata"""
        blob.Blob.__init__(self)
        self.title = title
        self.path = path
        self.mimetype = mimetype
        self.name = name

    def get_size(self):
        """Return the size of the blob."""

        file = self.open('r')
        try:
            file.seek(0, 2)
            result = file.tell()
        finally:
            file.close()
        return result

    def updateMetadata(self, **kwargs):
        """fills the metadata"""
        for key, value in kwargs.items():
            if key in self.__dict__:
                setattr(self, key, value)

    def writeValue(self, value):
        """fills the blob"""
        file = self.open('w')
        try:
            file.write(value)
        finally:
            file.close()

    def getValue(self):
        """returns the blob whole content"""
        file = self.open('r')
        try:
            return file.read()
        finally:
            file.close()

The real version can be found in the collective repository in a branch. This Zope2-style code is not what I would have done in a new product, but my goal was to quickly add the feature in FSS to make some tries.

It took me quite a time though, to set up the test fixture to be sure to run the tests over a BlobStorage instead of a DemoStorage. You must create a custom_zodb.py file into your tests folder and make sure it’s called before the TestCase imports. Here’s my file:

from ZODB.FileStorage.FileStorage import FileStorage
from ZODB.MappingStorage import MappingStorage
from ZODB.blob import BlobStorage
from tempfile import mkdtemp

base_storage = MappingStorage("test")
blob_dir = mkdtemp()
Storage = BlobStorage(blob_dir, base_storage)

(Thanks ZODB guys for using doctests in your code, that helped me much)

Once the ImplicitBlob is made available, I just addded a new strategy in FSS that uses it this way:

  • the tool becomes a BTreeFolder2 object and holds ImplicitBlobinstances in it;
  • each blob id is the file uid;
  • the strategy knows how to return a File object;

Now I can add FSSItem instances that are stored into blobs. And guess what, it seems to work ;)

 

What’s next ?

Using blobs seems to be the future of FSS, because it has all the features needed to store files. Furthermore blob are transactional, and this is quite a difference with FSS’s regular file storages, because in that case, it’s not necessary anymore to set up a NFS to be ZEO-compatible. But all FSS strategies have use cases that we need to keep. For example, the Site Storage Strategy is really sweet: user can find back their file on the filesystem with the same name, etc. Even though we know this is not really important technically speaking, it can be reassuring for the customer.

For large sites though, I wouldn’t use blobs and would set up a specialized co-server. Tramline fills the bill and we are thinking about providing a tramline-friendly strategy in FSS, in order to provide direct access to the files.

But all of this is a work in process, and the blob strategy still needs a lot of testing.

09.10.07

How to handle large files in Plone ?

Posted in plone, python, zope at 8:42 am by Tarek Ziadé

The first time I had to help out a customer (a fifty year old lady that had no interest whatsoever about computers) on a Plone-based intranet, I had to explain to her why she couldn’t upload really big files.

That was really tough, as she couldn’t understand why the system had such a limitation. I think she was right. If we drop the technical point of view, a sophisticated system such as Plone should provide a transparent way to upload big files and handle them smoothly on the server side, so the user doesn’t feel any difference. Think about it: what’s the functional difference between a big file and a small file ? Merely none, except that it’s longer to put or to get.

 

Uploading

The first problem when we deal with big files is the upload time. Browser doesn’t provide any feedback, unless you install some Firefox extension or you use a dedicated protocol, such as FTP. In a web interface, there’s nothing but an animated logo that is moving around just to say: “Hey, I am not dead !”. Ruby On Rails guys came up with a great feature on this topic: they added in their publiser a few apis that would let client-side Javascript:

  • display a progress bar;
  • cancel the upload at anytime.

I am not a technical guru on HTTP protocols but I am really wondering why this is not already available in Gmail… I tried myself to implement it a while ago on Zope 3 publisher, and came up with something that was working, but never had the time to polish it.

The idea is to maintain on server-side, for each ongoing uploads, some infos that can be read by the client, and a cancel API. The only important point is to provide a secure mechanism so no other client can read the infos of another client. The client-side JS code then can call asynchronously the server to display feedback, and provide a cancel button.

At this point, we may consider that it is better to use other piece of software, like Tramline, or some dedicated Apache plugin. But in a marketing point of view, I think it’s really important to be able to provide this feature with an out-of-the-box Plone.

 

Server-side handling

OK, so our file is uploaded. My lady at the office now complains that the system is getting very slow. No wonder, her ZODB weights now several gigas… Again, functionaly speaking, that’s a non-sense. Plone should be able to work with these big files without making the ZODB so heavy. The ZODB is a great thing for light objects, but was not meant to hold big chunk of data. A specific thing has to be done. In Java world, they have an advanced data storage for Content Managment System, called JSR-170, and implemented by the JCR. In this system, a tree of folders is maintained and big files are stored in blobs (Binar Large OBjectS), keeping the whole thing scalable, no matter what the users store in it.

In Zope, we don’t have a lot of solutions, there’s the File System Storage (FSS) product (and a few similar products) in the Plone Collective, and that’s about it. The FSS provides a smart proxy over the file system, that let the ZODB breathe and the user handle his big files. But this is an extra product, created to provide a really missing core feature.

The 3.8 version of ZODB though, providen this feature now, through blobs. On Plone side, it seems that the latest Plone4Artists sprint in Boston has boosted the work in the topic and blobs will be available in Plone through ATBlobField.

This is great news, and I hope we will soon be able to make all files uses this feature transparently (big file, small file, what the difference ? they should all be stored out of the ZODB)

 

Downloading

One of my colleague had a tough issue last week on big files. The File System Storage was not acting right when the user was downloading PDF files from the Adobe Browser plugin: the file iterator internally provided by the Zope publisher was not acting right because it doesn’t support the range feature. This HTTP feature used by the plugin make it possible to get pieces of files in non-sequential orders. Though, it’s not possible with an iterator to provide pieces of files without having to rewind. So iterating is of no use there. The solution would be to provide a smart system that knows who asked the file and how, to smartly instanciate an iterator or a regular accessor, maybe with a cache system for called ranges. I need to dig on ZODB blobs, maybe they already provide such a feature. If you read this and know the answer, please let me know…

 

Conclusion

There’s still a lot of work in the topic, and I think the Plone4Artists project is doing a great job to enhance things. The three topics to work on are:

  • cancelable upload with live feedback;
  • transparent use of blobs to store files;
  • smart downloading capabilities.

This could make a great sprint I guess, to create some kind of Plone-FSS-NG, so my Lady can upload her movies in her Plone.

09.08.07

Plone 3 UI : omg !

Posted in plone, python, zope at 8:10 am by Tarek Ziadé

The last time I *really used* Plone was back in the days I used to have a website called “zopeur.org”, a few years ago. I used to work then with Plone 1.something, and was quite impressed by its features, but disliked a bit its interface.

I worked then at Nuxeo a few years with CPS, trying to enhance its interface through JS and other tools, until they quit Zope. Now, a year after, I’m back to Plone for my new work, and I must say that I am really impressed by the new Plone user interface.

The guys did a great work enhancing the user experience, through clever layouts and a smooth integration of KSS, a JS framework. I worked a little bit on it with Godefroid when it started but nothing compared to what it became: simple, yet powerful.

I didn’t write to much on this blog lately, because it’s hard to write for two blogs (one in french, one in english) but i’ll try to do it regularly for now one, to talk about my Plone experience, since it’s now becoming my every day work. I guess my next posts will focus on the tools I am discovering, and all those things that have evolved a lot in the past 4 years, focusing onthe UI of course, but also on Zope 3 integration, since I took a little bit part in its development.

The manual that was published yesterday is a good starting point for me : http://plone.org/documentation/manual/plone-3-user-manual