Fetchez le Python

Technical blog on the Python programming language, in a pure Frenglish style

Starting The Hitchhiker’s Guide to Packaging

DON’T PANIC !

Following-up various discussions we had about the need to provide a guide for people that want to understand our rather complex packaging eco-system, and how they can use it,

I’ve decided to start The Hitchhiker’s Guide to Packaging. The goals are:

  • consolidate in one single Sphinx-based powered website, tutorials and documentation on how to use the various packaging tools and how to package, release and distribute your Python application.
  • provide guidelines to the community that are up-to-date with the latest work done in packaging
  • see if it can be added in the Python documentation itself at some point when its mature, as Georg Brandl suggested.

If  you want to join this documentation effort, by adding your existing documentation to this guide, by writing new one, or by helping on copy editing or reviewing, let me know (#distutils on Freenode, or by mail)

The project repository is hosted on bitbucket: http://bitbucket.org/tarek/hitchhiker-guide-packaging/

I’ve also created a google group: http://groups.google.com/group/packaging-guide

Filed under: python

Distutils and Distribute status (part #1)

Someone told me on IRC that it’s currently hard to follow what’s going on in the packaging front. The truth is that it’s almost impossible if you don’t read all mails posted in Distutils-SIG.

So here’s a quick wrap-up that can save you some time if you are not reading Distutils-SIG.

PEP 345 – Metadata 1.2

We are almost done with the update of PEP 345. This PEP is describing the Metadata fields for a distribution, that get added in the file named PKG-INFO. This file is inserted in all your distribution and also published at PyPI. It’s the ID card of your project.

We are adding these fields in the metadata:

  • Maintainer : the maintainer’s name
  • Maintainer-email : the maintainer’s email
  • Requires-Python : What Python versions are compatible with this distribution
  • Requires-External : A list of external dependencies, like “libpng”, “libxslt”
  • Requires-Dist : A list of Python dependencies, from the names founded at PyPI. like “zope.interface”
  • Provides-Dist : A list of additional distribution names this distribution provides (as a complement to the one provided in “Name”)
  • Obsoletes-Dist : A list of Python dependencies that are incompatible with the current distribution

Another important change is environment markers. An environment marker is a marker that can be added at the end of a field after a semi-colon (‘;’), to add a condition about the execution environment.

Examples:

Requires-Dist: pywin32, bar > 1.0; sys.platform == 'win32'
Obsoletes-Dist: pywin31; sys.platform == 'win32'
Requires-Dist: foo; os.machine == 'i386'
Requires-Dist: bar; python_version == '2.4' or python_version == '2.5'
Requires-External: libxslt; 'linux' in sys.platform

This will allow developers to define different conditions depending on the target platform. Moreover, this will allow tools like Pip to get a list of all dependencies for a given project and a given platform just by querying PyPI, and with no downloads or build required !

Last, for all the fields that manipulates versions, PEP 345 will use the version scheme described in PEP 386.

PEP 386 – Version scheme

We’ve designed in PEP 386 a version scheme that works with most Python software we know about. This version scheme comes with a new version comparison algorithm that will be provided by Distutils.

The scheme is in pseudo-regexpr (read the PEP for more details):

N.N[.N]+[abc]N[.N]+[.postN+][.devN+]

Don’t be afraid ! It looks complex but it’s not. The apparent complexity is due to the fact that we need to be able to work with development versions and post-release versions.

There are good chances that your project already works with this version scheme. If you want to give it a shot, there’s a prototype you can play with in an external repo here: http://bitbucket.org/tarek/distutilsversion/

PEP 376 – Installation standard

PEP 376 is quite completed now. We have our “standard” for site-packages, we know how to query installed projects, and how to remove them.

The discussions are now focusing on the “data” problem. Which is : how to describe in Distutils, in a more elegant way, the data files you are using, such as images, man files etc.

This is required to provide to developers more control on how their data files are installed on the target system, and to the packagers more tools to re-package a Python distribution.

Wolodja Wentland has been doing a lot of work in this area and leads this “data” effort. You can follow the discussion on this work in the Python wiki, starting at: http://wiki.python.org/moin/Distutils/DiscussionOverview.

PEP 382 -Namespaces packages

Distribute comes with a namespace package system, that allows you to have packages under the same namespace, spread into several distributions.

That’s what Plone and Zope use to be able to release all those plone.* and zope.* distributions.

Martin von Loewis proposed to implement it in Python, and this is described in PEP 382.

We are now waiting for Martin to implement it, and are ready to drop in Distribute 0.7.x the namespace feature in favor of supporting the PEP 382 one.

Distutils redesign discussions

One thing that makes Distutils a bit hard to work with, is how commands are designed. David Cournapeau (from the Numpy project) gave us an example of a use case that makes it hard. He basically needs to run the “build” command knowing the finalized options from the “install” command.

In other words, when you call something like :

$ python setup.py install --prefix=/some/place

The install command will use the prefix option to cook some other options. The build command that needs all the options needs in that case to look over the install command to get the values.

This is not optimal because it means that a build command depends on an install command to run. It also makes options redundants from one command to the other.

The solution we are going to try is to create a new command, called configure, that will be in charge of building a file with all the options that are required by the build command and the install command.

This is not new. It has been implemented years ago in 4suite, and it’s the philosophy behind tools like scons, etc: a configure/make/make install principle applied to a Python project.

This redesign is going to occur in Distribute 0.7. Once it’s ready, if the community has tried it and gave us positive feedback, I’ll push it in Distutils.

It might happen before Python 2.7 is out, it might not.

Other topics

There are many other topics, like PyPI mirroring (PEP 381) etc. I’ll write a blog entry later for these.

Filed under: python

virtualenv and zc.buildout now with Distribute included

We are still actively working in fixing all the remaining bugs in Distribute (our Setuptools fork).

But we have reached an important milestone this week: both virtualenv and zc.buildout now comes with an option to switch to Distribute.

In virtualenv:

$ virtualenv --distribute ENV

In zc.buildout, using its bootstrap.py file:

$ python bootstrap.py --distribute

Enjoy !

For those who may wonder why they should switch to Distribute over Setuptools, it’s quite simple:

  • Distribute 0.6.x is a drop-in replacement for Setuptools
  • Distribute is actively maintained, and has over 10 commiters
  • Distribute 0.6.x offers Python 3 support !

And if you still struggle with packaging issues, the place to hang around to get some help is the #distutils IRC channel on Freenode.

Filed under: distribute, distutils, python, zc.buildout

First Distribute mini-sprint (online)

While we are working at fixing bugs in 0.6.x, we are organizing a first online coding sprint for the 0.7.x series, and for various community tasks.

What are we going to do ?

  • We need to finalize the 0.7.x renaming and splitting work
  • We also need to define a more detailed roadmap for each splitted package (new features, deprecations, etc.)
  • We need to build a better test environment, and see if we can set up a buildbot for our work
  • We need to write a tutorial explaining how Distribute can be used in a project. (Python 3 support, Moving from Setuptools to Distribute, etc)

Who can join ?

There are no particular level required to participate, but we won’t be able in this session to teach to participants how to work with Mercurial, Python, etc. But if you want to help in the Documentation part or testing, it’s as important and useful as the code and we are welcoming you.

Last, if you are just interested in evaluating Distribute for your project, you are welcome to join, we will help you.

Where ?

Online, on IRC – Freenode – #distutils channel

When, how long and how ?

Possible times (all CEST +0200 UTC) :

  • Sunday October 18th – 6 pm
  • Tuesday October 20th – 7 pm
  • Wenesday October 21th – 7 pm
  • Friday October 23th – 7 pm

The sprint will be held for a minimum of 3 hours, and people will be free to leave when they want.

The most important thing is to be present on time when the sprint starts, so we can make groups and kick it off. If you want to join, add your name here: http://doodle.com/bfgv3yi3pi48buuv

Filed under: distribute, python

top-posting, mobile devices, mail threads and semantics

There’s an interesting discussion on python-dev about how hard it is to follow a thread when people are starting to top-post, meaning they are answering by quoting the whole text and putting their answers at the top. I even got bitten by someone once because I was top-posting (don’t get me wrong, he was right about it, I was just not fully aware of the problem)

First of all, if you use a mobile device, there are good chances that the mail application you are using doesn’t give you the choice : it will quote the text for you and will let you answer at the top. That’s how it works on my android (HTC) phone and I couldn’t find a way to change it. I am expecting mail apps on mobile devices to improve on this.

But this problem reflects how hard it is to follow a thread with +100 answers. Worse, depending on the way people are quoting to provide an answer for a specific part of a mail, some people will just stop reading it.

Gmail is doing a pretty good job to reduce this problem, because it will automagically hide old content and you will only see new content on every new mail in the thread.

But some people are not using Gmail for good reasons.

The other problem with gigantic threads at python-dev is that they often end up in a tree of several sub-threads, making it very hard to follow what’s going on if you don’t sort mails by threads in your client. And again, this is not possible in some mail clients.

I was very frustrated about this problem on gigantic threads about Distutils because I was seeing people “lost” in a branch of the thread, asking questions that were answered at the other end of the tree.

So how could we improve on this ?

Imho, mail threads are not suited for design discussions. I think a way to improve the situation could be to link mails by keywords.

Everytime you answer a mail, instead of quoting some of its text, you provide some keywords related to the topic you want to discuss, and you just type a plain answer. Your answer and the original mail will then be linked through a semantical relation, like a RDF triplet. These keywords could be new headers in the mail that is sent.

Let’s imagine all mail client are able to sort and browse the threads by keywords, and to list the used keywords on the side of the thread. Meaning that everytime you send a mail, you can pick one of the keyword that was already used, to limit the number of keywords.

Does that make any sense ?

Filed under: python

Distribute 0.6.4 released – zc.buildout support

We’ve juste released Distribute 0.6.4.

This version is now fully compatible with zc.buildout, meaning that you can use Distribute in your buildout transparently as long as you use our special zc.buildout bootstrap file located here : http://nightly.ziade.org/bootstrap.py.

We’ve also added in Jannis’ upload_docs command, that allows a project to upload its Sphinx based documentation to PyPI.

As a matter of fact, Distribute is using it now and you can reach its documentation at : http://packages.python.org/distribute.

If you have any feedback, or want to help around, drop us a mail at distutils-sig or come in our IRC channel in #distutils (freenode)

Filed under: distribute, python, zc.buildout

Python 2.6.3 and Distribute.

Python 2.6.3 is out, will a lot of bugs fixed. I had my share with Distutils and fixed quite a few, and 2.6.3 is looking very good so far !

Just a quick note for Setuptools users: you might bump into a problem if you provide a C extension. The setuptools code makes some assumptions on how and in which order the Distutils build_ext API are called. It also overrides some of these API to do some internal extra work. In other words, the way Setuptools patches Distutils makes it very sensitive to any internal Distutils code changes. In this particulare case you might have this bug:

File "...setuptools/command/build_ext.py", line 85, in get_ext_filename   KeyError: 'xxx'

The fix is quite simple, it can be done by the end-user or in your package (which is better of course).

  • In your package : use “Distribute >= 0.6.3″ distribution instead of the usual “Setuptools == 0.6c9″ distribution in you dependencies list. The code remain unchanged and you can still “import setuptools” and have it working fine.
  • As an end-user: just do a Distribute installation and your fine “(sudo) easy_install Distribute”

Hang in #distutils on Freenode, or drop a mail in distutils-SIG in case you have a problem.

Just to make things clear: The Distribute 0.6.x series is a mirror of Setuptools 0.6c9 code, with bug fixes.

Filed under: distribute, distutils, python

Got Python 3 support ?

One thing that slows down the adoption of Python 3 is the low number of available third party projects. If your project depends on some other projects, you are pretty lucky if they are all available under Python 3.

I don’t think that the problem comes from the Python 3 syntax adoption, because Python provides a pretty powerful tool to convert your Python 2 code into Python 3 code, called 2to3. (notice that the backward process is also available since this summer : 3to2).

The biggest issue in my opinion is the lack of packaging support. Distutils itself works fine on Python 3, but I am talking about Setuptools, which is widely used in the community and doesn’t work under Python 3. So if your project, or one of its dependency uses Setuptools, you can’t switch to Python 3.

Well, I am glad to say that this is not true anymore, thanks to Martin von Löwis, Lennart Regebro and Alex Grönholm that have been working on Distribute’s Python 3 support lately.

The Distribute project is a fork of the Setuptools project and is now fully compatible with Python 3.

If you are using Setuptools, install Distribute 0.6.3, read its docs/python3.txt file and add Python 3 support to your project.

And if you need help in porting your distribution to Python 3, drop in Distutils-SIG, we will help you.

Filed under: distribute, distutils, python

static metadata for distutils

In Distutils, every package has some metadata fields, defined in PEP 314.

The setup.py script is the place where you provide them, by calling the setup function, located in distutils.core. Each argument passed to the function can be one of these metadata.

So basically, you can describe your distribution in the setup.py file like this:

from distutils.core import setup
setup(name='MyDistribution', version='0.1', description='cool',
      packages=['my_package'], ext_modules=[Extension('foo', 'foo.c')])

Notice that packages and ext_modules in this example are not part of the Metadata fields. They are extra fields used by some commands.

From there, various distutils commands can be called using this script. They will get these options and act upon.

For instance, the sdist command will build a source distribution and create a static PKG-INFO file that contains the metadata fields. It will extract them from the arguments you’ve passed to setup.py.

The install command will install this PKG-INFO file in your Python installation alongside your packages and modules (since Python 2.5) and some tools like Distribute or Setuptools will let you read these information once the distribution is installed.

You can even get the metadata fields values by asking for them directly through setup.py:

$ python setup.py --name
MyDistribution

Another example : the  register command can send the metadata or your distribution to PyPI. They will be made available on PyPI website and also through its XML-RPC interface:

>>> import xmlrpclib
>>> server = xmlrpclib.Server('http://pypi.python.org/pypi')
>>> server.release_data('distribute', '0.6')['author']
'The fellowship of the packaging'
>>> server.release_data('distribute', '0.6')['keywords']
'CPAN PyPI distutils eggs package management'

Limitations of metadata

Metadata are pretty handy, but there are some obstructing limitations we bumped into when we started to work on packaging matters during last Pycon.

Platform-dependant metadata

We wanted to extend the Metadata fields list in order to add a “requires” field that can be used to list the requirements (in term others python packages or modules).

For instance, if you want to define that your project depends on simplejson, you could write:

from distutils.core import setup

setup(name='MyDistribution', version='0.1', description='cool',
      packages=['my_package'], ext_modules=[Extension('foo', 'foo.c')],
      requires=['simplsjon'])

This is not a new proposal. It was proposed in PEP 345, but never really used.

Since then, Setuptools provided a similar field, called “install_requires” together with easy_install script that acts a bit like a package manager. easy_install reads the requirements and install them when you install a distribution.

But the limitation of those requirement fields is that they might be platform-dependant. For example, you don’t need to install simplejson anymore under 2.6 since a json library was included in the standard library. In other cases you might have different dependencies depending if you run under windows or linux, and so on.

So to be able to get the metadata right, you have to work a little bit in your setup.py file:

from distutils.core import setup
import sys
if sys.version_info[0] == 2 and sys.version_info[1] < 6:
    requires = ['simplejson']
else:
    requires = []
setup(name='MyDistribution', version='0.1', description='cool',
      packages=['my_package'], ext_modules=[Extension('foo', 'foo.c')],
      requires=requires)

But the metadata will only be available at install time, when the install command will execute the code of setup.py on the target system.

Code-dependant metadata

In other words, once a field like requires is added in the Metadata, you will not know for sure if it’s reliable when you look at the project page at PyPI. That’s because the metadata you will see there will be the one created by the person that called the register command and sent the result. This result is tighted to his environment, not yours.

To be able to get the metadata for your environment you will need to run that code again, by downloading the package, then running a setup.py command.

Let’s try to do it with the lxml source distribution. Let’s try to get the name field :

$ python setup.py --name
Building lxml version 2.2.2.
NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c' needs to be available.
Using build configuration of libxslt 1.1.12
Building against libxml2/libxslt in the following directory: /usr/lib
lxml

What happened here ? Frankly I am not sure. But asking for the name (that appears on the last line) called a bunch of code located in the distribution.

I could probably ask the lxml team to fix this output, and make sure setup.py can still be used to work with the metadata, but this was just to demonstrate a flaw in the way Distutils works :  you need to run third party code just to get the metadata of a distribution you’re not even sure you are going to install on your system.

The setup.cfg file

Part of the problem can be resolved by putting the metadata in a static file alongside setup.py. As a matter of fact, the setup.cfg file is already used by distutils to store some options. There’s even a global section that can be used to set the metadata into the Distribution object Distutils creates when you run setup(). Using the global section that way is not documented and probably not intended. What’s intended is to be able to set some global options like “verbose” or such things.

See http://docs.python.org/install/index.html#syntax-of-config-files

But the code is a generic setter, that allows you to pass any field (so the metadata). Call it a bug if you want, but I was pretty excited to see that I could pass my metadata to Distutils through it. Unfortunately these values are not passed to the DistributionMetadata subobject in Distutils, so it doesn’t work exactly like the arguments passed to setup(). Too bad ;-)

I could change this right away in the code, but we have better plans I think.

A new setup section

Instead of working in the global section which should stay specific to running options, let’s create a new section and put the metadata in them.

[setup]
name: MyDistribution
version: 0.1
description: cool

The setup.py script stays, but is now not containing any metadata field, and does only contain what I would call “working arguments”. e.g. argument used by commands that are not part of the Metadata:

from distutils.core import setup
setup(packages=['my_package'], ext_modules=[Extension('foo', 'foo.c')])

What about platform-dependant fields ?

In order not to require any third party code to read the metadata, we need a way to express platform-dependant fields in the setup.cfg file.

The proposed way is to have platform-dependant sections :

[setup]
name: MyDistribution
version: 0.1
description: cool
conditional-sections: py25
[py25]
condition: python_version == '2.5'
requires: simplejson

The py25 section is read only if the expression is true.

Another example:

[setup]
name: MyDistribution
version: 0.1
description: cool
conditional-sections: py25, py26
[py25]
condition: python_version == '2.5' or python_version == '2.4'
requires: simplejson
[py26]
condition: python_version == '2.6' and sys_platform == 'win32'
requires: bar

here, “bar” will be installed under Python 2.6 under Windows, and “simpljson” under Python 2.5 or 2.4 on any platform.

Distutils will provide a new function that is able to interpret the expressions provided in the condition, and calculate the metadata depending on the platform.

That’s still some code we are running here, but:

  • We are restricting the execution context to the bare minimum: python_version, sys_platform, os_name, and all values returned by os.uname()
  • The function will be vanilla Python: you will be able to extract the metadata without running a third party code, and knowing that the execution is restricted to a few string comparisons.
  • The code can be executed at PyPI without any potential security issue, meaning that the XML-RPC functions will be able to send you back the metadata of a packages depending on your environment. In other word, a package manager would be able to list all the dependency of a distribution for the target platform without downloading any of these distribution.

There will always be edge cases

For the 1% of distributions that need more work to calculate the metadata, setup.py will still be present and any option passed as an argument will override a value provided by setup.cfg. They’ll just have to add a flag in the setup.cfg file, indicating that it does not provides all the metadata, and that running setup.py is required:

[setup]
name: MyDistribution
version: 0.1
description: cool
static-metadata: false

If this flag is present, people will now that running setup.py is mandatory to get the full set of metadata.

For example, if the web service provided at PyPI to get the  metadata, will be able to return a platform specific set if we provide the target environment. Let’s say we add in distutils an ‘execution_environment‘ that returns the environment used to interpret the setup.cfg file:

>>> import xmlrpclib
>>> server = xmlrpclib.Server('http://pypi.python.org/pypi')
>>> from distutils.util import execution_environment
>>> execution_environment
{'os_version': 'Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386',
 'os_name': 'posix',
 'python_version': '2.6',
 'os_release': '9.8.0',
 'os_sysname': 'Darwin',
 'os_nodename': 'MacZiade',
 'os_machine': 'i386',
 'sys_platform': 'darwin'}

>>> server.release_data('MyDistribution', '0.6', execution_environment)['requires']
['foo', 'bar']
>>> server.release_data('MyDistribution', '0.6', execution_environment)['static-metadata']

True

PyPI will be able to generate the metadata by interpreting the setup.cfg file with the execution_environment info.

What happens now ?

I won’t write a PEP for this. I don’t think it’s necessary because this feature is backward compatible, and if people don’t use it in Python 2.7 and 3.2, it will just fade away, like other things in Distutils.

But we need to reach a consensus at Distutils-SIG then inform about it at Python-dev. I just hope we will have this consensus real quickly, unlike most topics we are working on for a year ;)

Or maybe I should be a bit of a dictator for this feature and just go ahead and add it ? Because as Brett told me several times, it’s impossible to make everyone happy about everything. And I’d like to see Distutils move on. There’s so much left to do…

What do you think ? How do you like that feature ?

Filed under: distutils, python

Gsoc is over, we have a Python Keyring lib

The Google Summer of Code is over and the first version of the keyring library was released last week by Kang at PyPI.

How Keyring works, the big picture

This library implements a simple plugin system. Each plugin has to implement a set of methods described in an abstract class and can wrap any underlying Keyring system. We called those plugins “backends“. The nice thing about it is that you can implement your own custom backend and make it available through the Keyring configuration file.

Kang has coded various Keyring backends in C and C++ extensions, for KWallet, Keychain, and Gnome. We also have added a Keyring implementation that uses the Win32Crypto API so windows users can use the lib.

When the Keyring lib is used, all declared plugins, whether they are provided by the lib itself or by a third party package, will be loaded. Then they will be asked a simple question:

“Can you run in this environment ?”

The backend can answer one of these:

  • “Yes, I could work in this environment”
  • “No, I can’t”
  • “Yes and you should use me  !”

The library filters out backends that can’t work on the target, sort the remaining ones, and get one of the best backend. This doesn’t happens of course if you explicitely define which backend you want to use, which is possible.

What’s next

Keyring 0.1 is out and there will probably be 1 or 2 releases to stabilize the code.

The next steps will be :

  • to use it in Distutils, with a soft dependency : Distutils will let you use it through configuration if it detects Keyring is installed.
  • to promote its usage and in particular see if projects like Mercurial could use it
  • to work on a PEP for its integration in Python stdlib, in the getpass module

Filed under: gsoc, python