The fate of Distutils – Pycon Summit + Packaging Sprint detailed report
by Tarek Ziadé
The summit
I quickly posted an entry right after the Language Summit we held before Pycon in Atlanta. Basically, all the work I have being doing in Distutils and the PEPs we’ve prepared for the “big refactoring” will not be done in the standard library. Distutils in the stdlib trunk will be reverted to its current 2.6.x state.
I was quite despaired right after the summit. All the work we did during in the past year would not land into the standard library for 2.7, and all the pre- refactoring work I did, like making the test coverage decent, was going to be useless for the stdlib. Having that work included in 2.7 was one of my goal and I worked hard on making sure most of the important PEPs would be accepted before the feature freeze for 2.7 happened (the first beta, freezing new features, is in 4 weeks.)
I was even more depressed because I started to pull out of Distutils the “sysconfig” module and simplified the code in distutils, while making sure that the backward compatibility was kept.
I had a twenty minutes meeting with Guido after the Summit to clarify the situation and he helped me understand why this was the right path and worked with me on what to do next in the stdlib front and outside the stdlib.
Basically, a package that comes in the standard library has a foot in the grave (I am paraphrasing Guido here.). Its APIs is frozen, and people don’t really expect nothing from it, but small new features and bug fixes. Refactorings are dangerous, if not impossible.
I have hit that problem in the past, in one of the 2.6 bug fix release, where I broke Setuptools compatibility because of an internal change I have made in a private method. The breakage was partly because Setuptools overrides a private method and partly because a public method that was not clearly documented was affected.
A few weeks ago the problem happened again : someone complained on python-dev because a declaration (an exception class) was missing from Distutils. An exception class was imported from the errors module into another Distutils module, but not used anymore there. And the module it was imported in didn’t have an __all__ attribute. A third-party tool was importing the exception from the wrong module, so when I cleaned it up the third party module was broken.
So basically, any change I make in Distutils, even a simple cleaning, and worse, even a private method change, potentially breaks third-party tools.
You could argue that they should be careful in how they use Distutils, and never patch it or change its internal etc., and for edge cases like missing imports, just fix them.
But hey, Python 2.7 is out of the door in five weeks, and the user experience will be that Python has broken third-party libraries.
And the worse part of it : some of these libraries like Setuptools are not really maintained anymore and expect Distutils not to evolve anymore. But Setuptools is used nevertheless since it solves some problems Distutils doesn’t. So the end user is the one that will suffer from those regressions.
In other words, project like Setuptools slows down the work we want to do in packaging because the current eco-system depends on a big, monolithic, messy pile of code that is located in different projects with different maintainers.
At this point, I understood that the easiest way for Distutils to evolve was to get away from this pile and grow on another namespace called distutils2.
Welcome Distutils2
If you have followed what is going on with packaging since last year, you might think: “distutils, setuptools, distribute and now distutils2 ?, oh no!!!”
But that is going to be for the benefit of everyone. See the roadmap in image below.
So basically, I have forked Distutils and renamed its package into Distutils2. The project is located in http://hg.python.org/distutils2 and the goal is to put it back into the standard library as soon as it reaches a state where it starts to be used by the community. Distutils will just die slowly, probably pulling Setuptools and Distribute with it.
The Distribute project is still important because it can help us releasing bug fixes or Python 3 support things today.
Distutils2 will be 2.4 to 3.2 compatible and will get back from Distribute the good bits and implement the PEPs that were accepted lately PEP 345 and PEP 386.
And I am happily removing old code we don’t want/need anymore without worrying about backward compatibility. Yeah !
The packaging sprint
After the conferences, we started a packaging sprint and I was surprised because many people showed up and worked on the topic.
We created a few teams to work on PEP 376, mkpkg, the Hitchicker’s Guide to Packaging (HHGP), and Distribute. I won’t say the name of each person, I am too scared to forget someone :D.
PEP 376
Like last year, people from various distributions (Fedora, Ubuntu, Debian) and I worked on packaging issues. They worked on PEP 386 last year mainly (the versioning scheme) and focused on PEP 376 this year. This PEP is about setting up a standard for installed packages, and an installation index that allows to query what packages are installed, and get their metadata. In extend, it provides an uninstall feature. The goal is to have a standard for all package managers of course.
One part of the PEP is about describing the data files that are installed with the project (like configuration files or documentation) so they can be removed and maybe relocated. The group focused on describing the files a project contains in a static way (in setup.cfg) with variables that can be expanded an installation time (which values are provided by Python, but globally configurable by the OS packagers.)
We did quite some work and brainstorming on this, and even focused on removing setup.py ! A fully static description of a project (metadata+file list) is the key to a better packaging tool !
Expect a proposal soon on distutils-SIG, for PEP 376. If you want to have a look, the draft proposal is here: draft.
mkpkg and Distribute
We had two one-member teams at some point, so I can name them without being scared of forgetting someone 😉
Sean worked on a nice add-on for Distutils2, a script that builds a setup.py file after asking you a few questions. He blogged about it. so I don’t need to get into further details 🙂
Noufal worked on fixing some bugs in Distribute. We should do a release at some point.
The HitchHicker’s Guide to Packaging
Another group worked on the guide. The goal is to provide some help for people that want to package things today and are despaired with the sparse documentation they can find. Which tool to use ? how ? when ?
The work done was quite amazing, look at it : http://guide.python-distribute.org
I have spoken with Georg Brandl to see how we could move it to docs.python.org and make it grow there.
Distutils2 coding
Besides PEP 345, I worked on making Distutils2 work for 2.2, 2.5, 2.6 and this is now over. I have also almost fully implemented PEP 345 in there.
There’s now a metadata module with a dict-like DistributionMetadata class that knows how to read and write PKG-INFO files. It also knows how to interpret the micro-language we’ve defined: the environment markers.
Last, I’ve added the PEP 386 version module : version.py. This one is used now by the metadata class to control versions.
More to come !
Next sprint at Confoo.ca
The next packaging sprint will happen in Montreal, where I am going as a speaker next week. We will continue the worked started, so stay tuned.
Tarek, thank you for this detailed write-up! My initial reaction to the idea of a “distutils2” was also dismay, but you have done a good job of explaining the situation. I suppose this inability to refactor internals is a cost of the fact that Python does not support interfaces or allow us to hide variables: people can rely on pretty much ANYTHING about how our implementation works, and then complain later when we want to improve it.
We need a nice linter that tells people to import exceptions from where they’re defined rather than from places that import them incidentally. The Zope people might already have one, come to think of it.
@Brandon: yes that can be quite disturbing for observers 🙂
About the imports: I think using __all__ is a good thing to publish what you really want to be part of the public API (even if its a name just imported from another module)
Nothing about __all__ is even a speedbump for wrong imports. It would be okay if Tarek could just get the report and quick make a new release to fix little issues like that. But if there’s a Python release with a problem like that in the standard library it is much more disruptive and harder to correct with rapid releases. It’s just not a good place to do this development. Which is quite annoying, as it’s not exactly awesome to do this development outside the standard library either 😦 It’s a tricky problem.
@Ian: well, at least __all__ marks yur public API in some ways but… Anyways, the fact is that the stdlib *do* need to have distutils2 at some point, so maybe this will end up its evolution then.
Unless we start to make it possible to release stdlib members with sorther cycles maybe..
Hey Tarek,
I am disappointed too. If there is going to be a distutils2, could it be merged with distribute? It would make things simpler.
Can something like distutils, or distutils2, really be “finished”? That certainly wasn’t the case for distutils, causing a lot of pain for everyone. I don’t see a point when distutils2 can really be finished either. We need a new idea for managing libraries like these.
@Gael: well, distutils2 will be distributed on its own and compatible with 2.4 -> 3.2, and will include what we’ve added in distribute so far, so what do you have in mind to lake things simpler given the constraints ?
@Ian: Agreed. Maybe Python could be distributed with a first level of packages that can have a 18 months release cycle and not suffer from it, and a second level for packages like distutils, with the ability to get updates in a shorter release cycle. I am also wondering if having a standalone release beside the frozen one in the stdlib is not solving the problem (ctypes has this startegy IIRC)
There should be a term or phrase for describing
packages (or APIs) that can’t change due to
community inertia, and the effort required to
replace those packages (APIs) with *new* and *improved* packages (or APIs).
I saw a similar problem in previous (and current) job where we couldn’t change an API, so had to
just create a new one. Once all the old packages
were converted, then the new became the old. But
that wouldn’t work for us with distutils2 eventually
becoming distutils I think. (though it should.)
As Ian recently pointed out, there’s a lot of ambiguity as to what constitutes the public API of a Python package. Is it anything not beginning with _? What about modules whose names don’t start with _? Must they stick around forever? In the extreme, if we import a public module into another public module but don’t intend others to reimport it from there, must we import it “as _somepublicmodule” (ick)?
Perhaps the stdlib is the place to start establishing some solid conventions for this. We could lay down a “Here’s how the stdlib expresses what is public” document in the Library Reference someplace, perhaps developing it as a PEP first. With time, luck, and careful choosing, these practices might spread to the bulk of third-party packages, and we’ll have a more dependable set of expectations for everybody. Crazy?
@Erik: I think __all__ is what we want – http://pyref.infogami.com/__all__
@Allan: That’s the deprecation process I guess, that makes it possible to move away from an API. In distutils case though, setuptools (and some other projects) used private APIs then spread in the community, then stopped to be maintained. That blocked the usual deprecation process. I am not sure how this can be handled properly. Hard problem…
Yes, I like __all__.
The options for scaring folks away from private APIs are both pretty bad: (1) spewing ugly, developer-visible warnings when they’re called (which is what Zope does at its default log level) or (2) actively preventing outsiders from calling them, via ugly magic.
I think the ideal course is to keep the underlying lib agile so that, if a future author of something like setuptools needs access to something provided by a private API, we can come up with a public alternative quickly. Of course, being shipped only with new Python revisions makes that tricky….
Thanks for the write-up! Good code without good communication is dead code. You seem to manage to do both.
“Unless we start to make it possible to release stdlib members with shorther cycles maybe..”
Splitting the stdlib or having parallel standalone distributions for modules / packages seem a way forward. It would certainly world in the free world, but I don’t know whether users of OSes lacking a package manager will update their stdlib.
Cheers
s/world/work/
[…] March we started to develop Distutils2, which was basically the trunk of Distutils right before the Big Revert. Yannick and other fine […]
Why isn’t schema and data migration tool south included in django by default?…
Because shipping things as part of Django means they can’t have separate releases, which means you only get a new released version every 6-12 months. South is improving far faster than that. Guido van Rossum once described packages that are included i…
Thanks for the detailed explanation of distutils2 project . Even though I am a beginner here in terms of packaging, it explains a lot of things and makes sense to go to the newer version than refactoring code.
[…] also known as Packaging in the Python 3.3 standard library, is an improved fork of Distutils. Here’s a short review of the main differences between both […]
[…] The fate of Distutils – Pycon Summit + Packaging Sprint detailed report […]
[…] The fate of Distutils – Pycon Summit + Packaging Sprint detailed report […]
[…] Distutils的命运– Pycon峰会+ Packaging Sprint详细&#x… […]
[…] Distutils의 운명 – Pycon Summit + Packaging Sprint 상세 보고서 […]
[…] The fate of Distutils – Pycon Summit + Packaging Sprint detailed report […]
[…] Yes, but it’s not going to happen overnight. […]
[…] The fate of Distutils – Pycon Summit + Packaging Sprint detailed report […]