As you might know, we are working hard on Distutils side for Python 2.7 and 3.2 upcoming releases. The biggest work is on PEP 376, that will introduce among other things a uninstaller function and functions to query installed distributions.
The other “big” work is on PEP 345. We want to introduce a new metadata field called “install_requires” to be able to express requirements. That’s from the setuptools project and is quite used by the community. Notice that there were several attempts to define requirements in the past in Distutils, but none of them really made it through.
For instance, if you want to define docutils as a dependency for your distribution but a version less or equal to 0.4, you can say :
docutils <= 0.4
But as long as you want to work with such dependencies and provide a way to express them with operators, you have to be able to compare versions. For instance if you want to compare an installed docutils distribution to see of it is compatible with 0.4.
That’s another big topic we have been working on for the last few months, with people from various communities (Fedora, Ubuntu, etc). And I have started to write down PEP 386.
But comparing version appears to be a topic that cannot be generic. It seems that Distutils, therefore Python, shouldn’t enforce any rule on this.
Furthermore, since we have said that Distutils should be a lighter package, it will not implement a complete package managment system, like setuptools or zc.buildout does.
So I’ve decided to propose to drop PEP 386, and stick on a very simple rule in PEP 345, saying that requirements can be defined, with :
distribution_name OPERATOR version
where OPERATOR is in >, <, ==, !=, >= or <=.
Last, so the work done at Pycon and in Distutils-SIG is not lost, I will publish the library we wrote. This could be a very good basis for packaging managment systems out there.
The wider problem is that PyPI is kind of an anarchy.
Having version numbers like
1.5.1
1.5.2b2
161
3.10a
8.02
3.4j
1996.07.12
3.2.pl0
3.1.1.6
2g6
11g
0.960923
2.2beta29
1.13++
5.5.kw
2.0b1pl0
PyPI should enforce strict rules for metadata including version numbers.
There is no need for supporting all weird version number schema. There should be only
one which is mandatory for each package and it should be enforced as part
of the upload policy. POINT.
Giving package authors too much freedom when it comes to clean metadata is the wrong way. Clear rules, clear policies and enforced policies are the way to go.
@Andreas: AFAIK there will never be any enforcement at PyPI beside mandatory metadata like the email, or the package name at PyPI. That’s against Python philosophy in some way.
“”"
That’s against Python philosophy in some way.
“”"
The philosophy is wrong here.
@Andreas: Suppose that you are a scientist. You have your own custom way to handle version numbers. It works now for years. It’s based on dates (20090101, etc..). Then you want to publish your work for other scientists at PyPI.
Should PyPI reject your package because your version numbers are based on dates ? I don’t think it should.
Now from a package manager point of view (zc.buildout for example) that compares versions, it makes a lot of sense to have a rational version comparison system like what’s described in PEP 376 and maybe reject those package because they can’t be compared properly.
It’s not only about versions – it’s about metadata in general. Both the CPAN and Ruby package repository have stronger rules for providing a certain minimal package quality. On PyPI you are allowed to upload every kind of personal trash without control. That’s *bad*.
I think there are two levels of metadata :
1- the ones provided by PEP 345 which are semi-enforced (eg you have to provide metadata with specific names like ‘name’, ‘url’, etc..),
2- the one created for PyPI, the classifiers http://www.python.org/dev/peps/pep-0301/#distutils-trove-classification. Those are strict : you can’t add a custom classifier. It has to be added on PyPI side.
Starting from these, what would you change to raise the QA at PyPI ? (that’s a true question, I have very little knowledge about Perl and Ruby ones)
I think PyPI could award points for packages depending on how good the metadata is. Of the Trove classifiers agree with the version numbering, if the versions follow a reasonable standard like 1.2.3a4, if there is a description longer than one line, etc.
I know there’s a django website that does it, using a cheesecake-like system (http://pycheesecake.org/) but I can’t remember the name
You’re thinking of PyPants.org?
no, I didn’t know about pypants (looks nice!) but that was a similar project. I can’t recall the name
Mandatory fields:
Maintainer name
Maintainer email
Short description
Long description
If one field of them is missing -> a release should be rejected or hidden on PyPI or at least not shown by default.
Another pain are externally hosted packages where the home url or download url is broken.
In addition: one versioning schema should be enforced…we don’t need ten or more different version schema. If some package maintainers (I could mention at least one from the Plone world) can’t or won’t fix their broken package or if they invent new versioning schemas -> reject their releases.
My strong opinion is that any package registered with PyPI should be downloadable from PyPI directly. Especially packages hosted on Sourceforge often make problems – either because the configured download URL does work (because it refers to some SF mirror which are often not reachable) or because the SF download page change pretty often and make setuptools fail miserably (we had a very strange case with Cheetah 2.2.1 – V 2.0.X was easy_installable, 2.2.1 wasn’t).
Freedom and anarchy on PyPI might be good for the developers – they are a pain for the average end-user just trying to install some package. As an integrator I have certain quality expectations about a repository. Basically never had any issues with installing 3rd party Perl modules from CPAN over the last years and not with some Ruby stuff…PyPI is a pain in the a** compared to the Ruby and Python world.
For the metadata check : that’s the philosophy of the new “check” command. It provides a way to check the metadata on client side. There’s a strict mode that raises an error if a rule is not met. Take a look.
For the URL hosting, I am not sure what would be the best way to handle this problem. e.g. It’s rather the way easy_install works, by browsing the project home, which I think is wrong.
If you query PyPI using xmlrpc, you get clean download urls at pypi or elsewhere, without having to browse html pages.
I agree that more field can be mandatory. Also I agree that packages who aren’t registered and packages that have external download URL’s should not be listed by default. I don’t think version numberings should be enforced, though. For example, pytz uses version numbers that follow the Olsen database versions, which are years+letter, like 2008a, 2009k, etc. I think forcing pytz to use another versioning scheme would cause more confusion than it solves.
@Lennart
> packages who aren’t registered and packages that have external download URL’s
> should not be listed by default
Why ? it’s good to have these registered, and a link to a download url too. I think it’s more up to the client script (eg easy_install) to decide not to follow this link for instance.
But download_url is part of the metadata and it’s good to be able to get it. PyPI is an archive repository *and* an index.
Because each package you use that isn’t uploaded to PyPI creates a new single point of failure. We really need to get rid of that, IMO. There is no reason not to upload them.
I think that at least one counter to this argument is that PyPi imposes a limit on the size of what you can upload (10MB). In ETS, we have a couple packages where the source dist includes some very thorough tutorials and docs, and thus alot of images, which means the sdists are more than 10MB. The only way to list these on PyPi is by providing a link to an alternative download location.
I’d be very upset if something prevented users from installing our projects simply because we wanted to provide good docs.
I’d hate to have to download 10 MB just because you include hefty docs in the source file. I have to admit that it’s a bit strange to me. Even more, I’d hate to have my buildout downloading 10 MB frequently.
But I guess even so you have a point. Even if the package would be separated into a docs package and the source code you should be able to list your +10MB doc package, I assume.
Sorry, to be so late to the discussion, but here’s one point about uploading packages that hasn’t been considered here:
What about binary packages for C-extensions? A package maintainer can’t possibly be expected to provide these for all possible combinations of os, python version and architecture. And do we really want all these to be hosted in the PyPI? The only sensible way to always provide up-to-date binary packages would be some kind of (external) build server (cluster)., but these are very project specific. External download URLs are definitely a must.