PyPI : CDN vs Mirrors
by Tarek Ziadé
We had a few discussions in the last days about what would be a good way to make PyPI more reliable.
It think there were a bit of confusion about the mirroring protocol (PEP 381) and its goals, versus the reliability of the current PyPI main server.
Some people were basically saying (I am paraphrashing)
Just move PyPI to a CDN and be done with it, this mirroring thing is too complicated.
Well, ok. We could set up a CDN for our PyPI files and have all our archives at Amazon or elsewhere.
But since the mirroring protocol is implemented and available on server-side (We have 5 active mirrors), and since Pip already supports switching to a mirror, the shortest path to a better PyPI is simply to :
create a new mirror in a <put the name of the best provider> CDN and register it as a mirror at PyPI
And if it’s so better, so reliable and fast, maybe we’ll move it up in the mirrors list, as the first one so all clients should pick it by default.
And the day <put the name of the best provider> is down. (yeah it happens, remember EC2 a few months ago), you will all be thankful that we have other mirrors and a protocol to switch over them !
So, if you think a CDN is the magic solution, go ahead. Grab pep381client, set up your monster infrastructure, and let me know so I add it in the list or mirrors. And maybe we will never ever call another server again. Or maybe not.
For further info, here is a detailed summary of PyPI status we’ve built with Noah: http://wiki.python.org/moin/BetterPyPI
Related: I am going to submit a tutorial on how to work with PyPI, mirrors, private packages etc., to show how I do at Mozilla
Sounds good to me!
one small nit CND == Content Delivery Network right? Us anglos usually abbreviate it CDN. CND made me do a double take 😉
-w
Yeah that was a typo, I’ve fixed it. thx !
maybe I’m dyslexic this morning…. ignore me.
It was misspelled and Carl told me on Twitter — I;ve fixed it while you where commenting 😉
For what its worth (and validating your point in a way) putting everything in a single location cloud instance isn’t a CDN. Amazon’s outage affected EBS and therefore a lot of EC2 instances in US East 1, a proper CDN has geographic redundancy (aka mirrors) built in
I think people were thinking about using CloudFront
Added a mirror @ http://pypi.digitalxero.net
A thanks, please send a mail to catalog-SIG so we add it. And if it’s a CDN let us know
[…] of the current PyPI main server. Some people were basically saying (I am… Python Read the full post on Planet Python… Share […]
If we’re going to rely on mirrors, will there be some form of digital signature system to prevent people from running rogue mirrors?
For example, distro mirrors manage to be secure (in terms of integrity, not availability) even against rogue mirrors and http mitm.
We do have some of this concerned addressed in PEP 381 — see http://www.python.org/dev/peps/pep-0381/#mirror-authenticity