Fetchez le Python

Technical blog on the Python programming language, in a pure Frenglish style. Main topics are Python and Mozilla – This blog does not engage my employer

More privacy please

It became extremely annoying for me to get targeted ads, since I do most of my shopping online, even groceries. I am a great target for advertising operators 🙂

Google seems to be very transparent about letting you opt-out third party cookies as a logged-in user, there’s a page for this here. You can also make sure your web history is disabled on this page.

(c) bhikku cc-by-nc 2.0

But I don’t want to chase options on every online service I use, just to opt-out and protect my privacy. All these options should be off by default in my opinion, or at least you should be warned, no ?  I don’t know if it’s still accurate but there’s a law in Europe for this.

Anyways, this is not a new problem. But I realized that while I was shocked by how cookies could breach your privacy +10 years ago, I was not taking any action to protect myself. Let’s fix this.

I am trying to fine-tune my browser and change my search habits to make sure I am leaking as less data as possible.

Here’s a list of actions I have done, if you have better ideas please comment, I’ll try to keep that list up-to-date.

I have also decided to give a shot to duckduckgo.com instead of google.com for search. They don’t track at all. You can install the ssl search bar add-on and also tweak your Awesome bar so it uses it too (I set it to https://duckduckgo.com/?q= and it seems to work).

It’s been a week of Duck Duck Going, and I like it so far. It’s refreshing to use a new search engine. I have not suffered from bad results so far — nothing compared to what made me move from Altavista to Google a very long time ago 😀

And oh… nooo.. what now.. Browser Fingerprint ??!

circus – a process controller

Benoit — from Gunicorn fame —  and myself have started to work on Circus, a process controller.

I really like this small project for several reasons:

  • it’s  the first time we’re building a Mozilla Services library from the  ground with a contributor that’s not for the Mozilla community but  rather from the Python community
  • this library is the last bit we’re missing in our Python server stack to have a full control over it

Why a new lib ?

A process controller is mostly a script that manages processes, sends signals to them, re-spawns them whenever they die.  There are numerous libraries out there that are already doing this so creating this new library can sound like NIH.

I have looked at several projects before we’ve  started this one, like supervisord, BluePill, and so on. But none of them met exactly all of my requirements – here’s my audacious list:

  • can be used as a program and as a library — so in Python
  • a way to query the system to perform live operations via a (secured) remote console such as
    • add or remove workers
    • restart the workers, broadcast signals
    • get some stats on what’s going on
  • advanced process management features, like
    • controlled flapping – trying to restart n times a worker that dies on startup — then abandon
    • auto-grow – add workers if all workers are 100% busy n seconds (for CPU-bound workers)
    • auto-shrink – remove workers if they are bored
    • run in containment tools like cgroups
  • meta-controller that can drive several controller remotely, to manage a cluster (later)

BluePill looked very promising because it has most features we wanted, but unfortunately, since we’d like to use it as a library, it’s a blocker. Also, while the DSL is quite sexy, that’s not something we’d want to use as-is because we’re in a ini-file land were every tool is configured via a new [section] in a config file.

Supervisord is excellent — and widely used.

I have tried to extend it and I must admit I had a little hard time to wrap my head into it.  This is purely technical, but some choices made in Supervisord make it hard for me to extend it the way we want — like the fact that the main class is driven by a configuration where I wanted to  completely separate these two concepts. I want to be able to create a class and tell it to run n workers without having to create a configuration object in the middle. There’s also now the subprocess module in Python, and while Supervisord is probably compatible with older versions of Python, we want 2.6+ so we can make the code way less verbose.

Here’s an example on how to run 3 workers on a given command with Circus – KISS:

from circus import get_trainer

trainer = get_trainer(cmd, 3)
try:
    trainer.start()
finally:
    trainer.stop()

Anyways — we’ll still be suffering from our choice for a bit — we’ll encounter issues that other projects have encountered before. But I think that’s for the best, and Benoit has a lot of experience in this area with Gunicorn – I expect both project to exchange a lot of pieces.

Current status

We’re busy polishing the tool, but it’s already in a usable state. For Mozilla, the main use case is to run Crypto Workers for Powerhose (read about Powerhose here) and we can already do this.

Turns out all Powerhose does is wrapping the get_trainer() call into a class called Workers. (see https://github.com/mozilla-services/powerhose/blob/master/powerhose/client/workers.py)  and when the web application is launched, it runs a Powerhose Master and some PowerHose Workers that way — delegating all the process management tasks to Circus.

For our Ops, Circus provide a small console that will let them watch the workers, add some, restart them, etc.

We’ve reached a point where we have almost all the features we wanted for our needs, but I suspect the project will gain many more features with the contributions of Benoit and maybe other folks in the future.

Nothing’s released yet — I’ll wait for it to pass our benches, QA tests before I cut a release. But the code is growing here if you’re curious : https://github.com/mozilla-services/circus.

Yeah, it’s under-tested because I did not come up with a nice testing environment yet – it’s hard to do this properly when you deal with processes and signals — and mocking this is a bit of a non-sense. I suspect the best way will be to run functional tests with workers that produce some content the test can check out.

Defining a wsgi app deployment standard

Next month at Pycon, we’ll have a web summit and I’m invited there to talk about how I deploy web applications. This is not a new topic, as it was already discussed a bit last year — see Ian Bicking’s thought on the topic.

My presentation at the summit will be in two parts. I want to 1/ explain how I organized our Python deployments at Mozilla (using RPMs)  2/ make an initial proposal for a deployment standard that would work for the community at large – I intend to work on this during Pycon and later on the dedicated SIG.

Here’s an overview of the deployment standard idea…

How we deploy usually

If I want to roughly summarize how people deploy their web applications these days, from my knowledge I’d say that there are two main categories.

  1. Deployments that need to be done in the context of an existing packaging system — like RPM or DPKG
  2. Deployments that are done in no particular context, where we want it to just work. — like a directory containing a virtualenv and all the dependencies needed.

In both cases, preparing a deployment usually consists of fetching Python packages at PyPI and maybe compile some of them. These steps are usually done using tools like zc.buildout or virtualenv + pip, and in the case of Mozilla Services, a custom tool that transforms all dependencies into RPMs.

In one case we end up with a directory filled with everything needed to run the application, except the system dependencies, and in the other case with a collection of RPMs that can be deployed on the target system.

But in both cases, we end up using the same thing: a complete list of Python dependencies.

The trick with using tools like zc.buildout or pip is that from an initial list of dependencies, you end up pulling indirect dependencies. For instance, the Pyramid package will pull the Mako package and so on.  A good practice is to have them listed in a single place and to pin each package to a specific version before releasing the app. Both pip and zc.buildout have tools to do this.

Deployments practices I have seen so far:

  • a collection of rpms/debian packages/etc are built using tools like bdist_rpms etc.
  • a virtualenv-based directory is created in-place in production or as a pre-build binary release that’s archived and copied in production
  • a zc-buildout-based directory is created in-place in production or as a pre-build binary release that’s archived and copied in production

The part that’s still fuzzy for everyone that is not using RPMs or Debian packages is how to list system-level dependencies. We introduced in PEP 345 the notion of hint where you can define system level dependencies which name may not be the actual name on the target system. So if you say you need libxml-dev, which is valid under Debian, people that deploy your system will know they’ll need libxml-devel under Fedora. Yeah no magic here, it’s a tough issue. see Requires-External.

The Standard

EDIT : Ian has a much more rich standard proposal here. (see the comments)

The standard I have in mind is a very lightweight standard that could be useful in all our deployment practices – it’s a thin layer on the top of the WSGIstandard.

A wsgi application is a directory containing:

  • a text file located in the directory at dependencies.txt,  listing all dependencies – possibly reusing Pip’s requirements format
  • a text file located in the directory at external-dependencies.txt,  listing all system dependencies – possibly reusing PEP 345 format
  • a Python script located it the directory at bin/wsgiapp with an  “application” variable. The shebang line of the Python script might also point to a local Python interpreter (a virtualenv version)

From there we have all kind of possible scenarios where the application can be built and/or run with the usual set of tools

Here’s one example of a deployment from scratch :

  • The repository of the project is cloned
  • A virtualenv is created in the repository clone
  • pip, which gets installed with virtualenv, is used to install all dependencies describes in dependencies.txt
  • gunicorn is used to run the app locally using “cd bin; gunicorn wsgiapp:application”
  • the directory is zipped and sent in production
  • the directory is unzipped
  • virtualenv is run again in the directory
  • the app is hooked to Apache+mod_wsgi

Another scenario I’d use in our RPM environment:

  • The repository of the project is cloned
  • a RPM is built for each package in dependencies.txt
  • if possible, external-dependencies.txt is used to feed a spec file.
  • the app is deployed using the RPM collection

That’s the idea, roughly — a light standard to point a wsgi app and a list of dependencies.

Scaling Crypto work in Python

We’re building a new service at Services called the Token Server – The idea is simple : give us a Browser ID assertion and a service name, and the Token Server will send you back a token that’s good for 30 minutes to use for the specific service.

That indirection makes our live easier to manage user authentication and resource allocation for our services . A few examples:

  • when a new user wants to use Firefox Sync, we can check which server has the smallest number of allocated users, and tell the user to go there
  • we can manage a user from a central place
  • we can manage a user we’ve never heard about before without asking her to register specifically to each service — that’s the whole point of Browser ID

I won’t get into more details because that’s not the intent of this blog post. But if you are curious the full draft spec is here – https://wiki.mozilla.org/Services/Sagrada/TokenServer

What’s this post is really about is how to build this token server.

The server is a single web service that gets a Browser ID assertion and does the following:

  1. verify the assertion
  2. create a token, which is a simple JSON mapping
  3. encrypt and sign the token

The GIL, Gevent, greenlet and the likes

Implementing this using Cornice and a crypto lib is quite simple, but has one major issue : the crypto work is CPU intensive, and even if the libraries we can use have C code under the hood, it seems that the GIL is not released enough to let your threads really use several cores. For example, we benched M2Crypto and it was obvious that a multi-threaded app was locked by the GIL.

But we don’t use threads in our Python servers — we use Gevent workers, which are based on greenlets. But while greenlets help on I/O bound calls, it won’t help on CPU bound work : you’re tied into a single thread in this case and each greenlet that does some CPU work blocks the other ones.

It’s easy to demonstrate — see http://tarek.pastebin.mozilla.org/1476644  If I run it on my Mac Book Air, the pure Python synchronous version is always faster (huh, the gevent version is *much* slower, not sure why..)

So the sanest option is to use separate processes and set up a messaging queue between the web service that needs some crypto work to be done and specialized crypto workers.

We’re back in that case to our beloved 100% I/O bound model we know how to scale using NGinx + GUnicorn + GEvent

For the crypto workers, we want it to be as fast as possible, so we started to look at Crypto++ which seems promising because it uses CPU-specific calls in ASM. There’s the pycryptopp binding that’s available to work with Crypto++ but we happen to need to do some tasks that are not available in that lib yet — like HKDF.

Yeah, at that point it became obvious we’d use pure C++ for that part, and drive it from Python.

Message passing

Back to our Token server — we need to send crypto work to our workers and get back the result. The first option that comes in mind is to use multiprocessing to spawn our C++ workers and to feed them with work.

The model is quite simple, but now that we have one piece in C++, it’s getting harder to use the built-in tools in multiprocessing to communicate with our workers — we need to be lower level and start to work with signals or sockets. And well, I am not sure what would be left of multiprocessing then.

This is doable but a bit of a pain to do correctly (and in a portable way.) Moreover, if we want to have a robust system, we need to have things like a hearbeat, which requires more inter-process message passing.   And now I need to code it in Python and C++

Hold on — Let me summarize my requirements:

  • inter-process communication
  • something less painful than signals or sockets
  • very very very fast

I got tempted by Memory Mapped Files, but the drawbacks I’ve read here and there scared me.

ZeroMQ

It turns out zeromq is perfect for this job – there are clients in Python and C++, and defining a protocol to exchange data from the Python web server to the crypto workers is quite simple.

In fact, this can be done as a reusable library that takes care of passing messages to workers and getting back results. It has been done hundreds of times, there are many examples in the zmq website, but I have failed to find any Python packaged library that would let me push some work to workers transparently, via a simple execute() call — if you know one tell me!.

So I am building one since it’s quite short and simple —  The project is called PowerHose and is located here : https://github.com/mozilla-services/powerhose.

Here is its descriptions/limitations:

  • Powerhose is based on a single master and multiple workers protocol
  • The Master opens a socket and waits for workers to register themselves into it
  • The worker registers itself to the master, provides the path to its own socket, and wait for some work on it.
  • Workers are performing the work synchronously and send back the result immediatly.
  • The master load-balances on available workers, and if all are busy waits a bit before it times out.
  • The worker pings the master on a regular basis and exits if it’s unable to reach it. It attempts several time to reconnect to give a chance to the master to come back.
  • Workers are language agnostic and a master could run heterogeneous workers (one in C, one in Python etc..)
  • Powerhose is not serializing/deserializing the data – it sends plain strings. This is the responsibility of the program that uses it.
  • Powerhose is not responsible to respawn a master or a worker that dies. I plan to use daemontools for this, and maybe provide a script that runs all workers at once.
  • Powerhose do not queue works and just rely on zeromq sockets.

The library implements this protocol and gives two tools to use it:

  • A JobRunner class in Python, you can use to send some work to be done
  • A Worker class in Python and C++, you can use as a base class to implement workers

Here’s an example of using Powerhose:

For the Token server, we’ll have:

  • A JobRunner in our Cornice application
  • A C++ worker that uses Crypto++

The first benches look fantastic — probably faster that anything I’d have implemented myself using plain sockets 🙂

I’ll try to package Powerhose so other projects at Mozilla can use it. I am wondering if this could be useful to more people, since I failed to find that kind of tool.  How do you scale your CPU-bound web apps ?