A simple self-upgrade build pattern
by Tarek Ziadé
Bootstrapping is useful.
For example, we use it to build in-place our applications at Mozilla Services, to develop them. When the developer runs “make build”, a Python script is invoked, and makes sure all the Python packages we are developing are checked out locally, and tied to the execution environment.
In the Python community you have similar tools like zc.buildout combined with mrdeveloper, or Pip’s requirements. Although, our script does Mozilla-specific things, like allowing to update the dependencies at specific tags via command-line options, or at the “latest stable tag”.
Anyways, back to bootstrapping.
Having such scripts is great, but maintaining them across all projects is painful. Since the script is the first thing the build runs, it cannot have an external dependency so it has to be completely independent. So you have to copy it around, in all your projects, and that leads to maintenance hell.
One solution is to add a self-upgrade feature to the script, so it tries to get the latest version online before it is run. zc.buildout has such a self-update feature, that will make sure it’s at the latest version. Although, the implementation of this feature is a bit odd. 1/ you cannot run the bootstrap script if you’re offline. 2/ on upgrades, zc.buildout restarts itself. I don’t think 2/ a robust approach and I recall that it had some bugs under Windows in the past (updating files that your own program is using might fail).
So, since I was working in the tools-for-Mozilla-Services area, I added a self-upgrade feature and tried to isolate a simple mechanism for this. A pattern that would let us publish fresh versions of some scripts in our server and make it easy for our tools to use them.
An efficient bootstrap script should:
- be able to run offline, so the app can be built even if it’s unable to check for the latest online version
- be able to upgrade itself with a newer version
- upgrade itself without having to restart itself
- be completely independent — besides a vanilla Python
The pattern I have used is as follows:
- The bootstrap script is an empty shell that is just responsible to update a second script that contains the real bootstrapping code, then dynamically import it via an __import__, and run it.
- The server that publishes the real bootstrap script returns an ETag header with it, and supports the If-None-Match header.
- The bootstrap script is able to ask the server if the online version is newer, by using the ETag value it has from the previous update in the If-None-Match header. If the online version is unchanged, the client receives a 412 code (Precondition failed).
The code has no dependencies besides Python.
Extract from the client-side code:
import os
import sys
import urllib2
def main():
# getting the file age
if os.path.exists('._build.etag'):
with open('._build.etag') as f:
current_etag = f.read().strip()
headers = {'If-None-Match': current_etag}
else:
headers = {}
current_etag = None
request = urllib2.Request('http://moz.ziade.org/_build.py',
headers=headers)
# checking the last version on our server
try:
url = urllib2.urlopen(request, timeout=5)
etag = url.headers.get('ETag')
except urllib2.HTTPError, e:
if e.getcode() != 412:
raise
# we're up-to-date (precondition failed)
etag = current_etag
except urllib2.URLError:
# timeout error
etag = None
if etag is not None and current_etag != etag:
# we need to update our version
_rename('_build.py')
content = url.read()
with open('_build.py', 'w') as f:
f.write(content)
with open('._build.etag', 'w') as f:
f.write(etag)
# we're good, let's import the file and run it
mod = __import__('_build')
project_name = sys.argv[1]
deps = [dep.strip() for dep in sys.argv[2].split(',')]
mod.main(project_name, deps)
if __name__ == '__main__':
main()
Extract from the server-side code:
import os
import mimetypes
from hashlib import md5
from wsgiref.simple_server import make_server
# where the served files are located ?
_DIR = '.'
def application(environ, start_response):
if_none_match = environ.get('HTTP_IF_NONE_MATCH', '')
path = environ['PATH_INFO']
path = path.lstrip('/')
if path.startswith('.') or '/' in path:
# looks like a bad path -- or a potential security thread
status = '400 Bad Request'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return "Invalid Request"
path = os.path.join(_DIR, path)
if not os.path.exists(path):
# unknown file
status = '404 Not Found'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return "File not found"
# we're good, let's look at the file age
etag = '"%s"' % md5(str(os.stat(path).st_mtime)).hexdigest()
if etag == if_none_match:
# same file
status = '412 Precondition failed'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return ''
content_type = mimetypes.guess_type(path)
status = '200 OK'
headers = [('Content-type', content_type[0]),
('ETag', etag)]
start_response(status, headers)
with open(path) as f:
content = f.read()
return content
if __name__ == '__main__':
httpd = make_server('', 5000, application)
print "Listening on port 5000...."
httpd.serve_forever()
The server is a plain Python wsgi script that is easy to hook in Apache or Nginx, to serve a directory of files. It should be safe enough as long as the served directory is isolated from the rest of the system. I did not try that hard to secure it, but it makes sure you can’t leave the directory.
EDIT: as noted in the comments, Nginx and Apache provides such ETag features, so you can just configure it to serve static files this way. I did a wsgi script as an initial prototype and to demonstrate what the server does, and I kept it because it’s easy to run or hook into any web server. And I will probably add more features next there.
Now the bootstrap script we use can live in its own repository, and get bug fixes and features, that will be automatically propagated to all our applications.
Do you solve this issue differently ? I am curious to get some feedback on this.
[...] This post was mentioned on Twitter by Planet Mozilla, Planet Plone. Planet Plone said: A simple self-upgrade build pattern: http://bit.ly/g3ayME [...]
For the task of publishing static files, I would use a lightweight pyramid (or bfg) app, as it can just serve out a static tree with a single library call or zcml directive, and it takes care of the changes (although not sure if it satisfies along the same requirements that you implement on the server side of your app). – but I don’t need that for my workflow as you see next.
For generally updating my server software, as I use bazaar for version management, I can simply cd to the required versionmy project tree on my workstation, and simply say ‘bzr push-and-update bzr+ssh://…..’ from my workstation, or wherever my repository is, and my server files are updated through ssh.
For running the bootstrapping (which is usually just bootstrap and buildout in my case), I log in to the server with ssh, and execute manually there. If I needed to automate that I would probably use a remote ssh command.
I probably use a different workflow than you, but there is really nothing that I would feel missing in the process I follow.
That said, there are many flavors and many solutions to fry the same fish.
Interesting. I guess the self-upgrade feature of zc.buildout is similar to what I am achieving here.
Isn’t downloading (updated) code and then executing it a huge security issue? I would rather this only downloads from https:// sites and actually check the certificate before doing anything. Otherwise, a simple man-in-the-middle attack will allow remote code execution…
True, there’s this security concern. You can solve it by using https, or better: by signing the scripts you publish with pgp.
The client-side can then verify it was signed by a trusted party.
If you’re interested by this simple tool and want to provide this feature, I am highly interested !
The code is at http://bitbucket.org/tarek/mopytools
TLDR
But have you seen http://pypi.python.org/pypi/buildout.bootstrap/1.1 ?
A hehe nice
but that updates the script itself, and IIRC, zc.buildout wil still try to upgrade and restart itself.
Does it support being used behing a proxy ?
Yes, it does already support it in fact, through environment variables.
See http://docs.python.org/library/urllib2.html#urllib2.ProxyHandler
You server is sending 317 Precondition Failed, but the client is checking for 412:
if e.getcode() != 412:
Also, I’m curious: if you have Apache or Ngnix, why do you need a Python WSGI app to serve a single static file? Don’t Apache/Ngnix support etags already?
317: sorry, typo. it’s 412.
About the WSGI app: I wanted to quickly prototype this feature and run it in any environment, without having to activate new mods etc.
Now since it’s a basic wsgi app, I can just hook it in Apache or nginx. But yeah, the server-side can be done in any decent server that supports ETags.
[...] A simple self-upgrade build pattern [...]