Monday, 17 February 2014

Packaging Python With My Application

To try and keep a longer story as short as possible I needed to package up the Python run time environment to ship along with a Python based application I had written. And in this case the target platform was Windows, though the solution will also work for Linux or any other platform (however most Linux distributions will already have Python on them). I needed to ship Python itself with my application to guarantee that the application would be able to run (had everything it needed), and to avoid complications of requiring the customer to download and install Python themselves (potential issues over version compatibility).

Through a number of blog posts by other people about different "packaging" techniques (see References later) I came up with the following solution that works. This is not the only method of packaging a Python application, and indeed it is quite surprising how many different techniques there are. But this worked for me, and was what I wanted i.e. including Python itself with my Python based application. One of the neat benefits of this for me is that the whole Python run time I need is only 7.5 MB in size, and the main ZIP file of the run time environment is only 2.5 MB in size, which shows how compressable it all is.

Packaging Python with my application

First I create a directory (folder for Windows people) for my application, and put all my application's Python files in there.

Then I create a sub-directory in this to put Python itself into e.g. Python33_Win.

Into this I put the following files:
_socket.pyd
cx_Oracle.pyd
msvcr100.dll
pyexpat.pyd
python.exe
python33.dll
python33.zip
LICENSE.txt
Note that "cx_Oracle.pyd" is needed because my application makes a connection to an Oracle database to do its work. Also "msvcr100.dll" is technically a Microsoft DLL that is needed by programs written in C, which the Python interpreter is. Microsoft allows this DLL to be copied for the purpose of running such C based programs.

The "python33.zip" is something I created, and into which I put the rest of the necessary Python run time files. There are quite a lot of these, all taken from the directory where you installed Python on your own system:
  • All the ".py" files in the top level Python folder
  • The following folders including any sub-folders:-
    • collections
    • concurrent
    • ctypes
    • curses
    • dbm
    • distutils
    • email
    • encodings
    • html
    • http
    • importlib
    • logging
    • pydoc_data
    • site-packages
    • unittest
    • urllib
    • venv
    • wsgiref
    • xml
Then I wrote a wrapper script to run my application via the Python run time included. In this case it is a Windows batch script, and it exists in the folder above my application source code. My application needs two command line arguments provided e.g. user name and password.
MyAppName\Python33_Win\python MyAppName\myappname.py %1 %2
That's it, and it works.

How does it work.

Built into the Python interpreter i.e. into "python.exe", is clearly the functionality to dynamically load into itself various libraries it needs at run time. An example of these are the "*.pyd" files explicitly included in the Python directory. However, it also has the functionality to open up a ZIP file and look inside that for the libraries it needs. Thus we can take most of the Python run time environment files and put them into a PYTHON33.ZIP file, and Python will look in here to find the files it needs. Which makes packaging up Python pretty simple.

The exceptions to this are the "python33.dll" and Microsoft C DLL, and a few PYD files (which are Python Dynamic libraries, a bit like Windows DLL's). These will not be found in the ZIP file, as they seem to be needed before Python gets around to opening such a ZIP file.

Further Notes

  • These files and directories are what I needed to make my application work. The particular set needed can be different for each Python based application, depending on what modules you import into your program at run time.
  • I tried using the "modulefinder" module to report on which files were actually being referenced at run time when my application was being run. This helped reduce down the total number of files I needed to include in the ZIP file.
  • The ZIP file is named "python33.zip" because I am using Python version 3.3, and its DLL is named "python33.dll" i.e. it looks for a ZIP file with a matching name of its version.

References

The main articles I found that helped me and pointed out this way of packaging up Python were:
  • Distributing a Python Embedding Program which states that everything can be put into a ZIP file, except for some special files, and gives an example where those files are listed.
  • How to Distribute Commercial Python Applications which describes the different packaging options available for Python based applications, and concludes that "For complex scenarios, my recommendation is to go with bundling CPython yourself and steer clear of the freezers".
    • It does not however tell you how to "bundle CPython yourself" - it just states that it is possible to do it.