Tuesday, 17 July 2012

Python for the DBA (1)

I really like the Python programming language for general purpose programming and quickly putting together little utilities. There are similarities to Perl in terms of being an interpreted language with powerful data processing capabilities, but there the direct comparisons end. Python is a more modern language than Perl, and has a relatively clean design (nothing is ever perfect though, and Python itself has evolved over the years). The main reasons I prefer Python over Perl are:
  • Very clean syntax - no need for obscure characters in front of variable names and so on
  • Very readable code - it is generally clear what a piece of code is doing
  • Minimal syntax dressing overhead - Python avoids the need for begin / end block statement markers
  • True support for functions
    • Perl just pushes all the arguments into a single list that you must disassemble
  • Strongly typed - Python supports multiple data types and checks at run time that operations are valid
  • Rich set of types - number, string, list, dictionaries, plus others e.g. set, tuple
  • Supports classic "function" based programming - just like 'C' does
    • Very simple to write code to directly do what you want, and modularise common code into functions
  • Also supports full "object oriented programming" - full support for Classes as first level objects
    • Perl does not truly do classes as first level objects
I'm not trying to say that Python is better than Perl in all cases, or other programming languages. I'm just saying that Python is a very good and usable programming language, and that I prefer it over Perl now. With Perl I find it gets very confusing when you try and use any level of complexity and the syntax is not obvious or consistent, whereas with Python I don't get any of this because it is such a cleanly designed language. With Python I can easily build a small application by just putting together the necessary source code while using functions for modularity (what I call "just build and run"), avoiding the overheads of defining classes with fully object oriented programming. However, if I have a more complex set of requirements then I can do a full blown object based solution using classes within Python (too complex to just build, so some initial design is needed, probably with test cases too).

Python is very useful for a variety of different scenarios:
  • building a small application or utility by just writing the code - no "compile and link" steps as it is interpreted
  • building something iteratively (top down design & development) as your code is always runnable (being interpreted)
  • prototyping object based code quicker and easier than compiled languages such as Java
  • ability to extend Python with your own libraries - it is written in C and can call your own compiled libraries
  • quickly prototyping real Java code that uses standard Java Classes in the JVM via Jython 
    • Jython is a Python interpreter written in Java
    • this also uses far fewer lines of source code than would be needed in Java, because of Python's fundamentally different design
Again, some of these capabilities are not unique to Python, but the combination of them all together make it a winner for me.

Python Overview

Python should be straightforward enough to understand on reading it. The only major difference to other languages is that Python does not have begin / end statement block markers and instead relies on statement indenting to determine which statements are part of the same block. You will see that statements with embedded statement blocks in them have a colon (':') at the end of the first line, and the following lines are indented. Initially it takes some getting used to, but it does end up with less typing on your part and a consistent code layout.

Although Python supports what I call direct coding (write some statements and then just run them), it is really fully object oriented behind the scenes. Thus you see many standard functions actually returning objects, against which you invoke one of their methods (see Database Example below). The syntax of "object.method (arguments)" occurs frequently, rather than the non-object way of "function (object, other-arguments)".

Python Development

You could edit and run Python programs from the command line ("python filename.py" - the ".py" extension is just a common convention), or you could use an Integrated Development Environment, such as Eclipse with the PyDev plugin. The nice thing about this is that it does syntax checking for you as you type, so you can easily spot various errors before you run your code. And you can run your Python application from within the development environment, so you don't have to leave the editor each time you want to test something.

Python Database Access

Like Perl, Python defines a standard API for database access. The intention is that developers can write code based on this standard API, and a variety of drivers can be provided for access to different databases. By changing the driver used you can connect to different databases, but the bulk of your code remains the same.

There are a bunch of drivers out there for Oracle, but the most common one seems to be cx_Oracle.

Python Database Examples

Using Python is relatively straightforward. I won't describe the database API in detail, as it is obvious when you use it. In summary, you call a "connect" library function with connection details and get back a connection handle (really an object). Using this you can then create a new statement handle, execute it, and fetch back any data or other results. One neat thing is that Python has a "fetchall" method that fetches back all data rows for a SELECT into a single list (actually a list of rows, each row being a list of columns). This then lets you traverse the data using Python logic, with no more fetch calls. I assume that there are efficiencies with the "fetchall" call over how the data is transferred back over the network, but I have not done any tests about this. It will of course end up using more memory within your Python program to hold all the results fetched.

Example

Connect to an Oracle database and execute a SELECT from a table, printing out the data fetched.
from cx_Oracle import connect

conn = connect ("scott", "tiger", "orcl")
# Or: conn = connect('scott/tiger@orcl')
curs = conn.cursor()
curs.execute("SELECT name, value FROM atable")
rows = curs.fetchall()
for i in range(len(rows)):
    print "Row", i, "name", rows[i][0], "value", rows[i][1]
Notes:
  • len is a built in function that returns the number of entries in a list
  • range is a built in function that produces a list of numbers from 0 to one less than the supplied value
  • List members can be accessed using array like notation

No comments: