Python for Dervish and Astrotools Users (and other astronomers)

Table of Contents

1 Introduction

This document is not a tutorial; it is a quick tour through interesting features that are likely to be of interest, but will probably be missed by more general tutorials.

Using Python for Interactive Data Analysis by Perry Greenfield is a good general tutorial to using python for looking at astronomical data, including worknig with FITS files and displaying images in ds9, for example.

2 General commentary on Python

A general rule I have notices when working on python: if something seems unnecessarily difficult, there is probably an easier way to do it. See The Zen of Python, particularly line 13.

This quote, from this post on Ryan Tomayko's blog, rings true:

In Java, the tool (IDE) is the brush and the code is the canvas. In Python, the language is the brush and the machine is the canvas – the language and the tool are the same thing. In Java you enhance the IDE, in Python you enhance the language.

This is the primary reason why there is little serious demand for Python IDEs. Many people coming to Python can’t believe no one uses IDEs. The automatic assumption is that Python is for old grey beards who are comfortable with vi and Emacs and refuse to accept breakthroughs in programming productivity like IDEs. Then they write a little Python code and realize that an IDE would just get in their way.

More good advice on writing idiomatic python can be found on Dave Gooder's page on idiomatic python.

3 Using C and Fortran code

If you have a shared library compiled from C or Fortran (or anything else that can compile into a standard Linux or Windows shared library), then you can call it from Python. The only complication is that you need to tell python about the mapping between python types and the C/Fortarn types.

Now, I can use the slalib library from python:

>>> from ctypes import *
>>> from math import pi
>>> slalib = CDLL('/home/neilsen/pythonForDervishUsers/libsla.so')
>>>
>>> airmass = slalib.slaAirmas
>>> airmass.restype = c_double
>>> airmass.argtypes = [c_double]
>>> 
>>> airmass(0.0)
1.0
>>> airmass(30*(3.14/180)) # slalib does everything in radians
1.1541712912318414

Handling pointers, structures, etc. is a little more complicated, but not much. This is what you do with pointers:

>>> e2h = slalib.slaDe2h
>>> e2h.restype = c_void_p
>>> e2h.argtypes = [c_double, c_double, c_double, POINTER(c_double), POINTER(c_double)]
>>>
>>> ha = 0.8
>>> dec = -0.1
>>> lat = -0.6
>>> az = c_double()
>>> el = c_double()
>>> d=e2h(ha, dec, lat, az, el)
>>> print az, el
c_double(5.1209753260967794) c_double(0.67964167315433521)

You can also use structures, unions, and other C data types. For example, if you have a C program that looks like this:

typedef struct {
   double    x;
   double    y;
} point;

double foo(point p) {r
  return(p.x/p.y);
}

which you compile like this:

$ gcc -shared -fPIC -o ver.so ver.c

you can use it like this:

>>> foolib = CDLL('/home/neilsen/pythonForDervishUsers/foo.so')
>>> foo = foolib.foo
>>> foo.restype = c_double
>>> class Point(Structure):
...     _fields_=[('x',c_double),('y',c_double)]
...
>>> foo.argtypes = [Point]
>>> p = Point(10,20)
>>> print foo(p)
0.5
>>> p.x = 50
>>> print foo(p)
2.5

There are a couple of tools, ctypeslib and ctypesgen, each of which is supposed to automatically generate all the structure definitions and call signatures by reading the C header files, but I haven't tried either, neither are part of the official python language.

See this blog post for more info on working with Fortran libraries. In particular, the arguments are pointers rather than values, and the case of characters in function names can get changed in the library.

See this page for pointers on getting libraries to work well with numpy. It is particularly helpful for getting going between numpy arrays and ctype arrays.

4 numpy

The numpy package adds a new class for handling arrays which support operations similar to those found in IDL.

>>> from numpy import zeros
>>>
>>> x = zeros(5)
>>> print x
[ 0.  0.  0.  0.  0.]
>>> x[1:3] = 5, 4
>>> print x
[ 0.  5.  4.  0.  0.]
>>>
>>> y = zeros(5)
>>> print y
[ 0.  0.  0.  0.  0.]
>>> y[0:3] = 2, 3, 5
>>> print y
[ 2.  3.  5.  0.  0.]
>>> print x + y
[ 2.  8.  9.  0.  0.]
>>> print x - y
[-2.  2. -1.  0.  0.]

numpy also supplies a record type, suitable for storing a rows of a FITS binary table.

>>> from numpy import dtype, array
>>> pointing = dtype({'names': ['objname','ra','dec'], 'formats': ['S40', 'd', 'd']})
>>> stars = array([('Arcturus', 213.9, 19.2),('Sirius', 101.3, -16.7)],dtype=pointing)
>>> print stars[0]
('Arcturus', 213.90000000000001, 19.199999999999999)
>>> print stars['ra']
[ 213.9  101.3]
>>>
>>> from pyfits import *
>>> fhdu = new_table(stars)
>>> fhdu.header.update('CREATOR','Neilsen')
>>> fhdu.writeto('foo.fits')

5 Using SDSS data

There is a neat python package called sql2cl, available from the SDSS CAS website.

from sqlcl import query as sdss_query
from numpy import recfromcsv, array

data_handle = sdss_query("SELECT g-r AS color, r FROM Star S, dbo.fGetNearbyObjEq(205.5422,28.3816,10) N WHERE S.objID = N.objID AND err_g<0.02 AND err_r < 0.02")
d = recfromcsv(data_handle, names=True)

6 Plotting with pygplot

There are a great many plotting packages with python wrappers. Perry Greenfield's tutorial (mentioned the 1) uses matplotlib, which was designed to be familiar to users coming from matlab.

Users coming from Dervish and astrotools may prefer pygplot, a wrapper around pgplot, the same package wrapped for Dervish.

Here is how you would plot the color-magnitude diagram from 5:

from pygplot import Plot
p = Plot(title="Color-Magnitude Diagram", xlabel="g-r", ylabel="r magnitude")
p.point(d['color'],d['r'])
p.plot()

The various operations related to plots mostly map directly from Dervish to pygplot, because they both wrap the same underlying library.

7 doctests

The usual mechanism for incorporating documentation into python code looks like this:

#!/usr/bin/env python
"""Here is a short description of the module

A longer description goes here.

Now, here is an example
>>> a = [1, 2, 3]
>>> print a + [5, 10]
[1, 2, 3, 5, 10]

Here is another example
>>> from math import pi
>>> print pi
3.14159265359
>>> print pi/2
1.57079632679
"""

def add_ten(x):
    """Here is a short descripion of add_ten

    Here is a longer one.

    For example:
    >>> foo = 103
    >>> print add_ten(foo)
    113
    """
    return x+11

Now, you can test it like this:

$ python -m doctest example.py
**********************************************************************
File "examples.py", line 26, in examples.add_ten
Failed example:
    print add_ten(foo)
Expected:
    113
Got:
    114
**********************************************************************
1 items had failures:
   1 of   2 in examples.add_ten
***Test Failed*** 1 failures.
$

After correction, the output looks like this:

$ python -m doctest examples.py
$

In other words, there is no output.

One useful trick for debugging a doctest (or any other python code) is to add this line to your code where you want to start a debugger:

import pdb; pdb.set_trace()

This will start a debugger at that point in the code. See the online python documentation.

8 list comprehension

Python has list comprehensions, which make a large class of loops unnecessary. This:

>>> xlist = [1, 8, 4, 10]
>>> ylist = []
>>> for x in xlist:
...     if x < 5:
...         ylist.append(x^2)
... 
>>> print ylist
[3, 6]
>>>

can be replaced by this:

>>> xlist = [1, 8, 4, 10]
>>> zlist = [x^2 for x in xlist if x < 5]
>>> print zlist
[3, 6]

9 formatting strings

Python formats strings using string templates modeled after those used in C printf statements, but uses the % operator instead:

>>> print 'fpC-%06d-%c%d-%04d.fit' % (4646, 'r', 3, 81)
fpC-004646-r3-0081.fit

Python has a useful extension of this that lets it fill in values from a data structure that can be indexed by a string, for example a dictionary:

>>> frame_data = {'run': 4646, 'camcol': 3, 'rerun': 40, 'frame': 81, 'filter': 'g'}
>>> print 'fpC-%(run)06d-%(filter)c%(camcol)d-%(frame)04d.fit' % frame_data
fpC-004646-g3-0081.fit

or a numpy record:

>>> from sqlcl import query as sdss_query
>>> from numpy import recfromcsv, array
>>> 
>>> data_handle = sdss_query("SELECT run, rerun, camcol, field FROM field AS f JOIN dbo.fGetNearbyFrameEq(205.5422,28.3816,10,10) AS match ON f.fieldID=match.fieldID")
>>> d = recfromcsv(data_handle, names=True)
>>> 
>>> print 'tsField-%(run)06d-%(camcol)d-%(rerun)d-%(field)04d.fit' % d[0]
tsField-004646-3-40-0081.fit

10 functools

The functools packages includes tools for operations on functions and other callable objects. For example, the strftime function from the time module formats the time according to a string template. If you always want to use the same template, creating a new function that does just that is trivial:

>>> from ctypes import *
>>> slalib = CDLL('/home/neilsen/pythonForDervishUsers/libsla.so')
>>> rdplan = slalib.slaRdplan
>>> rdplan.restype = c_void_p
>>> rdplan.argtypes = [c_double, c_int, c_double, c_double, POINTER(c_double), POINTER(c_double), POINTER(c_double)]
>>> 
>>> mjd, planet_id, latitude, longitude = 53436, 3, -0.52648, -1.2359
>>> 
>>> ra1, decl1, diam1 = c_double(), c_double(), c_double()
>>> rdplan(mjd, planet_id, latitude, longitude, ra1, decl1, diam1)
>>> print ra1, decl1, diam1
c_double(5.2980920325026899) c_double(-0.42149579343231441) c_double(0.0095628006940188508)
>>> 
>>> from functools import partial
>>> rdmars = partial(rdplan, mjd, planet_id, latitude, longitude)
>>> ra2, decl2, diam2 = c_double(), c_double(), c_double()
>>> rdmars(ra2, decl2, diam2)
>>> print ra2, decl2, diam2
c_double(5.2980920325026899) c_double(-0.42149579343231441) c_double(0.0095628006940188508)
>>>

11 Using R from python: simple linear regression example

The rpy2 python package lets you use R functions on numpy arrays. For example, to use Rs linear least squares fit routine:

from numpy import array, rec
from numpy.random import normal as nprandom
from rpy2.robjects import numpy2ri, r

foo = array(range(10))
bar = foo + nprandom(0,1,10)

d = rec.fromarrays([foo, bar], names=('foo','bar'))
fit = r.lm('bar ~ foo', data=d)
print fit.rx2('coefficients')

When using R and numpy, be sure to include the line:

from rpy2.robjects import numpy2ri, r

Importing numpy2ri triggers automatic converson of numpy arrays to corresponding R structures when you use them as arguments to R functions. You want this.

12 Plotting with R

My favorite way of generating plots in python is to the wrapper around the R plotting tools. There are actually three different plotting engines of interest in R: the base plotting package, the lattice plotting package, and ggplot2.

Here is how you would plot the color-magnitude diagram from 5 using R's "base" plotting package:

from rpy2.robjects import r, numpy2ri
p = r.plot(d['color'],d['r'],ylab='r magnitude',xlab='g-r')

The ggplot2 package from R provides very powerful plotting tools, and generates attractive output, but is a bit cumbersome for simple plots and conceptually weird:

from rpy2.robjects import r, numpy2ri
import rpy2.robjects.lib.ggplot2 as ggplot2

g = ggplot2
p = g.ggplot(d) \
    + g.aes_string(x='color',y='r') \
    + g.geom_point() \
    + g.xlim(0,2) \
    + g.ylim(22,15)
p.plot()

See rpy2 page on using ggplot2 from Python for many examples.

The R lattice plotting package may also be of interest: it produces nicer plots than the base backage without being quite as weird as ggplot2. See the rpy2 page on using lattice from Python for examples.

13 Getters and setters

Because you can trigger the execution of object methods by accessing or setting "members", the standard java practice of accessing all members through getters and setters useless in python. In fact, there are times when it is better to replace methods with (apparent) members.

Consider this example:

"""Silly example of property

>>> a = Angle(45)
>>> print a.radians/pi
0.25
>>> a.radians = pi/2.0
>>> print a.degrees
90.0
>>> a.degrees = 36.0
>>> print a.radians/pi
0.2

"""
from math import pi

class Angle(object):
    def __init__(self, a):
        self.degrees = a

    def set_radians(self, radians):
        self.degrees = radians * 180.0/pi

    def get_radians(self):
        return self.degrees * pi/180.0

    radians = property(get_radians, set_radians)

See Ryan Tomayko's blog for a longer discussion. For other more discussion of mistakes Java developers make when writing python, see PJ Eby's Python is not Java.

14 Converting non-shared libraries to shared libraries for CDDL

Unfortunately, most of the libraries on the SDSS cluster are static rather than shared, so I had to convert them, for example:

bash$ setup slalib
bash$ mkdir /scratch/neilsen/slalib
bash$ cd /scratch/neilsen/slalib
bash$ ar -x $SLALIB_DIR/lib/libsla.a
bash$ gcc -shared -fPIC -o libsla.so *.o
bash$ ls *.so
libsla.so
bash$ ls /scratch/neilsen/slalib/libsla.so
/scratch/neilsen/slalib/libsla.so

15 Reloading python modules

15.1 The problem

One source of confusion and irritation for Dervish users moving to python is that the python import statement does not act quite like the tcl source command, in that repeating the import command does not update the interpreter with changes in the module.

For example, if I have a python module in example.py:

def testprint():
    print "This is test 1."

I can load it and run it thus:

>>> import example
>>> example.testprint()
This is test 1.

If I now change example.py to

def testprint():
    print "This is test 2."

and repeat the above in the same interpreter, it still runs the old code:

>>> import example
>>> example.testprint()
This is test 1.

15.2 Basic use of reload

This isn't very nice, but it is easy to overcome: the reload command updates an existing module:

>>> reload(example)
<module 'example' from 'example.py'>
>>> example.testprint()
This is test 2.

15.3 Reloading when the original import was of the form "from x import y"

The relead statement does not take a string, but a module object. If you originally imported a function using the from x import y form:

>>> from example import testprint
>>> testprint()
This is test 1.

then the module "example" isn't directly available:

>>> relead(example)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'relead' is not defined

and you have to jump through some hoops to get it:

>>> import sys
>>> reload(sys.modules['example'])
<module 'example' from 'example.py'>
>>> testprint()
This is test 2.

15.4 Releads and objects in memory

When you reload a module, it changes python's definitions of those modules, but it does not change python objects already in memory. For example, if I have a module that looks like this:

class Example(object):
    def testprint(self):
        print "This is sample class version 1"

and I import it and create an instance of the class:

>>> import Example
>>> first = Example.Example()
>>> first.testprint()
This is an object of class version 1

and then change and reload the Example module, then any new object I create will use the new code, but the existing object does not change:

>>> reload(Example)
<module 'Example' from 'Example.py'>
>>> first.testprint()
This is an object of class version 1
>>> second = Example.Example()
>>> second.testprint()
This is an object of class version 2

Note that there are other ways in which references to old version of the code can stick around when you might not expect it. For example, if I have a sample module sample.py:

def samp_ver():
    return 1

load it, and record a reference to it:

>>> import sample
>>> first = sample.samp_ver
>>> first()
1

and then change an reload it, the reference still points to the original version:

>>> reload(sample)
<module 'sample' from 'sample.py'>
>>> sample.samp_ver()
2
>>> first()
1

If I go about the problem slightly differently, I can get the result I probably want:

>>> import sample
>>> def first():
...     return sample.samp_ver()
... 
>>> first()
1
>>> # Change code here
... 
>>> reload(sample)
<module 'sample' from 'sample.py'>
>>> first()
2

This is because, in the first example, first references the function that was defined at the time, while in the second example, first refereces a bit of code. The sample.samp_ver() in that bit of code isn't dereferenced until it is actually run, and if it has been redefined at the time it is run, it will see the redefined version.

When approached the second way, updated behavior also propogates to objects in memory:

>>> import sample
>>> class Foo(object):
...     def print_version(self):
...         print sample.samp_ver()
... 
>>> first = Foo()
>>> first.print_version()
1
>>> 
>>> reload(sample)
<module 'sample' from 'sample.py'>
>>> first.print_version()
2

15.5 Reloading CDLL shared libraries

C shared libraries accessed through ctypes are harder to deal with. For example, if I start with:

double add_version(double x) {
  return(x + 1);
}

compliled thus:

$ gcc -shared -fPIC -o ver.so ver.c

Start by loading it and testing it:

>>> ver1 = CDLL('/Users/neilsen/Documents/pythonForDervishUsers/ver.so')
>>> first = ver1.add_version
>>> first.restype = c_double
>>> first.argtypes = [c_double]
>>> first(0.0)
1.0

Now, change the library to:

double add_version(double x) {
  return(x + 2);
}

and try to reload it as something else:

>>> ver2 = CDLL('/Users/neilsen/Documents/pythonForDervishUsers/ver.so')
>>> second = ver2.add_version
>>> second.restype = c_double
>>> second.argtypes = [c_double]
>>> second(0.0)
1.0

I still get the old version.

Deleting or updating shared libraries is actually really messy. See this stackoverflow question.

Author: Eric H. Neilsen, Jr.

Created: 2014-06-16 Mon 15:31

Emacs 24.3.1 (Org mode 8.2.5c)

Validate