Django ORM group by

Filtering in Django over foreign keys or many to many relations's values has, as a buy product, the possibility of creating duplicates of the object you are actually filtering from. A simple way to avoid this is to group by on the primary key of the model. Django, however, does not have group by in the ORM. There is however a way to make Django add it. By slightly abusing annotation it is possible to add just such a claus.

from django.db.models import Count obj1.objects.filter(obj2__value=1).annotate(Count('pk'))

This will add the group by on the obj1 primary key and avoid duplication on the result set.

Threading local and Django

It is rare for me to find something in Python that does not work as I expect it to. Generally speaking, the way I think seems to match the way Python does things. Thats a great advantage to have when working with the language. However when Python does not act as expected it is hard for to find out what is actually going on, because I have to try and think in ways alien to me. Today I met just such a issue within threading.

I am currently working on a project which involves a lot of aggregation of data. Recently the decision was make the application international. Part of the process was creating international databases and split the aggregated data across these databases. Most examples of using multiple databases use either the app, or the specific model to determine which database to use. My use case was more difficult as it involved selecting the database based on the actual data. If the data source was, for instance, from Spain, the spanish database should be used. The data was streaming in all mangled up so it was not possible to use the source as a way to distinguish the data. Furthermore, the data was not coming through a request (where middleware can be used) but synced in from a remote database via a cron job that the proceeds to aggregate the data. To set up the database routing I created a router class. The problem is, there is no way to actually pass data to the routing class. At the moment the hints parameter contains only the instance, if it exists. So for a newly created model object it's empty. My idea was to try and use the threading.local() to communicate between the aggregation function and the db routing class.

Why I'm not using using

First however let me explain why I have not chose to use using. The problem is that when aggregating, the application creates a few different objects, depending on the data. These model objects are created through proxy analyser classes. Using the manual method would not only involve a lot of code, it will also make debugging difficult.

What didn't work

My first instinct was to thing that in each file I needed To access the data I just needed to add the following lines:

import threading
local_storage = threading.local()

And that local storage would be consistent across the whole thread. Unfortunately, it either wasn't, or the code was running on two separate threads, which I don't think it did. This might be a good time for a disclaimer. I am by no means a threading expert. What I say may be really off base. All I know is what I observed and what did, and did not work for me. I added this code to both the db routing class and the aggregator. The aggregator would add an attribute to local_storage and the db routing would check for it to determine the routing. This attempt failed. After some debugging I found that the object created in the aggregator and the one created at the db router were, in fact, not the same. I figured if I define it one of the two, and import it for the other then surely this would be the same object. It was not. I think it probably has something to do with how import actually work, but I'm not sure. I was starting to get frustrated. Googling around didn't really turn up anything significant. I was all but ready to give up on threading.local when I found the django-tools Threadlocal middleware on github. It was using the mechanism I had in mind to make the request available everywhere. I was quite sure that this code worked (because everything you read on the internet is true, right?). So what was I doing wrong?

What did work

The difference seemed to be in what was actually imported. The middleware was defining the local storage and to access the local storage the middleware module was imported and the local storage is then accessed via functions in the module. I did not really see the difference but figured it was worth a shot. I added a new module that defined the local storage and added getters and setters. To my surprise it actually worked. I have no idea why this method worked while the other failed. I am guessing, as I said, that it has something to do with how threading and import work, and what is passed by reference and what is passed by value. One day I will have to dig deeper into this but for now this will do

Conclusion

The threading.local() object offers a thread safe manner to pass data between different parts of the django application when normal parameter passing is not possible. For it to work properly you need to create a proxy module with a getter and setter (and a deleter) and then import that module to each module that needs access.

Including your django site in a script

Out of the box django comes with a command line tool that is pretty useful. Just reading the getting started will introduce you to it. One of the options it has is top open an interactive python shell that will allow you to interact with your django app1. If a certain task is done regularly, there is also the possibility of writing your own management commands. Its actually quite easy to do and can be incredibly useful, as it can also be combined with Fabric to automate a lot of work.

That said there are those rare occasions when the need arises to run an independent python script that uses some part of the django code. since django is just python code in a few simple steps you can be hacking away at your custom script. What is needed is to import the settings and the right directories to the python path. Assuming your app is called my_app and that all the django apps are in a folder called apps the following code should do the trick:

import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), 'apps')))
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'

And done. Its now possible to just import apps and other django packages as in a normal django app. What did this do? The first two lines are cleared. The third line adds the parent folder to the python path. this will allow the use of my_app as a package. The fourth line adds the apps folder to the python path. This will allow inclusion with just app names. finally, in line 5 the django settings is set. Of course if your settings file is else where, this needs to change to where it is.

Debug http request in Python

Creating an http request in python is quite ismple. If anything, there are to many options to get do it. My prefered way is to use the wonderful requests library. However, this library can be a bit slow for some used cases. I was facing just such a used case today.

Going with urllib2 I didn't expect any trouble. Unofrtunatley, taht was not the case. I was getting 401 errors and the response object was not set. It seemed like I was stuck with my debugging. I almost gave in and instaled requests when I came across this stackoverflow question. Seems that the error is what I needed to inspect (actually read). Calling e.read() was all that was needed to show me what was actually going wrong with the requests.

python MySQLdb over ssh

Note: These instructions have been tested on OS X 10.8. If your using a different OS, this might not work for you.

MySQLdb is good way to make a connection to a MySQL database from your python script. However what to do when you need to first ssh to some server first? The solution seems to be to open an SSH Tunnel.

SSH Tunnel

Setting up a tunnel is fairly easy:

ssh user@host -L localport:mysql-server:mysql-server-port

the -L declares local machine. What this does is forward anything on the local machine sent to localport to mysql-server on mysql-server-port via host. If the remote server and the database server are on the same machine you can just use localhost for mysql-server If our host is called remote.foo,the database server is called dbs.bar and uses the default port, and we want to use port 9870 to tunnel our command will look like this:

ssh user@remote.foo -L 9870:dbs.bar:3306

Running this command in the terminal will setup the tunnel and open an ssh shell on remote.foo as a side effect. Keep that terminal tab open. If you close it or exit the remote shell, the tunnelling will be terminated. Open a new tab for your python script.

Python connection

Assuming the same settings are used as used above, the python connect call will look like this:

MySQLdb.connect(host='127.0.0.1', port=9870, user='user', passwd='password')

The connection is opening to your local machine op port 9870. The SSH Tunnel forwards all communication on that port, to dbs.bar:3306 via remote.foo.

Bunnies, now in text

One of the small irritations faced by developers is source code commit messages. So much so, that there is even a website dedicated to weird, stupid and/or pointless commit messages. Inspired by one of the messages there. One drawing an ASCII bunny, I decided to make a site dedicated to commit bunnies. And so bunnyapprovesthesechanges.com was born. It was a simple php script that created a random bunny with a message. I was planning on recoding it in python and making it more usable, but I never got around to it. I used it here and there but I figures I was pretty much the only one.

Two days ago, I got an email from a person working for an american company. He thanked me for making the site, said that they use it frequently in his company and was wondering if I was planning to add the ability to get plain text messages instead of html. I was pleasantly surprised. Someone out there was actually using this. It felt awesome. And so last night I sat down and rewrote everything in python.

Since this is a very simple app even flask felt like overkill, so I went with werkzeug. And as mentioned I got around to add plain text form. Since I use python a lot I also added a multiline python string option. The site now works as follows:

  • Just visitng the site will give html
  • adding a /txt will return a plain text version
  • adding a /py will return a multiline python string

Happy coding

Python MySQL and Unicode

One of the biggest changes between python 2 and python 3 is unicode support. While python 2 default string is ascii, python 3 is all unicode[citation needed]. For different reasons I have not yet made the move to python 3 which means that every now and again I get into some unicode issues.

Recently I needed to create some special exports from a utf-8 database1. At first I completely ignored the fact it was unicode and expected everything to go just fine. Which of course it didn't.

My second attempt was to try and convert the strings I was getting from the database to utf-8 encoded unicode. This also did not go well. At first I thought I was doing it wrong, or incomplete. Turn out I wasn't. The problem was I was looking at the wrong place.

When using MySQLdb to connect to the MySQL server, you use the connect function. This function will connect not using the default encoding of the database neither in the default encoding of the script. It will connect using Latin-1. That is, unless you give it the key argument use_unicode=True So for instance to connect to a local mysql server as root the function call would look something like:

con = MySQLdb.connect('127.0.0.1', 'root', use_unicode=True)

And that pretty much solved all my unicode problems. All the data form the database was now coming as utf-8 encoded unicode.


  1. unicode and utf-8 are not one and the same. Never forget that. 

Going Static

All the cool ids are doing it! For a good reason. I decided a while a go to turn off comments in my blog(A whole other discussion). Once comments are out there is no real reason to use a database for the blog since the posts pages are pretty static. Also, I am trying to minimize the amount of php I use. Moving away from wordpress was high on that list. So I looked around for a static generator.

The options

The first name I came across is Jekyll. Developed by github, it is a very popular engine. I, however decided to skip it as it seems rather strict in how it does things, and what it can do, and because it was in ruby. I have nothing against ruby, but I prefer not to learn a new programming language at the moment. The next candidate was hyde which is a static website generator written in python, originally developed as a port of Jekyll, it is quite different. Hyde lookd more flexible and it was written in a language I know and love. So I picked it. For a really good, though a bit old, comparison of the two check out Philip Mat's blog and Distractable.

Going Hyde

So Hyde it was, on with the show. Documentation was somewhat missing, but at this point it was not really a problem. This being python, getting started is a breeze.

Setup

Installing hyde is as easy as

pip install hyde

I highly recommend you use a virtualenv.

Adding preprocessing

Hyde uses preprocesses to work with your content. This is a very powerful features and one of the reason I wanted to work with hyde. Since you can write your own you can pretty much do anything with your content.

Extensions

Hyde is by default in your (virtualenv) site-packages. To add an extension, add the extension file to the lib/python2.7/site-packages/hyde/ext/plugins/ folder.

Markdown

Hyde by supports markdown out of the box. However it does not support changing the extension. that is, if your file's extension is .md, Hyde will deploy it as a .hd file, even though the content has been alterd to html. The suffix extension converts file suffixes. It is configurable through the site.yml

Content

Hyde takes your content and generates static html from it. In this section I will try, to the best of my knowledge to explain how the generation of content works.

Location

Content goes into the content folder. Generally speaking hyde will, unless otherwise instructed, render each file in the content folder as its own html page. Hyde emulates the folder construction in the content folder for the actual website. So, for instance, if your content folder looks like this:

-content
--about.md
--blog
---posts.md
---2012
----myfirstpost.md

Then the generated website will look like this:

-www.example.com
--about.html (www.example.com/about.html)
--blog
---posts.html
---2012
----myfirstpost.html (www.example.com/blog/2012/myfirstpost.html)

This makes it quite easy to determine the paths inside your website

Anatomy of a content file

A content file should start with a YAML Front Matter2. This is used to add some metadata to the post. An example is given here:

---
title: Test post html
description: A test post for hyde
created: !!timestamp '2012-01-01 10:00:00'
tags:
    - ideas
    - thoughts
---

As you can see the YAML Front Matter starts and ends with ---. In between the title, description, created date and tags are given. I am not aware of any other metadata available. This part is not required. I need to test what the defaults are.

After the YAML Front Matter heading, the actual post can be written. Hyde supports markdown3, which is quite easy to use.

Templating

Hyde uses the jinja2 templating engine. It is quite similar to Django's templating engine1.

Choosing a template

As far as I can tell hyde chooses which template file to use following these rules:

  1. If the content file has an extends attribute in the yaml head, use that template
  2. Otherwise, if there is a meta.yaml in the folder with an extends attribute, use that
  3. Move folders up until you find a meta.yaml and if it has an extends use that.
  4. If no template file is found, the file is not rendered in a template and is simply copied over to the deploy folder.

Mark

Hyde uses the mark tag to mark certain pieces that will be used elsewhere. The official list should eventually be available in the markrefer page of github. However at the moment this page contains no data. Here is a, probably incomplete list, of marks

  • image - Used for images
  • excerpt - Used as teasers in blog

Marks are used like jinja macros with the - sign for start and end, i.e.

{% mark image -%}
![Airport]([[!!images/airport.png]])
{%- endmark %}

For an image mark

Creating a list

There are several lists which any blog needs to generate. Some of these lists will be used to create pages just for them. However since the documentation was so lacking I was unable to find a way to do this the way I wanted

Giving up

At was at this point after investing hours, that I gave up on hyde. It looks great, and I am sure that it can do what I want. However lack of documentation makes it really hard to work with. I decided this already took to much of my time and so went to look elsewhere.

Blacksmith

It was time to try something completely different. Blacksmith is a node.js based static engine create by nodejitsu. It all looked very promising, until I saw that each post needs to have its own directory and that it has 1 file for the content and a json file for the header. For me that was just to much.

Pelican

I was close to give up and stick to wordpress when I can across Pelican. Written in python, and with great documentation, it was just what I needed. The docs are, in fact, so good that I really don't need to write any guide or about how I finally made the change. I will tell you this tough, it took me all of 3 hours, including importing the content from wordpress. Pelican is elegant, easy to use, well documented and it renders the markdown files quite fast. All I was looking for.


  1. Jinja2 can in fact be used in Django as well. 

  2. Jekyll terminology. I do not know if hyde has its own term for this. The idea is the same, though the supported syntax is, as of writing, not. 

  3. Markdown was introduced by John Gruber for the exact purpose of simplifying writing for the web. Learn more at daring fireball.net 

Installing PostgreSQL and Sql-Ledger for Django apps on Snow Leopard

At DOP we started working on a django front end for Sql-Ledger with the idea of combining it later on with our ticketing system. Getting it going on my MacBook proved to be more challenging then expected. Most of the problems were easily solvable, but finding the oplossing proved to be tricky. Therefor I have decided to create this simple guide, showing the steps I have taken to get everything going. Step 1: PostgreSQL You can download a binary installer from the PostgreSQL site. It will install Postgre to your /Library folder. This should take care of the Postgre part. Step 2: Sql-Ledger Ledger is written in perl, and requires perl to run. Luckily Snow Leopard comes standard with perl. What it does not come standard with is the perl binding for PostgreSQL. The following 3 commands will take care of that: sudo perl -MCPAN -e "install +YAML" YAML is not necessary but it can be helpful, and lacking YAML might give you problems with DBI. sudo perl -MCPAN -e "install DBI" sudo perl -MCPAN -e "install DBD::Pg" This will install the perl binding for PostreSQL. Download Sql-Ledger from here. As of this writing, the latest version is 2.8.31 Unzip and untar the archive and follow the instructions in the readme file. I did not manage to build using the automated script and had to do everything by hand. There is also this page on the ledger page with a somewhat oldish instructions for OS X. Once Ledger is tested and it is working, its time to get the python binding Step 3: Python binding The pyhton binding is for PostgreSQL. You can get the binding library, called Psycopg2 (don't ask me why) from here. Download and upack it. Installing is done by running: python setup.py build sudo python setup.py install If you are not able to build it might be a problem with your path. Python is trying to link to the PostgreSQL bin files. Try to add it to your path and then to build again. And your done. Enjoy (or not) working with Django-Sql-Ledger