It is rare for me to find something in Python that does not work as I expect it to. Generally speaking, the way I think seems to match the way Python does things. Thats a great advantage to have when working with the language. However when Python does not act as expected it is hard for to find out what is actually going on, because I have to try and think in ways alien to me. Today I met just such a issue within threading.
I am currently working on a project which involves a lot of aggregation of data. Recently the decision was make the application international. Part of the process was creating international databases and split the aggregated data across these databases. Most examples of using multiple databases use either the app, or the specific model to determine which database to use. My use case was more difficult as it involved selecting the database based on the actual data. If the data source was, for instance, from Spain, the spanish database should be used. The data was streaming in all mangled up so it was not possible to use the source as a way to distinguish the data. Furthermore, the data was not coming through a request (where middleware can be used) but synced in from a remote database via a cron job that the proceeds to aggregate the data. To set up the database routing I created a router class. The problem is, there is no way to actually pass data to the routing class. At the moment the hints parameter contains only the instance, if it exists. So for a newly created model object it’s empty. My idea was to try and use the
threading.local() to communicate between the aggregation function and the db routing class.
Why I’m not using
First however let me explain why I have not chose to use
using. The problem is that when aggregating, the application creates a few different objects, depending on the data. These model objects are created through proxy analyser classes. Using the manual method would not only involve a lot of code, it will also make debugging difficult.
What didn’t work
My first instinct was to thing that in each file I needed To access the data I just needed to add the following lines:
import threading local_storage = threading.local()
And that local storage would be consistent across the whole thread. Unfortunately, it either wasn’t, or the code was running on two separate threads, which I don’t think it did. This might be a good time for a disclaimer. I am by no means a threading expert. What I say may be really off base. All I know is what I observed and what did, and did not work for me. I added this code to both the db routing class and the aggregator. The aggregator would add an attribute to
local_storage and the db routing would check for it to determine the routing. This attempt failed. After some debugging I found that the object created in the aggregator and the one created at the db router were, in fact, not the same. I figured if I define it one of the two, and import it for the other then surely this would be the same object. It was not. I think it probably has something to do with how
import actually work, but I’m not sure. I was starting to get frustrated. Googling around didn’t really turn up anything significant. I was all but ready to give up on
threading.local when I found the django-tools Threadlocal middleware on github. It was using the mechanism I had in mind to make the request available everywhere. I was quite sure that this code worked (because everything you read on the internet is true, right?). So what was I doing wrong?
What did work
The difference seemed to be in what was actually imported. The middleware was defining the local storage and to access the local storage the middleware module was imported and the local storage is then accessed via functions in the module. I did not really see the difference but figured it was worth a shot. I added a new module that defined the local storage and added getters and setters. To my surprise it actually worked. I have no idea why this method worked while the other failed. I am guessing, as I said, that it has something to do with how threading and import work, and what is passed by reference and what is passed by value. One day I will have to dig deeper into this but for now this will do
threading.local() object offers a thread safe manner to pass data between different parts of the django application when normal parameter passing is not possible. For it to work properly you need to create a proxy module with a getter and setter (and a deleter) and then import that module to each module that needs access.