2010-12-26

Easy, Automated Reloading in Python

I love Python, but I have to admit it has a few weak points. One of the biggest for me has been the (in)ability to reload Python code at runtime. While there is a function (aptly named 'reload') which does just that, it is notoriously difficult to use correctly. The essential difficulty stems from the fact that when you reload a module, any references you had to objects in the old version still point to the old objects. This means that in addition to reloading the module, you generally have to re-import the module's symbols, re-initialize old class instances, etc. Compounding this difficulty is the fact that modules have to be reloaded in the correct order so that symbols are re-imported and propagated correctly. Several people have addressed this issue and explain it in greater detail:
Solutions to this problem usually involve hijacking Python's import mechanism to record module dependencies, and then making sure that modules are reloaded in the correct order. Others have suggested classes that track their instances and automatically update the __class__ attribute of each instance whenever the module is reloaded.

I haven't been very pleased with any of these options because they all require significant changes to my existing code to implement properly. In my search for the ultimate reload function, I eventually came up with a system that works very well for my application:

  1. Look through the entire set of imported modules and reload anything that has a new .py file available
  2. For every function and class method in the module, update the __code__ to point to the new function's __code__
  3. For every class, find all instances of the class and set __class__ to the new version. (it does not require extra work to track instances; just use gc.get_referrers to find them)
And that's it. In the end I have a single function called reloadAll which performs exactly the way you'd expect; all of your existing class instances start using the updated methods immediately, all modules point to the new objects, all functions update as well. Notably, it is not required to reload any modules in the correct order, nor is it required to re-import anything (with one exception, see below).

This works well with my existing applications, but there are a few situations where it won't work. If I make changes to the __init__ function of any class, or otherwise expect the internal state of an instance to be changed, I will have to re-create the instances manually (This is generally true of all languages I can think of; there is no general way to automatically re-initialize the state of the program). Additionally, any explicit references to objects in the module other than classes and functions (references to lists, for example) will still point to the old  objects. I can't think of a clever way around this, but it can be avoided easily: avoid using "from ... import ..." at all costs. If you are always forced to refer to the module when accessing its objects, then you automatically get the new versions when the module is reloaded.

Without further delay, the magic reloadAll function: http://luke.campagnola.me/code/downloads/reload.py

Leave a comment if you find a bug or have suggestions.

No comments:

Post a Comment