Anyone who's programmed long running processes, like servers or daemons, knows the importance of logging. Assert statements may be passe but log statements proliferate in these code bases.
Log calls are so prevalent that they get a pass from one of the most important conventions in OO-programming: static variables should be avoided. Most logging frameworks allow users to create static variables, one per process or one per code file, to ease usage. While this might make logging simple to begin with, I'm starting to believe it's a bad idea that should be abandoned in favor of more laborious log creation.
Here's a fake example of how logging is typically done:
from superframework.logging import create_log
LOG = LOGGING.make_a_log
class ThingThatDoesNotWork:
def do_something(self):
LOG.write_line("I am totally doing something right now!")
To log we first create a single static reference which is used everywhere else in this code file. Information on how this object is created and what formatting and customization applies to the logs lives in some central location such as a properties file. Typical log formats show things like the time and place the log statement was made next to the message. It looks a bit like this:
2012-09-21 12:30:00 [things.py, 7] I am totalling doing something right now!
Other than these properties, which are usually global, the only way to affect the information sent to the logs is the argument to write_line. What if you wanted some information to appear often enough that putting it into the write_line argument was a hassle, but you didn't want it to show up for every call to write_line for the application?
For example, what if 10 or more “ThingThatDoesNotWork” objects are created? It would be nice to give them an ID and show this next to each log message to keep things clear.
from superframework.logging import create_log
from myframework.util import generate_id
LOG = LOGGING.make_a_log
class ThingThatDoesNotWork:
def __init__(self):
self.id = generate_id()
def do_something(self):
LOG.write_line("%s I am totally doing something right now!" % self.id)
The only problem is this is now a convention, which means programmers are free to ignore it. If you create a log method in ThingThatDoesNotWork that logs everything, you can have it always spit out the ID, which is a bit better:
...
class ThingThatDoesNotWork:
....
def do_something(self):
self.log("I am totally doing something right now!")
if not complex_function():
LOG.write_line("Complex function failure!")
def log(self, message):
LOG.write_line("%s %s" % (self.id, message))
The problem with this, aside from having to write a series of new logging methods on each suitable class, is the log lacks context from outside the ThingThatDoesNotWork class. For example, what created the ThingThatDoesNotWork? You could log when a new instance is created, and also log from where, by logging the “id” field of each new ThingThatDoesNotWork instance. You'd then have to find the ID from the log generated by ThingThatDoesNotWork, and then reverse search the logs to find where that same ID is generated. On top of taking awhile, all of this would still be by convention as well, meaning maybe sometime someone would not log the ID of a new instance when they created it, hiding the source of troubles.
2012-09-21 12:00:00 [server.py, 10] Starting server...
... stuff ...
2012-09-21 12:28:00 [dispatch.py, 15] Received a new request: POST something
2012-09-21 12:28:57 [dispatcher.py 75] Created a ThingThatDoesNotWork, id=1
2012-09-21 12:28:59 [file_you_do_not_care_about.py, 32] TODO: Is this code even executing?
2012-09-21 12:29:09 [things.py, 7] 1 I am totalling doing something right now!
2012-09-21 12:28:59 [file_you_do_not_care_about.py, 53] Blah blah blah blah
...
2012-09-21 12:29:43 [file_you_do_not_care_about.py, 83] Value of a local variable in function that works perfectly="abcdef"
2012-09-21 12:29:45 [dispatcher.py 75] Created a ThingThatDoesNotWork, id=57
2012-09-21 12:29:59 [file_you_do_not_care_about.py, 33] Returning my VALUE of <Person, age=4, name=Pedro>
...
2012-09-21 12:29:59 [file_you_do_not_care_about.py, 65] Swallowing exception because I'm pretty sure this function always works. Message: Memory buffer explosion, please terminate app.
2012-09-21 12:30:00 [things.py, 7] 57 I am totalling doing something right now!
2012-09-21 12:30:01 [things.py, 7] Complex function failure!
As seen here, figuring out where instance 57 demands you scroll up in the log past mountains of garbage. Additionally, when “complex function” fails its accidentally logged with the static LOG var so you miss the chance to know the ID anyway.
Most “best practice” logging frameworks provide quick and dirty techniques to get up and running which use singleton patterns of some kind or another. This approach works and can be jammed into any program quickly but, despite the fact everyone does it, I don't believe it's the best possible fit for any program. Like all aspects of your code its worthwhile to consider creating an interface that makes the most sense for the given problem domain.
Once I wrote an app that had to log long running jobs. A job might call a method 99 times in a loop without incident before something goes hideously wrong on the 100th call. I didn't care about the other 99 calls, but when that 100th call really screwed things up I wanted a way to quickly see any related information while shutting out the unrelated noise. This meant I wanted to log absolutely everything but somehow not see stuff after the fact.
My idea was that each log object needed to store the relationship to its parent. This way, I could have a nested log, one I could use to create a tree view and drill down into errors. I made an implementation that produced a flat log file as well, but I really preferred that tree view whenever I could have it and came up with an interface like this:
class Log:
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
def write_line(self, message):
"""Write a message to the log for this task."""
...
@property
def factory(self):
def create(name):
return Log(name, self)
return create
The way it works is at the start of a job, a new Log instance is created. As it calls methods, it either passes its Log instance in or makes a new one.
from myframework.logging import create_log
from myframework.util import generate_id
class ThingThatDoesNotWork:
def __init__(self, log_creator):
self.id = generate_id()
self.log = log_create(self.id)
def do_something(self):
self.log.write("I am totally doing something right now!")
In a server application, this might means each request gets a new log, with an ID all its own, and as it calls long running methods, they too have identification information they set in their log objects.
The important thing is the chain of who created what where is preserved by the parent reference in each Log instance. You don't have to store the parent reference actually, whats important is that the child_factory method has some knowledge of where the log came from embedded in it.
In fact, using this interface I once created an implementation which didn't store the parent at all. Instead it just kept a number, which started at zero with the first Log instance and was incremented by one for each child instance. This was used to indent the logs to make the stack like nature clear.
In this example lets say we store such a count, as well as some string used to identify the log. The result looks like this:
...
class Log:
def __init__(self, name, indent=0):
self.name = name
self.indent = indent
def write_line(self, message):
"""Write a message to the log for this task."""
LOG.write_line("%s [%s] %s"
% ("|---" * self.indent, self.name, message)
@property
def factory(self):
def create(name):
return Log(name, self, self.indent + 1)
return create
def close(self):
"""Close this log somehow..."""
...
LOG = Log("server", parent=None)
def server_loop():
LOG.write_line("Starting server...")
while True:
request = next_request()
method_dispatcher(request, LOG.factory)
...
def method_dispatcher(create_log):
log = create_log(request.id)
LOG.write_line("Received a new request: %s, %s " % (request.method, request.path))
if request.method == "POST":
some_other_method(log, request.path)
...
...
def some_other_method(log, path):
if path="something":
thing = ThingThatDoesNotWork(log.factory)
do_something()
The log file of this trite example would look like this:
2012-09-21 12:00:00 Starting server... [server.py, 10]
2012-09-21 12:28:00 |--- [Request 01b7d394-eecc-4306-a628-0401676d85c1] Received a new request: POST something [dispatch.py, 15]
2012-09-21 12:30:00 |---|--- [Request 01b7d394-eecc-4306-a628-0401676d85c1][Thing] I am totalling doing something right now! [things.py, 7]
Forcing an instance to set up a log instance at important points means the code naturally registers important events in a hierarchical fashion, which is something you don't get by allowing code to get by with a lazy call to a global object or function. This opens the door to amazing possibilities later.
For example, what if you decide to log each request in its own file (what decadence! What luxury!). With this interface, writing such an implementation is child's play.
The Log implementation used by server_loop would itself log to a normal file, but the Log instance creation function returned by the factory property would create a new log file. This means every request would get its own file. You could then remove all the prefixed information relevant to the request, making it easier to read:
Let's say the name of the file was “Request_01b7d394-eecc-4306-a628-0401676d85c1”:
2012-09-21 12:28:00 Received a new request: POST something [dispatch.py, 15]
2012-09-21 12:29:50 Thing:57:
2012-09-21 12:30:00 |--- I am totalling doing something right now! [things.py, 7]
Notice how the implementation was changed, so that when ThingThatDoesNotWork calls create_log, it logs the argument given to create log, and creates a new log object which logs to the same file while indenting all log statements from that point out. Unlike the earlier example, where was on every applicable line, here it just appears before all the log statements. This makes sense because again the log file will all be from a single thread of execution.
I don't think this is the perfect logging interface. In many cases, creating a new log object for everything presents too much of a burden to the code clarity or execution time. However, there are definite benefits, and the trade-offs are worth considering.
However you accomplish logging, claim ownership of the interfaces and patterns so that you can tailor them to your app.