BORDER TOWN

The Tenets of a Code Nomad

Sunday September 30, 2012 16:31:41

Programming in Windows is culturally different than programming in Linux. In Linux most acolytes get their start using the terminal and a simple text editor such as Vim, and are able to pull down almost every tool they need for free with a simple command like “apt-get install.” Windows programmers on the other hand are given tools (usually by Microsoft, and often at a job) which are GUI-centric and hide the command line or totally eschew it. Generally speaking, Windows developers grow attached to their GUI power-apps, such as Visual Studio, and have no idea why they'd ever give them up. Linux developers are instantly infuriated with Windows as most literature tells them their former workflows are incompatible.

I worked in the .NET, Windows world for five years before switching over to Linux. When I made the change, it hit me how dependent on Visual Studio I had become. If a programming language wasn't supported by Visual Studio, I gave it only an academic interest. If a project couldn't be built in Visual Studio, there was a high probability I wouldn't investigate it.

The switch to Linux forced me to see the error of my ways. By getting so comfy in Windows and .NET, I'd actually not just hurt my ability to work in foreign programming environments, but to understand them.

These days I see many people who have the same issue. Maybe they're Java power users who can't, for the life of them, leave Eclipse, or Linux programmers who won't touch Windows with a ten-foot pole.

I however have become a bit of a nomad. Switching from Windows to Linux slowed me down for awhile, but ultimately taught me a methodologies that are universal among operating systems and software projects.

It also inspired these beliefs:

The Windows console is god awful compared to most Linux terminals or the one in OSX. Copy and paste is particularly atrocious.
The Windows console can do everything you need a terminal to do and should be used regardless.
There are numerous replacement consoles for Windows. In my experience these are buggy and seem to execute programs slower.
IDEs, especially Visual Studio, which is platform dependent, should never be relied on to do something critical like build an app unless you can also perform that action at the command line.
That said, IDEs are still very useful and you should try to support them for larger projects.
If you're writing your own tools, GUIs can sometimes be more useful than command line apps. However the payoff almost never justifies the time it takes.
You don't need an IDE to debug code. However most non-IDE based debuggers are a pain (which is why tests are so great).
JetBrains makes some truly nice cross-platform IDEs.
IDEs slow down experimentation and discovery by forcing you to configure them for every project you work with. That's why simpler text editors, which can be made to view files in a directory in a single step, are so useful. If you want to be able to pull down random projects from GitHub, possibly written in programming languages you've never used, it makes sense to get acquainted with a simple text editor.
Sublime Text is the greatest cross-platform text editor in existence. Other tools to consider are Vim and Emacs, though you will have to spend time learning them. College professors always say that becoming proficient at those two editors will ultimately make you faster than using other tools but I believe this is a myth. I've never witnessed a Vim or Emac power user navigate a code base faster than a Sublime Text user.
The greatest attribute of a text editor is the ability to quickly locate files. In Sublime Text, you can do this by pressing Ctrl + P and entering the name of the file. In JetBrains IDEs hit Ctrl + Shift + N.

These days, I feel comfortable on any operating system or code base because I have a set of practices that apply to all of them. It's not the best set for any given project but it gets me most of the way there and practicing it has made me a better programmer.

Using Log Instances

Saturday September 22, 2012 16:19:46

Anyone who's programmed long running processes, like servers or daemons, knows the importance of logging. Assert statements may be passe but log statements proliferate in these code bases.

Log calls are so prevalent that they get a pass from one of the most important conventions in OO-programming: static variables should be avoided. Most logging frameworks allow users to create static variables, one per process or one per code file, to ease usage. While this might make logging simple to begin with, I'm starting to believe it's a bad idea that should be abandoned in favor of more laborious log creation.

Here's a fake example of how logging is typically done:

from superframework.logging import create_log

LOG = LOGGING.make_a_log

class ThingThatDoesNotWork:
    def do_something(self):
        LOG.write_line("I am totally doing something right now!")

To log we first create a single static reference which is used everywhere else in this code file. Information on how this object is created and what formatting and customization applies to the logs lives in some central location such as a properties file. Typical log formats show things like the time and place the log statement was made next to the message. It looks a bit like this:

2012-09-21 12:30:00 [things.py, 7] I am totalling doing something right now!

Other than these properties, which are usually global, the only way to affect the information sent to the logs is the argument to write_line. What if you wanted some information to appear often enough that putting it into the write_line argument was a hassle, but you didn't want it to show up for every call to write_line for the application?

For example, what if 10 or more “ThingThatDoesNotWork” objects are created? It would be nice to give them an ID and show this next to each log message to keep things clear.

from superframework.logging import create_log
from myframework.util import generate_id

LOG = LOGGING.make_a_log

class ThingThatDoesNotWork:
    def __init__(self):
        self.id = generate_id()

    def do_something(self):
        LOG.write_line("%s I am totally doing something right now!" % self.id)

The only problem is this is now a convention, which means programmers are free to ignore it. If you create a log method in ThingThatDoesNotWork that logs everything, you can have it always spit out the ID, which is a bit better:

...
class ThingThatDoesNotWork:
    ....
    def do_something(self):
        self.log("I am totally doing something right now!")
        if not complex_function():
            LOG.write_line("Complex function failure!")

    def log(self, message):
        LOG.write_line("%s %s" % (self.id, message))

The problem with this, aside from having to write a series of new logging methods on each suitable class, is the log lacks context from outside the ThingThatDoesNotWork class. For example, what created the ThingThatDoesNotWork? You could log when a new instance is created, and also log from where, by logging the “id” field of each new ThingThatDoesNotWork instance. You'd then have to find the ID from the log generated by ThingThatDoesNotWork, and then reverse search the logs to find where that same ID is generated. On top of taking awhile, all of this would still be by convention as well, meaning maybe sometime someone would not log the ID of a new instance when they created it, hiding the source of troubles.

2012-09-21 12:00:00 [server.py, 10] Starting server... 
... stuff ...
2012-09-21 12:28:00 [dispatch.py, 15] Received a new request: POST something 
2012-09-21 12:28:57 [dispatcher.py 75] Created a ThingThatDoesNotWork, id=1
2012-09-21 12:28:59 [file_you_do_not_care_about.py, 32] TODO: Is this code even executing?
2012-09-21 12:29:09 [things.py, 7] 1 I am totalling doing something right now!
2012-09-21 12:28:59 [file_you_do_not_care_about.py, 53] Blah blah blah blah
...
2012-09-21 12:29:43 [file_you_do_not_care_about.py, 83] Value of a local variable in function that works perfectly="abcdef"
2012-09-21 12:29:45 [dispatcher.py 75] Created a ThingThatDoesNotWork, id=57
2012-09-21 12:29:59 [file_you_do_not_care_about.py, 33] Returning my VALUE of <Person, age=4, name=Pedro>
...
2012-09-21 12:29:59 [file_you_do_not_care_about.py, 65] Swallowing exception because I'm pretty sure this function always works. Message: Memory buffer explosion, please terminate app.
2012-09-21 12:30:00 [things.py, 7] 57 I am totalling doing something right now!
2012-09-21 12:30:01 [things.py, 7] Complex function failure!

As seen here, figuring out where instance 57 demands you scroll up in the log past mountains of garbage. Additionally, when “complex function” fails its accidentally logged with the static LOG var so you miss the chance to know the ID anyway.

Most “best practice” logging frameworks provide quick and dirty techniques to get up and running which use singleton patterns of some kind or another. This approach works and can be jammed into any program quickly but, despite the fact everyone does it, I don't believe it's the best possible fit for any program. Like all aspects of your code its worthwhile to consider creating an interface that makes the most sense for the given problem domain.

Once I wrote an app that had to log long running jobs. A job might call a method 99 times in a loop without incident before something goes hideously wrong on the 100th call. I didn't care about the other 99 calls, but when that 100th call really screwed things up I wanted a way to quickly see any related information while shutting out the unrelated noise. This meant I wanted to log absolutely everything but somehow not see stuff after the fact.

My idea was that each log object needed to store the relationship to its parent. This way, I could have a nested log, one I could use to create a tree view and drill down into errors. I made an implementation that produced a flat log file as well, but I really preferred that tree view whenever I could have it and came up with an interface like this:

class Log:
    def __init__(self, name, parent=None):
        self.name = name
        self.parent = parent

    def write_line(self, message):
        """Write a message to the log for this task."""
        ...

    @property
    def factory(self):
        def create(name):
            return Log(name, self)
        return create

The way it works is at the start of a job, a new Log instance is created. As it calls methods, it either passes its Log instance in or makes a new one.

from myframework.logging import create_log
from myframework.util import generate_id

class ThingThatDoesNotWork:
    def __init__(self, log_creator):
        self.id = generate_id()
        self.log = log_create(self.id)
    def do_something(self):
        self.log.write("I am totally doing something right now!")

In a server application, this might means each request gets a new log, with an ID all its own, and as it calls long running methods, they too have identification information they set in their log objects.

The important thing is the chain of who created what where is preserved by the parent reference in each Log instance. You don't have to store the parent reference actually, whats important is that the child_factory method has some knowledge of where the log came from embedded in it.

In fact, using this interface I once created an implementation which didn't store the parent at all. Instead it just kept a number, which started at zero with the first Log instance and was incremented by one for each child instance. This was used to indent the logs to make the stack like nature clear.

In this example lets say we store such a count, as well as some string used to identify the log. The result looks like this:

...
class Log:
    def __init__(self, name, indent=0):
        self.name = name
        self.indent = indent

    def write_line(self, message):
        """Write a message to the log for this task."""
        LOG.write_line("%s [%s] %s"  
                               % ("|---" * self.indent, self.name, message)

    @property
    def factory(self):
        def create(name):
            return Log(name, self, self.indent + 1)
        return create

    def close(self):
        """Close this log somehow..."""

...

LOG = Log("server", parent=None)

def server_loop():
    LOG.write_line("Starting server...")
    while True:
        request = next_request()        
        method_dispatcher(request, LOG.factory)

...

def method_dispatcher(create_log):
    log = create_log(request.id)
    LOG.write_line("Received a new request: %s, %s " % (request.method, request.path))
    if request.method == "POST":
        some_other_method(log, request.path)
    ...
...

def some_other_method(log, path):
    if path="something":
        thing = ThingThatDoesNotWork(log.factory)
        do_something()

The log file of this trite example would look like this:

2012-09-21 12:00:00 Starting server... [server.py, 10]
2012-09-21 12:28:00 |--- [Request 01b7d394-eecc-4306-a628-0401676d85c1] Received a new request: POST something [dispatch.py, 15]
2012-09-21 12:30:00 |---|--- [Request 01b7d394-eecc-4306-a628-0401676d85c1][Thing] I am totalling doing something right now! [things.py, 7]

Forcing an instance to set up a log instance at important points means the code naturally registers important events in a hierarchical fashion, which is something you don't get by allowing code to get by with a lazy call to a global object or function. This opens the door to amazing possibilities later.

For example, what if you decide to log each request in its own file (what decadence! What luxury!). With this interface, writing such an implementation is child's play.

The Log implementation used by server_loop would itself log to a normal file, but the Log instance creation function returned by the factory property would create a new log file. This means every request would get its own file. You could then remove all the prefixed information relevant to the request, making it easier to read:

Let's say the name of the file was “Request_01b7d394-eecc-4306-a628-0401676d85c1”:

2012-09-21 12:28:00 Received a new request: POST something [dispatch.py, 15]
2012-09-21 12:29:50 Thing:57:
2012-09-21 12:30:00 |--- I am totalling doing something right now! [things.py, 7]

Notice how the implementation was changed, so that when ThingThatDoesNotWork calls create_log, it logs the argument given to create log, and creates a new log object which logs to the same file while indenting all log statements from that point out. Unlike the earlier example, where was on every applicable line, here it just appears before all the log statements. This makes sense because again the log file will all be from a single thread of execution.

I don't think this is the perfect logging interface. In many cases, creating a new log object for everything presents too much of a burden to the code clarity or execution time. However, there are definite benefits, and the trade-offs are worth considering.

However you accomplish logging, claim ownership of the interfaces and patterns so that you can tailor them to your app.

Doing the Same Thing Twice

Sunday September 16, 2012 18:52:01

Yesterday a friend of mine was mine was telling a story, to which I replied “well the definition of genius is doing the same thing twice and expecting a different result!”

Everyone had a good laugh. We discussed the meaning of the joke for half a minute, before I concluded by adding in the exact same inflection “like I always say, the definition of genius is doing the same thing twice and expecting a different result!”

At this point someone else in the room, who'd been within ear shot the entire time but hadn't been listening, suddenly piped in with “what? You know that's the definition of insanity, right?”

I had tried to do the exact same thing twice, implicitly proving the real saying as a joke. Ironically however the result was different the second time.

Scientifically, doing the same thing twice should yield the same result, so assuming something different will happen is crazy. In the real world, though, almost nothing is isolated, so nothing is idempotent, and something different usually happens. Getting the same thing to happen twice is actually really difficult, and in fact the only way in which experiments become respected in the scientific community is if they are repeated, preferably by different people.

However, too often people take it for granted that the same thing will happen twice because they mistakenly think they completely understand the causes of what happened the first time. If you acknowledge the fact that most of us never really know what we're doing anyway and those that do rarely have 100% control of the outcome of our actions, there are many things people could attempt repeatedly while getting widely different results. This phenomenon forms the basis of the game “Horse.”

If being insane means expecting a different outcome from the same actions, yet most people are incapable of performing the intended action once, let alone twice, then maybe the real measure of sanity (and competence in general) is the ability to identify why an outcome is different from a goal and adjust accordingly.

How to install Java on Windows

Tuesday September 11, 2012 00:28:41

I decided to learn Clojure recently, which unfortunately means installing Java on Windows.

In case you don't know, when people blame Microsoft for creating insecure operating systems, they're half-right. Really, Windows and its ilk are incredibly secure. The issue is all the common junk people download for it; things like Winamp, various chat clients, and Java. I have personally lost a Windows Vista install which was otherwise up to date due to an exploit in Java which infected my computer via an ad pop-up in YTMND. Here's an article where security experts berate Java for being increasingly unsafe.

So, what's my secret now? First off, I do all my web browsing inside a Virtual Box VM running Ubuntu. LOL! Imagine what a funny joke that would be if it weren't true!

Second off, when I install the JDK I make sure to disable installing Java applets. It seems like Oracle is increasingly aggressive about installing the ability to run Java applets on a machine whenever you download the JDK. Except for online poker sites and virus writers, no one has seriously used a Java applet since the late nineties so make sure you treat it like cancer and don't let it into your system!

These days the installer installs the applet viewer with the “public JRE.” When installing the JDK, make sure you deny this option:

Don't worry, there's a private JRE too that gets installed too which is all you need as a developer.

Java's crappy track record with security, along with Oracle's ever-increasing dickishness, has dampened my desire to do anything with Java over the past year even though its such a nice framework. One of the reasons I'm learning Clojure is because there is a seemingly finished port to .NET and Javascript compiler, meaning I could eventually use it on a more stable, less-litigated platform, although for now I plan on going with the most used platform for Clojure since that's what any book will teach.

<---2012-11-23 14:40:17

history

2012-09-01 14:28:57--->

Main

Projects