BORDER TOWN

Introducing Macaroni for C++

Monday October 1, 2012 00:28:20

It's with great pride and trepidation that I announce a project I've been working on for the past four years.

It's finally reached the point where I think it might be usable to people other than myself, so without further adieu I present “Macaroni for C++,” a sort of compiler for C++ that attempts to make working with C++ easier and more enjoyable.

Find out more here.

Intro to the Beginning of Getting Started on Windows

Sunday September 30, 2012 17:53:21

This blog post is intended to guide you through setting up a limited dev environment in Windows. The goal is to get around Windows annoyances and create a decent command line environment.

I'll be using Windows XP, since I have a spare license and can put a pristine install on a VM and can be assured the what I'm writing here is legit.

Change folder view options.
Right click on the Start bar, and select “Explore.” When the Explorer window pops up, click “Tools” then “Folder Options.” Select the “View” tab.
Check “Display the contents of system folders.”, “Show hidden files and folders.” Uncheck “Hide extensions for known file types” and “Hide protected operating system files (Recommended)” (click “Yes” to the popup). You may also want to uncheck “Remember each folder's view settings.” Finally, click Apply, then Ok.

Install Launchy.
Now, open up a internet browser (hopefully not IE), then download and install Launchy. Launchy lets you leave the Start Bar behind and open programs and folders by hitting ALT+Space then searching for them by name (it's why I'm incredulous when so-called power users write articles complaining about Windows 8 abandoning the Start Bar). Be sure to donate to Launchy if it helps you out!

Install Sublime Text 2.
Sublime Text is an awesome text editor that works on Windows, OSX, and Linux. Download it from this link and install it. Note that Sublime Text costs money, but one license is only $60 and applies to any computer you use which is awesome. You can try it out for free as long as you like, but if you use it forever without paying that'd make you a bad person.

Create a tools directory.
Pulling down a Windows Installer and letting it go nuts on your system is OK for some things but in general is a practice that 1. won't work for all tools and 2. isn't a good way to understand how the programs you use work. So my practice is to create a “Tools” directory which has all the stuff I use. An advantage to doing this is I can move the Tools directory to another machine if I need.

Create a new directory named “Tools” wherever. I'm going to assume it lives at “C:\Tools”.

Install Notepad2.
As nice as Sublime Text is, I also like to have use a different, single file editor at times to jot stuff down or open single files from Windows Explorer and Windows Notepad doesn't cut . Thankfully its freeware sequel is much nicer.

Download Notepad2 here. You can use the installer or not, it doesn't matter. However, it replaces the standard notepad, so lets try pulling down the non-installer version, which is a zip file. Open the zip file, then open a new Explorer window at the C:\Tools directory, create a folder named “Notepad2”, and dump the contents of the Zip file to it.

Now we're going to add Notepad2 to the “Send To” menu that appears when you right-click a file in Explorer. Open a new Explorer window, and enter the SendTo folder as explained here (for me it's located at C:\Documents and Settings\Tim\SendTo). Now, in the Explorer window with Notepad2 right click Notepad2.exe and select “Create Shortcut.” Move this shortcut into the SendTo folder and rename it to “Notepad2”.

At this point, Notepad2 will appear in the SendTo menu when you write click a file. This allows you quickly investigate the contents of a text file.

Create a batch file.
If you've ever had to change an environment variable in Windows, you'll probably agree that its a hellish experience and uses one of Microsoft's patented “impossible to resize even though all the critical contents can't be seen at once” windows.

That's OK though, because despite what most PHP installation tutorials tell you changing environment variables (especially PATH) globally is really gross and can destabilize your system.

Unix systems normally have a file called “bash.rc” or something similar which is executed every time a terminal is started. This is a great place to add environment variables and do other setup things. We're going to emulate that functionality in Windows by creating a batch file that'll execute each time we start a command prompt.

Enter C:\Tools\ and create a directory called “CmdTools”. Write click and select “New… Text Document.” Now right click the new file, select “Rename” and change the name to “DevStartup.bat”. Right click on it again, then select “Send To … NotePad2.”

Enter the following:

@ECHO OFF
REM SETLOCAL ENABLEDELAYEDEXPANSION
SET TOOLS_ROOT=C:\Tools
SET PATH=%PATH%;%TOOLS_ROOT%\CmdTools
ECHO Dev Environment

REM  Hard to see but there is indeed a lot of space on this next line
SET A_LOT_OF_SPACE=

REM Add some stuff to the path.
CALL :AddPath NOTEPAD2       "%TOOLS_ROOT%\Notepad2"

GOTO THE_END

REM Put subroutines and other stuff here.
:AddPath
SET TOOL_DIRECTORY_NAME=%1
SET TOOL_DIRECTORY=%2
REM - Strip quotes
SET TOOL_DIRECTORY=%TOOL_DIRECTORY:"=%
SET         FOUND=""
SET %TOOL_DIRECTORY_NAME%=%TOOL_DIRECTORY%

IF NOT EXIST "%TOOL_DIRECTORY%" GOTO NotFound
PATH=%PATH%;%TOOL_DIRECTORY%
SET FOUND=
GOTO PrintInfo
:NotFound
SET FOUND=MISSING
:PrintInfo
REM @ECHO %ELEMENT_NAME% == %ELEMENT_VALUE%

SET COL1=%A_LOT_OF_SPACE%%FOUND%
SET COL2=%A_LOT_OF_SPACE%%TOOL_DIRECTORY_NAME%
ECHO %COL1:~-7% %COL2:~-15% : %TOOL_DIRECTORY%
GOTO:EOF

:THE_END
REM ENDLOCAL
@ECHO ON

One important note: the line “SET A_LOT_OF_SPACE=” needs to have a bunch of spaces after it (like more than 80). As the comment implies there’s no good way (I could find in the two days I was interested in writing my batch file originally) to create a string with some set number of characters in it other than to just type a bunch of spaces after the equals.

Create a command line shortcut.
Open up C:\Tools in Windows Explorer, then right click somewhere and select “New Shortcut.” This will bring up a Wizard. For location, use:

C:\Windows\System32\Cmd.exe

Name can be anything I guess. I used “Dev Command Prompt”. Click Finish.

Right click the new shortcut and select properties.

For target, set this:

%comspec% /k ""C:\Tools\CmdTools\DevStartup.bat"" x86

For start in, put the name of whatever directory you most prominently work in. I personally have all of my work in a single root directory which is what I put here.

Now click “Layout”. In “Screen Buffer size”, the default number of lines is 300 which is absurd. You'll want something much higher. 2500 is probably good enough.

Width starts off as “80.” I personally try to make all of my work 80 columns, but the output of most tools is not so conservative so I go with 120 here. Enter that value for “Width” in both spots.

Click Apply to save the changes. Double click on the shortcut and you should see the following:

CMD PROMPT
                 NOTEPAD2 : C:\Tools\Notepad2
C:\Work>

Type “Notepad2” and you'll see Notepad2 pop up, even though we didn't use the installer.

Make Dev Command Prompt accessible from Launchy.

Right click the short cut and select copy.

Now click the Start Bar, and select All Programs, then right click Accessories and select Open. Paste the shortcut into that folder.

Hit Alt + Space to bring up Launchy. Right click on the little space ship icon to the right of the text box, and select “Rebuild Catalog.”

Now when you type “Dev Command Prompt” into Launchy it should open the new terminal.

Adding other programs.

The batch file puts the directory “C:\Tools\Notepad2” on the path, allowing you to type Notepad2 to launch it.

To add other tools to the path, add lines similar to the following:

CALL :AddPath NOTEPAD2       "%TOOLS_ROOT%\Notepad2"

For example, if you want to put MinGW on your path, you'd probably write this:

CALL :AddPath MINGW           "C:\MinGW\bin"

The nice thing is the batch file will show you just how populated your PATH environment variable is.

This batch file is also a good place to stick other vars. For example, lets say you're using the Clojure programming tool Leiningen and need to set the var “LEIN_JAR.” Just put the following line next to all the calls to “AddPath:”

SET LEIN_JAR=C:\Users\Tim\Tools\Local\Leiningen\leiningen-2.0.0-preview10-standalone.jar

At this point you should be ready to go. If you want more information on how the batch file works you can read this post from my old blog where I went into greater detail on it.

The Tenets of a Code Nomad

Sunday September 30, 2012 16:31:41

Programming in Windows is culturally different than programming in Linux. In Linux most acolytes get their start using the terminal and a simple text editor such as Vim, and are able to pull down almost every tool they need for free with a simple command like “apt-get install.” Windows programmers on the other hand are given tools (usually by Microsoft, and often at a job) which are GUI-centric and hide the command line or totally eschew it. Generally speaking, Windows developers grow attached to their GUI power-apps, such as Visual Studio, and have no idea why they'd ever give them up. Linux developers are instantly infuriated with Windows as most literature tells them their former workflows are incompatible.

I worked in the .NET, Windows world for five years before switching over to Linux. When I made the change, it hit me how dependent on Visual Studio I had become. If a programming language wasn't supported by Visual Studio, I gave it only an academic interest. If a project couldn't be built in Visual Studio, there was a high probability I wouldn't investigate it.

The switch to Linux forced me to see the error of my ways. By getting so comfy in Windows and .NET, I'd actually not just hurt my ability to work in foreign programming environments, but to understand them.

These days I see many people who have the same issue. Maybe they're Java power users who can't, for the life of them, leave Eclipse, or Linux programmers who won't touch Windows with a ten-foot pole.

I however have become a bit of a nomad. Switching from Windows to Linux slowed me down for awhile, but ultimately taught me a methodologies that are universal among operating systems and software projects.

It also inspired these beliefs:

The Windows console is god awful compared to most Linux terminals or the one in OSX. Copy and paste is particularly atrocious.
The Windows console can do everything you need a terminal to do and should be used regardless.
There are numerous replacement consoles for Windows. In my experience these are buggy and seem to execute programs slower.
IDEs, especially Visual Studio, which is platform dependent, should never be relied on to do something critical like build an app unless you can also perform that action at the command line.
That said, IDEs are still very useful and you should try to support them for larger projects.
If you're writing your own tools, GUIs can sometimes be more useful than command line apps. However the payoff almost never justifies the time it takes.
You don't need an IDE to debug code. However most non-IDE based debuggers are a pain (which is why tests are so great).
JetBrains makes some truly nice cross-platform IDEs.
IDEs slow down experimentation and discovery by forcing you to configure them for every project you work with. That's why simpler text editors, which can be made to view files in a directory in a single step, are so useful. If you want to be able to pull down random projects from GitHub, possibly written in programming languages you've never used, it makes sense to get acquainted with a simple text editor.
Sublime Text is the greatest cross-platform text editor in existence. Other tools to consider are Vim and Emacs, though you will have to spend time learning them. College professors always say that becoming proficient at those two editors will ultimately make you faster than using other tools but I believe this is a myth. I've never witnessed a Vim or Emac power user navigate a code base faster than a Sublime Text user.
The greatest attribute of a text editor is the ability to quickly locate files. In Sublime Text, you can do this by pressing Ctrl + P and entering the name of the file. In JetBrains IDEs hit Ctrl + Shift + N.

These days, I feel comfortable on any operating system or code base because I have a set of practices that apply to all of them. It's not the best set for any given project but it gets me most of the way there and practicing it has made me a better programmer.

Using Log Instances

Saturday September 22, 2012 16:19:46

Anyone who's programmed long running processes, like servers or daemons, knows the importance of logging. Assert statements may be passe but log statements proliferate in these code bases.

Log calls are so prevalent that they get a pass from one of the most important conventions in OO-programming: static variables should be avoided. Most logging frameworks allow users to create static variables, one per process or one per code file, to ease usage. While this might make logging simple to begin with, I'm starting to believe it's a bad idea that should be abandoned in favor of more laborious log creation.

Here's a fake example of how logging is typically done:

from superframework.logging import create_log

LOG = LOGGING.make_a_log

class ThingThatDoesNotWork:
    def do_something(self):
        LOG.write_line("I am totally doing something right now!")

To log we first create a single static reference which is used everywhere else in this code file. Information on how this object is created and what formatting and customization applies to the logs lives in some central location such as a properties file. Typical log formats show things like the time and place the log statement was made next to the message. It looks a bit like this:

2012-09-21 12:30:00 [things.py, 7] I am totalling doing something right now!

Other than these properties, which are usually global, the only way to affect the information sent to the logs is the argument to write_line. What if you wanted some information to appear often enough that putting it into the write_line argument was a hassle, but you didn't want it to show up for every call to write_line for the application?

For example, what if 10 or more “ThingThatDoesNotWork” objects are created? It would be nice to give them an ID and show this next to each log message to keep things clear.

from superframework.logging import create_log
from myframework.util import generate_id

LOG = LOGGING.make_a_log

class ThingThatDoesNotWork:
    def __init__(self):
        self.id = generate_id()

    def do_something(self):
        LOG.write_line("%s I am totally doing something right now!" % self.id)

The only problem is this is now a convention, which means programmers are free to ignore it. If you create a log method in ThingThatDoesNotWork that logs everything, you can have it always spit out the ID, which is a bit better:

...
class ThingThatDoesNotWork:
    ....
    def do_something(self):
        self.log("I am totally doing something right now!")
        if not complex_function():
            LOG.write_line("Complex function failure!")

    def log(self, message):
        LOG.write_line("%s %s" % (self.id, message))

The problem with this, aside from having to write a series of new logging methods on each suitable class, is the log lacks context from outside the ThingThatDoesNotWork class. For example, what created the ThingThatDoesNotWork? You could log when a new instance is created, and also log from where, by logging the “id” field of each new ThingThatDoesNotWork instance. You'd then have to find the ID from the log generated by ThingThatDoesNotWork, and then reverse search the logs to find where that same ID is generated. On top of taking awhile, all of this would still be by convention as well, meaning maybe sometime someone would not log the ID of a new instance when they created it, hiding the source of troubles.

2012-09-21 12:00:00 [server.py, 10] Starting server... 
... stuff ...
2012-09-21 12:28:00 [dispatch.py, 15] Received a new request: POST something 
2012-09-21 12:28:57 [dispatcher.py 75] Created a ThingThatDoesNotWork, id=1
2012-09-21 12:28:59 [file_you_do_not_care_about.py, 32] TODO: Is this code even executing?
2012-09-21 12:29:09 [things.py, 7] 1 I am totalling doing something right now!
2012-09-21 12:28:59 [file_you_do_not_care_about.py, 53] Blah blah blah blah
...
2012-09-21 12:29:43 [file_you_do_not_care_about.py, 83] Value of a local variable in function that works perfectly="abcdef"
2012-09-21 12:29:45 [dispatcher.py 75] Created a ThingThatDoesNotWork, id=57
2012-09-21 12:29:59 [file_you_do_not_care_about.py, 33] Returning my VALUE of <Person, age=4, name=Pedro>
...
2012-09-21 12:29:59 [file_you_do_not_care_about.py, 65] Swallowing exception because I'm pretty sure this function always works. Message: Memory buffer explosion, please terminate app.
2012-09-21 12:30:00 [things.py, 7] 57 I am totalling doing something right now!
2012-09-21 12:30:01 [things.py, 7] Complex function failure!

As seen here, figuring out where instance 57 demands you scroll up in the log past mountains of garbage. Additionally, when “complex function” fails its accidentally logged with the static LOG var so you miss the chance to know the ID anyway.

Most “best practice” logging frameworks provide quick and dirty techniques to get up and running which use singleton patterns of some kind or another. This approach works and can be jammed into any program quickly but, despite the fact everyone does it, I don't believe it's the best possible fit for any program. Like all aspects of your code its worthwhile to consider creating an interface that makes the most sense for the given problem domain.

Once I wrote an app that had to log long running jobs. A job might call a method 99 times in a loop without incident before something goes hideously wrong on the 100th call. I didn't care about the other 99 calls, but when that 100th call really screwed things up I wanted a way to quickly see any related information while shutting out the unrelated noise. This meant I wanted to log absolutely everything but somehow not see stuff after the fact.

My idea was that each log object needed to store the relationship to its parent. This way, I could have a nested log, one I could use to create a tree view and drill down into errors. I made an implementation that produced a flat log file as well, but I really preferred that tree view whenever I could have it and came up with an interface like this:

class Log:
    def __init__(self, name, parent=None):
        self.name = name
        self.parent = parent

    def write_line(self, message):
        """Write a message to the log for this task."""
        ...

    @property
    def factory(self):
        def create(name):
            return Log(name, self)
        return create

The way it works is at the start of a job, a new Log instance is created. As it calls methods, it either passes its Log instance in or makes a new one.

from myframework.logging import create_log
from myframework.util import generate_id

class ThingThatDoesNotWork:
    def __init__(self, log_creator):
        self.id = generate_id()
        self.log = log_create(self.id)
    def do_something(self):
        self.log.write("I am totally doing something right now!")

In a server application, this might means each request gets a new log, with an ID all its own, and as it calls long running methods, they too have identification information they set in their log objects.

The important thing is the chain of who created what where is preserved by the parent reference in each Log instance. You don't have to store the parent reference actually, whats important is that the child_factory method has some knowledge of where the log came from embedded in it.

In fact, using this interface I once created an implementation which didn't store the parent at all. Instead it just kept a number, which started at zero with the first Log instance and was incremented by one for each child instance. This was used to indent the logs to make the stack like nature clear.

In this example lets say we store such a count, as well as some string used to identify the log. The result looks like this:

...
class Log:
    def __init__(self, name, indent=0):
        self.name = name
        self.indent = indent

    def write_line(self, message):
        """Write a message to the log for this task."""
        LOG.write_line("%s [%s] %s"  
                               % ("|---" * self.indent, self.name, message)

    @property
    def factory(self):
        def create(name):
            return Log(name, self, self.indent + 1)
        return create

    def close(self):
        """Close this log somehow..."""

...

LOG = Log("server", parent=None)

def server_loop():
    LOG.write_line("Starting server...")
    while True:
        request = next_request()        
        method_dispatcher(request, LOG.factory)

...

def method_dispatcher(create_log):
    log = create_log(request.id)
    LOG.write_line("Received a new request: %s, %s " % (request.method, request.path))
    if request.method == "POST":
        some_other_method(log, request.path)
    ...
...

def some_other_method(log, path):
    if path="something":
        thing = ThingThatDoesNotWork(log.factory)
        do_something()

The log file of this trite example would look like this:

2012-09-21 12:00:00 Starting server... [server.py, 10]
2012-09-21 12:28:00 |--- [Request 01b7d394-eecc-4306-a628-0401676d85c1] Received a new request: POST something [dispatch.py, 15]
2012-09-21 12:30:00 |---|--- [Request 01b7d394-eecc-4306-a628-0401676d85c1][Thing] I am totalling doing something right now! [things.py, 7]

Forcing an instance to set up a log instance at important points means the code naturally registers important events in a hierarchical fashion, which is something you don't get by allowing code to get by with a lazy call to a global object or function. This opens the door to amazing possibilities later.

For example, what if you decide to log each request in its own file (what decadence! What luxury!). With this interface, writing such an implementation is child's play.

The Log implementation used by server_loop would itself log to a normal file, but the Log instance creation function returned by the factory property would create a new log file. This means every request would get its own file. You could then remove all the prefixed information relevant to the request, making it easier to read:

Let's say the name of the file was “Request_01b7d394-eecc-4306-a628-0401676d85c1”:

2012-09-21 12:28:00 Received a new request: POST something [dispatch.py, 15]
2012-09-21 12:29:50 Thing:57:
2012-09-21 12:30:00 |--- I am totalling doing something right now! [things.py, 7]

Notice how the implementation was changed, so that when ThingThatDoesNotWork calls create_log, it logs the argument given to create log, and creates a new log object which logs to the same file while indenting all log statements from that point out. Unlike the earlier example, where was on every applicable line, here it just appears before all the log statements. This makes sense because again the log file will all be from a single thread of execution.

I don't think this is the perfect logging interface. In many cases, creating a new log object for everything presents too much of a burden to the code clarity or execution time. However, there are definite benefits, and the trade-offs are worth considering.

However you accomplish logging, claim ownership of the interfaces and patterns so that you can tailor them to your app.

<---2012-12-16 20:06:37

history

2012-09-16 18:52:01--->

Main

Projects