I've always appreciated quick programs, especially ones that perform well even on ancient or cheap hardware. I feel that most software running today is far slower than it needs to be, which has led to a global waste of hardware resources as everyone compensates by upgrading their gear. So of course I'd like to aspire to write quick code myself.
Most advice I've read on writing efficient code states that measurements are key and should be taken before taking any action at all. Unfortunately, everything I've read fails to mention what tools should be used for measuring code performance or cite products costing thousands of dollars. They often include a snarky comment about how if you're a serious developers you'll recoup your investment anyway. While I've slowly accrued enough hours of experience writing C++ over the years that I wouldn't laugh at myself too hard if I said I was a serious C++ developer, because the majority of work I do is at home in my spare time I just can't justify dropping several grand on code analysis tools.
So I was very happy the other day to discover Very Sleepy, a totally free code profiler that ended up helping me trim nearly 30% off the running time of Macaroni for C++.
Very Sleepy is noninvasive, meaning you don't have to recompile an application to use it (though the app must be compiled with debug symbols). How it works is by suspending the app once each millisecond and recording the stack trace. It then determines how much time is being spent in each function. Here's how I used it to get major performance gains from Macaroni in less than one hour.
Over the past year or so, I've moved from working on Macaroni for C++ in my spare time to actually using it for a game engine. The experience has made me aware of different, more practical issues in Macaroni itself. One thing I've noticed is while run times with Macaroni aren't that bad, they aren't where I'd like them to be either.
A few months back I added a feature to Macaroni which shows the time it's spent running so far next to every line output to the console. That helped me to get some solid numbers on how long projects were taking to parse and build.
Today, I was running a project which Macaroni spent 5.732 seconds simply analyzing and generating code for before it even invoked bjam. This isn't necessarily trivial work, but since Macaroni is not itself a real C++ compiler it's the kind of task I'd ultimately like to see happen in one second or less.
So, I invoked Macaroni as usual but added the “startPrompt” argument, which makes it sit around and wait for a user to press enter. This allows the program to get attached to a debugger. I then opened up Very Sleepy, and found Macaroni on it's list of running processes. I double clicked it, and then raced back to enter something into Macaroni's dummy prompt, so the time spent waiting at the prompt wouldn't skew the results (note that it's also possible to launch a process from Very Sleepy, but I was being lazy - though hopefully this serves as a proof of sorts that even casual usage can yield tangible benefits).
Since Very Sleepy was busy profiling it, Macaroni took longer to run- about 40 seconds or so- but when it stopped I got the following nice window showing all of the function calls in Macaroni and which ones had taken the most time.

One thing that's interesting is that the “% Inclusive” column shows the time spent not just running code in a function, but running all the code that function itself called. Because of this, the highest “% Inclusive” value is for the main method followed by other methods, such as “Macaroni::Environment::ProjectEnvironment::RunCommand,” which are the jumping off points for almost everything that happens in a typical invocation of Macaroni. However, it's still easy to spot functions which you know aren't called that early in the call graph.
The winning stinker of the bunch was “boost::filesystem::canonical” which is used by Macaroni's own Path class (which has different semantics in order to represent C++ source files easily, and which I wrote before I even knew of boost::filesystem… this project has taken so many years) to return an absolute path. As time has gone by, I've peppered the more recently created project / build system in Macaroni with things that need the absolute path, and it turns out the boost::filesystem function underpinning that is relatively expensive.
The fix was to simply cache the absolute path the first time it was computed, which took no time at all. This simple change cut two seconds off the run time, bringing the time until bjam was invoked from 5.732 to 3.874.
In short, Very Sleepy is an incredibly useful tool which makes spotting performance issues extremely simple. I'm grateful to it's authors for writing something so useful and giving it away for free.