alex's blog

Separate databases for each git branch in a Rails project

Git makes it easy to branch your code. I love this. But when a branch needs migrations, switching back to master can be difficult. Just checking out the master codebase isn't enough - you need to also spin up a database with the schema expected by master.

If you have no data worth keeping, simply running `rake db:schema:load` after checking out a different branch is fine. If that doesn't work for you, read on.

Here's the solution I've been trying out and which seems to be working quite well.

# config/database.yml
development:
  adapter: mysql2
  encoding: utf8
  reconnect: false
  database: project_development_<%= Git.open('.').current_branch %>
  pool: 5
  username: username
  password: userpassword
  host: 127.0.0.1

database.yml is automatically parsed for ERB snippets, so this just works without any extra setup. I'm using the git gem to figure out what branch I'm currently in, and select the right database accordingly.

You can see the databases I've set up to follow this convention in mysql.

mysql> show databases;
+-----------------------------------+
| Database                          |
+-----------------------------------+
| information_schema                |
| project_development_master        |
| project_development_experiment    |
| project_development_bugfix        |
+-----------------------------------+
4 rows in set (0.00 sec)

I think I want to add a feature/convention to allow me to keep using the master db on certain branches. (This will save setup time for branches which I know aren't going to have any migrations.) Otherwise, that's about it.

I'm a published author.

Cover of O'Reilly's 'Monitoring With Ganglia' book

I contributed to a few chapters in O'Reilly's recently-published book "Monitoring With Ganglia". This is a big first for me, and I'm pretty proud of it. Anyone who's interested in Ganglia (and really, who isn't?) should pick up a copy.

All proceeds are being donated to Scholarship America, which is a charity selected by the lead authors.

Learning a little R

What's a pirate's favorite programming language? R! (Groan all you want. It only encourages me.)

Over the past few years, I've made a few half-hearted attempts to learn the R programming language. I made a little progress, but never really felt like I understood what I was doing. The language very alien compared to other languages I was used to, and its problem domain (statistics) was one I didn't have much familiarity with.

I still don't really know any statistics, but thanks to a few years with Ruby I understand functional programming a bit better than I used to. This definitely has made a big difference with R.

I decided to try a Coursera class on R, both to see how Coursera works and (hopefully) to get over the hump and really learn enough R that I can use it daily. Since I work with lots of data, I expect I'll find plenty of places where it's helpful.

Coursera seems well suited for this kind of learning. I have a clear goal in mind, and I'm motivated enough to do the work without anyone hounding me. Please note that neither of these things were true for 18-year-old Alex, so I don't see Coursera as a replacement for a normal college in any way, shape or form. But, the range of things you can learn on Coursera is pretty impressive, as are the instructors.

So far the R class is living up to expectations. One recent programming assignment:

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0.

Whew, that's a mouthful. I had to read that over several times before it started to make any sense, but it did eventually. The most impressive thing is that the solution is something like 4 lines of code. And, after a few hours of hacking, I think I actually understand the code I wrote. :)

Animating geo data using ruby-processing

A few weeks I wrote about experimenting with Processing for the first time. I'm pretty happy about how that project has progressed since then.

I synthesized some TED talk views data and an IP address geolocation database to produce this:

It was fun to do, and I'm pretty happy with the end result. Switching from processing to ruby-processing was a huge win - I'm far more comfortable with Ruby and got a lot more done with less hassle. I kept dreaming up features & adding them without too much trouble. (The sparkline in the corner took less than an hour to add.)

This thing got featured on the TED blog, which is very nice. The code is on GitHub if you want to see how it works under the hood.

What is TED?

Us remote TED employees were asked to make a short video answering the question "What is TED?" for the yearly company meeting.

I had a little fun with my Makey Makey. Here's what I came up with.

I built a clock!

I'm mainly a software guy, but I've always wanted to learn more about building real things. I've got a workbench just for that purpose, but it doesn't get used as much as I'd like.

But now, I actually built something. Sara got me this very snazzy clock kit from AdaFruit. I stayed up way too late Monday night soldering it together, and now I've got the coolest looking alarm clock ever.

What's better than that? It's built to be hacked! Lots of people have written custom software for this clock to do things like automatically dim the display at night, or to set the time via GPS - since GPS is fundamentally a system of hyper-accurate clocks. So cool what people come up with!

Sometime soon I'm going to try adding the auto-dimming feature. I also plan to change the way the alarm goes off. The frequency is a little too urgent for me at 6:30am, and I'm just a few lines of code away from changing that for the better. :)

I still have to learn how to actually load modified firmware into the clock, and that'll probably be my next project. Until then, bask in the glory of watching time blink by on one of my favorite birthday presents ever:

fun with processing

I've been thinking on & off about how to do animations of spatial data. I tend to think of web-based stuff like GeoServer + OpenLayers, but I've also been meaning to try Processing to see what opportunities are there for building animated maps.

Last night I played around for a few hours and got some very nice little droplets appearing on a screen. No GIS just yet, they're all randomly placed, but so far I like what I've found. Processing is very easy to get started with, and you have access to the full world of Java libraries - meaning an application can get as complex as you need and you won't outgrow the toolset.

I'll put this code on GitHub at some point, but for now here's a sample of what I did. Nothing too amazing, but given that I started from nothing (not even having Processing installed) to this in under 2 hours, I'm pretty happy.

logstash patches

We've been experimenting with logstash at work, and so far it's looking like a very capable system - especially when paired with the beautiful Kibana web interface for log search & visualization.

Some bugs surfaced in the redis output module for logstash, and I'm happy that our patches were just merged into the master. Hooray for open source. :)

Bug report : https://logstash.jira.com/browse/LOGSTASH-573
Pull request : https://github.com/logstash/logstash/pull/195

Testing How A Method Is Called, Without Testing Equality

There are times I want to assert that a method is called with a particular kind of argument, but I don't really care what the exact values of that argument are. DateTime is a great example. I want a method to be called with a DateTime. I don't care what time it represents, so an equality test doesn't fit. I just want to know that the calling method is invoking the called method in the right way.

class Foo {
  public function duff() {
    return $this->fud(new DateTime());
  }
  public function fud(DateTime $date) {
    return $date;
  }
}

class TestCase extends PHPUnit_Framework_TestCase {
  public function test_duff_calls_fud_with_a_DateTime() {
    $mock = $this->getMock('Foo', array('fud'));
    $mock
      ->expects($this->once())
      ->method('fud')
      ->with(new PHPUnit_Framework_Constraint_IsInstanceOf('DateTime'));

    $mock->duff();
  }
}

PHPUnit_Framework_Constraint_IsInstanceOf lets me assert that duff is going to call fud with an instance of DateTime, but I don't need a reference to that DateTime. This is really useful, and not complex once you see it in action. But it took some digging in the PHPUnit source code to find it.

Hope that's useful.

Mocking Static Method Calls With PHPUnit

Update Nov 1, 2011

I updated the code samples in 2 places. User::sendRegistration needed to return a value, and MockTurtle was renamed to MockProxy since it just didn't seem funny or clever anymore. Late night naming gone awry.

Overview

PHPUnit 3.5 comes with some ability to mock static method calls. You create a new test class which can expect a given static call, and then use a staticExpects() call to set up your expectations just like with the normal instance-based expects().

http://sebastian-bergmann.de/archives/883-Stubbing-and-Mocking-Static-Me...

This is all fine if you can call the static method directly, or if all the static method calls are in the same class. But say you have an instance method in one class which calls a static in another class, and you want to test that the static is called correctly? You're sunk. Can't be done.

This was driving me crazy tonight, so I decided to try to hack a way around the problem. I want to share what I came up with, get feedback, and see if there are better ways to do it.

Setup

Forgive this painfully contrived example. We have a User, and the User calls a RegistrationService.

//User.class.php
class User {
  public function __construct($id) {
    $this->id = $id;
  }
  public function sendRegistration() {
    return RegistrationService::processRegistration( $this->id );
  }
}

//RegistrationService.class.php
class RegistrationService {
  public static function processRegistration($user_id) {
    // call some external service
    return '{'.$user_id.':"success"}';
  }
  public static function getServiceName() {
    return 'registration';
  }
}

In my test, I don't want to call the real RegistrationService::processRegistration, but I do want to verify that $user->sendRegistration() is making that call correctly.

The core of my approach is a mock RegistrationService that looks like this:

//RegistrationService.mock.php

class MockProxy {
// This thing could use a better name.

  private static $mock;

  public static function setStaticExpectations($mock) {
    self::$mock = $mock;
  }
  // Any static calls we get are passed along to self::$mock.
  public static function __callStatic($name, $args) {
    return call_user_func_array(
      array(self::$mock,$name),
      $args
    );
  }
}

class RegistrationService extends MockProxy {}

And after all that setup, this is what the test looks like

require_once 'User.class.php';

class TestCase extends PHPUnit_Framework_TestCase {

  /**
  * @runInSeparateProcess
  * @preserveGlobalState disabled
  */

  public function test_sendRegistration_calls_RegistrationService() {
    require_once 'RegistrationService.mock.php';

    $mock = $this->getMock( 'RegistrationService', array('processRegistration') );
    $mock->expects( $this->once() )
      ->method( 'processRegistration' )
      ->with( 25 )
      ->will( $this->returnValue('{mock:true}') );

    RegistrationService::setStaticExpectations($mock);

    $subject = new User( 25 );
    $this->assertEquals('{mock:true}', $subject->sendRegistration());
  }

  public function test_RegistrationService_reports_its_service_name() {
    require_once 'RegistrationService.class.php';
    $this->assertEquals('registration', RegistrationService::getServiceName());
  }
}

In the first test, any static methods calls made to the mock RegistrationService get passed along to the mock we supplied with setStaticExpectations. Note that these are normal instance-based expectations, not static ones. That's a little counter-intuitive, but if you follow the code it makes sense.

The second test has nothing to do with User, and really doesn't belong in this suite at all. I include it to show that you can use a mock in one test, and invoke the real un-mocked class in another test. You can find an explanation of @runInSeparateProcess and @preserveGlobalState in http://matthewturland.com/2010/08/19/process-isolation-in-phpunit/. As the first one implies, it means that this particular test will be run in its own process. This is necessary since we're dealing with 2 different classes both named RegistrationService.

OUTPUT

alex@turnip:~/Code$ phpunit UserTest.php
PHPUnit 3.5.14 by Sebastian Bergmann.

..

Time: 1 second, Memory: 5.50Mb

OK (2 tests, 2 assertions)

and if I break the first test on purpose, the error message is clear and easy to follow. That's one metric I was worried this approach wasn't going to do well on, but it seems just fine in this case at least.

alex@turnip:~/Code$ phpunit UserTest.php
PHPUnit 3.5.14 by Sebastian Bergmann.

F.

Time: 0 seconds, Memory: 5.75Mb

There was 1 failure:

1) TestCase::test_sendRegistration_calls_RegistrationService
Failed asserting that <integer:2525> matches expected <integer:25>.

/Users/alex/Code/RegistrationService.mock.php:15
/Users/alex/Code/User.class.php:9
/Users/alex/Code/User.class.php:9
/Users/alex/Code/UserTest.php:24

FAILURES!
Tests: 2, Assertions: 1, Failures: 1.

THOUGHTS?

So... feedback? Are there better ways to do this, and my Google-fu was just too weak tonight? How could I improve this approach?

Let me say right off the bat that I'm really really really not interested in "statics are bad, you should refactor your code to not use them" kinds of responses.

I actually tend to agree, and if I'm creating a project from scratch I tend to follow that advice. But how many projects do you work on which are entirely your own design and your own code? In my world, the count is 0. I expect to use 3rd-party libraries. I can't control their APIs, but I don't feel like that fact should prevent me from doing comprehensive testing of how I use those APIs.

Actually, I think we'd have the same problem even if User were using an instance of RegistrationService instead of calling a static, and it could be solved in essentially the same way.

But again that could just be me looking at the problem wrong. It seems like so much of testing is about habits. I really learned my habits doing Rails, and the kinds of stuff you can do with Mocha just aren't possible in PHP. That doesn't mean good testing isn't possible - it just means that the habits & intuition I've built up over the years aren't serving me well when I try testing PHP. I'm keen to learn, so fill me in!

FINALLY, SOME LOVE

Thanks to Sebastian Bergmann and everyone else who have contributed to PHPUnit. You make the best testing tools for PHP, and if I grumble about limitations here & there it should not be interpreted as "PHPUnit sucks, look what it can't do". I hate that kind of attitude, and I appreciate all the work you've done for all of us!

Have a good night! I think I did. :)

Ganglia References

This is a collection of ganglia-related links I put together for an 'Introduction to Ganglia' presentation I'm doing for the Ruby Users of Minnesota.

General Info

Live Demo

RRD

Custom Metrics

Metrics via log parsing

Custom Graphs

Ganglia/Nagios Integration

PHP Comparison Surprises

UPDATE

@tcollen reminded me that any non-empty string evaluates to true. Ok, my mistake there. I guess my PHP is rustier than I thought...


I'd expect that if $a == $b, and $b == $c, then $a == $c. Alas, I just tripped over a case where this isn't true.

$ php -r "echo 1 == true ? 'true' : 'false';"
true

$ php -r "echo true == 'enabled' ? 'true' : 'false';"
true

$ php -r "echo 1 == 'enabled' ? 'true' : 'false';"
false

First: This is just nuts. true == 'enabled'?!?!

Second: Where did 'enabled' get it's special status? I wasn't expecting it, and I can't find it documented anywhere. http://php.net/manual/en/types.comparisons.php

Third: ARGH!!!

Creating HTML elements with CSS classes in jQuery : IE surprise.

Intro

Let's say you've got an HTML snippet like this, which you want to add a new element to.

  <div id="test"></div>

You can easily add new elements.

jQuery( '<span>' ).appendTo( '#test' );

Danger

You might also want to add a CSS class also.

jQuery( '<span class="foo">' ).appendTo( '#test' );

But be warned! This fails silently in IE7 and IE8! There's no error raised, but no element is created or added to the test div.

A Workable Alternative

jQuery( '<span>' ).addClass( 'foo' ).appendTo( '#test' );

seems to work just fine, though.

Some Hacker at UCAR Likes Poe

Just now I was building NetCDF, and noticed this output:

got NC_CHAR val = A (0x41)
got NC_CHAR val = B (0x42)
got NC_CHAR val = "The red death had long devastated the country."
got val = A (0x41)
got val = B (0x42)
got val = "The red death had long devastated the country."
got vals = 0.000000 ... 447.000000
re nc_close ret = 0
PASS: t_nc

Reminded me of http://www.sysop.ca/?p=90, which I tweeted about a few weeks ago. Wonder how many other little nuggets like this I'll start noticing now. Though most of the time I think (hope?) I've got better things to do than watch ./configure; make; make install output scroll by.

Technical ToDo List

I feel like I have a million little projects floating around in my head. In this post, I'm going to mention some of the ones I think of most often. Should be a small way to hold myself more accountable for actually getting something done. Ask me how I'm doing on these in 6 months! :)

  • Ganglia contributions: I've been helping out with a refresh of the ganglia PHP app, and I need to get more done on that. https://github.com/alexdean/ganglia-misc
  • Regions Project v2: I created www.regionsproject.org long before I knew anything about real GIS. Now that GeoServer and OpenLayers exist, I would like to do a new version of that project using open-source GIS.
  • Invent a better CAPTCHA: I get all kinds of junk user registrations on this site. Somebody somewhere is solving reCaptchas for spammers. I keep having this sense that adding some kind of interaction would make this harder to pull off, but I haven't thought if what it should be.
  • Marc Finder: Blacklight (http://projectblacklight.org/) is a great system for cataloging books. I want a local Blacklight installation for my books, but I need to get ahold of MARC records for all my books first. I have a rough Rails app which can query Z3950 servers based on ISBN numbers, and I'd like to vamp this up into a system people could use. Picture a site where you could submit a batch of ISBN numbers, and get back a tarball full of MARC records to dump into Blacklight. Couple a cheap barcode scanner with MarcFinder and Blacklight, and you've got a nice system.
  • RGB: Really Good Backups. This is just a collection of Python and bash scripts I use for managing my rdiff-backup repositories. I have 1 repo for each of our workstations at home, as well as other repos for database data backups and website content. Again, I think this is probably releasable. It could use some polish and some unit tests.
  • Shot Tracker: Web application for target shooters, mainly for tracking your accuracy with various ammunition. I think it'd be really cool to have a mobile app which would let you photograph a target, find the holes, do some image recognition, and measure your group size for you. Upload that to the site, and build up a set of accuracy data. Inspired by http://bulletin.accurateshooter.com/2010/04/22-lr-ammunition-accuracy-55...

All these projects have been in various states of half-finished-ness (or exist as pure vapor-ware) for quite a while. Let's see how I can do in pushing them forward in the next six months!

Somehow I suspect I'll spend more time doing dishes and playing hide-and-seek with the kids. As a father of young kids, I think that's as it should be, but it does grate on me how little time I seem able to make for other projects. Ah well, every day is a day for change, right?!

Delegation in Rails

I work for IPS Meteostar. We make software for meteorologists to use in the production of forecasts. In the world of aviation meteorology, metwatching is the practice of comparing a current weather observation to a forecast, and making adjustments to the forecast when they differ too much.

The definition of what 'too much' is varies from parameter to parameter (visibility, wind speed, temperature, etc). Our software allows users to define various rules like "if visibility varies by more than 0.5 miles, color-code the airport red on my map". Each interval has a lower & upper boundary. A metwatching interval ties these lower & upper bounds to a color which should be displayed when the current value falls between those lower & upper boundaries.

Up to now, we've stored those rules in a configuration file. Now we need to move them to the database, and that has presented a challenge.

My code used to look like this:

class Interval
  attr_reader :lower, :upper
end

class MetwatchInterval < Interval
end

Note: None of my example code is complete. I'm only including enough to make my point about how the classes relate to each other. So, if it looks like there's a lot missing... there is!

Now, MetwatchInterval needs to become a subclass of ActiveRecord::Base. So how to access all those Interval methods? Without multiple inheritance, something has to give.

This was my first approach:

class MetwatchInterval < ActiveRecord::Base

  def initialize
    @interval = Interval.new
  end

  def method_missing( meth, *args )
    if @interval.respond_to?( meth )
      @interval.send( meth, *args )
    else
      raise NoMethodError, "#{meth} is missing."
    end
  end

end

This works well enough, but it just feels unclear. method_missing can be a life-saver, but I feel like it's worth avoiding when there are simpler solutions available. There's that extra method call to process, and the stack traces are frequently less clear when method_missing is used.

Enter Rails' delegate.

class MetwatchInterval < ActiveRecord::Base

  delegate :upper, :lower, :to=>:@interval

  def initialize
    @interval = Interval.new
  end

end

The external API exposed by MetwatchInterval is the same as the method_missing version, plus I think it's easier to read & understand. Under the hood, delegate just defines a few new methods on MetwatchInterval for me, so (internally), it looks the same as if I'd written

class MetwatchInterval < ActiveRecord::Base
 
  def initialize
    @interval = Interval.new
  end
 
  def upper( *args, &block )
    @interval.send( :upper, *args, &block )
  end

  def lower( *args, &block )
    @interval.send( :lower, *args, &block )
  end

end

Nice and simple. Not too much magic. There are plenty of times I'll be glad to have discovered this. It makes favoring composition over inheritance easy, and that generally feels like good design practice to me.

The Disemvoweler : Shorten those tweets!

And I give you... The Disemvoweler! Shorten your text and easily tweet the results.

I think it's fairly common knowledge that you can remove all the vowels from most sentences, and still be left with an intelligible statement. This is often called disemvoweling, which I think is just awesome. http://en.wikipedia.org/wiki/Disemvoweling After a http://ruby.mn meeting a few months ago, this topic came up over beers and I finally decided to do something about implementing it as a simple tool.

I'll probably continue goofing around with the algorithm here & there. For example, I think I probably shouldn't mangle words less that 3 characters long, but I haven't done that yet.

Here's what it does currently.

  • Remove all vowels, except those that begin words. These seem to be more important when it comes to legibility.
  • Don't disemvowel any words which have an interior '.'. This prevents me from mangling URLs.

If you have any other suggestions, please leave a comment and let me know.

Motivation, and predicting the future.

A few weeks ago I wrote about getting stuck, where motivation disappears and making progress seems impossible. I've continued to ponder what kinds of things contribute to this problem, since it's unpleasant and it seems so strange to me that I can continue to fall into a trap I know I don't like. So... here goes.

The issue

Satisfaction comes from solving some new problem. Frustration comes from investing lots of energy into something and ending up with nothing to show for the effort, and frequent frustration kills motivation.

It's the big problems which offer both the biggest payoffs (the satisfaction of solving something tricky) and the biggest dangers. The more often a given project throws up roadblocks and unexpected surprises, the harder it is to continue to motivate yourself to try again, to continue to work at it. And... you're stuck!

The point I want to add in this post is that, when your starting out on a genuinely new project, it's usually impossible to tell if you're going to make steady progress or if you're going to hit unexpected roadblocks. You can't. I think this explains some of my own reluctance to dive back into my big projects which have gotten bogged down. It gets easier and easier to see these big problems as unsolvable, and that's how I get stuck.

The solution?

I come to the same conclusion as I did a few weeks ago: it's best to approach a big task as a series of smaller tasks. It's absolutely essential to be able to see what progress is being made. I think this is already well-known in terms of keeping product owners happy, but I also see how important it is for me as a developer. If I feel like I'm getting nowhere, it gets incrementally harder to summon the motivation to continue. So, set things up in ways that you can see progress, and prove to yourself that you're not just going round and round the same carousel.

There's a constant tension between enough planning and too much planning. You need some high-level picture to know you're making progress, and to know you're actually solving the right problems. But you very quickly reach a point where the value of additional planning drops sharply. More investigation beyond this point is paralysis by analysis. At some point you just have to get going and see how things work out. But I do believe there's definitely such a thing as not enough planning, and that can lead to the kinds of pitfalls I'm writing about.

As a final note... I think it can be a welcome relief sometimes to attack a problem which is entirely within your comfort zone. I think of my brain like a muscle. Muscles don't grow if they aren't stretched, even to the point of pain, but constant pain is counter-productive. It's not a sign of weakness to knock off a few easy tasks in the midst of the big stuff. There's no constant relationship between the difficulty of a project and the importance of a project. Some easy stuff is really really important. Fixing little bugs can make a big difference. And the little stuff can be a great way to recharge between (or in the midst of) the bigger stuff.

PHP array creation on first assignment

$test['key'] = 'value';
echo $test['key'];

I thought I had an error here, since the code is making an assignment to an array which hasn't yet been initialized. There was nothing in my error log, so I assumed that I hadn't set my error_reporting value high enough. But even after using E_ALL | E_STRICT, the highest level possible, there's still not so much as a peep.

As I've discovered, this is a perfectly legitimate way to initialize an array.

An existing array can be modified by explicitly setting values in it. This is done by assigning values to the array, specifying the key in brackets. The key can also be omitted, resulting in an empty pair of brackets ([]). If $arr doesn't exist yet, it will be created, so this is also an alternative way to create an array.

http://us2.php.net/manual/en/language.types.array.php

Well, heck, that's convenient.

Define Ganglia Custom Graphs Using JSON

Vladimir Vuksan's blog has a nice writeup on how to define new custom Ganglia graphs using JSON. I helped implement this, so it's nice to see how the project has continued to improve.

http://vuksan.com/blog/2011/02/20/json-representation-for-graphs-in-gang...

Syndicate content