Learning a little R

October 10, 2012 by alex

What’s a pirate’s favorite programming language? R! (Groan all you want. It only encourages me.)

Over the past few years, I’ve made a few half-hearted attempts to learn the R programming language. I made a little progress, but never really felt like I understood what I was doing. The language very alien compared to other languages I was used to, and its problem domain (statistics) was one I didn’t have much familiarity with.

I still don’t really know any statistics, but thanks to a few years with Ruby I understand functional programming a bit better than I used to. This definitely has made a big difference with R.

I decided to try a Coursera class on R, both to see how Coursera works and (hopefully) to get over the hump and really learn enough R that I can use it daily. Since I work with lots of data, I expect I’ll find plenty of places where it’s helpful.

Coursera seems well suited for this kind of learning. I have a clear goal in mind, and I’m motivated enough to do the work without anyone hounding me. Please note that neither of these things were true for 18-year-old Alex, so I don’t see Coursera as a replacement for a normal college in any way, shape or form. But, the range of things you can learn on Coursera is pretty impressive, as are the instructors.

So far the R class is living up to expectations. One recent programming assignment:

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0.

Whew, that’s a mouthful. I had to read that over several times before it started to make any sense, but it did eventually. The most impressive thing is that the solution is something like 4 lines of code. And, after a few hours of hacking, I think I actually understand the code I wrote. :)

☙ ☙ ☙