In my day job, parting the mists of time to predict the future, it is often useful to know what is happening now (or has been happening recently). Much data about the present and recent past does exist on-line, but it tends (either by accident or as a matter of policy) to be rather painful to extract by hand – and so a little bit of “code” to automate the process comes in very handy. Given that “the man” has de-staffed those with useful skills rather significantly, on the whole if you want a bit of “code” you better start writing it yourself.
I used to code, back in the day (mostly my teenage days) so have had a punt at this using both Visual Basic and at the command-line in MS-DOS. I don’t actually know any VB, but I used Basic in the early 1980s on a TRS-80 and a BBC Micro and armed with that fading knowledge, some general views about how programming languages should work, a little macro capture, some guesswork and a bit of focused internet search I found I could write some code to scrape a website and compile the extracted data into a usable form. This code was very far from elegant, but it was a lot quicker than extracting the data by hand. As part of this process, I also discovered that based on file naming conventions, I could tell when the person publishing data for a major European system operator took their holidays for the last two years – it can be surprising how much data we inadvertently give away on-line!
However, this is all rather painful and I decided it would be handy if I learned a proper scripting language so that I could write more effective scrapes (among other things). For the avoidance of doubt, I decided to do this for “fun” in my own time – well, who among us hasn’t been tempted? My original plan was to have a look at Ruby or Python, but Southampton library seemed rather weak in these languages. However, it did have a book on Perl which some part of my subconscious seemed to believe might be a scripting language. So, Fate having poked me with her stick, I came home with Learning Perl – aka the Llama Book – to have a look at and see if it might serve my needs.
The good news was that my Mac has a version of Perl built-in, though it took a little fiddling to get it to actually run programmes. In my teenage days, even the most complex text editor did just that (if you were lucky), but now even a very basic text editor (like TextEdit) insists on adding all manner of bells and whistles to your putative programmes – and Perl really doesn’t like SmartQuotes or their ilk. Still, I am now happily using TextWrangler which outputs what you input and even colour codes my code, which does help avoid at least some common syntax errors.
I have rather fallen in love with Perl – mostly because it is just so naughty. It is the most transgressive programming language I have ever played with, breaking all the “rules” that more square languages insist upon – though it will play slightly less fast-and-loose if you use the pragma strict (but that would just spoil the fun). It also makes huge use of punctuation marks and other obscure characters which are dotted around your keyboard – so I am now becoming quite skilled at finding these. As a result, it can produce very compact and powerful – if hard to understand – code and is oft used by System Administrators (and now me). Learning it is proving great fun, partly aided by the enjoyable style of Learning Perl and its rather well chosen exercises. I have also found myself learning about Regular Expressions which have a substantial life well beyond Perl and which may improve my future internet searching. I think I have just about reached the stage where I can start tearing apart data files and reassembling them in a more useful form, so next week I shall be putting my learning to the test!
So enamoured am I with Perl, that I have bought my own copy of the Llama Book (and am considering the Camel Book), but last night matters rose to a whole new level. Yes, last night (so far as I know) I had my first Perl dream – I cannot remember all the details, though I do recall a rather egregious syntax error I was making and only noticed on waking: printing the number of elements in an array rather than its contents (fool!). So, I think it may be premature to offer my sleep-coding services to the world – but my more conscious coding should be rather better…