Find files in a git project redux

In my previous post I described an update to an Emacs Anything source to “Find files in a git project”. This works great if you are inside an git-managed project, but fails horribly if you are not.

Here is a version that fixes that:

(defvar anything-c-source-git-project-files-cache nil "(path signature cached-buffer)")
(defvar anything-c-source-git-project-files
  '((name . "Files from Current GIT Project")
    (init . (lambda ()
              (let* ((git-top-dir (magit-get-top-dir (if (buffer-file-name)
                                                         (file-name-directory (buffer-file-name))
                                                       default-directory)))
                     (top-dir (if git-top-dir
                                  (file-truename git-top-dir)
                                default-directory))
                     (default-directory top-dir)
                     (signature (magit-rev-parse "HEAD")))

                (unless (and anything-c-source-git-project-files-cache
                             (third anything-c-source-git-project-files-cache)
                             (equal (first anything-c-source-git-project-files-cache) top-dir)
                             (equal (second anything-c-source-git-project-files-cache) signature))
                  (if (third anything-c-source-git-project-files-cache)
                      (kill-buffer (third anything-c-source-git-project-files-cache)))
                  (setq anything-c-source-git-project-files-cache
                        (list top-dir
                              signature
                              (anything-candidate-buffer 'global)))
                  (with-current-buffer (third anything-c-source-git-project-files-cache)
                    (dolist (filename (mapcar (lambda (file) (concat default-directory file))
                                              (magit-git-lines "ls-files")))
                      (insert filename)
                      (newline))))
                (anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))

    (type . file)
    (candidates-in-buffer)))

As a diff from the previous version:

@@ -2,9 +2,12 @@
 (defvar anything-c-source-git-project-files
   '((name . "Files from Current GIT Project")
     (init . (lambda ()
-              (let* ((top-dir (file-truename (magit-get-top-dir (if (buffer-file-name)
-                                                                    (file-name-directory (buffer-file-name))
-                                                                  default-directory))))
+              (let* ((git-top-dir (magit-get-top-dir (if (buffer-file-name)
+                                                         (file-name-directory (buffer-file-name))
+                                                       default-directory)))
+                     (top-dir (if git-top-dir
+                                  (file-truename git-top-dir)
+                                default-directory))
                      (default-directory top-dir)
                      (signature (magit-rev-parse "HEAD")))

Emacs Anything - Find files in a git project

If you use Emacs you really should take a look at Anything. When you do, you’ll probably want to use it to replicate TextMate’s fabled “Go to file…”. Ken Wu wrote a nice little anything-source that uses git to derive a file list for a project, but he was obviously using an old version of magit. Here’s a tweaked version of his code that works with Magit v1.1.1:

(defvar anything-c-source-git-project-files-cache nil "(path signature cached-buffer)")
(defvar anything-c-source-git-project-files
  '((name . "Files from Current GIT Project")
    (init . (lambda ()
              (let* ((top-dir (file-truename (magit-get-top-dir (if (buffer-file-name)
                                                                    (file-name-directory (buffer-file-name))
                                                                  default-directory))))
                     (default-directory top-dir)
                     (signature (magit-rev-parse "HEAD")))

                (unless (and anything-c-source-git-project-files-cache
                             (third anything-c-source-git-project-files-cache)
                             (equal (first anything-c-source-git-project-files-cache) top-dir)
                             (equal (second anything-c-source-git-project-files-cache) signature))
                  (if (third anything-c-source-git-project-files-cache)
                      (kill-buffer (third anything-c-source-git-project-files-cache)))
                  (setq anything-c-source-git-project-files-cache
                        (list top-dir
                              signature
                              (anything-candidate-buffer 'global)))
                  (with-current-buffer (third anything-c-source-git-project-files-cache)
                    (dolist (filename (mapcar (lambda (file) (concat default-directory file))
                                              (magit-git-lines "ls-files")))
                      (insert filename)
                      (newline))))
                (anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))

    (type . file)
    (candidates-in-buffer)))

I tried to update the Emacs Wiki page to include this fix but couldn’t. Not sure what I was doing wrong… The changes I made to Ken’s code:

@@ -6,7 +6,7 @@
                                                                     (file-name-directory (buffer-file-name))
                                                                   default-directory))))
                      (default-directory top-dir)
-                     (signature (magit-shell (magit-format-git-command "rev-parse --verify HEAD" nil))))
+                     (signature (magit-rev-parse "HEAD")))
 
                 (unless (and anything-c-source-git-project-files-cache
                              (third anything-c-source-git-project-files-cache)
@@ -20,10 +20,14 @@
                               (anything-candidate-buffer 'global)))
                   (with-current-buffer (third anything-c-source-git-project-files-cache)
                     (dolist (filename (mapcar (lambda (file) (concat default-directory file))
-                                              (magit-shell-lines (magit-format-git-command "ls-files" nil))))
+                                              (magit-git-lines "ls-files")))
                       (insert filename)
                       (newline))))
                 (anything-candidate-buffer (third anything-c-source-git-project-files-cache)))))

Chef doesn't lock node data when updating

We came across an exciting Chef bug today.

Chef tracks metadata about nodes in its database. This includes operational facts about the node (uptime, memory, etc.), and chef-related things like when the node last checked in. It also includes intentional data such as what run list should be applied to the node.

Periodically, a node polls its server for updates. What happens is:

  • node checks in with server

  • node gets current metadata from server, including its run list of recipes and roles

  • node performs actions as per the run list

  • node saves its metadata back to the server, including the run list it just applied

All well and good, except that step three can potentially be long running. There’s plenty of time for an administrator to change the node’s desired run list (or other intentional metadata) using the knife tool or the web interface. But now, when the node’s run completes, it saves its old state back to the server, over-writing whatever updates an administrator applied while it was running. And you won’t know unless you look.

This is unfortunate.

There’s a bug that more or less describes this in the project’s tracker. It was raised quite recently, so hopefully someone from the Chef team will take a look at it soon. There’s also a thread on the Chef mailing list.

Jekyll archives grouped by date

One thing Jekyll doesn’t provide out of the box (as fas I can tell) is any sort of archive functionality. (Aside: I really like what Tumblr does for archives.)

I would have liked something a bit more flexible, but for now this site’s archive displays a list of all entries grouped by year. Here’s the template code I’m using:

<h2>Archives</h2>
<ul>
  {{'{'}}% for post in site.posts >}}

    {{'{'}}% unless post.next >}}
      <h3>{{'{'}}{ post.date | date: '%Y' }}</h3>
    {{'{'}}% else >}}
      {{'{'}}% capture year >}}{{'{'}}{{'{'}} post.date | date: '%Y' }}{{'{'}}% endcapture >}}
      {{'{'}}% capture nyear >}}{{'{'}}{{'{'}} post.next.date | date: '%Y' }}{{'{'}}% endcapture >}}
      {{'{'}}% if year != nyear >}}
        <h3>{{'{'}}{{'{'}} post.date | date: '%Y' }}</h3>
      {{'{'}}% endif >}}
    {{'{'}}% endunless >}}

    <li>{{'{'}}{{'{'}} post.date | date:"%b" }} <a href="{{'{'}}{{'{'}} post.url }}">{{'{'}}{{'{'}} post.title }}</a></li>
  {{'{'}}% endfor >}}
</ul>

which was shamelessly ripped off from http://blog.tracefunc.com/2009/12/04/jekyll-custom-liquid-tags/

Importing a Blosxom blog into Jekyll

As mentioned, I recently decided to move my blog from a self-hosted, Blosxom-driven mostly-manual set up to github pages.

This involved these main steps:

  • Set up a github repo to hold the templates and source text
  • Migrate templates from Blosxom’s templating language to Jekyll/Liquid
  • Import the content

I won’t cover the first two in detail here. Setting up a repository for pages is well documented by github, and migrating the templates was relatively straightforward–I used the code behind Simon Harris’s blog as a starting point. (Getting the archive page working was slightly more interesting. I’ll write more on this later.)

There were two parts to importing the content. Firstly, the directory layout expected by Jekyll is slightly different to that I was using in Blosxom.

Here is what I had:

.
|-- 2009
|   `-- 04
|       |-- an-interesting-story.txt
|       `-- something-else.txt
|-- 2010
|   |-- 01
|   |   |-- happy-new-year.txt
|   |   `-- headache.txt
|   `-- 08
|       `-- migrating-blog.txt

Jekyll wants a much flatter directory layout, with all the files in a single directory and the date as part of the file name:

.
`-- _posts
    |-- 2009-04-01-an-interesting-story.md
    |-- 2009-04-19-something-else.md
    |-- 2010-01-01-happy-new-year.md
    |-- 2010-01-02-headache.md
    `-- 2010-08-04-migrating-blog.md

The trick was that Jekyll wanted a day, but I only encoded the year and month in my Blosxom file structure. Luckily, I was using the Blosxom entries_index plugin, which stores Unix-style timestamps for every entry it publishes. So I wrote a little Clojure program to read the entries_index cache and derive a Jekyll-style file name for every entry:

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.duck-streams)

(import 'java.util.Date 'java.text.SimpleDateFormat)

(def entry-index
  (read-lines (first *command-line-args*)))

(defn parse-line [line]
  (let [[_ filename timestamp] (re-matches #".*'(.+)'.*\s+(\d+).*" line)]
    {:filepath filename :timestamp timestamp}))

(defn date [timestamp] (Date. (* 1000 (Long/valueOf timestamp))))
(defn date-str [date] (. (SimpleDateFormat. "yyyy-MM-dd") format date))
(defn filename [path] (last (re-split #"/" path)))
(defn md-ext [s] (re-sub #".txt$" ".md" s))
(defn valid? [line] (not (nil? (:timestamp line))))

(defn target-file-name [entry]
  (str (date-str (date (entry :timestamp))) "-" (md-ext (filename (entry :filepath)))))

(def entries (filter valid? (map parse-line entry-index)))

(defn copy-command [entry]
  (str "cp " (entry :filepath) " " (target-file-name entry)))

(println (str-join "\n" (map copy-command entries)))

Note that this program doesn’t actually do anything, it just outputs a bunch of “cp” commands that you can feed into a shell.

The second step is to add a block of YAML “front matter” to each file that Jekyll uses to parse the file and generate the appropriate output. This front matter is of the form:

---
layout: post
title: Blog migration
---

This tells Jekyll which template to use, and what to use for a title. The Blosxom source files don’t contain any such front matter, but do have the post’s title as their first line. A simple bit of sed wrote the appropriate opening lines of each file:

1,1 s/\([^-].*\)/---\
layout: post\
title: \1\
---/g

I invoked it like this:

for f in `ls  _posts/*`
    do sed -f ~/Projects/migrate-blosxom-to-jekyll/insert_front_matter.sed -i "" $f
done

And that was more or less that! The above code is available on github at http://github.com/mrowe/migrate-blosxom-to-jekyll, and of course the entire content of my blog is at http://github.com/mrowe/mrowe.github.com.

Blog migration

And in the latest installment in an ongoing tradition… I’ve moved my blog! This time, to github pages. Now they can worry about keeping servers running, and generating HTML from my text when I commit and all of those little details.

The migration was relatively painless–more details on the mechanics to follow. But if you can see this, it worked!

(Aside: all the templates and content that runs this blog is available on github.)

First adventures in Clojure

I’ve been banging on to anyone who’d listen for ages now about how Clojure is going the be the Next Big Thing. I read a fair way into Stuart Halloway’s Programming Clojure, and I played in the REPL a bit here and there, but I never got around to doing anything serious with it.

Today I finally found an excuse to use Clojure at work for a real-world problem. I needed to write a small program to read a product feed in CSV format, and cross-check that all the products in the feed actually exist in the live product catalogue database.

Here is my somewhat naïve attempt at implementing a solution:

;;
;; Read a CSV file and look up the product ids it contains in a
;; database. Report all the products in the CSV that do not exist in
;; the database.
;;
;; Usage: $0 <path-to-csv-file>
;;

(import 'java.io.FileReader 'au.com.bytecode.opencsv.CSVReader)

(use 'clojure.contrib.str-utils)
(use 'clojure.contrib.sql)

;; OpenCSV gives us a List of String[]s... ugh.
(defn read-csv [file-name]
  (with-open [reader (CSVReader. (FileReader. file-name))]
     (rest ;; skip the header row
      (map seq (seq (. reader readAll))))))

;; extract interesting fields from a CSV row
(defn product-from [row]
  {:product-id (nth row  0 "")
   :title      (nth row  1 "")})

;; set up the db connection
(def db {:classname   "org.h2.Driver"
         :subprotocol "h2"
         :subname (str "file:///Users/mrowe/.h2data/mydata")
         :user     "sa"
         :password ""})

(defn sql-query [q]
  (with-query-results res q (doall res)))

(defn count-products [product-id]
  (:count
   (first
    (sql-query ["select count(1) as count from product where id = ?" product-id]))))

(defn exists? [product-id]
   (>= (count-products product-id) 1))

(defn product-missing? [csv-row]
  (let [product (product-from csv-row)]
    (not (exists? (product :product-id)))))

;;;;;;;;;;

(def filename (first *command-line-args*))
(def feed (read-csv filename))

(defn report-product-id [row]
  (let [product (product-from row)]
    (format "Not in product catalog: %s - %s" (product :product-id) (product :title))))

(with-connection db 
  (println (str-join "\n" (map report-product-id (filter product-missing? feed)))))

This was purely an exercise in thinking functionally, and figuring out the basics of driving Clojure and getting it to interact with the world around it. I’ve made no attempt to actually use one of Clojure’s headline features, concurrency. (For what it’s worth, it happily processes an input of 2500 rows in a few seconds, most of which is spent in the database–I doubt there’s much to be gained from parallelising it.) But I think it reads pretty well, and is at least as concise and expressive as the equivalent Ruby would have been–once you learn to see through all the parentheses. ;-)

Let me know what you think!

Update: I’ve put the above code on github: http://gist.github.com/505633

Using VMWare Fusion shared folders with a Linux guest

VMWare Fusion has a “shared folders” feature which allows you to seamlessly share folders on the host Mac system with the virtualised guest OS. With a Linux guest, vmware-tools will install the “Host-Guest File System” (hgfs) driver and add an entry to /etc/fstab to automagically mount all shared folders under /mnt/hgfs.

This is great, but unless your user id in the Linux guest happens to match your user id OS X, you will not be able to access the mounted directories as a regular user. Luckily, you can get the hgfs driver to mount the shared folders as your user. Edit /etc/fstab as root:

$ sudo vi /etc/fstab

and look for a section like:

# Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5 0 0
# End of the block added by the VMware software

Add options for uid and gid:

# Beginning of the block added by the VMware software
.host:/ /mnt/hgfs vmhgfs defaults,ttl=5,uid=1000,gid=1000 0 0
# End of the block added by the VMware software

The values I’ve used, 1000 for uid and gid, are the defaults for the first user created on an Ubuntu desktop install. To find the correct values for your user, run the id command in the guest OS:

$ id
uid=1000(mrowe) gid=1000(mrowe) groups=...

Filtering lists

Recently, my friend Gav wrote about using STL to filter a vector of values in C++ in which he explained a surprising gotcha. I’m sure he knows what he’s talking about, but it struck me how ugly this (presumably idomatic) code was. So I figured I’d see what it would look like in a few more “modern” languages:

Ruby

>> numbers = 1..9
=> 1..9
>> numbers.reject { |n| n.even? }
=> [1, 3, 5, 7, 9]

Or, if you skip the separate assignment of the input data:

>> (1..9).reject { |n| n.even? }
=> [1, 3, 5, 7, 9]

Python

>>> numbers = range(1,10)
>>> [n for n in numbers if n % 2]
[1, 3, 5, 7, 9]

or

>>> [n for n in range(1, 10) if n % 2]
[1, 3, 5, 7, 9]

Clojure

user=> (def numbers (range 1 10))
#'user/numbers
user=> (filter odd? numbers)
(1 3 5 7 9)

or

user=> (filter odd? (range 1 10))
(1 3 5 7 9)

Yeah, I get that this wasn’t the point of the original post–sometimes you’re just stuck with C++. But if you do have the choice, other languages can be far more expressive for this common kind of list processing.

If you have examples in other languages (or improvements to my efforts) send them in and I’ll post them here.

Update: From Julian Doherty:

Erlang

1> Numbers = lists:seq(1,9).
[1,2,3,4,5,6,7,8,9]
2> [X || X <- Numbers, X rem 2 =/= 0].
[1,3,5,7,9]

Update: From Ben MacLeod:

C#

using System;
using System.Linq;

// ...

    var numbers = Enumerable.Range(1, 10).Where(n => n % 2 != 0);
    // or, equivalently:
    //var numbers = (from n in Enumerable.Range(1, 10) where n % 2 != 0 select n);
    foreach(var number in numbers) {
        Console.WriteLine(number);
    }

// ...

Update: From John Carney:

PHP

5.2

function not_even($x) {
    return $x & 1 ;
}

$numbers = array(1, 2, 3, 4, 5, 6, 7, 8, 9) ;
$numbers = array_filter($numbers, "not_even") ;

5.3

$numbers = array(1, 2, 3, 4, 5, 6, 7, 8, 9) ;
$numbers = array_filter($numbers, function($x) { return $x & 1 ; }) ;

Enabling git bash completion on OS X

Bash completion, the magic that allows you to start typing the name of a file, directory, etc. in bash then press TAB to complete it, can be taught new tricks, including knowing about your git repository. But if you’re on a Mac, the magic is not installed by defaut.

If you are running git from MacPorts, you probably don’t have the bash_completion variant installed. You can install it with:

sudo port install git-core +bash_completion

If you do already have git installed without this variant, you’ll probably need to deactivate it first:

sudo port deactivate git-core

Then reinstall with the variants you need:

sudo port install git-core +bash_completion +gitweb +svn +doc

You can then activate completion by adding the following to your ~/.bash_profile:

if [ -f /opt/local/etc/bash_completion ]; then
    . /opt/local/etc/bash_completion
fi

Thanks to Denis Barushev for this tip.