Ironclad hashes in SBCL

Getting a SHA1 digest from Ironclad turned out to be a quite bumpy ride; its “convenience functions” are not convenient enough to take a string, they only operate on octets.

There’s a helper function that converts an ASCII string to octets, but I needed to be able to supply Unicode strings as well. I ended up finding STRING-TO-OCTETS:

(defun sha1 (str)
  (ironclad:byte-array-to-hex-string
    (ironclad::digest-sequence :sha1 (SB-EXT:STRING-TO-OCTETS str))))

If someone knows an easier or more portable way, I’m all for it. I’m also interested in other free implementations’ functions to convert their strings to octets.

Function encapsulation in SBCL

The advice functionality allows the programmer to replace or encapsulate an existing function binding (akin to :AROUND methods in the CLOS). See Gary King’s “What is advice” for some useful links and explanations on this.

CLISP and SBCL, my favourite Common Lisp implementations, do not seem to supply this functionality. I haven’t checked whether CLISP has advice under its hood, but SBCL does, and it’s termed “Function encapsulation” there. Once you know how to do it, it’s surprisingly simple to use.

This is the definition of SB-INT:ENCAPSULATE in SBCL’s src/code/fdefinition.lisp:

;;; Replace the definition of NAME with a function that binds NAME's
;;; arguments to a variable named ARG-LIST, binds name's definition
;;; to a variable named BASIC-DEFINITION, and evaluates BODY in that
;;; context. TYPE is whatever you would like to associate with this
;;; encapsulation for identification in case you need multiple
;;; encapsulations of the same name.
(defun encapsulate (name type body)
  (let ((fdefn (fdefinition-object name nil)))
    (unless (and fdefn (fdefn-fun fdefn))
      (error 'undefined-function :name name))
    ;; We must bind and close over INFO. Consider the case where we
    ;; encapsulate (the second) an encapsulated (the first)
    ;; definition, and later someone unencapsulates the encapsulated
    ;; (first) definition. We don't want our encapsulation (second) to
    ;; bind basic-definition to the encapsulated (first) definition
    ;; when it no longer exists. When unencapsulating, we make sure to
    ;; clobber the appropriate INFO structure to allow
    ;; basic-definition to be bound to the next definition instead of
    ;; an encapsulation that no longer exists.
    (let ((info (make-encapsulation-info type (fdefn-fun fdefn))))
      (setf (fdefn-fun fdefn)
            (named-lambda encapsulation (&rest arg-list)
              (declare (special arg-list))
              (let ((basic-definition (encapsulation-info-definition info)))
                (declare (special basic-definition))
                (eval body)))))))

From this we can figure out that basic advice can be gotten from SBCL like this:

CL-USER(19): (defun gorm (x) (1+ x))
GORM
 
CL-USER(20): (SB-INT:ENCAPSULATE 'gorm 'identity '(apply sb-int:basic-definition sb-int:arg-list))
 
#<closure (SB-C::&OPTIONAL-DISPATCH SB-IMPL::ENCAPSULATION) {ACE428D}>
CL-USER(21): (gorm 10)
11
 
CL-USER(22): (SB-INT:ENCAPSULATE 'gorm 'add-five '(+ 5 (apply sb-int:basic-definition sb-int:arg-list)))
#</closure><closure (SB-C::&OPTIONAL-DISPATCH SB-IMPL::ENCAPSULATION) {ACEE31D}>
 
CL-USER(23): (gorm 10)
16
 
CL-USER(24): (SB-INT:ENCAPSULATE 'gorm 'add-seven '(+ 7 (apply sb-int:basic-definition sb-int:arg-list)))
 
#</closure><closure (SB-C::&OPTIONAL-DISPATCH SB-IMPL::ENCAPSULATION) {ACF84ED}>
CL-USER(25): (gorm)
23
</closure>

We have built a pretty onion, each layer of which calls the next inner layer via BASIC-DEFINITION (the analog to CALL-NEXT-METHOD). Let’s peel specific layers from it:

CL-USER(26): (SB-INT:UNENCAPSULATE 'gorm 'add-five)
T
 
CL-USER(27): (gorm 10)
18
 
CL-USER(28): (SB-INT:UNENCAPSULATE 'gorm 'add-seven)
T
 
CL-USER(29): (gorm 10)
11

A kind of magic

One often has to checking file uploads for correctness, for example with respect to size, file type or file name.

Here’s a sketch for checking the type of image files:

(defun matches-magic (file magic &optional (offset 0))
    (with-open-file (s file :element-type '(unsigned-byte 8))
      (file-position s offset)
      (loop for c across magic
            unless (eql c (code-char (read-byte s)))
            do (return-from matches-magic))
      t))
 
(defun jpeg-p (file) ; won't catch Exif files with JPEG inside
  (matches-magic file "JFIF" 6))
 
(defun png-p (file)
  (matches-magic file "PNG" 1))
 
(defun gif-p (file)
  (matches-magic file "GIF89a"))
 
(defun canonical-image-extension (file)
  (cond
    ((png-p file) "png")
    ((gif-p file) "gif")
    ((jpeg-p file) "jpeg")))
 
(defmacro any-predicate (preds &rest args)
    `(or ,@(loop for p in preds
                 collect `(,p ,@args))))
 
(defun valid-image-p (file)
  (any-predicate (jpeg-p png-p gif-p) file))

ANY-PREDICATE could also be written as a function (with a slightly different form of arguments), here’s another quick draft:

(defun any-predicate (preds &rest args) ; largely untested
  (some #'identity (mapcar (lambda (x) (apply x args))  preds)))
 
(defun valid-image-p (file)
  (any-predicate (list #'jpeg-p #'png-p #'gif-p #'exif-p) file))

Of course, you could also chain SYMBOL-FUNCTION to get rid of the sharp-signs in the call. Whatever suits you.
I like the macro better, though, since it’s clearer and probably more efficient. Update: see below for a comment by Zach Beane on this.

Homework would be writing a simple DSL to jot down file type data:

(define-file-type "jpeg" JFIF 6)

Alternatives from the outer world would be calling file(1) or parsing magic(4).

Escaping from higher-order functions

Among the different ways to tackle iterative processes I consider to be the higher-order function route the one I use most often. Especially in conjunction with function composition and list predicates it makes great filtering-style code.

A problem that came up several times is interrupting the list processing at an arbitrary point. For example, how do I stop MAPCAR at the third item if I see it fit? Something like that:

(let ((i 0))
  (mapcar (lambda (x)
            (when (eql (incf i) 3)
              (STOP-HERE)))
    '(a b c d)))

Sometimes, one can take alternative routes using FIND or other functions. And there’s always LOOP, DOLIST, recursion and other solutions; after all we’re working in a language that really implements the “there’s more than one way to do it” paradigm.

Using TAGBODY and GO:

(tagbody (mapcar (lambda (x) (go X)) '(a b c)) X)

Using BLOCK/RETURN-FROM:

(block X (mapcar (lambda (x) (return-from X))(a b c)))

I originally posted another version using CATCH/THROW, but as pointed out in the comments this wasn’t all well for several reasons.

Delimited continuations with CL-CONT unfortunately won’t work.

A good alternative solution would be writing your own version of MAPCAR; this has the advantage of being able to return the partially processed list. But there’s the disadvantage of having to rewrite the whole family of mapping functions, though, so for quick prototyping or occasional usage I prefer the above solution.

If you have other neat solutions or see problems with this approach, please leave a comment.

Analyzing return values with a recursive macro

Printing the argument and return values of a set of nested or sequential expressions is a common debugging tactic.

In Common Lisp we can make use of a recursive macro instead of manually inserting printing statements. Specifically, we want our macro (let’s call it WALK for want of a better name since it’s a simple code walker) to print all the return values it encounters along its way:

(walk (list (+ 5 (* 3 3))
            "Welcome to earth, third rock from the sun!"))
 
(* 3 3) => 9
(+ 5 (* 3 3)) => 14
(LIST (+ 5 (* 3 3)) "Welcome to earth, third rock from the sun!")
  => (14 "Welcome to earth, third rock from the sun!")

The following macro does that job.

(defmacro walk (form)
    (etypecase form
       (atom ; terminating base case
         form)
       (cons
         `(let ((result (,(first form) ,@(mapcar (lambda (arg) `(walk ,arg))
                                               (rest form)))))
            (format t "~S => ~S~%" ',form result)
            result))))

Modifying the output so it prints

(* 3 3) => 9
(+ 5 9) => 14
(LIST 14 "Welcome to earth, third rock from the sun!")
  => (14 "Welcome to earth, third rock from the sun!")

instead is left as an exercise to the reader (bad puns come easily in English…), as is the addition of an optional DEPTH argument.

It would also be nice to make use of the pretty printer for appropriate indentation, but I have severe problems grokking it, so maybe someone familiar with this facility can help.

In other programming languages the same problem requires a bit more effort.

Sequence shuffling wrap-up

This post has two parts, a social and a technical one. Scroll down to see the technical one if you don’t care about the former.

For me, Common Lisp offers two fundamental modes of programming, often mixing together.

The first mode is serious software development (by which I’m attempting to pay my rent right now). Everything needs to work here. Thread safety, correctness, safety over speed, enough speed nonetheless, pragmatism, … well, I guess you know the score.

The second mode is exploration and experiments. My last two posts on the randomization of sequences were just that, and collaboratively that. After all, my blog is neither a newspaper nor a research journal. Unfortunately some readers seem to have misunderstood this. I suppose this is partly owing to my usual portion of kidding hyperbole, like my stating that I were “allergic to consing” and had found an exceptionally “good solution”.

So, to make it clear: both of these posts were not about presenting a well-tested algorithm ready for public copy-and-paste, but something to think about — for you and for me.
Tagging the list’s elements with a random id beforehand, as in the solutions Paul and Phil came up with, is a very nice, solid and working solution. Unfortunately this one is also so straight-forward that it hasn’t much potential for intellectual stimulation (at least not for me), and that’s why I decided to pursue the idea with hash tables and memoization further.

Now for the technical part of this wrap-up: SEQRND works for all objects where EQ reliably points out differences. This definitely excludes literals (the will be constant-folded more often than not) but includes CLOS objects. SEQRND also was three times faster than Paul’s RANDOM-SHUFFLE on SBCL.

My personal bottom line is: use one of the solid algorithms all the time and rely on Fisher-Yates if you need the speed.

Sequence shuffling revisited

In my post “Trivial Sequence Shuffling” from yesterday I showed a simple hack to shuffle a sequence.

For your convenience:

(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (sort seq #'> :key (lambda (x) (random 1.0))))

The only downside to this algorithm, I claimed, is its complexity, which is worse than that of a specialized sorting algorithm.

However that is not the only problem with it. The ensuing discussion (thanks for your comments, guys!) showed that I didn’t notice an allowance made by the standard: the :KEY function might be called more than once, and implementations seem to make use of this. And while it is allowed to return different keys for the same element (i.e. let the :KEY function be not a proper function, but a plain relation), the result will not be a discrete uniform random distribution, but a biased distribution depending on the intricacies of the sorting procedure.

In this post I’d like to show the alternatives proposed and my own amendment, which extends my original flawed solution.

Peter de Wachter proposed:

(defun better-shuffle (seq)
  (let ((tagged (mapcar (lambda (x) (cons (random 1.0) x)) seq)))
    (mapcar 'cdr (sort tagged #'> :key 'car))))

It’s not that I don’t like it at all (after all it’s a nice case study in functional programming!), but I’m allergic to the consing it produces. I also feel that it overcomplicates things.

Paul Khuong wrote something similar which looks a bit more elegant to me:

(defun random-shuffle (sequence)
  (map-into sequence #'car
            (sort (map 'vector (lambda (x)
                                 (cons x (random 1d0)))
                       sequence)
                  #'< :key #'cdr)))

The other approach (in Scheme) came from Phil Bewig (hope I got the indent right):

(define (shuffle x)
  (do ((v (list->vector x)) (n (length x) (- n 1)))
    ((zero? n) (vector->list v))
    (let* ((r (random n)) (t (vector-ref v r)))
      (vector-set! v r (vector-ref v (- n 1)))
      (vector-set! v (- n 1) t))))

Ah, sorry Phil, but that's not my kind of style :) Too much imperative stuff here, a data conversion and a DO loop. Somehow the name of Rube Goldberg comes to my mind...

The <Good Solution> (alluding to Mark Tarver's Qi blurb) involves a thing as natural as anything to a Lisp (and I include Scheme here) programmer: memoization.

While memoization, i.e. caching function results, is most often used in a performance context, we might also apply it here to make our function always return the same result for an object, thus imposing a total ordering on the list:

(defvar *random-id-ht* nil)
 
(defun initialize-memo ()
  (setf *random-id-ht* (make-hash-table :test #'eq)))
 
(defun consistent-random-id (obj)
  (multiple-value-bind (val found-p) (gethash obj *random-id-ht*)
    (if found-p val
      (setf (gethash obj *random-id-ht*)
            (random 1.0)))))
 
(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (initialize-memo) ; need to clear between runs
  (sort seq #'> :key (lambda (x) (consistent-random-id x))))

This is a scaled down version of a memoizer; you probably don't want to do this at home. A generalized memoizer takes about two screenfuls (talking 24 lines here). You can find generalized memoizers everywhere on the net, for example in the Cells utility library for Common Lisp.

One might argue that this solution is too complicated. I hold against that that memoization should be present in every serious functional programmer's toolbox, so it boils down to the function itself -- which is only marginally longer than the first attempt.

Trivial sequence shuffling

IMPORTANT NOTE: the algorithm described in this post doesn’t produce a uniform distribution of shuffled sequences (i.e. it is not fair). Be sure to read the comments and the follow-up posts.

Here’s a quickie for the newbies among you:

(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (sort seq #'> :key (lambda (x) (random 1.0))))
 
(seqrnd (copy-seq "abcd")) ; need to copy the literal for the destructive operation
-> "dacb"

This also shifts the efficiency of the shuffle to the efficiency of your implementation’s sorting algorithm. Look for the Fisher-Yates shuffling algorithm if you’re really, really tight on resources; it does something akin to Bubble Sort to solve the problem.

A minimalistic web site compiler

The setup of small non-dynamic web sites is an often recurring task. Unfortunately, there’s a lot of repeated content, most of it in the header of the sites that doesn’t change except for the title.

SSI doesn’t really help because its syntax is clumsy and the files are compiled again on each request. It also needs to be enabled on the target host.

A solution that compiles standard HTML from templates is best for performance and portability. It’s possible to use on of the many HTML template toolkits out there, but why bother? All I want is a standardized header and footer and a page title varying with each HTML file.

So let’s do just that:

#!/bin/bash
set -e
 
OUTDIR=./build
 
mkdir -p "$OUTDIR"
 
# wrap header and footer, set <title> from first line of body
for f in {page1,page2,page3}.html; do
  TITLE=$(echo $(head -n1 $f) | perl -p -e 's/^TITLE: *(.*?)$/$1/')
 
  (sed "s|TITLE|$TITLE|" header.html ;
   sed '1d' $f;
   cat footer.html) > "$OUTDIR/$f"
done
 
# auxiliary files
for f in logo.png default.css; do
  cp $f "$OUTDIR/$f"
done
</title>

The files that get wrapped into header and footer need to set their title on the first line:

TITLE: my page title
<p>So this is it...</p>

If you don’t like Perl, you can also whip up a sed or awk script in its place.

STRING+ for the rest of us

About two weeks ago Franz, Inc. announced a new string utility function named STRING+ for their Common Lisp implementation.

It embodies the string concatenation paradigm of CONCATENATE, which is orthogonal to that of FORMAT. Some people don’t like FORMAT at all, but I think both have their uses.
I absolutely hate using the concatenation way for doing things like (here in JavaScript/Java/C++):

"You ate " + A + " apples, " + B + " bananas and " + C + " " + OTHER_THING + "."

The way the punctuation marks are placed here drives me totally nuts, and I’d rather prefer a lovely

"You ate ~A apples, ~A bananas and ~A ~A." A B C OTHER_THING

But the former method is useful for building file system paths and URLs, for example (MERGE-PATHNAMES aside).

So basically STRING+ is a shorthand for CONCATENATE ‘STRING. Franz claims speed improvements for small values, but I don’t need that. Short and sweet, here’s the functional equivalent:

(defun string+ (&rest parts)
  (with-output-to-string (out)
    (dolist (part parts)
      (write part :stream out :escape nil))))

Works as advertised, except for the example with the keyword symbol, where case may vary depending on your implementation’s default case conventions (CLISP prints it uppercase).

« Previous PageNext Page »