Analyzing return values with a recursive macro

Printing the argument and return values of a set of nested or sequential expressions is a common debugging tactic.

In Common Lisp we can make use of a recursive macro instead of manually inserting printing statements. Specifically, we want our macro (let’s call it WALK for want of a better name since it’s a simple code walker) to print all the return values it encounters along its way:

(walk (list (+ 5 (* 3 3))
            "Welcome to earth, third rock from the sun!"))
 
(* 3 3) => 9
(+ 5 (* 3 3)) => 14
(LIST (+ 5 (* 3 3)) "Welcome to earth, third rock from the sun!")
  => (14 "Welcome to earth, third rock from the sun!")

The following macro does that job.

(defmacro walk (form)
    (etypecase form
       (atom ; terminating base case
         form)
       (cons
         `(let ((result (,(first form) ,@(mapcar (lambda (arg) `(walk ,arg))
                                               (rest form)))))
            (format t "~S => ~S~%" ',form result)
            result))))

Modifying the output so it prints

(* 3 3) => 9
(+ 5 9) => 14
(LIST 14 "Welcome to earth, third rock from the sun!")
  => (14 "Welcome to earth, third rock from the sun!")

instead is left as an exercise to the reader (bad puns come easily in English…), as is the addition of an optional DEPTH argument.

It would also be nice to make use of the pretty printer for appropriate indentation, but I have severe problems grokking it, so maybe someone familiar with this facility can help.

In other programming languages the same problem requires a bit more effort.

Sequence shuffling wrap-up

This post has two parts, a social and a technical one. Scroll down to see the technical one if you don’t care about the former.

For me, Common Lisp offers two fundamental modes of programming, often mixing together.

The first mode is serious software development (by which I’m attempting to pay my rent right now). Everything needs to work here. Thread safety, correctness, safety over speed, enough speed nonetheless, pragmatism, … well, I guess you know the score.

The second mode is exploration and experiments. My last two posts on the randomization of sequences were just that, and collaboratively that. After all, my blog is neither a newspaper nor a research journal. Unfortunately some readers seem to have misunderstood this. I suppose this is partly owing to my usual portion of kidding hyperbole, like my stating that I were “allergic to consing” and had found an exceptionally “good solution”.

So, to make it clear: both of these posts were not about presenting a well-tested algorithm ready for public copy-and-paste, but something to think about — for you and for me.
Tagging the list’s elements with a random id beforehand, as in the solutions Paul and Phil came up with, is a very nice, solid and working solution. Unfortunately this one is also so straight-forward that it hasn’t much potential for intellectual stimulation (at least not for me), and that’s why I decided to pursue the idea with hash tables and memoization further.

Now for the technical part of this wrap-up: SEQRND works for all objects where EQ reliably points out differences. This definitely excludes literals (the will be constant-folded more often than not) but includes CLOS objects. SEQRND also was three times faster than Paul’s RANDOM-SHUFFLE on SBCL.

My personal bottom line is: use one of the solid algorithms all the time and rely on Fisher-Yates if you need the speed.

Sequence shuffling revisited

In my post “Trivial Sequence Shuffling” from yesterday I showed a simple hack to shuffle a sequence.

For your convenience:

(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (sort seq #'> :key (lambda (x) (random 1.0))))

The only downside to this algorithm, I claimed, is its complexity, which is worse than that of a specialized sorting algorithm.

However that is not the only problem with it. The ensuing discussion (thanks for your comments, guys!) showed that I didn’t notice an allowance made by the standard: the :KEY function might be called more than once, and implementations seem to make use of this. And while it is allowed to return different keys for the same element (i.e. let the :KEY function be not a proper function, but a plain relation), the result will not be a discrete uniform random distribution, but a biased distribution depending on the intricacies of the sorting procedure.

In this post I’d like to show the alternatives proposed and my own amendment, which extends my original flawed solution.

Peter de Wachter proposed:

(defun better-shuffle (seq)
  (let ((tagged (mapcar (lambda (x) (cons (random 1.0) x)) seq)))
    (mapcar 'cdr (sort tagged #'> :key 'car))))

It’s not that I don’t like it at all (after all it’s a nice case study in functional programming!), but I’m allergic to the consing it produces. I also feel that it overcomplicates things.

Paul Khuong wrote something similar which looks a bit more elegant to me:

(defun random-shuffle (sequence)
  (map-into sequence #'car
            (sort (map 'vector (lambda (x)
                                 (cons x (random 1d0)))
                       sequence)
                  #'< :key #'cdr)))

The other approach (in Scheme) came from Phil Bewig (hope I got the indent right):

(define (shuffle x)
  (do ((v (list->vector x)) (n (length x) (- n 1)))
    ((zero? n) (vector->list v))
    (let* ((r (random n)) (t (vector-ref v r)))
      (vector-set! v r (vector-ref v (- n 1)))
      (vector-set! v (- n 1) t))))

Ah, sorry Phil, but that's not my kind of style :) Too much imperative stuff here, a data conversion and a DO loop. Somehow the name of Rube Goldberg comes to my mind...

The <Good Solution> (alluding to Mark Tarver's Qi blurb) involves a thing as natural as anything to a Lisp (and I include Scheme here) programmer: memoization.

While memoization, i.e. caching function results, is most often used in a performance context, we might also apply it here to make our function always return the same result for an object, thus imposing a total ordering on the list:

(defvar *random-id-ht* nil)
 
(defun initialize-memo ()
  (setf *random-id-ht* (make-hash-table :test #'eq)))
 
(defun consistent-random-id (obj)
  (multiple-value-bind (val found-p) (gethash obj *random-id-ht*)
    (if found-p val
      (setf (gethash obj *random-id-ht*)
            (random 1.0)))))
 
(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (initialize-memo) ; need to clear between runs
  (sort seq #'> :key (lambda (x) (consistent-random-id x))))

This is a scaled down version of a memoizer; you probably don't want to do this at home. A generalized memoizer takes about two screenfuls (talking 24 lines here). You can find generalized memoizers everywhere on the net, for example in the Cells utility library for Common Lisp.

One might argue that this solution is too complicated. I hold against that that memoization should be present in every serious functional programmer's toolbox, so it boils down to the function itself -- which is only marginally longer than the first attempt.

Trivial sequence shuffling

IMPORTANT NOTE: the algorithm described in this post doesn’t produce a uniform distribution of shuffled sequences (i.e. it is not fair). Be sure to read the comments and the follow-up posts.

Here’s a quickie for the newbies among you:

(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (sort seq #'> :key (lambda (x) (random 1.0))))
 
(seqrnd (copy-seq "abcd")) ; need to copy the literal for the destructive operation
-> "dacb"

This also shifts the efficiency of the shuffle to the efficiency of your implementation’s sorting algorithm. Look for the Fisher-Yates shuffling algorithm if you’re really, really tight on resources; it does something akin to Bubble Sort to solve the problem.

STRING+ for the rest of us

About two weeks ago Franz, Inc. announced a new string utility function named STRING+ for their Common Lisp implementation.

It embodies the string concatenation paradigm of CONCATENATE, which is orthogonal to that of FORMAT. Some people don’t like FORMAT at all, but I think both have their uses.
I absolutely hate using the concatenation way for doing things like (here in JavaScript/Java/C++):

"You ate " + A + " apples, " + B + " bananas and " + C + " " + OTHER_THING + "."

The way the punctuation marks are placed here drives me totally nuts, and I’d rather prefer a lovely

"You ate ~A apples, ~A bananas and ~A ~A." A B C OTHER_THING

But the former method is useful for building file system paths and URLs, for example (MERGE-PATHNAMES aside).

So basically STRING+ is a shorthand for CONCATENATE ‘STRING. Franz claims speed improvements for small values, but I don’t need that. Short and sweet, here’s the functional equivalent:

(defun string+ (&rest parts)
  (with-output-to-string (out)
    (dolist (part parts)
      (write part :stream out :escape nil))))

Works as advertised, except for the example with the keyword symbol, where case may vary depending on your implementation’s default case conventions (CLISP prints it uppercase).

Why choose Elephant?

Over the past few years quite a few solutions for object persistency in CL have emerged.

For my current project, I have chosen Elephant, after having looked at other popular alternatives. In this post, I’d like to talk about my reasons for choosing Elephant, taking a comparative approach. I didn’t take a look at the LW and ACL solutions since they are either not free or not portable.

Flexibility

In Elephant, I have flexibility, and in more than one way:

First, you choose your backend, and you don’t do it once and for virtual eternity. Elephant works best with Berkeley DB, but also has a performant (so I’m told) Postmodern backend, usable CL-SQL and SQLite3 backends and an experimental SEXP backend. Switching among those both in the development stage and in the production stage is (at least theoretically) easy. CL-PEREC and Submarine only work with PostgreSQL.

Second, you choose the storage model. I’m going to talk about this in a coming post, for now just accept the idea that you are in command and are able to choose the model which works best for you. Hey, doesn’t that resemble Common Lisp philosophy? :)

Third, Elephant is not an object-relational mapper. While a lot of people might that see as a disadvantage (it only stores key-value pairs at the database level). But this leads to a very flexible backend model and enables easier schema evolution.

Maturity

Elephant is used in production environments (you can read about that in the manual). The only other library being used in this way is, to my knowledge, CL-PEREC.
And in some areas of Submarine still has its still be dragons.

Liveliness

An active development and user community is vital to help you with using and hacking the library. Rucksack and CL-Prevalence don’t have that. CL-PEREC does.

Ease of use

Wow, CL-PEREC is the best negative example here. To try it, install about a dozen (no, that’s not hyperbole) packages from Darcs repositories. Then figure out from some test case output how it works. Yuk. Elephant doesn’t work out of the box either, but you only have to adapt a simple configuration file to your environment and comes with a good manual.

But ease of use is actually one of the big reasons for choosing object persistency instead of the all-popular SQL. I can just put away my objects instead of defining views, classes, relations in some DSL I don’t really wish to know details about.

Conclusion

Not mentioned here is PLOB!, which seems to me to be an unmaintained project using concepts that Elephant embodies in a clean way and expands upon.

The combination of the above factors was what drove me to use Elephant. Rucksack also seems to be sensible stuff, but it’s not mature yet (in terms of community, documentation, stability and features).

Naturally, there are some deficiencies in Elephant, but they don’t hit where it hurts. The code is a bit crufty in some spots, for example, it uses feature macros to work with the MOP instead of using a library like Closer to MOP (though that’s work in progress). It also doesn’t support advanced (semantical) schema evolution (but no other does), though it copes with slot addition and removal. When you change your storage semantices, you can convert the data manually by mapping over it.
The manual and Trac page of Elephant also state that there’s no query language, but in my experience Elephant already offers enough querying features for more purposes.

Problems being solved by none of the libraries, apart from semantical schema evolution, are function, closure and continuation serialization. But since this is Lisp you can for simple purposes just store forms and COMPILE them as needed, and there’s more than one silver lining on the horizon, see Paul’s Common Cold and David’s SB-HEAPDUMP (both SBCL-specific right now).

What are your experiences with object persistency and data storage in Lisp? I’m curious.

Multiple stores in Elephant

For simplicity, it’s good to have only one store in Elephant.
Often however this doesn’t scale well with reality :)

For example, in the browser game I am developing, we have separate “worlds” with separate sets of players.
The items are to be administrated centrally, though, so the natural solution is having one shared store for them.

The easiest way to do this is having two store controllers. An alternative would be rolling your own solution using separate indexed B trees (I went down that road first), but handling things at the store controller level lets us use all the existing indexed class infrastructure.

Now there’s good and bad news: the good news is that it’s possible to do this. The bad news is that the interface is a bit clumsy. Most functions do not accept a store controller argument but refer directly to the global *store-controller*. Yuk.

So what we need to do is find the set of functions that need to refer to the alternative store controller and let them have a nice interface. For the item example (consider all Elephant symbols to be imported, I just added the package prefix to the functions for clarity):

(defun get-items-by-type (type)
  (let ((*store-controller* *item-sc*))
    (elephant:get-instances-by-class type)))

Try exceptionally hard to draw clear lines between the different store controllers, or you’ll be in imperative hell before you can say “Practical Extraction and Report Language”. If you’re nesting those functions, use dynamic variables (also known as “globals done right”):

(defun first-thing ()
  (declare (special *store-controller*)) ; refer to the dynamic binding
  (get-instances-by-value 'item 'price 5))
 
(defun do-complex-things ()
  (let ((*store-controller* *item-sc*))
    (declare (special *store-controller*)) ; provide a dynamic binding
    (first-thing)))

EDIT: As instructive as this is, Mikael Jansson has pointed out that we don’t need the whole DECLARE hanky-panky since DEFVAR/DEFPARAMETER automatically establish variables as dynamic. Seems to be another case of the “do what I mean” support in Common Lisp.

Another thing to take care of is opening and closing alternative stores.

Self-explanatory, but not that easy to find out if you’re just starting out with Elephant:

;;; init
(defvar *item-sc* nil) ; newbies note that DEFVAR only assigns to non-existing variables
(when (null *item-sc*)
  (setf *item-sc* (elephant:open-store *item-store*)) ; *item-store* defined elsewhere
 
;;; shutdown
(elephant:close-store *item-sc*)

If you’re interested in more Elephant posts, drop me a short comment.

Automatic table layout

GTK+ has a table layout container that lets you render widgets in a grid. The widgets get assigned to the table cells automatically. I haven’t looked at their rendering algorithm, but this does something similar in a Weblocks widget:

(in-package :weblocks)
 
(export '(table-composite render-widget-body))
 
(defwidget table-composite (composite)
  ((cols :type integer :accessor cols :initarg :cols :initform 0))
  (:documentation "Renders a set of widgets in table layout."))
 
(defmethod render-widget-body ((widget table-composite) &rest args)
  (let* ((widgets (composite-widgets widget))
         (num-widgets (list-length widgets))
         (cols (cols widget))
         (rows (ceiling (/ num-widgets cols))))
    (with-html
      (:table :rows rows :cols cols
        (loop for r from 0 below rows
              do (htm (:tr
              (loop for c from 0 below cols
                    do (let ((child (nth (+ c (* r cols)) widgets)))
                         (htm (:td (when child (render-widget child)))))))))))))
 
;;; EXAMPLE CODE:
(make-instance 'table-composite :cols 3
  :widgets (loop for i from 1 to 10
                 collect (write-to-string i)))

Yeah, I know a DOLIST would do the same as the LOOP here, but I really like LOOP.

This should be easily adaptable for generic HTML rendering.

Laying logging foundations

Gary W. King’s logging library log5 is a lispy way to do logging.

It’s simple to set up and should probably be the first dependency you add to any serious project.

Unfortunately, the User Guide‘s examples are a bit bland.

So here’s some salt (for simplicity let us also ignore log5′s default categories and outputs):

Preliminaries

For comfort.

(defpackage :our-package
  (:use :common-lisp :log5))
 
(in-package :our-package)

Categories

Define new categories every time you add a semantic part to your project:

(defcategory borg-attack)
(defcategory federation-cabal)
(defcategory ion-flux)
(defcategory all-categories (or borg-attack federation-cabal ion-flux))

…or want to differentiate on the seriousness level of logging messages:

(defcategory debug)
(defcategory error)
(defcategory info)
(defcategory all-levels (or debug info error))

Outputs

Outputs get added to every logging message automatically.
They are evaluated at the time the log messages gets sent.

Let’s add one for the time and one for a line break:

(defoutput newline (format nil "~%"))
 
(defoutput time-hms
  (multiple-value-bind (second minute hour day month year)
    (decode-universal-time (get-universal-time))
    (format nil "~D:~2,'0D:~2,'0D" hour minute second)))

And one for the load averages, so we can see whether an error might have occurred due to heavy load (maybe a race condition):

(defoutput load (format nil "[~A]" (load-averages)))

If your project is web-based, you might also want to add the username associated with the current session, like this:

(defoutput username (format nil "[~A]"
  (when (boundp hunchentoot:*session*)
    (session-value 'username))))

Finally, if you’re running SBCL and don’t mind a bit of a hack, you can also identify the function context:

(defmacro current-function-name-log5 ()
  `(caaddr (sb-debug::backtrace-as-list)))

(defoutput function-name (format nil “[~A]” (current-function-name-log5)))

Senders

Senders decide where logging messages from certain categories go, and what they look like.
Here’s one that will log messages from all of the above categories to the standard output, utilizing all of the outputs we defined:

(start-sender 'debug
  (stream-sender :location *standard-output*)
  :category-spec '(all-levels all-categories)
  :output-spec '(time-hms load username function-name message newline))

Usage

You would log a message like this:

(log-for (borg-attack ion-flux)
  "The ion flux of vessel ~A broke down due to a borg attack"
  (get-current-vessel))

That’s already incredibly useful!

Getting started with CFFI

I really like the Lisp approach of accessing foreign functions; it puts the programmer in charge (as usual) instead of making him wait for some bindings to appear or get updated.

Here’s a little recipe that shows how to get the load averages (the thing uptime shows) in Lisp, which will be useful later when we build a solid logging foundation with Gary’s log5 package.
In case you don’t know, the load average shows an approximation of the number of processes in the system’s task queue, thus serving as indication for machine load.

First, let’s do the initialization work for CFFI, as pointed out in its user guide:

(asdf:oos 'asdf:load-op 'cffi)
 
(defpackage :cffi-user
  (:use :common-lisp :cffi))
 
(in-package :cffi-user)
 
(define-foreign-library libc
  (:unix (:or "libc.so.6" "libc.so.5" "libc.so"))
  (t (:default "libc.so")))
 
(use-foreign-library libc)

Now we actually need to start thinking. How do we get at the numbers?
Let’s find the C function:

% apropos load | egrep -i "average|avg"
getloadavg           (3)  - get system load averages
[...]
% man 3 getloadavg

The man page gives us this prototype:

int getloadavg(double loadavg[], int nelem);

It also tells us that the first parameter will be filled with nelem samples and notes that (at least on my Linux system) the maximum number of samples is three, denoting the load averages of the last 1, 5 and 15 minutes. Let’s say that we want all three.

Now unfortunately the CFFI manual doesn’t say anything about arrays. But we can rewrite the prototype as

int getloadavg(double* loadavg, int nelem);

leading to the following CFFI function spec:

(defcfun "getloadavg" :int (loadavg :pointer) (nelem :int))

Now we are able to use foreign-alloc to allocate a pointer of the correct size (i.e. 3*sizeof(double)), and mem-aref to access the resulting array.

Combined with matching LOOP and FORMAT programs and the manual garbage collection we get:

(defun load-averages ()
  (let ((loadavg (foreign-alloc :double :count 3)))
    (getloadavg loadavg 3) ; note the imperative style we are forced to use
    (prog1 ; we need to clean up after producing the return value
      (format nil "~{~,2F~^ ~}" (loop for i from 0 to 2
                                    collect (mem-aref loadavg :double i)))
      (foreign-free loadavg))))

If you have questions regarding any part of that last snippet, feel free to ask.

Note that we don’t do any error checking here; getloadavg will return -1 on failure, although I can’t imagine why it would do so.

You can access the full code at http://paste.lisp.org/display/54746.

I hope this post wasn’t overly verbose (read: boring) to you.
It was my intention to make this understandable for beginners.

« Previous PageNext Page »