In my post “Trivial Sequence Shuffling” from yesterday I showed a simple hack to shuffle a sequence.
For your convenience:
(defun seqrnd (seq)
"Randomize the elements of a sequence. Destructive on SEQ."
(sort seq #'> :key (lambda (x) (random 1.0))))
The only downside to this algorithm, I claimed, is its complexity, which is worse than that of a specialized sorting algorithm.
However that is not the only problem with it. The ensuing discussion (thanks for your comments, guys!) showed that I didn’t notice an allowance made by the standard: the :KEY function might be called more than once, and implementations seem to make use of this. And while it is allowed to return different keys for the same element (i.e. let the :KEY function be not a proper function, but a plain relation), the result will not be a discrete uniform random distribution, but a biased distribution depending on the intricacies of the sorting procedure.
In this post I’d like to show the alternatives proposed and my own amendment, which extends my original flawed solution.
Peter de Wachter proposed:
(defun better-shuffle (seq)
(let ((tagged (mapcar (lambda (x) (cons (random 1.0) x)) seq)))
(mapcar 'cdr (sort tagged #'> :key 'car))))
It’s not that I don’t like it at all (after all it’s a nice case study in functional programming!), but I’m allergic to the consing it produces. I also feel that it overcomplicates things.
Paul Khuong wrote something similar which looks a bit more elegant to me:
(defun random-shuffle (sequence)
(map-into sequence #'car
(sort (map 'vector (lambda (x)
(cons x (random 1d0)))
sequence)
#'< :key #'cdr)))
The other approach (in Scheme) came from Phil Bewig (hope I got the indent right):
(define (shuffle x)
(do ((v (list->vector x)) (n (length x) (- n 1)))
((zero? n) (vector->list v))
(let* ((r (random n)) (t (vector-ref v r)))
(vector-set! v r (vector-ref v (- n 1)))
(vector-set! v (- n 1) t))))
Ah, sorry Phil, but that's not my kind of style :) Too much imperative stuff here, a data conversion and a DO loop. Somehow the name of Rube Goldberg comes to my mind...
The <Good Solution> (alluding to Mark Tarver's Qi blurb) involves a thing as natural as anything to a Lisp (and I include Scheme here) programmer: memoization.
While memoization, i.e. caching function results, is most often used in a performance context, we might also apply it here to make our function always return the same result for an object, thus imposing a total ordering on the list:
(defvar *random-id-ht* nil)
(defun initialize-memo ()
(setf *random-id-ht* (make-hash-table :test #'eq)))
(defun consistent-random-id (obj)
(multiple-value-bind (val found-p) (gethash obj *random-id-ht*)
(if found-p val
(setf (gethash obj *random-id-ht*)
(random 1.0)))))
(defun seqrnd (seq)
"Randomize the elements of a sequence. Destructive on SEQ."
(initialize-memo) ; need to clear between runs
(sort seq #'> :key (lambda (x) (consistent-random-id x))))
This is a scaled down version of a memoizer; you probably don't want to do this at home. A generalized memoizer takes about two screenfuls (talking 24 lines here). You can find generalized memoizers everywhere on the net, for example in the Cells utility library for Common Lisp.
One might argue that this solution is too complicated. I hold against that that memoization should be present in every serious functional programmer's toolbox, so it boils down to the function itself -- which is only marginally longer than the first attempt.