April 6, 2008
Sequence shuffling revisited
In my post “Trivial Sequence Shuffling” from yesterday I showed a simple hack to shuffle a sequence.
For your convenience:
(defun seqrnd (seq) "Randomize the elements of a sequence. Destructive on SEQ." (sort seq #'> :key (lambda (x) (random 1.0))))
The only downside to this algorithm, I claimed, is its complexity, which is worse than that of a specialized sorting algorithm.
However that is not the only problem with it. The ensuing discussion (thanks for your comments, guys!) showed that I didn’t notice an allowance made by the standard: the :KEY function might be called more than once, and implementations seem to make use of this. And while it is allowed to return different keys for the same element (i.e. let the :KEY function be not a proper function, but a plain relation), the result will not be a discrete uniform random distribution, but a biased distribution depending on the intricacies of the sorting procedure.
In this post I’d like to show the alternatives proposed and my own amendment, which extends my original flawed solution.
Peter de Wachter proposed:
(defun better-shuffle (seq) (let ((tagged (mapcar (lambda (x) (cons (random 1.0) x)) seq))) (mapcar 'cdr (sort tagged #'> :key 'car))))
It’s not that I don’t like it at all (after all it’s a nice case study in functional programming!), but I’m allergic to the consing it produces. I also feel that it overcomplicates things.
Paul Khuong wrote something similar which looks a bit more elegant to me:
(defun random-shuffle (sequence) (map-into sequence #'car (sort (map 'vector (lambda (x) (cons x (random 1d0))) sequence) #'< :key #'cdr)))
The other approach (in Scheme) came from Phil Bewig (hope I got the indent right):
(define (shuffle x) (do ((v (list->vector x)) (n (length x) (- n 1))) ((zero? n) (vector->list v)) (let* ((r (random n)) (t (vector-ref v r))) (vector-set! v r (vector-ref v (- n 1))) (vector-set! v (- n 1) t))))
Ah, sorry Phil, but that's not my kind of style :) Too much imperative stuff here, a data conversion and a DO loop. Somehow the name of Rube Goldberg comes to my mind...
The <Good Solution> (alluding to Mark Tarver's Qi blurb) involves a thing as natural as anything to a Lisp (and I include Scheme here) programmer: memoization.
While memoization, i.e. caching function results, is most often used in a performance context, we might also apply it here to make our function always return the same result for an object, thus imposing a total ordering on the list:
(defvar *random-id-ht* nil) (defun initialize-memo () (setf *random-id-ht* (make-hash-table :test #'eq))) (defun consistent-random-id (obj) (multiple-value-bind (val found-p) (gethash obj *random-id-ht*) (if found-p val (setf (gethash obj *random-id-ht*) (random 1.0))))) (defun seqrnd (seq) "Randomize the elements of a sequence. Destructive on SEQ." (initialize-memo) ; need to clear between runs (sort seq #'> :key (lambda (x) (consistent-random-id x))))
This is a scaled down version of a memoizer; you probably don't want to do this at home. A generalized memoizer takes about two screenfuls (talking 24 lines here). You can find generalized memoizers everywhere on the net, for example in the Cells utility library for Common Lisp.
One might argue that this solution is too complicated. I hold against that that memoization should be present in every serious functional programmer's toolbox, so it boils down to the function itself -- which is only marginally longer than the first attempt.
Comments(11)
Your new seqrnd function will fail to randomize properly lists with duplicate elements in them.
(Also, saying that you’re allergic to consing and then using a hash table for each run is pretty inconsistent. Have you measured the consing for your solution?)
Christophe: indeed, it should be an EQ hash table. Fixed.
I didn’t benchmark the consing (yet), but I’m positive the hash table will be faster; I’ll check this later.
No, an EQ hash table does not fix your problem.
Test cases:
(seqrnd (list 1 1 1 2 3))
(let ((x “foo”))
(seqrnd (list x x x 3 x)))
Here are the results, comparing Paul Khuong’s RANDOM-SHUFFLE with SEQRND.
I raised the question of how to shuffle a list in comp.lang.scheme in May 2002. The imperative solution that I posted in response to your previous blog entry follows Knuth’s method for shuffling a vector, and is O(n). Any method involving sorting will be O(n log n), but for practical purposes the constant factors will probably defeat the asymptotic bounds. The most functional-looking shuffle came from Joe Marshall; it partitions the list into two pieces deterministically, shuffles them recursively, then merges them back together randomly:
A faster functional shuffle comes from Al Petrofsky; it first partitions the list randomly, then merges the two pieces back together randomly:
If you speak Haskell, Oleg Kiselyov presented a perfect shuffle of complexity O(n log n) in comp.lang.functional in September 2001. I won’t quote it here, because the implementation only makes sense if you read the accompanying derivation.
What’s the trick to make the code indent properly?
Use <pre lang=”lisp”>CODE</pre>
Christophe: I can’t think of a way to fix the problem with literals. Do you?
It would be really nice to do this with memoizing.
But having it working except for literals is already a nice thing.
Don’t set the global variable. Use LET to bind a value to it.
foo: what are you referring to?
It’s not a problem with literals; it’s a problem with duplicate entries.
What I imagine “foo” is referring to is that your function isn’t thread-safe, because the binding for seqrnd is shared amongst all threads. (seqrnd is also not re-entrant: if you end up in the debugger, for example, and call seqrnd from there, you will destroy the current execution’s state.)
To fix these problems, use one of the correct solutions that you are “allergic” to.