April 5, 2008
Trivial sequence shuffling
IMPORTANT NOTE: the algorithm described in this post doesn’t produce a uniform distribution of shuffled sequences (i.e. it is not fair). Be sure to read the comments and the follow-up posts.
Here’s a quickie for the newbies among you:
(defun seqrnd (seq) "Randomize the elements of a sequence. Destructive on SEQ." (sort seq #'> :key (lambda (x) (random 1.0)))) (seqrnd (copy-seq "abcd")) ; need to copy the literal for the destructive operation -> "dacb"
This also shifts the efficiency of the shuffle to the efficiency of your implementation’s sorting algorithm. Look for the Fisher-Yates shuffling algorithm if you’re really, really tight on resources; it does something akin to Bubble Sort to solve the problem.
Comments(17)
That’s perverse, but cute. I had to implement a shuffle recently, and the best I could come up with (copying some Emacs code that worked on vectors) was:
(defun shuffle (list)
“Shuffle a list randomly.”
(let ((len (length list)))
(loop for i from 0 to (1- len)
do (rotatef (elt list i) (elt list (random len)))))
list)
But then, I definitely qualify as a newbie to CL.
Actually, now I think of it, would it work to do something like (sort seq #’(lambda (x) (< (random 1.0) 0.5))) instead?
Ah, you want to flip a coin.
Should work well, too. :)
Why would you classify the solution in my post as perverse?
Personally I consider it to be quite elegant; it’s also in line with Donald Knuth’s basic shuffling algorithm.
> Why would you classify the solution in my post as perverse?
I don’t know about “perverse”, but I don’t like it because you can’t say anything about the quality of the shuffling.
Randomizing a sequence in Factor…
In Trivial Sequence Shuffling, Leslie P. Polzer posted an easy and short version of a sequence shuffler in Common Lisp. The sort function has to do all the work. Not a recommended way to shuffle your sequences, but a short……
w00f: what kind of statements would you like to make?
Using random gives you an even distribution.
That’s a very elegant bit of code you have there!
[...] Polzer has a nice codelet for shuffling a sequence. I had always relied on don’t lots of rotatefs and had never been [...]
This shuffle algorithm is flawed because there is no guarantee RANDOM as a :KEY function will be called once and only once per element; if RANDOM is called multiple times for an element, then it will likely give inconsistent results, possibly poisoning the sort.
Joseph: the CLHS says:
If the key and predicate always return, then the sorting operation will always terminate, producing a sequence containing the same elements as sequence (that is, the result is a permutation of sequence). This is guaranteed even if the predicate does not really consistently represent a total order (in which case the elements will be scrambled in some unpredictable way, but no element will be lost). If the key consistently returns meaningful keys, and the predicate does reflect some total ordering criterion on those keys, then the elements of the sorted-sequence will be properly sorted according to that ordering.
Your wording is rather hazy, so I don’t know what exactly you’re concerned with, but for me it suffices that the sort will terminate after having shuffled the elements.
The thing to be concerned with is that it will not produce a uniform distribution. If you’re just interested in the function terminating, there’s a much simpler solution:
(defun seqrnd (seq)
seq)
I made a stupid simple shuffle tester, that tries to shuffle a three-element list.
http://paste.lisp.org/display/58653
With SBCL and your function, I get these results:
;; (3 1 2): 25
;; (3 2 1): 25
;; (1 2 3): 17
;; (2 1 3): 17
;; (1 3 2): 8
;; (2 3 1): 8
A nice and simple perfect shuffle is:
(defun better-shuffle (seq)
(let ((tagged (mapcar (lambda (x) (cons (random 1.0) x)) seq)))
(mapcar ‘cdr (sort tagged #’> :key ‘car))))
Peter proposes perl’s schwartzian transform!
I agree with Joseph that the algorithm is flawed. The problem is that the comparison procedure isn’t transitive: it is possible that a<b and b<c does not imply avector x)) (n (length x) (- n 1)))
((zero? n) (vector->list v))
(let* ((r (random n)) (t (vector-ref v r)))
(vector-set! v r (vector-ref v (- n 1)))
(vector-set! v (- n 1) t))))
That got garbled.
I agree with Joseph that the algorithm is flawed. The problem is that the comparison procedure isn’t transitive: it is possible that a<b and b<c does not imply a<c. Depending on the underlying sort algorithm, bad things may happen, and it is possible that the sort will never terminate. Worse, it is unlikely given this method that all permutations of the input list are not equally likely in the shuffled output, which could be a problem depending on how the shuffle is used (you wouldn’t want your poker buddies to accuse you of stacking the deck — it could be hazardous to your health).
Here is my preferred method for shuffling a list in Scheme.
(define (shuffle x)
(do ((v (list->vector x)) (n (length x) (- n 1)))
((zero? n) (vector->list v))
(let* ((r (random n)) (t (vector-ref v r)))
(vector-set! v r (vector-ref v (- n 1)))
(vector-set! v (- n 1) t))))
[...] In my post “Trivial Sequence Shuffling” from yesterday I showed a simple hack to shuffle a sequence. [...]
Sorry, I could have been clearer in my description.
The problem is not necessarily that the comparison is not transitive: that is the difference between a partial order and a complete order (and SORT isn’t required to maintain a partial order). The predicate itself could have that problem.
The problem I was referring to is that two elements, let’s call them A and B, should be in the same order every time the sort algorithm checks them. If the key values of A and B jump around at random, what is the sort supposed to assume? The comparison function is not well-defined at all, not just with respect to transitivity.
Now, it may be that an efficient sort algorithm does not often compare the same elements pairwise; however, the problem is not restricted to pairs of elements. Consider just the element A. The key function ranges from 0 to 1.0 (exclusive); 0 is interpreted to mean “I should be at the very beginning”, 0.999… interpreted to mean “I should be at the very end.”
Any individual element is going to be changing its stated destination every time it is checked. A particular random order, however, is just *one* order. The random seed should pick one order out of the n! possible orders, with equal likelihood, then *stick to* that order. Instead this algorithm jumps around between orders, giving mixed messages to the sort algorithm. It will be hard to predict the outcome of the sort, but also unlikely to guarantee the uniform distribution among the n! outcomes that a random shuffle desires.