Technical - How and why does a Soft Shuffle work?
Algorithm TLDR
This page is a deep dive of how and why Soft Shuffle works - meaning it's fairly long with a good amount of detail. It may be easier to start with the video especially if you're unfamiliar with some of the concepts. However, it is structured in such a way as to hopefully lead through the detail gently. It does stray into software engineering of course, and i've done my best to make it understandable to non software engineers when it does.
That said, I'm sure a lot of people reading this will want to jump straight to a technical summary and then decide whether to delve deeper, so that's what this first section gives - the pseudocode for the algorithm and a brief explanation of the salient points. The rest of the page essentially builds and explains back up to this, so if you want the full explanation from scratch, skip to the next section.
The pseudocode of the fully general algorithm is:

Key Points
The key points to understand:
-
The algorithm is fundamentally about manipulating an array that represents a deck of cards (deck[numCardsToShuffle]) and an array of arrays that represents piles of cards on a mat (matPiles[numPilesOnMat][] - a 2D array), over a number of passes (deals and gathers) based on the number of cards and piles on the mat. The deck array indices represent the position of a card in the deck, and the array element at each index represents the index that card should end up in, in the final randomised deck. The algorithm generates the index of the mat pile that each card to be dealt to each pass, and what direction to gather the piles of cards back up into a deck in.
-
The number of passes is governed by (numPilesOnMat ^ numPasses) >= numCardsToShuffle. 2 passes is usually the best trade-off, but 1 and 3 are often encountered. 4 or more is possible and handled but to be avoided in real life.
-
The deck array is randomised with Fisher Yates - a well known algorithm (see Randomisation - Fisher Yates below).
-
The direction the piles of cards are gathered from the mat spaces back into a deck alternates each pass, always ending with 'backwards'. Forwards is putting matPiles[0] onto matPiles[1], that combined pile onto matPiles[2], and so on - backwards is the reverse.
-
The key line is:
matPosition := floor((deck[i] / (numPilesOnMat ^ passNumber)) % numPilesOnMat)
-
matPosition is the value to use as the index on the mat (so matPiles[matPosition])
-
floor() (or a truncate operation) isn't strictly needed - the calculation should be integer arithmetic. However the languages this is expected to be coded in often aren't strongly typed and / or are permissive with the modulo operation returning Real as well as whole numbers.
Randomisation Starting Point
I previewed what I mean by good enough and fully random for shuffling / randomising a deck in the Overview and Usage pages. In this page we can reiterate and discuss randomness in more detail. As a starting point, we can look at what levels of 'randomness' we can achieve, ordered in terms of the difficulty to achieve:
-
Predicable / not good enough - it's possible to predict some or all of the positions of the cards (actual locations of the cards or regions of the deck) based on the input permutation; not all output permutations may be possible; some output permutations are significantly more likely than others based on the input permutation.
-
Unpredictable but good enough randomisation - we can't predict the positions of cards reliably based on the input permutation; not all output permutations may be possible; some permutations are significantly more likely than others based on the input permutation.
-
Unpredictable and better randomisation - we can't predict the positions of cards; all permutations possible; the chances of each permutation still vary but decrease with more iterations.
-
Fully Random - we can't predict the positions of the cards; all output permutations are possible and equally likely irrespective of the input permutation.
The levels can be thought of as a sliding scale from Predictable through to Fully Random. The level of 'randomness' increases, but what that actually means is a trend towards the probability of each output permutation being equal, but also the probability of each output permutation becoming less determined by the input permutation (those being interlinked).
With physical shuffling, there has been a lot of research about when we can declare the amount of shuffling done is good enough - and in the Usage page I discuss when good enough is good enough for me personally. However, fundamentally, physical shuffling will always be a series of re-orderings that follow a probability distribution (affected but how 'well' you can do each shuffling pass). To get to fully random (specifically that the output permutation is independent of the input permutation) we'd need an infinite number of physical shuffle rounds.
The starting point for Soft Shuffle was - given what I know about randomisation in software, how do I bypass physical shuffling altogether and jump straight to Fully Random in a linear / deterministic time (ie with n cards it takes x * n amount of time).
Don't worry if some of the terminology is unfamiliar, in this page i'm going to try and explain how Soft Shuffle works from scratch (and let me know if something is unclear and i'll try to add in more explanation). This page is an explanation of both what I mean by, and how we build up to a situation where we have a deck randomisation algorithm that gives us:
Fully Random - All output permutations are possible and equally likely irrespective of the input permutation.

Permutations & Mappings
So to explain what is meant by the statement:
Fully Random - All output permutations are possible and equally likely irrespective of the input permutation.
First - what do we mean by input and output permutations?
When I talk about a permutation (Wikipedia Link) I am referring to one of the possible orderings of the cards in a deck. All the cards in a deck are assumed to be unique, so there are n! (n factorial) possible permutations (orderings) of those cards. If we have a deck of 3 cards, we can label these cards 0, 1, 2. We could also label them A, B, C, and for the following diagrams i've included both versions to make the explanations easier to follow.
With 3 cards we have 3! = 6 permutations. These are all the possible orderings three unique cards can appear in. I've coloured each one to make it easier to distinguish them in the diagram below. When we talk about shuffling / randomising a deck, it must both start in an original ordering and finish in a new ordering - or an input permutation and an output permutation. The diagram below shows all the possible original orders (input permutations) as a column on the left, and all the possible new orders (output permutations) as a row on the right.

To meet our definition of fully random, each and every input permutation must potentially result in each and every output permutation, with equal likelihood in every case. If we used labelled cards as in the diagram above, whatever the original ordering (input permutation), as long as we can generate / pick a new permutation in an unbiased way (ie each permutation is equally likely), we could theoretically look at the labels and sort the cards into that order.
Permutations-as-Mappings
Of course we want to work with unlabelled cards, and still be sure it works. That is actually not a problem, the tricky bit is dealing to the new order we want - which is discussed later. For now we want to be sure that we can start with an input permutation, and given an unbiased new-order-generator (Fisher-Yates), generate each and every output permutation with equal likelihood.
This works if we consider the new order as a mapping (Wikipedia Link) - use it as a permutation-as-mapping. What I mean by that is instead of using the new ordering as an order which we deal labelled cards to match, we use the original and new ordering as a set of positions to move unlabelled cards from / to.
In the diagram below, the first row (light grey) shows an original ordering (0, 1, 2), and then every different new ordering (every permutation - in red). But we are now not considering these as labelled cards, but as positions of cards in the deck - position 0, position 1 and position 2. So the original (black) ordering is now the original positions of the cards, and the red shows all the permutations of positions those cards could be moved to. E.g. the first new ordering (2, 1, 0) represents the card that was at position 0 now being at position 2, the card at position 1 still being at position 1, and the card at position 2 being at position 0.
The row below shows what happens if we start with the input permutation of cards as 0, 1, 2 (or A, B, C), and apply each of the permutations-as-mappings in the first row.

What do all the permutations look like?
Starting with the ordered permutation 0, 1, 2 is useful for demonstrating how we can use permutations as mappings, but really we want to see what happens if we apply this to all the possible input permutations.
In the diagram below, the column underneath the original order numbering shows every input permutation. The column under each new permutation-as-mapping (in red) shows the output permutation that results from applying that mapping to each input permutation. And we can see that in each column, every output permutation exists - because using the new order as positions in the deck is a 1:1 mapping. I.e. each input permutation leads to exactly one output permutation given a specific mapping. If we look at each row we see that there is also each permutation in the the output permutations - each input permutation can generate every output permutation, given each of the mappings.

What is this showing us?
So what does that full table show us?
It shows us that: Given the full set of permutations used as input permutations, and using the full set of permutations applied as permutations-as-mappings, each of those input permutations gives the full set of permutations as output permutations.
To unpack that a bit:
-
We can use each of all of the possible permutations as input permutations.
-
For each input permutation, we can apply all of the possible permutations as permutations-as-mappings.
-
As those permutations-as-mappings are 1:1 and non-colliding, each input permutation can generate each of the possible permutations as output permutations.
-
If we pick one of the permutations-as-mappings at random, and apply it to an input permutation, all of the output permutations are possible.
-
Therefore, assuming we pick which permutation-as-mapping to use in an unbiased way (ie each is equally likely to be picked), each input permutation will give every possible output permutation with equal likelihood.
Or in other words - each and every possible output permutation is equally likely irrespective of the input permutation assuming we use an unbiased random order generator to pick a permutation-as-mapping to apply to the input permutation.
Which matches our fully random definition:
Fully Random - All output permutations are possible and equally likely irrespective of the input permutation.
To highlight that, see the diagram below - we've highlighted that we have all of the input permutations on the left (the first column). We've highlighted the output permutations for one of the input permutations (the row), and you can see how each permutation is generated from each of the permutations-as-mappings (the red). We can see that all of the permutations are there as output permutations (which can also be seen for each of the input permutations).

Virtual representation - Arrays
Given what we've discussed about permutations and mappings, we care about two things - generating an unbiased new ordering (to use as a mapping), and physically dealing the cards according to that new ordering. In both cases we need a virtual representation of a deck (and piles of cards on a mat). I used 0-based numbering while discussing positions and discussed why orderings as mappings works because our virtual representation is based around Arrays.
Arrays (Wikipedia Link) are a pretty fundamental object in CompSci. I won't go into much detail about them as there are many many resources on the internet explaining them. But if you are unfamiliar with them, we can consider an array as a set of boxes of data labelled with a number. This number is the position of the box within the array, and is known as the index. The data contained in the box is known as the element and for this algorithm is simply a number.
The basis of our virtual representation is that the array Index represents a card located at that position in the deck, and the array Element represents a piece of data about that card.
The diagram below shows this better and shows a convention used in the algorithm, and throughout the code, that Index 0 is the base of the deck / base of the piles of cards. So Index n (99 in a theoretical 100 card deck as we're 0-indexing) is the top of the deck / base of the piles of cards. If you are familiar with software this is because we need to pick a convention -> treating the deck/piles as a LIFO stack most closely models dealing in real life -> using index 0 as the base simplifies modelling dealing to piles as pile[0] will be the first index we push() elements into when we deal to that pile.

Array Element data
So what do we use the array Elements (the number stored at each index) for?
Well we use these as positions too - but instead of being the actual, current, physical position of the card in the deck (which is what the index is), the element represents a position associated with that card, that we want to track as we move that card around.
The nomenclature used with arrays is that the index is written in square brackets ( [0], [1], [2]), so if we had and array representing a deck we might refer to the whole deck as deck[] and to access the element at a specific index we'd use deck[0].
So in the diagram below we can see again (on the left) that array index [0] represents the base of the deck. But on the right, We see a starting point representation for our un-randomised, un-dealt deck. Each card in the diagram has both its index (its position in the physical stack) and it's element (a position associated with that card, which is currently its starting position).
Using the virtual representation we can follow the elements vs the indexes to calculate the dealing instructions we need to randomise our deck of cards. We can think of the positions stored in the elements as going through 3 phases:
* Pre-randomisation-pre-deals the element is the original position of the card in the deck (so matches the index).
* Post-randomisation-pre-deals the element is a new position that we want the card that's originally at that index to end up in.
* Post-randomisation-post-deals the element again matches the index - we've calculated the deal that needs to happen to make our random order mapping / new permutation / randomised deck a reality.

Randomisation representation
So what does randomisation look like?
* Pre-randomisation the array element is the original position of the card in the deck (ie the element matches the index).
* Post-randomisation the array element is a new target position we want the card that is originally at that index to end up in.
We perform the randomisation using Fisher-Yates - there are details of the implementation later. In our virtual representation of a 100 card deck as shown in the diagram below we now have array indices [0] to [99] and elements 0 to 99, except the elements are in a random order. The elements represent the index we want the card to end up in - we've perfromed the randomisation, we just need to calculate the deal to make that a reality.

Dealing representation
Second we work out the how to deal our virtual representation such that the indexes and elements match again:
-
Pre-deals the array element is a new position we want the card that is originally at that index to end up in.
-
Post-deals the array element again matches the index - we've calculated the deal that needs to happen to make our random order mapping / new permutation / randomised deck a reality.
The actual details of the algorithm to work out the deal(s) needed to achieve this are given below, but for now we note that we care about the number of cards to be randomised and the number of piles on the mat we want to use. There will be n deals where the number of piles >= nth-root of the number of cards.
As we model moving the cards around via dealing to piles and gathering piles back into decks, this is represented by moving elements to new indices - both in new arrays representing piles on the mat, and back into an array representing the full deck.
In this way, we are taking the new ordering randomly generated in the randomisation step, and using it as a mapping to turn the input permutation of the cards into the output permutation of the cards. And as we noted in the Permutation & Mapping section, that means by picking a random mapping and applying it, we are picking a random permutation from all the possible permutations.
And thus, the deck is now fully randomised.

Randomisation - Fisher-Yates
To generate the new order we want to deal the cards into, we use a well know algorithm - Fisher-Yates (Wikipedia Link). It allows in place randomisation in linear time, through iterating through the indices, swapping the current index's element with the element of an un-swapped index (though the un-swapped index's element may have already been swapped with). I won't go into much detail about it (there is plenty on the Wikipedia page and the internet generally), beyond noting the important point:
It provides an unbiased permutation provided it is implemented correctly and an unbiased random number generator (RNG) is used.
The first point (it is implemented correctly, largely avoiding classic out-by-one style implementations) I verified in the code I wrote, and the core code is open-sourced on github so you can verify it for yourself if interested (see Source Code and Logging).
Given that, the main focus is on the random number generator (RNG) - what I've used and why. For the RNG I used the cryptographic one available in all modern web browsers, and I used it along with rejection sampling to generate my 0-N values. Why I've used those I'll explain below, but for context I encourage you to read the Potential Sources of Bias section of the Wikipedia article.
Cryptographic Random Number Generators
One of the reasons I wrote Soft Shuffle in Javascript - other than portability - is web browsers give easy access to cryptographic random number generators (CNRG - Wikipedia Link). In a modern browser running on modern hardware, this is a Psuedo Random Number Generator (PRNG) but combined with source(s) of entropy from the operating system (such as a Hardware Random Number Generator (HRNG - Wikipedia Link) if available, and general hardware based entropy if not).
The mixing in of entropy in a CRNG overcomes the issues a PRNG has for a shuffling algorithm. A good CRNG both uses entropy to generate a random seed to start the algorithm, and mixes a source of entropy into the working state of the PRNG to prevent the seed-derived-permutation-limits a PRNG has.
How close CRNGs / HRNGs are to being 'perfectly random' is an active area research. I discuss this near the bottom of the page as it gets into the interesting philosophical question of 'What is randomness?'. For our purposes, they are random and unbiased - a product of the fact that they are one of the cornerstones of all encryption.

Implementation - Rejection Sampling
One of the other potential sources of bias in Fisher-Yates is modulo bias. This happens if we use a random number generated in a range defined by a data type (ie 0-65,535 for a uint16 - a two byte unsigned integer), and scale it to a much smaller range (0-99 for our 100 card deck) - the remainders are uneven across the numbers leading to bias.
Luckily, it's quite easy to avoid with careful implementation, we just need to use the random numbers generated, do some bit twiddling and rejection sample the range we are interested in.
The randomness in Fisher-Yates comes from repeatedly generating a number in the range 0-N, where N decreases as we run through the elements. Rejection sampling is a case of (for each 0-N generation):
-
Calculating the minimum number of bits (mBits) that can contain the value N.
-
Generating a random uint that has at least that many bits (in practise a uint16 will always suffice).
-
Bit-masking off any bits above mBits (in other words set them to 0) from our random number.
-
Considering the masked random number as an integer again.
-
If it's above N, reject it and try again.
By masking off the un-needed bits from our random uint, we ensure that the chance of rejection is always less than 50%. Whilst theoretically we could continually reject indefinitely, the expected number of retries is less than one, so in reality it is not an issue as the number of random numbers is small in computing terms and the code runs on the users device.
Similarly, technically we are 'wasting' the higher order random bits when we mask as well, but it makes the code simpler to read and maintain than creating a full bit stream. Plus we can afford to be a little inefficient with our code, because as said the number of cards in a deck is relatively small in computing terms, and the code runs on the users device.
The diagram below shows an example of generating a number in the range 0-71. We start with a randomly generated uint16 (Value 39721, binary 1001101100101001). We need a 7 digit number (6 can only encode 0-63, but 7 is 0-127), so we mask to keep bits 1-7, and mask off bits 8-16. After applying the mask the remaining value is 41, which is in our range 0-71, so we keep it. If it had been in the range 72-127 we would have discarded it and generated a new uint16 and repeated.

Dealing to an Order
Now we are happy we can generate a random ordering with a virtual representation, how do actually make that a reality? Lets start with the trivial case when we have a number of piles that is less than or equal to the number of cards - n-cards-n-piles.
To give a visual explanation of the algorithm i'll be using a mat with 10 piles, numbered 0-9. So if we have 10 cards, we simply generate a new random ordering for the elements 0-9. The dealing instructions will simply be working from index[9] - index[0] dealing the cards to the element at each index (as index[9] is the top of the deck, index[0] is the base, and we deal from the top of a deck).
(Note that the card numbers in the picture below are black - to denote these are the original positions of the cards, now randomised).

Start at the middle - the second deal
Ok, but we almost always will have more cards than piles - we want the algorithm to scale to 100+ cards. Lets say we are happy to do two deals - what can we do with that?
The key to all this is looking at the second deal, and what happens if we have n cards and at least sqrt(n) piles, so:
number of piles * number of piles >= number of cards
To make following this simpler, we'll consider the case of 10 piles for 100 cards - and we will use our mat labelled 0-9 for the 10 piles. Something interesting happens if we deal the top ten cards so that one goes to each pile, and then do the same to the next ten, and the next ten, until we've dealt the whole deck out; and then gather those 10 piles back into a deck by stacking the decks one on top of each other, in order.
Those first ten cards we dealt from the top of the deck are now stratified every 10 cards through the deck, starting from the bottom of the deck. And the next 10 cards are stratified but one higher in the deck, and so on and so on.
In fact, as long as we take note of the order we stack the piles back into the deck
The new ordering is entirely deterministic.
(as an aside, this is why pile shuffling is in most cases a bad way of randomising a deck).
Ok, but what practical use is it being entirely deterministic? Well lets take our 100 card deck (with each card labelled from 0-99). Lets say we set that deck such that:
The top ten cards of the deck are 00, 10, 20, 30, 40, 50, 60, 70, 80, 90.
Lets then deal them out based on the first digit (or tens value if you prefer to think of it that way) - so card 00 is dealt to pile 0, card 10 is dealt to pile 1, and so on. We could also have set the next 10 cards in our deck to be 01, 11, 21, 31, 41, 51 ,61, 71, 81, 91.
So we can again deal these out based on the first digit - 01 to pile 0, 11 to pile 1, and so on. The last pile is now 90 on the bottom, 91 on top of it, the one to the left is 80 on the bottom, 81 on top. And you can probably see where this is going now! We keep dealing out the entire deck (that we've set into groups of ten based on the first digit), as as long as we place pile 9 onto pile 8, and that new pile onto pile 7 (and so on), we've now sorted the deck into 0-99 (0 on the bottom, 99 on the top).
(Note that the card numbers in the diagram below are red - denoting that these are the new randomised positions which we want to deal the deck into.)

Which, of course, matches our virtual representation of a deck, and looks suspiciously like our picture of the dealing representation from the randomisation section - with the caveat of us having to set the deck beforehand to have 00, 10, 20, 30, 40, 50, 60, 70, 80, 90, as the top ten cards of the deck.
If we look closer at dealing out those top ten cards, we were putting one onto each pile before moving onto the next 10 cards. What that means is the order 00, 10, 20, 30, 40, 50, 60, 70, 80, 90 are in is irrelevant - whatever order they are in, we can still deal one to each pile before we move onto the next 10 cards in the deck. And the fact the ordering is irrelevant is what allows us to set the deck correctly in the first deal.
Back to the beginning - the first deal
So we need to set the deck such that a permutation of the cards 00, 10, 20, 30, 40, 50, 60, 70, 80, 90 are the top ten cards of the deck, but we don't care which permutation. The obvious thing about the set of cards is they all have the same second digit, so for the first deal we deal to piles based on the second digit - so for example 50 goes to pile 0, 51 goes to pile 1, 52 goes to pile 2, and 59 (one of our set) ends up in pile 9.
If we deal the entire 0-99 deck in this way, pile 0 will contain a permutation of 00, 10, 20, 30, 40, 50, 60, 70, 80, 90, pile 1 will contain a permutation of 01, 21, 31, 41, 51, 61, 71, 81, 91, and so on with pile 9 containing a permutation of 09, 19, 29, 39, 49, 59, 69, 79, 89, 99. Placing pile 0 on top of pile 1, on top of pile 2 and so on will give us a deck that satisfies our starting requirements of the second deal.
(Note again the red card numbers in the diagram below - denoting that these are the new randomised positions which we want to deal the deck into.)

As explained, for the second deal, which permutation of 00, 10, 20, 30, 40, 50, 60, 70, 80, 90 we have is irrelevant. Which means the order the deck starts in for the first deal is irrelevant - all the second-digit-is-0s will end up in pile 0 whatever order the cards are in before dealing. A random starting order for deal one simply means that the number of cards in each pile isn't consistent throughout the deal, only by the end - so early in the deal pile 0 could have 4 cards while pile 1 has 0 cards, pile 2 has 1 card, etc.
So taking the first and second deal together, we have a physical dealing algorithm that in two deals takes a random ordering of 100 card numbers and orders them fully. Which is exactly our definition in our randomisation representation. And taken along with what we showed in the Randomisation section and the Permutations & Mappings section means we now have a method of Fully Randomising a deck of 100 cards using 10 piles.
Things to note
Interesting things to note:
-
Both deals end up with 10 cards in each of the piles which are then gathered back into a deck, but they arrive in the piles differently.
-
In deal one the cards are dealt into the piles based on the second digit.
-
As the cards start off fully random, you don't get one card in each pile then repeat
-
They accrue unevenly but always end up with 10 in each pile.
-
The piles are gathered 'forwards' in that pile 0 is placed on pile 1, then that pile on pile 2 and so on.
-
-
In deal two the cards are dealt into the piles based on the first digit.
-
As the cards start off partially ordered, you get exactly one card in each pile then repeat.
-
Thus they accrue evenly but still always end up with 10 in each pile.
-
The piles are gathered 'backwards' in that pile 9 is placed on pile 8, then that pile on pile 7 and so on.
-
-
-
Deal one gathering forwards and deal two gathering backwards is a product of our index[0] being the bottom of the deck convention. If you reversed that, the cards would come out in the reverse order, with card 0 on top.
-
This is in actuality a sorting algorithm, but it is one optimised for physical sorting rather than virtual.
-
Moving a card to a specific point in a deck is an expensive operation for a human to perform so a quicksort style algorithm would take much longer for a human to perform.
-
So to recap, after the first deal but before gathering forwards:

And after the second deal but before gathering backwards (and remember we want 00 at the base of the deck, 99 on the top):

Extending to the 2-pass general case
Obviously, we don't want to limit ourselves to 100 cards and 10 piles, so we need to look at what's actually going on here when we sort by first digit and second digit. By doing so we can extend to handle any number of cards and piles (and at times, trade off the two).
The nomenclature for a division operation, specifically integer division (a.k.a. Euclidean division - Wikipedia Link) is:
Dividend / Divisor = Quotient + Remainder/Divisor
because we are satisfying:
Dividend = Quotient x Divisor + Remainder
or to put it another way:
-
Dividend: The number being divided.
-
Divisor: The number by which the dividend is divided.
-
Quotient: The result of the division.
-
Remainder: The number left after the division.
In computing we refer to the modulus returned by the modulo operation (% - Wikipedia Link) instead of Remainder. They are identical if using positive numbers, which we will be.
If we re-inspect our algorithm of sorting by the first digit and second digit we see that we appear to be sorting by the Quotient and the Remainder (respectively) of dividing the card number by the number of piles. And by deliberately picking the number of piles and number of cards as 10 and 10*10, the Quotient and Remainder nicely line up with our base 10 decimal counting system (Wikipedia Link). In fact the 2-pass general case is a specific case of the n-pass general case, but for now thinking in terms of the Quotient and the Remainder helps with our explanation.
2-pass general algorithm
So now we're looking at Quotients and Remainders, we can start to express the two pass / two deal algorithm with chunks of psuedocode (Wikipedia Link) using the modulo operation (%) for our remainder operation. With some setup:
-
deck[] is an array representing the deck, already randomised using Fisher-Yates (so the element at each index represents the new index the card currently at that index should end up at - think of it as the labels on our cards)
-
matPiles[][] is a 2D array (Wikipedia Link) representing the piles on our mat - the first index is the number of the pile (think of our 0-9 labelled mat), the second index representing position in that pile (remember 0 is base of the pile).
-
Gathering 'forwards' means gathering the piles in ascending order (pile 0 on pile 1, that on pile 2, and so on), gathering 'backwards is in descending order (pile n on pile n-1, that on pile n-2, and so on).
we can express the two pass algorithm as:

Note: Surely we don't we need floor() at all?
Indeed - if we are using a strongly typed language and/or purely doing the calculations as integer calculations (which ideally we should be) then neither the division or the modulo would output anything but integers. Thus using them immediately as an array index would be safe. Practically though, a lot of modern languages, especially ones I might expect a user to try this in (javascript, python, etc) aren't strongly typed. And more that, they are permissive about the modulo operation taking and outputing non-integer values. Thus floor / truncate / casting to an integer is worth noting in the psuedocode.
Why does the 2-pass general case work?
To understand why, lets look at a number grid for when we have 9 piles, and 9 * 9 = 81 cards. With 81 cards we have card numbers are 0-80, and our piles 0-8 (we're using 0-based numbering). We'll look at a very specific permutation of the card numbers after the first pass (deal) and how we always end up with an ordered deck after the second pass (deal). We'll highlight where the card numbers go in the piles and why, this time with non-decimal aligned numbers of cards and piles. It shows visually why two partial-sorts combined with two gathers gives us a full sort when we have a number of cards that equals the square of the number of piles.
The key thing to keep in mind is - we're always sorting horizontally (into columns representing piles). By doing multiple passes (deals) and gathers this sorts the cards into different regions of the deck each pass until they are fully sorted. We are also always using integer maths.
After the first pass (Deal One), we have sorted by modulo 9 of the card number (e.g. 39 mod 9 = 3, because 9 * 4 = 36, so a modulus of 3, hence card number 39 ends up in column/pile 3.). We highlight every card number that has a modulus of 0 (blue) and 8 (yellow) to track them across the 2 passes/deals.
If we look at the light grey labels above & left of the card number grid we can see that each column/pile has every multiple of nine plus a fixed modulus (so 9 * X + Y, where Y is the column/pile the card number has been sorted into, and X is the row). So we know that whatever order these happen to be dealt in, every card number in the blue column/pile will always be in that column/pile after pass/deal one. And the same for every other column/pile of numbers, including the yellow.
The light grey labels to the right of the card number grid show the array indices - using the [0] is the base of the pile convention. The light grey labels below the card number grid show the gather order after pass/deal one - pile 0 onto pile 1, that onto pile 2, and so on. So the card numbers that were in blue column are now at the top of our new deck after the first deal.
Given that the blue card numbers are on the top of the deck, they'll be dealt out first. In the second pass (Deal Two), we are sorting by the integer division quotient (so 17 / 9 = 1.888 = 1 in integer division). Once again we're sorting horizontally (into columns representing piles), but we're sorting the blue cards from deal one first As there was one card number for each X in 9 * X + Y, and we're sorting by the quotient after dividing by 9, we fill the 0-index of each pile. The ordering of those blue cards is irrelevant since we're having one go to each pile.
So the ordering of all the card numbers is in the deck we start with is irrelevant - 0,9,18,...,72 will always end up in the blue column/pile after deal one. And the order they are in in that pile is irrelevant because the blue card-numbers will always end up fully sorted when they are dealt to piles in deal two. Similarly through each other column of numbers (including yellow) in the first pass/deal. Therefore, we always end up with a fully sorted deck, whatever the card-numbering permutation in our initial deck.

With the X-pass fully general algorithm, we'll see that the first pass card-number-modulo and second pass card-number-division-quotient are in fact just simplifications of a general formula - the modulo of the quotient of a pass-number-based division.
Looking at the diagram above you can also see how having more cards than the square of the number of piles overwhelms our two partial-sorts and gathers. If we add just one more card - the card number 81 (9 * 9, remembering this is 0-based so 81 is the 82nd card), it would have a modulus of 0 in the first pass/deal and end up in column/pile 0.
So we'd end up with 10 cards in column/pile 0 after the the first deal. When we sort by the quotient it would either end up in column/pile 9 (which doesn't exist in our example), or column 0 if we used the generalised algorithm in the next section. Either way, it won't end up in the correct position, and will throw out the whole ordering after the second pass/deal.
What if I don't have exactly √(n) piles?
We've just discussed what happens if we have more cards than the number of piles squared - but what if we have less? Will that break everything?
Thankfully no - lets take our previous nine column / pile example, but in this case use 50 cards (so card numbers 0-49). As you can see, we still have card numbers in each pile after pass/deal one - we still have the full range of modulo values, just less of them. Those card numbers that we're using that were in the blue column in the last example are in the blue column again this time - that doesn't change, it's just a subset. Again, they could be in any order in the starting deck and they'll still be in the blue column, just in a different order.
And again, the blue card numbers end up on the top of the deck and thus are the first to be dealt again. Except there's there's less of them, so what happens?
Well what does change, is that we have less piles used in the second pass/deal. The explanation being that we're now using quotients, but not enough values to reach those higher numbered piles because the quotients never reach high enough. If we have six blue card numbers in the column in the first pass/deal, we'll have six piles in the second pass/deal. We may not have the same number of cards in every pile, but the piles will be tall enough that each card is in it's correct place in the correct pile.
And again, the ordering of the cards in the blue column/pass after pass/deal one is irrelevant when dealt out in pass/deal two. So we can still fully order the cards with two deals and two gathers.

2-Pass Worked Examples
In the previous section, I deliberately ordered the card numbers to highlight the patterns the algorithm is based on, but lets imagine the card numbers were one of all the other 5.8 * e ^ 120 permutations.
As discussed, the same card numbers will still end up in each column/pile after pass/deal one, just in a different order. The order they are dealt will be random, so we'll see each pile fill up with the same number of card numbers, but we will likely see multiple cards in some columns/piles before we see one card in other piles. And then after pass/deal two, they'll be fully ordered.
I'll show worked examples of both 81 cards (9 * 9 cards, 9 piles) and 58 cards (still using 9 piles), with the card-numbers randomised before pass/deal one, as they would be in a real example. I'll also highlight subsets of the cards in two different ways. I'll use the term backwards to denote highlighting a set of cards that ends up in a specific position at the end of the deal. I'll use forwards to denote highlighting cards that start in a specific set of positions. This lets us see how they flow through the piles through the course of the deals.
We don't have any pile numbers associated with the new, first, random number grid, as it just represents a deck. But I have used with a deck ordering convention that matches the pile notation for consistency. Labels below/right of the number grid show this, with top right in the number grid being the top of the deck, and bottom left being the base.
So shown below is a full 81 cards, using backwards highlighting of the cards that end up at the top and bottom of each pile after pass/deal two. This matches the highlighting in the previous section.

Forwards highlighted - 81 cards, 9 piles
Shown below is the same 81 card permutation, but this time i've highlighted the top 18 cards in the starting deck, in two sets. This demonstrates how the cards at the top of the deck are spread out unevenly in different columns/piles after deal one, and then spread out randomly after the second deal.

Backwards highlighted - 58 cards, 9 piles
Shown below is 58 cards, 9 piles. Once again using backwards highlighting of the cards that end up at the top and bottom of each pile after pass/deal two.

Forwards highlighted - 58 cards, 9 piles
And finally, 58 cards, 9 piles, but this time i've again highlighted the top 18 cards in the starting deck, in two sets.

Extending to the X-Pass General Case
As noted - if we use more cards than number of piles squared, the 2-pass algorithm breaks down. But we can counteract that with more passes. In fact, both the 1-pass and 2-pass algorithms are just simplified versions of the fully general, X-Pass algorithm, where X is the smallest whole number where number-of-piles ^ X >= number-of-cards. I'll discuss this more later, but that also implies we can trade off the number of passes with the number of piles we use on the mat.
The two important parts of the X-Pass algorithm are:
1. The pile matPosition a card cardNumber is dealt to in pass passNumber is given by:
matPosition = ((cardNumber / (numberOfPiles ^ passNumber)) % numberOfPiles
where:
-
matPosition, cardNumber and passNumber are 0-based, but numberOfPiles is not.
-
All calculations are integer (Euclidean arithmetic).
-
When we are not using strongly typed languages to ensure integer calculations, a surrounding floor(), truncate() or cast operation is often needed to be safe to use the matPosition as an array index.
-
2. The gather direction - forwards(ascending) or backwards(descending) - alternates each pass, with the final pass always being backwards.
Thus the full psuedocode is:

Note: there is an old joke about the 2 hard problems in computing - naming things, cache invalidation, and out-by-one errors. Out-by-one errors are definitely something to watch out for when implementing this, especially given the prevalence of 0-based numbers. Though given those largely align with array indexing norms it's not too bad. But in terms of naming things, it was tricky deciding what to call the function that gives the position on the mat for each card. moduloFunction - that just sounds like %. modOfQuotientFunction - that is only half right, and you'll end up calling it modOfQuotientOfPowerOfPass or similar. So I simply called it the quotientFunction for brevity.
1-Pass and 2-Pass from X-Pass
I mentioned earlier that the 1-Pass and 2-Pass versions of the algorithm are simplifications of the X-Pass, and we can see that by inspection of the function:
matPosition = ((cardNumber / (numberOfPiles ^ passNumber)) % numberOfPiles
For a 1-Pass algorithm, we can see that this simplifies to:
matPosition = ((cardNumber / 1)) % numberOfPiles
matPosition = cardNumber % numberOfPiles
as passNumber will only ever be 0, and any number to the power of 0 is 1.
For a 2-Pass algorithm, for the first pass passNumber will be 0 again (as it's 0-based), and the second pass, passNumber will be 1. So we can see that simplifies the second pass to:
matPosition = (cardNumber / numberOfPiles) % numberOfPiles
as any number to the power of 1 is the number.
But wait! That's not the quotient of cardNumber / numberOfPiles as you said in the 2-Pass description - its the modulo of the quotient? And that's true, but as long as we've got numberOfPiles as the both the divisor and the mod value, and our cardNumber is < numberOfPiles ^ 2, the mod of the quotient and the quotient evaluate to the same value (assuming integer arithmetic). And using modulus and quotient maps to ones and tens for 100 cards / 10 piles and decimal numbering - which makes the usual case (2-Pass) easier to explain and and grasp for the first time.
Why does the X-Pass Work?
As with the 2-Pass explanation, lets look a number grid, this time with 4 piles and 4 * 4 * 4 = 64 cards. With 64 cards we have card number 0-63, piles 0-3, and this time passes numbered 0-2. We agin look at a specific permutation to show what's going on, along with highlighting a specific subset of the card numbers. As we have 3 passes, the gather operations will be backwards, forward, backwards.
Again, always keep in mind we're doing a series of partial sorts horizontally (into columns representing piles). By doing multiple passes (deals) and gathers this sorts the cards into different regions of the deck each pass until they are fully sorted. If we compare with the 2-Pass, again we sort into a subset in each column/pile in the first pass/deal, that becomes a stratified subset in the final pass. This time there aren't enough piles for it to happen directly in the second pass though, instead the second pass is an intermediate sorting - each column/pile in the first pass becomes 1 / number of piles of the card numbers in each column/piles in the second pass (a block stratification), which becomes 1 / number of piles stratifications in the third pass/deal.
We can see that in the first pass/deal all of the same subset of card numbers will end up in each column/pile, but in a random order depending on the initial deck ordering. In the second pass all of those cards will end up in a specific region of the deck, but partially sorted into 1 / number of piles subsets of the subsets. But that places them in the correct position of the deck to have those subset-subsets be full ordered in the final pass/deal.
Think of it this way - if we have 4 piles, we can:
-
Sort 1 set of 4 unsorted cards directly in a single pass.
-
So (1 * 4) -> (4 * 1).
-
-
Sort 16 cards into 4 sorted-sets of 4-card-unsorted-subsets in one pass.
-
Sort those 4 subsets of 4 directly in a second pass, 4 times.
-
So (1 * 16) -> (4 * 4) -> (16 * 1)
-
-
-
Sort 64 cards into 4 sorted-sets of 16-card-unsorted-subsets in one pass.
-
Sort those 4 subsets-of-16 into 4-card-unsorted-subsets, 4 times, in a second pass.
-
Sort those 16 subsets-of-4 directly in a third pass, 4 times.
-
So (1 * 64) -> (4 * 16) -> (16 * 4) -> (64 * 1)
-
Or (1 * 64) -> (4 * 16) -> ((4 * 4) * 4) -> (64 * 1) to be precise.
-
-
-
-
And so on as a recursive sort with X-Passes.
-
Using a notation of (sorted-sets * unsorted-subsets-of-cards).

What if I don't have exactly Xth-root(n) piles?
As with the 2-Pass this still works, just with less cards in each pile, and/or less piles, depending on which pass.

X-Pass Worked Examples
As with the 2-Pass Worked Examples, these are worked examples with the card-numbers randomised before pass/deal one. I'll show worked examples of both 64 cards (4 * 4 * 4 cards, 4 piles) and 42 cards (still using 4 piles). I'll again highlight subsets of the cards using the term backwards to denote highlighting a set of cards that ends up in a specific position at the end of the deal, and forwards to denote highlighting cards that start in a specific set of positions.
There aren't any pile numbers associated with the first, random number grid, as it just represents a deck. But I have used with a deck ordering convention that matches the pile notation for consistency. Labels below/right of the number grid show this, with top right in the number grid being the top of the deck, and bottom left being the base.
So shown below is a full 64 cards, using backwards highlighting of the cards that end up stratified at regular intervals after the third pass/deal. This matches the highlighting in the previous section. It shows how cards are sorted into large subsets, then smaller subsets which then finally get fully sorted.

Forwards highlighted - 64 cards, 4 piles
Shown below is the same 64 card permutation, but this time i've highlighted the top 18 cards in the starting deck, in two sets. This demonstrates how the cards at the top of the deck are spread out unevenly in different columns/piles after the first deal, and then spread out randomly via the second and third deal.

Backwards highlighted - 42 cards, 4 piles
Shown below is 42 cards, 4 piles. Once again using backwards highlighting of the cards that end up stratified at regular intervals after the third pass/deal.

Forwards highlighted - 42 cards, 4 piles
And finally, 42 cards, 4 piles, but this time i've again highlighted the top 16 cards in the starting deck, in four sets.

Mat Label Mapping
So far we've purely used an array representation of the mat - in other words each pile is just represented with an index of 0-n. But the physical mats we're using use a letter+number label for each space.
I suspected and was confirmed in initial testing that just using numbered spaces doesn't work very intuitively leading a noticeable lag as you work out where each number is before placing it on the pile. Using a letter and number makes this more intuitive - letters for row, numbers for columns. It also makes it easier to use a subset of a mat (discussed below).
In terms of implementation it is simply a lookup table of labels that match the spaces (or subset of spaces) on the mat we're using. The labels are created when we initialise the virtual representation of mat. When we read out the instructions (which pile to deal each card to, in order), we just read the label from the same index in the lookup table instead.
Using a lookup table both makes the current code cleaner, but also makes some of the future alternatives (discussed below) simpler to implement at a later date.

Odd numbers are your friend
Also mentioned in part in the Usage page Concepts section, the choice of 5 * 2 (5 columns, 2 rows) and 7 * 3 for the first two mats is very deliberate, based on the initial testing and a bit of maths (and my knowledge of HCI, board games, CCGs, etc).
Firstly - odd numbers make it easier to intuit where the numbered piles are - and it's always my recommendation when selecting a number of piles yourself. We're ok with two (Left vs Right, Up vs Down, etc). Three is the most natural though (3 is the magic number after all) as we naturally think of:
Left, Centre, Right.
and the numbers associated with them - we are used to a central focus, and then paying attention to each side. That easily extends to five as we still have a concept of centre and sides, but now there's an in-between for each pair:
Left, Left-of-Centre, Centre, Right-of-Centre, Right
again I believe that maps to what we're used to with our vision system - a central focus, peripheral left/right, and left and right of central focus.
My experience is 5 is the sweetspot (though it will vary person to person), with a flow state easily achievable dealing with 5 columns. Extending to 7 works well though, conceptually along the lines of:
Left, Right-of-Left, Left-of-Centre, Centre, Right-of-Centre, Left-of-Right, Right
9 seems to be where it breaks down as it ends up being Left-of-Centre-of-Right or similar. Possibly it's too many to hold naturally, but can be trained with enough effort.
Either way, this is part of why I have 5 and 7 columns on each mat respectively. The other part is the numbers achievable with a few common (sub)sets of piles on the mats:
-
5 * 1 - gives 25 in 2 passes - for small decks such as action decks 2 passes works.
-
At 3 passes covers 125 cards - but for ~26-35 cards can feel simpler to use 3 passes of 5 * 1 as its so quick.
-
-
7 * 1 - gives 49 in 2 passes - for most ancillary decks in games 2 passes works.
-
Covers 40 cards CCG decks.
-
-
5 * 2 - gives 100 in 2 passes - covers a lot of main decks, 60 and 100 card CCG decks, and normal card deck.
-
At 3 passes it will cover 1000 cards - which is more than you'd ever do, but nice to have the possibility.
-
Covers enough at 2 passes to work as the smaller of two full mat sizes.
-
-
7 * 2 - 196 in 2 passes - covers most main decks
-
5 * 3 - 225 in 2 passes - again covers most main decks, it is more preference between this and 7 * 2.
-
7 * 3 - 441 in 2 passes - covers the vast majority of even large-deck deck-driven games with multiple expansions.
-
Covers enough at 2 passes to work as the larger of 2 mat sizes, and allows a subset of piles to be used.
-
and as you can see, all of the edge cases are covered by 3 passes - even the smaller mat covers 1000 cards at 3 passes, which is more than any game I know of, let alone gets regularly played.

Checking settings and piles vs passes
With all the possible combinations of numbers of columns and rows discussed above, it can be a little confusing for a user to pick which to use.
Realistically though, the number of cards being randomised is fixed, so there is only one trade-off - total number of piles used versus the number of passes (deals). So once the user has input the number of cards, we can give the number of passes their current mat selection implies, and information about other options.
For the initial version, this is achieved using the 'Check Settings' Button and dialogue. This gives:
-
A basic judgement of how good the current settings are (GOOD if 1 or 2 passes, OK if 3, POOR if 4 or more).
-
Info on how many mat spaces the currently selected pile config gives.
-
Info on how many piles would be needed for either 2 or 3 passes - so what the user might be aiming for.
Currently, if the initial selection doesn't give what the user wants, they have to look at the number of piles needed and do quick mental maths (or just try a few of the presets). This is another area for development, depending on feedback from wider usage - given the number of cards, the page could suggest a config for example.
There are also options to manually set number to try specific combinations (within some restrictions).

Alternatives
For the first version of Soft Shuffle i've gone with letters and numbers and individual pages of instructions. There are some next steps i'd like to take that have occurred to me while i've been developing Soft Shuffle. How that might work via crowdfunding i've discussed in The Plan subsection of the Overview Page.
Features i'm considering include:
-
Images instead of / in addition to the letter/number labels - a small thumbnail per card showing / highlighting the pile / place on the mat that card is to be dealt to. Colours and patterns may aid in speeding up recognition even further.
-
Audio instructions - the instructions could be read out so the user isn't focus on a screen, just dealing to a mat.
-
Animation and moving away from a static page of instructions - that might be as simple as a timer that flips to the next page after a set number of seconds or as complex as a rhythm video game style with the instructions delivered as a stream.
Ultimately, none of these alter the underlying algorithm - they are just ways of displaying the same information differently, in ways that might make it easier / quicker for a user to process it. By having the instructions generated purely in terms of the array indices means bolting these on isn't particularly tricky. I am also very much interested in suggestions from the community as people get time using Soft Shuffle.
Source Code and Logging
As well as the explanation on this page, it's important to me to have open sourced the code behind the shuffler for a few reasons:
-
Trust - it's important that the code can be inspected to show it does what I say.
-
Offline availability - I wanted a self contained single-page version that could be used offline.
-
Understanding - for experience software engineers often it's easier to just inspect the code to understand what it's doing.
-
Contributions - as I continue developing it, i'll be happy to take community contributions.
The core code that the Shuffler page runs on is open sourced on github at https://github.com/SoftShuffle/soft-shuffler. There are cosmetic differences on the Shuffler page to make it fit in with the aesthetics of the rest of the site, but these are minor. A version that fully matches the open sourced version is on the Classic page. As new features are added to the page on this site they'll be pushed to github to match.

Logging
As well as a commenting the code, there is quite a lot of logging turned on by default. You don't have to download the source from github to make use of the logging, though the log output is slightly cleaner there as it has nothing from the main website.
To view the logging, I would normally recommend Chrome/Chromium based desktop browsers - although you can get the logs from most desktop browsers, Chrome displays arrays the best from my experience. I've not come across a mobile browser that lets you view logs, though if you find one that works well, let me know and i'll add it here.
Currently in Chrome you do:
3-vertical-dots-options -> More tools -> Developer Options -> Click the console tab
or
View -> Developer -> Javascript console.
The logging is an active area of development, but it is already possible to fully follow through what the code is doing, from the initial un-randomised array -> randomised -> all passes -> sorted array -> dealing instruction generation. When reading the logging, remember deck[0] is the base of the deck!

Fully Random vs Perfectly Random
To end the technical discussion, something that verges on the philosophical - what does random actually mean?
The reason I use Fully Random rather than Perfectly Random is partly practical, partly pedantic and partly philosophical, and related to that question.
From a practical, pedantic, standpoint - i've been an engineer and an academic for years, and calling something 'Perfectly XXX' makes me instantly suspicious. It's never that simple. In the case of Soft Shuffle, i'm confident you could call it Perfectly Random, if you were confident you had a perfect random number generator.
So again, from a practical perspective - I don't know what hardware you're running Soft Shuffle on. Even if it was, to assume that any algorithm for generating random numbers is perfect and can never be bettered is naive. And to be clear - we're discussing how do you approach perfect randomness with physical means. On the order of 'can I have a perfect die?' - 'can I have it machined to the perfect number of atoms?', 'how might friction and gravity affect it?'. That level of esoteric discussion. That said, I am confident in the quality of the random number generators currently available, in part given they're what underpins a lot of modern security - so if you're using a modern browser on modern hardware, it's fine, it'll be Fully Random.
But going beyond that into the realms of philosophical - what does the term 'A Perfect Random Number Generator' even mean? The statement 'each permutation of it's possible output being equally likely' is an easy statement to make, but difficult to form a practical conception of how it might work. Lets take two thought-experiment examples to highlight the problem.
First - is atomic decay random? This is the most well know of a collection of physical-phenomena based sources of randomness, generally considered the gold standard. From everything we've measured, and our current understanding of quantum mechanics, it appears to be random. Hook Soft Shuffle up to that and you can call it done, right? But what if that understanding evolves, and we reach a point where we can predict when which atoms will decay? Seems unlikely, having something that appears random but isn't, but how can we rule that out given open questions about (for example) Pi?
So what about Pi? It turns out Pi is very interesting for discussions of determinism (and randomness). We know pi is irrational (it can't be expressed as a fraction of two integers - the digits go on infinitely). We also know how to calculate any given digit of pi - I can pick a very large number, and calculate that digit of pi. Thus, it is entirely deterministic. However, looking at a digit of pi (and as many previous digits as you want barring hitting the actual beginning of pi), you cannot predict the next digit of pi. This looks like 'random' - it passes our current measures for determining if something is random. So is it random or deterministic, or is it a case of perspective? Could a RNG based on pi ever be a 'perfectly random'? Is it a circular argument of 'if i could pick a random place in pi it would be random, but then I need a separate perfect RNG to pick that place'?
I find 'it appears to be random but is deterministic' an interesting concept to ponder, and to me at least it seems like it is impossible to ever know whether a RNG is truly random, or just operating in a deterministic way we haven't yet uncovered.
It seems to be a subset of the whole question of determinism. Perhaps one day the question of determinism will be proved one way or the other, until then I question if you can use the term perfectly random in a meaningful way.
Ultimately for me these are interesting questions to ponder and discuss, but get away from the original more practical question of 'Can I fully randomise this deck of cards for the game I want to play' that we originally started with. And that question I am satisfied I have answered: yes I can, with Soft Shuffle.
