Some Quicksort Variants

First, let’s re­view quick­sort a bit. Below is a (simple) im­ple­men­ta­tion of this fa­mous al­go­rithm:

// quicksort implement
template<typename T>
int partition(vector<T>& a, int lo, int hi) {
    int i = lo, j = hi;
    T v = a[lo];
    while (true) {
        while (a[++i]<=v) if (i == hi) break;
        while (v<=a[--j]) if (j == lo) break;
        if (i >= j)  break;
        swap(a[i], a[j]);
    }
    swap(a[lo], a[j]);
    return j;
}

template<typename T>
void quicksort(vector<T>& a, int lo, int hi) {
    if (lo >= hi-1) return;
    int j = partition(a, lo, hi);

    quicksort(a, lo, j);
    quicksort(a, j+1, hi);
}

Analysis

The com­plex­ity is eval­u­ated through the func­tion:

C(N)=n1+1ni=0n1(C(i)+C(ni1))

In the partition part, n1 com­par­isons are per­formed, then quick­sort is called twice with sizes i and ni1, the prob­a­bil­ity for each el­e­ment in the ar­ray is uni­formly dis­trib­uted as 1n, so we get the above ex­pres­sion.

The so­lu­tion method is de­tailed in books like Introduction to Algorithms, Concrete Mathematics, Introduction to Analysis of Algorithms. Briefly:

  1. Multiply both sides by n.
  2. Write the ex­pres­sion for n1.
  3. Calculate C(n)C(n1) to get a lin­ear re­cur­rence re­la­tion, which can be solved.

Limiting the Worst-case

One fa­mous method to avoid quick­sort’s worst-case is to shuf­fle the in­put ar­ray be­fore us­ing quick­sort. Us­ing Knuth shuf­fle with com­plex­ity O(N) en­sures the al­go­rith­m’s ef­fi­ciency. More­over, shuf­fling the in­put makes this im­prove­ment, of­ten called ran­dom­ized quick­sort, a clas­sic ex­am­ple in an­a­lyz­ing ran­dom­ized al­go­rithms. I have a post dis­cussing the analy­sis of ran­dom­ized quick­sort.

Caution

Link above is in Vietnamese.

Besides shuf­fling, there are other ap­proaches:

  • Instead of choos­ing the pivot at the be­gin­ning or end, choose the me­dian value.
  • Another worst-case is when all val­ues are the same; in this case, use 3way-quicksort.

Removing Recursion for Small Arrays

This is a trick us­able for any di­vide-and-con­quer al­go­rithm: when the re­cur­sion gets small enough that the in­put ar­ray is be­low a cer­tain thresh­old (10 – 15 el­e­ments), us­ing insertion sort will sig­nif­i­cantly im­prove per­for­mance. Just add this to the quick­sort func­tion:

///...
if (hi-lo < CUTOFF) {
 insertion_sort(a, lo, hi);
 return;
}
///...

Eliminating Boundary Value Comparisons

With the im­ple­men­ta­tion above, you can re­move the if state­ment in the while as fol­lows:

  • You can re­move the if when check­ing j be­cause a[lo] < a[lo] never hap­pens.
  • To re­move the if in i, be­fore call­ing quick­sort, set a flag value like MAX_INT. This al­ways en­sures there is a value greater than a[lo].

Median-of-three and Median-of-five

A clas­sic trick to avoid worst-case: sam­ple 3 or 5 el­e­ments, take the me­dian value, and use that as the pivot.

Tukey ninther quick­sort

Use 9 el­e­ments in­stead of 3/5.

3way Quicksort

This is the fa­mous al­go­rithm pro­posed by Dijkstra along with the Dutch flag prob­lem. Used to avoid cases with too many du­pli­cate val­ues in the ar­ray. Interestingly, this quick­sort ver­sion is op­ti­mal in terms of en­tropy. A brief proof is men­tioned in Algorithms, R. Sedgewick.

If any­one won­ders why Sedgewick is men­tioned so much in this post, it’s sim­ply be­cause his PhD the­sis is ti­tled: Quicksort.

A rather mag­i­cal ver­sion by Jon Bentley (author of Programming Pearls, col­lab­o­rated with Sedgewick to prove the en­tropy-op­ti­mal quick­sort):

void quicksort(int l, int u){
    int i, m;
    if(l >= u) return;
    m = l;
    for(i = l+1; i<= u; i++)
        if(x[i] < x[l])
            swap(++m, i); //this is where I'm lost. Why the heck does it preincrement?

    swap(l, m);

    quicksort(l, m-1);
    quicksort(m+1, u);
}

Non-recursive Version

Also an im­prove­ment pro­posed by Sedgewick, us­ing a stack in­stead of re­cur­sion. A very de­tailed C im­ple­men­ta­tion is de­scribed here. This ver­sion in­cludes the im­prove­ments men­tioned above:

  • CUTOFF
  • Median-of-three
  • Non-recursive and op­ti­mized stack size.

This is also the ver­sion of qsort in GNU C.

Sedgewick’s Java ver­sion is also worth ex­plor­ing: source code QuickX.

Samplesort

This is a mod­i­fied ver­sion of quick­sort to run in par­al­lel al­go­rithms. The idea is to use quick­sort to find piv­ots for each process, then run quick­sort on each dis­trib­uted pivot. This ar­ti­cle de­scribes and an­a­lyzes the al­go­rith­m’s com­plex­ity.

References

  1. Cormen, Thomas H. Introduction to al­go­rithms. MIT press, 2009.
  2. Graham, Ronald L., et al. Concrete math­e­mat­ics: a foun­da­tion for com­puter sci­ence.” Computers in Physics 3.5 (1989): 106 – 107.
  3. Sedgewick, Robert, and Philippe Flajolet. An in­tro­duc­tion to the analy­sis of al­go­rithms. Pearson Education India, 1996.
  4. Sedgewick, Robert, and Kevin Wayne. Algorithms. Addison-Wesley Professional, 2011.
  5. Bentley, Jon. Programming pearls. ACM, 1986.