Semantically, cmppt()'s inner loop is comparing rows in a truth table where the values 0 or 1 represent boolean false or true for that particular table entry, and 2 represents ``don't care''. The ``don't care'' values are being ``flattened'' on the fly to the value 0 for comparison ordering purposes. Such flattening is being done repeatedly on each row: merge sort's 35,944 calls to cmppt() (for the reference input file) means that an average of 17.5 flattenings are performed on each of the 4106 truth table entries; quicksort's 2,841,621 calls to cmppt() means an average of 1384 flattenings are performed on each entry. It may or may not prove faster (depending on whether O() behavior was common) to make a copy of the entire truth table, mutate all the entries in it once to flatten them, and then re-write cmppt() to perform faster comparisons by using entries from the copy of the table, and write a customized sort routine that perform element swaps on entries in both the copy and the original table. This adds both a per-program-execution cost of O(N) time and O(N) space (probably with fairly large constants, however) to copy and flatten entries in the copied table, and additional overhead (O( number of cmppt() calls)) in the new sorting routine to swap four elements every time the unmodified sort routine would have swapped two elements. The choice to quasistatically choose between this transformation and the original might look something like this:
Once we consider compressing the ``don't care'' value down to 0, another transformation suggests itself. If each truth table entry is considered a vector of bits, then even cmppt_using_copy() is extremely inefficient, because it is performing the comparison between two bit vectors one bit at a time, when the hardware of the processor is capable of doing exactly the kind of comparison it is doing many bits at a time. To modify initialize_truth_table_copy() and cmppt_using_copy() to collapse this inefficient representation of bit vectors by using shifts and OR's to coalesce every K truth table entries in each row into a single unsigned integer in the copy of the truth table would greatly increase the constant in the O(N) run time for initialize_truth_table_copy(), but it should greatly decrease the run time of cmppt_using_copy(). Furthermore, ``natural'' values for K that come to mind include 8, 16, and 32, but really any value from 2--32 should be considered. Using quasistatic constructs to code all this would relieve the programmer of the task of trying to determine whether this parameterized transformation is on average advantageous or not.