Semantically, cmppt()'s inner loop is comparing rows in a truth
table where the values 0 or 1 represent boolean false or true for that
particular table entry, and 2 represents ``don't care''. The ``don't
care'' values are being ``flattened'' on the fly to the value 0 for
comparison ordering purposes. Such flattening is being done
repeatedly on each row: merge sort's 35,944 calls to cmppt()
(for the reference input file) means that an average of 17.5
flattenings are performed on each of the 4106 truth table entries;
quicksort's 2,841,621 calls to cmppt() means an average of 1384
flattenings are performed on each entry. It may or may not prove
faster (depending on whether O() behavior was common) to make a
copy of the entire truth table, mutate all the entries in it once to
flatten them, and then re-write cmppt() to perform faster
comparisons by using entries from the copy of the table, and write a
customized sort routine that perform element swaps on entries in both
the copy and the original table.
This adds both
a per-program-execution cost of O(N) time and O(N) space (probably
with fairly large constants, however) to copy and flatten entries in
the copied table, and additional overhead (O( number of
cmppt() calls)) in the new sorting routine to swap four elements
every time the unmodified sort routine would have swapped two
elements. The choice to quasistatically choose between this
transformation and the original might look something like this:
Once we consider compressing the ``don't care'' value down to 0, another transformation suggests itself. If each truth table entry is considered a vector of bits, then even cmppt_using_copy() is extremely inefficient, because it is performing the comparison between two bit vectors one bit at a time, when the hardware of the processor is capable of doing exactly the kind of comparison it is doing many bits at a time. To modify initialize_truth_table_copy() and cmppt_using_copy() to collapse this inefficient representation of bit vectors by using shifts and OR's to coalesce every K truth table entries in each row into a single unsigned integer in the copy of the truth table would greatly increase the constant in the O(N) run time for initialize_truth_table_copy(), but it should greatly decrease the run time of cmppt_using_copy(). Furthermore, ``natural'' values for K that come to mind include 8, 16, and 32, but really any value from 2--32 should be considered. Using quasistatic constructs to code all this would relieve the programmer of the task of trying to determine whether this parameterized transformation is on average advantageous or not.