The select_alternatives pass performs several tasks: it checks for existence of a raw profiling data file, and if found, processes the file; it decides what experiment to conduct next, or if it is ready to generate a non-experimental version; finally, it creates the actual executable corresponding to its decision.
There are many sources of noise in the profiling data, including short-term variations in usage pattern and temporary variations in machine load (creating noise for the real time, but not CPU time, profiling mechanism). Summing data over repeated program runs can help filter out much of the noise; hence, the current implementation of the clever compiler generates executables which after four runs which successfully result in profiling data saved, will re-invoke the compiler. Four is merely an arbitrarily chosen default, and can be overridden by the RUNS environment variable.
It would be preferable to have the default number of runs before recompilation be adaptive --- for example, if select_alternatives notices that the profiled parts of the program collectively took less than a second of real time to execute in the last four runs, then it could adjust the number of runs upward to reduce the frequency of recompilation, since there is not much benefit to trying to optimize this program, and recompiling less often will help amortize the cost of recompilation. Another factor in how many runs should occur before recompilation is whether the clever compiler is still in an experimental phase, when most of the program version space still unexplored, or whether it has reached an equilibrium phase, when much of the space has been adequately explored and the clever compiler therefore has a good idea what version is optimal for current usage. Even in the equilibrium phase, the clever compiler should still monitor program performance, but recompilations can occur at lengthier intervals.
Given the formulation of the problem faced by the clever compiler on
page as searching a space of
program versions, any of a large number of generic techniques for
space-searching and function minimization can be applied. Because of
time considerations, I chose to implement the most native possible
procedure, a combination of the generate-and-test principle and
the British Museum procedure: namely, select_alternatives
generates all possible program versions and tries them all in the
generated order. The current implementation of even this procedure is
incomplete:
to complete this procedure, it would just need
to automatically choose the program version which generated the
shortest run time, whereas currently it emits a warning to the user
when it runs out of program versions to try.
Note that some function minimization techniques such as hill-climbing (alternately, gradient descent) explicitly require functions of nearly continuously variable parameters which have calculable derivative or gradient functions. However, other techniques such as downhill simplex method and Powell's method (direction-set methods) [] do not require the derivative or gradient. Still other techniques, like simulated annealing, are designed to solve combinatorial problems, where the dimensions of the space are discrete rather than continuous, and it is explicitly recognized that in such a configuration space, the notion of ``continuing downhill in a favorable direction'' may be meaningless. Thus the latter set of techniques seem to offer a better fit.
Given that the clever compiler is searching in a very specific kind of
domain, additional heuristics may be applied to improve the efficiency
and the effectiveness of a combinatorial space search. For example,
non-interacting sets can be modeled thusly: if a represents the
first dimension (a set of 10 quasistatic variables which are always
used together in the same qif statements), and b represents
the second dimension (a quasistatic parameter which can take on any of
4 different values), and c represents the third dimension (a
quasistatic parameter which can take on any of 20 different values),
then we're seeking to minimize in a space of 800 possible
program versions. However, if a's effect on the performance of the
program and b's appear to be dependent on each other because there
is a qinfluence statement for b which encloses one of the a
qif statements, but c's appears to be ``completely
independent'', then we can rewrite what we are trying to minimize as
, in which case
and
can be
independently minimized, effectively collapsing the search space.
Also, as a first approximation, the clever compiler's search engine
might assume that
if c1
and c3 are relatively close to each other in value and it is known
that
--- this is a weak belief that
is close to continuous when moving onlyu along the
dimension corresponding to a quasistatic parameter. Finally, if some
profiling indicates that the parts of the program for which a and
b influence the performance take up a negligible percentage of the
total run-time, then the search engine can to a first approximation
ignore the a and b dimensions altogether.
A decision determines exactly one version of the program, and consists of the quasistatic variables which have been selected for, all quasistatic parameters, each with its selected value. Implementing a decision merely requires creating definitions in the SUIF file with the appropriate initializing value for each quasistatic variable (assigned the integer value 1 if selected for, 0 if not) and for each quasistatic parameter. Then the create_executable script is invoked, and it calls the SUIF s2c pass and then gcc to compile and link the program against the profiling library.