A traditional optimizing compiler relies entirely on static analysis of the program's source code. The kinds and scope of static analyses which can be practically undertaken have increased dramatically in the last decade. The more sophisticated understanding has made possible more extensive program transformations which have proven their worth in the increased performance of programs, e.g., when benchmarks and real world programs run noticeably faster on the same computing platform solely due to an improved release of a vendor's compiler. However, static analysis by definition can only guess at the performance characteristics of a program. Trying to ascertain the true run-time characteristics of a program would effectively require the compiler to emulate the target architecture and hardware/software run-time environment to simulate running the program --- and despite vast improvements in emulation technology, emulation is still relatively impractical. Furthermore, there would still be the problem of not having the typical usage input data sets for the program available.
Does it matter that static analyses only give the compiler an incomplete understanding of the program? It can; certain kinds of speculative transformations are not performed or even considered either because they don't always improve performance and may even sometimes worsen performance, or because they may cost more in space overhead than can generally be justified --- examples include inlining and loop unrolling, techniques which currently are only used by agressive optimizing compilers, but only in a very restricted fashion. However, with run-time profiling feedback, a smart compiler [] can try out these more speculative transformations and ascertain which appear to be beneficial in a particular instance. In short, a smart compiler can run experiments.
Ideally, a smart compiler would take normal source code, written by a programmer who gave no thought whatsoever to quasistatic computing issues, use profiling feedback to determine what parts of the program are ``hot'' and are therefore interesting loci, and automatically analyze those parts of the program for potential radical transformations. However, we need stepping stones on the way to that eventual goal: one such stepping stone which has been proposed is a clever compiler. A clever compiler relies on programmer annotations in the source code to help it decide how to organize and analyze profiling feedback gathered at program run-time; it is also up to the programmer to indicate potential transformations by encoding alternative code sequences with the annotations. Thus, a clever compiler can still quasistatically choose between the programmer-indicated alternative code implementations, but falls short of a smart compiler in that it cannot create alternative implementations through its own semantic analysis of source code.
Figure: Automating the profiling feedback loop.
Although taking advantage of a clever compiler requires some
additional programmer effort, it should still be much easier than the
traditional manual performance optimization process, and retains
the quasistatic computing advantage that the program will dynamically
evolve in response to changing usage patterns and changing computing
platforms without continuing programmer effort.
Figure
illustrates the difference in the
feedback loop between a programmer using a traditional compiler and a
programmer using a smart compiler. Note that the programmer using the
smart compiler is not involved in the iterative loop. Substituting a
clever compiler for a smart compiler, the programmer would not be
involved in the iterative loop, although there might be an extra step
for the programmer to annotate the source code for the clever compiler
to work with.
Figure also shows that it is important that the
profiling mechanisms used by quasistatic computing be very
light-weight in nature, since the end-user is always using
an instrumented executable. A number of different mechanisms to
gather profiling data exist (e.g., prof, gprof,
pixie), with overheads ranging from 3--5% to 40--200%. While a
programmer who is explicitly running test cases is usually willing to
tolerate that kind of slow-down, an end-user would be unhappy.
Admittedly, after a period of time of sustained experimentation, the
smart compiler can conclude it has likely converged on a near-optimal
program, and reduce the frequency and level of instrumentation and
experimentation, thus reducing overhead to near zero. However, it is
still necessary to keep overhead low even during that experimental
phase. Regardless of the particular mechanism used, a smart compiler
should be able to substantially reduce the profiling overhead, since
it is only interested in gathering performance statistics
relevant to the particular experiments it is conducting at any given
time, and not detailed statistics about all parts of the program.