- ...group
- Which evolved from the
Transit Project: for more information, see the World Wide Web URL
http://www.ai.mit.edu/projects/transit/rc_home_page.html (UPDATED).
- ...compilation
- The term ``compilation'' is used
loosely here; even if the software package is written in an
interpreted language, there is still a point in time when conceptually
the program is no longer viewed as source code but as an application
to be used.
- ...process,
- In which
the programmer examines reams of profiler output to evaluate the
resulting program's performance, formulates alternative code for ``hot
spots'', and iterates, and iterates, and iterates....
- ...effort.
- Another
aspect of annotations to allow the programmer to explicitly encode
different alternatives in the source code is that such annotations
serve to document, in a structured fashion, what alternatives and
trade-offs have already been considered by the programmer.
- ...2
- Or the less general but more common ifdef
directive.
- ...compiler,
- Although the programmer
certainly should choose a name which he or she finds semantically
meaningful, for code readability.
- ...tag
- For convenience, it
may be desireable to introduce additional syntax, e.g.
unique:LARGE_SORT, to allow the programmer to specify that different
call sites should be analyzed separately, but still use the same
identifier that has semantic meaning to a programmer reading the
source code.
- ...example.
- See [], which presents Typesetter,
a system specifically to select between different implementations of
abstract data structures based on profiling feedback. Also, the
documentation for libg++, a freely distributable / library
provided by the Free Software Foundation for use with the GNU /
compiler, includes some pragmatic discussion and statistics about the
expected efficiencies of the various operations on different
implementations of the abstract data types; also discussed is the
improvement in performance when the programmer knows and indicates in
the source code exactly which representation is being used for a
particular variable of an abstract type at a given point in the code,
so that the compiler can skip generating run-time checks to determine
the actual representation and instead generate code to
immediately dispatch to the appropriate implementation.
- ...characteristics
- Static on any given
machine, sans field upgrades, but widely varying on different
implementations of a binary-compatible instruction set architecture.
- ...package.
- Derived from the MIT LCS c-parser,
available as ftp://theory.lcs.mit.edu/pub/c2c/c2c-0-7-3.tar.Z.
- ...annotated
- SUIF supports
attaching annotations containing arbitrary data to the structures
which represent various parts of the parsed C program; annotations
can be attached not just to nodes in the abstract syntax trees
(representing the statements and expressions), but also to
declarations and definitions in symbol tables.
- ...platforms
- Profilers for MS-DOS and Windows platforms are
likely to be using similar techniques.
- ...expensive,
- For example, actual timings of
1,000,000 iterations (normalized for the loop iteration overhead)
indicate 282 elapsed CPU clock cycles per call to time(NULL);
876 cycles per call to gettimeofday(&tv,NULL); and 766 cycles
per call to getrusage(RUSAGE_SELF, &ru). Much of this is
probably due to context switches; by contrast, it costs 9 clock cycles
per call to a dummy() function that simply returns its input
parameter --- and viewing the assembly code output verifies that
gcc has not merely silently inlined the function call.
- ...instruction
- Actual timings (normalized for
the loop iteration overhead) of 1,000,000 iterations of the
RDTSC instruction indicates it executes in 11 CPU clock cycles on the
Pentium implementation.
- ...paper
-
http://www.sun.com/sparc/Performance.html (UPDATED).
- ...painless.
- However, see Table
on
page
and related discussion in text for
further details on the performance of this implementation.
- ...saved.
- Although not implemented, it has come to my attention
that a number of C compilers provide a library routine on_exit()
which allows the registration of routines to be run when exit()
is called. That would have dethorned this particular problem, but the
more general problem still remains.
- ...incomplete:
- I decided that having some examples written up
would be more interesting than having a more sophisticated search
algorithm implemented.
- ...system
- More specifically, by system I mean the qsort()
implementation located in the default libc.a library used by the
default linker when invoked by the C compiler driver.
- ...incorrect.
- This is worth noting, because it is important that
the programmer supplying alternative code ensure that the alternatives
are semantically equivalent. However, sometimes programmers and users
are prone to rely on non-guaranteed semantics of a particular
implementation --- such as the sort stability of qsort(), for
example.
- ...seconds.
- Correctness was partially verified by checking the
output of eqntott using the Linux system qsort() against
the output of eqntott using an insertion sort; the outputs were
identical.
- ...usage.
- By complication, I mean that the library
writer could implement two functions qsort1() and
qsort2(), and document that the latter is preferred over the former
for expensive comparison functions for faster performance, but
otherwise the two behave similarly. This complicates the semantics
since the application programmer now has to think about which function
he or she wants to use from each different call site in the program.
- ...it.
- It would be even nicer if the compiler could
be instructed to generate and try all the different permutations for
the programmer, but this would require additional special quasistatic
syntax. Note that GNU gcc already supports an extension to C
which allows the programmer to designate any programmer-defined
function as being const, i.e., side-effect free and dependent on
the input arguments only. This extension currently allows gcc
to perform common subexpression elimination and loop hoisting on such
functions; however, this can also be used to get us part of the way to
a compiler which understood when it could rearrange boolean
expressions to arrive at a semantically equivalent formulation with
faster average performance.
- ...accuracy
- Mostly, detecting when round-off errors in
performing computations are likely to make the numeric results
meaningless. Numerical accuracy issues will not be considered for the
remainder of this section.
- ...#figurematrixtest#1437>.
- Observation: once
again the qif statement syntax is not powerful enough to permit
an elegant expression. What we'd like to be able to do is to tell the
compiler that there are N predicates and N actions to take should
the corresponding predicate be true, plus a default should none of the
other predicates be true. Then the compiler could construct all
possible orders in which to evaluate all possible subsets of the
predicates. This is similar to the boolean clause ordering problem
with eqntott.
- ...stages
- Also, a
number of operating system's linkers will rearrange functions and
static data structures to reduce the number of cache collisions based
on profiling feedback; some third-party products can perform this same
function on already-linked executables.
- ...MacOS).
- Out of these example systems,
some versions of DOS do at least provide a BASIC interpreter, and OS/2
provides a REXX interpreter.
- ...portable.
- There are minor problems with transporting
intermediate format files from one platform to another because the
intermediate format files contain declarations from system header
files, the exact types of which may vary slightly from platform to
platform.
- ...function
- Possibly under some
implementations the additional prologue code and the storage slot
created even for prof-style profiling are a no-op and go unused,
respectively.
- ...architecture.
- For example, [] spends several chapters
discussing the dramatically different performance characteristics of
instructions sequences on an Intel i386 vs. i486 vs. Pentium. A
i486 is RISC-like in that many more instructions will execute in just
one cycle than on an i386, and the Pentium is dual-issue superscalar;
any pixie-like tool would have to take this into account in
order to calculate meaningful ideal times lest a sequence of
instructions carefully tuned for the Pentium's two pipelines be
unfairly assigned far more time than it actually takes to execute
because pixie was using cycle counts for instructions executing
on an i386. Along similar lines, [,] look at
instruction set utilization and the difference in SPEC92 benchmark
results for MIPS- and SPARC-based workstations when compilers were
given command-line options to permit them to assume later
implementations of those chip architectures.
- ...table.
- qsort() is not
sufficiently parameterized; if qsort()'s interface had included
passing in a function to perform element swapping, then it would be
possible to use qsort() itself for this purpose.
Reinventing Computing, MIT AI Lab. Author: pshuang@ai.mit.edu (Ping Huang)