Next: Future Work Up: Conclusions Previous: Related Work

Practicality Issues

Program distribution.

The clever/smart compiler model described throughout this thesis presumes that source code is available to the end-user for recompilation. Although a fair amount of software have freely distributable source code, much software is owned by companies which refuse to make source code available except under tightly controlled conditions, in order to protect their trade secrets and proprietary rights in their capital investment. A possible solution to this particular problem might be the wide adoption of some intermediate format which could be made available instead of the human-readable representation of the source code. Such an intermediate format, to satisfy the needs of a clever/smart compiler, would need to preserve a reasonable amount of high-level programming language structure, but still inhibit reverse-engineering. (SUIF, for example, would be unsuitable for distribution purpose, since a ``reverse-engineering'' pass is not just possible given the information stored in the intermediate format, but part of the standard distribution.) The Architecture-Neutral Distribution Format (ANDF) technology being worked on by the Open Software Foundation [] offers a possible solution to this dilemma. ANDF is a language-independent and machine-independent intermediate language that came into existence largely to address the problem of commercial software distribution for multiple instruction set architectures; hence, some of its design goals address this source code access problem. Namely, ANDF attempts to preserve as much semantic information as possible so that the translator from ANDF format to the native platform binary format can perform as much optimization as possible, while still inhibiting reverse engineering (for examples, identifiers are replaced with unique tags). ANDF is complex because it attempts to solve many other problems as well, and its success in the marketplace seems questionable. However, the corpus of ANDF-related work should be considered with respect to any further evaluation of the practicality of distributing ``source code''.

Compiler availability.

The clever/smart compiler model also presumes that the end-user has access to the compiler. This is increasingly not a safe assumption. Although users of UNIX workstations frequently have access to the system cc compiler, the vast majority of computer users use systems which do not include a compiler in the base system software (e.g., IBM compatibles running DOS, Windows, OS/2, or NT; or Macintoshes running the MacOS). Even in the UNIX world, it is becoming increasingly common for users to be given accounts which do not have access to compilers (on the grounds that they have no need for access to a compiler), and for system vendors to unbundle their optimizing compilers from the base system software in order to generate additional revenue by selling separate licenses to developers, bundling only an older less-capable compiler. If this trend does not reverse itself, the viability of the clever/smart compiler model would be seriously impaired.

Debugging and testing.

A correct smart compiler would be tricky to write --- correct meaning that the compiler only made program transformations which were semantics-preserving, no matter what the input program. However, it would not be any more difficult for a programmer to use than a normal compiler. A correct clever compiler is much easier to write, but the obligation placed on the programmer using the clever compiler is substantial. While writing the program, more emphasis will probably have to be given to data invariants and attempting to ensure that all legal combinations of alternatives result in a program which preserves invariants. Debugging will be more difficult since the programmer now needs to keep track of the settings of the quasistatic variables and parameters in order to reproduce a bug and in order to know what version of the program to examine. Comprehensively testing the program will become that much more intractable when there are an explosive number of alternative versions of the program. However, not all is bleak: formalizing ways of expressing alternative code, and thus replace current ad hoc methods, may result in some improvement in source code comprehensibility.

Next: Future Work Up: Conclusions Previous: Related Work

Reinventing Computing, MIT AI Lab. Author: pshuang@ai.mit.edu (Ping Huang)