.\" To runoff, use: tbl file | troff -ms | print .nr PS 12 .nr VS 14 .TL Parallel and Distributed Compilations in Loosely-Coupled Systems: A Case Study .AU Erik H. Baalbergen .AI Dept. of Mathematics and Computer Science Vrije Universiteit Amsterdam The Netherlands .AB One application of large-grain parallelism is the use of parallel and distributed compilations by .I make , running under .UX . The original version of .I make executes its compilation commands successively. 'Making' a large system could therefore take a large amount of time. An increase in efficiency may be achieved by a parallel version of .I make , which tries to execute the compilations simultaneously. A parallel, non-distributed, version of .I make turns out to be inefficient. The compilations, which are mainly cpu-bound, slow each other down due to degradation of the processor's performance. A solution may be found in the idea of boarding out (part of the) compilations to other processors. This resulted in a study of how to do compilations in a distributed manner. .PP The aspect of having a system of loosely-coupled processors is an important issue in the field of distributed compilations. The relatively high cost of doing transactions (compared to local actions) in a loosely-coupled system makes the use of low-level inter-processor communication (e.g., the execution of system calls on another processor) inefficient. A .UX network system like the .I Amoeba Connection .R turns out to be unsuitable for doing distributed compilations. It is shown that much overhead results from the communication between the system that contains the source code to be compiled and the system that does the compilation. Another possibility is to copy the source code to the other processor's data space, execute a local compilation on that processor and send the results back; this greatly reduces the communication overhead. The time needed to send the source to and receive the code from the remote processor is negligible compared with the overhead mentioned earlier. .PP In order to create a parallel and distributed .I make , I adapted the original 'make' program by adding a module for finding out which compilations can be executed in parallel, depending on the actions to be taken, the actions already finished, and the present files. Furthermore, I created various versions of the .UX C compiler .I cc in order to perform some measurements. .AE .NH INTRODUCTION .PP The availability of networks of personal workstations has increased interest in parallel and distributed compilations. A decrease of the response time is the main motive for executing compilations in parallel on several processors. This paper is a description of an experiment set up to examine the feasibility of using more than one processor for doing compilations. The experiment took place in a .UX environment and consisted of two parts: creating two different distributed versions of the C compiler .I cc and constructing a version of the .UX tool .I make [Feldman78] that could run its compilations in parallel. The aspect of being tightly-coupled or loosely-coupled turns out to be an important issue in determining whether a specific network is suitable for doing distributed compilations. .PP The test configuration consisted of four VAX 11/750s, each of them running .UX version 4.1 BSD. The machines are connected by a 10 Mbps token ring (Pronet). Pronet was made available to user programs by incorporating the .B Amoeba 3.0 [Tanenbaum81] communication primitives into .UX [Renesse84]. .PP A distributed compilation can be done in various ways. One possibility is to create a version of the compiler in which the system calls may be carried out remotely. This can be achieved by using a .UX network system like the .I Amoeba Connection, .R similar to the .I Newcastle Connection .R [Brownbridge82]. The system combines the file systems of each of the connected machines by allowing access to files and execution of programs on other systems. The compiler runs remotely (i.e., on another processor) but each system call concerning the source code should be executed on the processor that runs the file system of the source code. Another possibility is to isolate components from the compiler and execute some of them remotely. One problem with remote compilations is that the compiling program should produce code for the local machine. Each of the connected machines should therefore have a compiler for each of the other machines. This is no problem if the connected machines are similar, as is the case in the test environment. Another problem arises in the second kind of compilation: libraries and included source code should be derived from the source machine. This, too, caused no problem in our test environment, as will be shown. .PP It must be said that many of the results depend strongly on our configuration, especially with regard to the communication overhead. Results from a common network operation, performed in the same configuration, are included to give an indication of the overhead. .NH AMOEBA AND THE AMOEBA CONNECTION .PP .B Amoeba is a distributed operating system developed at the Vrije Universiteit [Tanenbaum81]. The .B Amoeba communication primitives are described in [Tanenbaum84]. .B Amoeba uses a "request-reply" or "transaction" style of communication, in which the basic primitive is the client sending a request to a server and the server sending a reply back to the client. Such a pair of request and reply messages is henceforth called a .I transaction . The implementation of the primitives in .UX created the possibility for two processes running on different systems to communicate by means of transactions. This has been exploited in various user programs such as copying files between various systems, remote execution of commands, sharing of resources and remote logging in [Renesse84]. An application of the transaction mechanism is a .UX system-call server. A process on machine .I A can ask a system-call server on machine .I B to execute a system call, such as .B open , .B read , or .B write . The strategy used to implement the remote system calls is to build an extra layer on the kernel. A program does not directly invoke the kernel but calls a stub routine which checks whether the command must be done locally or remotely. Local commands are passed directly to the kernel. Remote commands are passed to a system call server on the proper machine by doing a transaction with the system-call server. A great advantage of the use of this extra layer is that existing programs need not be rewritten or even recompiled. They only have to be relinked with a library of stub routines. The naming scheme for remote files (i.e., files on other .UX systems), the system-call server and the stub-routine library together form the .I Amoeba Connection. .R The connection was found useful in our experiment, although the overhead was large. .PP The following table gives an indication of the speed of the connection in terms of response time, measured in seconds. Three versions of the .UX file-copy command .I cp are compared: plain .I cp , able to copy files on the local machine only; .I rcp , the inter-machine file-copy program as described in [Renesse84]; and .I fcp , which is plain .I cp linked with the .I Amoeba Connection .R library. 'Local' is a file copy from one disk to another on the same system. 'Remote' is a file copy from the local system to another system. All measurements took place on lightly loaded machines. .TS center,box; c||c|c s|c s. number of bytes cp rcp fcp .T& c||c|c|c|c|c r||r|r|r|r|r. \^ _ _ _ _ _ \^ local local remote local remote = 1 0.28 1.08 3.55 0.25 0.40 _ 1024 0.28 1.20 3.60 0.25 0.43 _ 10240 0.40 1.93 3.95 0.38 0.83 .TE .ce Table 1. .NH THE EXPERIMENT AND ITS RESULTS .NH 2 A distributed compiler .PP The first phase of the experiment was to construct two distributed versions of .I cc which is the .UX C compiler. .I Cc is a program that causes C source code to be passed through several compilation programs. The first step is done by the C preprocessor .I cpp which performs macro substitution, file inclusion and elimination of source code, depending on several user-specified, preprocess-time conditions. Next follows the compiler proper, .I ccom , which is a two-pass portable C compiler [Johnson79]. The assembly code generated by the two-pass compiler is translated to object code by the .UX assembler .I as . The program .I ld , which is the .UX link editor, finally combines the object programs, together with some libraries, into one program which may be executed. The first distributed version of the C compiler is based on the .I Amoeba Connection. .R The four compilation programs are relinked to allow files on other systems to be compiled. The resulting compiler, together with its driver .I cc , is installed on each of the connected machines. (Having multiple copies of the compiler is in fact an optimization with regards to our implementation of the .I Amoeba .I Connection ; remote execution is allowed only if the program is situated in the file system belonging to the remote processor.) The remote compilation of a source file on .B A is now done by invoking .I cc on some other machine .B B. The overhead caused by the implementation of the .I Amoeba Connection .R is large. A transaction takes one system call on the local machine and two system calls on the remote machine. Furthermore, a remote system call takes one system call on the remote machine. It turns out that if the compiler executes one remote system call, four system calls are needed to get the result. Communication overhead, too, plays an important role. Some results, compared to local execution, are listed in table 2. .PP The second distributed version of the compiler was created using the local versions of each of the four components. .I Cpp and .I ld run locally, because both programs are strongly file-system dependent. .I Cpp may include some standard files and .I ld may use some libraries from the local file system. .I Ccom and .I as are combined into a server which takes preprocessed C-source code as input and produces object code as output. The input and output (and compiler diagnostics) may be sent across the network using transactions. The compiler server can therefore run remotely, getting its input from the locally-executed preprocessor and sending its output to the local file system. The advantage of this approach is that the various components need not be recompiled and that the only network communication is limited to invoking the server and passing the input and output. Some results for response times (compared to plain .I cc ) are listed below. .TS center,box; c||c|c s|c s. T{ number of C source lines T} T{ local using cc T} Amoeba Connection Compiler server .T& c||c|c|c|c|c. \^ \^ _ _ _ _ \^ \^ local remote local remote = .T& r||r|r|r|r|r. 20 3.27 4.05 21.45 3.07 5.58 _ 200 9.17 9.82 30.57 9.07 11.33 _ 2000 68.75 68.68 105.28 68.98 69.63 .TE .ce Table 2. Table 2 shows that remote compilation using the compiler server gives a better response time than using the compiler, based on the Amoeba Connection. .NH 2 Distributed and parallel \fImake\fR .PP The second part of the experiment was to apply the result of the first part to the .UX program .I make . Many programs developed under .UX consist of a set of C source files which have to be compiled. .I Make performs the compilations sequentially. Doing the compilations in parallel on the same processor is not a solution, as table 3 shows. A solution is found in parallel and distributed execution of the compilations. Table 3 shows results of the three ways of doing independent compilations (i.e., no compilation uses results of the other compilations.) The file to be compiled was about 1800 lines. The maximum number of available processors was 4. The compiler server is used for doing the distributed compilations. .TS center,box; c||c|c|c c||c|c|c r||r|r|r. number of locally locally distributed compilations sequentially in parallel in parallel = 1 68.67 - - _ 2 137.34 127.32 69.58 _ 3 206.00 183.45 73.22 _ 4 274.66 242.50 78.35 .TE .ce Table 3. .PP Adapting .I make in order to execute compilations in parallel on several processors was done easily. .I Make maintains a list of programs that can also run on other machines. A command is executed in parallel to the other commands under the following conditions: the program appears on the distributed-program list; the necessary files are present; the compilations on which the command depends are already done; and there is still enough remote processing power. The only program currently in the distributed-program list is .I cc , but other compilers and translating programs may be added to this list. The processing power of a machine is computed by keeping an account of the number of compilations started by .I make on that machine. .NH PLANS .PP From the experiment we learned that splitting up compilations and running the environment and machine independent phases on other processors results in a remarkable increase in response time, especially if several sources need to be compiled. Splitting up a single compilation is also a result of the philosophy behind the .B Amsterdam Compiler Kit (ACK), .R a project at the Vrije Universiteit in the area of compiler construction [Tanenbaum83]. A compilation in .B ACK causes a program to pass through several components: a front end, which translates the program into machine-independent intermediate code; a peephole and global optimizer; a back end, which translates from intermediate code into assembly code; an assembler; and a linker. The idea is to have a pool of processors, each of them running a dedicated server performing one of the components. .B ACK tries to allocate a server for each of the phases and passes the program through the pipeline of servers. .NH References .IP [Brownbridge82] .br D.R. Brownbridge, L.F. Marshall and B. Randell, "The Newcastle Connection or UNIXes of the World Unite!," .I Software-Practice and Experience, .R vol. 12, pp. 1147-1162 (1982). .IP [Devarakonda85] .br M. Devarakonda, R. McGrath, R. Campbell and W. Kubitz, "Networking a Large Number of Workstations Using .UX United," .I Proc. 1st IEEE Int. Conf. on Computer Workstations, .R pp. 231-239 (1985). .IP [Feldman78] .br S.I. Feldman, "Make - A Program for Maintaining Computer Programs," appeared in .I UNIX Programmers Manual, .R vol. 2A, Bell Laboratories, Murray Hill NJ, 1979. .IP [Johnson79] .br S.C. Johnson, "A Portable Compiler: Theory and Practice," .I Proc. 5th ACM Sym. on Principles of Programming Languages, .R pp. 97-104 (January 1978). .IP [Miller82] .br J.A. Miller and R.J. LeBlanc, "Distributed Compilation: A Case Study," .I Proc. 3th IEEE Int. Conf. on Distributed Computing Systems, .R pp. 548-553 (October 1982). .IP [Renesse84] .br R. van Renesse, A.S. Tanenbaum and S.J. Mullender, "Connecting .UX systems Using a Token Ring," Rapport IR-91, Vrije Universiteit, Amsterdam, The Netherlands (October 1984). .IP [Tanenbaum81] .br A.S. Tanenbaum and S.J. Mullender, "An Overview of the Amoeba Distributed Operating system," .I Operating Systems Review, .R vol. 15, no. 3, pp. 51-64 (July 1981). .IP [Tanenbaum83] .br A.S. Tanenbaum, J.M. van Staveren, E.G. Keizer and J.W. Stevenson, "A Practical Toolkit for Making Portable Compilers," .I CACM vol. 26, no. 9, pp. 654-660 (September 1983). .IP [Tanenbaum84] .br A.S. Tanenbaum and S.J. Mullender, "The Design of a Capability-Based Distributed Operating System," Rapport nr. IR-88, Vrije Universiteit, Amsterdam, The Netherlands (November 1984).