To: psg@neutron.lcs.mit.edu Subject: Another approach to object references, forwarding, etc. Date: Wed, 17 Oct 90 22:25:17 -0400 From: Bob Gruber Sanjay and I thought up the following scheme for traversing object references in the parallel language's runtime system. (1) All object references are local addresses, e.g., 32 bits on a 32 bit word architecture. If an object is local, then references to it point directly to its rep. The rep contains the object's OID and also a pointer to the object's method table (shared by all local objects of a given type) and . Method invocation on local objects is simply an indirection through the method table of the object, where the method name maps to a compile-time offset into the table. (2) Some references will be to non-local objects. All such references will actually point to local forwarder objects. A forwarder object's rep includes the OID of the remote object, possibly other info (such as a hint as to its current location), and, like other objects, a pointer to a method table. Method invocation on non-local objects proceeds in exactly the same way as above, i.e., an indirection through the forwarder's method table. The method table of a forwarder contains methods that translate the invocations into message sends to the remote object (described next). (3) Local references only make sense in the local processor's address space. Thus, when a forwarder forwards a message to a remote object, the receiver and the arguments, which will be local references, must be converted to OID's. Since both local and forwarder objects contain OID's, it is trivial to translate local references to OID's. N.B. Small immutable objects should be sent as values. (4) When the runtime system receives a message from another processor, it must translate the OID's of the receiver and the args to local references. Each processor maintains a hash table that maps OID's to local addresses. If an OID does not map to a local address, this means that this is the first occurrence of this OID at this processor, in which case the OID must be for a remote object... a forwarder object is created containing the OID, and its local address is added to the hash table. Note: the hash table contains entries for all local objects as well. When an object is created, its OID is placed in its rep, and the OID to local address mapping is added to the hash table. (5) Method caching: like the global method table approach described in Carl's meeting summary, you can do per-method code caching as follows. Any slot in a type's method table can point to a faulting routine that loads in the correct method and updates the method table. (6) Object migration: to move an object you simply replace its rep with a forwarder rep. If you want to move a process associated with an object, then, as with message sending, you need to convert local references (on the process stack) to OID's, and, at the other end, back to local references. (Some method must be used to deal with return addresses as well.) (7) Garbage collection and the hash table: When the garbage collector removes an object, its entry in the hash table should also be removed. This is easy for a collector that explicitly frees up objects. For a copying collector, we would implement this by scanning the hash table after each gc, removing all entries with local pointers to from-space. Because object can be referenced remotely, there will need to be some form of global garbage collection scheme, for example, along with local roots, you can have a table of non-local reference counts. Forwarder objects are special in that they cannot be referenced remotely. Thus, a local garbage collector can collect them without examining the non-local root table. Note: the garbage collector does not traverse / count references in the hash table. (6) Cost of method invocation: With a global handler table, a jump to a method looks like JMP *(HTBASE + HANDLER_OFFSET) where HTBASE is the base of the global table. With the schame described above, a jump to a method looks like JMP *(*(OBJREF + METHOD_TABLE_OFFSET) + HANDLER_OFFSET) where OBJREF is a pointer to the object and METHOD_TABLE_OFFSET is the location of the pointer to the method table. If this offset is zero, then the additional cost is one extra indirection: JMP *(*OBJREF + HANDLER_OFFSET) Note that once you introduce subtyping, per-class method tables are necessary; you can't have a global table! (7) Advantages of this approach: (a) You never need to check for object presence. (b) You can specialize object behavior by replacing method table pointers; this might be useful for more than forwarding, e.g., distributed objects. Also, it would be easy to experiment with different forwarding strategies, since the forwarding code is simply code in a method table. (c) Object references are small (e.g., 32 bits); OID size is totally independent of architecture. (d) Object migration is easy, without the need for fancy gc support. (e) Code produced by the compiler manipulates local addresses only. Contrast this with an approach that uses OID's for object references; the code will manipulate both OID's and local addresses (as OID's are translated). This probably means larger code size. In addition, OID's may not fit in registers, while local addresses will! (f) The scheme seems conceptually very clean. (8) Potential disadvantages of this approach: There is a small cost to translating local addresses to OID's on remote message send, and a larger cost (a hash table lookup) to translating OID's to local addresses on message receipt. Some of these translations on message receipt may not be necessary in certain cases. For example, we may just want to pass an OID on to another processor. If OID's are used as object references, the larger cost (a hash table lookup) must be paid for each object reference. Thus, taking into account the translation cost and the cost of object references, it is not clear which approach will be the most efficient, since it will depend on how many times on average an object is referenced. Forwarder objects take up some space. However, compare this to using OID's for references. Each reference as an OID is probably larger than a local address, thus each reference pays the extra space cost; in comparison, there is only one forwarder per object, and the extra space is only a pointer to a method table. Note that it may be that our scheme actually wins on both of these points. As noted above, there is an extra indirection (over using a global method table), however we will have to pay this cost anyway if we have subtypes. ############################################################################# To: psg@neutron.lcs.mit.edu Subject: migration Date: Mon, 29 Oct 90 12:09:37 -0500 From: brewer@muon.LCS.MIT.EDU (I apologize in advance if this note is confusing... I wrote it in one sitting.) There are several serious problems in implementing migration based on the stack template approach used in Emerald. The fundamental problem is the migration of threads, particularly stacks and registers. We must be able to identify pointers (of various sorts) in order to correctly translate them for the destination processor. One solution involves the use of tags, either in hardware or software. We can't assume hardware tags and software tags add significant overhead to integer operations. The Emerald approach uses a template per frame to mark words on the stack as either pointers or data. Unlike (per word) software tags, there is no overhead for integer operations. Emerald divides registers into two classes: pointers only and data only. There are several major problems with the Emerald approach for use in our language: 1) register usage The division of registers into specific categories must be handled at the very end of compilation. For example, gcc does not assign hardware registers until after optimization. This was not a problem in Emerald because they wrote their own compiler. The problem is the conflict between portability and the need for control of the back end of the compiler. If we write our own back end, we have control of register usage but lose portability. If we use an intermediate language, we gain portability but lose control of register allocation. Thus, if we use an intermediate language we may not be able to determine the contents of a register, which *severly* limits our ability to migrate threads. I will talk about possible solutions below. 2) register spilling Like number 1, this is a problem only if use an intermediate language (which I believe is mandatory). The back end will sometimes save (spill) registers onto the stack before a call that might use those registers. After the call returns, the registers are restored. A given routine may spill in more than one place. The compiler allocates space for the registers in the stack frame. The space allocated is the maximum needed by any spill. This introduces two problems. First, we must know the contents of the registers at spill time in order to generate the template. Second, because multiple spills may use the same part of the frame, there may not be a correct template value. For example, the first word of the spill area may be used to hold an integer for the first spill and a pointer for the second. What should its (constant) tag be? If we wrote our own back end, we would guarantee that every word has a consistent tag and we would could generate the correct template. We lose both abilities with an intermediate language. 3) polymorphism Consider a polymorphic procedure that builds a list of type T. The code generated will move around T's as needed. The problem comes in trying to migrate a thread running the procedure. Are references to T translated or not? It depends... If T is an integer the T's should be untouched. If T's are pointers they must be translated. This leads to two versions of every polymorphic procedure (or object), one for translated types and one for non-translated types. A similar problem arises with the use of "any" unless it is always a pointer to something. 4) subtyping If we allow subtyping, we must impose the following restriction: Subtypes may only redefine fields with fields of the same tag. A pointer field may only be redefined to be a different pointer field, never an integer or other non-translated type. Likewise, an integer field can never be redefined to be a pointer field. Consider field p, a pointer, of type T. Code that handles type T will translate p during migration. Thus the p field of any subtype will also be translated, which implies the given restriction. I believe these are very serious problems. I have few ideas and fewer solutions. Two solutions seem (somewhat) reasonable. Idea Number 1: build our own intermediate language based on Gnu's RTL (Register Transfer Language). The key idea is to inherit the machine description stuff and most of the optimizations. We would use the same machine description files, which would allow us to move to any platform already running gnu C (will there be any?), and will reduce the problem of porting to simply generating the machine description (which is non-trivial but better than writing a new back end). Our new RTL would track tags for every register, which handles problems 1 and 2. This is a great deal of work. Idea Number 2: Place severe restrictions on the migration of threads. This boils downs to specific migration points in top-level procedures. We would save the state of the thread in a controlled manner and start a new thread at the destination processor. There would be no translation of stacks or registers. I believe this would not be too restrictive. In object-oriented languages, almost all threads are associated with a particular method and are thus short lived. These threads wouldn't migrate at all. Long-lived threads, such as background tasks in Argus, generally have natural "resting points" -- places where they are waiting for something to do. The migration points (inserted by the user) would be in these places. The ability to migrate would be designed into the task. The definition of a migration point is essentially "a point where you can encapsulate the state, translate it, and start a new thread that uses it." In this scenario, a migration would work something like this: migration is delayed until all long-lived threads reach migration points (where they stop), short-lived threads run to completion, the state is moved, new threads run at the destination (after the state moves), long-lived threads are restarted at the destination. The biggest problem with this approach is that methods that rpc other objects may be relatively long-lived, delaying migration and possibly preventing the object from doing any work. The object can do useful work if we allow the the state to be moved while threads are active. (This implies that threads must perform residence checks.) This is really a separate issue, but it is more important (in theory) here than in Idea 1 because the latter allows the migration of all threads. Eric ############################################################################# To: psg@neutron.lcs.mit.edu Subject: more on migration Date: Wed, 31 Oct 90 12:07:43 -0500 From: brewer@muon.LCS.MIT.EDU I should note that my last note on migration owes much to Sanjay. The problem discussed here was found by Sanjay. 5) Migration and Code Caching The compiler may save computed code addresses in memory or in registers for later use. If the thread migrates, these addresses will be incorrect if we use code caching (which seems mandatory). Without code caching, code is loaded at the same place on every processor and the stored addresses are thus correct on every processor. With code caching these addresses must be found and translated. The only solution I've found is to have a fixed table on all processors that is used as an indirection for ALL entry points. Instead of an entry for each procedure/method, there would be an entry for every entry point. Entry points include 1) starting points, 2) return points, and 3) any addresses used for a computed jump. (Computed jumps within a method, which is almost all of them, can be done relative to the PC and need not be in the entry point table.) The table serves two purposes. First, since all computed jumps go thru the table, their is no translation of code addresses. Second, if the target code isn't resident, the thread would jump to a routine that loads the code and then jumps to the correct location. This cleanly handles the problem of a return to code that has been unloaded since the call was made. The cost of this approach should be obvious -- the table could be large. But at 4 bytes per entry point, 10,000 entry points would only be 40k. Note that there is no "resident" bit; code is resident iff the address in the table is not the start of the loading routine. There is a hidden problem of how the loading routine figures out which code to load. This can be solved by using an indexed jump from the base of the table. The loading routine could look at the index register (which is fixed by convention) to figure out the intended routine. Correct use of this table depends on control of the back end. The normal return convention must be replaced by the indirect jump. Eric ############################################################################# To: brewer@muon.LCS.MIT.EDU Cc: weihl@photon.lcs.mit.edu Subject: migration Message-Id: Some remarks on Eric's messages on migration: 1) Polymorphism --- If we go with a very simple parameterization mechanism to start with (e.g., CLU-style parameters without "where" clauses), it's easy to implement by translating to an intermediate language and having all the instantiations share code. However, the problem Eric identified does seem hard to solve. If we're willing to change the linker to do things similar to what the CLU linker does, then the instantiations could still share code but the templates to be for objects of the parameter types would come from the caller. I.e., the linker would build a parameter block at each instantiation site that could be used to tell whether an object of each parameter type is a reference or an immediate; this parameter block would be passed as an extra parameter to each routine in the module. 2) Subtyping --- I think there are two problems here, one with subtyping and one with inheritance. By subtyping, I mean that a variable declared to be of type T can actually denote an object of type S, where S is some subtype of T. So a procedure that expects an arg of type T could be called with an actual arg of type S. If user-defined types are always represented by pointers and never by immediate values, then there's no problem; the template for the object's real type tells how to move it. In CLU, an object is just represented by its rep object, while in many other object-oriented languages, an instance of a class is always a structure with instance variables, even if there is only one instance variable. I'm not sure that CLU's approach would work with subtyping anyway, and if we use the other approach, then there isn't a problem. Inheritance may cause problems, however. If we want a subclass to share code with its superclass, then the subclass probably shouldn't change the tag of any field defined by the superclass. I wonder, though -- could we share the code, but use different templates for the two classes? 3) Code caching --- if processors have virtual memory support, we could build a single code image for the entire program, and then do code caching at the page level. It's just like having the entire program on every processor as far as addressing is concerned -- i.e., code addresses never need to be translated -- but it doesn't use all the storage. Also, do we need a table with an entry for every entry point to every routine? If we're caching at the routine level, why not have an entry in the table for each routine, and have a jump involve either (1) a residence check for the routine followed by a jump to the appropriate offset in the routine once the routine is resident, or (2) a jump to the start of the routine, passing it the offset of the desired entry point; the routine stored in the table is either the desired code or a faulting routine; if the former, it jumps to the specified entry point, and if the latter, it loads the routine and then jumps to the start. Regardless, it seems like implementing code caching requires some control of the back end to replace ordinary calls and returns with appropriate table lookups and indirections. Bill ############################################################################# Minutes for PSG Meeting on 26-Sep-90 ==================================== Asbestos (preemptive flame retarder): If you have any additions, corrections, changes, or amplifications, please let me know. Also, if it makes you happier, read "class" as "type". I tried hard to be unbiased in reporting PRO/CON/CONSENSUS information. [1] Handler Tables We discussed a number of options regarding the storage of code and pointers for message handlers (aka methods). (a) Handler Table Per Instance Under this scheme, we store pointers to all handlers defined for a class with every instance of that class. PRO: Fast handler dispatch. CON: Prohibitively large space requirements. Consider a "Point" class with instance variables x and y, for which a dozen handlers are defined. With a per-instance handler table, the space requirements for each point increase by nearly an order of magnitude. CONSENSUS: Not a viable option. (b) Complete Handler Table Per Node Under this scheme, each node stores one large handler table that contains code pointers for all of the handlers for every class defined in a program. Each handler in the system is assigned a unique HANDLER_OFFSET into this one table. Since all of the code may not be loaded on every node, some pointers may be NULL, requiring a residency check before dispatch. Eric proposed an optimization: The residency check can be avoided by setting the handler table pointers for non-resident handlers to the address of a runtime system "fault" routine that loads the required handler (and patches up the call stack, if necessary). Bill pointed out that faulting is slower than pre-fetching; perhaps a handler fault could cause the pre-fetching of related handlers. PRO: Fast handler dispatch. The caller simply jumps to the address defined in the handler table -- JMP *(HTBASE + HANDLER_OFFSET). Additional state (e.g. one word per entry or a few bits in an associated bitvector) could be used to make handler caching/paging decisions (e.g. Corby's clock algorithm). CON: May require a very large amount of memory for some large programs (e.g. consider the number of handlers defined for a Smalltalk-80 system). CONSENSUS: A classic time-space tradeoff. Unless memory is very tight (or programs very large) this is probably a good solution. Most people (myself excluded) tended to dismiss the idea of very large programs as unreasonable anyway. (c) Complete Class Table Per Node Under this scheme, each node stores a class table that contains pointers for all of the classes defined in a program. This pointer contains the address of relevant class information, including the address of the local handler table for that class (or the address of a runtime system "fault" routine as described above). PRO: Table size is considerably reduced in comparison to a complete handler table per node. CON: Handler dispatch is more expensive, due to the extra indirection. CONSENSUS: The flip side of the time-space tradeoff discussed above in (b). Unless memory is very tight, it was generally agreed that the complete handler table is better. [2] Migration: Object IDs (OIDs) vs. Local Addresses (a) The Problem Ideally, we'd like to generate efficient code and also allow for cheap thread/object/partial-closure migration. Unfortunately, this may be quite difficult: For efficient code, we would like to use local addresses for objects whenever possible to minimize the number of OID -> local address translations that need to be performed. On the other hand, the use of local addresses wreaks havoc with migration -- all local addresses need to be found and modified, since local addresses are not portable across nodes. A tagged architecture would help identify local pointers. Alternatively, Bill proposed adding tag bits to stack frames to indicate the location of local references. Sanjay pointed out that this type of method was used successfully in a standard ML implementation. If we ALWAYS reference objects via their OIDs, then this problem evaporates, but the corresponding loss of efficiency in the normal case may outweigh cheap migration. (b) Relative Addressing Wilson made a similar proposal in an earlier meeting. The basic idea is to use relative addressing whenever possible to allow for fairly efficient normal-case and migration operations. All entry points in handler code are treated as offsets from the start of the code for that handler. Return addresses are relative to the start address of the code for the calling method. When used in conjunction with the monolithic handler approach in (1b), the dispatch sequence looks something like: MOV Rn, HANDLER_ENTRY_OFFSET JMP *(HTBASE + HANDLER_START_OFFSET) If the handler isn't present, it is faulted in. When the handler is present, it does a relative jump to the appropriate entry point by using the value stored in Rn. PRO: Migration costs are sharply decreased, since non-portable absolute addresses aren't used. Also, handlers can still be swapped in and out if all jumps indirect through the handler table entry. CON: More overhead on most transfers of control (e.g. double-jumps). CONSENSUS: Seems like a net win. Some people (e.g. Eric) don't believe in migration, however. (c) Help from the Garbage Collector Bill pointed out that an incremental garbage collection technique from a paper by Ellis, Lee, Appel (sp?) may be helpful in reducing the number of presence checks when local references are used. The scheme is a copying collector, and exploits memory protection hardware that is available on some machines (it was pointed out that many parallel machines don't have VM and may lack this sort of support). Basically, local references to objects which have been swapped out end up as pointers into FROM-space, and cause a fault when they are accessed. PRO: Saves on presence checks. CON: Restricts migrations to occur in concert with the garbage collector. May not be appropriate for some parallel architectures. CONSENSUS: Should be investigated in more detail. (d) Pre-Computed Hash Values for OIDs (Sorry, no credit attribution in my notes.) Each node will most likely maintain a local cache (translation table, whatever) of OID -> (local-address or location-hint) mappings. Lookups into this table will require some sort of computed hash value on the OID. The idea here is to avoid recomputing the hash value based on the OID every time a lookup is performed. Instead, this hash value is computed once, at object creation time, and stored with the OID. PRO: Lookups are much faster, since they don't involve hash value computations. CON: OIDs become larger. This causes memory space problems, and may complicate or slow down some runtime system operations. Object creation time is increased due to the need for the initial hash value computation. CONSENSUS: None (as far as I could tell). (e) Optimizating Away Translations (Sorry, no credit attribution in my notes.) Conditional code (similar to "code-splitting" for type-dispatch in systems with subtyping) can be used to eliminate redundant translations once an OID is known to have been resolved to a local address. In addition, redundant conditionals can be further optimized. PRO: Better, faster, able to leap tall buildings in a single bound. CON: None. CONSENSUS: It's a win. [3] Responsible Nodes Since object creation rates are unlikely to be uniform across all nodes, the use of a "birth node" to keep track of an object's location (i.e. "home-based" addressing) may lead to considerable imbalance in storage requirements and location requests. I proposed that a "responsible node" be designated for each newly created object, which is not necessarily the same as the object's birth node. This way memory use and communication traffic can be better balanced if information about imbalances is maintained by the runtime system. PRO: Better balance of memory usage and communication traffic. CON: Requires runtime to keep track of imbalances (although this may be done infrequently). CONSENSUS: Most people seemed to agree that some mechanism is needed to avoid severe imbalances. In any case, the use of the term "responsible node" has survived to refer to a generalization of the birth node for maintaining OID -> location translations. - Carl Authors: Sanjay Ghemawat Carl Waldspurger Bill Weihl ############################################################################# Modules ------- Modules will be used to impose a two level naming hierarchy, and will also be used as the only form of access boundary. I.e., if two classes are defined in the same module, then they can see and manipulate each other's reps. This is the position taken by a number of modern languages that provide support for modules (Modula 2, Ada, Standard ML) The tentative decision is to allow more than one class definition in a module. (Carl: I object to this scheme. I'll flame at the meeting.) The argument against this decision is that now these classes can look at each other's reps. (Note: if we allow exactly one class definition per module, then the module starts looking like class objects in Smalltalk). We decided that we will have separate interfaces and implementations for modules. Module interfaces can contain zero or more of the following: * Type abbreviations * Single assignment variables (assignment performed by the module implementation). These will be called "module objects". * Class interface (name + list of methods). * Procedures: these include creators for any classes in the module. Module implementations can contain: * Type abbreviations * Declaration and initialization of module objects. * Class rep and method implementations. * Procedure implementations. Module components that appear in the implementation, but not in the interface can be considered "hidden" or "private" components. The implications of using this mechanism are: - Class variables (cluster own variables in CLU) are just the single assignment module variables. These variables can be omitted from the interface if desired. - Object creation is handled by the module procedures. - Visibility is controlled by what is listed in the interface. Here is an example (skip it if you like): interface Complex Complex = class % List of methods on complex objects go here ... end % Some simple complex constants (module objects) zero, one, i: Complex % Creators cartesian = proc (r, i: real) returns (Complex) polar = proc (m, a: real) returns (Complex) % Other procedures ... end Using modules ------------- The complex module given above can be used in the following manner: c: Complex::Complex := Complex::cartesian(10.0, 7.0) c := Complex::polar(c.get_magnitude(), c.get_angle() * 2.0) However, a less painful way would be to "import" the module: import Complex c: Complex := cartesian(10.0, 7.0) c := polar(c.get_magnitude(), c.get_angle() * 2.0) Suppose module M contains components c1..cN. Then the effect of "import M" is identical to the following sequence of equates: c1 = M::c1 c2 = M::c2 ... cN = M::cN Therefore, if two modules contain components with the same name, importing the modules creates a name conflict. Here is how we deal with these conflicts. Suppose a name X appears in the source code: * If X is defined locally, the local definition is used. * Else if X is defined in exactly one of the imported modules M, M::X is used. * Else if X is defined in more than one module, an error message is given saying that X is ambigous. * Else an error message is given saying that X is not defined. Object Representation --------------------- - The implementation of a class will list the slots for each instance of the class. In a method, the slots of the "self" object can br referenced by just giving the slot name. The following example demonstrates this scheme, and also shows the mechanism for creating objects: implementation Complex Complex = class slots [r: real, i: real] = ... get_real= method () returns (real) return (r) end ... end ... polar = proc (m, a: real) returns (Complex) return instance[Complex]{ r: m * cosine(a), i: m * sine(a) } end ... end Of course, this mechanism is very similar to rep types, it is just more implicit in two ways: - There is an implicit record around the rep - Ups/downs are done implicitly - Accessing slots of non-self objects: (Useful in binary ops, and for accessing other classes in the module) The compiler automatically creates a type rep[T] for each class T in the module implementation. Each class T gets an extra method: down = method () returns (rep[T]) rep[T] provides get_x and set_x operations for each slot x in class T. Example: implementation Stack Stack = class slots [ list: array[int] ] % Only one slot ... append = method (s: Stack) r: rep[Stack] := s.down() for i: int in r.get_list().elements() do push(i) end end ... end end * It is illegal for rep[T] to appear in the interface (this implies that the down operation cannot be exported). * rep[T] is legal if and only if T is a class defined by the current module. Dispatch mechanisms ------------------- There are several choices here: 1 Explicit message queue. This is general enough, but we thought that it would be very cumbersome to use. 2 New thread per invocation (like Argus). Locks are not propagated across invocation boundaries unless we provide more mechanism (such as action identifiers). Of course, the compiler can optimize local calls so that they run in the caller's thread. 3 Called method runs in same logical thread as the caller for synchronous calls. Therefore, the caller's locks are automatically propagated to the callee. This mechanism only works for a chain of synchronous calls. What if we want to propagate a lock to an asynchronous child thread. 4 Eric and Carl's two solutions to the recursion deadlock problem in Actor systems. The ports in their model translate to the locks in our model. Option 2 is the tentative decision because it has very low overhead. Asynchronous ordered messages ----------------------------- We need to run at least some code sequentially to make effective use of network message ordering. The different options here just specify how much code should be run sequentially for messages that are ordered with respect to each other (i.e., they arrived on the same stream) 1 Preamble: a block of code 2 Entire method body: we can get unserialized executions when desired by forking off a thread. 3 The code says "ready" when it is prepared to compete with other messages from the same stream. This is very similar to the preamble mechanism, but is more flexible. Ready statements are analogous to returns from procedure bodies. In Pascal, which has a single return point per procedure, it sometimes gets very painful to force all returns through a single point. 4 One (or possibly a number of) lock is specified in the method header, and the system acquires this lock on behalf of the thread before allowing another message on the same stream to be handled. Decision: Entire method bodies are executed sequentially (option 2). When we get the time, it should be easy to make the switch to using "ready" statements. Procedures ---------- - No own variables. Module objects can be used instead. - We can create bind closures. - If "m" is a method on class "X", and "x" is an instance of "X", then "x.m" is a procedure. This is very similar to bind closures. Method invocation ----------------- The proposal for the simple language is to use the syntax x.foo(a1, a2, ... an) to invoke the method foo on object x with arguments a1..aN. The drawback of this syntax is that the record sugaring mechanisms from CLU (get_x and set_x) stop working. For example, if I say x.foo, am I calling x.get_foo(), or do I want the procedure obtained by binding x to method foo. We will disallow record sugaring until this problem is solved (possibly by changing the method invocation syntax). One implication of this decision is that we can't use the sugar for even the built-in record types. Parameter passing ----------------- - Call by sharing (object reference) - Small immutable object can be passed as values (just an optimization) Variables --------- We need to define scoping/shadowing rules for variables. We should probably steal CLU's mechanism. Parameterization ---------------- We will allow modules to be parametrized. interface Stack [T: type] Stack = class Push = method (v: T) Pop = method () returns (T) signals (empty) Top = method () returns (T) signals (empty) end New = proc () returns (Stack) end The only problem comes with import statements: Suppose I need both Stack[int] and Stack[real]. import Stack[int] import Stack[real] Now, the type name Stack is ambiguous, and therefore has to be fully qualified. I have to say: s1: Stack[int]::Stack s2: Stack[real]::Stack One solution is to use equates: Stack_int = Stack[int]::Stack Stack_real = Stack[real]::Stack I think a better solution is to introduce the idea of parameterized equates. I should be able to say: Stack[T] = Stack[T]::Stack (This example depends on module names being in a separate name space from local declarations) Given that we have parametrized types, we can say that if module M is parametrized by k parameters: P1..Pk, and that the module has components C1..Cn, then "import M" is equivalent to: C1[P1,P2,..Pk] = M[P1,P2,..Pk]::C1 C2[P1,P2,..Pk] = M[P1,P2,..Pk]::C2 ... Cn[P1,P2,..Pk] = M[P1,P2,..Pk]::Cn Therefore, if I say "import Stack", I can then say: s1: Stack[int] s2: Stack[real] Initialization of module objects -------------------------------- We want to guarantee that a module object is initialized before it is accessed (barring cycles in the initialization graph). This will be very easy to do if we restrict the code that can be run for initialization of module objects. However, we would like to provide more generality. Here is the implementation strategy for the initial version of the language: * All module objects that are initialized by built-in code will be initialized either at load time, or just before the main program starts running. Example: integer objects will be easy to initialize at load time. * All other module objects are allowed to be in one of three states: unintialized, initializing, and initialized. Each access to one of these objects will check the state: if uninitialized: Change state to initializing and run initialization code. if initializing: Block until state changes to initialized. if initialiazed: Just use the relevant object. Note: this scheme will deadlock if there is a cycle in the initialization graph: i.e, object A's initialization code uses object B, and object B's initialization code uses A. Multiple access boundaries -------------------------- It may be convenient to expose different portions of the object's interface to different entities in the system. For example, a bank account object will have different sets of operations for bank employees and customers. This sounds like capabilities. We will punt on this issue for now. Associated body/background process per object --------------------------------------------- This seems unnecessary because the creator can just fork a process that does the necessary work. Languages like POOL have a body per object so that they can explicitly pick up messages from the message queue. We will not need this capability in our model. ############################################################################# To: psg@neutron.lcs.mit.edu Subject: Notes on type system/base types Date: Thu, 08 Nov 90 19:16:00 -0500 From: Wilson Hsieh Authors: Adrian Colbrook Sanjay Ghemawat Wilson Hsieh Semaphore --------- - We can probably get away with providing just counting semaphores because they are more general than binary semaphores. Should a counting semaphore should probably be parameterized by the count so that we can implement binary semaphores very efficiently (counting semaphores will cost a few more instructions per access as compared to binary semaphores). - We can combine both blocking and spinning models into the same semaphore type by providing two operations: acquire: Block until the thread acquires the semaphore test_and_acquire: Acquire only if it is possible to do so without waiting. Return false otherwise. This arrangement abstracts away from low-level synchronization primitives like test-and-set. - What fairness guarantees should we give for threads blocked on a semaphore? FIFO? Creators: - create Operations: - acquire - release - test_and_acquire Mutex ----- - Parameterized by type of contained element. Example: mutex[array[int]]. Creators: - create Create a new mutex Operations: - seize/release Lexical structure like Argus? - get_value/set_value Access/change the contained object - pause Not needed if we have condition variables Condition Variables ------------------- Since condition variables are usually used in association with a mutex, it may be convenient to package up both the mutex, and the condition variable's state (set of threads) together. Therefore, condition variables should provide at least the following operations: Operations: - seize/release - wait - signal Any fairness guarantees? - broadcast The rationale for making condition variables a part of the language is that they are very useful, and it seems hard to come up with a language level implementation for condition variables that will both be efficient and portable between shared memory and message passing machines. Promises -------- Promise[T] is a future for a value of type T. A promise[T] is created by making an asynchronous call on a procedure that returns T. Operations: - get_value Wait until ready and then return value - ready Return true iff value is ready Stream wrappers (for ordering asynchronous calls) --------------- A stream[T] provides a mechanism for delivering messages in order to the contained object of type T. A stream[T] should provide all operations that T provides. However, the return types are all promises: If T provides foo: T1 -> T2, stream[T] provides foo: T1 -> promise[T2] Creators: - create Do we need more operations? The following may be useful: - sync Read-Write locks ---------------- We can easily build read-write locks out of mutexes. The questions are: - Can we provide a better interface for read/write locks by explicit support in the language? - Can read-write locks be implemented more efficiently than by using mutexes? For example, the kernel may implement some fancy locking protocol that uses special hardware capabilities. If we decide not to make read-write locks a part of the language, they should be one of the first types added to the language library because they will probably be very useful. Atomicity of normal types (arrays, etc.) ---------------------------------------- - If normal base types are atomic, then every use of these types has to pay the penalty of acquiring the appropriate locks (global compiler analysis may alleviate this problem slightly). - If the normal base types are not atomic, then the programmer will have to build atomic versions by combining these types with synchronization types (mutexes for example). Our parameterization ability will be a big help here. Parameterized types ------------------- - What about where clauses? They may turn out to be very useful. In any case, they are probably not too hard to implement, and even if they are not part of the initial language, they should be one of the first few things added to it. Base types (CLU/Argus types) ---------- int real bool char null any (?) array string record variant node? We might want some notion of node ids, rather than just passing integers. Do we want immutable versions (sequence, struct, oneof)? We need some type that captures handler instances (x.foo, or bind foo(x,*, *, ...), depending on our syntax) as well as free-standing procedures, such as proctype. We will also need creatortype. Do we have itertype? Input/Output ------------ We probably don't need istreams for now. However, we will probably want to read/write text files and perform terminal I/O. streams What is an object? ------------------ Are messages, types, or threads objects? If they are, they must be base types. We should probably say no on all three for now. ############################################################################# To: psg@neutron.lcs.mit.edu Cc: wang@simpleton.lcs.mit.edu Subject: Invocation Mechanisms Date: Thu, 08 Nov 90 20:05:51 -0500 From: carl@meson.lcs.mit.edu Authors: Paul Wang Carl Waldspurger We haven't developed anything that's very new or interesting, primarily because we are not dealing with location-related invocation options (e.g. Emerald call-by-move) in Feeble. INVOCATION MECHANISMS: (1) Blocking calls and sends: % y has type t y: t % BLOCKING CALL % Method foo (which returns a t) is invoked on receiver object x. % The caller waits for the reply, which is assigned to y. y := x.foo(arg1, arg2, ..., argN) % BLOCKING SEND % Method foo (which returns a t) is invoked on receiver object x. % The caller waits for the reply, which is ignored. % (The implementation may choose to simply return an ACK.) x.foo(arg1, arg2, ..., argN) (2) Non-blocking calls and sends: % y has type promise[t]. A promise is a typed "future" % (including exceptions) that must be explicitly claimed. % At the claim point (e.g. y.claim()), the promise will either % return an object of type t or raise an exception defined in % the promise type. y: promise[t] % NON-BLOCKING CALL % Method foo (which returns a t) is invoked on receiver object x. % The callee immediately receives a promise[t], and does not % wait for the method to be invoked or completed. Conceptually % a new thread of control is "forked" which asynchronously % performs the call. y := fork x.foo(arg1, arg2, ..., argN) % NON-BLOCKING SEND % Method foo (which returns a t) is invoked on receiver object x. % The caller does NOT wait for the reply, which is ignored. % (The implementation may choose to simply avoid any reply.) % Conceptually a new thread of control is "forked" which % asynchronously performs the send. fork x.foo(arg1, arg2, ..., argN) (3) Nested calls and sends: % We may want to guarantee that "call-thrus" avoid unnecessary % messages (e.g. like Scheme guarantees tail recursion avoids % unnecessary stack frames). This behavior is often explicitly % coded in actor languages using the actor notions of % "customer" and "reply-keyword". y: promise[t] % The reply value generated by invoking foo on receiver object x % is sent directly as an argument to the invocation of method bar % on receiver z. The intermediate result need not be sent back % to the callee. This may simply be an implementation optimization % or we may want to guarantee it in the language semantics (?). y := fork z.bar(x.foo(arg1, arg2, ..., argN)) (4) Pipe calls: % Pipes are abstractions which perform ordering. They are % identical to the "stream" objects we talked about in % meetings (and are elaborated in PSG design note #4 (draft) % by Wilson and Bill). (The name was changed to protect the % innocent, and to avoid overloading the already-overloaded % term "stream"). % Create a pipe for ordering messages to x. The signature % of p is the same as the signature for x, except that all % the return types become the corresponding promise types. p: pipe[t] := pipe(x) % Method foo (which returns a t) is synchronously invoked on % receiver object x through pipe p. The callee immediately % receives a promise[t], and does not wait for the method % to be invoked or completed. The pipe abstraction guarantees % that subsequent calls placed on pipe x will appear to execute % AFTER prior calls have completed. y: promise[t] % Special syntax is used ("pipe") for two reasons: % (1) There may be special operations (e.g. p.sync()) that % we want to execute directly on the pipe object itself. % Simple using "y := p.foo(...)" could present a name clash. % (2) It is plainly clear that a "pipe" call is being made. % Bill objected to the implicitness of simply using % "y := p.foo(...)" although we don't mind. y := pipe p.foo(arg1, arg2, ..., argN) (5) Order of evaluation: In the method invocation: x.foo(arg1, arg2, ..., argN), the receiver object is first evaluated ("x"), then the arguments are SEQUENTIALLY evaluated in an arbitrary (i.e. up to the compiler, as in Scheme) order, then finally the message invoking the method ("foo") is sent. We decided not to evaulate the arguments in parallel for Feeble, since this can be done manually in a straightforward way, such as: val1 := fork arg1 val2 := fork arg2 ... valN := fork argN x.foo(val1.claim(), val2.claim(), ..., valN.claim()) In the future, we may want to provide some suitably convenient syntactic sugar if this is common. ############################################################################# To: psg@neutron.lcs.mit.edu Subject: Notes on concurrency structures Date: Fri, 09 Nov 90 13:57:37 -0500 From: dellaro@muon.LCS.MIT.EDU Authors: Chris Dellarocas Qin Huang Concurrency Structures ====================== Concurrency in our language will be achieved by means of the following : 1. Synchronous and asynchronous method invocations 2. Parallel block statements (cobegin/coend) 3. Parallel iterators Placement of threads is an important issue in getting concurrency, i.e., we would like to distribute threads among different processors. 1. Presumably, method invocations are executed on the processor where the invoked object resides. We may have call-by-move, etc. later on. 2. It is harder to decide where to put threads in the cases of cobegin/coend and parallel iterators. We argue that there must be some distributed data structures underline the cobegin/coend and parallel iterators to make the full use of them. Method invocations ================== We will have both synchronous and asynchronous method invocations. A synchronous call has the syntax: y := x.foo(a1,...,an); or x.foo(a1,...,an); depending on whether we want a result from method foo. An asynchronous method invocation is qualified by a keyword such as fork, async, future. p := async x.foo(a1,...,an); If x.foo returns an object of type T, then async x.foo returns an object of type promise[T]. In our view, promises are a nice way to integrate the notion of threads in our object-oriented language. Promise is an abstract data type which has a number of defined methods like: - join essential to be able to get back the result of a method if p::promise[T] then p.join::T - kill to be able to kill unnecessary threads - useful for speculative concurrency - suspend ??? resume Comments on the stream data type (Adrian, Sanjay & Wilson) ========================================================= We have some objections to the stream data type, as proposed by AS&W. Suppose we have an object x and we want to make an asynchronous call to one of its defined methods, foo. Then, using the stream data type, we can (1) create a stream of the same type as x. From what we understood, streams are "tied" to specific objects of that type, which are passed as arguments to the stream creator. Suppose that the resulting stream is called s. (2) invoke method foo on stream s p := s.foo(a1,..., an); This is equivalent to the async call presented above, but rather confusing since we now have two different names to (ultimately) refer to the same object. We definitely prefer explicit asynchronous calls. Parallel Block Constructs ========================= By default statements in a method will be executed sequentially. We propose to provide two parallel block constructs. One for statements that can be executed in parallel and one for statements that MUST be executed in parallel (for correctness). Possible syntax could be cobegin % can be parallel block of statements coend parbegin % must be parallel block of statements parend A question that arises is, when do these constructs exit ? Do we wait for completion of all statements in the block (implicit barrier) or not ? Do we need separate variants to cover all cases ? Parallel Iterators ================== Parallel iterators will be analogous to parallel block constructs. cofor (sequential iterator syntax) statement end Same questions about termination (barrier or not) remain. Placement is an issue in this simple language version. Since methods are executed where their objects reside, a parallel iterator only makes sense if each iteration invokes a method on a different object. We think that we must include some kind of distributed objects in our language very soon ! Group Promises ============== While discussing possible parallel iteration constructs, Chris came up with the following idea. On many occasions, programmers may want to asynchronously start a number of methods and then treat the set of promises thus created as a single entity. For example, they may want to join with any of the methods (whichever finishes first) and then kill all other methods, or join with some methods and then wait for the rest to finish. We propose a new abstract data type which captures the idea of a group of promises ( a group of threads ) and a statement that asynchronously starts a number of methods and returns a single identifier for the group. The abstract type is promise-group[T]. This type is an abstraction for a number of concurrently executing methods that return type T. Each member of the group can be referenced by an integer index i running from 0 to the size of the group. A number of methods can be invoked on an object of type promise-group[T] : - join-any Wait until any member of the group has completed. Return the result of the first member that does. - join Selectively join with a speicified member of the group. - kill-all Kill all members of the group that have not finished. - kill Kill a specified member of the group - wait-all Wait until all members of the group have completed Return their total number - groupsize Return total number of members in the group - finished Return number of members that have finished - working Return number of members that are still executing The statement is a variant of a simple for loop gp :: promise-group[T] gp := group from i := 0 to do statement endgroup Group starts for all indices from 0 to and immediately returns a group of promises in gp. Eventually we should come up with more general versions of the group statement, providing more general ways to index the members of a group. ############################################################################# To: psg@neutron.lcs.mit.edu Cc: weihl@photon.lcs.mit.edu Subject: issues raised at today's meeting Message-Id: Here's the list of issues from today's meeting. Please think about them for next week, and also go through the draft notes on the language design to see if there are any other issues that should be raised. 1. Syntax for pipe calls. Bob raised the issue of needing to distinguish between "pipetypes" and other types in where clauses if pipe calls involve special syntax. If we eliminate the special syntax, we need some mechanism for dealing with possible name clashes between operations on a pipe itself and operations on the target object of a pipe. There are (at least) 5 possible options for dealing with the name clashes: 1. Don't have any operations on the pipe itself. (Disadvantage: "synch" is inconvenient.) 2. Have special syntax for calling target operations through a pipe. This suffers from the problems raised by Bob. 3. Have special syntax for calling operations on the pipe itself. (E.g., pipe s.synch().) Disadvantage: this also suffers from the problems raised by Bob, unless we simply decide that this syntax can't be used for calling pipe operations. 4. Have no special syntax -- but allow the instantiation pipe[T] only if T has no operations with the same name as one of the pipe operations. Disadvantage: if there are very many pipe operations, this introduces a lot of operation names that have to be avoided. This also makes it more painful for us to add more pipe operations later. 5. Have no special syntax -- and have a single pipe operation that extracts the underlying pipe object, which has some other type (e.g., real_pipe) that provides the desired operations such as synch. This is similar to #4 above, except that there is only a single operation name that can cause a conflict -- and we can easily add new pipe operations to the real_pipe type. 2. If we eliminate modules from the initial language, what replaces them? Do we allow separate compilation? If so, how does it work? and what are the rules for type-checking across compilation units? 3. Is there any special syntax for "lock statements" for mutexes? 4. Is there any exception handling mechanism? (I think we decided to eliminate this initially.) 5. To generalize #3, what about a "try ... finally ..." statement? 6. How about goto? (I vote NO.) 7. What are the details of the cobegin/parbegin/cofor mechanism? 8. Can arms of a cobegin exit other than by falling off the end? If so, what is the semantics? 9. Can threads be killed by other threads? (e.g., as in "race" for speculative concurrency.) 10. If we don't have modules (with static variables in modules), is there any mechanism for static variables (either globals, or local (own) statics)? 11. We need to go over the details of object creation. 12. What about grouping calls on a pipe into "atomic batches"? (I think we decided to defer this and similar elaborations of the basic mechanism for now.) Bill ############################################################################# Date: Wed, 5 Dec 1990 17:56:01 EST From: William E. Weihl Reply-To: weihl@photon.lcs.mit.edu To: psg@neutron.lcs.mit.edu Cc: weihl@photon.lcs.mit.edu Subject: meeting Message-Id: Here are my initial thoughts on the issues raised last week; we can discuss them at tomorrow's meeting. Bill 1. For pipes, I think the best option may be the fifth one listed in my message from last week -- i.e., have no special syntax for pipe calls, and have a single operation on pipes that extracts the underlying pipe object; operations such as synch are handled by the underlying pipe. I.e., the type pipe[T] has all the operations of objects of type T (with return types changed to promises), plus an operation "get_pipe" that returns an object of type pipe. The type pipe has an operation "synch", plus possibly some others. The instantiation pipe[T] is legal only if T has no operation named "get_pipe". 2. To simplify our initial implementation effort, I propose that we not worry about separate compilation initially. Designing and building mechanisms to support separate compilation are activities that are unrelated to the research we're doing; if we can manage without it for awhile we'll have more useful results sooner. 3. I think we should not bother with special syntax for mutexes in the initial language. 4. To keep things simple, I suggest we ignore exception handling mechanisms for now. 5. I also think we shouldn't include a "try...finally..." statement. 6. No goto. 7. Here's a suggested design for cobegin/etc. We provide a compound statement "cobegin" that takes a number of arms. Each arm can be either static or dynamic. A static arm is a single statement (or body). A dynamic arm is a statement, together with a specification (as in a for loop) of the number of instances to create. The semantics is that one thread is created for each static arm, and one for each instance of a dynamic arm. We will provide annotations to allow the programmer to indicate whether or not the system must be prepared to run all the threads concurrently. (Similar annotations are required for "fork".) 8. An arm of a cobegin is not allowed to transfer control outside the cobegin. Control leaves the cobegin after all arms have terminated. 9. I think we should defer providing mechanisms for threads to terminate each other until the next version of the language. 10. We should provide both global and local static variables. Local static variables can be declared in the outermost scope of a type implementation or routine (similarly to CLU). 11. We should talk about creation at the meeting. 12. We should defer enhancements to the pipe ordering mechanisms until later. Flames welcome. ############################################################################# Date: Thu, 6 Dec 1990 11:04:43 EST From: William E. Weihl Reply-To: weihl@photon.lcs.mit.edu To: psg@neutron.lcs.mit.edu Cc: weihl@photon.lcs.mit.edu Subject: annotations Message-Id: Below is an outline of some issues related to annotations for controlling placement, migration, and granularity. One clear omission is that I haven't thought about ``distributed objects'' in writing this. Comments and discussion are welcome. If we have time, we can discuss this at today's meeting, so please read it (and try to find time to think about it if you can) before then. Bill ************************************************** We need annotations for controlling: object placement and migration thread placement and migration thread granularity and scheduling 1. Object placement and migration A. initial placement could be described i. on declaration of variable - but variables are assigned both newly created objects and existing objects -- so this might not be useful. ii. on creation of object - this seems like the right time to do it. The problem with this is that it could be very difficult to know the right place to create an object. E.g., suppose I have a routine that happens to create some objects; it could be that the appropriate places to create them depend on how the routine is called -- so the placement decisions need to depend on the routine's arguments. In general, an abstraction that creates objects might need information from the point of call about where the objects should be created. Can we develop simple annotations that don't require passing node ids as explicit arguments of creation operations? What if creation of an object requires creating subobjects? Where are they created? And how does the creator of the outer object indicate this? In general, when creating an object, one would like to be able to specify how its pieces should be spread around the machine; this requires some sort of ``language'' for describing the pieces and where to put them. It's easy to design a specialized language for a few types like arrays; what about more general types? B. migration i. explicit ``move'' instructions. ii. as part of a call (because it may be cheaper to move an object in the same message as a call than to use a separate message). a. call-by-move for arguments. b. data-shipping (i.e., bringing target of call to the node where the calling thread resides -- or in general to the node where the call will run). iii. by attachment to other objects (including threads) a. indicated by declaration (e.g., an instance variable has the attribute ``attached'', indicating that the subobject should be colocated with the containing object; or a local variable in a thread has the attribute ``attached'', indicating that the object denoted by that variable should be colocated with the thread) b. indicated by explicit attach and unattach operations. iv. other operations? e.g., ``fix'' as in Emerald? 2. Thread migration A. by explicit move instructions (move self to node X, or move some other thread). B. as part of a (sequential) call: i. where to run the call a. at thread's current node. b. at node where called object resides (for calls to methods on objects) c. at a third node (other than the calling thread's and the called object's) ii. how to run the call a. RPC b. continuation-passing (i.e., move part of thread's stack to desired node, so when the call returns the thread continues running there rather than returning immediately to the original calling node) -- need ways of describing how much of thread is moved (e.g., top frame, top n frames, partial frames) 3. Thread placement and granularity A. granularity i. information about may/must fork/cobegin ii. information to assist in aggregation of subtasks (for ``may'' fork) -- i.e., which forked calls should be run inline, which arms of a cobegin should be aggregated together. B. placement -- where to run a new thread. i. at calling thread's node. (usually a bad idea?) ii. at called object's node (for calls to methods). iii. at some third node. iv. at a lightly loaded node chosen by the run-time system. 4. Scheduling It's very unclear to me right now what kinds of control are needed or possible over scheduling. We need to discuss this more. A general note: annotations about actions to be taken on a call could be given either in the called routine's interface or as part of the call. ############################################################################# To: psg@neutron.lcs.mit.edu Subject: invocation Date: Mon, 25 Feb 91 14:44:16 -0500 From: brewer@muon.LCS.MIT.EDU Notes on invocation: 1) Converting OIDs to local addresses We will use a hash table that converts OIDs into local addresses. An OID will have an entry exactly when there is local storage for it -- either the object itself or a surrogate. Entries are deleted when the corresponding storage is deallocated. Objects created locally are not assigned OIDs until a reference to them is exported. The id 0 denotes objects without OIDs. After an OID is assigned the corresponding entry is added to the hash table. There are two uses for the hash table. The primary use is to convert OIDs to local addresses when unmarshalling a message. The second use is to help locate an object during a location request to the birthnode. The birthnode first looks in the hash table. If there is an entry (the normal case), then the location information is at the local address (at either the object or the surrogate). If there is not an entry, a second data structure is referenced to find the location. In this case, the hash table serves only to reduce the data required in the second data structure. The second structure has an entry for each non-resident object without a local surrogate that was created on this processor. The hash table also serves as a list of all the objects (with OIDs) known to the processor. This should be useful during gc, and perhaps gc info should be stored in the hash table. (Note that everything in the hash table has been exported.) 2) Invoking a incoming message (remote -> local) 1) Find the object using hash table If not here at all, goto 7 If here only as surrogate, goto 5 2) (object is here) Unmarshall/translate message 3) Invoke local procedure (the method) 4) return reply if expected (DONE) --- 5) (object not here, but have surrogate) Get location from surrogate 6) Send REJECT message to OS at sender including rejected message and location hint --- 7) (object not here and no surrogate) set location hint to NULL goto 6 3) Local -> local 1) Find method. The method address is a function of three things: ma = f(method_name, class, instance) The instance reveals whether to use the normal method or the surrogate method. Alternatives: a) Use link-time assigned numbers for methods, with the normal version and the surrogate version numbered consecutively. To do invocation: 1) Using object address, look up offset located at the object. The offset is 0 if local, 1 if surrogate. (1 cycle, 1 mem access) 2) base pointer = method_table + offset (1 cycle) 3) jump indirect through METHOD(base pointer) (1 cycle, 1 mem access) Total: 3 cycles, 2 mem accesses A variation of this keeps (method_table + offset) at the instance. This requires 32 bits of storage instead of 1 bit (although the "bit" may use a byte for speed of access), but eliminates step 2. If were going to use 32-bits per instance, alternative b makes more sense. b) Eliminate the table. Each class has a method table that contains all of the normal methods followed by all of the surrogate methods. The offset field (which is 32 bits) holds the address of either the start of the normal methods or the start of the surrogate methods. Methods are named by the (compile-time constant) offset into these tables. To do invocation: 1) Get base pointer (1 cycle, 1 mem access) 2) jump indirect through METHOD(base pointer) (1 cycle, 1 mem access) Total: 2 cycles, 2 mem accesses This scheme enlarges object space requirements, but eliminates a global table, allowing the OS to allocate table space only for the classes that are resident. (Both allow code caching, although we must discuss the procedure return mechanism and its effects on a code cache.) In both versions, the method tables must be resident at all times. If we go with version b, we should be able to eliminate the class pointer for instances, since we can determine the class based on method table pointer if we're careful. Elimination of the class pointer brings the per-instance space overhead (due to invocation) of version b to zero; we have replaced the class pointer with the method table pointer, and everything else should be the same. 2) local procedure call 4) Local -> Remote 1) Find the method as above; it will be the surrogate method. 2) call surrogate procedure a) allocate message space b) set up header c) marshall arguments d) send message Obviously, I expect there will be some discussion of the above. Eric, Adrian, and Carl ############################################################################# To: psg@neutron.lcs.mit.edu Subject: The "Dot" Mess & Modula3 Date: Tue, 14 May 91 17:17:47 -0400 From: carl@meson.lcs.mit.edu Modula3 avoids the mess we've encountered by specifying that the target object in a method call is the first argument to that method. Thus, to call a method "method" on an object "obj" of type "Foo", one would write: Foo.method(obj, arg1, arg2, ...) Note the similarity to CLU syntax: Foo$method(obj, arg1, arg2, ...). (Thankfully) There seems to be a consensus among us that this CLU-style listing of the type for every method invocation is too verbose. Please send constructive ideas to me, and flames to /dev/null. - Carl ############################################################################# To: psg@neutron.lcs.mit.edu Subject: Dots... Date: Thu, 23 May 91 14:27:39 -0400 From: carl@meson.lcs.mit.edu BOTTOM LINE: A simple change to our "agreement" that should be acceptable to everyone. Flame directly to me, cc: psg@neutron. ISSUE: Wilson has made a very convincing case that we should not have two distinct basic entities in our language -- namely, objects and records. I agree, and it is my understanding that everyone Wilson has talked to also agrees. THE FIX: (1) We still punt on providing record get/set sugar for user-defined objects. (2) We still provide record get/set syntax for record types. Records are still special in that they are the only objects for which this sugar applies. (3) The only real change is that we think of records as objects that have get/set methods that can be either directly accessed -- like ordinary methods "foo.get_x()" -- or accessed via the sugar -- "foo.x". Aside from a mental viewpoint shift, the only change is that we now support direct access of get/set methods for records. Thus, we could also pass foo.get_x as a proctype, just as we could pass a method for any other object. Provided that I don't hear (m)any objections, and that Bill doesn't disagree (and also put on his dictator hat), then the above "fix" is what I will add to the Prelude language description. - Carl