To: psg@neutron.lcs.mit.edu
Subject: Another approach to object references, forwarding, etc.
Date: Wed, 17 Oct 90 22:25:17 -0400
From: Bob Gruber <gruber@electron.lcs.mit.edu>

Sanjay and I thought up the following scheme for traversing
object references in the parallel language's runtime system.

(1) All object references are local addresses, e.g., 32 bits on a 32 bit
	word architecture.  If an object is local, then references to it
	point directly to its rep.  The rep contains the object's OID and
	also a pointer to the object's method table (shared by all local
	objects of a given type) and .  Method invocation on local
	objects is simply an indirection through the method table of the
	object, where the method name maps to a compile-time offset into
	the table.

(2) Some references will be to non-local objects.  All such references
	will actually point to local forwarder objects.  A forwarder
	object's rep includes the OID of the remote object, possibly
	other info (such as a hint as to its current location), and, like
	other objects, a pointer to a method table.  Method invocation on
	non-local objects proceeds in exactly the same way as above,
	i.e., an indirection through the forwarder's method table.  The
	method table of a forwarder contains methods that translate the
	invocations into message sends to the remote object (described
	next).

(3) Local references only make sense in the local processor's address
	space.  Thus, when a forwarder forwards a message to a remote
	object, the receiver and the arguments, which will be local
	references, must be converted to OID's.  Since both local and
	forwarder objects contain OID's, it is trivial to translate local
	references to OID's.

	N.B. Small immutable objects should be sent as values.

(4) When the runtime system receives a message from another processor, it
	must translate the OID's of the receiver and the args to local
	references.  Each processor maintains a hash table that maps
	OID's to local addresses.  If an OID does not map to a local
	address, this means that this is the first occurrence of this OID
	at this processor, in which case the OID must be for a remote
	object... a forwarder object is created containing the OID, and
	its local address is added to the hash table.

	Note: the hash table contains entries for all local objects as
	well.  When an object is created, its OID is placed in its rep,
	and the OID to local address mapping is added to the hash table.

(5) Method caching: like the global method table approach described in
	Carl's meeting summary, you can do per-method code caching as
	follows.  Any slot in a type's method table can point to a
	faulting routine that loads in the correct method and updates
	the method table.

(6) Object migration: to move an object you simply replace its rep with a
	forwarder rep.  If you want to move a process associated with an
	object, then, as with message sending, you need to convert local
	references (on the process stack) to OID's, and, at the other
	end, back to local references.  (Some method must be used to
	deal with return addresses as well.)

(7) Garbage collection and the hash table:

	When the garbage collector removes an object, its entry in the
	hash table should also be removed.  This is easy for a collector
	that explicitly frees up objects.  For a copying collector, we
	would implement this by scanning the hash table after each gc,
	removing all entries with local pointers to from-space.

	Because object can be referenced remotely, there will need to be
	some form of global garbage collection scheme, for example, along
	with local roots, you can have a table of non-local reference
	counts.  Forwarder objects are special in that they cannot be
	referenced remotely.  Thus, a local garbage collector can collect
	them without examining the non-local root table.

	Note: the garbage collector does not traverse / count references
	in the hash table.


(6) Cost of method invocation:

	With a global handler table, a jump to a method looks like

		JMP *(HTBASE + HANDLER_OFFSET)

	where HTBASE is the base of the global table.  With the schame
	described above, a jump to a method looks like

		JMP *(*(OBJREF + METHOD_TABLE_OFFSET) + HANDLER_OFFSET)

	where OBJREF is a pointer to the object and METHOD_TABLE_OFFSET
	is the location of the pointer to the method table.  If this
	offset is zero, then the additional cost is one extra indirection:

		JMP *(*OBJREF + HANDLER_OFFSET)

	Note that once you introduce subtyping, per-class method tables
	are necessary; you can't have a global table!

(7) Advantages of this approach:

	(a) You never need to check for object presence.
	(b) You can specialize object behavior by replacing method
		table pointers; this might be useful for more than
		forwarding, e.g., distributed objects.  Also, it would
		be easy to experiment with different forwarding strategies,
		since the forwarding code is simply code in a method table.
	(c) Object references are small (e.g., 32 bits); OID size is
		totally independent of architecture.
	(d) Object migration is easy, without the need for fancy gc support.
	(e) Code produced by the compiler manipulates local addresses
		only.  Contrast this with an approach that uses OID's for
		object references; the code will manipulate both OID's
		and local addresses (as OID's are translated).  This
		probably means larger code size.  In addition, OID's may
		not fit in registers, while local addresses will!
	(f) The scheme seems conceptually very clean.

(8) Potential disadvantages of this approach:

	There is a small cost to translating local addresses to OID's on
	remote message send, and a larger cost (a hash table lookup) to
	translating OID's to local addresses on message receipt.  Some of
	these translations on message receipt may not be necessary in
	certain cases.  For example, we may just want to pass an OID on
	to another processor.

	If OID's are used as object references, the larger cost (a hash
	table lookup) must be paid for each object reference.  Thus,
	taking into account the translation cost and the cost of object
	references, it is not clear which approach will be the most
	efficient, since it will depend on how many times on average an
	object is referenced.

	Forwarder objects take up some space.  However, compare this to
	using OID's for references.  Each reference as an OID is probably
	larger than a local address, thus each reference pays the extra
	space cost; in comparison, there is only one forwarder per
	object, and the extra space is only a pointer to a method table.

	Note that it may be that our scheme actually wins on both of
	these points.

	As noted above, there is an extra indirection (over using
	a global method table), however we will have to pay this cost
	anyway if we have subtypes.

#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: migration
Date: Mon, 29 Oct 90 12:09:37 -0500
From: brewer@muon.LCS.MIT.EDU


(I apologize in advance if this note is confusing... I wrote it in one
sitting.)

There are several serious problems in implementing migration based on the
stack template approach used in Emerald.

The fundamental problem is the migration of threads, particularly
stacks and registers.  We must be able to identify pointers (of
various sorts) in order to correctly translate them for the
destination processor.  One solution involves the use of tags, either
in hardware or software.  We can't assume hardware tags and software
tags add significant overhead to integer operations.

The Emerald approach uses a template per frame to mark words on the
stack as either pointers or data.  Unlike (per word) software tags,
there is no overhead for integer operations.  Emerald divides
registers into two classes: pointers only and data only.

There are several major problems with the Emerald approach for use in our
language:

1) register usage
The division of registers into specific categories must be handled at the
very end of compilation.  For example, gcc does not assign hardware registers
until after optimization.  This was not a problem in Emerald because they wrote
their own compiler.  The problem is the conflict between portability and the
need for control of the back end of the compiler.  If we write our own
back end, we have control of register usage but lose portability.  If we use
an intermediate language, we gain portability but lose control of register
allocation.  Thus, if we use an intermediate language we may not be able to
determine the contents of a register, which *severly* limits our ability to
migrate threads.  I will talk about possible solutions below.


2) register spilling
Like number 1, this is a problem only if use an intermediate language (which
I believe is mandatory).  The back end will sometimes save (spill) registers
onto the stack before a call that might use those registers. After the call
returns, the registers are restored.  A given routine may spill in more than
one place.  The compiler allocates space for the registers in the stack frame.
The space allocated is the maximum needed by any spill.  This introduces two
problems.  First, we must know the contents of the registers at spill time
in order to generate the template.  Second, because multiple spills may use
the same part of the frame, there may not be a correct template value.  For
example, the first word of the spill area may be used to hold an integer
for the first spill and a pointer for the second.  What should its (constant)
tag be?  If we wrote our own back end, we would guarantee that every word
has a consistent tag and we would could generate the correct template.  We
lose both abilities with an intermediate language.


3) polymorphism
Consider a polymorphic procedure that builds a list of type T.  The code
generated will move around T's as needed.  The problem comes in trying to
migrate a thread running the procedure.  Are references to T translated or
not?  It depends...  If T is an integer the T's should be untouched.  If
T's are pointers they must be translated.  This leads to two versions of 
every polymorphic procedure (or object), one for translated types and one
for non-translated types.  A similar problem arises with the use of "any"
unless it is always a pointer to something.


4) subtyping
If we allow subtyping, we must impose the following restriction:  Subtypes
may only redefine fields with fields of the same tag.  A pointer
field may only be redefined to be a different pointer field, never an
integer or other non-translated type.  Likewise, an integer field can
never be redefined to be a pointer field.  Consider field p, a pointer,
of type T.  Code that handles type T will translate p during migration.
Thus the p field of any subtype will also be translated, which implies the
given restriction.


I believe these are very serious problems.  I have few ideas and fewer
solutions.  Two solutions seem (somewhat) reasonable.

Idea Number 1: build our own intermediate language based on Gnu's RTL
(Register Transfer Language).  The key idea is to inherit the machine
description stuff and most of the optimizations.  We would use the
same machine description files, which would allow us to move to any
platform already running gnu C (will there be any?), and will reduce
the problem of porting to simply generating the machine description
(which is non-trivial but better than writing a new back end).
Our new RTL would track tags for every register, which handles
problems 1 and 2.  This is a great deal of work.

Idea Number 2: Place severe restrictions on the migration of threads.
This boils downs to specific migration points in top-level procedures.
We would save the state of the thread in a controlled manner and start
a new thread at the destination processor.  There would be no
translation of stacks or registers.  I believe this would not be too
restrictive.  In object-oriented languages, almost all threads are
associated with a particular method and are thus short lived.  These
threads wouldn't migrate at all.  Long-lived threads, such as
background tasks in Argus, generally have natural "resting points" --
places where they are waiting for something to do.  The migration
points (inserted by the user) would be in these places.  The ability
to migrate would be designed into the task.  The definition of a
migration point is essentially "a point where you can encapsulate the
state, translate it, and start a new thread that uses it."  In this
scenario, a migration would work something like this: migration is
delayed until all long-lived threads reach migration points (where
they stop), short-lived threads run to completion, the state is moved,
new threads run at the destination (after the state moves), long-lived
threads are restarted at the destination.

The biggest problem with this approach is that methods that rpc other
objects may be relatively long-lived, delaying migration and possibly
preventing the object from doing any work.  The object can do useful
work if we allow the the state to be moved while threads are active.
(This implies that threads must perform residence checks.)  This is
really a separate issue, but it is more important (in theory) here
than in Idea 1 because the latter allows the migration of all threads.


Eric


#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: more on migration
Date: Wed, 31 Oct 90 12:07:43 -0500
From: brewer@muon.LCS.MIT.EDU


I should note that my last note on migration owes much to Sanjay.  The
problem discussed here was found by Sanjay.

5)  Migration and Code Caching

The compiler may save computed code addresses in memory or in registers for
later use.  If the thread migrates, these addresses will be incorrect if we
use code caching (which seems mandatory).  Without code caching, code is
loaded at the same place on every processor and the stored addresses
are thus correct on every processor.  With code caching these addresses must
be found and translated.

   The only solution I've found is to have a fixed table on all
processors that is used as an indirection for ALL entry points.
Instead of an entry for each procedure/method, there would be an entry
for every entry point.  Entry points include 1) starting points, 2)
return points, and 3) any addresses used for a computed jump.
(Computed jumps within a method, which is almost all of them, can be
done relative to the PC and need not be in the entry point table.)
The table serves two purposes.  First, since all computed jumps go
thru the table, their is no translation of code addresses.  Second, if
the target code isn't resident, the thread would jump to a routine
that loads the code and then jumps to the correct location.  This
cleanly handles the problem of a return to code that has been unloaded
since the call was made.  The cost of this approach should be obvious
-- the table could be large.  But at 4 bytes per entry point, 10,000
entry points would only be 40k.  Note that there is no "resident" bit;
code is resident iff the address in the table is not the start of the
loading routine.
   There is a hidden problem of how the loading routine figures out
which code to load.  This can be solved by using an indexed jump from
the base of the table.  The loading routine could look at the index
register (which is fixed by convention) to figure out the intended
routine.
   Correct use of this table depends on control of the back end.  The
normal return convention must be replaced by the indirect jump.

Eric


#############################################################################

To: brewer@muon.LCS.MIT.EDU
Cc: weihl@photon.lcs.mit.edu
Subject: migration 
Message-Id: <CMM.0.88.657407083.weihl@photon.LCS.MIT.EDU>

Some remarks on Eric's messages on migration:

1) Polymorphism --- If we go with a very simple parameterization
mechanism to start with (e.g., CLU-style parameters without
"where" clauses), it's easy to implement by translating to an
intermediate language and having all the instantiations share
code.  However, the problem Eric identified does seem hard to
solve.  If we're willing to change the linker to do things
similar to what the CLU linker does, then the instantiations
could still share code but the templates to be for objects of
the parameter types would come from the caller.  I.e., the
linker would build a parameter block at each instantiation
site that could be used to tell whether an object of each
parameter type is a reference or an immediate; this parameter
block would be passed as an extra parameter to each routine
in the module.

2) Subtyping --- I think there are two problems here, one
with subtyping and one with inheritance.  By subtyping, I mean
that a variable declared to be of type T can actually denote
an object of type S, where S is some subtype of T.  So a procedure
that expects an arg of type T could be called with an actual arg
of type S.  If user-defined types are always represented by pointers
and never by immediate values, then there's no problem; the template
for the object's real type tells how to move it.  In CLU, an object
is just represented by its rep object, while in many other object-oriented
languages, an instance of a class is always a structure with instance
variables, even if there is only one instance variable.  I'm not sure
that CLU's approach would work with subtyping anyway, and if we use
the other approach, then there isn't a problem.  Inheritance may
cause problems, however.  If we want a subclass to share code with
its superclass, then the subclass probably shouldn't change the
tag of any field defined by the superclass.  I wonder, though -- could
we share the code, but use different templates for the two classes?

3) Code caching --- if processors have virtual memory support,
we could build a single code image for the entire program, and
then do code caching at the page level.  It's just like having
the entire program on every processor as far as addressing is
concerned -- i.e., code addresses never need to be translated --
but it doesn't use all the storage.

Also, do we need a table with an entry for every entry point
to every routine?  If we're caching at the routine level, why
not have an entry in the table for each routine, and have a
jump involve either (1) a residence check for the routine
followed by a jump to the appropriate offset in the routine
once the routine is resident, or (2) a jump to the start of
the routine, passing it the offset of the desired entry point;
the routine stored in the table is either the desired code or
a faulting routine; if the former, it jumps to the specified
entry point, and if the latter, it loads the routine and then
jumps to the start.

Regardless, it seems like implementing code caching requires
some control of the back end to replace ordinary calls and
returns with appropriate table lookups and indirections.

Bill


#############################################################################

Minutes for PSG Meeting on 26-Sep-90 
====================================

Asbestos (preemptive flame retarder): If you have any additions,
corrections, changes, or amplifications, please let me know.  Also, if
it makes you happier, read "class" as "type".  I tried hard to be
unbiased in reporting PRO/CON/CONSENSUS information.

[1] Handler Tables

We discussed a number of options regarding the storage of code and
pointers for message handlers (aka methods).  

(a) Handler Table Per Instance

Under this scheme, we store pointers to all handlers defined for a class
with every instance of that class.  

PRO: Fast handler dispatch.

CON: Prohibitively large space requirements. Consider a "Point" class
with instance variables x and y, for which a dozen handlers are
defined.  With a per-instance handler table, the space requirements
for each point increase by nearly an order of magnitude.

CONSENSUS: Not a viable option.

(b) Complete Handler Table Per Node

Under this scheme, each node stores one large handler table that
contains code pointers for all of the handlers for every class defined
in a program.  Each handler in the system is assigned a unique
HANDLER_OFFSET into this one table. Since all of the code may not be
loaded on every node, some pointers may be NULL, requiring a residency
check before dispatch.

Eric proposed an optimization: The residency check can be avoided by
setting the handler table pointers for non-resident handlers to the
address of a runtime system "fault" routine that loads the required
handler (and patches up the call stack, if necessary).  Bill pointed
out that faulting is slower than pre-fetching; perhaps a handler fault
could cause the pre-fetching of related handlers.

PRO: Fast handler dispatch.  The caller simply jumps to the address
defined in the handler table --  JMP *(HTBASE + HANDLER_OFFSET).
Additional state (e.g. one word per entry or a few bits in an
associated bitvector) could be used to make handler caching/paging
decisions (e.g. Corby's clock algorithm).

CON: May require a very large amount of memory for some large programs
(e.g. consider the number of handlers defined for a Smalltalk-80 system).

CONSENSUS: A classic time-space tradeoff.  Unless memory is very
tight (or programs very large) this is probably a good solution.
Most people (myself excluded) tended to dismiss the idea of very large
programs as unreasonable anyway.

(c) Complete Class Table Per Node

Under this scheme, each node stores a class table that contains 
pointers for all of the classes defined in a program.  This pointer
contains the address of relevant class information, including the
address of the local handler table for that class (or the address of a
runtime system "fault" routine as described above).

PRO: Table size is considerably reduced in comparison to a complete
handler table per node.

CON: Handler dispatch is more expensive, due to the extra indirection.

CONSENSUS: The flip side of the time-space tradeoff discussed above in
(b).  Unless memory is very tight, it was generally agreed that the
complete handler table is better.

[2] Migration: Object IDs (OIDs) vs. Local Addresses

(a) The Problem

Ideally, we'd like to generate efficient code and also allow for cheap
thread/object/partial-closure migration.  Unfortunately, this may be
quite difficult:

For efficient code, we would like to use local addresses for objects
whenever possible to minimize the number of  OID -> local address
translations that need to be performed.

On the other hand, the use of local addresses wreaks havoc with
migration -- all local addresses need to be found and modified, since
local addresses are not portable across nodes.  A tagged architecture
would help identify local pointers.  Alternatively, Bill proposed
adding tag bits to stack frames to indicate the location of local
references.  Sanjay pointed out that this type of method was used
successfully in a standard ML implementation.

If we ALWAYS reference objects via their OIDs, then this problem
evaporates, but the corresponding loss of efficiency in the normal
case may outweigh cheap migration.

(b) Relative Addressing

Wilson made a similar proposal in an earlier meeting.  The basic idea
is to use relative addressing whenever possible to allow for fairly
efficient normal-case and migration operations.

All entry points in handler code are treated as offsets from the start
of the code for that handler.  Return addresses are relative to the
start address of the code for the calling method.

When used in conjunction with the monolithic handler approach in (1b),
the dispatch sequence looks something like:

	MOV 	Rn, HANDLER_ENTRY_OFFSET
	JMP 	*(HTBASE + HANDLER_START_OFFSET)
	
If the handler isn't present, it is faulted in.  When the handler is
present, it does a relative jump to the appropriate entry point by
using the value stored in Rn.

PRO: Migration costs are sharply decreased, since non-portable
absolute addresses aren't used.  Also, handlers can still be swapped
in and out if all jumps indirect through the handler table entry.

CON: More overhead on most transfers of control (e.g. double-jumps).

CONSENSUS: Seems like a net win.  Some people (e.g. Eric) don't
believe in migration, however.

(c) Help from the Garbage Collector 

Bill pointed out that an incremental garbage collection technique from
a paper by Ellis, Lee, Appel (sp?) may be helpful in reducing the
number of presence checks when local references are used.

The scheme is a copying collector, and exploits memory protection
hardware that is available on some machines (it was pointed out that
many parallel machines don't have VM and may lack this sort of
support).  Basically, local references to objects which have been
swapped out end up as pointers into FROM-space, and cause a fault when
they are accessed.

PRO: Saves on presence checks.

CON: Restricts migrations to occur in concert with the garbage
collector.  May not be appropriate for some parallel architectures.

CONSENSUS: Should be investigated in more detail.

(d) Pre-Computed Hash Values for OIDs

(Sorry, no credit attribution in my notes.)

Each node will most likely maintain a local cache (translation table,
whatever) of OID -> (local-address or location-hint) mappings.
Lookups into this table will require some sort of computed hash value
on the OID.

The idea here is to avoid recomputing the hash value based on the OID
every time a lookup is performed.  Instead, this hash value is
computed once, at object creation time, and stored with the OID.

PRO: Lookups are much faster, since they don't involve hash value
computations.

CON: OIDs become larger.  This causes memory space problems, and may
complicate or slow down some runtime system operations.  Object
creation time is increased due to the need for the initial hash value
computation.  

CONSENSUS: None (as far as I could tell). 

(e) Optimizating Away Translations 

(Sorry, no credit attribution in my notes.)

Conditional code (similar to "code-splitting" for type-dispatch in
systems with subtyping) can be used to eliminate redundant
translations once an OID is known to have been resolved to a local
address.  In addition, redundant conditionals can be further optimized.

PRO: Better, faster, able to leap tall buildings in a single bound.

CON: None.

CONSENSUS: It's a win.


[3]  Responsible Nodes

Since object creation rates are unlikely to be uniform across all
nodes, the use of a "birth node" to keep track of an object's location
(i.e. "home-based" addressing) may lead to considerable imbalance in
storage requirements and location requests.  

I proposed that a "responsible node" be designated for each newly
created object, which is not necessarily the same as the object's
birth node.  This way memory use and communication traffic can be
better balanced if information about imbalances is maintained by the
runtime system.

PRO: Better balance of memory usage and communication traffic.

CON: Requires runtime to keep track of imbalances (although this may
be done infrequently).

CONSENSUS: Most people seemed to agree that some mechanism is needed
to avoid severe imbalances.  In any case, the use of the term
"responsible node" has survived to refer to a generalization of the
birth node for maintaining OID -> location translations.


	- Carl

Authors:	Sanjay Ghemawat
		Carl Waldspurger
		Bill Weihl


#############################################################################

Modules
-------
Modules will be used to impose a two level naming hierarchy, and
will also be used as the only form of access boundary.  I.e., if two
classes are defined in the same module, then they can see and
manipulate each other's reps.  This is the position taken by a number
of modern languages that provide support for modules (Modula 2, Ada,
Standard ML)

The tentative decision is to allow more than one class definition in a
module.  (Carl: I object to this scheme.  I'll flame at the meeting.)
The argument against this decision is that now these classes can look
at each other's reps.  (Note: if we allow exactly one class definition
per module, then the module starts looking like class objects in
Smalltalk).

We decided that we will have separate interfaces and implementations
for modules.  Module interfaces can contain zero or more of the
following:
  * Type abbreviations

  * Single assignment variables (assignment performed by the module
    implementation).  These will be called "module objects".

  * Class interface (name + list of methods).

  * Procedures: these include creators for any classes in the module.

Module implementations can contain:
  * Type abbreviations

  * Declaration and initialization of module objects.

  * Class rep and method implementations.

  * Procedure implementations.

Module components that appear in the implementation,  but not in the
interface can be considered "hidden" or "private" components.

The implications of using this mechanism are:
- Class variables (cluster own variables in CLU) are just the
  single assignment module variables.  These variables can be
  omitted from the interface if desired.
- Object creation is handled by the module procedures.
- Visibility is controlled by what is listed in the interface.

Here is an example (skip it if you like):

    interface Complex
	Complex = class
	    % List of methods on complex objects go here
	    ...
	    end

	% Some simple complex constants (module objects)
	zero, one, i: Complex

	% Creators
	cartesian = proc (r, i: real) returns (Complex)
        polar = proc (m, a: real) returns (Complex)

	% Other procedures
	...

        end

Using modules
-------------
The complex module given above can be used in the following manner:

	c: Complex::Complex := Complex::cartesian(10.0, 7.0)
	c := Complex::polar(c.get_magnitude(), c.get_angle() * 2.0)

However,  a less painful way would be to "import" the module:

	import Complex

	c: Complex := cartesian(10.0, 7.0)
	c := polar(c.get_magnitude(), c.get_angle() * 2.0)

Suppose module M contains components c1..cN.  Then the effect of "import M"
is identical to the following sequence of equates:

	c1 = M::c1
	c2 = M::c2
	...
	cN = M::cN

Therefore, if two modules contain components with the same name,
importing the modules creates a name conflict.  Here is how we deal
with these conflicts.  Suppose a name X appears in the source code:

  * If X is defined locally,  the local definition is used.
  * Else if X is defined in exactly one of the imported
    modules M, M::X is used.
  * Else if X is defined in more than one module, an error
    message is given saying that X is ambigous.
  * Else an error message is given saying that X is not defined.


Object Representation
---------------------
- The implementation of a class will list the slots for each
  instance of the class.  In a method,  the slots of the "self"
  object can br referenced by just giving the slot name.  The
  following example demonstrates this scheme,  and also shows
  the mechanism for creating objects:

    implementation Complex
	Complex = class slots [r: real, i: real] =
	    ...
	    get_real= method () returns (real)
		return (r)
		end
	    ...
	    end
	...
	polar = proc (m, a: real) returns (Complex)
	    return instance[Complex]{
			r: m * cosine(a),
			i: m * sine(a)
			}
	    end
	...
	end

  Of course, this mechanism is very similar to rep types, it is just
  more implicit in two ways:
	- There is an implicit record around the rep
	- Ups/downs are done implicitly

- Accessing slots of non-self objects:
  (Useful in binary ops,  and for accessing other classes in the module)

  The compiler automatically creates a type rep[T] for each class T
  in the module implementation.  Each class T gets an extra method:

	down = method () returns (rep[T])

  rep[T] provides get_x and set_x operations for each slot x in class T.

  Example:
	implementation Stack
	    Stack = class
		slots [ list: array[int] ]	% Only one slot
		...
		append = method (s: Stack)
		    r: rep[Stack] := s.down()
		    for i: int in r.get_list().elements() do
			push(i)
			end
		    end
		...
		end
	    end

  * It is illegal for rep[T] to appear in the interface  (this implies
    that the down operation cannot be exported).
  * rep[T] is legal if and only if T is a class defined by the current
    module.

Dispatch mechanisms
-------------------
There are several choices here:
  1 Explicit message queue.  This is general enough, but we thought
    that it would be very cumbersome to use.

  2 New thread per invocation (like Argus).  Locks are not propagated
    across invocation boundaries unless we provide more mechanism
    (such as action identifiers).  Of course, the compiler can
    optimize local calls so that they run in the caller's thread.

  3 Called method runs in same logical thread as the caller for
    synchronous calls.  Therefore, the caller's locks are
    automatically propagated to the callee.  This mechanism only works
    for a chain of synchronous calls.  What if we want to propagate a
    lock to an asynchronous child thread.

  4 Eric and Carl's two solutions to the recursion deadlock problem
    in Actor systems.  The ports in their model translate to the
    locks in our model.

  Option 2 is the tentative decision because it has very low overhead.

Asynchronous ordered messages
-----------------------------
We need to run at least some code sequentially to make effective
use of network message ordering.  The different options here
just specify how much code should be run sequentially for messages
that are ordered with respect to each other (i.e., they arrived on
the same stream)

  1 Preamble: a block of code
  2 Entire method body:  we can get unserialized executions when
    desired by forking off a thread.
  3 The code says "ready" when it is prepared to compete with other
    messages from the same stream.  This is very similar to the
    preamble mechanism, but is more flexible.  Ready statements are
    analogous to returns from procedure bodies.  In Pascal, which has a
    single return point per procedure, it sometimes gets very painful to
    force all returns through a single point.
  4 One (or possibly a number of) lock is specified in the method
    header,  and the system acquires this lock on behalf of the
    thread before allowing another message on the same stream to be
    handled.

Decision:  Entire method bodies are executed sequentially (option 2).
When we get the time,  it should be easy to make the switch to using
"ready" statements.

Procedures
----------
- No own variables.  Module objects can be used instead.

- We can create bind closures.

- If "m" is a method on class "X", and "x" is an instance of "X",
  then "x.m" is a procedure.  This is very similar to bind closures.

Method invocation
-----------------
The proposal for the simple language is to use the syntax
    x.foo(a1, a2, ... an)
to invoke the method foo on object x with arguments a1..aN.
The drawback of this syntax is that the record sugaring mechanisms
from CLU (get_x and set_x) stop working.  For example,  if I
say x.foo,  am I calling x.get_foo(),  or do I want the procedure
obtained by binding x to method foo.

We will disallow record sugaring until this problem is solved
(possibly by changing the method invocation syntax).  One implication
of this decision is that we can't use the sugar for even the built-in
record types.

Parameter passing
-----------------
- Call by sharing (object reference)
- Small immutable object can be passed as values (just an optimization)

Variables
---------
We need to define scoping/shadowing rules for variables.  We should
probably steal CLU's mechanism.

Parameterization
----------------
We will allow modules to be parametrized.

    interface Stack [T: type]
	Stack = class
	    Push = method (v: T)
	    Pop  = method () returns (T) signals (empty)
	    Top  = method () returns (T) signals (empty)
	    end
	New = proc () returns (Stack)
	end

The only problem comes with import statements:  Suppose I need both
Stack[int] and Stack[real].

	import Stack[int]
	import Stack[real]

Now,  the type name Stack is ambiguous, and therefore has to be
fully qualified.  I have to say:

	s1: Stack[int]::Stack
	s2: Stack[real]::Stack

One solution is to use equates:
	Stack_int  = Stack[int]::Stack
	Stack_real = Stack[real]::Stack

I think a better solution is to introduce the idea of parameterized
equates.  I should be able to say:

	Stack[T] = Stack[T]::Stack

(This example depends on module names being in a separate name
space from local declarations)

Given that we have parametrized types,  we can say that if module
M is parametrized by k parameters: P1..Pk,  and that the module
has components C1..Cn,  then "import M" is equivalent to:

	C1[P1,P2,..Pk] = M[P1,P2,..Pk]::C1
	C2[P1,P2,..Pk] = M[P1,P2,..Pk]::C2
	...
	Cn[P1,P2,..Pk] = M[P1,P2,..Pk]::Cn

Therefore,  if I say "import Stack",  I can then say:

	s1: Stack[int]
	s2: Stack[real]

Initialization of module objects
--------------------------------
We want to guarantee that a module object is initialized before it
is accessed (barring cycles in the initialization graph).  This
will be very easy to do if we restrict the code that can be run
for initialization of module objects.  However,  we would like
to provide more generality.

Here is the implementation strategy for the initial version of the
language:

  * All module objects that are initialized by built-in code will
    be initialized either at load time,  or just before the main
    program starts running.  Example:  integer objects will be
    easy to initialize at load time.

  * All other module objects are allowed to be in one of three
    states: unintialized, initializing, and initialized.  Each
    access to one of these objects will check the state:
	if uninitialized:
	   Change state to initializing and run initialization code.
	if initializing:
	   Block until state changes to initialized.
	if initialiazed:
	   Just use the relevant object.

    Note:  this scheme will deadlock if there is a cycle in the
    initialization graph: i.e, object A's initialization code uses
    object B,  and object B's initialization code uses A.

Multiple access boundaries
--------------------------
It may be convenient to expose different portions of the object's
interface to different entities in the system.  For example, a bank
account object will have different sets of operations for bank
employees and customers.  This sounds like capabilities.  We will punt
on this issue for now.

Associated body/background process per object
---------------------------------------------
This seems unnecessary because the creator can just fork a process
that does the necessary work.  Languages like POOL have a body per
object so that they can explicitly pick up messages from the message
queue.  We will not need this capability in our model.


#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: Notes on type system/base types
Date: Thu, 08 Nov 90 19:16:00 -0500
From: Wilson Hsieh <wchsieh@graviton.lcs.mit.edu>

Authors: Adrian Colbrook
	 Sanjay Ghemawat
	 Wilson Hsieh

Semaphore
---------
- We can probably get away with providing just counting semaphores
  because they are more general than binary semaphores.  
  Should a counting semaphore should probably be parameterized
  by the count so that we can implement binary semaphores very
  efficiently (counting semaphores will cost a few more instructions
  per access as compared to binary semaphores).

- We can combine both blocking and spinning models into the same
  semaphore type by providing two operations:
	acquire:		Block until the thread acquires the semaphore
	test_and_acquire:	Acquire only if it is possible to do so
				without waiting.  Return false otherwise.
  This arrangement abstracts away from low-level synchronization primitives
  like test-and-set.

- What fairness guarantees should we give for threads blocked on
  a semaphore?  FIFO?


Creators:
- create

Operations:
- acquire
- release
- test_and_acquire

Mutex
-----
- Parameterized by type of contained element.
  Example: mutex[array[int]].

Creators:
- create		Create a new mutex

Operations:
- seize/release		Lexical structure like Argus?
- get_value/set_value	Access/change the contained object
- pause			Not needed if we have condition variables

Condition Variables
-------------------
Since condition variables are usually used in association with a
mutex, it may be convenient to package up both the mutex, and the
condition variable's state (set of threads) together.  Therefore,
condition variables should provide at least the following operations:

Operations:
- seize/release
- wait
- signal		Any fairness guarantees?
- broadcast

The rationale for making condition variables a part of the language
is that they are very useful,  and it seems hard to come up with a
language level implementation for condition variables that will both be
efficient and portable between shared memory and message passing machines.

Promises
--------
Promise[T] is a future for a value of type T.  A promise[T] is created
by making an asynchronous call on a procedure that returns T.

Operations:
- get_value		Wait until ready and then return value
- ready			Return true iff value is ready

Stream wrappers (for ordering asynchronous calls)
---------------
A stream[T] provides a mechanism for delivering messages in order
to the contained object of type T.  A stream[T] should provide all
operations that T provides.  However, the return types are all
promises:
  If T provides foo: T1 -> T2, stream[T] provides foo: T1 -> promise[T2]

Creators:
- create

Do we need more operations?  The following may be useful:
- sync

Read-Write locks
----------------
We can easily build read-write locks out of mutexes.
The questions are:
- Can we provide a better interface for read/write locks
  by explicit support in the language?

- Can read-write locks be implemented more efficiently than by
  using mutexes?  For example, the kernel may implement some fancy
  locking protocol that uses special hardware capabilities.

If we decide not to make read-write locks a part of the language,  they
should be one of the first types added to the language library because
they will probably be very useful.

Atomicity of normal types (arrays, etc.)
----------------------------------------
- If normal base types are atomic, then every use of these types has
  to pay the penalty of acquiring the appropriate locks (global
  compiler analysis may alleviate this problem slightly).

- If the normal base types are not atomic, then the programmer will
  have to build atomic versions by combining these types with
  synchronization types (mutexes for example).  Our parameterization
  ability will be a big help here.

Parameterized types
-------------------
- What about where clauses?  They may turn out to be very useful.  In
  any case, they are probably not too hard to implement, and even if
  they are not part of the initial language, they should be one of the
  first few things added to it.

Base types  (CLU/Argus types)
----------
int
real
bool
char
null
any  (?)
array
string
record
variant

node?  We might want some notion of node ids, rather than
just passing integers.

Do we want immutable versions (sequence, struct, oneof)?

We need some type that captures handler instances (x.foo, or
bind foo(x,*, *, ...), depending on our syntax) as well as
free-standing procedures, such as proctype.
We will also need creatortype.

Do we have itertype?

Input/Output
------------
We probably don't need istreams for now.  However, we will
probably want to read/write text files and perform terminal
I/O.

streams

What is an object?
------------------
Are messages, types, or threads objects?  If they are, they must
be base types.  We should probably say no on all three for now.


#############################################################################

To: psg@neutron.lcs.mit.edu
Cc: wang@simpleton.lcs.mit.edu
Subject: Invocation Mechanisms
Date: Thu, 08 Nov 90 20:05:51 -0500
From: carl@meson.lcs.mit.edu


Authors:	Paul Wang
		Carl Waldspurger

We haven't developed anything that's very new or interesting,
primarily because we are not dealing with location-related invocation
options (e.g. Emerald call-by-move) in Feeble.

INVOCATION MECHANISMS:

(1) Blocking calls and sends:

	% y has type t
	y: t	

	% BLOCKING CALL
	% Method foo (which returns a t) is invoked on receiver object x.
        % The caller waits for the reply, which is assigned to y.
	y := x.foo(arg1, arg2, ..., argN) 
        
	% BLOCKING SEND
	% Method foo (which returns a t) is invoked on receiver object x.
	% The caller waits for the reply, which is ignored.
        % (The implementation may choose to simply return an ACK.)
	x.foo(arg1, arg2, ..., argN)	


(2) Non-blocking calls and sends:

	% y has type promise[t].  A promise is a typed "future"
        % (including exceptions) that must be explicitly claimed.
	% At the claim point (e.g. y.claim()), the promise will either
        % return an object of type t or raise an exception defined in
	% the promise type.
	y: promise[t]

	% NON-BLOCKING CALL
	% Method foo (which returns a t) is invoked on receiver object x.
	% The callee immediately receives a promise[t], and does not
	% wait for the method to be invoked or completed.  Conceptually
	% a new thread of control is "forked" which asynchronously 
        % performs the call.
	y := fork x.foo(arg1, arg2, ..., argN)

	% NON-BLOCKING SEND
	% Method foo (which returns a t) is invoked on receiver object x.
	% The caller does NOT wait for the reply, which is ignored.
        % (The implementation may choose to simply avoid any reply.)
	% Conceptually a new thread of control is "forked" which
        % asynchronously performs the send.
	fork x.foo(arg1, arg2, ..., argN)

(3) Nested calls and sends:

	% We may want to guarantee that "call-thrus" avoid unnecessary
	% messages (e.g. like Scheme guarantees tail recursion avoids
	% unnecessary stack frames).  This behavior is often explicitly
	% coded in actor languages using the actor notions of
	% "customer" and "reply-keyword".

	y: promise[t]
	
	% The reply value generated by invoking foo on receiver object x
	% is sent directly as an argument to the invocation of method bar
	% on receiver z.  The intermediate result need not be sent back
	% to the callee.  This may simply be an implementation optimization
	% or we may want to guarantee it in the language semantics (?).

	y := fork z.bar(x.foo(arg1, arg2, ..., argN))

(4) Pipe calls:

	% Pipes are abstractions which perform ordering.  They are 
	% identical to the "stream" objects we talked about in
        % meetings (and are elaborated in PSG design note #4 (draft)
	% by Wilson and Bill).  (The name was changed to protect the
	% innocent, and to avoid overloading the already-overloaded
	% term "stream").

	% Create a pipe for ordering messages to x.  The signature
	% of p is the same as the signature for x, except that all
	% the return types become the corresponding promise types.
	p: pipe[t] := pipe(x)
	
	% Method foo (which returns a t) is synchronously invoked on
        % receiver object x through pipe p. The callee immediately
        % receives a promise[t], and does not wait for the method
        % to be invoked or completed.  The pipe abstraction guarantees
	% that subsequent calls placed on pipe x will appear to execute
	% AFTER prior calls have completed.

	y: promise[t]

	% Special syntax is used ("pipe") for two reasons: 
	% (1) There may be special operations (e.g. p.sync()) that
	%     we want to execute directly on the pipe object itself.
	%     Simple using "y := p.foo(...)" could present a name clash.
	% (2) It is plainly clear that a "pipe" call is being made.
	%     Bill objected to the implicitness of simply using
	%     "y := p.foo(...)" although we don't mind.

	y := pipe p.foo(arg1, arg2, ..., argN)	

	
(5) Order of evaluation:

	In the method invocation: x.foo(arg1, arg2, ..., argN),
	the receiver object is first evaluated ("x"), then
	the arguments are SEQUENTIALLY evaluated in an arbitrary (i.e.
        up to the compiler, as in Scheme) order, then finally the
	message invoking the method ("foo") is sent.

	We decided not to evaulate the arguments in parallel for 
	Feeble, since this can be done manually in a straightforward
	way, such as:

		val1 := fork arg1
		val2 := fork arg2
		...
		valN := fork argN

		x.foo(val1.claim(), val2.claim(), ..., valN.claim())

	In the future, we may want to provide some suitably convenient
	syntactic sugar if this is common.


#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: Notes on concurrency structures
Date: Fri, 09 Nov 90 13:57:37 -0500
From: dellaro@muon.LCS.MIT.EDU

Authors: Chris Dellarocas
	 Qin Huang

Concurrency Structures
======================

Concurrency in our language will be achieved by means of the following :

1. Synchronous and asynchronous method invocations
2. Parallel block statements (cobegin/coend)
3. Parallel iterators

Placement of threads is an important issue in getting concurrency, i.e., 
we would like to distribute threads among different processors. 

1. Presumably, method invocations are executed on the processor where 
   the invoked object resides. We may have call-by-move, etc. later on. 
2. It is harder to decide where to put threads in the cases of cobegin/coend
   and parallel iterators.  We argue that there must be some distributed
   data structures underline the cobegin/coend and parallel iterators 
   to make the full use of them.


Method invocations
==================

We will have both synchronous and asynchronous method invocations.
A synchronous call has the syntax:

	y := x.foo(a1,...,an);

or
	x.foo(a1,...,an);

depending on whether we want a result from method foo. An asynchronous 
method invocation is qualified by a keyword such as fork, async, future.

	p := async x.foo(a1,...,an);

If x.foo returns an object of type T, then async x.foo returns an object
of type promise[T]. In our view, promises are a nice way to integrate the
notion of threads in our object-oriented language. Promise is an abstract
data type which has a number of defined methods like:

- join 		essential to be able to get back the result of a method
		if p::promise[T]  then p.join::T

- kill		to be able to kill unnecessary threads - useful for
		speculative concurrency

- suspend	???
  resume

Comments on the stream data type (Adrian, Sanjay & Wilson)
=========================================================

We have some objections to the stream data type, as proposed by AS&W.
Suppose we have an object x and we want
to make an asynchronous call to one of its defined methods, foo.
Then, using the stream data type, we can 

(1) create a stream of the same type as x. From what we understood, streams
are "tied" to specific objects of that type, which are passed as arguments
to the stream creator. Suppose that the resulting stream is called s.

(2) invoke method foo on stream s

	p := s.foo(a1,..., an);

This is equivalent to the async call presented above, but rather confusing
since we now have two different names to (ultimately) refer to the same 
object. We definitely prefer explicit asynchronous calls.


Parallel Block Constructs
=========================

By default statements in a method will be executed sequentially.
We propose to provide two parallel block constructs. One for
statements that can be executed in parallel and one for statements
that MUST be executed in parallel (for correctness). Possible 
syntax could be

cobegin				% can be parallel	
  block of statements
coend


parbegin			% must be parallel
  block of statements
parend

A question that arises is, when do these constructs exit ? Do we wait for
completion of all statements in the block (implicit barrier) or not ?
Do we need separate variants to cover all cases ?

Parallel Iterators
==================


Parallel iterators will be analogous to parallel block constructs.

cofor (sequential iterator syntax)
  statement
end

Same questions about termination (barrier or not) remain. Placement is an
issue in this simple language version. Since methods are executed where
their objects reside, a parallel iterator only makes sense if each iteration
invokes a method on a different object. We think that we must include
some kind of distributed objects in our language very soon !

Group Promises
==============

While discussing possible parallel iteration constructs, Chris came up with 
the following idea. On many occasions,
programmers may want to asynchronously start a number of methods and then
treat the set of promises thus created as a single entity. 
For example, they may want to join with any of the methods (whichever
finishes first) and then kill all other methods, or join with some
methods and then wait for the rest to finish. 
We propose a new abstract data type which captures the idea of a group of
promises ( a group of threads ) and a statement that asynchronously 
starts a number of methods and returns a single identifier for the group. 

The abstract type is promise-group[T]. This type is an abstraction for
a number of concurrently executing methods that return type T. Each 
member of the group can be referenced by an integer index i running from
0 to the size of the group. A number of methods can be invoked on an 
object of type promise-group[T] :

- join-any         Wait until any member of the group has completed.
                   Return the result of the first member that does.

- join             Selectively join with a speicified member of the group.

- kill-all         Kill all members of the group that have not finished.

- kill             Kill a specified member of the group

- wait-all         Wait until all members of the group have completed
                   Return their total number

- groupsize        Return total number of members in the group

- finished         Return number of members that have finished

- working          Return number of members that are still executing


The statement is a variant of a simple for loop

 gp :: promise-group[T]

 gp := group from i := 0 to <groupsize-1> do
         statement
       endgroup

Group starts <statement> for all indices from 0 to <groupsize-1> and
immediately returns a group of promises in gp.

Eventually we should come up with more general versions of the group 
statement, providing more general ways to index the members of a group.


#############################################################################

To: psg@neutron.lcs.mit.edu
Cc: weihl@photon.lcs.mit.edu
Subject: issues raised at today's meeting 
Message-Id: <CMM.0.88.659921374.weihl@photon.LCS.MIT.EDU>

Here's the list of issues from today's meeting.  Please think
about them for next week, and also go through the draft notes
on the language design to see if there are any other issues
that should be raised.

1.  Syntax for pipe calls.

	Bob raised the issue of needing to distinguish between
	"pipetypes" and other types in where clauses if pipe calls
	involve special syntax.  If we eliminate the special syntax,
	we need some mechanism for dealing with possible name clashes
	between operations on a pipe itself and operations on the
	target object of a pipe.  There are (at least) 5 possible
	options for dealing with the name clashes:

		1.  Don't have any operations on the pipe itself.
		(Disadvantage:  "synch" is inconvenient.)

		2.  Have special syntax for calling target operations
		through a pipe.  This suffers from the problems raised
		by Bob.

		3.  Have special syntax for calling operations on
		the pipe itself.  (E.g., pipe s.synch().)  Disadvantage:
		this also suffers from the problems raised by Bob, unless
		we simply decide that this syntax can't be used for
		calling pipe operations.

		4.  Have no special syntax -- but allow the instantiation
		pipe[T] only if T has no operations with the same name as
		one of the pipe operations.  Disadvantage:  if there are
		very many pipe operations, this introduces a lot of operation
		names that have to be avoided.  This also makes it more
		painful for us to add more pipe operations later.

		5.  Have no special syntax -- and have a single pipe
		operation that extracts the underlying pipe object, which
		has some other type (e.g., real_pipe) that provides the
		desired operations such as synch.  This is similar to #4
		above, except that there is only a single operation name
		that can cause a conflict -- and we can easily add new
		pipe operations to the real_pipe type.

2.  If we eliminate modules from the initial language, what replaces
them?  Do we allow separate compilation?  If so, how does it work?
and what are the rules for type-checking across compilation units?

3.  Is there any special syntax for "lock statements" for mutexes?

4.  Is there any exception handling mechanism?  (I think we decided
to eliminate this initially.)

5.  To generalize #3, what about a "try ... finally ..." statement?

6.  How about goto?  (I vote NO.)

7.  What are the details of the cobegin/parbegin/cofor mechanism?

8.  Can arms of a cobegin exit other than by falling off the end?
If so, what is the semantics?

9.  Can threads be killed by other threads?  (e.g., as in "race"
for speculative concurrency.)

10.  If we don't have modules (with static variables in modules),
is there any mechanism for static variables (either globals, or
local (own) statics)?

11.  We need to go over the details of object creation.

12.  What about grouping calls on a pipe into "atomic batches"?
(I think we decided to defer this and similar elaborations of the
basic mechanism for now.)

Bill


#############################################################################

Date: Wed, 5 Dec 1990 17:56:01 EST
From: William E. Weihl <weihl@photon.lcs.mit.edu>
Reply-To: weihl@photon.lcs.mit.edu
To: psg@neutron.lcs.mit.edu
Cc: weihl@photon.lcs.mit.edu
Subject: meeting
Message-Id: <CMM.0.88.660437761.weihl@photon.LCS.MIT.EDU>

Here are my initial thoughts on the issues
raised last week; we can discuss them at
tomorrow's meeting.

Bill

1.  For pipes, I think the best option may be the fifth one
listed in my message from last week -- i.e., have no special
syntax for pipe calls, and have a single operation on pipes
that extracts the underlying pipe object; operations such
as synch are handled by the underlying pipe.  I.e., the
type pipe[T] has all the operations of objects of type
T (with return types changed to promises), plus an operation
"get_pipe" that returns an object of type pipe.  The type
pipe has an operation "synch", plus possibly some others.
The instantiation pipe[T] is legal only if T has no operation
named "get_pipe".

2.  To simplify our initial implementation effort, I propose
that we not worry about separate compilation initially.  Designing
and building mechanisms to support separate compilation are
activities that are unrelated to the research we're doing; if
we can manage without it for awhile we'll have more useful
results sooner.

3.  I think we should not bother with special syntax for mutexes
in the initial language.

4.  To keep things simple, I suggest we ignore exception handling
mechanisms for now.

5.  I also think we shouldn't include a "try...finally..." statement.

6.  No goto.

7.  Here's a suggested design for cobegin/etc.  We provide a
compound statement "cobegin" that takes a number of arms.  Each
arm can be either static or dynamic.  A static arm is a single
statement (or body).  A dynamic arm is a statement, together with
a specification (as in a for loop) of the number of instances to
create.  The semantics is that one thread is created for each static
arm, and one for each instance of a dynamic arm.  We will provide
annotations to allow the programmer to indicate whether or not
the system must be prepared to run all the threads concurrently.
(Similar annotations are required for "fork".)

8.  An arm of a cobegin is not allowed to transfer control outside
the cobegin.  Control leaves the cobegin after all arms have terminated.

9.  I think we should defer providing mechanisms for threads
to terminate each other until the next version of the language.

10.  We should provide both global and local static variables.
Local static variables can be declared in the outermost scope
of a type implementation or routine (similarly to CLU).

11.  We should talk about creation at the meeting.

12.  We should defer enhancements to the pipe ordering
mechanisms until later.

Flames welcome.


#############################################################################

Date: Thu, 6 Dec 1990 11:04:43 EST
From: William E. Weihl <weihl@photon.lcs.mit.edu>
Reply-To: weihl@photon.lcs.mit.edu
To: psg@neutron.lcs.mit.edu
Cc: weihl@photon.lcs.mit.edu
Subject: annotations
Message-Id: <CMM.0.88.660499483.weihl@photon.LCS.MIT.EDU>

Below is an outline of some issues related to annotations
for controlling placement, migration, and granularity.  One
clear omission is that I haven't thought about ``distributed
objects'' in writing this.  Comments and discussion are
welcome.  If we have time, we can discuss this at today's
meeting, so please read it (and try to find time to think
about it if you can) before then.

Bill
**************************************************

We need annotations for controlling:

	object placement and migration

	thread placement and migration

	thread granularity and scheduling


1.  Object placement and migration

  A. initial placement could be described

	i. on declaration of variable
	   - but variables are assigned both newly created objects
	     and existing objects -- so this might not be useful.

	ii. on creation of object
	   - this seems like the right time to do it.  The problem
	     with this is that it could be very difficult to know
	     the right place to create an object.  E.g., suppose
	     I have a routine that happens to create some objects;
	     it could be that the appropriate places to create them
	     depend on how the routine is called -- so the placement
	     decisions need to depend on the routine's arguments.
	     In general, an abstraction that creates objects might
	     need information from the point of call about where
	     the objects should be created.

    Can we develop simple annotations that don't require passing
    node ids as explicit arguments of creation operations?  What
    if creation of an object requires creating subobjects?  Where
    are they created?  And how does the creator of the outer object
    indicate this?   In general, when creating an object, one would
    like to be able to specify how its pieces should be spread
    around the machine; this requires some sort of ``language''
    for describing the pieces and where to put them.  It's easy
    to design a specialized language for a few types like arrays;
    what about more general types?

  B. migration

	i. explicit ``move'' instructions.

	ii. as part of a call (because it may be cheaper to move
	    an object in the same message as a call than to use
	    a separate message).

		a. call-by-move for arguments.

		b. data-shipping (i.e., bringing target of call to
		   the node where the calling thread resides -- or
		   in general to the node where the call will run).

	iii. by attachment to other objects (including threads)

		a. indicated by declaration (e.g., an instance
		   variable has the attribute ``attached'', indicating
		   that the subobject should be colocated with the
		   containing object; or a local variable in a thread
		   has the attribute ``attached'', indicating that
		   the object denoted by that variable should be
		   colocated with the thread)

		b. indicated by explicit attach and unattach operations.

	iv. other operations?  e.g., ``fix'' as in Emerald?

2. Thread migration

  A. by explicit move instructions (move self to node X, or move
     some other thread).

  B. as part of a (sequential) call:

	i. where to run the call

		a. at thread's current node.

		b. at node where called object resides (for calls
		   to methods on objects)

		c. at a third node (other than the calling thread's
		   and the called object's)

	ii. how to run the call

		a. RPC

		b. continuation-passing (i.e., move part of thread's
		   stack to desired node, so when the call returns the
		   thread continues running there rather than returning
		   immediately to the original calling node) -- need
		   ways of describing how much of thread is moved (e.g.,
		   top frame, top n frames, partial frames)

3. Thread placement and granularity

  A. granularity

	i. information about may/must fork/cobegin
	ii. information to assist in aggregation of subtasks (for ``may''
	    fork) -- i.e., which forked calls should be run inline,
	    which arms of a cobegin should be aggregated together.

  B. placement -- where to run a new thread.

	i. at calling thread's node. (usually a bad idea?)

	ii. at called object's node (for calls to methods).

	iii. at some third node.

	iv. at a lightly loaded node chosen by the run-time system.

4. Scheduling

  It's very unclear to me right now what kinds of control are
  needed or possible over scheduling.  We need to discuss this
  more.

A general note: annotations about actions to be taken on a
call could be given either in the called routine's interface
or as part of the call.


#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: invocation
Date: Mon, 25 Feb 91 14:44:16 -0500
From: brewer@muon.LCS.MIT.EDU

Notes on invocation:

1) Converting OIDs to local addresses

We will use a hash table that converts OIDs into local addresses.  An
OID will have an entry exactly when there is local storage for it --
either the object itself or a surrogate.  Entries are deleted when the
corresponding storage is deallocated.

Objects created locally are not assigned OIDs until a reference to
them is exported.  The id 0 denotes objects without OIDs.  After an
OID is assigned the corresponding entry is added to the hash table.

There are two uses for the hash table.  The primary use is to convert
OIDs to local addresses when unmarshalling a message.  The second use
is to help locate an object during a location request to the
birthnode.  The birthnode first looks in the hash table.  If there is
an entry (the normal case), then the location information is at the
local address (at either the object or the surrogate).  If there is
not an entry, a second data structure is referenced to find the
location.  In this case, the hash table serves only to reduce the data
required in the second data structure.  The second structure has an
entry for each non-resident object without a local surrogate that was
created on this processor.

The hash table also serves as a list of all the objects (with OIDs)
known to the processor.  This should be useful during gc, and perhaps
gc info should be stored in the hash table. (Note that everything in
the hash table has been exported.)


2) Invoking a incoming message (remote -> local)

	1) Find the object using hash table
	   If not here at all, goto 7
           If here only as surrogate, goto 5

	2) (object is here)
	   Unmarshall/translate message

	3) Invoke local procedure (the method)

	4) return reply if expected (DONE)
       ---
	5) (object not here, but have surrogate)
	   Get location from surrogate

	6) Send REJECT message to OS at sender including rejected message
	   and location hint
       ---
	7) (object not here and no surrogate)
	   set location hint to NULL
	   goto 6

3) Local -> local

	1) Find method.  The method address is a function of three things:
		ma = f(method_name, class, instance)
	   The instance reveals whether to use the normal method or the
	   surrogate method.  Alternatives:

	 	a) Use link-time assigned numbers for methods, with the normal
	  	   version and the surrogate version numbered consecutively.
		   To do invocation:
			1) Using object address, look up offset located at
			   the object.  The offset is 0 if local, 1 if
			   surrogate. (1 cycle, 1 mem access)
		        2) base pointer = method_table + offset (1 cycle)
			3) jump indirect through METHOD(base pointer)
			   (1 cycle, 1 mem access)
		   Total: 3 cycles, 2 mem accesses

		   A variation of this keeps (method_table + offset) at the
		   instance.  This requires 32 bits of storage instead of 1
		   bit (although the "bit" may use a byte for speed of access),
		   but eliminates step 2. If were going to use 32-bits per
		   instance, alternative b makes more sense.

		b) Eliminate the table.  Each class has a method table that
	           contains all of the normal methods followed by all of the
		   surrogate methods.  The offset field (which is 32 bits)
		   holds the address of either the start of the normal methods
	   	   or the start of the surrogate methods.  Methods are
		   named by the (compile-time constant) offset into these
		   tables.  To do invocation:
			1) Get base pointer (1 cycle, 1 mem access)
			2) jump indirect through METHOD(base pointer)
			   (1 cycle, 1 mem access)
		   Total: 2 cycles, 2 mem accesses

		   This scheme enlarges object space requirements, but
		   eliminates a global table, allowing the OS to allocate
		   table space only for the classes that are resident.
		   (Both allow code caching, although we must discuss the
		   procedure return mechanism and its effects on a code cache.)

	   In both versions, the method tables must be resident at all times.

	   If we go with version b, we should be able to eliminate the class
	   pointer for instances, since we can determine the class based on
	   method table pointer if we're careful.  Elimination of the class
	   pointer brings the per-instance space overhead (due to invocation)
	   of version b to zero; we have replaced the class pointer with the
	   method table pointer, and everything else should be the same.

	2) local procedure call


4) Local -> Remote

	1) Find the method as above; it will be the surrogate method.

	2) call surrogate procedure
		a) allocate message space
		b) set up header
		c) marshall arguments
		d) send message


Obviously, I expect there will be some discussion of the above.

Eric, Adrian, and Carl


#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: The "Dot" Mess & Modula3
Date: Tue, 14 May 91 17:17:47 -0400
From: carl@meson.lcs.mit.edu


Modula3 avoids the mess we've encountered by specifying that the
target object in a method call is the first argument to that method.
Thus, to call a method "method" on an object "obj" of type "Foo",
one would write:

	Foo.method(obj, arg1, arg2, ...)

Note the similarity to CLU syntax: 

	Foo$method(obj, arg1, arg2, ...).


(Thankfully) There seems to be a consensus among us that this
CLU-style listing of the type for every method invocation is
too verbose.

Please send constructive ideas to me, and flames to /dev/null.

		- Carl


#############################################################################

To: psg@neutron.lcs.mit.edu
Subject: Dots...
Date: Thu, 23 May 91 14:27:39 -0400
From: carl@meson.lcs.mit.edu


BOTTOM LINE:

	A simple change to our "agreement" that should be acceptable
	to everyone.  Flame directly to me, cc: psg@neutron.

ISSUE:

	Wilson has made a very convincing case that we should not
	have two distinct basic entities in our language -- namely,
	objects and records.  I agree, and it is my understanding 
	that everyone Wilson has talked to also agrees.

THE FIX:

	(1) We still punt on providing record get/set sugar for
	    user-defined objects.

	(2) We still provide record get/set syntax for record types.
	    Records are still special in that they are the only objects
	    for which this sugar applies.

	(3) The only real change is that we think of records as
	    objects that have get/set methods that can be either 
	    directly accessed -- like ordinary methods "foo.get_x()" --
	    or accessed via the sugar -- "foo.x".

	    Aside from a mental viewpoint shift, the only change is
	    that we now support direct access of get/set methods for
	    records.  Thus, we could also pass foo.get_x as a
	    proctype, just as we could pass a method for any other
            object.

Provided that I don't hear (m)any objections, and that Bill doesn't
disagree (and also put on his dictator hat), then the above "fix" is
what I will add to the Prelude language description.

		- Carl