Note: the following is a description of the low-level "assemble-cell"
mechanism, which has been superceded by the higher-level
"assemble-fields" mechanism.  This new mechanism is described in
"basic.list" and in the source file "step-assm.fth".  For examples,
see "/cam8/demos/newest/diffuse.exp", and "/cam8/demos/smooth/5avg.exp".


ASSEMBLE-CELL: CREATING MODELS WITH LARGE CELLS

On CAM-8, table lookup updates 16 bits at a time.  If you want to
construct a model with more than 16 bits at each site, you must
synthesize the full update out of a sequence of 16-bit updates.

If, for example, we want a 512x512 space with 32 bits at each site, we
still declare the space to be 512x512 as usual, but use two COPIES of
the space.  All updating scans of the space will cover 512x512 16-bit
values, but each of the 16-bits can be chosen either from one INSTANCE
of the space, or from the other.  The choice of which 16-bits to use
can be made differently for each scan of the space, as can the choice
of the lookup table to apply at each scan.

If, in our example, we wanted to make the second instance of the space
contain a duplicate of the data contained in the first instance, we
could do it as follows:

   1) Download a lookup table that copies bits 0--7 into bits 8--15.
      Select bits 0--7 from the first instance, bits 8--15 from the second.
      Start an updating scan of the space.

   2) Download a lookup table that copies bits 8--15 into bits 0--7.
      Select bits 8--15 from the first instance, bits 0--7 from the second.
      Start an updating scan of the space.

   3) Download a lookup table that swaps bits 0--7 with bits 8--15.
      Select bits 0--15 from the second instance of the space.
      Start an updating scan of the space.

Note that the lookup table for one scan can be downloaded while the
previous scan runs; in the algorithm above, we take advantage of
this by having the download occur immediately after starting the
updating scan.  Note also that if we didn't care about the
bit-order of the duplicate data, then we wouldn't need step (3) above.


More generally, if we have N instances of our space, the maximum size
of each instance is proportionately reduced.  Each full-cell of our
space consists of N 16-bit elements, one from each instance, that we
call SUBCELLs.  Each of the 16 bits involved in one scan of the space
can be freely chosen from any subcell.  Once chosen, the 16 active
bit-fields may be shifted as desired using the normal kicking
mechanism, and may be updated by table lookup.


ASSEMBLE-CELL: A CODE EXAMPLE

Here is a complete code example that implements the algorithm
described above.

       new-experiment  512 by 512 space
       
       0  7 == cell-lo
       8 15 == cell-hi
       
       0 subcell:
       
       0  7 == cell0-lo
       8 15 == cell0-hi
       
       1 subcell:
       
       0  7 == cell1-lo
       8 15 == cell1-hi
       
       : lo-to-hi    cell-lo -> cell-hi ;
       : hi-to-lo    cell-hi -> cell-lo ;
       : xchng-lo+hi lo-to-hi  hi-to-lo ;
       
       create-lut copy-up     rule>table lo-to-hi    copy-up
       create-lut copy-down   rule>table hi-to-lo    copy-down
       create-lut swap-bytes  rule>table xchng-lo+hi swap-bytes
       
       define-step  copy-step
       
       	 lut-data	copy-up
       	 assemble-cell	cell0-hi field remove
       			cell1-hi field add
         kick
       	 run		new-table
       
       	 lut-data	copy-down
       	 assemble-cell	cell1-hi field remove
       			cell0-lo field remove
       			cell0-hi field add
       			cell1-lo field add
       	 run		new-table
       
       	 lut-data	swap-bytes
       	 assemble-cell	cell0-hi field remove
       			cell1-hi field add
       	 run		new-table
       
       	 assemble-cell	cell1-hi field remove
       			cell0-hi field add
       end-step

We begin as usual with "new-experiment", to clear the machine and the
software to a standard state.  We then specify the size of the space.
Next we define names for fields in various subcells that we will be
using.  Once we specify the size of the space, all remaining memory is
available for use as extra instances of the space; we don't have to
explicitly declare the number of subcells that constitute a full cell.

After making the three lookup tables that we'll need, we're ready to
define our copy step.  In this step, we load a table, assemble the
bits we need, and run a scan; repeating this three times to complete
our algorithm.  The syntax and operation of "assemble-cell" will be
described in detail in the next sections; here simply note that we are
selecting for each bit-field which subcell the bit-field should come
from.  We initially assume that all bits come from subcell 0, and each
assemble-cell command removes the bits that we don't want, and
substitutes bits that we do want.  At the end of the step, we make
sure that we're looking only at bits from subcell 0 again, so that our
assumption at the begining of this step (or any other step that
follows this convention) will be valid.

Note also that before the first "run" we perform a kick of magnitude
zero.  Leaving out the kick before the run wouldn't do the same thing
as a kick of magnitude zero: its important that the the kick register
actually contain zero during a scan if you want a kick of zero.
(Notice that we don't need to specify a kick of zero for the other two
runs, since the kick register is already zero).


ASSEMBLE-CELL: THEORY OF OPERATION

The "assemble-cell" pseudo-instruction compiles into one or more
instructions that affect the offset register.  Everything it does is
implemented by manipulating offsets.  If you use assemble-cell as your
interface with the offset register, you shouldn't directly manipulate
offsets.

Part of the action of kicks is to increment the offset register --
there is a separate offset for each of the 16 bit-fields in the
hardware.  This change of offset results in a data shift: the offset
is added to a logical site-address in computing the physical memory
location where the site-data is actually stored.

If we want to have more than 16 independently shiftable bit-fields, we
need more than 16 offset registers.  All of the offsets for the extra
bits are kept in the memory of the host workstation.  As we remove
bit-fields and substitute others for them, we remove the offsets
associated with the original bit-fields, swapping in the offsets
associated with the new ones.  This is the first action of
"assemble-cell".

The second action of assemble-cell is to choose which instance of the
space a given bit-field comes from.  This is done by a kind of
"bank-switching".  If the portion of the space contained in one module
(called a SECTOR) is of size 2^n, then the first n-bits of the
site-address and of the offset have to do with where you are within
the space.  The remaining high-order bits have to do with which
instance of the space you are dealing with.  All offsets that refer to
bit-fields within, say, instance number 3 of your space would have
their high-order bits set to 3.  If such an offset is added to the
site-address, it will result in a pointer to a location within the
memory associated with instance number 3.  Thus as we swap in a new
offset for a given bit-field, we also swap in a pointer to the
instance of the space that this offset corresponds to.

Note that, in hardware, each module has its own set of 16 offset
registers.  We assume that all modules have been kicked identically,
so that this set of registers will be identical in all of them: in
software we only keep track of one set of offset registers for the
whole machine.


ASSEMBLE-CELL: SYNTAX AND USAGE

As with other CAM instructions, executing "assemble-cell" puts us in
an instruction-context in which an extra vocabulary is activated.
This context is ended when the next instruction is executed, which
triggers the compilation of an "assemble-cell" step-list entry based
on the parameter settings present at the end of the context.

An example of the usage of assemble-cell:

       	 assemble-cell	cell1-hi field remove
       			cell0-lo field remove
       			cell0-hi field add
       			cell1-lo field add

The assemble-cell-specific words "remove" and "add" create lists of
bit-fields to swap-out or swap-in in assembling the 16-bits to which
the next scan will apply.  When the next instruction is encountered,
an efficient program of register reads and writes will be constructed
to achieve the net effect indicated by the list of "add"s and
"remove"s.  If you add a bit-field without removing what's already
there, or if you remove a bit-field without adding in something to
replace it, then an error message will be generated.

Note that at this level, there is no checking of the assumptions you
make as to what's in the cell when you start the assembly.  If you
remove the "cell1-hi" field and it wasn't actually present, then you
have swapped out pointers to some other fields, and stored them as
pointers to the "cell1-hi" field.

The normal convention to use in defining steps is that all bit-fields
belong to subcell 0 at the start of the step, and have been restored
to subcell 0 at the end of the step.  Following this convention allows
you to execute your defined steps in any order.


ASSEMBLE-CELL: RELATED WORDS

If you need to define operations that work when the "start with
subcell 0" conventions are broken, you can use the words
"force-zero-subcell" and "restore-active-subcell".  These words
involve the SPARCstation CPU in looking at offsets, and so cannot be
compiled as part of a single step-list -- which must be executable
without any intervention of the CPU.

"force-zero-subcell" interrogates CAM to find out which subcells are
currently swapped in, and saves this information.  It swaps out all
the bit-fields that don't belong to subcell 0, and replaces them with
ones that do.  "restore-active-subcell" uses the information saved by
"force-zero-subcell" to restore the subcell state of all bit-fields to
what they were before being forced to zero.

The "check-subcells" command can be used to determine if subcell
offset-pointers have been corrupted by misuse of "assemble-cell".
