\.
This experiment demonstrates some image processing operations. We load
7-bit video-data (a film-strip) into CAM memory, and then process it using
a 9-point approximation of the diffusion equation. The parameters can be
varied to see the effects of changing the scaling value for the center,
nearest neighbors, next-orthogonal neighbors, or overal normalization.
The first three subcells are used for our transformation; we keep our
filmstrip in consecutive subcells following this. We begin our processing
by copying the next image to process into the "original-image" field of
subcell 2. This is done so that we can separate the copying step (which
is different for each image) from the processing step (which is the same
for all images).
In our processing steps, we keep the lo part of our accumulated result in
"image-lo"; the high part will be accumulated in "accumulator-for-hi".
Each phase of processing begins by bringing in a new copy of the image
into "image-hi" and shifting it to get the desired neighbor. A scaled
addition is then performed: ScaleFactor * new-neighbor is added to the
low-part of the accumulated result ("image-lo", treated as an unsigned
integer). The high part of the resulting value is added as a signed 7-bit
quantity to the signed 7-bit "accumulator-for-high". This process is
repeated for each neighbor. For the first neighbor processed, the lo and
hi parts of the accumulated result are treated as zero.
The algorithm runs at 15 updates/second of a 512x512 image (including
display after each update). The speed would be the same with direct CCD
camera input. With a little work, the algorithm should run at close to 20
frames per second. Keys are defined to change the scaling weights used by
the algorithm. The "O" key sets the weights so that the original pixel
has weight 1, and all others weight zero (lets you see the unprocessed
filmstrip).
.\
new-experiment 512 by 512 space 64 subcells load system.3.96.fth
"" sweet-hot.pat 0 64 file>subcells
\ ********************** subcell definitions **********************
0 subcell:
0 6 == image-lo
7 13 == image-hi
14 15 == function
0 13 == result
1 subcell:
0 6 == accumulator-for-hi
2 subcell:
0 6 == original-image
3 constant first-data-subcell
61 constant #data-subcells
\ ********************* rules that we'll need ********************
\* We use 2 bits to specify a function, so that each lookup table can be
used for 4 different rules. This allows us to have up to 8 rules with 2
lookup tables, without having to download rules as part of our updating
inner-loop (which would slow things down for the small spaces we use for
image processing). In fact, in this example we only needed 6 of our 8
choices. If we had needed more than 8 tables, we would have used short
tables (13 inputs), and we could then download tables as often as we like
without slowing things down for images of the size we're working with.
Note that we define addition with a choice of one of three scale factors
(each of which can be positive, negative, or zero). The "add-signed" rule
is used when adding the high part of the result of the first addition to
the high part of the accumulated value from previous additions (a signed
number). Since the high parts are treated as signed, we should make sure
that our accumulated totals cannot exceed 8191 in absolute value at any
point.
The "normalize" function will be applied to our final 14-bit result, to
rescale it using a final, overall scale factor (which could as easily have
been a fraction, but here is just treated as division by an integer). The
final result is then coerced into being a 7-bit value. *\
1 constant scale1
1 constant scale2
1 constant scale3
1 constant scale4
: add-signed image-lo 7 wide image-hi 7 wide + -> result ;
: add-scale1 image-lo image-hi scale1 * + -> result ;
: add-scale2 image-lo image-hi scale2 * + -> result ;
: add-scale3 image-lo image-hi scale3 * + -> result ;
: add-rule function {{ add-signed add-scale1 add-scale2 add-scale3 }}
;
create-lut table0
: copy-lo2hi image-lo -> image-hi ;
: normalize result 14 wide scale4 / 0 max 127 min -> result ;
: copy-norm function 2 mod {{ copy-lo2hi normalize }}
;
create-lut table1
\* Since we're only using two tables, they can be downloaded whenever we
start to run, and don't need to be sent as part of the steps. *\
define-step send-tables
lut-data table0 switch-luts
lut-data table1
end-step
this is when-starting
\ ************ words for defining our transformation **************
\* These constants, together with the words "fn" and "0fn", will run a
selected rule by switching luts as appropriate, and setting the "function"
bits as appropriate. Note that, by dividing each table into 4
sub-functions (selected by the two "function" bits), we can have up to 8
different 14-bit operations available without any lut-downloading overhead
(which would slow us down for full tables on such a small space). In
fact, only 6 of these 9 possible operations are defined here: we can add
more functions later if we need to.
"fn" runs the function as defined, "0fn" forces the low part of the input
to the lut to be zero during execution of the function: this is useful
during the first phase of the accumulation algorithm. *\
0 0 2constant signed+
1 0 2constant scale1+
2 0 2constant scale2+
3 0 2constant scale3+
0 1 2constant lo2hi
1 1 2constant norm
: fn (s fn# tab# -- )
2dup
if switch-luts then
site-src lut function field 0 fix
lut-src site function field fix
run
if switch-luts then
drop
;
\ same as fn, but with image-lo input to lut held at 0
: 0fn (s fn# tab# -- )
2dup
if switch-luts then
site-src lut function field 0 fix
lut-src site image-lo field 0 fix function field fix
run
if switch-luts then
drop
;
\* These words bring the desired data together before a function is
executed. "image" brings together the low and high parts of subcell zero,
where the final image value will be constructed. "accum-hi" brings
together the high part of the result of and addition, together with a
field where we are accumulating the high part of our overall result.
"x,y" makes a copy of the original image in "image-hi", and then shifts it
by x and y amounts indicated on the stack. The low part of the acculated
overall result is in "image-lo". *\
: image
0 activate-subcell kick
;
: accum-hi
{ accumulator-for-hi image-hi } assemble-fields kick
;
: x,y (s x y -- )
{ original-image image-hi } assemble-fields kick lo2hi fn
0 activate-subcell kick image-hi field y x
;
\* "save-original" moves the copy of the original value from "image-hi"
into "original-image", where "x,y" will look for it. We could have
defined a copy rule for doing this, but we used one of our addition rules
instead (with "image-lo" fixed at 0) to avoid defining unecessary rules. *\
: save-original
{ original-image image-hi } assemble-fields kick signed+ 0fn
;
\ ****************** performing the tranformation *****************
\* This is a 9 point transformation. We start by copying the original
image from "image-hi" into "original-image" for later use by "x,y": each
application of "x,y" gives us a shifted copy back in "image-hi".
The overall operation that we want to achieve is to accumulate scaled
copies of neighbors, and then normalize the final result. If, in the
description below, we call the low and high halves of our 14-bit
accumulator "accum_lo" and "accum_hi", then each step of our accumulation
algorithm looks like this:
accum_lo + accum_hi * 2^7 is added to
neighbor * ScaleFactor to give a new 14 bit
accum_lo + accum_hi * 2^7
This is actually done in two steps. First
accum_lo is added to
neighbor * ScaleFactor to give a 14 bit
accum_lo + intermediate_hi * 2^7
Since our accumulated result is a two's complement signed value, we can
add the low part, accum_lo, as if it were a positive number. The neighbor
is known to be a positive 7-bit number, but after scaling it may be
negative. A signed addition of
accum_hi * 2^7 and
accum_lo + intermediate_hi * 2^7 would now give the desired updated
accum_lo + accum_hi * 2^7
But this addition only involves the high bits of all of these quantities,
and so can be done using only those bits. When the final result is
accumulated, it is scaled and coerced into the right range, and we're
done. Note that if our accumulator ever exceeds the limits of a 14-bit
signed number, then our arithmetic will be incorrect. *\
define-step 9point-transform save-original
0 0 x,y scale1+ 0fn accum-hi signed+ 0fn
1 0 x,y scale2+ fn accum-hi signed+ fn
0 1 x,y scale2+ fn accum-hi signed+ fn
-1 0 x,y scale2+ fn accum-hi signed+ fn
0 -1 x,y scale2+ fn accum-hi signed+ fn
2 0 x,y scale3+ fn accum-hi signed+ fn
0 2 x,y scale3+ fn accum-hi signed+ fn
-2 0 x,y scale3+ fn accum-hi signed+ fn
0 -2 x,y scale3+ fn accum-hi signed+ fn
lo2hi fn image norm fn
end-step
\* Define a few different scaled-addition rules, to modify the action of
this 9 point transformation: *\
: rescale (s scale1 scale2 scale3 scale4 -- )
is scale4 is scale3 is scale2 is scale1
['] add-rule ['] table0 table!
['] copy-norm ['] table1 table!
;
\* To get a 9 point diffusion with the following approximation of the
Laplacian,
D^2 u_0,0 = [16(u_1,0 + u_0,1 + u_-1,0 + u_0,-1) - 60 u_0,0
- (u_2,0 + u_0,2 + u_-2,0 + u_0,-2)]/(12 h^2) + O(h^4)
we use the scaling factors taken from this equation, plus u_0,0 (this
assumes a diffusion coefficient of 1, and that h=1). *\
: best-9point -48 16 -1 12 rescale
;
press N "Use best 9-point transform parameters."
\* For a 5 point diffusion, we just set the longer-range scaling
coefficient to 0, and derive our scales from the following:
D^2 u_0,0 = (u_1,0 + u_0,1 + u_-1,0 + u_0,-1 - 4 u_0,0)/h^2 + O(h^2)
again assuming a diffusion coefficient of 1, and h=1. *\
: best-5point -3 1 0 1 rescale
;
press F "Use 5-point transform parameters."
\* We can use the same algorithm to display the original data by setting
all of our scaling factors to either 0 or 1. *\
: show-original 1 0 0 1 rescale
;
press O "Show original (use identity tranform parameters)."
\ *********************** defining the steps **********************
\* The only difficulty left is bringing in the filmstrip images one at a
time, and then processing them. We make our "update-step" ("trans-step")
depend on the "step-count", so that it can cycle through the filmstrip,
processing a different image each time it's executed.
To copy the n-th image, we must make the right data visible, and then
execute our "lo2hi" function. "image-n" assembles the bits we need, then
we define compiled steps for each copy that we need (we do this with a
loop inside of "make-copy-steps". Then we define "copy-n" to be a case
statement that contains all of these named steps. Finally, "trans-step"
copies the next image (the image number is "step-count" mod the number of
images available), and then performs the 9 point transform on the image. *\
: image-n (s image# -- )
first-data-subcell + ( subcell# )
image-lo field layer-mask @ swap
image-hi field layer-mask @ 0 2 activate-bit-fields
;
: make-copy-steps
#data-subcells 0
?do
i "" copy- name-n "define-step
i image-n kick lo2hi fn
0 activate-subcell end-step
loop
;
make-copy-steps
: compile-copy-cfas
#data-subcells 0
?do i "" copy- name-n find drop (compile) loop
;
: copy-n {{ [ compile-copy-cfas ] }} ;
: trans-step step-count @ #data-subcells mod copy-n 9point-transform
;
this is update-step
\ ****************** colormap and initialization ******************
: gmap image-lo 1 << >gray ;
colormap gmap
best-9point