9500 crashes (still going): what I've done

To: weltyc@cs.vassar.edu
Subject: 9500 crashes (still going): what I've done
From: hall@research.att.com (Bob Hall)
Date: Wed, 6 Mar 96 13:56:30 EST
Cc: bug-mcl@digitool.com, www-cl@ai.mit.edu
In-Reply-To: Chris Welty's message of Wed, 6 Mar 96 11:30:09 EST <9603061630.AA06188@cs.vassar.edu>
Sender: owner-www-cl@ai.mit.edu
   Date: Wed, 6 Mar 96 11:30:09 EST
   From: weltyc@cs.vassar.edu (Chris Welty)

   I'm running cl-http on a 9500 with all the latest patches and fixes,
   etc.  I haven't tried the experimental mactcp interface, other than
   that I believe I've tried everything.
   [...]
   Is there any data I can collect that would be more meaningful?  Any
   suggestions on how to deal with this?  This is a cry for help.

I have recently been playing with reimplementing parts of the MacTCP
interface more or less from scratch by actually reading the MacTCP
documentation that is provided with MacTCP 2.0.6.  Based on comments on
the www-cl and info-mcl lists and my own reading of the code, I've
formulated a few theories about these mysterious crashes.  Empirically
speaking, my hacking seems to have fixed the crashes (I have not
experienced any since my new code was installed), but I have not yet
firmly convinced myself that I have fixed the problem, and I have also
*not* been running CL-HTTP, but my own web-server-like application, so I
don't even know if CL-HTTP would be fixed by hacking like mine.  But I'll
float the theories here anyway in case anybody wants to comment or try
them out.

  [The first is actually just an instance of the remark made by someone
   in the www-cl group about the shared parameter blocks.]

- When one does a passive open, the open may return from mactcp.lisp while
  the MacTCP driver call is still in progress because it is done 
  asynchronously and yet the code doesn't wait (because it wouldn't make
  sense to wait).  Thus, the parameter block used to initiate the
  passive open call (which is the shared one allocated to the tcp-stream)
  should (at least theoretically) not be used again until the driver
  call completes, which doesn't happen until someone connects to the
  port or until the timeout occurs.

  HOWEVER, most mactcp.lisp operations share the same parameter
  block so, in particular, %tcp-control(...,$TCPStatus) calls (which one
  must do to tell whether the passive-open has been completed yet) could 
  overwrite the contents of the parameter block before the pending
  passive-open call completes.  Other %tcp-control calls also share the
  parameter block, so they could also if they were done before the
  passive open completes.

  In my implementation, I've kept the shared param block but only use it
  for creates and opens (active and passive).  In all other calls I use
  (rlet ((pb :tcpiopb))...) and reinitialize the necessary fields.

- Another (theoretical) problem with the %tcp-control(...,$TCPStatus)
  calls is that they are done with ignore-errors t and yet a value
  is always read back out of the param block at the end.  Now, it seems
  to me that if the status driver-call ended in an error, there'd be no
  guarantee against the param block status slot having garbage.  Thus,
  one might be tricked into thinking the connection was established when in
  fact it wasn't there at all, or was still listening (e.g., if a MacOS
  error caused premature status termination).  This could also cause one's
  code to do further driver calls, increasing the potential for munging the
  shared param block.

- I have also redone memory allocation so that my TCP-connection structure
  (which includes the read and write buffers, rds's, etc like the conn and
  pb structures do in the current mactcp.lisp)
  is allocated once and explicitly freed and kept on my own free list.
  This allows me to control the number and time #_Newptr is called,
  thus presumably reducing the likelihood of the Mac OS memory allocator
  causing a crash for some unknown/unknowable reason.  E.g., I can 
  preallocate a pool of $n$ tcp connections and then never again
  bother the Mac OS about it.  (If the pool maxes out, then subsequent
  calls just process-wait until one becomes free again.)
  This seems to me a more precise approach than just preallocating and
  disposing of a gazillion bytes of Mac Heap space as was suggested before.
  Note that this pool approach requires that all tcp-connections have the
  same buffer sizes, but that's how I use them anyway.

I hope these ideas are useful to somebody.  My code is not a full
reimplementation of mactcp functionality (I just reimplemented the parts
I use), so it couldn't simply be plugged into CL-HTTP and tried out.  But
perhaps it is a worthwhile starting point for someone fixing mactcp.lisp.

-- Bob
References:
9500 crashes (still going)
From: weltyc@cs.vassar.edu (Chris Welty)