[Prev][Next][Index][Thread]

First stab at alternate parser for SEARCH URL's/GET forms



Greetings (to olivier.clarisse@lucent.com and all those cc:ed),

[olivier.clarisse@lucent.com writes:]
>
>If we use a parser with (& + =) precedence we have GET working like
>POST for search keys and break existing user code. If we don't, we
>need a second pass to disambiguate the current result. We can preserve
>the existing parser for GET. Then, if the user code chooses
>to it can call a disambiguator function on the URL:SEARCH-KEYS
>as a second step.

I had the same motivation -- trying to use the same parsing
for GET and POST, but I think there may be technical difficulties
(see below).  I also agree with not wanting to break existing code,
and an alternate parser invokable at the user's option would seem
the right way to go.

Here's my inital proposal of an alternate parser.  It presents
an optional extension of the existing mechanism (so shouldn't
break any existing code):

(I'm sure this can be improved upon! so have at it ...)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Parsing arg-string of search-url (HTTP-SEARCH object) into a query-alist
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;

(in-package :http-user)

;;; This parser method can be called in the query-response-function
;;; to approximate POST-style query-alist parsing of the GET args
;;; It takes the url (as passed to the response function) as input
;;; and returns a query-alist.

(defmethod parse-get-args-as-query-alist ((url url:http-search))
  (with-slots (name-string)
              url
    (let* ((last-?-position (position #\? name-string :from-end t))
           (arg-string (http::string-unescape-special-chars
                        (subseq name-string
                                (if last-?-position
                                  (1+ last-?-position)
                                  (length name-string)))))) ; no delimiter
      (print arg-string)
      (with-input-from-string (arg-stream arg-string)
        (http::parse-form-raw-values arg-stream (length arg-string))))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

This hack seems to work for vanilla uses of GET FORMS. Basically,
it takes the name-string of the url, and finds the arg-string by taking
everything after the last "?".  Then it decodes escape characters, and
calls http::parse-form-raw-values on a stream version of the decoded-argstring.

There seems to be one very annoying TECHNICAL DIFFICULTY involving special
characters.  POST seems to properly handle special characters in submitted
text (eg "&", "%", "+", "=").  I don't know exactly how POST does this,
but it works!  Unfortunately, GET doesn't seem to do the right thing,
at least judging by the namestring that appears in the server
HTTP-SEARCH url object.

Here is an illustration of the problem:

First, here's is what the arg-list part of the url looks like
in my Netscape Location Box:

"fullname=special+char+test+%3D+also+%26%2C+%25%2C+and+%3F+too&hiddenarg=hid
den+text&submitname=Submit+Query"

Compare that with the arg-string (before de-coding special chars)
as stripped off of the HTTP-SEARCH object NAME-STRING:

"fullname%3Dspecial+char+test+%3D+also+%26,+%25,+and+%3F+too%26hiddenarg%3Dh
idden+text%26submitname%3DSubmit+Query"

Notice that the first string has both "=" and encoded "="'s (%3D).
The query delimiters "=" and "&" are NOT encoded so they can
be uniquely identified.  Unfortunately, in the namestring version of
the args, the query delimiters have been encoded, BUT the "%" special
characters have NOT been re-encoded (which IMHO would be the right
thing to do).  I don't know if this is the "fault" of the way Netscape
encodes its request, or how CL-HTTP generates the name-string from
the client request.  In any case, the name-string version is
IN PRINCIPLE ambiguous, because there is no way to recover the
original delimiters (sigh!).  Does anybody know the spec for
URL encoding?? Doesn't it require "%" escape chars to get re-encoded??
If this were done, then all should work quite nicely!
Incidentally, both Netscape and Internet Explorer seem to be doing
the same thing in this regard (though I can't rule out that something
is amiss on the CL-HTTP server side).  Is there any easy way for me
to see the full text of the request being sent from the client??

The good news is that this hack should work for most vanilla submissions
from GET FORMS that do not involve special chars. Here are some special chars
that DO cause problems:  "+" "&" "=" "%".

Another caveat, is that the above parse method does not trap errors,
so you should probably do that yourself in your response function.

For myself, I'm now convinced to use POST instead of GET, but
unfortunately my Netscape hangs on my POST's. Netscape 3.0b4
running on PowerPC (my 68k version, also 3.0b4, at home home works
ok on the same posts).  This finally convinced me to start looking at
internet explorer, which handles the posts ok as well, and when IE
finally has java i'll be all set [maybe :) ].

Enjoy,
Glenn








-------------------------------------------------------------
Glenn A. Iba
GTE Laboratories
40 Sylvan Rd., MS-44
Waltham, MA 02254

e-mail: giba@gte.com
phone: 617-466-4233