[Prev][Next][Index][Thread]

Re: Kanji problems



Hi Kazushi,

Your plan sounds very interesting.

I don't have any direct experience beyond using the standard ISO Latin for
the Web, but here is my understanding of the issue.

1. You will need a browser supporting the correct character set for display of
Kanji.

2. You will need to emit the appropriate mime headers from CL-HTTP that tell
the browser which character set to use.

3. You will need to write the appropriate bytes to the HTTP stream so that
the browser
can display them.

Observations:

a. You can define the export-type and kanji data-type keywords as follows:

(define-url-export-types
  (:html-kanji-file :html-kanji (:text :html :charset :iso-2022-jp)
:copy-mode :text)
  (:text-kanji-file :text-kanji (:text :plain :charset :iso-2022-jp)
:copy-mode :text))

After evaluating the above form, you will have export types for
:html-kanji-file and :text-kanji-file
that allow you to export static files.

You will also have content type keywords :html-kanji and  :text-kanji that
you can use
with the macro with-successful-response and friends.

I would make sure that you like these names for keywords because they will
propagate throughout
everyones code. Also, the name should reflect the appropriate level of
generality associated with
the particular character style, i.e. if there are other charsets that might
be  considered kanji,  these
names should probably use the ISO name.

It might be better to simply provide a charset keyword to the standard html
and text export or content
type keywords to acheive the desired effect, but that would take a small
amount of  reegineering
of the code.  We may do that but we will need a working example so that we
are sure to get it write.

b. Make sure that :iso-2022-jp is the character set that you want.  Others
are noted in
http:servers;headers.lisp, which  the followin form defines the
mime-content type.

(define-mime-content-type
  :text
  :minor-types (:plain :html)
  :parameters ((:charset :us-ascii :iso-8859-1 :iso-8859-2 :iso-8859-3
:iso-8859-4
                         :iso-8859-5 :iso-8859-6 :iso-8859-7 :iso-8859-8
:iso-8859-9
                         :iso-2022-jp :iso-2022-jp :iso-2022-kr
                         :unicode-1-1 :unicode-2-2-utf-7 :unicode-2-2-utf-7)))

b. It seems to me that you will not want Ascii translation turned on (i.e.,
adding a CR-LF for
each CR). This would mean you would say :copy-mode :binary in a.

c I don't know what write-char does with a kanji character. Does it write
two bytes?
If not, one will need to arrange for this.  I suppose one option is to
build a kanji translating
http stream and to provide a way enable the mode. A better solution would
be a facility
that works for all multi-byte character sets.  Again, this involves a small
amount of
reegineering on the HTTP stream, whereby it supports more classes of
character translations
than merely ascii. Not a big deal.

d. Being able to work with kanji character in your code would be much
better than requiring
user code to perform the byte translation themselves.  This argues in favor
of the translating stream
approach.

e. BUT, if you know the number of bytes in a kanji file, then you can
easily arrange for
persistent connections to work for you export type. Otherwise, one gets
into caching
issues.

I hope this helps get you started.  Let us know how your application
progresses because we would like
to povide  default support for character sets beyond the usual iso latin.
If you send us the code need to
get your application running, we can review it and see how a
general-purpose version can be
incorporated into CL-HTTP.

Regards,

John C. Mallery
Artificial Intelligence Laboratory
Massachusetts Institute of technology
545 Technology Square, NE43-797
Cambridge, MA 02139 USA