[nycphp-talk] enforcing Latin-1 input
Allen Shaw
ashaw at polymerdb.org
Wed Nov 23 14:40:53 EST 2005
Mikko Rantalainen wrote:
> Allen Shaw wrote:
>> [snip] what you think of this half-baked idea: [snip...]
>
> You cannot trust that behavior. Specification only says (IIRC) that
> the user agent MUST not send characters outside iso-8859-1 on such a
> form.
Okay, I'm in way over my head here. I'd like to get my hands on that
spec -- would you have a link or some reasonably unique keywords to
google for (w3c, character encoding, specification, etc. don't seem to
be cutting it...)? I should just dig in there and understand what I'm
doing before trying to implement anything, I think.
> I guess that what I'm trying to tell you is that to *force*
> iso-8859-1 input only, you're going to have to use UTF-8 for the
> form and you'ge going to have to use UTF-8 internally. That's the
> only way you can really get in iso-8859-1 encoding the same data the
> user really tried to input.
What I'm really trying to do is not encode their input into Latin-1, but
figure out if they _entered_ Latin-1 characters in the form and if so
accept it, or if not, reject it and tell them why. If we just encode
their Chinese characters into latin-1 neither I nor anybody around me
will be able to read it, not in any encoding or character set, because
of human language limitations; so I want to require the user to enter
either common western characters only or nothing at all. Anyway, maybe
it's a fool's errand...
Thanks for bouncing this around with me. If I do go with any particular
approach I'll let you know as an update.
- Allen
--
Allen Shaw
Polymer (http://polymerdb.org)
More information about the talk
mailing list