NYCPHP Meetup

NYPHP.org

[nycphp-talk] Charsets are still driving me nuts

csnyder chsnyder at gmail.com
Thu Mar 6 17:15:40 EST 2008


I just found another potential gotcha when using Unicode throughout
your application: byte order marks in uploaded text files.
http://en.wikipedia.org/wiki/Byte_Order_Mark

Turns out Word puts a byte order mark (BOM) at the beginning of all
Unicode files. Unicode-friendly tools ignore it. PHP's fgets()
doesn't.

Detecting and stripping the BOM is an interesting exercise, because
strlen('') == 6, but it's really only 3 bytes long... not sure if
this is a bug or what, but it's certainly an annoyance.


-- 
Chris Snyder
http://chxo.com/


More information about the talk mailing list