NYCPHP Meetup

NYPHP.org

[nycphp-talk] Character set issues revisited

Michael B Allen ioplex at gmail.com
Fri Oct 19 14:29:55 EDT 2007


On 10/19/07, Cliff Hirsch <cliff at pinestream.com> wrote:
>
>  There was recently a thread about some character set problem. I just found
> a similar issue. I just transferred a site from a Windows XP dev. platform
> to rhel. Everything looks fine except for a few special characters.
>
>  Windows   -> rhel
>  it's           -> it?s
>  —            -> ? (should be the long dash, an em I think)
>  'blahblah' -> ?blahblah?
>  "                  -> ?

Hey Cliff,

That's actually not a character encoding issue. The '?' or an empty
box is commonly displayed whenever a glyph associated with a character
value is not available. Meaning the client doesn't have the necessary
font. Also meaning, whatever editor was used to input those single
quotes didn't input the more common ASCII single quote character value
of 0x27. If you hexdump that content you'll see it's something else
(it will probably be a multibyte UTF-8 secquence which when decoded
will give you a Unicode value that you can lookup in Adobe's glyph
tables).

This is the sort of thing that happends when you create some content
with a word processor and then copy and paste it into the web page.

The way to fix this problem is to just seek and destory all of those
characters and replace them with their more common equivalent values
(e.g. the single quote 0x27 ASCII value).

Or you could install whatever wacked out font that has that character
on every client that will ever visit the page but that's probably not
the more desirable solution.

>  In phpMyAdmin I see: can't
>  In my app, I see: can?t
>  So phpMyAdmin is displaying things correctly on either platform.

That's odd. Maybe phpMyAdmin is doing some transliteration.

>  Where should I start looking? What is the best charset to use anyway?
> Iso-8859-1 or utf-8?

Look at the page with hexdump to see verify what the encoding is and
what the unicode value of one of the errant characters really is. Then
you can start to figure out where things went wrong.

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/



More information about the talk mailing list