NYCPHP Meetup

NYPHP.org

[nycphp-talk] UTF-8, databases and best practices

Hans Zaunere bulk at zaunere.com
Wed May 23 13:25:28 EDT 2012


Hi Eugenio,

> I need to distribute an application that potentially can be used with
> many different DBMSs (such as MySQL, PostgreSQL, SQLite, Microsoft SQL
> Server). The charset used in the databases can be ANY.
> 
> I would like to always output UTF-8 text when possible and my
> questions are about the current best practices to handle this kind of
> application with PHP.
> 
> 1) As far as I know, PHP still doesn't support natively utf-8 so to
> avoid problems with string functions, I still have to use mbstring
> fucntions, am I right? What does PHP 5.4 change about that?

AFAIK, correct, and there hasn't been many significant changes with this
recently.

> 2) How to handle the fact that the data I receive from the database
> can be stored using any possible charset? Do I need iconv functions
> and convert everything in utf-8? And then convert it back in the
> original charset when I have to write to the DB?

I'd be interested to hear other's thoughts, but the general consensus these
days is "convert all to UTF-8".  Is there an application-requirement-reason
that you'd need to convert data to a different charset at different times?

In general:

1. Raw data (any charset/encoding)
2. Detect and convert to UTF 8, clean-up, etc.
3. Store in database/etc
4. Read/display in UTF 8

This should support the vast majority of written human languages, though I
believe there are some exceptions.

H





More information about the talk mailing list