[nycphp-talk] PHP + UTF-8 + mb_string issue.
Mark Armendariz
lists at enobrev.com
Wed Mar 21 06:35:37 EDT 2007
> -----Original Message-----
> [mailto:talk-bounces at lists.nyphp.org] On Behalf Of Anirudh Zala
>
> Question is why PHP is not able to count length of given
> string in practical way. I am aware that current PHP versions
> are not aware of string, instead they just deal with bytes.
> In that case output is correct but this is not practical
> solution as length of word in Gujarati language is only "2"
> (In Indic languages, we have primary characters like "?" and
> secondary characters like "?", but there is not value of
> secondary characters without primary
> characters) and not "4" even if it requires 4 bytes to store data.
It's my understanding that the mbstring extension doesn't actually replace
php functions. If you're using the extension, you'll have to use the
mb_string functions, (mb_strlen in this case).
On another note, something to use if you don't / can't use the extensions:
http://dev.splitbrain.org/view/darcs/dokuwiki/inc/utf8.php
I grabbed this while doing research for a project I haven't started yet - so
I haven't had the chance to try it out, but it comes well recommended.
Specific to your cause (from the link):
/**
* Unicode aware replacement for strlen()
*
* utf8_decode() converts characters that are not in ISO-8859-1
* to '?', which, for the purpose of counting, is alright - It's
* even faster than mb_strlen.
*
* @author <chernyshevsky at hotmail dot com>
* @see strlen()
* @see utf8_decode()
*/
function utf8_strlen($string){
return strlen(utf8_decode($string));
}
I hope that works for you.
Mark Armendariz
More information about the talk
mailing list