[nycphp-talk] Regular Expressions & Foreign Characters
David Sklar
sklar at sklar.com
Wed Sep 17 11:20:38 EDT 2003
On Wednesday, September 17, 2003 10:59 AM, wrote:
> If I understand correctly, a regular expression like this:
> ^[a-z0-9\',.
> -]{1,35}$/I will not allow foreign characters, e.g., Ë, because it is
> not part of the regular ASCII set of characters but part of the
> extended set. So...what's a kid to do?
Use a POSIX named character class. These respect locale settings:
preg_match('/[[:alnum:]]/','Ë');
This returns true under a locale like 'en_US', or 'de_DE'.
Read all about POSIX named character classes in the egrep(1) manpage.
You should probably call setlocale() in your PHP script before
preg_match()ing against special characters, the default locale (often "C")
may not include these characters in the "alnum" or "alpha" classes. E.g.:
setlocale(LC_CTYPE,'en_US');
or
setlocale(LC_ALL,'en_US');
David
More information about the talk
mailing list