NYCPHP Meetup

NYPHP.org

[nycphp-talk] Squashing accented characters

Brent Baisley brenttech at gmail.com
Mon Nov 1 17:01:15 EDT 2010


If you are using mysql on the backend, you can make your table UTF8, then your indexes would use utf8_general_ci collation by default. That collation basically strips out all accent marks on the data, then indexes it. So if you search for Dusseldorf or Düsseldorf, they will both come up with the same set of records. The you don't have to do anything on the PHP side.

Regards,
Brent

On Oct 22, 2010, at 2:50 PM, Paul A Houle wrote:

> For my site at
> 
> http://ookaboo.com/
> 
> I'm running into the problem that people are searching for "Dusseldorf" but the name of the place is "Düsseldorf",  so they don't find it.
> 
> It seems to me a good answer to this is to have some function that squashes accented characters down to unaccented forms.  I'd index the unaccented forms and also squash down queries so they'd always match up.  I definitely need to do both ISO-Latin-1 and the Latin-Extended-A,   because fate has given me a lot of place names that have the Polish dark L in them (ł).  It also seems like there are a lot of characters in Latin Extended-B that would also map plausably to unaccented characters.
> 
> I can see how to write something like this,  I'd need to parse out the Unicode code points from UTF-8 and run them through a lookup table,  but it's a lot of details and I wonder if anybody has written a PHP function to do this already.
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
> 
> http://www.nyphp.org/Show-Participation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20101101/b57a415f/attachment.html>


More information about the talk mailing list