NYCPHP Meetup

NYPHP.org

[nycphp-talk] Squashing accented characters

Paul A Houle paul at devonianfarm.com
Fri Oct 22 14:50:59 EDT 2010


  For my site at

http://ookaboo.com/

I'm running into the problem that people are searching for "Dusseldorf" 
but the name of the place is "Düsseldorf",  so they don't find it.

It seems to me a good answer to this is to have some function that 
squashes accented characters down to unaccented forms.  I'd index the 
unaccented forms and also squash down queries so they'd always match 
up.  I definitely need to do both ISO-Latin-1 and the 
Latin-Extended-A,   because fate has given me a lot of place names that 
have the Polish dark L in them (? 
<http://fileformat.info/info/unicode/char/0142/>).  It also seems like 
there are a lot of characters in Latin Extended-B that would also map 
plausably to unaccented characters.

I can see how to write something like this,  I'd need to parse out the 
Unicode code points from UTF-8 and run them through a lookup table,  but 
it's a lot of details and I wonder if anybody has written a PHP function 
to do this already.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20101022/1d126166/attachment.html>


More information about the talk mailing list