[nycphp-talk] (ir) regular expressions (stupid me)
Jayesh Sheth
jayeshsh at ceruleansky.com
Sat Apr 24 09:50:48 EDT 2004
Hello all,
I am sorry if you get this twice. Like an idiot, I sent this message
from my other email to the NYPHP list first. The thing is that I am
subscribed from this email address. That message will probably be
rejected, so I am sending this again. If it went to Hans for approval, I
apologize.
Here is the message I sent:
I have recently had the need to do some specific data validation of
information entered through a form. I have two questions.
A)
In the first case, I have to make sure that the information entered in a
text input box looks as follows:
123 Elm St., Brooklyn, NY
I will also accept is as follows:
123 Elm St,Brooklyn,NY
or like this:
123 Elm St , Brooklyn, NY
So what I am saying, is that I need to check for two commas, an alpha
numeric string before the first comma, a capitalized city name after the
first comma, and two capital letters after the second comma (for the
state). I will ignore whitespace before or after the commas. (That
whitespace can be trimmed programmatically).
I find POSIX style expressions using the ereg() function to be a bit
easier to learn than their Perl equivalents. Here is what I came up with
using an ereg() expression:
^([[:alnum:]]+[\.]{0,}[[:space:]]{0,}){1,},([[:space:]]{0,}[[:upper:]][[:alpha:]]+),([[:space:]]{0,}[[:upper:]]{2})$
I am not sure if it will work in all the situations I descirbed above.
The syntax might also be a bit weird (character classes and things such
as {0,} ).
B) In the second case, what I want to check for seems to be much
simpler, but I having no luck.
I want to check the part returned by the first bracket in the previous
expression (i.e. "123 Elm St.) for the existence of the string "St" but
not for "St." . In other words, I am looking for an abbreviation of
Street that does not use a period. I know that you can use a carrot
inside square brackets ('character classes') as a negation. But this
does not seem to work with string literals.
For example:
St[^\.]
or
St[^St\.]
produce unreliable results.
Can anyone help me in asking the ereg() function politely that I do not
want a match if the string contains "St.", but I do want a match if it
contains "St" ?
Thanks a lot in advance.
Best Regards,
- Jay
More information about the talk
mailing list