NYCPHP Meetup

NYPHP.org

[nycphp-talk] (ir) regular expressions (stupid me)

Jayesh Sheth jayeshsh at ceruleansky.com
Sat Apr 24 09:50:48 EDT 2004


Hello all,

I am sorry if you get this twice. Like an idiot, I sent this message 
from my other email to the NYPHP list first. The thing is that I am 
subscribed from this email address. That message will probably be 
rejected, so I am sending this again. If it went to Hans for approval, I 
  apologize.

Here is the message I sent:

I have recently had the need to do some specific data validation of 
information entered through a form. I have two questions.

A)
In the first case, I have to make sure that the information entered in a 
text input box looks as follows:
123 Elm St., Brooklyn, NY

I will also accept is as follows:
123 Elm St,Brooklyn,NY

or like this:

123 Elm St , Brooklyn, NY

So what I am saying, is that I need to check for two commas, an alpha 
numeric string before the first comma, a capitalized city name after the 
first comma, and two capital letters after the second comma (for the 
state). I will ignore whitespace before or after the commas. (That 
whitespace can be trimmed programmatically).

I find POSIX style expressions using the ereg() function to be a bit 
easier to learn than their Perl equivalents. Here is what I came up with 
using an ereg() expression:

^([[:alnum:]]+[\.]{0,}[[:space:]]{0,}){1,},([[:space:]]{0,}[[:upper:]][[:alpha:]]+),([[:space:]]{0,}[[:upper:]]{2})$

I am not sure if it will work in all the situations I descirbed above. 
The syntax might also be a bit weird (character classes and things such 
as {0,} ).

B) In the second case, what I want to check for seems to be much 
simpler, but I having no luck.

I want to check the part returned by the first bracket in the previous 
expression (i.e. "123 Elm St.) for the existence of the string "St" but 
not for "St." . In other words, I am looking for an abbreviation of 
Street that does not use a period. I know that you can use a carrot 
inside square brackets ('character classes') as a negation. But this 
does not seem to work with string literals.

For example:
St[^\.]

or

St[^St\.]

produce unreliable results.

Can anyone help me in asking the ereg() function politely that I do not 
want a match if the string contains "St.", but I do want a match if it 
contains "St" ?

Thanks a lot in advance.

Best Regards,

- Jay






More information about the talk mailing list