[nycphp-talk] Re: regexp for URLs (is this correct?)
Jayesh Sheth
jayeshsh at ceruleansky.com
Mon May 3 22:08:27 EDT 2004
Hello all,
thanks for the excellent pointers regarding URL validation.
I think that since I am only validating http:// and https:// URLs for
now, the really (30 - 50 line) long one would be too much to incorporate
for this job ...
But I just realized some things that were wrong with my previous regular
expressions (those matching 'http://www.google.com' and 'www.google.com'
respectively):
a) I could check for a optional slash at the end by using something like:
/?
b) In both cases, input such as the following would fail:
http://www.google.com/something/something.html
OR
www.google.com/something/something.html
With a bit of experimenting (I really need to upload my interactive
Perl-regex tester script to my public scripts area), I came up with the
following:
/*
Should match:
http://www.google.com/something/something.html
http://www.google.com/something/something
http://www.google.com/something/something/
http://www.google.com/
http://www.google.com
*/
#^(([a-z]{3,5})://)(([0-9a-z-]+\.)+[0-9a-z]{2,6})((/[0-9a-z-]*)+?/?([0-9a-z-.]*)+?)$#i
and
/*
Should match:
www.google.com/something/something.html
www.google.com/something/something
www.google.com/something/something/
www.google.com/
www.google.com
*/
#^(([0-9a-z-]+\.)+[0-9a-z]{2,6})((/[0-9a-z-]*)+?/?([0-9a-z-.]*)+?)$#i
It is getting close to my bed time now, so I am not sure how correct
these are. I will do some more testing tomorrow. If, however, they do
work, they might be of use to others.
If anyone finds anything wrong with them, please let me know.
Best Regards,
- Jay
PS: I changed my regexps to allow 6 letter domains ( .museum ) after
reading some responses today. The [0-9a-z]{2,6} part, that is.
PPS: I will be using these expressions to mostly evaluate newly
submitted URLs via a text input box. The other regexps that I posted
recently were for batch validation (and transformation / import) of lots
of invalid MySQL data. Thanks for the fopen() tip, David.
More information about the talk
mailing list