[nycphp-talk] A good PCRE expression for matching URLs
Michael B Allen
ioplex at gmail.com
Thu Jul 24 19:50:13 EDT 2008
On Thu, Jul 24, 2008 at 5:34 PM, John Campbell <jcampbell1 at gmail.com> wrote:
> On Thu, Jul 24, 2008 at 4:32 PM, Michael B Allen <ioplex at gmail.com> wrote:
>> On Thu, Jul 24, 2008 at 2:37 PM, John Campbell <jcampbell1 at gmail.com> wrote:
>>> On Thu, Jul 24, 2008 at 2:19 PM, Michael B Allen <ioplex at gmail.com> wrote:
>>> What is the context for the matching?
>>
>> This will be used to pick out URLs in Creole Wiki markup. Which
>> incedentally is not supposed to match characters that can occur
>> naturally at the end of a sentence (,.?!:;"') so I guess I need to
>> leave out '.' and ';' for my particular application.
>>
>
> Many urls contain a question mark. Why not just accept anything
> except a period or an question mark at the end?
> (http://|ftp://|mailto:).*?[\.\?]?\s
Despite the fact that things should be escaped when output, I think
it's a good opportunity to effectively validate things.
But it would be nice to exclude those end-of-sentence punctuation from
the capture output. I tried the following minimalistic expression just
to try and get the trailing condition right I'm not able to
distinguish between a dot that is part of the URL and a period at the
end.
$expr = '(http://[a-z./]+)\\. ';
Your expression doesn't seem to work for me either. It seems that '.*'
is just matching everything.
Mike
--
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/
More information about the talk
mailing list