[nycphp-talk] REGEXP Solution Needed
Andrew Muraco
amuraco at gmail.com
Thu Sep 16 01:05:46 EDT 2010
Cool tool for Regex testing: all in-browser testing..
http://gskinner.com/RegExr/
<http://gskinner.com/RegExr/>what about
[?&]size=0*10([&#].*)?$ to match when size=10 or size=0000010 (logically the
same thing..)
- Andrew Muraco
On Wed, Sep 8, 2010 at 1:05 PM, justin <justin at justinhileman.info> wrote:
> that doesn't do what you think it does. it will fail on
>
> http://www.example.com/events/events?node=&start=0&size=0&sort=event
> http://www.example.com/events/events?node=&start=0&size=1&sort=event
> http://www.example.com/events/events?node=&start=0&size=11&sort=event
> http://www.example.com/events/events?node=&start=0&size=100&sort=event
>
> and any "size" value starting with either 0 or 1.
>
> use this instead:
>
> [?&]size=((?!10)|\d|1[1-9]|[02-9]\d|\d{3,})(&|#|$)
>
>
> -- justin
>
>
> On Wed, Sep 8, 2010 at 11:45 AM, <ps at blu-studio.com> wrote:
> >
> > As usual lots of great inoput, but here is what seems to work for me
> testing
> > it against some actula URLs:
> > ^http://www\\.example\\.com/events/events?.*size=[^10].*
> >
> > Just using size does not equal 10.
> >
> > -------- Original Message --------
> > Subject: Re: [nycphp-talk] REGEXP Solution Needed
> > From: John Campbell <jcampbell1 at gmail.com>
> > Date: Wed, September 08, 2010 7:47 am
> > To: NYPHP Talk <talk at lists.nyphp.org>
> >
> > On Wed, Sep 8, 2010 at 10:27 PM, <ps at blu-studio.com> wrote:
> >> I believe this is what I am looking for:
> >> ^http://www\\.example\\.com/events/events?.*size=[\d|\d\d^10].*
> >
> > Test that, but I am quite sure it doesn't do what you want. I think
> > you need negative lookahead, which typically has syntax like
> >
> > size=(?!10)
> >
> > but that isn't quite right, because it will negate with size=100.
> >
> > so I think you need:
> >
> > (size=(?!10))|(size=\d{3,}))
> >
> > Regards,
> > John Campbell
> >
> >> If anyone can polish this more or if I am wrong, pls give a note.
> Thanks.
> >>
> >> -------- Original Message --------
> >> Subject: Re: [nycphp-talk] REGEXP Solution Needed
> >> From: <ps at blu-studio.com>
> >> Date: Wed, September 08, 2010 6:52 am
> >> To: "NYPHP Talk" <talk at lists.nyphp.org>
> >>
> >> This is a great technique, thanks, Scott.
> >> But, I'm putting this into the Do Not Crawl front end of a google search
> >> appliance and it has to be done with gnu regexp. So I've been working on
> >> it
> >> and I got something like this for starters:
> >> ^http://www\\.example\\.com/events/events\\?\.size=[\d|\d\d^10]\.
> >>
> >> Where with the above I am intending to match my domain, then the
> directory
> >> path events/events followed by a questin mark, then any characters
> leading
> >> up to size = any one or two digits but not 10 followed by any
> characters.
> >> That is where I need to be going.
> >> Peter
> >>
> >> -------- Original Message --------
> >> Subject: Re: [nycphp-talk] REGEXP Solution Needed
> >> From: Scott Mattocks <scott at crisscott.com>
> >> Date: Wed, September 08, 2010 6:19 am
> >> To: NYPHP Talk <talk at lists.nyphp.org>
> >>
> >> On 09/08/2010 08:30 AM, ps at blu-studio.com wrote:
> >>> Using GNU Regular Expressions I need to examine an URL like those
> below,
> >>> checking the size key and value, I need to capture and block all URLs
> >>> where 'size does not equal 10'. In other words "size=12", not
> >>> acceptable.
> >>
> >> Regular expressions are expensive and should only be used when
> >> absolutely necessary. If you are checking for a specific string, just
> >> check for it with str* functions. Here's how I would check for it:
> >>
> >> $key = 'size';
> >> $val = 10;
> >> $url = 'http://....';
> >>
> >> $last = strrpos($url, $key . '=');
> >> if ($last !== false && $last == strrpos($url, $key . '=' . $value))
> >> {
> >> echo 'Good';
> >> }
> >> else
> >> {
> >> echo 'Bad';
> >> }
> >>
> >> That block of code makes sure that 'size=' shows up in your URL and that
> >> the last occurrence of 'size=' is actually 'size=10'. The last
> >> occurrence is the value that will be passed to the server so that's
> >> probably the only one you care about. If you want to verify that there
> >> is only one occurrence use strpos(...) == strrpos(...) in addition to
> >> the checks above.
> >>
> >> --
> >> Scott Mattocks
> >> _______________________________________________
> >> New York PHP Users Group Community Talk Mailing List
> >> http://lists.nyphp.org/mailman/listinfo/talk
> >>
> >> http://www.nyphp.org/Show-Participation
> >>
> >> ________________________________
> >> _______________________________________________
> >> New York PHP Users Group Community Talk Mailing List
> >> http://lists.nyphp.org/mailman/listinfo/talk
> >>
> >> http://www.nyphp.org/Show-Participation
> >>
> >> _______________________________________________
> >> New York PHP Users Group Community Talk Mailing List
> >> http://lists.nyphp.org/mailman/listinfo/talk
> >>
> >> http://www.nyphp.org/Show-Participation
> >>
> > _______________________________________________
> > New York PHP Users Group Community Talk Mailing List
> > http://lists.nyphp.org/mailman/listinfo/talk
> >
> > http://www.nyphp.org/Show-Participation
> >
> > _______________________________________________
> > New York PHP Users Group Community Talk Mailing List
> > http://lists.nyphp.org/mailman/listinfo/talk
> >
> > http://www.nyphp.org/Show-Participation
> >
>
>
>
> --
> justin
> http://justinhileman.com
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20100916/35ac2047/attachment.html>
More information about the talk
mailing list