[nycphp-talk] sanitizing user-submitted html

Chris Snyder chris at
Sat May 31 00:38:26 EDT 2003

James Wetterau wrote:

>This submission breaks it:
>Strips some attibutes:<br>
><img src=''
>alert("I can do anything in here"); 
Not anymore. Two things happened there-- I needed to create separate 
patterns for attributes delimited with " and with ', and I didn't 
realize that the dot wasn't matching newline chars. Fixed both of those, 
and thanks for the shakedown!!

It's also closing open tags now, but without any sort of pretense to 
well-formed HTML-- it just tacks the appropriate number of closing tags 
on at the end. My goal is to brute-force protect against people who 
might want to break the page visually, not correct a poster's formatting.

>Your program needs to verify that after it strips the HTML it hasn't
>generated unsafe HTML, and it needs a way to avoid getting caught in a
>loop doing that.  This is the sort of programming challenge that I
>like to model with a state machine.
I took a crash course in state machines this evening via Google, and I 
must admit that I have no idea what this problem would like if modeled 
as one. It's true that I would be happier with mathematical proof that 
the routine was unexploitable, but anecdotal proof will be enough for me 
to allow HTML posts in non-critical applications. Thanks again for 
testing it!


More information about the talk mailing list