NYCPHP Meetup

NYPHP.org

[nycphp-talk] Parsing Fun

inforequest sm11szw02 at sneakemail.com
Mon Aug 23 01:11:53 EDT 2004


Christopher Greeley tgrza-at-grza.com |nyphp 04/2004| wrote:

> I have been experimenting with parsing, as I am, and have always been 
> (regardless of the programming language) in the dark on exactly how I 
> should be going about parsing a text file. I have always kept it 
> simple with easy explodes and the like, but it is getting to the point 
> where I want to have a smarter script that doesn’t need a finite list 
> of things that must come in a certain order, etc. So, to that end, I 
> have been experimenting with parsing some RSS streams (I am using the 
> Reuters Sports Stream at 
> http://www.microsite.reuters.com/rss/sportsNews as a guinea pig). I 
> thought that for this end, sscanf would be really easy – I basically 
> got the position of two tags I wanted to read in between with strpos, 
> used substr to truncate the string, and then attempted to use sscanf 
> to parse it into neat little variables. The problem I ran into is that 
> sscanf doesn’t really like white spaces, and it stops reading at that 
> point. So, I dug around a little and found that someone had used 
> %[^[]] to match everything – but at this point, sscanf stopped 
> following my handy little outline.
>
> So, this is more of a request for some general direction in gaining 
> some parsing skills – I am sure there are some out there with some 
> weaker skills who could use the brush up as well.
>
> Thanks,
>
> Chris
>
>------------------------------------------------------------------------
>

I always enjoy parsing (really). IMHO string manipulation is what made 
Professional Basic the success it was (PBDS, a loong time ago) and what 
sold me on PHP as a general-purpose scripting language and not just a 
way to access HTTP headers.

Tokenizing is always fun (strtok). file_get_contents is very handy for 
use with PHP string functions, especially tokenizing. I never cared much 
for "exploding" or "splitting". Nothing like a deep, nested loops of 
str_stuff and preg_replace to get the brain cells working in the morning 
or in other words, to burn up your morning hours!)

Seriously though, with PHP I have found it is wise to try and use the 
built in parsers when possible, such as parse_url, but *never 
underestimate the amazing power of the preg_replace_callback*.

In your example, wouldn't preg_match(all) be a better choice than 
sscanf, so you can explicitly handle tabs and whitespace? That'd give 
you an array of all tagged content appearing in your file (in case there 
was more than the one you expected ;-) which you can further 
parse-n-store using str_"stuff" as you like?

-=john andrews















More information about the talk mailing list