[nycphp-talk] ORM vs SQL: the ultimate showdown
Ben Sgro (ProjectSkyLine)
ben at projectskyline.com
Sat Sep 15 12:31:40 EDT 2007
Hello,
INSERT INTO users
(first_name,last_name,organization,address1, ... 20 more fields
..., birthday, favorite_color)
VALUES
(?,?,?,?,... 20 more question marks ...,?,?)
",array($first_name,$last_name,$organization,$address1,... 20 more
fields...,$favorite_color,$birthday);
Why not just do:
SET ...
first_name = $firstname,
last_name = $lastname, ...
Which is *basically* the same as what your saying ORM is useful for in this
case...
- Ben
----- Original Message -----
From: "Paul Houle" <paul at devonianfarm.com>
To: "NYPHP Talk" <talk at lists.nyphp.org>
Sent: Saturday, September 15, 2007 12:50 PM
Subject: [nycphp-talk] ORM vs SQL: the ultimate showdown
> I think of ORM as a partial solution to the problems of building
> database-backed web apps. People call database applications "CRUD" apps,
> short for
>
> Create
> Recall
> Update
> Delete
>
> ORM is good for Create, and for many cases of Update and Delete. It
> can be used to do Recall, but can be disastrously slow for generating
> many kinds of reports.
>
> Just the other day I was working with a Java Servlet-based webapp. I
> entered a text string with a single quote, which caused an error. The
> cause was code that would have looked like this in PHP:
>
> $conn->Execute("INSERT INTO users (first_name,last_name,organization)
> VALUES ('$first_name','$last_name','$organization'");
>
> This would have been OK if they'd written something like
>
> $first_name=addslashes($_POST["first_name"]);
> $last_name=addslashes($_POST["last_name"]);
> $organization=addslashes($_POST["organization"]);
>
> But they didn't. Neither do most developers. Now, there are tools
> that help with this: for instance, some database API's will
> automatically quote things if you use placeholders. This style of doing
> things has it's own problems: let's say the user table has a lot of
> fields...
>
> $conn->Execute("
> INSERT INTO users
> (first_name,last_name,organization,address1, ... 20 more fields ...,
> birthday, favorite_color)
> VALUES
> (?,?,?,?,... 20 more question marks ...,?,?)
> ",array($first_name,$last_name,$organization,$address1,... 20 more
> fields...,$favorite_color,$birthday);
>
> Notice the error? I didn't get the order of the fields quite right ,
> so the 'birthday' and 'favorite color' fields will get scrambled. Imagine
> how hard it would have been to have noticed the error in the middle of the
> field list! I recently caught this one in a Cold Fusion application --
> this kind of code gets hard to maintain.
>
> Enter ORM. Syntax varies, but this kind of insert is simple and
> maintainable with an ORM system:
>
> $t=$db->users;
> $r=$t->new_record();
> $r->first_name=$_POST["first_name"];
> $r->last_name=$_POST["last_name"];
> $r->organization=$_POST["organization"];
> ...
> $r->insert();
> $user_id=$r->user_id;
>
> The ORM system takes care of quotes and bad data for you. You're not
> likely to introduce bugs when you add and remove fields, and they'll be
> transparent if you do. If you want to make life really easy, you might
> write a subroutine that uses the ORM's introspection capability, so you
> can just write
>
> populate_from_post($r,$_POST);
>
> that scans through the database fields, and automatically assigns post
> variables down to fields. Talk about zero maintainance!
>
> -------
>
> ORM and other SQL-phobic approaches can also get you into big trouble.
> I once inherited a PHP app, using MS SQL server as the back end that had
> an administrative interface that listed all of the users in the system.
> It worked just fine in testing, but got REALLY slow when a few hundred
> users joined the system... It could take more than 100 seconds to produce
> the screen!
>
> The problem was that the system did between 10 and 20 queries to
> produce each output row. It first did a query that retrieved a few
> hundred rows, and then it did more queries to get all the details. Doing
> 5000 queries took a really long time.
>
> We changed the application to use prepared statements and found that
> the query took 10 seconds. This was acceptable, but still pretty bad.
>
> Although this system didn't use an ORM, the general approach was that
> encouraged by most ORM systems -- I don't think any ORM system would have
> been smart enough to consolidate the multiple queries into a smaller
> number, because the individual-row queries were implementing quite
> complicated business rules that involved counting rows with certain
> attributes, preparing thresholds, etc.
>
> An ORM system might or might not be smart enough to generate prepared
> statements. The difference between a 'smart' and 'dumb' ORM would be a
> factor of 10 in performance in this case.
>
> Later on I wrote a 'pure SQL' query that calculated everything in one
> complex query. It used a number of subselects... I thought it was pretty
> straightforward, but many people aren't comfortable with queries that are
> this complex. It did the job in 0.1 seconds!! That's a factor of 100x
> better than I got from running separate statements.
>
> -----------------------
>
> Similar issues turn up with UPDATEs and DELETEs. ORM is quite
> efficient if you only want to update one or two records at a time, but
> imagine you want to update 100,000 rows. (Turning off service for people
> who didn't pay their bills, reset scores in an online game to zero, ...)
> It can be thousands of times faster to do something like:
>
> UPDATE user SET score=0;
>
> than to write some loop that runs hundreds of thousands of queries. Once
> you add in the issues of concurrency and transactions, the 'pure SQL'
> solution looks a lot better -- it's automatically protected by the
> transactional integrity of the database. Probably 80% of people working
> with an ORM system will forget to put the whole thing in a transaction:
> so it won't be reliable. The 20% of the people who do put it in a
> transaction will hold locks on the database for hundreds or thousands of
> times longer than they need to... Which slows down the site for everybody
> else.
>
> --------------------------
>
> The gold standard is use an ORM system that's tightly integrated with
> your framework when it's appropriate: when you're manipulating a few rows
> at a time. You'll get big gains in maintainability. When you're updating
> 1000+ rows or generating complex reports, you need different tools. I've
> built a few systems that integrate data grid display on the client with a
> prebuilt set of data fields that can be incorporated into the query,
> using subselects to 'join' data from other tables -- this gets the
> silver... The gold medal would go to a system that uses the database
> metadata from an ORM system to help you build queries. Hook this up to an
> AJAX data grid, say the one from ext, and you've got a rails killer...
> _______________________________________________
> New York PHP Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> NYPHPCon 2006 Presentations Online
> http://www.nyphpcon.com
>
> Show Your Participation in New York PHP
> http://www.nyphp.org/show_participation.php
More information about the talk
mailing list