[nycphp-talk] session variables: seven deadly sins
Paul Houle
paul at devonianfarm.com
Tue Dec 26 22:11:46 EST 2006
Allen Shaw wrote:
> Really? That's a surprising assertion, though I'll agree my surprise
> probably comes more from my own lack of insight than a flaw in your
> argument. Of course a quick google shows a few people hold that
> session vars are "evil," but I can't find much to back up the idea.
>
> Can you elaborate, or give us a few links on the topic?
I'll try to reply to this and some other people who replied to my
previous message.
I'll start with my background. I've often been the person who the
buck stops with -- somebody else develops an application that almost
works (perhaps even puts it in production) and then I have to clean up
the mess. The app might be written in PHP, Java, Cold Fusion, Perl,
you name it. I've learned to see session variables as a "bad smell".
When I develop my own applications, I use cookies for
personalization and caching. I use the authentication system described in
http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz
this mechanism can carry a "session id", which in turn can be used
a key against application state stored in a relational database. I
think through the boundary cases, and find that my greenfield apps
behave predictably -- my only woe is that you'll discover that browsers
have a lot of undocumented behavior connected with cookies, form
handling, and caching. All problems that you still need to fight with
if you use sessions, see the comments for
http://www.php.net/manual/en/function.session-cache-limiter.php
----
The context of this is that the average web application is poor in
the areas of usability and security: recent studies show that 80% of
web applications have serious security problems
http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf
Jacob Nielsen's website has been chronicling the sorry state of web
application usability:
http://www.useit.com/
Perhaps the top 20% of programmers can write applications with
$_SESSION that don't have serious security and usability problems, but
what about the other 80%?
----
(1) Session variables are treacherous. Odd things can happen in
boundary cases, such as when sessions expire, or when you are targeted
by session fixation attacks.
http://shiflett.org/articles/security-corner-feb2004
I've looked at many apps that use sessions that seem to be
working... Until you walk away for two hours, come back, and discover
that you're logged in as somebody else. I suppose I could have spent
hours or days tracking down an intermittent problem, which involved
some confluence of browser oddness (IE was fine, Firefox was screwy),
the behavior of the session system, and crooked logic in the
application. Or I could use cryptographically signed cookies to
implement an authentication system which won't give me surprises in the
future.
Anybody can write applications that work 95% of the time with
$_SESSION. Getting the other 5% right requires a deep understanding of
state and statelessness on the web... Which is what (many) people are
trying to avoid when they use $_SESSION variables.
There are more than twenty configuration variables that affect the
way sessions work under PHP. Incorrect configuration of any of these
can cause applications to fail, often in intermittent ways. The use of
a custom session handler can have unpredictable effects on security,
reliability and performance.
Other languages are a lot worse than PHP -- the use of the "scope"
concept in languages such as Cold Fusion and Tango makes it easy to use
a session variable without realizing it... Resulting in an application
that "works" sometimes, but fails in mysterious ways.
(2) Session variables are bound to a particular language. In the real
world, I work with legacy systems that might be written in other
languages. I might have some old pages in Cold Fusion that work just
fine, and I won't rework them in PHP until I've got a good reason. If
users can set a customization parameter, such as the background of a
page, it's easy to write a cookie that all languages can read.
Applications stuck in the session variable roach motel aren't as
maintainable and portable.
(3) PHPSESSID. Do I need to say more? I consider the client that wants
user tracking and can't accept cookies, so all the pages on their
site look like
http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob**
Three months later they come back and wonder why their site isn't
being indexed in Google. Yes, there's a saner way to use this
feature, but this "cure" to privacy violation is worse than the cookie
"disease", since session ids will leak out through referrers,
bookmarks, links that people cut-and-pate...
(4) The back button. When somebody asks a question about sessions on a
forum, they'll usually ask another question a few days or weeks later:
"How do I disable the back button?"
The underlying problem is a deep aspect of the structure of the
web. There is certain state information that's particular to a request
(GET and POST variables) and certain state information that has a more
persistent scope (cookies, session information, a relational
database.) The back button makes it possible for these two things to
get out of sync.
Ultimately, we need a systematic strategy to deal with this. One
pattern is to put the complete state of the application in form
variables. Applications that use this pattern always work perfectly
with the back button. This pattern doesn't work always (hitting the
back button shouldn't cancel your order on an e-commerce site), but it
works often... For instance, you can use hidden variables to hold onto
form variables for complicated forms that spread over several pages,
(5) Multiple windows. I think it's a human right to be able to have
more than one window open on a web site. If I'm shopping, for
instance, I'd like to be able to look at two products simultaneously.
An application that keeps state in form variables doesn't care how many
you have open. If you're looking for jobs at an organization that uses
taleo.net's software, you'll find that it uses trickery to prevent you
from having more than one window open... So you can't look at two jobs
at once, or look at the job description while you're filling out the
application. I suspect that they did this because they don't want to
spend forever debugging "race conditions" that could be caused by a user
acting in two windows simultaneously.
Session variables introduce problems of locking. PHP gets an
exclusive lock on the session for each page displayed. This hurts the
performance of pages that use dynamically generated images and
Javascript, and can mysteriously deadlock AJAX applications.
(6) Scalability, Reliability, and all that. This is a tricky one,
because it depends on particulars. Sessions can be lightning-fast in
systems that keep them in RAM, such as Java and Cold Fusion. The
default session handler in PHP uses files, and is probably faster than
a relational database in a direct comparison: however, the session
handler will load all of the data into RAM, whereas a relational
implementation may only need to load information when it's needed.
Keeping information in POST variables or cookies also involves a
tradeoff -- this is as scalable as it gets so far as server resources,
but requires that the state be passed back and forth between the browser
and server. This is no big deal if the state is 500 bytes. It's
unacceptable if the state is 500 megabytes. In most cases, it starts
looking expensive when we're passing an extra 10k-100k around.
I've recently been working on a legacy app that contains a query (select
a subset of items) and reporting (display user-selected fields of those
items) function. The interface between those modules is simple: the
query system passes a comma-separated list of item identifiers to the
reporting system. I like this, because it meant that one system could
be changed without affecting the other. I had to update the app so it
would work with a changed database schema, so both sides needed some work.
I discovered that the app was passing the item list as a session
variable. This worked: unless I was using the application in two
windows at a time. In that case, a query in one window would change
the report delivered in another window. I thought about it, and
realized that in this case, result sets would always be under about
10k, and usually be around 1k. Therefore, it made sense to pass this
as a hidden variable in the form and ditch the session variable.
This shows the kind of problems that regularly turn up in the
applications that developers "throw over the wall" to testers and
clients. Choose a session variable, and your application behaves
mysteriously for a user who didn't respect the "one window at a time"
assumption you made. Passing hidden variables in forms, on the other
hand, might work OK when you're testing with a small data set over a
LAN, but could rapidly become a performance nightmare for dialup users
using a production database.
Performance can be improved in a number of ways: for instance, by
delta-sigma compressing the item list, or creating a "form scope"
variable that's keyed against a unique identifier in the form. Either
way, quality web applications take quality thought.
(7) Lack of engineered application state: Engineered Application State
is the gem of database-backed web applications.
If you keep the state of your application in a relational database, you
need to ~design~ the state of your application. You need to ~think~
every time you add or change a table in your relational database. You
can add a new variable to your application as easily as typing '$'.
Desktop apps keep the application state in a tangle of pointers. C and
C++ applications tend to contain 5 or more defects per thousand lines of
code. Errors show up in data structures over time, just as mutations
occur in your cells. Memory leaks, application hangs, and crashes are
cancers caused by these mutations.
PHP apps die at the end of each request, and are reborn for the next
request. They don't accumulate errors over time. Web application
environments such as Java and Cold Fusion that involve a long-running
process regularly hang or crash and require restarts. When is the last
time you've had to restart PHP?
A database protects you from errors in multiple ways. Transactions,
for instance, protect against data corruption caused by crashing
scripts. It's easy to write
$_SESSION["logged_in"]=true;
in one place and
$_SESSION["logged-in"]=false;
in another, introducing unpredictable behavior and security holes. A
relational database will give you an error if you try something like that.
-------------
Can users of $_SESSION avoid the seven deadly sins?
Yes.
In practice they don't.
More information about the talk
mailing list