[nycphp-talk] Java provides???
Paul A Houle
paul at devonianfarm.com
Wed Aug 12 10:35:29 EDT 2009
Matt Williams wrote:
> On Aug 12, 2009, at 5:57, Leam Hall <leam at reuel.net> wrote:
>
>> What does Java provide that PHP can't do faster and with lighter
>> resource usage?
>
> Concurrency and threading to name a couple...
I've got a system that's gotten complicated enough that it's
"outgrowing" PHP. One big advantage in PHP is that you can get more
productivity out of rookie programmers. It takes a good programmer 1
1/2 years to be able to produce usable Java, and some programmers never
get good at it. The ideas in this system are complicated enough that I
think I'd have a hard time hiring another programmer who could handle
it, so the simplicity advantage of PHP is gone. I'm starting to want
static types so that the compiler is watching my back and so that my IDE
can do automated refactoring.
I'm thinking of gradually moving to the JVM but using Scala instead
of Java. After 2 years of working in C#, Java really seems like C#--.
I mean, even PHP has closures today. Type inference, generics and
other features in C# make Java seem like it's going backwards. On the
other hand, if I'm doing my own sysadmin or paying somebody to sysadmin
my systems, I don't want to be stuck in Windows. I know a lot of
people think the type system of Scala is over-complicated, but after 2
years of lover's quarrels with the C# type system, Scala provided the
general theory that informs my practice in C#.
I'm interested in logic programming and other inference systems, as
well as specialized databases: there's a lot of that written in Java.
Java's never quite going to have the efficiency of C, but it's better
for systems work than PHP. If I feel the need for scripting there's
always Groovy, Jython, etc.
My big beef with the JVM (and the CLR) is the UTF-16 scandal;
perhaps I'm a cultural imperialist, but I process lots of text
(billions and billions of characters) that is mainly:
(i) us-ascii,
(ii) iso-latin-1, and
(iii) Unicode that is mainly us-ascii with occasional spattering of
iso-latin-1 and other unicode characters
For me, UTF-8 encodes text at about (1+epsilon) bytes per character;
the JVM and CLR encode text at (2+epsilon) bytes per characters. A few
years ago, when I was stuck on 32-bit machines, that was often the
difference between a program that could run in RAM and a program that
couldn't. Since text processing is limited by memory bandwidth, it
often means large text-processing programs run about twice as slow on
the JVM as they do in UTF-8 based environments.
What makes it a scandal is that UTF-16 pretends to be a fixed-width
encoding when it really isn't. Code that works correctly with, say,
English or Japanese will break when you're processing Chinese or
mathematical characters. Code written with the fast random access that
Java provides doesn't generalize to all languages, so you need to fall
back to the same sequential access methods that you use handling UTF-8
in PHP.
A big advantage of PHP for unicode handling is that it "does no harm;"
I've often seen Java and CLR systems fail seriously because of
limitations in how they handle Unicode characters, particularly when
dealing with junky input data.
More information about the talk
mailing list