[nycphp-talk] Trapping Errors with simplexml for Not Well-Formed XML
Emmanuel M. Décarie
emm at scriptdigital.com
Wed Feb 7 10:21:15 EST 2007
Hello there,
I posted the following on my blog and wanted to check with the crowd
if I didn't miss anything obvious here.
<http://lettre13.com/2007/02/07/trapping-errors-with-simplexml-for-
not-well-formed-xml/>
Cheers
-Emmanuel
I discovered the hard way that in PHP5 there are no obvious ways to
detect if some XML is well-formed, especially if you want to deploy
on Unix/Windows platform and don’t want to access the shell directly.
Adding to this problem, I discovered also that the DOM and simplexml
extensions can’t use the PHP5 exception handling to trap the errors
when the XML is not well-formed. Using simplexml or the DOM
extensions against not well-formed XML, the errors generated by these
extensions are not trapped and are displayed immediately.
It’s possible to load with the DOM or the Tidy extensions not well-
formed XML, and then repair it on the fly. But what if you need to
detect not well-formed XML and provide a message stating the error?
Fortunately, after some research, I found that you could use the
libxml functions (PHP 5.1 and over) to test XML well formedness and
trap XML errors. So, I wiped out this little function called
get_xml_object (see here (1) for the inspiration) that allow me to
trap errors when simplexml is used to parse XML. The function is
quite simple, by default, you provide a path to a XML file. If you
want to use a string, just add another argument after the first
parameter (it can’t be anything, but here’s I chose “string”
for clarity sakes). You can also replace the simplexml extension by
the DOM extensions if you prefer this extension to parse XML.
The function get_xml_object will return an array that contains two
keys, errors and xml. In this example, $result=get_xml_object($s,
"string"), $result is an array. If there are no errors, $result
['errors'] will be set to null. If everything is ok, $result['xml']
will contains a simplexml object that you can then manipulate with
the simplexml extension.
$s = "tag>hello world</tag>";
// $s = "<tag>hello world</tag>";
function get_xml_object ($xml, $xmlFormat=”file”) {
$xml_object = null;
$result = array (”errors” => null, “xml” => null);
libxml_use_internal_errors (true);
$xmlFormat == “file” ? $xml_object = simplexml_load_file ($xml)
: $xml_object = simplexml_load_string ($xml);
if (!$xml_object) {
$errors = libxml_get_errors();
foreach ($errors as $error) {
$error_msg = “Error: line: ” . $error->line
. “: column: ” . $error->column . “: ”
. $error->message . “n”;
}
libxml_clear_errors();
$result[”errors”] = $error_msg;
} else {
$result[”xml”] = $xml_object;
}
return $result;
}
$result = get_xml_object ($s, “string”);
if ($result[’errors’]) {
var_dump ($result[’errors’]);
} else {
var_dump ($result[’xml’]);
}
(1) <http://ca3.php.net/manual/en/function.libxml-use-internal-
errors.php>
More information about the talk
mailing list