Friday, November 27, 2009

SimpleXML does not parse text inside CDATA tags in an XML.

SimpleXML does not parse text inside CDATA tags in an XML.

Consider the XML below:


$str = '';
$str.='some text goes here';
$str.='';

To parse it we use following syntax:

$xml = simplexml_load_string($str);

On printing it outputs:


SimpleXMLElement Object
(
[childNode] => some text goes here
)

Thats OK. Now the same xml but this time the text is enclosed in CDATA tags.


$str = '';
$str.='';
$str.=''

On printing this gives following output:


SimpleXMLElement Object
(
[childNode] => SimpleXMLElement Object
(
)

)

Yes its empty. This is because SimpleXML does not parse CDATA tags. All data enclosed within CDATA is ignored by SimpleXML parser.
Solution: Set the 3rd parameter to LIBXML_NOCDATA while parsing.

simplexml_load_string(simplexml_load_file too) actually takes 3 parameters.

* The string to parse
* Optional parameter – to return an object of class specified in this parameter. (By default it returns a SimpleXMLElement Object)
* Also optional – libxml parameters can be specified as options. This option provides the solution to our CDATA problem

Provide the 3rd parameter LIBXML_NOCDATA and SimpleXML will consider CDATA nodes as text nodes and will parse them.


$xml = simplexml_load_string($str,'SimpleXMLElement', LIBXML_NOCDATA);

This will output as desired:


SimpleXMLElement Object
(
[childNode] => some text goes here
)

Please note that using the third parameter requires PHP >=5.1 compiled with libxml.

in reference to: Google Sidewiki (view on Google Sidewiki)

3 comments:

Anonymous said...

Enjoyed reading/following your page.Please keep it coming. Cheers!
watch harry potter and deathly hallows online

Anonymous said...

I will add this blog to my favorites, it is great.
watch harry potter and deathly hallows online

Anonymous said...

It’s funny how many articles and news come out on a weekly basis.
watch harry potter and deathly hallows online

security header validate

  HTTP Security Headers Check Tool - Security Headers Response (serpworx.com)