Parsing xml into php. What are XML parsers used for and how can they be useful?

publication of this article is permitted only with a link to the website of the author of the article

In this article I will show an example of how to parse a large XML file. If your server (hosting) does not prohibit increasing the running time of the script, then you can parse an XML file weighing at least gigabytes; I personally only parsed files from ozone weighing 450 megabytes.

When parsing large XML files, two problems arise:
1. Not enough memory.
2. There is not enough allocated time for the script to run.

The second problem with time can be solved if the server does not prohibit it.
But the problem with memory is difficult to solve, even if we are talking about your own server, then moving files of 500 megabytes is not very easy, and it’s simply not possible to increase the memory on hosting and VDS.

PHP has several built-in XML processing options - SimpleXML, DOM, SAX.
All of these options are described in detail in many articles with examples, but all examples demonstrate working with a full XML document.

Here is one example, getting an object from an XML file

Now you can process this object, BUT...
As you can see, the entire XML file is read into memory, then everything is parsed into an object.
That is, all data goes into memory and if there is not enough allocated memory, the script stops.

This option is not suitable for processing large files; you need to read the file line by line and process this data one by one.
In this case, the validity check is also carried out as the data is processed, so you need to be able to rollback, for example, delete all data entered into the database in the case of an invalid XML file, or carry out two passes through the file, first read for validity, then read for processing data.

Here is a theoretical example of parsing a large XML file.
This script reads one character at a time from a file, collects this data into blocks and sends it to the XML parser.
This approach completely solves the memory problem and does not cause a load, but aggravates the problem over time. How to try to solve the problem over time, read below.

Function webi_xml ($file)
{

########
### data function

{
print $data ;
}
############################################

{
print $name ;
print_r($attrs);
}

## closing tag function
function endElement ($parser, $name)
{
print $name ;
}
############################################

($xml_parser, "data");

// open the file
$fp = fopen($file, "r");

$perviy_vxod = 1 ; $data = "" ;

{

$simvol = fgetc ($fp); $data .= $simvol ;

if($simvol != ">" ) ( continue;)

echo "

break;
}

$data = "" ;
}
fclose($fp);

Webi_xml("1.xml");

In this example, I put everything into one function webi_xml() and at the very bottom you can see its call.
The script itself consists of three main functions:
1. A function that catches the opening of the startElement() tag
2. A function that catches the closing endElement() tag
3. And the data receiving function data() .

Let's assume that the contents of file 1.xml is a recipe

< title >Simple bread
< ingredient amount = "3" unit = "стакан" >Flour
< ingredient amount = "0.25" unit = "грамм" >Yeast
< ingredient amount = "1.5" unit = "стакан" >Warm water
< ingredient amount = "1" unit = "чайная ложка" >Salt
< instructions >
< step > Mix all ingredients and knead thoroughly.
< step > Cover with a cloth and leave for one hour in a warm room..
< step > Knead again, place on a baking sheet and put in the oven.
< step > Visit site site

We start everything by calling the general function webi_xml ("1.xml" );
Next, the parser starts in this function and converts all tag names to upper case so that all tags have the same case.

$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);

Now we indicate which functions will work to catch the opening of a tag, closing and processing data

xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "data");

Next comes the opening of the specified file, iterating through the file one character at a time and each character is added to the string variable until the character is found > .
If this is the very first access to the file, then along the way everything that is unnecessary at the beginning of the file will be deleted, everything that comes before , this is the tag that XML should begin with.
For the first time, a string variable will contain a string

And send it to the disassembler
xml_parse ($xml_parser, $data, feof ($fp));
After processing the data, the string variable is reset and the collection of data into a string begins again and the string is formed for the second time

On the third
 on the fourth Simple bread

Please note that a string variable is always formed from a completed tag > and it is not necessary to send the burglar an open and closed tag with data, for example
Simple bread
It is important for this handler to receive a whole unbroken tag, at least one open tag, and in the next step a closed tag, or immediately receive 1000 lines of a file, it doesn’t matter, the main thing is that the tag does not break, for example

le>Plain bread
This way, it is impossible to send data to the handler, since the tag is torn.
You can come up with your own method of sending data to the handler, for example, collect 1 megabyte of data and send it to the handler to increase speed, just make sure that the tags are always completed and the data can be torn
Simple bread

Thus, in parts as you wish, you can send a large file to the processor.

Now let's look at how this data is processed and how to obtain it.

Let's start with the opening tags function startElement ($parser, $name, $attrs)
Let's assume that processing has reached the line
< ingredient amount = "3" unit = "стакан" >Flour
Then inside the function the variable $name will be equal to ingredient that is, the name of the open tag (it hasn’t come to closing the tag yet).
Also in this case, an array of attributes of this tag $attrs will be available, which will contain data amount = "3" and unit = "glass".

After this, the data of the open tag was processed by the function data ($parser, $data)
The $data variable will contain everything that is between the opening and closing tags, in our case this is the text Muka

And the processing of our string by the function ends endElement ($parser, $name)
This is the name of the closed tag, in our case $name will be equal to ingredient

And after that everything went in circles again.

The above example only demonstrates the principle of XML processing, but for real application it needs to be modified.
Typically, you have to parse large XML to enter data into the database, and to properly process the data you need to know which open tag the data belongs to, what level of tag nesting and which tags are open in the hierarchy above. With this information, you can process the file correctly without any problems.
To do this, you need to introduce several global variables that will collect information about open tags, nesting and data.
Here's an example you can use

Function webi_xml ($file)
{
global $webi_depth ; // counter to track nesting depth
$webi_depth = 0 ;
global $webi_tag_open ; // will contain an array of currently open tags
$webi_tag_open = array();
global $webi_data_temp ; // this array will contain the data of one tag
####################################################
### data function
function data ($parser, $data)
{
global $webi_depth ;
global $webi_tag_open ;
global $webi_data_temp ;
// add data to the array indicating nesting and currently open tag
$webi_data_temp [ $webi_depth ][ $webi_tag_open [ $webi_depth ]][ "data" ].= $data ;
}
############################################
####################################################
### opening tag function
function startElement ($parser, $name, $attrs)
{
global $webi_depth ;
global $webi_tag_open ;
global $webi_data_temp ;
// if the nesting level is no longer zero, then one tag is already open
// and the data from it is already in the array, you can process it
if ($webi_depth)
{

" ;

print "
" ;
print_r($webi_tag_open); // array of open tags
print "
" ;
// after processing the data, delete it to free up memory
unset($GLOBALS [ "webi_data_temp" ][ $webi_depth ]);
}
// now the next tag is opened and further processing will occur in the next step
$webi_depth++; // increase nesting
$webi_tag_open [ $webi_depth ]= $name ; // add an open tag to the information array
$webi_data_temp [ $webi_depth ][ $name ][ "attrs" ]= $attrs ; // now add tag attributes
}
###############################################
#################################################
## closing tag function
function endElement ($parser, $name) (
global $webi_depth ;
global $webi_tag_open ;
global $webi_data_temp ;
// data processing begins here, for example adding to the database, saving to a file, etc.
// $webi_tag_open contains a chain of open tags by nesting level
// for example $webi_tag_open[$webi_depth] contains the name of the open tag whose information is currently being processed
// $webi_depth tag nesting level
// $webi_data_temp[$webi_depth][$webi_tag_open[$webi_depth]]["attrs"] array of tag attributes
// $webi_data_temp[$webi_depth][$webi_tag_open[$webi_depth]]["data"] tag data
Print "data" . $webi_tag_open [ $webi_depth ]. "--" .($webi_data_temp [ $webi_depth ][ $webi_tag_open [ $webi_depth ]][ "data" ]). "
" ;
print_r ($webi_data_temp [ $webi_depth ][ $webi_tag_open [ $webi_depth ]][ "attrs" ]);
print "
" ;
print_r($webi_tag_open);
print "
" ;
Unset($GLOBALS [ "webi_data_temp" ]); // after processing the data, we delete the entire array with the data, since the tag was closed
unset($GLOBALS [ "webi_tag_open" ][ $webi_depth ]); // delete information about this open tag... since it closed
$webi_depth --; // reduce nesting
}
############################################
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
// indicate which functions will work when opening and closing tags
xml_set_element_handler($xml_parser, "startElement", "endElement");
// specify a function for working with data
xml_set_character_data_handler($xml_parser, "data");
// open the file
$fp = fopen($file, "r");
$perviy_vxod = 1 ; // flag to check the first entry into the file
$data = "" ; // here we collect data from the file in parts and send it to the xml parser
// loop until the end of the file is found
while (! feof ($fp ) and $fp )
{
$simvol = fgetc ($fp); // read one character from the file
$data .= $simvol ; // add this character to the data to be sent
// if the character is not the end tag, then go back to the beginning of the loop and add another character to the data, and so on until the end tag is found
if($simvol != ">" ) ( continue;)
// if the closing tag was found, now we will send this collected data for processing
// check if this is the first entry into the file, then we will delete everything that is before the tag// since sometimes you may encounter garbage before the beginning of the XML (clumsy editors, or the file was received by a script from another server)
if($perviy_vxod ) ( $data = strstr ($data , "
// now throw the data into the xml parser
if (! xml_parse ($xml_parser, $data, feof ($fp))) (
// here you can process and receive validity errors...
// as soon as an error is encountered, parsing stops
echo "
XML Error: " . xml_error_string(xml_get_error_code($xml_parser));
echo "at line" . xml_get_current_line_number ($xml_parser);
break;
}
// after parsing, discard the collected data for the next step of the cycle.
$data = "" ;
}
fclose($fp);
xml_parser_free($xml_parser);
// removing global variables
unset($GLOBALS [ "webi_depth" ]);
unset($GLOBALS [ "webi_tag_open" ]);
unset($GLOBALS [ "webi_data_temp" ]);
Webi_xml("1.xml");
?>

The entire example is accompanied by comments, now test and experiment.
Please note that in the function of working with data, data is not simply inserted into an array, but rather added using " .=" since the data may not arrive in its entirety, and if you just make an assignment, then from time to time you will receive the data in chunks.
Well, that’s all, now there is enough memory when processing a file of any size, but the script’s running time can be increased in several ways.
Insert a function at the beginning of the script
set_time_limit(6000);
or
ini_set ("max_execution_time" , "6000" );

Or add text to the .htaccess file
php_value max_execution_time 6000

These examples will increase the script running time to 6000 seconds.
You can increase the time in this way only when safe mode is turned off.

If you have access to edit php.ini you can increase the time using
max_execution_time = 6000

For example, on the Masterhost hosting, at the time of writing this article, increasing the script time is prohibited, despite safe mode being turned off, but if you are a pro, you can make your own PHP build on the Masterhost, but that is not the subject of this article.
28.3K
I've seen a lot of xml parsers, but I haven't touched on web programming. Now I want to find out and learn with you how to make a simple xml parser in php.

What for? Necessary!

No, well, actually: xml files are a very useful thing. And any professional should... no, should not, but must know how to work with them. We want to become professionals, right? If you are on my blog, then you have such a desire.

We assume that we know what XML is and will not describe it here. Well, if we don’t know, we can easily find out here: http://ru.wikipedia.org/wiki/XML

While searching for ways to parse XML in PHP, I discovered a simple set of functions in PHP for working with XML files called "XML Parser Functions". Parsing begins by initializing the parser by calling the xml_parser_create function:

$xml_parser = xml_parser_create();

Then we need to tell the parser which functions will process the xml tags and text information it encounters during the parsing process. Those. you need to install some handlers:

xml_set_element_handler($xml_parser, “startElement”, “endElement”);

This function is responsible for setting the start of element and end of element handlers. For example, if a combination is found in the text of an xml file, the startElement function will work when the parser finds the element, and the endElement function will work when it finds it.

The startElement and endElement functions themselves take several parameters according to the php documentation:

But how to read data from a file? We have not yet seen a single parameter for this in any of the functions! And more on this later: reading the file rests on the shoulders of the programmer, i.e. we must use standard functions for working with files:

Opened the file. Now you need to read it line by line and feed the read lines to the xml_parse function:

XML Error: ".xml_error_string(xml_get_error_code($xml_parser)); echo " at line ".xml_get_current_line_number($xml_parser); break; ) ) ?>

Here we note two very important things. The first is that the xml_parse function needs to be passed the last line reading flag in the third parameter (true - if the line is the last, false - if not). The second thing is that, as in any business, we must watch for mistakes here. The functions xml_get_error_code and xml_error_string are responsible for this. The first function receives the error code, and the second, based on the received code, returns a text description of the error. What will happen as a result of an error will be discussed later. An equally useful function xml_get_current_line_number will tell us the number of the current line being processed in the file.

And as always, we must free up the resources occupied by the system. For XML parsing, this is the xml_parser_free function:

xml_parser_free($xml_parser);

Here we have looked at the main functions. It's time to see them in action. For this I came up with an xml file with a very simple structure:

123

71234567890

Let's call this file data.xml and try to parse it using the following code:

Element: $name
"; // element name $depth++; // increase the depth so that the browser shows indents foreach ($attrs as $attr => $value) ( echo str_repeat(" ", $depth * 3); // indents // display the name attribute and its value echo "Attribute: ".$attr." = ".$value."
"; ) ) function endElement($parser, $name) ( global $depth; $depth--; // reduce the depth ) $depth = 0; $file = "data.xml"; $xml_parser = xml_parser_create(); xml_set_element_handler ($xml_parser, "startElement", "endElement"); if (!($fp = fopen($file, "r"))) ( die("could not open XML input"); ) while ($data = fgets ($fp)) ( if (!xml_parse($xml_parser, $data, feof($fp))) ( echo "
XML Error: "; echo xml_error_string(xml_get_error_code($xml_parser)); echo " at line ".xml_get_current_line_number($xml_parser); break; ) ) xml_parser_free($xml_parser); ?>

As a result of the simplest script we developed, the browser displayed the following information in its window:

Element: ROOT
Element: INFO
Attribute: WHO = mine
Element: ADDRESS

Attribute: KVARTIRA = 12
Attribute: DOM = 15
Element: PHONE

Let's try to corrupt the XML file by replacing the tag On , and leaving the closing tag the same:

Element: ROOT
Element: INFO
Attribute: WHO = mine
Element: ADDRESS
Attribute: ULICA = my street!!
Attribute: KVARTIRA = 12
Attribute: DOM = 15
Element: TELEPHONE
XML Error: Mismatched tag at line 5

Wow! Error messages work! And quite informative.

Eh, I forgot one more thing... We didn’t display the text contained inside the address and phone tags. We correct our shortcoming - add a text handler using the xml_set_character_data_handler function:

xml_set_character_data_handler($xml_parser, 'stringElement');

And we add the handler function itself to the code.

→ What are XML parsers needed for and how can they be useful
If you are involved in website creation, you have probably heard about XML, even if you have not yet used it in your work. In this case, it’s time to get acquainted, because having experienced a real boom, over the past ten years this new format has grown from an innovative project into a real industrial standard, and almost every day there are reports of examples of its successful use.
One of the most important components of XML technology is a special class of programs responsible for analyzing documents and extracting the necessary information - parsers. These are the ones that will be discussed in this article. Let's figure out what parsers are needed for, what they are like and where you can get them.
In general, an XML document is a simple text file in which the necessary data structure is stored using special syntactic structures (called “tags”). This allows you to store information not as a continuous array, but in the form of hierarchically related fragments. Because text files are so easy to create and transmit over a network, they are an extremely convenient way to store information and are widely used in creating complex distributed applications.
But the universality of the XML text format results in a very obvious inconvenience - before extracting data from a document, you have to really struggle with parsing the text and determining its structure. Implementing all the necessary procedures manually is a very non-trivial task and will require considerable effort. One of the standard mechanisms that makes life easier for developers is parsers.
What is it? An XML parser is a program designed to parse the content of a text document that conforms to the XML specification. She gets all the “dirty” work: obtaining general information about the document, analyzing the text, searching it for service constructs (elements, attributes, entities, etc.), checking for compliance with syntactic rules, and also providing an interface for accessing the document. As a result, the carefully extracted data will be transferred to the user application, which may not know anything at all about what XML is.
The parser can be implemented as a separate software module or ActiveX component, and can be connected to the application through special class libraries at the compilation or execution stage. Parsers are divided into validating and non-validating. The former can check the document structure based on DTD or data schemas, while the latter do not care about this - and therefore are, as a rule, smaller in size. Many of the modern parsers are “loaded” with numerous additional features (extended error handling, adding and editing data), which makes them more convenient to use, although it increases the size of the programs. Almost all common parsers also support a number of important XML standards (XSLT, data schemas, Name spaces, XPath, etc.) - or are supplied in conjunction with parsers of other languages derived from it.
If you realize the usefulness of an XML parser, then it’s time to start practical experiments. Where can I get them? There shouldn’t be any particular problems finding the appropriate software: the Internet is full of freely distributed parsers written in all kinds of programming languages, working on all platforms and having a variety of characteristics and purposes.
The most common and well-known is the Expat parser, written by James Clark, one of the creators of the XML specification. It is implemented in the C++ programming language and is distributed along with the source code. By the way, support for this markup language in such well-known environments as PHP and Perl is implemented precisely on its basis. Another common parser is Xerces, available in the Apache XML Project (implemented in Java and C++). You can find many parsers for C++, Perl and Python. Most of them are written in Java, and are suitable for any platform familiar with Java. The market leaders (Microsoft, Oracle, Sun), who are always distinguished by their scale and monumentality, did not stand aside either. They have released more “heavyweight” and functional packages, which contain, in addition to the parsers themselves, many additional utilities that make the life of developers easier.
Of course, it is impossible to tell everything about parsers in one article. But I would like to hope that you understand that working with XML is not as difficult as it might seem. All the complexities of this format are hidden from us inside the parsers, and there is no reason to be afraid to introduce a new format into existing projects.

XML parsing essentially means walking through an XML document and returning the corresponding data. While an increasing number of web services return data in JSON format, most still use XML, so it's important to master XML parsing if you want to use the full range of available APIs.

Using the extension SimpleXML in PHP, which was added back in PHP 5.0, working with XML is very easy and simple. In this article I will show you how to do it.

Basics of use

Let's start with the following example languages.xml:

>

> 1972>
> Dennis Ritchie >
>

> 1995>
> Rasmus Lerdorf >
>

> 1995>
> James Gosling >
>
>

This XML document contains a list of programming languages with some information about each language: the year it was introduced and the name of its creator.

The first step is to load the XML using the functions either simplexml_load_file(), or simplexml_load_string(). As the name of the functions suggests, the first one will load XML from a file, and the second one will load XML from a string.

Both functions read the entire DOM tree into memory and return an object SimpleXMLElement. In the above example, the object is stored in the $languages variable. You can use the functions var_dump() or print_r() to get details about the returned object if you want.

SimpleXMLElement Object
[lang] => Array
[ 0 ] => SimpleXMLElement Object
[@attributes] => Array
[name] => C
[appeared] => 1972
[creator] => Dennis Ritchie
[ 1 ] => SimpleXMLElement Object
[@attributes] => Array
[name] => PHP
[appeared] => 1995
[creator] => Rasmus Lerdorf
[ 2 ] => SimpleXMLElement Object
[@attributes] => Array
[name] => Java
[appeared] => 1995
[creator] => James Gosling
)
)

This XML contains a root element languages, inside which there are three elements lang. Each array element corresponds to an element lang in the XML document.

You can access the properties of an object using the operator -> . For example, $languages->lang will return you a SimpleXMLElement object that matches the first element lang. This object contains two properties: appeared and creator.

$languages -> lang [ 0 ] -> appeared ;
$languages -> lang [ 0 ] -> creator ;

Displaying a list of languages and showing their properties can be done very easily using a standard loop such as foreach.

foreach ($languages -> lang as $lang ) (
printf(
"" ,
$lang [ "name" ] ,
$lang -> appeared ,
$lang -> creator
) ;
}

Notice how I accessed the element's lang attribute name to get the language name. This way you can access any attribute of an element represented as a SimpleXMLElement object.

Working with Namespaces

While working with XML of various web services, you will come across element namespaces more than once. Let's change our languages.xml to show an example of using a namespace:

xmlns:dc =>

> 1972>
> Dennis Ritchie >
>

> 1995>
> Rasmus Lerdorf >
>

> 1995>
> James Gosling >
>
>

Now the element creator fits in the namespace dc which points to http://purl.org/dc/elements/1.1/. If you try to print the language creators using our previous code, it will not work. In order to read element namespaces you need to use one of the following approaches.

The first approach is to use URI names directly in the code when accessing the element namespace. The following example shows how this is done:

$dc = $languages -> lang [ 1 ] - > children( "http://purl.org/dc/elements/1.1/") ;
echo $dc -> creator ;

Method children() takes a namespace and returns child elements that start with a prefix. It takes two arguments, the first of which is the XML namespace, and the second is an optional argument which defaults to false. If the second argument is set to TRUE, the namespace will be treated as a prefix. If FALSE, then the namespace will be treated as a URL namespace.

The second approach is to read the URI names from the document and use them when accessing the element namespace. This is actually a better way to access elements because you don't have to be hardcoded to the URI.

$namespaces = $languages -> getNamespaces (true) ;
$dc = $languages -> lang [ 1 ] -> children ( ($namespaces [ "dc" ] ) ;
echo $dc -> creator ;

Method GetNamespaces() returns an array of prefix names and their associated URIs. It accepts an additional parameter which defaults to false. If you set it like true, then this method will return the names used in the parent and child nodes. Otherwise, it finds namespaces used only in the parent node.

Now you can iterate through the list of languages like this:

$languages = simplexml_load_file ("languages.xml" ) ;
$ns = $languages -> getNamespaces (true ) ;
foreach ($languages -> lang as $lang ) (
$dc = $lang -> children ($ns [ "dc" ] ) ;
printf(
"
%s appeared in %d and was created by %s .
" ,
$lang [ "name" ] ,
$lang -> appeared ,
$dc -> creator
) ;
}

Practical example - Parsing a video channel from YouTube

Let's look at an example that gets an RSS feed from a YouTube channel and displays links to all the videos from it. To do this, please contact the following address:

http://gdata.youtube.com/feeds/api/users/xxx/uploads

The URL returns a list of the latest videos from a given channel in XML format. We will parse the XML and get the following information for each video:

Link to video

Miniature

Name

We'll start by searching and loading the XML:

$channel = "Channel_name" ;
$url = "http://gdata.youtube.com/feeds/api/users/". $channel. "/uploads" ;
$xml = file_get_contents($url);
$feed = simplexml_load_string ($xml) ;
$ns = $feed -> getNameSpaces ( true ) ;

If you look at the XML feed, you can see that there are several elements there entity, each of which stores detailed information about a specific video from the channel. But we only use image thumbnails, video URL and title. These three elements are descendants of the element group, which, in turn, is a child of entry:

>
…
>
…

Title… >
…
>
…
>

We'll just go through all the elements entry, and for each of them we will extract the necessary information. note that player thumbnail And title are in the media namespace. Thus, we must proceed as in the previous example. We get names from the document and use the namespace when accessing elements.

foreach ($feed -> entry as $entry ) (
$group = $entry -> children ($ns [ "media" ] ) ;
$group = $group -> group ;
$thumbnail_attrs = $group -> thumbnail [ 1 ] -> attributes () ;
$image = $thumbnail_attrs [ "url" ] ;
$player = $group -> player -> attributes () ;
$link = $player [ "url" ] ;
$title = $group -> title ;
printf( "
" ,
$player, $image, $title);
}

Conclusion

Now that you know how to use SimpleXML For parsing XML data, you can improve your skills by parsing different XML feeds with different APIs. But it's important to consider that SimpleXML reads the entire DOM into memory, so if you're parsing a large data set, you may run out of memory. To learn more about SimpleXML read the documentation.

If you have any questions, we recommend using our

Many examples in this reference require an XML string. Instead of repeating this string in every example, we put it into a file which we include in each example. This included file is shown in the following example section. Alternatively, you could create an XML document and read it with simplexml_load_file().

Example #1 Include file example.php with XML string

$xmlstr =<<

PHP: Behind the Parser

Ms. Coder
Onlivia Actora

Mr. Coder
El Actor

So, this language. It"s like, a programming language. Or is it a
scripting language? All is revealed in this thrilling horror spoof
of a documentary.

7
5

XML;
?>

The simplicity of SimpleXML appears most clearly when one extracts a string or number from a basic XML document.
Example #2 Getting

include "example.php" ;
echo $movies -> movie [ 0 ]-> plot ;
?>

So, this language. It"s like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary.

Accessing elements within an XML document that contain characters not permitted under PHP"s naming convention (e.g. the hyphen) can be accomplished by encapsulating the element name within braces and the apostrophe.
Example #3 Getting

include "example.php" ;
echo $movies -> movie ->( "great-lines" )-> line ;
?>

The above example will output:

PHP solves all my web problems

Example #4 Accessing non-unique elements in SimpleXML

When multiple instances of an element exist as children of a single parent element, normal iteration techniques apply.

include "example.php" ;
$movies = new SimpleXMLElement ($xmlstr );
/* For each node, we echo a separate . */
foreach ($movies -> movie -> characters -> character as $character ) (
echo $character -> name , " played by " , $character -> actor , PHP_EOL ;
}
?>

The above example will output:

Properties ( $movies->movie in previous example) are not arrays. They are iterable and accessible objects.

Example #5 Using attributes

So far, we have only covered the work of reading element names and their values. SimpleXML can also access element attributes. Access attributes of an element just as you would elements of an array.

include "example.php" ;
$movies = new SimpleXMLElement ($xmlstr );
/* Access the nodes of the first movie.
* Output the rating scale, too. */
foreach ($movies -> movie [ 0 ]-> rating as $rating ) (
switch((string) $rating [ "type" ]) ( // Get attributes as element indices
case "thumbs" :
echo $rating , "thumbs up" ;
break;
case "stars" :
echo $rating, "stars";
break;
}
}
?>

The above example will output:

7 thumbs up5 stars

Example #6 Comparing Elements and Attributes with Text

To compare an element or attribute with a string or pass it into a function that requires a string, you must cast it to a string using (string). Otherwise, PHP treats the element as an object.

include "example.php" ;
$movies = new SimpleXMLElement ($xmlstr );
if ((string) $movies -> movie -> title == "PHP: Behind the Parser" ) {!}
print "My favorite movie." ;
}
echo htmlentities ((string) $movies -> movie -> title );
?>

The above example will output:

My favorite movie.PHP: Behind the Parser

Example #7 Comparing Two Elements

Two SimpleXMLElements are considered different even if they point to the same element since PHP 5.2.0.

include "example.php" ;
$movies1 = new SimpleXMLElement ($xmlstr );
$movies2 = new SimpleXMLElement ($xmlstr );
var_dump($movies1 == $movies2); // false since PHP 5.2.0
?>

The above example will output:

Example #8 Using XPath

SimpleXML includes built-in XPath support. To find all elements:

include "example.php" ;
$movies = new SimpleXMLElement ($xmlstr );
foreach ($movies -> xpath ("//character") as $character ) (
echo $character -> name , " played by " , $character -> actor , PHP_EOL ;
}
?>

"// " serves as a wildcard. To specify absolute paths, omit one of the slashes.

The above example will output:

Ms. Coder played by Onlivia Actora Mr. Coder played by El Actor

Example #9 Setting values

Data in SimpleXML doesn't have to be constant. The object allows for manipulation of all of its elements.

include "example.php" ;
$movies = new SimpleXMLElement ($xmlstr );
$movies -> movie [ 0 ]-> characters -> character [ 0 ]-> name = "Miss Coder" ;
echo $movies -> asXML();
?>

The above example will output:

PHP: Behind the Parser Miss Coder Onlivia Actora Mr. Coder El Actor PHP solves all my web problems 7 5

Example #10 Adding elements and attributes

Since PHP 5.1.3, SimpleXML has had the ability to easily add children and attributes.

include "example.php" ;
$movies = new SimpleXMLElement ($xmlstr );
$character = $movies -> movie [ 0 ]-> characters -> addChild ("character" );
$character -> addChild ( "name" , "Mr. Parser" );
$character -> addChild ("actor" , "John Doe" );
$rating = $movies -> movie [ 0 ]-> addChild ( "rating" , "PG" );
$rating -> addAttribute ("type" , "mpaa" );
echo $movies -> asXML();
?>

The above example will output:

PHP: Behind the Parser Ms. Coder Onlivia Actora Mr. Coder El Actor Mr. ParserJohn Doe So, this language. It"s like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary. PHP solves all my web problems 7 5 PG

Example #11 DOM Interoperability

PHP has a mechanism to convert XML nodes between SimpleXML and DOM formats. This example shows how one might change a DOM element to SimpleXML.

$dom = new DOMDocument ;
$dom -> loadXML ( "blah" );
if (! $dom ) (
echo "Error while parsing the document";
exit;
}
$books = simplexml_import_dom($dom);
echo $books -> book [ 0 ]-> title ;
?>

The above example will output:

4 years ago

There is a common "trick" often proposed to convert a SimpleXML object to an array, by running it through json_encode() and then json_decode(). I"d like to explain why this is a bad idea.
Most simply, because the whole point of SimpleXML is to be easier to use and more powerful than a plain array. For instance, you can writebar -> baz [ "bing" ] ?> and it means the same thing asbar [ 0 ]-> baz [ 0 ][ "bing" ] ?> , regardless of how many bar or baz elements there are in the XML; and if you writebar [ 0 ]-> baz [ 0 ] ?> you get all the string content of that node - including CDATA sections - regardless of whether it also has child elements or attributes. You also have access to namespace information, the ability to make simple edits to the XML, and even the ability to "import" into a DOM object, for much more powerful manipulation. All of this is lost by turning the object into an array rather than reading understanding the examples on this page.
Additionally, because it is not designed for this purpose, the conversion to JSON and back will actually lose information in some situations. For instance, any elements or attributes in a namespace will simply be discarded, and any text content will be discarded if an element also has children or attributes. Sometimes, this won't matter, but if you get in the habit of converting everything to arrays, it's going to sting you eventually.
Of course, you could write a smarter conversion, which didn't have these limitations, but at that point, you are getting no value out of SimpleXML at all, and should just use the lower level XML Parser functions, or the XMLReader class, to create your structure. You still won"t have the extra convenience functionality of SimpleXML, but that"s your loss.

9 years ago

If you need to output valid xml in your response, don"t forget to set your header content type to xml in addition to echoing out the result of asXML():
$xml = simplexml_load_file("...");
...
... xml stuff
...
//output xml in your response:
header("Content-Type: text/xml");
echo $xml -> asXML();
?>

1 year ago

If your xml string contains booleans encoded with "0" and "1", you will run into problems when you cast the element directly to bool:
$xmlstr =<<

1
0

XML;
$values = new SimpleXMLElement($xmlstr);
$truevalue = (bool)$values->truevalue; // true
$falsevalue = (bool)$values->falsevalue; // also true!!!
Instead you need to cast to string or int first:
$truevalue = (bool)(int)$values->truevalue; // true
$falsevalue = (bool)(int)$values->falsevalue; // false

9 years ago

From the README file:
SimpleXML is meant to be an easy way to access XML data.
SimpleXML objects follow four basic rules:
1) properties denote element iterators
2) numeric indices denote elements
3) non numeric indices denote attributes
4) string conversion allows to access TEXT data
When iterating properties then the extension always iterates over
all nodes with that element name. Thus method children() must be
called to iterate over subnodes. But also doing the following:
foreach ($obj->node_name as $elem) (
// do something with $elem
}
always results in iteration of "node_name" elements. So no further
check is needed to distinguish the number of nodes of that type.
When an elements TEXT data is being accessed through a property
then the result does not include the TEXT data of subelements.
Known issues
============
Due to engine problems it is currently not possible to access
a subelement by index 0: $object->property.

8 years ago

A quick tip on xpath queries and default namespaces. It looks like the XML-system behind SimpleXML has the same workings as I believe the XML-system .NET uses: when one needs to address something in the default namespace, one will have to declare the namespace using registerXPathNamespace and then use its prefix to address the otherwise in the default namespace living element.
$string =<<

Forty What?
Joe
Jane

I know that"s the answer -- but what"s the question?

XML;
$xml = simplexml_load_string ($string);
$xml -> registerXPathNamespace("def" , "http://www.w3.org/2005/Atom");
$nodes = $xml -> xpath ("//def:document/def:title" );
?>

8 years ago

Using stuff like: is_object($xml->module->admin) to check if there actually is a node called "admin", doesn't seem to work as expected, since simplexml always returns an object- in that case an empty one - even if a particular node does not exist.
For me good old empty() function seems to work just fine in such cases.

9 years ago

While SimpleXMLElement claims to be iterable, it does not seem to implement the standard Iterator interface functions like::next and::reset properly. Therefore while foreach() works, functions like next(), current(), or each() don"t seem to work as you would expect -- the pointer never seems to move or keeps getting reset.

5 years ago

If the XML document's encoding is not UTF-8, the encoding declaration must appear immediately after version="..." and before standalone="...". This is a requirement of the XML standard.
If encoding XML document differs from UTF-8. Encoding declaration should follow immediately after the version = "..." and before standalone = "...". This requirement is standard XML.

Ok

Russian language. Russian language
Fatal error: Uncaught exception "Exception" with message "String could not be parsed as XML" in...

Parsing xml into php. What are XML parsers used for and how can they be useful?

Basics of use

Working with Namespaces

Practical example - Parsing a video channel from YouTube

Conclusion

More on the topic of the article: