Python Language Tutorial => Opening and reading large XML files...

Example

Sometimes we don't want to load the entire XML file in order to get the information we need. In these instances, being able to incrementally load the relevant sections and then delete them when we are finished is useful. With the iterparse function you can edit the element tree that is stored while parsing the XML.

Import the ElementTree object:

import xml.etree.ElementTree as ET

Open the .xml file and iterate over all the elements:

for event, elem in ET.iterparse("yourXMLfile.xml"):
    ... do something ...

Alternatively, we can only look for specific events, such as start/end tags or namespaces. If this option is omitted (as above), only "end" events are returned:

events=("start", "end", "start-ns", "end-ns")
for event, elem in ET.iterparse("yourXMLfile.xml", events=events):
    ... do something ...

Here is the complete example showing how to clear elements from the in-memory tree when we are finished with them:

for event, elem in ET.iterparse("yourXMLfile.xml", events=("start","end")):        
    if elem.tag == "record_tag" and event == "end":
        print elem.text
        elem.clear()
    ... do something else ...

PDF - Download Python Language for free

Previous Next

Python Language

Fastest Entity Framework Extensions

Example

Got any Python Language Question?

Python Language

Python Language Manipulating XML Opening and reading large XML files using iterparse (incremental parsing)

Fastest Entity Framework Extensions

Example

Got any Python Language Question?