python - parsing xml containing default namespace to get an element value using lxml -

- April 15, 2013

i have xml string this

str1 = """<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap>     <loc>         http://www.example.org/sitemap_1.xml.gz     </loc>     <lastmod>2015-07-01</lastmod> </sitemap> </sitemapindex> """

i want extract urls present inside <loc> node i.e http://www.example.org/sitemap_1.xml.gz

i tried code didn't word

from lxml import etree root = etree.fromstring(str1) urls = root.xpath("//loc/text()") print urls []

i tried check if root node formed correctly. tried , same string str1

etree.tostring(root)  '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n<sitemap>\n<loc>http://www.example.org/sitemap_1.xml.gz</loc>\n<lastmod>2015-07-01</lastmod>\n</sitemap>\n</sitemapindex>'

this common error when dealing xml having default namespace. xml has default namespace, namespace declared without prefix, here :

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

note not element default namespace declared in namespace, descendant elements inherit ancestor default namespace implicitly, unless otherwise specified (using explicit namespace prefix or local default namespace point different namespace uri). means, in case, elements including loc in default namespace.

to select element in namespace, you'll need define prefix namespace mapping , use prefix in xpath :

from lxml import etree str1 = '''<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap>     <loc>         http://www.example.org/sitemap_1.xml.gz     </loc>     <lastmod>2015-07-01</lastmod> </sitemap> </sitemapindex>''' root = etree.fromstring(str1)  ns = {"d" : "http://www.sitemaps.org/schemas/sitemap/0.9"} url = root.xpath("//d:loc", namespaces=ns)[0] print etree.tostring(url)

output :

<loc xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">         http://www.example.org/sitemap_1.xml.gz     </loc>

Search This Blog

Bay WIKI

python - parsing xml containing default namespace to get an element value using lxml -

Comments

Post a Comment

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

go - syntax error: unexpected name, expecting semicolon or newline -