python 2.7 - findAll() in BeautifulSoup missing nodes -
the method findall() in beautifulsoup not return elements in xml. if code below , open url, can see there 10 pubmedarticle nodes in xml. findall method finds 6 of them. there 6 * on output instead of 10. doing wrong?
import urllib2 bs4 import beautifulsoup url = 'http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=23858559,23858558,23858557,23858521,23858508,23858506,23858494,23858473,23858461,23858404' data = urllib2.urlopen(url).read() soup = beautifulsoup(data) x in soup.findall('pubmedarticle'): print '*'
edit: i've discovered 'findall' relative current node, can set root node soup.
the entities in provided xml named "pubmedarticle", try following:
for x in soup.pubmedarticleset.findall('pubmedarticle'): print '*'
Comments
Post a Comment