TypeError: decoding Unicode is not supported python -
i using lxml.html parse html file , text page. bur have string has character '
example florian's
due which, while printing output traceback
parent_link_id_text = parent_link_id.xpath('./td[@width="400"]/text()') print (sgs_mid[0]+";"+"external"+";"+str(link_id_num[0])+";"+parent_link_id_text[0]+";"+parent_link_link[0], file = log_file_1)
unicodeencodeerror: 'ascii' codec can't encode characters in position 56-58: ordinal not in range(128)
then tried
print (sgs_mid[0]+";"+"publicfreeurl"+";"+str(link_id_num[0])+";"+unicode(parent_link_id_text[0],"utf-8")+";"+parent_link_link[0], file = log_file_1)
and traceback:
typeerror: decoding unicode not supported
how can solve printing string unicode cahracter?
not sure if solution problem, perhaps guide in right direction.
without seeing code have data, i'm going speculate , make programmatic guess how solve issue.
please see following code:
import lxml.html lh import urllib2 url = 'http://loremipsum.net/about.html' doc = lh.parse(urllib2.urlopen(url)) value = doc.xpath('//p/strong/text()')[0] print value
printed result:
what 'lorem ipsum'?
by reading page on lorem ipsum site, can see text returned indeed has ' in it.
i hope helps in right direction.
Comments
Post a Comment