powershell - Encoding of the response of the Invoke-Webrequest -
when using cmdlet invokewebrequest against web non-english characters, see no way of defining encoding of response / page content.
i use simple on http://colours.cz/ucinkujici/ , names of artists corrupted. can try simple line:
invoke-webrequest http://colours.cz/ucinkujici
is caused design of cmdlet? can specify encoding somwhere somehow? there workaround parsed response?
it seems me correct :/
here 1 way content right, not dealing htmlwebresponseobject
:
invoke-webrequest http://colours.cz/ucinkujici -outfile .\colours.cz.txt $content = gc .\colours.cz.txt -encoding utf8 -raw
this equally far:
[net.httpwebrequest]$httpwebrequest = [net.webrequest]::create('http://colours.cz/ucinkujici/') [net.httpwebresponse]$httpwebresponse = $httpwebrequest.getresponse() $reader = new-object io.streamreader($httpwebresponse.getresponsestream()) $content = $reader.readtoend() $reader.close()
should want such htmlwebresponseobject
, here way e.g. stuff parsedhtml
more or less "readable" invoke-webrequest
($bad
vs. $better
):
invoke-webrequest http://colours.cz/ucinkujici -outvariable htmlwebresponse $bad = $htmlwebresponse.parsedhtml.title $better = [text.encoding]::utf8.getstring([text.encoding]::default.getbytes($bad)) $bad = $htmlwebresponse.links[7].outerhtml $better = [text.encoding]::utf8.getstring([text.encoding]::default.getbytes($bad))
update: here new take on this, knowing want work parsedhtml
.
once have content, can this:
$parsedhtml = new-object -com "htmlfile" $parsedhtml.ihtmldocument2_write($content) $parsedhtml.close()
et voilĂ :] e.g. $parsedhtml.title
shows correctly, guessing rest ok well…
Comments
Post a Comment