asp.net - How to write rich text to word document generated from htm file in C# -
i trying generate word doc saved html file using open xml library. if html file not contain image can use code below , write text content word doc.
htmldocument doc = new htmldocument(); doc.load(filename); //filename htm file string detail = string.empty; string webdata = string.empty; htmlnode hcollection = doc.documentnode.selectsinglenode("//body"); detail = hcollection.innertext;
but if html file contains embedded image struggling include image in word doc.
using hcollection.innertext
writes text part , excludes image.
when use
htmlnode hcollection = doc.documentnode.selectsinglenode("//body"); detail = hcollection.innerhtml;
all html tags written word doc along path of image in tag
<table border='0' width='100%' cellpadding='0' cellspacing='0' align='center'> <tr><td valign='top' align="left"> <div style='width:100%'><div id="div_img"> <div> <img src="http://www.myweb.com/web/img/2013/07/18/img_1.jpg"> <span>sample text</span></div></div><br><br>sample text content here<br><br> </div></td></tr></table>
how remove html tags , instead of path shown
<img src="http://www.myweb.com/web/img/2013/07/18/img_1.jpg">
the corresponding picture gets loaded.
please help.
you'll need @ html , translate openxml somehow.
i've used htmltoopenxml open-source library (license), , works enough. should handle images (inline, local or remote) , correctly insert them openxml document. submitted patch accepted, project still active.
there limitations library though:
javascript (<script>), css <style>, <meta> , other not supported tags not generate error ignored.
it handle inline style information, entirely ignores other css, needed. ended integrating simple parsing of single <style>
element open-source project (jsonfx, using mit license).
note: handling multiple <style>
elements, downloading css files, sorting out style rules have precedence -- these problems did not address.
Comments
Post a Comment