c# - How can I get this with XPath -
i'm writing crawler 1 of sites , and came across problem.
from html...
<div class="price"> <span style="font-size: 14px; text-decoration: line-through; color: #444;">195.90 usd</span> <br /> 131.90 usd </div> i need 131.90 usd using xpath.
tried this...
"//div[@class='price']" but returns different result.
how can achieve this?
edit
i'm using c# code (simplified demonstration)
protected override dealdictionary grabdata(htmlagilitypack.htmldocument html) { var price = helper.getinnerhtml(html.documentnode, "//div[@class='price']/text()"); }
helper class
public static class helper { public static string getinnertext(htmldocument doc, string xpath) { var nodes = doc.documentnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; return node.innertext.trimhtml(); } return string.empty; } public static string getinnertext(htmlnode inputnode, string xpath) { var nodes = inputnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; var comments = node.childnodes.oftype<htmlcommentnode>().tolist(); foreach (var comment in comments) comment.parentnode.removechild(comment); return node.innertext.trimhtml(); } return string.empty; } public static string getinnerhtml(htmldocument doc, string xpath) { var nodes = doc.documentnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; return node.innerhtml.trimhtml(); } return string.empty; } public static string getinnerhtml(htmlnode inputnode, string xpath) { var nodes = inputnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; return node.innerhtml.trimhtml(); } return string.empty; } }
the xpath tried start:
//div[@class='price'] this selects <div> element in xml document. restrict selection <div> elements have class attribute value price.
so far, - select <div> element, <div> element including of contents.
in xml fragment show above, have following hierarchical structure:
<div> element <span> element text node <br> element text node so, interested in latter text node. can use text() in xpath select text nodes. in case, interested in first text node immediate child of <div> element found, xpath should this:
//div[@class='price']/text()
Comments
Post a Comment