c# - How can I get this with XPath -
i'm writing crawler 1 of sites , and came across problem.
from html...
<div class="price"> <span style="font-size: 14px; text-decoration: line-through; color: #444;">195.90 usd</span> <br /> 131.90 usd </div>
i need 131.90 usd using xpath.
tried this...
"//div[@class='price']"
but returns different result.
how can achieve this?
edit
i'm using c# code (simplified demonstration)
protected override dealdictionary grabdata(htmlagilitypack.htmldocument html) { var price = helper.getinnerhtml(html.documentnode, "//div[@class='price']/text()");
}
helper class
public static class helper { public static string getinnertext(htmldocument doc, string xpath) { var nodes = doc.documentnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; return node.innertext.trimhtml(); } return string.empty; } public static string getinnertext(htmlnode inputnode, string xpath) { var nodes = inputnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; var comments = node.childnodes.oftype<htmlcommentnode>().tolist(); foreach (var comment in comments) comment.parentnode.removechild(comment); return node.innertext.trimhtml(); } return string.empty; } public static string getinnerhtml(htmldocument doc, string xpath) { var nodes = doc.documentnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; return node.innerhtml.trimhtml(); } return string.empty; } public static string getinnerhtml(htmlnode inputnode, string xpath) { var nodes = inputnode.selectnodes(xpath); if (nodes != null && nodes.count > 0) { var node = nodes[0]; return node.innerhtml.trimhtml(); } return string.empty; } }
the xpath tried start:
//div[@class='price']
this selects <div>
element in xml document. restrict selection <div>
elements have class
attribute value price
.
so far, - select <div>
element, <div>
element including of contents.
in xml fragment show above, have following hierarchical structure:
<div> element <span> element text node <br> element text node
so, interested in latter text node. can use text()
in xpath select text nodes. in case, interested in first text node immediate child of <div>
element found, xpath should this:
//div[@class='price']/text()
Comments
Post a Comment