c# - How can I get this with XPath -


i'm writing crawler 1 of sites , and came across problem.

from html...

<div class="price">     <span style="font-size: 14px; text-decoration: line-through; color: #444;">195.90 usd</span>     <br />     131.90 usd            </div> 

i need 131.90 usd using xpath.

tried this...

"//div[@class='price']" 

but returns different result.

how can achieve this?

edit

i'm using c# code (simplified demonstration)

protected override dealdictionary grabdata(htmlagilitypack.htmldocument html) { var price = helper.getinnerhtml(html.documentnode, "//div[@class='price']/text()"); 

}

helper class

public static class helper {     public static string getinnertext(htmldocument doc, string xpath) {         var nodes = doc.documentnode.selectnodes(xpath);         if (nodes != null && nodes.count > 0) {             var node = nodes[0];             return node.innertext.trimhtml();         }         return string.empty;     }      public static string getinnertext(htmlnode inputnode, string xpath) {         var nodes = inputnode.selectnodes(xpath);         if (nodes != null && nodes.count > 0) {             var node = nodes[0];             var comments = node.childnodes.oftype<htmlcommentnode>().tolist();             foreach (var comment in comments)                 comment.parentnode.removechild(comment);              return node.innertext.trimhtml();         }         return string.empty;     }      public static string getinnerhtml(htmldocument doc, string xpath) {         var nodes = doc.documentnode.selectnodes(xpath);         if (nodes != null && nodes.count > 0) {             var node = nodes[0];             return node.innerhtml.trimhtml();         }         return string.empty;     }      public static string getinnerhtml(htmlnode inputnode, string xpath) {         var nodes = inputnode.selectnodes(xpath);         if (nodes != null && nodes.count > 0) {             var node = nodes[0];             return node.innerhtml.trimhtml();         }         return string.empty;     } } 

the xpath tried start:

//div[@class='price'] 

this selects <div> element in xml document. restrict selection <div> elements have class attribute value price.

so far, - select <div> element, <div> element including of contents.

in xml fragment show above, have following hierarchical structure:

<div> element     <span> element         text node     <br> element     text node 

so, interested in latter text node. can use text() in xpath select text nodes. in case, interested in first text node immediate child of <div> element found, xpath should this:

//div[@class='price']/text() 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -