ruby - Need help to locate the text of element with class? -
i have file have got using command page.css("table.vc_result span a"), not able second , third span element of file:
file
<table border="0" bgcolor="#ffffff" onmouseout="resdef(this)" onmouseover="resemp(this)" class="vc_result"> <tbody> <tr> <td width="260" valign="top"> <table> <tbody> <tr> <td width="40%" valign="top"><span><a class="caddname" href="/usa/illinois/chicago/yellow+page+advertising+and+telephone+directory+publica/gateway-megatech_13478733"> gateway megatech</a></span><br> <span class="caddtext">p.o. box 99682, chicago il 60696</span></td> </tr> <tr> <td><span class="caddtext">cook county illinois</span></td> </tr> <tr> <td><span class="caddcategory">yellow page advertising , telephone directory publica chicago</span></td> </tr> </tbody> </table> </td> <td width="260"> <table align="center"> <tbody> <tr> <td> <table> <tbody> <tr> <td> <div style= "background: url('images/listings.png');background-position: -0px -0px; width: 16px; height: 16px"> </div> </td> <td><font style="font-weight:bold">847-506-7800</font></td> </tr> </tbody> </table> </td> </tr> <tr> <td> <table> <tbody> <tr> <td> <div style= "background: url('images/listings.png');background-position: -0px -78px; width: 16px; height: 16px"> </div> </td> <td><a href= "/usa/illinois/chicago/yellow+page+advertising+and+telephone+directory+publica/gateway-megatech_13478733" class="caddnearby">businesses near 60696</a></td> </tr> </tbody> </table> </td> </tr> <tr> <td> <table> <tbody> <tr> <td></td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table>
...this not complete file there plenty more span entries in file.
the code using able locate exact text not able associate text of nested element span a.
require 'rubygems' require 'nokogiri' require 'open-uri' name="yellow" city="chicago" state="il" burl="http://www.sitename.com/" url="#{burl}business_listings.php?name=#{name}&city=#{city}&state=#{state}¤t=1&submit=search" page = nokogiri::html(open(url)) rows = page.css("table.vc_result span a") rows.each |arow| if arow.text == "gateway megatech" puts(arow.next_element.text) puts("capturing next span text") found="got it" break else puts("found nothing") found="none" end end
assuming each business new <tr>
inside top table have supplied, following code gives array of hashes values:
require 'nokogiri' doc = nokogiri.html(html) business_rows = doc.css('table.vc_result > tbody > tr') details = business_rows.map |tr| # inside first <td> of row, find <td> a.caddname in business = tr.at_xpath('td[1]//td[//a[@class="caddname"]]') name = business.at_css('a.caddname').text.strip address = business.at_css('.caddtext').text.strip # inside second <td> of row, find first <font> tag phone = tr.at_xpath('td[2]//font').text.strip # return hash of values row, using capitalization requested { name:name, address:address, phone:phone } end p details #=> [ #=> { #=> :name=>"gateway megatech", #=> :address=>"p.o. box 99682, chicago il 60696", #=> :phone=>"847-506-7800" #=> } #=> ]
this pretty fragile, works you've given, , there not seem many semantic items hang onto in insane, horrorific abuse of html.
Comments
Post a Comment