Nokogiri gem, a new HTML, XML, SAX and Reader parser for Ruby.
It parses and searches XML/HTML faster than Hpricot(Hpricot being the current de facto Ruby HTML parser) and boasts XPath support, CSS3 selector support (a big deal, because CSS3 selectors are mega powerful) and the ability to be used as a "drop in" replacement for Hpricot.
On an Hpricot vs Nokogiri benchmark, Nokogiri clocked in at 7 times faster at initially loading an XML document, 5 times faster at searching for content based on an XPath, and 1.62 times faster at searching for content via a CSS-based search.
Here is the example :
require 'nokogiri'
require 'open-uri'
# Get a Nokogiri::HTML:Document for the page we’re #interested in...
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
#Do funky things with it using Nokogiri::XML::Node methods...
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
puts link.content
end
####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
puts link.content
end
####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
puts link.content
end
Source : http://nokogiri.org/
No comments:
Post a Comment