SO x-path or css selector as Scrape Map.

Extraction from the DOM
clipped from

Firequark : quick html screen scraping

Firequark is an extension to Firebug to aid the process of HTML Screen Scraping. Firequark automatically extracts css selector for a single or multiple html node(s) from a web page using Firebug (a web development plugin for Firefox). The css selector generated can be given as an input to html screen scrapers like Scrapi to extract information. Firequark is built to unleash the power of css selector for use of html screen scraping.

HTML screen scraping is a common technique of extracting information about specific and useful elements from a web page. Independent of programming language, for extracting an element from a web page one need to know its exact location or a key to uniquely identify the element. There are two approaches for uniquely identifying an element: using XPath or CSS Selectors.

XPath for HTML screen scraping.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: