cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Processing HTML

From: Lars Nilsson <chamaeleon_at_gmail.com>
Date: Mon, 18 Dec 2006 08:51:27 -0500

On 12/18/06, John Vorwald <john_vorwald_at_msn.com> wrote:
> I've used curl to read the url page into a std::string, and need to process
> data in tables on the page. Can anyone recommend a C++ HTML parser. I've
> tried a few, but have been unable to parse a sample page with three tables
> that displays in FireFox/IE. I'm using dev-c++.

I've used libxml[1] in the past for this purpose. For my purposes
SAX-style parsing worked well enough, although it should be able to
build a document object as well (although it might not turn out the
way you want, depending on the HTML code, I suppose).

Lars Nilsson

[1] http://xmlsoft.org/html/libxml-HTMLparser.html
Received on 2006-12-18