cURL / Mailing Lists / curl-library / Single Mail


Re: what is the best way to extract urls from web page with?

From: Yaroslav Samchuk <>
Date: Tue, 6 Nov 2007 13:03:51 +0200

On 06.11.2007, at 12:27, <> wrote:

> Hello everybody,
> Please, I want to know how to extracts urls from a web page with C+
> +? I
> first think to used regex, but with this way I can only extract the
> first url maybe? Or I think after to handle my webpage like an html
> tree, like treebuilder in perl. A friend say that this is slow to do
> like this. Otherwise I don't know the kind of tools to extract url
> from
> my page like a tree.
> What'is the best way? If this is to handle the page like a tree, what
> kind of simple library could I used please?

I think with any regular expression engine provides searching
functionality. You can use boost::regex_search, for example. Here is
an example
or check this one (it contains a reference to the examples page)

I think there's no need to build a tree unless you really need it.

Oh! If you're using C, then try PCRE and check for
pcre_get_substring_list function.

Yaroslav Samchuk
Received on 2007-11-06