cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: How to achieve high bandwidth utilization when downloading web-pages with libcurl?

From: Gary Maxwell <gmaxwell_at_broadsoft.com>
Date: Sat, 17 Dec 2011 12:51:26 -0800

> From: Alex
> Sent: Saturday, December 17, 2011 11:57
>
> I'm new to libcurl and network programming

Welcome! For curl and libcurl, I would first read the FAQ, http://curl.haxx.se/docs/faq.html
It will answer many questions and provide guidance dealing with problems.

> So, I have a program written in C++ with libcurl, that basically crawls
> the web, downloading the pages. I also have a pretty good internet
> connection, which easily gives me 4-5 MB/s download speed. However, I
> can't squeeze any decent network utilization out of my program.
> Downloading a single page at a time, I rarely see speeds in excsess of
> 300-500 KB/s.

First, check if your version of libcurl is outdated. Download
the latest release appropriate for you from the downloads page.

Then, see if you can reproduce the issue using the curl command line tool.
If the performance remains bad, the problem is probably not in your application
nor in libcurl.

If the curl command gives performance that you would expect, then the
problem is in your application. If a rebuild of the application against the
latest libcurl does not help, then enable libcurl debugging and perform
a packet capture to diagnose the issue.

> The only idea I have so far is to download multiple pages
> from different servers simultaneously from multiple threads.

I would solve the problem using the single request case, to keep things
simple.

> I've implemented that, but network utilization graph still doesn't
> look good: it has rare short spikes up to 1.5-2 MB/s with 0 in
> between the spikes. I've tried different number of threads, from 5
> to 15, but didn't see any visible changes whatsoever.

Once you solve the problem, take a look at libcurl's multi interface, which
supports parallelism without requiring additional threads.

Cheers,
Gary @ Broadsoft

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2011-12-17