cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Limits to curl command or how to download a long list of URL.

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Sun, 4 Jan 2015 22:32:44 +0100

On Sun, Jan 04, 2015 at 02:22:07PM -0200, Rodrigo Zanatta Silva wrote:
> I have a (really) big list of URL to download. Every file is a small html about
> 1k size. One easy thing was download one by one creating a big bash script. 
>
> But I discovery it can be faster if I send various URL at once to curl. Maybe
> because it need to start a connection, download and close connection, when I
> send various at same time, the curl do this things as fast as possible.

Exactly. This is due to persistent connections and fewer fork/exec calls &
initialization time.

> So one strategy is use braces. My URL is not numeric and don't have an easy
> logic. So, I will create a big command file with
>
> curl  http://site.{one,two,three}.com
>
> How long can be my command?

This is up to the shell and OS kernel. It could be anything from 1KB to 1MB (or
more). If you use the xargs program to set the arguments, it should have that
knowledge built-in and won't provide more arguments that is possible.

> OSX Yosemite 10.10.1 
> Bash: version 3.2.53(1)-release
> curl: curl 7.37.1 (x86_64-apple-darwin14.0) libcurl/7.37.1 SecureTransport zlib
> /1.2.5
>
> (Hum.. Maybe I need update it, and for newer bash and curl, it change the
> limit?)
>
> Is there another strategy? Maybe a list of file. How can I config it to
> download all URL in an txt file, one URL per line (in an efficient way, not
> transform this file making curl download one by one)

Providing the URLs in a file is essentially equivalent to providing them on the
command-line, but without the potential size limits there. Just put all the
options into a file and pass it in with the --config option.

> And I will use various threads, so I can start various command at same time,
> and every command will use the strategy that I am asking now.

That's another good approach. The parallel command or GNU xargs -P option can
help with this, too.

>>> Dan
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-01-04