cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: libcurl question, Range support for http

From: Goswin von Brederlow <brederlo_at_informatik.uni-tuebingen.de>
Date: Fri, 26 Nov 2004 14:28:27 +0100

"Guillaume Arluison" <ga_at_lacde.net> writes:

>> > for url in urls
>> > do
>> > curlObj->set(url)
>> > curlObj->perform()
>> > do whatever you want with the data retrieved
>> > done
>>
>> Which sends one header, waits for the round robin, read all the
>> data, send second header, wait for the round robin, read, send, wait,
>> read, send, wait, read.
>>
>> Notice all that waiting? If I have to do 10000+ requests all that
>> waiting accumulates to a severe performance penalty.
> No actually I dont notice it sorry because it doesnt happen.
> It depends what do you call 'round robin' but nevertheless with any method
> of load balancing you use (dns/cookie whatever): once the TCP connection is
> made on the chosen server it is still open for the next transactions (what
> you call send header/read data).
> It is the purpose of the keepAlive thinggy and the difference between curl
> and lot of other similar get programs is that curl doesnt cut the connection
> between perform !
>
> So unless you have a weird and innefficient load balancer the pseud code
> above will do :
> Make a tcp connection/waits for your round robin
> sends header

300ms for the header to travel throguh the net
a few ms to parse and send the data
300ms for the data to travel back

> read data
> sends header

another 600+ms wait

> read data
> ...
>
> until either your server close the connection because of internal
> configuration / waited too much / client disconnect / network pbs. (which
> may happen if you have 10000+ requests) but curl will still be able to
> recreate the connection when needed or r-e-u-s-e the previous one if it is
> available.

Let do a little (exagerated) example. Requests are for blocks, which
might be as little as 2K, and Range only allows a limited number or
ranges in one request, say the server only accepts 10. So I can
request 20K discontinous blocks per header.

Now say I have the worst case of a 5GB dvd iso image and I need every
other block. I need to send 5*1024*1024/20/2 = 131072 headers to the
server.

With a 0.6s round-trip delay for each header that is nearly 22 hours
just waiting.

I know a 2K blocksize for dvd isos is small but I hope this shows my
point.

>> Which sends one header, waits for the round robin, read all the
>> data, send second header, wait for the round robin, read, send, wait,
> Actually with curl or any other program that internally does what you say
> only one perfom with 10000+ files plz give us how http will handle this in
> only ONE header/read data, data, data, data.

It doesn't.

You just send the next header before the first one is finished.

send header 1
send header 2
send header 3
read reply 1
send header 4
read reply 2
...

As stated in RFC-2086: http://www.freesoft.org/CIE/RFC/2068/52.htm

| 8.1.1 Purpose [of persistant connections]
|
| # HTTP requests and responses can be pipelined on a connection.
| Pipelining allows a client to make multiple requests without
| waiting for each response, allowing a single TCP connection to be
| used much more efficiently, with much lower elapsed time.

> Dont mix HTTP layer and TCP layer.

MfG
        Goswin
Received on 2004-11-26