cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: debugging a crash in Curl_pgrsTime/checkPendPipeline?

From: <johansen_at_sun.com>
Date: Fri, 7 Aug 2009 14:14:44 -0700

On Fri, Aug 07, 2009 at 09:11:59PM +0200, Daniel Stenberg wrote:
> On Fri, 7 Aug 2009, johansen_at_sun.com wrote:
>> I wondered the same thing myself, but I don't have enough experience to
>> make a broad generalization. I don't see any code or documentation
>> that forbids users from passing a FILE * to CURLOPT_WRITEDATA that has
>> been fseek'd to a different offset in the file. If we write a method
>> that seeks back to 0 when Curl_do() loses the connection, then we'll
>> break the ability for users to do that. On the other hand, it seems
>> like a bit of a contrived case. However, that case lead me to consider
>> simply returning an error to be a better approach. (It's hard to
>> anticipate what the library's callers might be trying to accomplish).
>
> Indeed. If a write seek is deemed necessary, we'd have to do it like the
> SEEKFUNCTION so that libcurl asks the app to seek with a callback, but
> the app might not be able to either for example in the case it simply
> passes along the data like in a pipe or similar.

Sorry, I should have been more clear in my previous comment. My
concern was that we would probably also want to save the value of the
offset at the time the client sets WRITEDATA. Otherwise, we don't know
how far to seek backwards.

>> OpenSolaris is making use of this code pretty heavily now. We picked
>> libcurl for the packaging client because it allowed us to perform
>> pipelined downloads.
>
> Out of curiosity, did you measure how much impact pipelining has on your
> typical use cases in terms of speed?

In our standard use case, we were actually able to achieve parity with
our previous transfer model. The old mechanism sent a POST request with
a list of files to retrieve, and then the server responded with a binary
blob.

When we tested against a server that had no other clients, performance
depended on the size of the files. If we downloaded a package that had
lots of small files, performance was up to 20% worse. If we downloaded
a package that had lots of large files, there was no difference. When
we compared against a server that was handling other traffic, it was a
lot harder to observe any difference, perhaps only 5% on a package with
lots of small files.

Where we ran into trouble the most was with misconfigured
load-balancers. If the backend had connection re-use disabled, or
terminated a connection after a small number of requests, that really
reduced the client's performance -- about 2x in some cases.

One of the great things about switching from POST to pipelined GET, at
least for our application, was that the GETs are cachable. If the
client goes through a web cache, only the first download takes the
performance penalty. Subsequent downloads, depending upon their cache
hit rates, got 20-40% speed improvement, sometimes more.

-j
Received on 2009-08-07