cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: "pull" aspect of multi interface not quite working properly

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 22 Jun 2007 16:42:32 +0200 (CEST)

On Thu, 21 Jun 2007, Allen Pulsifer wrote:

> In digging into this further, the most important thing IMO is to prevent
> more than one read on the socket per call to curl_multi_perform. This will
> limit the amount of data that will be sent to the application.

I can agree to making such a change, at least offer it to applications that
want to have strict control of how much data they (can) get. It won't be
easily made though, as you will see below. Also, a recv() call does not
necessarily mean that data is passed to the application, nor that the read
amount is the same amount passed on.

> The amount of data read from the socket is limited to data->set.buffer_size,
> which is directly set by CURLOPT_BUFFERSIZE.

Right, just note that this setting is limited both upwards and downwards to
what sizes it accepts.

> If curl_multi_perform is limited to one read, and if there is no expansion
> of the data then the maximum amount of data sent to the application per call
> to curl_multi_perform will be CURLOPT_BUFFERSIZE.

Yes, if you set that to be within the acceptable boundaries.

> In certain circumstances however, there can be data expansion, such as when
> libcurl does on-the-fly content decoding. In that case, the application may
> get more than CURLOPT_BUFFERSIZE bytes in one call to curl_multi_perform and
> must be prepared to handle this.

Exactly.

> Only two set of changes to the code are needed to ensure read is only called
> once per call to curl_multi_perform.

Now we're reaching the trickier parts. Converting ideas into code... Both your
suggested changes will break things:

> while (easy->easy_handle->change.url_changed);
>
> while (0);
>
> This is a minor change. change.url_changed is only true when the
> application sets CURLOPT_URL while the connection is in progress, but in
> that one case, this change will ensure read is not called until the next
> time the application calls curl_multi_perform.

This was added to make libcurl properly deal with the (very rare) situation
when you get the new URL during the SSL handshake and the remote cert would
indicate the page to get. I don't currently recall why it was important to
allow libcurl to do the magic itself instead of leaving it to the app, but I
can dig up the reasoning if need be.

So there was no data at all delivered to the application before or during
this, but there were potentially multiple recv() calls...

> CHANGE:
>
> ((select_res & CSELECT_IN) || conn->bits.stream_was_rewound)) {
>
> TO:
>
> ((select_res & CSELECT_IN) || conn->bits.stream_was_rewound) ||
> data_pending(conn)) {

[...]

> This moves the test for data_pending(conn) out of the loop. As a result,
> curl_multi_perform will not loop back to repeat the read when
> data_pending(conn) is true, but instead it will return to the application
> and then call read the next time it is called.

Eh, no. Taking out that function will effectively ruin how libcurl works with
both OpenSSL and libssh2 as both those libs are sucking in data by themselves
to a buffer, so we cannot simply just wait for the socket to become readable
again to know if there is more data to read from them.

I also think both these changes are slightly too naive and simple-minded in
the way they just return instead of keeping some kind of state to be able to
get back to where they were in the next invoke, but perhaps most importantly
both cases should (must?) make curl_multi_perform() return
CURLM_CALL_MULTI_PERFORM.

Look, I'm not saying that what you want isn't doable and I'm all for working
towards making the behaviour of the pulling more predictable for applications.
But making this reality requires careful poking at the right places to prevent
other things from breaking at the same time.

> There is more than could be done to make libcurl even more "pull", for
> example, content decompression could be limited to CURLOPT_BUFFERSIZE before
> being resumed on the next call to curl_multi_perform, but this looks like it
> would require more significant changes to the code, and could alternately be
> accomplished by disabling decompression in libcurl and performing it in the
> application instead.

Indeed, if you truly want to set a fixed limit you shouldn't do automatic
decompression.

-- 
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html
Received on 2007-06-22