curl / Mailing Lists / curl-library / Single Mail

Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: Should CURLOPT_READFUNCTION need EOF twice?

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Jiří Hruška via curl-library <curl-library_at_lists.haxx.se>
Date: Thu, 16 Nov 2023 04:46:13 +0100

> I don't think this behavior was done on purpose and I think it would benefit
> users if we made it not do this.I think it should be sufficient for the
> callback to signal the end of the read *once*.

Thanks for the clarification. Actually I've found even other hints that this
was the case since sending the message, for example:

    tests/libtest/lib670.c:
        fprintf(stderr, "Read callback called after EOF\n");

> Fixing the code probably requires a better separation between end-of-input
> and end-of-upload than what that function does right now.

Yeah. And it should probably happen already in the inner Curl_fillreadbuffer(),
in order to be able to give a strong guarantee to users of CURLOPT_READFUNCTION
in general.

What do you think about a new BIT(fread_func_eof) in RequestState and
something like this?

    if(!data->req.fread_func_eof) {
      Curl_set_in_callback(data, true);
      nread = data->state.fread_func(data->req.upload_fromhere, 1,
                                     buffersize, data->state.in);
      Curl_set_in_callback(data, false);

      /* make sure the callback is not called again after EOF */
      data->req.fread_func_eof = (nread == 0);
    }
    else
      nread = 0;

> Writing up a test case to reproduce this scenario might take some creativity.
> Do you have any special setup that makes this happen more likely than others?

I ran the whole test suite with the code above and got no hits of the `else`
branch (with some extra code to silently create a marker file for me to see).

Then I found that it is this part of readwrite_upload() which limits the
exposure of this bug:

827: if(0 != k->upload_present &&
828: k->upload_present < curl_upload_refill_watermark(data) &&
829: !k->upload_chunky &&/*(variable sized chunked header;
append not safe)*/
830: !k->upload_done && /*!(k->upload_done once k->upload_present sent)*/
831: !(k->writebytecount + k->upload_present - k->pendingheader ==
832: data->state.infilesize)) {
833: offset = k->upload_present;
834: }

So it will happen only if:
#1: uploading something with an unknown size (lines 832-832)
#2: using HTTP/2 (because otherwise no size implies TE: chunked, line 829)
#3: the buffer contains between 1..(¹/₃₂ · upload_buffer_size) at the end of
    the upload, i.e. 3.125% chance for an arbitrary length (line 828)

With this knowledge, I can reproduce it at will:

$ cat tests/http/gen/apache/docs/data-10k | \
    CURL_SMALLSENDS=2000 ./src/curl --upload-file - --http2 \
    --insecure --url https://127.0.0.1:45779/curltest/put
...
readwrite_upload(): k->upload_present=0, offset=0
CURLOPT_READFUNCTION called, fillcount=10240
readwrite_upload(): k->upload_present=8240, offset=0
readwrite_upload(): k->upload_present=6240, offset=0
readwrite_upload(): k->upload_present=4240, offset=0
readwrite_upload(): k->upload_present=2240, offset=0
readwrite_upload(): k->upload_present=240, offset=240
CURLOPT_READFUNCTION called, fillcount=0
readwrite_upload(): k->upload_present=0, offset=0
CURLOPT_READFUNCTION called, fillcount=0
...

It has a piped `cat` and `--upload-file -` to address #1, `--http2` to address
#2, and the library must be compiled with CURLDEBUG to be able to do
CURL_SMALLSENDS=2000 to address #3. (UPLOADBUFFER_DEFAULT is 65536, divided by
32 gives 2048, so using 2000 makes sure that the condition always applies.)

Doing the same in tests is tricky, though. I got no hits for the `else` branch
in the whole test suite with CURL_SMALLSENDS=2000 either.

What tests/http/test_07_upload.py does comes the closest, but the tests upload
bigger data from files, so they do not satisfy the unknown upload size
condition #1.

When I wrote a new test that used `-T -` and a pipe instead, the source data
sizes like 100k or 10M could not hit #3 without CURL_SMALLSENDS either. That
seems to be the hardest condition to satisfy in general, because it really
depends on how the I/O behaves. If the network can take up to 64k bytes at
a time, getting the right amount of data to remain in the buffer at EOF is
very tricky.

Do you have any more creative ideas to try?
Or would a manual test with CURL_SMALLSEND be good enough?

-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Received on 2023-11-16

This message: [ Message body ]
Next message: Jiří Hruška via curl-library: "Re: Should CURLOPT_READFUNCTION need EOF twice?"
Previous message: Daniel Stenberg via curl-library: "Re: Should CURLOPT_READFUNCTION need EOF twice?"
In reply to: Daniel Stenberg via curl-library: "Re: Should CURLOPT_READFUNCTION need EOF twice?"
Next in thread: Jiří Hruška via curl-library: "Re: Should CURLOPT_READFUNCTION need EOF twice?"
Reply: Jiří Hruška via curl-library: "Re: Should CURLOPT_READFUNCTION need EOF twice?"
Reply: Daniel Stenberg via curl-library: "Re: Should CURLOPT_READFUNCTION need EOF twice?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]