cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Data retrieval is fragmented using Multi CURL Lib

From: Ray Satiro via curl-library <curl-library_at_cool.haxx.se>
Date: Sat, 7 Nov 2015 14:43:36 -0500

On 11/7/2015 7:26 AM, doa379 wrote:
>>> I am using multi libcurl to download data from various sources. It's
>>> all working except for the issue that the data retrieved from the
>>> various sources is fragmented and jumbled. For example if you have
>>> this JSON data in any order:
>>>
>>> {{ JSON1 }, { JSON2 }, { JSON3 }}
>>> {{ JSON1 }, { JSON3 }, { JSON2 }}
>>> {{ JSON2 }, { JSON1 }, { JSON3 }}
>>> .
>>> .
>>> .
>>>
>>>
>>> it may appear in the resulting stream like this:
>>>
>>> {{ JSON1, { JSON3 }}, { JSON2 }}
>>>
>>> All the data is successfully downloaded but just in the wrong order.
>>> As it makes parsing the data difficult I would like to resolve this
>>> issue.
>>
>> Please review the bug reporting requirements [1]. Also ... if you can
>> give a self contained compilable example that can be used to reproduce
>> your problem that would be great.
>>
>> If you are receiving JSON from different handles and it is jumbled I
>> wonder if maybe your write function is using the same location for more
>> than one handle? It sounds like an issue with your write function.
>>
>>
>
> This is not a CURL bug. Yes the write function is writing to the same
> buffer using multiple handles.
>
> Would the suggestion be to use multiple buffers corresponding to each
> handle?
>
> I would prefer to keep it simple and use a single buffer but rather
> have the writes ordered using some way.

You'll need a separate write location for each handle if you are using
the multi interface because it's doing the transfers simultaneously.
Partial data from handle A may be received, followed by partial data
from handle B, followed by partial data from handle A, followed by
partial data from handle C, etc. It works well as long as you're writing
each handle's received content to a separate location.

As to your query on receiving JSON datums one at a time, I'm assuming
you mean you want to know when a whole valid JSON object is received
from one of several concurrent streams. Could you not wait until a
transfer is complete, or are you receiving multiple objects for each
transfer? The latter may be difficult to deal with. If server A sends all of
{"foo":"bar"}
{"baz":"qux"}
...
And you want to know immediately when {"foo":"bar"} is received without
waiting for {"baz":"qux"} then you'd have to find some way of parsing
it. For example the write function gets called by libcurl with some
partials like this
write("\"{fo")
write("o\":\"bar\"}\n{\"b")
write("az\":\"qux\"}\n")
This is for example purposes of course, pardon my pseudo code this is
just to make things easier to understand.
So by the second write you've received the first object from stream n.
In your write function you would have to be parsing the object to know a
whole object has been received. libcurl will not do that for you. I'd
parse it in realtime which would only be easy if I knew for certain it's
a line delimited type of JSON (?x-json-stream or something, check the
transfer's Content-Type ), then I'd just detect each complete object on
receipt of a newline.

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-11-07