cURL / Mailing Lists / curl-library / Single Mail

curl-library

Hashing while downloading

From: Leon Winter <winter-curl_at_bfw-online.de>
Date: Mon, 19 Jan 2015 07:05:20 +0100

Hi,

a friend of mine is working on "modernizing" Apt of debian. Apparently
there were running some obscure hand-written HTTP code resulting in very
low performance. Not to mention potential security issues and lacking
features. He therefore now rewrites Apt to use curl. Apt uses a Hash
function (MD5) to verify its downloads.
In the existing curl codebase there is already the Metalink
implementation which also does hash verification _after_ downloading a
file. However hash functions could be fed while downloading. Especially
when the operating on big files, this increases performance (in terms of
"not waiting for md5sum after the download") dramatically.
So on the command line this is trivial:

 curl $(URL) | tee download | md5sum

Also with the curl library this is trivial, just using the
WRITEFUNCTION/WRITEDATA callback to feed the hash function like in the
getinmemory example shipped with curl. However in order to do this one
needs to link against a library providing a hash function. Curl however
already has such dependencies and even has a small abstraction layer for
Md5. However this is not exported and only used internally. Projects
like Apt would depend on curl which in turn would depend on a TLS
library (in Debians case GnuTLS). When implementing the MD5 hashing one
would need to make use of hash function from this crypto library
possibly by copying the curls abstraction layer over many TLS/crypto
libraries yet again.
It is noteworthy that this copy'n'pasting already happened inside curl
to some extend:

 lib/md5.c
 src/tool_metalink.c (albeit abstracting over more hash functions)

While looking into this I also noticed that the metalink code does the
verification _after_ the download, which Daniel also mentions [0]. In
the mentioned RFCs about the headers and XML format I found no mention
of the time of the hash processing.
Why not do it while downloading?

Should we either export the awesome abstractions curl offers for
hashes or possibly also TLS (the VTLS layer) to outside?
Should we add HASHFUNCTION to CURLoption, so curl would automatically
compute the hash for a download while downloading? (This would be
somewhat easy I figure)
Shouldn't the metalink implementation make use of the MD5 abstraction
already in place?
One way or the other, to make Debians Apt less horrible, one would like
to have hashing while downloading.

Regards,
Leon

[0] http://daniel.haxx.se/blog/2012/06/03/curling-the-metalink/
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-01-19