cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Performance analysis & issues found with SFTP

From: Jonas Schnelli <jonas.schnelli_at_include7.ch>
Date: Fri, 18 Nov 2011 12:35:19 +0100

> Hi!
>
> As usual when I stumble upon these issues, it is not a clear cut between what belongs to libcURL and what belongs in libssh2. Originally I started writing this mail to suggest that the main issues seemed to stem from cURL, but as I started verifying it started to look more and more like it is in libssh2. So, this mail instead ends up as a heads up, so cURL/libcURL users are aware of this pitfall, and I’ll be sending a separate, more in detail mail to the libssh2 mailing list.
>
> To sum up the reason for my investigation of this in short; Customer had performance issues with SFTP upload against a specific very low latency high bandwidth server (nothing really special, OpenSSH4.3 in a linux environment). Roughly 450kB/s with libcURL where FileZilla did ~20MB/s. Specific case was solved by increasing the CURL_MAX_WRITE_SIZE from the standard 16kB to 16MB (where performance seemed to peak, and brought libcURL speeds to ~25MB/s) as discussions both here and on libssh2 mailing list already had suggested as way to improve performance.
>
> This using libcURL 7.22.0, libssh 1.3.0 and OpenSSL 1.0.0e, on a Win32 platform (customer was as far as I could tell running Windows Server 2003, but rest of testing was done on Windows 7). In our own test environment the target machines was running Win7 with CoreSFTP server, and a Xubuntu virtualbox VM running a OpenSSH sshd (reported as OpenSSH_5.8p1 via command line).
>
> So, our current test suite consists of four sample sets of files with sizes 1 x 2GB, 10 x 12MB, 100 x 1MB and 1000 x 20kB. They are first uploaded to the target server, and then downloaded again. The tests described was performed with various buffer sizes, due to the behavior noticed when testing the 16MB buffer version.
>
> Then to what was found during testing; Upload is only improved with larger buffer size. The larger we tried at customer site, the better. On our local test systems it wasn’t that bad to start off with, but we could also see improvements.
>
> Download however was an entirely different beast. The set with a single large file had roughly the same transfer time, but the speed of the transfer, going by the number provided to the progress callback, was very very uneven. It could stay still for half a minute, before increasing with a few hundred megabytes in a single jump.
>
> The set of small files was also unaffected, if run by themselves.
>
> The two “middle” cases however, suffered severe performance penalties from using a larger buffer. It does not seem like the actual transfer itself is that affected, as the numbers to the progress callback indicates, but there are huge delays added surrounding the actual transfer. This also lead to curl returning CURLE_SSH when ran against the CoreSFTP server, leading to all the remaining files failing until the curl handle was teardown/reset. This behavior only got worse the larger the internal buffer size was set. As for the CURLE_SSH error, it was quite random, sometimes failing on first or second file of the test set, sometimes 15-20 files down (when running the sets isolated of each other).
>
> Worst case scenario when looking at the performance was against the OpenSSH server, where the test of 100x1MB files went from 18 seconds to 45 minutes (18 seconds when having 16k buffer, 45 minutes when having 16M buffer).
> Performance seems to be worsen for each and every call and seeing that the 100x1MB test is towards the end of the test suite, this is probably the reason for the incredibly huge difference in time.
>
> The performance decrease however is not a straight correlation to the number of files already transferred, as the 1MB files gets bad performance from file 1, whereas the smaller 20kB files almost unaffected unaffected, if run by themselves. Running them as the final part of the test suite, performance also dropped quite a lot however.
>
> A separate issue I also found while debugging is that sometimes when doing downloading the single 1,8GB file, the progress callback stopped firing during the download. The WRITEFUNCTION callback was called regularly, whereas the progress wasn’t called until the end of the file.
>
> So, to sum up my experiences; If using libcURL for SFTP in a win32 environment, increasing internal buffer size will currently only be a good move if you use it exclusively for uploading data. If downloading data, increasing the CURL_MAX_WRITE_SIZE will get you quite a performance penalty.
>

Thanks Patrick.
Your work/support is really appreciated.

I'm also involved in a task where i need to improve the SFTP performance of libcurl / curl with libssh2.
My "study" is not complete yet, but what i can say:
- the performance of libcurl/libssh2 on sftp file transfer is not as good as with other file transfer applications.

I also focus on the CPU usage when using SFTP (not only the network performance).

It's not to blame someone or to blame on of the projects (curl/libssh2): it's all about improving the performance and weld both parts better together.
The performance issue is not solved by improving the buffer size. I think thats also what you test-results shown us.

As soon as the issue is on the top of my backlog. I would like to do the following:
-> write a testscript / benchmark script (in perl?!), upload/download scenario could be like you described (1 GB file, 100x1MB file, etc.)
--> one version should use libcurl / curl
--> one version should use sftp command line utility
--> one version should use libssh2 directly (to compare libcurl agains direct usage of libssh2)
-> benchmark other sftp transfer tools (Transmit on Mac, FileZilla, etc.)
-> run the testscript / benchmark-script in various environments (100Mbit, DSL, 3G, Edge)
-> see where performance is bad (valgrind, instruments/Xcode)
-> try to do improvements and re-run the tests

Who else is working on the performance of SFTP with libcurl/libssh2?
And hints about the process to make libcurl/libssh2 on SFTP faster?
We should join forces and build up a proper benchmark/test environment (at least windows/linux/mac).

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2011-11-18