cURL / Mailing Lists / curl-library / Single Mail

curl-library

Performance analysis & issues found with SFTP

From: Patrik Thunström <patrik.thunstrom_at_bassetglobal.com>
Date: Fri, 18 Nov 2011 10:03:50 +0100

Hi!

 

As usual when I stumble upon these issues, it is not a clear cut between
what belongs to libcURL and what belongs in libssh2. Originally I started
writing this mail to suggest that the main issues seemed to stem from cURL,
but as I started verifying it started to look more and more like it is in
libssh2. So, this mail instead ends up as a heads up, so cURL/libcURL users
are aware of this pitfall, and I’ll be sending a separate, more in detail
mail to the libssh2 mailing list.

 

To sum up the reason for my investigation of this in short; Customer had
performance issues with SFTP upload against a specific very low latency high
bandwidth server (nothing really special, OpenSSH4.3 in a linux
environment). Roughly 450kB/s with libcURL where FileZilla did ~20MB/s.
Specific case was solved by increasing the CURL_MAX_WRITE_SIZE from the
standard 16kB to 16MB (where performance seemed to peak, and brought libcURL
speeds to ~25MB/s) as discussions both here and on libssh2 mailing list
already had suggested as way to improve performance.

 

This using libcURL 7.22.0, libssh 1.3.0 and OpenSSL 1.0.0e, on a Win32
platform (customer was as far as I could tell running Windows Server 2003,
but rest of testing was done on Windows 7). In our own test environment the
target machines was running Win7 with CoreSFTP server, and a Xubuntu
virtualbox VM running a OpenSSH sshd (reported as OpenSSH_5.8p1 via command
line).

 

So, our current test suite consists of four sample sets of files with sizes
1 x 2GB, 10 x 12MB, 100 x 1MB and 1000 x 20kB. They are first uploaded to
the target server, and then downloaded again. The tests described was
performed with various buffer sizes, due to the behavior noticed when
testing the 16MB buffer version.

 

Then to what was found during testing; Upload is only improved with larger
buffer size. The larger we tried at customer site, the better. On our local
test systems it wasn’t that bad to start off with, but we could also see
improvements.

 

Download however was an entirely different beast. The set with a single
large file had roughly the same transfer time, but the speed of the
transfer, going by the number provided to the progress callback, was very
very uneven. It could stay still for half a minute, before increasing with a
few hundred megabytes in a single jump.

 

The set of small files was also unaffected, if run by themselves.

 

The two “middle” cases however, suffered severe performance penalties from
using a larger buffer. It does not seem like the actual transfer itself is
that affected, as the numbers to the progress callback indicates, but there
are huge delays added surrounding the actual transfer. This also lead to
curl returning CURLE_SSH when ran against the CoreSFTP server, leading to
all the remaining files failing until the curl handle was teardown/reset.
This behavior only got worse the larger the internal buffer size was set. As
for the CURLE_SSH error, it was quite random, sometimes failing on first or
second file of the test set, sometimes 15-20 files down (when running the
sets isolated of each other).

 

Worst case scenario when looking at the performance was against the OpenSSH
server, where the test of 100x1MB files went from 18 seconds to 45 minutes
(18 seconds when having 16k buffer, 45 minutes when having 16M buffer).

Performance seems to be worsen for each and every call and seeing that the
100x1MB test is towards the end of the test suite, this is probably the
reason for the incredibly huge difference in time.

 

The performance decrease however is not a straight correlation to the number
of files already transferred, as the 1MB files gets bad performance from
file 1, whereas the smaller 20kB files almost unaffected unaffected, if run
by themselves. Running them as the final part of the test suite, performance
also dropped quite a lot however.

 

A separate issue I also found while debugging is that sometimes when doing
downloading the single 1,8GB file, the progress callback stopped firing
during the download. The WRITEFUNCTION callback was called regularly,
whereas the progress wasn’t called until the end of the file.

 

So, to sum up my experiences; If using libcURL for SFTP in a win32
environment, increasing internal buffer size will currently only be a good
move if you use it exclusively for uploading data. If downloading data,
increasing the CURL_MAX_WRITE_SIZE will get you quite a performance penalty.

 

As I said earlier, I will also throw this into the libssh2 mailing list for
discussion with more details.

 

Best regards

Patrik Thunström / patrik.thunstrom_at_bassetglobal.com

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2011-11-18