PyCURL interface - Uploading large binary files
Date: Wed, 04 Feb 2004 17:50:06 -0500
The problem: I am writing a file uploading utility in python that uses
the walk() function to parse a directory, finding any file under that
directory, and upload it to a remote server using the pyCURL curl
interface. The files are invariably binary files, and the upload
method is via an HTTP PUT to the system.
I also need to perform the reverse - I need to GET those files and
write them to disk.
The problem I am seeing is memory and time outs. Currently, I call
os.path.walk(dir), and then I call the upload function. The upload
function basically goes (the formatting got nuked when I pasted it):
f = open(filepath, "rb")
fs = os.path.getsize(filepath)
c = pycurl.Curl()
c.setopt(c.HTTPHEADER, ["User-Agent: Load Tool (PyCURL Load Tool)"])
if verbose == 'true':
c.body = StringIO()
This opens the file via open() - which reads the file into memory.
This of course, causes problems when the client machine only has 512
megs of ram and we're uploading a 2-3 gig file (barring the argument
against doing this via HTTP PUT).
I am also running into the problem where if I hit a ~260 Megabyte file,
I start getting intermittent (to constant depending on file size) errors:
* Empty reply from server
Originally I assumed this was because I was contacting the server with a form
post method and libcurl was taking too long to encode the file which caused
a timeout. This is not the case - I can recreate it with a PUT method as shown above.
The verbose output from the session is:
* About to connect() to target:8080
* Connected to target.foobar.com (10.1.1.6) port 8080
> POST /put HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
User-Agent: Zoidberg (PyCURL Load Tool)
Content-Type: multipart/form-data; boundary=----------------------------46b3250c0a30
* Empty reply from server
* Connection #0 left intact
If I run a: curl -vT file URL on the command line - it PUTs properly, so
I have to assume it's something with the way I, or pycurl is invoking the libcurl
Does anyone know a more efficient method to do this with? Please also
note I am measuring the metrics for each transaction sent too - so I
don't want to chunk and then upload, as I only get metrics for the
The metrics measuring comes before the c.close() function:
speed_up = c.getinfo(c.SPEED_UPLOAD)
size_up = c.getinfo(c.SIZE_UPLOAD)
ttime = c.getinfo(c.TOTAL_TIME)
ctime = c.getinfo(c.CONNECT_TIME)
sttime = c.getinfo(c.STARTTRANSFER_TIME)
Does anyone have any thoughts?
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
Received on 2004-02-04