cURL / Mailing Lists / curl-users / Single Mail

curl-users

PyCURL interface - Uploading large binary files

From: Jesse Noller <jnoller_at_archivas.com>
Date: Wed, 04 Feb 2004 17:50:06 -0500

The problem: I am writing a file uploading utility in python that uses
the walk() function to parse a directory, finding any file under that
directory, and upload it to a remote server using the pyCURL curl
interface. The files are invariably binary files, and the upload
method is via an HTTP PUT to the system.

I also need to perform the reverse - I need to GET those files and
write them to disk.

The problem I am seeing is memory and time outs. Currently, I call
os.path.walk(dir), and then I call the upload function. The upload
function basically goes (the formatting got nuked when I pasted it):

f = open(filepath, "rb")
fs = os.path.getsize(filepath)

c = pycurl.Curl()
c.setopt(c.URL, target_url)
c.setopt(c.HTTPHEADER, ["User-Agent: Load Tool (PyCURL Load Tool)"])
c.setopt(c.PUT, 1)
c.setopt(c.READDATA, f)
c.setopt(c.INFILESIZE, int(fs))
c.setopt(c.NOSIGNAL, 1)
         if verbose == 'true':
    c.setopt(c.VERBOSE, 1)
c.body = StringIO()
                 c.setopt(c.WRITEFUNCTION, c.body.write)
try:
c.perform()
except:
import traceback
traceback.print_exc(file=sys.stderr)
sys.stderr.flush()
f.close()
c.close()
sys.stdout.write(".")
sys.stdout.flush()

This opens the file via open() - which reads the file into memory.
This of course, causes problems when the client machine only has 512
megs of ram and we're uploading a 2-3 gig file (barring the argument
against doing this via HTTP PUT).

I am also running into the problem where if I hit a ~260 Megabyte file,
 I start getting intermittent (to constant depending on file size) errors:

* Empty reply from server

Originally I assumed this was because I was contacting the server with a form
post method and libcurl was taking too long to encode the file which caused
a timeout. This is not the case - I can recreate it with a PUT method as shown above.

The verbose output from the session is:

* About to connect() to target:8080
* Connected to target.foobar.com (10.1.1.6) port 8080
> POST /put HTTP/1.1
Host: target-036:8080
Pragma: no-cache
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
User-Agent: Zoidberg (PyCURL Load Tool)
Content-Length: 314573158
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------46b3250c0a30
 
* Empty reply from server
* Connection #0 left intact

If I run a: curl -vT file URL on the command line - it PUTs properly, so
I have to assume it's something with the way I, or pycurl is invoking the libcurl
interface.

Does anyone know a more efficient method to do this with? Please also
note I am measuring the metrics for each transaction sent too - so I
don't want to chunk and then upload, as I only get metrics for the
chunks.

The metrics measuring comes before the c.close() function:

speed_up = c.getinfo(c.SPEED_UPLOAD)
size_up = c.getinfo(c.SIZE_UPLOAD)
ttime = c.getinfo(c.TOTAL_TIME)
ctime = c.getinfo(c.CONNECT_TIME)
sttime = c.getinfo(c.STARTTRANSFER_TIME)

Does anyone have any thoughts?

Thank you

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
Received on 2004-02-04