curl-and-python

Session switched from POST to GET fails with HTTP 400 "request verb is invalid"

From: Binney, Peter <Peter.Binney_at_commerzbank.com>
Date: Wed, 12 Nov 2014 11:27:20 +0100

I have a script that downloads files from a report page on an HTTPS ASP.NET website.

Because of the site's authentication mode, all fetches must be done from one Curl object (to maintain cookies). So the script starts with, inter alia:
        c = pycurl.Curl()
        c.setopt(c.PROXY, 'damprox.intranet.commerzbank.com:8080')
        c.setopt(c.COOKIEFILE, '') # Enables inbuilt cookie jar (send+receive)

It then does a GET followed by two POST's (preserving IIS __VIEWSTATE, inter alia) to get through the site's login page. There is then a third POST to get the report page (a list of URL's to files).

It then parses the report page and fetches each in turn, writing it to a file - using, basically:
        fPointer = open(savedFile, "wb")
        c.setopt(c.WRITEFUNCTION, fPointer.write)
        c.perform()
        fPointer.close()

The Curl object automatically switches from GET (used on first fetch) to POST mode because of the following on the second fetch:
        c.setopt(c.POSTFIELDS, postFieldsString)

There being no other mechanism I could find, I wanted to revert to GET mode for the file fetches, using:
        c.setopt(c.CUSTOMREQUEST, "GET") ## Can cause PycURL request corruption

When the files are being saved to a local filesystem, this works OK.
But if they are to a network share, the script fails after a few file GET's (usually one or two), with the remote server reporting:
        HTTP Error 400. The request verb is invalid.

I can find nothing on the Curl object to show the GET/POST state being used. But, when I observe it by setting VERBOSE mode, it does say it is doing a GET for each file.

The script runs on Windows XP using PycURL 7.19.5 and I have tried using both Python 2.7.6 and 3.4.0

If I remove the c.setopt(c.CUSTOMREQUEST, "GET"), so it POST's the file fetches, it works OK.
So I have a work-around, but I assume there's some bug inside PycURL (or Python) that is corrupting the HTTP request packet.

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2014-11-12