cURL / Mailing Lists / curl-library / Single Mail

curl-library

Downloading does not work via Socks proxy - what could it be?

From: Felix E. Klee <felix.klee_at_inka.de>
Date: Fri, 8 Nov 2013 12:29:10 +0100

Using our server example.com, we scrape content from certain web pages.
The IP of example.com is white listed by the owners of the web pages.

Now, development is happening on an other machine, dev.example.com. As
this machine has a different IP, connections are done via a Socks proxy,
which is established via OpenSSH:

    user_at_dev.example.com$ ssh -C -f -q -N -D9999 example.com

Now, for *some* web pages, download does not work on dev.example.com.
After a request is sent from dev.example.com there is simply no
response (or - I think - sometimes it just doesn't finish).

I logged the downloads:

  * example.com:

        Nov 08 10:57:56 Daemon started
        Nov 08 10:58:13 Data type: 0
        Nov 08 10:58:13 Data (57): About to connect() to site.com port 80 (#0)

        Nov 08 10:58:13 Data type: 0
        Nov 08 10:58:13 Data (25): Trying 123.456.789.123....

        Nov 08 10:58:13 Data type: 0
        Nov 08 10:58:13 Data (63): Connected to site.com
(123.456.789.123) port 80 (#0)

        Nov 08 10:58:13 Data type: 2
        Nov 08 10:58:13 Data (255): POST /CSearch.aspx HTTP/1.1^M
        User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:13.0)
Gecko/20100101 Firefox/13.0.1^M
        Host: site.com^M
        Accept: */*^M
        Accept-Encoding: deflate,gzip^M
        Content-Type: application/x-www-form-urlencoded^M
        Content-Length: 984^M
        ^M
        [...]

  * dev.example.com (download fails):

        Nov 08 10:46:00 Daemon started
        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (53): About to connect() to proxy
localhost port 9999 (#0)

        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (22): Trying 127.0.0.1...

        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (4): 208

        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (3): 72

        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (2): 8

        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (4): 246

        Nov 08 10:46:21 Data type: 0
        Nov 08 10:46:21 Data (50): Connected to localhost (127.0.0.1)
port 9999 (#0)

        Nov 08 10:46:21 Data type: 2
        Nov 08 10:46:21 Data (255): POST /CSearch.aspx HTTP/1.1^M
        User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:13.0)
Gecko/20100101 Firefox/13.0.1^M
        Host: site.com^M
        Accept: */*^M
        Accept-Encoding: deflate,gzip^M
        Content-Type: application/x-www-form-urlencoded^M
        Content-Length: 984^M
        ^M
        [...]

Note:

  * Scraping is initialized by sending a `POST` request.

  * In the example.com log, it says `Trying 123.456.789.123....`
    (site.com). That is not reported in the dev.example.com log. Here it
    only says `Trying 127.0.0.1...` (proxy).

  * This problem only affects certain web pages, perhaps these where
    connection is initiated by `POST`.

Anything suspicious? What could be a starting point for debugging?
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2013-11-08