cURL / Mailing Lists / curl-and-php / Single Mail

curl-and-php

Re: Curl instance against web browser

From: jom dalina <user101110_at_yahoo.com>
Date: Mon, 2 Mar 2009 03:28:33 -0800 (PST)

that makes sense!

thanks for your reply sir.

However it happens that I am scraping a site that requires login, my problem is that in web browser, the site does not accept reload of browser, direct access of page thru url, using "BACK" button from browser and opening a page in new tab also get me an error session.

Now here is the scenario, I can login using curl and accessed the page after i successfully logged in, but i can't navigate the succeeding page. it get me an error that either my session(website), or i accessed the page directly.

Since the site only works for the browser where you open it, and using a new browser window or either a tab will result in an error, I just thought that every curl activity is independent instance to each other.

Or I might be missing some needed parameter of curl in php's set_opt. but i already tried the code for other site that uses session cookie and it works fine.
 

--- On Mon, 3/2/09, Daniel Stenberg <daniel_at_haxx.se> wrote:
From: Daniel Stenberg <daniel_at_haxx.se>
Subject: Re: Curl instance against web browser
To: "curl with PHP" <curl-and-php_at_cool.haxx.se>
Date: Monday, March 2, 2009, 2:24 AM

On Sun, 1 Mar 2009, jom dalina wrote:

> does opening a curl session equivalent to opening a browser instance (e.g.
opening a firefox browser)?
>
> if so, after scraping a web page, without closing the curl session and
scrape another web page, does it consider the same browser instance opened or it
is it consider another browser instance? (e.g. like closing the previously
opened browser and opening a new one)

Yes and no.

HTTP is stateless so there's no difference to a web site if you do one
request, then close your browser, start it again and then do the next request.

But a TCP connection to a server can be re-used for multiple requests which
makes the subsequent ones faster and CURL/PHP can do that as well as your
browser.

Then there's also stuff like cookies, but that has nothing to do with the
TCP connection and CURL/PHP can do them fine disregarding if you re-use a handle
or not.

-- 
 / daniel.haxx.se
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
Received on 2009-03-02