cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: PHPCrawler or CURL library can't get content

From: bruce <badouglas_at_gmail.com>
Date: Wed, 7 Jan 2015 15:48:04 -0500

Hey.

If the page you're crawling gets you to a page that's dynamically generated
by a script, you're going to have to "run" the script to determine the
ultimate URL that the page/app redirects you to.

In some cases, you can look at the underlying jscript, in others you can
run a traffic sniffing app to examine the actual traffic.

The idea is is to be able to duplicate the process to get the target url in
your code.

In worse cases, you're going to need to get away from CURL/php and get to
running/building a headless browsing app. -- In these cases, check our
casper/phantomjs, etc..

On Wed, Jan 7, 2015 at 6:53 AM, xNokia <xnokia_at_nokiagate.com> wrote:

> Hi There,
>
> I'm using PHPCrawler class to get product titles from different stores
> such as eBay, the library does well with all stores I'm supporting in my
> application except Blink store website(
> http://blink.com.kw/search-result.aspx?text=mobile&searchfor=all) the
> website's search page is not normally initiated like other store websites,
> when I have followed the website's requests through Chrome Debugger I found
> that it is initiated by script, though the request url is identical to the
> original url I enter to the address bar on Chrome and the url I set in the
> class to crawl.
>
> So is there any way for the crawler class to fetch the page that I'm
> redirected to? I've used the setFollowRedirects methods but with no luck,
> because the redirect is done on client side through javascript not in the
> headers. Besides I've found an extra post request made after the normal get
> request, I've tried to add post data too but I get the same result an empty
> result set, and when I output the fetched page I get it without the
> products listed.
>
> Side Note: Blink store website is an ASP.net site, is this the cause that
> I can't crawl its pages?
>
> UPDATE
>
> I've tried to fetch the page using the standard php cURL function and
> echoed the response, the page is echoed incomplete and keeps refreshing.
>
>
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-users
> FAQ: http://curl.haxx.se/docs/faq.html
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>
>

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-01-07