cURL / Mailing Lists / curl-and-php / Single Mail

curl-and-php

Re: curl not scraping Microsoft.com (something wrong?)

From: <spamiam_at_aroint.org>
Date: Fri, 15 Apr 2005 14:07:31 -0400 (EDT)

Problem solved, thanks to Kirk!!

Here is Kirk's code that seems to work for most websites. (Perhaps this
should be added to the examples page?)

<?php
      $ch = curl_init();
      $header[] = "Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
      $header[] = "Accept-Language: en-us,en;q=0.5";
      $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
      $header[] = "Keep-Alive: 300";
      $header[] = "Pragma:";
      curl_setopt ($ch, CURLOPT_URL,
        "http://yourdomainhere.com/Search.aspx?action=search");
      curl_setopt ($ch, CURLOPT_USERAGENT,
         "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)");
      curl_setopt ($ch, CURLOPT_COOKIEJAR, "cookies.txt");
      curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookies.txt");
      curl_setopt ($ch, CURLOPT_HEADER, 1);
      curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
      curl_setopt ($ch, CURLOPT_TIMEOUT, 300);
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
      curl_setopt ($ch, CURLOPT_ENCODING,"");
      curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
      $string = curl_exec ($ch);
      curl_close($ch);
      echo $string;
?>

On Thu, 14 Apr 2005, Kirk Hedden wrote:

>
> >
> >I'm sure you have had many successes and can do this in a blink of an eye.
> >But as a novice, I've been able to retrieve 1 out of 3 sites.
> >
> >Perhaps I missed where on your website it explains precisely *what* needs
> >to be copied from the headers and *how* to do it in curl, in a
> >step-by-step fashion. I'm not expecting a cookie cutter solution, but it
> >shouldn't be some mystical process, either.
> >
> >Any help will be much appreciated!
>
> I don't know why I did this, but I ran the url through my code and it
> worked, so I looked to see what I was doing that you weren't.
>
> You need to set the CURLOPT_COOKIEJAR option.
>
> CURL is not an http primer. It's an http tool. If you don't know http,
> you'll have a hard time using it. The docs aren't the best, but the info is
> there.
>
> Best,
> Kirk
>
Received on 2005-04-15