cURL / Mailing Lists / curl-and-php / Single Mail

curl-and-php

Re: curl not scraping Microsoft.com (something wrong?)

From: <spamiam_at_aroint.org>
Date: Thu, 14 Apr 2005 10:03:30 -0400 (EDT)

On Thu, 14 Apr 2005, Daniel Stenberg wrote:

> On Wed, 13 Apr 2005 spamiam_at_aroint.org wrote:
>
> > Tried that too... none of the documentation or FAQs on your website appeared
> > to address this type of site. I would be happy to help write appropriate
> > LiveHTTPHeaders documentation for your website if I could first figure out
> > how to get it working!
>
> I beg to differ. The info on the site explains how to deal with *any* type of
> HTTP server. microsoft.com is not a different type, it just uses a particular
> set of checks and requirements. Many sites are unique in that aspect.
>
> If you repeat the sequence *exactly* as LiveHTTPHeaders reported that your
> browser did, it will work.
>
> I've done things like this with curl hundreds of times.
>

I'm sure you have had many successes and can do this in a blink of an eye.
But as a novice, I've been able to retrieve 1 out of 3 sites.

Perhaps I missed where on your website it explains precisely *what* needs
to be copied from the headers and *how* to do it in curl, in a
step-by-step fashion. I'm not expecting a cookie cutter solution, but it
shouldn't be some mystical process, either.

Any help will be much appreciated!
Received on 2005-04-14