cURL
Haxx ad
libcurl

curl's project page on SourceForge.net

Sponsors:
Haxx

cURL > Mailing List > Monthly Index > Single Mail

curl-and-php mailing list Archives

Re: curl not scraping Microsoft.com (something wrong?)

From: <spamiam_at_aroint.org>
Date: Thu, 14 Apr 2005 10:03:30 -0400 (EDT)

On Thu, 14 Apr 2005, Daniel Stenberg wrote:

> On Wed, 13 Apr 2005 spamiam_at_aroint.org wrote:
>
> > Tried that too... none of the documentation or FAQs on your website appeared
> > to address this type of site. I would be happy to help write appropriate
> > LiveHTTPHeaders documentation for your website if I could first figure out
> > how to get it working!
>
> I beg to differ. The info on the site explains how to deal with *any* type of
> HTTP server. microsoft.com is not a different type, it just uses a particular
> set of checks and requirements. Many sites are unique in that aspect.
>
> If you repeat the sequence *exactly* as LiveHTTPHeaders reported that your
> browser did, it will work.
>
> I've done things like this with curl hundreds of times.
>

I'm sure you have had many successes and can do this in a blink of an eye.
But as a novice, I've been able to retrieve 1 out of 3 sites.

Perhaps I missed where on your website it explains precisely *what* needs
to be copied from the headers and *how* to do it in curl, in a
step-by-step fashion. I'm not expecting a cookie cutter solution, but it
shouldn't be some mystical process, either.

Any help will be much appreciated!
Received on 2005-04-14

These mail archives are generated by hypermail.

donate! Page updated November 12, 2010.
web site info

File upload with ASP.NET