cURL / Mailing Lists / curl-library / Single Mail

curl-library

Can't download URL via libcurl but can using curl

From: Lyndon Hill <emptystate_at_yahoo.co.uk>
Date: Mon, 29 Mar 2010 21:11:34 +0000 (GMT)

Hi,

I'm trying to use libcurl to download the RSS feed from Google News. The
default feed given to me is

http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss

When I try to get it using libcurl the server gives me a HTML webpage (200
response). In CURLOPT_VERBOSE mode I get:

* About to connect() to news.google.com port 80 (#0)
* Trying 74.125.79.99... * Connected to news.google.com (74.125.79.99) port 80 (#0)
> GET /news?pz=1&amp;cf=all&amp;ned=uk&amp;hl=en&amp;topic=h&amp;num=3&amp;output=rss HTTP/1.1
User-Agent: myapplication/1.0
Host: news.google.com
Accept: */*

< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8

I can get the RSS feed via the curl command line no problems,

curl -i -A "myapplication/1.0" "http://news.google.com/news?pz=1&cf=all&ned=uk&hl=en&topic=h&num=3&output=rss"

This gives XML (also a 200 response).

You'll notice that Google doesn't like to be scraped, hence setting a user
agent string. I'm thinking that they detect my application as a scraper so
they serve me the HTML. Another possibility is that the URL is not formed
properly. My application is passing &amp; to libcurl instead of &.

The curl tool can get the XML using this URL and the same user agent
string as my application so I don't see why I can't get it.

I tried looking at the output from curl using --libcurl but can't see any
reason why my application is different.

Here is the code I am using:

  handle = curl_easy_init();

  // Set up options
  curl_easy_setopt(handle, CURLOPT_URL, url.ascii());
#if DEBUG
  curl_easy_setopt(handle, CURLOPT_VERBOSE, 1);
#endif
  curl_easy_setopt(handle, CURLOPT_USERAGENT, useragent.ascii());
  curl_easy_setopt(handle, CURLOPT_TIMEOUT, timeout);
  if(!proxy.isEmpty())
    curl_easy_setopt(handle, CURLOPT_PROXY, proxy.ascii());
  curl_easy_setopt(handle, CURLOPT_FOLLOWLOCATION, 1);

It's probably a case of not seeing the wood for the trees.
What am I doing wrong ?

      

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-03-29