cURL / Mailing Lists / curl-library / Single Mail

curl-library

Possible harm caused by using cURL on other people's websites.

From: Gloves 12 <gloves12_at_gmail.com>
Date: Mon, 23 Aug 2010 15:16:32 -0500

Hello!

    I have a pretty general question about what other people think
about cURL. I use cURL a lot in my job; I work to build software that
downloads information from other people's websites (spidering / mining
/ screen scraping / ... ). I obey robots.txt and I have a delay of 5
to 30 seconds in my code, so that's not an issue. But I am concerned
that, if I accidentally send bad cURL requests, I could cause problems
on their servers.
    I try very hard to make my cURL requests emulate exactly the
requests made by a normal user using a browser. But, I inevitably
make small mistakes, like forgetting to save a cookie, or forgetting
to visit some page that is called by javascript, or mis-sending POST
data, or visiting a page or form that no longer exists, or has
changed.
    Whenever I debug my code and get error messages in response to my
cURL calls, error messages that I never see when I am in a browser, I
always cringe. What if their website is not designed to handle
strange values that I accidentally pass to it, and what if my cURL
requests really cause a problem in their database? For example, I
pass a variable to a php file on their site, but their site never
expected such a value, and so bad data is introduced?
    Or, what if I cause error messages on their site, and those error
messages are saved in a log. What if the web developers on that site
think they have an error in their program? After all, they do not
necessarily think people are tampering with their POST, GET, and
cookie data - they might assume that it is an internal problem, when
really it was my mistake by sending a bad cURL call.

    Maybe I'm being overly cautions. But I know that, even though I
work hard at making them good, some of my cURL calls are not exactly
the same as their browser counterparts. I'm wondering if anyone else
has ever considered this, or knows anything more about it (like "yes,
we all worry about this a ton, you need to be testing your cURL
statements a lot better" or "no, these never matter - it's really rare
that even a terribly sloppy cURL call would ever harm a website").

Thanks for any thoughts or advice!
   - Ryan S
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-08-23