cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Fwd: Downloaded File (.pdf) Corrupt

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Mon, 9 Jan 2012 08:45:46 -0800

On Mon, Jan 09, 2012 at 10:58:19AM -0500, Eric Belec wrote:
> Thank you for the reply. I've determined the scenario where
> Chrome/Firefox works where libcurl does not. If you look at the link
> -> "https://someserver/myfile123/12045%20%20GIECKE%20&amp;%20DE%20CANADA%20Jan%2064%2072.pdf"
> you will notice an added "amp;" which Chrome/Firefox appears ignore
> and unfortunetely libcurl does not resulting in a corrupt downoad.
> Can someone explain why this might be happening and also the best
> short term fix? I assume I could just parse my link and remove the
> added 'amp;' but there has to be a better solution.

You you getting this link from within an HTML document? Then it's
not that the browser is ignoring the 'amp;' but rather converting the
character entity reference "&amp;" into an ampersand character "&". This
is perfectly valid HTML, and is needed by any any code that parses links
out of HTML.

There are library functions available in many languages that will do this
substitution for you, but if &amp; is the only one you ever see in whatever
links you're scraping, you could just special-case it and replace just that
one with &.

>>> Dan
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2012-01-09