cURL / Mailing Lists / curl-and-php / Single Mail

curl-and-php

RE: curl-and-php Digest, Vol 29, Issue 3

From: Richard Lynch <ceo_at_l-i-e.com>
Date: Thu, 17 Jan 2008 14:48:11 -0600 (CST)

You MIGHT be able to avoid the disk write by dinking around with
opening up named pipes in PHP, and then the program will THINK it's
reading from the hard drive, when it's really reading from RAM...

You may also want to try writing to /tmp which is sometimes a (much)
faster drive.

And, finally, if some kind of ram-disk is available, you could perhaps
use that.

On Fri, January 4, 2008 9:53 am, Ralph Seward wrote:
> Thank you Richard and Colleen for your replies.
> First I should say that I went back to my code and simplified it to
> the barest minimum, and ran it against an online pdf and the code
> actually worked perfectly. I present this code below. The problem I
> was having was that I was attempting to run the pdf through a complex
> series of routines to parse out html code and this code was running
> into problems with the pdf format.
> In response to Richard, I actually have a set of code that converts
> the pdf format into a text format utilizing a shell program called
> pdftotext available from http://www.bluem.net/downloads/pdftotext_en/
> . A requirement for this program is that the pdf must first be written
> to disk.
>
> Below is my code to capture pdf code and write it to disk. It is
> actually a pretty basic curl download followed by a disk write.
>
> Ralph
>
> #open curl session
> $s = curl_init();
> #configure curl command
> curl_setopt($s, CURLOPT_URL,
> "http://www.ire.org/training/nettour/pdf/PDFTOTEXT.pdf"); //
> target pdf
> curl_setopt($s, CURLOPT_RETURNTRANSFER, TRUE); //
> return in string
> # execute curl command & send contents of target pdf to string
> $downloaded_page = curl_exec($s);
> # close php/curl session
> curl_close($s);
> $filename = "downloaded.pdf";
> $outfile = fopen($filename, "w+") or die("Error opening file\n");
> fwrite($outfile, $downloaded_page) or die("Error writing to
> file.");
> fclose($outfile);
>
>
>
>
>> From: curl-and-php-request_at_cool.haxx.se
>> Subject: curl-and-php Digest, Vol 29, Issue 3
>> To: curl-and-php_at_cool.haxx.se
>> Date: Fri, 4 Jan 2008 12:00:02 +0100
>>
>> Send curl-and-php mailing list submissions to
>> curl-and-php_at_cool.haxx.se
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
>> or, via email, send a message with subject or body 'help' to
>> curl-and-php-request_at_cool.haxx.se
>>
>> You can reach the person managing the list at
>> curl-and-php-owner_at_cool.haxx.se
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of curl-and-php digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Problem with redirection (Douglas Fonseca)
>> 2. PDF links (Ralph Seward)
>> 3. Re: PDF links (Richard Lynch)
>> 4. Re: PDF links (Colleen R. Dick)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Thu, 3 Jan 2008 15:01:31 -0300
>> From: "Douglas Fonseca" <dglsbr_at_gmail.com>
>> Subject: Re: Problem with redirection
>> To: curl-and-php_at_cool.haxx.se
>> Message-ID:
>> <abbefc8b0801031001j9df51c0u1791993f99d0f3a6_at_mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi,
>> I'm with the same problem.
>> I'm trying to log in on orkut with cURL, it works, but don't
>> redirects de
>> page to Orkut Home, just show a header.
>> I'm already using COOKIEJAR.
>> Please I hope someone can help us.
>> Thank you,
>> Douglas Fonseca
>>
>>
>> 2008/1/1, Richard Lynch <ceo_at_l-i-e.com>:
>> >
>> > On Mon, December 17, 2007 9:34 am, Werner Hofer wrote:
>> > > I would like to use the curl library. I wish to get a content
>> from a
>> > > page
>> > > (page x: http://www.travelan.net/module/20316/more-gesamt/).
>> > > For that i call a page (page y:
>> > > http://www.getyourstock.com/alfa13.php) and
>> > > this page calls page x (see the example alfa13.php).
>> > > Now the problem is following: There is an redirection whitin
>> page x
>> > > and i do
>> > > not know how to get the content of the redirected page x
>> > > The content of the redirected page x should be displayed finally
>> in
>> > > the
>> > > browser.
>> > > Result: I only get the header, but not the content of the page
>> > > itselfs.
>> > >
>> > > The header i get is following:
>> > >
>> > > HTTP/1.1 302 Found Date: Mon, 17 Dec 2007 15:20:17 GMT Server:
>> > > Apache/2.0.54
>> > > (Debian GNU/Linux) PHP/5.2.3 with Suhosin-Patch DAV/2
>> mod_ssl/2.0.54
>> > > OpenSSL/0.9.7e X-Powered-By: PHP/5.2.3 Set-Cookie:
>> > > PHPSESSID=1vmqk1rfjgau42pvki54kl8je5; path=/ Expires: Thu, 19
>> Nov 1981
>> > > 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate,
>> > > post-check=0, pre-check=0 Pragma: no-cache location:
>> > > http://www.holidayandmore.de/index.asp?Agentur=50376&AgentID=20316
>> > > Transfer-Encoding: chunked Content-Type: text/html
>> >
>> > Since you are getting cookies in the headers, perhaps you need to
>> > provide a COOKIEJAR and COOKIEFILE for the redirects to work.
>> >
>> > --
>> > Some people have a "gift" link here.
>> > Know what I want?
>> > I want you to buy a CD from some indie artist.
>> > http://cdbaby.com/from/lynch
>> > Yeah, I get a buck. So?
>> >
>> > _______________________________________________
>> > http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://cool.haxx.se/pipermail/curl-and-php/attachments/20080103/307f7dd3/attachment-0001.htm
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Thu, 3 Jan 2008 19:24:19 +0000
>> From: Ralph Seward <rj_seward_at_hotmail.com>
>> Subject: PDF links
>> To: <curl-and-php_at_cool.haxx.se>
>> Message-ID: <BAY123-W349DEDE5DA833B25B236839E530_at_phx.gbl>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Dear Folks,
>>
>> I am currently developing a web bot using php/curl and I have a
>> question to throw out. Many times I will come across a link to a pdf
>> file that appears just like a link to a web page. For example,
>> http://www.somesite/healthcenter/ImmunizationForm.pdf. Click on this
>> link, and in Firefox a popup-like window will appear asking "What
>> should Firefox do with this file?" with the options of Open or Save
>> to Disk.
>> Now, is it possible to follow such a link through curl and have the
>> pdf file saved to disk? Has anyone ever succeeded in doing anything
>> with a pdf through curl?
>> Thanks in advance.
>> Ralph J Seward
>>
>> _________________________________________________________________
>> Get the power of Windows + Web with the new Windows Live.
>> http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://cool.haxx.se/pipermail/curl-and-php/attachments/20080103/1e77629f/attachment-0001.htm
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Thu, 3 Jan 2008 16:03:53 -0600 (CST)
>> From: "Richard Lynch" <ceo_at_l-i-e.com>
>> Subject: Re: PDF links
>> To: "curl with PHP" <curl-and-php_at_cool.haxx.se>
>> Message-ID: <37113.98.193.37.55.1199397833.squirrel_at_www.l-i-e.com>
>> Content-Type: text/plain;charset=iso-8859-1
>>
>> On Thu, January 3, 2008 1:24 pm, Ralph Seward wrote:
>> > I am currently developing a web bot using php/curl and I have a
>> > question to throw out. Many times I will come across a link to a
>> pdf
>> > file that appears just like a link to a web page. For example,
>> > http://www.somesite/healthcenter/ImmunizationForm.pdf. Click on
>> this
>> > link, and in Firefox a popup-like window will appear asking "What
>> > should Firefox do with this file?" with the options of Open or
>> Save to
>> > Disk.
>> > Now, is it possible to follow such a link through curl and have
>> the
>> > pdf file saved to disk? Has anyone ever succeeded in doing
>> anything
>> > with a pdf through curl?
>>
>> You can get it just as you would with an HTML document.
>>
>> There's nothing particularly fancy involved.
>>
>> If you want to actually analyze what's IN the PDF, then things get a
>> bit more complicated, as the PDF format itself has a bewildering
>> array
>> of ways in which it can obfuscate content...
>>
>> But there are projects/products "out there" for tearing apart a PDF
>> into its parts and analyzing them to varying degrees.
>>
>> --
>> Some people have a "gift" link here.
>> Know what I want?
>> I want you to buy a CD from some indie artist.
>> http://cdbaby.com/from/lynch
>> Yeah, I get a buck. So?
>>
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Thu, 03 Jan 2008 15:09:11 -0800
>> From: "Colleen R. Dick" <platypus_at_proaxis.com>
>> Subject: Re: PDF links
>> To: curl with PHP <curl-and-php_at_cool.haxx.se>
>> Message-ID: <477D6B17.7070908_at_proaxis.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> I didn't see him indicate that he cares about analyzing whats in the
>> PDF, he just wants the bot to be able to download it to the host
>> that
>> its running on.
>>
>> I could say the same about csv files. Of course I'm having trouble
>> getting the app to spit them out in the first place from curl. It
>> calls
>> a javascript and I have to simulate everything it does.
>>
>>
>>
>> Ralph, have you tried? What code have you tried and what is the
>> result?
>>
>> Ralph Seward wrote:
>> > Dear Folks,
>> >
>> > I am currently developing a web bot using php/curl and I have a
>> > question to throw out. Many times I will come across a link to a
>> pdf
>> > file that appears just like a link to a web page. For example,
>> > http://www.somesite/healthcenter/ImmunizationForm.pdf. Click on
>> this
>> > link, and in Firefox a popup-like window will appear asking "What
>> > should Firefox do with this file?" with the options of Open or
>> Save to
>> > Disk.
>> > Now, is it possible to follow such a link through curl and have
>> the
>> > pdf file saved to disk? Has anyone ever succeeded in doing
>> anything
>> > with a pdf through curl?
>> > Thanks in advance.
>> > Ralph J Seward
>> >
>> > ------------------------------------------------------------------------
>> > Get the power of Windows + Web with the new Windows Live. Get it
>> now!
>> > <http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
>> > ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
>> >
>>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: platypus.vcf
>> Type: text/x-vcard
>> Size: 314 bytes
>> Desc: not available
>> Url :
>> http://cool.haxx.se/pipermail/curl-and-php/attachments/20080103/75ea0b75/attachment-0001.vcf
>>
>> ------------------------------
>>
>> _______________________________________________
>> curl-and-php mailing list
>> curl-and-php_at_cool.haxx.se
>> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
>>
>>
>> End of curl-and-php Digest, Vol 29, Issue 3
>> *******************************************
>
> _________________________________________________________________
> Share life as it happens with the new Windows Live.
> http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008_______________________________________________
> http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
>

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
Received on 2008-01-17