curl / Mailing Lists / curl-users / Single Mail

curl-users

Re: How do I get Curl to execute a URL that is in the Web page source code ?

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Thu, 21 Feb 2019 09:08:16 +0100

On Thu, Feb 21, 2019 at 06:01:27PM +1100, Mike Lambert wrote:
> I have downloaded a web page source code …. Now I want to execute a URL within
> the source code. This URL will produce a PDF file

Ok, do that then! But seriously, once curl downloads a page, it's up to your
app to do with it what it will. If you need to extract a URL, fine, just use an
HTML (or whatever) parser and do that. Since you're asking on this list I
assume you're trying to do this from the command-line and not a programming
language. In that case, using a tool like "tidy" to convert HTML to XML then
"xmlstarlet" to extract an element containing the URL out of the page might
suffice. If the source is JSON, there are similar tools to manipulate that. If
the source is Javascript it can be more difficult, but you can use a headless
browser to execute the page then extract data out of the DOM. See also these
FAQs:

https://curl.haxx.se/docs/faq.html#Does_curl_support_ASP_XML_XHTM
https://curl.haxx.se/docs/faq.html#Does_curl_support_Javascript_or
https://curl.haxx.se/docs/faq.html#Redirects_work_in_browser_but_no

>>> Dan
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2019-02-21