curl / Mailing Lists / curl-users / Single Mail

curl-users

Re: Scrape text from the screen

From: Georg Horn <horn_at_koblenz-net.de>
Date: Tue, 19 Jun 2018 12:51:19 +0200

Hello,

> Date: Mon, 18 Jun 2018 15:24:51 +0200
> From: Daniel Lublin <daniel_at_lublin.se>
> To: the curl tool <curl-users_at_cool.haxx.se>
> Subject: Re: Scrape text from the screen
>
> > I have a web page where the text is displayed from a sever directly onto the
> > screen .. Hence data not found in the web page source code.
> >
> > How can I use Curl to scrape the text from the screen buffer ?? .. the
> > displayed data can go over multiple screens .
>
> It sounds like you want to extract words, letters and digits, from an image.
> This is not something that curl does. Simply put it, curl downloads
> documents, like texts or images, from a location (URL).

I´d rather belive that the webpage makes use of Javascript/Ajax to load
content like many modern webpages do. You can use browser-addons like
LiveHttpHeader or the builtin developer tools to record all the
HTTP-requests that the browser executes while loading the page, and then
try to mimic that behaviour with curl. Unfortunately the requests which are
made often contain dynamically generated parts and you have to emulate
the behaviour of the javascript which is executed in the browser in your
test script...

I work on website testing and monitoring for many years now and curl was
and is a very handy tool for the job, but meanwhile i use the selenium
framework too. With selenium you can remote control a real browser, so
the page is just called as if a real user did it.

Regards,
Georg
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2018-06-19