cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Save as text, lynx -dump

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Wed, 7 May 2003 21:20:59 -0700

On Thu, May 08, 2003 at 12:39:00PM +1000, James Wettenhall wrote:
> Is there a way to immitate lynx -dump using libcurl?
> i.e. I want to save a webpage as text using an API,
> rather than using a system call to "lynx".

Downloading the HTML (what libcurl does) is only a small part of converting
a web page to text. The hard part is parsing the HTML and rendering a page
that looks half decent. If you just want the raw text and don't care how it
looks (for indexing or something), then it's pretty easy to write a parser
that just throws out everything between < and >. Otherwise, you'll end up
rewriting most of lynx. I can't think of many situations where that would
be a win.

>>> Dan

-- 
http://www.MoveAnnouncer.com              The web change of address service
          Let webmasters know that your web site has moved
-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com
Received on 2003-05-08