cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Finding library regex

From: Soren Spies <sspies_at_apple.com>
Date: Fri, 18 Oct 2002 17:59:05 -0700

On Friday, Oct 18, 2002, at 17:48 US/Pacific, $B2&(B $BgK(B wrote:

> Since curl do not extract links from html file. I wanna use regex to
> parse it. Anyone knows where can I find it, or any better way to
> extract links? Thanks in advance.

Here is a shell function I used to extract URLs from the default apache
file listing (no index.html or equivalent). You can probably adapt it
a bit to extract any link?

listurls() {
     baseurl="$1"
     [[ "$baseurl" != */ ]] && baseurl="${baseurl}/"
     #echo baseurl: $baseurl >&2
     $WGET $WGETCATOPTS "$baseurl"|sed -n '
             /\[ \]/s!.*HREF="\([^"]*\).*$!'"$baseurl"'\1!p
             /\[DIR\]/s!.*HREF="/\([^"]*\).*$!'"$baseurl"'!p
             /\[DIR\]/s!.*HREF="\([^"]*\).*$!'"$baseurl"'\1!p'
}

$WGET is usually curl, but for older OS X (where the function was first
written), it was GNU's wget.
     WGETCATOPTS="-L"
for curl
     WGETCATOPTS="-nv -O -"
for wget.

--
Soren Spies
Apple Computer, Inc.
-------------------------------------------------------
This sf.net email is sponsored by:
Access Your PC Securely with GoToMyPC. Try Free Now
https://www.gotomypc.com/s/OSND/DD
Received on 2002-10-19