Re: Finding library regex

From: Soren Spies <>
Date: Fri, 18 Oct 2002 17:59:05 -0700

On Friday, Oct 18, 2002, at 17:48 US/Pacific, $B2&(B $BgK(B wrote:

> Since curl do not extract links from html file. I wanna use regex to
> parse it. Anyone knows where can I find it, or any better way to
> extract links? Thanks in advance.

Here is a shell function I used to extract URLs from the default apache
file listing (no index.html or equivalent). You can probably adapt it
a bit to extract any link?

listurls() {
     [[ "$baseurl" != */ ]] && baseurl="${baseurl}/"
     #echo baseurl: $baseurl >&2
     $WGET $WGETCATOPTS "$baseurl"|sed -n '
             /\[ \]/s!.*HREF="\([^"]*\).*$!'"$baseurl"'\1!p

$WGET is usually curl, but for older OS X (where the function was first
written), it was GNU's wget.
for curl
     WGETCATOPTS="-nv -O -"
for wget.

Soren Spies
Apple Computer, Inc.
