cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: [Semi-OT] Any perl tools to parse the client side HTML?

From: Ben Greear <greearb_at_candelatech.com>
Date: Mon, 26 Nov 2001 15:14:31 -0700

Nick Chirca wrote:

>>I'm writing some perl scripts to do some automated web server
>>testing. Basically, I need to read a URL, parse out the HTML
>>returned to me (specifically, a form with input fields), modify
>>some input fields according to my test script's values, and then
>>POST the resulting request back to the server. In other words,
>>I want my script to appear as a real user/browser...
>>
>
> You'll have to modify the HTTP headers, the user agent name, manage
> cookies, redirects and stuff like this. I tried to achieve that through an
> older version of libwww also known as lwp. You can find out more about it
> on cpan site. I gave up using/researching on lwp after I discorvered,
> installed curl/libcurl. With Daniel's help I was able to do what I was
> looking for (log in to a website, manage cookies, redirects, fill in forms
> and stuff like this).

Hrm, I'm testing a very controlled system, so I may not need
to get this elaborate...but we'll see! I haven't tried
posting quite yet, as I have to parse first...

>
>
>>I currently have the perl HTTP package working enough to download
>>the URL, and it can post as well,
>>
>
> Can you show/send me some code/script exemples ? I am still a beginner in
> this Internet agent/crawler stuff and I could use any exemple/help.
>
>

This is using the HTTP::* stuff I found on CPAN.

my $ua = new LWP::UserAgent;
my $req = new HTTP::Request GET => "$my_url";
my $response = $ua->request($req);
if ($response->is_success) {
  print "RESPONSE -:" . $response->content . ":-\n";

  # This parse stuff is experimental, and I think I'll just write

   # it myself...

  my @inputs = parseHtmlForm($response->content);
  print joint("\n", @inputs);
} else {
  print "RESPONSE (error) -:" . $response->error_as_HTML . ":-\n";
}

-- 
Ben Greear <greearb_at_candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear
Received on 2001-11-26