curl-and-python

Re: Noob Question.

From: Sandip Shah <sandipshah_at_vthrive.com>
Date: Wed, 30 Jan 2013 09:25:59 -0800

Hi,

Thanks for the response - but I do not need to set headers to send, I want
to receive the headers only.

This is what I ended up using (clue came from --libcurl option mentioned in
one of the previous threads by Daniel):

c = pycurl.Curl()

c.setopt(c.URL, 'http://www.yahoo.com')

c.setopt(c.HEADER, True)

c.setopt(c.NOBODY, True)

c.setopt(c.FOLLOWLOCATION, True)

and it gives me just the headers that I want.

For efficiency, I am looking into curl_multi option. I initialize the
curl_multi, add handles, read the returned results, delete the curl_multi
object and restart with the next batch ... but after a few batches the
process hangs.

I am looking into how to read the status from info_read() method of the
curl_multi object to find out what is going wrong.

This gives me data in separate sets - success objects and failed objects.
 But the pycurl object's getinfo method gives me only effective_url ...
which could be different than the original url. How do I tie the results
back to the original url? One can check the redirect_count, but that still
does not give me the original url to tie the response back to.

What am I missing?

Also, is it better to del the curl_multi and re-initialize it, or should
one remove the handles and add new(er) handles to the same object in the
next batch?

SS

On Tue, Jan 29, 2013 at 10:46 AM, Sandip Shah <sandipshah_at_vthrive.com>wrote:

> Hi,
>
> I need to get the headers only from a URL (I am doing this for a lot of
> URLs) and seems like PyCURL is the fastest way to do it in Python.
>
> However, I do not see a "setopt_HEAD" (curl -I option) in PyCURL.
>
> What am I missing, and how can I get it?
>
> Thanks,
>
> SS
>
>

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2013-01-30