cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Project hiper - High Performance libcurl

From: Cory Nelson <phrosty_at_gmail.com>
Date: Wed, 9 Nov 2005 18:47:37 -0800

On 11/9/05, Jamie Lokier <jamie_at_shareable.org> wrote:
> Cory Nelson wrote:
> > > I think such fancy event/thread scheduler should not be part of Curl;
> > > it should be a project in its own right. (One I've heard of for unix
> > > is called libasync-smp. I'm slowly working on one myself, too).
> >
> > It is easy to do (on Windows, at least), and would help users a lot.
> > If it's not in Curl, we lose the ability of easily writing
> > cross-platform applications that work on all the major OSes.
>
> A cross-platform library to implement good event/thread scheduling is
> not a trivial undertaking. I know, because I am writing one and it's
> taken a year so far to do it right. While acknowledging that I might
> be a slow programmer, still if such a thing is written, it would be
> useful to more applications than just the ones using Curl.

Didn't mean that. I meant making the hiper interface use threading.
Put generic functions in one file, then have separate hiper_iocp.c
hiper_kqueue.c for each OS. Maintaining separate files with the same
basic hiper api is much simpler than writing a threading/event
wrapper.

> But far more importantly, if Curl were to _require_ that the blocking
> happens inside Curl's code, then it would be difficult to use Curl in
> programs which already have their own event handling for other things.

Could you elaborate why? Is this a problem specific to *nix? I don't
see any problems for Windows.

> > Not having it in Curl would also mean we lose efficiency. By not
> > being able to delegate which type of thread (I/O or non-I/O) to begin
> > a request in, Curl would be forcing the user to never destroy any of
> > the threads that Curl has touched, for fear of canceling I/O. We
> > would no longer be able to intelligently pool threads by how heavy the
> > workload is.
>
> I don't see why you say the user couldn't cancel threads touched by
> Curl. Surely, if the scheduler is outside Curl, the application would
> know _exactly_ which threads Curl is using at all times?

IOCP operations are cancelled if the thread that initiated them is
destroyed. Sure, the user could have a set of threads and never
destroy them until Curl is done, but it is more efficient to
create/destroy threads based on how many work items are in the pool's
queue and how many threads are blocking, which is what the Win2k
thread pool does behind the scenes.

> > CPUs aren't getting any faster, but cores are being added on. The
> > concept of multi-threaded coding needs to get a lot more popular, and
> > this is a prime example of something that would really benefit from
> > it.
>
> Well, using multiple threads may be a prime example. But I thought
> you could already use multiple threads with Curl?

Indeed, but not efficiently.

> > As I said above, it would be trivial to add a single lock to each easy
> > handle. No fancy scheduling needed.
>
> So you're suggesting to add the right locks so that the application's
> scheduler is allowed to call Curl's event handlers from any of it's
> threads?

So whatever functions that manipulate an easy handle will be able to
do so without any troubles.

> That would certainly be an easy API to understand, and I'd have no
> complaint with it.
>
> But it wouldn't be as efficient as you think. As soon as some Curl
> code blocked on any lock, you'd need to create another thread somehow
> to ensure the CPU is still used (otherwise there is no point in this
> discussion). The result would be extra threads that aren't really
> needed.

Well, high performance apps shouldn't ever block. And these are per
easy handle locks, so the likelyhood of a thread needing to block is
slim.

I do see your point though.. I'm sure this will happen. The Win2k
thread pool does that management for you, but I suppose some pool code
will have to be written for other operating systems.

> > This is simple to do for Windows. There really is no reason not to.
> > I'm not familiar with kqueue etc but I doubt it would be very complex
> > for them either.
>
> The complex part isn't the part which calls kqueue (although that's
> quite a lot of work to do, and more to do efficiently, for all the
> different variations on the kqueue idea (epoll, RT signals, kqueue,
> /dev/poll, port_create, and IOCP; which is why it should be a library
> of it's own if Curl were to depend on it).

It could certainly be its own library, but I think it would be easier
and faster (performance-wise) to just separate the code into
hiper_iocp.c, hiper_kqueue.c etc and make each OS not have to cut
corners to conform to some async wrapper.

> The complex part is deciding which event handlers to run on which
> threads - and also deciding when to create more threads or destroy
> them. Most OSes don't have a "create another thread if one blocks" or
> a "create another thread if a CPU becomes available".
>
> That's interesting work, but there is no obviously perfect way to do
> it (it's still a very active research field - lots of testing, lots of
> heuristics). Which is why Curl should be able to work with different
> people's implementations of such methods.

I'm trying to come up with a public hiper API which would allow Curl
to take full advantage of all operating systems in the easiest way
possible. The API I gave would allow this and be simple to learn.

Regardless... the API outlined at the hiper website is far too *nix
specific and absolutely does not complement IOCP. We all love *nix
but ignoring the majority OS can't be done on such a widely used
library.

> -- Jamie
>

--
Cory Nelson
http://www.int64.org
Received on 2005-11-10