cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: [SECURITY NOTICE] libidn with bad UTF8 input

From: dev <dev_at_cor0.com>
Date: Thu, 2 Jul 2015 10:15:12 -0400 (EDT)

> On June 29, 2015 at 5:09 PM Daniel Stenberg <daniel_at_haxx.se> wrote:
>
>
> Hi all libcurl users.
>
> Here's a little problem many of us need to be aware of!
<snip>
> RECOMMENDATION
>
> Rebuild libcurl with libidn support disabled.
<snip>

Another better way to proceed would be to call a routine to clean up,
check, fix, or trap bad UTF-8 data.

> REFERENCES
>
> [1] =
> https://blog.thijsalkema.de/me/blog//blog/2015/04/17/validate-the-encoding-before-passing-strings-to-libcurl-or-glibc/

This was discussed on the libidn list with Simon Josefsson :
    see https://tools.ietf.org/html/rfc3629#section-11

That was back in November last year and the general feeling within the
libidn world is that it is the responsibility of the application to
detect and prep utf-8 and not the responsibility of libidn.

As seen at
https://blog.thijsalkema.de/me/blog//blog/2015/04/17/validate-the-encoding-before-passing-strings-to-libcurl-or-glibc/
:

    So who should check it?

    The libidn developers show little motivation to fix this, pointing
    the blame to applications instead:

      Applications should not pass unvalidated strings to stringprep(),
      it must be checked to be valid UTF-8 first. If stringprep()
      receives non-UTF8 inputs, I believe there are other similar
      serious things that can happen.
      Simon Josefsson

    http://lists.gnu.org/archive/html/help-libidn/2014-11/msg00002.html

Serious things? Yes, you can bet on it. My response to this was that
we should be checking for bad utf-8, and possibly even doing repair to
avoid security leakage :

    http://lists.gnu.org/archive/html/help-libidn/2014-11/msg00003.html

Wherein I said :

    I wrote a UTF-8 check routine on a recent project and it did require
    a fair amount of thought and was not a perfect check algorithm by
    any means. Things such as bytes 0xC080h ( as mentioned in section 10
    of RFC 3629 ) would be reasonable to check but a complete and
    stringent check for compliance could be a fair chunk of work.

Oddly enough I have done some more work and am still not at a "stringent
check" and perhaps I should get back onto that.

Now then, the key thought on my mind at the moment is "finger pointing"
where we are all pointing elsewhere and saying "its your job not mine"
along with "why didn't you do this?" as opposed to just sitting down and
reading all of RFC 3629 and then with coffee in hand code out a check
routine. Yep, that sounds like lots of fun but it is getting to be a
valid "necessity" as opposed to a fun "want". Also, it just feels so
wrong to strip away functions from libcurl to protect ourselves from a
problem that can be solved.

I have code now, working nicely for at least a year, which catches just
about all four byte utf-8 bit sequence issues and neatly repairs the
damage. I feel that I should get this code bit out in the open and let
good people such as you hack at it and maybe we can provide a final and
reasonable solution to the dreaded nasty UTF-8 bad bits issue. Mostly I
don't like finger pointing and would rather just put fingers to keys and
write a solution that works. Mine doesn't. Not perfectly and not in a
really strict fashion but it is better than being no where on this. Also
I took the approach of "repair" as opposed to just signaling an error. I
was wrong to do that. Bad utf-8 is bad. However it works for my code
world and traps and fixes bad data being fired into a database backend.

You can see some of what I meam by looking at :

    http://lists.gnu.org/archive/html/help-libidn/2014-11/msg00003.html

Let me know your thoughts.

Dennis Clarke
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2015-07-02