curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: Status of IDN support?

From: Tim Ruehsen <tim.ruehsen_at_gmx.de>
Date: Wed, 11 Jan 2017 12:49:53 +0100

On Wednesday, January 11, 2017 12:49:00 PM CET Tim Ruehsen wrote:
> On Wednesday, January 11, 2017 12:23:34 PM CET Tim Ruehsen wrote:
> > On Tuesday, January 10, 2017 11:40:49 PM CET Daniel Stenberg wrote:
> > > On Tue, 10 Jan 2017, Alessandro Ghedini wrote:
> > > >> TESTFAIL: These test cases failed: 165 1034 1035 2046 2047
> > > >
> > > > Note that this is with curl 7.52.1 and libidn2 0.14 from Debian
> > > > unstable.
> > >
> > > I suspect this has something to do with libidn2's limitations, but we
> > > haven't changed any IDN code in curl since 7.52.1 that I can recall and
> > > I
> > > use 0.14 too.
> >
> > Sorry for dropping in late... I made the recent changes to libidn2 which
> > is
> > basically TR46 support.
> >
> > Now the bad news: I introduced a bug (regression) regarding NFC conversion
> > in libidn 0.14. A fix is already in upstream repo but not released yet.
> > This might introduce the test failures you experience... on some systems
> > UTF-8/Unicode might be decomposed and on some it is composed. Using
> > decomposed codepoints with IDN2_NFC_INPUT fails with libidn 0.14.
> >
> > But if you enable the new TR46 feature, the input is NFCed (and
> > lowercased)
> > automatically:
> >
> > Example:
> > $ printf "\x62\x6c\x61\xcc\x8a\x62\xc3\xa6\x72\x67\x72\xc3\xb8\x64\x2e\x6e
> > \x6f"|idn2
> > idn2: lookup: string is not in Unicode NFC format
> >
> > $ printf "\x62\x6c\x61\xcc\x8a\x62\xc3\xa6\x72\x67\x72\xc3\xb8\x64\x2e\x6e
> > \x6f"|idn2 -T
> >
> > You can check for TR46 availability:
> >
> > #if IDN2_VERSION_NUMBER >= 0x00140000
> >
> > if ((rc = idn2_lookup_u8((uint8_t *)utf8, (uint8_t **)ascii,
> >
> > IDN2_TRANSITIONAL)) == IDN2_OK)
> > ...
> > #else
> > ...
> > #endif
> >
> > The IDN2_TRANSITIONAL enables TR46 'transitional' conversion (tries to be
> > compatible to IDNA2008 and IDNA2003 as much as possible),
> > IDN2_NONTRANSITIONAL enables TR46 'non-transitional (IDNA2008, the way
> > that
> > every app should go... may arise some incompatibilties with IDNA2003 which
> > is still under heavy use).
>
> I just looked at lib/url.c... how do you NFC'ed the input to
> idn2_lookup_ul(), couldn't find any conversion using a quick grep ?
>
> My suggestion would be:
>
> diff --git a/lib/url.c b/lib/url.c
> index 7944d7b0c..81cd490e0 100644
> --- a/lib/url.c
> +++ b/lib/url.c
> @@ -4010,7 +4010,12 @@ static void fix_hostname(struct connectdata *conn,
> struct hostname *host)
> #ifdef USE_LIBIDN2
> if(idn2_check_version(IDN2_VERSION)) {
> char *ace_hostname = NULL;
> - int rc = idn2_lookup_ul((const char *)host->name, &ace_hostname, 0);
> +#ifdef IDN2_TRANSITIONAL
> + int flags = IDN2_TRANSITIONAL;
> +#else
> + int flags = IDN2_INPUT_NFC;
> +#endif
> + int rc = idn2_lookup_ul((const char *)host->name, &ace_hostname,
> flags); if(rc == IDN2_OK) {
> host->encalloc = (char *)ace_hostname;
> /* change the name pointer to point to the encoded hostname */
>
> That way there is a chance that the tests work more stable between different
> systems.

Sorry, I meant IDN2_NFC_INPUT.

Tim

-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html

Received on 2017-01-11