Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore ctype.h character classification for non-ASCII platforms #2494

Closed
wants to merge 4 commits into from

Conversation

smuehlst
Copy link
Contributor

With commit 4272a0b curl-specific
character classification macros and functions were introduced
in curl_ctype.[ch] to avoid dependencies on the locale. This broke
curl on non-ASCII, e.g. EBCDIC platforms. This change restores the
previous set of character classification macros when
CURL_DOES_CONVERSIONS is defined.

This was discovered on a z/OS machine with EBCDIC encoding, where Content-type headers could no longer be matched because leading blanks were no longer stripped off by curl.

Stephan Mühlstrasser added 2 commits April 13, 2018 14:48
With commit 4272a0b curl-speficic
character classification macros and functions were introduced
in curl_ctype.[ch] to avoid dependencies on the locale. This broke
curl on non-ASCII, e.g. EBCDIC platforms. This change restores the
previous set of character classification macros when
CURL_DOES_CONVERSIONS is defined.
strcpy_url() had a hard-coded dependency on ASCII to work because of
an explicit character comparison for being greater or equal than 0x80.
@smuehlst
Copy link
Contributor Author

I added a second fix for strcpy_url to the same pull request because the strcpy_url fix depends on the character classification fix.

@bagder
Copy link
Member

bagder commented Apr 16, 2018

The red travis is a checkrsrc complaint:

./http_chunks.c:83:45: warning: space before last semicolon (SPACESEMICOLON)
         || (digit >= 0x61 && digit <= 0x66) /* a-f */;
                                             ^

While the warning looks a bit wrong there, I think you could improve the readability of that code by not having the semicolon "hidden" like that after a comment.

@smuehlst
Copy link
Contributor Author

Sorry, I had thought I had run "make checksrc", but apparently I did run it for the wrong sources. Fixed.

@smuehlst
Copy link
Contributor Author

Test case 255 is failing in in one of the travis builds, while it is succeeding in all other builds:

https://travis-ci.org/curl/curl/jobs/367525539

What does that tell me? I doubt that the failure could be related to my change, given the fact that for ASCII platforms it has no effect.

@bagder
Copy link
Member

bagder commented Apr 19, 2018

It seems to be a test that started to become flaky (for other PRs as well) and shouldn't be related. I've triggered a rerun of it now.

@bagder
Copy link
Member

bagder commented Apr 23, 2018

We avoid the use of ISXDIGIT to accommodate non-ASCII hosts.

So this is because you're checking ascii input so you don't want Curl_isxdigit() because that assumes the non-ascii input in your case?

In all ascii-based systems however, Curl_isxdigit_ascii() adds nothing but is simply a tad bit slower... I propose that we then introduce a #define Curl_isxdigit_ascii(x) Curl_isxdigit(x) for ascii hosts. What do you say?

@smuehlst
Copy link
Contributor Author

We avoid the use of ISXDIGIT to accommodate non-ASCII hosts.

So this is because you're checking ascii input so you don't want Curl_isxdigit() because that assumes the non-ascii input in your case?

Correct. The above comment comes from the old code that was present in http_chunks.c before commit 4272a0b:

We avoid the use of isxdigit to accommodate non-ASCII hosts. */

I restored this code and renamed the old Curl_isxdigit() function to Curl_isxdigit_ascii() with the hope to avoid confusion.

In all ascii-based systems however, Curl_isxdigit_ascii() adds nothing but is simply a tad bit slower... I propose that we then introduce a #define Curl_isxdigit_ascii(x) Curl_isxdigit(x) for ascii hosts. What do you say?

Sounds reasonable to me. Do I understand it correctly that the current Curl_isxdigit_ascii() function shall be kept for the non-ASCII case and the define will be used for the ASCII case, like this:

#ifdef CURL_DOES_CONVERSIONS
/* Check for an ASCII hex digit.
   We avoid the use of ISXDIGIT to accommodate non-ASCII hosts. */
static bool Curl_isxdigit_ascii(char digit)
{
  return (digit >= 0x30 && digit <= 0x39) /* 0-9 */
        || (digit >= 0x41 && digit <= 0x46) /* A-F */
        || (digit >= 0x61 && digit <= 0x66); /* a-f */
}
#else
#define Curl_isxdigit_ascii(x) Curl_isxdigit(x) 
#endif

@bagder
Copy link
Member

bagder commented Apr 23, 2018

Right, that's exactly what I was thinking!

While for non-ASCII hosts a special function is needed to check for
a hex digit in ASCII encoding, on ASCII hosts the normal Curl_isxdigit
function can be used.
@smuehlst
Copy link
Contributor Author

Pull request updated accordingly.

@bagder
Copy link
Member

bagder commented Apr 24, 2018

Thanks!

@bagder bagder closed this in dd7521b Apr 24, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jul 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants