curl / Mailing Lists / curl-library / Single Mail

curl-library

Re: wildcard matching

From: Patrick Monnerat via curl-library <curl-library_at_cool.haxx.se>
Date: Sun, 28 Jan 2018 01:51:31 +0100

On 01/27/2018 10:09 PM, Daniel Stenberg wrote:
> On Sat, 27 Jan 2018, Patrick Monnerat via curl-library wrote:
>
>> From the curl_fnmatch.c code and unit1307.c tests, I can see that
>> currently, a negated character set pattern can match the end of string.
>>
>> Example: "a" is matched by pattern "a[^b]".
>>
>> It is not what shell globing does: the end of data can only be
>> matched by the end of the pattern.
>>
>> Is it intentional or a bug ?
>
> I'd say it is a bug.
Thanks for your reply. This confirms what I think: IMO, the end of
string cannot be matched by anything else but the end of pattern or '*'.
I'll fix it.
> I fixed a bug in there recently and I couldn't find any proper docs
> describing how it is supposed to work and I took an easy route and
> made a decision. I suppose it should work like the fnmatch() function
> - although I couldn't really find any good docs for that pattern either.
Why not starting with:
https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
https://en.wikipedia.org/wiki/Glob_(programming)
http://tldp.org/LDP/abs/html/globbingref.html

Although incomplete and somehow diverging, these docs put some light on
the subject.

> This is a function that seems to be virtually unused...

Well... if we need/have it, it should work :-/

The reason why I noted this problem is there are Apple Travis jobs
failing on it and I tried to figure out why (I have no OS X/Darwin). The
failing pattern in unit1307 is:

{ "[!ÿ]",                     "",                       MATCH },

First, it should be NOMATCH according to what has been said above.
Next, it uses a non-ascii character in a set: the current implementation
does not support multi-byte characters in sets: since source encoding is
UTF-8, the set is built with 2 pseudo-characters in it: \xC3 and \xBF.
If we want to put the single-byte ISO-8859-1 character for it, we better
code it as \xFF.

Apple Travis jobs fail with a pattern parsing error, not a "no match"
and it looks like it is random and very frequent (some Apple jobs
succeed). This makes me consider a possible signed character problem
indexing out of the charset buffer. That said, I cannot reproduce the
problem on Fedora (as expected, unsigned char --> array index is not
sign extended) so it really seems to be an Apple-only problem. I'm
giving up on this.

Patrick
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2018-01-28