cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: [PATCH] a new option WILDCARDMATCH (FTP only for now)

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Wed, 31 Mar 2010 16:40:55 +0200 (CEST)

On Tue, 30 Mar 2010, Pavel Raiskup wrote:

> I'd like to offer you first patch to testing and considering wildcard
> enhancement of FTP implementation.

Thanks. Here's my initial, but quite substantial, set of feedback. They're big
and small and in a random order. (I basically read the patch from top to
bottom a few times and added stuff as I found or thought about them.)

1 - the CURLOPT_* options that set callbacks should preferably end in FUNCTION
as all existing callback options do that. I also don't like the word "UNIT" in
the two callbacks sent before and after each directory entry. What unit?

2 - each callback function option should have a corresponding *DATA option to
set the custom pointer passed on to the callback. All existing callbacks have
this design.

3 - I miss the man page updates that explain how to use all this new
functionality

4 - I miss a set of new test cases for the curl test suite to test all this
new functionality. I'm also curious about what set of directory listings
you've tested this against and it would be nice to have at least some of the
most common ones in the test cases. From my reading of the code, it seems to
assume a rather fixed format which you will see reality doesn't quite match.

5 - I think Curl_fileinfo_copy() would better be called *_clone() or *_dup()
as it actually makes a duplicate of the struct plus name.

6 - I don't like like typedefed structs. 'struct Curl_fileinfo' is better
called exactly that in the code to make it obvious it is a struct instead of
typedefing it to get used just as Curl_fileinfo.

7 - typo: symlink is spelled symlink ;-)

8 - CURLFILETYPE_UNKNOWNTYPE really? What's an app or anyone supposed to do
with a file that isn't a file, a dir, a symlink or a device? Isn't that rather
an error case that should be skipped?

9 - the code base is full of memory leaks, especially in error cases and once
you've added a few test cases I can only recommend to try the torture tests as
done with runtests.pl -t as they are great at revealing all those mistakes you
thought were hidden where the sun doesn't shine!

10 - return codes from callbacks. We generally allow the callback to return at
least errors in case it wants to bail out from everything. If nothing else,
you are better off to document them to return 0 for ok and reserve everything
else for other cases for the future.

11 - don't use isspace(), use ISSPACE() which is our custom version which is
less error-prone (and yes you did exactly the error in the code that this
macro helps us avoid).

12 - the code is sprinkled with lines that do several statements on the same
physical line, like:

   case 'd': parser->file_data.filetype = CURLFILETYPE_DIRECTORY; break

please write them in curl style like:

   case 'd':
     parser->file_data.filetype = CURLFILETYPE_DIRECTORY;
     break;

I believe I saw several if() lines written in single lines as well.

13 - please write if() exactly like that, with no space after the f

14 - when a directory linsting line is parsed, how come it doesn't store
timestamp, permission, user + group etc and provides that in the callback as
well? I'm sure applications who want info about files would like that info
too.

15 - functions in wildcard.c use a curl_ prefix when they should use Curl_ as
they are private within the library.

16 - error handling from the callback. Even when you detect out of memory
within the callback, the callback still doesn't return any error (it returns
the bufflen no matter what) and I don't think that's a good idea. If the error
is serious enough you better return error properly. Only "simple" problems
with parsing the data should be considered soft enough to allow the data to
keep getting downloaded.

17 - I would much more prefer a style of code that instead of:

+ parser->error = Curl_llist_insert_next(parser->filelist,
+ parser->filelist->tail, finfo);
+ if(parser->error)
+ parser->error = CURLE_OK;
+ else
+ parser->error = CURLE_OUT_OF_MEMORY;

is written like:

    int rc = Curl_llist_insert_next(parser->filelist,
                                    parser->filelist->tail, finfo);
    parser->error = rc ? CURLE_OK : CURLE_OUT_OF_MEMORY;

18 - the boundary checks for the buffer allocated as parser->buffer seems a
bit spread out and not consistent. First, a lot of code checks:

   if(parser->bufferin > FTP_BUFFER_ALLOCSIZE - 1)

while a the filename part checks for it:

   if(parser->bufferin > parser->bufferlength - 1)

... and afaiu, the latter could be used all over to become more generic. The
current boundary check is not covering a few places, like when you parse in
the PL_SIZE_SIZE or PL_TIME_PREYEAR_OR_TIME states, leading to possible buffer
overflows.

19 - fnmatch() is not portable enough for unconditional use. Lots of systems
that build and run libcurl have no native fnmatch(). You either need to
provide an alternative "native" version for those systems or we need to have
the entire matching code conditional on the existance of this function.

20 - function pointers cannot be assigned NULL properly, that will cause
compiler warnings. Use ZERO_NULL for that for now, or typecast to the function
pointer type.

21 - one or more tests for this should be done using the multi interface, and
you might find that to be a rather big hurdle, as I don't see any
considerations for that in the current patch. We do make an effort to allow
the app select what interface to use so all functionality should be possible
to use with both.

22 - I would prefer to see as much as possible of the code handling wildcards
in a separate file. You added two new files, wildcard.c and fileinfo.c but yet
the largest portion of new code was added to lib/ftp.c and while that works I
find it unfortunate since ftp.c already is one of the larger source files in
our tree.

23 - The wildcard.fnmatch function pointer should be within the struct
UserDefined instead of the wildcard struct to make the Curl_dupset() function
work properly (internally used to make curl_easy_duphandle work). You could
perhaps write a test case for wildcard matching that uses
curl_easy_duphandle() to verify things.

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2010-03-31