cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: [PATH] --data-urlencode

From: Alessandro Vesely <vesely_at_tana.it>
Date: Wed, 21 Nov 2007 16:33:58 +0100

Daniel Stenberg wrote:
> On Wed, 21 Nov 2007, Alessandro Vesely wrote:
>
>>> HTML pages with dynamic form input names are very rare so the need
>>> for dynamicly encodes of names should be much less frequent.
>>
>> Even if they are static, they may be complicate, e.g. the
>> Java-generated stuff that Bart exemplified a couple of days ago.
>
> So are you then advocating that we should encode the name part as well?

No, I think that would be counter-intuitive for scripting.

>>> I don't see why post data is any special here than all other kinds of
>>> operations you do with curl?
>>
>> So that one can learn how to encode the static name in the script, even
>> if it's not curl's job.
>
> You can still do that as you said with -vG or with --trace/--trace-ascii
> so I don't think we need to add terrible kludges for this.

Yeah, it's not really a need...

>> BTW, the syntax is slightly over-restrictive: --data-urlencode "stuff"
>> gives "option --data-urlencode: is badly used here", which looks
>> rather unnecessary.
>
> That's why I've tried to discuss the syntax of the option... :-)

Yup, another point of discussion is about newlines and conversions...

>> The CGI mandates a name=value syntax, but http accepts any input.
>
> Yes, but HTTP doesn't require URL encoding of the data either...

I agree.

>> I would patch it as attached, to urlencode any argument.
>
> I think that syntax has a downside we should rather work around. Here's
> my thinking:
>
> Let's say you write a shell script that wants to pass just about
> anything to a web site, URL encoded without doing it "name=contents"
> style. The script gets the data from a user, or some other external
> source and it can be just about anything... Then the data can very well
> contain a '@' or '=' letters, and --data-urlencode "$data" will then act
> very "funny".

Yes, that's true. I add a new patch.

> I would rather then make it accept "=" or "@" as the first letter in the
> argument and then it would not use a name or '=' in the output buffer
> but simply encode the whole data chunk or the whole file name.

(I assume you meant "file content".) Easily fixed.

> Or what do you say? Would you mind adjusting the patch to something like
> this? Oh, and please poke on the man page too at the same time!

I've had an attempt. I found no other way than indenting to convey the
idea that mixing --data-* options joins them with '&'. Thus I messed up
the whole --data section. (I didn't run roffit on it, so please check
links, if you like the changes.)

>> This is a possibly related, yet different problem. Usually, one does
>> not care to bombard a server with sending attempts (which may be good
>> testing anyway.) However, if one needs to be careful and wants to test
>> a command before issuing it, and does not have a test server at hand,
>> there are not many ways to proceed. The latter problem seems even more
>> marginal than the former. Perhaps using "file://-" for output could
>> solve both of them?
>
> I guess we'd need some kind of --dry-run option that would output as
> much as possible about the request-to-be-done but without actually doing
> it...

Yes, that may solve both problems.

> I think it is a good idea, but I think we'll need to bang on the library
> a bit too then since a lot of what is sent in a typical request is not
> quite known to curl but is simply left for libcurl to setup and do.

Some parts of it. Of course, that will have to be done thinking at how
a generic client program may benefit for it. Sounds good for testing.

I'll have to check the "file://" behavior. It is barely (un)documented
for the CURLOPT_URL API parameter. Apparently, curl can read from a file
url, but doesn't write to it. And "file://-" doesn't read stdin.

Index: src/main.c
===================================================================
RCS file: /cvsroot/curl/curl/src/main.c,v
retrieving revision 1.431
diff -u -r1.431 main.c
--- src/main.c 20 Nov 2007 10:08:43 -0000 1.431
+++ src/main.c 21 Nov 2007 14:29:17 -0000
@@ -2059,19 +2059,21 @@
            */
           char *p = strchr(nextarg, '=');
           long size = 0;
- size_t nlen;
+ int nlen;
+ char is_file;
           if(!p)
             p = strchr(nextarg, '@');
- if(!p) {
- warnf(config, "bad use of --data-urlencode\n");
- return PARAM_BAD_USE;
+ if (p) {
+ nlen = p - nextarg; /* length of the name part */
+ is_file = *p++; /* pass the separator */
           }
- nlen = p - nextarg; /* length of the name part */
- if('@' == *p) {
+ else {
+ nlen = is_file = -1;
+ p = nextarg;
+ }
+ if('@' == is_file) {
             /* a '@' letter, it means that a file name or - (stdin) follows */
 
- p++; /* pass the separator */
-
             if(curlx_strequal("-", p)) {
               file = stdin;
               SET_BINMODE(stdin);
@@ -2090,7 +2092,7 @@
               fclose(file);
           }
           else {
- GetStr(&postdata, ++p);
+ GetStr(&postdata, p);
             size = strlen(postdata);
           }
 
@@ -2108,8 +2110,10 @@
               char *n = malloc(outlen);
               if(!n)
                 return PARAM_NO_MEM;
-
- snprintf(n, outlen, "%.*s=%s", nlen, nextarg, enc);
+ if (nlen > 0) /* only append '=' if we have a name */
+ snprintf(n, outlen, "%.*s=%s", nlen, nextarg, enc);
+ else
+ strcpy(n, enc);
               curl_free(enc);
               free(postdata);
               if(n) {
Index: docs/curl.1
===================================================================
RCS file: /cvsroot/curl/curl/docs/curl.1,v
retrieving revision 1.230
diff -u -r1.230 curl.1
--- docs/curl.1 20 Nov 2007 10:08:43 -0000 1.230
+++ docs/curl.1 21 Nov 2007 14:30:18 -0000
@@ -224,56 +224,58 @@
 If this option is used several times, the following occurrences make no
 difference.
 .IP "-d/--data <data>"
-(HTTP) Sends the specified data in a POST request to the HTTP server, in a way
-that can emulate as if a user has filled in a HTML form and pressed the submit
-button. Note that the data is sent exactly as specified with no extra
-processing (with all newlines cut off). The data is expected to be
-\&"url-encoded". This will cause curl to pass the data to the server using the
-content-type application/x-www-form-urlencoded. Compare to \fI-F/--form\fP. If
-this option is used more than once on the same command line, the data pieces
+(HTTP) Sends the specified data in a POST request to the HTTP server, in the
+same way that a browser does when a user has filled in an HTML form and
+presses the submit button. This will cause curl to pass the data to the server
+using the content-type application/x-www-form-urlencoded.
+Compare to \fI-F/--form\fP.
+
+\fI-d/--data\fP is the same as \fI--data-ascii\fP. To post data purely binary,
+you should instead use the \fI--data-binary\fP option. To URL encode the value
+of a form field you may use \fI--data-urlencode\fP.
+
+If any of
+these options is used more than once on the same command line, the data pieces
 specified will be merged together with a separating &-letter. Thus, using '-d
 name=daniel -d skill=lousy' would generate a post chunk that looks like
 \&'name=daniel&skill=lousy'.
+.RS
+.IP "--data-ascii <data>"
+(HTTP) This is the default \fI-d/--data\fP option.
+All newlines are cut off. If curl was compiled with \fIiconv\fP (check that
+with \fI-V/--version\fP) the data is converted to ascii. To be CGI compliant,
+the data should be URL encoded and in valid name=value format.
 
 If you start the data with the letter @, the rest should be a file name to
 read the data from, or - if you want curl to read the data from stdin. The
 contents of the file must already be url-encoded. Multiple files can also be
 specified. Posting data from a file named 'foobar' would thus be done with
-\fI--data\fP @foobar".
-
-To post data purely binary, you should instead use the \fI--data-binary\fP
-option.
-
-\fI-d/--data\fP is the same as \fI--data-ascii\fP.
-
-If this option is used several times, the ones following the first will
-append data.
-.IP "--data-ascii <data>"
-(HTTP) This is an alias for the \fI-d/--data\fP option.
-
-If this option is used several times, the ones following the first will
-append data.
+\fI--data @foobar\fP.
 .IP "--data-binary <data>"
-(HTTP) This posts data in a similar manner as \fI--data-ascii\fP does,
-although when using this option the entire context of the posted data is kept
-as-is. If you want to post a binary file without the strip-newlines feature of
-the \fI--data-ascii\fP option, this is for you.
+(HTTP) This posts data exactly as specified with no extra processing
+whatsoever.
 
-If this option is used several times, the ones following the first will
-append data.
+If you start the data with the letter @, the rest should be a filename and
+data is posted in a similar manner as \fI--data-ascii\fP does, except that
+newlines are preserved and conversions are never done.
 .IP "--data-urlencode <data>"
 (HTTP) This posts data, similar to the other --data options with the exception
 that this will do partial URL encoding. (Added in 7.17.2)
 
-The <data> part should be using one of the two following syntaxes:
+To be CGI compliant, the <data> part should begin with a \fIname\fP followed
+by a separator and a content specification, according to one of the two
+following syntaxes:
 .RS
 .IP "name=content"
-This will make curl URL encode the content part and pass that on. Note that
-the name part is not encoded.
+This will make curl URL encode the content part and pass that on.
 .IP "name_at_filename"
-This will make curl load data from the given file, URL encode that data and
-pass it on in the POST like \fIname=urlencoded-data\fP. Note that the name
-part is not encoded.
+This will make curl load data from the given file (including any newlines),
+URL encode that data and pass it on in the POST. If the name part is given,
+curl appends an equal sign, resulting in \fIname=urlencoded-file-content\fP.
+.RE
+.IP
+In both cases, the name part is expected to be URL encoded already. If no
+name is given or no separator is found, only the URL encoded data is sent.
 .RE
 .IP "--digest"
 (HTTP) Enables HTTP Digest authentication. This is a authentication that
Received on 2007-11-21