cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: curl issue...

From: bruce <badouglas_at_gmail.com>
Date: Mon, 4 Jul 2016 12:55:36 -0400

Hey Ray...

Thanks for the reply.. vaguely recall what you're describing... can't seem
to find any reference in my back emails/proj notes..

However, your comments were enough to cause me to back up..

I tested further, and as you suggested, it appears the backend server/arch
is indeed doing "something" and the fact that the consistent results I
were/was getting, did indeed point to "luck"!!

Ie, something may have cached to cookies and I was lucky enough to use a
user_agent that then was able to have the associated cookies returned,
without the backend generating them...

Testing using curl/wget for the 'http://www.bkstr.com' with a
different/obscure (but valid) user-agent hung..

Now, this pointed me to doing a simple test with casperjs, using the same
obscure user-agent.. and as expected, the page/content was returned.. So,
the issue was then to test, to see if the returned cookies from the
casperjs, could then be fed/used for the usual curl/wget...

Stripping out the "garbage", and only focusing on the bkstr.com domain
cookies, and a bit of reformatting.. and yeah.. it appears that this will
work..

So. the basic steps, for anyone who might run into a similar situation with
a target where curl/wget doesn't work.

I) generate a casperjs/headless node solution to get the initial set of
cookies
II) parse the returned cookies to get the cookies in a format for curl/wget
III) craft the curl to use the cookie-jar/file
IV) run as needed..

Now, this only works were the pages/urls after the init essentially use the
same base cookies. If there's dynamic cookie creation going on, on a per
page basis.. the only solution for that (that I know of) is the full
casperjs/headless solution.

Hope this helps someone in the future!

Thanks/Peace!

On Mon, Jul 4, 2016 at 1:37 AM, Ray Satiro via curl-users <
curl-users_at_cool.haxx.se> wrote:

> On 7/3/2016 9:12 AM, bruce wrote:
>
>> weirdness abounds..
>>
>> wget -- this works.. consistently..
>>
>> echo '' > a.lwp
>> wget -vvv --user-agent "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT
>> 6.1; WOW64; Trident/5.0; chromeframe/12.0.742.112)" --cookies=on
>> --load-cookies=a.lwp --keep-session-cookies --save-cookies=a.lwp -O
>> - "http://www.bkstr.com"
>>
>> wget -vvv --user-agent "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT
>> 6.1; WOW64; Trident/5.0; chromeframe/12.0.742.112)" --cookies=on
>> --load-cookies=a.lwp --keep-session-cookies --save-cookies=a.lwp -O
>> - "
>> http://www.bkstr.com/webapp/wcs/stores/servlet/LocateCourseMaterialsServlet?requestType=INITIAL&storeId=432905&demoKey=d
>> "
>>
>>
>> curl -this hangs, it appears to be a cookie thing with the 1st/2nd curls
>> echo '' > a.lwp
>> curl -vvv -A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
>> Firefox/38.0" --cookie-jar 'a.lwp' -L "http://www.bkstr.com"
>>
>> curl -vvv -A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
>> Firefox/38.0" --cookie a.lwp --cookie-jar a.lwp -L "
>> https://www.bkstr.com/webapp/wcs/stores/servlet/LocateCourseMaterialsServlet?requestType=INITIAL&storeId=432905&demoKey=d"
>>
>>
>> I've tested using single/double quotes around the curl file I've tested
>> using --cookie as well as --cookie-jar in both 1st and 2nd curls. It
>> appears to be a consistent issue with the server/curl generating the
>> cookies/string the returned cokies.
>>
>> When I just test the 1st, I still don't get consistent cookies back.
>>
>> Again, running the wget seems to cinsistently work with regards to the
>> cookie issue.
>>
>> thoughts/comments..
>>
>
>
> Please don't top post [1] it makes the conversation harder to follow.
>
> If you are using Windows command prompt single quotes are part of the
> argument, so when you write 'a.lwp' Windows actually writes to the filename
> where ' is the first and last character. Next you attempt to read in a.lwp
> without quotes and curl can't find that since you actually have 'a.lwp'.
>
> You are dealing with caching servers, as I mentioned to you last month
> regarding this issue [2][3].
>
> Vary: Accept-Encoding,User-Agent
>
> That is basically the server telling you different content MAY be served
> depending on what is sent for accept-encoding and user-agent. It is helpful
> to caches which can use that information to determine how to cache the
> page. In your case the website is heavy on javascript and IIRC it was using
> at least F5 in some setup where if there's a no-hit in the cache it will
> send javascript to be executed immediately, which then makes follow up
> requests causing the actual page to be returned, with cookies. That page
> should then be cached with that agent/encoding combination and future
> requests will return that page as long as it hasn't expired.
>
> wget appears to work here because you are using a different user-agent
> that the caching server already has a cached version of bkstr for so the
> extra javascript isn't sent, instead you get the cached version and the
> right cookies. The user agent may continue to work until it doesn't. You're
> really relying on someone in their browser using that user agent on that
> website on that page often enough that it stays in the cache.
>
> As I mentioned last month at the very least you need a JESSIONID cookie to
> avoid the hang, and I think the fastest way is to just supply a blank one.
> The server will either ignore it or return a valid one.
>
> Initially create one like this:
> printf "www.bkstr.com\tFALSE\t/\tFALSE\t0\tJSESSIONID\t" > a.lwp
>
> Then on future requests you should be able to just do this:
> curl -v -b a.lwp -c a.lwp -A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
> Gecko/20100101 Firefox/38.0" -L "
> https://www.bkstr.com/webapp/wcs/stores/servlet/LocateCourseMaterialsServlet?requestType=INITIAL&storeId=432905&demoKey=d
> "
>
> If it's more complicated than that then try something like casperjs, which
> you may have done already [4].
>
>
> [1]: https://curl.haxx.se/mail/etiquette.html#Do_Not_Top_Post
> [2]: https://curl.haxx.se/mail/archive-2016-05/0011.html
> [3]: https://curl.haxx.se/mail/archive-2016-05/0027.html
> [4]: https://curl.haxx.se/mail/archive-2016-05/0026.html
>
>
> -------------------------------------------------------------------
> List admin: https://cool.haxx.se/list/listinfo/curl-users
> FAQ: https://curl.haxx.se/docs/faq.html
> Etiquette: https://curl.haxx.se/mail/etiquette.html
>

-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-users
FAQ: https://curl.haxx.se/docs/faq.html
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2016-07-04