New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hsts: add support for Strict-Transport-Security #5896
Conversation
ecd7b61
to
b4fe60e
Compare
I just found the email talking about this feature on the mailing list and also had a quick look into the wiki. About this:
Would be incredible if supported from the get-go, we probably could have function just like |
ok, let me write up a proposal for how it could work and run it by you before I start any code on that... |
@JCMais something like this: https://github.com/curl/curl/wiki/HSTS#possible-future ? |
This is great @bagder, thanks for taking the time to look into it. What would be a scenario that |
Should the write callback get signaled somehow when the last entry is passed to it? |
I see, makes sense.
Yes, I think this would be great. Example: to close the stream that is being used to write the data to. If possible, the callback could receive the index of the item currently being processed, and the total amount of items, or any other data structure that would convey the same information. |
I added an index struct to the write callback, and I noticed I had forgot to include the expiry date in the hstsentry struct. |
A first shot on read/write callbacks is now provided in this PR as well. The wiki description of the funtions is updated to match the code (and there are initial manpages in the branch). This still lacks test cases for the callbacks. I'll appreciate all and any feedback I can get on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor comments
83f0bed
to
af653fc
Compare
DeepCode's analysis on #1c65de found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
According to the draft implementation in curl, an external library will not be needed. Ref: curl/curl#5896
I took a quick look at this proposal. It appears that it is based on the idea that the HSTS cache is loaded and saved sequentially from a flat file at library initiation and teardown. I'm not sure this scales - especially if HSTS finally gains widespread traction. Questions:
I'm not opposed to a simple flatfile default, but you should consider the alternatives. (A database doesn't have to be massive - SQLite, BerkeleyDB, or even a purpose-built backend might be used.) In any case, I suggest thinking about scalablitly and the (implict) use models. I think that at a minimum, the API should support a database back end, and that the cache load should happen when a cache miss happens, not in library init. This also includes providing a hostname in an eviction callback, not just "write whatever is left in the cache" at library teardown. Sorry if I'm missing something - I saw this pop up, but am too time-limited to dive deeply into the code. Nonetheless, I hope these comments are useful. [1] Yes, that's a very large number to make the point. Consider a library user that's a search engine. Or even 10M or 1M. |
Thanks for good feedback and excellent questions. I think this also shows that it is a good idea to ship this feature marked as experimental to start with, to let users try things out as they are now and we can polish and improve things according to what users want and experience.
If you use that feature, yes. A library user can deal with it however it likes. I'm positive most applications using libcurl will not populate the cache with millions of entries even if/when HSTS is used very widely, since very few applications work with that amount of random sites in a browser-like fashion. If we want to build the system to actually have a huge hsts database, then yes we'd need something different. The current design is made with a small dataset in mind.
Nothing.
Perhaps. But I rather see an actual use for this option before we go ahead and add more bells and whistles.
There's nothing in neither the file format nor in the API for the callbacks that says anything about how the search is to be done. If someone experiences slow HSTS handling I don't think we will object to performance enhancements. I don't think we need to optimize for such a future hypothetical case now but take it as it comes.
Then you're probably asking for an API that lets the application maintain the database and does the entire lookup (because I can't see a future where libcurl bundles some sort of database). That's not provided here but I wouldn't be against it if someone actually provides a viable use case where this is necessary/beneficial. We could of course perhaps consider such an "asking API" instead of the current load/save API to better allow applications to scale to large amounts of hsts entries.
What is "the entire cache" for you? I would imagine a typical command line use case have a very small cache of just a bunch of hosts so reading "the entire cache" is very fast and small operation. Separate caches for separate invokes. Having two different curl command line tools using the same file on disk will just make the second overwrite the first with the existing functionality.
Standard coding practice: first we deliver something basic that works. If there's a need, we improve it over time. If not, basic is fine.
That's a fair idea. I don't think the current API dictates when the cache gets read so we can work on tweaking that. |
081b5ae
to
63c0061
Compare
I'll merge this shortly, Buckle up! |
- enable in the build (configure) - header parsing - host name lookup - unit tests for the above - CI build - CURL_VERSION_HSTS bit - curl_version_info support - curl -V output - curl-config --features - CURLOPT_HSTS_CTRL - man page for CURLOPT_HSTS_CTRL - curl --hsts (sets CURLOPT_HSTS_CTRL and works with --libcurl) - man page for --hsts - save cache to disk - load cache from disk - CURLOPT_HSTS - man page for CURLOPT_HSTS - added docs/HSTS.md - fixed --version docs - adjusted curl_easy_duphandle
- read/write callback options - man pages for the 4 new setopts - test 1915 verifies the callbacks
- read/write callback options - man pages for the 4 new setopts - test 1915 verifies the callbacks Closes #5896
CURLOPT_HSTS_CTRL
)--hsts
--hsts
docsCURLOPT_HSTS_CTRL
docsCURLOPT_HSTS
)CURLOPT_HSTS
docsdocs/HSTS.md
--version
docs updatedcurl_easy_duphandle
accordinglyInitial thoughts on API and usage: https://github.com/curl/curl/wiki/HSTS
Targeted to get merged for 7.74.0