Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 1 | Table of Contents |
| 2 | ================= |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 3 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 4 | - [Intro](#intro) |
| 5 | - [git](#git) |
| 6 | - [Portability](#Portability) |
| 7 | - [Windows vs Unix](#winvsunix) |
| 8 | - [Library](#Library) |
| 9 | - [`Curl_connect`](#Curl_connect) |
| 10 | - [`Curl_do`](#Curl_do) |
| 11 | - [`Curl_readwrite`](#Curl_readwrite) |
| 12 | - [`Curl_done`](#Curl_done) |
| 13 | - [`Curl_disconnect`](#Curl_disconnect) |
| 14 | - [HTTP(S)](#http) |
| 15 | - [FTP](#ftp) |
| 16 | - [Kerberos](#kerberos) |
| 17 | - [TELNET](#telnet) |
| 18 | - [FILE](#file) |
| 19 | - [SMB](#smb) |
| 20 | - [LDAP](#ldap) |
| 21 | - [E-mail](#email) |
| 22 | - [General](#general) |
| 23 | - [Persistent Connections](#persistent) |
| 24 | - [multi interface/non-blocking](#multi) |
| 25 | - [SSL libraries](#ssl) |
| 26 | - [Library Symbols](#symbols) |
| 27 | - [Return Codes and Informationals](#returncodes) |
| 28 | - [AP/ABI](#abi) |
| 29 | - [Client](#client) |
| 30 | - [Memory Debugging](#memorydebug) |
| 31 | - [Test Suite](#test) |
| 32 | - [Asynchronous name resolves](#asyncdns) |
| 33 | - [c-ares](#cares) |
| 34 | - [`curl_off_t`](#curl_off_t) |
| 35 | - [curlx](#curlx) |
| 36 | - [Content Encoding](#contentencoding) |
| 37 | - [hostip.c explained](#hostip) |
| 38 | - [Track Down Memory Leaks](#memoryleak) |
| 39 | - [`multi_socket`](#multi_socket) |
| 40 | - [Structs in libcurl](#structs) |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 41 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 42 | <a name="intro"></a> |
| 43 | curl internals |
| 44 | ============== |
| 45 | |
| 46 | This project is split in two. The library and the client. The client part |
| 47 | uses the library, but the library is designed to allow other applications to |
| 48 | use it. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 49 | |
| 50 | The largest amount of code and complexity is in the library part. |
| 51 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 52 | |
| 53 | <a name="git"></a> |
| 54 | git |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 55 | === |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 56 | |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 57 | All changes to the sources are committed to the git repository as soon as |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 58 | they're somewhat verified to work. Changes shall be committed as independently |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 59 | as possible so that individual changes can be easier spotted and tracked |
| 60 | afterwards. |
| 61 | |
| 62 | Tagging shall be used extensively, and by the time we release new archives we |
| 63 | should tag the sources with a name similar to the released version number. |
| 64 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 65 | <a name="Portability"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 66 | Portability |
| 67 | =========== |
| 68 | |
| 69 | We write curl and libcurl to compile with C89 compilers. On 32bit and up |
| 70 | machines. Most of libcurl assumes more or less POSIX compliance but that's |
| 71 | not a requirement. |
| 72 | |
| 73 | We write libcurl to build and work with lots of third party tools, and we |
| 74 | want it to remain functional and buildable with these and later versions |
| 75 | (older versions may still work but is not what we work hard to maintain): |
| 76 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 77 | Dependencies |
| 78 | ------------ |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 79 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 80 | - OpenSSL 0.9.7 |
| 81 | - GnuTLS 1.2 |
| 82 | - zlib 1.1.4 |
| 83 | - libssh2 0.16 |
| 84 | - c-ares 1.6.0 |
| 85 | - libidn 0.4.1 |
| 86 | - cyassl 2.0.0 |
| 87 | - openldap 2.0 |
| 88 | - MIT Kerberos 1.2.4 |
| 89 | - GSKit V5R3M0 |
| 90 | - NSS 3.14.x |
| 91 | - axTLS 1.2.7 |
| 92 | - PolarSSL 1.3.0 |
| 93 | - Heimdal ? |
| 94 | - nghttp2 1.0.0 |
| 95 | |
| 96 | Operating Systems |
| 97 | ----------------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 98 | |
| 99 | On systems where configure runs, we aim at working on them all - if they have |
| 100 | a suitable C compiler. On systems that don't run configure, we strive to keep |
| 101 | curl running fine on: |
| 102 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 103 | - Windows 98 |
| 104 | - AS/400 V5R3M0 |
| 105 | - Symbian 9.1 |
| 106 | - Windows CE ? |
| 107 | - TPF ? |
| 108 | |
| 109 | Build tools |
| 110 | ----------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 111 | |
| 112 | When writing code (mostly for generating stuff included in release tarballs) |
| 113 | we use a few "build tools" and we make sure that we remain functional with |
| 114 | these versions: |
| 115 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 116 | - GNU Libtool 1.4.2 |
| 117 | - GNU Autoconf 2.57 |
| 118 | - GNU Automake 1.7 |
| 119 | - GNU M4 1.4 |
| 120 | - perl 5.004 |
| 121 | - roffit 0.5 |
| 122 | - groff ? (any version that supports "groff -Tps -man [in] [out]") |
| 123 | - ps2pdf (gs) ? |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 124 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 125 | <a name="winvsunix"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 126 | Windows vs Unix |
| 127 | =============== |
| 128 | |
| 129 | There are a few differences in how to program curl the unix way compared to |
| 130 | the Windows way. The four perhaps most notable details are: |
| 131 | |
| 132 | 1. Different function names for socket operations. |
| 133 | |
| 134 | In curl, this is solved with defines and macros, so that the source looks |
| 135 | the same at all places except for the header file that defines them. The |
| 136 | macros in use are sclose(), sread() and swrite(). |
| 137 | |
| 138 | 2. Windows requires a couple of init calls for the socket stuff. |
| 139 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 140 | That's taken care of by the `curl_global_init()` call, but if other libs |
| 141 | also do it etc there might be reasons for applications to alter that |
| 142 | behaviour. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 143 | |
| 144 | 3. The file descriptors for network communication and file operations are |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 145 | not easily interchangeable as in unix. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 146 | |
| 147 | We avoid this by not trying any funny tricks on file descriptors. |
| 148 | |
| 149 | 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus |
| 150 | destroying binary data, although you do want that conversion if it is |
| 151 | text coming through... (sigh) |
| 152 | |
| 153 | We set stdout to binary under windows |
| 154 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 155 | Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 156 | conditionals that deal with features *should* instead be in the format |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 157 | `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts, |
| 158 | we maintain a `curl_config-win32.h` file in lib directory that is supposed to |
| 159 | look exactly as a `curl_config.h` file would have looked like on a Windows |
| 160 | machine! |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 161 | |
| 162 | Generally speaking: always remember that this will be compiled on dozens of |
| 163 | operating systems. Don't walk on the edge. |
| 164 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 165 | <a name="Library"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 166 | Library |
| 167 | ======= |
| 168 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 169 | (See `LIBCURL-STRUCTS` for a separate document describing all major internal |
| 170 | structs and their purposes.) |
| 171 | |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 172 | There are plenty of entry points to the library, namely each publicly defined |
| 173 | function that libcurl offers to applications. All of those functions are |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 174 | rather small and easy-to-follow. All the ones prefixed with `curl_easy` are |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 175 | put in the lib/easy.c file. |
| 176 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 177 | `curl_global_init_()` and `curl_global_cleanup()` should be called by the |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 178 | application to initialize and clean up global stuff in the library. As of |
| 179 | today, it can handle the global SSL initing if SSL is enabled and it can init |
| 180 | the socket layer on windows machines. libcurl itself has no "global" scope. |
| 181 | |
| 182 | All printf()-style functions use the supplied clones in lib/mprintf.c. This |
| 183 | makes sure we stay absolutely platform independent. |
| 184 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 185 | [ `curl_easy_init()`][2] allocates an internal struct and makes some |
| 186 | initializations. The returned handle does not reveal internals. This is the |
| 187 | 'SessionHandle' struct which works as an "anchor" struct for all `curl_easy` |
| 188 | functions. All connections performed will get connect-specific data allocated |
| 189 | that should be used for things related to particular connections/requests. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 190 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 191 | [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must |
| 192 | be passed in pairs: the parameter-ID and the parameter-value. The list of |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 193 | options is documented in the man page. This function mainly sets things in |
| 194 | the 'SessionHandle' struct. |
| 195 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 196 | `curl_easy_perform()` is just a wrapper function that makes use of the multi |
| 197 | API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`, |
| 198 | `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done |
| 199 | and then returns. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 200 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 201 | Some of the most important key functions in url.c are called from multi.c |
| 202 | when certain key steps are to be made in the transfer operation. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 203 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 204 | <a name="Curl_connect"></a> |
| 205 | Curl_connect() |
| 206 | -------------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 207 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 208 | Analyzes the URL, it separates the different components and connects to the |
| 209 | remote host. This may involve using a proxy and/or using SSL. The |
| 210 | `Curl_resolv()` function in lib/hostip.c is used for looking up host names |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 211 | (it does then use the proper underlying method, which may vary between |
| 212 | platforms and builds). |
| 213 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 214 | When `Curl_connect` is done, we are connected to the remote site. Then it |
| 215 | is time to tell the server to get a document/file. `Curl_do()` arranges |
| 216 | this. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 217 | |
| 218 | This function makes sure there's an allocated and initiated 'connectdata' |
| 219 | struct that is used for this particular connection only (although there may |
| 220 | be several requests performed on the same connect). A bunch of things are |
| 221 | inited/inherited from the SessionHandle struct. |
| 222 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 223 | <a name="Curl_do"></a> |
| 224 | Curl_do() |
| 225 | --------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 226 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 227 | `Curl_do()` makes sure the proper protocol-specific function is called. The |
| 228 | functions are named after the protocols they handle. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 229 | |
| 230 | The protocol-specific functions of course deal with protocol-specific |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 231 | negotiations and setup. They have access to the `Curl_sendf()` (from |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 232 | lib/sendf.c) function to send printf-style formatted data to the remote |
| 233 | host and when they're ready to make the actual file transfer they call the |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 234 | `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 235 | returns. |
| 236 | |
| 237 | If this DO function fails and the connection is being re-used, libcurl will |
| 238 | then close this connection, setup a new connection and re-issue the DO |
| 239 | request on that. This is because there is no way to be perfectly sure that |
| 240 | we have discovered a dead connection before the DO function and thus we |
| 241 | might wrongly be re-using a connection that was closed by the remote peer. |
| 242 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 243 | Some time during the DO function, the `Curl_setup_transfer()` function must |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 244 | be called with some basic info about the upcoming transfer: what socket(s) |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 245 | to read/write and the expected file transfer sizes (if known). |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 246 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 247 | <a name="Curl_readwrite"></a> |
| 248 | Curl_readwrite() |
| 249 | ---------------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 250 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 251 | Called during the transfer of the actual protocol payload. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 252 | |
| 253 | During transfer, the progress functions in lib/progress.c are called at a |
| 254 | frequent interval (or at the user's choice, a specified callback might get |
| 255 | called). The speedcheck functions in lib/speedcheck.c are also used to |
| 256 | verify that the transfer is as fast as required. |
| 257 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 258 | <a name="Curl_done"></a> |
| 259 | Curl_done() |
| 260 | ----------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 261 | |
| 262 | Called after a transfer is done. This function takes care of everything |
| 263 | that has to be done after a transfer. This function attempts to leave |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 264 | matters in a state so that `Curl_do()` should be possible to call again on |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 265 | the same connection (in a persistent connection case). It might also soon |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 266 | be closed with `Curl_disconnect()`. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 267 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 268 | <a name="Curl_disconnect"></a> |
| 269 | Curl_disconnect() |
| 270 | ----------------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 271 | |
| 272 | When doing normal connections and transfers, no one ever tries to close any |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 273 | connections so this is not normally called when `curl_easy_perform()` is |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 274 | used. This function is only used when we are certain that no more transfers |
| 275 | is going to be made on the connection. It can be also closed by force, or |
| 276 | it can be called to make sure that libcurl doesn't keep too many |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 277 | connections alive at the same time. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 278 | |
| 279 | This function cleans up all resources that are associated with a single |
| 280 | connection. |
| 281 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 282 | <a name="http"></a> |
| 283 | HTTP(S) |
| 284 | ======= |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 285 | |
| 286 | HTTP offers a lot and is the protocol in curl that uses the most lines of |
| 287 | code. There is a special file (lib/formdata.c) that offers all the multipart |
| 288 | post functions. |
| 289 | |
| 290 | base64-functions for user+password stuff (and more) is in (lib/base64.c) and |
| 291 | all functions for parsing and sending cookies are found in (lib/cookie.c). |
| 292 | |
| 293 | HTTPS uses in almost every means the same procedure as HTTP, with only two |
| 294 | exceptions: the connect procedure is different and the function used to read |
| 295 | or write from the socket is different, although the latter fact is hidden in |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 296 | the source by the use of `Curl_read()` for reading and `Curl_write()` for |
| 297 | writing data to the remote server. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 298 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 299 | `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 300 | encoding. |
| 301 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 302 | An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()` |
| 303 | series of functions we use. They append data to one single buffer, and when |
| 304 | the building is done the entire request is sent off in one single write. This |
| 305 | is done this way to overcome problems with flawed firewalls and lame servers. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 306 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 307 | <a name="ftp"></a> |
| 308 | FTP |
| 309 | === |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 310 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 311 | The `Curl_if2ip()` function can be used for getting the IP number of a |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 312 | specified network interface, and it resides in lib/if2ip.c. |
| 313 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 314 | `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It |
| 315 | was made a separate function to prevent us programmers from forgetting that |
| 316 | they must be CRLF terminated. They must also be sent in one single write() to |
| 317 | make firewalls and similar happy. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 318 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 319 | <a name="kerberos"></a> |
| 320 | Kerberos |
| 321 | -------- |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 322 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 323 | Kerberos support is mainly in lib/krb5.c and lib/security.c but also |
| 324 | `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and |
| 325 | `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 326 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 327 | <a name="telnet"></a> |
| 328 | TELNET |
| 329 | ====== |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 330 | |
| 331 | Telnet is implemented in lib/telnet.c. |
| 332 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 333 | <a name="file"></a> |
| 334 | FILE |
| 335 | ==== |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 336 | |
| 337 | The file:// protocol is dealt with in lib/file.c. |
| 338 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 339 | <a name="smb"></a> |
| 340 | SMB |
| 341 | === |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 342 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 343 | The smb:// protocol is dealt with in lib/smb.c. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 344 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 345 | <a name="ldap"></a> |
| 346 | LDAP |
| 347 | ==== |
| 348 | |
| 349 | Everything LDAP is in lib/ldap.c and lib/openldap.c |
| 350 | |
| 351 | <a name="email"></a> |
| 352 | E-mail |
| 353 | ====== |
| 354 | |
| 355 | The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c. |
| 356 | |
| 357 | <a name="general"></a> |
| 358 | General |
| 359 | ======= |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 360 | |
| 361 | URL encoding and decoding, called escaping and unescaping in the source code, |
| 362 | is found in lib/escape.c. |
| 363 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 364 | While transferring data in Transfer() a few functions might get used. |
| 365 | `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more). |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 366 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 367 | lib/getenv.c offers `curl_getenv()` which is for reading environment |
| 368 | variables in a neat platform independent way. That's used in the client, but |
| 369 | also in lib/url.c when checking the proxy environment variables. Note that |
| 370 | contrary to the normal unix getenv(), this returns an allocated buffer that |
| 371 | must be free()ed after use. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 372 | |
| 373 | lib/netrc.c holds the .netrc parser |
| 374 | |
| 375 | lib/timeval.c features replacement functions for systems that don't have |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 376 | gettimeofday() and a few support functions for timeval conversions. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 377 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 378 | A function named `curl_version()` that returns the full curl version string |
| 379 | is found in lib/version.c. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 380 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 381 | <a name="persistent"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 382 | Persistent Connections |
| 383 | ====================== |
| 384 | |
| 385 | The persistent connection support in libcurl requires some considerations on |
| 386 | how to do things inside of the library. |
| 387 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 388 | - The 'SessionHandle' struct returned in the [`curl_easy_init()`][2] call |
| 389 | must never hold connection-oriented data. It is meant to hold the root data |
| 390 | as well as all the options etc that the library-user may choose. |
| 391 | |
| 392 | - The 'SessionHandle' struct holds the "connection cache" (an array of |
| 393 | pointers to 'connectdata' structs). |
| 394 | |
| 395 | - This enables the 'curl handle' to be reused on subsequent transfers. |
| 396 | |
| 397 | - When libcurl is told to perform a transfer, it first checks for an already |
| 398 | existing connection in the cache that we can use. Otherwise it creates a |
| 399 | new one and adds that the cache. If the cache is full already when a new |
| 400 | connection is added added, it will first close the oldest unused one. |
| 401 | |
| 402 | - When the transfer operation is complete, the connection is left |
| 403 | open. Particular options may tell libcurl not to, and protocols may signal |
| 404 | closure on connections and then they won't be kept open of course. |
| 405 | |
| 406 | - When `curl_easy_cleanup()` is called, we close all still opened connections, |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 407 | unless of course the multi interface "owns" the connections. |
| 408 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 409 | The curl handle must be re-used in order for the persistent connections to |
| 410 | work. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 411 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 412 | <a name="multi"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 413 | multi interface/non-blocking |
| 414 | ============================ |
| 415 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 416 | The multi interface is a non-blocking interface to the library. To make that |
| 417 | interface work as good as possible, no low-level functions within libcurl |
| 418 | must be written to work in a blocking manner. (There are still a few spots |
| 419 | violating this rule.) |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 420 | |
| 421 | One of the primary reasons we introduced c-ares support was to allow the name |
| 422 | resolve phase to be perfectly non-blocking as well. |
| 423 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 424 | The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust |
| 425 | the code to allow non-blocking operations even on multi-stage command- |
| 426 | response protocols. They are built around state machines that return when |
| 427 | they would otherwise block waiting for data. The DICT, LDAP and TELNET |
| 428 | protocols are crappy examples and they are subject for rewrite in the future |
| 429 | to better fit the libcurl protocol family. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 430 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 431 | <a name="ssl"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 432 | SSL libraries |
| 433 | ============= |
| 434 | |
| 435 | Originally libcurl supported SSLeay for SSL/TLS transports, but that was then |
| 436 | extended to its successor OpenSSL but has since also been extended to several |
| 437 | other SSL/TLS libraries and we expect and hope to further extend the support |
| 438 | in future libcurl versions. |
| 439 | |
| 440 | To deal with this internally in the best way possible, we have a generic SSL |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 441 | function API as provided by the vtls/vtls.[ch] system, and they are the only |
| 442 | SSL functions we must use from within libcurl. vtls is then crafted to use |
| 443 | the appropriate lower-level function calls to whatever SSL library that is in |
| 444 | use. For example vtls/openssl.[ch] for the OpenSSL library. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 445 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 446 | <a name="symbols"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 447 | Library Symbols |
| 448 | =============== |
| 449 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 450 | All symbols used internally in libcurl must use a `Curl_` prefix if they're |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 451 | used in more than a single file. Single-file symbols must be made static. |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 452 | Public ("exported") symbols must use a `curl_` prefix. (There are exceptions, |
| 453 | but they are to be changed to follow this pattern in future versions.) Public |
| 454 | API functions are marked with `CURL_EXTERN` in the public header files so |
| 455 | that all others can be hidden on platforms where this is possible. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 456 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 457 | <a name="returncodes"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 458 | Return Codes and Informationals |
| 459 | =============================== |
| 460 | |
| 461 | I've made things simple. Almost every function in libcurl returns a CURLcode, |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 462 | that must be `CURLE_OK` if everything is OK or otherwise a suitable error |
| 463 | code as the curl/curl.h include file defines. The very spot that detects an |
| 464 | error must use the `Curl_failf()` function to set the human-readable error |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 465 | description. |
| 466 | |
| 467 | In aiding the user to understand what's happening and to debug curl usage, we |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 468 | must supply a fair amount of informational messages by using the |
| 469 | `Curl_infof()` function. Those messages are only displayed when the user |
| 470 | explicitly asks for them. They are best used when revealing information that |
| 471 | isn't otherwise obvious. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 472 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 473 | <a name="abi"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 474 | API/ABI |
| 475 | ======= |
| 476 | |
| 477 | We make an effort to not export or show internals or how internals work, as |
| 478 | that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI |
| 479 | for our promise to users. |
| 480 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 481 | <a name="client"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 482 | Client |
| 483 | ====== |
| 484 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 485 | main() resides in `src/tool_main.c`. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 486 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 487 | `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script |
| 488 | to display the complete "manual" and the src/tool_urlglob.c file holds the |
| 489 | functions used for the URL-"globbing" support. Globbing in the sense that the |
| 490 | {} and [] expansion stuff is there. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 491 | |
| 492 | The client mostly messes around to setup its 'config' struct properly, then |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 493 | it calls the `curl_easy_*()` functions of the library and when it gets back |
| 494 | control after the `curl_easy_perform()` it cleans up the library, checks |
| 495 | status and exits. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 496 | |
| 497 | When the operation is done, the ourWriteOut() function in src/writeout.c may |
| 498 | be called to report about the operation. That function is using the |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 499 | `curl_easy_getinfo()` function to extract useful information from the curl |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 500 | session. |
| 501 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 502 | It may loop and do all this several times if many URLs were specified on the |
| 503 | command line or config file. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 504 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 505 | <a name="memorydebug"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 506 | Memory Debugging |
| 507 | ================ |
| 508 | |
| 509 | The file lib/memdebug.c contains debug-versions of a few functions. Functions |
| 510 | such as malloc, free, fopen, fclose, etc that somehow deal with resources |
| 511 | that might give us problems if we "leak" them. The functions in the memdebug |
| 512 | system do nothing fancy, they do their normal function and then log |
| 513 | information about what they just did. The logged data can then be analyzed |
| 514 | after a complete session, |
| 515 | |
| 516 | memanalyze.pl is the perl script present in tests/ that analyzes a log file |
| 517 | generated by the memory tracking system. It detects if resources are |
| 518 | allocated but never freed and other kinds of errors related to resource |
| 519 | management. |
| 520 | |
| 521 | Internally, definition of preprocessor symbol DEBUGBUILD restricts code which |
| 522 | is only compiled for debug enabled builds. And symbol CURLDEBUG is used to |
| 523 | differentiate code which is _only_ used for memory tracking/debugging. |
| 524 | |
| 525 | Use -DCURLDEBUG when compiling to enable memory debugging, this is also |
| 526 | switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD |
| 527 | when compiling to enable a debug build or run configure with --enable-debug. |
| 528 | |
| 529 | curl --version will list 'Debug' feature for debug enabled builds, and |
| 530 | will list 'TrackMemory' feature for curl debug memory tracking capable |
| 531 | builds. These features are independent and can be controlled when running |
| 532 | the configure script. When --enable-debug is given both features will be |
| 533 | enabled, unless some restriction prevents memory tracking from being used. |
| 534 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 535 | <a name="test"></a> |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 536 | Test Suite |
| 537 | ========== |
| 538 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 539 | The test suite is placed in its own subdirectory directly off the root in the |
| 540 | curl archive tree, and it contains a bunch of scripts and a lot of test case |
| 541 | data. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 542 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 543 | The main test script is runtests.pl that will invoke test servers like |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 544 | httpserver.pl and ftpserver.pl before all the test cases are performed. The |
| 545 | test suite currently only runs on unix-like platforms. |
| 546 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 547 | You'll find a description of the test suite in the tests/README file, and the |
| 548 | test case data files in the tests/FILEFORMAT file. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 549 | |
| 550 | The test suite automatically detects if curl was built with the memory |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 551 | debugging enabled, and if it was it will detect memory leaks, too. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 552 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 553 | <a name="asyncdns"></a> |
| 554 | Asynchronous name resolves |
| 555 | ========================== |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 556 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 557 | libcurl can be built to do name resolves asynchronously, using either the |
| 558 | normal resolver in a threaded manner or by using c-ares. |
Lucas Eckels | 9bd90e6 | 2012-08-06 15:07:02 -0700 | [diff] [blame] | 559 | |
Bertrand SIMONNET | e6cd738 | 2015-07-01 15:39:44 -0700 | [diff] [blame] | 560 | <a name="cares"></a> |
| 561 | [c-ares][3] |
| 562 | ------ |
| 563 | |
| 564 | ### Build libcurl to use a c-ares |
| 565 | |
| 566 | 1. ./configure --enable-ares=/path/to/ares/install |
| 567 | 2. make |
| 568 | |
| 569 | ### c-ares on win32 |
| 570 | |
| 571 | First I compiled c-ares. I changed the default C runtime library to be the |
| 572 | single-threaded rather than the multi-threaded (this seems to be required to |
| 573 | prevent linking errors later on). Then I simply build the areslib project |
| 574 | (the other projects adig/ahost seem to fail under MSVC). |
| 575 | |
| 576 | Next was libcurl. I opened lib/config-win32.h and I added a: |
| 577 | `#define USE_ARES 1` |
| 578 | |
| 579 | Next thing I did was I added the path for the ares includes to the include |
| 580 | path, and the libares.lib to the libraries. |
| 581 | |
| 582 | Lastly, I also changed libcurl to be single-threaded rather than |
| 583 | multi-threaded, again this was to prevent some duplicate symbol errors. I'm |
| 584 | not sure why I needed to change everything to single-threaded, but when I |
| 585 | didn't I got redefinition errors for several CRT functions (malloc, stricmp, |
| 586 | etc.) |
| 587 | |
| 588 | <a name="curl_off_t"></a> |
| 589 | `curl_off_t` |
| 590 | ========== |
| 591 | |
| 592 | curl_off_t is a data type provided by the external libcurl include |
| 593 | headers. It is the type meant to be used for the [`curl_easy_setopt()`][1] |
| 594 | options that end with LARGE. The type is 64bit large on most modern |
| 595 | platforms. |
| 596 | |
| 597 | curlx |
| 598 | ===== |
| 599 | |
| 600 | The libcurl source code offers a few functions by source only. They are not |
| 601 | part of the official libcurl API, but the source files might be useful for |
| 602 | others so apps can optionally compile/build with these sources to gain |
| 603 | additional functions. |
| 604 | |
| 605 | We provide them through a single header file for easy access for apps: |
| 606 | "curlx.h" |
| 607 | |
| 608 | `curlx_strtoofft()` |
| 609 | ------------------- |
| 610 | A macro that converts a string containing a number to a curl_off_t number. |
| 611 | This might use the curlx_strtoll() function which is provided as source |
| 612 | code in strtoofft.c. Note that the function is only provided if no |
| 613 | strtoll() (or equivalent) function exist on your platform. If curl_off_t |
| 614 | is only a 32 bit number on your platform, this macro uses strtol(). |
| 615 | |
| 616 | `curlx_tvnow()` |
| 617 | --------------- |
| 618 | returns a struct timeval for the current time. |
| 619 | |
| 620 | `curlx_tvdiff()` |
| 621 | -------------- |
| 622 | returns the difference between two timeval structs, in number of |
| 623 | milliseconds. |
| 624 | |
| 625 | `curlx_tvdiff_secs()` |
| 626 | --------------------- |
| 627 | returns the same as curlx_tvdiff but with full usec resolution (as a |
| 628 | double) |
| 629 | |
| 630 | Future |
| 631 | ------ |
| 632 | |
| 633 | Several functions will be removed from the public curl_ name space in a |
| 634 | future libcurl release. They will then only become available as curlx_ |
| 635 | functions instead. To make the transition easier, we already today provide |
| 636 | these functions with the curlx_ prefix to allow sources to get built properly |
| 637 | with the new function names. The functions this concerns are: |
| 638 | |
| 639 | - `curlx_getenv` |
| 640 | - `curlx_strequal` |
| 641 | - `curlx_strnequal` |
| 642 | - `curlx_mvsnprintf` |
| 643 | - `curlx_msnprintf` |
| 644 | - `curlx_maprintf` |
| 645 | - `curlx_mvaprintf` |
| 646 | - `curlx_msprintf` |
| 647 | - `curlx_mprintf` |
| 648 | - `curlx_mfprintf` |
| 649 | - `curlx_mvsprintf` |
| 650 | - `curlx_mvprintf` |
| 651 | - `curlx_mvfprintf` |
| 652 | |
| 653 | <a name="contentencoding"></a> |
| 654 | Content Encoding |
| 655 | ================ |
| 656 | |
| 657 | ## About content encodings |
| 658 | |
| 659 | [HTTP/1.1][4] specifies that a client may request that a server encode its |
| 660 | response. This is usually used to compress a response using one of a set of |
| 661 | commonly available compression techniques. These schemes are 'deflate' (the |
| 662 | zlib algorithm), 'gzip' and 'compress'. A client requests that the sever |
| 663 | perform an encoding by including an Accept-Encoding header in the request |
| 664 | document. The value of the header should be one of the recognized tokens |
| 665 | 'deflate', ... (there's a way to register new schemes/tokens, see sec 3.5 of |
| 666 | the spec). A server MAY honor the client's encoding request. When a response |
| 667 | is encoded, the server includes a Content-Encoding header in the |
| 668 | response. The value of the Content-Encoding header indicates which scheme was |
| 669 | used to encode the data. |
| 670 | |
| 671 | A client may tell a server that it can understand several different encoding |
| 672 | schemes. In this case the server may choose any one of those and use it to |
| 673 | encode the response (indicating which one using the Content-Encoding header). |
| 674 | It's also possible for a client to attach priorities to different schemes so |
| 675 | that the server knows which it prefers. See sec 14.3 of RFC 2616 for more |
| 676 | information on the Accept-Encoding header. |
| 677 | |
| 678 | ## Supported content encodings |
| 679 | |
| 680 | The 'deflate' and 'gzip' content encoding are supported by libcurl. Both |
| 681 | regular and chunked transfers work fine. The zlib library is required for |
| 682 | this feature. |
| 683 | |
| 684 | ## The libcurl interface |
| 685 | |
| 686 | To cause libcurl to request a content encoding use: |
| 687 | |
| 688 | [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string) |
| 689 | |
| 690 | where string is the intended value of the Accept-Encoding header. |
| 691 | |
| 692 | Currently, libcurl only understands how to process responses that use the |
| 693 | "deflate" or "gzip" Content-Encoding, so the only values for |
| 694 | [`CURLOPT_ACCEPT_ENCODING`][5] that will work (besides "identity," which does |
| 695 | nothing) are "deflate" and "gzip" If a response is encoded using the |
| 696 | "compress" or methods, libcurl will return an error indicating that the |
| 697 | response could not be decoded. If <string> is NULL no Accept-Encoding header |
| 698 | is generated. If <string> is a zero-length string, then an Accept-Encoding |
| 699 | header containing all supported encodings will be generated. |
| 700 | |
| 701 | The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for |
| 702 | content to be automatically decoded. If it is not set and the server still |
| 703 | sends encoded content (despite not having been asked), the data is returned |
| 704 | in its raw form and the Content-Encoding type is not checked. |
| 705 | |
| 706 | ## The curl interface |
| 707 | |
| 708 | Use the [--compressed][6] option with curl to cause it to ask servers to |
| 709 | compress responses using any format supported by curl. |
| 710 | |
| 711 | <a name="hostip"></a> |
| 712 | hostip.c explained |
| 713 | ================== |
| 714 | |
| 715 | The main compile-time defines to keep in mind when reading the host*.c source |
| 716 | file are these: |
| 717 | |
| 718 | ## `CURLRES_IPV6` |
| 719 | |
| 720 | this host has getaddrinfo() and family, and thus we use that. The host may |
| 721 | not be able to resolve IPv6, but we don't really have to take that into |
| 722 | account. Hosts that aren't IPv6-enabled have CURLRES_IPV4 defined. |
| 723 | |
| 724 | ## `CURLRES_ARES` |
| 725 | |
| 726 | is defined if libcurl is built to use c-ares for asynchronous name |
| 727 | resolves. This can be Windows or *nix. |
| 728 | |
| 729 | ## `CURLRES_THREADED` |
| 730 | |
| 731 | is defined if libcurl is built to use threading for asynchronous name |
| 732 | resolves. The name resolve will be done in a new thread, and the supported |
| 733 | asynch API will be the same as for ares-builds. This is the default under |
| 734 | (native) Windows. |
| 735 | |
| 736 | If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If |
| 737 | libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is |
| 738 | defined. |
| 739 | |
| 740 | ## host*.c sources |
| 741 | |
| 742 | The host*.c sources files are split up like this: |
| 743 | |
| 744 | - hostip.c - method-independent resolver functions and utility functions |
| 745 | - hostasyn.c - functions for asynchronous name resolves |
| 746 | - hostsyn.c - functions for synchronous name resolves |
| 747 | - asyn-ares.c - functions for asynchronous name resolves using c-ares |
| 748 | - asyn-thread.c - functions for asynchronous name resolves using threads |
| 749 | - hostip4.c - IPv4 specific functions |
| 750 | - hostip6.c - IPv6 specific functions |
| 751 | |
| 752 | The hostip.h is the single united header file for all this. It defines the |
| 753 | `CURLRES_*` defines based on the config*.h and curl_setup.h defines. |
| 754 | |
| 755 | <a name="memoryleak"></a> |
| 756 | Track Down Memory Leaks |
| 757 | ======================= |
| 758 | |
| 759 | ## Single-threaded |
| 760 | |
| 761 | Please note that this memory leak system is not adjusted to work in more |
| 762 | than one thread. If you want/need to use it in a multi-threaded app. Please |
| 763 | adjust accordingly. |
| 764 | |
| 765 | |
| 766 | ## Build |
| 767 | |
| 768 | Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with |
| 769 | --enable-debug fixes this). 'make clean' first, then 'make' so that all |
| 770 | files actually are rebuilt properly. It will also make sense to build |
| 771 | libcurl with the debug option (usually -g to the compiler) so that debugging |
| 772 | it will be easier if you actually do find a leak in the library. |
| 773 | |
| 774 | This will create a library that has memory debugging enabled. |
| 775 | |
| 776 | ## Modify Your Application |
| 777 | |
| 778 | Add a line in your application code: |
| 779 | |
| 780 | `curl_memdebug("dump");` |
| 781 | |
| 782 | This will make the malloc debug system output a full trace of all resource |
| 783 | using functions to the given file name. Make sure you rebuild your program |
| 784 | and that you link with the same libcurl you built for this purpose as |
| 785 | described above. |
| 786 | |
| 787 | ## Run Your Application |
| 788 | |
| 789 | Run your program as usual. Watch the specified memory trace file grow. |
| 790 | |
| 791 | Make your program exit and use the proper libcurl cleanup functions etc. So |
| 792 | that all non-leaks are returned/freed properly. |
| 793 | |
| 794 | ## Analyze the Flow |
| 795 | |
| 796 | Use the tests/memanalyze.pl perl script to analyze the dump file: |
| 797 | |
| 798 | tests/memanalyze.pl dump |
| 799 | |
| 800 | This now outputs a report on what resources that were allocated but never |
| 801 | freed etc. This report is very fine for posting to the list! |
| 802 | |
| 803 | If this doesn't produce any output, no leak was detected in libcurl. Then |
| 804 | the leak is mostly likely to be in your code. |
| 805 | |
| 806 | <a name="multi_socket"></a> |
| 807 | `multi_socket` |
| 808 | ============== |
| 809 | |
| 810 | Implementation of the `curl_multi_socket` API |
| 811 | |
| 812 | The main ideas of this API are simply: |
| 813 | |
| 814 | 1 - The application can use whatever event system it likes as it gets info |
| 815 | from libcurl about what file descriptors libcurl waits for what action |
| 816 | on. (The previous API returns `fd_sets` which is very select()-centric). |
| 817 | |
| 818 | 2 - When the application discovers action on a single socket, it calls |
| 819 | libcurl and informs that there was action on this particular socket and |
| 820 | libcurl can then act on that socket/transfer only and not care about |
| 821 | any other transfers. (The previous API always had to scan through all |
| 822 | the existing transfers.) |
| 823 | |
| 824 | The idea is that [`curl_multi_socket_action()`][7] calls a given callback |
| 825 | with information about what socket to wait for what action on, and the |
| 826 | callback only gets called if the status of that socket has changed. |
| 827 | |
| 828 | We also added a timer callback that makes libcurl call the application when |
| 829 | the timeout value changes, and you set that with [`curl_multi_setopt()`][9] |
| 830 | and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work, |
| 831 | Internally, there's an added a struct to each easy handle in which we store |
| 832 | an "expire time" (if any). The structs are then "splay sorted" so that we |
| 833 | can add and remove times from the linked list and yet somewhat swiftly |
| 834 | figure out both how long time there is until the next nearest timer expires |
| 835 | and which timer (handle) we should take care of now. Of course, the upside |
| 836 | of all this is that we get a [`curl_multi_timeout()`][8] that should also |
| 837 | work with old-style applications that use [`curl_multi_perform()`][11]. |
| 838 | |
| 839 | We created an internal "socket to easy handles" hash table that given |
| 840 | a socket (file descriptor) return the easy handle that waits for action on |
| 841 | that socket. This hash is made using the already existing hash code |
| 842 | (previously only used for the DNS cache). |
| 843 | |
| 844 | To make libcurl able to report plain sockets in the socket callback, we had |
| 845 | to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that |
| 846 | the conversion from sockets to `fd_sets` for that function is only done in |
| 847 | the last step before the data is returned. I also had to extend c-ares to |
| 848 | get a function that can return plain sockets, as that library too returned |
| 849 | only `fd_sets` and that is no longer good enough. The changes done to c-ares |
| 850 | are available in c-ares 1.3.1 and later. |
| 851 | |
| 852 | <a name="structs"></a> |
| 853 | Structs in libcurl |
| 854 | ================== |
| 855 | |
| 856 | This section should cover 7.32.0 pretty accurately, but will make sense even |
| 857 | for older and later versions as things don't change drastically that often. |
| 858 | |
| 859 | ## SessionHandle |
| 860 | |
| 861 | The SessionHandle handle struct is the one returned to the outside in the |
| 862 | external API as a "CURL *". This is usually known as an easy handle in API |
| 863 | documentations and examples. |
| 864 | |
| 865 | Information and state that is related to the actual connection is in the |
| 866 | 'connectdata' struct. When a transfer is about to be made, libcurl will |
| 867 | either create a new connection or re-use an existing one. The particular |
| 868 | connectdata that is used by this handle is pointed out by |
| 869 | SessionHandle->easy_conn. |
| 870 | |
| 871 | Data and information that regard this particular single transfer is put in |
| 872 | the SingleRequest sub-struct. |
| 873 | |
| 874 | When the SessionHandle struct is added to a multi handle, as it must be in |
| 875 | order to do any transfer, the ->multi member will point to the `Curl_multi` |
| 876 | struct it belongs to. The ->prev and ->next members will then be used by the |
| 877 | multi code to keep a linked list of SessionHandle structs that are added to |
| 878 | that same multi handle. libcurl always uses multi so ->multi *will* point to |
| 879 | a `Curl_multi` when a transfer is in progress. |
| 880 | |
| 881 | ->mstate is the multi state of this particular SessionHandle. When |
| 882 | `multi_runsingle()` is called, it will act on this handle according to which |
| 883 | state it is in. The mstate is also what tells which sockets to return for a |
| 884 | specific SessionHandle when [`curl_multi_fdset()`][12] is called etc. |
| 885 | |
| 886 | The libcurl source code generally use the name 'data' for the variable that |
| 887 | points to the SessionHandle. |
| 888 | |
| 889 | When doing multiplexed HTTP/2 transfers, each SessionHandle is associated |
| 890 | with an individual stream, sharing the same connectdata struct. Multiplexing |
| 891 | makes it even more important to keep things associated with the right thing! |
| 892 | |
| 893 | ## connectdata |
| 894 | |
| 895 | A general idea in libcurl is to keep connections around in a connection |
| 896 | "cache" after they have been used in case they will be used again and then |
| 897 | re-use an existing one instead of creating a new as it creates a significant |
| 898 | performance boost. |
| 899 | |
| 900 | Each 'connectdata' identifies a single physical connection to a server. If |
| 901 | the connection can't be kept alive, the connection will be closed after use |
| 902 | and then this struct can be removed from the cache and freed. |
| 903 | |
| 904 | Thus, the same SessionHandle can be used multiple times and each time select |
| 905 | another connectdata struct to use for the connection. Keep this in mind, as |
| 906 | it is then important to consider if options or choices are based on the |
| 907 | connection or the SessionHandle. |
| 908 | |
| 909 | Functions in libcurl will assume that connectdata->data points to the |
| 910 | SessionHandle that uses this connection (for the moment). |
| 911 | |
| 912 | As a special complexity, some protocols supported by libcurl require a |
| 913 | special disconnect procedure that is more than just shutting down the |
| 914 | socket. It can involve sending one or more commands to the server before |
| 915 | doing so. Since connections are kept in the connection cache after use, the |
| 916 | original SessionHandle may no longer be around when the time comes to shut |
| 917 | down a particular connection. For this purpose, libcurl holds a special |
| 918 | dummy `closure_handle` SessionHandle in the `Curl_multi` struct to use when |
| 919 | needed. |
| 920 | |
| 921 | FTP uses two TCP connections for a typical transfer but it keeps both in |
| 922 | this single struct and thus can be considered a single connection for most |
| 923 | internal concerns. |
| 924 | |
| 925 | The libcurl source code generally use the name 'conn' for the variable that |
| 926 | points to the connectdata. |
| 927 | |
| 928 | ## Curl_multi |
| 929 | |
| 930 | Internally, the easy interface is implemented as a wrapper around multi |
| 931 | interface functions. This makes everything multi interface. |
| 932 | |
| 933 | `Curl_multi` is the multi handle struct exposed as "CURLM *" in external APIs. |
| 934 | |
| 935 | This struct holds a list of SessionHandle structs that have been added to |
| 936 | this handle with [`curl_multi_add_handle()`][13]. The start of the list is |
| 937 | ->easyp and ->num_easy is a counter of added SessionHandles. |
| 938 | |
| 939 | ->msglist is a linked list of messages to send back when |
| 940 | [`curl_multi_info_read()`][14] is called. Basically a node is added to that |
| 941 | list when an individual SessionHandle's transfer has completed. |
| 942 | |
| 943 | ->hostcache points to the name cache. It is a hash table for looking up name |
| 944 | to IP. The nodes have a limited life time in there and this cache is meant |
| 945 | to reduce the time for when the same name is wanted within a short period of |
| 946 | time. |
| 947 | |
| 948 | ->timetree points to a tree of SessionHandles, sorted by the remaining time |
| 949 | until it should be checked - normally some sort of timeout. Each |
| 950 | SessionHandle has one node in the tree. |
| 951 | |
| 952 | ->sockhash is a hash table to allow fast lookups of socket descriptor to |
| 953 | which SessionHandle that uses that descriptor. This is necessary for the |
| 954 | `multi_socket` API. |
| 955 | |
| 956 | ->conn_cache points to the connection cache. It keeps track of all |
| 957 | connections that are kept after use. The cache has a maximum size. |
| 958 | |
| 959 | ->closure_handle is described in the 'connectdata' section. |
| 960 | |
| 961 | The libcurl source code generally use the name 'multi' for the variable that |
| 962 | points to the Curl_multi struct. |
| 963 | |
| 964 | ## Curl_handler |
| 965 | |
| 966 | Each unique protocol that is supported by libcurl needs to provide at least |
| 967 | one `Curl_handler` struct. It defines what the protocol is called and what |
| 968 | functions the main code should call to deal with protocol specific issues. |
| 969 | In general, there's a source file named [protocol].c in which there's a |
| 970 | "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's |
| 971 | then the main array with all individual `Curl_handler` structs pointed to |
| 972 | from a single array which is scanned through when a URL is given to libcurl |
| 973 | to work with. |
| 974 | |
| 975 | ->scheme is the URL scheme name, usually spelled out in uppercase. That's |
| 976 | "HTTP" or "FTP" etc. SSL versions of the protcol need its own `Curl_handler` |
| 977 | setup so HTTPS separate from HTTP. |
| 978 | |
| 979 | ->setup_connection is called to allow the protocol code to allocate protocol |
| 980 | specific data that then gets associated with that SessionHandle for the rest |
| 981 | of this transfer. It gets freed again at the end of the transfer. It will be |
| 982 | called before the 'connectdata' for the transfer has been selected/created. |
| 983 | Most protocols will allocate its private 'struct [PROTOCOL]' here and assign |
| 984 | SessionHandle->req.protop to point to it. |
| 985 | |
| 986 | ->connect_it allows a protocol to do some specific actions after the TCP |
| 987 | connect is done, that can still be considered part of the connection phase. |
| 988 | |
| 989 | Some protocols will alter the connectdata->recv[] and connectdata->send[] |
| 990 | function pointers in this function. |
| 991 | |
| 992 | ->connecting is similarly a function that keeps getting called as long as the |
| 993 | protocol considers itself still in the connecting phase. |
| 994 | |
| 995 | ->do_it is the function called to issue the transfer request. What we call |
| 996 | the DO action internally. If the DO is not enough and things need to be kept |
| 997 | getting done for the entire DO sequence to complete, ->doing is then usually |
| 998 | also provided. Each protocol that needs to do multiple commands or similar |
| 999 | for do/doing need to implement their own state machines (see SCP, SFTP, |
| 1000 | FTP). Some protocols (only FTP and only due to historical reasons) has a |
| 1001 | separate piece of the DO state called `DO_MORE`. |
| 1002 | |
| 1003 | ->doing keeps getting called while issuing the transfer request command(s) |
| 1004 | |
| 1005 | ->done gets called when the transfer is complete and DONE. That's after the |
| 1006 | main data has been transferred. |
| 1007 | |
| 1008 | ->do_more gets called during the `DO_MORE` state. The FTP protocol uses this |
| 1009 | state when setting up the second connection. |
| 1010 | |
| 1011 | ->`proto_getsock` |
| 1012 | ->`doing_getsock` |
| 1013 | ->`domore_getsock` |
| 1014 | ->`perform_getsock` |
| 1015 | Functions that return socket information. Which socket(s) to wait for which |
| 1016 | action(s) during the particular multi state. |
| 1017 | |
| 1018 | ->disconnect is called immediately before the TCP connection is shutdown. |
| 1019 | |
| 1020 | ->readwrite gets called during transfer to allow the protocol to do extra |
| 1021 | reads/writes |
| 1022 | |
| 1023 | ->defport is the default report TCP or UDP port this protocol uses |
| 1024 | |
| 1025 | ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions |
| 1026 | have their "base" protocol set and then the SSL variation. Like |
| 1027 | "HTTP|HTTPS". |
| 1028 | |
| 1029 | ->flags is a bitmask with additional information about the protocol that will |
| 1030 | make it get treated differently by the generic engine: |
| 1031 | |
| 1032 | - `PROTOPT_SSL` - will make it connect and negotiate SSL |
| 1033 | |
| 1034 | - `PROTOPT_DUAL` - this protocol uses two connections |
| 1035 | |
| 1036 | - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the |
| 1037 | connection. This flag is no longer used by code, yet still set for a bunch |
| 1038 | protocol handlers. |
| 1039 | |
| 1040 | - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to |
| 1041 | limit which "direction" of socket actions that the main engine will |
| 1042 | concern itself about. |
| 1043 | |
| 1044 | - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:) |
| 1045 | |
| 1046 | - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default |
| 1047 | one unless one is provided |
| 1048 | |
| 1049 | - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL |
| 1050 | (?foo=bar) |
| 1051 | |
| 1052 | ## conncache |
| 1053 | |
| 1054 | Is a hash table with connections for later re-use. Each SessionHandle has |
| 1055 | a pointer to its connection cache. Each multi handle sets up a connection |
| 1056 | cache that all added SessionHandles share by default. |
| 1057 | |
| 1058 | ## Curl_share |
| 1059 | |
| 1060 | The libcurl share API allocates a `Curl_share` struct, exposed to the |
| 1061 | external API as "CURLSH *". |
| 1062 | |
| 1063 | The idea is that the struct can have a set of own versions of caches and |
| 1064 | pools and then by providing this struct in the `CURLOPT_SHARE` option, those |
| 1065 | specific SessionHandles will use the caches/pools that this share handle |
| 1066 | holds. |
| 1067 | |
| 1068 | Then individual SessionHandle structs can be made to share specific things |
| 1069 | that they otherwise wouldn't, such as cookies. |
| 1070 | |
| 1071 | The `Curl_share` struct can currently hold cookies, DNS cache and the SSL |
| 1072 | session cache. |
| 1073 | |
| 1074 | ## CookieInfo |
| 1075 | |
| 1076 | This is the main cookie struct. It holds all known cookies and related |
| 1077 | information. Each SessionHandle has its own private CookieInfo even when |
| 1078 | they are added to a multi handle. They can be made to share cookies by using |
| 1079 | the share API. |
| 1080 | |
| 1081 | |
| 1082 | [1]: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html |
| 1083 | [2]: http://curl.haxx.se/libcurl/c/curl_easy_init.html |
| 1084 | [3]: http://c-ares.haxx.se/ |
| 1085 | [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230" |
| 1086 | [5]: http://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html |
| 1087 | [6]: http://curl.haxx.se/docs/manpage.html#--compressed |
| 1088 | [7]: http://curl.haxx.se/libcurl/c/curl_multi_socket_action.html |
| 1089 | [8]: http://curl.haxx.se/libcurl/c/curl_multi_timeout.html |
| 1090 | [9]: http://curl.haxx.se/libcurl/c/curl_multi_setopt.html |
| 1091 | [10]: http://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html |
| 1092 | [11]: http://curl.haxx.se/libcurl/c/curl_multi_perform.html |
| 1093 | [12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html |
| 1094 | [13]: http://curl.haxx.se/libcurl/c/curl_multi_add_handle.html |
| 1095 | [14]: http://curl.haxx.se/libcurl/c/curl_multi_info_read.html |