| David Tolnay | 7db7369 | 2019-10-20 14:51:12 -0400 | [diff] [blame] | 1 | CXX — safe FFI between Rust and C++ |
| 2 | ========================================= |
| 3 | |
| 4 | [](https://travis-ci.com/dtolnay/cxx) |
| 5 | [](https://crates.io/crates/cxx) |
| 6 | [](https://docs.rs/cxx) |
| 7 | |
| 8 | This library provides a **safe** mechanism for calling C++ code from Rust and |
| 9 | Rust code from C++, not subject to the many ways that things can go wrong when |
| 10 | using bindgen or cbindgen to generate unsafe C-style bindings. |
| 11 | |
| 12 | ```toml |
| 13 | [dependencies] |
| David Tolnay | e43b737 | 2020-01-08 08:46:20 -0800 | [diff] [blame^] | 14 | cxx = "0.1" |
| David Tolnay | 7db7369 | 2019-10-20 14:51:12 -0400 | [diff] [blame] | 15 | ``` |
| 16 | |
| 17 | *Compiler support: requires rustc 1.42+ (beta on January 30, stable on March |
| 18 | 12)* |
| 19 | |
| 20 | <br> |
| 21 | |
| 22 | ## Overview |
| 23 | |
| 24 | The idea is that we define the signatures of both sides of our FFI boundary |
| 25 | embedded together in one Rust module (the next section shows an example). From |
| 26 | this, CXX receives a complete picture of the boundary to perform static analyses |
| 27 | against the types and function signatures to uphold both Rust's and C++'s |
| 28 | invariants and requirements. |
| 29 | |
| 30 | If everything checks out statically, then CXX uses a pair of code generators to |
| 31 | emit the relevant `extern "C"` signatures on both sides together with any |
| 32 | necessary static assertions for later in the build process to verify |
| 33 | correctness. On the Rust side this code generator is simply an attribute |
| 34 | procedural macro. On the C++ side it can be a small Cargo build script if your |
| 35 | build is managed by Cargo, or for other build systems like Bazel or Buck we |
| 36 | provide a command line tool which generates the header and source file and |
| 37 | should be easy to integrate. |
| 38 | |
| 39 | The resulting FFI bridge operates at zero or negligible overhead, i.e. no |
| 40 | copying, no serialization, no memory allocation, no runtime checks needed. |
| 41 | |
| 42 | The FFI signatures are able to use native types from whichever side they please, |
| 43 | such as Rust's `String` or C++'s `std::string`, Rust's `Box` or C++'s |
| 44 | `std::unique_ptr`, Rust's `Vec` or C++'s `std::vector`, etc in any combination. |
| 45 | CXX guarantees an ABI-compatible signature that both sides understand, based on |
| 46 | builtin bindings for key standard library types to expose an idiomatic API on |
| 47 | those types to the other language. For example when manipulating a C++ string |
| 48 | from Rust, its `len()` method becomes a call of the `size()` member function |
| 49 | defined by C++; when manipulation a Rust string from C++, its `size()` member |
| 50 | function calls Rust's `len()`. |
| 51 | |
| 52 | <br> |
| 53 | |
| 54 | ## Example |
| 55 | |
| 56 | A runnable version of this example is provided under the *demo-rs* directory of |
| 57 | this repo (with the C++ side of the implementation in the *demo-cxx* directory). |
| 58 | To try it out, jump into demo-rs and run `cargo run`. |
| 59 | |
| 60 | ```rust |
| 61 | #[cxx::bridge] |
| 62 | mod ffi { |
| 63 | // Any shared structs, whose fields will be visible to both languages. |
| 64 | struct SharedThing { |
| 65 | z: i32, |
| 66 | y: Box<ThingR>, |
| 67 | x: UniquePtr<ThingC>, |
| 68 | } |
| 69 | |
| 70 | extern "C" { |
| 71 | // One or more headers with the matching C++ declarations. Our code |
| 72 | // generators don't read it but it gets #include'd and used in static |
| 73 | // assertions to ensure our picture of the FFI boundary is accurate. |
| 74 | include!("demo-cxx/demo.h"); |
| 75 | |
| 76 | // Zero or more opaque types which both languages can pass around but |
| 77 | // only C++ can see the fields. |
| 78 | type ThingC; |
| 79 | |
| 80 | // Functions implemented in C++. |
| 81 | fn make_demo(appname: &str) -> UniquePtr<ThingC>; |
| 82 | fn get_name(thing: &ThingC) -> &CxxString; |
| 83 | fn do_thing(state: SharedThing); |
| 84 | } |
| 85 | |
| 86 | extern "Rust" { |
| 87 | // Zero or more opaque types which both languages can pass around but |
| 88 | // only Rust can see the fields. |
| 89 | type ThingR; |
| 90 | |
| 91 | // Functions implemented in Rust. |
| 92 | fn print_r(r: &ThingR); |
| 93 | } |
| 94 | } |
| 95 | ``` |
| 96 | |
| 97 | Now we simply provide C++ definitions of all the things in the `extern "C"` |
| 98 | block and Rust definitions of all the things in the `extern "Rust"` block, and |
| 99 | get to call back and forth safely. |
| 100 | |
| 101 | Here are links to the complete set of source files involved in the demo: |
| 102 | |
| 103 | - [demo-rs/src/main.rs](demo-rs/src/main.rs) |
| 104 | - [demo-rs/build.rs](demo-rs/build.rs) |
| 105 | - [demo-cxx/demo.h](demo-cxx/demo.h) |
| 106 | - [demo-cxx/demo.cc](demo-cxx/demo.cc) |
| 107 | |
| 108 | To look at the code generated in both languages for the example by the CXX code |
| 109 | generators: |
| 110 | |
| 111 | ```console |
| 112 | # run Rust code generator and print to stdout |
| 113 | # (requires https://github.com/dtolnay/cargo-expand) |
| 114 | $ cargo expand --manifest-path demo-rs/Cargo.toml |
| 115 | |
| 116 | # run C++ code generator and print to stdout |
| 117 | $ cargo run --manifest-path cmd/Cargo.toml -- demo-rs/src/main.rs |
| 118 | ``` |
| 119 | |
| 120 | <br> |
| 121 | |
| 122 | ## Details |
| 123 | |
| 124 | As seen in the example, the language of the FFI boundary involves 3 kinds of |
| 125 | items: |
| 126 | |
| 127 | - **Shared structs** — their fields are made visible to both languages. |
| 128 | The definition written within cxx::bridge is the single source of truth. |
| 129 | |
| 130 | - **Opaque types** — their fields are secret from the other language. |
| 131 | These cannot be passed across the FFI by value but only behind an indirection, |
| 132 | such as a reference `&`, a Rust `Box`, or a `UniquePtr`. Can be a type alias |
| 133 | for an arbitrarily complicated generic language-specific type depending on |
| 134 | your use case. |
| 135 | |
| 136 | - **Functions** — implemented in either language, callable from the other |
| 137 | language. |
| 138 | |
| 139 | Within the `extern "C"` part of the CXX bridge we list the types and functions |
| 140 | for which C++ is the source of truth, as well as the header(s) that declare |
| 141 | those APIs. In the future it's possible that this section could be generated |
| 142 | bindgen-style from the headers but for now we need the signatures written out; |
| 143 | static assertions will verify that they are accurate. |
| 144 | |
| 145 | Within the `extern "Rust"` part, we list types and functions for which Rust is |
| 146 | the source of truth. These all implicitly refer to the `super` module, the |
| 147 | parent module of the CXX bridge. You can think of the two items listed in the |
| 148 | example above as being like `use super::ThingR` and `use super::print_r` except |
| 149 | re-exported to C++. The parent module will either contain the definitions |
| 150 | directly for simple things, or contain the relevant `use` statements to bring |
| 151 | them into scope from elsewhere. |
| 152 | |
| 153 | Your function implementations themselves, whether in C++ or Rust, *do not* need |
| 154 | to be defined as `extern "C"` ABI or no\_mangle. CXX will put in the right shims |
| 155 | where necessary to make it all work. |
| 156 | |
| 157 | <br> |
| 158 | |
| 159 | ## Comparison vs bindgen and cbindgen |
| 160 | |
| 161 | Notice that with CXX there is repetition of all the function signatures: they |
| 162 | are typed out once where the implementation is defined (in C++ or Rust) and |
| 163 | again inside the cxx::bridge module, though compile-time assertions guarantee |
| 164 | these are kept in sync. This is different from [bindgen] and [cbindgen] where |
| 165 | function signatures are typed by a human once and the tool consumes them in one |
| 166 | language and emits them in the other language. |
| 167 | |
| 168 | [bindgen]: https://github.com/rust-lang/rust-bindgen |
| 169 | [cbindgen]: https://github.com/eqrion/cbindgen/ |
| 170 | |
| 171 | This is because CXX fills a somewhat different role. It is a lower level tool |
| 172 | than bindgen or cbindgen in a sense; you can think of it as being a replacement |
| 173 | for the concept of `extern "C"` signatures as we know them, rather than a |
| 174 | replacement for a bindgen. It would be reasonable to build a higher level |
| 175 | bindgen-like tool on top of CXX which consumes a C++ header and/or Rust module |
| 176 | (and/or IDL like Thrift) as source of truth and generates the cxx::bridge, |
| 177 | eliminating the repetition while leveraging the static analysis safety |
| 178 | guarantees of CXX. |
| 179 | |
| 180 | But note in other ways CXX is higher level than the bindgens, with rich support |
| 181 | for common standard library types. Frequently with bindgen when we are dealing |
| 182 | with an idiomatic C++ API we would end up manually wrapping that API in C-style |
| 183 | raw pointer functions, applying bindgen to get unsafe raw pointer Rust |
| 184 | functions, and replicating the API again to expose those idiomatically in Rust. |
| 185 | That's a much worse form of repetition because it is unsafe all the way through. |
| 186 | |
| 187 | By using a CXX bridge as the shared understanding between the languages, rather |
| 188 | than `extern "C"` C-style signatures as the shared understanding, common FFI use |
| 189 | cases become expressible using 100% safe code. |
| 190 | |
| 191 | It would also be reasonable to mix and match, using CXX bridge for the 95% of |
| 192 | your FFI that is straightforward and doing the remaining few oddball signatures |
| 193 | the old fashioned way with bindgen and cbindgen, if for some reason CXX's static |
| 194 | restrictions get in the way. Please file an issue if you end up taking this |
| 195 | approach so that we know what ways it would be worthwhile to make the tool more |
| 196 | expressive. |
| 197 | |
| 198 | <br> |
| 199 | |
| 200 | ## Cargo-based setup |
| 201 | |
| 202 | For builds that are orchestrated by Cargo, you will use a build script that runs |
| 203 | CXX's C++ code generator and compiles the resulting C++ code along with any |
| 204 | other C++ code for your crate. |
| 205 | |
| 206 | The canonical build script is as follows. The indicated line returns a |
| 207 | [`cc::Build`] instance (from the usual widely used `cc` crate) on which you can |
| 208 | set up any additional source files and compiler flags as normal. |
| 209 | |
| 210 | [`cc::Build`]: https://docs.rs/cc/1.0/cc/struct.Build.html |
| 211 | |
| 212 | ```rust |
| 213 | // build.rs |
| 214 | |
| 215 | fn main() { |
| 216 | cxx::Build::new() |
| 217 | .bridge("src/main.rs") // returns a cc::Build |
| 218 | .file("../demo-cxx/demo.cc") |
| 219 | .flag("-std=c++11") |
| 220 | .compile("cxxbridge-demo"); |
| 221 | |
| 222 | println!("cargo:rerun-if-changed=src/main.rs"); |
| 223 | println!("cargo:rerun-if-changed=../demo-cxx/demo.h"); |
| 224 | println!("cargo:rerun-if-changed=../demo-cxx/demo.cc"); |
| 225 | } |
| 226 | ``` |
| 227 | |
| 228 | <br> |
| 229 | |
| 230 | ## Non-Cargo setup |
| 231 | |
| 232 | For use in non-Cargo builds like Bazel or Buck, CXX provides an alternate way of |
| 233 | invoking the C++ code generator as a standalone command line tool. The tool is |
| 234 | packaged as the `cxxbridge-cmd` crate on crates.io or can be built from the |
| 235 | *cmd* directory of this repo. |
| 236 | |
| 237 | ```bash |
| 238 | $ cargo install cxxbridge-cmd |
| 239 | |
| 240 | $ cxxbridge src/main.rs --header > path/to/mybridge.h |
| 241 | $ cxxbridge src/main.rs > path/to/mybridge.cc |
| 242 | ``` |
| 243 | |
| 244 | <br> |
| 245 | |
| 246 | ## Safety |
| 247 | |
| 248 | Be aware that the design of this library is intentionally restrictive and |
| 249 | opinionated! It isn't a goal to be powerful enough to handle arbitrary |
| 250 | signatures in either language. Instead this project is about carving out a |
| 251 | reasonably expressive set of functionality about which we can make useful safety |
| 252 | guarantees today and maybe extend over time. You may find that it takes some |
| 253 | practice to use CXX bridge effectively as it won't work in all the ways that you |
| 254 | are used to. |
| 255 | |
| 256 | Some of the considerations that go into ensuring safety are: |
| 257 | |
| 258 | - By design, our paired code generators work together to control both sides of |
| 259 | the FFI boundary. Ordinarily in Rust writing your own `extern "C"` blocks is |
| 260 | unsafe because the Rust compiler has no way to know whether the signatures |
| 261 | you've written actually match the signatures implemented in the other |
| 262 | language. With CXX we achieve that visibility and know what's on the other |
| 263 | side. |
| 264 | |
| 265 | - Our static analysis detects and prevents passing types by value that shouldn't |
| 266 | be passed by value from C++ to Rust, for example because they may contain |
| 267 | internal pointers that would be screwed up by Rust's move behavior. |
| 268 | |
| 269 | - To many people's surprise, it is possible to have a struct in Rust and a |
| 270 | struct in C++ with exactly the same layout / fields / alignment / everything, |
| 271 | and still not the same ABI when passed by value. This is a longstanding |
| 272 | bindgen bug that leads to segfaults in absolutely correct-looking code |
| 273 | ([rust-lang/rust-bindgen#778]). CXX knows about this and can insert the |
| 274 | necessary zero-cost workaround transparently where needed, so go ahead and |
| 275 | pass your structs by value without worries. This is made possible by owning |
| 276 | both sides of the boundary rather than just one. |
| 277 | |
| 278 | - Template instantiations: for example in order to expose a UniquePtr\<T\> type |
| 279 | in Rust backed by a real C++ unique\_ptr, we have a way of using a Rust trait |
| 280 | to connect the behavior back to the template instantiations performed by the |
| 281 | other language. |
| 282 | |
| 283 | [rust-lang/rust-bindgen#778]: https://github.com/rust-lang/rust-bindgen/issues/778 |
| 284 | |
| 285 | <br> |
| 286 | |
| 287 | ## Builtin types |
| 288 | |
| 289 | In addition to all the primitive types (i32 ⟷ int32_t), the following common |
| 290 | types may be used in the fields of shared structs and the arguments and returns |
| 291 | of functions. |
| 292 | |
| 293 | <table> |
| 294 | <tr><th>name in Rust</th><th>name in C++</th><th>restrictions</th></tr> |
| 295 | <tr><td>String</td><td>cxxbridge::RustString</td><td></td></tr> |
| 296 | <tr><td>&str</td><td>cxxbridge::RustStr</td><td></td></tr> |
| David Tolnay | e43b737 | 2020-01-08 08:46:20 -0800 | [diff] [blame^] | 297 | <tr><td><a href="https://docs.rs/cxx/0.1/cxx/struct.CxxString.html">CxxString</a></td><td>std::string</td><td><sup><i>cannot be passed by value</i></sup></td></tr> |
| David Tolnay | 7db7369 | 2019-10-20 14:51:12 -0400 | [diff] [blame] | 298 | <tr><td>Box<T></td><td>cxxbridge::RustBox<T></td><td><sup><i>cannot hold opaque C++ type</i></sup></td></tr> |
| David Tolnay | e43b737 | 2020-01-08 08:46:20 -0800 | [diff] [blame^] | 299 | <tr><td><a href="https://docs.rs/cxx/0.1/cxx/struct.UniquePtr.html">UniquePtr<T></a></td><td>std::unique_ptr<T></td><td><sup><i>cannot hold opaque Rust type</i></sup></td></tr> |
| David Tolnay | 7db7369 | 2019-10-20 14:51:12 -0400 | [diff] [blame] | 300 | <tr><td></td><td></td><td></td></tr> |
| 301 | </table> |
| 302 | |
| 303 | The C++ API of the `cxxbridge` namespace is defined by the *include/cxxbridge.h* |
| 304 | file in this repo. You will need to include this header in your C++ code when |
| 305 | working with those types. |
| 306 | |
| 307 | The following types are intended to be supported "soon" but are just not |
| 308 | implemented yet. I don't expect any of these to be hard to make work but it's a |
| 309 | matter of designing a nice API for each in its non-native language. |
| 310 | |
| 311 | <table> |
| 312 | <tr><th>name in Rust</th><th>name in C++</th></tr> |
| 313 | <tr><td>&[T]</td><td></td></tr> |
| 314 | <tr><td>Vec<T></td><td></td></tr> |
| 315 | <tr><td>BTreeMap<K, V></td><td></td></tr> |
| 316 | <tr><td>HashMap<K, V></td><td></td></tr> |
| 317 | <tr><td></td><td>std::vector<T></td></tr> |
| 318 | <tr><td></td><td>std::map<K, V></td></tr> |
| 319 | <tr><td></td><td>std::unordered_map<K, V></td></tr> |
| 320 | </table> |
| 321 | |
| 322 | <br> |
| 323 | |
| 324 | ## Remaining work |
| 325 | |
| 326 | This is still early days for CXX; I am releasing it as a minimum viable product |
| 327 | to collect feedback on the direction and invite collaborators. Here are some of |
| 328 | the facets that I still intend for this project to tackle: |
| 329 | |
| 330 | - [ ] Support associated methods: `extern "Rust" { fn f(self: &Struct); }` |
| 331 | - [ ] Support C++ member functions |
| 332 | - [ ] Support passing function pointers across the FFI |
| 333 | - [ ] Support translating between Result ⟷ exceptions |
| 334 | - [ ] Support structs with type parameters |
| 335 | - [ ] Support async functions |
| 336 | |
| 337 | On the build side, I don't have much experience with the `cc` crate so I expect |
| 338 | there may be someone who can suggest ways to make that aspect of this crate |
| 339 | friendlier or more robust. Please report issues if you run into trouble building |
| 340 | or linking any of this stuff. |
| 341 | |
| 342 | Finally, I know more about Rust library design than C++ library design so I |
| 343 | would appreciate help making the C++ APIs in this project more idiomatic where |
| 344 | anyone has suggestions. |
| 345 | |
| 346 | <br> |
| 347 | |
| 348 | #### License |
| 349 | |
| 350 | <sup> |
| 351 | Licensed under either of <a href="LICENSE-APACHE">Apache License, Version |
| 352 | 2.0</a> or <a href="LICENSE-MIT">MIT license</a> at your option. |
| 353 | </sup> |
| 354 | |
| 355 | <br> |
| 356 | |
| 357 | <sub> |
| 358 | Unless you explicitly state otherwise, any contribution intentionally submitted |
| 359 | for inclusion in this project by you, as defined in the Apache-2.0 license, |
| 360 | shall be dual licensed as above, without any additional terms or conditions. |
| 361 | </sub> |