blob: 7a12641babf89eaa159800c505022e325a2f5fda [file] [log] [blame]
David Majnemera6539272016-07-23 04:05:08 +00001=====================================
2Coroutines in LLVM
3=====================================
4
5.. contents::
6 :local:
7 :depth: 3
8
9.. warning::
10 This is a work in progress. Compatibility across LLVM releases is not
11 guaranteed.
12
13Introduction
14============
15
16.. _coroutine handle:
17
18LLVM coroutines are functions that have one or more `suspend points`_.
19When a suspend point is reached, the execution of a coroutine is suspended and
20control is returned back to its caller. A suspended coroutine can be resumed
21to continue execution from the last suspend point or it can be destroyed.
22
23In the following example, we call function `f` (which may or may not be a
24coroutine itself) that returns a handle to a suspended coroutine
25(**coroutine handle**) that is used by `main` to resume the coroutine twice and
26then destroy it:
27
28.. code-block:: llvm
29
30 define i32 @main() {
31 entry:
32 %hdl = call i8* @f(i32 4)
33 call void @llvm.coro.resume(i8* %hdl)
34 call void @llvm.coro.resume(i8* %hdl)
35 call void @llvm.coro.destroy(i8* %hdl)
36 ret i32 0
37 }
38
39.. _coroutine frame:
40
41In addition to the function stack frame which exists when a coroutine is
42executing, there is an additional region of storage that contains objects that
43keep the coroutine state when a coroutine is suspended. This region of storage
44is called **coroutine frame**. It is created when a coroutine is called and
45destroyed when a coroutine runs to completion or destroyed by a call to
46the `coro.destroy`_ intrinsic.
47
48An LLVM coroutine is represented as an LLVM function that has calls to
49`coroutine intrinsics`_ defining the structure of the coroutine.
50After lowering, a coroutine is split into several
51functions that represent three different ways of how control can enter the
52coroutine:
53
541. a ramp function, which represents an initial invocation of the coroutine that
55 creates the coroutine frame and executes the coroutine code until it
56 encounters a suspend point or reaches the end of the function;
57
582. a coroutine resume function that is invoked when the coroutine is resumed;
59
603. a coroutine destroy function that is invoked when the coroutine is destroyed.
61
62.. note:: Splitting out resume and destroy functions are just one of the
63 possible ways of lowering the coroutine. We chose it for initial
64 implementation as it matches closely the mental model and results in
65 reasonably nice code.
66
67Coroutines by Example
68=====================
69
70Coroutine Representation
71------------------------
72
73Let's look at an example of an LLVM coroutine with the behavior sketched
74by the following pseudo-code.
75
Sanjoy Das77a9c792016-07-26 21:03:41 +000076.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +000077
78 void *f(int n) {
79 for(;;) {
80 print(n++);
81 <suspend> // returns a coroutine handle on first suspend
82 }
83 }
84
85This coroutine calls some function `print` with value `n` as an argument and
86suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
87a `main` shown in the previous section. It will call `print` with values 4, 5
88and 6 after which the coroutine will be destroyed.
89
90The LLVM IR for this coroutine looks like this:
91
Aaron Ballman378ac7e2016-07-23 18:53:35 +000092.. code-block:: none
David Majnemera6539272016-07-23 04:05:08 +000093
94 define i8* @f(i32 %n) {
95 entry:
96 %size = call i32 @llvm.coro.size.i32()
97 %alloc = call i8* @malloc(i32 %size)
Gor Nishanovb2a9c022016-08-10 16:40:39 +000098 %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null)
99 %hdl = call noalias i8* @llvm.coro.frame(token %beg)
David Majnemera6539272016-07-23 04:05:08 +0000100 br label %loop
101 loop:
102 %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
103 %inc = add nsw i32 %n.val, 1
104 call void @print(i32 %n.val)
105 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
106 switch i8 %0, label %suspend [i8 0, label %loop
107 i8 1, label %cleanup]
108 cleanup:
109 %mem = call i8* @llvm.coro.free(i8* %hdl)
110 call void @free(i8* %mem)
111 br label %suspend
112 suspend:
113 call void @llvm.coro.end(i8* %hdl, i1 false)
114 ret i8* %hdl
115 }
116
117The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
118lowered to a constant representing the size required for the coroutine frame.
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000119The `coro.begin`_ intrinsic initializes the coroutine frame and returns the a
120token that is used to obtain the coroutine handle via `coro.frame` intrinsic.
121The first parameter of `coro.begin` is given a block of memory to be used if the
122coroutine frame needs to be allocated dynamically.
David Majnemera6539272016-07-23 04:05:08 +0000123
124The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
125given the coroutine handle, returns a pointer of the memory block to be freed or
126`null` if the coroutine frame was not allocated dynamically. The `cleanup`
127block is entered when coroutine runs to completion by itself or destroyed via
128call to the `coro.destroy`_ intrinsic.
129
130The `suspend` block contains code to be executed when coroutine runs to
131completion or suspended. The `coro.end`_ intrinsic marks the point where
132a coroutine needs to return control back to the caller if it is not an initial
133invocation of the coroutine.
134
135The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
136intrinsic in combination with the following switch indicates what happens to
137control flow when a coroutine is suspended (default case), resumed (case 0) or
138destroyed (case 1).
139
140Coroutine Transformation
141------------------------
142
143One of the steps of coroutine lowering is building the coroutine frame. The
144def-use chains are analyzed to determine which objects need be kept alive across
145suspend points. In the coroutine shown in the previous section, use of virtual register
146`%n.val` is separated from the definition by a suspend point, therefore, it
147cannot reside on the stack frame since the latter goes away once the coroutine
148is suspended and control is returned back to the caller. An i32 slot is
149allocated in the coroutine frame and `%n.val` is spilled and reloaded from that
150slot as needed.
151
152We also store addresses of the resume and destroy functions so that the
153`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
154when its identity cannot be determined statically at compile time. For our
155example, the coroutine frame will be:
156
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000157.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000158
159 %f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }
160
161After resume and destroy parts are outlined, function `f` will contain only the
162code responsible for creation and initialization of the coroutine frame and
163execution of the coroutine until a suspend point is reached:
164
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000165.. code-block:: none
David Majnemera6539272016-07-23 04:05:08 +0000166
167 define i8* @f(i32 %n) {
168 entry:
169 %alloc = call noalias i8* @malloc(i32 24)
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000170 %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null)
171 %0 = call i8* @llvm.coro.frame(token %beg)
Mehdi Aminibe1cb222016-07-27 06:03:47 +0000172 %frame = bitcast i8* %0 to %f.frame*
David Majnemera6539272016-07-23 04:05:08 +0000173 %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
174 store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
175 %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
176 store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2
177
178 %inc = add nsw i32 %n, 1
179 %inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
180 store i32 %inc, i32* %inc.spill.addr
181 call void @print(i32 %n)
182
183 ret i8* %frame
184 }
185
186Outlined resume part of the coroutine will reside in function `f.resume`:
187
188.. code-block:: llvm
189
190 define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
191 entry:
192 %inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
193 %inc.spill = load i32, i32* %inc.spill.addr, align 4
194 %inc = add i32 %n.val, 1
195 store i32 %inc, i32* %inc.spill.addr, align 4
196 tail call void @print(i32 %inc)
197 ret void
198 }
199
200Whereas function `f.destroy` will contain the cleanup code for the coroutine:
201
202.. code-block:: llvm
203
204 define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
205 entry:
206 %0 = bitcast %f.frame* %frame.ptr.destroy to i8*
207 tail call void @free(i8* %0)
208 ret void
209 }
210
211Avoiding Heap Allocations
212-------------------------
213
214A particular coroutine usage pattern, which is illustrated by the `main`
215function in the overview section, where a coroutine is created, manipulated and
216destroyed by the same calling function, is common for coroutines implementing
217RAII idiom and is suitable for allocation elision optimization which avoid
218dynamic allocation by storing the coroutine frame as a static `alloca` in its
219caller.
220
David Majnemera6539272016-07-23 04:05:08 +0000221In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
David Majnemer78557192016-07-27 05:12:35 +0000222when dynamic allocation is required, and an address of an alloca on the caller's
223frame where coroutine frame can be stored if dynamic allocation is elided.
David Majnemera6539272016-07-23 04:05:08 +0000224
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000225.. code-block:: none
David Majnemera6539272016-07-23 04:05:08 +0000226
227 entry:
228 %elide = call i8* @llvm.coro.alloc()
229 %need.dyn.alloc = icmp ne i8* %elide, null
230 br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
231 dyn.alloc:
232 %size = call i32 @llvm.coro.size.i32()
233 %alloc = call i8* @CustomAlloc(i32 %size)
234 br label %coro.begin
235 coro.begin:
236 %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000237 %beg = call token @llvm.coro.begin(i8* %phi, i8* null, i32 0, i8* null, i8* null)
David Majnemera6539272016-07-23 04:05:08 +0000238
239In the cleanup block, we will make freeing the coroutine frame conditional on
240`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
241thus skipping the deallocation code:
242
243.. code-block:: llvm
244
245 cleanup:
246 %mem = call i8* @llvm.coro.free(i8* %hdl)
247 %need.dyn.free = icmp ne i8* %mem, null
248 br i1 %need.dyn.free, label %dyn.free, label %if.end
249 dyn.free:
250 call void @CustomFree(i8* %mem)
251 br label %if.end
252 if.end:
253 ...
254
255With allocations and deallocations represented as described as above, after
David Majnemer78557192016-07-27 05:12:35 +0000256coroutine heap allocation elision optimization, the resulting main will be:
David Majnemera6539272016-07-23 04:05:08 +0000257
258.. code-block:: llvm
259
260 define i32 @main() {
261 entry:
262 call void @print(i32 4)
263 call void @print(i32 5)
264 call void @print(i32 6)
265 ret i32 0
266 }
267
268Multiple Suspend Points
269-----------------------
270
271Let's consider the coroutine that has more than one suspend point:
272
Sanjoy Das77a9c792016-07-26 21:03:41 +0000273.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +0000274
275 void *f(int n) {
276 for(;;) {
277 print(n++);
278 <suspend>
279 print(-n);
280 <suspend>
281 }
282 }
283
284Matching LLVM code would look like (with the rest of the code remaining the same
285as the code in the previous section):
286
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000287.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000288
289 loop:
290 %n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ]
291 call void @print(i32 %n.addr) #4
292 %2 = call i8 @llvm.coro.suspend(token none, i1 false)
293 switch i8 %2, label %suspend [i8 0, label %loop.resume
294 i8 1, label %cleanup]
295 loop.resume:
296 %inc = add nsw i32 %n.addr, 1
297 %sub = xor i32 %n.addr, -1
298 call void @print(i32 %sub)
299 %3 = call i8 @llvm.coro.suspend(token none, i1 false)
300 switch i8 %3, label %suspend [i8 0, label %loop
301 i8 1, label %cleanup]
302
303In this case, the coroutine frame would include a suspend index that will
304indicate at which suspend point the coroutine needs to resume. The resume
305function will use an index to jump to an appropriate basic block and will look
306as follows:
307
308.. code-block:: llvm
309
310 define internal fastcc void @f.Resume(%f.Frame* %FramePtr) {
311 entry.Resume:
312 %index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2
313 %index = load i8, i8* %index.addr, align 1
314 %switch = icmp eq i8 %index, 0
315 %n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3
316 %n = load i32, i32* %n.addr, align 4
317 br i1 %switch, label %loop.resume, label %loop
318
319 loop.resume:
320 %sub = xor i32 %n, -1
321 call void @print(i32 %sub)
322 br label %suspend
323 loop:
324 %inc = add nsw i32 %n, 1
325 store i32 %inc, i32* %n.addr, align 4
326 tail call void @print(i32 %inc)
327 br label %suspend
328
329 suspend:
330 %storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ]
331 store i8 %storemerge, i8* %index.addr, align 1
332 ret void
333 }
334
335If different cleanup code needs to get executed for different suspend points,
336a similar switch will be in the `f.destroy` function.
337
338.. note ::
339
340 Using suspend index in a coroutine state and having a switch in `f.resume` and
341 `f.destroy` is one of the possible implementation strategies. We explored
342 another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
343 every suspend point, and instead of storing an index, the resume and destroy
344 function pointers are updated at every suspend. Early testing showed that the
345 current approach is easier on the optimizer than the latter so it is a
346 lowering strategy implemented at the moment.
347
348Distinct Save and Suspend
349-------------------------
350
351In the previous example, setting a resume index (or some other state change that
352needs to happen to prepare a coroutine for resumption) happens at the same time as
353a suspension of a coroutine. However, in certain cases, it is necessary to control
354when coroutine is prepared for resumption and when it is suspended.
355
356In the following example, a coroutine represents some activity that is driven
357by completions of asynchronous operations `async_op1` and `async_op2` which get
358a coroutine handle as a parameter and resume the coroutine once async
359operation is finished.
360
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000361.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000362
363 void g() {
364 for (;;)
365 if (cond()) {
366 async_op1(<coroutine-handle>); // will resume once async_op1 completes
367 <suspend>
368 do_one();
369 }
370 else {
371 async_op2(<coroutine-handle>); // will resume once async_op2 completes
372 <suspend>
373 do_two();
374 }
375 }
376 }
377
378In this case, coroutine should be ready for resumption prior to a call to
379`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
380point when coroutine should be ready for resumption (namely, when a resume index
381should be stored in the coroutine frame, so that it can be resumed at the
382correct resume point):
383
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000384.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000385
386 if.true:
387 %save1 = call token @llvm.coro.save(i8* %hdl)
388 call void async_op1(i8* %hdl)
389 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
390 switch i8 %suspend1, label %suspend [i8 0, label %resume1
391 i8 1, label %cleanup]
392 if.false:
393 %save2 = call token @llvm.coro.save(i8* %hdl)
394 call void async_op2(i8* %hdl)
395 %suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false)
396 switch i8 %suspend1, label %suspend [i8 0, label %resume2
397 i8 1, label %cleanup]
398
399.. _coroutine promise:
400
401Coroutine Promise
402-----------------
403
404A coroutine author or a frontend may designate a distinguished `alloca` that can
405be used to communicate with the coroutine. This distinguished alloca is called
406**coroutine promise** and is provided as a third parameter to the `coro.begin`_
407intrinsic.
408
409The following coroutine designates a 32 bit integer `promise` and uses it to
410store the current value produced by a coroutine.
411
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000412.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000413
414 define i8* @f(i32 %n) {
415 entry:
416 %promise = alloca i32
417 %pv = bitcast i32* %promise to i8*
David Majnemer78557192016-07-27 05:12:35 +0000418 %elide = call i8* @llvm.coro.alloc()
419 %need.dyn.alloc = icmp ne i8* %elide, null
420 br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
421 dyn.alloc:
David Majnemera6539272016-07-23 04:05:08 +0000422 %size = call i32 @llvm.coro.size.i32()
423 %alloc = call i8* @malloc(i32 %size)
David Majnemer78557192016-07-27 05:12:35 +0000424 br label %coro.begin
425 coro.begin:
426 %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000427 %beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* %pv, i8* null)
428 %hdl = call i8* @llvm.coro.frame(token %beg)
David Majnemera6539272016-07-23 04:05:08 +0000429 br label %loop
430 loop:
David Majnemer78557192016-07-27 05:12:35 +0000431 %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
David Majnemera6539272016-07-23 04:05:08 +0000432 %inc = add nsw i32 %n.val, 1
433 store i32 %n.val, i32* %promise
434 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
435 switch i8 %0, label %suspend [i8 0, label %loop
436 i8 1, label %cleanup]
437 cleanup:
438 %mem = call i8* @llvm.coro.free(i8* %hdl)
439 call void @free(i8* %mem)
440 br label %suspend
441 suspend:
442 call void @llvm.coro.end(i8* %hdl, i1 false)
443 ret i8* %hdl
444 }
445
446A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the
447coroutine promise.
448
449.. code-block:: llvm
450
451 define i32 @main() {
452 entry:
453 %hdl = call i8* @f(i32 4)
454 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
455 %promise.addr = bitcast i8* %promise.addr.raw to i32*
456 %val0 = load i32, i32* %promise.addr
457 call void @print(i32 %val0)
458 call void @llvm.coro.resume(i8* %hdl)
459 %val1 = load i32, i32* %promise.addr
460 call void @print(i32 %val1)
461 call void @llvm.coro.resume(i8* %hdl)
462 %val2 = load i32, i32* %promise.addr
463 call void @print(i32 %val2)
464 call void @llvm.coro.destroy(i8* %hdl)
465 ret i32 0
466 }
467
David Majnemer78557192016-07-27 05:12:35 +0000468After example in this section is compiled, result of the compilation will be:
David Majnemera6539272016-07-23 04:05:08 +0000469
470.. code-block:: llvm
471
472 define i32 @main() {
473 entry:
474 tail call void @print(i32 4)
475 tail call void @print(i32 5)
476 tail call void @print(i32 6)
477 ret i32 0
478 }
479
480.. _final:
481.. _final suspend:
482
483Final Suspend
484-------------
485
486A coroutine author or a frontend may designate a particular suspend to be final,
487by setting the second argument of the `coro.suspend`_ intrinsic to `true`.
488Such a suspend point has two properties:
489
490* it is possible to check whether a suspended coroutine is at the final suspend
491 point via `coro.done`_ intrinsic;
492
493* a resumption of a coroutine stopped at the final suspend point leads to
494 undefined behavior. The only possible action for a coroutine at a final
495 suspend point is destroying it via `coro.destroy`_ intrinsic.
496
497From the user perspective, the final suspend point represents an idea of a
498coroutine reaching the end. From the compiler perspective, it is an optimization
499opportunity for reducing number of resume points (and therefore switch cases) in
500the resume function.
501
502The following is an example of a function that keeps resuming the coroutine
503until the final suspend point is reached after which point the coroutine is
504destroyed:
505
506.. code-block:: llvm
507
508 define i32 @main() {
509 entry:
510 %hdl = call i8* @f(i32 4)
511 br label %while
512 while:
513 call void @llvm.coro.resume(i8* %hdl)
514 %done = call i1 @llvm.coro.done(i8* %hdl)
515 br i1 %done, label %end, label %while
516 end:
517 call void @llvm.coro.destroy(i8* %hdl)
518 ret i32 0
519 }
520
521Usually, final suspend point is a frontend injected suspend point that does not
522correspond to any explicitly authored suspend point of the high level language.
523For example, for a Python generator that has only one suspend point:
524
525.. code-block:: python
526
527 def coroutine(n):
528 for i in range(n):
529 yield i
530
531Python frontend would inject two more suspend points, so that the actual code
532looks like this:
533
Sanjoy Das77a9c792016-07-26 21:03:41 +0000534.. code-block:: c
David Majnemera6539272016-07-23 04:05:08 +0000535
536 void* coroutine(int n) {
537 int current_value;
538 <designate current_value to be coroutine promise>
539 <SUSPEND> // injected suspend point, so that the coroutine starts suspended
540 for (int i = 0; i < n; ++i) {
541 current_value = i; <SUSPEND>; // corresponds to "yield i"
542 }
543 <SUSPEND final=true> // injected final suspend point
544 }
545
546and python iterator `__next__` would look like:
547
Sanjoy Das77a9c792016-07-26 21:03:41 +0000548.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +0000549
550 int __next__(void* hdl) {
551 coro.resume(hdl);
552 if (coro.done(hdl)) throw StopIteration();
553 return *(int*)coro.promise(hdl, 4, false);
554 }
555
556Intrinsics
557==========
558
559Coroutine Manipulation Intrinsics
560---------------------------------
561
562Intrinsics described in this section are used to manipulate an existing
563coroutine. They can be used in any function which happen to have a pointer
564to a `coroutine frame`_ or a pointer to a `coroutine promise`_.
565
566.. _coro.destroy:
567
568'llvm.coro.destroy' Intrinsic
569^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
570
571Syntax:
572"""""""
573
574::
575
576 declare void @llvm.coro.destroy(i8* <handle>)
577
578Overview:
579"""""""""
580
581The '``llvm.coro.destroy``' intrinsic destroys a suspended
582coroutine.
583
584Arguments:
585""""""""""
586
587The argument is a coroutine handle to a suspended coroutine.
588
589Semantics:
590""""""""""
591
592When possible, the `coro.destroy` intrinsic is replaced with a direct call to
593the coroutine destroy function. Otherwise it is replaced with an indirect call
594based on the function pointer for the destroy function stored in the coroutine
595frame. Destroying a coroutine that is not suspended leads to undefined behavior.
596
597.. _coro.resume:
598
599'llvm.coro.resume' Intrinsic
600^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
601
602::
603
604 declare void @llvm.coro.resume(i8* <handle>)
605
606Overview:
607"""""""""
608
609The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine.
610
611Arguments:
612""""""""""
613
614The argument is a handle to a suspended coroutine.
615
616Semantics:
617""""""""""
618
619When possible, the `coro.resume` intrinsic is replaced with a direct call to the
620coroutine resume function. Otherwise it is replaced with an indirect call based
621on the function pointer for the resume function stored in the coroutine frame.
622Resuming a coroutine that is not suspended leads to undefined behavior.
623
624.. _coro.done:
625
626'llvm.coro.done' Intrinsic
627^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
628
629::
630
631 declare i1 @llvm.coro.done(i8* <handle>)
632
633Overview:
634"""""""""
635
636The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at
637the final suspend point or not.
638
639Arguments:
640""""""""""
641
642The argument is a handle to a suspended coroutine.
643
644Semantics:
645""""""""""
646
647Using this intrinsic on a coroutine that does not have a `final suspend`_ point
648or on a coroutine that is not suspended leads to undefined behavior.
649
650.. _coro.promise:
651
652'llvm.coro.promise' Intrinsic
653^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
654
655::
656
657 declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)
658
659Overview:
660"""""""""
661
662The '``llvm.coro.promise``' intrinsic obtains a pointer to a
663`coroutine promise`_ given a coroutine handle and vice versa.
664
665Arguments:
666""""""""""
667
668The first argument is a handle to a coroutine if `from` is false. Otherwise,
669it is a pointer to a coroutine promise.
670
671The second argument is an alignment requirements of the promise.
672If a frontend designated `%promise = alloca i32` as a promise, the alignment
673argument to `coro.promise` should be the alignment of `i32` on the target
674platform. If a frontend designated `%promise = alloca i32, align 16` as a
675promise, the alignment argument should be 16.
676This argument only accepts constants.
677
678The third argument is a boolean indicating a direction of the transformation.
679If `from` is true, the intrinsic returns a coroutine handle given a pointer
680to a promise. If `from` is false, the intrinsics return a pointer to a promise
681from a coroutine handle. This argument only accepts constants.
682
683Semantics:
684""""""""""
685
686Using this intrinsic on a coroutine that does not have a coroutine promise
687leads to undefined behavior. It is possible to read and modify coroutine
688promise of the coroutine which is currently executing. The coroutine author and
689a coroutine user are responsible to makes sure there is no data races.
690
691Example:
692""""""""
693
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000694.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000695
696 define i8* @f(i32 %n) {
697 entry:
698 %promise = alloca i32
699 %pv = bitcast i32* %promise to i8*
700 ...
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000701 ; the fourth argument to coro.begin points to the coroutine promise.
702 %beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* %pv, i8* null)
703 %hdl = call noalias i8* @llvm.coro.frame(token %beg)
David Majnemera6539272016-07-23 04:05:08 +0000704 ...
705 store i32 42, i32* %promise ; store something into the promise
706 ...
707 ret i8* %hdl
708 }
709
710 define i32 @main() {
711 entry:
712 %hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle
713 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
714 %promise.addr = bitcast i8* %promise.addr.raw to i32*
715 %val = load i32, i32* %promise.addr ; load a value from the promise
716 call void @print(i32 %val)
717 call void @llvm.coro.destroy(i8* %hdl)
718 ret i32 0
719 }
720
721.. _coroutine intrinsics:
722
723Coroutine Structure Intrinsics
724------------------------------
725Intrinsics described in this section are used within a coroutine to describe
726the coroutine structure. They should not be used outside of a coroutine.
727
728.. _coro.size:
729
730'llvm.coro.size' Intrinsic
731^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
732::
733
734 declare i32 @llvm.coro.size.i32()
735 declare i64 @llvm.coro.size.i64()
736
737Overview:
738"""""""""
739
740The '``llvm.coro.size``' intrinsic returns the number of bytes
741required to store a `coroutine frame`_.
742
743Arguments:
744""""""""""
745
746None
747
748Semantics:
749""""""""""
750
751The `coro.size` intrinsic is lowered to a constant representing the size of
752the coroutine frame.
753
754.. _coro.begin:
755
756'llvm.coro.begin' Intrinsic
757^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
758::
759
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000760 declare i8* @llvm.coro.begin(i8* <mem>, i8* <elide>, i32 <align>, i8* <promise>, i8* <fnaddr>)
David Majnemera6539272016-07-23 04:05:08 +0000761
762Overview:
763"""""""""
764
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000765The '``llvm.coro.begin``' intrinsic captures coroutine initialization
766information and returns a token that can be used by `coro.frame` intrinsic to
767return an address of the coroutine frame.
David Majnemera6539272016-07-23 04:05:08 +0000768
769Arguments:
770""""""""""
771
David Majnemer78557192016-07-27 05:12:35 +0000772The first argument is a pointer to a block of memory where coroutine frame
773will be stored.
David Majnemera6539272016-07-23 04:05:08 +0000774
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000775The second argument is either null or an SSA value of `coro.alloc` intrinsic.
776
777The third argument provides information on the alignment of the memory returned
David Majnemera6539272016-07-23 04:05:08 +0000778by the allocation function and given to `coro.begin` by the first argument. If
779this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
780This argument only accepts constants.
781
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000782The fourth argument, if not `null`, designates a particular alloca instruction to
David Majnemera6539272016-07-23 04:05:08 +0000783be a `coroutine promise`_.
784
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000785The fifth argument is `null` before coroutine is split, and later is replaced
David Majnemera6539272016-07-23 04:05:08 +0000786to point to a private global constant array containing function pointers to
787outlined resume and destroy parts of the coroutine.
788
789Semantics:
790""""""""""
791
792Depending on the alignment requirements of the objects in the coroutine frame
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000793and/or on the codegen compactness reasons the pointer returned from `coro.frame`
794associated with a particular `coro.begin` may be at offset to the `%mem`
795argument. (This could be beneficial if instructions that express relative access
796to data can be more compactly encoded with small positive and negative offsets).
David Majnemera6539272016-07-23 04:05:08 +0000797
David Majnemer78557192016-07-27 05:12:35 +0000798A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
David Majnemera6539272016-07-23 04:05:08 +0000799
800.. _coro.free:
801
802'llvm.coro.free' Intrinsic
803^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
804::
805
806 declare i8* @llvm.coro.free(i8* <frame>)
807
808Overview:
809"""""""""
810
811The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
812coroutine frame is stored or `null` if this instance of a coroutine did not use
813dynamically allocated memory for its coroutine frame.
814
815Arguments:
816""""""""""
817
818A pointer to the coroutine frame. This should be the same pointer that was
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000819returned by prior `coro.frame` call.
David Majnemera6539272016-07-23 04:05:08 +0000820
821Example (custom deallocation function):
822"""""""""""""""""""""""""""""""""""""""
823
824.. code-block:: llvm
825
826 cleanup:
827 %mem = call i8* @llvm.coro.free(i8* %frame)
828 %mem_not_null = icmp ne i8* %mem, null
829 br i1 %mem_not_null, label %if.then, label %if.end
830 if.then:
831 call void @CustomFree(i8* %mem)
832 br label %if.end
833 if.end:
834 ret void
835
836Example (standard deallocation functions):
837""""""""""""""""""""""""""""""""""""""""""
838
839.. code-block:: llvm
840
841 cleanup:
842 %mem = call i8* @llvm.coro.free(i8* %frame)
843 call void @free(i8* %mem)
844 ret void
845
846.. _coro.alloc:
847
848'llvm.coro.alloc' Intrinsic
849^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
850::
851
852 declare i8* @llvm.coro.alloc()
853
854Overview:
855"""""""""
856
857The '``llvm.coro.alloc``' intrinsic returns an address of the memory on the
858callers frame where coroutine frame of this coroutine can be placed or `null`
859otherwise.
860
861Arguments:
862""""""""""
863
864None
865
866Semantics:
867""""""""""
868
869If the coroutine is eligible for heap elision, this intrinsic is lowered to an
870alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
David Majnemer78557192016-07-27 05:12:35 +0000871
872A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
David Majnemera6539272016-07-23 04:05:08 +0000873
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000874If `coro.alloc` is present, the second parameter to `coro.begin` should refer
875to it.
876
David Majnemera6539272016-07-23 04:05:08 +0000877Example:
878""""""""
879
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000880.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000881
882 entry:
883 %elide = call i8* @llvm.coro.alloc()
884 %0 = icmp ne i8* %elide, null
885 br i1 %0, label %coro.begin, label %coro.alloc
886
887 coro.alloc:
888 %frame.size = call i32 @llvm.coro.size()
889 %alloc = call i8* @MyAlloc(i32 %frame.size)
890 br label %coro.begin
891
892 coro.begin:
893 %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000894 %beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* null, i8* null)
895 %frame = call i8* @llvm.coro.frame(token %beg)
David Majnemera6539272016-07-23 04:05:08 +0000896
897.. _coro.frame:
898
899'llvm.coro.frame' Intrinsic
900^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
901::
902
903 declare i8* @llvm.coro.frame()
904
905Overview:
906"""""""""
907
908The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of
909the enclosing coroutine.
910
911Arguments:
912""""""""""
913
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000914A token that refers to `coro.begin` instruction.
David Majnemera6539272016-07-23 04:05:08 +0000915
916Semantics:
917""""""""""
918
Gor Nishanovb2a9c022016-08-10 16:40:39 +0000919This intrinsic is lowered to refer to address of the coroutine frame.
David Majnemera6539272016-07-23 04:05:08 +0000920
921.. _coro.end:
922
923'llvm.coro.end' Intrinsic
924^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
925::
926
927 declare void @llvm.coro.end(i8* <handle>, i1 <unwind>)
928
929Overview:
930"""""""""
931
932The '``llvm.coro.end``' marks the point where execution of the resume part of
933the coroutine should end and control returns back to the caller.
934
935
936Arguments:
937""""""""""
938
939The first argument should refer to the coroutine handle of the enclosing coroutine.
940
941The second argument should be `true` if this coro.end is in the block that is
942part of the unwind sequence leaving the coroutine body due to exception prior to
943the first reaching any suspend points, and `false` otherwise.
944
945Semantics:
946""""""""""
947The `coro.end`_ intrinsic is a no-op during an initial invocation of the
948coroutine. When the coroutine resumes, the intrinsic marks the point when
949coroutine need to return control back to the caller.
950
951This intrinsic is removed by the CoroSplit pass when a coroutine is split into
952the start, resume and destroy parts. In start part, the intrinsic is removed,
953in resume and destroy parts, it is replaced with `ret void` instructions and
954the rest of the block containing `coro.end` instruction is discarded.
955
956In landing pads it is replaced with an appropriate instruction to unwind to
957caller.
958
959A frontend is allowed to supply null as the first parameter, in this case
960`coro-early` pass will replace the null with an appropriate coroutine handle
961value.
962
963.. _coro.suspend:
964.. _suspend points:
965
966'llvm.coro.suspend' Intrinsic
967^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
968::
969
970 declare i8 @llvm.coro.suspend(token <save>, i1 <final>)
971
972Overview:
973"""""""""
974
975The '``llvm.coro.suspend``' marks the point where execution of the coroutine
976need to get suspended and control returned back to the caller.
977Conditional branches consuming the result of this intrinsic lead to basic blocks
978where coroutine should proceed when suspended (-1), resumed (0) or destroyed
979(1).
980
981Arguments:
982""""""""""
983
984The first argument refers to a token of `coro.save` intrinsic that marks the
985point when coroutine state is prepared for suspension. If `none` token is passed,
986the intrinsic behaves as if there were a `coro.save` immediately preceding
987the `coro.suspend` intrinsic.
988
989The second argument indicates whether this suspension point is `final`_.
990The second argument only accepts constants. If more than one suspend point is
991designated as final, the resume and destroy branches should lead to the same
992basic blocks.
993
994Example (normal suspend point):
995"""""""""""""""""""""""""""""""
996
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000997.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000998
999 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
1000 switch i8 %0, label %suspend [i8 0, label %resume
1001 i8 1, label %cleanup]
1002
1003Example (final suspend point):
1004""""""""""""""""""""""""""""""
1005
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +00001006.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +00001007
1008 while.end:
1009 %s.final = call i8 @llvm.coro.suspend(token none, i1 true)
1010 switch i8 %s.final, label %suspend [i8 0, label %trap
1011 i8 1, label %cleanup]
1012 trap:
1013 call void @llvm.trap()
1014 unreachable
1015
1016Semantics:
1017""""""""""
1018
1019If a coroutine that was suspended at the suspend point marked by this intrinsic
1020is resumed via `coro.resume`_ the control will transfer to the basic block
1021of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
1022basic block indicated by the 1-case. To suspend, coroutine proceed to the
1023default label.
1024
1025If suspend intrinsic is marked as final, it can consider the `true` branch
1026unreachable and can perform optimizations that can take advantage of that fact.
1027
1028.. _coro.save:
1029
1030'llvm.coro.save' Intrinsic
1031^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1032::
1033
1034 declare token @llvm.coro.save(i8* <handle>)
1035
1036Overview:
1037"""""""""
1038
1039The '``llvm.coro.save``' marks the point where a coroutine need to update its
1040state to prepare for resumption to be considered suspended (and thus eligible
1041for resumption).
1042
1043Arguments:
1044""""""""""
1045
1046The first argument points to a coroutine handle of the enclosing coroutine.
1047
1048Semantics:
1049""""""""""
1050
1051Whatever coroutine state changes are required to enable resumption of
1052the coroutine from the corresponding suspend point should be done at the point
1053of `coro.save` intrinsic.
1054
1055Example:
1056""""""""
1057
1058Separate save and suspend points are necessary when a coroutine is used to
1059represent an asynchronous control flow driven by callbacks representing
1060completions of asynchronous operations.
1061
1062In such a case, a coroutine should be ready for resumption prior to a call to
1063`async_op` function that may trigger resumption of a coroutine from the same or
1064a different thread possibly prior to `async_op` call returning control back
1065to the coroutine:
1066
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +00001067.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +00001068
1069 %save1 = call token @llvm.coro.save(i8* %hdl)
1070 call void async_op1(i8* %hdl)
1071 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
1072 switch i8 %suspend1, label %suspend [i8 0, label %resume1
1073 i8 1, label %cleanup]
1074
1075.. _coro.param:
1076
1077'llvm.coro.param' Intrinsic
1078^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1079::
1080
1081 declare i1 @llvm.coro.param(i8* <original>, i8* <copy>)
1082
1083Overview:
1084"""""""""
1085
David Majnemer78557192016-07-27 05:12:35 +00001086The '``llvm.coro.param``' is used by a frontend to mark up the code used to
David Majnemera6539272016-07-23 04:05:08 +00001087construct and destruct copies of the parameters. If the optimizer discovers that
1088a particular parameter copy is not used after any suspends, it can remove the
1089construction and destruction of the copy by replacing corresponding coro.param
1090with `i1 false` and replacing any use of the `copy` with the `original`.
1091
1092Arguments:
1093""""""""""
1094
1095The first argument points to an `alloca` storing the value of a parameter to a
1096coroutine.
1097
1098The second argument points to an `alloca` storing the value of the copy of that
1099parameter.
1100
1101Semantics:
1102""""""""""
1103
1104The optimizer is free to always replace this intrinsic with `i1 true`.
1105
1106The optimizer is also allowed to replace it with `i1 false` provided that the
1107parameter copy is only used prior to control flow reaching any of the suspend
1108points. The code that would be DCE'd if the `coro.param` is replaced with
1109`i1 false` is not considered to be a use of the parameter copy.
1110
1111The frontend can emit this intrinsic if its language rules allow for this
1112optimization.
1113
1114Example:
1115""""""""
1116Consider the following example. A coroutine takes two parameters `a` and `b`
1117that has a destructor and a move constructor.
1118
Sanjoy Das77a9c792016-07-26 21:03:41 +00001119.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +00001120
1121 struct A { ~A(); A(A&&); bool foo(); void bar(); };
1122
1123 task<int> f(A a, A b) {
1124 if (a.foo())
1125 return 42;
1126
1127 a.bar();
1128 co_await read_async(); // introduces suspend point
1129 b.bar();
1130 }
1131
1132Note that, uses of `b` is used after a suspend point and thus must be copied
1133into a coroutine frame, whereas `a` does not have to, since it never used
1134after suspend.
1135
1136A frontend can create parameter copies for `a` and `b` as follows:
1137
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +00001138.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +00001139
1140 task<int> f(A a', A b') {
1141 a = alloca A;
1142 b = alloca A;
1143 // move parameters to its copies
1144 if (coro.param(a', a)) A::A(a, A&& a');
1145 if (coro.param(b', b)) A::A(b, A&& b');
1146 ...
1147 // destroy parameters copies
1148 if (coro.param(a', a)) A::~A(a);
1149 if (coro.param(b', b)) A::~A(b);
1150 }
1151
1152The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses
1153of `a` with `a'`, since it is not used after suspend.
1154
1155The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used
1156after suspend and therefore, it has to reside in the coroutine frame.
1157
1158Coroutine Transformation Passes
1159===============================
1160CoroEarly
1161---------
1162The pass CoroEarly lowers coroutine intrinsics that hide the details of the
1163structure of the coroutine frame, but, otherwise not needed to be preserved to
1164help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
1165and `coro.promise`_ intrinsics.
1166
1167.. _CoroSplit:
1168
1169CoroSplit
1170---------
1171The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
1172into separate functions.
1173
1174CoroElide
1175---------
1176The pass CoroElide examines if the inlined coroutine is eligible for heap
1177allocation elision optimization. If so, it replaces `coro.alloc` and
Gor Nishanovb2a9c022016-08-10 16:40:39 +00001178`coro.frame` intrinsic with an address of a coroutine frame placed on its caller
David Majnemera6539272016-07-23 04:05:08 +00001179and replaces `coro.free` intrinsics with `null` to remove the deallocation code.
1180This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
1181calls to resume and destroy functions for a particular coroutine where possible.
1182
1183CoroCleanup
1184-----------
1185This pass runs late to lower all coroutine related intrinsics not replaced by
1186earlier passes.
1187
1188Upstreaming sequence (rough plan)
1189=================================
David Majnemer78557192016-07-27 05:12:35 +00001190#. Add documentation.
David Majnemer3d32b7e2016-07-28 21:04:31 +00001191#. Add coroutine intrinsics.
Gor Nishanovb2a9c022016-08-10 16:40:39 +00001192#. Add empty coroutine passes.
David Majnemera6539272016-07-23 04:05:08 +00001193#. Add coroutine devirtualization + tests.
1194#. Add CGSCC restart trigger + tests.
1195#. Add coroutine heap elision + tests.
Gor Nishanovb2a9c022016-08-10 16:40:39 +00001196#. Add custom allocation heap elision + tests. <== we are here
David Majnemera6539272016-07-23 04:05:08 +00001197#. Add coroutine splitting logic + tests.
1198#. Add simple coroutine frame builder + tests.
1199#. Add the rest of the logic + tests. (Maybe split further as needed).
1200
1201Areas Requiring Attention
1202=========================
1203#. A coroutine frame is bigger than it could be. Adding stack packing and stack
1204 coloring like optimization on the coroutine frame will result in tighter
1205 coroutine frames.
1206
1207#. Take advantage of the lifetime intrinsics for the data that goes into the
1208 coroutine frame. Leave lifetime intrinsics as is for the data that stays in
1209 allocas.
1210
1211#. The CoroElide optimization pass relies on coroutine ramp function to be
1212 inlined. It would be beneficial to split the ramp function further to
1213 increase the chance that it will get inlined into its caller.
1214
1215#. Design a convention that would make it possible to apply coroutine heap
1216 elision optimization across ABI boundaries.
1217
1218#. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows).
1219
1220#. Alignment is ignored by coro.begin and coro.free intrinsics.
1221
1222#. Make required changes to make sure that coroutine optimizations work with
1223 LTO.
1224
1225#. More tests, more tests, more tests