blob: a54c18a35379a18c458e3a6085693c5118b897fb [file] [log] [blame]
David Majnemera6539272016-07-23 04:05:08 +00001=====================================
2Coroutines in LLVM
3=====================================
4
5.. contents::
6 :local:
7 :depth: 3
8
9.. warning::
10 This is a work in progress. Compatibility across LLVM releases is not
11 guaranteed.
12
13Introduction
14============
15
16.. _coroutine handle:
17
18LLVM coroutines are functions that have one or more `suspend points`_.
19When a suspend point is reached, the execution of a coroutine is suspended and
20control is returned back to its caller. A suspended coroutine can be resumed
21to continue execution from the last suspend point or it can be destroyed.
22
23In the following example, we call function `f` (which may or may not be a
24coroutine itself) that returns a handle to a suspended coroutine
25(**coroutine handle**) that is used by `main` to resume the coroutine twice and
26then destroy it:
27
28.. code-block:: llvm
29
30 define i32 @main() {
31 entry:
32 %hdl = call i8* @f(i32 4)
33 call void @llvm.coro.resume(i8* %hdl)
34 call void @llvm.coro.resume(i8* %hdl)
35 call void @llvm.coro.destroy(i8* %hdl)
36 ret i32 0
37 }
38
39.. _coroutine frame:
40
41In addition to the function stack frame which exists when a coroutine is
42executing, there is an additional region of storage that contains objects that
43keep the coroutine state when a coroutine is suspended. This region of storage
44is called **coroutine frame**. It is created when a coroutine is called and
45destroyed when a coroutine runs to completion or destroyed by a call to
46the `coro.destroy`_ intrinsic.
47
48An LLVM coroutine is represented as an LLVM function that has calls to
49`coroutine intrinsics`_ defining the structure of the coroutine.
50After lowering, a coroutine is split into several
51functions that represent three different ways of how control can enter the
52coroutine:
53
541. a ramp function, which represents an initial invocation of the coroutine that
55 creates the coroutine frame and executes the coroutine code until it
56 encounters a suspend point or reaches the end of the function;
57
582. a coroutine resume function that is invoked when the coroutine is resumed;
59
603. a coroutine destroy function that is invoked when the coroutine is destroyed.
61
62.. note:: Splitting out resume and destroy functions are just one of the
63 possible ways of lowering the coroutine. We chose it for initial
64 implementation as it matches closely the mental model and results in
65 reasonably nice code.
66
67Coroutines by Example
68=====================
69
70Coroutine Representation
71------------------------
72
73Let's look at an example of an LLVM coroutine with the behavior sketched
74by the following pseudo-code.
75
Sanjoy Das77a9c792016-07-26 21:03:41 +000076.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +000077
78 void *f(int n) {
79 for(;;) {
80 print(n++);
81 <suspend> // returns a coroutine handle on first suspend
82 }
83 }
84
85This coroutine calls some function `print` with value `n` as an argument and
86suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
87a `main` shown in the previous section. It will call `print` with values 4, 5
88and 6 after which the coroutine will be destroyed.
89
90The LLVM IR for this coroutine looks like this:
91
Aaron Ballman378ac7e2016-07-23 18:53:35 +000092.. code-block:: none
David Majnemera6539272016-07-23 04:05:08 +000093
94 define i8* @f(i32 %n) {
95 entry:
96 %size = call i32 @llvm.coro.size.i32()
97 %alloc = call i8* @malloc(i32 %size)
98 %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
99 br label %loop
100 loop:
101 %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
102 %inc = add nsw i32 %n.val, 1
103 call void @print(i32 %n.val)
104 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
105 switch i8 %0, label %suspend [i8 0, label %loop
106 i8 1, label %cleanup]
107 cleanup:
108 %mem = call i8* @llvm.coro.free(i8* %hdl)
109 call void @free(i8* %mem)
110 br label %suspend
111 suspend:
112 call void @llvm.coro.end(i8* %hdl, i1 false)
113 ret i8* %hdl
114 }
115
116The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
117lowered to a constant representing the size required for the coroutine frame.
118The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
119coroutine handle. The first parameter of `coro.begin` is given a block of memory
120to be used if the coroutine frame needs to be allocated dynamically.
121
122The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
123given the coroutine handle, returns a pointer of the memory block to be freed or
124`null` if the coroutine frame was not allocated dynamically. The `cleanup`
125block is entered when coroutine runs to completion by itself or destroyed via
126call to the `coro.destroy`_ intrinsic.
127
128The `suspend` block contains code to be executed when coroutine runs to
129completion or suspended. The `coro.end`_ intrinsic marks the point where
130a coroutine needs to return control back to the caller if it is not an initial
131invocation of the coroutine.
132
133The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
134intrinsic in combination with the following switch indicates what happens to
135control flow when a coroutine is suspended (default case), resumed (case 0) or
136destroyed (case 1).
137
138Coroutine Transformation
139------------------------
140
141One of the steps of coroutine lowering is building the coroutine frame. The
142def-use chains are analyzed to determine which objects need be kept alive across
143suspend points. In the coroutine shown in the previous section, use of virtual register
144`%n.val` is separated from the definition by a suspend point, therefore, it
145cannot reside on the stack frame since the latter goes away once the coroutine
146is suspended and control is returned back to the caller. An i32 slot is
147allocated in the coroutine frame and `%n.val` is spilled and reloaded from that
148slot as needed.
149
150We also store addresses of the resume and destroy functions so that the
151`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
152when its identity cannot be determined statically at compile time. For our
153example, the coroutine frame will be:
154
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000155.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000156
157 %f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }
158
159After resume and destroy parts are outlined, function `f` will contain only the
160code responsible for creation and initialization of the coroutine frame and
161execution of the coroutine until a suspend point is reached:
162
163.. code-block:: llvm
164
165 define i8* @f(i32 %n) {
166 entry:
167 %alloc = call noalias i8* @malloc(i32 24)
168 %0 = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
Mehdi Aminibe1cb222016-07-27 06:03:47 +0000169 %frame = bitcast i8* %0 to %f.frame*
David Majnemera6539272016-07-23 04:05:08 +0000170 %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
171 store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
172 %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
173 store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2
174
175 %inc = add nsw i32 %n, 1
176 %inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
177 store i32 %inc, i32* %inc.spill.addr
178 call void @print(i32 %n)
179
180 ret i8* %frame
181 }
182
183Outlined resume part of the coroutine will reside in function `f.resume`:
184
185.. code-block:: llvm
186
187 define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
188 entry:
189 %inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
190 %inc.spill = load i32, i32* %inc.spill.addr, align 4
191 %inc = add i32 %n.val, 1
192 store i32 %inc, i32* %inc.spill.addr, align 4
193 tail call void @print(i32 %inc)
194 ret void
195 }
196
197Whereas function `f.destroy` will contain the cleanup code for the coroutine:
198
199.. code-block:: llvm
200
201 define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
202 entry:
203 %0 = bitcast %f.frame* %frame.ptr.destroy to i8*
204 tail call void @free(i8* %0)
205 ret void
206 }
207
208Avoiding Heap Allocations
209-------------------------
210
211A particular coroutine usage pattern, which is illustrated by the `main`
212function in the overview section, where a coroutine is created, manipulated and
213destroyed by the same calling function, is common for coroutines implementing
214RAII idiom and is suitable for allocation elision optimization which avoid
215dynamic allocation by storing the coroutine frame as a static `alloca` in its
216caller.
217
David Majnemera6539272016-07-23 04:05:08 +0000218In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
David Majnemer78557192016-07-27 05:12:35 +0000219when dynamic allocation is required, and an address of an alloca on the caller's
220frame where coroutine frame can be stored if dynamic allocation is elided.
David Majnemera6539272016-07-23 04:05:08 +0000221
222.. code-block:: llvm
223
224 entry:
225 %elide = call i8* @llvm.coro.alloc()
226 %need.dyn.alloc = icmp ne i8* %elide, null
227 br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
228 dyn.alloc:
229 %size = call i32 @llvm.coro.size.i32()
230 %alloc = call i8* @CustomAlloc(i32 %size)
231 br label %coro.begin
232 coro.begin:
233 %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
234 %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)
235
236In the cleanup block, we will make freeing the coroutine frame conditional on
237`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
238thus skipping the deallocation code:
239
240.. code-block:: llvm
241
242 cleanup:
243 %mem = call i8* @llvm.coro.free(i8* %hdl)
244 %need.dyn.free = icmp ne i8* %mem, null
245 br i1 %need.dyn.free, label %dyn.free, label %if.end
246 dyn.free:
247 call void @CustomFree(i8* %mem)
248 br label %if.end
249 if.end:
250 ...
251
252With allocations and deallocations represented as described as above, after
David Majnemer78557192016-07-27 05:12:35 +0000253coroutine heap allocation elision optimization, the resulting main will be:
David Majnemera6539272016-07-23 04:05:08 +0000254
255.. code-block:: llvm
256
257 define i32 @main() {
258 entry:
259 call void @print(i32 4)
260 call void @print(i32 5)
261 call void @print(i32 6)
262 ret i32 0
263 }
264
265Multiple Suspend Points
266-----------------------
267
268Let's consider the coroutine that has more than one suspend point:
269
Sanjoy Das77a9c792016-07-26 21:03:41 +0000270.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +0000271
272 void *f(int n) {
273 for(;;) {
274 print(n++);
275 <suspend>
276 print(-n);
277 <suspend>
278 }
279 }
280
281Matching LLVM code would look like (with the rest of the code remaining the same
282as the code in the previous section):
283
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000284.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000285
286 loop:
287 %n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ]
288 call void @print(i32 %n.addr) #4
289 %2 = call i8 @llvm.coro.suspend(token none, i1 false)
290 switch i8 %2, label %suspend [i8 0, label %loop.resume
291 i8 1, label %cleanup]
292 loop.resume:
293 %inc = add nsw i32 %n.addr, 1
294 %sub = xor i32 %n.addr, -1
295 call void @print(i32 %sub)
296 %3 = call i8 @llvm.coro.suspend(token none, i1 false)
297 switch i8 %3, label %suspend [i8 0, label %loop
298 i8 1, label %cleanup]
299
300In this case, the coroutine frame would include a suspend index that will
301indicate at which suspend point the coroutine needs to resume. The resume
302function will use an index to jump to an appropriate basic block and will look
303as follows:
304
305.. code-block:: llvm
306
307 define internal fastcc void @f.Resume(%f.Frame* %FramePtr) {
308 entry.Resume:
309 %index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2
310 %index = load i8, i8* %index.addr, align 1
311 %switch = icmp eq i8 %index, 0
312 %n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3
313 %n = load i32, i32* %n.addr, align 4
314 br i1 %switch, label %loop.resume, label %loop
315
316 loop.resume:
317 %sub = xor i32 %n, -1
318 call void @print(i32 %sub)
319 br label %suspend
320 loop:
321 %inc = add nsw i32 %n, 1
322 store i32 %inc, i32* %n.addr, align 4
323 tail call void @print(i32 %inc)
324 br label %suspend
325
326 suspend:
327 %storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ]
328 store i8 %storemerge, i8* %index.addr, align 1
329 ret void
330 }
331
332If different cleanup code needs to get executed for different suspend points,
333a similar switch will be in the `f.destroy` function.
334
335.. note ::
336
337 Using suspend index in a coroutine state and having a switch in `f.resume` and
338 `f.destroy` is one of the possible implementation strategies. We explored
339 another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
340 every suspend point, and instead of storing an index, the resume and destroy
341 function pointers are updated at every suspend. Early testing showed that the
342 current approach is easier on the optimizer than the latter so it is a
343 lowering strategy implemented at the moment.
344
345Distinct Save and Suspend
346-------------------------
347
348In the previous example, setting a resume index (or some other state change that
349needs to happen to prepare a coroutine for resumption) happens at the same time as
350a suspension of a coroutine. However, in certain cases, it is necessary to control
351when coroutine is prepared for resumption and when it is suspended.
352
353In the following example, a coroutine represents some activity that is driven
354by completions of asynchronous operations `async_op1` and `async_op2` which get
355a coroutine handle as a parameter and resume the coroutine once async
356operation is finished.
357
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000358.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000359
360 void g() {
361 for (;;)
362 if (cond()) {
363 async_op1(<coroutine-handle>); // will resume once async_op1 completes
364 <suspend>
365 do_one();
366 }
367 else {
368 async_op2(<coroutine-handle>); // will resume once async_op2 completes
369 <suspend>
370 do_two();
371 }
372 }
373 }
374
375In this case, coroutine should be ready for resumption prior to a call to
376`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
377point when coroutine should be ready for resumption (namely, when a resume index
378should be stored in the coroutine frame, so that it can be resumed at the
379correct resume point):
380
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000381.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000382
383 if.true:
384 %save1 = call token @llvm.coro.save(i8* %hdl)
385 call void async_op1(i8* %hdl)
386 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
387 switch i8 %suspend1, label %suspend [i8 0, label %resume1
388 i8 1, label %cleanup]
389 if.false:
390 %save2 = call token @llvm.coro.save(i8* %hdl)
391 call void async_op2(i8* %hdl)
392 %suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false)
393 switch i8 %suspend1, label %suspend [i8 0, label %resume2
394 i8 1, label %cleanup]
395
396.. _coroutine promise:
397
398Coroutine Promise
399-----------------
400
401A coroutine author or a frontend may designate a distinguished `alloca` that can
402be used to communicate with the coroutine. This distinguished alloca is called
403**coroutine promise** and is provided as a third parameter to the `coro.begin`_
404intrinsic.
405
406The following coroutine designates a 32 bit integer `promise` and uses it to
407store the current value produced by a coroutine.
408
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000409.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000410
411 define i8* @f(i32 %n) {
412 entry:
413 %promise = alloca i32
414 %pv = bitcast i32* %promise to i8*
David Majnemer78557192016-07-27 05:12:35 +0000415 %elide = call i8* @llvm.coro.alloc()
416 %need.dyn.alloc = icmp ne i8* %elide, null
417 br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
418 dyn.alloc:
David Majnemera6539272016-07-23 04:05:08 +0000419 %size = call i32 @llvm.coro.size.i32()
420 %alloc = call i8* @malloc(i32 %size)
David Majnemer78557192016-07-27 05:12:35 +0000421 br label %coro.begin
422 coro.begin:
423 %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
424 %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null)
David Majnemera6539272016-07-23 04:05:08 +0000425 br label %loop
426 loop:
David Majnemer78557192016-07-27 05:12:35 +0000427 %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
David Majnemera6539272016-07-23 04:05:08 +0000428 %inc = add nsw i32 %n.val, 1
429 store i32 %n.val, i32* %promise
430 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
431 switch i8 %0, label %suspend [i8 0, label %loop
432 i8 1, label %cleanup]
433 cleanup:
434 %mem = call i8* @llvm.coro.free(i8* %hdl)
435 call void @free(i8* %mem)
436 br label %suspend
437 suspend:
438 call void @llvm.coro.end(i8* %hdl, i1 false)
439 ret i8* %hdl
440 }
441
442A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the
443coroutine promise.
444
445.. code-block:: llvm
446
447 define i32 @main() {
448 entry:
449 %hdl = call i8* @f(i32 4)
450 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
451 %promise.addr = bitcast i8* %promise.addr.raw to i32*
452 %val0 = load i32, i32* %promise.addr
453 call void @print(i32 %val0)
454 call void @llvm.coro.resume(i8* %hdl)
455 %val1 = load i32, i32* %promise.addr
456 call void @print(i32 %val1)
457 call void @llvm.coro.resume(i8* %hdl)
458 %val2 = load i32, i32* %promise.addr
459 call void @print(i32 %val2)
460 call void @llvm.coro.destroy(i8* %hdl)
461 ret i32 0
462 }
463
David Majnemer78557192016-07-27 05:12:35 +0000464After example in this section is compiled, result of the compilation will be:
David Majnemera6539272016-07-23 04:05:08 +0000465
466.. code-block:: llvm
467
468 define i32 @main() {
469 entry:
470 tail call void @print(i32 4)
471 tail call void @print(i32 5)
472 tail call void @print(i32 6)
473 ret i32 0
474 }
475
476.. _final:
477.. _final suspend:
478
479Final Suspend
480-------------
481
482A coroutine author or a frontend may designate a particular suspend to be final,
483by setting the second argument of the `coro.suspend`_ intrinsic to `true`.
484Such a suspend point has two properties:
485
486* it is possible to check whether a suspended coroutine is at the final suspend
487 point via `coro.done`_ intrinsic;
488
489* a resumption of a coroutine stopped at the final suspend point leads to
490 undefined behavior. The only possible action for a coroutine at a final
491 suspend point is destroying it via `coro.destroy`_ intrinsic.
492
493From the user perspective, the final suspend point represents an idea of a
494coroutine reaching the end. From the compiler perspective, it is an optimization
495opportunity for reducing number of resume points (and therefore switch cases) in
496the resume function.
497
498The following is an example of a function that keeps resuming the coroutine
499until the final suspend point is reached after which point the coroutine is
500destroyed:
501
502.. code-block:: llvm
503
504 define i32 @main() {
505 entry:
506 %hdl = call i8* @f(i32 4)
507 br label %while
508 while:
509 call void @llvm.coro.resume(i8* %hdl)
510 %done = call i1 @llvm.coro.done(i8* %hdl)
511 br i1 %done, label %end, label %while
512 end:
513 call void @llvm.coro.destroy(i8* %hdl)
514 ret i32 0
515 }
516
517Usually, final suspend point is a frontend injected suspend point that does not
518correspond to any explicitly authored suspend point of the high level language.
519For example, for a Python generator that has only one suspend point:
520
521.. code-block:: python
522
523 def coroutine(n):
524 for i in range(n):
525 yield i
526
527Python frontend would inject two more suspend points, so that the actual code
528looks like this:
529
Sanjoy Das77a9c792016-07-26 21:03:41 +0000530.. code-block:: c
David Majnemera6539272016-07-23 04:05:08 +0000531
532 void* coroutine(int n) {
533 int current_value;
534 <designate current_value to be coroutine promise>
535 <SUSPEND> // injected suspend point, so that the coroutine starts suspended
536 for (int i = 0; i < n; ++i) {
537 current_value = i; <SUSPEND>; // corresponds to "yield i"
538 }
539 <SUSPEND final=true> // injected final suspend point
540 }
541
542and python iterator `__next__` would look like:
543
Sanjoy Das77a9c792016-07-26 21:03:41 +0000544.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +0000545
546 int __next__(void* hdl) {
547 coro.resume(hdl);
548 if (coro.done(hdl)) throw StopIteration();
549 return *(int*)coro.promise(hdl, 4, false);
550 }
551
552Intrinsics
553==========
554
555Coroutine Manipulation Intrinsics
556---------------------------------
557
558Intrinsics described in this section are used to manipulate an existing
559coroutine. They can be used in any function which happen to have a pointer
560to a `coroutine frame`_ or a pointer to a `coroutine promise`_.
561
562.. _coro.destroy:
563
564'llvm.coro.destroy' Intrinsic
565^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
566
567Syntax:
568"""""""
569
570::
571
572 declare void @llvm.coro.destroy(i8* <handle>)
573
574Overview:
575"""""""""
576
577The '``llvm.coro.destroy``' intrinsic destroys a suspended
578coroutine.
579
580Arguments:
581""""""""""
582
583The argument is a coroutine handle to a suspended coroutine.
584
585Semantics:
586""""""""""
587
588When possible, the `coro.destroy` intrinsic is replaced with a direct call to
589the coroutine destroy function. Otherwise it is replaced with an indirect call
590based on the function pointer for the destroy function stored in the coroutine
591frame. Destroying a coroutine that is not suspended leads to undefined behavior.
592
593.. _coro.resume:
594
595'llvm.coro.resume' Intrinsic
596^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
597
598::
599
600 declare void @llvm.coro.resume(i8* <handle>)
601
602Overview:
603"""""""""
604
605The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine.
606
607Arguments:
608""""""""""
609
610The argument is a handle to a suspended coroutine.
611
612Semantics:
613""""""""""
614
615When possible, the `coro.resume` intrinsic is replaced with a direct call to the
616coroutine resume function. Otherwise it is replaced with an indirect call based
617on the function pointer for the resume function stored in the coroutine frame.
618Resuming a coroutine that is not suspended leads to undefined behavior.
619
620.. _coro.done:
621
622'llvm.coro.done' Intrinsic
623^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
624
625::
626
627 declare i1 @llvm.coro.done(i8* <handle>)
628
629Overview:
630"""""""""
631
632The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at
633the final suspend point or not.
634
635Arguments:
636""""""""""
637
638The argument is a handle to a suspended coroutine.
639
640Semantics:
641""""""""""
642
643Using this intrinsic on a coroutine that does not have a `final suspend`_ point
644or on a coroutine that is not suspended leads to undefined behavior.
645
646.. _coro.promise:
647
648'llvm.coro.promise' Intrinsic
649^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
650
651::
652
653 declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)
654
655Overview:
656"""""""""
657
658The '``llvm.coro.promise``' intrinsic obtains a pointer to a
659`coroutine promise`_ given a coroutine handle and vice versa.
660
661Arguments:
662""""""""""
663
664The first argument is a handle to a coroutine if `from` is false. Otherwise,
665it is a pointer to a coroutine promise.
666
667The second argument is an alignment requirements of the promise.
668If a frontend designated `%promise = alloca i32` as a promise, the alignment
669argument to `coro.promise` should be the alignment of `i32` on the target
670platform. If a frontend designated `%promise = alloca i32, align 16` as a
671promise, the alignment argument should be 16.
672This argument only accepts constants.
673
674The third argument is a boolean indicating a direction of the transformation.
675If `from` is true, the intrinsic returns a coroutine handle given a pointer
676to a promise. If `from` is false, the intrinsics return a pointer to a promise
677from a coroutine handle. This argument only accepts constants.
678
679Semantics:
680""""""""""
681
682Using this intrinsic on a coroutine that does not have a coroutine promise
683leads to undefined behavior. It is possible to read and modify coroutine
684promise of the coroutine which is currently executing. The coroutine author and
685a coroutine user are responsible to makes sure there is no data races.
686
687Example:
688""""""""
689
690.. code-block:: llvm
691
692 define i8* @f(i32 %n) {
693 entry:
694 %promise = alloca i32
695 %pv = bitcast i32* %promise to i8*
696 ...
697 ; the third argument to coro.begin points to the coroutine promise.
698 %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null)
699 ...
700 store i32 42, i32* %promise ; store something into the promise
701 ...
702 ret i8* %hdl
703 }
704
705 define i32 @main() {
706 entry:
707 %hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle
708 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
709 %promise.addr = bitcast i8* %promise.addr.raw to i32*
710 %val = load i32, i32* %promise.addr ; load a value from the promise
711 call void @print(i32 %val)
712 call void @llvm.coro.destroy(i8* %hdl)
713 ret i32 0
714 }
715
716.. _coroutine intrinsics:
717
718Coroutine Structure Intrinsics
719------------------------------
720Intrinsics described in this section are used within a coroutine to describe
721the coroutine structure. They should not be used outside of a coroutine.
722
723.. _coro.size:
724
725'llvm.coro.size' Intrinsic
726^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
727::
728
729 declare i32 @llvm.coro.size.i32()
730 declare i64 @llvm.coro.size.i64()
731
732Overview:
733"""""""""
734
735The '``llvm.coro.size``' intrinsic returns the number of bytes
736required to store a `coroutine frame`_.
737
738Arguments:
739""""""""""
740
741None
742
743Semantics:
744""""""""""
745
746The `coro.size` intrinsic is lowered to a constant representing the size of
747the coroutine frame.
748
749.. _coro.begin:
750
751'llvm.coro.begin' Intrinsic
752^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
753::
754
755 declare i8* @llvm.coro.begin(i8* <mem>, i32 <align>, i8* <promise>, i8* <fnaddr>)
756
757Overview:
758"""""""""
759
David Majnemer78557192016-07-27 05:12:35 +0000760The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
David Majnemera6539272016-07-23 04:05:08 +0000761
762Arguments:
763""""""""""
764
David Majnemer78557192016-07-27 05:12:35 +0000765The first argument is a pointer to a block of memory where coroutine frame
766will be stored.
David Majnemera6539272016-07-23 04:05:08 +0000767
768The second argument provides information on the alignment of the memory returned
769by the allocation function and given to `coro.begin` by the first argument. If
770this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
771This argument only accepts constants.
772
773The third argument, if not `null`, designates a particular alloca instruction to
774be a `coroutine promise`_.
775
776The fourth argument is `null` before coroutine is split, and later is replaced
777to point to a private global constant array containing function pointers to
778outlined resume and destroy parts of the coroutine.
779
780Semantics:
781""""""""""
782
783Depending on the alignment requirements of the objects in the coroutine frame
784and/or on the codegen compactness reasons the pointer returned from `coro.begin`
785may be at offset to the `%mem` argument. (This could be beneficial if
786instructions that express relative access to data can be more compactly encoded
787with small positive and negative offsets).
788
David Majnemer78557192016-07-27 05:12:35 +0000789A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
David Majnemera6539272016-07-23 04:05:08 +0000790
791.. _coro.free:
792
793'llvm.coro.free' Intrinsic
794^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
795::
796
797 declare i8* @llvm.coro.free(i8* <frame>)
798
799Overview:
800"""""""""
801
802The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
803coroutine frame is stored or `null` if this instance of a coroutine did not use
804dynamically allocated memory for its coroutine frame.
805
806Arguments:
807""""""""""
808
809A pointer to the coroutine frame. This should be the same pointer that was
810returned by prior `coro.begin` call.
811
812Example (custom deallocation function):
813"""""""""""""""""""""""""""""""""""""""
814
815.. code-block:: llvm
816
817 cleanup:
818 %mem = call i8* @llvm.coro.free(i8* %frame)
819 %mem_not_null = icmp ne i8* %mem, null
820 br i1 %mem_not_null, label %if.then, label %if.end
821 if.then:
822 call void @CustomFree(i8* %mem)
823 br label %if.end
824 if.end:
825 ret void
826
827Example (standard deallocation functions):
828""""""""""""""""""""""""""""""""""""""""""
829
830.. code-block:: llvm
831
832 cleanup:
833 %mem = call i8* @llvm.coro.free(i8* %frame)
834 call void @free(i8* %mem)
835 ret void
836
837.. _coro.alloc:
838
839'llvm.coro.alloc' Intrinsic
840^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
841::
842
843 declare i8* @llvm.coro.alloc()
844
845Overview:
846"""""""""
847
848The '``llvm.coro.alloc``' intrinsic returns an address of the memory on the
849callers frame where coroutine frame of this coroutine can be placed or `null`
850otherwise.
851
852Arguments:
853""""""""""
854
855None
856
857Semantics:
858""""""""""
859
860If the coroutine is eligible for heap elision, this intrinsic is lowered to an
861alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
David Majnemer78557192016-07-27 05:12:35 +0000862
863A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
David Majnemera6539272016-07-23 04:05:08 +0000864
865Example:
866""""""""
867
868.. code-block:: llvm
869
870 entry:
871 %elide = call i8* @llvm.coro.alloc()
872 %0 = icmp ne i8* %elide, null
873 br i1 %0, label %coro.begin, label %coro.alloc
874
875 coro.alloc:
876 %frame.size = call i32 @llvm.coro.size()
877 %alloc = call i8* @MyAlloc(i32 %frame.size)
878 br label %coro.begin
879
880 coro.begin:
881 %phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
882 %frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)
883
884.. _coro.frame:
885
886'llvm.coro.frame' Intrinsic
887^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
888::
889
890 declare i8* @llvm.coro.frame()
891
892Overview:
893"""""""""
894
895The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of
896the enclosing coroutine.
897
898Arguments:
899""""""""""
900
901None
902
903Semantics:
904""""""""""
905
906This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
907a frontend convenience intrinsic that makes it easier to refer to the
908coroutine frame.
909
910.. _coro.end:
911
912'llvm.coro.end' Intrinsic
913^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
914::
915
916 declare void @llvm.coro.end(i8* <handle>, i1 <unwind>)
917
918Overview:
919"""""""""
920
921The '``llvm.coro.end``' marks the point where execution of the resume part of
922the coroutine should end and control returns back to the caller.
923
924
925Arguments:
926""""""""""
927
928The first argument should refer to the coroutine handle of the enclosing coroutine.
929
930The second argument should be `true` if this coro.end is in the block that is
931part of the unwind sequence leaving the coroutine body due to exception prior to
932the first reaching any suspend points, and `false` otherwise.
933
934Semantics:
935""""""""""
936The `coro.end`_ intrinsic is a no-op during an initial invocation of the
937coroutine. When the coroutine resumes, the intrinsic marks the point when
938coroutine need to return control back to the caller.
939
940This intrinsic is removed by the CoroSplit pass when a coroutine is split into
941the start, resume and destroy parts. In start part, the intrinsic is removed,
942in resume and destroy parts, it is replaced with `ret void` instructions and
943the rest of the block containing `coro.end` instruction is discarded.
944
945In landing pads it is replaced with an appropriate instruction to unwind to
946caller.
947
948A frontend is allowed to supply null as the first parameter, in this case
949`coro-early` pass will replace the null with an appropriate coroutine handle
950value.
951
952.. _coro.suspend:
953.. _suspend points:
954
955'llvm.coro.suspend' Intrinsic
956^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
957::
958
959 declare i8 @llvm.coro.suspend(token <save>, i1 <final>)
960
961Overview:
962"""""""""
963
964The '``llvm.coro.suspend``' marks the point where execution of the coroutine
965need to get suspended and control returned back to the caller.
966Conditional branches consuming the result of this intrinsic lead to basic blocks
967where coroutine should proceed when suspended (-1), resumed (0) or destroyed
968(1).
969
970Arguments:
971""""""""""
972
973The first argument refers to a token of `coro.save` intrinsic that marks the
974point when coroutine state is prepared for suspension. If `none` token is passed,
975the intrinsic behaves as if there were a `coro.save` immediately preceding
976the `coro.suspend` intrinsic.
977
978The second argument indicates whether this suspension point is `final`_.
979The second argument only accepts constants. If more than one suspend point is
980designated as final, the resume and destroy branches should lead to the same
981basic blocks.
982
983Example (normal suspend point):
984"""""""""""""""""""""""""""""""
985
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000986.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000987
988 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
989 switch i8 %0, label %suspend [i8 0, label %resume
990 i8 1, label %cleanup]
991
992Example (final suspend point):
993""""""""""""""""""""""""""""""
994
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +0000995.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +0000996
997 while.end:
998 %s.final = call i8 @llvm.coro.suspend(token none, i1 true)
999 switch i8 %s.final, label %suspend [i8 0, label %trap
1000 i8 1, label %cleanup]
1001 trap:
1002 call void @llvm.trap()
1003 unreachable
1004
1005Semantics:
1006""""""""""
1007
1008If a coroutine that was suspended at the suspend point marked by this intrinsic
1009is resumed via `coro.resume`_ the control will transfer to the basic block
1010of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
1011basic block indicated by the 1-case. To suspend, coroutine proceed to the
1012default label.
1013
1014If suspend intrinsic is marked as final, it can consider the `true` branch
1015unreachable and can perform optimizations that can take advantage of that fact.
1016
1017.. _coro.save:
1018
1019'llvm.coro.save' Intrinsic
1020^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1021::
1022
1023 declare token @llvm.coro.save(i8* <handle>)
1024
1025Overview:
1026"""""""""
1027
1028The '``llvm.coro.save``' marks the point where a coroutine need to update its
1029state to prepare for resumption to be considered suspended (and thus eligible
1030for resumption).
1031
1032Arguments:
1033""""""""""
1034
1035The first argument points to a coroutine handle of the enclosing coroutine.
1036
1037Semantics:
1038""""""""""
1039
1040Whatever coroutine state changes are required to enable resumption of
1041the coroutine from the corresponding suspend point should be done at the point
1042of `coro.save` intrinsic.
1043
1044Example:
1045""""""""
1046
1047Separate save and suspend points are necessary when a coroutine is used to
1048represent an asynchronous control flow driven by callbacks representing
1049completions of asynchronous operations.
1050
1051In such a case, a coroutine should be ready for resumption prior to a call to
1052`async_op` function that may trigger resumption of a coroutine from the same or
1053a different thread possibly prior to `async_op` call returning control back
1054to the coroutine:
1055
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +00001056.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +00001057
1058 %save1 = call token @llvm.coro.save(i8* %hdl)
1059 call void async_op1(i8* %hdl)
1060 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
1061 switch i8 %suspend1, label %suspend [i8 0, label %resume1
1062 i8 1, label %cleanup]
1063
1064.. _coro.param:
1065
1066'llvm.coro.param' Intrinsic
1067^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1068::
1069
1070 declare i1 @llvm.coro.param(i8* <original>, i8* <copy>)
1071
1072Overview:
1073"""""""""
1074
David Majnemer78557192016-07-27 05:12:35 +00001075The '``llvm.coro.param``' is used by a frontend to mark up the code used to
David Majnemera6539272016-07-23 04:05:08 +00001076construct and destruct copies of the parameters. If the optimizer discovers that
1077a particular parameter copy is not used after any suspends, it can remove the
1078construction and destruction of the copy by replacing corresponding coro.param
1079with `i1 false` and replacing any use of the `copy` with the `original`.
1080
1081Arguments:
1082""""""""""
1083
1084The first argument points to an `alloca` storing the value of a parameter to a
1085coroutine.
1086
1087The second argument points to an `alloca` storing the value of the copy of that
1088parameter.
1089
1090Semantics:
1091""""""""""
1092
1093The optimizer is free to always replace this intrinsic with `i1 true`.
1094
1095The optimizer is also allowed to replace it with `i1 false` provided that the
1096parameter copy is only used prior to control flow reaching any of the suspend
1097points. The code that would be DCE'd if the `coro.param` is replaced with
1098`i1 false` is not considered to be a use of the parameter copy.
1099
1100The frontend can emit this intrinsic if its language rules allow for this
1101optimization.
1102
1103Example:
1104""""""""
1105Consider the following example. A coroutine takes two parameters `a` and `b`
1106that has a destructor and a move constructor.
1107
Sanjoy Das77a9c792016-07-26 21:03:41 +00001108.. code-block:: c++
David Majnemera6539272016-07-23 04:05:08 +00001109
1110 struct A { ~A(); A(A&&); bool foo(); void bar(); };
1111
1112 task<int> f(A a, A b) {
1113 if (a.foo())
1114 return 42;
1115
1116 a.bar();
1117 co_await read_async(); // introduces suspend point
1118 b.bar();
1119 }
1120
1121Note that, uses of `b` is used after a suspend point and thus must be copied
1122into a coroutine frame, whereas `a` does not have to, since it never used
1123after suspend.
1124
1125A frontend can create parameter copies for `a` and `b` as follows:
1126
Aaron Ballmanbc7c2d02016-07-23 20:11:21 +00001127.. code-block:: text
David Majnemera6539272016-07-23 04:05:08 +00001128
1129 task<int> f(A a', A b') {
1130 a = alloca A;
1131 b = alloca A;
1132 // move parameters to its copies
1133 if (coro.param(a', a)) A::A(a, A&& a');
1134 if (coro.param(b', b)) A::A(b, A&& b');
1135 ...
1136 // destroy parameters copies
1137 if (coro.param(a', a)) A::~A(a);
1138 if (coro.param(b', b)) A::~A(b);
1139 }
1140
1141The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses
1142of `a` with `a'`, since it is not used after suspend.
1143
1144The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used
1145after suspend and therefore, it has to reside in the coroutine frame.
1146
1147Coroutine Transformation Passes
1148===============================
1149CoroEarly
1150---------
1151The pass CoroEarly lowers coroutine intrinsics that hide the details of the
1152structure of the coroutine frame, but, otherwise not needed to be preserved to
1153help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
1154and `coro.promise`_ intrinsics.
1155
1156.. _CoroSplit:
1157
1158CoroSplit
1159---------
1160The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
1161into separate functions.
1162
1163CoroElide
1164---------
1165The pass CoroElide examines if the inlined coroutine is eligible for heap
1166allocation elision optimization. If so, it replaces `coro.alloc` and
1167`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
1168and replaces `coro.free` intrinsics with `null` to remove the deallocation code.
1169This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
1170calls to resume and destroy functions for a particular coroutine where possible.
1171
1172CoroCleanup
1173-----------
1174This pass runs late to lower all coroutine related intrinsics not replaced by
1175earlier passes.
1176
1177Upstreaming sequence (rough plan)
1178=================================
David Majnemer78557192016-07-27 05:12:35 +00001179#. Add documentation.
David Majnemer3d32b7e2016-07-28 21:04:31 +00001180#. Add coroutine intrinsics.
Gor Nishanov341ecba2016-08-05 13:17:06 +00001181#. Add empty coroutine passes. <== we are here
David Majnemera6539272016-07-23 04:05:08 +00001182#. Add coroutine devirtualization + tests.
1183#. Add CGSCC restart trigger + tests.
1184#. Add coroutine heap elision + tests.
1185#. Add custom allocation heap elision + tests.
1186#. Add coroutine splitting logic + tests.
1187#. Add simple coroutine frame builder + tests.
1188#. Add the rest of the logic + tests. (Maybe split further as needed).
1189
1190Areas Requiring Attention
1191=========================
1192#. A coroutine frame is bigger than it could be. Adding stack packing and stack
1193 coloring like optimization on the coroutine frame will result in tighter
1194 coroutine frames.
1195
1196#. Take advantage of the lifetime intrinsics for the data that goes into the
1197 coroutine frame. Leave lifetime intrinsics as is for the data that stays in
1198 allocas.
1199
1200#. The CoroElide optimization pass relies on coroutine ramp function to be
1201 inlined. It would be beneficial to split the ramp function further to
1202 increase the chance that it will get inlined into its caller.
1203
1204#. Design a convention that would make it possible to apply coroutine heap
1205 elision optimization across ABI boundaries.
1206
1207#. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows).
1208
1209#. Alignment is ignored by coro.begin and coro.free intrinsics.
1210
1211#. Make required changes to make sure that coroutine optimizations work with
1212 LTO.
1213
1214#. More tests, more tests, more tests