Blame - man/io_uring.7 - platform/external/liburing

blob: a63b3e901d648c8867f6e0ccd7fc6fa80a22c7d1 [file] [log] [blame]

Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	1	.\" Copyright (C) 2020 Shuveb Hussain <shuveb@gmail.com>
				2	.\" SPDX-License-Identifier: LGPL-2.0-or-later
				3	.\"
				4
				5	.TH IO_URING 7 2020-07-26 "Linux" "Linux Programmer's Manual"
				6	.SH NAME
				7	io_uring \- Asynchronous I/O facility
				8	.SH SYNOPSIS
				9	.nf
				10	.B "#include <linux/io_uring.h>"
				11	.fi
				12	.PP
				13	.SH DESCRIPTION
				14	.PP
				15	.B io_uring
				16	is a Linux-specific API for asynchronous I/O.
				17	It allows the user to submit one or more I/O requests,
				18	which are processed asynchronously without blocking the calling process.
				19	.B io_uring
				20	gets its name from ring buffers which are shared between user space and
				21	kernel space. This arrangement allows for efficient I/O,
				22	while avoiding the overhead of copying buffers between them,
				23	where possible.
				24	This interface makes
				25	.B io_uring
				26	different from other UNIX I/O APIs,
				27	wherein,
				28	rather than just communicate between kernel and user space with system calls,
				29	ring buffers are used as the main mode of communication.
				30	This arrangement has various performance benefits which are discussed in a
				31	separate section below.
				32	This man page uses the terms shared buffers, shared ring buffers and
				33	queues interchangeably.
				34	.PP
				35	The general programming model you need to follow for
				36	.B io_uring
				37	is outlined below
				38	.IP \(bu
				39	Set up shared buffers with
				40	.BR io_uring_setup (2)
				41	and
				42	.BR mmap (2),
				43	mapping into user space shared buffers for the submission queue (SQ) and the
				44	completion queue (CQ).
				45	You place I/O requests you want to make on the SQ,
				46	while the kernel places the results of those operations on the CQ.
				47	.IP \(bu
				48	For every I/O request you need to make (like to read a file, write a file,
				49	accept a socket connection, etc), you create a submission queue entry,
				50	or SQE,
				51	describe the I/O operation you need to get done and add it to the tail of
				52	the submission queue (SQ).
				53	Each I/O operation is,
				54	in essence,
				55	the equivalent of a system call you would have made otherwise,
				56	if you were not using
				57	.BR io_uring .
				58	You can add more than one SQE to the queue depending on the number of
				59	operations you want to request.
				60	.IP \(bu
				61	After you add one or more SQEs,
				62	you need to call
				63	.BR io_uring_enter (2)
				64	to tell the kernel to dequeue your I/O requests off the SQ and begin
				65	processing them.
				66	.IP \(bu
				67	For each SQE you submit,
				68	once it is done processing the request,
				69	the kernel places a completion queue event or CQE at the tail of the
				70	completion queue or CQ.
				71	The kernel places exactly one matching CQE in the CQ for every SQE you
				72	submit on the SQ.
				73	After you retrieve a CQE,
				74	minimally,
				75	you might be interested in checking the
				76	.I res
				77	field of the CQE structure,
				78	which corresponds to the return value of the system
				79	call's equivalent,
				80	had you used it directly without using
				81	.BR io_uring .
				82	For instance,
				83	a read operation under
				84	.BR io_uring ,
				85	started with the
				86	.BR IORING_OP_READ
				87	operation,
				88	which issues the equivalent of the
				89	.BR read (2)
				90	system call,
				91	would return as part of
				92	.I res
				93	what
				94	.BR read (2)
				95	would have returned if called directly,
				96	without using
				97	.BR io_uring .
				98	.IP \(bu
				99	Optionally,
				100	.BR io_uring_enter (2)
				101	can also wait for a specified number of requests to be processed by the kernel
				102	before it returns.
				103	If you specified a certain number of completions to wait for,
				104	the kernel would have placed at least those many number of CQEs on the CQ,
				105	which you can then readily read,
				106	right after the return from
				107	.BR io_uring_enter (2).
				108	.IP \(bu
				109	It is important to remember that I/O requests submitted to the kernel can
Shuveb Hussain	26afb36	2020-09-28 18:41:35 +0530	[diff] [blame]	110	complete in any order.
				111	It is not necessary for the kernel to process one request after another,
				112	in the order you placed them.
				113	Given that the interface is a ring,
				114	the requests are attempted in order,
				115	however that doesn't imply any sort of ordering on their completion.
				116	When more than one request is in flight,
				117	it is not possible to determine which one will complete first.
				118	When you dequeue CQEs off the CQ,
				119	you should always check which submitted request it corresponds to.
				120	The most common method for doing so is utilizing the
Jens Axboe	c630d9a	2020-09-14 19:25:35 -0600	[diff] [blame]	121	.I user_data
				122	field in the request, which is passed back on the completion side.
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	123	.PP
				124	Adding to and reading from the queues:
				125	.IP \(bu
				126	You add SQEs to the tail of the SQ.
				127	The kernel reads SQEs off the head of the queue.
				128	.IP \(bu
				129	The kernel adds CQEs to the tail of the CQ.
				130	You read CQEs off the head of the queue.
				131	.SS Submission queue polling
				132	One of the goals of
				133	.B io_uring
				134	is to provide a means for efficient I/O.
				135	To this end,
				136	.B io_uring
				137	supports a polling mode that lets you avoid the call to
				138	.BR io_uring_enter (2),
				139	which you use to inform the kernel that you have queued SQEs on to the SQ.
				140	With SQ Polling,
				141	.B io_uring
				142	starts a kernel thread that polls the submission queue for any I/O
				143	requests you submit by adding SQEs.
				144	With SQ Polling enabled,
				145	there is no need for you to call
				146	.BR io_uring_enter (2),
				147	letting you avoid the overhead of system calls.
				148	A designated kernel thread dequeues SQEs off the SQ as you add them and
				149	dispatches them for asynchronous processing.
				150	.SS Setting up io_uring
				151	.PP
Shuveb Hussain	26afb36	2020-09-28 18:41:35 +0530	[diff] [blame]	152	The main steps in setting up
				153	.B io_uring
				154	consist of mapping in the shared buffers with
				155	.BR mmap (2)
				156	calls.
				157	In the example program included in this man page,
				158	the function
				159	.BR app_setup_uring ()
				160	sets up
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	161	.B io_uring
				162	with a QUEUE_DEPTH deep submission queue.
				163	Pay attention to the 2
				164	.BR mmap (2)
				165	calls that set up the shared submission and completion queues.
				166	If your kernel is older than version 5.4,
				167	three
				168	.BR mmap(2)
				169	calls are required.
				170	.PP
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	171	.SS Submitting I/O requests
				172	The process of submitting a request consists of describing the I/O
				173	operation you need to get done using an
				174	.B io_uring_sqe
				175	structure instance.
				176	These details describe the equivalent system call and its parameters.
				177	Because the range of I/O operations Linux supports are very varied and the
				178	.B io_uring_sqe
				179	structure needs to be able to describe them,
				180	it has several fields,
				181	some packed into unions for space efficiency.
				182	Here is a simplified version of struct
				183	.B io_uring_sqe
				184	with some of the most often used fields:
				185	.PP
				186	.in +4n
				187	.EX
				188	struct io_uring_sqe {
				189	__u8 opcode; /* type of operation for this sqe */
				190	__s32 fd; /* file descriptor to do IO on */
				191	__u64 off; /* offset into file */
				192	__u64 addr; /* pointer to buffer or iovecs */
				193	__u32 len; /* buffer size or number of iovecs */
				194	__u64 user_data; /* data to be passed back at completion time */
				195	__u8 flags; /* IOSQE_ flags */
				196	...
				197	};
				198	.EE
				199	.in
				200
				201	Here is struct
				202	.B io_uring_sqe
				203	in full:
				204
				205	.in +4n
				206	.EX
				207	struct io_uring_sqe {
				208	__u8 opcode; /* type of operation for this sqe */
				209	__u8 flags; /* IOSQE_ flags */
				210	__u16 ioprio; /* ioprio for the request */
				211	__s32 fd; /* file descriptor to do IO on */
				212	union {
				213	__u64 off; /* offset into file */
				214	__u64 addr2;
				215	};
				216	union {
				217	__u64 addr; /* pointer to buffer or iovecs */
				218	__u64 splice_off_in;
				219	};
				220	__u32 len; /* buffer size or number of iovecs */
				221	union {
				222	__kernel_rwf_t rw_flags;
				223	__u32 fsync_flags;
				224	__u16 poll_events; /* compatibility */
				225	__u32 poll32_events; /* word-reversed for BE */
				226	__u32 sync_range_flags;
				227	__u32 msg_flags;
				228	__u32 timeout_flags;
				229	__u32 accept_flags;
				230	__u32 cancel_flags;
				231	__u32 open_flags;
				232	__u32 statx_flags;
				233	__u32 fadvise_advice;
				234	__u32 splice_flags;
				235	};
				236	__u64 user_data; /* data to be passed back at completion time */
				237	union {
				238	struct {
				239	/* pack this to avoid bogus arm OABI complaints */
				240	union {
				241	/* index into fixed buffers, if used */
				242	__u16 buf_index;
				243	/* for grouped buffer selection */
				244	__u16 buf_group;
				245	} __attribute__((packed));
				246	/* personality to use, if used */
				247	__u16 personality;
				248	__s32 splice_fd_in;
				249	};
				250	__u64 __pad2[3];
				251	};
				252	};
				253	.EE
				254	.in
				255	.PP
				256	To submit an I/O request to
				257	.BR io_uring ,
				258	you need to acquire a submission queue entry (SQE) from the submission
				259	queue (SQ),
				260	fill it up with details of the operation you want to submit and call
				261	.BR io_uring_enter (2).
				262	If you want to avoid calling
				263	.BR io_uring_enter (2),
				264	you have the option of setting up Submission Queue Polling.
				265	.PP
				266	SQEs are added to the tail of the submission queue.
				267	The kernel picks up SQEs off the head of the SQ.
				268	The general algorithm to get the next available SQE and update the tail is
				269	as follows.
				270	.PP
				271	.in +4n
				272	.EX
				273	struct io_uring_sqe *sqe;
				274	unsigned tail, index;
				275	tail = *sqring->tail;
				276	index = tail & (*sqring->ring_mask);
				277	sqe = &sqring->sqes[index];
				278	/* fill up details about this I/O request */
				279	describe_io(sqe);
				280	/* fill the sqe index into the SQ ring array */
				281	sqring->array[index] = index;
				282	tail++;
Jens Axboe	7466587	2020-09-14 19:21:31 -0600	[diff] [blame]	283	atomic_store_release(sqring->tail, tail);
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	284	.EE
				285	.in
				286	.PP
				287	To get the index of an entry,
				288	the application must mask the current tail index with the size mask of the
				289	ring.
				290	This holds true for both SQs and CQs.
				291	Once the SQE is acquired,
				292	the necessary fields are filled in,
				293	describing the request.
				294	While the CQ ring directly indexes the shared array of CQEs,
				295	the submission side has an indirection array between them.
				296	The submission side ring buffer is an index into this array,
				297	which in turn contains the index into the SQEs.
				298	.PP
				299	The following code snippet demonstrates how a read operation,
				300	an equivalent of a
				301	.BR preadv2 (2)
				302	system call is described by filling up an SQE with the necessary
				303	parameters.
				304	.PP
				305	.in +4n
				306	.EX
				307	struct iovec iovecs[16];
				308	...
				309	sqe->opcode = IORING_OP_READV;
				310	sqe->fd = fd;
				311	sqe->addr = (unsigned long) iovecs;
				312	sqe->len = 16;
				313	sqe->off = offset;
				314	sqe->flags = 0;
				315	.EE
				316	.in
				317	.TP
				318	.B Memory ordering
				319	Modern compilers and CPUs freely reorder reads and writes without
				320	affecting the program's outcome to optimize performance.
				321	Some aspects of this need to be kept in mind on SMP systems since
				322	.B io_uring
				323	involves buffers shared between kernel and user space.
				324	These buffers are both visible and modifiable from kernel and user space.
				325	As heads and tails belonging to these shared buffers are updated by kernel
				326	and user space,
				327	changes need to be coherently visible on either side,
				328	irrespective of whether a CPU switch took place after the kernel-user mode
				329	switch happened.
				330	We use memory barriers to enforce this coherency.
				331	Being significantly large subjects on their own,
				332	memory barriers are out of scope for further discussion on this man page.
				333	.TP
				334	.B Letting the kernel know about I/O submissions
				335	Once you place one or more SQEs on to the SQ,
				336	you need to let the kernel know that you've done so.
				337	You can do this by calling the
				338	.BR io_uring_enter (2)
				339	system call.
				340	This system call is also capable of waiting for a specified count of
				341	events to complete.
				342	This way,
				343	you can be sure to find completion events in the completion queue without
				344	having to poll it for events later.
				345	.SS Reading completion events
				346	Similar to the submission queue (SQ),
				347	the completion queue (CQ) is a shared buffer between the kernel and user
				348	space.
				349	Whereas you placed submission queue entries on the tail of the SQ and the
				350	kernel read off the head,
				351	when it comes to the CQ,
				352	the kernel places completion queue events or CQEs on the tail of the CQ and
				353	you read off its head.
				354	.PP
				355	Submission is flexible (and thus a bit more complicated) since it needs to
				356	be able to encode different types of system calls that take various
				357	parameters.
				358	Completion,
				359	on the other hand is simpler since we're looking only for a return value
				360	back from the kernel.
				361	This is easily understood by looking at the completion queue event
				362	structure,
				363	struct
				364	.BR io_uring_cqe :
				365	.PP
				366	.in +4n
				367	.EX
				368	struct io_uring_cqe {
				369	__u64 user_data; /* sqe->data submission passed back */
				370	__s32 res; /* result code for this event */
				371	__u32 flags;
				372	};
				373	.EE
				374	.in
				375	.PP
				376	Here,
				377	.I user_data
				378	is custom data that is passed unchanged from submission to completion.
				379	That is,
				380	from SQEs to CQEs.
				381	This field can be used to set context,
				382	uniquely identifying submissions that got completed.
				383	Given that I/O requests can complete in any order,
				384	this field can be used to correlate a submission with a completion.
				385	.I res
				386	is the result from the system call that was performed as part of the
				387	submission;
				388	its return value.
				389	The
				390	.I flags
				391	field could carry request-specific metadata in the future,
				392	but is currently unused.
				393	.PP
				394	The general sequence to read completion events off the completion queue is
				395	as follows:
				396	.PP
				397	.in +4n
				398	.EX
				399	unsigned head;
				400	head = *cqring->head;
Jens Axboe	7466587	2020-09-14 19:21:31 -0600	[diff] [blame]	401	if (head != atomic_load_acquire(cqring->tail)) {
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	402	struct io_uring_cqe *cqe;
				403	unsigned index;
				404	index = head & (cqring->mask);
				405	cqe = &cqring->cqes[index];
				406	/* process completed CQE */
				407	process_cqe(cqe);
				408	/* CQE consumption complete */
				409	head++;
				410	}
Jens Axboe	7466587	2020-09-14 19:21:31 -0600	[diff] [blame]	411	atomic_store_release(cqring->head, head);
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	412	.EE
				413	.in
				414	.PP
				415	It helps to be reminded that the kernel adds CQEs to the tail of the CQ,
				416	while you need to dequeue them off the head.
				417	To get the index of an entry at the head,
				418	the application must mask the current head index with the size mask of the
				419	ring.
				420	Once the CQE has been consumed or processed,
				421	the head needs to be updated to reflect the consumption of the CQE.
				422	Attention should be paid to the read and write barriers to ensure
				423	successful read and update of the head.
				424	.SS io_uring performance
				425	Because of the shared ring buffers between kernel and user space,
				426	.B io_uring
				427	can be a zero-copy system.
				428	Copying buffers to and fro becomes necessary when system calls that
				429	transfer data between kernel and user space are involved.
				430	But since the bulk of the communication in
				431	.B io_uring
				432	is via buffers shared between the kernel and user space,
				433	this huge performance overhead is completely avoided.
				434	.PP
				435	While system calls may not seem like a significant overhead,
				436	in high performance applications,
				437	making a lot of them will begin to matter.
				438	While workarounds the operating system has in place to deal with Specter
				439	and Meltdown are ideally best done away with,
				440	unfortunately,
				441	some of these workarounds are around the system call interface,
				442	making system calls not as cheap as before on affected hardware.
				443	While newer hardware should not need these workarounds,
				444	hardware with these vulnerabilities can be expected to be in the wild for a
				445	long time.
				446	While using synchronous programming interfaces or even when using
				447	asynchronous programming interfaces under Linux,
				448	there is at least one system call involved in the submission of each
				449	request.
				450	In
				451	.BR io_uring ,
				452	on the other hand,
				453	you can batch several requests in one go,
				454	simply by queueing up multiple SQEs,
				455	each describing an I/O operation you want and make a single call to
				456	.BR io_uring_enter (2).
				457	This is possible due to
				458	.BR io_uring 's
				459	shared buffers based design.
				460	.PP
				461	While this batching in itself can avoid the overhead associated with
				462	potentially multiple and frequent system calls,
				463	you can reduce even this overhead further with Submission Queue Polling,
				464	by having the kernel poll and pick up your SQEs for processing as you add
				465	them to the submission queue. This avoids the
				466	.BR io_uring_enter (2)
				467	call you need to make to tell the kernel to pick SQEs up.
				468	For high-performance applications,
				469	this means even lesser system call overheads.
				470	.SH CONFORMING TO
				471	.B io_uring
				472	is Linux-specific.
Shuveb Hussain	26afb36	2020-09-28 18:41:35 +0530	[diff] [blame]	473	.SH EXAMPLES
				474	The following example uses
				475	.B io_uring
				476	to copy stdin to stdout.
				477	Using shell redirection,
				478	you should be able to copy files with this example.
				479	Because it uses a queue depth of only one,
				480	this example processes I/O requests one after the other.
				481	It is purposefully kept this way to aid understanding.
				482	In real-world scenarios however,
				483	you'll want to have a larger queue depth to parallelize I/O request
				484	processing so as to gain the kind of performance benefits
				485	.B io_uring
				486	provides with its asynchronous processing of requests.
				487	.PP
				488	.EX
				489	#include <stdio.h>
				490	#include <stdlib.h>
				491	#include <sys/stat.h>
				492	#include <sys/ioctl.h>
				493	#include <sys/syscall.h>
				494	#include <sys/mman.h>
				495	#include <sys/uio.h>
				496	#include <linux/fs.h>
				497	#include <fcntl.h>
				498	#include <unistd.h>
				499	#include <string.h>
				500	#include <stdatomic.h>
				501
				502	#include <linux/io_uring.h>
				503
				504	#define QUEUE_DEPTH 1
				505	#define BLOCK_SZ 1024
				506
				507	/* Macros for barriers needed by io_uring */
				508	#define io_uring_smp_store_release(p, v) \\
				509	atomic_store_explicit((_Atomic typeof((p)) )(p), (v), \\
				510	memory_order_release)
				511	#define io_uring_smp_load_acquire(p) \\
				512	atomic_load_explicit((_Atomic typeof((p)) )(p), \\
				513	memory_order_acquire)
				514
				515	int ring_fd;
				516	unsigned sring_tail, sring_mask, *sring_array,
				517	cring_head, cring_tail, *cring_mask;
				518	struct io_uring_sqe *sqes;
				519	struct io_uring_cqe *cqes;
				520	char buff[BLOCK_SZ];
				521	off_t offset;
				522
				523	/*
				524	* System call wrappers provided since glibc does not yet
				525	* provide wrappers for io_uring system calls.
				526	* */
				527
				528	int io_uring_setup(unsigned entries, struct io_uring_params *p)
				529	{
				530	return (int) syscall(__NR_io_uring_setup, entries, p);
				531	}
				532
				533	int io_uring_enter(int ring_fd, unsigned int to_submit,
				534	unsigned int min_complete, unsigned int flags)
				535	{
				536	return (int) syscall(__NR_io_uring_enter, ring_fd, to_submit, min_complete,
				537	flags, NULL, 0);
				538	}
				539
				540	int app_setup_uring(void) {
				541	struct io_uring_params p;
				542	void sq_ptr, cq_ptr;
				543
				544	/* See io_uring_setup(2) for io_uring_params.flags you can set */
				545	memset(&p, 0, sizeof(p));
				546	ring_fd = io_uring_setup(QUEUE_DEPTH, &p);
				547	if (ring_fd < 0) {
				548	perror("io_uring_setup");
				549	return 1;
				550	}
				551
				552	/*
				553	* io_uring communication happens via 2 shared kernel-user space ring
				554	* buffers, which can be jointly mapped with a single mmap() call in
				555	* kernels >= 5.4.
				556	*/
				557
				558	int sring_sz = p.sq_off.array + p.sq_entries * sizeof(unsigned);
				559	int cring_sz = p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe);
				560
				561	/* Rather than check for kernel version, the recommended way is to
				562	* check the features field of the io_uring_params structure, which is a
				563	* bitmask. If IORING_FEAT_SINGLE_MMAP is set, we can do away with the
				564	* second mmap() call to map in the completion ring separately.
				565	*/
				566	if (p.features & IORING_FEAT_SINGLE_MMAP) {
				567	if (cring_sz > sring_sz)
				568	sring_sz = cring_sz;
				569	cring_sz = sring_sz;
				570	}
				571
				572	/* Map in the submission and completion queue ring buffers.
				573	* Kernels < 5.4 only map in the submission queue, though.
				574	*/
				575	sq_ptr = mmap(0, sring_sz, PROT_READ \| PROT_WRITE,
				576	MAP_SHARED \| MAP_POPULATE,
				577	ring_fd, IORING_OFF_SQ_RING);
				578	if (sq_ptr == MAP_FAILED) {
				579	perror("mmap");
				580	return 1;
				581	}
				582
				583	if (p.features & IORING_FEAT_SINGLE_MMAP) {
				584	cq_ptr = sq_ptr;
				585	} else {
				586	/* Map in the completion queue ring buffer in older kernels separately */
				587	cq_ptr = mmap(0, cring_sz, PROT_READ \| PROT_WRITE,
				588	MAP_SHARED \| MAP_POPULATE,
				589	ring_fd, IORING_OFF_CQ_RING);
				590	if (cq_ptr == MAP_FAILED) {
				591	perror("mmap");
				592	return 1;
				593	}
				594	}
				595	/* Save useful fields for later easy reference */
				596	sring_tail = sq_ptr + p.sq_off.tail;
				597	sring_mask = sq_ptr + p.sq_off.ring_mask;
				598	sring_array = sq_ptr + p.sq_off.array;
				599
				600	/* Map in the submission queue entries array */
				601	sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
				602	PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_POPULATE,
				603	ring_fd, IORING_OFF_SQES);
				604	if (sqes == MAP_FAILED) {
				605	perror("mmap");
				606	return 1;
				607	}
				608
				609	/* Save useful fields for later easy reference */
				610	cring_head = cq_ptr + p.cq_off.head;
				611	cring_tail = cq_ptr + p.cq_off.tail;
				612	cring_mask = cq_ptr + p.cq_off.ring_mask;
				613	cqes = cq_ptr + p.cq_off.cqes;
				614
				615	return 0;
				616	}
				617
				618	/*
				619	* Read from completion queue.
				620	* In this function, we read completion events from the completion queue.
				621	* We dequeue the CQE, update and head and return the result of the operation.
				622	* */
				623
				624	int read_from_cq() {
				625	struct io_uring_cqe *cqe;
				626	unsigned head, reaped = 0;
				627
				628	/* Read barrier */
				629	head = io_uring_smp_load_acquire(cring_head);
				630	/*
				631	* Remember, this is a ring buffer. If head == tail, it means that the
				632	* buffer is empty.
				633	* */
				634	if (head == *cring_tail)
				635	return -1;
				636
				637	/* Get the entry */
				638	cqe = &cqes[head & (*cring_mask)];
				639	if (cqe->res < 0)
noah	97e3a8b	2020-12-16 19:08:33 -0500	[diff] [blame]	640	fprintf(stderr, "Error: %s\\n", strerror(abs(cqe->res)));
Shuveb Hussain	26afb36	2020-09-28 18:41:35 +0530	[diff] [blame]	641
				642	head++;
				643
				644	/* Write barrier so that update to the head are made visible */
				645	io_uring_smp_store_release(cring_head, head);
				646
				647	return cqe->res;
				648	}
				649
				650	/*
				651	* Submit a read or a write request to the submission queue.
				652	* */
				653
				654	int submit_to_sq(int fd, int op) {
				655	unsigned index, tail;
				656
				657	/* Add our submission queue entry to the tail of the SQE ring buffer */
				658	tail = *sring_tail;
				659	index = tail & *sring_mask;
				660	struct io_uring_sqe *sqe = &sqes[index];
				661	/* Fill in the parameters required for the read or write operation */
				662	sqe->opcode = op;
				663	sqe->fd = fd;
				664	sqe->addr = (unsigned long) buff;
				665	if (op == IORING_OP_READ) {
				666	memset(buff, 0, sizeof(buff));
				667	sqe->len = BLOCK_SZ;
				668	}
				669	else {
				670	sqe->len = strlen(buff);
				671	}
				672	sqe->off = offset;
				673
				674	sring_array[index] = index;
				675	tail++;
				676
				677	/* Update the tail */
				678	io_uring_smp_store_release(sring_tail, tail);
				679
				680	/*
				681	* Tell the kernel we have submitted events with the io_uring_enter() system
				682	* call. We also pass in the IOURING_ENTER_GETEVENTS flag which causes the
				683	* io_uring_enter() call to wait until min_complete (the 3rd param) events
				684	* complete.
				685	* */
				686	int ret = io_uring_enter(ring_fd, 1,1,
				687	IORING_ENTER_GETEVENTS);
				688	if(ret < 0) {
				689	perror("io_uring_enter");
				690	return -1;
				691	}
				692
				693	return ret;
				694	}
				695
				696	int main(int argc, char *argv[]) {
				697	int res;
				698
				699	/* Setup io_uring for use */
				700	if(app_setup_uring()) {
noah	97e3a8b	2020-12-16 19:08:33 -0500	[diff] [blame]	701	fprintf(stderr, "Unable to setup uring!\\n");
Shuveb Hussain	26afb36	2020-09-28 18:41:35 +0530	[diff] [blame]	702	return 1;
				703	}
				704
				705	/*
				706	* A while loop that reads from stdin and writes to stdout.
				707	* Breaks on EOF.
				708	*/
				709	while (1) {
				710	/* Initiate read from stdin and wait for it to complete */
				711	submit_to_sq(STDIN_FILENO, IORING_OP_READ);
				712	/* Read completion queue entry */
				713	res = read_from_cq();
				714	if (res > 0) {
				715	/* Read successful. Write to stdout. */
				716	submit_to_sq(STDOUT_FILENO, IORING_OP_WRITE);
				717	read_from_cq();
				718	} else if (res == 0) {
				719	/* reached EOF */
				720	break;
				721	}
				722	else if (res < 0) {
				723	/* Error reading file */
noah	97e3a8b	2020-12-16 19:08:33 -0500	[diff] [blame]	724	fprintf(stderr, "Error: %s\\n", strerror(abs(res)));
Shuveb Hussain	26afb36	2020-09-28 18:41:35 +0530	[diff] [blame]	725	break;
				726	}
				727	offset += res;
				728	}
				729
				730	return 0;
				731	}
				732	.EE
Shuveb Hussain	95167d0	2020-09-14 10:57:50 +0530	[diff] [blame]	733	.SH SEE ALSO
				734	.BR io_uring_enter (2)
				735	.BR io_uring_register (2)
				736	.BR io_uring_setup (2)