| .\" Copyright (C) 2019 Jens Axboe <axboe@kernel.dk> |
| .\" Copyright (C) 2019 Red Hat, Inc. |
| .\" |
| .\" SPDX-License-Identifier: LGPL-2.0-or-later |
| .\" |
| .TH IO_URING_REGISTER 2 2019-01-17 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| io_uring_register \- register files or user buffers for asynchronous I/O |
| .SH SYNOPSIS |
| .nf |
| .BR "#include <linux/io_uring.h>" |
| .PP |
| .BI "int io_uring_register(unsigned int " fd ", unsigned int " opcode , |
| .BI " void *" arg ", unsigned int " nr_args ); |
| .fi |
| .PP |
| .SH DESCRIPTION |
| .PP |
| |
| The |
| .BR io_uring_register () |
| system call registers resources (e.g. user buffers, files, eventfd, |
| personality, restrictions) for use in an |
| .BR io_uring (7) |
| instance referenced by |
| .IR fd . |
| Registering files or user buffers allows the kernel to take long term |
| references to internal data structures or create long term mappings of |
| application memory, greatly reducing per-I/O overhead. |
| |
| .I fd |
| is the file descriptor returned by a call to |
| .BR io_uring_setup (2). |
| .I opcode |
| can be one of: |
| |
| .TP |
| .B IORING_REGISTER_BUFFERS |
| .I arg |
| points to a |
| .I struct iovec |
| array of |
| .I nr_args |
| entries. The buffers associated with the iovecs will be locked in |
| memory and charged against the user's |
| .B RLIMIT_MEMLOCK |
| resource limit. See |
| .BR getrlimit (2) |
| for more information. Additionally, there is a size limit of 1GiB per |
| buffer. Currently, the buffers must be anonymous, non-file-backed |
| memory, such as that returned by |
| .BR malloc (3) |
| or |
| .BR mmap (2) |
| with the |
| .B MAP_ANONYMOUS |
| flag set. It is expected that this limitation will be lifted in the |
| future. Huge pages are supported as well. Note that the entire huge |
| page will be pinned in the kernel, even if only a portion of it is |
| used. |
| |
| After a successful call, the supplied buffers are mapped into the |
| kernel and eligible for I/O. To make use of them, the application |
| must specify the |
| .B IORING_OP_READ_FIXED |
| or |
| .B IORING_OP_WRITE_FIXED |
| opcodes in the submission queue entry (see the |
| .I struct io_uring_sqe |
| definition in |
| .BR io_uring_enter (2)), |
| and set the |
| .I buf_index |
| field to the desired buffer index. The memory range described by the |
| submission queue entry's |
| .I addr |
| and |
| .I len |
| fields must fall within the indexed buffer. |
| |
| It is perfectly valid to setup a large buffer and then only use part |
| of it for an I/O, as long as the range is within the originally mapped |
| region. |
| |
| An application can increase or decrease the size or number of |
| registered buffers by first unregistering the existing buffers, and |
| then issuing a new call to |
| .BR io_uring_register () |
| with the new buffers. |
| |
| Note that registering buffers will wait for the ring to idle. If the application |
| currently has requests in-flight, the registration will wait for those to |
| finish before proceeding. |
| |
| An application need not unregister buffers explicitly before shutting |
| down the io_uring instance. Available since 5.1. |
| |
| .TP |
| .B IORING_UNREGISTER_BUFFERS |
| This operation takes no argument, and |
| .I arg |
| must be passed as NULL. All previously registered buffers associated |
| with the io_uring instance will be released. Available since 5.1. |
| |
| .TP |
| .B IORING_REGISTER_FILES |
| Register files for I/O. |
| .I arg |
| contains a pointer to an array of |
| .I nr_args |
| file descriptors (signed 32 bit integers). |
| |
| To make use of the registered files, the |
| .B IOSQE_FIXED_FILE |
| flag must be set in the |
| .I flags |
| member of the |
| .IR "struct io_uring_sqe" , |
| and the |
| .I fd |
| member is set to the index of the file in the file descriptor array. |
| |
| The file set may be sparse, meaning that the |
| .B fd |
| field in the array may be set to |
| .B -1. |
| See |
| .B IORING_REGISTER_FILES_UPDATE |
| for how to update files in place. |
| |
| Note that registering files will wait for the ring to idle. If the application |
| currently has requests in-flight, the registration will wait for those to |
| finish before proceeding. See |
| .B IORING_REGISTER_FILES_UPDATE |
| for how to update an existing set without that limitation. |
| |
| Files are automatically unregistered when the io_uring instance is |
| torn down. An application need only unregister if it wishes to |
| register a new set of fds. Available since 5.1. |
| |
| .TP |
| .B IORING_REGISTER_FILES_UPDATE |
| This operation replaces existing files in the registered file set with new |
| ones, either turning a sparse entry (one where fd is equal to -1) into a |
| real one, removing an existing entry (new one is set to -1), or replacing |
| an existing entry with a new existing entry. |
| .I arg |
| must contain a pointer to a struct io_uring_files_update, which contains |
| an offset on which to start the update, and an array of file descriptors to |
| use for the update. |
| .I nr_args |
| must contain the number of descriptors in the passed in array. Available |
| since 5.5. |
| |
| .TP |
| .B IORING_UNREGISTER_FILES |
| This operation requires no argument, and |
| .I arg |
| must be passed as NULL. All previously registered files associated |
| with the io_uring instance will be unregistered. Available since 5.1. |
| |
| .TP |
| .B IORING_REGISTER_EVENTFD |
| It's possible to use eventfd(2) to get notified of completion events on an |
| io_uring instance. If this is desired, an eventfd file descriptor can be |
| registered through this operation. |
| .I arg |
| must contain a pointer to the eventfd file descriptor, and |
| .I nr_args |
| must be 1. Available since 5.2. |
| |
| An application can temporarily disable notifications, coming through the |
| registered eventfd, by setting the |
| .B IORING_CQ_EVENTFD_DISABLED |
| bit in the |
| .I flags |
| field of the CQ ring. |
| Available since 5.8. |
| |
| .TP |
| .B IORING_REGISTER_EVENTFD_ASYNC |
| This works just like |
| .B IORING_REGISTER_EVENTFD |
| , except notifications are only posted for events that complete in an async |
| manner. This means that events that complete inline while being submitted |
| do not trigger a notification event. The arguments supplied are the same as |
| for |
| .B IORING_REGISTER_EVENTFD. |
| Available since 5.6. |
| |
| .TP |
| .B IORING_UNREGISTER_EVENTFD |
| Unregister an eventfd file descriptor to stop notifications. Since only one |
| eventfd descriptor is currently supported, this operation takes no argument, |
| and |
| .I arg |
| must be passed as NULL and |
| .I nr_args |
| must be zero. Available since 5.2. |
| |
| .TP |
| .B IORING_REGISTER_PROBE |
| This operation returns a structure, io_uring_probe, which contains information |
| about the opcodes supported by io_uring on the running kernel. |
| .I arg |
| must contain a pointer to a struct io_uring_probe, and |
| .I nr_args |
| must contain the size of the ops array in that probe struct. The ops array |
| is of the type io_uring_probe_op, which holds the value of the opcode and |
| a flags field. If the flags field has |
| .B IO_URING_OP_SUPPORTED |
| set, then this opcode is supported on the running kernel. Available since 5.6. |
| |
| .TP |
| .B IORING_REGISTER_PERSONALITY |
| This operation registers credentials of the running application with io_uring, |
| and returns an id associated with these credentials. Applications wishing to |
| share a ring between separate users/processes can pass in this credential id |
| in the sqe |
| .B personality |
| field. If set, that particular sqe will be issued with these credentials. Must |
| be invoked with |
| .I arg |
| set to NULL and |
| .I nr_args |
| set to zero. Available since 5.6. |
| |
| .TP |
| .B IORING_UNREGISTER_PERSONALITY |
| This operation unregisters a previously registered personality with io_uring. |
| .I nr_args |
| must be set to the id in question, and |
| .I arg |
| must be set to NULL. Available since 5.6. |
| |
| .TP |
| .B IORING_REGISTER_ENABLE_RINGS |
| This operation enables an io_uring ring started in a disabled state |
| .RB (IORING_SETUP_R_DISABLED |
| was specified in the call to |
| .BR io_uring_setup (2)). |
| While the io_uring ring is disabled, submissions are not allowed and |
| registrations are not restricted. |
| |
| After the execution of this operation, the io_uring ring is enabled: |
| submissions and registration are allowed, but they will |
| be validated following the registered restrictions (if any). |
| This operation takes no argument, must be invoked with |
| .I arg |
| set to NULL and |
| .I nr_args |
| set to zero. Available since 5.10. |
| |
| .TP |
| .B IORING_REGISTER_RESTRICTIONS |
| .I arg |
| points to a |
| .I struct io_uring_restriction |
| array of |
| .I nr_args |
| entries. |
| |
| With an entry it is possible to allow an |
| .BR io_uring_register () |
| .I opcode, |
| or specify which |
| .I opcode |
| and |
| .I flags |
| of the submission queue entry are allowed, |
| or require certain |
| .I flags |
| to be specified (these flags must be set on each submission queue entry). |
| |
| All the restrictions must be submitted with a single |
| .BR io_uring_register () |
| call and they are handled as an allowlist (opcodes and flags not registered, |
| are not allowed). |
| |
| Restrictions can be registered only if the io_uring ring started in a disabled |
| state |
| .RB (IORING_SETUP_R_DISABLED |
| must be specified in the call to |
| .BR io_uring_setup (2)). |
| |
| Available since 5.10. |
| |
| .SH RETURN VALUE |
| |
| On success, |
| .BR io_uring_register () |
| returns 0. On error, -1 is returned, and |
| .I errno |
| is set accordingly. |
| |
| .SH ERRORS |
| .TP |
| .B EACCES |
| The |
| .I opcode |
| field is not allowed due to registered restrictions. |
| .TP |
| .B EBADF |
| One or more fds in the |
| .I fd |
| array are invalid. |
| .TP |
| .B EBADFD |
| .B IORING_REGISTER_ENABLE_RINGS |
| or |
| .B IORING_REGISTER_RESTRICTIONS |
| was specified, but the io_uring ring is not disabled. |
| .TP |
| .B EBUSY |
| .B IORING_REGISTER_BUFFERS |
| or |
| .B IORING_REGISTER_FILES |
| or |
| .B IORING_REGISTER_RESTRICTIONS |
| was specified, but there were already buffers, files, or restrictions |
| registered. |
| .TP |
| .B EFAULT |
| buffer is outside of the process' accessible address space, or |
| .I iov_len |
| is greater than 1GiB. |
| .TP |
| .B EINVAL |
| .B IORING_REGISTER_BUFFERS |
| or |
| .B IORING_REGISTER_FILES |
| was specified, but |
| .I nr_args |
| is 0. |
| .TP |
| .B EINVAL |
| .B IORING_REGISTER_BUFFERS |
| was specified, but |
| .I nr_args |
| exceeds |
| .B UIO_MAXIOV |
| .TP |
| .B EINVAL |
| .B IORING_UNREGISTER_BUFFERS |
| or |
| .B IORING_UNREGISTER_FILES |
| was specified, and |
| .I nr_args |
| is non-zero or |
| .I arg |
| is non-NULL. |
| .TP |
| .B EINVAL |
| .B IORING_REGISTER_RESTRICTIONS |
| was specified, but |
| .I nr_args |
| exceeds the maximum allowed number of restrictions or restriction |
| .I opcode |
| is invalid. |
| .TP |
| .B EMFILE |
| .B IORING_REGISTER_FILES |
| was specified and |
| .I nr_args |
| exceeds the maximum allowed number of files in a fixed file set. |
| .TP |
| .B EMFILE |
| .B IORING_REGISTER_FILES |
| was specified and adding |
| .I nr_args |
| file references would exceed the maximum allowed number of files the user |
| is allowed to have according to the |
| .B |
| RLIMIT_NOFILE |
| resource limit and the caller does not have |
| .B CAP_SYS_RESOURCE |
| capability. Note that this is a per user limit, not per process. |
| .TP |
| .B ENOMEM |
| Insufficient kernel resources are available, or the caller had a |
| non-zero |
| .B RLIMIT_MEMLOCK |
| soft resource limit, but tried to lock more memory than the limit |
| permitted. This limit is not enforced if the process is privileged |
| .RB ( CAP_IPC_LOCK ). |
| .TP |
| .B ENXIO |
| .B IORING_UNREGISTER_BUFFERS |
| or |
| .B IORING_UNREGISTER_FILES |
| was specified, but there were no buffers or files registered. |
| .TP |
| .B ENXIO |
| Attempt to register files or buffers on an io_uring instance that is already |
| undergoing file or buffer registration, or is being torn down. |
| .TP |
| .B EOPNOTSUPP |
| User buffers point to file-backed memory. |