When I started writing code for a new USB filesystem, I was told that I should
implement asynchronous I/O by providing
operations. I had assumed that these file operations corresponded to the
I started to write the code for my kernel driver, and ran across a couple
infrastructure questions. I wanted to know what the system call code was doing
before my driver functions were called. My first reaction was to grep for
sys_aio_read to find the kernel side of the system call.
No such function exists. My assumption that
were system calls was wrong. This led to several questions:
- What do the userspace
aio_write(3)functions actually do?
- How are the kernel driver
aio_readfile operations called?
It turns out that GNU libc implements
functions are actually a userland implementation of asynchronous I/O. When
aio_write(3) is first called on an open file descriptor, libc
creates a new thread and adds the read or write request to that thread's
request queue. Subsequent requests also go into the request queue. Higher
priority requests are processed first, in the order they are submitted.
The newly created threads simply call blocking
means that kernel-side asynchronous I/O is bypassed. If a character device
aio_write file operation, those file operation are
To access kernel-side asynchronous I/O, userspace programs need to include libaio. This library provides wrappers for Linux-specific asynchronous I/O system calls:
These functions, found in
fs/aio.c, provide the kernel-side asynchronous I/O
implementation. They actually call the
operations. The system calls needed unique names so they wouldn't conflict
with the GNU libc
aio_write(3) function calls.
The kernel-side aio implementation is meant to be truly asynchronous.
aio_write() are expected to return immediately after they
queue the request. When an interrupt in the driver signals that transfer is
complete, the driver calls
kick_iocb(). This will guarantee that the
read_retry() driver function is called within the context of the program that
made the initial system call. The driver then copies data into userspace and
signals to the aio subsystem that the transaction is complete by calling
Truly asynchronous behavior
One would hope that doing asynchronous I/O in the kernel would be more
efficient. There has been some discussion about the subject; however, the
discussion is irrelevant unless kernel drivers structure their
aio_write functions properly.
When I was searching for an asynchronous driver to use as an example, I found
aio_write file operations that weren't asynchronous.
They would simply initiate their transaction and then wait for it to finish,
like a simple
read file operation.
A truly asynchronous implementation of the
calls would call
aio_complete() somewhere in the driver. By
grep -r 'kick_iocb\|aio_complete' in the 2.6.20-rc4 kernel source tree,
I came up with the following files:
- drivers/usb/gadgetfs/inode.c - used on USB devices that run Linux.
- fs/aio.c - the file that handles the aio syscalls and calls down into the driver's file operations table.
- fs/block_dev.c - block device driver.
In the case of NFS and block devices, you only get asynchronous behavior if the file descriptor has the O_DIRECT flag set. This asks the kernel to attempt to bypass the cache and write into or read from the userspace buffer directly. As the man page for open says, "In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching." Bottom line: the only useful asynchronous I/O implementation in the kernel is in gadgetfs.
Another way of dealing with asynchronous I/O in the kernel is being discussed on the LKML mailing list. The idea is to create a thread with a tiny stack whenever an I/O operation blocks. This light-weight thread is called a fibril. When a fibril blocks, the scheduler will be called, and there is a chance that the userland program that made the syscall will be scheduled. It can continue with other operations while it waits for the I/O operation to complete.
In some ways, this sounds suspiciously similar to GNU libc's userland aio implementation. There may be some performance gain to creating threads in the kernel rather than in userspace, but the kernel developers are still deciding on the details of fibrils.
The benefit of fibrils is that device drivers can write simple blocking code and the kernel will turn that into asynchronous I/O. If fibrils really catches on, I think that libaio and the kernel system calls will fall out of use.
Few people truly understand asynchronous I/O, and documentation on the subject
is sparse. The beginning userspace application writer will probably use the
aio_write(3) functions without understanding they
are not using the kernel space asynchronous I/O implementation.
The few kernel drivers that implement
operations in a truly asynchronous manner are rarely used and may not be well
tested. Until a true performance gain is shown when kernel aio used instead of
GNU libc, I would suggest that kernel driver writers write blocking code. The
implementation is simpler, and those userspace applications that want
asynchronous calls can use GNU libc.