2003-07-01 01:06:12

by Daniel McNeil

[permalink] [raw]
Subject: [PATCH 2.5.73-mm2] aio O_DIRECT no readahead

More testing on AIO with O_DIRECT and /dev/raw/. Doing AIO reads was
using a lot more cpu time than using dd on the raw partition even with
the io_queue_wait patch. Found out that aio is doing readahead even
for O_DIRECT. Here's a patch that fixes it. Here are some output
from time on reading a 90mb partition on a raw device with 32k i/o:

CMD Kernel I/O real user sys
RUN version size
=== ====== === ==== ==== ===
dd 2.5.73-mm2: 32k 2.514s 0.000s 0.200s
aio (1 iocb) 2.5.73-mm2: 32k 3.408s 1.120s 2.250s
aio (32 iocb) 2.5.73-mm2: 32k 2.994s 0.830s 2.160s
aio (1 iocb) 2.5.73-mm2+patch.aiodio 32k 2.509s 0.850s 1.660s
aio (1 iocb) 2.5.73-mm2+patch.aiodio
+patch.io_queue_wait 32k 2.566s 0.010s 0.240s
aio (32 iocb) 2.5.73-mm2+patch.aiodio
+patch.io_queue_wait 32k 2.465s 0.010s 0.080s

With this patch and my previous io_queue_wait patch the cpu time
drops down to very little when doing AIO with O_DIRECT.

Daniel McNeil <[email protected]>


Attachments:
patch-2.5.73-mm2.aiodio (1.05 kB)

2003-07-01 02:59:16

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 2.5.73-mm2] aio O_DIRECT no readahead

On Mon, Jun 30, 2003 at 06:19:51PM -0700, Daniel McNeil wrote:
> More testing on AIO with O_DIRECT and /dev/raw/. Doing AIO reads was
> using a lot more cpu time than using dd on the raw partition even with
> the io_queue_wait patch. Found out that aio is doing readahead even
> for O_DIRECT. Here's a patch that fixes it. Here are some output

That was a good catch. Thanks !

> from time on reading a 90mb partition on a raw device with 32k i/o:
>
> CMD Kernel I/O real user sys
> RUN version size
> === ====== === ==== ==== ===
> dd 2.5.73-mm2: 32k 2.514s 0.000s 0.200s
> aio (1 iocb) 2.5.73-mm2: 32k 3.408s 1.120s 2.250s
> aio (32 iocb) 2.5.73-mm2: 32k 2.994s 0.830s 2.160s
> aio (1 iocb) 2.5.73-mm2+patch.aiodio 32k 2.509s 0.850s 1.660s
> aio (1 iocb) 2.5.73-mm2+patch.aiodio
> +patch.io_queue_wait 32k 2.566s 0.010s 0.240s
> aio (32 iocb) 2.5.73-mm2+patch.aiodio
> +patch.io_queue_wait 32k 2.465s 0.010s 0.080s
>
> With this patch and my previous io_queue_wait patch the cpu time
> drops down to very little when doing AIO with O_DIRECT.
>
> Daniel McNeil <[email protected]>

> --- linux-2.5.73-mm2/fs/aio.c 2003-06-30 16:39:15.874216228 -0700
> +++ linux-2.5.73-mm2.aiodio/fs/aio.c 2003-06-30 15:38:46.000000000 -0700
> @@ -1380,15 +1380,20 @@ ssize_t aio_setup_iocb(struct kiocb *kio
> break;
> ret = -EINVAL;
> if (file->f_op->aio_read) {
> - struct address_space *mapping =
> - file->f_dentry->d_inode->i_mapping;
> - unsigned long index = kiocb->ki_pos >> PAGE_CACHE_SHIFT;
> - unsigned long end = (kiocb->ki_pos + kiocb->ki_left)
> - >> PAGE_CACHE_SHIFT;
> -
> - for (; index < end; index++) {
> - page_cache_readahead(mapping, &file->f_ra,
> - file, index);
> + /*
> + * Do not do readahead for DIRECT i/o
> + */
> + if (!(file->f_flags & O_DIRECT)) {
> + struct address_space *mapping =
> + file->f_dentry->d_inode->i_mapping;
> + unsigned long index = kiocb->ki_pos >> PAGE_CACHE_SHIFT;
> + unsigned long end = (kiocb->ki_pos + kiocb->ki_left)
> + >> PAGE_CACHE_SHIFT;
> +
> + for (; index < end; index++) {
> + page_cache_readahead(mapping, &file->f_ra,
> + file, index);
> + }
> }
> kiocb->ki_retry = aio_pread;
> }


--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Labs, India

2003-07-03 14:11:28

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 2.5.73-mm2] aio O_DIRECT no readahead

On Mon, Jun 30, 2003 at 06:19:51PM -0700, Daniel McNeil wrote:
> More testing on AIO with O_DIRECT and /dev/raw/. Doing AIO reads was
> using a lot more cpu time than using dd on the raw partition even with
> the io_queue_wait patch. Found out that aio is doing readahead even
> for O_DIRECT. Here's a patch that fixes it. Here are some output
> from time on reading a 90mb partition on a raw device with 32k i/o:

I'd have actually liked to move the readahead out from aio.c (avoid
the duplication and bugs like this one), and instead modify the
common read code to just readahead all the pages for a given
request upfront, rather than in the loop where it waits for pages
to become up-to-date.

Does anyone foresee any issues with doing it this way ?

Something like the appended patch.

Regards
Suparna

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Labs, India


--- linux-2.5.73-mm1/fs/aio.c Thu Jul 3 21:19:27 2003
+++ linux-2.5.73-mm1-aio/fs/aio.c Thu Jul 3 21:44:07 2003
@@ -1379,19 +1379,8 @@ ssize_t aio_setup_iocb(struct kiocb *kio
kiocb->ki_left)))
break;
ret = -EINVAL;
- if (file->f_op->aio_read) {
- struct address_space *mapping =
- file->f_dentry->d_inode->i_mapping;
- unsigned long index = kiocb->ki_pos >> PAGE_CACHE_SHIFT;
- unsigned long end = (kiocb->ki_pos + kiocb->ki_left)
- >> PAGE_CACHE_SHIFT;
-
- for (; index < end; index++) {
- page_cache_readahead(mapping, &file->f_ra,
- file, index);
- }
+ if (file->f_op->aio_read)
kiocb->ki_retry = aio_pread;
- }
break;
case IOCB_CMD_PWRITE:
ret = -EBADF;
diff -pur -X dontdiff linux-2.5.73-mm1/include/linux/aio.h linux-2.5.73-mm1-aio/include/linux/aio.h
--- linux-2.5.73-mm1/include/linux/aio.h Thu Jul 3 21:19:33 2003
+++ linux-2.5.73-mm1-aio/include/linux/aio.h Thu Jul 3 19:38:43 2003
@@ -179,6 +180,9 @@ int FASTCALL(io_submit_one(struct kioctx
dump_stack(); \
}

+#define io_wait_to_kiocb(wait) container_of(wait, struct kiocb, ki_wait)
+#define is_retried_kiocb(iocb) ((iocb)->ki_retried > 1)
+
#include <linux/aio_abi.h>

static inline struct kiocb *list_kiocb(struct list_head *h)
diff -pur -X dontdiff linux-2.5.73-mm1/mm/filemap.c linux-2.5.73-mm1-aio/mm/filemap.c
--- linux-2.5.73-mm1/mm/filemap.c Thu Jul 3 21:19:34 2003
+++ linux-2.5.73-mm1-aio/mm/filemap.c Thu Jul 3 21:21:41 2003
@@ -601,21 +601,39 @@ void do_generic_mapping_read(struct addr
read_actor_t actor)
{
struct inode *inode = mapping->host;
- unsigned long index, offset;
+ unsigned long index, offset, last, end_index;
struct page *cached_page;
+ loff_t isize = i_size_read(inode);
int error;

cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;

+ last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
+ end_index = isize >> PAGE_CACHE_SHIFT;
+ if (last > end_index)
+ last = end_index;
+
+ /* Don't repeat the readahead if we are executing aio retries */
+ if (in_aio()) {
+ if (is_retried_kiocb(io_wait_to_kiocb(current->io_wait)))
+ goto done_readahead;
+ }
+
+ /*
+ * Let the readahead logic know upfront about all
+ * the pages we'll need to satisfy this request
+ */
+ for (; index < last; index++)
+ page_cache_readahead(mapping, ra, filp, index);
+ index = *ppos >> PAGE_CACHE_SHIFT;
+
+done_readahead:
for (;;) {
struct page *page;
- unsigned long end_index, nr, ret;
- loff_t isize = i_size_read(inode);
+ unsigned long nr, ret;

- end_index = isize >> PAGE_CACHE_SHIFT;
-
if (index > end_index)
break;
nr = PAGE_CACHE_SIZE;
@@ -626,8 +644,6 @@ void do_generic_mapping_read(struct addr
}

cond_resched();
- if (is_sync_wait(current->io_wait))
- page_cache_readahead(mapping, ra, filp, index);

nr = nr - offset;
find_page: