2006-11-03 08:23:40

by Brent Baccala

[permalink] [raw]
Subject: async I/O seems to be blocking on 2.6.15

Hello -

I'm running 2.6.15 (Debian) on a Pentium M laptop, PCI attached ext3
filesystem.

I'm writing my first asynchronous I/O program, and for a while I
thought I was really doing something wrong, but more and more I'm
starting to conclude that the problem might be in the kernel.

Basically, I've narrowed things down to a test program which opens a
large (700 MB) file in O_DIRECT mode and fires off 100 one MB async
reads for the first 100 MB of data. The enqueues take about 5 seconds
to complete, which is also about the amount of time this disk needs to
read 100 MB, so I suspect that it's blocking.

I've gotten the POSIX AIO interface at least tolerably running using
the GLIBC thread-based implementation, but I really want the native
interface working.

I whittled the test program down to use system calls instead of the
POSIX AIO library, and I'm attaching a copy. You put a big file at
'testfile' (it just reads it) and run the program:


baccala@debian ~/src/endgame$ time ./testaio
Enqueues starting
Enqueues complete

real 0m5.327s
user 0m0.004s
sys 0m0.740s
baccala@debian ~/src/endgame$


Of that five seconds, it's almost all spent between the two "enqueues"
messages.

If anybody can shed any light on this, I'd appreciate your feedback
direct to [email protected] (I don't read the list).

Thank you.



-bwb

Brent Baccala
[email protected]


Attachments:
testaio.c (1.50 kB)

2006-11-03 12:19:05

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Fri, Nov 03 2006, Brent Baccala wrote:
> Hello -
>
> I'm running 2.6.15 (Debian) on a Pentium M laptop, PCI attached ext3
> filesystem.
>
> I'm writing my first asynchronous I/O program, and for a while I
> thought I was really doing something wrong, but more and more I'm
> starting to conclude that the problem might be in the kernel.
>
> Basically, I've narrowed things down to a test program which opens a
> large (700 MB) file in O_DIRECT mode and fires off 100 one MB async
> reads for the first 100 MB of data. The enqueues take about 5 seconds
> to complete, which is also about the amount of time this disk needs to
> read 100 MB, so I suspect that it's blocking.
>
> I've gotten the POSIX AIO interface at least tolerably running using
> the GLIBC thread-based implementation, but I really want the native
> interface working.
>
> I whittled the test program down to use system calls instead of the
> POSIX AIO library, and I'm attaching a copy. You put a big file at
> 'testfile' (it just reads it) and run the program:
>
>
> baccala@debian ~/src/endgame$ time ./testaio
> Enqueues starting
> Enqueues complete
>
> real 0m5.327s
> user 0m0.004s
> sys 0m0.740s
> baccala@debian ~/src/endgame$
>
>
> Of that five seconds, it's almost all spent between the two "enqueues"
> messages.

You don't mention what hardware you are running this on (the disk sub
system). io_submit() will block, if you run out of block layer requests.
We have 128 of those by default, but if your io ends up getting chopped
into somewhat smaller bits than 1MiB each, then you end up having to
block on allocation of those. So lets say your /src is mounted on
/dev/sdaX, try:

# echo 512 > /sys/block/sda/queue/nr_requests

(substitute sda for whatever device your /src is on)

and re-test. The time between starting and complete should be a lot
smaller, now that you are not blocking on blkdev request allocation. You
may also want to look at the max_sectors_kb in the queue/ directory,
that'll tell you how large a single io will be at most once it reaches
the driver.

--
Jens Axboe

2006-11-03 15:58:17

by Brent Baccala

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Fri, 3 Nov 2006, Jens Axboe wrote:

> On Fri, Nov 03 2006, Brent Baccala wrote:
>> Hello -
>>
>> I'm running 2.6.15 (Debian) on a Pentium M laptop, PCI attached ext3
>> filesystem.
>>
>> I'm writing my first asynchronous I/O program, and for a while I
>> thought I was really doing something wrong, but more and more I'm
>> starting to conclude that the problem might be in the kernel.
>>
>> Basically, I've narrowed things down to a test program which opens a
>> large (700 MB) file in O_DIRECT mode and fires off 100 one MB async
>> reads for the first 100 MB of data. The enqueues take about 5 seconds
>> to complete, which is also about the amount of time this disk needs to
>> read 100 MB, so I suspect that it's blocking.
>>
>> I've gotten the POSIX AIO interface at least tolerably running using
>> the GLIBC thread-based implementation, but I really want the native
>> interface working.
>>
>> I whittled the test program down to use system calls instead of the
>> POSIX AIO library, and I'm attaching a copy. You put a big file at
>> 'testfile' (it just reads it) and run the program:
>>
>>
>> baccala@debian ~/src/endgame$ time ./testaio
>> Enqueues starting
>> Enqueues complete
>>
>> real 0m5.327s
>> user 0m0.004s
>> sys 0m0.740s
>> baccala@debian ~/src/endgame$
>>
>>
>> Of that five seconds, it's almost all spent between the two "enqueues"
>> messages.
>
> You don't mention what hardware you are running this on (the disk sub
> system). io_submit() will block, if you run out of block layer requests.
> We have 128 of those by default, but if your io ends up getting chopped
> into somewhat smaller bits than 1MiB each, then you end up having to
> block on allocation of those. So lets say your /src is mounted on
> /dev/sdaX, try:
>
> # echo 512 > /sys/block/sda/queue/nr_requests
>
> (substitute sda for whatever device your /src is on)
>
> and re-test. The time between starting and complete should be a lot
> smaller, now that you are not blocking on blkdev request allocation. You
> may also want to look at the max_sectors_kb in the queue/ directory,
> that'll tell you how large a single io will be at most once it reaches
> the driver.
>
> --
> Jens Axboe
>

OK, good question. Here's what the kernel reports about the controller:

ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ALI15X3: IDE controller at PCI slot 0000:00:04.0
ACPI: PCI Interrupt 0000:00:04.0[A]: no GSI
ALI15X3: chipset revision 195
ALI15X3: not 100%% native mode: will probe irqs later
ide0: BM-DMA at 0xeff0-0xeff7, BIOS settings: hda:DMA, hdb:pio
ALI15X3: simplex device: DMA forced
ide1: BM-DMA at 0xeff8-0xefff, BIOS settings: hdc:DMA, hdd:DMA

Your suggestion definitely helped! /proc/sys/block/hda/queue/max_sectors_kb
reported 128, so I tried:

# echo 1024 > /sys/block/hda/queue/nr_requests

And am now definately seeing async behavior!

The enqueues still take a noticable amount of time, though, just a lot
less than before. They average 1 second total. That's 100 one-MB
reads, broken down into 128 KB blocks, I suppose, for a total of 800
low-level reads. Setting nr_requests higher (2048) doesn't seem to do
any more good.

I can see that you've put me on the right track, but I am still
puzzling... any idea what the remaining second is being used for?



-bwb

Brent Baccala
[email protected]

2006-11-03 16:00:20

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Fri, Nov 03 2006, Brent Baccala wrote:
> On Fri, 3 Nov 2006, Jens Axboe wrote:
>
> >On Fri, Nov 03 2006, Brent Baccala wrote:
> >>Hello -
> >>
> >>I'm running 2.6.15 (Debian) on a Pentium M laptop, PCI attached ext3
> >>filesystem.
> >>
> >>I'm writing my first asynchronous I/O program, and for a while I
> >>thought I was really doing something wrong, but more and more I'm
> >>starting to conclude that the problem might be in the kernel.
> >>
> >>Basically, I've narrowed things down to a test program which opens a
> >>large (700 MB) file in O_DIRECT mode and fires off 100 one MB async
> >>reads for the first 100 MB of data. The enqueues take about 5 seconds
> >>to complete, which is also about the amount of time this disk needs to
> >>read 100 MB, so I suspect that it's blocking.
> >>
> >>I've gotten the POSIX AIO interface at least tolerably running using
> >>the GLIBC thread-based implementation, but I really want the native
> >>interface working.
> >>
> >>I whittled the test program down to use system calls instead of the
> >>POSIX AIO library, and I'm attaching a copy. You put a big file at
> >>'testfile' (it just reads it) and run the program:
> >>
> >>
> >>baccala@debian ~/src/endgame$ time ./testaio
> >>Enqueues starting
> >>Enqueues complete
> >>
> >>real 0m5.327s
> >>user 0m0.004s
> >>sys 0m0.740s
> >>baccala@debian ~/src/endgame$
> >>
> >>
> >>Of that five seconds, it's almost all spent between the two "enqueues"
> >>messages.
> >
> >You don't mention what hardware you are running this on (the disk sub
> >system). io_submit() will block, if you run out of block layer requests.
> >We have 128 of those by default, but if your io ends up getting chopped
> >into somewhat smaller bits than 1MiB each, then you end up having to
> >block on allocation of those. So lets say your /src is mounted on
> >/dev/sdaX, try:
> >
> ># echo 512 > /sys/block/sda/queue/nr_requests
> >
> >(substitute sda for whatever device your /src is on)
> >
> >and re-test. The time between starting and complete should be a lot
> >smaller, now that you are not blocking on blkdev request allocation. You
> >may also want to look at the max_sectors_kb in the queue/ directory,
> >that'll tell you how large a single io will be at most once it reaches
> >the driver.
> >
> >--
> >Jens Axboe
> >
>
> OK, good question. Here's what the kernel reports about the controller:
>
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> ALI15X3: IDE controller at PCI slot 0000:00:04.0
> ACPI: PCI Interrupt 0000:00:04.0[A]: no GSI
> ALI15X3: chipset revision 195
> ALI15X3: not 100%% native mode: will probe irqs later
> ide0: BM-DMA at 0xeff0-0xeff7, BIOS settings: hda:DMA, hdb:pio
> ALI15X3: simplex device: DMA forced
> ide1: BM-DMA at 0xeff8-0xefff, BIOS settings: hdc:DMA, hdd:DMA
>
> Your suggestion definitely helped! /proc/sys/block/hda/queue/max_sectors_kb
> reported 128, so I tried:
>
> # echo 1024 > /sys/block/hda/queue/nr_requests
>
> And am now definately seeing async behavior!
>
> The enqueues still take a noticable amount of time, though, just a lot
> less than before. They average 1 second total. That's 100 one-MB
> reads, broken down into 128 KB blocks, I suppose, for a total of 800
> low-level reads. Setting nr_requests higher (2048) doesn't seem to do
> any more good.
>
> I can see that you've put me on the right track, but I am still
> puzzling... any idea what the remaining second is being used for?

Try to time it (visual output of the app is not very telling, and it's
buffered) and then apply some profiling.

--
Jens Axboe

2006-11-03 17:09:58

by Brent Baccala

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Fri, 3 Nov 2006, Jens Axboe wrote:

> On Fri, Nov 03 2006, Brent Baccala wrote:
>>
>> ...
>>
>> The enqueues still take a noticable amount of time, though, just a lot
>> less than before. They average 1 second total. That's 100 one-MB
>> reads, broken down into 128 KB blocks, I suppose, for a total of 800
>> low-level reads. Setting nr_requests higher (2048) doesn't seem to do
>> any more good.
>>
>> I can see that you've put me on the right track, but I am still
>> puzzling... any idea what the remaining second is being used for?
>
> Try to time it (visual output of the app is not very telling, and it's
> buffered) and then apply some profiling.
>

Yeah, I did time it. I bracketed the enqueues with calls to
gettimeofday(). It takes them about a second to run, and I made sure
that the fprintfs were outside the timing checks. Here's the actual code:


fprintf(stderr, "Enqueues starting\n");
gettimeofday(&tv1, NULL);

for (i=0; i<NUMAIOS; i++) {

memset(&iocb[i], 0, sizeof(struct iocb));

iocb[i].aio_lio_opcode = IOCB_CMD_PREAD;
iocb[i].aio_fildes = fd;
iocb[i].aio_buf = (unsigned long) buffer[i];
iocb[i].aio_nbytes = BUFFER_BYTES;
iocb[i].aio_offset = BUFFER_BYTES * i;
/* aiocb[i].aio_offset = 0; */

iocbp[0] = &iocb[i];
if (io_submit(aio_default_context, 1, iocbp) != 1) {
perror("");
fprintf(stderr, "Can't enqueue aio_read %d\n", i);
}
}

gettimeofday(&tv2, NULL);
subtract_timeval(&tv2, &tv1);
sprint_timeval(strbuf, &tv2);
fprintf(stderr, "Enqueues complete in %s\n", strbuf);


And the output:


baccala@debian ~/src/endgame$ time ./testaio
Enqueues starting
Enqueues complete in 1.187s

real 0m5.335s
user 0m0.001s
sys 0m0.760s
baccala@debian ~/src/endgame$


What kind of profiling did you have in mind? Kernel profiling? As
you can see, its user time is basically nil.



-bwb

Brent Baccala
[email protected]

2006-11-03 17:30:10

by Brent Baccala

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Fri, 3 Nov 2006, Jens Axboe wrote:

> Try to time it (visual output of the app is not very telling, and it's
> buffered) and then apply some profiling.

OK, a little more info. I added gettimeofday() calls after each call
to io_submit(), put the timevals in an array, and after everything was
done computed the difference between each timeval and the program start
time, as well as the deltas. I got this:

0: 0.080s
1: 0.086s 0.006s
2: 0.102s 0.016s
3: 0.111s 0.008s
4: 0.118s 0.007s
5: 0.134s 0.015s
6: 0.141s 0.006s
7: 0.148s 0.006s
8: 0.158s 0.009s
9: 0.164s 0.006s
...
96: 1.036s 0.007s
97: 1.044s 0.007s
98: 1.147s 0.102s
99: 1.155s 0.008s

98 appears to be an aberration. Perhaps three of the times on an
average run are around a tenth of a second; all of the others are
pretty steady at 7 or 8 microseconds. So, it's basically linear in
its time consumption.

Does 7 microseconds seem a bit excessive for an io_submit (and a
gettimeofday)?



-bwb

Brent Baccala
[email protected]

2006-11-05 12:13:21

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Fri, Nov 03 2006, Brent Baccala wrote:
> On Fri, 3 Nov 2006, Jens Axboe wrote:
>
> >Try to time it (visual output of the app is not very telling, and it's
> >buffered) and then apply some profiling.
>
> OK, a little more info. I added gettimeofday() calls after each call
> to io_submit(), put the timevals in an array, and after everything was
> done computed the difference between each timeval and the program start
> time, as well as the deltas. I got this:
>
> 0: 0.080s
> 1: 0.086s 0.006s
> 2: 0.102s 0.016s
> 3: 0.111s 0.008s
> 4: 0.118s 0.007s
> 5: 0.134s 0.015s
> 6: 0.141s 0.006s
> 7: 0.148s 0.006s
> 8: 0.158s 0.009s
> 9: 0.164s 0.006s
> ...
> 96: 1.036s 0.007s
> 97: 1.044s 0.007s
> 98: 1.147s 0.102s
> 99: 1.155s 0.008s
>
> 98 appears to be an aberration. Perhaps three of the times on an
> average run are around a tenth of a second; all of the others are
> pretty steady at 7 or 8 microseconds. So, it's basically linear in
> its time consumption.
>
> Does 7 microseconds seem a bit excessive for an io_submit (and a
> gettimeofday)?

I guess you mean miliseconds, not microseconds. 7 miliseconds seems way
too long. I repeated your test here, and the 100 submits take 97000
microseconds here - or 97 miliseconds. So that's a little less than 1
msec per io_submit. Still pretty big. You can experiment with oprofile
to profile where the kernel spends its time in that period.

--
Jens Axboe

2006-11-06 06:42:41

by Brent Baccala

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Sun, 5 Nov 2006, Jens Axboe wrote:

> On Fri, Nov 03 2006, Brent Baccala wrote:
>>
>> Does 7 microseconds seem a bit excessive for an io_submit (and a
>> gettimeofday)?
>
> I guess you mean miliseconds, not microseconds. 7 miliseconds seems way
> too long. I repeated your test here, and the 100 submits take 97000
> microseconds here - or 97 miliseconds. So that's a little less than 1
> msec per io_submit. Still pretty big. You can experiment with oprofile
> to profile where the kernel spends its time in that period.
>
> --
> Jens Axboe
>

Yes, of course, milliseconds. I have enough other problems with this
program (measured in minutes, and no mistake that) that I doubt I'll
be profiling the kernel any time soon, but thank you for your help.

More than anything else, you've made me understand that I can't just
fire off a bunch of async requests like I'm tossing peanuts across the
table. I've really got to pay attention to what's in that kernel
queue and how it gets managed.



-bwb

Brent Baccala
[email protected]

2006-11-06 10:41:45

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Mon, Nov 06 2006, Brent Baccala wrote:
> On Sun, 5 Nov 2006, Jens Axboe wrote:
>
> >On Fri, Nov 03 2006, Brent Baccala wrote:
> >>
> >>Does 7 microseconds seem a bit excessive for an io_submit (and a
> >>gettimeofday)?
> >
> >I guess you mean miliseconds, not microseconds. 7 miliseconds seems way
> >too long. I repeated your test here, and the 100 submits take 97000
> >microseconds here - or 97 miliseconds. So that's a little less than 1
> >msec per io_submit. Still pretty big. You can experiment with oprofile
> >to profile where the kernel spends its time in that period.
> >
> >--
> >Jens Axboe
> >
>
> Yes, of course, milliseconds. I have enough other problems with this
> program (measured in minutes, and no mistake that) that I doubt I'll
> be profiling the kernel any time soon, but thank you for your help.
>
> More than anything else, you've made me understand that I can't just
> fire off a bunch of async requests like I'm tossing peanuts across the
> table. I've really got to pay attention to what's in that kernel
> queue and how it gets managed.

Yeah, I'm afraid so. We really should be returning EAGAIN or something
like that for the congested condition, though.

--
Jens Axboe

2006-11-06 15:52:48

by Phillip Susi

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

Jens Axboe wrote:
> Yeah, I'm afraid so. We really should be returning EAGAIN or something
> like that for the congested condition, though.

How would user mode know when it was safe to retry the request?


2006-11-06 16:00:45

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Mon, Nov 06 2006, Phillip Susi wrote:
> Jens Axboe wrote:
> >Yeah, I'm afraid so. We really should be returning EAGAIN or something
> >like that for the congested condition, though.
>
> How would user mode know when it was safe to retry the request?

You could optimistically retry when you had reaped some completed
events, or use some controlled way of blocking for free request
notification. There are many ways, most of them share the fact that the
time between notification and new io_submit() may change the picture, in
which case you'd get EAGAIN once more.

The important bit is imho to make the blocking at least deterministic.
At some point you _have_ to block for resources, but it's not very
polite to be blocking for a considerable time indeterministically.

--
Jens Axboe

2006-11-06 17:04:27

by Phillip Susi

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

Jens Axboe wrote:
> You could optimistically retry when you had reaped some completed
> events, or use some controlled way of blocking for free request
> notification. There are many ways, most of them share the fact that the
> time between notification and new io_submit() may change the picture, in
> which case you'd get EAGAIN once more.
>
> The important bit is imho to make the blocking at least deterministic.
> At some point you _have_ to block for resources, but it's not very
> polite to be blocking for a considerable time indeterministically.
>

Right, but there currently exists no mechanism for waiting until there
is room in the queue is there? My point was that this would be required
in order to return EAGAIN.

2006-11-06 17:07:59

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Mon, Nov 06 2006, Phillip Susi wrote:
> Jens Axboe wrote:
> >You could optimistically retry when you had reaped some completed
> >events, or use some controlled way of blocking for free request
> >notification. There are many ways, most of them share the fact that the
> >time between notification and new io_submit() may change the picture, in
> >which case you'd get EAGAIN once more.
> >
> >The important bit is imho to make the blocking at least deterministic.
> >At some point you _have_ to block for resources, but it's not very
> >polite to be blocking for a considerable time indeterministically.
> >
>
> Right, but there currently exists no mechanism for waiting until there
> is room in the queue is there? My point was that this would be required
> in order to return EAGAIN.

No mechanism exists, since we don't return EAGAIN for this condition
right now. I wouldn't say it's required, just retrying when you have
reaped one (or more) completed events should be good enough. You can't
do much better than that anyway, with others beating on the disk as
well.

--
Jens Axboe

2006-11-06 21:22:17

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: async I/O seems to be blocking on 2.6.15

Jens Axboe wrote on Sunday, November 05, 2006 4:15 AM
> On Fri, Nov 03 2006, Brent Baccala wrote:
> > On Fri, 3 Nov 2006, Jens Axboe wrote:
> >
> > >Try to time it (visual output of the app is not very telling, and it's
> > >buffered) and then apply some profiling.
> >
> > OK, a little more info. I added gettimeofday() calls after each call
> > to io_submit(), put the timevals in an array, and after everything was
> > done computed the difference between each timeval and the program start
> > time, as well as the deltas. I got this:
> >
> > 0: 0.080s
> > 1: 0.086s 0.006s
> > 2: 0.102s 0.016s
> > 3: 0.111s 0.008s
> > 4: 0.118s 0.007s
> > 5: 0.134s 0.015s
> > 6: 0.141s 0.006s
> > 7: 0.148s 0.006s
> > 8: 0.158s 0.009s
> > 9: 0.164s 0.006s
> > ...
> > 96: 1.036s 0.007s
> > 97: 1.044s 0.007s
> > 98: 1.147s 0.102s
> > 99: 1.155s 0.008s
> >
> > 98 appears to be an aberration. Perhaps three of the times on an
> > average run are around a tenth of a second; all of the others are
> > pretty steady at 7 or 8 microseconds. So, it's basically linear in
> > its time consumption.
> >
> > Does 7 microseconds seem a bit excessive for an io_submit (and a
> > gettimeofday)?
>
> I guess you mean miliseconds, not microseconds. 7 miliseconds seems way
> too long. I repeated your test here, and the 100 submits take 97000
> microseconds here - or 97 miliseconds. So that's a little less than 1
> msec per io_submit. Still pretty big. You can experiment with oprofile
> to profile where the kernel spends its time in that period.


I've tried that myself too and see similar result. One thing to note is
that I/O being submitted are pretty big at 1MB, so the vector list inside
bio is going to be pretty long and it will take a while to construct that.
Drop the size for each I/O to something like 4KB will significantly reduce
the time. I haven't done the measurement whether the time to submit I/O
grows linearly with respect to I/O size. Most likely it will. If it is
not, then we might have a scaling problem (though I don't believe we have
this problem).

- Ken


2006-11-07 00:04:05

by Brent Baccala

[permalink] [raw]
Subject: RE: async I/O seems to be blocking on 2.6.15

On Mon, 6 Nov 2006, Chen, Kenneth W wrote:

> I've tried that myself too and see similar result. One thing to note is
> that I/O being submitted are pretty big at 1MB, so the vector list inside
> bio is going to be pretty long and it will take a while to construct that.
> Drop the size for each I/O to something like 4KB will significantly reduce
> the time. I haven't done the measurement whether the time to submit I/O
> grows linearly with respect to I/O size. Most likely it will. If it is
> not, then we might have a scaling problem (though I don't believe we have
> this problem).
>
> - Ken
>
>

I'm basically an end user here (as far as the kernel is concerned), so
let me ask the basic "dumb user" question here:

How should I do my async I/O if I just want to read or write
sequentially through a file, using O_DIRECT, and letting the CPU get
some work done in the meantime? What about more random access?

I've already concluded that I should try to keep my read and write
files on seperate disks and hopefully on seperate controllers, but I
still seem to be fighting this thing to keep it from blocking.


-bwb

Brent Baccala
[email protected]

2006-11-07 00:24:19

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: async I/O seems to be blocking on 2.6.15

Brent Baccala wrote on Monday, November 06, 2006 4:04 PM
> On Mon, 6 Nov 2006, Chen, Kenneth W wrote:
> > I've tried that myself too and see similar result. One thing to note is
> > that I/O being submitted are pretty big at 1MB, so the vector list inside
> > bio is going to be pretty long and it will take a while to construct that.
> > Drop the size for each I/O to something like 4KB will significantly reduce
> > the time. I haven't done the measurement whether the time to submit I/O
> > grows linearly with respect to I/O size. Most likely it will. If it is
> > not, then we might have a scaling problem (though I don't believe we have
> > this problem).
>
> I'm basically an end user here (as far as the kernel is concerned), so
> let me ask the basic "dumb user" question here:
>
> How should I do my async I/O if I just want to read or write
> sequentially through a file, using O_DIRECT, and letting the CPU get
> some work done in the meantime? What about more random access?
>
> I've already concluded that I should try to keep my read and write
> files on seperate disks and hopefully on seperate controllers, but I
> still seem to be fighting this thing to keep it from blocking.


You bark rightfully at the fact that kernel blocks in the async path when
block queue is full. It would be nice that -EAGAIN is returned so app can
do other stuff.

My earlier post was commenting on the measurement of I/O submit when
nothing blocks (which happens to be extremely important to me). Your
measurement of 7ms seems excessive. While mine and Jens came at around
90 micro-second which isn't all that unreasonable given the size of the
I/O request.

- Ken

2006-11-07 07:23:58

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Mon, Nov 06 2006, Chen, Kenneth W wrote:
> Jens Axboe wrote on Sunday, November 05, 2006 4:15 AM
> > On Fri, Nov 03 2006, Brent Baccala wrote:
> > > On Fri, 3 Nov 2006, Jens Axboe wrote:
> > >
> > > >Try to time it (visual output of the app is not very telling, and it's
> > > >buffered) and then apply some profiling.
> > >
> > > OK, a little more info. I added gettimeofday() calls after each call
> > > to io_submit(), put the timevals in an array, and after everything was
> > > done computed the difference between each timeval and the program start
> > > time, as well as the deltas. I got this:
> > >
> > > 0: 0.080s
> > > 1: 0.086s 0.006s
> > > 2: 0.102s 0.016s
> > > 3: 0.111s 0.008s
> > > 4: 0.118s 0.007s
> > > 5: 0.134s 0.015s
> > > 6: 0.141s 0.006s
> > > 7: 0.148s 0.006s
> > > 8: 0.158s 0.009s
> > > 9: 0.164s 0.006s
> > > ...
> > > 96: 1.036s 0.007s
> > > 97: 1.044s 0.007s
> > > 98: 1.147s 0.102s
> > > 99: 1.155s 0.008s
> > >
> > > 98 appears to be an aberration. Perhaps three of the times on an
> > > average run are around a tenth of a second; all of the others are
> > > pretty steady at 7 or 8 microseconds. So, it's basically linear in
> > > its time consumption.
> > >
> > > Does 7 microseconds seem a bit excessive for an io_submit (and a
> > > gettimeofday)?
> >
> > I guess you mean miliseconds, not microseconds. 7 miliseconds seems way
> > too long. I repeated your test here, and the 100 submits take 97000
> > microseconds here - or 97 miliseconds. So that's a little less than 1
> > msec per io_submit. Still pretty big. You can experiment with oprofile
> > to profile where the kernel spends its time in that period.
>
>
> I've tried that myself too and see similar result. One thing to note is
> that I/O being submitted are pretty big at 1MB, so the vector list inside
> bio is going to be pretty long and it will take a while to construct that.
> Drop the size for each I/O to something like 4KB will significantly reduce
> the time. I haven't done the measurement whether the time to submit I/O
> grows linearly with respect to I/O size. Most likely it will. If it is
> not, then we might have a scaling problem (though I don't believe we have
> this problem).

True, it might not be all that unreasonable, just seemed a bit excessive
to me. If you submit smaller ios, you move the cost from bio_add_page()
to the merge logic in the driver. You'd have more allocations as well,
with bio's strung together instead of a bigger vector map.

--
Jens Axboe

2006-11-07 07:27:00

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Mon, Nov 06 2006, Brent Baccala wrote:
> On Mon, 6 Nov 2006, Chen, Kenneth W wrote:
>
> >I've tried that myself too and see similar result. One thing to note is
> >that I/O being submitted are pretty big at 1MB, so the vector list inside
> >bio is going to be pretty long and it will take a while to construct that.
> >Drop the size for each I/O to something like 4KB will significantly reduce
> >the time. I haven't done the measurement whether the time to submit I/O
> >grows linearly with respect to I/O size. Most likely it will. If it is
> >not, then we might have a scaling problem (though I don't believe we have
> >this problem).
> >
> >- Ken
> >
> >
>
> I'm basically an end user here (as far as the kernel is concerned), so
> let me ask the basic "dumb user" question here:
>
> How should I do my async I/O if I just want to read or write
> sequentially through a file, using O_DIRECT, and letting the CPU get
> some work done in the meantime? What about more random access?

For sequential io, you'll pretty quickly hit the transfer size sweet
spot where growing the io larger wont yield any benefits. In this case
you don't want a huge queue depth, 100 ios is an insane amount for
sequential io. If you have a properly sized io unit and a depth of 4 or
so, I doubt you'll see any improvement beyond that on most hardware.

For random io, you want a bigger queue depth. If the hardware can do
queueing, you want to make sure that you can fill that queue. 100 ios is
still a lot in this case, though.

> I've already concluded that I should try to keep my read and write
> files on seperate disks and hopefully on seperate controllers, but I
> still seem to be fighting this thing to keep it from blocking.

Shrink your queue size and/or io size :-)

--
Jens Axboe

2006-11-07 21:02:08

by Bill Davidsen

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

Chen, Kenneth W wrote:
> Jens Axboe wrote on Sunday, November 05, 2006 4:15 AM
>> On Fri, Nov 03 2006, Brent Baccala wrote:
>>> On Fri, 3 Nov 2006, Jens Axboe wrote:
>>>
>>>> Try to time it (visual output of the app is not very telling, and it's
>>>> buffered) and then apply some profiling.
>>> OK, a little more info. I added gettimeofday() calls after each call
>>> to io_submit(), put the timevals in an array, and after everything was
>>> done computed the difference between each timeval and the program start
>>> time, as well as the deltas. I got this:
>>>
>>> 0: 0.080s
>>> 1: 0.086s 0.006s
>>> 2: 0.102s 0.016s
>>> 3: 0.111s 0.008s
>>> 4: 0.118s 0.007s
>>> 5: 0.134s 0.015s
>>> 6: 0.141s 0.006s
>>> 7: 0.148s 0.006s
>>> 8: 0.158s 0.009s
>>> 9: 0.164s 0.006s
>>> ...
>>> 96: 1.036s 0.007s
>>> 97: 1.044s 0.007s
>>> 98: 1.147s 0.102s
>>> 99: 1.155s 0.008s
>>>
>>> 98 appears to be an aberration. Perhaps three of the times on an
>>> average run are around a tenth of a second; all of the others are
>>> pretty steady at 7 or 8 microseconds. So, it's basically linear in
>>> its time consumption.
>>>
>>> Does 7 microseconds seem a bit excessive for an io_submit (and a
>>> gettimeofday)?
>> I guess you mean miliseconds, not microseconds. 7 miliseconds seems way
>> too long. I repeated your test here, and the 100 submits take 97000
>> microseconds here - or 97 miliseconds. So that's a little less than 1
>> msec per io_submit. Still pretty big. You can experiment with oprofile
>> to profile where the kernel spends its time in that period.
>
>
> I've tried that myself too and see similar result. One thing to note is
> that I/O being submitted are pretty big at 1MB, so the vector list inside
> bio is going to be pretty long and it will take a while to construct that.
> Drop the size for each I/O to something like 4KB will significantly reduce
> the time. I haven't done the measurement whether the time to submit I/O
> grows linearly with respect to I/O size. Most likely it will. If it is
> not, then we might have a scaling problem (though I don't believe we have
> this problem).

But... I'm probably missing something, but submitting smaller i/o would
mean more system calls, and presumably more total overhead. I assume
they will be faster system calls, but if the kernel code is sorting and
merging requests even that might not be true. Having user space break it
into pieces and kernel space put them back together again isn't an
obvious win in overhead or a solution to blocking.

That said, I admit that I rarely use AIO, since the problems I have
where it would be useful are threaded already, and I can painlessly do
the i/o in a thread and let it block.

Perhaps the use of -EAGAIN could solve this? As mentioned by ken, I think.

--
Bill Davidsen <[email protected]>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.

2006-11-10 09:22:36

by Jens Axboe

[permalink] [raw]
Subject: Re: async I/O seems to be blocking on 2.6.15

On Mon, Nov 06 2006, Chen, Kenneth W wrote:
> I haven't done the measurement whether the time to submit I/O grows
> linearly with respect to I/O size. Most likely it will. If it is
> not, then we might have a scaling problem (though I don't believe we
> have this problem).

I would not expect it to, but ran a simple test to check submission
times from 32KiB to 2MiB:

Size Time (usecs)
----------------------------
2MiB 223
1MiB 112
512KiB 56
256KiB 29
128KiB 14
32KiB 4

(note that I skipped 64KiB, things got boring). That clearly shows a
linear increase in time, so no scaling problem there.

--
Jens Axboe