2002-09-19 21:06:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] [PATCH] 2.5.35 patch for making DIO async--performance numbers

Mingming Cao wrote:
>
> Hi Ben & Andrew,
>
> I run sync raw I/O tests and Narasimha's async I/O tests on a 8 way PIII to
> measure the I/O performance before/after the dio async patch. All the
> tests (sync and async) did 4000 * 256K I/O on 40 disks.
>
> Basically, sync RAW read/write performance has no affect with the dio async
> patch. Async I/O seems to be slower than the sync I/O. Async RAW I/O got
> better performance when the queue length for io_submit() is set to be 4.
>
> I measured the time per test. vmstat infos are also listed below.
>

Thanks. Note that the old code (which seems to be a tiny bit faster,
and used less CPU as well) has a significantly higher context switch
rate. At a guess I'd say that it is more efficient at getting userspace
up and running in response to IO completion.

I'd say it's only likely to affect these huge linear IOs. Once you get
into real workloads which are seeking and merging then a bit of latency
here or there would just be soaked up by other system activity.

Ah. The current direct-io.c uses wake_up_process(), not waitqueues.
So the aio version has to wear the waitqueue cost. If you're using the
-mm patch I'd suggest that you convert aio.c to prepare_to_wait/finish_wait.
The waitqueue/wakeup costs on your 8-ways seem to be very high.


2002-09-19 21:15:21

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC] [PATCH] 2.5.35 patch for making DIO async--performance numbers

Andrew,

>
> Thanks. Note that the old code (which seems to be a tiny bit faster,
> and used less CPU as well) has a significantly higher context switch
> rate. At a guess I'd say that it is more efficient at getting userspace
> up and running in response to IO completion.
>

I my patch, I removed bio_list. So, I do all the processing of "bio"
in end_io() function, instead of postpone it to waiter. Do you think
this matters ?


> I'd say it's only likely to affect these huge linear IOs. Once you get
> into real workloads which are seeking and merging then a bit of latency
> here or there would just be soaked up by other system activity.
>
> Ah. The current direct-io.c uses wake_up_process(), not waitqueues.
> So the aio version has to wear the waitqueue cost. If you're using the
> -mm patch I'd suggest that you convert aio.c to prepare_to_wait/finish_wait.
> The waitqueue/wakeup costs on your 8-ways seem to be very high.

Ok !! I still use wake_up_process() for the sync case.
I will try to use waitqueues and see.

Thanks,
Badari