2008-12-12 11:22:56

by Thanos McAtos

[permalink] [raw]
Subject: Inexplicable I/O latency using worker threads

Hello all.

I am facing a weird problem with a virtual block driver I made concerning excessive I/O latency.

My block driver intercepts requests and redirects them to a real block device,
but not just be setting the bio->bi_bdev field, I create new bios.

Anyway, my problem is that for load balancing reasons I need per-CPU worker threads
where I enqueue requests and let them do all the work. If I use 2 threads in a round
robin manner (request 1 served by CPU 0, 2 by CPU1, 3 by CPU0 and so on), performance
is inexplicably low.

If I choose only one CPU to act as a worker the problem is gone. The difference of measured
I/O latency is more than 30 times.

What could be happening?

I'm using a vanilla 2.6.18.8.

Thanx in advance.


2008-12-12 17:48:22

by Chris Snook

[permalink] [raw]
Subject: Re: Inexplicable I/O latency using worker threads

Thanos Makatos wrote:
> Hello all.
>
> I am facing a weird problem with a virtual block driver I made concerning excessive I/O latency.
>
> My block driver intercepts requests and redirects them to a real block device,
> but not just be setting the bio->bi_bdev field, I create new bios.
>
> Anyway, my problem is that for load balancing reasons I need per-CPU worker threads
> where I enqueue requests and let them do all the work. If I use 2 threads in a round
> robin manner (request 1 served by CPU 0, 2 by CPU1, 3 by CPU0 and so on), performance
> is inexplicably low.
>
> If I choose only one CPU to act as a worker the problem is gone. The difference of measured
> I/O latency is more than 30 times.
>
> What could be happening?
>
> I'm using a vanilla 2.6.18.8.
>
> Thanx in advance.

a) I/O scheduling
b) lock contention

Do you really need to load balance I/O to a single bdev across multiple CPUs?
Disk I/O generally isn't very CPU-intensive.

-- Chris

2008-12-13 13:42:26

by Thanos McAtos

[permalink] [raw]
Subject: Re: Inexplicable I/O latency using worker threads

> Thanos Makatos wrote:
>> Hello all.
>>
>> I am facing a weird problem with a virtual block driver I made
>> concerning excessive I/O latency.
>>
>> My block driver intercepts requests and redirects them to a real block
>> device,
>> but not just be setting the bio->bi_bdev field, I create new bios.
>>
>> Anyway, my problem is that for load balancing reasons I need per-CPU
>> worker threads
>> where I enqueue requests and let them do all the work. If I use 2
>> threads in a round
>> robin manner (request 1 served by CPU 0, 2 by CPU1, 3 by CPU0 and so
>> on), performance
>> is inexplicably low.
>>
>> If I choose only one CPU to act as a worker the problem is gone. The
>> difference of measured
>> I/O latency is more than 30 times.
>>
>> What could be happening?
>>
>> I'm using a vanilla 2.6.18.8.
>>
>> Thanx in advance.
>
> a) I/O scheduling
What does I/O scheduling has to do with from where the I/O request
originates?
> b) lock contention
I doubt there is lock contention, my simple test uses 1 outstanding I/O.
>
> Do you really need to load balance I/O to a single bdev across multiple
> CPUs?
Yes, because I have to do very CPU-intensive operations (compression etc.).
> Disk I/O generally isn't very CPU-intensive.
>
> -- Chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>