2008-07-06 20:07:28

by Timothy Normand Miller

[permalink] [raw]
Subject: HELP: Getting unexpected fakeraid behavior. Fix?

I apologize if this is a question that has come up a lot. I've done a
fair amount of googling on it, and while I've learned a lot, I have
not been able to find an answer to my specific question.

I'm running a beta of Gentoo 2008.0, ~amd64, kernel version
2.6.25-gentoo-r5 (Gentoo's 'genkernel'). The drive controller I'm
using is built into the Intel ICH9R southbridge. I used the Intel
Matrix BIOS tool to configure two SATA-II drives in a RAID1
configuration.

My understanding of fakeraid/dmraid is as follows:
- There is no real RAID controller, just a regular drive controller chip.
- The only difference is that there's some BIOS code that allows the
system BIOS use the array as a boot device.
- Once the kernel comes up, it's completely in control of the
individual drives, I/O scheduling, etc. (They look like regular SATA
devices.)
- So since fakeraid, as far as Linux is concerned, is just software
RAID, behavior and performance of fakeraid (dm) and software RAID (md)
should be identical (via the same drive controller).

Since RAID1 is a mirrored configuration, it's possible to distribute
reads across the drives, improving throughput and latency over a
single drive on random reads. I have also come to understand that
RAID1 systems in general and Linux specifically do in fact take
advantage of this mirroring to improve read performance.

I have written a program that, on start up, reads through thousands of
small files, and as a result does a great deal of random reads for
several minutes. While that was going on, I ran "iostat -d 2". My
observation was that any writes that occurred were correctly sent to
both disks, but all reads were being requested ONLY from the first
drive.

This suggests to me that I have some configuration mistake. Can
anyone give me any hints on this?

Thanks!

--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project


2008-07-06 20:19:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

On Sun, 6 Jul 2008 16:07:14 -0400
"Timothy Normand Miller" <[email protected]> wrote:

Him

> - So since fakeraid, as far as Linux is concerned, is just software
> RAID, behavior and performance of fakeraid (dm) and software RAID (md)
> should be identical (via the same drive controller).

there's a few minor nits on this concerning disk layout for non-RAID1;
for RAID1 they should be equivalent.

>
> Since RAID1 is a mirrored configuration, it's possible to distribute
> reads across the drives, improving throughput and latency over a
> single drive on random reads.

This is.. borderline true. Let me explain the caveats;
for a SINGLE THREADED workload, there is actually no difference.
Balancing long sequential reads over the 2 disks isn't such a good idea
for that case, since it just introduces seeks rather than keep up with
the streaming speed.
Balancing seeks; there might be some theoretical advantage because you
could, again in theory, do shorter seeks if you keep one head at the
inside and one the other disk head on the outside. In practice... a lot
of the seek time is rotational latency so it's not as big a deal as it
may sound; the moment you seek you pay a ton.


>
> I have written a program that, on start up, reads through thousands of
> small files, and as a result does a great deal of random reads for
> several minutes. While that was going on, I ran "iostat -d 2". My
> observation was that any writes that occurred were correctly sent to
> both disks, but all reads were being requested ONLY from the first
> drive.

if your application is single threaded this isn't totally unexpected as
a result...



--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-07-06 20:44:26

by Timothy Normand Miller

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

On Sun, Jul 6, 2008 at 4:19 PM, Arjan van de Ven <[email protected]> wrote:

>> Since RAID1 is a mirrored configuration, it's possible to distribute
>> reads across the drives, improving throughput and latency over a
>> single drive on random reads.
>
> This is.. borderline true. Let me explain the caveats;
> for a SINGLE THREADED workload, there is actually no difference.
> Balancing long sequential reads over the 2 disks isn't such a good idea
> for that case, since it just introduces seeks rather than keep up with
> the streaming speed.
> Balancing seeks; there might be some theoretical advantage because you
> could, again in theory, do shorter seeks if you keep one head at the
> inside and one the other disk head on the outside. In practice... a lot
> of the seek time is rotational latency so it's not as big a deal as it
> may sound; the moment you seek you pay a ton.

Ok, I see. This makes sense. It is a single thread. I did kindof
expect that since the spatial locality of the metadata and files
themselves (thousands of 2K byte files being read in a different order
from which they were created) should not be very good, the kernel
would round-robin (or probably something better) the requests (single
blocks and sequential groups of blocks) between drives.

So, if I were to divide these reads across multiple threads, I would
see requests to both drives? I'm wondering now if that would really
help, although as far as I can tell, what I'm doing is dominated by
seek time.

And the kernel pays attention to which thread caused a given request?
What if you have two threads alternatively reading one file? (Which
would obviously have to be serialized by locks.) Just curious. :)

> if your application is single threaded this isn't totally unexpected as
> a result...


Oh, and one other question. I know the drives are AHCI capable, and
when I read the drive flags, they report that it's enabled, and that
the queue depth is 31. For the controller, the BIOS has three
settings, IDE, AHCI, and RAID. In IDE mode, NCQ and other AHCI
features are disabled. Unfortunately, the BIOS and manual for the
motherboard do not state that AHCI features are enabled in RAID mode.
Is there a way I can query to kernel to see if it's actually USING
NCQ?

The reason I'm on about this is that the workloads I'm throwing at
this computer are (sometimes) badly bottlenecked by the disks, so if I
can get better throughput, that would help a lot.

Thank you for your help.


--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project

2008-07-06 21:25:07

by Timothy Normand Miller

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

On Sun, Jul 6, 2008 at 4:19 PM, Arjan van de Ven <[email protected]> wrote:

> if your application is single threaded this isn't totally unexpected as
> a result...

Ok, I've rewritten the program to use multiple threads. There is a
bzcat process that's sequentially reading a large file, and then
there's another process with four threads that is reading lots of
small files with a lot of random access. All threads are constrained
by the disk (CPU load is low).

I'm still seeing absolutely NO reads from the second disk.

Any ideas?

Thanks.


--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project

2008-07-07 01:22:19

by Timothy Normand Miller

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

I have absolutely no idea what I'm doing, but the kernel source is
well formatted. Here's what I found. I'm looking in
"drivers/md/dm-raid1.c".

There's this function that seems to be involved in initiating reads:

static void do_reads(struct mirror_set *ms, struct bio_list *reads)
970{
971 region_t region;
972 struct bio *bio;
973 struct mirror *m;
974
975 while ((bio = bio_list_pop(reads))) {
976 region = bio_to_region(&ms->rh, bio);
977 m = get_default_mirror(ms);
978
979 /*
980 * We can only read balance if the region is in sync.
981 */
982 if (likely(rh_in_sync(&ms->rh, region, 1)))
983 m = choose_mirror(ms, bio->bi_sector);
984 else if (m && atomic_read(&m->error_count))
985 m = NULL;
986
987 if (likely(m))
988 read_async_bio(m, bio);
989 else
990 bio_endio(bio, -EIO);
991 }
992}

It calls this function:

866static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector)
867{
868 struct mirror *m = get_default_mirror(ms);
869
870 do {
871 if (likely(!atomic_read(&m->error_count)))
872 return m;
873
874 if (m-- == ms->mirror)
875 m += ms->nr_mirrors;
876 } while (m != get_default_mirror(ms));
877
878 return NULL;
879}

which seems to always choose the default mirror unless there's an
error on the default mirror. I have absolutely no idea where to put
it, but it seems to me that a solution would be to keep track of the
last mirror and select the next one each time this function is called.

That would be a simple and blind round-robin, which isn't necessarily
the smartest thing to do, so it would require testing. It also occurs
to me, given how long this code has been around (since 2003), that the
idea of distributing the read load may have been considered and
rejected. Was it? If not, this could be a valuable boost to Linux
RAID performance. (I chose RAID1 both for the redundancy AND the read
performance boost, and so I was surprised to find that there was no
read performance boost under Linux.)

What is dm-round-robin.c for? Is there a place where the kernel
source is heavily annotated so that a newbie can navigate it and
figure out what does what?

Cheers.

--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project

2008-07-07 01:26:44

by Timothy Normand Miller

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

I was poking around some more, and I found drivers/md/raid1.c. This
has sophisticated load balancing for reads. Why isn't this code being
used for dm_raid1? I'm a bit confused as to why this redundant code
should be there if dmraid and mdraid are functionally the same.


--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project

2008-07-07 09:58:19

by Joseph Fannin

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

On Sun, Jul 06, 2008 at 09:26:35PM -0400, Timothy Normand Miller wrote:
> I was poking around some more, and I found drivers/md/raid1.c. This
> has sophisticated load balancing for reads. Why isn't this code being
> used for dm_raid1? I'm a bit confused as to why this redundant code
> should be there if dmraid and mdraid are functionally the same.

I don't think dmraid and mdraid *are* functionally the same. dmraid
was written as more or less a compatibilty hack, so people using
fakeRAID with its Windows-only software drivers could still
dual-boot. That could have changed somewhat while I wasn't looking,
but it doesn't sound like it has changed much.

mdraid is still the only "real" Linux softRAID
implementation... though LVM also does some very RAID-like things.

It would be good if dmraid was better optimized, or more complete --
possibly by reusing some of md -- but no one has done the work.

Anyway, the consensus for a long time seems to be that, unless you
need to dual-boot with a Windows system using fakeRAID drivers, you
should just put the drive controller in AHCI mode and use mdraid.

--
Joseph Fannin
[email protected]

2008-07-07 11:31:18

by Timothy Normand Miller

[permalink] [raw]
Subject: Re: HELP: Getting unexpected fakeraid behavior. Fix?

On Mon, Jul 7, 2008 at 5:57 AM, Joseph Fannin <[email protected]> wrote:

> I don't think dmraid and mdraid *are* functionally the same. dmraid
> was written as more or less a compatibilty hack, so people using
> fakeRAID with its Windows-only software drivers could still
> dual-boot. That could have changed somewhat while I wasn't looking,
> but it doesn't sound like it has changed much.

It looks like I've been bitten by misinformation on the web.

> mdraid is still the only "real" Linux softRAID
> implementation... though LVM also does some very RAID-like things.

I''m not sure how much you would save, but it might be more compact to
combine them. Complexity/size tradeoff, I guess.

> Anyway, the consensus for a long time seems to be that, unless you
> need to dual-boot with a Windows system using fakeRAID drivers, you
> should just put the drive controller in AHCI mode and use mdraid.

That may be so on LKML, but others don't seem to be so well-informed.


--
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project