2005-12-29 16:29:38

by Kenny Simpson

[permalink] [raw]
Subject: RAID controller safety

Hello,
I am trying to determine which drivers and what hardware can be used to give reliable fsync
behavior. I see many drivers with FIXME or TODO comments which make me nervous.
I see the sd driver supports the direct SYNCHRONIZE_CACHE, but I'me having a hard time tracing
through how this translates to raid controllers and their drivers.
Specificly, I am looking at the Adaptec RAID controllers and their i2o drivers. I am told the
kernel's i2o driver lacks a strong guarantee on fsync, and so far am unable to determine if the
dpt_i2o driver also falls short in this reguard.

I don't mean to start a 'drives lie - disable all caching, buy FC drives' flame war. I know
drives can lie, but I'd like to know if the drivers are lying too.

thanks,
-Kenny





__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/


2005-12-30 15:16:04

by Alan

[permalink] [raw]
Subject: Re: RAID controller safety

On Iau, 2005-12-29 at 08:29 -0800, Kenny Simpson wrote:
> Specificly, I am looking at the Adaptec RAID controllers and their i2o drivers. I am told the
> kernel's i2o driver lacks a strong guarantee on fsync, and so far am unable to determine if the
> dpt_i2o driver also falls short in this reguard.

Only dpt can tell you what their firmware actually does.

The i2o core drivers use the following rules

i2o_scsi issues SCSI commands and assumes they are pass through and that
the firmware does not fake completions early (or if it does that it
battery backs them). For the known hardware the i2o SCSI class interface
is a pass through interface with the card cpu just doing protocol gunk
and supervision

i2o_block by default assumes the card is caching. It adopts write
through mode if the controller has no battery, write back if it shows
battery. This can be configured differently via ioctls including the
ability to tune write through of large I/O's (to avoid cache thrashing),
and to do write back with no battery backup for performance in cases
where losing the data on a crash doesn't matter (eg swap)

Alan

2005-12-30 16:18:09

by Kenny Simpson

[permalink] [raw]
Subject: Re: RAID controller safety

> > Specificly, I am looking at the Adaptec RAID controllers and their i2o drivers. I am told
> the
> > kernel's i2o driver lacks a strong guarantee on fsync, and so far am unable to determine if
> the
> > dpt_i2o driver also falls short in this reguard.
>
> Only dpt can tell you what their firmware actually does.

Yeah, I wasn't so much interrested in the firmware just yet, just interrested if the device driver
(dpt_i2o) gave it a fighting chance of doing the right thing.

> The i2o core drivers use the following rules

> i2o_block by default assumes the card is caching. It adopts write
> through mode if the controller has no battery, write back if it shows
> battery. This can be configured differently via ioctls including the
> ability to tune write through of large I/O's (to avoid cache thrashing),
> and to do write back with no battery backup for performance in cases
> where losing the data on a crash doesn't matter (eg swap)

That's what I read in the comments too, but looking at the code I only ever see it set to
write-back. I verified this with blktool - our controllers have no battery, and blktool showed
the i2o-wcache state as write-back.

However, I was also told that the i2o_block driver lacks barrier support, so even in the
write-back case, the controller won't be told to flush/sync.

I was sent a patch against 2.6.10 that implements barrier support in i2o_block, but the code base
has shifted too much for me to make it apply.

-Kenny





__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/

2005-12-30 18:18:42

by Alan

[permalink] [raw]
Subject: Re: RAID controller safety

On Gwe, 2005-12-30 at 08:18 -0800, Kenny Simpson wrote:
> That's what I read in the comments too, but looking at the code I only ever see it set to
> write-back. I verified this with blktool - our controllers have no battery, and blktool showed
> the i2o-wcache state as write-back.

blktool doesn't support i2o control as far as I am aware. The blk level
generic ioctls are just too crude to control it properly.

> However, I was also told that the i2o_block driver lacks barrier support, so even in the
> write-back case, the controller won't be told to flush/sync.

Correct, but it should only ever enable this in the battery backed case.
Otherwise it uses the per command control bits to decide what mode it
wishes to use for each I/O

2005-12-30 18:58:42

by Kenny Simpson

[permalink] [raw]
Subject: Re: RAID controller safety

--- Alan Cox <[email protected]> wrote:
> On Gwe, 2005-12-30 at 08:18 -0800, Kenny Simpson wrote:
> > That's what I read in the comments too, but looking at the code I only ever see it set to
> > write-back. I verified this with blktool - our controllers have no battery, and blktool
> showed
> > the i2o-wcache state as write-back.
>
> blktool doesn't support i2o control as far as I am aware. The blk level
> generic ioctls are just too crude to control it properly.

>From man blktool dated August 2004:
i2o-wcache
Query or set an I2O block device's write cache.

>
> > However, I was also told that the i2o_block driver lacks barrier support, so even in the
> > write-back case, the controller won't be told to flush/sync.
>
> Correct, but it should only ever enable this in the battery backed case.
> Otherwise it uses the per command control bits to decide what mode it
> wishes to use for each I/O

So all writes would be treated as syncronous in the write-through case (no battery), making fsync
a no-op?

-Kenny





__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/

2005-12-30 19:31:20

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: RAID controller safety

In article <[email protected]> you wrote:
> So all writes would be treated as syncronous in the write-through case (no battery), making fsync
> a no-op?

The device cache is IMHO not related to the higher level buffer cache. So
fsync flushes the buffers and the write-through ensures the controller may
not delay it in controller ram.

Gruss
Bernd

2005-12-31 00:47:48

by Alan

[permalink] [raw]
Subject: Re: RAID controller safety

On Gwe, 2005-12-30 at 10:58 -0800, Kenny Simpson wrote:
> So all writes would be treated as syncronous in the write-through case (no battery), making fsync
> a no-op?

fsync is never a no-op. fsync ensures material the OS is caching hits
disk drivers/disks. Barriers or write through on the disk driver ensure
that it hits the media.

The two are independant

2005-12-31 03:25:09

by Kenny Simpson

[permalink] [raw]
Subject: Re: RAID controller safety

--- Alan Cox <[email protected]> wrote:

> On Gwe, 2005-12-30 at 10:58 -0800, Kenny Simpson wrote:
> > So all writes would be treated as syncronous in the write-through case (no battery), making
> fsync
> > a no-op?
>
> fsync is never a no-op. fsync ensures material the OS is caching hits
> disk drivers/disks. Barriers or write through on the disk driver ensure
> that it hits the media.
>
> The two are independant
>

Ok, the light is slowly coming on for me...

Lets see if I get it:
fsync, according to POSIX, will flush all pending writes
http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html
Linux, according to the man page, takes this a little further an says that data is on stable
storage.

Stable storage for a battery-backed RAID controller means its battery-backed cache. Stable
storage for a RAID controller w/o battery means that the data is on disk. I2O controllers are
told the caching requirement for each write in the command.

To tell a disk to force data to the platter, it needs to be sent a specific command (which some
drives ignore), or the write cache must be disabled (write-through mode). The specific command
varies depending on SCSI vs. SATA vs. TCQ, etc..

Ignoring O_DIRECT, Linux writes out data from the page cache. Data gets written out when the OS
decides (high memory pressure, timers expire, etc..), or when a program requests it (fsync).

For Linux, to make the fsync command have stability (Durability), it must not only send the data
to the controller, but must inform the controller to force the data to stable storage, and then
wait for the controller to report the writes as completed.

An battery-backed I2O controller only needs to be sent the writes as write-back cache. A
non-battery-backed I2O controller needs to be sent these writes as write-through.
In both cases, the controller should set the drives themselves to be write-through.

To match the POSIX behavior, the onus on the OS is just the push out the data to the driver. To
get the further reliability, the driver, contoller (and firmware), and drives (and firmware) must
all function as advertised. If any one of these fail, the reliability is lost. Linux can at most
hope to control the driver.

Ok, with all that out of the way...

Are there any known drivers that do not correctly pass on barriers/flush/sync to their
controllers?

In my observations, and what others have told me in private emails, the I2O driver is such a
driver at least for my non-battery-backed controller (Adaptec 2015S). I read in the comments of
the I2O driver that it should set the write-cache flag for writes as write-through for
non-battery-backed controllers, but I don't observe that setting via blktool, and basic
write/fsync benchmarks run too fast for the drives I have (4x 10kRPM in RAID-10).
Also from reading the source, I only see the write-cache flag being set to write-back. I see no
test for controller properties, or anything else that would modify this setting (except for the
ioctl).
Of course, I could be mis-reading the I2O spec and all this is up to the controller to know if
it has a battery, so the controller is responsible for doing the right thing, and the flag in the
I2O driver is irrelavant for this.

Thanks for your patience,
-Kenny




__________________________________________
Yahoo! DSL ? Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com

2005-12-31 03:29:18

by Kenny Simpson

[permalink] [raw]
Subject: Re: RAID controller safety

Just for the record, I am looking at/using 2.6.15-rc7.

-Kenny





__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/

2005-12-31 06:55:05

by Kenny Simpson

[permalink] [raw]
Subject: Re: RAID controller safety

Ok, I finally tracked through the i2o code, and found that i2o_block_device_flush is ultimately
called for fsync. Sorry for being so dense.

However, it does look like barriers are not directly supported. So, are they safe to use in ext3,
or is ext3 all fine without them? Would barriers benefit i2o devices, or is there some reason to
not have them?

As for the controller defaulting to write-back, I still cannot find anything that would set the
cache mode to write-through in the non-battery-backed case.

-Kenny





__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/

2005-12-31 07:57:49

by Kenny Simpson

[permalink] [raw]
Subject: Re: RAID controller safety

To follow-up to myself again, I see that patches to implement barriers further have been working
their way through the gauntlet, with the most recent trial being on Nov 24. I guess this will be
2.6.15+ stuff. I do not see an i2o update in that set of patches.

-Kenny




__________________________________________
Yahoo! DSL ? Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com

2006-01-06 14:33:15

by Mark Salyzyn

[permalink] [raw]
Subject: RE: RAID controller safety

Alan Cox sez:
> 2005-12-29 at 08:29 -0800, Kenny Simpson wrote:
> > Specificly, I am looking at the Adaptec RAID controllers
> > and their i2o drivers. I am told the
> > kernel's i2o driver lacks a strong guarantee on fsync, and
> > so far am unable to determine if the
> > dpt_i2o driver also falls short in this reguard.
> Only dpt can tell you what their firmware actually does.

The dpt_i2o driver (which is a scsi driver) accepts the
SYNCHRONIZE_CACHE scsi command and passes it off to the firmware. The
firmware respects this and flushes all the outstanding (cached)
commands. This is true in all (kernel.org or Adaptec latest) versions.

The only environment, in my memory, that this has been tested is in the
ASR driver in FreeBSD, where this behavior is necessary in support of
cluster checkpointing.

-- Mark Salyzyn

2006-01-06 14:44:20

by Alan

[permalink] [raw]
Subject: RE: RAID controller safety

On Gwe, 2006-01-06 at 09:33 -0500, Salyzyn, Mark wrote:
> The dpt_i2o driver (which is a scsi driver) accepts the
> SYNCHRONIZE_CACHE scsi command and passes it off to the firmware. The
> firmware respects this and flushes all the outstanding (cached)
> commands. This is true in all (kernel.org or Adaptec latest) versions.

In which case it should be fine and correct with the generic i2o_scsi as
well as that will pass through SCSI command requests directly. i2o_block
doesn't know about converting any incoming cache flush to an i2o command
block so might not.

Alan

2006-01-06 15:19:00

by Kenny Simpson

[permalink] [raw]
Subject: RE: RAID controller safety

--- Alan Cox <[email protected]> wrote:
> On Gwe, 2006-01-06 at 09:33 -0500, Salyzyn, Mark wrote:
> > The dpt_i2o driver (which is a scsi driver) accepts the
> > SYNCHRONIZE_CACHE scsi command and passes it off to the firmware. The
> > firmware respects this and flushes all the outstanding (cached)
> > commands. This is true in all (kernel.org or Adaptec latest) versions.
>
> In which case it should be fine and correct with the generic i2o_scsi as
> well as that will pass through SCSI command requests directly. i2o_block
> doesn't know about converting any incoming cache flush to an i2o command
> block so might not.
>
> Alan
>

Won't the i2o_block driver use i2o_block_device_flush to flush the devices' cache (by issuing a
I2O_CMD_BLOCK_CFLUSH), or this this function used in some very different context?

Oddly enough, I see I2O_CMD_BLOCK_CFLISH #define'd to 0x37 in both the i2o driver
(include/linux/i2o.h), AND in the dpt driver (drivers/scsi/dpt/dpti_i2o.h). However, I do not see
the dpt driver using this value anywhere.

-Kenny




__________________________________________
Yahoo! DSL ? Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com

2006-01-06 16:00:19

by Alan

[permalink] [raw]
Subject: RE: RAID controller safety

On Gwe, 2006-01-06 at 07:18 -0800, Kenny Simpson wrote:
> Won't the i2o_block driver use i2o_block_device_flush to flush the devices' cache (by issuing a
> I2O_CMD_BLOCK_CFLUSH), or this this function used in some very different context?


I'm out of date. It was originally used on the last close of removable
media and to work around some promise bugs. Markus Lidel has indeed
added the relevant functions and hooks to let the block layer use it for
barriers

Alan

2006-01-06 17:06:18

by Mark Salyzyn

[permalink] [raw]
Subject: RE: RAID controller safety

Kenny Simpson [mailto:[email protected]] sez:
> Won't the i2o_block driver use i2o_block_device_flush to
> flush the devices' cache (by issuing a
> I2O_CMD_BLOCK_CFLUSH), or this this function used in some
> very different context?

We support I2O_BSA_CACHE_FLUSH, which is the i2o spec definition of this
identifier. It is merely internally re-issued as a SCSI
SYNCHRONIZE_CACHE command issued to the block device TID.

> Oddly enough, I see I2O_CMD_BLOCK_CFLISH #define'd to 0x37 in
> both the i2o driver
> (include/linux/i2o.h), AND in the dpt driver
> (drivers/scsi/dpt/dpti_i2o.h). However, I do not see
> the dpt driver using this value anywhere.

The dpt_i2o driver is a *SCSI* driver, and the card accepts SCSI
commands to all the devices (including block). The dpt_i2o driver uses
the SCSI synchronize as the path for this action, that is why you see no
utilization of I2O_BSA_CACHE_FLUSH.

A DPT private message is used to issue these SCSI commands to the
controller.

Sincerely -- Mark Salyzyn