Hello to everybody and a Happy New Year.
Thanks a lot for all the great job you are all doing.
Having EXT2 file system corruption after suspend on a notebook I investigate
kernel code and it seems to me that IDE Disk cache handling is wrong in
kernel 2.4.XX.
Sources extracts are from kernel 2.4.20.
If you look at struct hd_driveid in hdreg.h you will find :
unsigned short command_set_1; /* (word 82) supported
* 6: look-ahead
* 5: write cache ==== this means the Disk has a disk cache =====
unsigned short command_set_2; /* (word 83)
* 13: FLUSH CACHE EXT ==== Those fields were RESERVED in ATA/ATAPI 5
* 12: FLUSH CACHE ==== and are used ONLY in ATA/ATAPI 6 ====
* 10: 48-bit Address Feature Set
unsigned short cfs_enable_1; /* (word 85)
* 6: look-ahead
* 5: write cache ==== this means the Disk cache is enabled or disabled
*/
unsigned short cfs_enable_2; /* (word 86)
* 13: FLUSH CACHE EXT ==== Those fields were RESERVED in ATA/ATAPI 5
* 12: FLUSH CACHE ==== and are used ONLY in ATA/ATAPI 6 ====
* 10: 48-bit Address Feature Set
In ide-disk.c you have
static int write_cache (ide_drive_t *drive, int arg)
{
struct hd_drive_task_hdr taskfile;
struct hd_drive_hob_hdr hobfile;
memset(&taskfile, 0, sizeof(struct hd_drive_task_hdr));
memset(&hobfile, 0, sizeof(struct hd_drive_hob_hdr));
taskfile.feature = (arg) ? SETFEATURES_EN_WCACHE : SETFEATURES_DIS_WCACHE;
taskfile.command = WIN_SETFEATURES;
if (!(drive->id->cfs_enable_2 & 0x3000)) <==== WRONG ! ONLY FOR ATA/ATAPI 6
return 1;
(void) ide_wait_taskfile(drive, &taskfile, &hobfile, NULL);
drive->wcache = arg;
return 0;
}
In fact for ATA/ATAPI 5 cfs_enable_2 has no meaning. The fields to test are
write cache bits in command_set_1 and cfs_enable_1.
And in both cases the FLUSH CACHE command ALWAYS EXISTS !
The test of cfs_enable_2 must only be used for ATA/ATAPI 6 drives to use
FLUSH CACHE or FLUSH CACHE EXT in case of 48 bit addressing mode.
And it seems to me that when an IDE drive has a cache enabled wcache must be
initialized to say so ? Or you have to do a STANDY or SLEEP before APM
suspend or power off to be sure that the cache has been written to the disk.
I had a look at patch 2.4.21pre3 and the code looks the same.
And by the way how are powered off the IDE drives ?
Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the
drive with cache enabled or you will enjoy lost data.
I am not on the list so thank you to CC me.
Best regards to all.
Francis Verscheure
> And by the way how are powered off the IDE drives ?
> Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before
> powering off the drive with cache enabled or you will enjoy lost
> data.
This was discussed on the list a few months ago:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103188486216124&w=2
I'm not sure it really got fully resolved, I had disks that would
spin down and then spin up again, because of the order that the
standby and flush cache commands were sent.
John
On Fri, 2003-01-10 at 09:54, Francis Verscheure wrote:
> In fact for ATA/ATAPI 5 cfs_enable_2 has no meaning. The fields to test are
> write cache bits in command_set_1 and cfs_enable_1.
> And in both cases the FLUSH CACHE command ALWAYS EXISTS !
> The test of cfs_enable_2 must only be used for ATA/ATAPI 6 drives to use
> FLUSH CACHE or FLUSH CACHE EXT in case of 48 bit addressing mode.
Thanks for the report. I need to go reread the spec before I can
definitively comemnt on this.
> And it seems to me that when an IDE drive has a cache enabled wcache must be
> initialized to say so ? Or you have to do a STANDY or SLEEP before APM
> suspend or power off to be sure that the cache has been written to the disk.
Technically - no. In the real world its a very very good idea
> I had a look at patch 2.4.21pre3 and the code looks the same.
>
> And by the way how are powered off the IDE drives ?
> Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the
> drive with cache enabled or you will enjoy lost data
IDE disagrees with itself over this but when we get a controlled power
off we do this. The same ATA5/ATA6 problem may well be present there
too. I will review both
Any specific opinion Andre ?
On Fri, 2003-01-10 at 12:37, Alan Cox wrote:
> > And by the way how are powered off the IDE drives ?
> > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the
> > drive with cache enabled or you will enjoy lost data
>
> IDE disagrees with itself over this but when we get a controlled power
> off we do this. The same ATA5/ATA6 problem may well be present there
> too. I will review both
I did fix a data loss problem with some PPC laptops that way too, that
is just before shutdown and just before machine sleep, sending a STANDBYNOW
command. In the case of machine sleep, I make sure not to accept any more
request (mark the hwif busy) after that and until machine wake up (at which
point I do a full hard reset or poweron reset of the drive).
Ben.
On 10 Jan 2003, Alan Cox wrote:
> On Fri, 2003-01-10 at 09:54, Francis Verscheure wrote:
> > In fact for ATA/ATAPI 5 cfs_enable_2 has no meaning. The fields to test are
> > write cache bits in command_set_1 and cfs_enable_1.
> > And in both cases the FLUSH CACHE command ALWAYS EXISTS !
> > The test of cfs_enable_2 must only be used for ATA/ATAPI 6 drives to use
> > FLUSH CACHE or FLUSH CACHE EXT in case of 48 bit addressing mode.
>
> Thanks for the report. I need to go reread the spec before I can
> definitively comemnt on this.
I had started to work out a ruleset from ATA-3 forward based on the
capablities supported/enabled/supported_enabled ...
But it it is not clear how to even hand 2 or 3 bits in word 93, I have
little hope with out huge amounts of help to classify a rule set for at
least 48-bits of supported against another 48-bits of enabled, traversing
ATA-3,4,5,6,7 major releases, and about 15 minor revisions.
> > And it seems to me that when an IDE drive has a cache enabled wcache must be
> > initialized to say so ? Or you have to do a STANDY or SLEEP before APM
> > suspend or power off to be sure that the cache has been written to the disk.
>
> Technically - no. In the real world its a very very good idea
They default wcache enabled.
bdflush,sync,spin,flushcache,check_error,(OMG's),STANDY/SLEEP
OMG:
The drive does random and automatic flush caches, if an error happens it
does not report. *sigh* When APM hits it with a flush and pray the error
is from this flush, but it does not matter ... the kernel does not have
the paths to deal this issue ... so bye bye data! Now it if the current
flush is not the owner of the error OMFG is suggested.
OMFG:
Since there is not a mechanism to assist the drive, and the drive can not
help it self ... you can expect a device lockup and fatal operations.
Stuff people really do not want to know.
> > I had a look at patch 2.4.21pre3 and the code looks the same.
> >
> > And by the way how are powered off the IDE drives ?
> > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the
> > drive with cache enabled or you will enjoy lost data
>
> IDE disagrees with itself over this but when we get a controlled power
> off we do this. The same ATA5/ATA6 problem may well be present there
> too. I will review both
Not true, the firmware knows to commit the data to platter.
If it was true you would be screaming long ago.
> Any specific opinion Andre ?
A dirty trick used to date is to pop the STANDY or SLEEP, and depend on
the drive to deal with the double dirty flush error. If the FLUSH CACHE
was not valid, the drive would spin back up from STANDY, but not from
SLEEP, this could be a problem. However SLEEP issued by the driver only
happens at shutdown unless it has been changed. In the shutdown process,
each partition unmount was flushed and also once extra when the usage
count was set to zero. Worst case was 2 flush min.
So it is a pig in a poke ...
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 2003-01-10 at 11:14, Andre Hedrick wrote:
> The drive does random and automatic flush caches, if an error happens it
> does not report. *sigh* When APM hits it with a flush and pray the error
> is from this flush, but it does not matter ... the kernel does not have
> the paths to deal this issue ... so bye bye data! Now it if the current
> flush is not the owner of the error OMFG is suggested.
For that matter the BIOS tends to issue the flush, in fact APM is
supposed to be transparent so the BIOS is required to handle it and
since a critical shutdown from the APM PM might not even hit the OS
it has to. Of course pigs also fly 8)
> > > I had a look at patch 2.4.21pre3 and the code looks the same.
> > >
> > > And by the way how are powered off the IDE drives ?
> > > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the
> > > drive with cache enabled or you will enjoy lost data
> >
> > IDE disagrees with itself over this but when we get a controlled power
> > off we do this. The same ATA5/ATA6 problem may well be present there
> > too. I will review both
>
> Not true, the firmware knows to commit the data to platter.
> If it was true you would be screaming long ago.
IDE disagrees with itself because it is meant to work compatibly but if
you run it compatibly you lose data on poweroff.
>
> > Any specific opinion Andre ?
>
> A dirty trick used to date is to pop the STANDY or SLEEP, and depend on
> the drive to deal with the double dirty flush error. If the FLUSH CACHE
> was not valid, the drive would spin back up from STANDY, but not from
> SLEEP, this could be a problem. However SLEEP issued by the driver only
> happens at shutdown unless it has been changed. In the shutdown process,
> each partition unmount was flushed and also once extra when the usage
> count was set to zero. Worst case was 2 flush min.
>
The original question however is whether we are skipping issuing the flush
and sleep on ATA3-5 devices when we should not, because the test is over
strong.
It seems weakening the test is the best option, it fixes ATA-5 and any device
told to sleep, standby or flush that doesn't know the command is just going
to go "Huh ?" and we'll get a nice easy to handle error.
Alan
Oh, just let the darn thing barf a 0x51/0x04 is fine with me!
Just an abort/unsupported command.
Cheers,
On 10 Jan 2003, Alan Cox wrote:
> On Fri, 2003-01-10 at 11:14, Andre Hedrick wrote:
> > The drive does random and automatic flush caches, if an error happens it
> > does not report. *sigh* When APM hits it with a flush and pray the error
> > is from this flush, but it does not matter ... the kernel does not have
> > the paths to deal this issue ... so bye bye data! Now it if the current
> > flush is not the owner of the error OMFG is suggested.
>
> For that matter the BIOS tends to issue the flush, in fact APM is
> supposed to be transparent so the BIOS is required to handle it and
> since a critical shutdown from the APM PM might not even hit the OS
> it has to. Of course pigs also fly 8)
>
> > > > I had a look at patch 2.4.21pre3 and the code looks the same.
> > > >
> > > > And by the way how are powered off the IDE drives ?
> > > > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the
> > > > drive with cache enabled or you will enjoy lost data
> > >
> > > IDE disagrees with itself over this but when we get a controlled power
> > > off we do this. The same ATA5/ATA6 problem may well be present there
> > > too. I will review both
> >
> > Not true, the firmware knows to commit the data to platter.
> > If it was true you would be screaming long ago.
>
> IDE disagrees with itself because it is meant to work compatibly but if
> you run it compatibly you lose data on poweroff.
>
> >
> > > Any specific opinion Andre ?
> >
> > A dirty trick used to date is to pop the STANDY or SLEEP, and depend on
> > the drive to deal with the double dirty flush error. If the FLUSH CACHE
> > was not valid, the drive would spin back up from STANDY, but not from
> > SLEEP, this could be a problem. However SLEEP issued by the driver only
> > happens at shutdown unless it has been changed. In the shutdown process,
> > each partition unmount was flushed and also once extra when the usage
> > count was set to zero. Worst case was 2 flush min.
> >
>
> The original question however is whether we are skipping issuing the flush
> and sleep on ATA3-5 devices when we should not, because the test is over
> strong.
>
> It seems weakening the test is the best option, it fixes ATA-5 and any device
> told to sleep, standby or flush that doesn't know the command is just going
> to go "Huh ?" and we'll get a nice easy to handle error.
>
> Alan
>
>
Andre Hedrick
LAD Storage Consulting Group
On Fri, 2003-01-10 at 13:03, Andre Hedrick wrote:
> Oh, just let the darn thing barf a 0x51/0x04 is fine with me!
> Just an abort/unsupported command.
Sounds ok to me. We do need a barfsupressor option so we can issue
commands that may fail without confusing the user (eg multiwrite setup)
ie
ide_hwif_barfsupress(hwif);
ide_command....
That's very much true irrespective of the flush thing
On Fri, Jan 10 2003, Alan Cox wrote:
> On Fri, 2003-01-10 at 13:03, Andre Hedrick wrote:
> > Oh, just let the darn thing barf a 0x51/0x04 is fine with me!
> > Just an abort/unsupported command.
>
> Sounds ok to me. We do need a barfsupressor option so we can issue
> commands that may fail without confusing the user (eg multiwrite setup)
>
> ie
> ide_hwif_barfsupress(hwif);
> ide_command....
>
> That's very much true irrespective of the flush thing
In the barrier patches, I just used drive->quiet to supress ide_error()
complaining too much (on cache flushes, too). Whether that's per-drive
of per-hwif entity, dunno...
--
Jens Axboe
--- ../linux.21pre3/drivers/ide/ide-disk.c 2003-01-07 14:03:08.000000000 +0000
+++ drivers/ide/ide-disk.c 2003-01-10 17:15:02.000000000 +0000
@@ -38,9 +38,11 @@
* Version 1.15 convert all calls to ide_raw_taskfile
* since args will return register content.
* Version 1.16 added suspend-resume-checkpower
+ * Version 1.17 do flush on standy, do flush on ATA < ATA6
+ * fix wcache setup.
*/
-#define IDEDISK_VERSION "1.16"
+#define IDEDISK_VERSION "1.17"
#undef REALLY_SLOW_IO /* most systems can safely undef this */
@@ -67,7 +69,7 @@
#include <asm/uaccess.h>
#include <asm/io.h>
-/* FIXME: soem day we shouldnt need to look in here! */
+/* FIXME: some day we shouldnt need to look in here! */
#include "legacy/pdc4030.h"
@@ -716,6 +718,7 @@
MOD_INC_USE_COUNT;
if (drive->removable && drive->usage == 1) {
ide_task_t args;
+ int cf;
memset(&args, 0, sizeof(ide_task_t));
args.tfRegister[IDE_COMMAND_OFFSET] = WIN_DOORLOCK;
args.command_type = ide_cmd_type_parser(&args);
@@ -727,12 +730,40 @@
*/
if (drive->doorlocking && ide_raw_taskfile(drive, &args, NULL))
drive->doorlocking = 0;
+ drive->wcache = 0;
+ /* Cache enabled ? */
+ if (drive->id->csfo & 1)
+ drive->wcache = 1;
+ /* Cache command set available ? */
+ if (drive->id->cfs_enable_1 & (1<<5))
+ drive->wcache = 1;
+ /* ATA6 cache extended commands */
+ cf = drive->id->command_set_2 >> 24;
+ if((cf & 0xC0) == 0x40 && (cf & 0x30) != 0)
+ drive->wcache = 1;
}
return 0;
}
static int do_idedisk_flushcache(ide_drive_t *drive);
+static int ide_cacheflush_p(ide_drive_t *drive)
+{
+ if(drive->wcache)
+ {
+ printk(KERN_INFO "ide: performing cache flush.\n");
+ if (do_idedisk_flushcache(drive))
+ {
+ printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
+ drive->name);
+ return -EIO;
+ }
+ return 1;
+ }
+ printk(KERN_INFO "ide: no cache flush required.\n");
+ return 0;
+}
+
static void idedisk_release (struct inode *inode, struct file *filp, ide_drive_t *drive)
{
if (drive->removable && !drive->usage) {
@@ -744,10 +775,7 @@
if (drive->doorlocking && ide_raw_taskfile(drive, &args, NULL))
drive->doorlocking = 0;
}
- if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
- if (do_idedisk_flushcache(drive))
- printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
- drive->name);
+ ide_cacheflush_p(drive);
MOD_DEC_USE_COUNT;
}
@@ -1435,7 +1463,7 @@
{
if (drive->suspend_reset)
return 1;
-
+ ide_cacheflush_p(drive);
return call_idedisk_suspend(drive, 0);
}
@@ -1679,10 +1707,7 @@
static int idedisk_cleanup(ide_drive_t *drive)
{
- if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
- if (do_idedisk_flushcache(drive))
- printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
- drive->name);
+ ide_cacheflush_p(drive);
return ide_unregister_subdriver(drive);
}
> > And by the way how are powered off the IDE drives ?
> > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before
> > powering off the drive with cache enabled or you will enjoy lost
> > data.
>
> We always issue standby or sleep commands to a drive before powering
> off which means the cache flush thing should never have been an
> issue.
I experienced drives spinning back up after they had been flushed on
powerdown, which is not necessarily wrong, (I.E. I never noticed any
data loss), but it's not ideal. Can't we do:
* Standby
* Flush
* Standby
or is there a reason not to? I know there were discussions about the
right order to do the standyby and flush, and as far as I remember, we
never reached a conclusion :-).
John.
On Fri, 2003-01-10 at 16:48, Jens Axboe wrote:
> In the barrier patches, I just used drive->quiet to supress ide_error()
> complaining too much (on cache flushes, too). Whether that's per-drive
> of per-hwif entity, dunno...
Commands are queued per hwif so it doesn't actually matter I suspect.
BTW do you plan to fix up the oopses in the tcq code or should I just mark
it disabled for anyone who has the time to finish the job ? There are a
whole pile of drivers that fail with tcq - mostly because they have custom
dma end functions
On Fri, Jan 10 2003, Alan Cox wrote:
> On Fri, 2003-01-10 at 16:48, Jens Axboe wrote:
> > In the barrier patches, I just used drive->quiet to supress ide_error()
> > complaining too much (on cache flushes, too). Whether that's per-drive
> > of per-hwif entity, dunno...
>
> Commands are queued per hwif so it doesn't actually matter I suspect.
True
> BTW do you plan to fix up the oopses in the tcq code or should I just mark
> it disabled for anyone who has the time to finish the job ? There are a
> whole pile of drivers that fail with tcq - mostly because they have custom
> dma end functions
Yes I will get around to it, probably next week.
--
Jens Axboe