2002-11-27 10:16:31

by Neil Conway

[permalink] [raw]
Subject: FS-corrupting IDE bug still in 2.4.20-rc3

--- ide-features.c.orig Wed Nov 27 09:34:18 2002
+++ ide-features.c Wed Nov 27 10:06:49 2002
@@ -272,12 +272,39 @@
*/
int ide_config_drive_speed (ide_drive_t *drive, byte speed)
{
+ ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
int i, error = 1;
- byte stat;
+ byte stat,unit;
+ unsigned long flags;
+
+ spin_lock_irqsave(&io_request_lock, flags);
+ /*
+ * XXXXX FIXME:
+ * The next test is a band-aid. This is because this routine can be
+ * called while the hwgroup is busy - e.g., after a DMA or PIO
+ * transfer has been initiated. Known culprits: so far, the only
+ * known way to trigger the bug is to load an IDE CD module (both
+ * ide-scsi and ide-cd count) - on most chipsets, this ultimately
+ * causes a call to this routine with no regard for the busy-ness of
+ * the hwgroup. If a transfer is in progress, then as soon as we issue
+ * the SELECT_DRIVE() command below, we trash it. This has caused
+ * fs corruption (it probably shouldn't!).
+ *
+ * The RIGHT way to deal with this is probably either to queue the
+ * call for execution when the hwgroup isn't busy, OR (dodgy?) to sleep
+ * right here in this routine until it isn't busy. We also now have
+ * to use the io_request_lock spinlock to keep SMP systems honest.
+ * This lot is temporary, pending a real fix. NJC 9/5/02, 26/11/02
+ */
+ if (hwgroup) if (hwgroup->busy) {
+ spin_unlock_irqrestore(&io_request_lock, flags);
+ printk("Argh: hwgroup is busy in ide_config_drive_speed\n");
+ return error;
+ }

#if defined(CONFIG_BLK_DEV_IDEDMA) && !defined(CONFIG_DMA_NONPCI)
- byte unit = (drive->select.b.unit & 0x01);
+ unit = (drive->select.b.unit & 0x01);
outb(inb(hwif->dma_base+2) & ~(1<<(5+unit)), hwif->dma_base+2);
#endif /* (CONFIG_BLK_DEV_IDEDMA) && !(CONFIG_DMA_NONPCI) */

@@ -338,6 +365,7 @@
enable_irq(hwif->irq);

if (error) {
+ spin_unlock_irqrestore(&io_request_lock, flags);
(void) ide_dump_status(drive, "set_drive_speed_status", stat);
return error;
}
@@ -371,6 +399,7 @@
case XFER_SW_DMA_0: drive->id->dma_1word |= 0x0101; break;
default: break;
}
+ spin_unlock_irqrestore(&io_request_lock, flags);
return error;
}


Attachments:
(No filename) (1.18 kB)
ide_patch_2.4.20-rc3_261102.txt (2.15 kB)
ide_patch_2.4.20-rc3_261102.txt
Download all attachments

2002-11-27 10:52:16

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: FS-corrupting IDE bug still in 2.4.20-rc3

Hi Neil,

> I've been off-list and not paying much attention since Andre
> acknowledged it was a bug (and didn't like my patch).
I can imagine why ...

> Let me be very clear: this bug has corrupted filesystems on three
> machines of mine. All of these had PIIX chipsets. I have also
> reproduced it on a VIA chipset, but since that machine was production I
> didn't try very hard to corrupt the fs.
You may try that patch with a VIA boxen and I am quite sure you may experience
a bug that none of your harddisks may be recognized and result in a
panic();

> The patch is not a real fix, merely a workaround. But since 6 months
> have already elapsed, can I request that the patch be applied now, and
> when Andre creates a proper fix we can use that.
I had the same Fix in WOLK some time ago and many users with VIA chipset
complained that with the fix their mashine does not recognize any harddisks
and after trying to recognize they had a panic();

Maybe it's working for you with some VIA chipsets but I removed that fix and
after removal all users with VIA were happy. I've never heard of a FS
corruption of them.

ciao, Marc

2002-11-27 12:29:17

by Neil Conway

[permalink] [raw]
Subject: Re: FS-corrupting IDE bug still in 2.4.20-rc3

Hi Marc...

--- Marc-Christian Petersen <[email protected]> wrote:
> You may try that patch with a VIA boxen and I am quite sure you may
> experience
> a bug that none of your harddisks may be recognized and result in a
> panic();

Aha... Actually, if you read the patch, you'll see why that's no
longer the case. I had in fact forgotten that I'd had to patch the
patch to make my VIA box boot (it was 6 months ago now!). I now do "if
(hwgroup) ..." in the test-for-busy section. So, the new patch does
NOT now cause panics on VIA.

> I had the same Fix in WOLK some time ago and many users with VIA
> chipset
> complained that with the fix their mashine does not recognize any
> harddisks
> and after trying to recognize they had a panic();

Yes, only some chipsets end up in ide_config_drive_speed() at bootup;
notably the PIIX doesn't and the VIA does. Mea culpa! I should have
posted the fixed patch perhaps, but then it was deprecated and I
thought Andre had a better fix in hand. (BTW, when you say you had
"the same fix", you mean you fixed it independently or you used my
patch from May '02?)

> Maybe it's working for you with some VIA chipsets but I removed that
> fix and
> after removal all users with VIA were happy. I've never heard of a FS
> corruption of them.

Well, do they have the trigger ingredients? You MUST have an IDE CDROM
sharing a bus with a HDD. Also, the perfect recipe for disk corruption
is to reboot, and then log in to your chosen desktop, and while it's
hammering the disk starting everything up, it should fire off
"magicdev", which in turn loads ide-cd: BOOM. RedHat 7.x with GNOME
does things in this order. YMMV.

Neil

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

2002-11-28 23:54:57

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: FS-corrupting IDE bug still in 2.4.20-rc3



On Wed, 27 Nov 2002, Neil Conway wrote:

> Guys - you may remember this one from May this year.
>
> I've been off-list and not paying much attention since Andre
> acknowledged it was a bug (and didn't like my patch).

Neil,

Sorry for taking so long to answer, but does not seem to be a kernel
problem.

Andre, could you comment on his issue?