2002-04-18 00:11:22

by Adam Kropelin

[permalink] [raw]
Subject: 2.5.8-dj1 with IDE TCQ doesn't survive boot

Jens,

Tried 2.5.8-dj1 here with IDE TCQ and it doesn't make it through bootup. The lockup (no oops) happens at various places, usually
during a disk-intensive operation like starting PostgreSQL. Disk & chipset detection always goes ok; the lockup is much later in the
boot cycle. Nothing shows up in the logs.

A kernel without TCQ boots reliably.

Boot drive is hda (IBM-DTTA-351680). System is SMP ppro, no preempt.

--Adam

Chipset/disk detection looks like this:

Uniform Multi-Platform E-IDE driver ver.:7.0.0
ide: system bus speed 33MHz
Intel Corp. 82371SB PIIX3 IDE [Natoma/Triton II]: IDE controller on PCI slot 001
Intel Corp. 82371SB PIIX3 IDE [Natoma/Triton II]: chipset revision 0
Intel Corp. 82371SB PIIX3 IDE [Natoma/Triton II]: not 100% native mode: will prr
PIIX: Intel Corp. 82371SB PIIX3 IDE [Natoma/Triton II] MWDMA16 controller on pc1
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:DMA
hda: IBM-DTTA-351680, ATA DISK drive
hdc: WDC AC21200H, ATA DISK drive
hdd: CD-ROM CDU311, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
ide: unexpected interrupt
hda: tagged command queueing enabled, command queue depth 8
hda: 33022080 sectors (16907 MB) w/462KiB Cache, CHS=34944/15/63, (U)DMA
ide: unexpected interrupt
hdc: 2503872 sectors (1282 MB) w/128KiB Cache, CHS=2484/16/63, DMA
hdd: ATAPI 8X CD-ROM drive, 256kB Cache, DMA
Uniform CD-ROM driver Revision: 3.12
hda: [PTBL] [2055/255/63] hda1 hda2 hda3
hdc: [PTBL] [621/64/63] hdc1


Relevant .config entries:

CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_IDE_TCQ=y
CONFIG_BLK_DEV_IDE_TCQ_FULL=y
CONFIG_BLK_DEV_IDE_TCQ_DEFAULT=y
CONFIG_BLK_DEV_IDE_TCQ_DEPTH=8
CONFIG_BLK_DEV_PIIX=y
CONFIG_IDEDMA_AUTO=y



2002-04-18 06:14:17

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.8-dj1 with IDE TCQ doesn't survive boot

On Wed, Apr 17 2002, Adam Kropelin wrote:
> Jens,
>
> Tried 2.5.8-dj1 here with IDE TCQ and it doesn't make it through
> bootup. The lockup (no oops) happens at various places, usually during
> a disk-intensive operation like starting PostgreSQL. Disk & chipset
> detection always goes ok; the lockup is much later in the boot cycle.
> Nothing shows up in the logs.

There are two unknowns for me here -- the IBM model you use has not been
tested, and the adapter is the old PIIX3 core which is untested as well.
There are a few things I would like you to try, if you don't mind.

- First try a later kernel, there have been lots of changes since 2.5.8
wrt TCQ. I've attached a patch against 2.5.8-clean (not -dj1), could
you see if that works for you?

- If that doesn't change anything, please also try and disable
CONFIG_BLK_DEV_IDE_TCQ_FULL.

If none of this makes it work, I'm hoping you can setup a serial console
and do some debug logging for me? If you can, I'll let you know how and
what to capture.

Thanks for your report!

--
Jens Axboe


Attachments:
(No filename) (1.03 kB)
ata-tcq-258-all-1.bz2 (35.22 kB)
Download all attachments

2002-04-19 00:45:29

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.8-dj1 with IDE TCQ doesn't survive boot

Jens Axboe wrote:
> On Wed, Apr 17 2002, Adam Kropelin wrote:
>> Jens,
>>
>> Tried 2.5.8-dj1 here with IDE TCQ and it doesn't make it through
>> bootup. The lockup (no oops) happens at various places, usually during
>
> - First try a later kernel, there have been lots of changes since 2.5.8
> wrt TCQ.

<snip>

> - If that doesn't change anything, please also try and disable
> CONFIG_BLK_DEV_IDE_TCQ_FULL.

The problem persists in both cases, but there are subtle differences...

With CONFIG_BLK_DEV_IDE_TCQ_FULL it will lock regularly.
The auto fsck at boot is good at killing it; it locked up at 51% last time.
Occasionally it will make it as far as PostgreSQL and nfslock startup,
but no further.

With !CONFIG_BLK_DEV_IDE_TCQ_FULL it survives fsck
regularly. However, it still locks up around nfslock and PostgreSQL
startup. Interestingly, if I disable those two items, it boots every
time and (even more interestingly) I can start both by hand after
boot and it "seems" stable.

> If none of this makes it work, I'm hoping you can setup a serial console
> and do some debug logging for me? If you can, I'll let you know how and
> what to capture.

Your wish is my command...

--Adam

P.S. The following messages in reference to the CDROM are now emitted
during device detection. I assume it's because I have no media in the drive,
but since this is new behavior with the patch you sent I figured I'd note it.

hdd: request sense failure: status=0x51 { DriveReady SeekComplete Error }
hdd: request sense failure: error=0x24
hdd: request sense failure: status=0x51 { DriveReady SeekComplete Error }
hdd: request sense failure: error=0x24
hdd: request sense failure: status=0x51 { DriveReady SeekComplete Error }
hdd: request sense failure: error=0x24


2002-04-19 11:07:33

by Martin Dalecki

[permalink] [raw]
Subject: Re: 2.5.8-dj1 with IDE TCQ doesn't survive boot

Adam Kropelin wrote:
> Jens Axboe wrote:
>
>>On Wed, Apr 17 2002, Adam Kropelin wrote:
>>
>>>Jens,
>>>
>>>Tried 2.5.8-dj1 here with IDE TCQ and it doesn't make it through
>>>bootup. The lockup (no oops) happens at various places, usually during
>>
>>- First try a later kernel, there have been lots of changes since 2.5.8
>> wrt TCQ.
>
>
> <snip>
>
>>- If that doesn't change anything, please also try and disable
>> CONFIG_BLK_DEV_IDE_TCQ_FULL.
>
>
> The problem persists in both cases, but there are subtle differences...
>
> With CONFIG_BLK_DEV_IDE_TCQ_FULL it will lock regularly.
> The auto fsck at boot is good at killing it; it locked up at 51% last time.
> Occasionally it will make it as far as PostgreSQL and nfslock startup,
> but no further.
>
> With !CONFIG_BLK_DEV_IDE_TCQ_FULL it survives fsck
> regularly. However, it still locks up around nfslock and PostgreSQL
> startup. Interestingly, if I disable those two items, it boots every
> time and (even more interestingly) I can start both by hand after
> boot and it "seems" stable.
>
>
>>If none of this makes it work, I'm hoping you can setup a serial console
>>and do some debug logging for me? If you can, I'll let you know how and
>>what to capture.
>
>
> Your wish is my command...
>
> --Adam
>
> P.S. The following messages in reference to the CDROM are now emitted
> during device detection. I assume it's because I have no media in the drive,
> but since this is new behavior with the patch you sent I figured I'd note it.
>
> hdd: request sense failure: status=0x51 { DriveReady SeekComplete Error }
> hdd: request sense failure: error=0x24
> hdd: request sense failure: status=0x51 { DriveReady SeekComplete Error }
> hdd: request sense failure: error=0x24
> hdd: request sense failure: status=0x51 { DriveReady SeekComplete Error }
> hdd: request sense failure: error=0x24

The problem is right now that the TCQ changes unfortunately make it necessary
to adjust the transport layer of the ide-cd driver as well. The problem is
known and I'm working on it.