2015-08-07 10:29:37

by Christian Kujau

[permalink] [raw]
Subject: 4.1-rc6: ATA link is slow to respond, please be patient

Hi,

this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade
to latest mainline. However, during bootup the following happens:

===============================
[ 2.237102] ata1: PATA max UDMA/100 irq 39
[ 2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
[ 2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
[ 2.417633] ata1.00: configured for UDMA/100
[ 44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 44.920452] ata1.00: failed command: READ DMA
[ 44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
[ 44.927257] ata1.00: status: { DRDY }
[ 49.971784] ata1.00: qc timeout (cmd 0xec)
[ 49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 49.978908] ata1.00: revalidation failed (errno=-5)
[ 55.019662] ata1: link is slow to respond, please be patient (ready=0)
[ 60.007677] ata1: device not ready (errno=-16), forcing hardreset
[ 60.012670] ata1: soft resetting link
[ 60.193638] ata1.00: configured for UDMA/100
[ 60.196158] ata1.00: device reported invalid CHS sector 0
[ 60.198610] ata1: EH complete
===============================

This happens only once, but systemd thinks there's a hard problem and will
drop to a recovery shell. I can start sshd and login remotely and then the
system appears to be running just fine.

This happened in 4.2.0-rc5 so I went back a few versions and found that
4.1-rc5 was OK (the error does not show up and the system boots just fine)
and 4.1-rc6 is not.

Unfortunately a git-bisect between these two versions went completly off
the charts, I don't know what happened here:

==================================
first bad commit:

0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
Author: Takashi Iwai <[email protected]>
Date: Wed May 27 16:17:19 2015 +0200

ALSA: hda - Fix noise on AMD radeon 290x controller
==================================

I don't have this driver (or ALSA) even selected. I can reproduce this
error pretty reliably and I'd like to attempt another git-bisect
run when I'm more awake. But maybe somebody recognizes this error and
has a hint where this could come from?

dmesg & .config: http://nerdbynature.de/bits/v4.1-rc6/

Thanks,
Christian.
--
BOFH excuse #225:

It's those computer people in X {city of world}. They keep stuffing things up.


2015-08-07 12:09:10

by Denis Kirjanov

[permalink] [raw]
Subject: Re: 4.1-rc6: ATA link is slow to respond, please be patient

On 8/7/15, Christian Kujau <[email protected]> wrote:
> Hi,
>
> this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade
> to latest mainline. However, during bootup the following happens:
>
> ===============================
> [ 2.237102] ata1: PATA max UDMA/100 irq 39
> [ 2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> [ 2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
> [ 2.417633] ata1.00: configured for UDMA/100
> [ 44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 44.920452] ata1.00: failed command: READ DMA
> [ 44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma
> 69632 in
> [ 44.927257] ata1.00: status: { DRDY }
> [ 49.971784] ata1.00: qc timeout (cmd 0xec)
> [ 49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 49.978908] ata1.00: revalidation failed (errno=-5)
> [ 55.019662] ata1: link is slow to respond, please be patient (ready=0)
> [ 60.007677] ata1: device not ready (errno=-16), forcing hardreset
> [ 60.012670] ata1: soft resetting link
> [ 60.193638] ata1.00: configured for UDMA/100
> [ 60.196158] ata1.00: device reported invalid CHS sector 0
> [ 60.198610] ata1: EH complete
> ===============================

Interesting, I'll try to reproduce it on my G4.

>
> This happens only once, but systemd thinks there's a hard problem and will
> drop to a recovery shell. I can start sshd and login remotely and then the
> system appears to be running just fine.
>
> This happened in 4.2.0-rc5 so I went back a few versions and found that
> 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> and 4.1-rc6 is not.
>
> Unfortunately a git-bisect between these two versions went completly off
> the charts, I don't know what happened here:
>
> ==================================
> first bad commit:
>
> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> Author: Takashi Iwai <[email protected]>
> Date: Wed May 27 16:17:19 2015 +0200
>
> ALSA: hda - Fix noise on AMD radeon 290x controller
> ==================================
>
> I don't have this driver (or ALSA) even selected. I can reproduce this
> error pretty reliably and I'd like to attempt another git-bisect
> run when I'm more awake. But maybe somebody recognizes this error and
> has a hint where this could come from?
>
> dmesg & .config: http://nerdbynature.de/bits/v4.1-rc6/
>
> Thanks,
> Christian.
> --
> BOFH excuse #225:
>
> It's those computer people in X {city of world}. They keep stuffing things
> up.
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2015-08-08 08:57:12

by Denis Kirjanov

[permalink] [raw]
Subject: Re: 4.1-rc6: ATA link is slow to respond, please be patient

On 8/7/15, Christian Kujau <[email protected]> wrote:
> Hi,
>
> this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade
> to latest mainline. However, during bootup the following happens:
>
> ===============================
> [ 2.237102] ata1: PATA max UDMA/100 irq 39
> [ 2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> [ 2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
> [ 2.417633] ata1.00: configured for UDMA/100
> [ 44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 44.920452] ata1.00: failed command: READ DMA
> [ 44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma
> 69632 in
> [ 44.927257] ata1.00: status: { DRDY }
> [ 49.971784] ata1.00: qc timeout (cmd 0xec)
> [ 49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 49.978908] ata1.00: revalidation failed (errno=-5)
> [ 55.019662] ata1: link is slow to respond, please be patient (ready=0)
> [ 60.007677] ata1: device not ready (errno=-16), forcing hardreset
> [ 60.012670] ata1: soft resetting link
> [ 60.193638] ata1.00: configured for UDMA/100
> [ 60.196158] ata1.00: device reported invalid CHS sector 0
> [ 60.198610] ata1: EH complete
> ===============================

Just tried 4.2.0-rc5+ and haven't hit the issue.

[ 17.180034] pata-pci-macio 0002:20:0d.0: enabling device (0000 -> 0002)
[ 17.185862] adb: starting probe task...
[ 17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
chipset UniNorth ATA-6, Apple bus ID 3
[ 17.202312] scsi host0: pata_macio
[ 17.203698] ata1: PATA max UDMA/100 irq 39
[ 17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
[ 17.225400] ADB keyboard at 2, handler 1
[ 17.225560] Detected ADB keyboard, type ISO, swapping keys.
[ 17.226642] input: ADB keyboard as /devices/virtual/input/input0
[ 17.227590] input: ADB Powerbook buttons as /devices/virtual/input/input1
[ 17.227795] adb: finished probe task...
[ 17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
[ 17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
[ 17.376346] ata1.00: configured for UDMA/100
[ 17.377544] scsi 0:0:0:0: Direct-Access ATA TOSHIBA
MK8026GA 5B PQ: 0 ANSI: 5
[ 17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
(80.0 GB/74.5 GiB)
[ 17.393144] sd 0:0:0:0: [sda] Write Protect is off
[ 17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 17.661225] sda: [mac] sda1 sda2 sda3 sda4
[ 17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
[ 18.223985] pata-macio 0.00020000:ata-3: Activating pata-macio
chipset KeyLargo ATA-3, Apple bus ID 0
[ 18.233397] scsi host1: pata_macio
[ 18.239172] ata2: PATA max MWDMA2 irq 24


>
> This happens only once, but systemd thinks there's a hard problem and will
> drop to a recovery shell. I can start sshd and login remotely and then the
> system appears to be running just fine.
>
> This happened in 4.2.0-rc5 so I went back a few versions and found that
> 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> and 4.1-rc6 is not.
>
> Unfortunately a git-bisect between these two versions went completly off
> the charts, I don't know what happened here:
>
> ==================================
> first bad commit:
>
> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
> Author: Takashi Iwai <[email protected]>
> Date: Wed May 27 16:17:19 2015 +0200
>
> ALSA: hda - Fix noise on AMD radeon 290x controller
> ==================================
>
> I don't have this driver (or ALSA) even selected. I can reproduce this
> error pretty reliably and I'd like to attempt another git-bisect
> run when I'm more awake. But maybe somebody recognizes this error and
> has a hint where this could come from?
>
> dmesg & .config: http://nerdbynature.de/bits/v4.1-rc6/
>
> Thanks,
> Christian.
> --
> BOFH excuse #225:
>
> It's those computer people in X {city of world}. They keep stuffing things
> up.
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2015-08-08 21:34:48

by Christian Kujau

[permalink] [raw]
Subject: Re: 4.1-rc6: ATA link is slow to respond, please be patient

On August 8, 2015 1:57:05 AM PDT, Denis Kirjanov <[email protected]> wrote:
>On 8/7/15, Christian Kujau <[email protected]> wrote:
>> Hi,
>>
>> this PowerBook G4 was running 3.16 for a while but now I wanted to
>upgrade
>> to latest mainline. However, during bootup the following happens:
>>
>> ===============================
>> [ 2.237102] ata1: PATA max UDMA/100 irq 39
>> [ 2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max
>UDMA/100
>> [ 2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
>> [ 2.417633] ata1.00: configured for UDMA/100
>> [ 44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
>0x0
>> [ 44.920452] ata1.00: failed command: READ DMA
>> [ 44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0
>dma
>> 69632 in
>> [ 44.927257] ata1.00: status: { DRDY }
>> [ 49.971784] ata1.00: qc timeout (cmd 0xec)
>> [ 49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 49.978908] ata1.00: revalidation failed (errno=-5)
>> [ 55.019662] ata1: link is slow to respond, please be patient
>(ready=0)
>> [ 60.007677] ata1: device not ready (errno=-16), forcing hardreset
>> [ 60.012670] ata1: soft resetting link
>> [ 60.193638] ata1.00: configured for UDMA/100
>> [ 60.196158] ata1.00: device reported invalid CHS sector 0
>> [ 60.198610] ata1: EH complete
>> ===============================
>
>Just tried 4.2.0-rc5+ and haven't hit the issue.
>
>[ 17.180034] pata-pci-macio 0002:20:0d.0: enabling device (0000 ->
>0002)
>[ 17.185862] adb: starting probe task...
>[ 17.196011] pata-pci-macio 0002:20:0d.0: Activating pata-macio
>chipset UniNorth ATA-6, Apple bus ID 3
>[ 17.202312] scsi host0: pata_macio
>[ 17.203698] ata1: PATA max UDMA/100 irq 39
>[ 17.219397] adb devices: [2]: 2 c4 [7]: 7 1f
>[ 17.225400] ADB keyboard at 2, handler 1
>[ 17.225560] Detected ADB keyboard, type ISO, swapping keys.
>[ 17.226642] input: ADB keyboard as /devices/virtual/input/input0
>[ 17.227590] input: ADB Powerbook buttons as
>/devices/virtual/input/input1
>[ 17.227795] adb: finished probe task...
>[ 17.368537] ata1.00: ATA-6: TOSHIBA MK8026GAX, PA005B, max UDMA/100
>[ 17.368717] ata1.00: 156301488 sectors, multi 16: LBA48
>[ 17.376346] ata1.00: configured for UDMA/100
>[ 17.377544] scsi 0:0:0:0: Direct-Access ATA TOSHIBA
>MK8026GA 5B PQ: 0 ANSI: 5
>[ 17.386989] sd 0:0:0:0: [sda] 156301488 512-byte logical blocks:
>(80.0 GB/74.5 GiB)
>[ 17.393144] sd 0:0:0:0: [sda] Write Protect is off
>[ 17.397579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>[ 17.398215] sd 0:0:0:0: Attached scsi generic sg0 type 0
>[ 17.404124] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
>enabled, doesn't support DPO or FUA
>[ 17.661225] sda: [mac] sda1 sda2 sda3 sda4
>[ 17.672937] sd 0:0:0:0: [sda] Attached SCSI disk
>[ 18.223985] pata-macio 0.00020000:ata-3: Activating pata-macio
>chipset KeyLargo ATA-3, Apple bus ID 0
>[ 18.233397] scsi host1: pata_macio
>[ 18.239172] ata2: PATA max MWDMA2 irq 24
>
>
>>
>> This happens only once, but systemd thinks there's a hard problem and
>will
>> drop to a recovery shell. I can start sshd and login remotely and
>then the
>> system appears to be running just fine.
>>
>> This happened in 4.2.0-rc5 so I went back a few versions and found
>that
>> 4.1-rc5 was OK (the error does not show up and the system boots just
>fine)
>> and 4.1-rc6 is not.
>>
>> Unfortunately a git-bisect between these two versions went completly
>off
>> the charts, I don't know what happened here:
>>
>> ==================================
>> first bad commit:
>>
>> 0fa372b6c95013af1334b3d5c9b5f03a70ecedab is the first bad commit
>> commit 0fa372b6c95013af1334b3d5c9b5f03a70ecedab
>> Author: Takashi Iwai <[email protected]>
>> Date: Wed May 27 16:17:19 2015 +0200
>>
>> ALSA: hda - Fix noise on AMD radeon 290x controller
>> ==================================
>>
>> I don't have this driver (or ALSA) even selected. I can reproduce
>this
>> error pretty reliably and I'd like to attempt another git-bisect
>> run when I'm more awake. But maybe somebody recognizes this error and
>> has a hint where this could come from?
>>
>> dmesg & .config: http://nerdbynature.de/bits/v4.1-rc6/
>>
>> Thanks,
>> Christian.
>> --
>> BOFH excuse #225:
>>
>> It's those computer people in X {city of world}. They keep stuffing
>things
>> up.
>> _______________________________________________
>> Linuxppc-dev mailing list
>> [email protected]
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

Can you send me your .config or did you use my .config, verbatim?

I'll try another git-bisect later today.

Thanks,
Christian.
--
make bzImage, not war

2015-08-10 01:37:13

by Michael Ellerman

[permalink] [raw]
Subject: Re: 4.1-rc6: ATA link is slow to respond, please be patient

On Sat, 2015-08-08 at 21:17 -0700, Christian Kujau wrote:
> [Adding [email protected]]
>
> On Fri, 7 Aug 2015, Christian Kujau wrote:
> > this PowerBook G4 was running 3.16 for a while but now I wanted to upgrade
> > to latest mainline. However, during bootup the following happens:
> >
> > ===============================
> > [ 2.237102] ata1: PATA max UDMA/100 irq 39
> > [ 2.401708] ata1.00: ATA-8: SAMSUNG HM061GC, LR100-10, max UDMA/100
> > [ 2.401764] ata1.00: 117231408 sectors, multi 16: LBA48
> > [ 2.417633] ata1.00: configured for UDMA/100
> > [ 44.918102] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [ 44.920452] ata1.00: failed command: READ DMA
> > [ 44.922725] ata1.00: cmd c8/00:88:64:c2:12/00:00:00:00:00/e0 tag 0 dma 69632 in
> > [ 44.927257] ata1.00: status: { DRDY }
> > [ 49.971784] ata1.00: qc timeout (cmd 0xec)
> > [ 49.976529] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [ 49.978908] ata1.00: revalidation failed (errno=-5)
> > [ 55.019662] ata1: link is slow to respond, please be patient (ready=0)
> > [ 60.007677] ata1: device not ready (errno=-16), forcing hardreset
> > [ 60.012670] ata1: soft resetting link
> > [ 60.193638] ata1.00: configured for UDMA/100
> > [ 60.196158] ata1.00: device reported invalid CHS sector 0
> > [ 60.198610] ata1: EH complete
> > ===============================
> >
> > This happens only once, but systemd thinks there's a hard problem and will
> > drop to a recovery shell. I can start sshd and login remotely and then the
> > system appears to be running just fine.
> >
> > This happened in 4.2.0-rc5 so I went back a few versions and found that
> > 4.1-rc5 was OK (the error does not show up and the system boots just fine)
> > and 4.1-rc6 is not.
> >
>
> After more digging around I noticed that the same error (with
> changed wording) happens with a Debian 3.16.0-4-powerpc kernel - so it
> doesn't appear to be a recent regression as I suspected at first:
>
> ==================================
> [ 46.907147] ata1: drained 572 bytes to clear DRQ
> [ 46.907166] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [ 46.908419] ata1.00: failed command: READ DMA
> [ 46.909058] ata1.00: cmd c8/00:80:9c:f9:60/00:00:00:00:00/e0 tag 0 dma 65536 in
> res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x20 (host bus error)
> [ 46.910303] ata1.00: status: { DRDY }
> [ 46.970579] ata1.00: configured for UDMA/100
> [ 46.971853] ata1.00: device reported invalid CHS sector 0
> [ 46.972524] ata1: EH complete
> ==================================
>
> Also, the error cannot repduced as reliably as I thought: sometimes, the
> machine just boots w/o a hitch - and that might be the reasons why my
> bisect attempts failed and incorrectly blamed totally unrelated commits:
> after each "git bisect {good,bad}" (+compiling) I rebooted but there was a
> chance that the system came up just fine / showed the same ATA error and
> thus falsified the git-bisect results.

Yes that would explain why the bisect went wrong. If you have an intermittent
bug like that you have to be very careful about which commits you mark good or
bad.

I don't really know anything about disk drivers, so hopefully someone who does
can chime in.

cheers