2015-07-23 17:48:26

by Alex Deucher

[permalink] [raw]
Subject: ATA failure regression in kernel 4.2

Something new in kernel 4.2 seems to have broken one of my hard drives
(ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
relevant logs.

Kernel 4.1:
[ 1.435809] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.435837] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.444814] input: Logitech Inc. iFeel Mouse as
/devices/pci0000:00/0000:00:10.0/usb1/1-1/1-1:1.0/0003:046D:C030.0001/input/input3
[ 1.445156] hid-generic 0003:046D:C030.0001: input,hidraw0: USB HID
v1.00 Mouse [Logitech Inc. iFeel Mouse ] on
usb-0000:00:10.0-1/input0
[ 1.447770] ata2.00: ATA-8: KINGSTON SV300S37A120G, 506ABBF0, max UDMA/133
[ 1.447773] ata2.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 1.457610] ata2.00: configured for UDMA/133
[ 1.472799] ata1.00: HPA detected: current 156299375, native 156301488
[ 1.472805] ata1.00: ATA-7: ST380815AS, 4.AAB, max UDMA/133
[ 1.472807] ata1.00: 156299375 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 1.531057] ata1.00: configured for UDMA/133
[ 1.531223] scsi 0:0:0:0: Direct-Access ATA ST380815AS
B PQ: 0 ANSI: 5
[ 1.531438] sd 0:0:0:0: [sda] 156299375 512-byte logical blocks:
(80.0 GB/74.5 GiB)
[ 1.531476] sd 0:0:0:0: [sda] Write Protect is off
[ 1.531478] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.531484] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 1.531491] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 1.531686] scsi 1:0:0:0: Direct-Access ATA KINGSTON
SV300S3 BBF0 PQ: 0 ANSI: 5
[ 1.531847] sd 1:0:0:0: [sdb] 234441648 512-byte logical blocks:
(120 GB/111 GiB)
[ 1.531879] sd 1:0:0:0: [sdb] Write Protect is off
[ 1.531881] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 1.531891] sd 1:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 1.531904] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 1.534210] sdb: sdb1
[ 1.534466] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 1.579337] sda: sda1
[ 1.579628] sd 0:0:0:0: [sda] Attached SCSI disk

Kernel 4.2:
[ 1.552246] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.552308] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.568731] input: Logitech Inc. iFeel Mouse as
/devices/pci0000:00/0000:00:10.0/usb1/1-1/1-1:1.0/0003:046D:C030.0001/input/input3
[ 1.569086] hid-generic 0003:046D:C030.0001: input,hidraw0: USB HID
v1.00 Mouse [Logitech Inc. iFeel Mouse ] on
usb-0000:00:10.0-1/input0
[ 1.597811] ata1.00: HPA detected: current 156299375, native 156301488
[ 1.597900] ata1.00: ATA-7: ST380815AS, 4.AAB, max UDMA/133
[ 1.597983] ata1.00: 156299375 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 1.656075] ata1.00: configured for UDMA/133
[ 1.656417] scsi 0:0:0:0: Direct-Access ATA ST380815AS
B PQ: 0 ANSI: 5
[ 1.656744] sd 0:0:0:0: [sda] 156299375 512-byte logical blocks:
(80.0 GB/74.5 GiB)
[ 1.656801] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 1.657071] sd 0:0:0:0: [sda] Write Protect is off
[ 1.657166] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.657201] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 1.704347] sda: sda1
[ 1.704779] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2.059840] tsc: Refined TSC clocksource calibration: 3693.283 MHz
[ 2.059936] clocksource: tsc: mask: 0xffffffffffffffff max_cycles:
0x6a7918f751d, max_idle_ns: 881590915016 ns
[ 3.059155] clocksource: Switched to clocksource tsc
[ 6.547628] ata2.00: qc timeout (cmd 0xec)
[ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 16.997819] ata2.00: qc timeout (cmd 0xec)
[ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
[ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 47.429257] ata2.00: qc timeout (cmd 0xec)
[ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

Any ideas?

Thanks,

Alex


2015-07-23 18:35:53

by Tejun Heo

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

Hello,

On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
> Something new in kernel 4.2 seems to have broken one of my hard drives
> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
> relevant logs.
>
...
> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
related. Which controller is it (lspci -nn)? Also, can you try to
bisect the issue?

Thanks.

--
tejun

2015-07-23 18:44:09

by Alex Deucher

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>> Something new in kernel 4.2 seems to have broken one of my hard drives
>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>> relevant logs.
>>
> ...
>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>
> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
> related. Which controller is it (lspci -nn)? Also, can you try to
> bisect the issue?

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7801] (rev 40)
00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
IDE Controller [1022:780c]

I can take a look at bisecting later this week.

Alex

>
> Thanks.
>
> --
> tejun

2015-07-24 17:38:43

by Alex Deucher

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>> Hello,
>>
>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>> relevant logs.
>>>
>> ...
>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>
>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>> related. Which controller is it (lspci -nn)? Also, can you try to
>> bisect the issue?
>
> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
> SATA Controller [AHCI mode] [1022:7801] (rev 40)
> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
> IDE Controller [1022:780c]
>
> I can take a look at bisecting later this week.

You were right about the interrupts. This is an AMD Kaveri APU system.

52f518a3a7c2f80551a38d38be28bc9f335e713c is the first bad commit
commit 52f518a3a7c2f80551a38d38be28bc9f335e713c
Author: Jiang Liu <[email protected]>
Date: Mon Apr 13 14:11:35 2015 +0800

x86/MSI: Use hierarchical irqdomains to manage MSI interrupts

Enhance MSI code to support hierarchical irqdomains, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: David Cohen <[email protected]>
Cc: Sander Eikelenboom <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: Joerg Roedel <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Joerg Roedel <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

:040000 040000 786bcad9a3fad413e0b744e2cfa20da7ff402db6
22618cac66dee85a7752bb3af81169fff3a242d8 M arch
:040000 040000 acee54015803d4cd52d582a9e5e93aa56ad08482
40d2c7a02c0f8677e596c98c936404b2211336a3 M drivers

git bisect start
# bad: [3aa20508a6fe386c2a893027ef4c4ef78ee4eac2] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
git bisect bad 3aa20508a6fe386c2a893027ef4c4ef78ee4eac2
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345
# bad: [0db9723cacf4d62bc3685fb15179b39ee4e17679] Merge branch 'next'
of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
git bisect bad 0db9723cacf4d62bc3685fb15179b39ee4e17679
# bad: [e3d8238d7f5c3f539a29f5ac596cd342d847e099] Merge tag
'arm64-upstream' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect bad e3d8238d7f5c3f539a29f5ac596cd342d847e099
# bad: [c70c5fb2b96dae0996fb0877d996458d3ca57eda] Merge tag
'pwm/for-4.2-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
git bisect bad c70c5fb2b96dae0996fb0877d996458d3ca57eda
# good: [e75c73ad64478c12b3a44b86a3e7f62a4f65b93e] Merge branch
'x86-fpu-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e75c73ad64478c12b3a44b86a3e7f62a4f65b93e
# bad: [43224b96af3154cedd7220f7b90094905f07ac78] Merge branch
'timers-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 43224b96af3154cedd7220f7b90094905f07ac78
# bad: [9dda1658a9bd450d65da5153a2427955785d17c2] Merge branch
'x86/asm' into x86/core, to prepare for new patch
git bisect bad 9dda1658a9bd450d65da5153a2427955785d17c2
# bad: [191a66353b22fad8ac89404ab4c929cbe7b0afb2] Merge branch
'x86/asm' into x86/apic, to resolve a conflict
git bisect bad 191a66353b22fad8ac89404ab4c929cbe7b0afb2
# bad: [099c5c03487f6bca30c628e14e666788dd61fb33] irq_remapping/vt-d:
Move struct irq_2_iommu into intel_irq_remapping.c
git bisect bad 099c5c03487f6bca30c628e14e666788dd61fb33
# bad: [90d84fe95dd6b418383aa0e0e5cace8f1b1e7e30] x86/MSI: Replace
msi_update_msg() with irq_chip_compose_msi_msg()
git bisect bad 90d84fe95dd6b418383aa0e0e5cace8f1b1e7e30
# good: [3cb96f0c97330834929abe9bd2ca3c252a83def0] x86/hpet: Enhance
HPET IRQ to support hierarchical irqdomains
git bisect good 3cb96f0c97330834929abe9bd2ca3c252a83def0
# bad: [b1855c752e67d1125d41fadb499014b49a245db8] x86/MSI: Clean up
unused MSI related code and interfaces
git bisect bad b1855c752e67d1125d41fadb499014b49a245db8
# bad: [2b43817e481da9f5118adb56aef46b3f0298c685] irq_remapping/vt-d:
Clean up unused MSI related code
git bisect bad 2b43817e481da9f5118adb56aef46b3f0298c685
# bad: [80aa283364a17998dceb577bd185e3380b927544] x86/irq: Directly
call native_compose_msi_msg() for DMAR IRQ
git bisect bad 80aa283364a17998dceb577bd185e3380b927544
# bad: [52f518a3a7c2f80551a38d38be28bc9f335e713c] x86/MSI: Use
hierarchical irqdomains to manage MSI interrupts
git bisect bad 52f518a3a7c2f80551a38d38be28bc9f335e713c
# first bad commit: [52f518a3a7c2f80551a38d38be28bc9f335e713c]
x86/MSI: Use hierarchical irqdomains to manage MSI interrupts

lscpi:
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1422]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1423]
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Kaveri [Radeon R7 200 Series] [1002:130f]
00:01.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
Device [1002:1308]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1424]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1424]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1424]
00:10.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB XHCI Controller [1022:7814] (rev 09)
00:10.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB XHCI Controller [1022:7814] (rev 09)
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7801] (rev 40)
00:12.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB OHCI Controller [1022:7807] (rev 11)
00:12.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB EHCI Controller [1022:7808] (rev 11)
00:13.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB OHCI Controller [1022:7807] (rev 11)
00:13.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB EHCI Controller [1022:7808] (rev 11)
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus
Controller [1022:780b] (rev 16)
00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
IDE Controller [1022:780c]
00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] FCH
Azalia Controller [1022:780d] (rev 01)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC
Bridge [1022:780e] (rev 11)
00:14.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] FCH PCI
Bridge [1022:780f] (rev 40)
00:14.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH
USB OHCI Controller [1022:7809] (rev 11)
00:15.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Hudson
PCI to PCI bridge (PCIE port 0) [1022:43a0]
00:15.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Hudson
PCI to PCI bridge (PCIE port 2) [1022:43a2]
00:15.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Hudson
PCI to PCI bridge (PCIE port 3) [1022:43a3]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:141a]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:141b]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:141c]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:141d]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:141e]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:141f]
03:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042
SuperSpeed USB Host Controller [1b21:1042]
04:00.0 Ethernet controller [0200]: Qualcomm Atheros QCA8171 Gigabit
Ethernet [1969:10a1] (rev 10)


>
> Alex
>
>>
>> Thanks.
>>
>> --
>> tejun

2015-07-27 03:01:59

by Jiang Liu

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On 2015/7/25 1:38, Alex Deucher wrote:
> On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
>> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>>> Hello,
>>>
>>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>>> relevant logs.
>>>>
>>> ...
>>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>
>>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>>> related. Which controller is it (lspci -nn)? Also, can you try to
>>> bisect the issue?
>>
>> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
>> SATA Controller [AHCI mode] [1022:7801] (rev 40)
>> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
>> IDE Controller [1022:780c]
>>
>> I can take a look at bisecting later this week.
>
> You were right about the interrupts. This is an AMD Kaveri APU system.
Hi Alex,
Could you please help to provide more information about the
system so we could identify the issue? Dmesg and /proc/interrupts
from good and bad kernels are welcomed.
Thanks!
Gerry

2015-07-27 15:21:16

by Alex Deucher

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On Sun, Jul 26, 2015 at 11:01 PM, Jiang Liu <[email protected]> wrote:
> On 2015/7/25 1:38, Alex Deucher wrote:
>> On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
>>> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>>>> Hello,
>>>>
>>>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>>>> relevant logs.
>>>>>
>>>> ...
>>>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>
>>>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>>>> related. Which controller is it (lspci -nn)? Also, can you try to
>>>> bisect the issue?
>>>
>>> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
>>> SATA Controller [AHCI mode] [1022:7801] (rev 40)
>>> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
>>> IDE Controller [1022:780c]
>>>
>>> I can take a look at bisecting later this week.
>>
>> You were right about the interrupts. This is an AMD Kaveri APU system.
> Hi Alex,
> Could you please help to provide more information about the
> system so we could identify the issue? Dmesg and /proc/interrupts
> from good and bad kernels are welcomed.
> Thanks!
> Gerry

See attached. Thanks!

Alex


Attachments:
interrupts.working (3.86 kB)
interrupts.broken (4.14 kB)
dmesg.working (62.42 kB)
dmesg.broken (57.01 kB)
Download all attachments

2015-07-27 16:30:59

by Jiang Liu

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On 2015/7/27 23:21, Alex Deucher wrote:
> On Sun, Jul 26, 2015 at 11:01 PM, Jiang Liu <[email protected]> wrote:
>> On 2015/7/25 1:38, Alex Deucher wrote:
>>> On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
>>>> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>>>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>>>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>>>>> relevant logs.
>>>>>>
>>>>> ...
>>>>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>>>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>>>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>>>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>>>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>
>>>>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>>>>> related. Which controller is it (lspci -nn)? Also, can you try to
>>>>> bisect the issue?
>>>>
>>>> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
>>>> SATA Controller [AHCI mode] [1022:7801] (rev 40)
>>>> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
>>>> IDE Controller [1022:780c]
>>>>
>>>> I can take a look at bisecting later this week.
>>>
>>> You were right about the interrupts. This is an AMD Kaveri APU system.
>> Hi Alex,
>> Could you please help to provide more information about the
>> system so we could identify the issue? Dmesg and /proc/interrupts
>> from good and bad kernels are welcomed.
>> Thanks!
>> Gerry
>
> See attached. Thanks!
Hi Alex,
Thanks for the info. Seems something is wrong with multiple-MSI
support. To narrow down the scope, could you please help to:
1) apply the small patch and retest
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 7e62751abfac..35f524cc23b7 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1345,6 +1345,7 @@ static int ahci_init_msi(struct pci_dev *pdev,
unsigned int n_ports,
if (nvec < 0)
return nvec;

+ nvec = 1;
/*
* If number of MSIs is less than number of ports then Sharing Last
* Message mode could be enforced. In this case assume that
advantage
2) Disable interrupt remapping by kernel parameter "nointremap"
and retest.
Thanks!
Gerry

2015-07-27 17:36:30

by Alex Deucher

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On Mon, Jul 27, 2015 at 12:30 PM, Jiang Liu <[email protected]> wrote:
> On 2015/7/27 23:21, Alex Deucher wrote:
>> On Sun, Jul 26, 2015 at 11:01 PM, Jiang Liu <[email protected]> wrote:
>>> On 2015/7/25 1:38, Alex Deucher wrote:
>>>> On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
>>>>> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>>>>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>>>>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>>>>>> relevant logs.
>>>>>>>
>>>>>> ...
>>>>>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>>>>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>>>>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>>>>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>>>>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>>
>>>>>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>>>>>> related. Which controller is it (lspci -nn)? Also, can you try to
>>>>>> bisect the issue?
>>>>>
>>>>> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
>>>>> SATA Controller [AHCI mode] [1022:7801] (rev 40)
>>>>> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
>>>>> IDE Controller [1022:780c]
>>>>>
>>>>> I can take a look at bisecting later this week.
>>>>
>>>> You were right about the interrupts. This is an AMD Kaveri APU system.
>>> Hi Alex,
>>> Could you please help to provide more information about the
>>> system so we could identify the issue? Dmesg and /proc/interrupts
>>> from good and bad kernels are welcomed.
>>> Thanks!
>>> Gerry
>>
>> See attached. Thanks!
> Hi Alex,
> Thanks for the info. Seems something is wrong with multiple-MSI
> support. To narrow down the scope, could you please help to:
> 1) apply the small patch and retest
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 7e62751abfac..35f524cc23b7 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1345,6 +1345,7 @@ static int ahci_init_msi(struct pci_dev *pdev,
> unsigned int n_ports,
> if (nvec < 0)
> return nvec;
>
> + nvec = 1;
> /*
> * If number of MSIs is less than number of ports then Sharing Last
> * Message mode could be enforced. In this case assume that
> advantage

This worked.


> 2) Disable interrupt remapping by kernel parameter "nointremap"
> and retest.

nointremap worked too.

dmesg and /proc/interrupts output for both tests attached.

Alex


Attachments:
interrupts.nointremap (4.09 kB)
interrupts.nvec1 (4.15 kB)
dmesg.nvec1 (58.40 kB)
dmesg.nointremap (58.44 kB)
Download all attachments

2015-07-28 18:19:18

by Alex Deucher

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

On Mon, Jul 27, 2015 at 12:30 PM, Jiang Liu <[email protected]> wrote:
> On 2015/7/27 23:21, Alex Deucher wrote:
>> On Sun, Jul 26, 2015 at 11:01 PM, Jiang Liu <[email protected]> wrote:
>>> On 2015/7/25 1:38, Alex Deucher wrote:
>>>> On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
>>>>> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>>>>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>>>>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>>>>>> relevant logs.
>>>>>>>
>>>>>> ...
>>>>>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>>>>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>>>>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>>>>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>>>>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>>
>>>>>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>>>>>> related. Which controller is it (lspci -nn)? Also, can you try to
>>>>>> bisect the issue?
>>>>>
>>>>> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
>>>>> SATA Controller [AHCI mode] [1022:7801] (rev 40)
>>>>> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
>>>>> IDE Controller [1022:780c]
>>>>>
>>>>> I can take a look at bisecting later this week.
>>>>
>>>> You were right about the interrupts. This is an AMD Kaveri APU system.
>>> Hi Alex,
>>> Could you please help to provide more information about the
>>> system so we could identify the issue? Dmesg and /proc/interrupts
>>> from good and bad kernels are welcomed.
>>> Thanks!
>>> Gerry
>>
>> See attached. Thanks!
> Hi Alex,
> Thanks for the info. Seems something is wrong with multiple-MSI
> support. To narrow down the scope, could you please help to:

I'm also not getting interrupts in my gpu driver. I haven't bisected
this specifically, but I suspect it is related since it to used to
work in 4.1. Whether I enable MSIs or not in my driver, I get a huge
numbers of interrupts on all CPUs as soon as the driver is loaded, but
the driver isr never gets called. E.g.,
49: 117757835 117763227 117787837 117868913 PCI-MSI
524288-edge amdgpu
nointremap doesn't seem to help.

Alex

> 1) apply the small patch and retest
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 7e62751abfac..35f524cc23b7 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1345,6 +1345,7 @@ static int ahci_init_msi(struct pci_dev *pdev,
> unsigned int n_ports,
> if (nvec < 0)
> return nvec;
>
> + nvec = 1;
> /*
> * If number of MSIs is less than number of ports then Sharing Last
> * Message mode could be enforced. In this case assume that
> advantage
> 2) Disable interrupt remapping by kernel parameter "nointremap"
> and retest.
> Thanks!
> Gerry

2015-07-28 18:37:41

by Alexander Holler

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

Am 28.07.2015 um 20:19 schrieb Alex Deucher:
> On Mon, Jul 27, 2015 at 12:30 PM, Jiang Liu <[email protected]> wrote:
>> On 2015/7/27 23:21, Alex Deucher wrote:
>>> On Sun, Jul 26, 2015 at 11:01 PM, Jiang Liu <[email protected]> wrote:
>>>> On 2015/7/25 1:38, Alex Deucher wrote:
>>>>> On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <[email protected]> wrote:
>>>>>> On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <[email protected]> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
>>>>>>>> Something new in kernel 4.2 seems to have broken one of my hard drives
>>>>>>>> (ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
>>>>>>>> relevant logs.
>>>>>>>>
>>>>>>> ...
>>>>>>>> [ 6.547628] ata2.00: qc timeout (cmd 0xec)
>>>>>>>> [ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>>> [ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>>>>> [ 16.997819] ata2.00: qc timeout (cmd 0xec)
>>>>>>>> [ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>>> [ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
>>>>>>>> [ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>>>> [ 47.429257] ata2.00: qc timeout (cmd 0xec)
>>>>>>>> [ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>>>> [ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>>>>>>>
>>>>>>> Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
>>>>>>> related. Which controller is it (lspci -nn)? Also, can you try to
>>>>>>> bisect the issue?
>>>>>>
>>>>>> 00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
>>>>>> SATA Controller [AHCI mode] [1022:7801] (rev 40)
>>>>>> 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
>>>>>> IDE Controller [1022:780c]
>>>>>>
>>>>>> I can take a look at bisecting later this week.
>>>>>
>>>>> You were right about the interrupts. This is an AMD Kaveri APU system.
>>>> Hi Alex,
>>>> Could you please help to provide more information about the
>>>> system so we could identify the issue? Dmesg and /proc/interrupts
>>>> from good and bad kernels are welcomed.
>>>> Thanks!
>>>> Gerry
>>>
>>> See attached. Thanks!
>> Hi Alex,
>> Thanks for the info. Seems something is wrong with multiple-MSI
>> support. To narrow down the scope, could you please help to:
>
> I'm also not getting interrupts in my gpu driver. I haven't bisected
> this specifically, but I suspect it is related since it to used to
> work in 4.1. Whether I enable MSIs or not in my driver, I get a huge
> numbers of interrupts on all CPUs as soon as the driver is loaded, but
> the driver isr never gets called. E.g.,
> 49: 117757835 117763227 117787837 117868913 PCI-MSI
> 524288-edge amdgpu
> nointremap doesn't seem to help.

Same problem here with the AMD SATA controller (1022:7801). It failed to
identify the second disk when using 4.2-rc4.

nointremap helped, nothing else tested

Regards,

Alexander Holler

2015-07-30 17:00:09

by Jiang Liu

[permalink] [raw]
Subject: [Debug Patch] Collect more information about the regression

Hi Alexander, Mark, Alex,
Could you please help to apply the debug patch and send me back
the dmesg? Please also help to turn kernel paramemter "apic=debug".

Hi Mark,
It seems that this regression is caused by support of multiple-MSI,
but I have no PCI card supportting multiple-MSI at hand. So may I remotely
access your system from Intel internal network? That will definitely speed
up fix.
Thanks!
Gerry

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/amd_iommu.c | 15 +++++++++++++--
drivers/iommu/intel_irq_remapping.c | 4 ++++
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a57e9b749895..c039ed9333a4 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3916,8 +3916,8 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
union irte *irte = &data->irte_entry;
struct IO_APIC_route_entry *entry;

- data->irq_2_irte.devid = devid;
- data->irq_2_irte.index = index + sub_handle;
+ irte_info->devid = devid;
+ irte_info->index = index + sub_handle;

/* Setup IRTE for IOMMU */
irte->val = 0;
@@ -3926,6 +3926,10 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data,
irte->fields.destination = irq_cfg->dest_apicid;
irte->fields.dm = apic->irq_dest_mode;
irte->fields.valid = 1;
+ if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+ pr_warn("irqdomain: IRTE%d vector %d APICID%d data%p cfg%p\n",
+ irte_info->index, irte->fields.vector,
+ irte->fields.destination, data, irq_cfg);

switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
@@ -3972,6 +3976,9 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
info->type != X86_IRQ_ALLOC_TYPE_MSIX)
return -EINVAL;

+ if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+ pr_warn("irqdomain: allocate %d MSI IRQ, VIRQ%d\n", nr_irqs, virq);
+
/*
* With IRQ remapping enabled, don't need contiguous CPU vectors
* to support multiple MSI interrupts.
@@ -3986,6 +3993,8 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
if (ret < 0)
return ret;
+ if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+ pr_warn("irqdomain: allocate parent returns %d\n", ret);

ret = -ENOMEM;
data = kzalloc(sizeof(*data), GFP_KERNEL);
@@ -4005,6 +4014,8 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
kfree(data);
goto out_free_parent;
}
+ if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+ pr_warn("irqdomain: allocate IRTE index %d\n", index);

for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(domain, virq + i);
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index f15692a410c7..bb093bc6b334 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1234,6 +1234,8 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
MSI_ADDR_IR_SHV |
MSI_ADDR_IR_INDEX1(index) |
MSI_ADDR_IR_INDEX2(index);
+ if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+ pr_warn("irqdomain: allocate index%d, subhandle%d, irte%llx,%llx\n", index, sub_handle, irte->high, irte->low);
break;

default:
@@ -1305,6 +1307,8 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain,
kfree(data);
goto out_free_parent;
}
+ if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+ pr_warn("irqdomain: allocate VIRQ%d, count%d, index%d\n", virq, nr_irqs, index);

for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(domain, virq + i);
--
1.7.10.4

2015-07-31 10:38:16

by Alexander Holler

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

Am 28.07.2015 um 20:37 schrieb Alexander Holler:

> Same problem here with the AMD SATA controller (1022:7801). It failed to
> identify the second disk when using 4.2-rc4.
>
> nointremap helped, nothing else tested

A quick look at the change history and two tests are suggesting that the
right one already is in cc.

commit

f7fa7ae x86/irq: Avoid memory allocation in __assign_irq_vector()

fails while

af87bae x86/htirq: Use new irqdomain interfaces to allocate/free IRQ

works. Inbetween almost all changes are from Jiang Liu.

Haven't bisected it up to now but might do it later.

> Regards,
>
> Alexander Holler

2015-07-31 10:42:40

by Jiang Liu

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2



On 2015/7/31 18:38, Alexander Holler wrote:
> Am 28.07.2015 um 20:37 schrieb Alexander Holler:
>
>> Same problem here with the AMD SATA controller (1022:7801). It failed to
>> identify the second disk when using 4.2-rc4.
>>
>> nointremap helped, nothing else tested
>
> A quick look at the change history and two tests are suggesting that the
> right one already is in cc.
>
> commit
>
> f7fa7ae x86/irq: Avoid memory allocation in __assign_irq_vector()
>
> fails while
>
> af87bae x86/htirq: Use new irqdomain interfaces to allocate/free IRQ
>
> works. Inbetween almost all changes are from Jiang Liu.
>
> Haven't bisected it up to now but might do it later.
Hi Alexander,
Really thanks for you help. I have debugged this regression
for a whole day without any progress:( Everything seems correct to me,
and I have no hardware to reproduce and debug it. Still need more time:(
Thanks!
Gerry

2015-07-31 18:59:36

by Alexander Holler

[permalink] [raw]
Subject: Re: ATA failure regression in kernel 4.2

Am 31.07.2015 um 12:42 schrieb Jiang Liu:
>
>
> On 2015/7/31 18:38, Alexander Holler wrote:
>> Am 28.07.2015 um 20:37 schrieb Alexander Holler:
>>
>>> Same problem here with the AMD SATA controller (1022:7801). It failed to
>>> identify the second disk when using 4.2-rc4.
>>>
>>> nointremap helped, nothing else tested
>>
>> A quick look at the change history and two tests are suggesting that the
>> right one already is in cc.
>>
>> commit
>>
>> f7fa7ae x86/irq: Avoid memory allocation in __assign_irq_vector()
>>
>> fails while
>>
>> af87bae x86/htirq: Use new irqdomain interfaces to allocate/free IRQ
>>
>> works. Inbetween almost all changes are from Jiang Liu.
>>
>> Haven't bisected it up to now but might do it later.
> Hi Alexander,
> Really thanks for you help. I have debugged this regression
> for a whole day without any progress:( Everything seems correct to me,
> and I have no hardware to reproduce and debug it. Still need more time:(
> Thanks!
> Gerry
>

There's nothing to excuse. It wasn't a flame, just a determination. I'm
sorry if it might have sounded like a flame. Only people which are doing
nothing don't make failures, and doing nothing often is a failure
itself. But I'm drifting aways.

I've now bisected those commits I've had already identified by other
means down to the commit 52f518a3a7c2f80551a38d38be28bc9f335e713c
"x86/MSI: Use hierarchical irqdomains to manage MSI interrupts".

That is the one where 4.2-rc* fails here first.

Regards,

Alexander Holler