2023-09-18 09:05:39

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: Marvell RAID Controller issues since 6.5.x

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Hardware is a HPE ProLiant Microserver Gen10 X3216 with
>
> # lspci | grep SATA
> 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 49)
> 01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 11)
>
> # dmesg | grep ATA
> [ 0.015106] NODE_DATA(0) allocated [mem 0x1feffc000-0x1feffffff]
> [ 0.569868] ahci 0000:00:11.0: AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
> [ 0.570560] ata1: SATA max UDMA/133 abar m1024@0xfeb69000 port 0xfeb69100 irq 19
> [ 0.581964] ahci 0000:01:00.0: AHCI 0001.0200 32 slots 8 ports 6 Gbps 0xff impl SATA mode
> [ 0.586488] ata2: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40100 irq 28
> [ 0.586554] ata3: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40180 irq 28
> [ 0.586617] ata4: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40200 irq 28
> [ 0.586681] ata5: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40280 irq 28
> [ 0.586742] ata6: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40300 irq 28
> [ 0.586804] ata7: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40380 irq 28
> [ 0.586866] ata8: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40400 irq 28
> [ 0.586927] ata9: SATA max UDMA/133 abar m2048@0xfea40000 port 0xfea40480 irq 28
> [ 0.882680] ata1: SATA link down (SStatus 0 SControl 300)
> [ 0.896665] ata8: SATA link down (SStatus 0 SControl 310)
> [ 0.896979] ata7: SATA link down (SStatus 0 SControl 310)
> [ 0.897660] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 0.897986] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 0.899615] ata6: SATA link down (SStatus 0 SControl 310)
> [ 1.052964] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 1.312890] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 1.477997] ata9.00: ATAPI: MARVELL VIRTUAL, 1.09, max UDMA/66
> [ 1.478613] ata3.00: ATA-10: WDC WD40EFZX-68AWUN0, 81.00B81, max UDMA/133
> [ 1.478720] ata4.00: ATA-10: WDC WD40EFZX-68AWUN0, 81.00A81, max UDMA/133
> [ 1.478912] ata2.00: ATA-9: Samsung SSD 840 EVO 120GB, EXT0DB6Q, max UDMA/133
> [ 1.482260] scsi 1:0:0:0: Direct-Access ATA Samsung SSD 840 DB6Q PQ: 0 ANSI: 5
> [ 1.483793] scsi 2:0:0:0: Direct-Access ATA WDC WD40EFZX-68A 0B81 PQ: 0 ANSI: 5
> [ 1.485746] scsi 3:0:0:0: Direct-Access ATA WDC WD40EFZX-68A 0A81 PQ: 0 ANSI: 5
> [ 1.520882] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 1.521779] ata5.00: ATA-9: WDC WD30EFRX-68EUZN0, 82.00A82, max UDMA/133
> [ 1.523463] scsi 4:0:0:0: Direct-Access ATA WDC WD30EFRX-68E 0A82 PQ: 0 ANSI:
>
> I don't use the RAID features but make use of software RAID instead, on the first port I have a SSD with the operating system and the three others have HDDs plugged in.
>
> These days I noticed extensive load and when looking at dmesg I could see the following lines getting repeated constantly.
>
> [396495.764520] ata9.00: configured for UDMA/66
> [396496.092239] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [396496.092584] ata9.00: configured for UDMA/66
> [396496.420123] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [396496.420464] ata9.00: configured for UDMA/66
> [396496.748016] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [396496.748320] ata9.00: configured for UDMA/66
> [396497.076285] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [396497.076609] ata9.00: configured for UDMA/66
>
> First I thought it'a disk issue as I already had some of them dying and replaced, however after leaving only the SSD connected I still recieved the same dmesg spam immediatelly during boot. So my guess was that the SSD is faulty then, so I replaced my long running
>
> [ 1.036030] ata2.00: ATA-9: SanDisk SDSSDP064G, 2.0.0, max UDMA/133
>
> with with an older spare one I had lying around (using Clonezilla to clone the drive)
>
> [ 1.478912] ata2.00: ATA-9: Samsung SSD 840 EVO 120GB, EXT0DB6Q, max UDMA/133
>
> and still hit the same problem with that one. After thinking about what I changed lately besides distribution package updates it came to my mind that I upgraded from kernel 6.4.x to 6.5.x lately (kernels and their upgrades are manual on my distribution so no package was used). I used an arch linux iso to boot my system which also used a previous kernel and worked fine, compiled a 6.4.x kernel again on the system, specifically the latest 6.4.16 one. Rebootet and everything is up and running fine again so after half a day I'm pretty sure none of my hardware is faulty and it's indeed a kernel issue/regression.
>
> I hope I chose the correct component as I wasn't sure if it should be either SCSI or IO/Storage instead. Please let me know if you need further details. I can't guarantee to be able to do any actual testing like bisecting as I use the system in production.

See Bugzilla for the full thread.

Anyway, I'm adding this regression to be tracked by regzbot:

#regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217920
#regzbot title: UDMA configured spam on Marvell RAID controller

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217920

--
An old man doll... just what I always wanted! - Clara


2023-09-18 13:40:52

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: Marvell RAID Controller issues since 6.5.x

On Mon, Sep 18, 2023 at 07:34:50AM +0000, Niklas Cassel wrote:
> On Mon, Sep 18, 2023 at 07:18:28AM +0700, Bagas Sanjaya wrote:
> > Hi,
> >
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> >
> > Anyway, I'm adding this regression to be tracked by regzbot:
> >
> > #regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217920
> > #regzbot title: UDMA configured spam on Marvell RAID controller
> >
> > Thanks.
> >
> > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217920
>
> Hello Bagas,
>
> This is a duplicate of:
> https://bugzilla.kernel.org/show_bug.cgi?id=217902
>
> Problem is solved by:
> https://lore.kernel.org/linux-scsi/[email protected]/
>
>

I have asked the reporter on Bugzilla to check the fix above. When he
reports back successfully, I'll mark this report as fixed.

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (964.00 B)
signature.asc (235.00 B)
Download all attachments

2023-09-19 11:42:25

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Fwd: Marvell RAID Controller issues since 6.5.x

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 19.09.23 08:53, Bagas Sanjaya wrote:
> On Mon, Sep 18, 2023 at 02:56:16PM +0700, Bagas Sanjaya wrote:
>> On Mon, Sep 18, 2023 at 07:34:50AM +0000, Niklas Cassel wrote:
>>> On Mon, Sep 18, 2023 at 07:18:28AM +0700, Bagas Sanjaya wrote:
>>>
>>> This is a duplicate of:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217902
>>>
>>> Problem is solved by:
>>> https://lore.kernel.org/linux-scsi/[email protected]/
>>
>> I have asked the reporter on Bugzilla to check the fix above. When he
>> reports back successfully, I'll mark this report as fixed.
>>
>
> Another user has confirmed the fix (see Bugzilla), so:
>
> #regzbot fix: https://lore.kernel.org/linux-scsi/[email protected]/

Bagas, FWIW, using "#regzbot fix" is not supported (maybe it should, but
I have other priorities currently), hence let me fix this up:

#regzbot fix: scsi: Do no try to probe for CDL on old drives
#regzbot monitor:
https://lore.kernel.org/linux-scsi/[email protected]/
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-09-19 13:29:33

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: Marvell RAID Controller issues since 6.5.x

On 19/09/2023 18:32, Linux regression tracking #update (Thorsten Leemhuis) wrote:
> [TLDR: This mail in primarily relevant for Linux kernel regression
> tracking. See link in footer if these mails annoy you.]
>
> On 19.09.23 08:53, Bagas Sanjaya wrote:
>> Another user has confirmed the fix (see Bugzilla), so:
>>
>> #regzbot fix: https://lore.kernel.org/linux-scsi/[email protected]/
>
> Bagas, FWIW, using "#regzbot fix" is not supported (maybe it should, but
> I have other priorities currently), hence let me fix this up:
>
> #regzbot fix: scsi: Do no try to probe for CDL on old drives
> #regzbot monitor:
> https://lore.kernel.org/linux-scsi/[email protected]/
> #regzbot ignore-activity
>

Duh! I should have read regzbot doc then. Thanks anyway.

--
An old man doll... just what I always wanted! - Clara

2023-09-19 18:13:36

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Fwd: Marvell RAID Controller issues since 6.5.x

On Mon, Sep 18, 2023 at 02:56:16PM +0700, Bagas Sanjaya wrote:
> On Mon, Sep 18, 2023 at 07:34:50AM +0000, Niklas Cassel wrote:
> > On Mon, Sep 18, 2023 at 07:18:28AM +0700, Bagas Sanjaya wrote:
> >
> > Hello Bagas,
> >
> > This is a duplicate of:
> > https://bugzilla.kernel.org/show_bug.cgi?id=217902
> >
> > Problem is solved by:
> > https://lore.kernel.org/linux-scsi/[email protected]/
> >
> >
>
> I have asked the reporter on Bugzilla to check the fix above. When he
> reports back successfully, I'll mark this report as fixed.
>

Another user has confirmed the fix (see Bugzilla), so:

#regzbot fix: https://lore.kernel.org/linux-scsi/[email protected]/

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (812.00 B)
signature.asc (235.00 B)
Download all attachments