Hello!
I upgraded machine with NVIDIA SATA controller (nforce4 chipse) from
Debian Stretch to Debian Buster and SATA disks started to have problems.
I booted back to Debian Stretch kernel version (having userspace
untouched in Buster) and everything was like before, so problem is 100%
kernel related. Problematic is APM support (it does not work at all),
HPA support (kernel show warnings at boot time) and whole booting is
delayed by 10 seconds. Also broken is disk speed test.
SATA controller is using sata_nv.ko kernel driver and in lspci is
identified as:
00:07.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0054] (rev f3)
00:08.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0055] (rev f3)
Debian Stretch has kernel version (which is working fine):
4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
Debian Buster has kernel version (which is problematic):
4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
So kernel regression happened somewhere between 4.9 and 4.19 versions.
APM on Stretch:
$ sudo hdparm -B /dev/sda
/dev/sda:
APM_level = not supported
$ sudo hdparm -B /dev/sdb
/dev/sdb:
APM_level = off
APM on Buster:
$ sudo hdparm -B /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
APM_level = not supported
$ sudo hdparm -B /dev/sdb
/dev/sdb:
SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
APM_level = not supported
/dev/sda does not support APM, but /dev/sdb supports. I do not
understand what above SG_IO error means, but because it works fine on
older kernel version, it is not hardware problem.
Disk speed test on Stretch:
$ sudo hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 118 MB in 2.00 seconds = 58.91 MB/sec
Timing buffered disk reads: 116 MB in 3.09 seconds = 37.54 MB/sec
$ sudo hdparm -Tt /dev/sdb
/dev/sdb:
Timing cached reads: 1242 MB in 2.00 seconds = 620.93 MB/sec
Timing buffered disk reads: 388 MB in 3.00 seconds = 129.31 MB/sec
Disk speed test on Buster:
$ sudo hdparm -Tt /dev/sda
/dev/sda:
read() hit EOF - device too small
SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Timing buffered disk reads: read() hit EOF - device too small
$ sudo hdparm -Tt /dev/sdb
/dev/sdb:
read() hit EOF - device too small
SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Timing buffered disk reads: read() hit EOF - device too small
As can be seen disk speed test is completely broken on new kernel
version and hdparm returns same error as for APM.
dmesg output on Stretch:
[ 1.716970] sata_nv 0000:00:07.0: version 3.5
[ 1.717309] sata_nv 0000:00:07.0: Using ADMA mode
[ 1.717358] sata_nv 0000:00:07.0: Using MSI
[ 1.717810] scsi host0: sata_nv
[ 1.717954] scsi host1: sata_nv
[ 1.718016] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
[ 1.718024] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
[ 1.718308] sata_nv 0000:00:08.0: Using ADMA mode
[ 1.718345] sata_nv 0000:00:08.0: Using MSI
[ 1.718757] scsi host2: sata_nv
[ 1.718886] scsi host3: sata_nv
[ 2.192111] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.194691] ata1.00: HPA detected: current 976771055, native 976773168
[ 2.194701] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
[ 2.194709] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 2.199241] ata1.00: configured for UDMA/133
[ 2.199501] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
[ 2.708030] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.710442] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
[ 2.710455] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 2.715066] ata2.00: configured for UDMA/133
[ 2.715333] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
dmesg output on Buster:
[ 2.079293] sata_nv 0000:00:07.0: version 3.5
[ 2.133503] sata_nv 0000:00:07.0: Using ADMA mode
[ 2.137138] sata_nv 0000:00:07.0: Using MSI
[ 2.142043] scsi host0: sata_nv
[ 2.174745] scsi host2: sata_nv
[ 2.178329] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
[ 2.181675] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
[ 2.188680] sata_nv 0000:00:08.0: Using ADMA mode
[ 2.215676] sata_nv 0000:00:08.0: Using MSI
[ 2.219649] scsi host4: sata_nv
[ 2.226626] scsi host5: sata_nv
[ 2.657732] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 7.773692] ata1.00: qc timeout (cmd 0x27)
[ 7.773738] ata1.00: failed to read native max address (err_mask=0x4)
[ 7.773785] ata1.00: HPA support seems broken, skipping HPA handling
[ 8.245678] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 8.248009] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
[ 8.248065] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 32)
[ 8.252593] ata1.00: configured for UDMA/133
[ 8.252964] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
[ 8.725693] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 13.917688] ata2.00: qc timeout (cmd 0x27)
[ 13.920096] ata2.00: failed to read native max address (err_mask=0x4)
[ 13.922491] ata2.00: HPA support seems broken, skipping HPA handling
[ 14.393683] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 14.398360] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
[ 14.400813] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
[ 14.407722] ata2.00: configured for UDMA/133
[ 14.412939] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
As can be seen new kernel has problems with handling of both SATA
controllers and disks HPA area. Plus before kernel prints
"qc timeout (cmd 0x27)" there is nothing on output, seems that kernel
waits until 5s timeout occur and it slow down booting by 10s.
Do you have any idea what is happening there? What those SG_IO errors
or dmesg errors means?
I'm CCing all people who touched sata_nv.c file between 4.9 and 4.19
versions, so maybe somebody would know anything about this problem.
If you need more information or other outputs, please let me know and I
can provide it.
--
Pali Rohár
[email protected]
On Tuesday 24 December 2019 18:35:29 Pali Rohár wrote:
> Hello!
>
> I upgraded machine with NVIDIA SATA controller (nforce4 chipse) from
> Debian Stretch to Debian Buster and SATA disks started to have problems.
> I booted back to Debian Stretch kernel version (having userspace
> untouched in Buster) and everything was like before, so problem is 100%
> kernel related. Problematic is APM support (it does not work at all),
> HPA support (kernel show warnings at boot time) and whole booting is
> delayed by 10 seconds. Also broken is disk speed test.
>
> SATA controller is using sata_nv.ko kernel driver and in lspci is
> identified as:
>
> 00:07.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0054] (rev f3)
> 00:08.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0055] (rev f3)
>
> Debian Stretch has kernel version (which is working fine):
>
> 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
>
> Debian Buster has kernel version (which is problematic):
>
> 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
>
> So kernel regression happened somewhere between 4.9 and 4.19 versions.
>
> APM on Stretch:
>
> $ sudo hdparm -B /dev/sda
>
> /dev/sda:
> APM_level = not supported
>
> $ sudo hdparm -B /dev/sdb
>
> /dev/sdb:
> APM_level = off
>
> APM on Buster:
>
> $ sudo hdparm -B /dev/sda
>
> /dev/sda:
> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> APM_level = not supported
>
>
> $ sudo hdparm -B /dev/sdb
>
> /dev/sdb:
> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> APM_level = not supported
>
> /dev/sda does not support APM, but /dev/sdb supports. I do not
> understand what above SG_IO error means, but because it works fine on
> older kernel version, it is not hardware problem.
>
> Disk speed test on Stretch:
>
> $ sudo hdparm -Tt /dev/sda
>
> /dev/sda:
> Timing cached reads: 118 MB in 2.00 seconds = 58.91 MB/sec
> Timing buffered disk reads: 116 MB in 3.09 seconds = 37.54 MB/sec
>
> $ sudo hdparm -Tt /dev/sdb
>
> /dev/sdb:
> Timing cached reads: 1242 MB in 2.00 seconds = 620.93 MB/sec
> Timing buffered disk reads: 388 MB in 3.00 seconds = 129.31 MB/sec
>
> Disk speed test on Buster:
>
> $ sudo hdparm -Tt /dev/sda
>
> /dev/sda:
> read() hit EOF - device too small
> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Timing buffered disk reads: read() hit EOF - device too small
>
> $ sudo hdparm -Tt /dev/sdb
>
> /dev/sdb:
> read() hit EOF - device too small
> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Timing buffered disk reads: read() hit EOF - device too small
>
> As can be seen disk speed test is completely broken on new kernel
> version and hdparm returns same error as for APM.
>
> dmesg output on Stretch:
>
> [ 1.716970] sata_nv 0000:00:07.0: version 3.5
> [ 1.717309] sata_nv 0000:00:07.0: Using ADMA mode
> [ 1.717358] sata_nv 0000:00:07.0: Using MSI
> [ 1.717810] scsi host0: sata_nv
> [ 1.717954] scsi host1: sata_nv
> [ 1.718016] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> [ 1.718024] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> [ 1.718308] sata_nv 0000:00:08.0: Using ADMA mode
> [ 1.718345] sata_nv 0000:00:08.0: Using MSI
> [ 1.718757] scsi host2: sata_nv
> [ 1.718886] scsi host3: sata_nv
> [ 2.192111] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 2.194691] ata1.00: HPA detected: current 976771055, native 976773168
> [ 2.194701] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> [ 2.194709] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 31/32)
> [ 2.199241] ata1.00: configured for UDMA/133
> [ 2.199501] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> [ 2.708030] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 2.710442] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> [ 2.710455] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> [ 2.715066] ata2.00: configured for UDMA/133
> [ 2.715333] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>
> dmesg output on Buster:
>
> [ 2.079293] sata_nv 0000:00:07.0: version 3.5
> [ 2.133503] sata_nv 0000:00:07.0: Using ADMA mode
> [ 2.137138] sata_nv 0000:00:07.0: Using MSI
> [ 2.142043] scsi host0: sata_nv
> [ 2.174745] scsi host2: sata_nv
> [ 2.178329] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> [ 2.181675] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> [ 2.188680] sata_nv 0000:00:08.0: Using ADMA mode
> [ 2.215676] sata_nv 0000:00:08.0: Using MSI
> [ 2.219649] scsi host4: sata_nv
> [ 2.226626] scsi host5: sata_nv
> [ 2.657732] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 7.773692] ata1.00: qc timeout (cmd 0x27)
> [ 7.773738] ata1.00: failed to read native max address (err_mask=0x4)
> [ 7.773785] ata1.00: HPA support seems broken, skipping HPA handling
> [ 8.245678] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 8.248009] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> [ 8.248065] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 32)
> [ 8.252593] ata1.00: configured for UDMA/133
> [ 8.252964] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> [ 8.725693] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 13.917688] ata2.00: qc timeout (cmd 0x27)
> [ 13.920096] ata2.00: failed to read native max address (err_mask=0x4)
> [ 13.922491] ata2.00: HPA support seems broken, skipping HPA handling
> [ 14.393683] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 14.398360] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> [ 14.400813] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> [ 14.407722] ata2.00: configured for UDMA/133
> [ 14.412939] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>
> As can be seen new kernel has problems with handling of both SATA
> controllers and disks HPA area. Plus before kernel prints
> "qc timeout (cmd 0x27)" there is nothing on output, seems that kernel
> waits until 5s timeout occur and it slow down booting by 10s.
>
> Do you have any idea what is happening there? What those SG_IO errors
> or dmesg errors means?
>
> I'm CCing all people who touched sata_nv.c file between 4.9 and 4.19
> versions, so maybe somebody would know anything about this problem.
>
> If you need more information or other outputs, please let me know and I
> can provide it.
Now I tested also versions 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17 and
4.18. And problem appeared only in 4.18 (all previous versions work
fine) In 4.18 dmesg is:
[ 8.596039] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 8.598489] ata2: illegal qc_active transition (100000000->100000001)
[ 13.792086] ata2.00: qc timeout (cmd 0x27)
[ 13.792122] ata2.00: failed to read native max address (err_mask=0x4)
[ 13.792167] ata2.00: HPA support seems broken, skipping HPA handling
[ 14.264041] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 14.268756] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
[ 14.271230] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
[ 14.278161] ata2.00: configured for UDMA/133
[ 14.283427] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
(There is another line "illegal qc_active transition" which is not
present in 4.19)
So this problem must have been introduced during 4.18 release cycle as
4.17 version is working fine.
--
Pali Rohár
[email protected]
Hi Jens, can you please try to look at this problem?
On Wednesday 25 December 2019 12:05:31 Pali Rohár wrote:
> On Tuesday 24 December 2019 18:35:29 Pali Rohár wrote:
> > Hello!
> >
> > I upgraded machine with NVIDIA SATA controller (nforce4 chipse) from
> > Debian Stretch to Debian Buster and SATA disks started to have problems.
> > I booted back to Debian Stretch kernel version (having userspace
> > untouched in Buster) and everything was like before, so problem is 100%
> > kernel related. Problematic is APM support (it does not work at all),
> > HPA support (kernel show warnings at boot time) and whole booting is
> > delayed by 10 seconds. Also broken is disk speed test.
> >
> > SATA controller is using sata_nv.ko kernel driver and in lspci is
> > identified as:
> >
> > 00:07.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0054] (rev f3)
> > 00:08.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0055] (rev f3)
> >
> > Debian Stretch has kernel version (which is working fine):
> >
> > 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
> >
> > Debian Buster has kernel version (which is problematic):
> >
> > 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
> >
> > So kernel regression happened somewhere between 4.9 and 4.19 versions.
> >
> > APM on Stretch:
> >
> > $ sudo hdparm -B /dev/sda
> >
> > /dev/sda:
> > APM_level = not supported
> >
> > $ sudo hdparm -B /dev/sdb
> >
> > /dev/sdb:
> > APM_level = off
> >
> > APM on Buster:
> >
> > $ sudo hdparm -B /dev/sda
> >
> > /dev/sda:
> > SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > APM_level = not supported
> >
> >
> > $ sudo hdparm -B /dev/sdb
> >
> > /dev/sdb:
> > SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > APM_level = not supported
> >
> > /dev/sda does not support APM, but /dev/sdb supports. I do not
> > understand what above SG_IO error means, but because it works fine on
> > older kernel version, it is not hardware problem.
> >
> > Disk speed test on Stretch:
> >
> > $ sudo hdparm -Tt /dev/sda
> >
> > /dev/sda:
> > Timing cached reads: 118 MB in 2.00 seconds = 58.91 MB/sec
> > Timing buffered disk reads: 116 MB in 3.09 seconds = 37.54 MB/sec
> >
> > $ sudo hdparm -Tt /dev/sdb
> >
> > /dev/sdb:
> > Timing cached reads: 1242 MB in 2.00 seconds = 620.93 MB/sec
> > Timing buffered disk reads: 388 MB in 3.00 seconds = 129.31 MB/sec
> >
> > Disk speed test on Buster:
> >
> > $ sudo hdparm -Tt /dev/sda
> >
> > /dev/sda:
> > read() hit EOF - device too small
> > SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > Timing buffered disk reads: read() hit EOF - device too small
> >
> > $ sudo hdparm -Tt /dev/sdb
> >
> > /dev/sdb:
> > read() hit EOF - device too small
> > SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > Timing buffered disk reads: read() hit EOF - device too small
> >
> > As can be seen disk speed test is completely broken on new kernel
> > version and hdparm returns same error as for APM.
> >
> > dmesg output on Stretch:
> >
> > [ 1.716970] sata_nv 0000:00:07.0: version 3.5
> > [ 1.717309] sata_nv 0000:00:07.0: Using ADMA mode
> > [ 1.717358] sata_nv 0000:00:07.0: Using MSI
> > [ 1.717810] scsi host0: sata_nv
> > [ 1.717954] scsi host1: sata_nv
> > [ 1.718016] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> > [ 1.718024] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> > [ 1.718308] sata_nv 0000:00:08.0: Using ADMA mode
> > [ 1.718345] sata_nv 0000:00:08.0: Using MSI
> > [ 1.718757] scsi host2: sata_nv
> > [ 1.718886] scsi host3: sata_nv
> > [ 2.192111] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 2.194691] ata1.00: HPA detected: current 976771055, native 976773168
> > [ 2.194701] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> > [ 2.194709] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 31/32)
> > [ 2.199241] ata1.00: configured for UDMA/133
> > [ 2.199501] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> > [ 2.708030] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 2.710442] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> > [ 2.710455] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> > [ 2.715066] ata2.00: configured for UDMA/133
> > [ 2.715333] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >
> > dmesg output on Buster:
> >
> > [ 2.079293] sata_nv 0000:00:07.0: version 3.5
> > [ 2.133503] sata_nv 0000:00:07.0: Using ADMA mode
> > [ 2.137138] sata_nv 0000:00:07.0: Using MSI
> > [ 2.142043] scsi host0: sata_nv
> > [ 2.174745] scsi host2: sata_nv
> > [ 2.178329] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> > [ 2.181675] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> > [ 2.188680] sata_nv 0000:00:08.0: Using ADMA mode
> > [ 2.215676] sata_nv 0000:00:08.0: Using MSI
> > [ 2.219649] scsi host4: sata_nv
> > [ 2.226626] scsi host5: sata_nv
> > [ 2.657732] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 7.773692] ata1.00: qc timeout (cmd 0x27)
> > [ 7.773738] ata1.00: failed to read native max address (err_mask=0x4)
> > [ 7.773785] ata1.00: HPA support seems broken, skipping HPA handling
> > [ 8.245678] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 8.248009] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> > [ 8.248065] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 32)
> > [ 8.252593] ata1.00: configured for UDMA/133
> > [ 8.252964] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> > [ 8.725693] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 13.917688] ata2.00: qc timeout (cmd 0x27)
> > [ 13.920096] ata2.00: failed to read native max address (err_mask=0x4)
> > [ 13.922491] ata2.00: HPA support seems broken, skipping HPA handling
> > [ 14.393683] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 14.398360] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> > [ 14.400813] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> > [ 14.407722] ata2.00: configured for UDMA/133
> > [ 14.412939] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >
> > As can be seen new kernel has problems with handling of both SATA
> > controllers and disks HPA area. Plus before kernel prints
> > "qc timeout (cmd 0x27)" there is nothing on output, seems that kernel
> > waits until 5s timeout occur and it slow down booting by 10s.
> >
> > Do you have any idea what is happening there? What those SG_IO errors
> > or dmesg errors means?
> >
> > I'm CCing all people who touched sata_nv.c file between 4.9 and 4.19
> > versions, so maybe somebody would know anything about this problem.
> >
> > If you need more information or other outputs, please let me know and I
> > can provide it.
>
> Now I tested also versions 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17 and
> 4.18. And problem appeared only in 4.18 (all previous versions work
> fine) In 4.18 dmesg is:
>
> [ 8.596039] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 8.598489] ata2: illegal qc_active transition (100000000->100000001)
> [ 13.792086] ata2.00: qc timeout (cmd 0x27)
> [ 13.792122] ata2.00: failed to read native max address (err_mask=0x4)
> [ 13.792167] ata2.00: HPA support seems broken, skipping HPA handling
> [ 14.264041] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 14.268756] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> [ 14.271230] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> [ 14.278161] ata2.00: configured for UDMA/133
> [ 14.283427] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>
> (There is another line "illegal qc_active transition" which is not
> present in 4.19)
>
> So this problem must have been introduced during 4.18 release cycle as
> 4.17 version is working fine.
I tried to git bisect this problem between 4.17 and 4.18. I used
following command to filter relevant libata.ko and sata_nv.ko modules:
$ git bisect start v4.18 v4.17 -- drivers/ata/libata* drivers/ata/sata_nv.c
And here are results:
sata_nv.ko and libata.ko compiled from commit 804689 (libata: Fix
command retry decision) for 4.17 kernel are working fine.
sata_nv.ko and libata.ko compiled from commit e3ed89 (libata: bump
->qc_active to a 64-bit type) for 4.18 kernel are broken.
So problem seems to be somehow related with introduction of hardware
tags done by Jens Axboe.
--
Pali Rohár
[email protected]
On 12/25/19 7:01 AM, Pali Rohár wrote:
> Hi Jens, can you please try to look at this problem?
>
> On Wednesday 25 December 2019 12:05:31 Pali Rohár wrote:
>> On Tuesday 24 December 2019 18:35:29 Pali Rohár wrote:
>>> Hello!
>>>
>>> I upgraded machine with NVIDIA SATA controller (nforce4 chipse) from
>>> Debian Stretch to Debian Buster and SATA disks started to have problems.
>>> I booted back to Debian Stretch kernel version (having userspace
>>> untouched in Buster) and everything was like before, so problem is 100%
>>> kernel related. Problematic is APM support (it does not work at all),
>>> HPA support (kernel show warnings at boot time) and whole booting is
>>> delayed by 10 seconds. Also broken is disk speed test.
>>>
>>> SATA controller is using sata_nv.ko kernel driver and in lspci is
>>> identified as:
>>>
>>> 00:07.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0054] (rev f3)
>>> 00:08.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0055] (rev f3)
>>>
>>> Debian Stretch has kernel version (which is working fine):
>>>
>>> 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
>>>
>>> Debian Buster has kernel version (which is problematic):
>>>
>>> 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
>>>
>>> So kernel regression happened somewhere between 4.9 and 4.19 versions.
>>>
>>> APM on Stretch:
>>>
>>> $ sudo hdparm -B /dev/sda
>>>
>>> /dev/sda:
>>> APM_level = not supported
>>>
>>> $ sudo hdparm -B /dev/sdb
>>>
>>> /dev/sdb:
>>> APM_level = off
>>>
>>> APM on Buster:
>>>
>>> $ sudo hdparm -B /dev/sda
>>>
>>> /dev/sda:
>>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> APM_level = not supported
>>>
>>>
>>> $ sudo hdparm -B /dev/sdb
>>>
>>> /dev/sdb:
>>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> APM_level = not supported
>>>
>>> /dev/sda does not support APM, but /dev/sdb supports. I do not
>>> understand what above SG_IO error means, but because it works fine on
>>> older kernel version, it is not hardware problem.
>>>
>>> Disk speed test on Stretch:
>>>
>>> $ sudo hdparm -Tt /dev/sda
>>>
>>> /dev/sda:
>>> Timing cached reads: 118 MB in 2.00 seconds = 58.91 MB/sec
>>> Timing buffered disk reads: 116 MB in 3.09 seconds = 37.54 MB/sec
>>>
>>> $ sudo hdparm -Tt /dev/sdb
>>>
>>> /dev/sdb:
>>> Timing cached reads: 1242 MB in 2.00 seconds = 620.93 MB/sec
>>> Timing buffered disk reads: 388 MB in 3.00 seconds = 129.31 MB/sec
>>>
>>> Disk speed test on Buster:
>>>
>>> $ sudo hdparm -Tt /dev/sda
>>>
>>> /dev/sda:
>>> read() hit EOF - device too small
>>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> Timing buffered disk reads: read() hit EOF - device too small
>>>
>>> $ sudo hdparm -Tt /dev/sdb
>>>
>>> /dev/sdb:
>>> read() hit EOF - device too small
>>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> Timing buffered disk reads: read() hit EOF - device too small
>>>
>>> As can be seen disk speed test is completely broken on new kernel
>>> version and hdparm returns same error as for APM.
>>>
>>> dmesg output on Stretch:
>>>
>>> [ 1.716970] sata_nv 0000:00:07.0: version 3.5
>>> [ 1.717309] sata_nv 0000:00:07.0: Using ADMA mode
>>> [ 1.717358] sata_nv 0000:00:07.0: Using MSI
>>> [ 1.717810] scsi host0: sata_nv
>>> [ 1.717954] scsi host1: sata_nv
>>> [ 1.718016] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
>>> [ 1.718024] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
>>> [ 1.718308] sata_nv 0000:00:08.0: Using ADMA mode
>>> [ 1.718345] sata_nv 0000:00:08.0: Using MSI
>>> [ 1.718757] scsi host2: sata_nv
>>> [ 1.718886] scsi host3: sata_nv
>>> [ 2.192111] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 2.194691] ata1.00: HPA detected: current 976771055, native 976773168
>>> [ 2.194701] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
>>> [ 2.194709] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 31/32)
>>> [ 2.199241] ata1.00: configured for UDMA/133
>>> [ 2.199501] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>>> [ 2.708030] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 2.710442] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
>>> [ 2.710455] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
>>> [ 2.715066] ata2.00: configured for UDMA/133
>>> [ 2.715333] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>>>
>>> dmesg output on Buster:
>>>
>>> [ 2.079293] sata_nv 0000:00:07.0: version 3.5
>>> [ 2.133503] sata_nv 0000:00:07.0: Using ADMA mode
>>> [ 2.137138] sata_nv 0000:00:07.0: Using MSI
>>> [ 2.142043] scsi host0: sata_nv
>>> [ 2.174745] scsi host2: sata_nv
>>> [ 2.178329] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
>>> [ 2.181675] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
>>> [ 2.188680] sata_nv 0000:00:08.0: Using ADMA mode
>>> [ 2.215676] sata_nv 0000:00:08.0: Using MSI
>>> [ 2.219649] scsi host4: sata_nv
>>> [ 2.226626] scsi host5: sata_nv
>>> [ 2.657732] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 7.773692] ata1.00: qc timeout (cmd 0x27)
>>> [ 7.773738] ata1.00: failed to read native max address (err_mask=0x4)
>>> [ 7.773785] ata1.00: HPA support seems broken, skipping HPA handling
>>> [ 8.245678] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 8.248009] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
>>> [ 8.248065] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 32)
>>> [ 8.252593] ata1.00: configured for UDMA/133
>>> [ 8.252964] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>>> [ 8.725693] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 13.917688] ata2.00: qc timeout (cmd 0x27)
>>> [ 13.920096] ata2.00: failed to read native max address (err_mask=0x4)
>>> [ 13.922491] ata2.00: HPA support seems broken, skipping HPA handling
>>> [ 14.393683] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [ 14.398360] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
>>> [ 14.400813] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
>>> [ 14.407722] ata2.00: configured for UDMA/133
>>> [ 14.412939] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>>>
>>> As can be seen new kernel has problems with handling of both SATA
>>> controllers and disks HPA area. Plus before kernel prints
>>> "qc timeout (cmd 0x27)" there is nothing on output, seems that kernel
>>> waits until 5s timeout occur and it slow down booting by 10s.
>>>
>>> Do you have any idea what is happening there? What those SG_IO errors
>>> or dmesg errors means?
>>>
>>> I'm CCing all people who touched sata_nv.c file between 4.9 and 4.19
>>> versions, so maybe somebody would know anything about this problem.
>>>
>>> If you need more information or other outputs, please let me know and I
>>> can provide it.
>>
>> Now I tested also versions 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17 and
>> 4.18. And problem appeared only in 4.18 (all previous versions work
>> fine) In 4.18 dmesg is:
>>
>> [ 8.596039] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [ 8.598489] ata2: illegal qc_active transition (100000000->100000001)
>> [ 13.792086] ata2.00: qc timeout (cmd 0x27)
>> [ 13.792122] ata2.00: failed to read native max address (err_mask=0x4)
>> [ 13.792167] ata2.00: HPA support seems broken, skipping HPA handling
>> [ 14.264041] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [ 14.268756] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
>> [ 14.271230] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
>> [ 14.278161] ata2.00: configured for UDMA/133
>> [ 14.283427] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
>>
>> (There is another line "illegal qc_active transition" which is not
>> present in 4.19)
>>
>> So this problem must have been introduced during 4.18 release cycle as
>> 4.17 version is working fine.
>
> I tried to git bisect this problem between 4.17 and 4.18. I used
> following command to filter relevant libata.ko and sata_nv.ko modules:
>
> $ git bisect start v4.18 v4.17 -- drivers/ata/libata* drivers/ata/sata_nv.c
>
> And here are results:
>
> sata_nv.ko and libata.ko compiled from commit 804689 (libata: Fix
> command retry decision) for 4.17 kernel are working fine.
>
> sata_nv.ko and libata.ko compiled from commit e3ed89 (libata: bump
> ->qc_active to a 64-bit type) for 4.18 kernel are broken.
>
> So problem seems to be somehow related with introduction of hardware
> tags done by Jens Axboe.
Can you try with this patch:
https://lore.kernel.org/linux-ide/[email protected]/T/#u
--
Jens Axboe
On Wednesday 25 December 2019 08:00:05 Jens Axboe wrote:
> On 12/25/19 7:01 AM, Pali Rohár wrote:
> > Hi Jens, can you please try to look at this problem?
> >
> > On Wednesday 25 December 2019 12:05:31 Pali Rohár wrote:
> >> On Tuesday 24 December 2019 18:35:29 Pali Rohár wrote:
> >>> Hello!
> >>>
> >>> I upgraded machine with NVIDIA SATA controller (nforce4 chipse) from
> >>> Debian Stretch to Debian Buster and SATA disks started to have problems.
> >>> I booted back to Debian Stretch kernel version (having userspace
> >>> untouched in Buster) and everything was like before, so problem is 100%
> >>> kernel related. Problematic is APM support (it does not work at all),
> >>> HPA support (kernel show warnings at boot time) and whole booting is
> >>> delayed by 10 seconds. Also broken is disk speed test.
> >>>
> >>> SATA controller is using sata_nv.ko kernel driver and in lspci is
> >>> identified as:
> >>>
> >>> 00:07.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0054] (rev f3)
> >>> 00:08.0 IDE interface [0101]: NVIDIA Corporation CK804 Serial ATA Controller [10de:0055] (rev f3)
> >>>
> >>> Debian Stretch has kernel version (which is working fine):
> >>>
> >>> 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
> >>>
> >>> Debian Buster has kernel version (which is problematic):
> >>>
> >>> 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64
> >>>
> >>> So kernel regression happened somewhere between 4.9 and 4.19 versions.
> >>>
> >>> APM on Stretch:
> >>>
> >>> $ sudo hdparm -B /dev/sda
> >>>
> >>> /dev/sda:
> >>> APM_level = not supported
> >>>
> >>> $ sudo hdparm -B /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> APM_level = off
> >>>
> >>> APM on Buster:
> >>>
> >>> $ sudo hdparm -B /dev/sda
> >>>
> >>> /dev/sda:
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> APM_level = not supported
> >>>
> >>>
> >>> $ sudo hdparm -B /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> APM_level = not supported
> >>>
> >>> /dev/sda does not support APM, but /dev/sdb supports. I do not
> >>> understand what above SG_IO error means, but because it works fine on
> >>> older kernel version, it is not hardware problem.
> >>>
> >>> Disk speed test on Stretch:
> >>>
> >>> $ sudo hdparm -Tt /dev/sda
> >>>
> >>> /dev/sda:
> >>> Timing cached reads: 118 MB in 2.00 seconds = 58.91 MB/sec
> >>> Timing buffered disk reads: 116 MB in 3.09 seconds = 37.54 MB/sec
> >>>
> >>> $ sudo hdparm -Tt /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> Timing cached reads: 1242 MB in 2.00 seconds = 620.93 MB/sec
> >>> Timing buffered disk reads: 388 MB in 3.00 seconds = 129.31 MB/sec
> >>>
> >>> Disk speed test on Buster:
> >>>
> >>> $ sudo hdparm -Tt /dev/sda
> >>>
> >>> /dev/sda:
> >>> read() hit EOF - device too small
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> Timing buffered disk reads: read() hit EOF - device too small
> >>>
> >>> $ sudo hdparm -Tt /dev/sdb
> >>>
> >>> /dev/sdb:
> >>> read() hit EOF - device too small
> >>> SG_IO: bad/missing sense data, sb[]: f0 00 05 00 00 00 00 0a 00 aa 55 40 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> Timing buffered disk reads: read() hit EOF - device too small
> >>>
> >>> As can be seen disk speed test is completely broken on new kernel
> >>> version and hdparm returns same error as for APM.
> >>>
> >>> dmesg output on Stretch:
> >>>
> >>> [ 1.716970] sata_nv 0000:00:07.0: version 3.5
> >>> [ 1.717309] sata_nv 0000:00:07.0: Using ADMA mode
> >>> [ 1.717358] sata_nv 0000:00:07.0: Using MSI
> >>> [ 1.717810] scsi host0: sata_nv
> >>> [ 1.717954] scsi host1: sata_nv
> >>> [ 1.718016] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> >>> [ 1.718024] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> >>> [ 1.718308] sata_nv 0000:00:08.0: Using ADMA mode
> >>> [ 1.718345] sata_nv 0000:00:08.0: Using MSI
> >>> [ 1.718757] scsi host2: sata_nv
> >>> [ 1.718886] scsi host3: sata_nv
> >>> [ 2.192111] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 2.194691] ata1.00: HPA detected: current 976771055, native 976773168
> >>> [ 2.194701] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> >>> [ 2.194709] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 31/32)
> >>> [ 2.199241] ata1.00: configured for UDMA/133
> >>> [ 2.199501] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>> [ 2.708030] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 2.710442] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >>> [ 2.710455] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> >>> [ 2.715066] ata2.00: configured for UDMA/133
> >>> [ 2.715333] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>>
> >>> dmesg output on Buster:
> >>>
> >>> [ 2.079293] sata_nv 0000:00:07.0: version 3.5
> >>> [ 2.133503] sata_nv 0000:00:07.0: Using ADMA mode
> >>> [ 2.137138] sata_nv 0000:00:07.0: Using MSI
> >>> [ 2.142043] scsi host0: sata_nv
> >>> [ 2.174745] scsi host2: sata_nv
> >>> [ 2.178329] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xd000 irq 20
> >>> [ 2.181675] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xd008 irq 20
> >>> [ 2.188680] sata_nv 0000:00:08.0: Using ADMA mode
> >>> [ 2.215676] sata_nv 0000:00:08.0: Using MSI
> >>> [ 2.219649] scsi host4: sata_nv
> >>> [ 2.226626] scsi host5: sata_nv
> >>> [ 2.657732] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 7.773692] ata1.00: qc timeout (cmd 0x27)
> >>> [ 7.773738] ata1.00: failed to read native max address (err_mask=0x4)
> >>> [ 7.773785] ata1.00: HPA support seems broken, skipping HPA handling
> >>> [ 8.245678] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 8.248009] ata1.00: ATA-8: WDC WD5000AADS-00S9B0, 01.00A01, max UDMA/133
> >>> [ 8.248065] ata1.00: 976771055 sectors, multi 16: LBA48 NCQ (depth 32)
> >>> [ 8.252593] ata1.00: configured for UDMA/133
> >>> [ 8.252964] ata1: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>> [ 8.725693] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 13.917688] ata2.00: qc timeout (cmd 0x27)
> >>> [ 13.920096] ata2.00: failed to read native max address (err_mask=0x4)
> >>> [ 13.922491] ata2.00: HPA support seems broken, skipping HPA handling
> >>> [ 14.393683] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >>> [ 14.398360] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >>> [ 14.400813] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> >>> [ 14.407722] ata2.00: configured for UDMA/133
> >>> [ 14.412939] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>>
> >>> As can be seen new kernel has problems with handling of both SATA
> >>> controllers and disks HPA area. Plus before kernel prints
> >>> "qc timeout (cmd 0x27)" there is nothing on output, seems that kernel
> >>> waits until 5s timeout occur and it slow down booting by 10s.
> >>>
> >>> Do you have any idea what is happening there? What those SG_IO errors
> >>> or dmesg errors means?
> >>>
> >>> I'm CCing all people who touched sata_nv.c file between 4.9 and 4.19
> >>> versions, so maybe somebody would know anything about this problem.
> >>>
> >>> If you need more information or other outputs, please let me know and I
> >>> can provide it.
> >>
> >> Now I tested also versions 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17 and
> >> 4.18. And problem appeared only in 4.18 (all previous versions work
> >> fine) In 4.18 dmesg is:
> >>
> >> [ 8.596039] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >> [ 8.598489] ata2: illegal qc_active transition (100000000->100000001)
> >> [ 13.792086] ata2.00: qc timeout (cmd 0x27)
> >> [ 13.792122] ata2.00: failed to read native max address (err_mask=0x4)
> >> [ 13.792167] ata2.00: HPA support seems broken, skipping HPA handling
> >> [ 14.264041] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >> [ 14.268756] ata2.00: ATA-8: TOSHIBA HDWD110, MS2OA8J0, max UDMA/133
> >> [ 14.271230] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32)
> >> [ 14.278161] ata2.00: configured for UDMA/133
> >> [ 14.283427] ata2: DMA mask 0xFFFFFFFFFFFFFFFF, segment boundary 0xFFFFFFFF, hw segs 61
> >>
> >> (There is another line "illegal qc_active transition" which is not
> >> present in 4.19)
> >>
> >> So this problem must have been introduced during 4.18 release cycle as
> >> 4.17 version is working fine.
> >
> > I tried to git bisect this problem between 4.17 and 4.18. I used
> > following command to filter relevant libata.ko and sata_nv.ko modules:
> >
> > $ git bisect start v4.18 v4.17 -- drivers/ata/libata* drivers/ata/sata_nv.c
> >
> > And here are results:
> >
> > sata_nv.ko and libata.ko compiled from commit 804689 (libata: Fix
> > command retry decision) for 4.17 kernel are working fine.
> >
> > sata_nv.ko and libata.ko compiled from commit e3ed89 (libata: bump
> > ->qc_active to a 64-bit type) for 4.18 kernel are broken.
> >
> > So problem seems to be somehow related with introduction of hardware
> > tags done by Jens Axboe.
>
> Can you try with this patch:
>
> https://lore.kernel.org/linux-ide/[email protected]/T/#u
>
Great! This patch fixes above problem with sata_nv.ko. Thank you for
quick response.
--
Pali Rohár
[email protected]