2023-01-30 04:05:36

by Christian Kujau

[permalink] [raw]
Subject: External USB disks not recognized with v6.1.8 when using Xen

[CC stable as I only tested the stable tree for now]

I'm running a current Alpine Linux with linux-edge-6.1.8-r0 installed on a
Lenovo Thinkpad L540 where an external disk enclosure with two disks is
attached via USB. The Alpine Linux kernel appears to track Linux stable
and is more or less vanilla. Also, the machine boots into Xen 4.17.0 and
then starts a few headless VMs, nothing too exotic here.

But when updating from Linux 6.1.1 to 6.1.8, the disks from the external
enclosure did not show up. Unplug, replug, no dice, and this is 100%
reproducable. dmesg has new these lines now:

+ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
+ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
+xhci_hcd 0000:00:14.0: init 0000:00:14.0 fail, -14
+ioremap error for 0xfed1f000-0xfed20000, requested 0x2, got 0x0
+iTCO_wdt iTCO_wdt.1.auto: ioremap failed for resource [mem 0xfed1f410-0xfed1f414]

I'm not sure if the ioremap error is related here (booted with
early_ioremap_debug but then dmesg was filled with WARNINGS for both
versions, so I disabled it again), but that xhci_hcd error looks
suspicious.

Curiously 6.1.8 works just fine when NOT booted via Xen. I booted into
Xen + vanilla 6.1.8 now and was able to reproduce this issue. Xen +
vanilla 6.1.1 works fine.

From v6.1.1 to v6.1.8 there's only one commit in drivers/xen, but 54
commits in drivers/usb. Compiling takes time because the distribution
kernel has almost everything enabled and I still need to cut down enabled
options to be able to attempt a git biset in a reasonable time, but I
still wanted to report this, maybe someone has an idea about this.

Full dmesg and lshw outputs: https://nerdbynature.de/bits/usb_v6.1.8/

Thanks,
Christian.

PS: I found this workaround on the interwebs[0] to force the USB ports
of that machine to USB 2.0 and then the missing disks magically appear:

$ lspci -nn | grep -i usb
00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05) <=== !!!
00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)

$ setpci -H1 -d 8086:8c31 d8.l=0
$ setpci -H1 -d 8086:8c31 d0.l=0

$ dmesg
usb 1-1.3: new full-speed USB device number 3 using ehci-pci
usb 2-1.3: new high-speed USB device number 3 using ehci-pci
usb 1-1.3: New USB device found, idVendor=138a, idProduct=0011, bcdDevice=0.78
usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=1
usb 1-1.3: SerialNumber: aa32bf84ed47
usb 1-1.5: new full-speed USB device number 4 using ehci-pci
usb 2-1.3: New USB device found, idVendor=1e91, idProduct=a3a8, bcdDevice=2.07
usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
usb 2-1.3: Product: Elite Pro Dual
usb 2-1.3: Manufacturer: OWC
usb 2-1.3: SerialNumber: RANDOM__1E359879645F
usb 2-1.3: UAS is ignored for this device, using usb-storage instead
usb-storage 2-1.3:1.0: USB Mass Storage device detected
usb-storage 2-1.3:1.0: Quirks match for vid 1e91 pid a3a8: 800000
scsi host5: usb-storage 2-1.3:1.0
usb 1-1.5: New USB device found, idVendor=8087, idProduct=07dc, bcdDevice=0.01
usb 1-1.5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Bluetooth: hci0: Legacy ROM 2.5 revision 8.0 build 1 week 45 2013
Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.1.2d.d.bseq
usb 1-1.6: new high-speed USB device number 5 using ehci-pci
usb 1-1.6: New USB device found, idVendor=04f2, idProduct=b398, bcdDevice=39.98
usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.6: Product: Integrated Camera
usb 1-1.6: Manufacturer: Vimicro corp.
Bluetooth: hci0: Intel BT fw patch 0x2a completed & activated
scsi 5:0:0:0: Direct-Access ElitePro Dual U3FW-1 0207 PQ: 0 ANSI: 6
scsi 5:0:0:1: Direct-Access ElitePro Dual U3FW-2 0207 PQ: 0 ANSI: 6
sd 5:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
sd 5:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
sd 5:0:0:1: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
sd 5:0:0:1: [sdd] Write Protect is off
sd 5:0:0:1: [sdd] Mode Sense: 47 00 10 08
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 47 00 10 08
sd 5:0:0:0: [sdc] No Caching mode page found
sd 5:0:0:0: [sdc] Assuming drive cache: write through
sd 5:0:0:1: [sdd] No Caching mode page found
sd 5:0:0:1: [sdd] Assuming drive cache: write through
sd 5:0:0:0: [sdc] Attached SCSI disk
sd 5:0:0:1: [sdd] Attached SCSI disk

$ lsblk /dev/sd[cd]
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdc 8:32 0 3.6T 0 disk
sdd 8:48 0 3.6T 0 disk


[0] https://superuser.com/a/875863/218574
--
BOFH excuse #135:

You put the disk in upside down.


2023-01-30 05:17:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Mon, Jan 30, 2023 at 04:46:02AM +0100, Christian Kujau wrote:
> [CC stable as I only tested the stable tree for now]

The stable list is only for adding new things to stable releases, it's
not going to reach the developers involved in the changes you are having
issues with at all.

Try cc:ing the xen and usb mailing lists instead. And using 'git
bisect' as you said, will help out a lot here in tracking down the
issue.

thanks,

greg k-h

2023-01-30 12:02:05

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 30.01.23 04:46, Christian Kujau wrote:
> [CC stable as I only tested the stable tree for now]
>
> I'm running a current Alpine Linux with linux-edge-6.1.8-r0 installed on a
> Lenovo Thinkpad L540 where an external disk enclosure with two disks is
> attached via USB. The Alpine Linux kernel appears to track Linux stable
> and is more or less vanilla. Also, the machine boots into Xen 4.17.0 and
> then starts a few headless VMs, nothing too exotic here.
>
> But when updating from Linux 6.1.1 to 6.1.8, the disks from the external
> enclosure did not show up. Unplug, replug, no dice, and this is 100%
> reproducable. dmesg has new these lines now:
>
> +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
> +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
> +xhci_hcd 0000:00:14.0: init 0000:00:14.0 fail, -14
> +ioremap error for 0xfed1f000-0xfed20000, requested 0x2, got 0x0
> +iTCO_wdt iTCO_wdt.1.auto: ioremap failed for resource [mem 0xfed1f410-0xfed1f414]
>
> I'm not sure if the ioremap error is related here (booted with
> early_ioremap_debug but then dmesg was filled with WARNINGS for both
> versions, so I disabled it again), but that xhci_hcd error looks
> suspicious.
>
> Curiously 6.1.8 works just fine when NOT booted via Xen. I booted into
> Xen + vanilla 6.1.8 now and was able to reproduce this issue. Xen +
> vanilla 6.1.1 works fine.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced v6.1.1..v6.1.8
#regzbot title xen/usb(?): External USB disks not recognized anymore
under Xen
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

> From v6.1.1 to v6.1.8 there's only one commit in drivers/xen, but 54
> commits in drivers/usb. Compiling takes time because the distribution
> kernel has almost everything enabled and I still need to cut down enabled
> options to be able to attempt a git biset in a reasonable time,

FWIW, I'm working on a text for the kernel docs that will use
"localmodconfig" to trim down the configs automatically. Maybe it's
helpful for you, here is a draft:

https://www.leemhuis.info/files/misc/How%20to%20quickly%20build%20a%20Linux%20kernel%20%E2%80%94%20The%20Linux%20Kernel%20documentation.html

> but I
> still wanted to report this, maybe someone has an idea about this.
>
> Full dmesg and lshw outputs: https://nerdbynature.de/bits/usb_v6.1.8/
>
> Thanks,
> Christian.
>
> PS: I found this workaround on the interwebs[0] to force the USB ports
> of that machine to USB 2.0 and then the missing disks magically appear:
>
> $ lspci -nn | grep -i usb
> 00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05) <=== !!!
> 00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
> 00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
>
> $ setpci -H1 -d 8086:8c31 d8.l=0
> $ setpci -H1 -d 8086:8c31 d0.l=0
>
> $ dmesg
> usb 1-1.3: new full-speed USB device number 3 using ehci-pci
> usb 2-1.3: new high-speed USB device number 3 using ehci-pci
> usb 1-1.3: New USB device found, idVendor=138a, idProduct=0011, bcdDevice=0.78
> usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=1
> usb 1-1.3: SerialNumber: aa32bf84ed47
> usb 1-1.5: new full-speed USB device number 4 using ehci-pci
> usb 2-1.3: New USB device found, idVendor=1e91, idProduct=a3a8, bcdDevice=2.07
> usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> usb 2-1.3: Product: Elite Pro Dual
> usb 2-1.3: Manufacturer: OWC
> usb 2-1.3: SerialNumber: RANDOM__1E359879645F
> usb 2-1.3: UAS is ignored for this device, using usb-storage instead
> usb-storage 2-1.3:1.0: USB Mass Storage device detected
> usb-storage 2-1.3:1.0: Quirks match for vid 1e91 pid a3a8: 800000
> scsi host5: usb-storage 2-1.3:1.0
> usb 1-1.5: New USB device found, idVendor=8087, idProduct=07dc, bcdDevice=0.01
> usb 1-1.5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> Bluetooth: hci0: Legacy ROM 2.5 revision 8.0 build 1 week 45 2013
> Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.1.2d.d.bseq
> usb 1-1.6: new high-speed USB device number 5 using ehci-pci
> usb 1-1.6: New USB device found, idVendor=04f2, idProduct=b398, bcdDevice=39.98
> usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> usb 1-1.6: Product: Integrated Camera
> usb 1-1.6: Manufacturer: Vimicro corp.
> Bluetooth: hci0: Intel BT fw patch 0x2a completed & activated
> scsi 5:0:0:0: Direct-Access ElitePro Dual U3FW-1 0207 PQ: 0 ANSI: 6
> scsi 5:0:0:1: Direct-Access ElitePro Dual U3FW-2 0207 PQ: 0 ANSI: 6
> sd 5:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
> sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> sd 5:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
> sd 5:0:0:1: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> sd 5:0:0:1: [sdd] Write Protect is off
> sd 5:0:0:1: [sdd] Mode Sense: 47 00 10 08
> sd 5:0:0:0: [sdc] Write Protect is off
> sd 5:0:0:0: [sdc] Mode Sense: 47 00 10 08
> sd 5:0:0:0: [sdc] No Caching mode page found
> sd 5:0:0:0: [sdc] Assuming drive cache: write through
> sd 5:0:0:1: [sdd] No Caching mode page found
> sd 5:0:0:1: [sdd] Assuming drive cache: write through
> sd 5:0:0:0: [sdc] Attached SCSI disk
> sd 5:0:0:1: [sdd] Attached SCSI disk
>
> $ lsblk /dev/sd[cd]
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> sdc 8:32 0 3.6T 0 disk
> sdd 8:48 0 3.6T 0 disk
>
>
> [0] https://superuser.com/a/875863/218574

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-01-31 22:50:54

by Christian Kujau

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

[Leaving the full quote below for reference and adding more appropriate people.]

After a far too long round of git-bisect I narrowed it down to:

c1c59538337ab6d45700cb4a1c9725e67f59bc6e is the first bad commit

x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
commit 90b926e68f500844dff16b5bcea178dc55cf580a upstream.

And indeed, reverting this single commit from v6.1.8 (stable) makes the
disks appear again.

TL;DR: with v6.1.8 in Xen Dom0 mode (i.e. the Xen host itself) the
external disk enclosure attached via USB is not being recognized. When
booted *without* Xen, the disks show up just fine.

Details with dmesg and lsusb outputs:
https://nerdbynature.de/bits/usb_v6.1.8/

Thanks Thorsten for the localmodconfig hint, I've tried that before, but
the thing just did not want to boot, so I manually cut down on options,
but it's still ~12 minutes per compile, ccache helped a bit in the end.

Thanks for reading,
Christian.

On Mon, 30 Jan 2023, Linux kernel regression tracking (#adding) wrote:

> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.
> See link in footer if these mails annoy you.]
>
> On 30.01.23 04:46, Christian Kujau wrote:
> > [CC stable as I only tested the stable tree for now]
> >
> > I'm running a current Alpine Linux with linux-edge-6.1.8-r0 installed on a
> > Lenovo Thinkpad L540 where an external disk enclosure with two disks is
> > attached via USB. The Alpine Linux kernel appears to track Linux stable
> > and is more or less vanilla. Also, the machine boots into Xen 4.17.0 and
> > then starts a few headless VMs, nothing too exotic here.
> >
> > But when updating from Linux 6.1.1 to 6.1.8, the disks from the external
> > enclosure did not show up. Unplug, replug, no dice, and this is 100%
> > reproducable. dmesg has new these lines now:
> >
> > +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
> > +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
> > +xhci_hcd 0000:00:14.0: init 0000:00:14.0 fail, -14
> > +ioremap error for 0xfed1f000-0xfed20000, requested 0x2, got 0x0
> > +iTCO_wdt iTCO_wdt.1.auto: ioremap failed for resource [mem 0xfed1f410-0xfed1f414]
> >
> > I'm not sure if the ioremap error is related here (booted with
> > early_ioremap_debug but then dmesg was filled with WARNINGS for both
> > versions, so I disabled it again), but that xhci_hcd error looks
> > suspicious.
> >
> > Curiously 6.1.8 works just fine when NOT booted via Xen. I booted into
> > Xen + vanilla 6.1.8 now and was able to reproduce this issue. Xen +
> > vanilla 6.1.1 works fine.
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced v6.1.1..v6.1.8
> #regzbot title xen/usb(?): External USB disks not recognized anymore
> under Xen
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
>
> > From v6.1.1 to v6.1.8 there's only one commit in drivers/xen, but 54
> > commits in drivers/usb. Compiling takes time because the distribution
> > kernel has almost everything enabled and I still need to cut down enabled
> > options to be able to attempt a git biset in a reasonable time,
>
> FWIW, I'm working on a text for the kernel docs that will use
> "localmodconfig" to trim down the configs automatically. Maybe it's
> helpful for you, here is a draft:
>
> https://www.leemhuis.info/files/misc/How%20to%20quickly%20build%20a%20Linux%20kernel%20%E2%80%94%20The%20Linux%20Kernel%20documentation.html
>
> > but I
> > still wanted to report this, maybe someone has an idea about this.
> >
> > Full dmesg and lshw outputs: https://nerdbynature.de/bits/usb_v6.1.8/
> >
> > Thanks,
> > Christian.
> >
> > PS: I found this workaround on the interwebs[0] to force the USB ports
> > of that machine to USB 2.0 and then the missing disks magically appear:
> >
> > $ lspci -nn | grep -i usb
> > 00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05) <=== !!!
> > 00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
> > 00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
> >
> > $ setpci -H1 -d 8086:8c31 d8.l=0
> > $ setpci -H1 -d 8086:8c31 d0.l=0
> >
> > $ dmesg
> > usb 1-1.3: new full-speed USB device number 3 using ehci-pci
> > usb 2-1.3: new high-speed USB device number 3 using ehci-pci
> > usb 1-1.3: New USB device found, idVendor=138a, idProduct=0011, bcdDevice=0.78
> > usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=1
> > usb 1-1.3: SerialNumber: aa32bf84ed47
> > usb 1-1.5: new full-speed USB device number 4 using ehci-pci
> > usb 2-1.3: New USB device found, idVendor=1e91, idProduct=a3a8, bcdDevice=2.07
> > usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> > usb 2-1.3: Product: Elite Pro Dual
> > usb 2-1.3: Manufacturer: OWC
> > usb 2-1.3: SerialNumber: RANDOM__1E359879645F
> > usb 2-1.3: UAS is ignored for this device, using usb-storage instead
> > usb-storage 2-1.3:1.0: USB Mass Storage device detected
> > usb-storage 2-1.3:1.0: Quirks match for vid 1e91 pid a3a8: 800000
> > scsi host5: usb-storage 2-1.3:1.0
> > usb 1-1.5: New USB device found, idVendor=8087, idProduct=07dc, bcdDevice=0.01
> > usb 1-1.5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> > Bluetooth: hci0: Legacy ROM 2.5 revision 8.0 build 1 week 45 2013
> > Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.1.2d.d.bseq
> > usb 1-1.6: new high-speed USB device number 5 using ehci-pci
> > usb 1-1.6: New USB device found, idVendor=04f2, idProduct=b398, bcdDevice=39.98
> > usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > usb 1-1.6: Product: Integrated Camera
> > usb 1-1.6: Manufacturer: Vimicro corp.
> > Bluetooth: hci0: Intel BT fw patch 0x2a completed & activated
> > scsi 5:0:0:0: Direct-Access ElitePro Dual U3FW-1 0207 PQ: 0 ANSI: 6
> > scsi 5:0:0:1: Direct-Access ElitePro Dual U3FW-2 0207 PQ: 0 ANSI: 6
> > sd 5:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
> > sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> > sd 5:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
> > sd 5:0:0:1: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> > sd 5:0:0:1: [sdd] Write Protect is off
> > sd 5:0:0:1: [sdd] Mode Sense: 47 00 10 08
> > sd 5:0:0:0: [sdc] Write Protect is off
> > sd 5:0:0:0: [sdc] Mode Sense: 47 00 10 08
> > sd 5:0:0:0: [sdc] No Caching mode page found
> > sd 5:0:0:0: [sdc] Assuming drive cache: write through
> > sd 5:0:0:1: [sdd] No Caching mode page found
> > sd 5:0:0:1: [sdd] Assuming drive cache: write through
> > sd 5:0:0:0: [sdc] Attached SCSI disk
> > sd 5:0:0:1: [sdd] Attached SCSI disk
> >
> > $ lsblk /dev/sd[cd]
> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> > sdc 8:32 0 3.6T 0 disk
> > sdd 8:48 0 3.6T 0 disk
> >
> >
> > [0] https://superuser.com/a/875863/218574
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>

--
BOFH excuse #188:

..disk or the processor is on fire.

2023-02-01 08:15:03

by Jürgen Groß

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On 31.01.23 23:50, Christian Kujau wrote:
> [Leaving the full quote below for reference and adding more appropriate people.]
>
> After a far too long round of git-bisect I narrowed it down to:
>
> c1c59538337ab6d45700cb4a1c9725e67f59bc6e is the first bad commit
>
> x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
> commit 90b926e68f500844dff16b5bcea178dc55cf580a upstream.
>
> And indeed, reverting this single commit from v6.1.8 (stable) makes the
> disks appear again.

I have problems understanding the behavior.

Assuming the cited error messages

ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0

are related to the issue, this would mean that the ioremap() caller
requested _PAGE_CACHE_MODE_UC_MINUS (type 0x2) and got _PAGE_CACHE_MODE_WB
(type 0x0).

The patch you have reverted is modifying behavior only if the _input_ type
is _PAGE_CACHE_MODE_WB.

Anyone having an idea what could be wrong here?


Juergen

>
> TL;DR: with v6.1.8 in Xen Dom0 mode (i.e. the Xen host itself) the
> external disk enclosure attached via USB is not being recognized. When
> booted *without* Xen, the disks show up just fine.
>
> Details with dmesg and lsusb outputs:
> https://nerdbynature.de/bits/usb_v6.1.8/
>
> Thanks Thorsten for the localmodconfig hint, I've tried that before, but
> the thing just did not want to boot, so I manually cut down on options,
> but it's still ~12 minutes per compile, ccache helped a bit in the end.
>
> Thanks for reading,
> Christian.
>
> On Mon, 30 Jan 2023, Linux kernel regression tracking (#adding) wrote:
>
>> [TLDR: I'm adding this report to the list of tracked Linux kernel
>> regressions; the text you find below is based on a few templates
>> paragraphs you might have encountered already in similar form.
>> See link in footer if these mails annoy you.]
>>
>> On 30.01.23 04:46, Christian Kujau wrote:
>>> [CC stable as I only tested the stable tree for now]
>>>
>>> I'm running a current Alpine Linux with linux-edge-6.1.8-r0 installed on a
>>> Lenovo Thinkpad L540 where an external disk enclosure with two disks is
>>> attached via USB. The Alpine Linux kernel appears to track Linux stable
>>> and is more or less vanilla. Also, the machine boots into Xen 4.17.0 and
>>> then starts a few headless VMs, nothing too exotic here.
>>>
>>> But when updating from Linux 6.1.1 to 6.1.8, the disks from the external
>>> enclosure did not show up. Unplug, replug, no dice, and this is 100%
>>> reproducable. dmesg has new these lines now:
>>>
>>> +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
>>> +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
>>> +xhci_hcd 0000:00:14.0: init 0000:00:14.0 fail, -14
>>> +ioremap error for 0xfed1f000-0xfed20000, requested 0x2, got 0x0
>>> +iTCO_wdt iTCO_wdt.1.auto: ioremap failed for resource [mem 0xfed1f410-0xfed1f414]
>>>
>>> I'm not sure if the ioremap error is related here (booted with
>>> early_ioremap_debug but then dmesg was filled with WARNINGS for both
>>> versions, so I disabled it again), but that xhci_hcd error looks
>>> suspicious.
>>>
>>> Curiously 6.1.8 works just fine when NOT booted via Xen. I booted into
>>> Xen + vanilla 6.1.8 now and was able to reproduce this issue. Xen +
>>> vanilla 6.1.1 works fine.
>>
>> Thanks for the report. To be sure the issue doesn't fall through the
>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>> tracking bot:
>>
>> #regzbot ^introduced v6.1.1..v6.1.8
>> #regzbot title xen/usb(?): External USB disks not recognized anymore
>> under Xen
>> #regzbot ignore-activity
>>
>> This isn't a regression? This issue or a fix for it are already
>> discussed somewhere else? It was fixed already? You want to clarify when
>> the regression started to happen? Or point out I got the title or
>> something else totally wrong? Then just reply and tell me -- ideally
>> while also telling regzbot about it, as explained by the page listed in
>> the footer of this mail.
>>
>> Developers: When fixing the issue, remember to add 'Link:' tags pointing
>> to the report (the parent of this mail). See page linked in footer for
>> details.
>>
>>> From v6.1.1 to v6.1.8 there's only one commit in drivers/xen, but 54
>>> commits in drivers/usb. Compiling takes time because the distribution
>>> kernel has almost everything enabled and I still need to cut down enabled
>>> options to be able to attempt a git biset in a reasonable time,
>>
>> FWIW, I'm working on a text for the kernel docs that will use
>> "localmodconfig" to trim down the configs automatically. Maybe it's
>> helpful for you, here is a draft:
>>
>> https://www.leemhuis.info/files/misc/How%20to%20quickly%20build%20a%20Linux%20kernel%20%E2%80%94%20The%20Linux%20Kernel%20documentation.html
>>
>>> but I
>>> still wanted to report this, maybe someone has an idea about this.
>>>
>>> Full dmesg and lshw outputs: https://nerdbynature.de/bits/usb_v6.1.8/
>>>
>>> Thanks,
>>> Christian.
>>>
>>> PS: I found this workaround on the interwebs[0] to force the USB ports
>>> of that machine to USB 2.0 and then the missing disks magically appear:
>>>
>>> $ lspci -nn | grep -i usb
>>> 00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05) <=== !!!
>>> 00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
>>> 00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
>>>
>>> $ setpci -H1 -d 8086:8c31 d8.l=0
>>> $ setpci -H1 -d 8086:8c31 d0.l=0
>>>
>>> $ dmesg
>>> usb 1-1.3: new full-speed USB device number 3 using ehci-pci
>>> usb 2-1.3: new high-speed USB device number 3 using ehci-pci
>>> usb 1-1.3: New USB device found, idVendor=138a, idProduct=0011, bcdDevice=0.78
>>> usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=1
>>> usb 1-1.3: SerialNumber: aa32bf84ed47
>>> usb 1-1.5: new full-speed USB device number 4 using ehci-pci
>>> usb 2-1.3: New USB device found, idVendor=1e91, idProduct=a3a8, bcdDevice=2.07
>>> usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
>>> usb 2-1.3: Product: Elite Pro Dual
>>> usb 2-1.3: Manufacturer: OWC
>>> usb 2-1.3: SerialNumber: RANDOM__1E359879645F
>>> usb 2-1.3: UAS is ignored for this device, using usb-storage instead
>>> usb-storage 2-1.3:1.0: USB Mass Storage device detected
>>> usb-storage 2-1.3:1.0: Quirks match for vid 1e91 pid a3a8: 800000
>>> scsi host5: usb-storage 2-1.3:1.0
>>> usb 1-1.5: New USB device found, idVendor=8087, idProduct=07dc, bcdDevice=0.01
>>> usb 1-1.5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
>>> Bluetooth: hci0: Legacy ROM 2.5 revision 8.0 build 1 week 45 2013
>>> Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.1.2d.d.bseq
>>> usb 1-1.6: new high-speed USB device number 5 using ehci-pci
>>> usb 1-1.6: New USB device found, idVendor=04f2, idProduct=b398, bcdDevice=39.98
>>> usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
>>> usb 1-1.6: Product: Integrated Camera
>>> usb 1-1.6: Manufacturer: Vimicro corp.
>>> Bluetooth: hci0: Intel BT fw patch 0x2a completed & activated
>>> scsi 5:0:0:0: Direct-Access ElitePro Dual U3FW-1 0207 PQ: 0 ANSI: 6
>>> scsi 5:0:0:1: Direct-Access ElitePro Dual U3FW-2 0207 PQ: 0 ANSI: 6
>>> sd 5:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
>>> sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
>>> sd 5:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
>>> sd 5:0:0:1: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
>>> sd 5:0:0:1: [sdd] Write Protect is off
>>> sd 5:0:0:1: [sdd] Mode Sense: 47 00 10 08
>>> sd 5:0:0:0: [sdc] Write Protect is off
>>> sd 5:0:0:0: [sdc] Mode Sense: 47 00 10 08
>>> sd 5:0:0:0: [sdc] No Caching mode page found
>>> sd 5:0:0:0: [sdc] Assuming drive cache: write through
>>> sd 5:0:0:1: [sdd] No Caching mode page found
>>> sd 5:0:0:1: [sdd] Assuming drive cache: write through
>>> sd 5:0:0:0: [sdc] Attached SCSI disk
>>> sd 5:0:0:1: [sdd] Attached SCSI disk
>>>
>>> $ lsblk /dev/sd[cd]
>>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
>>> sdc 8:32 0 3.6T 0 disk
>>> sdd 8:48 0 3.6T 0 disk
>>>
>>>
>>> [0] https://superuser.com/a/875863/218574
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> That page also explains what to do if mails like this annoy you.
>>
>


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.03 kB)
OpenPGP public key
OpenPGP_signature (495.00 B)
OpenPGP digital signature
Download all attachments

2023-02-01 09:37:47

by Christian Kujau

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Wed, 1 Feb 2023, Juergen Gross wrote:
> I have problems understanding the behavior.
>
> Assuming the cited error messages
>
> ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0

Yes, that and the XHCI_HCD error are only present in dmesg when it's
running as the Xen Dom0:

+ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
+xhci_hcd 0000:00:14.0: init 0000:00:14.0 fail, -14
+xhci_hcd: probe of 0000:00:14.0 failed with error -14

I don't know if it's related, but the issue is really, really gone when
that commit is reverted. And "no external disks when running in Dom0 mode"
(i.e. started by the Xen HV) is kind of a big issue, I'm afraid.

But I'm at a loss on the technicalities of that commit, sorry.

Christian.

> are related to the issue, this would mean that the ioremap() caller
> requested _PAGE_CACHE_MODE_UC_MINUS (type 0x2) and got _PAGE_CACHE_MODE_WB
> (type 0x0).
>
> The patch you have reverted is modifying behavior only if the _input_ type
> is _PAGE_CACHE_MODE_WB.
>
> Anyone having an idea what could be wrong here?
--
BOFH excuse #339:

manager in the cable duct

2023-02-02 11:38:45

by Christian Kujau

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Wed, 1 Feb 2023, Juergen Gross wrote:
> On 31.01.23 23:50, Christian Kujau wrote:
> > [Leaving the full quote below for reference and adding more appropriate
> > people.]
> >
> > After a far too long round of git-bisect I narrowed it down to:
> >
> > c1c59538337ab6d45700cb4a1c9725e67f59bc6e is the first bad commit
> >
> > x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
> > commit 90b926e68f500844dff16b5bcea178dc55cf580a upstream.
> >
> > And indeed, reverting this single commit from v6.1.8 (stable) makes the
> > disks appear again.
>
> I have problems understanding the behavior.
>
> Assuming the cited error messages
>
> ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
>
> are related to the issue, this would mean that the ioremap() caller
> requested _PAGE_CACHE_MODE_UC_MINUS (type 0x2) and got _PAGE_CACHE_MODE_WB
> (type 0x0).
>
> The patch you have reverted is modifying behavior only if the _input_ type
> is _PAGE_CACHE_MODE_WB.

This also happens on mainline, not only in stable. Reverting this patch
from 6.2-rc6 makes the disks appear again.

Christian.

> Anyone having an idea what could be wrong here?
>
>
> Juergen
>
> >
> > TL;DR: with v6.1.8 in Xen Dom0 mode (i.e. the Xen host itself) the
> > external disk enclosure attached via USB is not being recognized. When
> > booted *without* Xen, the disks show up just fine.
> >
> > Details with dmesg and lsusb outputs:
> > https://nerdbynature.de/bits/usb_v6.1.8/
> >
> > Thanks Thorsten for the localmodconfig hint, I've tried that before, but
> > the thing just did not want to boot, so I manually cut down on options,
> > but it's still ~12 minutes per compile, ccache helped a bit in the end.
> >
> > Thanks for reading,
> > Christian.
> >
> > On Mon, 30 Jan 2023, Linux kernel regression tracking (#adding) wrote:
> >
> > > [TLDR: I'm adding this report to the list of tracked Linux kernel
> > > regressions; the text you find below is based on a few templates
> > > paragraphs you might have encountered already in similar form.
> > > See link in footer if these mails annoy you.]
> > >
> > > On 30.01.23 04:46, Christian Kujau wrote:
> > > > [CC stable as I only tested the stable tree for now]
> > > >
> > > > I'm running a current Alpine Linux with linux-edge-6.1.8-r0 installed on
> > > > a
> > > > Lenovo Thinkpad L540 where an external disk enclosure with two disks is
> > > > attached via USB. The Alpine Linux kernel appears to track Linux stable
> > > > and is more or less vanilla. Also, the machine boots into Xen 4.17.0 and
> > > > then starts a few headless VMs, nothing too exotic here.
> > > >
> > > > But when updating from Linux 6.1.1 to 6.1.8, the disks from the external
> > > > enclosure did not show up. Unplug, replug, no dice, and this is 100%
> > > > reproducable. dmesg has new these lines now:
> > > >
> > > > +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
> > > > +ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
> > > > +xhci_hcd 0000:00:14.0: init 0000:00:14.0 fail, -14
> > > > +ioremap error for 0xfed1f000-0xfed20000, requested 0x2, got 0x0
> > > > +iTCO_wdt iTCO_wdt.1.auto: ioremap failed for resource [mem
> > > > 0xfed1f410-0xfed1f414]
> > > >
> > > > I'm not sure if the ioremap error is related here (booted with
> > > > early_ioremap_debug but then dmesg was filled with WARNINGS for both
> > > > versions, so I disabled it again), but that xhci_hcd error looks
> > > > suspicious.
> > > >
> > > > Curiously 6.1.8 works just fine when NOT booted via Xen. I booted into
> > > > Xen + vanilla 6.1.8 now and was able to reproduce this issue. Xen +
> > > > vanilla 6.1.1 works fine.
> > >
> > > Thanks for the report. To be sure the issue doesn't fall through the
> > > cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> > > tracking bot:
> > >
> > > #regzbot ^introduced v6.1.1..v6.1.8
> > > #regzbot title xen/usb(?): External USB disks not recognized anymore
> > > under Xen
> > > #regzbot ignore-activity
> > >
> > > This isn't a regression? This issue or a fix for it are already
> > > discussed somewhere else? It was fixed already? You want to clarify when
> > > the regression started to happen? Or point out I got the title or
> > > something else totally wrong? Then just reply and tell me -- ideally
> > > while also telling regzbot about it, as explained by the page listed in
> > > the footer of this mail.
> > >
> > > Developers: When fixing the issue, remember to add 'Link:' tags pointing
> > > to the report (the parent of this mail). See page linked in footer for
> > > details.
> > >
> > > > From v6.1.1 to v6.1.8 there's only one commit in drivers/xen, but 54
> > > > commits in drivers/usb. Compiling takes time because the distribution
> > > > kernel has almost everything enabled and I still need to cut down
> > > > enabled
> > > > options to be able to attempt a git biset in a reasonable time,
> > >
> > > FWIW, I'm working on a text for the kernel docs that will use
> > > "localmodconfig" to trim down the configs automatically. Maybe it's
> > > helpful for you, here is a draft:
> > >
> > > https://www.leemhuis.info/files/misc/How%20to%20quickly%20build%20a%20Linux%20kernel%20%E2%80%94%20The%20Linux%20Kernel%20documentation.html
> > >
> > > > but I
> > > > still wanted to report this, maybe someone has an idea about this.
> > > >
> > > > Full dmesg and lshw outputs: https://nerdbynature.de/bits/usb_v6.1.8/
> > > >
> > > > Thanks,
> > > > Christian.
> > > >
> > > > PS: I found this workaround on the interwebs[0] to force the USB ports
> > > > of that machine to USB 2.0 and then the missing disks magically appear:
> > > >
> > > > $ lspci -nn | grep -i usb
> > > > 00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series
> > > > Chipset Family USB xHCI [8086:8c31] (rev 05) <=== !!!
> > > > 00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series
> > > > Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
> > > > 00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series
> > > > Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
> > > >
> > > > $ setpci -H1 -d 8086:8c31 d8.l=0
> > > > $ setpci -H1 -d 8086:8c31 d0.l=0
> > > >
> > > > $ dmesg
> > > > usb 1-1.3: new full-speed USB device number 3 using ehci-pci
> > > > usb 2-1.3: new high-speed USB device number 3 using ehci-pci
> > > > usb 1-1.3: New USB device found, idVendor=138a, idProduct=0011,
> > > > bcdDevice=0.78
> > > > usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=1
> > > > usb 1-1.3: SerialNumber: aa32bf84ed47
> > > > usb 1-1.5: new full-speed USB device number 4 using ehci-pci
> > > > usb 2-1.3: New USB device found, idVendor=1e91, idProduct=a3a8,
> > > > bcdDevice=2.07
> > > > usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
> > > > usb 2-1.3: Product: Elite Pro Dual
> > > > usb 2-1.3: Manufacturer: OWC
> > > > usb 2-1.3: SerialNumber: RANDOM__1E359879645F
> > > > usb 2-1.3: UAS is ignored for this device, using usb-storage instead
> > > > usb-storage 2-1.3:1.0: USB Mass Storage device detected
> > > > usb-storage 2-1.3:1.0: Quirks match for vid 1e91 pid a3a8: 800000
> > > > scsi host5: usb-storage 2-1.3:1.0
> > > > usb 1-1.5: New USB device found, idVendor=8087, idProduct=07dc,
> > > > bcdDevice=0.01
> > > > usb 1-1.5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> > > > Bluetooth: hci0: Legacy ROM 2.5 revision 8.0 build 1 week 45 2013
> > > > Bluetooth: hci0: Intel Bluetooth firmware file:
> > > > intel/ibt-hw-37.7.10-fw-1.80.1.2d.d.bseq
> > > > usb 1-1.6: new high-speed USB device number 5 using ehci-pci
> > > > usb 1-1.6: New USB device found, idVendor=04f2, idProduct=b398,
> > > > bcdDevice=39.98
> > > > usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > > > usb 1-1.6: Product: Integrated Camera
> > > > usb 1-1.6: Manufacturer: Vimicro corp.
> > > > Bluetooth: hci0: Intel BT fw patch 0x2a completed & activated
> > > > scsi 5:0:0:0: Direct-Access ElitePro Dual U3FW-1 0207 PQ: 0
> > > > ANSI: 6
> > > > scsi 5:0:0:1: Direct-Access ElitePro Dual U3FW-2 0207 PQ: 0
> > > > ANSI: 6
> > > > sd 5:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
> > > > sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> > > > sd 5:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
> > > > sd 5:0:0:1: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
> > > > sd 5:0:0:1: [sdd] Write Protect is off
> > > > sd 5:0:0:1: [sdd] Mode Sense: 47 00 10 08
> > > > sd 5:0:0:0: [sdc] Write Protect is off
> > > > sd 5:0:0:0: [sdc] Mode Sense: 47 00 10 08
> > > > sd 5:0:0:0: [sdc] No Caching mode page found
> > > > sd 5:0:0:0: [sdc] Assuming drive cache: write through
> > > > sd 5:0:0:1: [sdd] No Caching mode page found
> > > > sd 5:0:0:1: [sdd] Assuming drive cache: write through
> > > > sd 5:0:0:0: [sdc] Attached SCSI disk
> > > > sd 5:0:0:1: [sdd] Attached SCSI disk
> > > >
> > > > $ lsblk /dev/sd[cd]
> > > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> > > > sdc 8:32 0 3.6T 0 disk
> > > > sdd 8:48 0 3.6T 0 disk
> > > >
> > > >
> > > > [0] https://superuser.com/a/875863/218574
> > >
> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > --
> > > Everything you wanna know about Linux kernel regression tracking:
> > > https://linux-regtracking.leemhuis.info/about/#tldr
> > > That page also explains what to do if mails like this annoy you.
> > >
> >
>
>

--
BOFH excuse #286:

Telecommunications is downgrading.

2023-02-02 20:25:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Thu, Feb 2, 2023 at 3:38 AM Christian Kujau <[email protected]> wrote:
>
> On Wed, 1 Feb 2023, Juergen Gross wrote:
> > On 31.01.23 23:50, Christian Kujau wrote:
> > > [Leaving the full quote below for reference and adding more appropriate
> > > people.]
> > >
> > > After a far too long round of git-bisect I narrowed it down to:
> > >
> > > c1c59538337ab6d45700cb4a1c9725e67f59bc6e is the first bad commit
> > >
> > > x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
> > > commit 90b926e68f500844dff16b5bcea178dc55cf580a upstream.
> > >
> > > And indeed, reverting this single commit from v6.1.8 (stable) makes the
> > > disks appear again.
>
> This also happens on mainline, not only in stable. Reverting this patch
> from 6.2-rc6 makes the disks appear again.

I think the patch is simply wrong and should be reverted.

The way hardware works, MTRR_TYPE_INVALID implies UC-, not WB.

So that commit 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR
disabled case") is simply wrong to say "disabled means the same as
WB",.

That said, I think mtrr_type_lookup() is wrong too. It has two bugs

(a) it basically returns the wrong mtrr type for the "not enabled" conditions

(b) it doesn't set the "uniform" bit for said conditions, which then
causes problems for callers - the hugepage case in particular only
checks for that MTRR_TYPE_INVALID case because of this.

(c) it sets is_uniform wrongly for the fixed mtrr case, but I guess
the only thing that cares is largepage, so it works

and I think the !CONFIG_MTRR case has the same issue.

I'm not convinced it *ever* makes sense for mtrr_type_lookup() to
return MTRR_TYPE_INVALID (it makes sense for the helper functions to
do so to let the code know to look at other mtrrs, but not the final
lookup).

And at a minimum, the !MTRR_STATE_MTRR_ENABLED case seems very wrong -
if mtrr is disabled, it should return 'def_type', no?>

So I think that commit should be reverted as broken, and then people
should *maybe* look at something like this (intentionally whitespace
damaged, and people should *really* think about what the
MTRR_TYPE_INVALID case should be - returning UC- is probably what is
closest to "this is what the hardware does", but maybe doesn't make
sense for the largepage case, which might as well just always use
largepages in that case?)

--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -53,7 +53,8 @@ static inline u8 mtrr_type_lookup(..
/*
* Return no-MTRRs:
*/
- return MTRR_TYPE_INVALID;
+ *uniform = 1;
+ return MTRR_TYPE_INVALID; /* ??? */
}
#define mtrr_save_fixed_ranges(arg) do {} while (0)
#define mtrr_save_state() do {} while (0)
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -261,11 +261,13 @@ u8 mtrr_type_lookup(..
/* Make end inclusive instead of exclusive */
end--;

+ type = MTRR_TYPE_INVALID; /* ??? */
if (!mtrr_state_set)
- return MTRR_TYPE_INVALID;
+ goto out;

+ type = mtrr_state.def_type;
if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return MTRR_TYPE_INVALID;
+ goto out;

/*
* Look up the fixed ranges first, which take priority over

But note how I think the !MTRR_STATE_MTRR_ENABLED case really should
return the default mtrr type, and that looks fairly unambiguous to me.

Hmm?

Linus

2023-02-03 16:50:51

by Christian Kujau

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Thu, 2 Feb 2023, Linus Torvalds wrote:
> So I think that commit should be reverted as broken, and then people
> should *maybe* look at something like this (intentionally whitespace
> damaged, and people should *really* think about what the
> MTRR_TYPE_INVALID case should be - returning UC- is probably what is

Not fully understanding what your proposal really does, I got curious and
applied it to v6.2-rc6 with 90b926e68f50 (upstream) reverted. And it
boots, and the disks are there, and the "ioremap error" is gone, but now
I've got strange memory allocation errors, for like really small
operations (I wanted to capture dmesg):

$ dmesg -t | xz -9ec | base64
xz: (stdin): Out of memory

With v6.2-rc6 (vanilla, and also booted under Xen) there's no problem
allocating much more memory than that, so something is still not right.
Patch applied for reference, but as you said: nobody should apply this :-)

More details:

- https://nerdbynature.de/bits/usb_v6.1.8/dmesg.6.1.8.xen
- https://nerdbynature.de/bits/usb_v6.1.8/dmesg_xen-6.1.8_MTRR_TYPE_INVALID.txt
- https://nerdbynature.de/bits/usb_v6.1.8/meminfo_xen-6.1.8.txt
- https://nerdbynature.de/bits/usb_v6.1.8/meminfo_xen-6.1.8_MTRR_TYPE_INVALID.txt

Thanks,
Christian.
--
BOFH excuse #448:

vi needs to be upgraded to vii


Attachments:
MTRR_TYPE_INVALID.patch.txt (1.29 kB)

2023-02-03 17:30:06

by Christian Kujau

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Fri, 3 Feb 2023, Christian Kujau wrote:
> Not fully understanding what your proposal really does, I got curious and
> applied it to v6.2-rc6 with 90b926e68f50 (upstream) reverted. And it
> boots, and the disks are there, and the "ioremap error" is gone, but now
> I've got strange memory allocation errors, for like really small
> operations (I wanted to capture dmesg):
>
> $ dmesg -t | xz -9ec | base64
> xz: (stdin): Out of memory

OK, whatever that is, it's unrelated to Linus's "patch" here, this happens
with v6.2-rc6 (under Xen, and the revert of 90b926e68f50 (upstream) too.
Dmesg has this too:

__vm_enough_memory: pid: 3450, comm: xz, no enough memory for the allocation

Never seen this before, and Xen DomU (pvh) domains can be started just
fine. Not sure what this message is all about, the system appears to run
just fine.

Christian.

>
> With v6.2-rc6 (vanilla, and also booted under Xen) there's no problem
> allocating much more memory than that, so something is still not right.
> Patch applied for reference, but as you said: nobody should apply this :-)
>
> More details:
>
> - https://nerdbynature.de/bits/usb_v6.1.8/dmesg.6.1.8.xen
> - https://nerdbynature.de/bits/usb_v6.1.8/dmesg_xen-6.1.8_MTRR_TYPE_INVALID.txt
> - https://nerdbynature.de/bits/usb_v6.1.8/meminfo_xen-6.1.8.txt
> - https://nerdbynature.de/bits/usb_v6.1.8/meminfo_xen-6.1.8_MTRR_TYPE_INVALID.txt
>
> Thanks,
> Christian.
> --
> BOFH excuse #448:
>
> vi needs to be upgraded to vii

--
BOFH excuse #83:

Support staff hung over, send aspirin and come back LATER.

2023-02-05 10:40:09

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 31.01.23 23:50, Christian Kujau wrote:
> [Leaving the full quote below for reference and adding more appropriate people.]
>
> After a far too long round of git-bisect I narrowed it down to:
>
> c1c59538337ab6d45700cb4a1c9725e67f59bc6e is the first bad commit
>
> x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case
> commit 90b926e68f500844dff16b5bcea178dc55cf580a upstream.
>
> And indeed, reverting this single commit from v6.1.8 (stable) makes the
> disks appear again.
> [...]

In that case let me update the tracking state:

#regzbot introduced: 90b926e68f500844dff16b5bcea178dc55cf5
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-02-05 13:20:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Thu, Feb 02, 2023 at 12:24:30PM -0800, Linus Torvalds wrote:
> So I think that commit should be reverted as broken, and then people
> should *maybe* look at something like this (intentionally whitespace
> damaged, and people should *really* think about what the
> MTRR_TYPE_INVALID case should be - returning UC- is probably what is
> closest to "this is what the hardware does",

Yes, it is actually even documented that by default, all memory is UC-
if MTRRs are disabled.

> but maybe doesn't make sense for the largepage case, which might as
> well just always use largepages in that case?)

See below. I think it should be this way but I might be missing an
angle...

---
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f0eeaf6e5f5f..4061f1e8d34c 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -53,7 +53,8 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
/*
* Return no-MTRRs:
*/
- return MTRR_TYPE_INVALID;
+ *uniform = 1;
+ return MTRR_TYPE_UNCACHABLE;
}
#define mtrr_save_fixed_ranges(arg) do {} while (0)
#define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index ee09d359e08f..2a1ed63d2b24 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -255,17 +255,25 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{
u8 type, prev_type, is_uniform = 1, dummy;
- int repeat;
u64 partial_end;
+ int repeat;

/* Make end inclusive instead of exclusive */
end--;

+ /*
+ * UC- by default because " [i]f the MTRRs are disabled in implementations
+ * that support the MTRR mechanism, the default memory type is set to
+ * uncacheable (UC)".
+ */
+ type = MTRR_TYPE_UNCACHABLE;
+
if (!mtrr_state_set)
- return MTRR_TYPE_INVALID;
+ goto out;

+ type = mtrr_state.def_type;
if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
- return MTRR_TYPE_INVALID;
+ goto out;

/*
* Look up the fixed ranges first, which take priority over
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e4f499eb0f29..ed914bc95345 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,8 +721,9 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
u8 mtrr, uniform;

mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
- if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
- (mtrr != MTRR_TYPE_WRBACK))
+ if (mtrr != MTRR_TYPE_UNCACHABLE &&
+ mtrr != MTRR_TYPE_WRBACK &&
+ !uniform)
return 0;

/* Bail out if we are we on a populated non-leaf entry: */
@@ -748,8 +749,9 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
u8 mtrr, uniform;

mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
- if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
- (mtrr != MTRR_TYPE_WRBACK)) {
+ if (mtrr != MTRR_TYPE_UNCACHABLE &&
+ mtrr != MTRR_TYPE_WRBACK &&
+ !uniform) {
pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
__func__, addr, addr + PMD_SIZE);
return 0;

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-05 17:17:15

by Christian Kujau

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Sun, 5 Feb 2023, Borislav Petkov wrote:
> See below. I think it should be this way but I might be missing an
> angle...

Thanks for taking a stab at this, Borislav. With this applied to v6.2-rc6
(and 90b926e68f50 still reverted), the machine boots just fine, no new
errors and the USB disks show up just fine and appear to be usable. If
this ends up the final fix for this, feel free to add my Tested-by to the
same.

Thanks again,
Christian.
--
BOFH excuse #176:

vapors from evaporating sticky-note adhesives

2023-02-05 20:22:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Sun, Feb 5, 2023 at 5:20 AM Borislav Petkov <[email protected]> wrote:
>
> @@ -53,7 +53,8 @@ static inline u8 mtrr_type_lookup(u64 addr,
> /*
> * Return no-MTRRs:
> */
> - return MTRR_TYPE_INVALID;
> + *uniform = 1;
> + return MTRR_TYPE_UNCACHABLE;

So this is the one I'd almost leave alone.

Because this is not a "there are no MTRR's" situation, this is a "I
haven't enabled CONFIG_MTRR, so I don't _know_ if there are any MTRR's
or not.

And returning MTRR_TYPE_UNCACHABLE will then disable things like
largepages etc, so this change would effectively mean that if
CONFIG_MTRR is off, it would turn off hugepage support too.

But maybe that was the only thing that cared, and we have:

> @@ -721,8 +721,9 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
> u8 mtrr, uniform;
>
> mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> - (mtrr != MTRR_TYPE_WRBACK))
> + if (mtrr != MTRR_TYPE_UNCACHABLE &&
> + mtrr != MTRR_TYPE_WRBACK &&
> + !uniform)
> return 0;

Here you make up for it, but I don't actually understand why these
checks exist at all.

I *think* that what the check should do is just check for uniformity.

Why would the largepage code otherwise care?

Other MTRR types are explicitly fine, and I think things like the X
server might even want to do write-combining with large pages etc.

So I think the hugepage code should only do

if (!uniform)
return 0;

or there should be some explanation for why those types are special?

>> @@ -748,8 +749,9 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
> u8 mtrr, uniform;
>
> mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> - (mtrr != MTRR_TYPE_WRBACK)) {
> + if (mtrr != MTRR_TYPE_UNCACHABLE &&
> + mtrr != MTRR_TYPE_WRBACK &&
> + !uniform) {

Same here.

Again, I *think* that the reason it used to do that "check two types"
thing is simply because "uniform" wasn't set correctly.

But I don't know.

Linus

2023-02-06 06:34:29

by Jürgen Groß

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On 05.02.23 21:21, Linus Torvalds wrote:
> On Sun, Feb 5, 2023 at 5:20 AM Borislav Petkov <[email protected]> wrote:
>>
>> @@ -53,7 +53,8 @@ static inline u8 mtrr_type_lookup(u64 addr,
>> /*
>> * Return no-MTRRs:
>> */
>> - return MTRR_TYPE_INVALID;
>> + *uniform = 1;
>> + return MTRR_TYPE_UNCACHABLE;
>
> So this is the one I'd almost leave alone.
>
> Because this is not a "there are no MTRR's" situation, this is a "I
> haven't enabled CONFIG_MTRR, so I don't _know_ if there are any MTRR's
> or not.

Yes.

> And returning MTRR_TYPE_UNCACHABLE will then disable things like
> largepages etc, so this change would effectively mean that if
> CONFIG_MTRR is off, it would turn off hugepage support too.

Correct.

>
> But maybe that was the only thing that cared, and we have:
>
>> @@ -721,8 +721,9 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>> u8 mtrr, uniform;
>>
>> mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
>> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>> - (mtrr != MTRR_TYPE_WRBACK))
>> + if (mtrr != MTRR_TYPE_UNCACHABLE &&
>> + mtrr != MTRR_TYPE_WRBACK &&
>> + !uniform)
>> return 0;
>
> Here you make up for it, but I don't actually understand why these
> checks exist at all.
>
> I *think* that what the check should do is just check for uniformity.
>
> Why would the largepage code otherwise care?

I agree. The reasoning in the comment above pud_set_huge() is nonsense, as
it is not specific to huge pages:

* - MTRRs are enabled and the corresponding MTRR memory type is WB, which
* has no effect on the requested PAT memory type.

Any other MTRR memory type would interfere with the requested PAT memory
type in undesired ways, but this is still true when using small pages
only.

> Other MTRR types are explicitly fine, and I think things like the X
> server might even want to do write-combining with large pages etc.
>
> So I think the hugepage code should only do
>
> if (!uniform)
> return 0;
>
> or there should be some explanation for why those types are special?

As written above: there is an explanation, but it doesn't make much sense
IMHO.

>
>>> @@ -748,8 +749,9 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>> u8 mtrr, uniform;
>>
>> mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
>> - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>> - (mtrr != MTRR_TYPE_WRBACK)) {
>> + if (mtrr != MTRR_TYPE_UNCACHABLE &&
>> + mtrr != MTRR_TYPE_WRBACK &&
>> + !uniform) {
>
> Same here.
>
> Again, I *think* that the reason it used to do that "check two types"
> thing is simply because "uniform" wasn't set correctly.

This might very well be the reason, yes.

I still don't see why the original report of Christian is making sense:

According to the error message, the _requested_ memory type was UC-, but
the reverted patch only affects cases where the requested type is WB. So
why does a revert of 90b926e68f50 is helping to make this message go away?
The message was:

ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0

Meanwhile I've found a system which is issuing such a message under Xen.
I'll investigate further _why_ a request of UC- ends up to get WB.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.03 kB)
OpenPGP public key
OpenPGP_signature (495.00 B)
OpenPGP digital signature
Download all attachments

2023-02-06 09:43:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On Sun, Feb 05, 2023 at 12:21:42PM -0800, Linus Torvalds wrote:
> So this is the one I'd almost leave alone.
>
> Because this is not a "there are no MTRR's" situation, this is a "I
> haven't enabled CONFIG_MTRR, so I don't _know_ if there are any MTRR's
> or not.
>
> And returning MTRR_TYPE_UNCACHABLE will then disable things like
> largepages etc, so this change would effectively mean that if
> CONFIG_MTRR is off, it would turn off hugepage support too.

Right, if we wanted to be precise here, we would check whether the
underlying hw supports MTRRs - i.e., check CPUID bit - and if our
support for it is disabled, then we'd return UC because this is how the
MTRR-supporting hw behaves:

"If the MTRRs are disabled in implementations that support the MTRR
mechanism, the default memory type is set to uncacheable (UC)."

That's the AMD APM.

The Intel SDM has a similar wording:

"Following a hardware reset, the P6 and more recent processor families
disable all the fixed and variable MTRRs, which in effect makes all of
physical memory uncacheable."

So something like

if (cpu_feature_enabled(X86_FEATURE_MTRR))
return MTRR_TYPE_UNCACHABLE;
else
return MTRR_TYPE_INVALID;


> > @@ -721,8 +721,9 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
> > u8 mtrr, uniform;
> >
> > mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> > - if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> > - (mtrr != MTRR_TYPE_WRBACK))
> > + if (mtrr != MTRR_TYPE_UNCACHABLE &&
> > + mtrr != MTRR_TYPE_WRBACK &&
> > + !uniform)
> > return 0;
>
> Here you make up for it, but I don't actually understand why these
> checks exist at all.
>
> I *think* that what the check should do is just check for uniformity.

Looka here:

6b6378355b92 ("x86, mm: support huge KVA mappings on x86")

Ack on the uniformity aspect. The WB is fine too because "has no affect on
the PAT memory types."

And then when MTRRs are disabled, then I guess it doesn't matter for the
large page mappings anyway. I would have said that we don't really care
about MTRRs being disabled but all those new confidential computing
things do disable MTRRs. Xen too.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-06 09:55:22

by Jürgen Groß

[permalink] [raw]
Subject: Re: External USB disks not recognized with v6.1.8 when using Xen

On 06.02.23 07:33, Juergen Gross wrote:
> I still don't see why the original report of Christian is making sense:
>
> According to the error message, the _requested_ memory type was UC-, but
> the reverted patch only affects cases where the requested type is WB. So
> why does a revert of 90b926e68f50 is helping to make this message go away?
> The message was:
>
>   ioremap error for 0xf2520000-0xf2530000, requested 0x2, got 0x0
>
> Meanwhile I've found a system which is issuing such a message under Xen.
> I'll investigate further _why_ a request of UC- ends up to get WB.

Okay, here the result of my investigation:

During ACPI initialization ACPI code seems to try mapping a memory area
being marked as "reserved" in the memory map with type WB (this happens
in acpi_os_map_iomem()).

With commit 90b926e68f50 this is now accepted, resulting in this memory
area being registered with the WB type.

Much later the driver for the device owning this reserved memory area
tries to map the area as UC-, but it gets WB due to the much earlier
mapping via acpi_os_map_iomem().

Before commit 72cbc8f04fe2 (which 90b926e68f50 tried to fix) this whole
mess worked, because memtype_reserve() took the early exit due to
pat_enabled() returning false.

Just reverting 90b926e68f50 will reintroduce the TDX guest issue Michael
reported (massive slow down due to getting memory areas mapped as UC-).

I believe the most promising way out of this mess would be to let
interested parties (Xen guests, Hyper-V TDX guests) set the MTRR memory
type they want to get back from mtrr_type_lookup() for the cases it
returns MTRR_TYPE_INVALID today.

I guess Xen Dom0 would specify MTRR_TYPE_UNCACHABLE, while Hyper-V TDX
guests could set it to MTRR_TYPE_WRBACK.

Any thoughts?


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.03 kB)
OpenPGP public key
OpenPGP_signature (495.00 B)
OpenPGP digital signature
Download all attachments
Subject: [tip: x86/urgent] x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: f9f57da2c2d119dbf109e3f6e1ceab7659294046
Gitweb: https://git.kernel.org/tip/f9f57da2c2d119dbf109e3f6e1ceab7659294046
Author: Juergen Gross <[email protected]>
AuthorDate: Thu, 09 Feb 2023 08:22:17 +01:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Tue, 14 Feb 2023 10:16:34 +01:00

x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")

Commit

90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")

broke the use case of running Xen dom0 kernels on machines with an
external disk enclosure attached via USB, see Link tag.

What this commit was originally fixing - SEV-SNP guests on Hyper-V - is
a more specialized situation which has other issues at the moment anyway
so reverting this now and addressing the issue properly later is the
prudent thing to do.

So revert it in time for the 6.2 proper release.

[ bp: Rewrite commit message. ]

Reported-by: Christian Kujau <[email protected]>
Tested-by: Christian Kujau <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/mm/pat/memtype.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index fb4b1b5..46de9cf 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -387,8 +387,7 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
u8 mtrr_type, uniform;

mtrr_type = mtrr_type_lookup(start, end, &uniform);
- if (mtrr_type != MTRR_TYPE_WRBACK &&
- mtrr_type != MTRR_TYPE_INVALID)
+ if (mtrr_type != MTRR_TYPE_WRBACK)
return _PAGE_CACHE_MODE_UC_MINUS;

return _PAGE_CACHE_MODE_WB;

2023-02-18 09:47:57

by Christian Kujau

[permalink] [raw]
Subject: Re: [tip: x86/urgent] x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")

On Tue, 14 Feb 2023, tip-bot2 for Juergen Gross wrote:
> The following commit has been merged into the x86/urgent branch of tip:

Sorry for being dense but I couldn't figure this out from the tip tree
handbook[0]: will this be included in 6.2 or has this ship sailed? If so,
I'll start bugging the Alpine folks to maybe carry this around until the
next release.

Thanks,
Christian.

[0] https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

>
> Commit-ID: f9f57da2c2d119dbf109e3f6e1ceab7659294046
> Gitweb: https://git.kernel.org/tip/f9f57da2c2d119dbf109e3f6e1ceab7659294046
> Author: Juergen Gross <[email protected]>
> AuthorDate: Thu, 09 Feb 2023 08:22:17 +01:00
> Committer: Borislav Petkov (AMD) <[email protected]>
> CommitterDate: Tue, 14 Feb 2023 10:16:34 +01:00
>
> x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
>
> Commit
>
> 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
>
> broke the use case of running Xen dom0 kernels on machines with an
> external disk enclosure attached via USB, see Link tag.
>
> What this commit was originally fixing - SEV-SNP guests on Hyper-V - is
> a more specialized situation which has other issues at the moment anyway
> so reverting this now and addressing the issue properly later is the
> prudent thing to do.
>
> So revert it in time for the 6.2 proper release.
>
> [ bp: Rewrite commit message. ]
>
> Reported-by: Christian Kujau <[email protected]>
> Tested-by: Christian Kujau <[email protected]>
> Signed-off-by: Juergen Gross <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> arch/x86/mm/pat/memtype.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index fb4b1b5..46de9cf 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -387,8 +387,7 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
> u8 mtrr_type, uniform;
>
> mtrr_type = mtrr_type_lookup(start, end, &uniform);
> - if (mtrr_type != MTRR_TYPE_WRBACK &&
> - mtrr_type != MTRR_TYPE_INVALID)
> + if (mtrr_type != MTRR_TYPE_WRBACK)
> return _PAGE_CACHE_MODE_UC_MINUS;
>
> return _PAGE_CACHE_MODE_WB;
>

--
BOFH excuse #155:

Dumb terminal

2023-02-18 10:01:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip: x86/urgent] x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")

On Sat, Feb 18, 2023 at 10:47:46AM +0100, Christian Kujau wrote:
> Sorry for being dense but I couldn't figure this out from the tip tree
> handbook[0]: will this be included in 6.2 or has this ship sailed?

Yes, it will. Urgent branches go usually to Linus in the current
stabilization phase. If you wanna do a patch for the handbook to fix
that shortcoming, I'll take it.

:-)

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/cleanups] Documentation/process: Explain when tip branches get merged into mainline

The following commit has been merged into the x86/cleanups branch of tip:

Commit-ID: cb5d28c01b2d56700e1656cd6c3742b40e840bf9
Gitweb: https://git.kernel.org/tip/cb5d28c01b2d56700e1656cd6c3742b40e840bf9
Author: Christian Kujau <[email protected]>
AuthorDate: Sat, 18 Feb 2023 22:29:44 +01:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 08 May 2023 15:35:00 +02:00

Documentation/process: Explain when tip branches get merged into mainline

Explain when tip branches get merged into mainline.

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Christian Kujau <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/process/maintainer-tip.rst | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst
index 178c95f..93d8a79 100644
--- a/Documentation/process/maintainer-tip.rst
+++ b/Documentation/process/maintainer-tip.rst
@@ -421,6 +421,9 @@ allowing themselves a breath. Please respect that.
The release candidate -rc1 is the starting point for new patches to be
applied which are targeted for the next merge window.

+So called _urgent_ branches will be merged into mainline during the
+stabilization phase of each release.
+

Git
^^^

Subject: [tip: x86/cleanups] Documentation/process: Explain when tip branches get merged into mainline

The following commit has been merged into the x86/cleanups branch of tip:

Commit-ID: 4f1192559707eaa7adef307f5b9ad3a444b248f8
Gitweb: https://git.kernel.org/tip/4f1192559707eaa7adef307f5b9ad3a444b248f8
Author: Christian Kujau <[email protected]>
AuthorDate: Sat, 18 Feb 2023 22:29:44 +01:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 15 May 2023 17:11:28 +02:00

Documentation/process: Explain when tip branches get merged into mainline

Explain when tip branches get merged into mainline.

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Christian Kujau <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/process/maintainer-tip.rst | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst
index 178c95f..93d8a79 100644
--- a/Documentation/process/maintainer-tip.rst
+++ b/Documentation/process/maintainer-tip.rst
@@ -421,6 +421,9 @@ allowing themselves a breath. Please respect that.
The release candidate -rc1 is the starting point for new patches to be
applied which are targeted for the next merge window.

+So called _urgent_ branches will be merged into mainline during the
+stabilization phase of each release.
+

Git
^^^