2021-07-11 22:51:10

by Linus Torvalds

[permalink] [raw]
Subject: Linux 5.14-rc1

You all know the drill by now. It's been the usual two weeks of merge
window, and not it's closed, and 5.14-rc1 is out there.

As usual, it's much too big to post the shortlog, with about 13k
commits (an another ~800 merge commits) by about 1650 developers, and
a diffstat summary of

11859 files changed, 817707 insertions(+), 285485 deletions(-)

Appended is my mergelog which gives you an overview of what I've
pulled during the merge window, and who I pulled from. And as usual, I
want to stress how this is obviously just a very high-level summary,
and tiny part of the actual developer community - if you want the full
details of all those changes, you'll have to go to the -git tree.

On the whole, I don't think there are any huge surprises in here, and
size-wise this seems to be a pretty regular release too. Let's hope
that that translates to a nice and calm release cycle, but you never
know. Last release was big, but it was all fairly calm despite that,
so size isn't always the determining factor here..

If somebody wants to look at the actual diff for the release, I'd
encourage you to ignore - once again - another set of big AMD GPU
hardware description header files. We seem to have those fairly
regularly, and they are always these huge generated headers that end
up dwarfing everything else. Almost exactly half of the whole 5.14-rc1
patch is comprised of those GPU headers, and it skews the statistics a
lot.

Now, even if you ignore that AMD header drop, drivers account for over
two thirds of the changes when you look at the diff, and that's
perfectly normal. What's slightly less usual is how there's a lot of
line _removals_ in there, with the old IDE layer finally having met
its long-overdue demise, and all our IDE support is now based on
libata.

Of course, the fact that we removed all that legacy IDE code doesn't
mean that we had a reduction in lines over-all: a few tens of
thousands of lines of legacy code is nowhere near enough to balance
out the usual kernel growth. But it's still a nice thing to see the
cleanup.

So drivers dominate: even when ignoring the AMD header addition
there's a fair amount of gpu updates, but there's networking drivers,
rdma, sound, scsi, staging, media...

Outside of drivers, there's all the usual suspects: architecture
updates (arm, arm64, x86, powerpc, s390, with a smattering of other
architecture updates too) and various core kernel updates: networking,
filesystems, VM, scheduling etc. And the usual documentation and
tooling (perf and self-tests) updates.

Please do test, and we can get the whole calming-down period rolling
and hopefully get a timely final 5.14 release.

Linus

---

Al Viro (3):
vfs d_path() updates
iov_iter updates
vfs name lookup updates

Alex Williamson (1):
VFIO updates

Alexandre Belloni (2):
i3c updates
RTC updates

Andreas Gruenbacher (1):
gfs2 updates

Andrew Morton (3):
misc updates
more updates
yet more updates

Arnaldo Carvalho de Melo (2):
perf tool updates
more perf tool updates

Arnd Bergmann (1):
asm/unaligned.h unification

Bartosz Golaszewski (1):
gpio updates

Bjorn Andersson (2):
remoteproc updates
hwspinlock updates

Bjorn Helgaas (2):
pci updates
pci fix

Borislav Petkov (3):
x86 RAS updates
x86 cpu updates
x86 SEV updates

Bruce Fields (1):
nfsd updates

Casey Schaufler (1):
smack updates

Christian Brauner (2):
mount_setattr updates
openat2 fixes

Christoph Hellwig (2):
dma-mapping updates
configfs updates

Corey Minyard (1):
IPMI driver updates

Dan Williams (1):
CXL (Compute Express Link) updates

Daniel Lezcano (1):
thermal updates

Daniel Thompson (1):
kgdb updates

Darrick Wong (1):
xfs updates

Dave Airlie (2):
drm updates
drm fixes

David Kleikamp (1):
jfs updates

David Sterba (1):
btrfs updates

David Teigland (1):
dlm updates

Dennis Zhou (2):
percpu updates
percpu fix

Dmitry Torokhov (1):
input updates

Eric Biederman (1):
user namespace rlimit handling update

Eric Biggers (1):
fscrypt updates

Gao Xiang (1):
erofs updates

Geert Uytterhoeven (1):
m68k updates

Greg KH (5):
char / misc driver updates
driver core changes
staging / IIO driver updates
tty / serial updates
USB / Thunderbolt updates

Greg Ungerer (1):
m68knommu update

Guenter Roeck (1):
hwmon updates

Guo Ren (1):
arch/csky updates

Gustavo Silva (3):
fallthrough fixes
array-bounds fixes
more fallthrough fixes

Hans de Goede (1):
x86 platform driver updates

Herbert Xu (2):
crypto updates
crypto fixes

Ilya Dryomov (1):
ceph updates

Ingo Molnar (19):
EFI updates
objtool fix and updates
locking updates
perf events updates
scheduler udpates
timers/nohz updates
x86 exception handling updates
x86 asm updates
x86 boot update
x86 resource control documentation fixes
x86 cleanups
x86 uapi fixlet
x86 mm update
x86 splitlock updates
scheduler fixes
locking fixes
perf fixes
scheduler fixes
irq fixes

Jaegeuk Kim (1):
f2fs updates

Jakub Kicinski (1):
networking updates

James Bottomley (2):
SCSI updates
more SCSI updates

Jan Kara (1):
misc fs updates

Jarkko Sakkinen (1):
tpm driver updates

Jason Gunthorpe (1):
rdma updates

Jassi Brar (1):
mailbox updates

Jens Axboe (6):
libata updates
core block updates
block driver updates
io_uring updates
more block updates
io_uring fixes

Jessica Yu (1):
module updates

Jiri Kosina (1):
HID updates

Joerg Roedel (1):
iommu updates

Jonathan Corbet (1):
documentation updates

Juergen Gross (1):
xen updates

Julia Lawall (1):
coccinelle updates

Kees Cook (3):
seccomp updates
pstore updates
clang feature updates

Lee Jones (2):
mfd updates
backlight updates

Linus Walleij (1):
pin control updates

Mark Brown (3):
regmap updates
regulator updates
spi updates

Masahiro Yamada (1):
Kbuild updates

Mauro Carvalho Chehab (1):
media updates

Micah Morton (1):
SafeSetID update

Michael Ellerman (2):
powerpc updates
powerpc fixes

Michael Tsirkin (1):
virtio,vhost,vdpa updates

Michal Simek (1):
microblaze updates

Mike Marshall (1):
orangefs updates

Mike Rapoport (2):
memblock updates
memblock fix

Mike Snitzer (1):
device mapper updates

Miklos Szeredi (1):
fuse updates

Mimi Zohar (1):
integrity subsystem updates

Namjae Jeon (1):
exfat updates

Olof Johansson (3):
ARM SoC updates
ARM devicetree updates
ARM driver updates

Palmer Dabbelt (1):
RISC-V updates

Paolo Bonzini (1):
kvm updates

Paul E McKenney (1):
lkmm fixlet

Paul McKenney (2):
KCSAN updates
RCU updates

Paul Moore (2):
SELinux updates
audit updates

Pavel Machek (1):
LED updates

Petr Mladek (1):
printk updates

Rafael Wysocki (6):
power management updates
ACPI updates
PNP updates
device properties framework updates
more power management updates
more ACPI updates

Richard Weinberger (3):
MTD updates
UBIFS updates
UML updates

Rob Herring (1):
devicetree updates

Russell King (1):
ARM development updates

Sebastian Reichel (1):
power supply and reset updates

Shuah Khan (2):
KUnit update
Kselftest update

Stafford Horne (1):
OpenRISC updates

Stephen Boyd (2):
clk updates
more clk updates

Steve French (2):
cifs updates
cifs fixes

Steven Rostedt (2):
tracing updates
tracing fix and cleanup

Takashi Iwai (2):
sound updates
sound fixes

Ted Ts'o (2):
ext4 updates
ext4 updates

Tejun Heo (1):
cgroup updates

Tetsuo Handa (1):
tomoyo fix

Thierry Reding (1):
pwm updates

Thomas Bogendoerfer (2):
MIPS updates
MIPS fixes

Thomas Gleixner (7):
CPU hotplug cleanup
CPU hotplug fix
irq updates
timer updates
x86 interrupt related updates
x86 entry code related updates
x86 fpu updates

Tony Luck (1):
EDAC updates

Trond Myklebust (1):
NFS client updates

Ulf Hansson (2):
MMC and MEMSTICK updates
MMC fixes

Vasily Gorbik (2):
s390 updates
more s390 updates

Vinod Koul (1):
dmaengine updates

Wei Liu (1):
hyperv updates

Will Deacon (1):
arm64 updates

Wim Van Sebroeck (1):
watchdog updates

Wolfram Sang (1):
i2c updates


2021-07-12 02:02:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Sun, Jul 11, 2021 at 03:49:31PM -0700, Linus Torvalds wrote:
> You all know the drill by now. It's been the usual two weeks of merge
> window, and not it's closed, and 5.14-rc1 is out there.
>
[ ... [
> Please do test, and we can get the whole calming-down period rolling
> and hopefully get a timely final 5.14 release.
>

Build results:
total: 154 pass: 152 fail: 2
Failed builds:
arcv2:allnoconfig
riscv:allmodconfig
Qemu test results:
total: 462 pass: 443 fail: 19
Failed tests:
arm:z2:pxa_defconfig:nodebug:nocd:nofs:nonvme:noscsi:notests:novirt:nofdt:flash8,384k,2:rootfs
<all riscv32>

z2:pxa_defconfig fails to boot due to commit 4b361cfa8624 ("mtd: core:
add OTP nvmem provider support"). A patch to fix the problem has been
posted at
https://patchwork.ozlabs.org/project/linux-mtd/patch/[email protected]/

The riscv:allmodconfig build failure is not new. It is seen if both
STACKPROTECTOR_PER_TASK and GCC_PLUGIN_RANDSTRUCT are enabled.
See
https://patchwork.kernel.org/project/linux-riscv/patch/[email protected]/
for details and a proposed fix.

riscv32 images fail to boot due to commit ca6eaaa210de ("riscv:
__asm_copy_to-from_user: Optimize unaligned memory access and pipeline
stall"). I reported this a couple of days ago, but have not seen a reply.

In addition to that, there are some new warning tracebacks.

WARNING: CPU: 0 PID: 55 at crypto/testmgr.c:5652 alg_test.part.0+0x148/0x460
self-tests for drbg_nopr_hmac_sha512 (stdrng) failed (rc=-22)

This is due to commits

9b7b94683a9b crypto: DRBG - switch to HMAC SHA512 DRBG as default DRBG
8833272d876e crypto: drbg - self test for HMAC(SHA-512)

which set the default crypto algorithm to SHA-512 without actually
mandating CONFIG_CRYPTO_SHA512. A patch to fix this has been posted at
https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/

WARNING: CPU: 0 PID: 24 at block/genhd.c:484 __device_add_disk+0x248/0x286

This is seen with riscv64 images when booting from usb or scsi drives.
I don't recall seeing this warning before, but I may have missed it
in the flurry of other warnings. It may have been introduced with commit
7c3f828b522b0 ("block: refactor device number setup in __device_add_disk")
but I did not try to bisect it yet.

Guenter

2021-07-12 04:18:50

by Guenter Roeck

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Sun, Jul 11, 2021 at 06:56:21PM -0700, Guenter Roeck wrote:
> On Sun, Jul 11, 2021 at 03:49:31PM -0700, Linus Torvalds wrote:
> > You all know the drill by now. It's been the usual two weeks of merge
> > window, and not it's closed, and 5.14-rc1 is out there.
> >
> [ ... ]
> > Please do test, and we can get the whole calming-down period rolling
> > and hopefully get a timely final 5.14 release.
> >
>
[ ... ]
>
> WARNING: CPU: 0 PID: 24 at block/genhd.c:484 __device_add_disk+0x248/0x286
>
> This is seen with riscv64 images when booting from usb or scsi drives.
> I don't recall seeing this warning before, but I may have missed it
> in the flurry of other warnings. It may have been introduced with commit
> 7c3f828b522b0 ("block: refactor device number setup in __device_add_disk")
> but I did not try to bisect it yet.
>
My guess was correct. Bisect points to the above commit. Bisect log as well
as complete backtrace and example qemu command attached.

Copying Christoph and Jens.

Guenter

---
# bad: [3dbdb38e286903ec220aaf1fb29a8d94297da246] Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
# good: [007b350a58754a93ca9fe50c498cc27780171153] Merge tag 'dlm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
git bisect start '3dbdb38e2869' '007b350a5875'
# good: [b6df00789e2831fff7a2c65aa7164b2a4dcbe599] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good b6df00789e2831fff7a2c65aa7164b2a4dcbe599
# good: [990ec3014deedfed49e610cdc31dc6930ca63d8d] drm/amdgpu: add psp runtime db structures
git bisect good 990ec3014deedfed49e610cdc31dc6930ca63d8d
# bad: [c288d9cd710433e5991d58a0764c4d08a933b871] Merge tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block
git bisect bad c288d9cd710433e5991d58a0764c4d08a933b871
# bad: [df668a5fe461bb9d7e899c538acc7197746038f4] Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block
git bisect bad df668a5fe461bb9d7e899c538acc7197746038f4
# good: [4b5e35ce075817bc36d7c581b22853be984e5b41] Merge tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good 4b5e35ce075817bc36d7c581b22853be984e5b41
# bad: [e42cfb1da0bf33c313318da201730324c423351d] block: Remove unnecessary elevator operation checks
git bisect bad e42cfb1da0bf33c313318da201730324c423351d
# bad: [c97d93c31e5734a16bfe663085ec91b8c9fb20f9] block: factor out a part_devt helper
git bisect bad c97d93c31e5734a16bfe663085ec91b8c9fb20f9
# bad: [7681750bd35fe92dd915f4df177d45265e78a933] zram: convert to blk_alloc_disk/blk_cleanup_disk
git bisect bad 7681750bd35fe92dd915f4df177d45265e78a933
# good: [56b68085e536eff2676108f2f8356889a7dbbf55] blk-mq: Some tag allocation code refactoring
git bisect good 56b68085e536eff2676108f2f8356889a7dbbf55
# bad: [958229a7c55f219b1cff99f939dabbc1b6ba7161] block: add a flag to make put_disk on partially initalized disks safer
git bisect bad 958229a7c55f219b1cff99f939dabbc1b6ba7161
# bad: [7c3f828b522b07adb341b08fde1660685c5ba3eb] block: refactor device number setup in __device_add_disk
git bisect bad 7c3f828b522b07adb341b08fde1660685c5ba3eb
# good: [d97e594c51660bea510a387731637b894651e4b5] blk-mq: Use request queue-wide tags for tagset-wide sbitmap
git bisect good d97e594c51660bea510a387731637b894651e4b5
# first bad commit: [7c3f828b522b07adb341b08fde1660685c5ba3eb] block: refactor device number setup in __device_add_disk

---
[ 11.940230] Waiting for root device /dev/sda...
[ 12.066026] usb 1-1: new full-speed USB device number 2 using ohci-pci
[ 12.306673] usb-storage 1-1:1.0: USB Mass Storage device detected
[ 12.310957] scsi host0: usb-storage 1-1:1.0
[ 13.354722] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 13.370433] sd 0:0:0:0: Power-on or device reset occurred
[ 13.390621] sd 0:0:0:0: [sda] 32768 512-byte logical blocks: (16.8 MB/16.0 MiB)
[ 13.396348] sd 0:0:0:0: [sda] Write Protect is off
[ 13.402622] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 13.403994] ------------[ cut here ]------------
[ 13.404165] WARNING: CPU: 0 PID: 7 at block/genhd.c:484 __device_add_disk+0x248/0x286
[ 13.404393] Modules linked in:
[ 13.404601] CPU: 0 PID: 7 Comm: kworker/u2:0 Not tainted 5.14.0-rc1 #1
[ 13.404830] Hardware name: riscv-virtio,qemu (DT)
[ 13.405081] Workqueue: events_unbound async_run_entry_fn
[ 13.405309] epc : __device_add_disk+0x248/0x286
[ 13.405496] ra : __device_add_disk+0x1b2/0x286
[ 13.405657] epc : ffffffff8042a4cc ra : ffffffff8042a436 sp : ffffffd00024bb80
[ 13.405863] gp : ffffffff819d15a8 tp : ffffffe0027a8040 t0 : ffffffe01f6f48f8
[ 13.406087] t1 : 000000006faf79ac t2 : 00000000000001a5 s0 : ffffffd00024bbc0
[ 13.406293] s1 : ffffffe004450e00 a0 : 0000000000006000 a1 : ffffffe0027a88b0
[ 13.406499] a2 : ffffffff819e2890 a3 : 0000000000000000 a4 : 0000000000000008
[ 13.406703] a5 : 0000000000000000 a6 : 0000000000001fff a7 : 0000000000000000
[ 13.406908] s2 : ffffffe004450e00 s3 : 0000000000000001 s4 : 0000000000000000
[ 13.407135] s5 : ffffffe00438c268 s6 : 0000000000000000 s7 : 0000000000000000
[ 13.407344] s8 : ffffffff819d41b8 s9 : ffffffff819d4298 s10: ffffffe00261a858
[ 13.407550] s11: ffffffe00261a8d0 t3 : 0000000045db8cae t4 : 000000000000000c
[ 13.407752] t5 : fffffffff04a2835 t6 : 0000000000001fff
[ 13.407912] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003
[ 13.408179] [<ffffffff8042a4cc>] __device_add_disk+0x248/0x286
[ 13.408394] [<ffffffff8042a518>] device_add_disk+0xe/0x16
[ 13.408555] [<ffffffff806e3886>] sd_probe+0x2b8/0x366
[ 13.408711] [<ffffffff8067bce4>] really_probe.part.0+0x188/0x222
[ 13.408886] [<ffffffff8067be16>] __driver_probe_device+0x98/0xbe
[ 13.409079] [<ffffffff8067be68>] driver_probe_device+0x2c/0xb0
[ 13.409247] [<ffffffff8067c330>] __device_attach_driver+0x62/0x9a
[ 13.409419] [<ffffffff80679c7e>] bus_for_each_drv+0x5c/0xa2
[ 13.409580] [<ffffffff8067b458>] __device_attach_async_helper+0x88/0x92
[ 13.409766] [<ffffffff80032e12>] async_run_entry_fn+0x22/0xc4
[ 13.409930] [<ffffffff80027e28>] process_one_work+0x1f4/0x53a
[ 13.410114] [<ffffffff800281ec>] worker_thread+0x7e/0x324
[ 13.410272] [<ffffffff8002fa1e>] kthread+0x100/0x116
[ 13.410419] [<ffffffff80003648>] ret_from_exception+0x0/0x10
[ 13.410614] irq event stamp: 59724
[ 13.410733] hardirqs last enabled at (59723): [<ffffffff80a1471c>] _raw_spin_unlock_irqrestore+0x54/0x62
[ 13.411019] hardirqs last disabled at (59724): [<ffffffff80003592>] _save_context+0x7c/0xe0
[ 13.411249] softirqs last enabled at (34082): [<ffffffff80a1510a>] __do_softirq+0x39a/0x520
[ 13.411496] softirqs last disabled at (34073): [<ffffffff80014354>] irq_exit+0xd2/0xde
[ 13.411733] ---[ end trace 644c7abe39308f0f ]---
[ 13.480431] sd 0:0:0:0: [sda] Attached SCSI disk
[ 13.511335] EXT4-fs (sda): mounting ext2 file system using the ext4 subsystem
[ 13.536810] EXT4-fs (sda): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
[ 13.537632] VFS: Mounted root (ext2 filesystem) readonly on device 8:0.

---

Sample qemu command:

qemu-system-riscv64 -M virt -m 512M \
-no-reboot -bios default -kernel arch/riscv/boot/Image \
-snapshot -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-usb -device pci-ohci,id=ohci -device usb-storage,bus=ohci.0,drive=d0 \
-drive file=/var/cache/buildbot/riscv64/rootfs.ext2,if=none,id=d0,format=raw \
-append "root=/dev/sda rootwait console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
-nographic -monitor none

The problem is seen with various USB boot variants (ohcu, ehci, xhci, uas-ehci,
uas-xhci) and all SCSI controllers supported by qemu.

2021-07-12 05:22:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Sun, Jul 11, 2021 at 09:14:23PM -0700, Guenter Roeck wrote:
> My guess was correct. Bisect points to the above commit. Bisect log as well
> as complete backtrace and example qemu command attached.
>
> Copying Christoph and Jens.

This should fіx it:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 6d2d63629a90..b8d55af763f9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -98,11 +98,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);

-#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
#define SD_MINORS 16
-#else
-#define SD_MINORS 0
-#endif

static void sd_config_discard(struct scsi_disk *, unsigned int);
static void sd_config_write_same(struct scsi_disk *);

2021-07-12 10:10:17

by Jon Masters

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On 7/11/21 6:49 PM, Linus Torvalds wrote:
> You all know the drill by now. It's been the usual two weeks of merge
> window, and not it's closed, and 5.14-rc1 is out there.

I happened to be installing a Fedora 34 (x86) VM for something and did a
test kernel compile that hung on boot. Setting up a serial console I get
the below backtrace from ttm but I have not had chance to look at it.

Fedora 34 (Server Edition)
Kernel 5.14.0-rc1 on an x86_64 (ttyS0)

Web console: https://fedora:9090/ or https://192.168.1.91:9090/

fedora login: [ 11.263539] BUG: kernel NULL pointer dereference,
address: 0000000000000010
[ 11.266355] #PF: supervisor read access in kernel mode
[ 11.268409] #PF: error_code(0x0000) - not-present page
[ 11.270456] PGD 0 P4D 0
[ 11.271506] Oops: 0000 [#1] SMP PTI
[ 11.272903] CPU: 1 PID: 41 Comm: kworker/1:1 Not tainted 5.14.0-rc1 #1
[ 11.275488] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
0.0.0 02/06/2015
[ 11.278274] Workqueue: events ttm_device_delayed_workqueue [ttm]
[ 11.279865] RIP: 0010:qxl_bo_delete_mem_notify+0x19/0x40 [qxl]
[ 11.281404] Code: 89 e7 45 31 e4 e8 67 bf f6 dc eb ea 0f 1f 44 00 00
0f 1f 44 00 00 55 48 89 fd e8 a2 02 00 00 84 c0 74 0d 48 8b 85 68 01 00
00 <83> 78 10 03 74 02 5d c3 8b 85 64 02 00 00 85 c0 74 f4 48 8b 7d 08
[ 11.286271] RSP: 0018:ffffb7a24017fdd0 EFLAGS: 00010202
[ 11.287616] RAX: 0000000000000000 RBX: ffff9da7c08e8670 RCX:
ffff9da7c0b30000
[ 11.288978] RDX: ffff9da7c27f7990 RSI: ffff9da7c27f7990 RDI:
ffff9da7c27f7800
[ 11.290332] RBP: ffff9da7c27f7800 R08: ffff9da7c27f7990 R09:
0000000000000000
[ 11.291690] R10: ffff9da7c991ec00 R11: 0000000000000000 R12:
ffff9da7c27f7990
[ 11.293021] R13: ffff9da7c27f7800 R14: ffff9da7c27f7960 R15:
ffff9da7c27f7990
[ 11.294349] FS: 0000000000000000(0000) GS:ffff9da937c80000(0000)
knlGS:0000000000000000
[ 11.295853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.296935] CR2: 0000000000000010 CR3: 000000010c178004 CR4:
0000000000370ee0
[ 11.298111] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 11.299120] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 11.300130] Call Trace:
[ 11.300489] ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm]
[ 11.301256] ttm_bo_release+0x1a1/0x300 [ttm]
[ 11.301879] ttm_bo_delayed_delete+0x1be/0x220 [ttm]
[ 11.302587] ttm_device_delayed_workqueue+0x18/0x40 [ttm]
[ 11.303358] process_one_work+0x1ec/0x390
[ 11.303941] worker_thread+0x53/0x3e0
[ 11.304464] ? process_one_work+0x390/0x390
[ 11.305066] kthread+0x127/0x150
[ 11.305535] ? set_kthread_struct+0x40/0x40
[ 11.306188] ret_from_fork+0x22/0x30
[ 11.306749] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set
nf_tables rfkill nfnetlink ip6table_filter ip6_tables iptable_filter
sunrpc vfat fat snd_hda_codec_generic intel_rapl_msr snd_hda_intel
intel_rapl_common snd_intel_dspcfg snd_hda_codec isst_if_common
snd_hwdep snd_hda_core iTCO_wdt intel_pmc_bxt iTCO_vendor_support
kvm_intel snd_seq snd_seq_device snd_pcm kvm joydev irqbypass i2c_i801
rapl i2c_smbus snd_timer snd virtio_balloon lpc_ich soundcore fuse zram
ip_tables xfs qxl drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul
crc32_pclmul crc32c_intel cec drm ghash_clmulni_intel serio_raw
virtio_blk qemu_fw_cfg virtio_net virtio_console net_failover failover
pkcs8_key_parser
[ 11.318215] CR2: 0000000000000010
[ 11.318670] ---[ end trace 20fb2a3e9bc19a76 ]---
[ 11.319300] RIP: 0010:qxl_bo_delete_mem_notify+0x19/0x40 [qxl]
[ 11.320090] Code: 89 e7 45 31 e4 e8 67 bf f6 dc eb ea 0f 1f 44 00 00
0f 1f 44 00 00 55 48 89 fd e8 a2 02 00 00 84 c0 74 0d 48 8b 85 68 01 00
00 <83> 78 10 03 74 02 5d c3 8b 85 64 02 00 00 85 c0 74 f4 48 8b 7d 08
[ 11.322574] RSP: 0018:ffffb7a24017fdd0 EFLAGS: 00010202
[ 11.323271] RAX: 0000000000000000 RBX: ffff9da7c08e8670 RCX:
ffff9da7c0b30000
[ 11.324226] RDX: ffff9da7c27f7990 RSI: ffff9da7c27f7990 RDI:
ffff9da7c27f7800
[ 11.325186] RBP: ffff9da7c27f7800 R08: ffff9da7c27f7990 R09:
0000000000000000
[ 11.326145] R10: ffff9da7c991ec00 R11: 0000000000000000 R12:
ffff9da7c27f7990
[ 11.327092] R13: ffff9da7c27f7800 R14: ffff9da7c27f7960 R15:
ffff9da7c27f7990
[ 11.328032] FS: 0000000000000000(0000) GS:ffff9da937c80000(0000)
knlGS:0000000000000000
[ 11.329086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.329848] CR2: 0000000000000010 CR3: 000000010c178004 CR4:
0000000000370ee0
[ 11.330810] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 11.331746] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400


--
Computer Architect

2021-07-12 13:55:10

by Guenter Roeck

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On 7/11/21 10:20 PM, Christoph Hellwig wrote:
> On Sun, Jul 11, 2021 at 09:14:23PM -0700, Guenter Roeck wrote:
>> My guess was correct. Bisect points to the above commit. Bisect log as well
>> as complete backtrace and example qemu command attached.
>>
>> Copying Christoph and Jens.
>
> This should fіx it:
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 6d2d63629a90..b8d55af763f9 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -98,11 +98,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
> MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
> -#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
> #define SD_MINORS 16
> -#else
> -#define SD_MINORS 0
> -#endif
>
> static void sd_config_discard(struct scsi_disk *, unsigned int);
> static void sd_config_write_same(struct scsi_disk *);
>

Yes, that fixes the problem for me.

Tested-by: Guenter Roeck <[email protected]>

Thanks,
Guenter

2021-07-12 19:06:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Mon, Jul 12, 2021 at 6:53 AM Guenter Roeck <[email protected]> wrote:
>
> On 7/11/21 10:20 PM, Christoph Hellwig wrote:
> >
> > This should fіx it:
> >
> > -#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
> > #define SD_MINORS 16
> > -#else
> > -#define SD_MINORS 0
> > -#endif
> >
> > static void sd_config_discard(struct scsi_disk *, unsigned int);
> > static void sd_config_write_same(struct scsi_disk *);
> >
>
> Yes, that fixes the problem for me.
>
> Tested-by: Guenter Roeck <[email protected]>

Thanks for reporting and testing.

Christoph, can I get that as a proper patch with a commit message?

Linus

2021-07-12 19:18:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Mon, Jul 12, 2021 at 12:08 AM Jon Masters <[email protected]> wrote:
>
> I happened to be installing a Fedora 34 (x86) VM for something and did a
> test kernel compile that hung on boot. Setting up a serial console I get
> the below backtrace from ttm but I have not had chance to look at it.

It's a NULL pointer in qxl_bo_delete_mem_notify(), with the code
disassembling to

16: 55 push %rbp
17: 48 89 fd mov %rdi,%rbp
1a: e8 a2 02 00 00 callq 0x2c1
1f: 84 c0 test %al,%al
21: 74 0d je 0x30
23: 48 8b 85 68 01 00 00 mov 0x168(%rbp),%rax
2a:* 83 78 10 03 cmpl $0x3,0x10(%rax) <-- trapping instruction
2e: 74 02 je 0x32
30: 5d pop %rbp
31: c3 retq

and that "cmpl $3" looks exactly like that

if (bo->resource->mem_type == TTM_PL_PRIV

and the bug is almost certainly from commit d3116756a710 ("drm/ttm:
rename bo->mem and make it a pointer"), which did

- if (bo->mem.mem_type == TTM_PL_PRIV ...
+ if (bo->resource->mem_type == TTM_PL_PRIV ...

and claimed "No functional change".

But clearly the "bo->resource" pointer is NULL.

Added guilty parties and dri-devel mailing list.

Christian? Full report at

https://lore.kernel.org/lkml/[email protected]/

but there's not a whole lot else there that is interesting except for
the call trace:

ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm]
ttm_bo_release+0x1a1/0x300 [ttm]
ttm_bo_delayed_delete+0x1be/0x220 [ttm]
ttm_device_delayed_workqueue+0x18/0x40 [ttm]
process_one_work+0x1ec/0x390
worker_thread+0x53/0x3e0

so it's presumably the cleanup phase and perhaps "bo->resource" has
been deallocated and cleared?

Linus

2021-07-12 19:26:21

by Christian König

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

Hi guys,

Am 12.07.21 um 21:14 schrieb Linus Torvalds:
> On Mon, Jul 12, 2021 at 12:08 AM Jon Masters <[email protected]> wrote:
>> I happened to be installing a Fedora 34 (x86) VM for something and did a
>> test kernel compile that hung on boot. Setting up a serial console I get
>> the below backtrace from ttm but I have not had chance to look at it.
> It's a NULL pointer in qxl_bo_delete_mem_notify(), with the code
> disassembling to
>
> 16: 55 push %rbp
> 17: 48 89 fd mov %rdi,%rbp
> 1a: e8 a2 02 00 00 callq 0x2c1
> 1f: 84 c0 test %al,%al
> 21: 74 0d je 0x30
> 23: 48 8b 85 68 01 00 00 mov 0x168(%rbp),%rax
> 2a:* 83 78 10 03 cmpl $0x3,0x10(%rax) <-- trapping instruction
> 2e: 74 02 je 0x32
> 30: 5d pop %rbp
> 31: c3 retq
>
> and that "cmpl $3" looks exactly like that
>
> if (bo->resource->mem_type == TTM_PL_PRIV
>
> and the bug is almost certainly from commit d3116756a710 ("drm/ttm:
> rename bo->mem and make it a pointer"), which did
>
> - if (bo->mem.mem_type == TTM_PL_PRIV ...
> + if (bo->resource->mem_type == TTM_PL_PRIV ...
>
> and claimed "No functional change".
>
> But clearly the "bo->resource" pointer is NULL.
>
> Added guilty parties and dri-devel mailing list.
>
> Christian? Full report at
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2Fa9473821-1d53-0037-7590-aeaf8e85e72a%40jonmasters.org%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C06dd885408e84008a9a208d945694d9f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637617140858341274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UlqsiWTjfJZ4%2FeIJJMh1AeCqs5SeFjNG%2F22UiuVAIII%3D&amp;reserved=0
>
> but there's not a whole lot else there that is interesting except for
> the call trace:
>
> ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm]
> ttm_bo_release+0x1a1/0x300 [ttm]
> ttm_bo_delayed_delete+0x1be/0x220 [ttm]
> ttm_device_delayed_workqueue+0x18/0x40 [ttm]
> process_one_work+0x1ec/0x390
> worker_thread+0x53/0x3e0
>
> so it's presumably the cleanup phase and perhaps "bo->resource" has
> been deallocated and cleared?

That's a known issue. Fixed by:

commit 3efe180d5105d367ae1dfadb97892ab93a89a783
Author: Christian König <[email protected]>
Date:   Tue Jul 6 08:51:25 2021 +0200

    drm/qxl: add NULL check for bo->resource

    When allocations fails that can be NULL now.

Previously the structure was embedded into the buffer object and when
allocation failed (or never happened in a temporary buffer) the
structure was just zeroed.

Going to double check tomorrow why that hasn't showed up in your tree yet.

Christian.


>
> Linus

2021-07-12 19:27:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Mon, Jul 12, 2021 at 12:03:36PM -0700, Linus Torvalds wrote:
> Christoph, can I get that as a proper patch with a commit message?

https://lore.kernel.org/linux-scsi/[email protected]/T/#u

2021-07-12 19:30:25

by Guenter Roeck

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On 7/12/21 12:03 PM, Linus Torvalds wrote:
> On Mon, Jul 12, 2021 at 6:53 AM Guenter Roeck <[email protected]> wrote:
>>
>> On 7/11/21 10:20 PM, Christoph Hellwig wrote:
>>>
>>> This should fіx it:
>>>
>>> -#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>>> #define SD_MINORS 16
>>> -#else
>>> -#define SD_MINORS 0
>>> -#endif
>>>
>>> static void sd_config_discard(struct scsi_disk *, unsigned int);
>>> static void sd_config_write_same(struct scsi_disk *);
>>>
>>
>> Yes, that fixes the problem for me.
>>
>> Tested-by: Guenter Roeck <[email protected]>
>
> Thanks for reporting and testing.
>
> Christoph, can I get that as a proper patch with a commit message?
>

Christoph already sent it:

https://patchwork.kernel.org/project/linux-block/patch/[email protected]/

Guenter

2021-07-12 19:30:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 5.14-rc1

On Mon, Jul 12, 2021 at 12:24 PM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Jul 12, 2021 at 12:03:36PM -0700, Linus Torvalds wrote:
> > Christoph, can I get that as a proper patch with a commit message?
>
> https://lore.kernel.org/linux-scsi/[email protected]/T/#u

Thanks, applied and pushed out (along with two VM issues that also got
reported since rc1..)

Linus