You all know the drill by now. It's been the usual two weeks of merge
window, and not it's closed, and 5.14-rc1 is out there.
As usual, it's much too big to post the shortlog, with about 13k
commits (an another ~800 merge commits) by about 1650 developers, and
a diffstat summary of
11859 files changed, 817707 insertions(+), 285485 deletions(-)
Appended is my mergelog which gives you an overview of what I've
pulled during the merge window, and who I pulled from. And as usual, I
want to stress how this is obviously just a very high-level summary,
and tiny part of the actual developer community - if you want the full
details of all those changes, you'll have to go to the -git tree.
On the whole, I don't think there are any huge surprises in here, and
size-wise this seems to be a pretty regular release too. Let's hope
that that translates to a nice and calm release cycle, but you never
know. Last release was big, but it was all fairly calm despite that,
so size isn't always the determining factor here..
If somebody wants to look at the actual diff for the release, I'd
encourage you to ignore - once again - another set of big AMD GPU
hardware description header files. We seem to have those fairly
regularly, and they are always these huge generated headers that end
up dwarfing everything else. Almost exactly half of the whole 5.14-rc1
patch is comprised of those GPU headers, and it skews the statistics a
lot.
Now, even if you ignore that AMD header drop, drivers account for over
two thirds of the changes when you look at the diff, and that's
perfectly normal. What's slightly less usual is how there's a lot of
line _removals_ in there, with the old IDE layer finally having met
its long-overdue demise, and all our IDE support is now based on
libata.
Of course, the fact that we removed all that legacy IDE code doesn't
mean that we had a reduction in lines over-all: a few tens of
thousands of lines of legacy code is nowhere near enough to balance
out the usual kernel growth. But it's still a nice thing to see the
cleanup.
So drivers dominate: even when ignoring the AMD header addition
there's a fair amount of gpu updates, but there's networking drivers,
rdma, sound, scsi, staging, media...
Outside of drivers, there's all the usual suspects: architecture
updates (arm, arm64, x86, powerpc, s390, with a smattering of other
architecture updates too) and various core kernel updates: networking,
filesystems, VM, scheduling etc. And the usual documentation and
tooling (perf and self-tests) updates.
Please do test, and we can get the whole calming-down period rolling
and hopefully get a timely final 5.14 release.
Linus
---
Al Viro (3):
vfs d_path() updates
iov_iter updates
vfs name lookup updates
Alex Williamson (1):
VFIO updates
Alexandre Belloni (2):
i3c updates
RTC updates
Andreas Gruenbacher (1):
gfs2 updates
Andrew Morton (3):
misc updates
more updates
yet more updates
Arnaldo Carvalho de Melo (2):
perf tool updates
more perf tool updates
Arnd Bergmann (1):
asm/unaligned.h unification
Bartosz Golaszewski (1):
gpio updates
Bjorn Andersson (2):
remoteproc updates
hwspinlock updates
Bjorn Helgaas (2):
pci updates
pci fix
Borislav Petkov (3):
x86 RAS updates
x86 cpu updates
x86 SEV updates
Bruce Fields (1):
nfsd updates
Casey Schaufler (1):
smack updates
Christian Brauner (2):
mount_setattr updates
openat2 fixes
Christoph Hellwig (2):
dma-mapping updates
configfs updates
Corey Minyard (1):
IPMI driver updates
Dan Williams (1):
CXL (Compute Express Link) updates
Daniel Lezcano (1):
thermal updates
Daniel Thompson (1):
kgdb updates
Darrick Wong (1):
xfs updates
Dave Airlie (2):
drm updates
drm fixes
David Kleikamp (1):
jfs updates
David Sterba (1):
btrfs updates
David Teigland (1):
dlm updates
Dennis Zhou (2):
percpu updates
percpu fix
Dmitry Torokhov (1):
input updates
Eric Biederman (1):
user namespace rlimit handling update
Eric Biggers (1):
fscrypt updates
Gao Xiang (1):
erofs updates
Geert Uytterhoeven (1):
m68k updates
Greg KH (5):
char / misc driver updates
driver core changes
staging / IIO driver updates
tty / serial updates
USB / Thunderbolt updates
Greg Ungerer (1):
m68knommu update
Guenter Roeck (1):
hwmon updates
Guo Ren (1):
arch/csky updates
Gustavo Silva (3):
fallthrough fixes
array-bounds fixes
more fallthrough fixes
Hans de Goede (1):
x86 platform driver updates
Herbert Xu (2):
crypto updates
crypto fixes
Ilya Dryomov (1):
ceph updates
Ingo Molnar (19):
EFI updates
objtool fix and updates
locking updates
perf events updates
scheduler udpates
timers/nohz updates
x86 exception handling updates
x86 asm updates
x86 boot update
x86 resource control documentation fixes
x86 cleanups
x86 uapi fixlet
x86 mm update
x86 splitlock updates
scheduler fixes
locking fixes
perf fixes
scheduler fixes
irq fixes
Jaegeuk Kim (1):
f2fs updates
Jakub Kicinski (1):
networking updates
James Bottomley (2):
SCSI updates
more SCSI updates
Jan Kara (1):
misc fs updates
Jarkko Sakkinen (1):
tpm driver updates
Jason Gunthorpe (1):
rdma updates
Jassi Brar (1):
mailbox updates
Jens Axboe (6):
libata updates
core block updates
block driver updates
io_uring updates
more block updates
io_uring fixes
Jessica Yu (1):
module updates
Jiri Kosina (1):
HID updates
Joerg Roedel (1):
iommu updates
Jonathan Corbet (1):
documentation updates
Juergen Gross (1):
xen updates
Julia Lawall (1):
coccinelle updates
Kees Cook (3):
seccomp updates
pstore updates
clang feature updates
Lee Jones (2):
mfd updates
backlight updates
Linus Walleij (1):
pin control updates
Mark Brown (3):
regmap updates
regulator updates
spi updates
Masahiro Yamada (1):
Kbuild updates
Mauro Carvalho Chehab (1):
media updates
Micah Morton (1):
SafeSetID update
Michael Ellerman (2):
powerpc updates
powerpc fixes
Michael Tsirkin (1):
virtio,vhost,vdpa updates
Michal Simek (1):
microblaze updates
Mike Marshall (1):
orangefs updates
Mike Rapoport (2):
memblock updates
memblock fix
Mike Snitzer (1):
device mapper updates
Miklos Szeredi (1):
fuse updates
Mimi Zohar (1):
integrity subsystem updates
Namjae Jeon (1):
exfat updates
Olof Johansson (3):
ARM SoC updates
ARM devicetree updates
ARM driver updates
Palmer Dabbelt (1):
RISC-V updates
Paolo Bonzini (1):
kvm updates
Paul E McKenney (1):
lkmm fixlet
Paul McKenney (2):
KCSAN updates
RCU updates
Paul Moore (2):
SELinux updates
audit updates
Pavel Machek (1):
LED updates
Petr Mladek (1):
printk updates
Rafael Wysocki (6):
power management updates
ACPI updates
PNP updates
device properties framework updates
more power management updates
more ACPI updates
Richard Weinberger (3):
MTD updates
UBIFS updates
UML updates
Rob Herring (1):
devicetree updates
Russell King (1):
ARM development updates
Sebastian Reichel (1):
power supply and reset updates
Shuah Khan (2):
KUnit update
Kselftest update
Stafford Horne (1):
OpenRISC updates
Stephen Boyd (2):
clk updates
more clk updates
Steve French (2):
cifs updates
cifs fixes
Steven Rostedt (2):
tracing updates
tracing fix and cleanup
Takashi Iwai (2):
sound updates
sound fixes
Ted Ts'o (2):
ext4 updates
ext4 updates
Tejun Heo (1):
cgroup updates
Tetsuo Handa (1):
tomoyo fix
Thierry Reding (1):
pwm updates
Thomas Bogendoerfer (2):
MIPS updates
MIPS fixes
Thomas Gleixner (7):
CPU hotplug cleanup
CPU hotplug fix
irq updates
timer updates
x86 interrupt related updates
x86 entry code related updates
x86 fpu updates
Tony Luck (1):
EDAC updates
Trond Myklebust (1):
NFS client updates
Ulf Hansson (2):
MMC and MEMSTICK updates
MMC fixes
Vasily Gorbik (2):
s390 updates
more s390 updates
Vinod Koul (1):
dmaengine updates
Wei Liu (1):
hyperv updates
Will Deacon (1):
arm64 updates
Wim Van Sebroeck (1):
watchdog updates
Wolfram Sang (1):
i2c updates
On Sun, Jul 11, 2021 at 03:49:31PM -0700, Linus Torvalds wrote:
> You all know the drill by now. It's been the usual two weeks of merge
> window, and not it's closed, and 5.14-rc1 is out there.
>
[ ... [
> Please do test, and we can get the whole calming-down period rolling
> and hopefully get a timely final 5.14 release.
>
Build results:
total: 154 pass: 152 fail: 2
Failed builds:
arcv2:allnoconfig
riscv:allmodconfig
Qemu test results:
total: 462 pass: 443 fail: 19
Failed tests:
arm:z2:pxa_defconfig:nodebug:nocd:nofs:nonvme:noscsi:notests:novirt:nofdt:flash8,384k,2:rootfs
<all riscv32>
z2:pxa_defconfig fails to boot due to commit 4b361cfa8624 ("mtd: core:
add OTP nvmem provider support"). A patch to fix the problem has been
posted at
https://patchwork.ozlabs.org/project/linux-mtd/patch/[email protected]/
The riscv:allmodconfig build failure is not new. It is seen if both
STACKPROTECTOR_PER_TASK and GCC_PLUGIN_RANDSTRUCT are enabled.
See
https://patchwork.kernel.org/project/linux-riscv/patch/[email protected]/
for details and a proposed fix.
riscv32 images fail to boot due to commit ca6eaaa210de ("riscv:
__asm_copy_to-from_user: Optimize unaligned memory access and pipeline
stall"). I reported this a couple of days ago, but have not seen a reply.
In addition to that, there are some new warning tracebacks.
WARNING: CPU: 0 PID: 55 at crypto/testmgr.c:5652 alg_test.part.0+0x148/0x460
self-tests for drbg_nopr_hmac_sha512 (stdrng) failed (rc=-22)
This is due to commits
9b7b94683a9b crypto: DRBG - switch to HMAC SHA512 DRBG as default DRBG
8833272d876e crypto: drbg - self test for HMAC(SHA-512)
which set the default crypto algorithm to SHA-512 without actually
mandating CONFIG_CRYPTO_SHA512. A patch to fix this has been posted at
https://patchwork.kernel.org/project/linux-crypto/patch/[email protected]/
WARNING: CPU: 0 PID: 24 at block/genhd.c:484 __device_add_disk+0x248/0x286
This is seen with riscv64 images when booting from usb or scsi drives.
I don't recall seeing this warning before, but I may have missed it
in the flurry of other warnings. It may have been introduced with commit
7c3f828b522b0 ("block: refactor device number setup in __device_add_disk")
but I did not try to bisect it yet.
Guenter
On Sun, Jul 11, 2021 at 06:56:21PM -0700, Guenter Roeck wrote:
> On Sun, Jul 11, 2021 at 03:49:31PM -0700, Linus Torvalds wrote:
> > You all know the drill by now. It's been the usual two weeks of merge
> > window, and not it's closed, and 5.14-rc1 is out there.
> >
> [ ... ]
> > Please do test, and we can get the whole calming-down period rolling
> > and hopefully get a timely final 5.14 release.
> >
>
[ ... ]
>
> WARNING: CPU: 0 PID: 24 at block/genhd.c:484 __device_add_disk+0x248/0x286
>
> This is seen with riscv64 images when booting from usb or scsi drives.
> I don't recall seeing this warning before, but I may have missed it
> in the flurry of other warnings. It may have been introduced with commit
> 7c3f828b522b0 ("block: refactor device number setup in __device_add_disk")
> but I did not try to bisect it yet.
>
My guess was correct. Bisect points to the above commit. Bisect log as well
as complete backtrace and example qemu command attached.
Copying Christoph and Jens.
Guenter
---
# bad: [3dbdb38e286903ec220aaf1fb29a8d94297da246] Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
# good: [007b350a58754a93ca9fe50c498cc27780171153] Merge tag 'dlm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
git bisect start '3dbdb38e2869' '007b350a5875'
# good: [b6df00789e2831fff7a2c65aa7164b2a4dcbe599] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good b6df00789e2831fff7a2c65aa7164b2a4dcbe599
# good: [990ec3014deedfed49e610cdc31dc6930ca63d8d] drm/amdgpu: add psp runtime db structures
git bisect good 990ec3014deedfed49e610cdc31dc6930ca63d8d
# bad: [c288d9cd710433e5991d58a0764c4d08a933b871] Merge tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block
git bisect bad c288d9cd710433e5991d58a0764c4d08a933b871
# bad: [df668a5fe461bb9d7e899c538acc7197746038f4] Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block
git bisect bad df668a5fe461bb9d7e899c538acc7197746038f4
# good: [4b5e35ce075817bc36d7c581b22853be984e5b41] Merge tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good 4b5e35ce075817bc36d7c581b22853be984e5b41
# bad: [e42cfb1da0bf33c313318da201730324c423351d] block: Remove unnecessary elevator operation checks
git bisect bad e42cfb1da0bf33c313318da201730324c423351d
# bad: [c97d93c31e5734a16bfe663085ec91b8c9fb20f9] block: factor out a part_devt helper
git bisect bad c97d93c31e5734a16bfe663085ec91b8c9fb20f9
# bad: [7681750bd35fe92dd915f4df177d45265e78a933] zram: convert to blk_alloc_disk/blk_cleanup_disk
git bisect bad 7681750bd35fe92dd915f4df177d45265e78a933
# good: [56b68085e536eff2676108f2f8356889a7dbbf55] blk-mq: Some tag allocation code refactoring
git bisect good 56b68085e536eff2676108f2f8356889a7dbbf55
# bad: [958229a7c55f219b1cff99f939dabbc1b6ba7161] block: add a flag to make put_disk on partially initalized disks safer
git bisect bad 958229a7c55f219b1cff99f939dabbc1b6ba7161
# bad: [7c3f828b522b07adb341b08fde1660685c5ba3eb] block: refactor device number setup in __device_add_disk
git bisect bad 7c3f828b522b07adb341b08fde1660685c5ba3eb
# good: [d97e594c51660bea510a387731637b894651e4b5] blk-mq: Use request queue-wide tags for tagset-wide sbitmap
git bisect good d97e594c51660bea510a387731637b894651e4b5
# first bad commit: [7c3f828b522b07adb341b08fde1660685c5ba3eb] block: refactor device number setup in __device_add_disk
---
[ 11.940230] Waiting for root device /dev/sda...
[ 12.066026] usb 1-1: new full-speed USB device number 2 using ohci-pci
[ 12.306673] usb-storage 1-1:1.0: USB Mass Storage device detected
[ 12.310957] scsi host0: usb-storage 1-1:1.0
[ 13.354722] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 13.370433] sd 0:0:0:0: Power-on or device reset occurred
[ 13.390621] sd 0:0:0:0: [sda] 32768 512-byte logical blocks: (16.8 MB/16.0 MiB)
[ 13.396348] sd 0:0:0:0: [sda] Write Protect is off
[ 13.402622] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 13.403994] ------------[ cut here ]------------
[ 13.404165] WARNING: CPU: 0 PID: 7 at block/genhd.c:484 __device_add_disk+0x248/0x286
[ 13.404393] Modules linked in:
[ 13.404601] CPU: 0 PID: 7 Comm: kworker/u2:0 Not tainted 5.14.0-rc1 #1
[ 13.404830] Hardware name: riscv-virtio,qemu (DT)
[ 13.405081] Workqueue: events_unbound async_run_entry_fn
[ 13.405309] epc : __device_add_disk+0x248/0x286
[ 13.405496] ra : __device_add_disk+0x1b2/0x286
[ 13.405657] epc : ffffffff8042a4cc ra : ffffffff8042a436 sp : ffffffd00024bb80
[ 13.405863] gp : ffffffff819d15a8 tp : ffffffe0027a8040 t0 : ffffffe01f6f48f8
[ 13.406087] t1 : 000000006faf79ac t2 : 00000000000001a5 s0 : ffffffd00024bbc0
[ 13.406293] s1 : ffffffe004450e00 a0 : 0000000000006000 a1 : ffffffe0027a88b0
[ 13.406499] a2 : ffffffff819e2890 a3 : 0000000000000000 a4 : 0000000000000008
[ 13.406703] a5 : 0000000000000000 a6 : 0000000000001fff a7 : 0000000000000000
[ 13.406908] s2 : ffffffe004450e00 s3 : 0000000000000001 s4 : 0000000000000000
[ 13.407135] s5 : ffffffe00438c268 s6 : 0000000000000000 s7 : 0000000000000000
[ 13.407344] s8 : ffffffff819d41b8 s9 : ffffffff819d4298 s10: ffffffe00261a858
[ 13.407550] s11: ffffffe00261a8d0 t3 : 0000000045db8cae t4 : 000000000000000c
[ 13.407752] t5 : fffffffff04a2835 t6 : 0000000000001fff
[ 13.407912] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003
[ 13.408179] [<ffffffff8042a4cc>] __device_add_disk+0x248/0x286
[ 13.408394] [<ffffffff8042a518>] device_add_disk+0xe/0x16
[ 13.408555] [<ffffffff806e3886>] sd_probe+0x2b8/0x366
[ 13.408711] [<ffffffff8067bce4>] really_probe.part.0+0x188/0x222
[ 13.408886] [<ffffffff8067be16>] __driver_probe_device+0x98/0xbe
[ 13.409079] [<ffffffff8067be68>] driver_probe_device+0x2c/0xb0
[ 13.409247] [<ffffffff8067c330>] __device_attach_driver+0x62/0x9a
[ 13.409419] [<ffffffff80679c7e>] bus_for_each_drv+0x5c/0xa2
[ 13.409580] [<ffffffff8067b458>] __device_attach_async_helper+0x88/0x92
[ 13.409766] [<ffffffff80032e12>] async_run_entry_fn+0x22/0xc4
[ 13.409930] [<ffffffff80027e28>] process_one_work+0x1f4/0x53a
[ 13.410114] [<ffffffff800281ec>] worker_thread+0x7e/0x324
[ 13.410272] [<ffffffff8002fa1e>] kthread+0x100/0x116
[ 13.410419] [<ffffffff80003648>] ret_from_exception+0x0/0x10
[ 13.410614] irq event stamp: 59724
[ 13.410733] hardirqs last enabled at (59723): [<ffffffff80a1471c>] _raw_spin_unlock_irqrestore+0x54/0x62
[ 13.411019] hardirqs last disabled at (59724): [<ffffffff80003592>] _save_context+0x7c/0xe0
[ 13.411249] softirqs last enabled at (34082): [<ffffffff80a1510a>] __do_softirq+0x39a/0x520
[ 13.411496] softirqs last disabled at (34073): [<ffffffff80014354>] irq_exit+0xd2/0xde
[ 13.411733] ---[ end trace 644c7abe39308f0f ]---
[ 13.480431] sd 0:0:0:0: [sda] Attached SCSI disk
[ 13.511335] EXT4-fs (sda): mounting ext2 file system using the ext4 subsystem
[ 13.536810] EXT4-fs (sda): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
[ 13.537632] VFS: Mounted root (ext2 filesystem) readonly on device 8:0.
---
Sample qemu command:
qemu-system-riscv64 -M virt -m 512M \
-no-reboot -bios default -kernel arch/riscv/boot/Image \
-snapshot -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-usb -device pci-ohci,id=ohci -device usb-storage,bus=ohci.0,drive=d0 \
-drive file=/var/cache/buildbot/riscv64/rootfs.ext2,if=none,id=d0,format=raw \
-append "root=/dev/sda rootwait console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
-nographic -monitor none
The problem is seen with various USB boot variants (ohcu, ehci, xhci, uas-ehci,
uas-xhci) and all SCSI controllers supported by qemu.
On Sun, Jul 11, 2021 at 09:14:23PM -0700, Guenter Roeck wrote:
> My guess was correct. Bisect points to the above commit. Bisect log as well
> as complete backtrace and example qemu command attached.
>
> Copying Christoph and Jens.
This should fіx it:
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 6d2d63629a90..b8d55af763f9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -98,11 +98,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
-#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
#define SD_MINORS 16
-#else
-#define SD_MINORS 0
-#endif
static void sd_config_discard(struct scsi_disk *, unsigned int);
static void sd_config_write_same(struct scsi_disk *);
On 7/11/21 6:49 PM, Linus Torvalds wrote:
> You all know the drill by now. It's been the usual two weeks of merge
> window, and not it's closed, and 5.14-rc1 is out there.
I happened to be installing a Fedora 34 (x86) VM for something and did a
test kernel compile that hung on boot. Setting up a serial console I get
the below backtrace from ttm but I have not had chance to look at it.
Fedora 34 (Server Edition)
Kernel 5.14.0-rc1 on an x86_64 (ttyS0)
Web console: https://fedora:9090/ or https://192.168.1.91:9090/
fedora login: [ 11.263539] BUG: kernel NULL pointer dereference,
address: 0000000000000010
[ 11.266355] #PF: supervisor read access in kernel mode
[ 11.268409] #PF: error_code(0x0000) - not-present page
[ 11.270456] PGD 0 P4D 0
[ 11.271506] Oops: 0000 [#1] SMP PTI
[ 11.272903] CPU: 1 PID: 41 Comm: kworker/1:1 Not tainted 5.14.0-rc1 #1
[ 11.275488] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
0.0.0 02/06/2015
[ 11.278274] Workqueue: events ttm_device_delayed_workqueue [ttm]
[ 11.279865] RIP: 0010:qxl_bo_delete_mem_notify+0x19/0x40 [qxl]
[ 11.281404] Code: 89 e7 45 31 e4 e8 67 bf f6 dc eb ea 0f 1f 44 00 00
0f 1f 44 00 00 55 48 89 fd e8 a2 02 00 00 84 c0 74 0d 48 8b 85 68 01 00
00 <83> 78 10 03 74 02 5d c3 8b 85 64 02 00 00 85 c0 74 f4 48 8b 7d 08
[ 11.286271] RSP: 0018:ffffb7a24017fdd0 EFLAGS: 00010202
[ 11.287616] RAX: 0000000000000000 RBX: ffff9da7c08e8670 RCX:
ffff9da7c0b30000
[ 11.288978] RDX: ffff9da7c27f7990 RSI: ffff9da7c27f7990 RDI:
ffff9da7c27f7800
[ 11.290332] RBP: ffff9da7c27f7800 R08: ffff9da7c27f7990 R09:
0000000000000000
[ 11.291690] R10: ffff9da7c991ec00 R11: 0000000000000000 R12:
ffff9da7c27f7990
[ 11.293021] R13: ffff9da7c27f7800 R14: ffff9da7c27f7960 R15:
ffff9da7c27f7990
[ 11.294349] FS: 0000000000000000(0000) GS:ffff9da937c80000(0000)
knlGS:0000000000000000
[ 11.295853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.296935] CR2: 0000000000000010 CR3: 000000010c178004 CR4:
0000000000370ee0
[ 11.298111] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 11.299120] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 11.300130] Call Trace:
[ 11.300489] ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm]
[ 11.301256] ttm_bo_release+0x1a1/0x300 [ttm]
[ 11.301879] ttm_bo_delayed_delete+0x1be/0x220 [ttm]
[ 11.302587] ttm_device_delayed_workqueue+0x18/0x40 [ttm]
[ 11.303358] process_one_work+0x1ec/0x390
[ 11.303941] worker_thread+0x53/0x3e0
[ 11.304464] ? process_one_work+0x390/0x390
[ 11.305066] kthread+0x127/0x150
[ 11.305535] ? set_kthread_struct+0x40/0x40
[ 11.306188] ret_from_fork+0x22/0x30
[ 11.306749] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set
nf_tables rfkill nfnetlink ip6table_filter ip6_tables iptable_filter
sunrpc vfat fat snd_hda_codec_generic intel_rapl_msr snd_hda_intel
intel_rapl_common snd_intel_dspcfg snd_hda_codec isst_if_common
snd_hwdep snd_hda_core iTCO_wdt intel_pmc_bxt iTCO_vendor_support
kvm_intel snd_seq snd_seq_device snd_pcm kvm joydev irqbypass i2c_i801
rapl i2c_smbus snd_timer snd virtio_balloon lpc_ich soundcore fuse zram
ip_tables xfs qxl drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul
crc32_pclmul crc32c_intel cec drm ghash_clmulni_intel serio_raw
virtio_blk qemu_fw_cfg virtio_net virtio_console net_failover failover
pkcs8_key_parser
[ 11.318215] CR2: 0000000000000010
[ 11.318670] ---[ end trace 20fb2a3e9bc19a76 ]---
[ 11.319300] RIP: 0010:qxl_bo_delete_mem_notify+0x19/0x40 [qxl]
[ 11.320090] Code: 89 e7 45 31 e4 e8 67 bf f6 dc eb ea 0f 1f 44 00 00
0f 1f 44 00 00 55 48 89 fd e8 a2 02 00 00 84 c0 74 0d 48 8b 85 68 01 00
00 <83> 78 10 03 74 02 5d c3 8b 85 64 02 00 00 85 c0 74 f4 48 8b 7d 08
[ 11.322574] RSP: 0018:ffffb7a24017fdd0 EFLAGS: 00010202
[ 11.323271] RAX: 0000000000000000 RBX: ffff9da7c08e8670 RCX:
ffff9da7c0b30000
[ 11.324226] RDX: ffff9da7c27f7990 RSI: ffff9da7c27f7990 RDI:
ffff9da7c27f7800
[ 11.325186] RBP: ffff9da7c27f7800 R08: ffff9da7c27f7990 R09:
0000000000000000
[ 11.326145] R10: ffff9da7c991ec00 R11: 0000000000000000 R12:
ffff9da7c27f7990
[ 11.327092] R13: ffff9da7c27f7800 R14: ffff9da7c27f7960 R15:
ffff9da7c27f7990
[ 11.328032] FS: 0000000000000000(0000) GS:ffff9da937c80000(0000)
knlGS:0000000000000000
[ 11.329086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.329848] CR2: 0000000000000010 CR3: 000000010c178004 CR4:
0000000000370ee0
[ 11.330810] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 11.331746] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
--
Computer Architect
On 7/11/21 10:20 PM, Christoph Hellwig wrote:
> On Sun, Jul 11, 2021 at 09:14:23PM -0700, Guenter Roeck wrote:
>> My guess was correct. Bisect points to the above commit. Bisect log as well
>> as complete backtrace and example qemu command attached.
>>
>> Copying Christoph and Jens.
>
> This should fіx it:
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 6d2d63629a90..b8d55af763f9 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -98,11 +98,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
> MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
> MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
>
> -#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
> #define SD_MINORS 16
> -#else
> -#define SD_MINORS 0
> -#endif
>
> static void sd_config_discard(struct scsi_disk *, unsigned int);
> static void sd_config_write_same(struct scsi_disk *);
>
Yes, that fixes the problem for me.
Tested-by: Guenter Roeck <[email protected]>
Thanks,
Guenter
On Mon, Jul 12, 2021 at 6:53 AM Guenter Roeck <[email protected]> wrote:
>
> On 7/11/21 10:20 PM, Christoph Hellwig wrote:
> >
> > This should fіx it:
> >
> > -#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
> > #define SD_MINORS 16
> > -#else
> > -#define SD_MINORS 0
> > -#endif
> >
> > static void sd_config_discard(struct scsi_disk *, unsigned int);
> > static void sd_config_write_same(struct scsi_disk *);
> >
>
> Yes, that fixes the problem for me.
>
> Tested-by: Guenter Roeck <[email protected]>
Thanks for reporting and testing.
Christoph, can I get that as a proper patch with a commit message?
Linus
On Mon, Jul 12, 2021 at 12:08 AM Jon Masters <[email protected]> wrote:
>
> I happened to be installing a Fedora 34 (x86) VM for something and did a
> test kernel compile that hung on boot. Setting up a serial console I get
> the below backtrace from ttm but I have not had chance to look at it.
It's a NULL pointer in qxl_bo_delete_mem_notify(), with the code
disassembling to
16: 55 push %rbp
17: 48 89 fd mov %rdi,%rbp
1a: e8 a2 02 00 00 callq 0x2c1
1f: 84 c0 test %al,%al
21: 74 0d je 0x30
23: 48 8b 85 68 01 00 00 mov 0x168(%rbp),%rax
2a:* 83 78 10 03 cmpl $0x3,0x10(%rax) <-- trapping instruction
2e: 74 02 je 0x32
30: 5d pop %rbp
31: c3 retq
and that "cmpl $3" looks exactly like that
if (bo->resource->mem_type == TTM_PL_PRIV
and the bug is almost certainly from commit d3116756a710 ("drm/ttm:
rename bo->mem and make it a pointer"), which did
- if (bo->mem.mem_type == TTM_PL_PRIV ...
+ if (bo->resource->mem_type == TTM_PL_PRIV ...
and claimed "No functional change".
But clearly the "bo->resource" pointer is NULL.
Added guilty parties and dri-devel mailing list.
Christian? Full report at
https://lore.kernel.org/lkml/[email protected]/
but there's not a whole lot else there that is interesting except for
the call trace:
ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm]
ttm_bo_release+0x1a1/0x300 [ttm]
ttm_bo_delayed_delete+0x1be/0x220 [ttm]
ttm_device_delayed_workqueue+0x18/0x40 [ttm]
process_one_work+0x1ec/0x390
worker_thread+0x53/0x3e0
so it's presumably the cleanup phase and perhaps "bo->resource" has
been deallocated and cleared?
Linus
Hi guys,
Am 12.07.21 um 21:14 schrieb Linus Torvalds:
> On Mon, Jul 12, 2021 at 12:08 AM Jon Masters <[email protected]> wrote:
>> I happened to be installing a Fedora 34 (x86) VM for something and did a
>> test kernel compile that hung on boot. Setting up a serial console I get
>> the below backtrace from ttm but I have not had chance to look at it.
> It's a NULL pointer in qxl_bo_delete_mem_notify(), with the code
> disassembling to
>
> 16: 55 push %rbp
> 17: 48 89 fd mov %rdi,%rbp
> 1a: e8 a2 02 00 00 callq 0x2c1
> 1f: 84 c0 test %al,%al
> 21: 74 0d je 0x30
> 23: 48 8b 85 68 01 00 00 mov 0x168(%rbp),%rax
> 2a:* 83 78 10 03 cmpl $0x3,0x10(%rax) <-- trapping instruction
> 2e: 74 02 je 0x32
> 30: 5d pop %rbp
> 31: c3 retq
>
> and that "cmpl $3" looks exactly like that
>
> if (bo->resource->mem_type == TTM_PL_PRIV
>
> and the bug is almost certainly from commit d3116756a710 ("drm/ttm:
> rename bo->mem and make it a pointer"), which did
>
> - if (bo->mem.mem_type == TTM_PL_PRIV ...
> + if (bo->resource->mem_type == TTM_PL_PRIV ...
>
> and claimed "No functional change".
>
> But clearly the "bo->resource" pointer is NULL.
>
> Added guilty parties and dri-devel mailing list.
>
> Christian? Full report at
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2Fa9473821-1d53-0037-7590-aeaf8e85e72a%40jonmasters.org%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C06dd885408e84008a9a208d945694d9f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637617140858341274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UlqsiWTjfJZ4%2FeIJJMh1AeCqs5SeFjNG%2F22UiuVAIII%3D&reserved=0
>
> but there's not a whole lot else there that is interesting except for
> the call trace:
>
> ttm_bo_cleanup_memtype_use+0x22/0x60 [ttm]
> ttm_bo_release+0x1a1/0x300 [ttm]
> ttm_bo_delayed_delete+0x1be/0x220 [ttm]
> ttm_device_delayed_workqueue+0x18/0x40 [ttm]
> process_one_work+0x1ec/0x390
> worker_thread+0x53/0x3e0
>
> so it's presumably the cleanup phase and perhaps "bo->resource" has
> been deallocated and cleared?
That's a known issue. Fixed by:
commit 3efe180d5105d367ae1dfadb97892ab93a89a783
Author: Christian König <[email protected]>
Date: Tue Jul 6 08:51:25 2021 +0200
drm/qxl: add NULL check for bo->resource
When allocations fails that can be NULL now.
Previously the structure was embedded into the buffer object and when
allocation failed (or never happened in a temporary buffer) the
structure was just zeroed.
Going to double check tomorrow why that hasn't showed up in your tree yet.
Christian.
>
> Linus
On Mon, Jul 12, 2021 at 12:03:36PM -0700, Linus Torvalds wrote:
> Christoph, can I get that as a proper patch with a commit message?
https://lore.kernel.org/linux-scsi/[email protected]/T/#u
On 7/12/21 12:03 PM, Linus Torvalds wrote:
> On Mon, Jul 12, 2021 at 6:53 AM Guenter Roeck <[email protected]> wrote:
>>
>> On 7/11/21 10:20 PM, Christoph Hellwig wrote:
>>>
>>> This should fіx it:
>>>
>>> -#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
>>> #define SD_MINORS 16
>>> -#else
>>> -#define SD_MINORS 0
>>> -#endif
>>>
>>> static void sd_config_discard(struct scsi_disk *, unsigned int);
>>> static void sd_config_write_same(struct scsi_disk *);
>>>
>>
>> Yes, that fixes the problem for me.
>>
>> Tested-by: Guenter Roeck <[email protected]>
>
> Thanks for reporting and testing.
>
> Christoph, can I get that as a proper patch with a commit message?
>
Christoph already sent it:
https://patchwork.kernel.org/project/linux-block/patch/[email protected]/
Guenter
On Mon, Jul 12, 2021 at 12:24 PM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Jul 12, 2021 at 12:03:36PM -0700, Linus Torvalds wrote:
> > Christoph, can I get that as a proper patch with a commit message?
>
> https://lore.kernel.org/linux-scsi/[email protected]/T/#u
Thanks, applied and pushed out (along with two VM issues that also got
reported since rc1..)
Linus