2024-03-07 21:23:33

by Mikhail Gavrilov

[permalink] [raw]
Subject: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

Hi,
on one of my systems, commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b
prevents the system going into suspend mode.

Every time when I tried switch to suspend mode I saw this messages in the log:
[ 117.596548] xhci_hcd 0000:12:00.3: PM: pci_pm_suspend():
hcd_pci_suspend+0x0/0x20 returns -16
[ 117.596569] xhci_hcd 0000:12:00.3: PM: dpm_run_callback():
pci_pm_suspend+0x0/0x4e0 returns -16
[ 117.596583] xhci_hcd 0000:12:00.3: PM: failed to suspend async: error -16
[ 118.295894] PM: Some devices failed to suspend, or early wake event detected
[ 118.301032] xhci_hcd 0000:10:00.0: xHC error in resume, USBSTS 0x401, Reinit
[ 118.301129] usb usb1: root hub lost power or was reset
[ 118.301132] usb usb2: root hub lost power or was reset
[ 118.301868] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[ 118.302115] [drm] PSP is resuming...
[ 118.336045] [drm] reserve 0x1300000 from 0x85fc000000 for PSP TMR
[ 118.374741] xone-dongle 3-1.1:1.0: xone_mt76_resume_radio: resumed
[ 118.377527] nvme nvme0: 31/0/0 default/read/poll queues
[ 118.379470] nvme nvme1: 32/0/0 default/read/poll queues
[ 118.493231] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode
is not available
[ 118.493237] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY:
securedisplay ta ucode is not available
[ 118.493241] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 118.493245] amdgpu 0000:03:00.0: amdgpu: smu driver if version =
0x0000003d, smu fw if version = 0x0000003f, smu fw program = 0, smu fw
version = 0x004e7900 (78.121.0)
[ 118.493248] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ 118.609941] ata3: SATA link down (SStatus 0 SControl 300)
[ 118.610052] ata4: SATA link down (SStatus 0 SControl 300)
[ 118.610154] ata2: SATA link down (SStatus 0 SControl 300)
[ 118.610174] ata1: SATA link down (SStatus 0 SControl 300)
[ 118.690018] usb 1-12: reset high-speed USB device number 4 using xhci_hcd
[ 119.067818] usb 1-10: reset high-speed USB device number 3 using xhci_hcd
[ 119.442726] usb 1-6: reset full-speed USB device number 2 using xhci_hcd
[ 122.034768] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
your previous command: SMN_C2PMSG_66:0x00000006
SMN_C2PMSG_82:0x00000000
[ 122.034779] amdgpu 0000:03:00.0: amdgpu: Failed to enable requested
dpm features!
[ 122.034780] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[ 122.034782] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
resume of IP block <smu> failed -62
[ 122.034975] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume
failed (-62).
[ 122.034984] amdgpu 0000:03:00.0: PM: dpm_run_callback():
pci_pm_resume+0x0/0x200 returns -62
[ 122.034990] amdgpu 0000:03:00.0: PM: failed to resume async: error -62
[ 122.042111] OOM killer enabled.
[ 122.042115] Restarting tasks ... done.

So I tried to find which commit borked it.
And I successfully found it:

5d390df3bdd13d178eb2e02e60e9a480f7103f7b is the first bad commit
commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b
Author: Alexey Dobriyan <[email protected]>
Date: Tue Jan 23 13:40:00 2024 +0300

smb: client: delete "true", "false" defines

Kernel has its own official true/false definitions.

The defines aren't even used in this file.

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Steve French <[email protected]>

fs/smb/client/smbencrypt.c | 7 -------
1 file changed, 7 deletions(-)

I am convinced that suspend mode started work after reverting commit
5d390df3bdd13d178eb2e02e60e9a480f7103f7b on top of 6.8-rc7.

Bisect log and all kernel logs from each step I attached here.
Also attached build config.

Alexey, can you look into it?

--
Best Regards,
Mike Gavrilov.


Attachments:
kernel-logs.zip (777.08 kB)
bisect-log-system-wont-going-into-suspend-mode.zip (1.23 kB)
.config.zip (64.06 kB)
Download all attachments

2024-03-08 06:17:43

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

On Fri, Mar 08, 2024 at 02:22:03AM +0500, Mikhail Gavrilov wrote:
> on one of my systems, commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b
> prevents the system going into suspend mode.

> Every time when I tried switch to suspend mode I saw this messages in the log:
> [ 117.596548] xhci_hcd 0000:12:00.3: PM: pci_pm_suspend():
> hcd_pci_suspend+0x0/0x20 returns -16
> [ 117.596569] xhci_hcd 0000:12:00.3: PM: dpm_run_callback():
> pci_pm_suspend+0x0/0x4e0 returns -16
> [ 117.596583] xhci_hcd 0000:12:00.3: PM: failed to suspend async: error -16
> [ 118.295894] PM: Some devices failed to suspend, or early wake event detected
> [ 118.301032] xhci_hcd 0000:10:00.0: xHC error in resume, USBSTS 0x401, Reinit
> [ 118.301129] usb usb1: root hub lost power or was reset
> [ 118.301132] usb usb2: root hub lost power or was reset
> [ 118.301868] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
> [ 118.302115] [drm] PSP is resuming...
> [ 118.336045] [drm] reserve 0x1300000 from 0x85fc000000 for PSP TMR
> [ 118.374741] xone-dongle 3-1.1:1.0: xone_mt76_resume_radio: resumed
> [ 118.377527] nvme nvme0: 31/0/0 default/read/poll queues
> [ 118.379470] nvme nvme1: 32/0/0 default/read/poll queues
> [ 118.493231] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode
> is not available
> [ 118.493237] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY:
> securedisplay ta ucode is not available
> [ 118.493241] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
> [ 118.493245] amdgpu 0000:03:00.0: amdgpu: smu driver if version =
> 0x0000003d, smu fw if version = 0x0000003f, smu fw program = 0, smu fw
> version = 0x004e7900 (78.121.0)
> [ 118.493248] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
> [ 118.609941] ata3: SATA link down (SStatus 0 SControl 300)
> [ 118.610052] ata4: SATA link down (SStatus 0 SControl 300)
> [ 118.610154] ata2: SATA link down (SStatus 0 SControl 300)
> [ 118.610174] ata1: SATA link down (SStatus 0 SControl 300)
> [ 118.690018] usb 1-12: reset high-speed USB device number 4 using xhci_hcd
> [ 119.067818] usb 1-10: reset high-speed USB device number 3 using xhci_hcd
> [ 119.442726] usb 1-6: reset full-speed USB device number 2 using xhci_hcd
> [ 122.034768] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
> your previous command: SMN_C2PMSG_66:0x00000006
> SMN_C2PMSG_82:0x00000000
> [ 122.034779] amdgpu 0000:03:00.0: amdgpu: Failed to enable requested
> dpm features!
> [ 122.034780] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
> [ 122.034782] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
> resume of IP block <smu> failed -62
> [ 122.034975] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume
> failed (-62).
> [ 122.034984] amdgpu 0000:03:00.0: PM: dpm_run_callback():
> pci_pm_resume+0x0/0x200 returns -62
> [ 122.034990] amdgpu 0000:03:00.0: PM: failed to resume async: error -62
> [ 122.042111] OOM killer enabled.
> [ 122.042115] Restarting tasks ... done.
>
> So I tried to find which commit borked it.
> And I successfully found it:
>
> 5d390df3bdd13d178eb2e02e60e9a480f7103f7b is the first bad commit
> commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b
> Author: Alexey Dobriyan <[email protected]>
> Date: Tue Jan 23 13:40:00 2024 +0300
>
> smb: client: delete "true", "false" defines
>
> Kernel has its own official true/false definitions.
>
> The defines aren't even used in this file.
>
> Signed-off-by: Alexey Dobriyan <[email protected]>
> Signed-off-by: Steve French <[email protected]>
>
> fs/smb/client/smbencrypt.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> I am convinced that suspend mode started work after reverting commit
> 5d390df3bdd13d178eb2e02e60e9a480f7103f7b on top of 6.8-rc7.
>
> Bisect log and all kernel logs from each step I attached here.
> Also attached build config.
>
> Alexey, can you look into it?

What? Deleting unused defines breaks suspend?

Collect fs/smb/client/smbencrypt.o with and without patch and
see them being identical.

Enum in stddef.h are

enum {
false = 0,
true = 1,
};

so if defines were used somehow they would expand to same values of
same type.

Something else is going on.

2024-03-08 12:48:28

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

On Fri, Mar 8, 2024 at 11:15 AM Alexey Dobriyan <[email protected]> wrote:
>
> What? Deleting unused defines breaks suspend?
>
> Collect fs/smb/client/smbencrypt.o with and without patch and
> see them being identical.
>
> Enum in stddef.h are
>
> enum {
> false = 0,
> true = 1,
> };
>
> so if defines were used somehow they would expand to same values of
> same type.
>
> Something else is going on.

I understand your confusion.
But I didn't come up with it. And moreover, I saw what the revert does.

diff --git a/fs/smb/client/smbencrypt.c b/fs/smb/client/smbencrypt.c
index 1d1ee9f18f37..f0ce26414f17 100644
--- a/fs/smb/client/smbencrypt.c
+++ b/fs/smb/client/smbencrypt.c
@@ -26,6 +26,13 @@
#include "cifsproto.h"
#include "../common/md4.h"

+#ifndef false
+#define false 0
+#endif
+#ifndef true
+#define true 1
+#endif
+
/* following came from the other byteorder.h to avoid include conflicts */
#define CVAL(buf,pos) (((unsigned char *)(buf))[pos])
#define SSVALX(buf,pos,val) (CVAL(buf,pos)=(val)&0xFF,CVAL(buf,pos+1)=(val)>>8)

Why did this really help is a question to which I would like to find an answer.

The most interesting thing is that I have two identical systems:
Identical:
- M/B - MSI MPG B650I EDGE WIFI
- CPU - AMD Ryzen 7950x
- GPU - AMD Radeon 7900XTX
- SSD1 for system - Intel Optane 905P SSDPE21D480GAM3
- SSD2 for data - Intel D5 P5316 Series SSDPF2NV307TZN1
- PSU - Asus ROG LOKI SFX-L 1000W Platinum
- Mouse - Logitech MX Master 3s
- Keyboard - MX Keys Mini
- Linux distro (identical version of all software) - Fedora Rawhide
On one system this bug is present, on the other it is not.

Affected system: https://linux-hardware.org/?probe=9a5a8c0338
Not affected system: https://linux-hardware.org/?probe=37c62300bb

--
Best Regards,
Mike Gavrilov.

2024-03-08 17:04:38

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

On Fri, Mar 08, 2024 at 05:48:04PM +0500, Mikhail Gavrilov wrote:
> On Fri, Mar 8, 2024 at 11:15 AM Alexey Dobriyan <[email protected]> wrote:
> >
> > What? Deleting unused defines breaks suspend?
> >
> > Collect fs/smb/client/smbencrypt.o with and without patch and
> > see them being identical.
> >
> > Enum in stddef.h are
> >
> > enum {
> > false = 0,
> > true = 1,
> > };
> >
> > so if defines were used somehow they would expand to same values of
> > same type.
> >
> > Something else is going on.
>
> I understand your confusion.
> But I didn't come up with it. And moreover, I saw what the revert does.

> Why did this really help is a question to which I would like to find an answer.

OK, lets exclude newbie mistakes.

Exclude CIFS:

* start with clean compile into out-of-tree directory

mkdir ../obj-001
cp .config ../obj-001/.config
make -k -j$(nproc) O=../obj-001 # buggy kernel
sudo rm -rf /lib/modules/$(uname -r) # no mixed module copies
sudo make O=../obj-001 modules_install
sudo make O=../obj-001 install

[patch]

mkdir ../obj-002
...

This is what I use in Production(tm):

#!/bin/sh -x
sudo rm -rf /lib/modules/$(uname -r) &&\
sudo make modules_install &&\
sudo make install &&\
sudo emerge @module-rebuild &&\
sudo grub-mkconfig -o /boot/grub/grub.cfg &&\
sync &&\
sudo nvme flush /dev/nvme*n1

* After rebooting double check that build number in /proc/version
matches .version in the ../obj directory:

$ cat /proc/version
Linux version 6.7.4-100.fc38.x86_64 (mockbuild@68dbdffd8a2b4619991006cfcbec2871) (gcc (GCC) 13.2.1 20231011 (Red Hat 13.2.1-4), GNU ld version 2.39-16.fc38) [[[[[ ===> #1 <=== ]]]]] SMP PREEMPT_DYNAMIC Mon Feb 5 22:19:06 UTC 2024

$ cat ../obj/.version
1

This verifies that you've rebooted into correct kernel.

* keep both full kernel trees in two separate directories

if both vmlinux are identical, you may try to find which modules
are different

* disassemble fs/smb/client/smbencrypt.o or (cifs.ko) for both kernels

objdump -M intel -dr $(find ../obj-001 -type f -name cifs.ko) >000.s
objdump -M intel -dr $(find ../obj-002 -type f -name cifs.ko) >001.s
diff -u0 000.s 001.s

For your experiment, number should be 1 (first clean recompile from
scratch) and then 2 (after applying 1 patch).

If the bug is not 100% reproducible, then bisecting gets more
entertaining because you can't be really sure each step is in the right
direction.

> The most interesting thing is that I have two identical systems:
> Identical:
> - M/B - MSI MPG B650I EDGE WIFI
> - CPU - AMD Ryzen 7950x
> - GPU - AMD Radeon 7900XTX
> - SSD1 for system - Intel Optane 905P SSDPE21D480GAM3
> - SSD2 for data - Intel D5 P5316 Series SSDPF2NV307TZN1
> - PSU - Asus ROG LOKI SFX-L 1000W Platinum
> - Mouse - Logitech MX Master 3s
> - Keyboard - MX Keys Mini
> - Linux distro (identical version of all software) - Fedora Rawhide
> On one system this bug is present, on the other it is not.
>
> Affected system: https://linux-hardware.org/?probe=9a5a8c0338
> Not affected system: https://linux-hardware.org/?probe=37c62300bb

2024-03-11 02:29:13

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

On Fri, Mar 8, 2024 at 10:03 PM Alexey Dobriyan <[email protected]> wrote:
>
> OK, lets exclude newbie mistakes.
>
> Exclude CIFS:
>
> * start with clean compile into out-of-tree directory
>
> mkdir ../obj-001
> cp .config ../obj-001/.config
> make -k -j$(nproc) O=../obj-001 # buggy kernel
> sudo rm -rf /lib/modules/$(uname -r) # no mixed module copies
> sudo make O=../obj-001 modules_install
> sudo make O=../obj-001 install
>
> [patch]
>
> mkdir ../obj-002
> ...
>
> This is what I use in Production(tm):
>
> #!/bin/sh -x
> sudo rm -rf /lib/modules/$(uname -r) &&\
> sudo make modules_install &&\
> sudo make install &&\
> sudo emerge @module-rebuild &&\
> sudo grub-mkconfig -o /boot/grub/grub.cfg &&\
> sync &&\
> sudo nvme flush /dev/nvme*n1
>
> * After rebooting double check that build number in /proc/version
> matches .version in the ../obj directory:
>
> $ cat /proc/version
> Linux version 6.7.4-100.fc38.x86_64 (mockbuild@68dbdffd8a2b4619991006cfcbec2871) (gcc (GCC) 13.2.1 20231011 (Red Hat 13.2.1-4), GNU ld version 2.39-16.fc38) [[[[[ ===> #1 <=== ]]]]] SMP PREEMPT_DYNAMIC Mon Feb 5 22:19:06 UTC 2024
>
> $ cat ../obj/.version
> 1
>
> This verifies that you've rebooted into correct kernel.
>
> * keep both full kernel trees in two separate directories
>
> if both vmlinux are identical, you may try to find which modules
> are different
>
> * disassemble fs/smb/client/smbencrypt.o or (cifs.ko) for both kernels
>
> objdump -M intel -dr $(find ../obj-001 -type f -name cifs.ko) >000.s
> objdump -M intel -dr $(find ../obj-002 -type f -name cifs.ko) >001.s
> diff -u0 000.s 001.s
>
> For your experiment, number should be 1 (first clean recompile from
> scratch) and then 2 (after applying 1 patch).
>
> If the bug is not 100% reproducible, then bisecting gets more
> entertaining because you can't be really sure each step is in the right
> direction.
>

Apology for misleading. At the weekend I investigated the problem deeper.
And now I can say which device is broken suspend mode.
It is a DJI Osmo Pocket 3 when it is switched in file transfer mode [1].
The issue can be easily reproduced even with the 5.17 kernel with
different error message:
[ 102.441187] Freezing of tasks failed after 20.000 seconds (1 tasks
refusing to freeze, wq_busy=0):
[ 102.441220] task:(udev-worker) state:D stack:24720 pid: 1085
ppid: 997 flags:0x00004006
[ 102.441232] Call Trace:
[ 102.441235] <TASK>
[ 102.441242] __schedule+0xe21/0x49c0
[ 102.441253] ? rcu_read_lock_sched_held+0x10/0x70
[ 102.441259] ? __update_load_avg_cfs_rq+0x667/0xce0
[ 102.441267] ? cpufreq_this_cpu_can_update+0x46/0x150
[ 102.441275] ? io_schedule_timeout+0x190/0x190
[ 102.441280] ? sugov_update_single_freq+0x750/0x750
[ 102.441286] ? update_load_avg+0x1389/0x1a50
[ 102.441294] schedule+0xe0/0x280
[ 102.441300] schedule_timeout+0x1ad/0x260
[ 102.441305] ? usleep_range_state+0x170/0x170
[ 102.441313] ? set_rq_online.part.0+0x160/0x160
[ 102.441319] ? _raw_spin_unlock_irq+0x24/0x50
[ 102.441325] __wait_for_common+0x2ba/0x370
[ 102.441331] ? usleep_range_state+0x170/0x170
[ 102.441337] ? bit_wait_timeout+0x160/0x160
[ 102.441344] ? __kasan_record_aux_stack+0xe/0xa0
[ 102.441350] ? _raw_spin_unlock_irq+0x24/0x50
[ 102.441357] __flush_work+0x487/0x9b0
[ 102.441365] ? queue_delayed_work_on+0xb0/0xb0
[ 102.441371] ? flush_workqueue_prep_pwqs+0x3f0/0x3f0
[ 102.441382] ? do_one_initcall+0xd1/0x430
[ 102.441387] ? do_init_module+0x190/0x6e0
[ 102.441392] ? load_module+0x77a6/0xaca0
[ 102.441397] ? __do_sys_finit_module+0x111/0x1b0
[ 102.441401] ? do_syscall_64+0x5c/0x80
[ 102.441407] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 102.441412] ? trace_hardirqs_off+0xe/0x110
[ 102.441418] ? try_to_grab_pending+0x8a/0x630
[ 102.441425] __cancel_work_timer+0x313/0x470
[ 102.441431] ? mod_delayed_work_on+0x120/0x120
[ 102.441440] ? set_rq_online.part.0+0x160/0x160
[ 102.441448] mt76u_stop_tx+0x10b/0x360 [mt76_usb]
[ 102.441459] ? slab_free_freelist_hook+0xe7/0x1d0
[ 102.441465] ? mt76u_resume_rx+0x260/0x260 [mt76_usb]
[ 102.441474] ? set_rq_online.part.0+0x160/0x160
[ 102.441481] ? usb_poison_urb+0x1f/0x30
[ 102.441487] ? mt76u_stop_rx+0xff/0x190 [mt76_usb]
[ 102.441496] mt76u_queues_deinit+0x2b/0x900 [mt76_usb]
[ 102.441505] ? mt76x2u_register_device+0x44d/0x5b0 [mt76x2u]
[ 102.441517] mt76x2u_probe+0xc4/0x290 [mt76x2u]
[ 102.441528] usb_probe_interface+0x278/0x6f0
[ 102.441536] really_probe+0x510/0xba0
[ 102.441543] __driver_probe_device+0x29e/0x450
[ 102.441550] driver_probe_device+0x4a/0x120
[ 102.441555] __driver_attach+0x1c3/0x420
[ 102.441561] ? __device_attach_driver+0x240/0x240
[ 102.441566] bus_for_each_dev+0x130/0x1c0
[ 102.441573] ? subsys_dev_iter_exit+0x10/0x10
[ 102.441582] bus_add_driver+0x39c/0x570
[ 102.441589] driver_register+0x20d/0x380
[ 102.441594] ? __raw_spin_lock_init+0x3b/0x110
[ 102.441600] usb_register_driver+0x237/0x400
[ 102.441607] ? 0xffffffffc0634000
[ 102.441612] do_one_initcall+0xd1/0x430
[ 102.441618] ? trace_event_raw_event_initcall_level+0x1a0/0x1a0
[ 102.441627] ? kasan_unpoison+0x40/0x60
[ 102.441634] do_init_module+0x190/0x6e0
[ 102.441642] load_module+0x77a6/0xaca0
[ 102.441659] ? module_frob_arch_sections+0x20/0x20
[ 102.441665] ? ima_read_file+0x160/0x160
[ 102.441672] ? bpf_lsm_kernel_read_file+0x10/0x10
[ 102.441680] ? kernel_read_file+0x247/0x850
[ 102.441692] ? __do_sys_finit_module+0x111/0x1b0
[ 102.441697] __do_sys_finit_module+0x111/0x1b0
[ 102.441703] ? __ia32_sys_init_module+0xa0/0xa0
[ 102.441707] ? __lock_acquire+0x53d0/0x53d0
[ 102.441713] ? reacquire_held_locks+0x4e0/0x4e0
[ 102.441723] ? seqcount_lockdep_reader_access.constprop.0+0xa6/0xb0
[ 102.441729] ? ktime_get_coarse_real_ts64+0x3d/0xc0
[ 102.441738] do_syscall_64+0x5c/0x80
[ 102.441743] ? btrfs_drop_pages+0x2e0/0x2e0
[ 102.441749] ? trace_hardirqs_on+0x1c/0x130
[ 102.441754] ? __fget_light+0x51/0x230
[ 102.441761] ? fpregs_assert_state_consistent+0x4b/0xb0
[ 102.441766] ? rcu_read_lock_sched_held+0x10/0x70
[ 102.441771] ? trace_hardirqs_on_prepare+0x72/0x160
[ 102.441776] ? do_syscall_64+0x68/0x80
[ 102.441781] ? do_syscall_64+0x68/0x80
[ 102.441786] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 102.441792] RIP: 0033:0x7efdc5a2511d
[ 102.441796] RSP: 002b:00007fff8ccfeb08 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 102.441803] RAX: ffffffffffffffda RBX: 000055860e73d960 RCX: 00007efdc5a2511d
[ 102.441807] RDX: 0000000000000000 RSI: 00007efdc5b4107d RDI: 000000000000005e
[ 102.441811] RBP: 00007fff8ccfebc0 R08: 0000000000000001 R09: 00007fff8ccfeb50
[ 102.441815] R10: 0000000000000050 R11: 0000000000000246 R12: 00007efdc5b4107d
[ 102.441818] R13: 0000000000020000 R14: 000055860e7574e0 R15: 000055860eaffeb0
[ 102.441828] </TASK>
[ 102.441892] OOM killer enabled.

I don’t know if it makes sense to try to build older kernels, but with
my toolkit more older kernels failed to build.
Also I noted that after switch DJI Osmo Pocket 3 in file transfer mode
loaded follow modules:
rndis_host cdc_ether usbnet mii uas usb_storage exfat
But manually removing them
# rmmod rndis_host cdc_ether usbnet mii uas usbstorage exfat
the problem does't gone.
Looks like something bad happening deep inside the usb stack.
Let's invite the guys from usb mail list here.
Maybe they could help us where to go next.

[1] https://postimg.cc/bsg46pNM

--
Best Regards,
Mike Gavrilov.