2024-04-03 13:46:03

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address



Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: 43066e32227ecde674e8ae1fcdd4a1ede67680c2 ("serial: port: Don't suspend if the port is still busy")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 026e680b0a08a62b1d948e5a8ca78700bfac0e6e]
[test failed on linux-next/master c0b832517f627ead3388c6f0c74e8ac10ad5774b]

in testcase: boot

compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


in our tests, the issue does not always happen, 15 times out of 200 runs as
below.

e5d6bd25f93d6ae1 43066e32227ecde674e8ae1fcdd
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:200 8% 15:200 dmesg.BUG:kernel_NULL_pointer_dereference,address
:200 8% 15:200 dmesg.EIP:serial8250_tx_chars
:200 8% 15:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
:200 8% 15:200 dmesg.Oops:#[##]



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 66.918163][ T61] BUG: kernel NULL pointer dereference, address: 00000cf5
[ 66.919684][ T61] #PF: supervisor read access in kernel mode
[ 66.920890][ T61] #PF: error_code(0x0000) - not-present page
[ 66.922103][ T61] *pdpt = 000000002c4f1001 *pde = 0000000000000000
[ 66.923402][ T61] Oops: 0000 [#1] PREEMPT SMP PTI
[ 66.924417][ T61] CPU: 0 PID: 61 Comm: kworker/0:2 Tainted: G W TN 6.8.0-rc6-00003-g43066e32227e #1
[ 66.926478][ T61] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 66.928396][ T61] Workqueue: pm pm_runtime_work
[ 66.929411][ T61] EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
[ 66.930475][ T61] Code: 10 02 00 00 75 09 89 f1 e8 b7 01 00 00 eb db 8b 9e d0 01 00 00 89 7d f0 90 90 90 90 90 90 90 90 90 90 90 90 8b 8f 0c 02 00 00 <0f> b6 0c 01 8b 7e 30 89 f0 31 d2 e8 8a 49 51 00 f6 86 cc 01 00 00
All code
========
0: 10 02 adc %al,(%rdx)
2: 00 00 add %al,(%rax)
4: 75 09 jne 0xf
6: 89 f1 mov %esi,%ecx
8: e8 b7 01 00 00 call 0x1c4
d: eb db jmp 0xffffffffffffffea
f: 8b 9e d0 01 00 00 mov 0x1d0(%rsi),%ebx
15: 89 7d f0 mov %edi,-0x10(%rbp)
18: 90 nop
19: 90 nop
1a: 90 nop
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
20: 90 nop
21: 90 nop
22: 90 nop
23: 90 nop
24: 8b 8f 0c 02 00 00 mov 0x20c(%rdi),%ecx
2a:* 0f b6 0c 01 movzbl (%rcx,%rax,1),%ecx <-- trapping instruction
2e: 8b 7e 30 mov 0x30(%rsi),%edi
31: 89 f0 mov %esi,%eax
33: 31 d2 xor %edx,%edx
35: e8 8a 49 51 00 call 0x5149c4
3a: f6 .byte 0xf6
3b: 86 cc xchg %cl,%ah
3d: 01 00 add %eax,(%rax)
...

Code starting with the faulting instruction
===========================================
0: 0f b6 0c 01 movzbl (%rcx,%rax,1),%ecx
4: 8b 7e 30 mov 0x30(%rsi),%edi
7: 89 f0 mov %esi,%eax
9: 31 d2 xor %edx,%edx
b: e8 8a 49 51 00 call 0x51499a
10: f6 .byte 0xf6
11: 86 cc xchg %cl,%ah
13: 01 00 add %eax,(%rax)
...
[ 66.934169][ T61] EAX: 00000cf5 EBX: 00000010 ECX: 00000000 EDX: 00000000
[ 66.935518][ T61] ESI: c33128e0 EDI: c35b824c EBP: c37e7e50 ESP: c37e7e40
[ 66.936922][ T61] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010087
[ 66.938425][ T61] CR0: 80050033 CR2: 00000cf5 CR3: 28938000 CR4: 000406b0
[ 66.939773][ T61] Call Trace:
[ 66.940519][ T61] ? __die_body (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420)
[ 66.941445][ T61] ? __die (arch/x86/kernel/dumpstack.c:434)
[ 66.942304][ T61] ? page_fault_oops (arch/x86/mm/fault.c:703)
[ 66.943233][ T61] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:761)
[ 66.944260][ T61] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:808)
[ 66.945368][ T61] ? bad_area_nosemaphore (arch/x86/mm/fault.c:857)
[ 66.946377][ T61] ? do_user_addr_fault (arch/x86/mm/fault.c:?)
[ 66.947406][ T61] ? exc_page_fault (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 arch/x86/mm/fault.c:1506 arch/x86/mm/fault.c:1554)
[ 66.948424][ T61] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1511)
[ 66.949709][ T61] ? handle_exception (arch/x86/entry/entry_32.S:1058)
[ 66.950749][ T61] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1511)
[ 66.951963][ T61] ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
[ 66.953001][ T61] ? pvclock_clocksource_read_nowd (arch/x86/mm/fault.c:1511)
[ 66.954188][ T61] ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
[ 66.955245][ T61] __start_tx (drivers/tty/serial/8250/8250_port.c:1551)
[ 66.956129][ T61] serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654)
[ 66.957146][ T61] serial_port_runtime_suspend (include/linux/spinlock.h:406 include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63)
[ 66.958275][ T61] __rpm_callback (drivers/base/power/runtime.c:393)
[ 66.959234][ T61] ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
[ 66.960282][ T61] ? serial_port_remove (drivers/tty/serial/serial_port.c:50)
[ 66.961297][ T61] rpm_suspend (drivers/base/power/runtime.c:447)
[ 66.962165][ T61] ? process_one_work (kernel/workqueue.c:?)
[ 66.963182][ T61] pm_runtime_work (include/linux/spinlock.h:401 drivers/base/power/runtime.c:983)
[ 66.964177][ T61] process_one_work (kernel/workqueue.c:2638)
[ 66.965223][ T61] worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787)
[ 66.966246][ T61] kthread (kernel/kthread.c:390)
[ 66.967118][ T61] ? pr_cont_work (kernel/workqueue.c:2733)
[ 66.968018][ T61] ? kthread_unuse_mm (kernel/kthread.c:341)
[ 66.968932][ T61] ? kthread_unuse_mm (kernel/kthread.c:341)
[ 66.969891][ T61] ret_from_fork (arch/x86/kernel/process.c:153)
[ 66.970813][ T61] ret_from_fork_asm (arch/x86/entry/entry_32.S:741)
[ 66.971755][ T61] entry_INT80_32 (arch/x86/entry/entry_32.S:948)
[ 66.972650][ T61] Modules linked in: input_leds aesni_intel crypto_simd evdev button drm fuse drm_panel_orientation_quirks
[ 66.974751][ T61] CR2: 0000000000000cf5
[ 66.975631][ T61] ---[ end trace 0000000000000000 ]---
[ 66.976733][ T61] EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809)
[ 66.977784][ T61] Code: 10 02 00 00 75 09 89 f1 e8 b7 01 00 00 eb db 8b 9e d0 01 00 00 89 7d f0 90 90 90 90 90 90 90 90 90 90 90 90 8b 8f 0c 02 00 00 <0f> b6 0c 01 8b 7e 30 89 f0 31 d2 e8 8a 49 51 00 f6 86 cc 01 00 00
All code
========
0: 10 02 adc %al,(%rdx)
2: 00 00 add %al,(%rax)
4: 75 09 jne 0xf
6: 89 f1 mov %esi,%ecx
8: e8 b7 01 00 00 call 0x1c4
d: eb db jmp 0xffffffffffffffea
f: 8b 9e d0 01 00 00 mov 0x1d0(%rsi),%ebx
15: 89 7d f0 mov %edi,-0x10(%rbp)
18: 90 nop
19: 90 nop
1a: 90 nop
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
20: 90 nop
21: 90 nop
22: 90 nop
23: 90 nop
24: 8b 8f 0c 02 00 00 mov 0x20c(%rdi),%ecx
2a:* 0f b6 0c 01 movzbl (%rcx,%rax,1),%ecx <-- trapping instruction
2e: 8b 7e 30 mov 0x30(%rsi),%edi
31: 89 f0 mov %esi,%eax
33: 31 d2 xor %edx,%edx
35: e8 8a 49 51 00 call 0x5149c4
3a: f6 .byte 0xf6
3b: 86 cc xchg %cl,%ah
3d: 01 00 add %eax,(%rax)
...

Code starting with the faulting instruction
===========================================
0: 0f b6 0c 01 movzbl (%rcx,%rax,1),%ecx
4: 8b 7e 30 mov 0x30(%rsi),%edi
7: 89 f0 mov %esi,%eax
9: 31 d2 xor %edx,%edx
b: e8 8a 49 51 00 call 0x51499a
10: f6 .byte 0xf6
11: 86 cc xchg %cl,%ah
13: 01 00 add %eax,(%rax)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240403/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-04-03 14:01:03

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address

On Wed, Apr 03, 2024 at 09:43:28PM +0800, kernel test robot wrote:

> kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:
>
> commit: 43066e32227ecde674e8ae1fcdd4a1ede67680c2 ("serial: port: Don't suspend if the port is still busy")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Ja-ja, I am investigating this issue for a while (not that I spend all my time
on it, though) on max3100.

The problem seems that we enable PM runtime on all serial ports
(even if they have no RPM support), but at the same time we enforce
the common serial_port PM callbacks and that change misses a detail
on how to check the port status as the real callback may happen
quite after the port being closed (/dev/ttySx).

Probably we need to check if the port is still open...

P.S. AFAIU the state / xmit is invalid pointer or so.

Culprit line: serial_out(up, UART_TX, xmit->buf[xmit->tail]);

--
With Best Regards,
Andy Shevchenko



2024-04-04 08:01:35

by Tony Lindgren

[permalink] [raw]
Subject: Re: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address

* Andy Shevchenko <[email protected]> [240403 14:00]:
> Probably we need to check if the port is still open...
>
> P.S. AFAIU the state / xmit is invalid pointer or so.
>
> Culprit line: serial_out(up, UART_TX, xmit->buf[xmit->tail]);

Maybe we can set UPF_DEAD a bit earlier as below?

Regards,

Tony

8< ---------------------------
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3442,8 +3442,6 @@ void serial_core_unregister_port(struct uart_driver *drv, struct uart_port *port

mutex_lock(&port_mutex);

- port->flags |= UPF_DEAD;
-
serial_core_remove_one_port(drv, port);

/* Note that struct uart_port *port is no longer valid at this point */
diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c
index 22b9eeb23e68a..1571ef721ef10 100644
--- a/drivers/tty/serial/serial_port.c
+++ b/drivers/tty/serial/serial_port.c
@@ -104,6 +104,7 @@ EXPORT_SYMBOL(uart_add_one_port);

void uart_remove_one_port(struct uart_driver *drv, struct uart_port *port)
{
+ port->flags |= UPF_DEAD;
serial_ctrl_unregister_port(drv, port);
}
EXPORT_SYMBOL(uart_remove_one_port);

2024-04-04 14:38:03

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address

On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> * Andy Shevchenko <[email protected]> [240403 14:00]:
> > Probably we need to check if the port is still open...
> >
> > P.S. AFAIU the state / xmit is invalid pointer or so.
> >
> > Culprit line: serial_out(up, UART_TX, xmit->buf[xmit->tail]);
>
> Maybe we can set UPF_DEAD a bit earlier as below?

We should do this under port_mutex. Otherwise we replace one race by another.

--
With Best Regards,
Andy Shevchenko



2024-04-04 14:43:20

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address

On Thu, Apr 04, 2024 at 02:59:19PM +0300, Andy Shevchenko wrote:
> On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> > * Andy Shevchenko <[email protected]> [240403 14:00]:
> > > Probably we need to check if the port is still open...
> > >
> > > P.S. AFAIU the state / xmit is invalid pointer or so.
> > >
> > > Culprit line: serial_out(up, UART_TX, xmit->buf[xmit->tail]);
>
> xmit.buf seems to be NULL, that's why this fails.
>
> > Maybe we can set UPF_DEAD a bit earlier as below?
>
> Lemme perform some tests later today.

Okay, so far it doesn't fix the issue AFAICT. The only reliable one right now
to me is clearing circ buffer before NULLifying it.

I'll send a formal patch.

--
With Best Regards,
Andy Shevchenko



2024-04-04 16:19:07

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address

On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> * Andy Shevchenko <[email protected]> [240403 14:00]:
> > Probably we need to check if the port is still open...
> >
> > P.S. AFAIU the state / xmit is invalid pointer or so.
> >
> > Culprit line: serial_out(up, UART_TX, xmit->buf[xmit->tail]);

xmit.buf seems to be NULL, that's why this fails.

> Maybe we can set UPF_DEAD a bit earlier as below?

Lemme perform some tests later today.

--
With Best Regards,
Andy Shevchenko



2024-04-05 05:46:46

by Tony Lindgren

[permalink] [raw]
Subject: Re: [linus:master] [serial] 43066e3222: BUG:kernel_NULL_pointer_dereference,address

* Andy Shevchenko <[email protected]> [240404 14:34]:
> On Thu, Apr 04, 2024 at 09:54:15AM +0300, Tony Lindgren wrote:
> > * Andy Shevchenko <[email protected]> [240403 14:00]:
> > > Probably we need to check if the port is still open...
> > >
> > > P.S. AFAIU the state / xmit is invalid pointer or so.
> > >
> > > Culprit line: serial_out(up, UART_TX, xmit->buf[xmit->tail]);
> >
> > Maybe we can set UPF_DEAD a bit earlier as below?
>
> We should do this under port_mutex. Otherwise we replace one race by another.

Oh right.. And adding some better check in addition to UPF_DEAD
would be better.

Regards,

Tony