2022-10-27 22:35:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 19/31] timers: net: Use del_timer_shutdown() before freeing timer

On Thu, 27 Oct 2022 17:07:20 -0400
Steven Rostedt <[email protected]> wrote:

> Well, I think this current use case will break if we prevent the timer from
> being rearmed or run again if it's not found. But as you said, the
> networking folks need to confirm or deny it.
>
> The fact that it does the sock_put() when it removes the timer makes me
> think that it can be called again, and we shouldn't prevent that from
> happening.
>
> The debug code will let us know too, as it only "frees" it for freeing if
> it deactivated the timer and shut it down.

I think we have our answer from Guenter's report:


Linux version 6.1.0-rc2-00138-gced58c742836 (groeck@jupiter) (aarch64-linux-gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.39) #1 SMP PREEMPT Thu Oct 27 14:53:17 PDT 2022
[ 17.258727] ------------[ cut here ]------------
[ 17.259079] ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x190
[ 17.259723] WARNING: CPU: 0 PID: 309 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100
[ 17.259951] Modules linked in:
[ 17.260249] CPU: 0 PID: 309 Comm: telnet Tainted: G N 6.1.0-rc2-00138-gced58c742836 #1
[ 17.260518] Hardware name: linux,dummy-virt (DT)
[ 17.260779] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 17.260967] pc : debug_print_object+0xb8/0x100
[ 17.261096] lr : debug_print_object+0xb8/0x100
[ 17.261223] sp : ffff8000086539e0
[ 17.261324] x29: ffff8000086539e0 x28: 0000000000000004 x27: ffff0d2ac2168000
[ 17.261574] x26: 0000000000000000 x25: ffffa241e2b9de18 x24: ffffa241e4f8fcd8
[ 17.261772] x23: ffffa241e336b370 x22: ffffa241e2b9de18 x21: ffff0d2ac20c5710
[ 17.261967] x20: ffffa241e4ea2568 x19: ffffa241e3ea3c00 x18: 00000000ffffffff
[ 17.262161] x17: 6c6973742068696e x16: 3a2074696d65725f x15: 6563742074797065
[ 17.262375] x14: 65203029206f626a x13: ffffa241e3ec7640 x12: 0000000000000d50
[ 17.262591] x11: 0000000000000470 x10: ffffa241e3f1f640 x9 : ffffa241e3ec7640
[ 17.262821] x8 : 00000000ffffefff x7 : ffffa241e3f1f640 x6 : 0000000000000000
[ 17.263028] x5 : ffff0d2adfebba68 x4 : 0000000000000000 x3 : 0000000000000027
[ 17.263235] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0d2ac658b340
[ 17.263528] Call trace:
[ 17.263646] debug_print_object+0xb8/0x100
[ 17.263795] __debug_check_no_obj_freed+0x1d0/0x25c
[ 17.263927] debug_check_no_obj_freed+0x20/0x90
[ 17.264051] slab_free_freelist_hook.constprop.0+0xac/0x1b0
[ 17.264197] kmem_cache_free+0x1ac/0x500
[ 17.264311] __sk_destruct+0x140/0x2a0
[ 17.264425] sk_destruct+0x54/0x64
[ 17.264531] __sk_free+0x74/0x120
[ 17.264636] sk_free+0x64/0x8c
[ 17.264736] tcp_close+0x94/0xc0
[ 17.264840] inet_release+0x50/0xb0
[ 17.264949] __sock_release+0x44/0xbc
[ 17.265061] sock_close+0x18/0x30
[ 17.265166] __fput+0x84/0x270
[ 17.265266] ____fput+0x10/0x20
[ 17.265366] task_work_run+0x88/0xf0
[ 17.265491] do_exit+0x334/0xafc
[ 17.265596] do_group_exit+0x34/0x90
[ 17.265705] __arm64_sys_exit_group+0x18/0x20
[ 17.265826] invoke_syscall+0x48/0x114
[ 17.265941] el0_svc_common.constprop.0+0x60/0x11c
[ 17.266070] do_el0_svc+0x30/0xd0
[ 17.266175] el0_svc+0x48/0xc0
[ 17.266276] el0t_64_sync_handler+0xbc/0x13c
[ 17.266396] el0t_64_sync+0x18c/0x190
[ 17.266565] irq event stamp: 5192
[ 17.266676] hardirqs last enabled at (5191): [<ffffa241e1926a18>] __up_console_sem+0x78/0x84
[ 17.266903] hardirqs last disabled at (5192): [<ffffa241e2b4d504>] el1_dbg+0x24/0x90
[ 17.267093] softirqs last enabled at (5170): [<ffffa241e181050c>] __do_softirq+0x46c/0x5d8
[ 17.267305] softirqs last disabled at (5163): [<ffffa241e1816750>] ____do_softirq+0x10/0x20
[ 17.267506] ---[ end trace 0000000000000000 ]---
[ 17.275715] ------------[ cut here ]------------

I'll go modify that code to make it shutdown even if it returns zero.

I thinks this means we'll need to change the name to:

del_timer_shutdown()
del_timer_shutdown_sync()

But I want to confirm this first.

-- Steve




2022-10-28 22:43:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 19/31] timers: net: Use del_timer_shutdown() before freeing timer


Could someone from networking confirm (or deny) that the timer being
removed in sk_stop_timer() will no longer be used even if del_timer()
returns false?

net/core/sock.c:

void sk_stop_timer(struct sock *sk, struct timer_list* timer)
{
if (del_timer(timer))
__sock_put(sk);
}

If this is the case, then I'll add the following interface:

del_timer_sync_shutdown() // the common case which syncs

del_timer_shutdown() // the uncommon case, that returns immediately
// used for those cases that add extra code to
// handle it, like sk_stop_timer()


Which has the same semantics as del_timer_sync() and del_timer()
respectively, but will prevent the timer from being rearmed again.

This way we can convert the sk_stop_timer() to:

void sk_stop_timer(struct sock *sk, struct timer_list* timer)
{
if (del_timer_shutdown(timer))
__sock_put(sk);
}


We can also add the del_timer_shutdown() to other locations that need to
put a timer into a shutdown state before freeing, and where it's in a
context that can not call del_timer_sync_shutdown().

-- Steve

2022-10-28 22:47:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 19/31] timers: net: Use del_timer_shutdown() before freeing timer

On Fri, 28 Oct 2022 18:31:49 -0400 Steven Rostedt wrote:
> Could someone from networking confirm (or deny) that the timer being
> removed in sk_stop_timer() will no longer be used even if del_timer()
> returns false?
>
> net/core/sock.c:
>
> void sk_stop_timer(struct sock *sk, struct timer_list* timer)
> {
> if (del_timer(timer))
> __sock_put(sk);
> }
>
> If this is the case, then I'll add the following interface:
>
> del_timer_sync_shutdown() // the common case which syncs
>
> del_timer_shutdown() // the uncommon case, that returns immediately
> // used for those cases that add extra code to
> // handle it, like sk_stop_timer()

Sorry too many bugs at once :)

FWIW Paolo was saying privately earlier today that he spotted some cases
of reuse, he gave an example of ccid2_hc_tx_packet_recv()

So we can't convert all cases of sk_stop_timer() in one fell swoop :(

> Which has the same semantics as del_timer_sync() and del_timer()
> respectively, but will prevent the timer from being rearmed again.
>
> This way we can convert the sk_stop_timer() to:
>
> void sk_stop_timer(struct sock *sk, struct timer_list* timer)
> {
> if (del_timer_shutdown(timer))
> __sock_put(sk);
> }
>
>
> We can also add the del_timer_shutdown() to other locations that need to
> put a timer into a shutdown state before freeing, and where it's in a
> context that can not call del_timer_sync_shutdown().

2022-10-30 17:34:53

by Paolo Abeni

[permalink] [raw]
Subject: Re: [RFC][PATCH v2 19/31] timers: net: Use del_timer_shutdown() before freeing timer

On Fri, 2022-10-28 at 15:46 -0700, Jakub Kicinski wrote:
> On Fri, 28 Oct 2022 18:31:49 -0400 Steven Rostedt wrote:
> > Could someone from networking confirm (or deny) that the timer being
> > removed in sk_stop_timer() will no longer be used even if del_timer()
> > returns false?
> >
> > net/core/sock.c:
> >
> > void sk_stop_timer(struct sock *sk, struct timer_list* timer)
> > {
> > if (del_timer(timer))
> > __sock_put(sk);
> > }
> >
> > If this is the case, then I'll add the following interface:
> >
> > del_timer_sync_shutdown() // the common case which syncs
> >
> > del_timer_shutdown() // the uncommon case, that returns immediately
> > // used for those cases that add extra code to
> > // handle it, like sk_stop_timer()
>
> Sorry too many bugs at once :)
>
> FWIW Paolo was saying privately earlier today that he spotted some cases
> of reuse, he gave an example of ccid2_hc_tx_packet_recv()

For the records, there are other cases, e.g. after sk_stop_timer() in
clear_3rdack_retransmission() (mptcp code) the timer can be-rearmed
without re-initializing. I *think* there are more of such use in the
in ax25/rose code.

> So we can't convert all cases of sk_stop_timer() in one fell swoop :(

On the positive side, I think converting the sk_stop_timer in
inet_csk_clear_xmit_timers() should be safe and should cover the issue
reported by Guenter

Cheers,

Paolo