LinuxLists.cc - INFO: rcu detected stall in tc_modify

2020-07-29 05:54:11

Subject: INFO: rcu detected stall in tc_modify_qdisc

Hello,

syzbot found the following issue on:

HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15b2d790900000

The issue was bisected to:

commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22
Author: Vinicius Costa Gomes <[email protected]>
Date: Sat Sep 29 00:59:43 2018 +0000

tc: Add support for configuring the taprio scheduler

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac900000
console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (1 GPs behind) idle=6f6/1/0x4000000000000000 softirq=10195/10196 fqs=1
(t=27930 jiffies g=9233 q=413)
rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
rcu_preempt R running task 29112 10 2 0x00004000
Call Trace:
context_switch kernel/sched/core.c:3458 [inline]
__schedule+0x8ea/0x2210 kernel/sched/core.c:4219
schedule+0xd0/0x2a0 kernel/sched/core.c:4294
schedule_timeout+0x148/0x250 kernel/time/timer.c:1908
rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline]
rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
NMI backtrace for cpu 1
CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320
print_cpu_stall kernel/rcu/tree_stall.h:553 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
rcu_pending kernel/rcu/tree.c:3489 [inline]
rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504
update_process_times+0x25/0x60 kernel/time/timer.c:1737
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320
__run_hrtimer kernel/time/hrtimer.c:1520 [inline]
__hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584
hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
__sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78
RSP: 0018:ffffc900016672c0 EFLAGS: 00000282
RAX: 1ffffffff1369c11 RBX: 0000000000000282 RCX: 0000000000000002
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
RBP: ffff888093a052e8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000282
R13: 00000078100c35c3 R14: ffff888093a05000 R15: 0000000000000000
spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
taprio_change+0x1fdc/0x2960 net/sched/sch_taprio.c:1557
taprio_init+0x52e/0x670 net/sched/sch_taprio.c:1670
qdisc_create+0x4b6/0x12e0 net/sched/sch_api.c:1246
tc_modify_qdisc+0x4c8/0x1990 net/sched/sch_api.c:1662
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5461
netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2352
___sys_sendmsg+0xf3/0x170 net/socket.c:2406
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x443819
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 0f fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff687c83d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000443819
RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000004
RBP: 00007fff687c83e0 R08: 0000000001bbbbbb R09: 0000000001bbbbbb
R10: 0000000001bbbbbb R11: 0000000000000246 R12: 00007fff687c83f0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

2020-07-29 07:30:44

by Zhang, Qiang

[permalink] [raw]

Subject: 回复: INFO: rcu detected stall in tc_modify_q disc

________________________________________
??????: [email protected] <[email protected]> ???? syzbot <[email protected]>
????ʱ??: 2020??7??29?? 13:53
?ռ???: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
????: INFO: rcu detected stall in tc_modify_qdisc

Hello,

syzbot found the following issue on:

HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15b2d790900000

The issue was bisected to:

commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22
Author: Vinicius Costa Gomes <[email protected]>
Date: Sat Sep 29 00:59:43 2018 +0000

tc: Add support for configuring the taprio scheduler

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=160e1bac900000
console output: https://syzkaller.appspot.com/x/log.txt?x=110e1bac900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (1 GPs behind) idle=6f6/1/0x4000000000000000 softirq=10195/10196 fqs=1
(t=27930 jiffies g=9233 q=413)
rcu: rcu_preempt kthread starved for 27901 jiffies! g9233 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
rcu_preempt R running task 29112 10 2 0x00004000
Call Trace:
context_switch kernel/sched/core.c:3458 [inline]
__schedule+0x8ea/0x2210 kernel/sched/core.c:4219
schedule+0xd0/0x2a0 kernel/sched/core.c:4294
schedule_timeout+0x148/0x250 kernel/time/timer.c:1908
rcu_gp_fqs_loop kernel/rcu/tree.c:1874 [inline]
rcu_gp_kthread+0xae5/0x1b50 kernel/rcu/tree.c:2044
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
NMI backtrace for cpu 1
CPU: 1 PID: 6799 Comm: syz-executor494 Not tainted 5.8.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1b3/0x223 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x194/0x1cf kernel/rcu/tree_stall.h:320
print_cpu_stall kernel/rcu/tree_stall.h:553 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
rcu_pending kernel/rcu/tree.c:3489 [inline]
rcu_sched_clock_irq.cold+0x5b3/0xccc kernel/rcu/tree.c:2504
update_process_times+0x25/0x60 kernel/time/timer.c:1737
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
tick_sched_timer+0x108/0x290 kernel/time/tick-sched.c:1320
__run_hrtimer kernel/time/hrtimer.c:1520 [inline]
__hrtimer_run_queues+0x1d5/0xfc0 kernel/time/hrtimer.c:1584
hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
__sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 88 e0 b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d e3 52 cc 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 35 e5 66 f9 65 8b 05 fe 70 19 78
RSP: 0018:ffffc900016672c0 EFLAGS: 00000282
RAX: 1ffffffff1369c11 RBX: 0000000000000282 RCX: 0000000000000002
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
RBP: ffff888093a052e8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000282
R13: 00000078100c35c3 R14: ffff888093a05000 R15: 0000000000000000
spin_unlock_irqrestore include/linux/spinlock.h:408 [inline]
taprio_change+0x1fdc/0x2960 net/sched/sch_taprio.c:1557

It looks like that some loops in "taprio_init" func occupy CPU 1 for a long time, resulting in the fact that the RCU has not detected CPU1 quiescent state.

taprio_init+0x52e/0x670 net/sched/sch_taprio.c:1670
qdisc_create+0x4b6/0x12e0 net/sched/sch_api.c:1246
tc_modify_qdisc+0x4c8/0x1990 net/sched/sch_api.c:1662
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5461
netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2352
___sys_sendmsg+0xf3/0x170 net/socket.c:2406
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x443819
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 0f fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff687c83d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000443819
RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000004
RBP: 00007fff687c83e0 R08: 0000000001bbbbbb R09: 0000000001bbbbbb
R10: 0000000001bbbbbb R11: 0000000000000246 R12: 00007fff687c83f0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

2020-07-29 19:14:02

by Vinicius Costa Gomes

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Hi,

"Zhang, Qiang" <[email protected]> writes:

> ________________________________________
> 发件人: [email protected] <[email protected]> 代表 syzbot <[email protected]>
> 发送时间: 2020年7月29日 13:53
> 收件人: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> 主题: INFO: rcu detected stall in tc_modify_qdisc
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
> git tree: net
> console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
> dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
> compiler: gcc (GCC) 10.1.0-syz 20200507
> syz repro:
> https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000

It seems that syzkaller is generating an schedule with too small
intervals (3ns in this case) which causes a hrtimer busy-loop which
starves other kernel threads.

We could put some limits on the interval when running in software mode,
but I don't like this too much, because we are talking about users with
CAP_NET_ADMIN and they have easier ways to do bad things to the system.

Cheers,
--
Vinicius

2020-07-30 05:59:08

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc_modify_qd isc

On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
<[email protected]> wrote:
>
> Hi,
>
> "Zhang, Qiang" <[email protected]> writes:
>
> > ________________________________________
> > 发件人: [email protected] <[email protected]> 代表 syzbot <[email protected]>
> > 发送时间: 2020年7月29日 13:53
> > 收件人: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> > 主题: INFO: rcu detected stall in tc_modify_qdisc
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
> > git tree: net
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
> > compiler: gcc (GCC) 10.1.0-syz 20200507
> > syz repro:
> > https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000
>
> It seems that syzkaller is generating an schedule with too small
> intervals (3ns in this case) which causes a hrtimer busy-loop which
> starves other kernel threads.
>
> We could put some limits on the interval when running in software mode,
> but I don't like this too much, because we are talking about users with
> CAP_NET_ADMIN and they have easier ways to do bad things to the system.

Hi Vinicius,

Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
Good code should check arguments regardless I think and it's useful to
protect root from, say, programming bugs rather than kill the machine
on any bug and misconfiguration. What am I missing?

Also are we talking about CAP_NET_ADMIN in a user ns as well
(effectively nobody)?

2020-07-30 17:45:42

by Vinicius Costa Gomes

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Hi,

Dmitry Vyukov <[email protected]> writes:

> On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
> <[email protected]> wrote:
>>
>> Hi,
>>
>> "Zhang, Qiang" <[email protected]> writes:
>>
>> > ________________________________________
>> > 发件人: [email protected] <[email protected]> 代表 syzbot <[email protected]>
>> > 发送时间: 2020年7月29日 13:53
>> > 收件人: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
>> > 主题: INFO: rcu detected stall in tc_modify_qdisc
>> >
>> > Hello,
>> >
>> > syzbot found the following issue on:
>> >
>> > HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
>> > git tree: net
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
>> > kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
>> > compiler: gcc (GCC) 10.1.0-syz 20200507
>> > syz repro:
>> > https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000
>>
>> It seems that syzkaller is generating an schedule with too small
>> intervals (3ns in this case) which causes a hrtimer busy-loop which
>> starves other kernel threads.
>>
>> We could put some limits on the interval when running in software mode,
>> but I don't like this too much, because we are talking about users with
>> CAP_NET_ADMIN and they have easier ways to do bad things to the system.
>
> Hi Vinicius,
>
> Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
> Good code should check arguments regardless I think and it's useful to
> protect root from, say, programming bugs rather than kill the machine
> on any bug and misconfiguration. What am I missing?

I admit that I am on the fence on that argument: do not let even root
crash the system (the point that my code is crashing the system gives
weight to this side) vs. root has great powers, they need to know what
they are doing.

The argument that I used to convince myself was: root can easily create
a bunch of processes and give them the highest priority and do
effectively the same thing as this issue, so I went with a the "they
need to know what they are doing side".

A bit more on the specifics here:

- Using a small interval size, is only a limitation of the taprio
software mode, when using hardware offloads (which I think most users
do), any interval size (supported by the hardware) can be used;

- Choosing a good lower limit for this seems kind of hard: something
below 1us would never work well, I think, but things 1us < x < 100us
will depend on the hardware/kernel config/system load, and this is the
range includes "useful" values for many systems.

Perhaps a middle ground would be to impose a limit based on the link
speed, the interval can never be smaller than the time it takes to send
the minimum ethernet frame (for 1G links this would be ~480ns, should be
enough to catch most programming mistakes). I am going to add this and
see how it looks like.

Sorry for the brain dump :-)

>
> Also are we talking about CAP_NET_ADMIN in a user ns as well
> (effectively nobody)?

Just checked, we are talking about CAP_NET_ADMIN in user namespace as
well.

Cheers,
--
Vinicius

2020-07-30 18:39:43

by Eric Dumazet

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc _modify_qdisc

On 7/30/20 10:44 AM, Vinicius Costa Gomes wrote:
> Hi,
>
> Dmitry Vyukov <[email protected]> writes:
>
>> On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
>> <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> "Zhang, Qiang" <[email protected]> writes:
>>>
>>>> ________________________________________
>>>> 发件人: [email protected] <[email protected]> 代表 syzbot <[email protected]>
>>>> 发送时间: 2020年7月29日 13:53
>>>> 收件人: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
>>>> 主题: INFO: rcu detected stall in tc_modify_qdisc
>>>>
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
>>>> git tree: net
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
>>>> compiler: gcc (GCC) 10.1.0-syz 20200507
>>>> syz repro:
>>>> https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000
>>>
>>> It seems that syzkaller is generating an schedule with too small
>>> intervals (3ns in this case) which causes a hrtimer busy-loop which
>>> starves other kernel threads.
>>>
>>> We could put some limits on the interval when running in software mode,
>>> but I don't like this too much, because we are talking about users with
>>> CAP_NET_ADMIN and they have easier ways to do bad things to the system.
>>
>> Hi Vinicius,
>>
>> Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
>> Good code should check arguments regardless I think and it's useful to
>> protect root from, say, programming bugs rather than kill the machine
>> on any bug and misconfiguration. What am I missing?
>
> I admit that I am on the fence on that argument: do not let even root
> crash the system (the point that my code is crashing the system gives
> weight to this side) vs. root has great powers, they need to know what
> they are doing.
>
> The argument that I used to convince myself was: root can easily create
> a bunch of processes and give them the highest priority and do
> effectively the same thing as this issue, so I went with a the "they
> need to know what they are doing side".
>
> A bit more on the specifics here:
>
> - Using a small interval size, is only a limitation of the taprio
> software mode, when using hardware offloads (which I think most users
> do), any interval size (supported by the hardware) can be used;
>
> - Choosing a good lower limit for this seems kind of hard: something
> below 1us would never work well, I think, but things 1us < x < 100us
> will depend on the hardware/kernel config/system load, and this is the
> range includes "useful" values for many systems.
>
> Perhaps a middle ground would be to impose a limit based on the link
> speed, the interval can never be smaller than the time it takes to send
> the minimum ethernet frame (for 1G links this would be ~480ns, should be
> enough to catch most programming mistakes). I am going to add this and
> see how it looks like.
>
> Sorry for the brain dump :-)

I do not know taprio details, but do you really need a periodic timer ?

Presumably there is no need to fire a timer before next packet departure time ?

2020-07-30 19:23:33

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc_modify_qd isc

On Thu, Jul 30, 2020 at 7:44 PM Vinicius Costa Gomes
<[email protected]> wrote:
>
> Hi,
>
> Dmitry Vyukov <[email protected]> writes:
>
> > On Wed, Jul 29, 2020 at 9:13 PM Vinicius Costa Gomes
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> "Zhang, Qiang" <[email protected]> writes:
> >>
> >> > ________________________________________
> >> > 发件人: [email protected] <[email protected]> 代表 syzbot <[email protected]>
> >> > 发送时间: 2020年7月29日 13:53
> >> > 收件人: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> >> > 主题: INFO: rcu detected stall in tc_modify_qdisc
> >> >
> >> > Hello,
> >> >
> >> > syzbot found the following issue on:
> >> >
> >> > HEAD commit: 181964e6 fix a braino in cmsghdr_from_user_compat_to_kern()
> >> > git tree: net
> >> > console output: https://syzkaller.appspot.com/x/log.txt?x=12925e38900000
> >> > kernel config: https://syzkaller.appspot.com/x/.config?x=f87a5e4232fdb267
> >> > dashboard link: https://syzkaller.appspot.com/bug?extid=9f78d5c664a8c33f4cce
> >> > compiler: gcc (GCC) 10.1.0-syz 20200507
> >> > syz repro:
> >> > https://syzkaller.appspot.com/x/repro.syz?x=16587f8c900000
> >>
> >> It seems that syzkaller is generating an schedule with too small
> >> intervals (3ns in this case) which causes a hrtimer busy-loop which
> >> starves other kernel threads.
> >>
> >> We could put some limits on the interval when running in software mode,
> >> but I don't like this too much, because we are talking about users with
> >> CAP_NET_ADMIN and they have easier ways to do bad things to the system.
> >
> > Hi Vinicius,
> >
> > Could you explain why you don't like the argument if it's for CAP_NET_ADMIN?
> > Good code should check arguments regardless I think and it's useful to
> > protect root from, say, programming bugs rather than kill the machine
> > on any bug and misconfiguration. What am I missing?
>
> I admit that I am on the fence on that argument: do not let even root
> crash the system (the point that my code is crashing the system gives
> weight to this side) vs. root has great powers, they need to know what
> they are doing.
>
> The argument that I used to convince myself was: root can easily create
> a bunch of processes and give them the highest priority and do
> effectively the same thing as this issue, so I went with a the "they
> need to know what they are doing side".
>
> A bit more on the specifics here:
>
> - Using a small interval size, is only a limitation of the taprio
> software mode, when using hardware offloads (which I think most users
> do), any interval size (supported by the hardware) can be used;
>
> - Choosing a good lower limit for this seems kind of hard: something
> below 1us would never work well, I think, but things 1us < x < 100us
> will depend on the hardware/kernel config/system load, and this is the
> range includes "useful" values for many systems.
>
> Perhaps a middle ground would be to impose a limit based on the link
> speed, the interval can never be smaller than the time it takes to send
> the minimum ethernet frame (for 1G links this would be ~480ns, should be
> enough to catch most programming mistakes). I am going to add this and
> see how it looks like.
>
> Sorry for the brain dump :-)
>
> >
> > Also are we talking about CAP_NET_ADMIN in a user ns as well
> > (effectively nobody)?
>
> Just checked, we are talking about CAP_NET_ADMIN in user namespace as
> well.

OK, so this is not root/admin, this is just any user.

2020-07-30 21:02:53

by Vinicius Costa Gomes

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Hi Eric,

Eric Dumazet <[email protected]> writes:

>> I admit that I am on the fence on that argument: do not let even root
>> crash the system (the point that my code is crashing the system gives
>> weight to this side) vs. root has great powers, they need to know what
>> they are doing.
>>
>> The argument that I used to convince myself was: root can easily create
>> a bunch of processes and give them the highest priority and do
>> effectively the same thing as this issue, so I went with a the "they
>> need to know what they are doing side".
>>
>> A bit more on the specifics here:
>>
>> - Using a small interval size, is only a limitation of the taprio
>> software mode, when using hardware offloads (which I think most users
>> do), any interval size (supported by the hardware) can be used;
>>
>> - Choosing a good lower limit for this seems kind of hard: something
>> below 1us would never work well, I think, but things 1us < x < 100us
>> will depend on the hardware/kernel config/system load, and this is the
>> range includes "useful" values for many systems.
>>
>> Perhaps a middle ground would be to impose a limit based on the link
>> speed, the interval can never be smaller than the time it takes to send
>> the minimum ethernet frame (for 1G links this would be ~480ns, should be
>> enough to catch most programming mistakes). I am going to add this and
>> see how it looks like.
>>
>> Sorry for the brain dump :-)
>
>
> I do not know taprio details, but do you really need a periodic timer
> ?

As we can control the transmission time of packets, you are right, I
don't.

Just a bit more detail about the current implementation taprio,
basically it has a sequence of { Traffic Classes that are open; Interval
} that repeats cyclicly, it uses an hrtimer to advance the pointer for
the current element, so during dequeue I can check if a traffic class is
"open" or "closed".

But again, if I calculate the 'skb->tstamp' of each packet during
enqueue, I don't need the hrtimer. What we have in the txtime-assisted
mode is half way there.

I think this is what you had in mind.

Cheers,
--
Vinicius

2020-07-30 22:04:49

by Vinicius Costa Gomes

[permalink] [raw]

Subject: Re: 回复: INFO: rcu detected stall in tc_modify_qdisc

Dmitry Vyukov <[email protected]> writes:

>> >
>> > Also are we talking about CAP_NET_ADMIN in a user ns as well
>> > (effectively nobody)?
>>
>> Just checked, we are talking about CAP_NET_ADMIN in user namespace as
>> well.
>
> OK, so this is not root/admin, this is just any user.

Yeah, will fix this.

Thanks,
--
Vinicius