2020-07-20 13:10:23

by jun qian

[permalink] [raw]
Subject: [RFC PATCH v2] Softirq:avoid large sched delay from the pending softirqs

From: jun qian <[email protected]>

When get the pending softirqs, it need to process all the pending
softirqs in the while loop. If the processing time of each pending
softirq is need more than 2 msec in this loop, or one of the softirq
will running a long time, according to the original code logic, it
will process all the pending softirqs without wakeuping ksoftirqd,
which will cause a relatively large scheduling delay on the
corresponding CPU, which we do not wish to see. The patch will check
the total time to process pending softirq, if the time exceeds 2 ms
we need to wakeup the ksofirqd to aviod large sched delay.

Signed-off-by: jun qian <[email protected]>
---
kernel/softirq.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index c4201b7f..f8e5be9 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -210,7 +210,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
* we want to handle softirqs as soon as possible, but they
* should not be able to lock up the box.
*/
-#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME 2000 /* In microseconds */
#define MAX_SOFTIRQ_RESTART 10

#ifdef CONFIG_TRACE_IRQFLAGS
@@ -248,7 +248,8 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }

asmlinkage __visible void __softirq_entry __do_softirq(void)
{
- unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
+ ktime_t end, start;
+ s64 delta;
unsigned long old_flags = current->flags;
int max_restart = MAX_SOFTIRQ_RESTART;
struct softirq_action *h;
@@ -256,6 +257,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
__u32 pending;
int softirq_bit;

+ start = ktime_get();
+
/*
* Mask out PF_MEMALLOC as the current task context is borrowed for the
* softirq. A softirq handled, such as network RX, might set PF_MEMALLOC
@@ -299,6 +302,15 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
}
h++;
pending >>= softirq_bit;
+
+ end = ktime_get();
+ delta = ktime_to_us(end - start);
+ /*
+ * the softirq's action has been running for too much time
+ * so it may need to wakeup the ksoftirqd
+ */
+ if (delta > MAX_SOFTIRQ_TIME && need_resched())
+ break;
}

if (__this_cpu_read(ksoftirqd) == current)
@@ -307,7 +319,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)

pending = local_softirq_pending();
if (pending) {
- if (time_before(jiffies, end) && !need_resched() &&
+ end = ktime_get();
+ delta = ktime_to_us(end - start);
+ if (delta < MAX_SOFTIRQ_TIME && !need_resched() &&
--max_restart)
goto restart;

--
1.8.3.1


2020-07-22 05:00:07

by jun qian

[permalink] [raw]
Subject: Re: [RFC PATCH v2] Softirq:avoid large sched delay from the pending softirqs

Hi Peter & Uladzislau

Are there any issues that have not been considered in this patch? Can
you give me some suggestions on this issue. If the situation I
described is indeed a problem,how about this modification. Thanks a
lot.

On Mon, Jul 20, 2020 at 9:09 PM <[email protected]> wrote:
>
> From: jun qian <[email protected]>
>
> When get the pending softirqs, it need to process all the pending
> softirqs in the while loop. If the processing time of each pending
> softirq is need more than 2 msec in this loop, or one of the softirq
> will running a long time, according to the original code logic, it
> will process all the pending softirqs without wakeuping ksoftirqd,
> which will cause a relatively large scheduling delay on the
> corresponding CPU, which we do not wish to see. The patch will check
> the total time to process pending softirq, if the time exceeds 2 ms
> we need to wakeup the ksofirqd to aviod large sched delay.
>
> Signed-off-by: jun qian <[email protected]>
> ---
> kernel/softirq.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c4201b7f..f8e5be9 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -210,7 +210,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
> * we want to handle softirqs as soon as possible, but they
> * should not be able to lock up the box.
> */
> -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME 2000 /* In microseconds */
> #define MAX_SOFTIRQ_RESTART 10
>
> #ifdef CONFIG_TRACE_IRQFLAGS
> @@ -248,7 +248,8 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
>
> asmlinkage __visible void __softirq_entry __do_softirq(void)
> {
> - unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> + ktime_t end, start;
> + s64 delta;
> unsigned long old_flags = current->flags;
> int max_restart = MAX_SOFTIRQ_RESTART;
> struct softirq_action *h;
> @@ -256,6 +257,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> __u32 pending;
> int softirq_bit;
>
> + start = ktime_get();
> +
> /*
> * Mask out PF_MEMALLOC as the current task context is borrowed for the
> * softirq. A softirq handled, such as network RX, might set PF_MEMALLOC
> @@ -299,6 +302,15 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> }
> h++;
> pending >>= softirq_bit;
> +
> + end = ktime_get();
> + delta = ktime_to_us(end - start);
> + /*
> + * the softirq's action has been running for too much time
> + * so it may need to wakeup the ksoftirqd
> + */
> + if (delta > MAX_SOFTIRQ_TIME && need_resched())
> + break;
> }
>
> if (__this_cpu_read(ksoftirqd) == current)
> @@ -307,7 +319,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>
> pending = local_softirq_pending();
> if (pending) {
> - if (time_before(jiffies, end) && !need_resched() &&
> + end = ktime_get();
> + delta = ktime_to_us(end - start);
> + if (delta < MAX_SOFTIRQ_TIME && !need_resched() &&
> --max_restart)
> goto restart;
>
> --
> 1.8.3.1
>

2020-07-22 18:08:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH v2] Softirq:avoid large sched delay from the pending softirqs

[email protected] writes:
> +
> + end = ktime_get();
> + delta = ktime_to_us(end - start);

What's the point of this conversion? That's a division for no value
because you can simply define the maximum time in nanoseconds with the
same effect, i.e.

ktime_t end = ktime_get() + MAX_SOFTIRQ_TIME_NS;

if (need_resched() && ktime_get() > end)
break;

So you can spare all that start, delta and conversion dance and keep the
code simple.

Also notice that need_resched() wants to be evaluated first because
there is no point to do the more expensive time read if need_resched()
is false.

Thanks,

tglx

2020-07-23 04:51:20

by jun qian

[permalink] [raw]
Subject: Re: [RFC PATCH v2] Softirq:avoid large sched delay from the pending softirqs

On Thu, Jul 23, 2020 at 2:05 AM Thomas Gleixner <[email protected]> wrote:
>
> [email protected] writes:
> > +
> > + end = ktime_get();
> > + delta = ktime_to_us(end - start);
>
> What's the point of this conversion? That's a division for no value
> because you can simply define the maximum time in nanoseconds with the
> same effect, i.e.
>
> ktime_t end = ktime_get() + MAX_SOFTIRQ_TIME_NS;
>
> if (need_resched() && ktime_get() > end)
> break;
>
> So you can spare all that start, delta and conversion dance and keep the
> code simple.
>
> Also notice that need_resched() wants to be evaluated first because
> there is no point to do the more expensive time read if need_resched()
> is false.
good suggestion,Thanks

I will make changes in the next version
>
> Thanks,
>
> tglx

2020-07-23 15:22:43

by kernel test robot

[permalink] [raw]
Subject: [Softirq] a76eadba0d: WARNING:at_net/mac80211/rx.c:#ieee80211_rx_napi[mac80211]

Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: a76eadba0db10b76d2e966059f25d01993e3e1a1 ("[RFC PATCH v2] Softirq:avoid large sched delay from the pending softirqs")
url: https://github.com/0day-ci/linux/commits/qianjun-kernel-gmail-com/Softirq-avoid-large-sched-delay-from-the-pending-softirqs/20200720-211120
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5714ee50bb4375bd586858ad800b1d9772847452

in testcase: hwsim
with following parameters:

group: hwsim-28



on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 8 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+----------------------------------------------------------------------------+------------+------------+
| | 5714ee50bb | a76eadba0d |
+----------------------------------------------------------------------------+------------+------------+
| boot_successes | 11 | 9 |
| boot_failures | 1 | 18 |
| BUG:kernel_hang_in_boot_stage | 1 | |
| Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) | 0 | 2 |
| WARNING:at_net/mac80211/rx.c:#ieee80211_rx_napi[mac80211] | 0 | 16 |
| RIP:ieee80211_rx_napi[mac80211] | 0 | 16 |
| INFO:rcu_sched_self-detected_stall_on_CPU | 0 | 6 |
| RIP:console_unlock | 0 | 6 |
| INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 0 | 6 |
| RIP:x2apic_send_IPI | 0 | 5 |
| RIP:vprintk_emit | 0 | 5 |
| RIP:default_idle | 0 | 3 |
| RIP:io_serial_in | 0 | 1 |
+----------------------------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 106.856151] WARNING: CPU: 5 PID: 4569 at net/mac80211/rx.c:4708 ieee80211_rx_napi+0x44d/0x560 [mac80211]
[ 106.858369] Modules linked in: ccm mac80211_hwsim mac80211 cfg80211 rfkill libarc4 sr_mod cdrom sg ata_generic bochs_drm intel_rapl_msr drm_vram_helper ipmi_devintf drm_ttm_helper intel_rapl_common ttm ipmi_msghandler crct10dif_pclmul drm_kms_helper crc32_pclmul crc32c_intel ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev ata_piix parport_pc drm joydev parport serio_raw libata i2c_piix4 ip_tables
[ 106.873183] CPU: 5 PID: 4569 Comm: hostapd Not tainted 5.8.0-rc6-00002-ga76eadba0db10 #1
[ 106.879825] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 106.883913] RIP: 0010:ieee80211_rx_napi+0x44d/0x560 [mac80211]
[ 106.887468] Code: 45 44 24 04 89 44 24 04 e9 6d fd ff ff 0f 0b e9 72 fe ff ff 0f b6 45 4a 83 e8 01 3c 07 0f 87 97 fe ff ff 31 d2 e9 7a fc ff ff <0f> 0b e9 55 fe ff ff 4c 89 e7 e8 84 ff 0d cf 48 85 c0 0f 84 14 fe
[ 106.894436] RSP: 0018:ffffb0d280188e90 EFLAGS: 00010246
[ 106.899763] RAX: 0000000000000000 RBX: ffff90c0e39847e0 RCX: 0000000000000000
[ 106.903868] RDX: ffff90c0e3985ec8 RSI: 0000000000000000 RDI: ffff90c0e39847e0
[ 106.906345] RBP: ffff90c0e392e500 R08: 0000000000000000 R09: ffffffffc0873000
[ 106.911449] R10: ffff90c0e392fd00 R11: 0000000000000001 R12: ffff90c0e392e500
[ 106.913620] R13: 0000000000000000 R14: 0000000000000040 R15: ffffffff906050e0
[ 106.920097] FS: 00007ffbb8a4eb80(0000) GS:ffff90c17fd40000(0000) knlGS:0000000000000000
[ 106.922530] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.924537] CR2: 00007fdb4bae4d50 CR3: 00000003a393c000 CR4: 00000000000406e0
[ 106.926725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.928974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.941826] Call Trace:
[ 106.943761] <IRQ>
[ 106.945475] ? ieee80211_tx_status+0x71/0xa0 [mac80211]
[ 106.961847] ? kmem_cache_free+0x212/0x280
[ 106.964038] ieee80211_tasklet_handler+0xbc/0xd0 [mac80211]
[ 106.966220] tasklet_action_common+0x4f/0x100
[ 106.968344] __do_softirq+0xe7/0x32e
[ 106.970246] asm_call_on_stack+0x12/0x20
[ 106.972192] </IRQ>
[ 106.977831] do_softirq_own_stack+0x39/0x50
[ 106.979993] do_softirq+0x2b/0x30
[ 106.982020] __local_bh_enable_ip+0x4b/0x50
[ 106.984023] ieee80211_mgmt_tx+0x466/0x590 [mac80211]
[ 106.986185] ? skb_partial_csum_set+0x60/0x80
[ 106.988239] cfg80211_mlme_mgmt_tx+0x108/0x310 [cfg80211]
[ 107.005903] nl80211_tx_mgmt+0x292/0x3b0 [cfg80211]
[ 107.008082] genl_rcv_msg+0x1a4/0x2fc
[ 107.009886] ? genl_start+0x140/0x140
[ 107.011725] netlink_rcv_skb+0x49/0x110
[ 107.016692] genl_rcv+0x24/0x40
[ 107.018603] netlink_unicast+0x191/0x230
[ 107.020501] netlink_sendmsg+0x243/0x480
[ 107.022365] sock_sendmsg+0x5e/0x60
[ 107.032305] ____sys_sendmsg+0x1f3/0x260
[ 107.034175] ? copy_msghdr_from_user+0x5c/0x90
[ 107.039443] ___sys_sendmsg+0x81/0xc0
[ 107.041151] ? __generic_file_write_iter+0xf9/0x1d0
[ 107.042989] ? generic_file_write_iter+0x106/0x173
[ 107.044793] ? __cgroup_bpf_run_filter_setsockopt+0xb9/0x320
[ 107.046678] __sys_sendmsg+0x59/0xa0
[ 107.048170] do_syscall_64+0x4c/0x90
[ 107.049696] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 107.065381] RIP: 0033:0x7ffbb8b6d914
[ 107.068221] Code: Bad RIP value.
[ 107.069764] RSP: 002b:00007ffcae5590c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 107.071892] RAX: ffffffffffffffda RBX: 000055dba32b81b0 RCX: 00007ffbb8b6d914
[ 107.073939] RDX: 0000000000000000 RSI: 00007ffcae559100 RDI: 0000000000000007
[ 107.079830] RBP: 000055dba32c3b20 R08: 0000000000000004 R09: 00007ffbb8c2f3a0
[ 107.081785] R10: 00007ffcae5591dc R11: 0000000000000246 R12: 000055dba32b80c0
[ 107.086149] R13: 00007ffcae559100 R14: 00007ffcae559238 R15: 00007ffcae5591dc
[ 107.088023] ---[ end trace abbdb1b7b4425524 ]---


To reproduce:

# build kernel
cd linux
cp config-5.8.0-rc6-00002-ga76eadba0db10 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (6.83 kB)
config-5.8.0-rc6-00002-ga76eadba0db10 (160.83 kB)
job-script (5.09 kB)
dmesg.xz (168.15 kB)
hwsim (93.72 kB)
Download all attachments