Received: by 10.223.176.46 with SMTP id f43csp1165341wra; Fri, 19 Jan 2018 07:47:30 -0800 (PST) X-Google-Smtp-Source: ACJfBovjT560BdkhJKdDxNL8/nkb9FTPTHDvtoZE+yQSn/qNiquy9UGd6xTX4vygqP/Etzp+TXHx X-Received: by 2002:a17:902:3281:: with SMTP id z1-v6mr1805346plb.431.1516376850859; Fri, 19 Jan 2018 07:47:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516376850; cv=none; d=google.com; s=arc-20160816; b=AjUf8zY+DaXqqHMPQde80e0o3qPada9GRgWTJjYAGGu+uZIXpi5+LVMaGqn4OJ9TTp TK7gxmOqM3lW5fVMO4WchVtQu5xNNdJYuXNib0NkYqAb9UAuSfG2cB0he34tUnCnMpoS +1W7dXihaguN46TXw8bQlccRN+cT5ldNduXhQfrhf38AjPyIWoqaT3DgXRpWS8Qi41Qk qLdVM2N3LwSfEChC4AVZUg4jmQLyjbqmQfMppQTl2ImX0L5k63G4Wfe5Lsz/w4Oy5UQ2 fSLNhifilfmRbnh6gzX1wZb8xNe6W2PqH4GKWu3OiR8ssFiJSkMgLkqcpECYFakuiZnb vP6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dmarc-filter:arc-authentication-results; bh=WY1SMlR8tSSZNfQYl6iEMOkS0KJGunNNuLsua0A/RxM=; b=AzFu14BV6aiaOeAArX2K6Qa2PCGoYmdev+dWkJstwAxrTwGtHDIDC9gOdjN5UdWy2C FfyBj75oe92NZYi+J378xRp3dBpUkDOPRMt0+UfcETiamKjGqrM3TeYGBCh9I1V4IzlU vRkHbLSk0bvBxJBGl5q4ljI3zNWJNNw+mp1Z58wHrzLs9l3aODUKbSyDOOkrYbXuWmja 8bx7Xe2VYPDOTilfXN4z6IpTpDPU36EB2TwAwzIFz49WCtWT3YESTjmwLwnKH2CWwIuq KW0LCGLCLkOU9MpukYd8pAhzDSlEqNtDnTHhHNZXu4VK9d3lgJeadYXBRIQ0E8IFvpRU OAOg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 5-v6si916646plt.284.2018.01.19.07.47.16; Fri, 19 Jan 2018 07:47:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755965AbeASPqo (ORCPT + 99 others); Fri, 19 Jan 2018 10:46:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:58516 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756037AbeASPqc (ORCPT ); Fri, 19 Jan 2018 10:46:32 -0500 Received: from localhost.localdomain (i16-les03-th2-31-37-47-191.sfr.lns.abo.bbox.fr [31.37.47.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7728021759; Fri, 19 Jan 2018 15:46:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7728021759 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=frederic@kernel.org From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Levin Alexander , Peter Zijlstra , Mauro Carvalho Chehab , Linus Torvalds , Hannes Frederic Sowa , "Paul E . McKenney" , Wanpeng Li , Dmitry Safonov , Thomas Gleixner , Andrew Morton , Paolo Abeni , Radu Rendec , Ingo Molnar , Stanislaw Gruszka , Rik van Riel , Eric Dumazet , David Miller Subject: [RFC PATCH 2/4] softirq: Per vector deferment to workqueue Date: Fri, 19 Jan 2018 16:46:12 +0100 Message-Id: <1516376774-24076-3-git-send-email-frederic@kernel.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516376774-24076-1-git-send-email-frederic@kernel.org> References: <1516376774-24076-1-git-send-email-frederic@kernel.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some softirq vectors can be more CPU hungry than others. Especially networking may sometimes deal with packet storm and need more CPU than IRQ tail can offer without inducing scheduler latencies. In this case the current code defers to ksoftirqd that behaves nicer. Now this nice behaviour can be bad for other IRQ vectors that usually need quick processing. To solve this we only defer to threading the vectors that got re-enqueued on IRQ tail processing and leave the others inline on IRQ tail service. This is achieved using workqueues with per-CPU/per-vector worklets. Note ksoftirqd is not yet removed as it is still needed for threaded IRQs mode. Suggested-by: Linus Torvalds Signed-off-by: Frederic Weisbecker Cc: Dmitry Safonov Cc: Eric Dumazet Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Andrew Morton Cc: David Miller Cc: Hannes Frederic Sowa Cc: Ingo Molnar Cc: Levin Alexander Cc: Paolo Abeni Cc: Paul E. McKenney Cc: Radu Rendec Cc: Rik van Riel Cc: Stanislaw Gruszka Cc: Thomas Gleixner Cc: Wanpeng Li Cc: Mauro Carvalho Chehab --- include/linux/interrupt.h | 2 + kernel/sched/cputime.c | 5 +- kernel/softirq.c | 120 ++++++++++++++++++++++++++++++++++++++++++---- net/ipv4/tcp_output.c | 3 +- 4 files changed, 119 insertions(+), 11 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 69c2382..92d044d 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -514,6 +514,8 @@ static inline struct task_struct *this_cpu_ksoftirqd(void) return this_cpu_read(ksoftirqd); } +extern int softirq_serving_workqueue(void); + /* Tasklets --- multithreaded analogue of BHs. Main feature differing them of generic softirqs: tasklet diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index bac6ac9..30f70e5 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -71,7 +71,8 @@ void irqtime_account_irq(struct task_struct *curr) */ if (hardirq_count()) irqtime_account_delta(irqtime, delta, CPUTIME_IRQ); - else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) + else if (in_serving_softirq() && curr != this_cpu_ksoftirqd() && + !softirq_serving_workqueue()) irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ); } EXPORT_SYMBOL_GPL(irqtime_account_irq); @@ -375,7 +376,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick, cputime -= other; - if (this_cpu_ksoftirqd() == p) { + if (this_cpu_ksoftirqd() == p || softirq_serving_workqueue()) { /* * ksoftirqd time do not get accounted in cpu_softirq_time. * So, we have to handle it separately here. diff --git a/kernel/softirq.c b/kernel/softirq.c index c8c6841..becb1d9 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -62,6 +62,19 @@ const char * const softirq_to_name[NR_SOFTIRQS] = { "TASKLET", "SCHED", "HRTIMER", "RCU" }; +struct vector { + int nr; + struct work_struct work; +}; + +struct softirq { + unsigned int pending_work_mask; + int work_running; + struct vector vector[NR_SOFTIRQS]; +}; + +static DEFINE_PER_CPU(struct softirq, softirq_cpu); + /* * we cannot loop indefinitely here to avoid userspace starvation, * but we also don't want to introduce a worst case 1/HZ latency @@ -223,8 +236,77 @@ static inline bool lockdep_softirq_start(void) { return false; } static inline void lockdep_softirq_end(bool in_hardirq) { } #endif +int softirq_serving_workqueue(void) +{ + return __this_cpu_read(softirq_cpu.work_running); +} + +static void vector_work_func(struct work_struct *work) +{ + struct vector *vector = container_of(work, struct vector, work); + struct softirq *softirq = this_cpu_ptr(&softirq_cpu); + int vec_nr = vector->nr; + int vec_bit = BIT(vec_nr); + u32 pending; + + local_irq_disable(); + pending = local_softirq_pending(); + account_irq_enter_time(current); + __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET); + lockdep_softirq_enter(); + set_softirq_pending(pending & ~vec_bit); + local_irq_enable(); + + if (pending & vec_bit) { + struct softirq_action *sa = &softirq_vec[vec_nr]; + + kstat_incr_softirqs_this_cpu(vec_nr); + softirq->work_running = 1; + trace_softirq_entry(vec_nr); + sa->action(sa); + trace_softirq_exit(vec_nr); + softirq->work_running = 0; + } + + local_irq_disable(); + + pending = local_softirq_pending(); + if (pending & vec_bit) + schedule_work_on(smp_processor_id(), &vector->work); + else + softirq->pending_work_mask &= ~vec_bit; + + lockdep_softirq_exit(); + account_irq_exit_time(current); + __local_bh_enable(SOFTIRQ_OFFSET); + local_irq_enable(); +} + +static void do_softirq_workqueue(u32 pending) +{ + struct softirq *softirq = this_cpu_ptr(&softirq_cpu); + struct softirq_action *h = softirq_vec; + int softirq_bit; + + pending &= ~softirq->pending_work_mask; + + while ((softirq_bit = ffs(pending))) { + struct vector *vector; + unsigned int vec_nr; + + h += softirq_bit - 1; + vec_nr = h - softirq_vec; + softirq->pending_work_mask |= BIT(vec_nr); + vector = &softirq->vector[vec_nr]; + schedule_work_on(smp_processor_id(), &vector->work); + h++; + pending >>= softirq_bit; + } +} + asmlinkage __visible void __softirq_entry __do_softirq(void) { + struct softirq *softirq = this_cpu_ptr(&softirq_cpu); unsigned long old_flags = current->flags; struct softirq_action *h; bool in_hardirq; @@ -238,15 +320,18 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) */ current->flags &= ~PF_MEMALLOC; - pending = local_softirq_pending(); + /* Ignore vectors pending on workqueues, they have been punished */ + pending = local_softirq_pending() & ~softirq->pending_work_mask; account_irq_enter_time(current); __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET); in_hardirq = lockdep_softirq_start(); - restart: - /* Reset the pending bitmask before enabling irqs */ - set_softirq_pending(0); + /* + * Reset the pending bitmask before enabling irqs but keep + * those pending on workqueues so they get properly handled there. + */ + set_softirq_pending(softirq->pending_work_mask); local_irq_enable(); @@ -282,12 +367,18 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) rcu_bh_qs(); local_irq_disable(); - pending = local_softirq_pending(); + pending = local_softirq_pending() & ~softirq->pending_work_mask; if (pending) { - if (pending & executed || need_resched()) + if (need_resched()) { wakeup_softirqd(); - else - goto restart; + } else { + /* Vectors that got re-enqueued are threaded */ + if (executed & pending) + do_softirq_workqueue(executed & pending); + pending &= ~executed; + if (pending) + goto restart; + } } lockdep_softirq_end(in_hardirq); @@ -624,10 +715,23 @@ void __init softirq_init(void) int cpu; for_each_possible_cpu(cpu) { + struct softirq *softirq; + int i; + per_cpu(tasklet_vec, cpu).tail = &per_cpu(tasklet_vec, cpu).head; per_cpu(tasklet_hi_vec, cpu).tail = &per_cpu(tasklet_hi_vec, cpu).head; + + softirq = &per_cpu(softirq_cpu, cpu); + + for (i = 0; i < NR_SOFTIRQS; i++) { + struct vector *vector; + + vector = &softirq->vector[i]; + vector->nr = i; + INIT_WORK(&vector->work, vector_work_func); + } } open_softirq(TASKLET_SOFTIRQ, tasklet_action); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a4d214c..b4e4160 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -919,7 +919,8 @@ void tcp_wfree(struct sk_buff *skb) * - chance for incoming ACK (processed by another cpu maybe) * to migrate this flow (skb->ooo_okay will be eventually set) */ - if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current) + if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && + (this_cpu_ksoftirqd() == current || softirq_serving_workqueue())) goto out; for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) { -- 2.7.4