Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp96702ybh; Fri, 17 Jul 2020 20:16:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyF8s2iqCjKAWp6yme7674ETs8cvFJJ1mV7e8NWE/C2JIJo687y4/XCIDq6x3pOZFne85ZA X-Received: by 2002:a17:906:f94c:: with SMTP id ld12mr11791298ejb.426.1595042166324; Fri, 17 Jul 2020 20:16:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595042166; cv=none; d=google.com; s=arc-20160816; b=GVgUuxSptgFw1JHxv98sSjHMbr8awEOiVYWs+4k3jMVO0NyEI9pW3mYvRyCW56PRcG rrARfnpi105sHNPDQSIwlhaBzeCLkIK0htpZo0UwpWLLBOxgkMBi8YEMYkqQIGHS05St 85V5mxWUmxpn9vxx3vqGliPxHboSB8pgdDLPTWkgSGgb2kFbiHwpkk5cEdmVMVEODWZh YX8lhy60uABTOmeG1U0OHh4nM3/mXVjBFlAzfYOJCAHLsZ/J7HS13+e3eALLjKKmLzx9 6Q33bQUbxxfSyWtcpNygIjwVsJSnp7lxpTtAVft+auJF0R8upuVftP7HYsuTkgCCEd68 9mqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=6RvVBoMawP24vzrSiDzzwQRHc7BRCqrjfCcmsbmA3U8=; b=JMDQ1vMU8Fw7E0A6EJ9PM1LVdGeMwTAGwJC5+LTylJmjQO44qVja5daFRyTxlPZhRs 9ZvWe5FQn4E36ot+RLKa25jOGNoHy2bufdPYuetkooNzVZvlH16bFhXO19FYi86RRbkI VOkGeVAWmy2wNijHOgU8p8Ue8HYldBHqz5CN82sVpjgCx6Zu1tdmecwAxcq07Wzk/J+h HTryNzKDV3IkX6F5IDCrixxm+W5e4MGJ9iliNmFCFQh1QOyhimHHIUOTeSsFBeDvWYbj Y/zKgP6lOsl9foAsWBns4A4PVtUvav9Ks04brjSGPAhCYWgaDDdiZQTXHNed/scXVsvQ Mc0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=scKMI63l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n5si6620228ejc.171.2020.07.17.20.15.41; Fri, 17 Jul 2020 20:16:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=scKMI63l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728552AbgGRDO3 (ORCPT + 99 others); Fri, 17 Jul 2020 23:14:29 -0400 Received: from mail.kernel.org ([198.145.29.99]:40600 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726898AbgGRDO3 (ORCPT ); Fri, 17 Jul 2020 23:14:29 -0400 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0BA0E2083B for ; Sat, 18 Jul 2020 03:14:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595042068; bh=24uFYMthOrL4UuVjEYj+QUe8/pbaGdx4LvKJbHi2cuE=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=scKMI63lvbDIy9cjSaqoguK1EzwVs3BvLGuSK5Jy2Lq0lWkODfq+enHnopNyafd6I Kk+ilRmii5gjRRbJ67I0S9WexzZmziseCgbQPy/OqcB7zHTKH8qJL0bFpaOsP1O+i2 mMBQqTTHKGY0EynCQE8GD2Do9YZcnt5XLD1R1KpE= Received: by mail-wr1-f48.google.com with SMTP id o11so13019096wrv.9 for ; Fri, 17 Jul 2020 20:14:27 -0700 (PDT) X-Gm-Message-State: AOAM531gOvk5lOBNGuYFUeDih+kyqv7EKzqAUXH8Wd6ZESEVa4IREVRi r3E3ipxF8PgsR0dZIPmQkOB2VigaiqgtxbvKxJOZ0w== X-Received: by 2002:adf:f707:: with SMTP id r7mr12760446wrp.70.1595042066566; Fri, 17 Jul 2020 20:14:26 -0700 (PDT) MIME-Version: 1.0 References: <20200718021331.940659-1-joshdon@google.com> In-Reply-To: <20200718021331.940659-1-joshdon@google.com> From: Andy Lutomirski Date: Fri, 17 Jul 2020 20:14:14 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC][PATCH] x86: optimization to avoid CAL+RES IPIs To: Josh Don Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H . Peter Anvin" , Linux PM , LKML , "Rafael J . Wysocki" , Daniel Lezcano , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Paul Turner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 17, 2020, at 7:13 PM, Josh Don wrote: > > =EF=BB=BFFrom: Venkatesh Pallipadi > > smp_call_function_single and smp_send_reschedule send unconditional IPI > to target CPU. However, if the target CPU is in some form of poll based > idle, we can do IPI-less wakeups. > > Doing this has certain advantages: > * Lower overhead on Async "no wait" IPI send path. > * Avoiding actual interrupts reduces system non-idle cycles. > > Note that this only helps when target CPU is idle. When it is busy we > will still send an IPI as before. PeterZ and I fixed a whole series of bugs a few years ago, and remote wakeups *should* already do this. Did we miss something? Did it regress? Even the call_function_single path ought to go through this: void send_call_function_single_ipi(int cpu) { struct rq *rq =3D cpu_rq(cpu); if (!set_nr_if_polling(rq->idle)) arch_send_call_function_single_ipi(cpu); else trace_sched_wake_idle_without_ipi(cpu); } > > *** RFC NOTE *** > This patch breaks idle time accounting (and to a lesser degree, softirq > accounting). This is because this patch violates the assumption that > softirq can only be run either on the tail of a hard IRQ or inline on > a non-idle thread via local_bh_enable(), since we can now process > softirq inline within the idle loop. These ssues can be resolved in a > later version of this patch. > > Signed-off-by: Josh Don > --- > arch/x86/include/asm/mwait.h | 5 +- > arch/x86/include/asm/processor.h | 1 + > arch/x86/include/asm/thread_info.h | 2 + > arch/x86/kernel/apic/ipi.c | 8 +++ > arch/x86/kernel/smpboot.c | 4 ++ > drivers/cpuidle/poll_state.c | 5 +- > include/linux/ipiless_wake.h | 93 ++++++++++++++++++++++++++++++ > kernel/sched/idle.c | 10 +++- > 8 files changed, 124 insertions(+), 4 deletions(-) > create mode 100644 include/linux/ipiless_wake.h > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index e039a933aca3..aed393f38a39 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -2,6 +2,7 @@ > #ifndef _ASM_X86_MWAIT_H > #define _ASM_X86_MWAIT_H > > +#include > #include > #include > > @@ -109,6 +110,7 @@ static inline void __sti_mwait(unsigned long eax, uns= igned long ecx) > static inline void mwait_idle_with_hints(unsigned long eax, unsigned long= ecx) > { > if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_te= st()) { > + enter_ipiless_idle(); > if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) { > mb(); > clflush((void *)¤t_thread_info()->flags); > @@ -116,8 +118,9 @@ static inline void mwait_idle_with_hints(unsigned lon= g eax, unsigned long ecx) > } > > __monitor((void *)¤t_thread_info()->flags, 0, 0); > - if (!need_resched()) > + if (!is_ipiless_wakeup_pending()) > __mwait(eax, ecx); > + exit_ipiless_idle(); > } > current_clr_polling(); > } > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/proc= essor.h > index 03b7c4ca425a..045fc9bbd095 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -568,6 +568,7 @@ static inline void arch_thread_struct_whitelist(unsig= ned long *offset, > * have to worry about atomic accesses. > */ > #define TS_COMPAT 0x0002 /* 32bit syscall active (64BIT)*/ > +#define TS_IPILESS_WAKEUP 0x0010 /* pending IPI-work on idle exit = */ What's this for? > +#define _TIF_IN_IPILESS_IDLE (1 << TIF_IN_IPILESS_IDLE) We already have TIF_POLLING_NRFLAG. Why do we need this? > #include "local.h" > @@ -67,11 +68,18 @@ void native_smp_send_reschedule(int cpu) > WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu); > return; > } > + > + if (try_ipiless_wakeup(cpu)) > + return; > + I think this shouldn=E2=80=99t be called if the target is idle unless we lo= st a race. What am I missing? > apic->send_IPI(cpu, RESCHEDULE_VECTOR); > } > > void native_send_call_func_single_ipi(int cpu) > { > + if (try_ipiless_wakeup(cpu)) > + return; > + > apic->send_IPI(cpu, CALL_FUNCTION_SINGLE_VECTOR); > } This function's caller already does this. > +static inline void exit_ipiless_idle(void) > +{ > + if (!test_and_clear_thread_flag(TIF_IN_IPILESS_IDLE)) { This has the unfortunate property that, if you try to wake the same polling CPU twice in rapid succession, the second try will send an IPI. One might debate whether this is a real problem, but the existing code does not have this issue. I could poke at this more, but I'm wondering whether you might have developed this against a rather old kernel and blindly forward-ported it. --Andy