Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp2838370rdb; Wed, 15 Nov 2023 12:01:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IEg1DH+oDVyfZhxxZICm4KWmgvvZzcPjV4OI2yNONypqNNuhtpjyw4Iv5n9dmdN27MHcu8C X-Received: by 2002:a05:6808:1529:b0:3b2:f429:828 with SMTP id u41-20020a056808152900b003b2f4290828mr4056368oiw.19.1700078514042; Wed, 15 Nov 2023 12:01:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700078514; cv=none; d=google.com; s=arc-20160816; b=1CgNCBgEWr6bznNnrrzXfnX4HLLmSOVr0DTzPMtsmMPulCA2yXRGVo6JPFgt4ryP9W YS5Tp2kxmrWqx4VX5QH6i6N89Jj2jh3R4kzQkoQ84OancPyLlOsuR+XBLOxZ6Y13q94V lFQFD/t1uTqAnAuNYOzvHxg6UXMkC0WnZbhAawwnD7F7AHGsRepjcCfqUWijIofNOmPP McRuQPWlT6aZBNB2K+uLS+ht25o9E1UqWP2hr9Ip0LfvLwOVTnrIIR5dVZhYztYpJaL6 Sb5Xt/TXfBQIvWCifO5RugENubfcy1Aub2C3OCpQ2V0TTPEmi5iSBxYGaXgJM8Q8AO+I VqAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=Hv+UvRKdmSuiVyK9qjX6laNyWNewh1RU4NnJMyLIEPQ=; fh=2hFjvaEPSeuRZmT5Oy5pixCnK56hZqRdDmTlKNMMEfc=; b=Lju7xj69QrnGm3S9LM67fiqYdhoXK7xZ7m1AIQztQljUhfH7junNRC7P9yZ1XzbwzU kGAb8YFokEn+uXIFroJq8VVc1ZqHviiRnju+GaulvillVgl0fMZcSPyL6cPEWg1Ad00J K0GnSMBgeyfFYFHsPqt2zxtVJIjrFLIu/K3IojU1Jx7qvMnJ6A1DUtHPJN7Pl6jFHai+ 8Ze7femzgMb42HBkNFy8JHhdG46Wz78cHWbSVNBjfIAwwH/8vX51Iw3FzJ32dTbOlcpF lp0zCTdXZEMVQe7EvyNR5qqgdynZdBYBZs0lZkveW5Si8DiRTMy6FVTTTwBv3MoDFJvD 2sIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="ju/wWnCI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id i15-20020a9d68cf000000b006b9eee87ae9si3849376oto.275.2023.11.15.12.01.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 12:01:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="ju/wWnCI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id F417E8078992; Wed, 15 Nov 2023 12:00:34 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344012AbjKOT7Z (ORCPT + 99 others); Wed, 15 Nov 2023 14:59:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343996AbjKOT7Y (ORCPT ); Wed, 15 Nov 2023 14:59:24 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14DD6AF; Wed, 15 Nov 2023 11:59:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700078361; x=1731614361; h=date:from:to:cc:subject:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nrWkTvCzWNuE40twGKi0g9x8mI/HOylHkbIgQH6YMTw=; b=ju/wWnCIo1vMui4X1niumgsVsi6O2yJz7ZdSUwrPTOzgNc8GQ0HpWOm8 xLaZVllvoRfHWbrIfiA3UqRN35vk+4/G5K3TApor77t5jSG7O+vAs/bmN NrVPKMlm/nBsGdfZmnv+voItFucz9uonTV0ofJgKmvcJ9jlwj95KTH6K2 hIsF1HhP+zhjYrU+XJOjY3Bp58DP8v6bEoupyeLrDUTXhs7aEZ6NQTLd/ sAslhXFgKlEIDxc9BEi9ZSJQvJwtl0k/pknKglTeKvvu2ZWKNZ5BqeE4u LxxEE1SzGjcNnIMk1UMemLjgV5Y+OWjftCIGLNbh7jyINVwaNg0YoLUra A==; X-IronPort-AV: E=McAfee;i="6600,9927,10895"; a="390743032" X-IronPort-AV: E=Sophos;i="6.03,305,1694761200"; d="scan'208";a="390743032" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2023 11:59:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,305,1694761200"; d="scan'208";a="12880278" Received: from jacob-builder.jf.intel.com (HELO jacob-builder) ([10.24.100.114]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2023 11:59:21 -0800 Date: Wed, 15 Nov 2023 12:04:01 -0800 From: Jacob Pan To: Peter Zijlstra Cc: LKML , X86 Kernel , iommu@lists.linux.dev, Thomas Gleixner , Lu Baolu , kvm@vger.kernel.org, Dave Hansen , Joerg Roedel , "H. Peter Anvin" , Borislav Petkov , Ingo Molnar , Raj Ashok , "Tian, Kevin" , maz@kernel.org, seanjc@google.com, Robin Murphy , jacob.jun.pan@linux.intel.com Subject: Re: [PATCH RFC 09/13] x86/irq: Install posted MSI notification handler Message-ID: <20231115120401.3e02d977@jacob-builder> In-Reply-To: <20231115125624.GF3818@noisy.programming.kicks-ass.net> References: <20231112041643.2868316-1-jacob.jun.pan@linux.intel.com> <20231112041643.2868316-10-jacob.jun.pan@linux.intel.com> <20231115125624.GF3818@noisy.programming.kicks-ass.net> Organization: OTC X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 15 Nov 2023 12:00:35 -0800 (PST) Hi Peter, On Wed, 15 Nov 2023 13:56:24 +0100, Peter Zijlstra wrote: > On Sat, Nov 11, 2023 at 08:16:39PM -0800, Jacob Pan wrote: > > > +static __always_inline inline void handle_pending_pir(struct pi_desc > > *pid, struct pt_regs *regs) +{ > > __always_inline means that... (A) > > > + int i, vec = FIRST_EXTERNAL_VECTOR; > > + u64 pir_copy[4]; > > + > > + /* > > + * Make a copy of PIR which contains IRQ pending bits for > > vectors, > > + * then invoke IRQ handlers for each pending vector. > > + * If any new interrupts were posted while we are processing, > > will > > + * do again before allowing new notifications. The idea is to > > + * minimize the number of the expensive notifications if IRQs > > come > > + * in a high frequency burst. > > + */ > > + for (i = 0; i < 4; i++) > > + pir_copy[i] = raw_atomic64_xchg((atomic64_t > > *)&pid->pir_l[i], 0); + > > + /* > > + * Ideally, we should start from the high order bits set in > > the PIR > > + * since each bit represents a vector. Higher order bit > > position means > > + * the vector has higher priority. But external vectors are > > allocated > > + * based on availability not priority. > > + * > > + * EOI is included in the IRQ handlers call to apic_ack_irq, > > which > > + * allows higher priority system interrupt to get in between. > > + */ > > + for_each_set_bit_from(vec, (unsigned long *)&pir_copy[0], 256) > > + call_irq_handler(vec, regs); > > + > > +} > > + > > +/* > > + * Performance data shows that 3 is good enough to harvest 90+% of the > > benefit > > + * on high IRQ rate workload. > > + * Alternatively, could make this tunable, use 3 as default. > > + */ > > +#define MAX_POSTED_MSI_COALESCING_LOOP 3 > > + > > +/* > > + * For MSIs that are delivered as posted interrupts, the CPU > > notifications > > + * can be coalesced if the MSIs arrive in high frequency bursts. > > + */ > > +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification) > > +{ > > + struct pt_regs *old_regs = set_irq_regs(regs); > > + struct pi_desc *pid; > > + int i = 0; > > + > > + pid = this_cpu_ptr(&posted_interrupt_desc); > > + > > + inc_irq_stat(posted_msi_notification_count); > > + irq_enter(); > > + > > + while (i++ < MAX_POSTED_MSI_COALESCING_LOOP) { > > + handle_pending_pir(pid, regs); > > + > > + /* > > + * If there are new interrupts posted in PIR, do > > again. If > > + * nothing pending, no need to wait for more > > interrupts. > > + */ > > + if (is_pir_pending(pid)) > > So this reads those same 4 words we xchg in handle_pending_pir(), right? > > > + continue; > > + else > > + break; > > + } > > + > > + /* > > + * Clear outstanding notification bit to allow new IRQ > > notifications, > > + * do this last to maximize the window of interrupt coalescing. > > + */ > > + pi_clear_on(pid); > > + > > + /* > > + * There could be a race of PI notification and the clearing > > of ON bit, > > + * process PIR bits one last time such that handling the new > > interrupts > > + * are not delayed until the next IRQ. > > + */ > > + if (unlikely(is_pir_pending(pid))) > > + handle_pending_pir(pid, regs); > > (A) ... we get _two_ copies of that thing in this function. Does that > make sense ? > > > + > > + apic_eoi(); > > + irq_exit(); > > + set_irq_regs(old_regs); > > +} > > #endif /* X86_POSTED_MSI */ > > Would it not make more sense to write things something like: > it is a great idea, we can save expensive xchg if pir[i] is 0. But I have to tweak a little to let it perform better. > bool handle_pending_pir() > { > bool handled = false; > u64 pir_copy[4]; > > for (i = 0; i < 4; i++) { > if (!pid-pir_l[i]) { > pir_copy[i] = 0; > continue; > } > > pir_copy[i] = arch_xchg(&pir->pir_l[i], 0); we are interleaving cacheline read and xchg. So made it to for (i = 0; i < 4; i++) { pir_copy[i] = pid->pir_l[i]; } for (i = 0; i < 4; i++) { if (pir_copy[i]) { pir_copy[i] = arch_xchg(&pid->pir_l[i], 0); handled = true; } } With DSA MEMFILL test just one queue one MSI, we are saving 3 xchg per loop. Here is the performance comparison in IRQ rate: Original RFC 9.29 m/sec, Optimized in your email 8.82m/sec, Tweaked above: 9.54m/s I need to test with more MSI vectors spreading out to all 4 u64. I suspect the benefit will decrease since we need to do both read and xchg for non-zero entries. > handled |= true; > } > > if (!handled) > return handled; > > for_each_set_bit() > .... > > return handled. > } > > sysvec_posted_blah_blah() > { > bool done = false; > bool handled; > > for (;;) { > handled = handle_pending_pir(); > if (done) > break; > if (!handled || ++loops > MAX_LOOPS) { > pi_clear_on(pid); > /* once more after clear_on */ > done = true; > } > } > } > > > Hmm? Thanks, Jacob