Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2547357lqz; Wed, 3 Apr 2024 00:49:07 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWJSv+QmTtXmDlVIbDaON+Wp/dSnpLI6/G/t795s3HTMV52cGOP+pEntP16fMpKtkibZllCCheewQl9Kn64P1YEEJUmI1JVBFzlhRmYoA== X-Google-Smtp-Source: AGHT+IHhmEYcEjJR6KX+yB/yNIlRz6RtyAi9tNHbIlVBQrWPvOzVMNWuU3bqic2IaYERRJv9go2X X-Received: by 2002:a05:6a21:a59e:b0:1a3:a64e:6050 with SMTP id gd30-20020a056a21a59e00b001a3a64e6050mr2153689pzc.30.1712130547441; Wed, 03 Apr 2024 00:49:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712130547; cv=pass; d=google.com; s=arc-20160816; b=OIA525cVxEorSiAYTA/xSDIMBa+tT9sx2hkRDJ9ncTmU4OArSBW3SAY0aqgnCS4mlu TQsLQwAZlhF9r6PrrbCDcTT5Ztavyq7kisx/8hFg855oTyNPzSwZeZoXRPuQLFOHWL6R NORLF4TtY/DDUC8gPyA7Fm71xzQZKLnQ75Xr5qwGfNybzA0bdhzpJT1UgUlRy0W59cZR efn0r5b1rFoY4tuPIEimrr6jpFo4CudBRxb7iaR0YR0IeQfiikPdGeGcZtZVpfQTisbB 2U9YFAMwHdIso26Ipn+LMB2Lo0ZHVcYKlv7EjD3Vw67B/a6QYQnJYC9Nq/CTM+LK3RA1 7Sgg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:references :in-reply-to:message-id:subject:cc:to:from:date:dkim-signature; bh=CkCZ8ZET/XbMnImvBURJH2sKHf+xUKLmnLxXzd8obxA=; fh=lwLlS0IvY80WuBlMdeSUJ8omIJyHuWXCvbGWVW7Mi1o=; b=m8pDda0pyGdn4OcBK/BoVHYW233QSnJVAEr+eOzphWOsOQAdyu/M5JQzdupSiHxhZo Yp7w4h1CH6x+EjtSFWV45gN+LO2MW6GhE8RlqDy/gpiohKQOSGZvnPhVFmxaIZYG2lSA u10FvxXqFB48mF13OjvFkxvrN4FLe23Ey1h+zVxWbsDzLPwIZeo+ZTEwVKJVzFd9cmvl wlHosBPq5O7V2JgRJ8w06jBhzFYJ4uJVZRfASUdqrmfGNPyo7IUnyMOhPnOTaNDBjHrF X4prldoHsnA3DL4ITbTfp9CPDknljpm8VvAhvYycVSlZmj8YcFbfAFwb8GbTfYbQ1Ml4 fhMw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SbJD14F3; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-128979-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-128979-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id s70-20020a637749000000b005dc8372021bsi13124704pgc.464.2024.04.03.00.49.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 00:49:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-128979-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SbJD14F3; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-128979-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-128979-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 02C8C2872A6 for ; Wed, 3 Apr 2024 02:39:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B7DBC1B96E; Wed, 3 Apr 2024 02:39:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SbJD14F3" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 253D314F70; Wed, 3 Apr 2024 02:39:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712111971; cv=none; b=Y3F3W/oGZRrghV9qlZzn3pr5/cosjOmlIhsMGK1KeNjWTxUuxnf1aj7MZsP8F67T/iUWrJ2frbkqJQfqKD4B+8pUK9oTHxT5lnaMayhtJvaVpiE2E6T2MdY3Sxk+uspoBkVyqXjI6zuxXKbFQbIgQrNLrLPS1fVkFt1mYJW039k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712111971; c=relaxed/simple; bh=tue9ayPeZn96trJ0JwvUsXX5dq+DjhDb0/XxC51zF/Y=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RlM2snhygH0edYL4BGDfHCMymU73HN6b47NHeonqyOLhd+o5uJj19dyCyHPXXkpT/bllQ2X25Zu5HE+ca9S8wWidnfuM2+Eqqh4d5qFKhIgjskNuQZWZc1BdN2w29+q+ue849VLCfjW1jD635FGpuuW7L5XQyAyYjg021Avdh6E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SbJD14F3; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712111970; x=1743647970; h=date:from:to:cc:subject:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tue9ayPeZn96trJ0JwvUsXX5dq+DjhDb0/XxC51zF/Y=; b=SbJD14F3bvh1L65trKLYZ5wroeeUR24r3qAAjRvsImKv+jhVcnUdBCj8 OYw6z7gcbpYlFZJ5dc4l8C2pX0oPE462FhK+2tIBst8ZoquOEpR57dWis uhiVUU0J0BGsxaxAk3LnjltsWSBrrDfFNGXq3ItMBHj94V9N/H51RpPZh U+tkOg3f36UL6Os3h/ZZr2nCpqDhjQbngWDH6n4sPDEfTj8AGyS5M5PWe Po3UJsvRcueTXmWQrPf95RTPzx9q807w/X5On7nqrPMDJz8WPPcdZQBM2 cDUXCI1aKjj3jNdmuVP4eYTsJgWZCMUmm3kJ1u5IUgnTZBpWy/18hV+0i Q==; X-CSE-ConnectionGUID: 8/OR1ilwSX6QfhNMoWW/lQ== X-CSE-MsgGUID: McQ1TGIKQIubryWynfnfkQ== X-IronPort-AV: E=McAfee;i="6600,9927,11032"; a="7220949" X-IronPort-AV: E=Sophos;i="6.07,176,1708416000"; d="scan'208";a="7220949" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 19:39:29 -0700 X-CSE-ConnectionGUID: CHXo6pawQSewMGVX0695hQ== X-CSE-MsgGUID: Mp3wZMv4SGKNDxiqFqnOgQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,176,1708416000"; d="scan'208";a="18702761" Received: from jacob-builder.jf.intel.com (HELO jacob-builder) ([10.54.39.125]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 19:39:29 -0700 Date: Tue, 2 Apr 2024 19:43:55 -0700 From: Jacob Pan To: Zeng Guang Cc: LKML , X86 Kernel , Peter Zijlstra , "iommu@lists.linux.dev" , Thomas Gleixner , Lu Baolu , "kvm@vger.kernel.org" , "Hansen, Dave" , Joerg Roedel , "H. Peter Anvin" , Borislav Petkov , Ingo Molnar , "Luse, Paul E" , "Williams, Dan J" , Jens Axboe , "Raj, Ashok" , "Tian, Kevin" , "maz@kernel.org" , "seanjc@google.com" , Robin Murphy , jacob.jun.pan@linux.intel.com Subject: Re: [PATCH 09/15] x86/irq: Install posted MSI notification handler Message-ID: <20240402194355.72b2ade8@jacob-builder> In-Reply-To: References: <20240126234237.547278-1-jacob.jun.pan@linux.intel.com> <20240126234237.547278-10-jacob.jun.pan@linux.intel.com> Organization: OTC X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Hi Zeng, On Fri, 29 Mar 2024 15:32:00 +0800, Zeng Guang wrote: > On 1/27/2024 7:42 AM, Jacob Pan wrote: > > @@ -353,6 +360,111 @@ void intel_posted_msi_init(void) > > pid->nv = POSTED_MSI_NOTIFICATION_VECTOR; > > pid->ndst = this_cpu_read(x86_cpu_to_apicid); > > } > > + > > +/* > > + * De-multiplexing posted interrupts is on the performance path, the > > code > > + * below is written to optimize the cache performance based on the > > following > > + * considerations: > > + * 1.Posted interrupt descriptor (PID) fits in a cache line that is > > frequently > > + * accessed by both CPU and IOMMU. > > + * 2.During posted MSI processing, the CPU needs to do 64-bit read and > > xchg > > + * for checking and clearing posted interrupt request (PIR), a 256 > > bit field > > + * within the PID. > > + * 3.On the other side, the IOMMU does atomic swaps of the entire PID > > cache > > + * line when posting interrupts and setting control bits. > > + * 4.The CPU can access the cache line a magnitude faster than the > > IOMMU. > > + * 5.Each time the IOMMU does interrupt posting to the PIR will evict > > the PID > > + * cache line. The cache line states after each operation are as > > follows: > > + * CPU IOMMU PID Cache line > > state > > + * --------------------------------------------------------------- > > + *...read64 exclusive > > + *...lock xchg64 modified > > + *... post/atomic swap invalid > > + *...------------------------------------------------------------- > > + * > > + * To reduce L1 data cache miss, it is important to avoid contention > > with > > + * IOMMU's interrupt posting/atomic swap. Therefore, a copy of PIR is > > used > > + * to dispatch interrupt handlers. > > + * > > + * In addition, the code is trying to keep the cache line state > > consistent > > + * as much as possible. e.g. when making a copy and clearing the PIR > > + * (assuming non-zero PIR bits are present in the entire PIR), it does: > > + * read, read, read, read, xchg, xchg, xchg, xchg > > + * instead of: > > + * read, xchg, read, xchg, read, xchg, read, xchg > > + */ > > +static __always_inline inline bool handle_pending_pir(u64 *pir, struct > > pt_regs *regs) +{ > > + int i, vec = FIRST_EXTERNAL_VECTOR; > > + unsigned long pir_copy[4]; > > + bool handled = false; > > + > > + for (i = 0; i < 4; i++) > > + pir_copy[i] = pir[i]; > > + > > + for (i = 0; i < 4; i++) { > > + if (!pir_copy[i]) > > + continue; > > + > > + pir_copy[i] = arch_xchg(pir, 0); > > Here is a problem that pir_copy[i] will always be written as pir[0]. > This leads to handle spurious posted MSIs later. Yes, you are right. It should be pir_copy[i] = arch_xchg(&pir[i], 0); Will fix in v2, really appreciated. > > + handled = true; > > + } > > + > > + if (handled) { > > + for_each_set_bit_from(vec, pir_copy, > > FIRST_SYSTEM_VECTOR) > > + call_irq_handler(vec, regs); > > + } > > + > > + return handled; > > +} > > + > > +/* > > + * Performance data shows that 3 is good enough to harvest 90+% of the > > benefit > > + * on high IRQ rate workload. > > + */ > > +#define MAX_POSTED_MSI_COALESCING_LOOP 3 > > + > > +/* > > + * For MSIs that are delivered as posted interrupts, the CPU > > notifications > > + * can be coalesced if the MSIs arrive in high frequency bursts. > > + */ > > +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification) > > +{ > > + struct pt_regs *old_regs = set_irq_regs(regs); > > + struct pi_desc *pid; > > + int i = 0; > > + > > + pid = this_cpu_ptr(&posted_interrupt_desc); > > + > > + inc_irq_stat(posted_msi_notification_count); > > + irq_enter(); > > + > > + /* > > + * Max coalescing count includes the extra round of > > handle_pending_pir > > + * after clearing the outstanding notification bit. Hence, at > > most > > + * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here. > > + */ > > + while (++i < MAX_POSTED_MSI_COALESCING_LOOP) { > > + if (!handle_pending_pir(pid->pir64, regs)) > > + break; > > + } > > + > > + /* > > + * Clear outstanding notification bit to allow new IRQ > > notifications, > > + * do this last to maximize the window of interrupt coalescing. > > + */ > > + pi_clear_on(pid); > > + > > + /* > > + * There could be a race of PI notification and the clearing > > of ON bit, > > + * process PIR bits one last time such that handling the new > > interrupts > > + * are not delayed until the next IRQ. > > + */ > > + handle_pending_pir(pid->pir64, regs); > > + > > + apic_eoi(); > > + irq_exit(); > > + set_irq_regs(old_regs); > > } > > #endif /* X86_POSTED_MSI */ > > Thanks, Jacob