Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp1787768pxy; Thu, 29 Apr 2021 14:47:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzP9qC5ox5DUsEhds2Fsg4WKEnHkSvwF3OjbcX/INz7OgMrc0s7/5IxGAHzvSPnKq4X8uBv X-Received: by 2002:a65:4985:: with SMTP id r5mr1670164pgs.65.1619732864849; Thu, 29 Apr 2021 14:47:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619732864; cv=none; d=google.com; s=arc-20160816; b=ZCCnL495g/e+Uo5MRftD9U3r4PTNdjJdHLwhbiJMf1Qa0bCYOiNjlHsuNNtmK8YZEn fIYQn1lMT20WHdrr/adWWLPbLXV1yJzGVGRHn9HSvjysFRRjokDUEukB76lkpCpSVFXV BSqv3rjP9bFfA9xWoCINFG1V4PbA1pgY3anXbQP3UYSlm9wVqkwvUAPY+g2OPQIoh3nW v8iUP/SxfCyLKVIt2JRsC+iT2XYqqBsMjP9QSWEgxa99QE9EbnzpEOxwIiKKPZBF2MF/ HxBPc4TknZivPsK548/MeGARxLXPQSN9mTxCz0aA9z4aTA1N4AMjWsapgMhbclWqkzwa Fj9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=juBq8jG0IfhjhmycH0bcejt+prDDhvyOJbkP2ywOwNE=; b=x5m9KUUIVHdv87BDnpyRsxOUN8UxzA3uDl9dnCRweLKw3b51MJW0g2WdH9rIVpi2T0 KnOrVgCfc2KFDZOkbtOSjmB3kBSXFpPxlkcOp/ddUotWWvVjEpLG7G74CKIy/JFFVnL9 AhXVeL1Dm8p1hK+/ltLIMHKPU3uC2/t8ImGOgVeAXiTcCCOXcMd5OjNlnTFnI0ZtBNBX nPNjl30Y7hAEpZZaSXsBspqjRFsoIsffEfOnyECTHTFJfdHafgj1kcxcphlYueNEKwQO Z/WOww+3ljn5JJ3HkRwXiYhF9prgsMivYg7RSM+1G77Omk4g/PIvHCoiSeQY54CsOzao Ivfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D0nNNUnv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x9si1352270pgp.386.2021.04.29.14.47.30; Thu, 29 Apr 2021 14:47:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D0nNNUnv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237352AbhD2Vpw (ORCPT + 99 others); Thu, 29 Apr 2021 17:45:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27366 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237299AbhD2Vpv (ORCPT ); Thu, 29 Apr 2021 17:45:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619732703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=juBq8jG0IfhjhmycH0bcejt+prDDhvyOJbkP2ywOwNE=; b=D0nNNUnv5A60U0Zr1jsl4cXx41cyl57a5nMoQnEr8AI2FTpPX+Dqo4Z7/CYe+ehNerTsmF MWwuMQn/Nfy21E9jM/s8h9FD2uHbLOV6ZxjgrV82ATOYZXsEgB2jKyK0pqphcDMgYgqHB7 Z56PIWHPV5TWZCeKaLUbHmVSqJc6WlA= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-541-9bIRIlpYOh246OMVp2SC8A-1; Thu, 29 Apr 2021 17:44:59 -0400 X-MC-Unique: 9bIRIlpYOh246OMVp2SC8A-1 Received: by mail-lj1-f199.google.com with SMTP id p10-20020a2e9aca0000b02900ce00c6dd3dso663878ljj.10 for ; Thu, 29 Apr 2021 14:44:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=juBq8jG0IfhjhmycH0bcejt+prDDhvyOJbkP2ywOwNE=; b=CwEoRfZdNaZ5X9AD8Jr0IyLsy4rBI0m+Ixpg5ShL7Ca9mEL18MS8alNEAdBKDpM1yK vdTTsy1QXgTEGj+cXTbHwWM2R7napDTfYyZWTw3h6vXNTnPpTWOov/lDp+KpsHG2TPsN yGmqnO/7hSD+ToRVEY2HDm/jSfwoxtEePtWhgCWCwcUavZgZ9lKTxWJ2nzah+PDzB564 sNkO/J0GFxP7weA9xlNI+jQ/VxkRZ5lsIchfCPAeQl8qaVFs6uqjAydvlLe+efoYtzRp g26PCCMKwZ2HGXzb6InV0neDQWWCmCMo1HQ79pgC/+D9XJ04NUMlNmm66XNrqY6PyzuE hk/A== X-Gm-Message-State: AOAM530i0VzefpfRizAXpMA4fgN2gKIcf9YurxJql1umPBiN5+IXax2+ I6RtmtBwGH4gIhTCEH0Fk/81sSuxx/nCu8I77fBTK5aP80khLt5zsmBO0p2wGKBvxCvBmo5s7TD b/uxFSpnBYqVTt+iNwFVhVC2XN9uY2T87WGbu+Cck X-Received: by 2002:ac2:41cb:: with SMTP id d11mr1134850lfi.114.1619732697348; Thu, 29 Apr 2021 14:44:57 -0700 (PDT) X-Received: by 2002:ac2:41cb:: with SMTP id d11mr1134820lfi.114.1619732696953; Thu, 29 Apr 2021 14:44:56 -0700 (PDT) MIME-Version: 1.0 References: <20200625223443.2684-1-nitesh@redhat.com> <20200625223443.2684-2-nitesh@redhat.com> <3e9ce666-c9cd-391b-52b6-3471fe2be2e6@arm.com> <20210127121939.GA54725@fuller.cnet> <87r1m5can2.fsf@nanos.tec.linutronix.de> <20210128165903.GB38339@fuller.cnet> <87h7n0de5a.fsf@nanos.tec.linutronix.de> <20210204181546.GA30113@fuller.cnet> <20210204190647.GA32868@fuller.cnet> <87y2g26tnt.fsf@nanos.tec.linutronix.de> <7780ae60-efbd-2902-caaa-0249a1f277d9@redhat.com> <07c04bc7-27f0-9c07-9f9e-2d1a450714ef@redhat.com> <20210406102207.0000485c@intel.com> <1a044a14-0884-eedb-5d30-28b4bec24b23@redhat.com> <20210414091100.000033cf@intel.com> <54ecc470-b205-ea86-1fc3-849c5b144b3b@redhat.com> In-Reply-To: <54ecc470-b205-ea86-1fc3-849c5b144b3b@redhat.com> From: Nitesh Lal Date: Thu, 29 Apr 2021 17:44:45 -0400 Message-ID: Subject: Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs To: Jesse Brandeburg , Thomas Gleixner , "frederic@kernel.org" , "juri.lelli@redhat.com" , Marcelo Tosatti , abelits@marvell.com Cc: Robin Murphy , "linux-kernel@vger.kernel.org" , "linux-api@vger.kernel.org" , "bhelgaas@google.com" , "linux-pci@vger.kernel.org" , "rostedt@goodmis.org" , "mingo@kernel.org" , "peterz@infradead.org" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "sfr@canb.auug.org.au" , "stephen@networkplumber.org" , "rppt@linux.vnet.ibm.com" , "jinyuqi@huawei.com" , "zhangshaokun@hisilicon.com" , netdev@vger.kernel.org, chris.friesen@windriver.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 15, 2021, at 6:11 PM Nitesh Narayan Lal wro= te: > > > On 4/14/21 12:11 PM, Jesse Brandeburg wrote: > > Nitesh Narayan Lal wrote: > > > >>> The original issue as seen, was that if you rmmod/insmod a driver > >>> *without* irqbalance running, the default irq mask is -1, which means > >>> any CPU. The older kernels (this issue was patched in 2014) used to u= se > >>> that affinity mask, but the value programmed into all the interrupt > >>> registers "actual affinity" would end up delivering all interrupts to > >>> CPU0, > >> So does that mean the affinity mask for the IRQs was different wrt whe= re > >> the IRQs were actually delivered? > >> Or, the affinity mask itself for the IRQs after rmmod, insmod was chan= ged > >> to 0 instead of -1? > > The smp_affinity was 0xfff, and the kernel chooses which interrupt to > > place the interrupt on, among any of the bits set. > > > > > > Your description of the problem makes it obvious there is an issue. It > > appears as if cpumask_local_spread() is the wrong function to use here. > > If you have any suggestions please let me know. > > > > We had one other report of this problem as well (I'm not sure if it's > > the same as your report) > > https://lkml.org/lkml/2021/3/28/206 > > https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210125= /023120.html > > So to understand further what the problem was with the older kernel based on Jesse's description and whether it is still there I did some more digging. Following are some of the findings (kindly correct me if there is a gap in my understanding): Part-1: Why there was a problem with the older kernel? ------ With a kernel built on top of the tag v4.0.0 (with Jesse's patch reverted and irqbalance disabled), if we observe the/proc/irq for ixgbe device IRQs then there are two things to note: # No separate effective affinity (Since it has been introduced as a part of the 2017 IRQ re-work) $ ls /proc/irq/86/ affinity_hint node p2p1 smp_affinity smp_affinity_list spurious # Multiple CPUs are set in the smp_affinity_list and the first CPU is CPU0: $ proc/irq/60/p2p1-TxRx-0 0,2,4,6,8,10,12,14,16,18,20,22 $ /proc/irq/61/p2p1-TxRx-1 0,2,4,6,8,10,12,14,16,18,20,22 $ /proc/irq/62/p2p1-TxRx-2 0,2,4,6,8,10,12,14,16,18,20,22 ... Now, if we read the commit message from Thomas's patch that was part of this IRQ re-work: fdba46ff: x86/apic: Get rid of multi CPU affinity " .. 2) Experiments have shown that the benefit of multi CPU affinity is close to zero and in some tests even worse than setting the affinity to a sing= le CPU. The reason for this is that the delivery targets the APIC with the lowest ID first and only if that APIC is busy (servicing an interrupt, i.e. ISR is not empty) it hands it over to the next APIC. In the conducted tests the vast majority of interrupts ends up on the APIC with the lowest ID anyway, so there is no natural spreading of the interrupts possible.=E2=80=9D " I think this explains why even if we have multiple CPUs in the SMP affinity mask the interrupts may only land on CPU0. With Jesse's patch in the kernel initial affinity mask that included multiple CPUs is overwritten with a single CPU. This CPU was previously selected based on vector_index, later on, this has been replaced with a log= ic where the CPU was fetched from cpumask_local_spread. Hence, in this case, the interrupts will be spread across to different CPUs. # listing the IRQ smp_affinity_list on the v4.0.0 kernel with Jesse's patch $ /proc/irq/60/p2p1-TxRx-0 0 $ /proc/irq/61/p2p1-TxRx-1 1 $ /proc/irq/62/p2p1-TxRx-2 2 ... $ /proc/irq/65/p2p1-TxRx-5 5 $ /proc/irq/66/p2p1-TxRx-6 6 ... Part-2: Why this may not be a problem anymore? ------ With the latest kernel, if we check the effective_affinity_list for i40e IRQs without irqblance and with Jesse's patch reverted, it is already set to a single CPU that is not always 0. This CPU is retrieved based on vector allocation count i.e., we get a CPU that has the lowest vector allocation count. $ /proc/irq/100/i40e-eno1-TxRx-5 28 $ /proc/irq/101/i40e-eno1-TxRx-6 30 $ /proc/irq/102/i40e-eno1-TxRx-7 32 =E2=80=A6 $ /proc/irq/121/i40e-eno1-TxRx-18 16 $ /proc/irq/122/i40e-eno1-TxRx-19 18 .. @Jesse do you think the Part-1 findings explain the behavior that you have observed in the past? Also, let me know if there are any suggestions or experiments to try here. -- Thanks Nitesh