Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp2426935pxb; Sat, 28 Aug 2021 14:28:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6tJTocoLnaPhrLnK2KymTL+7FEvuX++NE4PPiy8iIA8+mQ70fgBM7fAKOkTDQM7RL3edx X-Received: by 2002:a17:907:2721:: with SMTP id d1mr17228896ejl.24.1630186112628; Sat, 28 Aug 2021 14:28:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630186112; cv=none; d=google.com; s=arc-20160816; b=QwZlFLYwKUIoVPO1y+1STDAXig6A0kmpGtcpSttLLt4yX1Qx/0sru13e6Fr7/VtB5h CdYkLwFl5IvktdK2Nhj47Q4xPJHhZQLswJ9N2IvRK9vTO3QWU/uHtqRcDlBozxd2C8zf f5YerNLHS8/JOD/wFnn5caiaT9QkMGYMrXJvQF4BeLfxwIl+NtiXdkyzURk4XFCK+iWT A2OKiky9c2k/il144IH9l3tw8rqSo3f81ceshaLXdkMXwxK9h5wHofgthRfbXc4TcSMR ky6gqILT9eXLxlyAlkfy+T+II3Y6ZCwQY8G6bgxgDm0f2u46hWGvbdqnjA90GWFtr2jT 7MZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=jqn7wYgtWLIqZ9aye1ornYrsbRPpzBUqOlOGcb6SmI8=; b=bdOR+qgV2g+3TIsb9RsTxlXrUNHBVCH8DG2ejx858lRQ3g85crlLeruoDexNsL0oM9 +/YS1ouLtTdXL2JkYbddG+QnaZ9eZOFo7hhYOZA+NeQY6mGDn5ZX5r/u/I3m+Uy767z6 OfcyjI5kSSr/bKjlbVFk7Bq0cmQ3w9ze1PQNJvkQh6aM8JtNbzq5qFVj9DUyPf+Byitc eNeg9bYW/jqGYSu9ra9JUgZ7CpNbeQKobbLmX8iM9CHHP2XwroglaouBwI0ngfHeIZZf ziA13CpZIjO50GLNIC8OzfTZvC2LmB9ttBtx8NzAXIpa5r0oLDx15C3f5qOFjzgJOK+O OwdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=mqKN5AWh; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mp42si11816178ejc.604.2021.08.28.14.27.43; Sat, 28 Aug 2021 14:28:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=mqKN5AWh; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232059AbhH1UpH (ORCPT + 99 others); Sat, 28 Aug 2021 16:45:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230253AbhH1UpH (ORCPT ); Sat, 28 Aug 2021 16:45:07 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29249C061756; Sat, 28 Aug 2021 13:44:12 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1630183450; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jqn7wYgtWLIqZ9aye1ornYrsbRPpzBUqOlOGcb6SmI8=; b=mqKN5AWhujtiGKZ6L81ViYtYwQZi+xkoocav2zijsXbog4utBtunya7RVvX83UM298WNU0 g6b4QTbT6X2D/Uxjb06CtP0EKazlp+YTAyDYgoBOOcdeT7XUEUfMPxo2B7bgda1BhRXo21 AZE5GrHgCMvNhfjGefKyXHemYvVxU8/PxkTF52lDqFLQoe9LgtmqcekJaGLc74ZOzZe+sz KGP1In38tJbTvDk7FoNPwAsrM6I01dg1bgBWAU9jddoa2YuKRb8JaluRwun7nAZXcWBm1f DAmPlfPDvhlKtwTNLFyxMdvDoH6M7EYHeohvn4MVVwk1vf0fNOyLfXDI4zifZg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1630183450; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jqn7wYgtWLIqZ9aye1ornYrsbRPpzBUqOlOGcb6SmI8=; b=ez+HiTr9a13KmBngRjTOOnKKW3JeJJIguEpv3Zlr6LZei7W2GH62vHq5OMOZAn3KN3JGx0 oEVaZ7hY/QX9bODA== To: Dexuan Cui , 'Saeed Mahameed' , 'Leon Romanovsky' Cc: "'linux-pci@vger.kernel.org'" , "'netdev@vger.kernel.org'" , "'x86@kernel.org'" , Haiyang Zhang , "'linux-kernel@vger.kernel.org'" Subject: RE: [5.14-rc1] mlx5_core receives no interrupts with maxcpus=8 In-Reply-To: References: Date: Sat, 28 Aug 2021 22:44:09 +0200 Message-ID: <87tuj9guzq.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dexuan, On Sat, Aug 28 2021 at 01:53, Thomas Gleixner wrote: > On Thu, Aug 19 2021 at 20:41, Dexuan Cui wrote: >>> Sorry for the late response! I checked the below sys file, and the output is >>> exactly the same in the good/bad cases -- in both cases, I use maxcpus=8; >>> the only difference in the good case is that I online and then offline CPU 8~31: >>> for i in `seq 8 31`; do echo 1 > /sys/devices/system/cpu/cpu$i/online; done >>> for i in `seq 8 31`; do echo 0 > /sys/devices/system/cpu/cpu$i/online; done >>> >>> # cat /sys/kernel/debug/irq/irqs/209 Yes, that looks correct. >> >> I tried the kernel parameter "intremap=nosid,no_x2apic_optout,nopost" but >> it didn't help. Only "intremap=off" can work round the no interrupt issue. >> >> When the no interrupt issue happens, irq 209's effective_affinity_list is 5. >> I modified modify_irte() to print the irte->low, irte->high, and I also printed >> the irte_index for irq 209, and they were all normal to me, and they were >> exactly the same in the bad case and the good case -- it looks like, with >> "intremap=on maxcpus=8", MSI-X on CPU5 can't work for the NIC device >> (MSI-X on CPU5 works for other devices like a NVMe controller) , and somehow >> "onlining and then offlining CPU 8~31" can "fix" the issue, which is really weird. Just for the record: maxcpus=N is a dangerous boot option as it leaves the non brought up CPUs in a state where they can be hit by MCE broadcasting without being able to act on it. Which means you're operating the system out of spec. According to your debug output the interrupt in question belongs to the INTEL-IR-3 interrupt domain, which means it hangs of IOMMU3, aka DMAR unit 3. To which DMAR/remap unit are the other unaffected devices connected to? Thanks, tglx