Received: by 10.223.185.116 with SMTP id b49csp2428667wrg; Thu, 22 Feb 2018 13:39:37 -0800 (PST) X-Google-Smtp-Source: AH8x224w3jfldzt9SpF73vJoZ+fVRbaZsw1iJqIn7TZbCjsrjH0fMSgAOcHv2Ar7kW3bG8bCrQsx X-Received: by 2002:a17:902:7e87:: with SMTP id c7-v6mr8133310plm.138.1519335577074; Thu, 22 Feb 2018 13:39:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519335577; cv=none; d=google.com; s=arc-20160816; b=mZYT67CyZLxoJC3kFGvM21YVjn2a4L6ZUbe1j39jTbLj5+6vH1KzjO0WPO03GjHs+i T1IGcfptDuT+WUJDgvAJKkoPQ3iAOrwmzUneBIQFl7YyEZc81mMODBrl4M0fuGoH24zc kVLT6UX+46GolP4CyZDvDUDqQ6edvd8bBkTfverC2jDIKLixmB7gmBufKIC903Q+Fz1x lN+sWqbe/tUnJAzyckijoHqFKsCOb4bzYRQGJNCxlq5wbL1u3qND0/OXoMBAm8r7F4J1 l0T9/eU6VW7dSTLCqCGS0P3wqE/BRjE2pdbrSs30TnHFk2Qajm87lyxDHShlWqTyD7Fc +sew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=nnG+ZveaCvFBcKju5Yr+eCBOHzh9Ir6SlejPbz5tmz4=; b=VVcERVbDPm4w/YWylcPEJwQG1nPWKyQdBqLwVV0lYKgE/vtFg/cBhWoLgPnJkBUgNF o3Z1VXd3dqAQfz45ILxYqYdUdCKoZr0G5W3ST8FM5szhckv5A/1vpQjh4KP7r52Gzo4P crOB4ekHMHNqFMAiS7UcK9xZdmiwJtq2RCANlZ66AH8RM/p1SCQD5fIi/iFQHjCFb+oU EwoNOT17egzWHpOfSI5rRel0hArWrv3sgqryG4C8FWhM/3W2P5CHjZgXZGXt/vHAQbGb tfSWBZRbJlG5uwqE3Jg4CSJ1J8wls4y/g7YBPnDGdf1FgYA8tlM0iAG5gT0H6Y6J3+Kc lllw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 65-v6si616014plb.635.2018.02.22.13.39.22; Thu, 22 Feb 2018 13:39:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751477AbeBVVim (ORCPT + 99 others); Thu, 22 Feb 2018 16:38:42 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:39923 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751330AbeBVVil (ORCPT ); Thu, 22 Feb 2018 16:38:41 -0500 Received: from [37.81.189.207] by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1eoyWF-0000gI-Ln; Thu, 22 Feb 2018 22:35:11 +0100 Date: Thu, 22 Feb 2018 22:38:49 +0100 (CET) From: Thomas Gleixner To: Tariq Toukan cc: linux-kernel@vger.kernel.org, Maor Gottlieb Subject: Re: WARNING and PANIC in irq_matrix_free In-Reply-To: <1e79eaa9-202d-b48c-4463-ae195a4fc891@mellanox.com> Message-ID: References: <1e79eaa9-202d-b48c-4463-ae195a4fc891@mellanox.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 Feb 2018, Tariq Toukan wrote: > On 20/02/2018 8:18 PM, Thomas Gleixner wrote: > > On Tue, 20 Feb 2018, Thomas Gleixner wrote: > > > On Tue, 20 Feb 2018, Tariq Toukan wrote: > > > > > > Is there CPU hotplugging in play? > > No. Ok. > > > > > > I'll come back to you tomorrow with a plan how to debug that after staring > > > into the code some more. > > > > Do you have a rough idea what the test case is doing? > > > > It arbitrary appears in different flows, like sending traffic or interface > configuration changes. Hmm. Looks like memory corruption, but I can't pin point it. Find below a debug patch which should prevent the crash and might give us some insight into the type of corruption. Please enable the irq_matrix and vector allocation trace points. echo 1 >/sys/kernel/debug/tracing/events/irq_matrix/enable echo 1 >/sys/kernel/debug/tracing/events/irq_vectors/vector*/enable When the problem triggers the bogus vector is printed and the trace is frozen. Please provide dmesg and the tracebuffer output. Thanks, tglx 8<-------------- --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -822,6 +822,12 @@ static void free_moved_vector(struct api unsigned int cpu = apicd->prev_cpu; bool managed = apicd->is_managed; + if (vector < FIRST_EXTERNAL_VECTOR || vector >= FIRST_SYSTEM_VECTOR) { + tracing_off(); + pr_err("Trying to clear prev_vector: %u\n", vector); + goto out; + } + /* * This should never happen. Managed interrupts are not * migrated except on CPU down, which does not involve the @@ -833,6 +839,7 @@ static void free_moved_vector(struct api trace_vector_free_moved(apicd->irq, cpu, vector, managed); irq_matrix_free(vector_matrix, cpu, vector, managed); per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED; +out: hlist_del_init(&apicd->clist); apicd->prev_vector = 0; apicd->move_in_progress = 0;