Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4920267imm; Wed, 30 May 2018 14:57:08 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJAhmpP1+g1HkiymqY2NSXbMV/Cru9N2cl/89Td9SvOKSVRyWRV2d/AWSRgI5yCxS3FkObE X-Received: by 2002:a17:902:b087:: with SMTP id p7-v6mr4402400plr.227.1527717428046; Wed, 30 May 2018 14:57:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527717428; cv=none; d=google.com; s=arc-20160816; b=LiyJ7svDUYZzqad8YNRxHH8Z8H0wFuNEpjEE+kZk4SU22QanUZT5JfzbyATz1hyHbF ax5Y8bLVee9N+ZacvSSbIjTiHGfJraA6a4zpAIhNsHv3H6i0Q9GPOYzNPb2yY6KuUp7z bjkF9Ra87eLijBbX/Tyj3JKQES8xDkdHmh9fn8PgPYZdQAuts9cD+qOTVtYW9xz2DWNq IAwFxf9gbz/EZ+KmZ4njAI/c9mTebEWYWU44IHQU55UDIwx1SHTkGB0ARFd4hia/57De EwO1+i6Y0O+zT8IEIKSStLMobqT+7N51qhI2mbYBW8UY+UBJbY56dSUwTYXwFl+Udhuv qFBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=n0CLvnANT27MktInDHtEQpLJ3MWfE6US200a3ITY/48=; b=TIsAFOwKHEGTyaqj+falhCJ9fjofBHb2K34atDdjc65rWUY5PPDcVm5BPz/KjiXnKO ocA7b6d7JW77Jg8rY/vzibIzaQfATPgl9Y75vlFWa8KkDZ9TnB5YDI/4PfINS5BPgZep IXu1bOsrKvz30tsgf78XDRDh3XUvd4z8LT+4c7sl9p+AIAfgN/BQzS0HaCU9VHI6eXHO GyCqp1uPwlJNJXsxA+WEJZNqyKnaUXxr5u/xDidErGbBoqNM+/lqxa3Mns2Hxl5fC1Vo yjrk2bKLycAS/v4fEKjXPG2yPYVKDj1wV8arnLeuwqZ9HjDaF7IXZ8Mb7zkpGToku8mv wXsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k18-v6si35942542pfe.13.2018.05.30.14.56.54; Wed, 30 May 2018 14:57:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932438AbeE3V4Q (ORCPT + 99 others); Wed, 30 May 2018 17:56:16 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:49203 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932310AbeE3V4P (ORCPT ); Wed, 30 May 2018 17:56:15 -0400 Received: from p4fea5fc6.dip0.t-ipconnect.de ([79.234.95.198] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fO94k-0002Lt-BM; Wed, 30 May 2018 23:56:10 +0200 Date: Wed, 30 May 2018 23:56:09 +0200 (CEST) From: Thomas Gleixner To: Song Liu cc: Song Liu , Tariq Toukan , Dmitry Safonov <0x7f454c46@gmail.com>, open list , Maor Gottlieb , Kernel Team Subject: Re: WARNING and PANIC in irq_matrix_free In-Reply-To: <7954157A-E523-4041-825B-828A3A38E51B@fb.com> Message-ID: References: <16f47fa4-1555-cddb-3dfb-7d56fb992ea1@mellanox.com> <09A22A95-6BBD-48EB-A2FE-42BF6244F751@fb.com> <3F47F523-64C5-422B-B9B0-73B8D105CF71@fb.com> <7954157A-E523-4041-825B-828A3A38E51B@fb.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Song, On Tue, 29 May 2018, Song Liu wrote: > > On May 29, 2018, at 1:35 AM, Thomas Gleixner wrote: > >> Maybe we cannot enable all trace points under irq_vectors/ and irq_matrix. > > > > Right. Sorry, I forgot to say that we only need the irq_vectors ones which > > are related to vector allocation, i.e.: irq_vectors/vector_* > > Here is the ftrace dump: Thanks for providing the data! > 19d... 1610359248us : vector_deactivate: irq=31 is_managed=0 can_reserve=1 reserve=0 > 19d... 1610359248us : vector_clear: irq=31 vector=33 cpu=20 prev_vector=0 prev_cpu=2 > 19d... 1610359249us : irq_matrix_free: bit=33 cpu=20 online=1 avl=201 alloc=0 managed=0 online_maps=56 global_avl=11241, global_rsvd=25, total_alloc=15 Here IRQ 31 is shutdown and the vector freed. > 19d... 1610359249us : irq_matrix_reserve: online_maps=56 global_avl=11241, global_rsvd=26, total_alloc=15 > 19d... 1610359249us : vector_reserve: irq=31 ret=0 > 19d... 1610359249us : vector_config: irq=31 vector=239 cpu=0 apicdest=0x00000000 And set to the magic reservation vector 239 to catch spurious interrupts. > 20dN.. 1610366654us : vector_activate: irq=31 is_managed=0 can_reserve=1 reserve=0 > 20dN.. 1610366654us : vector_alloc: irq=31 vector=4294967268 reserved=1 ret=0 > 20dN.. 1610366655us : irq_matrix_alloc: bit=33 cpu=9 online=1 avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16 > 20dN.. 1610366655us : vector_update: irq=31 vector=33 cpu=9 prev_vector=0 prev_cpu=20 > 20dN.. 1610366656us : vector_alloc: irq=31 vector=33 reserved=1 ret=0 > 20dN.. 1610366656us : vector_config: irq=31 vector=33 cpu=9 apicdest=0x00000014 So here it gets initialized again and targets CPU9 now. > 20dN.. 1610366662us : irq_matrix_alloc: bit=33 cpu=20 online=1 avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16 > 20dN.. 1610366662us : vector_update: irq=31 vector=33 cpu=20 prev_vector=33 prev_cpu=9 > 20dN.. 1610366662us : vector_alloc: irq=31 vector=33 reserved=1 ret=0 > 20dN.. 1610366662us : vector_config: irq=31 vector=33 cpu=20 apicdest=0x0000002c Here it is reconfigured to CPU 20. Now that update schedules vector 33 on CPU9 for cleanup. > 20dN.. 1610366666us : irq_matrix_alloc: bit=34 cpu=2 online=1 avl=199 alloc=2 managed=0 online_maps=56 global_avl=11239, global_rsvd=28, total_alloc=17 > 20dN.. 1610366666us : vector_update: irq=31 vector=34 cpu=2 prev_vector=33 prev_cpu=20 > 20dN.. 1610366666us : vector_alloc: irq=31 vector=34 reserved=1 ret=0 > 20dN.. 1610366666us : vector_config: irq=31 vector=34 cpu=2 apicdest=0x00000004 So here the shit hits the fan because that update schedules vector 33 on CPU20 for cleanup while the previous cleanup for CPU9 has not been done yet. Cute. or not so cute.... > 20dNh. 1610366669us : vector_free_moved: irq=31 cpu=20 vector=33 is_managed=0 > 20dNh. 1610366670us : irq_matrix_free: bit=33 cpu=20 online=1 avl=201 alloc=0 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16 And frees the CPU 20 vector > 9d.h. 1610366696us : vector_free_moved: irq=31 cpu=20 vector=0 is_managed=0 And then CPU9 claims that it's queued for cleanup. Bah. I'm still working on a fix as the elegant solution refused to work. Thanks, tglx