Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8045340rwr; Wed, 10 May 2023 17:06:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4PEZgK3eN9WZNsysVa9LfXesW7XzMMbFCxWHFkrhEVSeHa8o2mMXAbqr9W7LukfX6TgFVv X-Received: by 2002:a17:902:ce91:b0:1ad:c1c2:7d1a with SMTP id f17-20020a170902ce9100b001adc1c27d1amr1206711plg.63.1683763607020; Wed, 10 May 2023 17:06:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683763607; cv=none; d=google.com; s=arc-20160816; b=puDfuFb/yT90h2aTmRRt3ZbiX4zJ9zAOzUuqTE8AiTJZ8AaRq3U1KEfZNfGUb/NGve k0aXAjTqwQNlQTcLKZreb0lG7n5EzXAM5sletCadx6GE1LMD8VheamzsXJ1XbRfS5zXz l8jzFYYOOTJXhrXltXOsJvqFfPsZD6cdfhdvS+hvLViybHwXAV7kDzuzL77WXXpHTGH5 5XlkdL75pz1WL3+Q4vbmrlTpCtW0hDfgPV6/g7IO3pc6q9nHCEx7PZubt0Wnf0xKB4FZ NF88EsL34WLELIxhClEvIz7zFGJEUQ7KQkDQFLrX+mYjXzTBzv8/kAxr77aiTfhrZU0s VUQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=uaKrKcom2BFXuduCtXHjmPL2zo+W2OJy2ayP/8xBNV8=; b=WptF6F8VoRayDbP9oIXhSGNMO13Zyuj3iAViFefJpmVryCaSAfZK8r7FPPCgE4eVpf /OGXVP9yDtb7db4h0EE2X69i3OFnJ4tZylnaSb4vGAZfqGyiNc9IClRHndodCq9EGK7m 1dRuhSEdU+pKWUroPRCCMNk5x4s0HwDYE9zxqIlU8aCZR0EKmNR1ozCSvb71Kvssu+x5 m6jgzfTRd2CNA62sGysd9xEHVSZUXJ7l+9pfGoUntIVAGdwbJbMlujEzlC4E+Y4WOnZh SozmgZHlvVfv81bH0cgNyVaTEMT9CvaThz4yiBFYfoIIakM4T7CVENzYPA/yOVhjEu7j 9VkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="XBfHxqn/"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=pBq2XPTb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x16-20020a170902821000b001a95c7e5ad9si5028251pln.352.2023.05.10.17.06.34; Wed, 10 May 2023 17:06:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="XBfHxqn/"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=pBq2XPTb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236041AbjEJXwe (ORCPT + 99 others); Wed, 10 May 2023 19:52:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbjEJXwd (ORCPT ); Wed, 10 May 2023 19:52:33 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BE3930CB for ; Wed, 10 May 2023 16:52:31 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1683762749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uaKrKcom2BFXuduCtXHjmPL2zo+W2OJy2ayP/8xBNV8=; b=XBfHxqn/hLdvoVjp6SThnbKPPOUC4JXB/+7PSNHC0ccdieoa4P5nucsg63QeexNswbq07I 30yQpVQC1jEmcg5VIRmhLpsauzhWKf+cizwLmXgj4f4sWiuJca0KpEVGbjDUalmNg+2GzY u0ZyAhVLHD2A590YwLHYunLQpfpJUfXecLNOvMR5Lw7SBF4ZpeztGr9QDDQHmvtyCkwcQm XTXpAkJeFEHMY4JQQ7ol2V8t6Lr4UWY7g1/5Kb0weYgRhhDt5XvwyCR+UfSTo5wrLnhyje +V/y8h0LHK4CeD5ZxRdlDR+nd01UMwk7RwrveUio4wnmOD6lKievc0fQW1H1mw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1683762749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uaKrKcom2BFXuduCtXHjmPL2zo+W2OJy2ayP/8xBNV8=; b=pBq2XPTbqdTYY0w4zaegRDbWep+egrwxn2M41WKRU4XqWV1alaSYaygKx9N2l0WFYbBpVg DqfKOwtSrLMNaKBQ== To: Nipun Gupta , "gregkh@linuxfoundation.org" , "maz@kernel.org" , "jgg@ziepe.ca" , "linux-kernel@vger.kernel.org" Cc: "git (AMD-Xilinx)" , "Anand, Harpreet" , "Jansen Van Vuuren, Pieter" , "Agarwal, Nikhil" , "Simek, Michal" , "Gangurde, Abhijit" , "Cascon, Pablo" , Jason Gunthorpe Subject: Re: [PATCH] cdx: add MSI support for CDX bus In-Reply-To: <87ednnes6o.ffs@tglx> References: <20230508140950.12717-1-nipun.gupta@amd.com> <874jom2ash.ffs@tglx> <87bkityxk3.ffs@tglx> <6dd142f8-5a8e-b62c-c629-a3a5859e73b3@amd.com> <87ednnes6o.ffs@tglx> Date: Thu, 11 May 2023 01:52:28 +0200 Message-ID: <878rdveocj.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Nipun! On Thu, May 11 2023 at 00:29, Thomas Gleixner wrote: > On Wed, May 10 2023 at 19:34, Nipun Gupta wrote: > #2) That's actually the worse part of it and completely broken versus > device setup > > probe() > cdx_msi_domain_alloc_irqs() > ... > request_irq() { > ... > irq_activate() > irq_chip_write_msi_msg() > ... > queue_work() > ... > } > > enable_irq_in_device() > > <- device raises interrupt and eventually uses an uninitialized > MSI message because the scheduled work has not yet completed. > > That's going to be a nightmare to debug and it's going to happen > once in a blue moon out in the field. Here is another failure scenario: cpu_down() // No scheduling possible as this happens in stomp_machine() context __cpu_disable() // Point of no return set_cpu_online(cpu, false); irq_migrate_all_off_this_cpu() ... set_affinity() That works with your current implementation by some definition of works because you schedule the affinity change async. But what makes sure that this takes effect _before_ the CPU goes into lala land and cannot respond to an interrupt anymore? Nothing, other than pure luck, which is not really a solid engineering principle. While the regular operations can be fixed via the bus_lock mechanics, this one falls flat on its nose because in that context this can't schedule anymore so acquiring a mutex or waiting for some async muck to complete is a no-no. Not the end of the world though. There is a way to handle that gracefully and we need that for the PCI/IMS implementations which will do that via command queues too. irq_migrate_all_off_this_cpu() is invoked from stomp_machine() context with interrupts disabled because the current x86 interrupt management hardware trainwreck requires it at least in the case that interrupt remapping is not available. Of course that got copied to all other architectures... Whether they really require it or not from a hardware POV is a different story. Though they all rely on the stomp_machine() context which prevents concurrent modifications. So from a historical POV all of this makes sense to some extent, but that does not prevent us to fix this and make it an two stage process which can actually handle both worlds. I'm way too tired to think that through, but you get the idea and you are welcome to beat me to it. Thanks, tglx