Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1870176rdb; Thu, 7 Dec 2023 10:54:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IGiyf8IZWjfHnnZmg5upTKrc2xRuQ0YNoa9i2E1M4ai8f9WETDzpx5lwRo2qpnNs8wAjYu0 X-Received: by 2002:a17:902:e804:b0:1d0:69dc:954e with SMTP id u4-20020a170902e80400b001d069dc954emr3839793plg.24.1701975256736; Thu, 07 Dec 2023 10:54:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701975256; cv=none; d=google.com; s=arc-20160816; b=fe4uJIJRWhqANtVntNEqxPqALit+hpg3JdIWbn7xItMXp78J3/LSdmH9S4qurCb/1q 63fwLmPrUkntHV2shntJNpVWI2noJxGmDhmsls3C24nsWOA/POl8xgD/dFAjtzDavVDr y+mcgTua2Wc9iliOjrMF3mXI34mLBRAVpAMTD+InyM5u+KFGGC9xiIwh03ZALpBfgdsI /nbshFphJ/ocJDpYdWVtOQH/HwI7rNhSyjTNxZIvfuYMWBfStjLH+pSq1Kfg5tzmFkD7 8im/9r+t/mz54o4ta43hEpJdjx45dY6AENw4pEuNeyefd1opVaM6Q0gWgfQcYfjLwj4T tGng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=K+4baxLGJCbqA3zWkhFCwva3PQMKvUCWJT2l+eXdYiI=; fh=Zb8VfsmTa3n53YapGPfd7DhQ7mZaBPGchmLOKyxgk7I=; b=mtl9GwFD5Zi2BbF3r7u3ylIQyZLY9JuXD3jIn3C4IuWPgV3iMbnxA8DtqVEjTgJMlw KH9EVxnRSlBHrZt4p4d3ucm8oBWmgzsh+B05anGQg4tYmxCCCNHaeU5W/Rwx+CKAHGX/ W8eW6V0gajpfNwfjrXpgoXyPelpWheKHY/LXtXwpL6Ji87WUCc2gmZ0qOr9MLCdACsay y6vJOK1re4VBVbIGou9rRzPuDvKuETwXJIeljhHrSnPOJ9GRh4xxnu/sqoiIF52DK4R2 9RGMbPyuG92en7gaAt0YbNg1IbJs6VbvZUUkiU9OkOJ9+hIi6lWrY2HetAKH42n/SrnZ YGVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fVAeBBGV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id c6-20020a170903234600b001d060d48fb3si173126plh.460.2023.12.07.10.54.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 10:54:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fVAeBBGV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 06F0781F8412; Thu, 7 Dec 2023 10:54:14 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443238AbjLGSx7 (ORCPT + 99 others); Thu, 7 Dec 2023 13:53:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233073AbjLGSx6 (ORCPT ); Thu, 7 Dec 2023 13:53:58 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1432C10FC for ; Thu, 7 Dec 2023 10:54:04 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A61DDC433C7; Thu, 7 Dec 2023 18:54:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701975243; bh=K+4baxLGJCbqA3zWkhFCwva3PQMKvUCWJT2l+eXdYiI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=fVAeBBGVY0Rj8o+1+QzAK0FiQb+Wo/noUN2XTR3/HHK+3u8pwdPiBt/+3VG6jYRAY WqXJMIiidKTfbd1agraFBV09t+gnSpekXl2UqpJe3Fxmovr3m23PNJ3zvObGuJoX6C Tg11zCPh9VStRw01fngVLaydmkFdCFiwmbnHvSzlKvpyAenBzoNOKV0NEWqOjFbLAy c8rUj+KYrBzKjTtQKvj7Jk+90873OgW+EVbraqjAwfnT5XrmY1We7aYJgoJ0L6sTaK boH+iVzipKXl8VFeltmEcK25ItMfidGIXRYnODmHp89CmXKSG1gd6PFRkRFkK1yre/ H/bjL5xtw3Zyg== Date: Thu, 7 Dec 2023 10:54:02 -0800 From: Saeed Mahameed To: Aron Silverton Cc: Jakub Kicinski , Greg Kroah-Hartman , Jason Gunthorpe , David Ahern , Arnd Bergmann , Leon Romanovsky , Jiri Pirko , Leonid Bloch , Itay Avraham , linux-kernel@vger.kernel.org, Saeed Mahameed Subject: Re: [PATCH V3 2/5] misc: mlx5ctl: Add mlx5ctl misc driver Message-ID: References: <20231128084421.6321b9b2@kernel.org> <20231128175224.GR436702@nvidia.com> <20231128103304.25c2c642@kernel.org> <2023112922-lyricist-unclip-8e78@gregkh> <20231204185210.030a72ca@kernel.org> <20231205204855.52fa5cc1@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Thu, 07 Dec 2023 10:54:14 -0800 (PST) On 07 Dec 10:41, Aron Silverton wrote: >On Tue, Dec 05, 2023 at 08:48:55PM -0800, Jakub Kicinski wrote: >> On Tue, 5 Dec 2023 11:11:00 -0600 Aron Silverton wrote: >> > 1. As mentioned already, we recently faced a complex problem with RDMA >> > in KVM and were getting nowhere trying to debug using the usual methods. >> > Mellanox support was able to use this debug interface to see what was >> > happening on the PCI bus and prove that the issue was caused by >> > corrupted PCIe transactions. This finally put the investigation on the >> > correct path. The debug interface was used consistently and extensively >> > to test theories about what was happening in the system and, ultimately, >> > allowed the problem to be solved. >> >> You hit on an important point, and what is also my experience working >> at Meta. I may have even mentioned it in this thread already. >> If there is a serious issue with a complex device, there are two ways >> you can get support - dump all you can and send the dump to the vendor >> or get on a live debugging session with their engineers. Users' ability >> to debug those devices is practically non-existent. The idea that we >> need access to FW internals is predicated on the assumption that we >> have an ability to make sense of those internals. >> >> Once you're on a support call with the vendor - just load a custom >> kernel, module, whatever, it's already extremely expensive manual labor. >> >> > 2. We've faced RDMA issues related to lost EQ doorbells, requiring >> > complex debug, and ultimately root-caused as a defective CPU. Without >> > interactive access to the device allowing us to test theories like, >> > "what if we manually restart the EQ", we could not have proven this >> > definitively. >> >> I'm not familiar with the RDMA debugging capabilities. Perhaps there >> are some gaps there. The more proprietary the implementation the harder >> it is to debug. An answer to that would be "try to keep as much as >> possible open".. and interfaces which let closed user space talk to >> closed FW take us in the opposite direction. >> >> FWIW good netdevice drivers have a selftest which tests IRQ generation >> and EQ handling. I think that'd cover the case you're describing? >> IDK if mlx5 has them, but if it doesn't definitely worth adding. And I >> recommend running those on suspicious machines (ethtool -t, devlink has >> some selftests, too) > >Essentially, a warning light, and that doesn't solve the underlying >problem. We still need experts (e.g., vendors) to investigate with their >toolsets when and where the problem occurs. > >I offered this as an example of one issue we solved. I cannot predict >what kind of issues will pop up in the future, and writing a self-test >for every possible situation is impossible by definition. > >> >> > Firstly, We believe in working upstream and all of the advantages that >> > that brings to all the distros as well as to us and our customers. >> > >> > Secondly, Our cloud business offers many types of machine instances, >> > some with bare metal/vfio mlx5 devices, that require customer driven >> > debug and we want our customers to have the freedom to choose which OS >> > they want to use. >> >> I understand that having everything packaged and shipped together makes >> life easier. > >I think it is a requirement. We operate with Secure Boot. The kernel is >locked down. We don't have debugfs access, even if it were sufficient, >and we cannot compile and load modules. Even without Secure Boot, there >may not be a build environment available. > >We really need the module ready-to-go when the debug calls for it - no >building, no reboots, no months long attempts to reproduce in some lab - >just immediate availability of the debug interface on the affected >machine. > >> >> If the point of the kernel at this stage of its evolution is to collect >> incompatible bits of vendor software, make sure they build cleanly and >> ship them to distros - someone should tell me, and I will relent. > >I'm not sure I follow you... The mlx5ctl driver seems very compatible >with the mlx5 device driver. I may be misunderstanding. > mlx5ctl is 100% compatible with mlx5 ConnectX open spec [1], and supports any mlx5 driven stacks, not only netdev, it is able to expose millions of objects and device states interactively, debugfs would explode if we even try to accommodate some of these objects or states via debugfs, not to mention it is also impossible to maintain a stable debugfs output for such a huge data set, when this mlx5ctl interface speaks out a clear and open ConnectX language, which is the hole point of the driver. ConnectX is a highly programmable device for the enduser, and we have a very open / accommodating policy, an advanced user who can read the open spec [1], will also have the ability to do self-debug of their own RDMA/DPU/FPGA apps or similar usecases. Also I would like to repeat, this is not touching netdev, netdev's policies do not apply to the greater kernel or RDMA, and we have use cases with pure-infiniband/DPU/FPGA cards that have no netdev at all, or other cases with pur virtio instances, and much more. [1] https://network.nvidia.com/files/doc-2020/ethernet-adapters-programming-manual.pdf