Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4773650pxv; Tue, 6 Jul 2021 08:50:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwIOJaLN0XZXZPvvC5N0ojeXkACiaqxTVmRpJism2jtnAKp1XEa+gOpxgfQRZfjwNzS7yhT X-Received: by 2002:a05:6e02:19cc:: with SMTP id r12mr14927444ill.285.1625586602545; Tue, 06 Jul 2021 08:50:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625586602; cv=none; d=google.com; s=arc-20160816; b=gU38N8Lbg7ZGaysqdfiwR9zobWVFLniezNfUkVs3fF1DROMaRek4TGcBk7R2lu7L2i bc9QqYqD2runtiEUkB8su65vnYd0bGGE+l0VV9QTfwBvAHIZlUVV3WBOfHQVSZ0aGfSa h4KlJX3rXN0YLRJni1W2lQkx6a09pbShLmsEEXiLSzjivn34Bqner/qzNkf/tVeuMJFF TgqUROVz3PKio+TSBwwXx6DsrsA4wIpZnDr5AUFl4WnVsUteGSHPRPaT5yLF8HPNtllX Y0TmPImOJtLhOuAU0hT54P3o/FvjwV/o/R7Y0px6TuUeeqYW5A2qYnfTjc5FGS3FRjKI gXMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=9Lz1C2Jg0D1JZ1ELvAKDI4J04OIYXO0SeOh8EZaA3Mg=; b=MTF5/qVfUgBi0cWbSz1KqZUITsFhhDtjN/7+aXW1u2WD4IYDl19R7cgxRLg0ylk6uM Zorhv3N6w1W/VhcoCgNGh3/Eq3ma82qQCSMCMpex50Z3TE1eK2lvJsCpSBXDWvaMtMMa g3x0DX0PXTnGqhBy1c7QR5FAhwQcSxgvcjBIYoRiLQBYfMd8MX/rRaWA0c6PG5Jv2g6W Dm4fVPh8PBbpKEaZ4Cvbfr9U4BJaCDgP4OwwyPc+DPiXrPK4Rk2icS+tFSU8tlCmDaJ2 WC0NHBQUU/KrW/EX0Q2mePAzpqM3kIQWsp/Hi+oRhJHwdlsaepVIHOdQhSRFSexDecT+ MjDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b="VdDtwm/D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l21si15566954ios.21.2021.07.06.08.49.50; Tue, 06 Jul 2021 08:50:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b="VdDtwm/D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232324AbhGFPvw (ORCPT + 99 others); Tue, 6 Jul 2021 11:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231422AbhGFPvw (ORCPT ); Tue, 6 Jul 2021 11:51:52 -0400 Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C4A5C061574 for ; Tue, 6 Jul 2021 08:49:13 -0700 (PDT) Received: by mail-ot1-x32d.google.com with SMTP id f12-20020a056830204cb029048bcf4c6bd9so9929689otp.8 for ; Tue, 06 Jul 2021 08:49:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9Lz1C2Jg0D1JZ1ELvAKDI4J04OIYXO0SeOh8EZaA3Mg=; b=VdDtwm/D83TJXUbV95JXQXWh8E3k45oG2fRtUIaudvPnJl1nn8ZENseETR2AfGF7Fh Z9CYBStDvLuPedqiTHMmfMEOpCaLTnt3AubNp7t+mogVZyL5uLbWqkXnmu2IapAEZw/A e7NU/V4E0AoVg9zPdo1692x8C9W8o3OhPR6UQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9Lz1C2Jg0D1JZ1ELvAKDI4J04OIYXO0SeOh8EZaA3Mg=; b=Anhgll+QHLfyvTU4hHps1WM6dU1TfLKaxPOCPJMXT5YsbkV8jlqGaoeTeRl0LpmbXc egWGaYZG2gvT0YWZ6WTi8a0rotbV2MmcHfKaOHdpQ3qnhrcvs/4ZEF+uK+700eygFvGG JvGzKOkTuh3q/OXRn4VmswJ1bLHxaRI1zTXOEEulgMqWNJHL74wNtnlVV13ZTXG5Rlg7 aIFvv16pXGOVOt620JsjZpJgSIe0qUjQCD41qJ6LtRR28GCu8tcDzHHisqj5JtVKs1vd 9kc4vdDwVVsi9vV0f4stsujwPfFzxGHEoUCj3O9wiDF7S0g9nXyxJOT0Mx1P90FBlX3L fYUQ== X-Gm-Message-State: AOAM531rkoN3pAPiVKtCcTc+JNypUv+LZvBf7xo8llcuzstmBcFDTBjC cItzkHvtUvIloc1LCKMmNkLzFXlTgYDHIIazqMuWnA== X-Received: by 2002:a9d:27a4:: with SMTP id c33mr15781208otb.281.1625586552803; Tue, 06 Jul 2021 08:49:12 -0700 (PDT) MIME-Version: 1.0 References: <20210705130314.11519-1-ogabbay@kernel.org> <20210706142357.GN4604@ziepe.ca> <20210706152542.GP4604@ziepe.ca> In-Reply-To: <20210706152542.GP4604@ziepe.ca> From: Daniel Vetter Date: Tue, 6 Jul 2021 17:49:01 +0200 Message-ID: Subject: Re: [PATCH v4 0/2] Add p2p via dmabuf to habanalabs To: Jason Gunthorpe Cc: Oded Gabbay , Oded Gabbay , "Linux-Kernel@Vger. Kernel. Org" , Greg Kroah-Hartman , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Gal Pressman , sleybo@amazon.com, Maling list - DRI developers , linux-rdma , Linux Media Mailing List , Doug Ledford , Dave Airlie , Alex Deucher , Leon Romanovsky , Christoph Hellwig , amd-gfx list , "moderated list:DMA BUFFER SHARING FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 6, 2021 at 5:25 PM Jason Gunthorpe wrote: > On Tue, Jul 06, 2021 at 04:39:19PM +0200, Daniel Vetter wrote: > > On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe wrote: > > > > > > On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote: > > > > > > > If that means AI companies don't want to open our their hw specs > > > > enough to allow that, so be it - all you get in that case is > > > > offloading the kernel side of the stack for convenience, with zero > > > > long term prospects to ever make this into a cross vendor subsystem > > > > stack that does something useful. > > > > > > I don't think this is true at all - nouveau is probably the best > > > example. > > > > > > nouveau reverse engineered a userspace stack for one of these devices. > > > > > > How much further ahead would they have been by now if they had a > > > vendor supported, fully featured, open kernel driver to build the > > > userspace upon? > > > > There is actually tons of example here, most of the arm socs have > > fully open kernel drivers, supported by the vendor (out of tree). > > I choose nouveau because of this: > > $ git ls-files drivers/gpu/drm/arm/ | xargs wc -l > 15039 total > $ git ls-files drivers/gpu/drm/nouveau/ | xargs wc -l > 204198 total drm/arm is the arm display driver, which isn't actually shipping anywhere afaik. Also it's not including the hdmi/dp output drivers, those are generally external on socs, but integrated in discrete gpu. The other thing to keep in mind is that one of these drivers supports 25 years of product generations, and the other one doesn't. So I think adding it all up it's not that much different. Last time I looked if you look at just command submission and rendering/compute, and not include display, which heavily skews the stats, it's about 10% kernel, 90% userspace driver parts. Not including anything that's shared, which is most of it (compiler frontend, intermediate optimizer, entire runtime/state tracker and all the integration and glue pieces largely). > At 13x the size of mali this is not just some easy to wire up memory > manager and command submission. And after all that typing it still > isn't very good. The fully supported AMD vendor driver is over 3 > million lines, so nouveau probably needs to grow several times. AMD is 3 million lines the size because it includes per-generation generated header files. And of course once you throw an entire vendor team at a driver all those engineers will produce something, and there's the usual that the last 10% of features produce about 90% of the complexity and code problem. E.g. the kbase driver for arm mali gpu is 20x the size of the in-tree panfrost driver - they need to keep typing to justify their continued employement, or something like that. Usually it's because they reinvent the world. > My argument is that an in-tree open kernel driver is a big help to > reverse engineering an open userspace. Having the vendors > collaboration to build that monstrous thing can only help the end goal > of an end to end open stack. Not sure where this got lost, but we're totally fine with vendors using the upstream driver together with their closed stack. And most of the drivers we do have in upstream are actually, at least in parts, supported by the vendor. E.g. if you'd have looked the drm/arm driver you picked is actually 100% written by ARM engineers. So kinda unfitting example. > For instance a vendor with an in-tree driver has a strong incentive to > sort out their FW licensing issues so it can be redistributed. Nvidia has been claiming to try and sort out the FW problem for years. They even managed to release a few things, but I think the last one is 2-3 years late now. Partially the reason is that there don't have a stable api between the firmware and driver, it's all internal from the same source tree, and they don't really want to change that. > I'm not sure about this all or nothing approach. AFAIK DRM has the > worst problems with out of tree drivers right now. Well I guess someone could stand up a drivers/totally-not-gpu and just let the flood in. Even duplicated drivers and everything included, because the vendor drivers are better. Worth a shot, we've practically started this already, I'm just not going to help with the cleanup. > > Where it would have helped is if this open driver would come with > > redistributable firmware, because that is right now the thing making > > nouveau reverse-engineering painful enough to be non-feasible. Well > > not the reverse-engineering, but the "shipping the result as a working > > driver stack". > > I don't think much of the out of tree but open drivers. The goal must > be to get vendors in tree. Agreed. We actually got them in-tree largely. Nvidia even contributes the oddball thing, and I think the tegra line is still fully supported in upstream with the upstream driver. I'm not sure the bleak picture you're drawing is reality, aside from the fact that Nvidia discrete gpu drivers being a disaster with no redistributable firmware, no open kernel driver that works, and nothing else really either. > I would applaud Habana for getting an intree driver at least, even if > the userspace is not what we'd all want to see. > > > I don't think the facts on the ground support your claim here, aside > > from the practical problem that nvidia is unwilling to even create an > > open driver to begin with. So there isn't anything to merge. > > The internet tells me there is nvgpu, it doesn't seem to have helped. Not sure which one you mean, but every once in a while they open up a few headers, or a few programming specs, or a small driver somewhere for a very specific thing, and then it dies again or gets obfuscated for the next platform, or just never updated. I've never seen anything that comes remotely to something complete, aside from tegra socs, which are fully supported in upstream afaik. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch