Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1357043imu; Wed, 23 Jan 2019 15:43:12 -0800 (PST) X-Google-Smtp-Source: ALg8bN7Hk2QTG94EKFQkYbntvfasMN+whV4Ishzg0aQGameXnoH+a3d8i7TWlHd3j1cW+SF79WYV X-Received: by 2002:a63:193:: with SMTP id 141mr3898273pgb.136.1548286992781; Wed, 23 Jan 2019 15:43:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548286992; cv=none; d=google.com; s=arc-20160816; b=bkgxpveR5kGui1lEY5Sscv4z2qeAJUg9T0myWplRU7YH4b6B9k9E+ydA+bb6kUJ0u+ DSFElyq+0kUIgq463rltT34dj3b4UWngoHwyHchAcuPumscgC6iLVNJY/xTEVlxig283 0aVM7GP4vomaqlzPY7aLFbmXdyxOjqhjRmogQOqizXgR5yml/EHO7uCpvdnqrm0fJi3C NPxQT8r+54faZGRJmPFdzKg3JOHgFjRrmlBpCDLiSRoOJbVqcJd8tMxR99O3KgC2qyak mDUmsnbn7yDBZfIi/rfp2kSNA2QuZxf/UuLRE/Z1aEMzl8vPHSOgJ576o1Ifwwfj9jjv IHWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=d2Dp+7a9clGEuaaVX83pwSBDeXwY1G2UUFONsI9pNDs=; b=Hmhv/78sYuenx3dqluYHDjO+pIZng56W7KWX5vpm9OfmrI05yiv9gi0A59QOWINrLZ uidrmSO13yCKkkF3oo1dr5BAs0m8FIFfEY1vaGgvZvOxhJ0qjGGTjKBeOIt+rj23fTY6 Va1CV4qdLq59EaAq7MU4pKZmKEZURTcLvzuvqJt4P4pmDH31oqnajjkBiUb5tM3D9dM8 /6qmP98NQNeuJASIChEwyTuGwxIk1fAFcs2vhdCt3Ib2ZVNKy+5mJUHwteEEH7kackQX 9/VQi0mi/KtwrGdFdQo5Q4iJfIBG5YJu7sHhT9WsURB50dPGuHels/uKvDFyBInQRQaN kI3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lixom-net.20150623.gappssmtp.com header.s=20150623 header.b=z3hrwu62; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h16si20149673pgh.283.2019.01.23.15.42.56; Wed, 23 Jan 2019 15:43:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lixom-net.20150623.gappssmtp.com header.s=20150623 header.b=z3hrwu62; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726627AbfAWXkj (ORCPT + 99 others); Wed, 23 Jan 2019 18:40:39 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:38583 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726236AbfAWXki (ORCPT ); Wed, 23 Jan 2019 18:40:38 -0500 Received: by mail-io1-f67.google.com with SMTP id l14so3152128ioj.5 for ; Wed, 23 Jan 2019 15:40:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lixom-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=d2Dp+7a9clGEuaaVX83pwSBDeXwY1G2UUFONsI9pNDs=; b=z3hrwu6222MxgUTiLEyHrhzOAYJ7YZ5GPa4/3mgFZ6Jat+fuyyLm30ODe/BRqVCPXk zKWjB5p4pJA4I+il7J0BLZmYn48cjtV3qVRDsJ8JPauB+iPNTTiAALTjpkVuHsbOtChN tuw7GhsNNlAU+8v/hchp46Q7P+1QpkAwcDU1hPZg8upDWeAK7XZRk1RjPY0b/qhAMd/O WS0tF2J9UtDo6/swD9Viqomvz3NEuCE1n1F1ydwZzEGDwyd66IMxLHr9EoIG7QSv/EOy 0BTmsQh/jIJaWs3PcE2GheY0EMqhaEadjjNiMHQFrfYMKoe+4uAhbUdz/qm9EHQUhKsi Dg0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=d2Dp+7a9clGEuaaVX83pwSBDeXwY1G2UUFONsI9pNDs=; b=TFNoLoUHAQlmvavM5LhRf/ZCPeAXfRT6pdjmAim5blcyD/bn+szrzPBCdGhHBUFgxR bRZN3IyHtdZ8j3VStJuXv9p0VDw2ZXLl1w7AiR1tKZJ97uhs13KDoHxo3vsRgtg1M4BJ 0PCOXSUP6B2Ch7z0etcxJURJfMNeBTstbQg1b7iwsBBpz04K/B8NK4WzQszskjDETcaQ yckQGcZNxI+M6WebYQko3gO+CMhQtSfjsV0VjCawMxdgyAUQG/fyfblvoltkTVI55hk2 143T8SROeIlwHJ6ZNywJ5lnflGWbB2Fi5YlWzSIEZynWnDPq6drdl0fqOcNU5j8Ga/8p T6ww== X-Gm-Message-State: AJcUukeJefXdss5PiiF8LC83s+33cU/EEpE8nUEuJKDJ9DDm90b53f5B nFXg5rHoxjVtL7QQ7V9agJqUshEOnrGLYlDQvFVVLA== X-Received: by 2002:a6b:440e:: with SMTP id r14mr2794573ioa.78.1548286837062; Wed, 23 Jan 2019 15:40:37 -0800 (PST) MIME-Version: 1.0 References: <20190123000057.31477-1-oded.gabbay@gmail.com> <20190123232052.GD1257@redhat.com> In-Reply-To: <20190123232052.GD1257@redhat.com> From: Olof Johansson Date: Wed, 23 Jan 2019 15:40:25 -0800 Message-ID: Subject: Re: [PATCH 00/15] Habana Labs kernel driver To: Jerome Glisse Cc: Dave Airlie , Oded Gabbay , Greg Kroah-Hartman , Daniel Vetter , LKML , ogabbay@habana.ai, Arnd Bergmann , fbarrat@linux.ibm.com, Andrew Donnellan Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 23, 2019 at 3:20 PM Jerome Glisse wrote: > > On Wed, Jan 23, 2019 at 03:04:33PM -0800, Olof Johansson wrote: > > On Wed, Jan 23, 2019 at 2:45 PM Dave Airlie wrote: > > > > > > On Thu, 24 Jan 2019 at 08:32, Oded Gabbay wrote: > > > > > > > > On Thu, Jan 24, 2019 at 12:02 AM Dave Airlie wrote: > > > > > > > > > > Adding Daniel as well. > > > > > > > > > > Dave. > > > > > > > > > > On Thu, 24 Jan 2019 at 07:57, Dave Airlie wrote: > > > > > > > > > > > > On Wed, 23 Jan 2019 at 10:01, Oded Gabbay wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > For those who don't know me, my name is Oded Gabbay (Kernel Maintainer > > > > > > > for AMD's amdkfd driver, worked at RedHat's Desktop group) and I work at > > > > > > > Habana Labs since its inception two and a half years ago. > > > > > > > > > > > > Hey Oded, > > > > > > > > > > > > So this creates a driver with a userspace facing API via ioctls. > > > > > > Although this isn't a "GPU" driver we have a rule in the graphics > > > > > > drivers are for accelerators that we don't merge userspace API with an > > > > > > appropriate userspace user. > > > > > > > > > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > > > > > > > > > > > > I see nothing in these accelerator drivers that make me think we > > > > > > should be treating them different. > > > > > > > > > > > > Having large closed userspaces that we have no insight into means we > > > > > > get suboptimal locked for ever uAPIs. If someone in the future creates > > > > > > an open source userspace, we will end up in a place where they get > > > > > > suboptimal behaviour because they are locked into a uAPI that we can't > > > > > > change. > > > > > > > > > > > > Dave. > > > > > > > > Hi Dave, > > > > While I always appreciate your opinion and happy to hear it, I totally > > > > disagree with you on this point. > > > > > > > > First of all, as you said, this device is NOT a GPU. Hence, I wasn't > > > > aware that this rule might apply to this driver or to any other driver > > > > outside of drm. Has this rule been applied to all the current drivers > > > > in the kernel tree with userspace facing API via IOCTLs, which are not > > > > in the drm subsystem ? I see the logic for GPUs as they drive the > > > > display of the entire machine, but this is an accelerator for a > > > > specific purpose, not something generic as GPU. I just don't see how > > > > one can treat them in the same way. > > > > > > The logic isn't there for GPUs for those reason that we have an > > > established library or that GPUs are in laptops. They are just where > > > we learned the lessons of merging things whose primary reason for > > > being in the kernel is to execute stuff from misc userspace stacks, > > > where the uAPI has to remain stable indefinitely. > > > > > > a) security - without knowledge of what the accelerator can do how can > > > we know if the API you expose isn't just a giant root hole? > > > > > > b) uAPI stability. Without a userspace for this, there is no way for > > > anyone even if in possession of the hardware to validate the uAPI you > > > provide and are asking the kernel to commit to supporting indefinitely > > > is optimal or secure. If an open source userspace appears is it to be > > > limited to API the closed userspace has created. It limits the future > > > unnecessarily. > > > > > > > There is no way that "someone" will create a userspace > > > > for our H/W without the intimate knowledge of the H/W or without the > > > > ISA of our programmable cores. Maybe for large companies this request > > > > is valid, but for startups complying to this request is not realistic. > > > > > > So what benefit does the Linux kernel get from having support for this > > > feature upstream? > > > > > > If users can't access the necessary code to use it, why does this > > > require to be maintained in the kernel. > > > > > > > To conclude, I think this approach discourage other companies from > > > > open sourcing their drivers and is counter-productive. I'm not sure > > > > you are aware of how difficult it is to convince startup management to > > > > opensource the code... > > > > > > Oh I am, but I'm also more aware how quickly startups go away and > > > leave the kernel holding a lot of code we don't know how to validate > > > or use. > > > > > > I'm opening to being convinced but I think defining new userspace > > > facing APIs is a task that we should take a lot more seriously going > > > forward to avoid mistakes of the past. > > > > I think the most important thing here is to know that things are > > likely to change quite a bit over the next couple of years, and that > > we don't know yet what we actually need. If we hold off picking up > > support for hardware while all of this is ironed out, we'll miss out > > on being exposed to it, and will have a very tall hill to climb once > > we try to convince vendors to come into the fold. It's also not been a > > requirement for the other two drivers we have merged, as far as I can > > tell (CAPI and OpenCAPI) so the cat's already out of the bag. > > > > I'd rather not get stuck in a stand-off needing the longterm solution > > to pick up the short term contribution. That way we can move over to a > > _new_ API once there's been a better chance of finding common grounds > > and once things settle down a bit, instead of trying to bring some > > larger legacy codebase for devices that people might no longer care > > much about over to the newer APIs. > > > > It's better to be exposed to the HW and drivers now, than having > > people build large elaborate out-of-tree software stacks for this. > > It's also better to get them to come and collaborate now, instead of > > pushing them away until things are perfect. > > > > Having a way to validate and exercise the userspace API is important, > > including ability to change it if needed. Would it be possible to open > > up the lowest userspace pieces (driver interactions), even if some > > other layers might not yet be, to exercise the device/kernel/userspace > > interfaces without "live" workload, etc? > > Yes and to exercise the userspace API you need at very least to > know the ISA so that you can write program for the accelerator. > You also need to know the set of commands the hardware has. The > ioctl and how to create a userspace that interact with the kernel > is the easy part, the hard part is the compiler. > > So if we want any kind of freedom to play with the UAPI, enhance > it or change it in anyway we must be free to build program for the > device ourself. > > I believe that the GPU sub-system requirement are a good guideline > to follow and the only exception with drivers/ that i am aware of > is the fpga. Everything else in driver as either an open source > userspace, expose a common API (like network) or is so simple that > anyone can write a userspace for it. Once we have a common framework I agree that we need enough tools to exercise everything needed. I don't agree that this includes full sources to everything. We don't expect this for most PCIe cards today either. If the GPU subsystem is to be followed, I fear that we will end up with Nvidia-equivalent vendors from day 1, where they will just build a bigger and bigger software stack on the side instead of joining in, and someone will need to best-effort bridge the gap by reverse engineering. I don't want that situation long-term, which is why I think it's reasonable to be more relaxed during the early days with upfront, clear, expectations for the longer term that hardware/kernel interfaces need to be exercisable. > For any complex device that execute program we should really enforce > the open source userspace so that we can properly audit the driver > as otherwise we only have half of the story with no idea what the > other half might implies. What you're demanding is open userspace _and_ firmware. Since without firmware sources, you can't audit any on-chip behavior either (in reality, most commands passed down are likely parsed by said firmware). -Olof