Received: by 2002:a05:6358:e9c4:b0:b2:91dc:71ab with SMTP id hc4csp823130rwb; Thu, 4 Aug 2022 11:31:53 -0700 (PDT) X-Google-Smtp-Source: AA6agR4rCgCnKNKNWedPPg/c5q9h93wKxyP6J9hgjHNnm/RFI5p6uM1/OjuodMwqBdLz4SBhqmy6 X-Received: by 2002:a17:903:1ce:b0:16f:145c:a842 with SMTP id e14-20020a17090301ce00b0016f145ca842mr3035344plh.83.1659637913491; Thu, 04 Aug 2022 11:31:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659637913; cv=none; d=google.com; s=arc-20160816; b=XnFhHOL5wXPICSUYlduJZvN83efAPj+P0PktAOajfojcQx3W5HkVv7o4L/4Y+R1N5c PhpjTmHbD2TmL5cp7Su5zeyGJDNgA4ItX0myow7HXj4QCGCiqBdr5s0Aeqd77GQ4p8zL VFVCvvq9cpwQn8dDgfkfjTZPCHkitXui4SceUEU0d4Xkz4PV1uSnE1JTnRbdi61lPUg8 e9TM/H/pA97DQeuYPo0AjEvo8F1df5+Muoe1r0iVeW81NRuh1ZHeJm60ptUXEME5xaMZ zuVB2FAMJUp1c3uJhOPN6tCJprYLfLwjSlO/2purGFViW27hi/JcJ07GxduUfKegj/6k wRqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=RSuhwQUQ/rtlHZ0/kclF6sR3y8SDjI8wPdFTaAIfmE8=; b=B7lNfWa4LlJVBRwXTvg3d9zEjSFCPS91CUUYFkAZmck6UKRzdqWQb6c5mBt8zFXufR Dq2lIQhHHBjeSSEqBRL+LO2yEQvHuRpYUDcy/5XaXZI/vJ51fx2p1Kd0fxcW/0z3L6Y3 YcZI27zJ/KtDMJOaOoyQnYUxUKOUtUVUcUvoFlzvYnV9/tFkCn6PVT/930LnKlz+Tt5w e2tFo4NkkV5HjQviYoA4TEZFMtWPaKukA0sXJV1sqUx/IA1A5Sy+W9ZccFjDlNb6U1JD 1Mve8PsZBQ3xw6ZOdfWo9OK/AzfFUbuRbaN9h8yVttQbrsXmx4J/UUgrVkTLlYkLCYZz PEqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=pA5TqVCr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m3-20020a170902d18300b0016ed8e18995si1268026plb.216.2022.08.04.11.31.39; Thu, 04 Aug 2022 11:31:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=pA5TqVCr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234721AbiHDRs7 (ORCPT + 99 others); Thu, 4 Aug 2022 13:48:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231712AbiHDRs5 (ORCPT ); Thu, 4 Aug 2022 13:48:57 -0400 Received: from mail-oa1-x35.google.com (mail-oa1-x35.google.com [IPv6:2001:4860:4864:20::35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 687AE101EF for ; Thu, 4 Aug 2022 10:48:56 -0700 (PDT) Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-10edfa2d57dso412591fac.0 for ; Thu, 04 Aug 2022 10:48:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=RSuhwQUQ/rtlHZ0/kclF6sR3y8SDjI8wPdFTaAIfmE8=; b=pA5TqVCrqsfU3E1QS05RGTU3XdQaYQEL1rIkanyUCosL0FnWxJeieTDvzxMRKpF9LZ bcuNTTiH+XvLsEHA19T+IPa7YneFcn2RV9FtCUgLzQ3+SPqeyd0/hhScoCzRwfMncigD 6odDNfaeIFS1OSBrRXhXp3hyDXwG02jFZ894kdfrwwZhuxDeYNul3r8lwOqTuZnfH17W oI2PHJeIESNUPGIlwPUtUnWDhbcqHPzgwjJ/X8Kvz7F+Cd+qhTpa4tj09VOyegKcmHKB a5/8yBAWDig6O/f0ubkESsFBPczUHPlT8nK728m/eaa7q/btLXyX+4pkjZlEwdgCutxk KA4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=RSuhwQUQ/rtlHZ0/kclF6sR3y8SDjI8wPdFTaAIfmE8=; b=V4vbsecbc7qGV5UH9lXaZ9K1gs3Rhl2/Yqz+bDKWqUc2lIWpDG0UZcVX6srQFApwDn RFDbFVOUAZgJP0PePjSmAK04ceV0H6FctLusmPTZo8rDlMZ7DQKofyCsSSjsZD5dWSt7 uX31uCJh4aLWENY+EiKqJhFu5lPxespicadLcuUQ+j+UQhbXl9rJPuIAIzUMi81Kwsx8 GH45ISN90wJCbZlRRfGpDVP9AyOmyFhEtFhAmddgMLZDLzhNwWBPavaYzHl4lWnM7eUu PuxV8ODcPVuTyJFZK2Y2T/F0Mw7uYNbYGvaUgzeQ/ZF9KID4i6IbSps7kcmS5tCuv3ne owvg== X-Gm-Message-State: ACgBeo0DrNB7aN/bfJaHZipcp+6DhAJ9t+LBxka0N68/nob3KcthCARN Dh5OCUk5MzgS7gadh/3OhxjnOWMTNfmuv9z2RVc= X-Received: by 2002:a05:6870:961d:b0:10d:7606:b212 with SMTP id d29-20020a056870961d00b0010d7606b212mr1395995oaq.166.1659635335065; Thu, 04 Aug 2022 10:48:55 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Oded Gabbay Date: Thu, 4 Aug 2022 20:48:28 +0300 Message-ID: Subject: Re: New subsystem for acceleration devices To: Jason Gunthorpe Cc: Dave Airlie , dri-devel , Greg Kroah-Hartman , Yuji Ishikawa , Jiho Chu , Arnd Bergmann , "Linux-Kernel@Vger. Kernel. Org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 4, 2022 at 5:50 PM Jason Gunthorpe wrote: > > On Thu, Aug 04, 2022 at 10:43:42AM +0300, Oded Gabbay wrote: > > > After all, memory management services, or common device chars handling > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > there as well), but this doesn't mean I belong there, right ? > > You sure can, but there is still an expectation, eg in RDMA, that your > device has a similarity to the established standards (like roce in > habana's case) that RDMA is geared to support. > > I think the the most important thing to establish a new subsystem is > to actually identify what commonalities it is supposed to be > providing. Usually this is driven by some standards body, but the > AI/ML space hasn't gone in that direction at all yet. I agree. In the AI-world the standard doesn't exist and I don't see anything on the horizon. There are the AI frameworks/compilers which are 30,000 feet above us, and there is CUDA which is closed-source and I have no idea what it does inside. > > We don't need a "subsystem" to have a bunch of drivers expose chardevs > with completely unique ioctls. I totally agree with this sentence and this is *exactly* why personally I don't want to use DRM because when I look at the long list of common IOCTLs in drm.h, I don't find anything that I can use. It's simply either not relevant at all to my h/w or it is something that our h/w implemented differently. This is in contrast to the rdma, where as you said, we have ibverbs API. So, when you asked that we write an IBverbs driver I understood the reasoning. There is a common user-space library which talks to the rdma drivers and all the rdma applications use that library and once I will write a (somewhat) standard driver, then hopefully I can enjoy all that. > > The flip is true of DRM - DRM is pretty general. I bet I could > implement an RDMA device under DRM - but that doesn't mean it should > be done. > > My biggest concern is that this subsystem not turn into a back door > for a bunch of abuse of kernel APIs going forward. Though things are How do you suggest to make sure it won't happen ? > better now, we still see this in DRM where expediency or performance > justifies hacky shortcuts instead of good in-kernel architecture. At > least DRM has reliable and experienced review these days. Definitely. DRM has some parts that are really well written. For example, the whole char device handling with sysfs/debugfs and managed resources code. This is something I would gladly either use or copy-paste into the hw accel subsystem. And of course more pairs of eyes looking at the code will usually produce better code. I think that it is clear from my previous email what I intended to provide. A clean, simple framework for devices to register with, get services for the most basic stuff such as device char handling, sysfs/debugfs. Later on, add more simple stuff such as memory manager and uapi for memory handling. I guess someone can say all that exists in drm, but like I said it exists in other subsystems as well. I want to be perfectly honest and say there is nothing special here for AI. It's actually the opposite, it is a generic framework for compute only. Think of it as an easier path to upstream if you just want to do compute acceleration. Maybe in the future it will be more, but I can't predict the future. If that's not enough for a new subsystem, fair enough, I'll withdraw my offer. Thanks, Oded > > Jason