Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp242867ybt; Fri, 19 Jun 2020 00:27:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxrBm4eYIpvIiXEqZ9y+Liwq1Xgbi76vyBc24d8gGxpKulCY7tFbr3YPk80hPwSXcv0De+t X-Received: by 2002:aa7:c558:: with SMTP id s24mr1859245edr.237.1592551625334; Fri, 19 Jun 2020 00:27:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592551625; cv=none; d=google.com; s=arc-20160816; b=PwFni9z6CsAR7qzEvwOM+eD5IA5qmhxECTLgDJTzq0yN0croHxwidAzoLyNstVYxaZ SUQ8N9iTJ/qXi6oly9bO22iXUQ3oO+h8k+xTd9XiCx97hBVlnJGCFXYOz2Xqea6OY/Ly oedm2EbH9ZlkOkQbKQWgwmHHBfBBWgQSa0sj+vAtzAnVH3QSJ/pqsrGYpDceIxwn0/IR wp3bolaObs/ZdruY2jvHV/SLWkCXOw6bKLqlW7PFR7PZQ7AcrYIcFjvPDWJZ59m5LkOT ZxLu6fisocon40RLactEaLDhpk1Rg9fjng3d2MjcSfSCu/qJqU6MVaYzS3xfxDEa+Zze Q+rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=DqEsfHHZMDjVjBSRzofGz02dqQOVqzrxROmLfkgGgfU=; b=jO6qWiHJfVvncLTRJqsWl8PbCWqaJD5G7P4F7Bcp0b7iSUu6a24ZHDR6ZMYUd0zvDr qJXPxTJ39P+qr3euiK/Tx3kF2rBp0o/ecfdUlXz4QSTl2tDPd6JJqlbLOdZXE5/+Cgfy +q31wZZccMEXtnv9P6UtR1oSOfAUpdGpNpU5X8UbADCm99MJFUdmFaJ2RvNqqtT8wKzQ feHebx+s8B7ECYoeLyLBf7UB2FPFnZG7aCPHZm5tceFB6rNVX5LPsZffYRsgwCaSyuds +nzn29S+wPIajiRzHeuRMcEz/QRl1b0SKw99EMsz8/EFbEoA4Jlsy1ga9S9/uS64M/c1 L1KQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=A5HUcVec; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k18si4092265edk.48.2020.06.19.00.26.42; Fri, 19 Jun 2020 00:27:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=A5HUcVec; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730650AbgFSHWZ (ORCPT + 99 others); Fri, 19 Jun 2020 03:22:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730523AbgFSHWV (ORCPT ); Fri, 19 Jun 2020 03:22:21 -0400 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7797C06174E for ; Fri, 19 Jun 2020 00:22:21 -0700 (PDT) Received: by mail-oi1-x241.google.com with SMTP id a21so7625465oic.8 for ; Fri, 19 Jun 2020 00:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DqEsfHHZMDjVjBSRzofGz02dqQOVqzrxROmLfkgGgfU=; b=A5HUcVec8RYu1bZm00Qb0nJy7vZOa/JILGkLgAuTZ8La1+bbfcPUqNdntXVxb3kR5C Pun8MPh8/elm4LjPbSFd8pHjYzxyAljbGO8JxS8ebVDq1mtIlkNyW1F1f57hRQ1lpcHC J28+KzCYOCc4U1wxOh8Jq++Bii/KSrEjFghhI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DqEsfHHZMDjVjBSRzofGz02dqQOVqzrxROmLfkgGgfU=; b=P1G7LqIJL0LXJf+M4QHGlFsSdjAYkYApUy6vJP0BTv/ZFrL73MKctyJJn/t7oj0irR AGDA8ipQPoW5ktf75sfA0R7aSZs5TES6W8xUFetnjGo2vglpXMAOD/V0nW4Bc9mYbYTj 0AI48f0H0NnhY7Fr3wxQbB3CmKMdDjtDjSiY/PgQFY5nW0fxKaTjoD31zuA2SvzDE8I5 uv/ha4mBTMeQ1IgAbUYQss0RU/wI0lyF8vcy1DvHA3L3ABiOJPjN9bCeLeZRkCvLqO2e vSk+ae2H4KiUXWGudoQRG4YxCOifdWWslPEBW+uGN3tYM3m3gS0nJ/HQP/V/vZ/EMNqU L/6g== X-Gm-Message-State: AOAM530Owqi9l3NnTyEWxY6yMtTG26QFD64SrLEaWeVw+xjJ7ne2dDy2 bOVymyX3oAyb5idngmFQNAxtzthdG5w8pep2egJi/A== X-Received: by 2002:aca:ad97:: with SMTP id w145mr2091861oie.128.1592551341034; Fri, 19 Jun 2020 00:22:21 -0700 (PDT) MIME-Version: 1.0 References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> <20200604081224.863494-5-daniel.vetter@ffwll.ch> <20200611083430.GD20149@phenom.ffwll.local> <20200611141515.GW6578@ziepe.ca> <20200616120719.GL20149@phenom.ffwll.local> <20200617152835.GF6578@ziepe.ca> <20200618150051.GS20149@phenom.ffwll.local> <20200618172338.GM6578@ziepe.ca> In-Reply-To: <20200618172338.GM6578@ziepe.ca> From: Daniel Vetter Date: Fri, 19 Jun 2020 09:22:09 +0200 Message-ID: Subject: Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations To: Jason Gunthorpe Cc: =?UTF-8?Q?Thomas_Hellstr=C3=B6m_=28Intel=29?= , DRI Development , linux-rdma , Intel Graphics Development , Maarten Lankhorst , LKML , amd-gfx list , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Thomas Hellstrom , Daniel Vetter , "open list:DMA BUFFER SHARING FRAMEWORK" , =?UTF-8?Q?Christian_K=C3=B6nig?= , Mika Kuoppala Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 19, 2020 at 8:58 AM Jason Gunthorpe wrote: > > On Thu, Jun 18, 2020 at 05:00:51PM +0200, Daniel Vetter wrote: > > On Wed, Jun 17, 2020 at 12:28:35PM -0300, Jason Gunthorpe wrote: > > > On Wed, Jun 17, 2020 at 08:48:50AM +0200, Daniel Vetter wrote: > > > > > > > Now my understanding for rdma is that if you don't have hw page fault > > > > support, > > > > > > The RDMA ODP feature is restartable HW page faulting just like nouveau > > > has. The classical MR feature doesn't have this. Only mlx5 HW supports > > > ODP today. > > > > > > > It's only gpus (I think) which are in this awkward in-between spot > > > > where dynamic memory management really is much wanted, but the hw > > > > kinda sucks. Aside, about 10+ years ago we had a similar problem with > > > > gpu hw, but for security: Many gpu didn't have any kinds of page > > > > tables to isolate different clients from each another. drivers/gpu > > > > fixed this by parsing&validating what userspace submitted to make sure > > > > it's only every accessing its own buffers. Most gpus have become > > > > reasonable nowadays and do have proper per-process pagetables (gpu > > > > process, not the pasid stuff), but even today there's still some of > > > > the old model left in some of the smallest SoC. > > > > > > But I still don't understand why a dma fence is needed inside the GPU > > > driver itself in the notifier. > > > > > > Surely the GPU driver can block and release the notifier directly from > > > its own command processing channel? > > > > > > Why does this fence and all it entails need to leak out across > > > drivers? > > > > So 10 years ago we had this world of every gpu driver is its own bucket, > > nothing leaks out to the world. But the world had a different idea how > > gpus where supposed to work, with stuff like: > > Sure, I understand DMA fence, but why does a *notifier* need it? > > The job of the notifier is to guarentee that the device it is > connected to is not doing DMA before it returns. > > That just means you need to prove that device is done with the buffer. > > As I've understood GPU that means you need to show that the commands > associated with the buffer have completed. This is all local stuff > within the driver, right? Why use fence (other than it already exists) Because that's the end-of-dma thing. And it's cross-driver for the above reasons, e.g. - device A renders some stuff. Userspace gets dma_fence A out of that (well sync_file or one of the other uapi interfaces, but you get the idea) - userspace (across process or just different driver) issues more rendering for device B, which depends upon the rendering done on device A. So dma_fence A is an dependency and will block this dma operation. Userspace (and the kernel) gets dma_fence B out of this - because unfortunate reasons, the same rendering on device B also needs a userptr buffer, which means that dma_fence B is also the one that the mmu_range_notifier needs to wait on before it can tell core mm that it can go ahead and release those pages - unhappiness ensues, because now the mmu notifier from device B can get hung up on the dma operation device A is doing If you want to avoid this either a) have less shitty hw (not an option, gpus are gpus, it is slowly getting better though) or b) force userspace to stall before handing over to next device (about as uncool) or c) just pin all the memory always, who cares (also rather unpopular, gpus tend to use all the memory they can get). I guess the thing with gpus is that dma operations aren't like read/writes for pretty much everything else, but essentially compute contexts (usually implemented as ringbuffers where you stream stuff into) with cross everything dependencies. This even holds within a single gpu, since pretty much all modern gpus have multiple different engines special on different things. And yup that's directly exposed to userspace, for vulkan and other low-level gpu apis even directly to applications. So dma operation for gpu isn't just "done when the read/write finishes", but pulls in an entire chain of dependencies and ordering that needs to happen before it can even start. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch