Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp2396913ybl; Thu, 15 Aug 2019 11:08:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqwn6Wolctwb8bmUUWlX/Hr0luPl8eZ2l5iT8ElgBR84xBddr72MASZu73qj+u3dIlWPsnx3 X-Received: by 2002:a63:1743:: with SMTP id 3mr4326548pgx.435.1565892504596; Thu, 15 Aug 2019 11:08:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565892504; cv=none; d=google.com; s=arc-20160816; b=GmTSYyyNAkwDQgvtTy79B9FB6FND3Zx1MeuSHNowcPoSqmhl+onV1ZdOEM9GDrO37n hV4E1anMsrAa3DVlRJYC6mv1DiAlSQxPNSEtBRm2SXzKB/TLQszQdkizbuc94Q7eI69P VCQmLSj4TggIUNPt3y2Kc7ZqtzV96ISb1RiRZOdurXVay1HnMcnFjFUvKecBev/l6aTn jAMj0WzukD5DUyYixqSri/ExSZ2xD0DXArT81SFgKokoJlxzDZI9bGjwalWuGeMasCTe NT7kC3NzER7+JXsoXiTuZRg3PCTwyAtwqOPntUMpPLoUa4nSOLdU7HeFKiC6aInXsiZU aLdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=bWQUZxVbgX8q+7/XNaaPkBz6uudNI37srtiH/YJRLMA=; b=bET2jpRNa5kUPGFAzmASfYjNZdmaISIzn79Q/JqVblUssdnlSVx1vep1DOVLZp8I4M KsLpyqmxOzZIpQiYtA9fyGvZ9KcOT430Wh/hf4J5iOscPCu+PVBDxtRXETs0lArBNd+A 6xRqf0+i2HDLo4DAJB6aprqat9om9MNqHiP0FMyektAoq7G66Ue5otV9Mpe17kzvgQRj bFh4YcXotPf5eFPMKLC4ao6UsPf3i/5VIZqAaHwoeD19H9OgYwpKCcBvVS/tvXctnJq1 H+Ty5FGz0rxIgMusrkGxkvnx1udu499EgmNDwkVKHik8A2OKpRivksm6hTm+jdNcsa9V wP1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=LyKrHHML; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si2399314pld.12.2019.08.15.11.08.08; Thu, 15 Aug 2019 11:08:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=LyKrHHML; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730317AbfHORQZ (ORCPT + 99 others); Thu, 15 Aug 2019 13:16:25 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:35378 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729109AbfHORQZ (ORCPT ); Thu, 15 Aug 2019 13:16:25 -0400 Received: by mail-qt1-f196.google.com with SMTP id u34so3130839qte.2 for ; Thu, 15 Aug 2019 10:16:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=bWQUZxVbgX8q+7/XNaaPkBz6uudNI37srtiH/YJRLMA=; b=LyKrHHML5qsAHOuu+3liATWbTb6DyVTWCsOvlxwJdwkXlE0Lc7JX8x425zP9bIasr/ z234649hCY9AqF83cc5SAoIRaIGSRUwvwesHrGgGC/flGkruF6oA300ue/onshbs0F/i avAkH6M1hqae7FbSfWx+SXEeVdRgVe8bp9cLBXMdWZAUwtclZPY109M3LnNxQRmTehbp A7+xYYueXtM+21QPDvDrdSYFg/+07W6wRbHEVPua8v1p5B4JKdw3uruExAedVopc4WP7 Joi5pFjarqFpfyVffxS7kdZQFyZ5RbTa72w3H2Rr5rdf2MgdFsBu794ulKvyqyrX+GuO iHvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=bWQUZxVbgX8q+7/XNaaPkBz6uudNI37srtiH/YJRLMA=; b=dJ/NZPU29U1zTLcPpDz/sS1q57B4kPLp348NWTj6efUxLuSmVqxqEvmLLk0ljtwbPz 2SxbY6k2x303On40QzLcs7oWj6gpg844DNuji2ymjfc6fy1pxYVEIYESqEuEQNwzMdVs os9IVJ0ytlSOs/NVWraZ0ewt0PyRhLeJt21ar+pRYyE2UboqLxWmxi2tav8h6GCcYvP2 Up+EgScYMzFc1vaXW9OUYvvvlBhbN7/A9SPQbivQoshvL3L+/YgMGfuGFvVGftzchCL1 Xap3TvFeJJX1b5oj5vGcTJ7IAZGEPvnZ1MQE8QZl1QkwLkBz+jh/897VW5ThWLKdhVX6 05dw== X-Gm-Message-State: APjAAAV/WvwQ864SPxT8sqHN2oUR/M7qsA66xwc7vf4RHwSCStc7PNSu xNjFX6iPDWcxcdSzqXm6wJ6ZOQ== X-Received: by 2002:a0c:9d0d:: with SMTP id m13mr4071346qvf.174.1565889383869; Thu, 15 Aug 2019 10:16:23 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-55-100.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.55.100]) by smtp.gmail.com with ESMTPSA id l11sm1685225qtr.11.2019.08.15.10.16.23 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Aug 2019 10:16:23 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1hyJMM-0006lb-QQ; Thu, 15 Aug 2019 14:16:22 -0300 Date: Thu, 15 Aug 2019 14:16:22 -0300 From: Jason Gunthorpe To: Jerome Glisse Cc: Daniel Vetter , Michal Hocko , Andrew Morton , LKML , Linux MM , DRI Development , Intel Graphics Development , Peter Zijlstra , Ingo Molnar , David Rientjes , Christian =?utf-8?B?S8O2bmln?= , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190815171622.GL21596@ziepe.ca> References: <20190814202027.18735-1-daniel.vetter@ffwll.ch> <20190814202027.18735-3-daniel.vetter@ffwll.ch> <20190814134558.fe659b1a9a169c0150c3e57c@linux-foundation.org> <20190815084429.GE9477@dhcp22.suse.cz> <20190815130415.GD21596@ziepe.ca> <20190815143759.GG21596@ziepe.ca> <20190815151028.GJ21596@ziepe.ca> <20190815163238.GA30781@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190815163238.GA30781@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 15, 2019 at 12:32:38PM -0400, Jerome Glisse wrote: > On Thu, Aug 15, 2019 at 12:10:28PM -0300, Jason Gunthorpe wrote: > > On Thu, Aug 15, 2019 at 04:43:38PM +0200, Daniel Vetter wrote: > > > > > You have to wait for the gpu to finnish current processing in > > > invalidate_range_start. Otherwise there's no point to any of this > > > really. So the wait_event/dma_fence_wait are unavoidable really. > > > > I don't envy your task :| > > > > But, what you describe sure sounds like a 'registration cache' model, > > not the 'shadow pte' model of coherency. > > > > The key difference is that a regirstationcache is allowed to become > > incoherent with the VMA's because it holds page pins. It is a > > programming bug in userspace to change VA mappings via mmap/munmap/etc > > while the device is working on that VA, but it does not harm system > > integrity because of the page pin. > > > > The cache ensures that each initiated operation sees a DMA setup that > > matches the current VA map when the operation is initiated and allows > > expensive device DMA setups to be re-used. > > > > A 'shadow pte' model (ie hmm) *really* needs device support to > > directly block DMA access - ie trigger 'device page fault'. ie the > > invalidate_start should inform the device to enter a fault mode and > > that is it. If the device can't do that, then the driver probably > > shouldn't persue this level of coherency. The driver would quickly get > > into the messy locking problems like dma_fence_wait from a notifier. > > I think here we do not agree on the hardware requirement. For GPU > we will always need to be able to wait for some GPU fence from inside > the notifier callback, there is just no way around that for many of > the GPUs today (i do not see any indication of that changing). I didn't say you couldn't wait, I was trying to say that the wait should only be contigent on the HW itself. Ie you can wait on a GPU page table lock, and you can wait on a GPU page table flush completion via IRQ. What is troubling is to wait till some other thread gets a GPU command completion and decr's a kref on the DMA buffer - which kinda looks like what this dma_fence() stuff is all about. A driver like that would have to be super careful to ensure consistent forward progress toward dma ref == 0 when the system is under reclaim. ie by running it's entire IRQ flow under fs_reclaim locking. > associated with the mm_struct. In all GPU driver so far it is a short > lived lock and nothing blocking is done while holding it (it is just > about updating page table directory really wether it is filling it or > clearing it). The main blocking I expect in a shadow PTE flow is waiting for the HW to complete invalidations of its PTE cache. > > It is important to identify what model you are going for as defining a > > 'registration cache' coherence expectation allows the driver to skip > > blocking in invalidate_range_start. All it does is invalidate the > > cache so that future operations pick up the new VA mapping. > > > > Intel's HFI RDMA driver uses this model extensively, and I think it is > > well proven, within some limitations of course. > > > > At least, 'registration cache' is the only use model I know of where > > it is acceptable to skip invalidate_range_end. > > Here GPU are not in the registration cache model, i know it might looks > like it because of GUP but GUP was use just because hmm did not exist > at the time. It is not because of GUP, it is because of the lack of invalidate_range_end. A driver cannot correctly implement the SPTE model without invalidate_range_end, even if it holds the page pins via GUP. So, I've been assuming the few drivers without invalidate_range_end are trying to do registration caching, rather than assuming they are broken. Jason