Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1365142ybt; Thu, 25 Jun 2020 04:26:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxvLjFyIN/2QInpFlzw7j0boazaDqoTJNbFEF/r9xGcX90+qeldcGeVuT+v1HMkiYqDlnOq X-Received: by 2002:a17:906:dbe5:: with SMTP id yd5mr15150484ejb.328.1593084398368; Thu, 25 Jun 2020 04:26:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593084398; cv=none; d=google.com; s=arc-20160816; b=t898aRBreWQrpCsUr8HDTi4CWh8D7r+7Qf3xCNL0vTVPpy4L1d3RMg13rGWDwqlwnB sf6XBigdywsM8qAH087ai8Mw403hUY7t7KQr4ju5EOUd7gZy+noc1DL2GwTIzYxbboaS IkO73+rIcGprF3B1K+8KehDD7lCdWLUYqycc+59Btgl7kHBt3O+SwOiMXh5MUZki+23L rJAwb6EdK2NHzUf9JDiKChtsZ8LQ5BrOfMq8DMEwscDunGrns8kpDZ+6x7pe9MWcQEn4 X+KHNSygko1kQO+8sNB+QsR79TGef12bkIKUCpRGBxHDj4aLbD5f0JMvRQsHgvyvDO24 nk9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=65LXemqk5y8rxtIO7a6bw9rfAs6r8hmREYRkhXqu0oU=; b=Viw+/Be7NFy8NEFpAeAILbBH+l5Dc61ArJxoM6RdFu+YOEcpBAoQINabTLxEKmZKUM y37vlT0UJAUe3Eo53j3VMfCobGdsawvImkFdhKZGW7uASv8YTao7nnYf7ZN2rVPP50xH +xPhDSZraHYUHhz7TuX0RRxXtrJnPkoUAUJQ2uuwV+AQ+vTRdU+IRuAvvv8m9e2sEM7w YxU7hPBJN+miuC8rBeB4YXOEz9kGuRsX8PFqifL0wPnoDKdKKPnxE2UdJMj1yBHl9w9i 0ItpukeTu+WNWJk+7GFaMFrnjjgdKv85Hat6kW8LJ1w4U85NOnuzI1onoC6UhaXcwQsQ moRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b19si2948020eju.654.2020.06.25.04.26.14; Thu, 25 Jun 2020 04:26:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404183AbgFYLY6 (ORCPT + 99 others); Thu, 25 Jun 2020 07:24:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:55444 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404042AbgFYLY6 (ORCPT ); Thu, 25 Jun 2020 07:24:58 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E7CB1AD60; Thu, 25 Jun 2020 11:24:55 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id AB4511E1274; Thu, 25 Jun 2020 13:24:55 +0200 (CEST) Date: Thu, 25 Jun 2020 13:24:55 +0200 From: Jan Kara To: John Hubbard Cc: Jason Gunthorpe , Chris Wilson , linux-mm@kvack.org, linux-kernel@vger.kernel.org, intel-gfx@lists.freedesktop.org, Andrew Morton , Jan Kara , =?iso-8859-1?B?Suly9G1l?= Glisse , Claudio Imbrenda , "Kirill A . Shutemov" Subject: Re: [PATCH] mm: Skip opportunistic reclaim for dma pinned pages Message-ID: <20200625112455.GC17788@quack2.suse.cz> References: <20200624191417.16735-1-chris@chris-wilson.co.uk> <20200624192116.GO6578@ziepe.ca> <44708b2e-479f-7d58-fe01-29cfd6c70bdb@nvidia.com> <20200624232047.GP6578@ziepe.ca> <887ac706-65f0-3089-b51b-47aabf7d3847@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <887ac706-65f0-3089-b51b-47aabf7d3847@nvidia.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 24-06-20 17:11:30, John Hubbard wrote: > On 2020-06-24 16:20, Jason Gunthorpe wrote: > > > I do like this code change, though. And I *think* it's actually safe to > > > do this, as it stays away from writeback or other filesystem activity. > > > But let me double check that, in case I'm forgetting something. > > ...OK, I've checked, and I like it a little bit less now. Mainly for > structural reasons, though. I think it would work correctly. But > here's a concern: try_to_unmap() should only fail to unmap if there is a > reason to not unmap. Having a page be pinned for dma is a reason to not > *free* a page, and it's also a reason to be careful about writeback and > page buffers for writeback and such. But I'm not sure that it's a reason > to fail to remove mappings. > > True, most (all?) of the reasons that we remove mappings, generally are > for things that are not allowed while a page is dma-pinned...at least, > today. But still, there's nothing fundamental about a mapping that > should prevent it from coming or going while a page is undergoing > dma. > > So, it's merely a convenient, now-misnamed location in the call stack > to fail out. That's not great. It might be better, as Jason hints at > below, to fail out a little earlier, instead. That would lead to a more > places to call page_maybe_dma_pinned(), but that's not a real problem, > because it's still a small number of places. Agreed, I think that shrink_page_list() as Michal writes is a better place for the page_may_be_dma_pinned() check. But other than that I agree it is good to avoid all the unnecessary work of preparing a page for reclaim when we can now check there's no hope of reclaiming it (at this time). > > I don't know, but could it be that try_to_unmap() has to be done > > before checking the refcount as each mapping is included in the > > refcount? ie we couldn't know a DMA pin was active in advance? > > > > Now that we have your pin stuff we can detect a DMA pin without doing > > all the unmaps? > > > > Once something calls pin_user_page*(), then the pages will be marked > as dma-pinned, yes. So no, there is no need to wait until try_to_unmap() > to find out. Yeah, I'm pretty sure the page ref check happens so late because earlier it was impossible to tell whether there are "external" page references reclaim cannot get rid of - filesystem may (but need not!) hold a page reference for its private data, page tables will hold references as well, etc. Now with page_may_be_dma_pinned() we can detect at least one class of external references (i.e. pins) early so it's stupid not to do that. > A final note: depending on where page_maybe_dma_pinned() ends up > getting called, this might prevent a fair number of the problems that > Jan originally reported [1], and that I also reported separately! > > Well, not all of the problems, and only after the filesystems get > converted to call pin_user_pages() (working on that next), but...I think > it would actually avoid the crash our customer reported back in early > 2018. Even though we don't have the full file lease + pin_user_pages() > solution in place. > > That's because reclaim is what triggers the problems that we saw. And > with this patch, we bail out of reclaim early. I agree that with this change, some races will become much less likely for some usecases. But as you say, it's not a full solution. Honza -- Jan Kara SUSE Labs, CR