Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp899213imm; Mon, 9 Jul 2018 12:48:47 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdeIA9vcPLxIwqNS2kyHKKopRqJqppWO7YQ36086aO04Gf6zaCQygiL1ooegP5xsp6OMRwb X-Received: by 2002:a62:4704:: with SMTP id u4-v6mr22644646pfa.76.1531165727265; Mon, 09 Jul 2018 12:48:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531165727; cv=none; d=google.com; s=arc-20160816; b=b93oVSOOXcw5h4fdZmU6UuiZZ1VUC8FKnoQiR8xde9c3ZQQ9qZ6dMR6nVFqM/yriNR 5IY9TAGDOZjTpNrP88aNFHJW1zh2IOkiLf3b3hnjhx8wXRhtNOf3BdmHKQjAmeMfk03V M0PkOH3vllbfnKwRvKe6kT7YSVKWIiDesxm+VjpkNCgFNppmqWh7gKo5+f3xr1rXR04G 3oljyxjBz6OJT8i1COnw9lIcPhFB0njpAB1jS3vBn77IFlfhdTlqQ3CVuN1IfubHQ4T+ OIXPjFvRegYsp4kZzx9Je32kHcSsxd0thwgnnMuhgrIMKZ49QGdie4vNrWZqM0ubr0/S ipuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=xqxx4QX4stZx4bieOwcC0nBIopPe2v0M0l5zjfcgt5A=; b=gjh/bhHf8+26pyX5yttooG8/sp7Mkie3RoEqlnOFuhKeT5qefX5pui4YW5FljT2R6r vK4iOCH30hK+gmrCQPbyuKHWGMv4VkYZNgbA0cKnmSwNAjt1N7mXmkmMcAqLF/JUUksa yWcl42JrLXmCEyqmejH+3S1I8HuketN1y0xajyVmDob/BfKKio2dNcwGxrqFyjcc4PM7 vi7VsUTjUxhMGG5YMGqlL4Di3uHajuNYxYZF2af9a2f0Po/1/ZwKZ6P+O3QSrix9f/56 prB4xEpP/Z3H/ZOart99X4l5SZvCBeBNLTHbCNd54dO//egdfdip3OmVCtjl5cSRXzfx mIFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a1-v6si15256278plp.247.2018.07.09.12.48.32; Mon, 09 Jul 2018 12:48:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932909AbeGITrw (ORCPT + 99 others); Mon, 9 Jul 2018 15:47:52 -0400 Received: from mx2.suse.de ([195.135.220.15]:56310 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932764AbeGITru (ORCPT ); Mon, 9 Jul 2018 15:47:50 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 86985AD49; Mon, 9 Jul 2018 19:47:48 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 9B3DB1E3B68; Mon, 9 Jul 2018 21:47:40 +0200 (CEST) Date: Mon, 9 Jul 2018 21:47:40 +0200 From: Jan Kara To: Matthew Wilcox Cc: Jan Kara , Nicholas Piggin , john.hubbard@gmail.com, Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Al Viro , linux-mm@kvack.org, LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: Re: [PATCH 0/2] mm/fs: put_user_page() proposal Message-ID: <20180709194740.rymbt2fzohbdmpye@quack2.suse.cz> References: <20180709080554.21931-1-jhubbard@nvidia.com> <20180709184937.7a70c3aa@roar.ozlabs.ibm.com> <20180709160806.xjt2l2pbmyiutbyi@quack2.suse.cz> <20180709171651.GE2662@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180709171651.GE2662@bombadil.infradead.org> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 09-07-18 10:16:51, Matthew Wilcox wrote: > On Mon, Jul 09, 2018 at 06:08:06PM +0200, Jan Kara wrote: > > On Mon 09-07-18 18:49:37, Nicholas Piggin wrote: > > > The problem with blocking in clear_page_dirty_for_io is that the fs is > > > holding the page lock (or locks) and possibly others too. If you > > > expect to have a bunch of long term references hanging around on the > > > page, then there will be hangs and deadlocks everywhere. And if you do > > > not have such log term references, then page lock (or some similar lock > > > bit) for the duration of the DMA should be about enough? > > > > There are two separate questions: > > > > 1) How to identify pages pinned for DMA? We have no bit in struct page to > > use and we cannot reuse page lock as that immediately creates lock > > inversions e.g. in direct IO code (which could be fixed but then good luck > > with auditing all the other GUP users). Matthew had an idea and John > > implemented it based on removing page from LRU and using that space in > > struct page. So we at least have a way to identify pages that are pinned > > and can track their pin count. > > > > 2) What to do when some page is pinned but we need to do e.g. > > clear_page_dirty_for_io(). After some more thinking I agree with you that > > just blocking waiting for page to unpin will create deadlocks like: > > Why are we trying to writeback a page that is pinned? It's presumed to > be continuously redirtied by its pinner. We can't evict it. So what should be a result of fsync(file), where some 'file' pages are pinned e.g. by running direct IO? If we just skip those pages, we'll lie to userspace that data was committed while it was not (and it's not only about data that has landed in those pages via DMA, you can have first 1k of a page modified by normal IO in parallel to DMA modifying second 1k chunk). If fsync(2) returns error, it would be really unexpected by userspace and most apps will just not handle that correctly. So what else can you do than block? > > ext4_writepages() ext4_direct_IO_write() > > __blockdev_direct_IO() > > iov_iter_get_pages() > > - pins page > > handle = ext4_journal_start_with_reserve(inode, ...) > > - starts transaction > > ... > > lock_page(page) > > mpage_submit_page() > > clear_page_dirty_for_io(page) -> blocks on pin > > I don't think it should block. It should fail. See above... Honza -- Jan Kara SUSE Labs, CR