Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752245AbbHPTmM (ORCPT ); Sun, 16 Aug 2015 15:42:12 -0400 Received: from mail.parknet.co.jp ([210.171.160.6]:49232 "EHLO mail.parknet.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751238AbbHPTmK (ORCPT ); Sun, 16 Aug 2015 15:42:10 -0400 From: OGAWA Hirofumi To: Jan Kara Cc: Daniel Phillips , David Lang , Rik van Riel , tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [FYI] tux3: Core changes References: <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> <87a8vtdqfz.fsf@mail.parknet.co.jp> <20150623161247.GP2427@quack.suse.cz> <87k2ueepd6.fsf@mail.parknet.co.jp> <20150709160528.GK2900@quack.suse.cz> <874mklaqbn.fsf@mail.parknet.co.jp> <20150803134251.GC9657@quack.suse.cz> <87r3nck27h.fsf@mail.parknet.co.jp> <20150810124525.GC3768@quack.suse.cz> Date: Mon, 17 Aug 2015 04:42:04 +0900 In-Reply-To: <20150810124525.GC3768@quack.suse.cz> (Jan Kara's message of "Mon, 10 Aug 2015 14:45:25 +0200") Message-ID: <87a8trxboz.fsf@mail.parknet.co.jp> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2802 Lines: 65 Jan Kara writes: > On Sun 09-08-15 22:42:42, OGAWA Hirofumi wrote: >> Jan Kara writes: >> >> > I'm not sure about which ENOSPC issue you are speaking BTW. Can you >> > please ellaborate? >> >> 1. GUP simulate page fault, and prepare to modify >> 2. writeback clear dirty, and make PTE read-only >> 3. snapshot/reflink make block cow > > I assume by point 3. you mean that snapshot / reflink happens now and thus > the page / block is marked as COW. Am I right? Right. >> 4. driver called GUP modifies page, and dirty page without simulate page fault > > OK, but this doesn't hit ENOSPC because as you correctly write in point 4., > the page gets modified without triggering another page fault so COW for the > modified page isn't triggered. Modified page contents will be in both the > original and the reflinked file, won't it? And above result can be ENOSPC too, depending on implement and race condition. Also, if FS converted zerod blocks to hole like hammerfs, simply ENOSPC happens. I.e. other process uses all spaces, but then no ->page_mkwrite() callback to check ENOSPC. > And I agree that the fact that snapshotted file's original contents can > still get modified is a bug. A one which is difficult to fix. Yes, it is why I'm thinking this logic is issue, before page forking. >> So it sounds like yet another "stable page". I.e. unpredictable >> performance. (BTW, by recall of "stable page", noticed "stable page" >> would not provide stabled page data for that logic too.) >> >> Well, assuming "elevated refcount == threshold + waitq/wakeup", so >> IMO, it is not attractive. Rather the last option if there is no >> others as design choice. > > I agree the performance will be less predictable and that is not good. But > changing what is visible in the file when writeback races with GUP is a > worse problem to me. > > Maybe if GUP marked pages it got ref for so that we could trigger the slow > behavior only for them (Peter Zijlstra proposed in [1] an infrastructure so > that pages pinned by get_user_pages() would be properly accounted and then > we could use PG_mlocked and elevated refcount as a more reliable indication > of pages that need special handling). I'm not reading Peter's patchset fully though, looks like good, and maybe similar strategy in my mind currently. Also I'm thinking to add callback for FS at start and end of GUP's pin window. (for just an example, callback can be used to stop writeback by FS if FS wants.) Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/