Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752696AbbHJMpg (ORCPT ); Mon, 10 Aug 2015 08:45:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:48336 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751770AbbHJMpa (ORCPT ); Mon, 10 Aug 2015 08:45:30 -0400 Date: Mon, 10 Aug 2015 14:45:25 +0200 From: Jan Kara To: OGAWA Hirofumi Cc: Jan Kara , Daniel Phillips , David Lang , Rik van Riel , tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [FYI] tux3: Core changes Message-ID: <20150810124525.GC3768@quack.suse.cz> References: <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> <87a8vtdqfz.fsf@mail.parknet.co.jp> <20150623161247.GP2427@quack.suse.cz> <87k2ueepd6.fsf@mail.parknet.co.jp> <20150709160528.GK2900@quack.suse.cz> <874mklaqbn.fsf@mail.parknet.co.jp> <20150803134251.GC9657@quack.suse.cz> <87r3nck27h.fsf@mail.parknet.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r3nck27h.fsf@mail.parknet.co.jp> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3715 Lines: 84 On Sun 09-08-15 22:42:42, OGAWA Hirofumi wrote: > Jan Kara writes: > > > I'm not sure about which ENOSPC issue you are speaking BTW. Can you > > please ellaborate? > > 1. GUP simulate page fault, and prepare to modify > 2. writeback clear dirty, and make PTE read-only > 3. snapshot/reflink make block cow I assume by point 3. you mean that snapshot / reflink happens now and thus the page / block is marked as COW. Am I right? > 4. driver called GUP modifies page, and dirty page without simulate page fault OK, but this doesn't hit ENOSPC because as you correctly write in point 4., the page gets modified without triggering another page fault so COW for the modified page isn't triggered. Modified page contents will be in both the original and the reflinked file, won't it? And I agree that the fact that snapshotted file's original contents can still get modified is a bug. A one which is difficult to fix. > >> If you claim, there is strange logic widely used already, and of course, > >> we can't simply break it because of compatibility. I would be able to > >> agree. But your claim sounds like that logic is sane and well designed > >> behavior. So I disagree. > > > > To me the rule: "Do not detach a page from a radix tree if it has an elevated > > refcount unless explicitely requested by a syscall" looks like a sane one. > > Yes. > > > >> > And frankly I fail to see why you and Daniel care so much about this > >> > corner case because from performance POV it's IMHO a non-issue and you > >> > bother with page forking because of performance, don't you? > >> > >> Trying to penalize the corner case path, instead of normal path, should > >> try at first. Penalizing normal path to allow corner case path is insane > >> basically. > >> > >> Make normal path faster and more reliable is what we are trying. > > > > Elevated refcount of a page is in my opinion a corner case path. That's why > > I think that penalizing that case by waiting for IO instead of forking is > > acceptable cost for the improved compatibility & maintainability of the > > code. > > What is "elevated refcount"? What is difference with normal refcount? > Are you saying "refcount >= specified threshold + waitq/wakeup" or > such? If so, it is not the path. It is the state. IOW, some group may > not hit much, but some group may hit much, on normal path. Yes, by "elevated refcount" I meant refcount > 2 (one for pagecache, one for your code inspecting the page). > So it sounds like yet another "stable page". I.e. unpredictable > performance. (BTW, by recall of "stable page", noticed "stable page" > would not provide stabled page data for that logic too.) > > Well, assuming "elevated refcount == threshold + waitq/wakeup", so > IMO, it is not attractive. Rather the last option if there is no > others as design choice. I agree the performance will be less predictable and that is not good. But changing what is visible in the file when writeback races with GUP is a worse problem to me. Maybe if GUP marked pages it got ref for so that we could trigger the slow behavior only for them (Peter Zijlstra proposed in [1] an infrastructure so that pages pinned by get_user_pages() would be properly accounted and then we could use PG_mlocked and elevated refcount as a more reliable indication of pages that need special handling). Honza [1] http://thread.gmane.org/gmane.linux.kernel.mm/117679 -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/