Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752974AbbHCNnB (ORCPT ); Mon, 3 Aug 2015 09:43:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:49907 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751288AbbHCNm7 (ORCPT ); Mon, 3 Aug 2015 09:42:59 -0400 Date: Mon, 3 Aug 2015 15:42:51 +0200 From: Jan Kara To: OGAWA Hirofumi Cc: Jan Kara , Daniel Phillips , David Lang , Rik van Riel , tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [FYI] tux3: Core changes Message-ID: <20150803134251.GC9657@quack.suse.cz> References: <20150526070910.GA3307@quack.suse.cz> <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> <87a8vtdqfz.fsf@mail.parknet.co.jp> <20150623161247.GP2427@quack.suse.cz> <87k2ueepd6.fsf@mail.parknet.co.jp> <20150709160528.GK2900@quack.suse.cz> <874mklaqbn.fsf@mail.parknet.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874mklaqbn.fsf@mail.parknet.co.jp> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4286 Lines: 84 On Fri 31-07-15 13:44:44, OGAWA Hirofumi wrote: > Jan Kara writes: > > >> > Yes, if userspace truncates the file, the situation we end up with is > >> > basically the same. However for truncate to happen some malicious process > >> > has to come and truncate the file - a failure scenario that is acceptable > >> > for most use cases since it doesn't happen unless someone is actively > >> > trying to screw you. With page forking it is enough for flusher thread > >> > to start writeback for that page to trigger the problem - event that is > >> > basically bound to happen without any other userspace application > >> > interfering. > >> > >> Acceptable conclusion is where came from? That pseudocode logic doesn't > >> say about usage at all. And even if assume it is acceptable, as far as I > >> can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a > >> page on non-exists block (sparse file. i.e. missing disk space check in > >> your logic). And if really no any lock/check, there would be another > >> races. > > > > So drop_caches won't cause any issues because it avoids mmaped pages. > > Also page reclaim or page migration don't cause any issues because > > they avoid pages with increased refcount (and increased refcount would stop > > drop_caches from reclaiming the page as well if it was not for the mmaped > > check before). Generally, elevated page refcount currently guarantees page > > isn't migrated, reclaimed, or otherwise detached from the mapping (except > > for truncate where the combination of mapping-index becomes invalid) and > > your page forking would change that assumption - which IMHO has a big > > potential for some breakage somewhere. > > Lifetime and visibility from user are different topic. The issue here > is visibility. Of course, those has relation more or less though, > refcount doesn't stop to drop page from radix-tree at all. Well, refcount prevents dropping page from a radix-tree in some cases - memory pressure, page migration to name the most prominent ones. It doesn't prevent page from being dropped because of truncate, that is correct. In general, the rule we currently obey is that kernel doesn't detach a page with increased refcount from a radix tree unless there is a syscall asking kernel to do that. > Well, anyway, your claim seems to be assuming the userspace app > workarounds the issues. And it sounds like still not workarounds the > ENOSPC issue (validate at page fault/GUP) even if assuming userspace > behave as perfect. Calling it as kernel assumption is strange. Realistically, I don't think userspace apps workaround anything. They just do what happens to work. Nobody happens to delete files while application works on it and expect application to gracefully handle that. So everyone is happy. I'm not sure about which ENOSPC issue you are speaking BTW. Can you please ellaborate? > If you claim, there is strange logic widely used already, and of course, > we can't simply break it because of compatibility. I would be able to > agree. But your claim sounds like that logic is sane and well designed > behavior. So I disagree. To me the rule: "Do not detach a page from a radix tree if it has an elevated refcount unless explicitely requested by a syscall" looks like a sane one. Yes. > > And frankly I fail to see why you and Daniel care so much about this > > corner case because from performance POV it's IMHO a non-issue and you > > bother with page forking because of performance, don't you? > > Trying to penalize the corner case path, instead of normal path, should > try at first. Penalizing normal path to allow corner case path is insane > basically. > > Make normal path faster and more reliable is what we are trying. Elevated refcount of a page is in my opinion a corner case path. That's why I think that penalizing that case by waiting for IO instead of forking is acceptable cost for the improved compatibility & maintainability of the code. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/