Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp251496ybf; Thu, 27 Feb 2020 20:27:20 -0800 (PST) X-Google-Smtp-Source: APXvYqxcfoA96S+G8agAD1X2Q66T79y9aO+dR75abZ5uKWGEMRVNA4lROlJ7OGGx4OzMx5T2Iy24 X-Received: by 2002:a05:6830:145:: with SMTP id j5mr1760179otp.242.1582864040723; Thu, 27 Feb 2020 20:27:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582864040; cv=none; d=google.com; s=arc-20160816; b=rzocoTvjtK/ZRS6W3v+/eoJRPgL1k8aJaVn0cHdXBLj2BqTih42tXLw5aFHMmlTNIi pzcbUgx3vEpHcsso9LYd3jdp8NLuD/j/FenQ/Cbq3txZINWlkqnVl+f61xzFygPMD/cn egIqF+IPgA/z47lW2IbJkyFf8dAHMpiX3uwbjZ7Bi0oO6i/iSQdigI+45ruITE02GLuK 6TbsbO5dMj/wu07Fq7FbOWRQCg3tDqzi+mFUX1d6Jd/FHW6KQ3xYNclgr8aX8gNMJ/Oi UFqMThP2CDNTLo5qKNB42G8pL4NV2RD0Qi5BI5NiGr3GAhCuhE7BJwWq/Mh06H5XZkkQ aNNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=kreL7heGps3JbfLIts0FLNnvtArOFcr7qYt1z2FRj5U=; b=VKYOZgxTymqJAJ7dzF8LZr0FmFWwQF/o6Mq7Gtc8Rqxl4snfnNe70g8WsfVpVwxIal GC0wkEQQCO4y7RuNMj2oKomQKAOfqbHziS/HQIE4qd3RwF30p7xEhYTwTGSmLuip3o7s Vn4vRGQt0xw0nNbg3aNcOhIz7/YH9f9QboDytTC0YKNFqJ03oc9bqKs2/60HKkKT/7Zq XbqPIJYaTJ+6FYlWisPO4IbEhu9Rr7jPtGpdBNmNO7wFFWKmVNps0+XYi+rWg4Ly3yT1 O5jmf9x6VuEoeNUVuxtGkoRYhVbsWyJCebrkcASjU1AfrQemrQSWbMPUrpA7vxY041yS OWSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=dQAcQnm5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u1si865145otg.113.2020.02.27.20.27.08; Thu, 27 Feb 2020 20:27:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=dQAcQnm5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730818AbgB1E07 (ORCPT + 99 others); Thu, 27 Feb 2020 23:26:59 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:47434 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730586AbgB1E07 (ORCPT ); Thu, 27 Feb 2020 23:26:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=kreL7heGps3JbfLIts0FLNnvtArOFcr7qYt1z2FRj5U=; b=dQAcQnm5hKtXRL++8eUjEfM0UT 3XWJYgpPsIwt6Yt6evJVGLYQk49iP+1VD01zQfPZU/G2fC166+OI1j0veUuAFEOiE5X6TvZbF6ApX tufOFW3vFytrauc9O7EzL76PJDmObN+eZQY9qYdPViKN0DdWtFho9qGdznDG/sL35Q+vmztA1iL8+ y7S/PP4GU0dyLkMZy0N7qYySxU/rB58T1UeVvVhk0Li6Zlv5W4qaUB16JIRN/UfxPRwHDjB6GNun+ +5eNiMOufQ0rbKhehA8PngUkXZ8r1eigi08h3/5je2hB0Y88nTHUjKoLJkZdtrenuUDpsVv5C0bFk +9NUK4cg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1j7XEc-0004yD-EJ; Fri, 28 Feb 2020 04:26:46 +0000 Date: Thu, 27 Feb 2020 20:26:46 -0800 From: Matthew Wilcox To: Hugh Dickins Cc: "Kirill A. Shutemov" , Andrew Morton , Yang Shi , Alexander Duyck , "Michael S. Tsirkin" , David Hildenbrand , "Kirill A. Shutemov" , Andrea Arcangeli , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] huge tmpfs: try to split_huge_page() when punching hole Message-ID: <20200228042646.GF29971@bombadil.infradead.org> References: <20200227084704.aolem5nktpricrzo@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 27, 2020 at 08:04:21PM -0800, Hugh Dickins wrote: > It's good to consider the implications for hole-punch on a persistent > filesystem cached with THPs (or lower order compound pages); but I > disagree that they should behave differently from this patch. > > The hole-punch is fundamentally directed at freeing up the storage, yes; > but its page cache must also be removed, otherwise you have the user > writing into cache which is not backed by storage, and potentially losing > the data later. So a hole must be punched in the compound page in that > case too: in fact, it's then much more important that split_huge_page() > succeeds - not obvious what the fallback should be if it fails (perhaps > in that case the compound page must be kept, but all its pmds removed, > and info on holes kept in spare fields of the compound page, to prevent > writes and write faults without calling back into the filesystem: > soluble, but more work than tmpfs needs today)(and perhaps when that > extra work is done, we would choose to rely on it rather than > immediately splitting; but it will involve discounting the holes). Ooh, a topic that reasonable people can disagree on! The current prototype I have will allocate (huge) pages and then ask the filesystem to fill them. The filesystem may well find that the extent is a hole, and if it is, it will fill the page with zeroes. Then, the application may write to those pages, and if it does, the filesystem will be notified to create an on-disk extent for that write. I haven't looked at the hole-punch path in detail, but presumably it notifies the filesystem to create a hole extent and zeroes out the pagecache for that range (possibly by removing entire pages, and with memset for partial pages). Then a subsequent write to the hole will cause the filesystem to allocate a new non-hole extent, just like the previous case. I think it's reasonable for the page cache to interpret a hole-punch request as being a hint that the hole is unlikely to be accessed again, so allocating new smaller pages for that region of the file (or just writing back & dropping the covering page altogether) would seem like a reasonable implementation decision. However, it also seems reasonable that just memset() of the affected region and leaving the page intact would also be an acceptable implementation. As long as writes to the newly-created hole cause the page to become dirtied and thus writeback to be in effect. It probably wouldn't be as good an implementation, but it shouldn't lose writes as you suggest above. I'm not sure I'd choose to split a large page into smaller pages. I think I'd prefer to allocate lower-order pages and memcpy() the data over. Again, that's an implementation choice, and not something that should be visible outside the implementation. [1] http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/xarray-pagecache