Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752325AbcDQBVs (ORCPT ); Sat, 16 Apr 2016 21:21:48 -0400 Received: from mail-pf0-f181.google.com ([209.85.192.181]:34796 "EHLO mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751943AbcDQBVr (ORCPT ); Sat, 16 Apr 2016 21:21:47 -0400 Date: Sat, 16 Apr 2016 18:21:37 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , Andrew Morton , "Kirill A. Shutemov" , Andrea Arcangeli , Andres Lagar-Cavilla , Yang Shi , Ning Qu , Stephen Rothwell , kernel test robot , Xiong Zhou , Matthew Wilcox , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH mmotm 5/5] huge tmpfs: add shmem_pmd_fault() In-Reply-To: <20160417004626.GA5169@node.shutemov.name> Message-ID: References: <20160417004626.GA5169@node.shutemov.name> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2414 Lines: 45 On Sun, 17 Apr 2016, Kirill A. Shutemov wrote: > On Sat, Apr 16, 2016 at 04:41:33PM -0700, Hugh Dickins wrote: > > The pmd_fault() method gives the filesystem an opportunity to place > > a trans huge pmd entry at *pmd, before any pagetable is exposed (and > > an opportunity to split it on COW fault): now use it for huge tmpfs. > > > > This patch is a little raw: with more time before LSF/MM, I would > > probably want to dress it up better - the shmem_mapping() calls look > > a bit ugly; it's odd to want FAULT_FLAG_MAY_HUGE and VM_FAULT_HUGE just > > for a private conversation between shmem_fault() and shmem_pmd_fault(); > > and there might be a better distribution of work between those two, but > > prising apart that series of huge tests is not to be done in a hurry. > > > > Good for now, presents the new way, but might be improved later. > > > > This patch still leaves the huge tmpfs map_team_by_pmd() allocating a > > pagetable while holding page lock, but other filesystems are no longer > > doing so; and we've not yet settled whether huge tmpfs should (like anon > > THP) or should not (like DAX) participate in deposit/withdraw protocol. > > > > Signed-off-by: Hugh Dickins > > Just for record: I don't like ->pmd_fault() approach because it results in > two requests to file system (two shmem_fault() in this case) if we don't > have a huge page to map: one for huge page (failed) and then one for small. > I think this case should be rather common: all mounts without huge pages > enabled. I expect performance regression from this too. Yes, I did consider that when making the switchover. But it's only when pmd_none(*pmd), not the other 511 times; and the caches have been primed for the pte fallback. So I didn't expect it to matter, and to be outweighed by having map_pages() back in its old position. Ah, you'll point out that map_pages() makes it a smaller ratio than 511:1. But if someone speeds up pmd_fault(), or replaces it by a better strategy, so much the better - I found it a little odd, doing two very different things, one of which (splitting) must be done in a non-fault context too. Anyway, I await judgement from the robot. And note your point about regressing mounts without huge pages enabled: maybe I should add an early VM_FAULT_FALLBACK for that case, or perhaps it will end up in the vma flags instead of my shmem_mapping() check. Hugh