Received: by 10.213.65.68 with SMTP id h4csp845530imn; Thu, 22 Mar 2018 09:48:45 -0700 (PDT) X-Google-Smtp-Source: AG47ELsFlRa9KRxzCAm9s/7Mr7fXXwVFpAcf0TQE7wh6hDSVz5L5Jju9oVuvG+Fjp/RVy2kKnIwj X-Received: by 10.98.233.3 with SMTP id j3mr21175496pfh.38.1521737325309; Thu, 22 Mar 2018 09:48:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521737325; cv=none; d=google.com; s=arc-20160816; b=yAmJAMAR1jyEYiNCex73XPpJhVewOS4QWh8gBFr0JZWPvMKIfruFVJ184GKtU+vwml 8EiXlGJaE8abs1O45Ebmh9+kzJtwdhQnvm+eNqFDQbCdaSUi5kNFtdAi4dFRysQzgE1A o2SlbnaUapSPBYpgBs0A4tL2vYT2U0CK7Da7OUV9vy9K1pdkVPNxblkML6Oo0in8LnBv Cp38QZxrfuna4JAV390xD8OyhTpaZDPW51pkwuWMI5OIXyKmkTaRP3yI71xSGm9mh74d 8g2bOuQRePTodn3Nn6AvkoZP/0KS6GMht0Yt9+MpPFMwdw+5VgZSss8LqH9Y0ftT9tUp /uIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=geBPIoNhFT3fo4JDBKBuKIEEtLiSNxW5eY0bs/ZX3kY=; b=PMkFN3yo/lbqpaFg3vQ/Fs6/A1+afZBe+PFNfmURcjwzDWFYyOnB+f6iHkFdQurA1E afPN48piItALNMLnSKvWZcgkjuGQUAoMJ2YVFsm4dmzd3R9FCgBwfve2caaj7ucVWLXi CO6LzKXaetBH8MSrB0ob/oh9U0g/TSKzWLmfMJTmHQpkm2YQkPHIjAvty44+z3tQrJuS 7g3Eq1sl+CQkpoCS3wmS1nDoTOU3Xz0pDdcLRtvUEyerH9ifUWvk+u2RcDj221kY3upH Q97ym8iboJ1cPOSei7paFD471ceIxaaSAaJjpFPpIXe7VcB1vIRqPQJ6gEcY8OjoJp1K hdMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e2-v6si6788991pls.160.2018.03.22.09.48.30; Thu, 22 Mar 2018 09:48:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752026AbeCVQrD (ORCPT + 99 others); Thu, 22 Mar 2018 12:47:03 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:33294 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751756AbeCVQrA (ORCPT ); Thu, 22 Mar 2018 12:47:00 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R911e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07488;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0SzvTlLk_1521737201; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:73.158.237.84) by smtp.aliyun-inc.com(127.0.0.1); Fri, 23 Mar 2018 00:46:43 +0800 Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section To: Laurent Dufour , Matthew Wilcox Cc: Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> <20180321130833.GM23100@dhcp22.suse.cz> <20180321172932.GE4780@bombadil.infradead.org> <20180321224631.GB3969@bombadil.infradead.org> <18a727fd-f006-9fae-d9ca-74b9004f0a8b@linux.vnet.ibm.com> <20180322154055.GB28468@bombadil.infradead.org> <0442fb0e-3da3-3f23-ce4d-0f6cbc3eac9a@linux.vnet.ibm.com> <20180322160547.GC28468@bombadil.infradead.org> <55ac947f-fd77-3754-ebfe-30d458c54403@linux.vnet.ibm.com> From: Yang Shi Message-ID: Date: Thu, 22 Mar 2018 09:46:38 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <55ac947f-fd77-3754-ebfe-30d458c54403@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/22/18 9:18 AM, Laurent Dufour wrote: > > On 22/03/2018 17:05, Matthew Wilcox wrote: >> On Thu, Mar 22, 2018 at 04:54:52PM +0100, Laurent Dufour wrote: >>> On 22/03/2018 16:40, Matthew Wilcox wrote: >>>> On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote: >>>>> Regarding the page fault, why not relying on the PTE locking ? >>>>> >>>>> When munmap() will unset the PTE it will have to held the PTE lock, so this >>>>> will serialize the access. >>>>> If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be >>>>> removed when mmap(MAP_FIXED) would do the cleanup. Fair enough. >>>> The page fault handler will walk the VMA tree to find the correct >>>> VMA and then find that the VMA is marked as deleted. If it assumes >>>> that the VMA has been deleted because of munmap(), then it can raise >>>> SIGSEGV immediately. But if the VMA is marked as deleted because of >>>> mmap(MAP_FIXED), it must wait until the new VMA is in place. >>> I'm wondering if such a complexity is required. >>> If the user space process try to access the page being overwritten through >>> mmap(MAP_FIXED) by another thread, there is no guarantee that it will >>> manipulate the *old* page or *new* one. >> Right; but it must return one or the other, it can't segfault. > Good point, I missed that... > >>> I'd think this is up to the user process to handle that concurrency. >>> What needs to be guaranteed is that once mmap(MAP_FIXED) returns the old page >>> are no more there, which is done through the mmap_sem and PTE locking. >> Yes, and allowing the fault handler to return the *old* page risks the >> old page being reinserted into the page tables after the unmapping task >> has done its work. > The PTE locking should prevent that. > >> It's *really* rare to page-fault on a VMA which is in the middle of >> being replaced. Why are you trying to optimise it? > I was not trying to optimize it, but to not wait in the page fault handler. > This could become tricky in the case the VMA is removed once mmap(MAP_FIXED) is > done and before the waiting page fault got woken up. This means that the > removed VMA structure will have to remain until all the waiters are woken up > which implies ref_count or similar. We may not need ref_count. After removing "locked-for-deletion" vmas when mmap(MAP_FIXED) is done, just wake up page fault to re-lookup vma, then it will find the new vma installed by mmap(MAP_FIXED), right? I'm not sure if completion can do this or not since I'm not quite familiar with it :-( Yang > >>>> I think I was wrong to describe VMAs as being *deleted*. I think we >>>> instead need the concept of a *locked* VMA that page faults will block on. >>>> Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of >>>> an rwsem since the only reason to write-lock the VMA is because it is >>>> being deleted. >>> Such a lock would only makes sense in the case of mmap(MAP_FIXED) since when >>> the VMA is removed there is no need to wait. Isn't it ? >> I can't think of another reason. I suppose we could mark the VMA as >> locked-for-deletion or locked-for-replacement and have the SIGSEGV happen >> early. But I'm not sure that optimising for SIGSEGVs is a worthwhile >> use of our time. Just always have the pagefault sleep for a deleted VMA.