Received: by 10.213.65.68 with SMTP id h4csp796074imn; Thu, 22 Mar 2018 08:44:47 -0700 (PDT) X-Google-Smtp-Source: AG47ELsgr/hFLl186KJsQhjGjcmuRWgAFQRGOggWwt79jHWCjmYn7LY1QhaKjQf6X/tktnnwxyXr X-Received: by 2002:a17:902:322:: with SMTP id 31-v6mr25749958pld.122.1521733487702; Thu, 22 Mar 2018 08:44:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521733487; cv=none; d=google.com; s=arc-20160816; b=QD8pNQjxVKuyD8CFw/4eHcKutLRRYZ6zFmFLhNGMysJS4CYG8XiMtMsA+ggHhVLJtr 4jsz1SLZ/QasSuTt0p+9qORHvk9XtBfcWR3Kb3cThg/deF3q5u6UlWQhwa4t3mDFZEwe oGKjrIlae6h8LruR+YPZcLpsZXdtCohWmi6nxR63P40U/BPErm61AEcMpR8aakHnb+Ya iIVcnlx8gndR+dx5EV7R40v8DVGkZMMF6paMJtvjmJwNuClEl42w+kQXz1ik0BLqusCL H7nLkPrTGOgOI5HnOQvX8djXYQjgW83B4w3jlSU0+KaX4OdEYrhLXGuDveofHuLjzWPn U1Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=dFDKiw9T4yoVyYuGafWLBfjOCT/R2NN6CXWAwesfLE4=; b=bNj17fUmC+drO6/CXa4FGghB9/R9bgROTTl+11ZD2CS2SQKfOgOrstLZILM4YDh6oa y9UxztiMCK2kgupB/nr1Jg/oK968C8D+8TH7DOX9O+ZbdrkbdlFUj/W5EJz013zaw4V3 FPSNxk5N6W+YVRwPnwEcY4bA9hXRXnUVsLRpCXfwir2oq3GRZixa3YCQoW8ltF5BfJfi qrzoqOZTS6N+jM6/UmfyNhWy3+YQRcvPHStPt93XSfEcCVaON6WHDOSj2uo1h9HNLucY TXutuXBQkyuSfQ7s/YVfGYwIIZNsCEpGTsXdL6NAaktgBaQD0n8zYr6rDKIPukFWHgMx W5QA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=K1z/qLfP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u132si4596150pgc.802.2018.03.22.08.44.32; Thu, 22 Mar 2018 08:44:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=K1z/qLfP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751539AbeCVPlA (ORCPT + 99 others); Thu, 22 Mar 2018 11:41:00 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:44656 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751140AbeCVPk7 (ORCPT ); Thu, 22 Mar 2018 11:40:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=dFDKiw9T4yoVyYuGafWLBfjOCT/R2NN6CXWAwesfLE4=; b=K1z/qLfPON7j5TYzr3iCVz8iY Oh3YPkCWn/+IcTM2qC2X+ckzWd+oSF7zBO3YMM0uMeGKuoffbSojlaQBRQYKLNReYoBuKTI/E031q WZwJ5TKbv1Aa8DGE0FWrIN0K7zyNs82ap58vhwLRixlBeHp9b1x+eOC0JUyfnUKt8zRuGf7R4/dVU BF4twEc/Zs2B+VeWgPQFveU+fbcGG7uv6HcsJAXBBuAQ0X4l9EmUj8kzw3kPAXf67F13lQcR1DZs8 bk+Styq0eMt5gNZbFpDSKcGTitDAA58p6gW4uro89cVWjThNIUcJbJIuWjpKAdOzLYd6vKoQSHmMR kaJmR/K3A==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1ez2Kl-0006SH-Ou; Thu, 22 Mar 2018 15:40:55 +0000 Date: Thu, 22 Mar 2018 08:40:55 -0700 From: Matthew Wilcox To: Laurent Dufour Cc: Yang Shi , Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section Message-ID: <20180322154055.GB28468@bombadil.infradead.org> References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> <20180321130833.GM23100@dhcp22.suse.cz> <20180321172932.GE4780@bombadil.infradead.org> <20180321224631.GB3969@bombadil.infradead.org> <18a727fd-f006-9fae-d9ca-74b9004f0a8b@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <18a727fd-f006-9fae-d9ca-74b9004f0a8b@linux.vnet.ibm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote: > On 21/03/2018 23:46, Matthew Wilcox wrote: > > On Wed, Mar 21, 2018 at 02:45:44PM -0700, Yang Shi wrote: > >> Marking vma as deleted sounds good. The problem for my current approach is > >> the concurrent page fault may succeed if it access the not yet unmapped > >> section. Marking deleted vma could tell page fault the vma is not valid > >> anymore, then return SIGSEGV. > >> > >>> does not care; munmap will need to wait for the existing munmap operation > >> > >> Why mmap doesn't care? How about MAP_FIXED? It may fail unexpectedly, right? > > > > The other thing about MAP_FIXED that we'll need to handle is unmapping > > conflicts atomically. Say a program has a 200GB mapping and then > > mmap(MAP_FIXED) another 200GB region on top of it. So I think page faults > > are also going to have to wait for deleted vmas (then retry the fault) > > rather than immediately raising SIGSEGV. > > Regarding the page fault, why not relying on the PTE locking ? > > When munmap() will unset the PTE it will have to held the PTE lock, so this > will serialize the access. > If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be > removed when mmap(MAP_FIXED) would do the cleanup. Fair enough. The page fault handler will walk the VMA tree to find the correct VMA and then find that the VMA is marked as deleted. If it assumes that the VMA has been deleted because of munmap(), then it can raise SIGSEGV immediately. But if the VMA is marked as deleted because of mmap(MAP_FIXED), it must wait until the new VMA is in place. I think I was wrong to describe VMAs as being *deleted*. I think we instead need the concept of a *locked* VMA that page faults will block on. Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of an rwsem since the only reason to write-lock the VMA is because it is being deleted.