Received: by 10.213.65.68 with SMTP id h4csp148007imn; Wed, 21 Mar 2018 14:47:28 -0700 (PDT) X-Google-Smtp-Source: AG47ELtwq4nhIejxTB6J0x1AMvzMYH61WqZRZcPzvdlMbYIBnVp1LHUOyQpm4xzYXdwU5V3XV/zd X-Received: by 10.101.97.139 with SMTP id c11mr15894133pgv.439.1521668848091; Wed, 21 Mar 2018 14:47:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521668848; cv=none; d=google.com; s=arc-20160816; b=Vx4V534MXq4EkekaoV0dXZ3h44AcpccW/j1vrtlUfRFUuWnxpg/4fOLrHaZHhnYn2T koA2AC+hq3NAbX0Z0wVHuK+1j/lvUOx8GIQeZERdH3KKb6DTSjQ5Akzv64ryQPPH/8ej AHVdvekGUF5LLBEfGbAgFTgeAMXWreXOkU5MbkKvwZ10jsJ4a7oUpa8QCiN+sko+FZ5Q mBcYFchpmO7Vy9xA+ETZ3ps9fnUjVmyd5/ZZIKGhPVpkXfrg/fwPqZRm1Y6ssCmcwjhT x8pWfRt/IqBkj/1btJXZa60hEQyOFJY/1ZiIBB8v0TQfZ2gd0jIDS7kxoNgUA7Qne8l6 Qzow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=LKu01LuSCgr64Qj/koYRnh2sx8dHyXwiCGfDjlLYZo4=; b=ijvc7XZdUOPlcAeay9AqAH2UuOe6MCNzf1X2y4dQTdRA/8HiSPJfvPH3wBBiPS0Z6x zJzF/ZMpXCun5AX3UCi5wcgBFCZ+heIP1ep4YvT4GBfo6cT4vctdUtw03fMVVhxPYSdD 7wqq1ULldXMZckV/nRolaD27d/ypjwDtpyoH0u8NZfvfdsaJN1z3B6V3jeOWGKX5oEtq RArhM4IySyOpN4PqLbFje7x7hmibzgkH+mKhrsxBmnMp+72x/mzcAbL1a0vTdDDA4k7t qbryjE/XQBimkXxmojinfdELb+Yj21A5lgED8ysvu/E7Hu7BtNCrwYssJN4lgelPgMOj POXQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p3si3338803pgc.776.2018.03.21.14.47.13; Wed, 21 Mar 2018 14:47:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753805AbeCUVqF (ORCPT + 99 others); Wed, 21 Mar 2018 17:46:05 -0400 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:38214 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753704AbeCUVqE (ORCPT ); Wed, 21 Mar 2018 17:46:04 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R841e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0Szs3x6X_1521668745; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:121.0.29.197) by smtp.aliyun-inc.com(127.0.0.1); Thu, 22 Mar 2018 05:45:48 +0800 Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section To: Matthew Wilcox Cc: Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> <20180321130833.GM23100@dhcp22.suse.cz> <20180321172932.GE4780@bombadil.infradead.org> From: Yang Shi Message-ID: Date: Wed, 21 Mar 2018 14:45:44 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20180321172932.GE4780@bombadil.infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/21/18 10:29 AM, Matthew Wilcox wrote: > On Wed, Mar 21, 2018 at 09:31:22AM -0700, Yang Shi wrote: >> On 3/21/18 6:08 AM, Michal Hocko wrote: >>> Yes, this definitely sucks. One way to work that around is to split the >>> unmap to two phases. One to drop all the pages. That would only need >>> mmap_sem for read and then tear down the mapping with the mmap_sem for >>> write. This wouldn't help for parallel mmap_sem writers but those really >>> need a different approach (e.g. the range locking). >> page fault might sneak in to map a page which has been unmapped before? >> >> range locking should help a lot on manipulating small sections of a large >> mapping in parallel or multiple small mappings. It may not achieve too much >> for single large mapping. > I don't think we need range locking. What if we do munmap this way: > > Take the mmap_sem for write > Find the VMA > If the VMA is large(*) > Mark the VMA as deleted > Drop the mmap_sem > zap all of the entries > Take the mmap_sem > Else > zap all of the entries > Continue finding VMAs > Drop the mmap_sem > > Now we need to change everywhere which looks up a VMA to see if it needs > to care the the VMA is deleted (page faults, eg will need to SIGBUS; mmap Marking vma as deleted sounds good. The problem for my current approach is the concurrent page fault may succeed if it access the not yet unmapped section. Marking deleted vma could tell page fault the vma is not valid anymore, then return SIGSEGV. > does not care; munmap will need to wait for the existing munmap operation Why mmap doesn't care? How about MAP_FIXED? It may fail unexpectedly, right? Thanks, Yang > to complete), but it gives us the atomicity, at least on a per-VMA basis. > > We could also do: > > Take the mmap_sem for write > Mark all VMAs in the range as deleted & modify any partial VMAs > Drop mmap_sem > zap pages from deleted VMAs > > That would give us the same atomicity that we have today. > > Deleted VMAs would need a pointer to a completion, so operations that > need to wait can queue themselves up. I'd recommend we use the low bit > of vm_file and treat it as a pointer to a struct completion if set.