Received: by 10.213.65.68 with SMTP id h4csp180029imn; Wed, 21 Mar 2018 15:41:59 -0700 (PDT) X-Google-Smtp-Source: AG47ELvWlKp4iEqoVBf60rmKgMqLfqDVmD+ZGvVI/jECp14nbIZ7K26kXKL7MV1nZslviq6Uo7i5 X-Received: by 2002:a17:902:a705:: with SMTP id w5-v6mr22925764plq.299.1521672119483; Wed, 21 Mar 2018 15:41:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521672119; cv=none; d=google.com; s=arc-20160816; b=jwhAR85EYXavlLh4PF24Hk+0QRFLx249aJXSgpT8P5gWuq/Ej6fb252qmOrbyshEIR VTiSDj8GfPx961J/dNuNTGxU47bgF0KWf8bNdDVBQPREcB0y0tmq4FZrW9FZJ6Pjpq/X Myk+w0DQWwGvcfQCN8ZWzdRIf66LFbdYUkMe1+VNHpm2kKjrb07+BQllaSOrF3u8QUwc Ogz6s9Okiii4uhuFrvg4rHDKuBMR0IgducfOmfTElKLARuZz7lsBACyfPQOTrBU6Z3G1 xQViCP9WRcBMtmsEwtH7IHXjba79DrzsPpOq+SrcOT5nY01PkKsZTpRiDgLxNgke2Ljl G8JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=iwTSS6slYmHX2Cj2QXY2H5BfJCFoxIDBpa57pEhZgbQ=; b=DaMWM4+bsPIOcZeaXFTkL67FACmC8ahl5Nc15r7MYoLVlJ6qUhi7Q6vlrksmptHSbp 1HkFPXoKxD6rIMg/fN3uZhTC1qTnpTnCz42brusuBxl0mJCGLo7NaLoyh7OlVDVCBu4g PhLbEYauWwo1ARIlkkO3iYRVZwrwjIIgOWrzppmqlwxG0pG4N3YISVIul6IcSM321peM exlxCBHtfkQhOjXf3mJUdKMKc1jEVMXBHkz4R7T5VwocO2sTZItfaPAbNwCNYT33upJi aASBepSWuM4fFSDgBmkAW5K4jKxHkiYlxvm85iqMfYgRbjN9Mxlfo78sHDrjjSmoe9SH 2ueQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 32-v6si4750666plg.555.2018.03.21.15.41.45; Wed, 21 Mar 2018 15:41:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754070AbeCUWk2 (ORCPT + 99 others); Wed, 21 Mar 2018 18:40:28 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:51182 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754050AbeCUWk1 (ORCPT ); Wed, 21 Mar 2018 18:40:27 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R381e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04452;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0Szs43U6_1521672009; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:121.0.29.197) by smtp.aliyun-inc.com(127.0.0.1); Thu, 22 Mar 2018 06:40:12 +0800 Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section To: Matthew Wilcox Cc: Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> <20180321130833.GM23100@dhcp22.suse.cz> <20180321172932.GE4780@bombadil.infradead.org> <20180321221502.GA3969@bombadil.infradead.org> From: Yang Shi Message-ID: <274f9d37-3dee-2bff-b1fd-1ca7fa41f1ca@linux.alibaba.com> Date: Wed, 21 Mar 2018 15:40:09 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20180321221502.GA3969@bombadil.infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/21/18 3:15 PM, Matthew Wilcox wrote: > On Wed, Mar 21, 2018 at 02:45:44PM -0700, Yang Shi wrote: >> On 3/21/18 10:29 AM, Matthew Wilcox wrote: >>> On Wed, Mar 21, 2018 at 09:31:22AM -0700, Yang Shi wrote: >>>> On 3/21/18 6:08 AM, Michal Hocko wrote: >>>>> Yes, this definitely sucks. One way to work that around is to split the >>>>> unmap to two phases. One to drop all the pages. That would only need >>>>> mmap_sem for read and then tear down the mapping with the mmap_sem for >>>>> write. This wouldn't help for parallel mmap_sem writers but those really >>>>> need a different approach (e.g. the range locking). >>>> page fault might sneak in to map a page which has been unmapped before? >>>> >>>> range locking should help a lot on manipulating small sections of a large >>>> mapping in parallel or multiple small mappings. It may not achieve too much >>>> for single large mapping. >>> I don't think we need range locking. What if we do munmap this way: >>> >>> Take the mmap_sem for write >>> Find the VMA >>> If the VMA is large(*) >>> Mark the VMA as deleted >>> Drop the mmap_sem >>> zap all of the entries >>> Take the mmap_sem >>> Else >>> zap all of the entries >>> Continue finding VMAs >>> Drop the mmap_sem >>> >>> Now we need to change everywhere which looks up a VMA to see if it needs >>> to care the the VMA is deleted (page faults, eg will need to SIGBUS; mmap >> Marking vma as deleted sounds good. The problem for my current approach is >> the concurrent page fault may succeed if it access the not yet unmapped >> section. Marking deleted vma could tell page fault the vma is not valid >> anymore, then return SIGSEGV. >> >>> does not care; munmap will need to wait for the existing munmap operation >> Why mmap doesn't care? How about MAP_FIXED? It may fail unexpectedly, right? > Oh, I forgot about MAP_FIXED. Yes, MAP_FIXED should wait for the munmap > to finish. But a regular mmap can just pretend that it happened before > the munmap call and avoid the deleted VMAs. But, my test shows race condition for reduced size mmap which calls do_munmap(). It may need wait for the munmap finish too. So, in my patches, I just make the do_munmap() called from mmap() hold mmap_sem all the time. Thanks, Yang