Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1368906imm; Tue, 3 Jul 2018 09:55:31 -0700 (PDT) X-Google-Smtp-Source: ADUXVKL+EEDazwJZoVJQJmbIqHQtRHVCSXpAkvKi8UZ8PX5FEmimzr0eUAr9p3YzhUMINTdeMBwz X-Received: by 2002:a65:6114:: with SMTP id z20-v6mr26452515pgu.312.1530636931015; Tue, 03 Jul 2018 09:55:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530636930; cv=none; d=google.com; s=arc-20160816; b=L1mykRGAldlb0/ZHNA59du53B1RdfhYEE/9SZzVj2DobdUijeVqPHHyLz49PM84MZT IlyjNzjaMQCLIGTZ7C8aI7ieP2utq/RQf+2b5Ly4SSLyhFv+u9FNx/opW89f82eMkY/C NlG+DMZaqZrkoMdxFvHL9IE84VhFM3OSVCEmFoJsRi4M9OhzjNCZmfeiWFwnosj9PnEO pz/TsjEYhUQSMbJA55MtV1X5f/c9l+dFnLD7OOTburRDl1EmrI6ms/YAT1OrTC9pz5bf PjT7uVe8PDL3iA46aKhpXitFvXbf48ZW8ozkzPyWCH6QuNH8NKQQtSuedcbxXPSTNMtc f9HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=fMNT8q9CYhcng4gQKs7j+l0tvaCAap4SKRjcGCvf6mY=; b=yC538buG5VWKoTAE4FPyg8L/IunXducPaOwmAG+PBZYlmfIA+0KVapF3o2NnyCzgyn O8QGiaYSz5u2SLmn99sNbhmFVpVe0PRbtY+oMmjFOUMW2h9w1lABL6/aP1hn9oPngN19 CicOPvGkc0jx4yFoKIB7GkZcyCyC+t9ODvBO0+3xIS4AYScT1Egn9wrGOWWqh1zKnq1r sdjbrC7w2eJG4++ksQvnz+NLLe4m5sEA48ALnSrr0cBuaDPmuBOqzB+7PxbDBLB3YE24 e7OYW8mljOZtks8Lz5eNVng/uTw3qzk3+0dCFhqjEXeeqa7/cibBw87x3joFtBv8bDq2 LTwQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 143-v6si1488236pfb.17.2018.07.03.09.55.15; Tue, 03 Jul 2018 09:55:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934066AbeGCQxw (ORCPT + 99 others); Tue, 3 Jul 2018 12:53:52 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:44936 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933929AbeGCQxt (ORCPT ); Tue, 3 Jul 2018 12:53:49 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R641e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0T3vSBRd_1530636810; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0T3vSBRd_1530636810) by smtp.aliyun-inc.com(127.0.0.1); Wed, 04 Jul 2018 00:53:38 +0800 Subject: Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping To: Michal Hocko , Andrew Morton Cc: willy@infradead.org, ldufour@linux.vnet.ibm.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org References: <1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com> <1530311985-31251-5-git-send-email-yang.shi@linux.alibaba.com> <20180629183501.9e30c26135f11853245c56c7@linux-foundation.org> <084aeccb-2c54-2299-8bf0-29a10cc0186e@linux.alibaba.com> <20180629201547.5322cfc4b52d19a0443daec2@linux-foundation.org> <20180702140502.GZ19043@dhcp22.suse.cz> <20180702134845.c4f536dead5374b443e24270@linux-foundation.org> <20180703060921.GA16767@dhcp22.suse.cz> From: Yang Shi Message-ID: Date: Tue, 3 Jul 2018 09:53:29 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180703060921.GA16767@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/2/18 11:09 PM, Michal Hocko wrote: > On Mon 02-07-18 13:48:45, Andrew Morton wrote: >> On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko wrote: >> >>> On Fri 29-06-18 20:15:47, Andrew Morton wrote: >>> [...] >>>> Would one of your earlier designs have addressed all usecases? I >>>> expect the dumb unmap-a-little-bit-at-a-time approach would have? >>> It has been already pointed out that this will not work. >> I said "one of". There were others. > Well, I was aware only about two potential solutions. Either do the > heavy lifting under the shared lock and do the rest with the exlusive > one and this, drop the lock per parts. Maybe I have missed others? There is the other one which I presented on LSFMM summit. But, actually it turns out that one looks very similar to the current under review one. Yang > >>> You simply >>> cannot drop the mmap_sem during unmap because another thread could >>> change the address space under your feet. So you need some form of >>> VM_DEAD and handle concurrent and conflicting address space operations. >> Unclear that this is a problem. If a thread does an unmap of a range >> of virtual address space, there's no guarantee that upon return some >> other thread has not already mapped new stuff into that address range. >> So what's changed? > Well, consider the following scenario: > Thread A = calling mmap(NULL, sizeA) > Thread B = calling munmap(addr, sizeB) > > They do not use any external synchronization and rely on the atomic > munmap. Thread B only munmaps range that it knows belongs to it (e.g. > called mmap in the past). It should be clear that ThreadA should not > get an address from the addr, sizeB range, right? In the most simple case > it will not happen. But let's say that the addr, sizeB range has > unmapped holes for what ever reasons. Now anytime munmap drops the > exclusive lock after handling one VMA, Thread A might find its sizeA > range and use it. ThreadB then might remove this new range as soon as it > gets its exclusive lock again. > > Is such a code safe? No it is not and I would call it fragile at best > but people tend to do weird things and atomic munmap behavior is > something they can easily depend on. > > Another example would be an atomic address range probing by > MAP_FIXED_NOREPLACE. It would simply break for similar reasons. > > I remember my attempt to make MAP_LOCKED consistent with mlock (if the > population fails then return -ENOMEM) and that required to drop the > shared mmap_sem and take it in exclusive mode (because we do not > have upgrade_read) and Linus was strongly against [1][2] for very > similar reasons. If you drop the lock you simply do not know what > happened under your feet. > > [1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@mail.gmail.com > [2] http://lkml.kernel.org/r/CA+55aFyajquhGhw59qNWKGK4dBV0TPmDD7-1XqPo7DZWvO_hPg@mail.gmail.com