Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp5820599imm; Tue, 26 Jun 2018 19:32:07 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdJx1PdsZcC4sFwdrheSgTYnihiw4yoUm7X9/ML0cp+Ln4UjmwLqk5angxVEjC/VrmDqwIS X-Received: by 2002:a62:770b:: with SMTP id s11-v6mr3928903pfc.61.1530066727503; Tue, 26 Jun 2018 19:32:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530066727; cv=none; d=google.com; s=arc-20160816; b=TRDqYPZ+KgNLyKTUMfYJ7Ly7j2LYQSFnrNXoXGMq5hDttSkiiwdNib/CbpgB5OyOa2 zkJVwN0TP2pmdzajjLBmj5nOtbsyQ1/aS0Y2WlBUhilS0Z/TPooyylhnaaQJjFVAuPXd BZg0FnnkinLHQ56V6uVAkWQeXD7AUiXTIqcwjkCFCetcvchdxv2d+sCzAEwzz9OC8lZX q1rqOXxWu6jdhMtASL7d6ObsDnSvviibWgMKb5k0o4I5kCcWMbRJXaWDWzBiz1HCLpCF wbKs29TzkBHHh3enqS2H1D4QZz+wTR7KQ2MhwUPo1KnxcS5NfykWFY4W3kNGteiRauT6 0/zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=J02U1DLLB3UOI2BxA70pY1xVnDGM+Vn22cHWwOEe6FM=; b=BsybEBbiUYwfkBzbDfCUj+YokiZz5TuRVubkWEo0YMzpNgIhCgfj2F+lyRJJUz4NEf ZJhklyzUxaVy4k7LfcNJ6jaQAWY14deT0Hly4A+5cPvRxuoAx1nriy1xZYbKXe/l1PGD 6VcjhUp0xjaZi4xNO1MC1lkHoBW2LwvA04zF4tyrI4gDNUsGmuf/Y2XpT5ICbbrbXKyT ONWuwLSRbA3Mi5Q17kFdXnZt2IDRNcLl/7suQQ2z3io4gqMws9MRgwRhL6+j7rpHjQCM AevQVvgmbjoTtCeaWePa9WaS2RfL8agf8RNAoCddO+YaQpz1kJkLC/0bPL9wGScCLYe8 w3wA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x185-v6si282889pfx.16.2018.06.26.19.31.52; Tue, 26 Jun 2018 19:32:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934550AbeF0BDz (ORCPT + 99 others); Tue, 26 Jun 2018 21:03:55 -0400 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:50968 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932649AbeF0BDy (ORCPT ); Tue, 26 Jun 2018 21:03:54 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0T3PEXoX_1530061415; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0T3PEXoX_1530061415) by smtp.aliyun-inc.com(127.0.0.1); Wed, 27 Jun 2018 09:03:41 +0800 Subject: Re: [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping To: Peter Zijlstra Cc: Michal Hocko , Nadav Amit , Matthew Wilcox , ldufour@linux.vnet.ibm.com, Andrew Morton , Ingo Molnar , acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, "open list:MEMORY MANAGEMENT" , linux-kernel@vger.kernel.org References: <1529364856-49589-1-git-send-email-yang.shi@linux.alibaba.com> <1529364856-49589-3-git-send-email-yang.shi@linux.alibaba.com> <3DDF2672-FCC4-4387-9624-92F33C309CAE@gmail.com> <158a4e4c-d290-77c4-a595-71332ede392b@linux.alibaba.com> <20180620071817.GJ13685@dhcp22.suse.cz> <263935d9-d07c-ab3e-9e42-89f73f57be1e@linux.alibaba.com> <20180626074344.GZ2458@hirez.programming.kicks-ass.net> From: Yang Shi Message-ID: Date: Tue, 26 Jun 2018 18:03:34 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180626074344.GZ2458@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/26/18 12:43 AM, Peter Zijlstra wrote: > On Mon, Jun 25, 2018 at 05:06:23PM -0700, Yang Shi wrote: >> By looking this deeper, we may not be able to cover all the unmapping range >> for VM_DEAD, for example, if the start addr is in the middle of a vma. We >> can't set VM_DEAD to that vma since that would trigger SIGSEGV for still >> mapped area. >> >> splitting can't be done with read mmap_sem held, so maybe just set VM_DEAD >> to non-overlapped vmas. Access to overlapped vmas (first and last) will >> still have undefined behavior. > Acquire mmap_sem for writing, split, mark VM_DEAD, drop mmap_sem. Acquire > mmap_sem for reading, madv_free drop mmap_sem. Acquire mmap_sem for > writing, free everything left, drop mmap_sem. > > ? > > Sure, you acquire the lock 3 times, but both write instances should be > 'short', and I suppose you can do a demote between 1 and 2 if you care. Thanks, Peter. Yes, by looking the code and trying two different approaches, it looks this approach is the most straight-forward one. Splitting vma up-front can save a lot pain later. Holding write mmap_sem for this job before zapping mappings sounds worth the cost (very short write critical section). And, VM_DEAD can be set exclusively with write mmap_sem without racing with page faults, this will give us consistent behavior for the race between PF and munmap. And, we don't need care about overlapped vma since it has been split before. Yang