Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp780721imm; Mon, 2 Jul 2018 23:10:23 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKzttMWZM7dw8g/moshly6YBuF+efpO9oQX7v07CKeL+51+oxJiIT5r7C0jkU34cIUgCDvv X-Received: by 2002:a17:902:280b:: with SMTP id e11-v6mr28484985plb.298.1530598223259; Mon, 02 Jul 2018 23:10:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530598223; cv=none; d=google.com; s=arc-20160816; b=amIwHCYvEdl6lap7DM1VSDj9XMyExQHf8WVZAVYIM0D9Jj3I0ODe9ckOXzfx2PCdXT 7P8naP2BcmqrabSMeinm3xJ5FsWQCLENuopoCS9k4kDXOaeoXmsTDuEeXdkziXyxGq6v 7byBXmvqqqRZ3GPNaq2JW4riqRMckFQsXQrOJIMrmIyMzKGZ8RNAq+Xw2o2+bZLR/Ex6 PJjvqgPzUh2B4aTMHOsPlxDuYDpTm13yOkiX/eBtBsuhWhnN8niIcJFKMSOHgMaDnD5R 0uz22xEw5cvKCOIc7KFdELdcgTsevIV1jKX1G0B2LdB4rtuKrvTzo+lowLepME/DUGni E0nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=Ah2/U3s2RN+EzRGdIuSgbzcHRStdORmyix39NG7DLCo=; b=RVqHToCb9QvvQwSCDB+LTsHdK2R8VtrUk/2hlGtAv9wn/j010QuiwLPMNnJ/1oWioz SUsO79U9T228LefXIPEvF7sGpg/x74nupCitu0BqQLl78fVReJmCPmcXNvwTgyTD0on6 YLvo/y1a9Dunz+S0VYxn4AxLwsQebjtnkVw9XY/x3NQOnMJiactc4nIFSTqDb0xLb1C8 j3H/St8OxlBkTeqfig0OcHh6lAuVq5DCKiAmHcoR1t6kOU3NPvDUgpvGivgPw0vdVkS7 NiNQEuh4ysXl25d0XqN0V+MeIm+SlQLx5LyVAFzHXbCI8UGflCbKWD7pNRqW1OcBPGNM booA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 63-v6si356746pgi.229.2018.07.02.23.10.08; Mon, 02 Jul 2018 23:10:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932718AbeGCGJZ (ORCPT + 99 others); Tue, 3 Jul 2018 02:09:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:56456 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932360AbeGCGJY (ORCPT ); Tue, 3 Jul 2018 02:09:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0F02FADE3; Tue, 3 Jul 2018 06:09:23 +0000 (UTC) Date: Tue, 3 Jul 2018 08:09:21 +0200 From: Michal Hocko To: Andrew Morton Cc: Yang Shi , willy@infradead.org, ldufour@linux.vnet.ibm.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping Message-ID: <20180703060921.GA16767@dhcp22.suse.cz> References: <1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com> <1530311985-31251-5-git-send-email-yang.shi@linux.alibaba.com> <20180629183501.9e30c26135f11853245c56c7@linux-foundation.org> <084aeccb-2c54-2299-8bf0-29a10cc0186e@linux.alibaba.com> <20180629201547.5322cfc4b52d19a0443daec2@linux-foundation.org> <20180702140502.GZ19043@dhcp22.suse.cz> <20180702134845.c4f536dead5374b443e24270@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180702134845.c4f536dead5374b443e24270@linux-foundation.org> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 02-07-18 13:48:45, Andrew Morton wrote: > On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko wrote: > > > On Fri 29-06-18 20:15:47, Andrew Morton wrote: > > [...] > > > Would one of your earlier designs have addressed all usecases? I > > > expect the dumb unmap-a-little-bit-at-a-time approach would have? > > > > It has been already pointed out that this will not work. > > I said "one of". There were others. Well, I was aware only about two potential solutions. Either do the heavy lifting under the shared lock and do the rest with the exlusive one and this, drop the lock per parts. Maybe I have missed others? > > You simply > > cannot drop the mmap_sem during unmap because another thread could > > change the address space under your feet. So you need some form of > > VM_DEAD and handle concurrent and conflicting address space operations. > > Unclear that this is a problem. If a thread does an unmap of a range > of virtual address space, there's no guarantee that upon return some > other thread has not already mapped new stuff into that address range. > So what's changed? Well, consider the following scenario: Thread A = calling mmap(NULL, sizeA) Thread B = calling munmap(addr, sizeB) They do not use any external synchronization and rely on the atomic munmap. Thread B only munmaps range that it knows belongs to it (e.g. called mmap in the past). It should be clear that ThreadA should not get an address from the addr, sizeB range, right? In the most simple case it will not happen. But let's say that the addr, sizeB range has unmapped holes for what ever reasons. Now anytime munmap drops the exclusive lock after handling one VMA, Thread A might find its sizeA range and use it. ThreadB then might remove this new range as soon as it gets its exclusive lock again. Is such a code safe? No it is not and I would call it fragile at best but people tend to do weird things and atomic munmap behavior is something they can easily depend on. Another example would be an atomic address range probing by MAP_FIXED_NOREPLACE. It would simply break for similar reasons. I remember my attempt to make MAP_LOCKED consistent with mlock (if the population fails then return -ENOMEM) and that required to drop the shared mmap_sem and take it in exclusive mode (because we do not have upgrade_read) and Linus was strongly against [1][2] for very similar reasons. If you drop the lock you simply do not know what happened under your feet. [1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@mail.gmail.com [2] http://lkml.kernel.org/r/CA+55aFyajquhGhw59qNWKGK4dBV0TPmDD7-1XqPo7DZWvO_hPg@mail.gmail.com -- Michal Hocko SUSE Labs