Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp532679imm; Wed, 4 Jul 2018 01:15:35 -0700 (PDT) X-Google-Smtp-Source: AAOMgperG7cA1anlkovB47Irrot52XarIJCPe1VIPmIpMNRK2yHntVPkWhLv6DEuVbK+6vE56jpB X-Received: by 2002:a65:6110:: with SMTP id z16-v6mr1012965pgu.412.1530692135262; Wed, 04 Jul 2018 01:15:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530692135; cv=none; d=google.com; s=arc-20160816; b=EA2/2ac7CIsKdYH4ZYizTeuxfMZLtzvhpSUQmEFDtW4jw0I80U10I4jqawLIc+3l3H 4y9DOrp4KBHHsANAawHgRpRkZuoA4v+0mosV+3sE7g4Vfwro86QJau0mf4Q0XDvpzOs6 thq7wPrlyZwA/OaRJ5U9FHVGCxZrin3BWeKZZbolU/DQdcRMVAQ1u2bAsINVUc9KxX1R a/o6M+J9wL4boMe7P3bXyRR/hSefX8dD26LhKrnhNSgbHgoVF2cAQpX6FK9qDcMGZ/Zp SVfp0S8HFPLjuPumaUaSSD4XhEc/F+StlqvR/GW4+fO4zo4yzv++buGEv1x+e7pyj0v+ kvVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=bWufFsbXT20WDb3PddIo/osnPHUFK3KwZv0m85U7DbA=; b=kJr0o9IvSICnkX65fyczuIqfy0px23uNHmYvsetPlZA8UMZBX4eEoZi+SlmRVr0/6l cCElQA0u65OpBGanA+SSYR8sVS4rPZcMnMnPftvHj4Ckq2u5/YjOvTqurpltZYeUGcqw 8aYY6SwN1ZdYhw5w/DzbLkCx+7zdqmC8/xB70My23VuoXeotn2nz7zNhGLMiA03sZTas CO7w/wPZAAONfav/xqeaSTyTvnkHU+Y+9ubezYF2uoUH8fcqQahYWH5aXNQcpR6eF5U1 +2R86uDK4b2beuY+JgjGbPIUWc1eyG2IfTDshcUG17/c+ZbeBSCSqjyS8zy7kU0WwHax Q0Vg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v4-v6si2695160pgv.624.2018.07.04.01.15.20; Wed, 04 Jul 2018 01:15:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933898AbeGDIN7 (ORCPT + 99 others); Wed, 4 Jul 2018 04:13:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:60278 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933756AbeGDINt (ORCPT ); Wed, 4 Jul 2018 04:13:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 21939ACC5; Wed, 4 Jul 2018 08:13:48 +0000 (UTC) Date: Wed, 4 Jul 2018 10:13:47 +0200 From: Michal Hocko To: Yang Shi Cc: Andrew Morton , willy@infradead.org, ldufour@linux.vnet.ibm.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping Message-ID: <20180704081347.GG22503@dhcp22.suse.cz> References: <1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com> <1530311985-31251-5-git-send-email-yang.shi@linux.alibaba.com> <20180629183501.9e30c26135f11853245c56c7@linux-foundation.org> <084aeccb-2c54-2299-8bf0-29a10cc0186e@linux.alibaba.com> <20180629201547.5322cfc4b52d19a0443daec2@linux-foundation.org> <20180702140502.GZ19043@dhcp22.suse.cz> <20180702134845.c4f536dead5374b443e24270@linux-foundation.org> <20180703060921.GA16767@dhcp22.suse.cz> <658e4c7b-d426-11ab-ef9a-018579cbf756@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <658e4c7b-d426-11ab-ef9a-018579cbf756@linux.alibaba.com> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 03-07-18 11:22:17, Yang Shi wrote: > > > On 7/2/18 11:09 PM, Michal Hocko wrote: > > On Mon 02-07-18 13:48:45, Andrew Morton wrote: > > > On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko wrote: > > > > > > > On Fri 29-06-18 20:15:47, Andrew Morton wrote: > > > > [...] > > > > > Would one of your earlier designs have addressed all usecases? I > > > > > expect the dumb unmap-a-little-bit-at-a-time approach would have? > > > > It has been already pointed out that this will not work. > > > I said "one of". There were others. > > Well, I was aware only about two potential solutions. Either do the > > heavy lifting under the shared lock and do the rest with the exlusive > > one and this, drop the lock per parts. Maybe I have missed others? > > > > > > You simply > > > > cannot drop the mmap_sem during unmap because another thread could > > > > change the address space under your feet. So you need some form of > > > > VM_DEAD and handle concurrent and conflicting address space operations. > > > Unclear that this is a problem. If a thread does an unmap of a range > > > of virtual address space, there's no guarantee that upon return some > > > other thread has not already mapped new stuff into that address range. > > > So what's changed? > > Well, consider the following scenario: > > Thread A = calling mmap(NULL, sizeA) > > Thread B = calling munmap(addr, sizeB) > > > > They do not use any external synchronization and rely on the atomic > > munmap. Thread B only munmaps range that it knows belongs to it (e.g. > > called mmap in the past). It should be clear that ThreadA should not > > get an address from the addr, sizeB range, right? In the most simple case > > it will not happen. But let's say that the addr, sizeB range has > > unmapped holes for what ever reasons. Now anytime munmap drops the > > exclusive lock after handling one VMA, Thread A might find its sizeA > > range and use it. ThreadB then might remove this new range as soon as it > > gets its exclusive lock again. > > I'm a little bit confused here. If ThreadB already has unmapped that range, > then ThreadA uses it. It sounds not like a problem since ThreadB should just > go ahead to handle the next range when it gets its exclusive lock again, > right? I don't think of why ThreadB would re-visit that range to remove it. Not if the new range overlap with the follow up range that ThreadB does. Example B: munmap [XXXXX] [XXXXXX] [XXXXXXXXXX] B: breaks the lock after processing the first vma. A: mmap [XXXXXXXXXXXX] B: munmap retakes the lock and revalidate from the last vm_end because the old vma->vm_next might be gone B: [XXX][XXXXX] [XXXXXXXXXX] so you munmap part of the range. Sure you can plan some tricks and skip over vmas that do not start above your last vma->vm_end or something like that but I expect there are other can of worms hidden there. -- Michal Hocko SUSE Labs