Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp548578imm; Mon, 2 Jul 2018 17:03:38 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKfL878aCxobVW5mNMQP55dpDmLBvFbJ6IA/nEYmCZtkQqx7GOxghO5eBv59FEm3qmIhvhS X-Received: by 2002:a63:85c8:: with SMTP id u191-v6mr23406154pgd.36.1530576218781; Mon, 02 Jul 2018 17:03:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530576218; cv=none; d=google.com; s=arc-20160816; b=pM7FiOei2pI6sGW0It0rPwO1KKKFp8DFxGHr9f7xuZbuFBXBuLBAfJICDU2vTRU3f0 Zjrrrg8YR0G8CO8w9UKLZJkqkR8QYPuwrXikIqDRCySF1VrZP1mVrX++U15/sM6y7PwV 2BE2koOUE/8ZyZs5ZgPQuC90P8PbYjDqe8q5SddixfHYiqDGqFOq8/RVyJ4GfcK9ROtY L7MLnyYBcmS39qrCNV1Hytm76rkUNoP8TUHpMwiO+gxzu1YOhQKdw2fzsghMRBSEIGVm DvgzLtdEhvhIbSwfcvjhZ8Nk2kfHFt8DOfpCFG5qUnZDldF7FIhsP9rWn3vmOp4ss1eT V/uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject:arc-authentication-results; bh=/GoS4oJ9l07BcYyCqiTBQFg3jeehvqJR2qPXx40sXrI=; b=P5RrprxlDseqT68YNAj/6s9cRpcyRlE8+mTSw2YFdQfSG/RTg5nUgmAQ+XKXdxmb5z EwzYoK6ktY4oGJnhhaRttQ5GU2Ppa+CXNb2s2BvZy6vz9bg72Ix6vBmm2d7wBeepjS08 9i3ItJX3KZmIwFxGVV0ZuabhHiYU7XX92E0CQAKlIyiBWrZGdv0YCoYPKvBjFV3sQXQG W0JiI2hpCiLsYTAUXK6Kt/VKHHhdokTScurCEHiqo4BGLLnmICHHJDoDd5nYuLVfFRYZ tDCs3IAVDK+gcIe6GT7jh0cDLRfsssBI1PkDWNIZK54D71ZAOCojz/vswaAtkBICbp/e g0eA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s197-v6si14176856pgs.637.2018.07.02.17.03.23; Mon, 02 Jul 2018 17:03:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753792AbeGCABs (ORCPT + 99 others); Mon, 2 Jul 2018 20:01:48 -0400 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:35939 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753639AbeGCABr (ORCPT ); Mon, 2 Jul 2018 20:01:47 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0T3rk.0Y_1530576073; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0T3rk.0Y_1530576073) by smtp.aliyun-inc.com(127.0.0.1); Tue, 03 Jul 2018 08:01:21 +0800 Subject: Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping From: Yang Shi To: Andrew Morton Cc: mhocko@kernel.org, willy@infradead.org, ldufour@linux.vnet.ibm.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org, tglx@linutronix.de, hpa@zytor.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org References: <1530311985-31251-1-git-send-email-yang.shi@linux.alibaba.com> <1530311985-31251-5-git-send-email-yang.shi@linux.alibaba.com> <20180629183501.9e30c26135f11853245c56c7@linux-foundation.org> <084aeccb-2c54-2299-8bf0-29a10cc0186e@linux.alibaba.com> <20180629201547.5322cfc4b52d19a0443daec2@linux-foundation.org> Message-ID: <06df816f-b8b7-f6c0-3710-baad99fb3213@linux.alibaba.com> Date: Mon, 2 Jul 2018 17:01:12 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/29/18 9:26 PM, Yang Shi wrote: > > > On 6/29/18 8:15 PM, Andrew Morton wrote: >> On Fri, 29 Jun 2018 19:28:15 -0700 Yang Shi >> wrote: >> >>> >>>> we're adding a bunch of code to 32-bit kernels which will never be >>>> executed. >>>> >>>> I'm thinking it would be better to be much more explicit with "#ifdef >>>> CONFIG_64BIT" in this code, rather than relying upon the above magic. >>>> >>>> But I tend to think that the fact that we haven't solved anything on >>>> locked vmas or on uprobed mappings is a shostopper for the whole >>>> approach :( >>> I agree it is not that perfect. But, it still could improve the most >>> use >>> cases. >> Well, those unaddressed usecases will need to be fixed at some point. > > Yes, definitely. > >> What's our plan for that? > > As I mentioned in the earlier email, locked and hugetlb cases might be > able to be solved by separating vm_flags update and actual unmap. I > will look into it further later. By looking into this further, I think both mlocked and hugetlb vmas can be handled. For mlocked vmas, it is easy since we acquires write mmap_sem before unmapping, so VM_LOCK flags can be cleared here then unmap, just like what the regular path does. For hugetlb vmas, the VM_MAYSHARE flag is just checked by huge_pmd_share() in hugetlb_fault()->huge_pte_alloc(), another call site is dup_mm()->copy_page_range()->copy_hugetlb_page_range(), we don't care this call chain in this case. So we may expand VM_DEAD to hugetlb_fault().  Michal suggested to check VM_DEAD in check_stable_address_space(), so it would be called in hugetlb_fault() too (not in current code), then the page fault handler would bail out before huge_pte_alloc() is called. With this trick, we don't have to care about when the vm_flags is updated, we can unmap hugetlb vmas in read mmap_sem critical section, then update the vm_flags with write mmap_sem held or before the unmap. Yang > > From my point of view, uprobe mapping sounds not that vital. > >> >> Would one of your earlier designs have addressed all usecases? I >> expect the dumb unmap-a-little-bit-at-a-time approach would have? > > Yes. The v1 design does unmap with holding write map_sem. So, the > vm_flags update is not a problem. > > Thanks, > Yang > >> >>> For the locked vmas and hugetlb vmas, unmapping operations need modify >>> vm_flags. But, I'm wondering we might be able to separate unmap and >>> vm_flags update. Because we know they will be unmapped right away, the >>> vm_flags might be able to be updated in write mmap_sem critical section >>> before the actual unmap is called or after it. This is just off the top >>> of my head. >>> >>> For uprobed mappings, I'm not sure how vital it is to this case. >>> >>> Thanks, >>> Yang >>> >