Received: by 10.213.65.68 with SMTP id h4csp1390320imn; Sat, 24 Mar 2018 11:27:16 -0700 (PDT) X-Google-Smtp-Source: AG47ELuTLaMbvN15vL7szuQlgFqixZV7heh2SY1llD779UVtDws7IPS3/7S0/7OJvHvBGRd1naPd X-Received: by 2002:a17:902:604e:: with SMTP id a14-v6mr34006176plt.356.1521916036626; Sat, 24 Mar 2018 11:27:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521916036; cv=none; d=google.com; s=arc-20160816; b=s/PUD+0CvCv3W/YtB1y1w+TSTUiHK5nAKv4ShGQb3hCOdnAv7+JU2RUYFqn+2YUKWO dIr+aGr4/wsh+OkfR1ZcFKB6Azpz8JrCyRdn//BtylweuNH2njgv/f4MIgcSe7eGzviC tW7TpWxXPSeGL+UOYlmesPFz3ttN8nCQSBaP/LuLGgYchmW/+LFdKACJGiEvrTtDavqR iEC7ZdahnYK1e4yPY/KOTu2zvJMqX2M96ss80oiy3K4/imkMNYpsrJiZxDceKYxaEyKd pLlsB6+JcGHYAjbqXHBbfbyrQo0iMgrrxQLqkjGGGD4TtDupdkg4PLAPLJMfU12Nv6kV U7Zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=yurO1NeXj7xZyQHklK5sjq8Z2dvHsjm3sSM6HrYxQKY=; b=N7p9dD7ebTIUUP+o44u7Wjttu4vQaKrV1Wtq2vspLPrJkHMAmFQqxaE4k84zH/QYjb 1lB4u1WGhFAjiEB9B6VfgCUVD8TQ7dEQ/YjqimhmdjD6PoQzoQTMMxPku1Qc3ZYN8upw Lz2YgUGLntSVLZrFTiYfkY9cnh8VFcqoD2AnxP3KhIaJoqjZCVhllB+Vk3OtCbR8Eg0t lPI0zIxqCy1SXb2OsXMAt5pLaGY4L+Rs9z2iiJpDkj72hyyx3CnxuBXnlogOr8rOwE0Y +BRBfBTwKB2n8u92+09gv0jrnM2j1ui0zx/hPyDAjaxAof3o8jwNlMkl0o6RUAuu99lz 6keA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t6-v6si10808096plr.503.2018.03.24.11.26.30; Sat, 24 Mar 2018 11:27:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752682AbeCXSYE (ORCPT + 99 others); Sat, 24 Mar 2018 14:24:04 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55502 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752253AbeCXSYD (ORCPT ); Sat, 24 Mar 2018 14:24:03 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 058D4818AAEE; Sat, 24 Mar 2018 18:24:03 +0000 (UTC) Received: from redhat.com (ovpn-120-168.rdu2.redhat.com [10.10.120.168]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E24672166BAE; Sat, 24 Mar 2018 18:24:01 +0000 (UTC) Date: Sat, 24 Mar 2018 14:24:00 -0400 From: Jerome Glisse To: Matthew Wilcox Cc: Yang Shi , Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Laurent Dufour Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section Message-ID: <20180324182359.GB4928@redhat.com> References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> <20180321130833.GM23100@dhcp22.suse.cz> <20180321172932.GE4780@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180321172932.GE4780@bombadil.infradead.org> User-Agent: Mutt/1.9.2 (2017-12-15) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Sat, 24 Mar 2018 18:24:03 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Sat, 24 Mar 2018 18:24:03 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jglisse@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 21, 2018 at 10:29:32AM -0700, Matthew Wilcox wrote: > On Wed, Mar 21, 2018 at 09:31:22AM -0700, Yang Shi wrote: > > On 3/21/18 6:08 AM, Michal Hocko wrote: > > > Yes, this definitely sucks. One way to work that around is to split the > > > unmap to two phases. One to drop all the pages. That would only need > > > mmap_sem for read and then tear down the mapping with the mmap_sem for > > > write. This wouldn't help for parallel mmap_sem writers but those really > > > need a different approach (e.g. the range locking). > > > > page fault might sneak in to map a page which has been unmapped before? > > > > range locking should help a lot on manipulating small sections of a large > > mapping in parallel or multiple small mappings. It may not achieve too much > > for single large mapping. > > I don't think we need range locking. What if we do munmap this way: > > Take the mmap_sem for write > Find the VMA > If the VMA is large(*) > Mark the VMA as deleted > Drop the mmap_sem > zap all of the entries > Take the mmap_sem > Else > zap all of the entries > Continue finding VMAs > Drop the mmap_sem > > Now we need to change everywhere which looks up a VMA to see if it needs > to care the the VMA is deleted (page faults, eg will need to SIGBUS; mmap > does not care; munmap will need to wait for the existing munmap operation > to complete), but it gives us the atomicity, at least on a per-VMA basis. > What about something that should fix all issues: struct list_head to_free_puds; ... down_write(&mm->mmap_sem); ... unmap_vmas(&tlb, vma, start, end, &to_free_puds); arch_unmap(mm, vma, start, end); /* Fix up all other VM information */ remove_vma_list(mm, vma); ... up_write(&mm->mmap_sem); ... zap_pud_list(rss_update_info, to_free_puds); update_rss(rss_update_info) We collect pud that need to be free/zap we update the page table PUD entry to pud_none under the write sem and CPU page table lock, add the pud to the list that need zapping. We only collect pud fully cover, and usual business for partialy covered pud. Everything behave as today except that we do not free memory. Care must be take with the anon vma and we should probably not free the vma struct either before the zap but all other mm struct can be updated. The rss_counter would also to be updated post zap pud. We would need special code to zap pud list, no need to take lock or do special arch tlb flushing, ptep_get_clear, ... when walking down those puds. So it should scale a lot better too. Did i miss something ? Cheers, J?r?me