Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4505278imm; Mon, 18 Jun 2018 16:36:39 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLDugO5VQX8+0haFA5u72IB+//zFQ6RE7kN2Me7BdHzSLrGdKpDgUKBnrYiUzPHym+9zACP X-Received: by 2002:a17:902:7406:: with SMTP id g6-v6mr15944236pll.90.1529364999852; Mon, 18 Jun 2018 16:36:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529364999; cv=none; d=google.com; s=arc-20160816; b=XuZJgIdsx+/tyM+axOOqbn1V8siU9aoqifjiW1j3NVmm1kWlkLiLPfrpyWzBFcvauf iZHRExNgUSHIwo5hMMFWAgRkgIb3Q15zAC5ztVs/RX9klN/TaHYskHoFcd7WEF/3mPuL t3wytKKXRfAfq8mn3JEXg+ivs5y7MVKow2rJh/5FMdyJysTpzzHE4P00kseiNSzlE3PY l7Nw+G7Lyzd5qMwyxhrcomikmVoEB1ElEfKnv4AM/qT9fcKE4eU1Vl7T6UOefgOFSYut ej3A5tK3loUk6qNkSBekFbFNswVfKoaW6XI9eS7hoLKDNt/9z8XuU8i5coe/QYvmtQ7X 0dZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:arc-authentication-results; bh=uUSVrgH6b0r4Fmwc7HqSJM6a6qUem4sBILLp7CyogNI=; b=VseSgXCL/8EirF/4MIYzONdgwNX2amHFPo8uV0jcMa6U15zun+LHz8c6QsitLeinIQ fEsAnfiaDXgZTjA+RUvemWhtEYwNC2jXIIGC8IM6k8Y0vroJyObkxx/SG1KNmvR7Xkck PnUXShV5dPFLh5Fv3beAZ5piWHhBMYjEOAKRDgWcuwwC1ce2AuqedGTiNbGkcJXuVYGO Q6S+TlKkrsjQuEMkYNfdUsPk6kT5q06niyn3btg3QsjrQm/JjhrWmTsyeNiKBJs+iThR JLjshz3jRdr7AAIxny0U+r+HD7dtKpzT2h4GWlhgH2f326M+tkJEgf9Omiw1kLcXGPvI b61A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31-v6si15885673plk.191.2018.06.18.16.36.26; Mon, 18 Jun 2018 16:36:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S937038AbeFRXep (ORCPT + 99 others); Mon, 18 Jun 2018 19:34:45 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:42827 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S937024AbeFRXeo (ORCPT ); Mon, 18 Jun 2018 19:34:44 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R221e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07402;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0T2yMyYn_1529364870; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0T2yMyYn_1529364870) by smtp.aliyun-inc.com(127.0.0.1); Tue, 19 Jun 2018 07:34:36 +0800 From: Yang Shi To: mhocko@kernel.org, willy@infradead.org, ldufour@linux.vnet.ibm.com, akpm@linux-foundation.org, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@redhat.com, namhyung@kernel.org Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC v2 0/2] mm: zap pages with read mmap_sem in munmap for large mapping Date: Tue, 19 Jun 2018 07:34:14 +0800 Message-Id: <1529364856-49589-1-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Background: Recently, when we ran some vm scalability tests on machines with large memory, we ran into a couple of mmap_sem scalability issues when unmapping large memory space, please refer to https://lkml.org/lkml/2017/12/14/733 and https://lkml.org/lkml/2018/2/20/576. History: Then akpm suggested to unmap large mapping section by section and drop mmap_sem at a time to mitigate it (see https://lkml.org/lkml/2018/3/6/784). V1 patch series was submitted to the mailing list per Andrew’s suggestion (see https://lkml.org/lkml/2018/3/20/786). Then I received a lot great feedback and suggestions. Then this topic was discussed on LSFMM summit 2018. In the summit, Michal Hock suggested (also in the v1 patches review) to try "two phases" approach. Zapping pages with read mmap_sem, then doing via cleanup with write mmap_sem (for discussion detail, see https://lwn.net/Articles/753269/) So, I came up with the V2 patch series per this suggestion. Here I don't call madvise(MADV_DONTNEED) directly since it is a little different from what munmap does, so I use unmap_region() as what do_munmap() does. The patches may need more cleanup and refactor, but it sounds better to let the community start review the patches early to make sure I'm on the right track. Regression and performance data: Test is run on a machine with 32 cores of E5-2680 @ 2.70GHz and 384GB memory Regression test with full LTP and trinity (munmap) with setting thresh to 4K in the code (just for regression test only) so that the new code can be covered better and trinity (munmap) test manipulates 4K mapping. No regression issue is reported and the system survives under trinity (munmap) test for 4 hours until I abort the test. Throughput of page faults (#/s) with the below stress-ng test: stress-ng --mmap 0 --mmap-bytes 80G --mmap-file --metrics --perf --timeout 600s pristine patched delta 89.41K/sec 97.29K/sec +8.8% The number looks a little bit better than v1. Yang Shi (2): uprobes: make vma_has_uprobes non-static mm: mmap: zap pages with read mmap_sem for large mapping include/linux/uprobes.h | 7 ++++ kernel/events/uprobes.c | 2 +- mm/mmap.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 155 insertions(+), 2 deletions(-)