Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1075062pxk; Thu, 1 Oct 2020 23:43:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyK1gGrLqgHn/oXbHIRNpR/xtRR/G1FIJZTJCKH1LTaB8zz/mfBOtjktayCsmBkIIsVuc1y X-Received: by 2002:aa7:d353:: with SMTP id m19mr740587edr.275.1601621033731; Thu, 01 Oct 2020 23:43:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601621033; cv=none; d=google.com; s=arc-20160816; b=t/tZWEkKHfaom0xjsnyVLP4A7o3vR+vArGEZMNDmOJLboJ3E7qpklN8p/aZRHkRfpy h/Li7+R2F9IVg1RRwgpbTwyZccB8kGwwqDx7z6uvzfaRYua6qMuYfIZffP0tsakDDdN3 +QNavmY59+UsORV1FNnGKc6BUYX5Vz/KL+go10ejvTwxOFlhCj6PKXif0A0fVEtPmd6E RcP5lOZbF9PfTgQSSkhQJ32HQ+gpT/Gi/kCsMEcCHtPk5J8oOyOOAiCA5rNmNaQzhnnS pboIPWXYjxbO0jpg7deUK50sXgiQe3o65p3Q24ckzHjXLeC3Qb+/Z4leOuE80VBzELfl VD4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=dkXqqJWGGMD3nsu1nDQF0uD51+pRQERPlmfOf0TOTVo=; b=SYzbaA1lOY7w5r5c2TPVHseyO1KXthvFi7Cz09N/pfcsW9mxKdw7sqcBjqLpx0Z1yf +moYFHH+/0nPFShTY70KlJTC+g6MF1JTUeFe61+BuHxChK78lJ6qSy1VLkI368qWImG9 5B17xOtk5qGvXT6uIc472bLA5QNLCndnp8tBux2GhExGjbUCvyfhLo0IQrQZTINWk8FU rCbJCYmax8xwWiy56iw+ZewIcRivf04cjCfcoHlMQSutkGwcXZBJLEApI0mxrudoH6Nv RQwEM5nzhMkIfIGH6BVFVF2Cm1Vx7nVDIw4iX3IMp1qPUnUIF/fXyvaR8WJw1rD1Tl4y U3TQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=iEkjLRgN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cx23si333531edb.541.2020.10.01.23.43.30; Thu, 01 Oct 2020 23:43:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=iEkjLRgN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726017AbgJBGkF (ORCPT + 99 others); Fri, 2 Oct 2020 02:40:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725948AbgJBGkF (ORCPT ); Fri, 2 Oct 2020 02:40:05 -0400 Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1463C0613D0 for ; Thu, 1 Oct 2020 23:40:04 -0700 (PDT) Received: by mail-io1-xd43.google.com with SMTP id k6so464913ior.2 for ; Thu, 01 Oct 2020 23:40:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=dkXqqJWGGMD3nsu1nDQF0uD51+pRQERPlmfOf0TOTVo=; b=iEkjLRgNkKYJWF7CEttuAu/tYQ0VOufE98KsN+GWMxJUGzWnlC8BFuPq5RorTIVtcD GUz8TA+zytLWsfeghAYOXm16ewnEmWroldRiJLLUqkIpye0v5gLc92/DzuUZ3RjoC0o2 Mj+49Kak5eFgPvzj1HZE10GW3MIpMKwWStTYAIBc5xjjTuGryaebrLfyq7fyZMgbCj47 vGPoiwmHKo51+C3S2BZQ8GMGQnX0K9vkbvsbvttctT9yFjFAGrDlrgwhFItqqdUsZcp5 OTWfLuNUXiQ3QREEwcpptRzqqqoO27EicyYNmoZAuoCjFFGOVcH49Su7CdhioMsK8LU0 824A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=dkXqqJWGGMD3nsu1nDQF0uD51+pRQERPlmfOf0TOTVo=; b=QZvQEAuOdp9EfH3QdZIKsyxrZtdob7dJEQ+9AYcxoUB0uB3niII2DHzmnIHjRRbbYD KkJ56QjgHug/UAQiSJQdNG5YGS9KRdFC5eSEfSXjsZaXC5mW+j3+7mZzVQ/z0fyNsUH1 mlFlIUud5qwLdXVTkUTWi0bucD7sGQiQVkIyaKd8lGfdgdNfwgf5cNEXD7SrM8XEM6G3 zN2m/1ZT8Sr+3KN5rxMRQq+leRntdALXraUgEitzUzJuspefpAj5kOL7OV35KUDmDGdZ k4u5USuh/o/Qv5FlECXzQZmlV4PPZgmVq7TgkAz1P204TyCnWZTh9oQJxE8n7BnDXwbt XBXA== X-Gm-Message-State: AOAM531V8Lpb4W+Jqbia2qidSCOJH5mfs3J33tV39Erhi2xi6aw/PUA2 sK91Bt74MTzJQBNeu5W6UF9ywS4JXK4/bVm9wDkECA== X-Received: by 2002:a6b:590c:: with SMTP id n12mr935815iob.25.1601620804015; Thu, 01 Oct 2020 23:40:04 -0700 (PDT) MIME-Version: 1.0 References: <20200930222130.4175584-1-kaleshsingh@google.com> <20200930223207.5xepuvu6wr6xw5bb@black.fi.intel.com> <20201001122706.jp2zr23a43hfomyg@black.fi.intel.com> <20201002053547.7roe7b4mpamw4uk2@black.fi.intel.com> In-Reply-To: <20201002053547.7roe7b4mpamw4uk2@black.fi.intel.com> From: Lokesh Gidra Date: Thu, 1 Oct 2020 23:39:53 -0700 Message-ID: Subject: Re: [PATCH 0/5] Speed up mremap on large regions To: "Kirill A. Shutemov" Cc: Kalesh Singh , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , "Cc: Android Kernel" , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "the arch/x86 maintainers" , "H. Peter Anvin" , Andrew Morton , Shuah Khan , "Aneesh Kumar K.V" , Kees Cook , Peter Zijlstra , Sami Tolvanen , Masahiro Yamada , Arnd Bergmann , Frederic Weisbecker , Krzysztof Kozlowski , Hassan Naveed , Christian Brauner , Mark Rutland , Mike Rapoport , Gavin Shan , Zhenyu Ye , Jia He , John Hubbard , William Kucharski , Sandipan Das , Ralph Campbell , Mina Almasry , Ram Pai , Dave Hansen , Kamalesh Babulal , Masami Hiramatsu , Brian Geffon , SeongJae Park , linux-kernel , "moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)" , "open list:MEMORY MANAGEMENT" , "open list:KERNEL SELFTEST FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 1, 2020 at 10:36 PM Kirill A. Shutemov wrote: > > On Thu, Oct 01, 2020 at 05:09:02PM -0700, Lokesh Gidra wrote: > > On Thu, Oct 1, 2020 at 9:00 AM Kalesh Singh wrote: > > > > > > On Thu, Oct 1, 2020 at 8:27 AM Kirill A. Shutemov > > > wrote: > > > > > > > > On Wed, Sep 30, 2020 at 03:42:17PM -0700, Lokesh Gidra wrote: > > > > > On Wed, Sep 30, 2020 at 3:32 PM Kirill A. Shutemov > > > > > wrote: > > > > > > > > > > > > On Wed, Sep 30, 2020 at 10:21:17PM +0000, Kalesh Singh wrote: > > > > > > > mremap time can be optimized by moving entries at the PMD/PUD level if > > > > > > > the source and destination addresses are PMD/PUD-aligned and > > > > > > > PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and > > > > > > > x86. Other architectures where this type of move is supported and known to > > > > > > > be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD > > > > > > > and HAVE_MOVE_PUD. > > > > > > > > > > > > > > Observed Performance Improvements for remapping a PUD-aligned 1GB-sized > > > > > > > region on x86 and arm64: > > > > > > > > > > > > > > - HAVE_MOVE_PMD is already enabled on x86 : N/A > > > > > > > - Enabling HAVE_MOVE_PUD on x86 : ~13x speed up > > > > > > > > > > > > > > - Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up > > > > > > > - Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up > > > > > > > > > > > > > > Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD > > > > > > > give a total of ~150x speed up on arm64. > > > > > > > > > > > > Is there a *real* workload that benefit from HAVE_MOVE_PUD? > > > > > > > > > > > We have a Java garbage collector under development which requires > > > > > moving physical pages of multi-gigabyte heap using mremap. During this > > > > > move, the application threads have to be paused for correctness. It is > > > > > critical to keep this pause as short as possible to avoid jitters > > > > > during user interaction. This is where HAVE_MOVE_PUD will greatly > > > > > help. > > > > > > > > Any chance to quantify the effect of mremap() with and without > > > > HAVE_MOVE_PUD? > > > > > > > > I doubt it's a major contributor to the GC pause. I expect you need to > > > > move tens of gigs to get sizable effect. And if your GC routinely moves > > > > tens of gigs, maybe problem somewhere else? > > > > > > > > I'm asking for numbers, because increase in complexity comes with cost. > > > > If it doesn't provide an substantial benefit to a real workload > > > > maintaining the code forever doesn't make sense. > > > > > mremap is indeed the biggest contributor to the GC pause. It has to > > take place in what is typically known as a 'stop-the-world' pause, > > wherein all application threads are paused. During this pause the GC > > thread flips the GC roots (threads' stacks, globals etc.), and then > > resumes threads along with concurrent compaction of the heap.This > > GC-root flip differs depending on which compaction algorithm is being > > used. > > > > In our case it involves updating object references in threads' stacks > > and remapping java heap to a different location. The threads' stacks > > can be handled in parallel with the mremap. Therefore, the dominant > > factor is indeed the cost of mremap. From patches 2 and 4, it is clear > > that remapping 1GB without this optimization will take ~9ms on arm64. > > > > Although this mremap has to happen only once every GC cycle, and the > > typical size is also not going to be more than a GB or 2, pausing > > application threads for ~9ms is guaranteed to cause jitters. OTOH, > > with this optimization, mremap is reduced to ~60us, which is a totally > > acceptable pause time. > > > > Unfortunately, implementation of the new GC algorithm hasn't yet > > reached the point where I can quantify the effect of this > > optimization. But I can confirm that without this optimization the new > > GC will not be approved. > > IIUC, the 9ms -> 90us improvement attributed to combination HAVE_MOVE_PMD > and HAVE_MOVE_PUD, right? I expect HAVE_MOVE_PMD to be reasonable for some > workloads, but marginal benefit of HAVE_MOVE_PUD is in doubt. Do you see > it's useful for your workload? > Yes, 9ms -> 90us is when both are combined. The past experience has been that even ~1ms long stop-the-world pause is prone to cause jitters. HAVE_MOVE_PMD takes us only this far. So HAVE_MOVE_PUD is required to bring the mremap cost to acceptable level. Ideally, I was hoping that the functionality of HAVE_MOVE_PMD can be extended to all levels of the hierarchical page table, and in the process simplify the implementation. But unfortunately, that doesn't seem to be possible from patch 3. > -- > Kirill A. Shutemov