Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp958860iob; Fri, 13 May 2022 17:44:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwl0D1Mh7ZotnywACGCoPre8AlUx534/CGGn1wde1+CY9BXxRl37YXiGLb1XhRSrvATRIYH X-Received: by 2002:adf:b35b:0:b0:20a:dd58:ef60 with SMTP id k27-20020adfb35b000000b0020add58ef60mr5999490wrd.647.1652489067843; Fri, 13 May 2022 17:44:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652489067; cv=none; d=google.com; s=arc-20160816; b=irSM3etamPv6y2PHwMzYGmQtgf/z66Set6L41x3XCLDz6JuZas8VI8yCWgFQl2C3yM pBnoTfBUGmuWuO+gL+rqQmlQnaTaaigJeC7Q2PESdPJRGRF+KKT1wPbf0D+uUvFHHpE+ 9TEk2+wopg4gZ8fxBQm0O8xPRHOzoqH9wRBD9bFzFwp/8HQGzDQ/rFRStk6SQzbnzhmb XJ0BXKuKtVY/f3/TgUKKVMeDYSFcEnhaRAFCpDbe/1a9GCXrBvffGWVmZsOmceMIAeki OzKPtqCFR/GjiXheWkoCgNIV10o/9AuCOTZGAz/XB/rvpqfKLWmYm/a2yAVebzFn75h6 2fkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=P05jhA0WcF/hZv6hPGGv+IwSy9Etnh2buB8vtD3ctaw=; b=ppHb/jyTSBsQgiYvAvETFXN/LYyNJ8UvPPp8O/9dQ1K3PD6+FZj4Rf0IrlPJFu0jt1 Rsadanf/C8hLaif+m/en4eEK+LYcSMNet5tzShraoY/QuIdu8SgP8PSwQQ8mMYvLcekG OV7cqmaerQNC9TT+UWQDuZZAJKQHwwp5iesSx/VYO4HXmaWdyZkrjgv23QRKA9b/M/K4 NsRhPm/sGC6KWc30EV1RXZs6onqrLqaB8UjLpa1kLtztgREKovqJh0TGAQnebXIcrLgZ ljQUanLTT+S8UVRu1Lja10/wfGXm8hd6kNPdqEtOYqSqcj4ok5EjzzANbBbrgwvuWSJw AI0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=NbGqMSBP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id a1-20020a5d4d41000000b00203e901958bsi3257045wru.783.2022.05.13.17.44.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 17:44:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=NbGqMSBP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 14640318EF3; Fri, 13 May 2022 16:24:19 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382377AbiEMQXm (ORCPT + 99 others); Fri, 13 May 2022 12:23:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382376AbiEMQXk (ORCPT ); Fri, 13 May 2022 12:23:40 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B97613C732 for ; Fri, 13 May 2022 09:23:34 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id l38-20020a05600c1d2600b00395b809dfbaso4745886wms.2 for ; Fri, 13 May 2022 09:23:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=P05jhA0WcF/hZv6hPGGv+IwSy9Etnh2buB8vtD3ctaw=; b=NbGqMSBPXMT8N2S9Mb/Dsg5lSZ+Q1JehG0g27ajHeCVpig2+23Km9JoVFr/ImPYFoO wJmh2nXKIDIII/vXFSfau8UNbJ5DJy18o9CrYBc7a9/NtK/YuUaf3AolOoaNxUrDOM7q Q6bqIo+8/dE7bkyPlhCmEkjd6AnOrD2OhH0pNxkmwIETys/pp9IK8GjEeaFuFYivzvmD gdTf3d0+GG/ftL3jXg8rtu2aDumAzRDQYSjyUvkD+w5EoHcKd9GNvo5O8R6/JDd1Mi60 2jyTWDR3yNwOUThXmrpBuS7lvNeRMKW2aCC5EP/QH/EPpkCTf5DhQlw8Sa6acajQ+nyC /GXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=P05jhA0WcF/hZv6hPGGv+IwSy9Etnh2buB8vtD3ctaw=; b=xP+PhSZD+UJXg44Q19XdMUn97EPjpJxfV8EBUNJEe56tu5dZbqfwKL8O8DpGtSEhYx 5sNCs0GQ2m9Fe/r+b9ULGw5Vo+Mw98OnaGv1cTipjXLu4hPhm04vs/3VyJdNvIqMr/QN EuKfwFqiXioJiyQKw3ft1oaAsDeqcFAZgFJ/3PCM2w80nZLpsVgKIqKktw2dOCy3u4eo 1QNzBgB8XUAoDl0swJzCfKuuEO0YhdIAkFyBWeZ/v5w1jFslKDCCtgbiidJeD/ppxstd ipcZR9Jj99H5E4s3Q1WuNToF927EofL1IHQqafPjNkjiMl4irPUAl4kq+qMnqyf5a/jD 42cQ== X-Gm-Message-State: AOAM533u+GmmEJSskvhwu0VV/viYBAUOF09+I4lgBolKxJDofzDIqTjf mmcaDlvOjuZzB1UkVDSJMmih6VM7JBXtnF08DhbgbA== X-Received: by 2002:a05:600c:3490:b0:394:5616:ac78 with SMTP id a16-20020a05600c349000b003945616ac78mr5401973wmq.80.1652459013025; Fri, 13 May 2022 09:23:33 -0700 (PDT) MIME-Version: 1.0 References: <20220429201131.3397875-1-yosryahmed@google.com> <20220429201131.3397875-2-yosryahmed@google.com> <87ilqoi77b.wl-maz@kernel.org> In-Reply-To: From: Yosry Ahmed Date: Fri, 13 May 2022 09:22:56 -0700 Message-ID: Subject: Re: [PATCH v4 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. To: Sean Christopherson Cc: Johannes Weiner , Marc Zyngier , Tejun Heo , Zefan Li , James Morse , Alexandru Elisei , Suzuki K Poulose , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Oliver Upton , cgroups@vger.kernel.org, Linux Kernel Mailing List , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Linux-MM Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks everyone for participating in this discussion and looking into this. On Fri, May 13, 2022 at 9:12 AM Sean Christopherson wrote: > > On Fri, May 13, 2022, Johannes Weiner wrote: > > On Thu, May 12, 2022 at 11:29:38PM +0000, Sean Christopherson wrote: > > > On Thu, May 12, 2022, Johannes Weiner wrote: > > > > On Mon, May 02, 2022 at 11:46:26AM -0700, Yosry Ahmed wrote: > > > > > On Mon, May 2, 2022 at 3:01 AM Marc Zyngier wrote: > > > > > > What do you plan to do for IOMMU page tables? After all, they serve > > > > > > the exact same purpose, and I'd expect these to be handled the same > > > > > > way (i.e. why is this KVM specific?). > > > > > > > > > > The reason this was named NR_SECONDARY_PAGTABLE instead of > > > > > NR_KVM_PAGETABLE is exactly that. To leave room to incrementally > > > > > account other types of secondary page tables to this stat. It is just > > > > > that we are currently interested in the KVM MMU usage. > > > > > > > > Do you actually care at the supervisor level that this memory is used > > > > for guest page tables? > > > > > > Hmm, yes? KVM does have a decent number of large-ish allocations that aren't > > > for page tables, but except for page tables, the number/size of those allocations > > > scales linearly with either the number of vCPUs or the amount of memory assigned > > > to the VM (with no room for improvement barring KVM changes). > > > > > > Off the top of my head, KVM's secondary page tables are the only allocations that > > > don't scale linearly, especially when nested virtualization is in use. > > > > Thanks, that's useful information. > > > > Are these other allocations accounted somewhere? If not, are they > > potential containment holes that will need fixing eventually? > > All allocations that are tied to specific VM/vCPU are tagged GFP_KERNEL_ACCOUNT, > so we should be good on that front. > > > > > It seems to me you primarily care that it is reported *somewhere* > > > > (hence the piggybacking off of NR_PAGETABLE at first). And whether > > > > it's page tables or iommu tables or whatever else allocated for the > > > > purpose of virtualization, it doesn't make much of a difference to the > > > > host/cgroup that is tracking it, right? > > > > > > > > (The proximity to nr_pagetable could also be confusing. A high page > > > > table count can be a hint to userspace to enable THP. It seems > > > > actionable in a different way than a high number of kvm page tables or > > > > iommu page tables.) > > > > > > I don't know about iommu page tables, but on the KVM side a high count can also > > > be a good signal that enabling THP would be beneficial. > > > > Well, maybe. > > > > It might help, but ultimately it's the process that's in control in > > all cases: it's unmovable kernel memory allocated to manage virtual > > address space inside the task. > > > > So I'm still a bit at a loss whether these things should all be lumped > > in together or kept separately. meminfo and memory.stat are permanent > > ABI, so we should try to establish in advance whether the new itme is > > really a first-class consumer or part of something bigger. > > > > The patch initially piggybacked on NR_PAGETABLE. I found an email of > > you asking why it couldn't be a separate item, but it didn't provide a > > reasoning for that decision. Could you share your thoughts on that? > > It was mostly an honest question, I too am trying to understand what userspace > wants to do with this information. I was/am also trying to understand the benefits > of doing the tracking through page_state and not a dedicated KVM stat. E.g. KVM > already has specific stats for the number of leaf pages mapped into a VM, why not > do the same for non-leaf pages? Let me cast some light on this. The reason this started being piggybacked on NR_PAGETABLE is that we had a remnant of an old internal implementation of NR_PAGETABLE before it was introduced upstream, that accounted KVM secondary page tables as normal page tables. This made me think this behavior was preferable. Personally, I wanted to make it a separate thing since the beginning. When I found opinions here that also suggested a separate stat I went ahead for that. As for where to put this information, it does not have to be NR_SECONDARY_PAGETABLE. Ultimately, people working on KVM are the ones that will interpret and act upon this data, so if you have somewhere else in mind please let me know, Sean.