Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4200874rdh; Tue, 28 Nov 2023 15:04:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IEQfobclbE4115GOejKUJoaY+rgGAIfYH4ky0AXkPOo+vt4RVpYhzofM/bS4+S2tpQPx7x9 X-Received: by 2002:a17:90b:80b:b0:285:866f:ca0d with SMTP id bk11-20020a17090b080b00b00285866fca0dmr15399722pjb.46.1701212674148; Tue, 28 Nov 2023 15:04:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701212674; cv=none; d=google.com; s=arc-20160816; b=GKUokUO9o2/66y0EGAyuUdmwr1tQwLVf9Io3EvCZDuAvviApcNLaoiEPiDnr9HkSek mF/dlBJ8AZ834EchLnIIAl2Avglzf5Q/1YzSK7S8NFq9Xi6rHKUePHCjYSxu/Kyoi5TS wWWrsNAQCsknVZUFP81vZjxP0/wopuZeFXN5El4hX/am4b8J68LtuO37SspWmSgqLPA6 fouDfyBqr8o7Qa8h3cssXz8lE4dzflN60FbUK+oXkzDFw7bLUN2aJIoCvmlppEBGwU9Y rp9t09k+5y47io5lI3eLRTnCl66wcfnIER9deJUTVq7CNlRjfGSODYchP9Osk9HrQ5Zp RjGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=43/GorJHph6wZf7ST1s9EfwOKwV7OvTpzkca3ztNxsA=; fh=VjIkwtvqI158LRmgm3fSiefb+bONLgKjSoxmKS7Uvjg=; b=dKSwsoPrvL/2Tm6lclqRtuVBGikpZ1AXpuTbAhgkeMx0rX2wmgAsosIWauFV1atNrq PxgOxbTpk0QuS3AAVvAGV5lW06Agffy1l1ewDBMeNYDPWh0DEIsXv3BlyWKKjqrsUsN9 WueQFyvZaxrFLfzsGCuDTqWZb5V+nE1gO3mzjV0wFNkU4CQjO1p224vVlBegisCSOKpz qR4ru/QyQ+L/1DpnEj56pOEtrF4+dpRJhoSf0fLzPOxw4hl7F4tjl6upECCQQvEAKiUX QPjgx6I7gBeb1pQzDNkKbfpSsQpmvvNmKWVH+JbhYbst2gGvoMKvGHUQtjwPvbOv1/2R Kr+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=T4+k+4UI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id cq12-20020a17090af98c00b00276b35bc424si19386pjb.26.2023.11.28.15.04.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 15:04:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=T4+k+4UI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 645C780A90F0; Tue, 28 Nov 2023 15:04:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345957AbjK1XEI (ORCPT + 99 others); Tue, 28 Nov 2023 18:04:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229595AbjK1XEH (ORCPT ); Tue, 28 Nov 2023 18:04:07 -0500 Received: from mail-oo1-xc2d.google.com (mail-oo1-xc2d.google.com [IPv6:2607:f8b0:4864:20::c2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7F36183 for ; Tue, 28 Nov 2023 15:04:12 -0800 (PST) Received: by mail-oo1-xc2d.google.com with SMTP id 006d021491bc7-58d564b98c9so2252854eaf.2 for ; Tue, 28 Nov 2023 15:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701212652; x=1701817452; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=43/GorJHph6wZf7ST1s9EfwOKwV7OvTpzkca3ztNxsA=; b=T4+k+4UIC04cOOUR2nvsU3u7pNoVVtWCRDRoeU582COZQW2ayY+qUXL3KQdPvLCbza VH1pT9x4sKAeCs9phGD+jh2rDGTtT6lCf+JsqtWLxgBXOO+e3wYPxGpqAyC+JSNCFKmB vzrhJ+lQlSvRgweKlSV5qiYjs+/LqHfca1OuVtBAC5+4iZqhTpnCc4xuh/OHo+wXJOSx Funu3YeeGPrrwqfLv4VV9jiQ4dHL2GWOGeeeVX9R7QUh92Lp/tCRdnva8MZQOYk0m8Al juJzaiCk7kKVw5t4rSexsTnuQYEfcHxlV2jjujm7rkS1Zpa+hScOzyUMvxNGtC5DAWYl bPtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701212652; x=1701817452; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=43/GorJHph6wZf7ST1s9EfwOKwV7OvTpzkca3ztNxsA=; b=lNS3t+nw7u+yXP1dL/H/TlCPpSZm7/YQ+czq+R088C6dHfKNS3dfRIT1/lSje9cb5m q+zVBAoIfyu9DfaDi3JX2AZ8sQc9LRA94f3T9VhoYKPs1MJtl41dwqqn00d1//L3fjch MCSJtsIqArjk02+MVu6GlVUEifkMQbBgMLxkYqUShPpYtzl2RPPQCZXAciQLm6LwzEXQ 3Jbt4H/AfZmx+ZT0jBpV9JNP4rfpjcsoJuzonKu4ogpKzyqkic2z1BDys+cBh+CRgjnh K9+2zjkRf0CTwrpLOJqSyRPdsF1KFM8ZZ4rd/AfDxje7dUty4VeDlyD7RvPZq4cnZbRF Vuow== X-Gm-Message-State: AOJu0Yx+nRGzauOauDis7q50EbqrPAC1kTjm6ulKcJX0DElfgdtbouJo woKe4rahPVTpuB6lOJTTSqzk0ZWu8hcFsJy8HTs1ZQ== X-Received: by 2002:a05:6359:67a9:b0:16d:bd74:19c9 with SMTP id sq41-20020a05635967a900b0016dbd7419c9mr14576065rwb.16.1701212651861; Tue, 28 Nov 2023 15:04:11 -0800 (PST) MIME-Version: 1.0 References: <20231128204938.1453583-1-pasha.tatashin@soleen.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 28 Nov 2023 15:03:30 -0800 Message-ID: Subject: Re: [PATCH 00/16] IOMMU memory observability To: Pasha Tatashin Cc: akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jgg@ziepe.ca, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, paulmck@kernel.org, rdunlap@infradead.org, robin.murphy@arm.com, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 28 Nov 2023 15:04:30 -0800 (PST) On Tue, Nov 28, 2023 at 2:32=E2=80=AFPM Pasha Tatashin wrote: > > On Tue, Nov 28, 2023 at 4:34=E2=80=AFPM Yosry Ahmed wrote: > > > > On Tue, Nov 28, 2023 at 12:49=E2=80=AFPM Pasha Tatashin > > wrote: > > > > > > From: Pasha Tatashin > > > > > > IOMMU subsystem may contain state that is in gigabytes. Majority of t= hat > > > state is iommu page tables. Yet, there is currently, no way to observ= e > > > how much memory is actually used by the iommu subsystem. > > > > > > This patch series solves this problem by adding both observability to > > > all pages that are allocated by IOMMU, and also accountability, so > > > admins can limit the amount if via cgroups. > > > > > > The system-wide observability is using /proc/meminfo: > > > SecPageTables: 438176 kB > > > > > > Contains IOMMU and KVM memory. > > > > > > Per-node observability: > > > /sys/devices/system/node/nodeN/meminfo > > > Node N SecPageTables: 422204 kB > > > > > > Contains IOMMU and KVM memory memory in the given NUMA node. > > > > > > Per-node IOMMU only observability: > > > /sys/devices/system/node/nodeN/vmstat > > > nr_iommu_pages 105555 > > > > > > Contains number of pages IOMMU allocated in the given node. > > > > Does it make sense to have a KVM-only entry there as well? > > > > In that case, if SecPageTables in /proc/meminfo is found to be > > suspiciously high, it should be easy to tell which component is > > contributing most usage through vmstat. I understand that users can do > > the subtraction, but we wouldn't want userspace depending on that, in > > case a third class of "secondary" page tables emerges that we want to > > add to SecPageTables. The in-kernel implementation can do the > > subtraction for now if it makes sense though. > > Hi Yosry, > > Yes, another counter for KVM could be added. On the other hand KVM > only can be computed by subtracting one from another as there are only > two types of secondary page tables, KVM and IOMMU: > > /sys/devices/system/node/node0/meminfo > Node 0 SecPageTables: 422204 kB > > /sys/devices/system/node/nodeN/vmstat > nr_iommu_pages 105555 > > KVM only =3D SecPageTables - nr_iommu_pages * PAGE_SIZE / 1024 > Right, but as I mention above, if userspace starts depending on this equation, we won't be able to add any more classes of "secondary" page tables to SecPageTables. I'd like to avoid that if possible. We can do the subtraction in the kernel.