Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4719449rdh; Wed, 29 Nov 2023 08:49:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IHF14JmaZrrEBUr104T31mldPzEk2shtrLDd2QmyKu4ag7JMNpNW5KHAaAC1cYVB3XE5g19 X-Received: by 2002:a17:902:c944:b0:1cf:a2e7:f843 with SMTP id i4-20020a170902c94400b001cfa2e7f843mr25469586pla.23.1701276550158; Wed, 29 Nov 2023 08:49:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701276550; cv=none; d=google.com; s=arc-20160816; b=f7wHTT3NyzYlbBgbKW9woWiibuUKJ5HhlZaCvuLhdKUNuJ9Rz+3ZuUGmMVmFyFSiDA WICAZTqjAoVHyrj0TOFuHBzo74gAqBTKKDTMyw075slvQO+7OM0VxMfgROPCf3rEv5Wj bgPKBbYwSfUC548cbOC6qjXW9UztTjZECGo4ZjGd5ub5yFpliPVivmSMjGalL2wtcQ11 G7pTh2aGImp+04lLIMgyCBcJiFt4yWwlVrEZwqZLxEmKFgrG4vmiVOhumkXhcHmDm8pt a9syW3C3IGbcJaWSa4zj9pGs9i0p+UOW4mxTx8WPhyB7AJYZqTHXSxCL9U8f5f0OZFds TlEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=A6AuT95bUlAwuO3bzSYtfBFbIe44WU8sjO4pFJ6kpSU=; fh=XXVUcDOA2w3L8yzduxuUEmcgUhAyFdDJFbt3jzJacg4=; b=raczeFP4lvdIvE5GGQonV10kbFywkRuZ7yzQ3LujiGQH0qamzSd6GN0OqUhynRs71t l6WE+JxOERu4Uurx2v9A8PDh2XuSeTgBzF6x53QW+oHHZ57FSNfLTsHlhg4+yP5GRxvc W69I/XpGDwvhWjiqVCPsJMefs6Qqz59EsI9JxpFMxCgN8+5vRK5BfilL+V+m6DO0T1BG xLspb8BisvyeG8x0Z60D3HJBh05J+MrRQLEQonaxkQVWJ37IUzg1k2Bk+i6vtHXnpf5H C0Q9/Mshn8TsIL2mG0aXYjJqM5/TiwImxUfGmmD13auAV8o0CGlrXBej4ptXPhkSiPE0 q7iw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id ds15-20020a17090b08cf00b0028515c441d7si1621184pjb.93.2023.11.29.08.49.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 08:49:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 9150F80B31E8; Wed, 29 Nov 2023 08:49:07 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229960AbjK2Qst (ORCPT + 99 others); Wed, 29 Nov 2023 11:48:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229799AbjK2Qsr (ORCPT ); Wed, 29 Nov 2023 11:48:47 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E02CAB0; Wed, 29 Nov 2023 08:48:53 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 96AD0C15; Wed, 29 Nov 2023 08:49:40 -0800 (PST) Received: from [10.1.196.40] (e121345-lin.cambridge.arm.com [10.1.196.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 69D1A3F73F; Wed, 29 Nov 2023 08:48:47 -0800 (PST) Message-ID: <52de3aca-41b1-471e-8f87-1a77de547510@arm.com> Date: Wed, 29 Nov 2023 16:48:43 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 08/16] iommu/fsl: use page allocation function provided by iommu-pages.h Content-Language: en-GB To: Jason Gunthorpe , Pasha Tatashin Cc: akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, paulmck@kernel.org, rdunlap@infradead.org, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com References: <20231128204938.1453583-1-pasha.tatashin@soleen.com> <20231128204938.1453583-9-pasha.tatashin@soleen.com> <1c6156de-c6c7-43a7-8c34-8239abee3978@arm.com> <20231128235037.GC1312390@ziepe.ca> From: Robin Murphy In-Reply-To: <20231128235037.GC1312390@ziepe.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 29 Nov 2023 08:49:07 -0800 (PST) On 28/11/2023 11:50 pm, Jason Gunthorpe wrote: > On Tue, Nov 28, 2023 at 06:00:13PM -0500, Pasha Tatashin wrote: >> On Tue, Nov 28, 2023 at 5:53 PM Robin Murphy wrote: >>> >>> On 2023-11-28 8:49 pm, Pasha Tatashin wrote: >>>> Convert iommu/fsl_pamu.c to use the new page allocation functions >>>> provided in iommu-pages.h. >>> >>> Again, this is not a pagetable. This thing doesn't even *have* pagetables. >>> >>> Similar to patches #1 and #2 where you're lumping in configuration >>> tables which belong to the IOMMU driver itself, as opposed to pagetables >>> which effectively belong to an IOMMU domain's user. But then there are >>> still drivers where you're *not* accounting similar configuration >>> structures, so I really struggle to see how this metric is useful when >>> it's so completely inconsistent in what it's counting :/ >> >> The whole IOMMU subsystem allocates a significant amount of kernel >> locked memory that we want to at least observe. The new field in >> vmstat does just that: it reports ALL buddy allocator memory that >> IOMMU allocates. However, for accounting purposes, I agree, we need to >> do better, and separate at least iommu pagetables from the rest. >> >> We can separate the metric into two: >> iommu pagetable only >> iommu everything >> >> or into three: >> iommu pagetable only >> iommu dma >> iommu everything >> >> What do you think? > > I think I said this at LPC - if you want to have fine grained > accounting of memory by owner you need to go talk to the cgroup people > and come up with something generic. Adding ever open coded finer > category breakdowns just for iommu doesn't make alot of sense. > > You can make some argument that the pagetable memory should be counted > because kvm counts it's shadow memory, but I wouldn't go into further > detail than that with hand coded counters.. Right, pagetable memory is interesting since it's something that any random kernel user can indirectly allocate via iommu_domain_alloc() and iommu_map(), and some of those users may even be doing so on behalf of userspace. I have no objection to accounting and potentially applying limits to *that*. Beyond that, though, there is nothing special about "the IOMMU subsystem". The amount of memory an IOMMU driver needs to allocate for itself in order to function is not of interest beyond curiosity, it just is what it is; limiting it would only break the IOMMU, and if a user thinks it's "too much", the only actionable thing that might help is to physically remove devices from the system. Similar for DMA buffers; it might be intriguing to account those, but it's not really an actionable metric - in the overwhelming majority of cases you can't simply tell a driver to allocate less than what it needs. And that is of course assuming if we were to account *all* DMA buffers, since whether they happen to have an IOMMU translation or not is irrelevant (we'd have already accounted the pagetables as pagetables if so). I bet "the networking subsystem" also consumes significant memory on the same kind of big systems where IOMMU pagetables would be of any concern. I believe some of the some of the "serious" NICs can easily run up hundreds of megabytes if not gigabytes worth of queues, SKB pools, etc. - would you propose accounting those too? Thanks, Robin.