Received: by 2002:a05:7412:bc1a:b0:d7:7d3a:4fe2 with SMTP id ki26csp382741rdb; Sat, 19 Aug 2023 05:47:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGklrkIkPel1mmOKTJGHOa7C4Gi5eZHzJA+2zFUjNEJHdTAWjnrqIW7ZQvt/FLbMmc/AKPG X-Received: by 2002:a17:903:1d2:b0:1bc:496c:8edb with SMTP id e18-20020a17090301d200b001bc496c8edbmr2402454plh.0.1692449250395; Sat, 19 Aug 2023 05:47:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692449250; cv=none; d=google.com; s=arc-20160816; b=xl99M441ZeZKJSeCOKsIsyxj+ZnVr1E/JZL0s2QeXgk0SUGoNFWAXGoK6GWTr9NFxY Djd8Tf0+c20dmeMQCShwb09XR4x388B1SVjPDKL8/LdNIqArKaiPSUQL04cT1yciLdpw ZHDkT5MfGT3j6cZFcHnpvZZJaWHfSOgrjnspBbO8KAkcv+CBTkszpVxKvz+xwhDaLvW5 rubm27VaZ/bWjkrg6mjrOt0RatSli1x4cZ/41JZfq5f+MK0I90vmXzghWuZv0AQCBf5u uSJrbaWybpg4eP1QBlZr+Zjkg0OhMcOePs41a7ybVLIFetAPg8t5w9kV7QcAVe1P2JtQ rQzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :content-language:references:cc:to:subject:from:user-agent :mime-version:date:message-id; bh=lgYz23rGeRYUIALN1Z168QMLSAe25SAcj+Zccfho21A=; fh=Wk1a203lcBapL2GG2lODC1dDl22x9GteRkyZ7FZjoNY=; b=OJgbg2nxWC85zeLtMOeWmop/NSSIc+S8KgsDb3d3gCyWppujq2XrLvcqgHdeooKJ5f yHo3ev/MdY4muAAHip08LdYt2xZUoEmMYIMYav4bYXjrn0AVKGUd9Ts/DBagddhPlEnT 0n1DMMqspgBuqOqDNSGA7zTsI0DAIfq2NJB9RSBw1XTmLD+0EWgBGUXkwmUqjfmWnb1F AQJ0vtCC54nHOLPgpz7C1CAj2YYt1B6QkN2+RFhYdntjpaolnIszQGtG9u77ZESgDWq3 ITVeehpGq4fGLD0V5ovtbxX4kE7LAoOZwTDNiKF9D1J77UCVD8uMlUpRmwWnkWrRkRXR vHpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id fh9-20020a17090b034900b002691e035572si5122893pjb.17.2023.08.19.05.47.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 19 Aug 2023 05:47:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 440F88D81A; Sat, 19 Aug 2023 01:38:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344019AbjHPPKV (ORCPT + 99 others); Wed, 16 Aug 2023 11:10:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344081AbjHPPKL (ORCPT ); Wed, 16 Aug 2023 11:10:11 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 289C8198C for ; Wed, 16 Aug 2023 08:10:10 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E50A6D75; Wed, 16 Aug 2023 08:10:50 -0700 (PDT) Received: from [10.57.90.114] (unknown [10.57.90.114]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A89DB3F762; Wed, 16 Aug 2023 08:10:08 -0700 (PDT) Message-ID: Date: Wed, 16 Aug 2023 16:10:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 From: Robin Murphy Subject: Re: [PATCH 0/2] iommu/iova: Make the rcache depot properly flexible To: John Garry , joro@8bytes.org Cc: will@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, zhangzekun11@huawei.com References: <80fb865a-eb45-e783-277d-0d2e044c28f5@oracle.com> <1aa1ecad-bdf0-84c8-a37f-94e1d0fb8a03@oracle.com> Content-Language: en-GB In-Reply-To: <1aa1ecad-bdf0-84c8-a37f-94e1d0fb8a03@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15/08/2023 2:35 pm, John Garry wrote: > On 15/08/2023 12:11, Robin Murphy wrote: >>> >>> This threshold is the number of online CPUs, right? >> >> Yes, that's nominally half of the current fixed size (based on all the >> performance figures from the original series seemingly coming from a >> 16-thread machine, > > If you are talking about > https://lore.kernel.org/linux-iommu/20230811130246.42719-1-zhangzekun11@huawei.com/, No, I mean the *original* rcache patch submission, and its associated paper: https://lore.kernel.org/linux-iommu/cover.1461135861.git.mad@cs.technion.ac.il/ > then I think it's a 256-CPU system and the DMA controller has 16 HW queues. The 16 HW queues are relevant as the per-completion queue interrupt handler runs on a fixed CPU from the set of 16 CPUs in the HW queue interrupt handler affinity mask. And what this means is while any CPU may alloc an IOVA, only those 16 CPUs handling each HW queue interrupt will be free'ing IOVAs. > >> but seemed like a fair compromise. I am of course keen to see how >> real-world testing actually pans out. >> >>>> it's enough of a challenge to get my 4-core dev board with spinning >>>> disk >>>> and gigabit ethernet to push anything into a depot at all ???? >>>> >>> >>> I have to admit that I was hoping to also see a more aggressive >>> reclaim strategy, where we also trim the per-CPU rcaches when not in >>> use. Leizhen proposed something like this a long time ago. >> >> Don't think I haven't been having various elaborate ideas for making >> it cleverer with multiple thresholds and self-tuning, however I have >> managed to restrain myself ???? >> > > OK, understood. My main issue WRT scalability is that the total > cacheable IOVAs (CPU and depot rcache) scales up with the number of > CPUs, but many DMA controllers have a fixed number of max in-flight > requests. > > Consider a SCSI storage controller on a 256-CPU system. The in-flight > limit for this example controller is 4096, which would typically never > be even used up or may not be even usable. > > For this device, we need 4096 * 6 [IOVA rcache range] = ~24K cached > IOVAs if we were to pre-allocate them all - obviously I am ignoring that > we have the per-CPU rcache for speed and it would not make sense to > share one set. However, according to current IOVA driver, we can in > theory cache upto ((256 [CPUs] * 2 [loaded + prev]) + 32 [depot size]) * > 6 [rcache range] * 128 (IOVA per mag) = ~420K IOVAs. That's ~17x what we > would ever need. > > Something like NVMe is different, as its total requests can scale up > with the CPU count, but only to a limit. I am not sure about network > controllers. Remember that this threshold only represents a point at which we consider the cache to have grown "big enough" to start background reclaim - over the short term it is neither an upper nor a lower limit on the cache capacity itself. Indeed it will be larger than the working set of some workloads, but then it still wants to be enough of a buffer to be useful for others which do make big bursts of allocations only periodically. > Anyway, this is just something which I think should be considered - > which I guess already has been. Indeed, I would tend to assume that machines with hundreds of CPUs are less likely to be constrained on overall memory and/or IOVA space, so tuning for a more responsive cache should be more beneficial than any potential wastage is detrimental. Cheers, Robin.