Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp988940rwd; Thu, 15 Jun 2023 05:05:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6wqsXX2h5xxyrsLfqTdEUjWROZljGydSo2vgoVeP+JWzV7FylIngdHvGRfSYZSoZJdh+3i X-Received: by 2002:a17:907:984:b0:974:1c99:7d3 with SMTP id bf4-20020a170907098400b009741c9907d3mr14809875ejc.25.1686830708549; Thu, 15 Jun 2023 05:05:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686830708; cv=none; d=google.com; s=arc-20160816; b=QBlFkYF2Iv6g13tFOqiPVYDI7iM7W6/jZQf6bQ/2VVttv2JHmN5qGPXGhxxDXzQfv1 YmQPSbQvnvcCRuzPTmiZpCgba51Wg70SPnbUWgjXVrAAOjrFtn9uQH5w2VbCDTxOi6EL 6piPIPjFJHytvoBFIbX7h4JxuU+M+MNzHIeGprSuwPULGxvxPzB2j8f8NdbdkrUyl1NU jSe1kRQh8HB36IfR9DpMCTy3wZCLE24i6+KErIJDg10qYc9YT6+zSO0BYPsM7XnuqW1u jvZdUrbj63YUDebmoeToYFUXfcZUogz1AjslD8vE7cZUuPVqVZGD0iqw7AYez25hs61Y 44zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=Pa+EVYFbemCPOt36hUkiZ2oxEAUAht6DR2E/JY/bAUY=; b=MPIoJSx//bPIDHJM4/3B2IhivZ6INww1HawUkYy8fUT0hC6LNSw5CMSjpnPrPDnZ4v DFnY9KF4doKPw6fCx4/+gYljwa/p+KLXqEyyO9PM9pz3pNrGQrc1Hv6Nt8+N0VxT9MR2 NQxgmn52oveEFf3w41pqgRmjeaJGDfHHBEJWW5vDGUDvhdud3w4dkM5z9qHUZn/sz61k MSgJoQyDLQsBbf4CLbAlREnaVZQvX4voFjG7gaxODuy2aeu2ZUP9bXu2//XIFjpdekLb eVz1rEH0gKZDScw8qWWtHPaABDwjRFD8PiphHFhLvZpx31SvTSsUuGnefMvgZNOfT2hS QirA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v19-20020a170906b01300b009829013a900si1223891ejy.552.2023.06.15.05.04.43; Thu, 15 Jun 2023 05:05:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344543AbjFOLrW (ORCPT + 99 others); Thu, 15 Jun 2023 07:47:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344535AbjFOLrD (ORCPT ); Thu, 15 Jun 2023 07:47:03 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0520359E4 for ; Thu, 15 Jun 2023 04:43:08 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 08E7F1FB; Thu, 15 Jun 2023 04:42:47 -0700 (PDT) Received: from [10.57.85.251] (unknown [10.57.85.251]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A58463F64C; Thu, 15 Jun 2023 04:42:01 -0700 (PDT) Message-ID: <99c1e8ab-a064-c770-072f-23ef9e9abb82@arm.com> Date: Thu, 15 Jun 2023 12:41:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v4] iommu: Optimise PCI SAC address trick Content-Language: en-GB To: John Garry , Jakub Kicinski , Joerg Roedel Cc: will@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Linus Torvalds References: <20230613105850.30172085@kernel.org> <4f9184c5-e6a2-08da-f44a-3000b6cdfe35@oracle.com> <198a73b0-d7c0-57d6-5ef9-4e9dddb6365b@arm.com> <568df53c-41a7-94d7-6662-f8f7c72e5178@oracle.com> From: Robin Murphy In-Reply-To: <568df53c-41a7-94d7-6662-f8f7c72e5178@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023-06-15 11:11, John Garry wrote: > On 15/06/2023 10:04, Robin Murphy wrote: >>> Since we're at rc6 time and a cautious approach was wanted to merge >>> this change, I doubt that this will be merged for this cycle. That's >>> quite unfortunate. >>> >>> Please note what I mentioned earlier about using >>> dma_opt_mapping_size(). This API is used by some block storage >>> drivers to avoid your same problem, by clamping max_sectors_kb at >>> this size - see sysfs-block Doc for info there. Maybe it can be used >>> similarly for network drivers. >> >> It's not the same problem - in this case the mappings are already >> small enough to use the rcaches, and it seems more to do with the >> total number of unusable cached IOVAs being enough to keep the 32-bit >> space almost-but-not-quite full most of the time, defeating the >> max32_alloc_size optimisation whenever the caches run out of the right >> size entries. > > Sure, not the same problem. > > However when we switched storage drivers to use dma_opt_mapping_size() > then performance is similar to iommu.forcedac=1 - that's what I found, > anyway. > > This tells me that that even though IOVA allocator performance is poor > when the 32b space fills, it was those large IOVAs which don't fit in > the rcache which were the major contributor to hogging the CPU in the > allocator. The root cause is that every time the last usable 32-bit IOVA is allocated, the *next* PCI caller to hit the rbtree for a SAC allocation is burdened with walking the whole 32-bit subtree to determine that it's full again and re-set max32_alloc_size. That's the overhead that forcedac avoids. In the storage case with larger buffers, dma_opt_mapping_size() also means you spend less time in the rbtree, but because you're inherently hitting it less often at all, since most allocations can now hopefully be fulfilled by the caches. That's obviously moot when the mappings are already small enough to be cached and the only reason for hitting the rbtree is overflow/underflow in the depot because the working set is sufficiently large and the allocation pattern sufficiently "bursty". Thanks, Robin.