Received: by 2002:a05:7412:b795:b0:e2:908c:2ebd with SMTP id iv21csp255270rdb; Thu, 2 Nov 2023 02:47:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHx6dWCHF9YqnfFI9iP9DRluXnE7V5YYFh/q2AocKlHRaoWGQ1lNwXKn/abxQm5x7zd+ur5 X-Received: by 2002:a05:6358:94a9:b0:169:845b:3441 with SMTP id i41-20020a05635894a900b00169845b3441mr11213213rwb.22.1698918474323; Thu, 02 Nov 2023 02:47:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698918474; cv=none; d=google.com; s=arc-20160816; b=In6AkHr5xFJOiBrw1/QZOqo0HW4v73Wenh9D+xegyDAAG4I/b7IONir9z6VVKtI0H3 qf2xsxJBBY5b8ylWJ2pBeGuAK5qjOScqUKmFy0EOoJOfcg+xu/kOicyE1ZDOqVhCToAD mmvyiJ3e/04DeDjfJ07wj6joYu3qpvEGNGPGPDG36r3GwxU2d/TdcohqFXNWc8P/Vd3e ufKEsjFB4ZfOm65S1McrYGF7iQvOswv5xbxAu0DNRuC2T1n4uxQ5Th5SrvHENiRdrOxw gs21oFRsUGMFlKv5CguMJsL3eeI665zsbRqf1rW5NcZLgBDA+D0DUhzYXc4dWKLSzQtZ Y6Hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=U4DI5NxLpUurNv+EM4EDkknl0rotrbWoWFoUB6y4MUA=; fh=i84ufboAR5FgvYgW45teI/Z55emTN9wmY8dqeGFTgIA=; b=Nggm8pOMA/JM+X8z4ZFYBWUMcoFg3sSV0vj9AiHGT/UEvyIADmTrPidjydE/Apbic7 //zI9+B7r3V5/GLZDR0cYcDNFz54JBabEv9R5+TK1a8KRgwxVJtQTXPoznu+wVKeT5lc uP6axZ3bgu+qT0SOJHRWqE2RJXJ6e2KURWCZ6JEyn0gqJz7jX8fMe+8j/rW2xDFI+wwF 1A4nvdf0ZQ9vdjgMRqd8j3wTAhRBHIVB2LQp1UQ0eKV8YuZVmU19MYCnmnASHdhAiIQT s6nDPo/FjuCF0p96lD4iU7MYMUGlIe/5pml08Q4wHjuRrWwIwEafkuw0gjddXV/JeFRc tn/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei-partners.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id q207-20020a632ad8000000b00578aedd8e8bsi1543622pgq.716.2023.11.02.02.47.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Nov 2023 02:47:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei-partners.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 4B82A8142DB9; Thu, 2 Nov 2023 02:47:44 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345524AbjKBJrS (ORCPT + 99 others); Thu, 2 Nov 2023 05:47:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345606AbjKBJrR (ORCPT ); Thu, 2 Nov 2023 05:47:17 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 885D6111 for ; Thu, 2 Nov 2023 02:47:13 -0700 (PDT) Received: from frapeml500002.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4SLf8c3MFYz6K6Zw; Thu, 2 Nov 2023 17:44:00 +0800 (CST) Received: from [10.48.131.78] (10.48.131.78) by frapeml500002.china.huawei.com (7.182.85.205) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 2 Nov 2023 10:47:09 +0100 Message-ID: Date: Thu, 2 Nov 2023 10:47:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] swiotlb: reduce area lock contention for non-primary IO TLB pools To: Petr Tesarik , Christoph Hellwig , Marek Szyprowski , Robin Murphy , "open list:DMA MAPPING HELPERS" , open list CC: Wangkefeng , Roberto Sassu , References: <20231102094445.1738-1-petrtesarik@huaweicloud.com> Content-Language: en-US From: Petr Tesarik In-Reply-To: <20231102094445.1738-1-petrtesarik@huaweicloud.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.48.131.78] X-ClientProxiedBy: frapeml100001.china.huawei.com (7.182.85.63) To frapeml500002.china.huawei.com (7.182.85.205) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 02 Nov 2023 02:47:44 -0700 (PDT) Hi, just to make it clear, this patch is orthogonal to and independent from the handling of decrypted pages, sent a few minutes earlier. Petr T On 11/2/2023 10:44 AM, Petr Tesarik wrote: > From: Petr Tesarik Hi > > If multiple areas and multiple IO TLB pools exist, first iterate the > current CPU specific area in all pools. Then move to the next area index. > > This is best illustrated by a diagram: > > area 0 | area 1 | ... | area M | > pool 0 A B C > pool 1 D E > ... > pool N F G H > > Currently, each pool is searched before moving on to the next pool, > i.e. the search order is A, B ... C, D, E ... F, G ... H. With this patch, > each area is searched in all pools before moving on to the next area, > i.e. the search order is A, D ... F, B, E ... G ... C ... H. > > Note that preemption is not disabled, and raw_smp_processor_id() may not > return a stable result, but it is called only once to determine the initial > area index. The search will iterate over all areas eventually, even if the > current task is preempted. > > Next, some pools may have less (but not more) areas than default_nareas. > Skip such pools if the distance from the initial area index is greater than > pool->nareas. This logic ensures that for every pool the search starts in > the initial CPU's own area and never tries any area twice. > > To verify performance impact, I booted the kernel with a minimum pool > size ("swiotlb=512,4,force"), so multiple pools get allocated, and I ran > these benchmarks: > > - small: single-threaded I/O of 4 KiB blocks, > - big: single-threaded I/O of 64 KiB blocks, > - 4way: 4-way parallel I/O of 4 KiB blocks. > > The "var" column in the tables below is the coefficient of variance over 5 > runs of the test, the "diff" column is the relative difference against base > in read-write I/O bandwidth (MiB/s). > > Tested on an x86 VM against a QEMU virtio SATA driver backed by a RAM-based > block device on the host: > > base patched > var var diff > small 0.69% 0.62% +25.4% > big 2.14% 2.27% +25.7% > 4way 2.65% 1.70% +23.6% > > Tested on a Raspberry Pi against a class-10 A1 microSD card: > > base patched > var var diff > small 0.53% 1.96% -0.3% > big 0.02% 0.57% +0.8% > 4way 6.17% 0.40% +0.3% > > These results confirm that there is significant performance boost in the > software IO TLB slot allocation itself. Where performance is dominated by > actual hardware, there is no measurable change. > > Signed-off-by: Petr Tesarik > --- > kernel/dma/swiotlb.c | 90 +++++++++++++++++++++++++++----------------- > 1 file changed, 55 insertions(+), 35 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index a1c3dabed19f..35d603ec0329 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -954,7 +954,7 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int nslots) > #endif /* CONFIG_DEBUG_FS */ > > /** > - * swiotlb_area_find_slots() - search for slots in one IO TLB memory area > + * swiotlb_search_pool_area() - search one memory area in one pool > * @dev: Device which maps the buffer. > * @pool: Memory pool to be searched. > * @area_index: Index of the IO TLB memory area to be searched. > @@ -969,7 +969,7 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int nslots) > * > * Return: Index of the first allocated slot, or -1 on error. > */ > -static int swiotlb_area_find_slots(struct device *dev, struct io_tlb_pool *pool, > +static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool, > int area_index, phys_addr_t orig_addr, size_t alloc_size, > unsigned int alloc_align_mask) > { > @@ -1063,41 +1063,50 @@ static int swiotlb_area_find_slots(struct device *dev, struct io_tlb_pool *pool, > return slot_index; > } > > +#ifdef CONFIG_SWIOTLB_DYNAMIC > + > /** > - * swiotlb_pool_find_slots() - search for slots in one memory pool > + * swiotlb_search_area() - search one memory area in all pools > * @dev: Device which maps the buffer. > - * @pool: Memory pool to be searched. > + * @start_cpu: Start CPU number. > + * @cpu_offset: Offset from @start_cpu. > * @orig_addr: Original (non-bounced) IO buffer address. > * @alloc_size: Total requested size of the bounce buffer, > * including initial alignment padding. > * @alloc_align_mask: Required alignment of the allocated buffer. > + * @retpool: Used memory pool, updated on return. > * > - * Search through one memory pool to find a sequence of slots that match the > + * Search one memory area in all pools for a sequence of slots that match the > * allocation constraints. > * > * Return: Index of the first allocated slot, or -1 on error. > */ > -static int swiotlb_pool_find_slots(struct device *dev, struct io_tlb_pool *pool, > - phys_addr_t orig_addr, size_t alloc_size, > - unsigned int alloc_align_mask) > +static int swiotlb_search_area(struct device *dev, int start_cpu, > + int cpu_offset, phys_addr_t orig_addr, size_t alloc_size, > + unsigned int alloc_align_mask, struct io_tlb_pool **retpool) > { > - int start = raw_smp_processor_id() & (pool->nareas - 1); > - int i = start, index; > - > - do { > - index = swiotlb_area_find_slots(dev, pool, i, orig_addr, > - alloc_size, alloc_align_mask); > - if (index >= 0) > - return index; > - if (++i >= pool->nareas) > - i = 0; > - } while (i != start); > + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; > + struct io_tlb_pool *pool; > + int area_index; > + int index = -1; > > - return -1; > + rcu_read_lock(); > + list_for_each_entry_rcu(pool, &mem->pools, node) { > + if (cpu_offset >= pool->nareas) > + continue; > + area_index = (start_cpu + cpu_offset) & (pool->nareas - 1); > + index = swiotlb_search_pool_area(dev, pool, area_index, > + orig_addr, alloc_size, > + alloc_align_mask); > + if (index >= 0) { > + *retpool = pool; > + break; > + } > + } > + rcu_read_unlock(); > + return index; > } > > -#ifdef CONFIG_SWIOTLB_DYNAMIC > - > /** > * swiotlb_find_slots() - search for slots in the whole swiotlb > * @dev: Device which maps the buffer. > @@ -1121,18 +1130,17 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > unsigned long nslabs; > unsigned long flags; > u64 phys_limit; > + int cpu, i; > int index; > > - rcu_read_lock(); > - list_for_each_entry_rcu(pool, &mem->pools, node) { > - index = swiotlb_pool_find_slots(dev, pool, orig_addr, > - alloc_size, alloc_align_mask); > - if (index >= 0) { > - rcu_read_unlock(); > + cpu = raw_smp_processor_id(); > + for (i = 0; i < default_nareas; ++i) { > + index = swiotlb_search_area(dev, cpu, i, orig_addr, alloc_size, > + alloc_align_mask, &pool); > + if (index >= 0) > goto found; > - } > } > - rcu_read_unlock(); > + > if (!mem->can_grow) > return -1; > > @@ -1145,8 +1153,8 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > if (!pool) > return -1; > > - index = swiotlb_pool_find_slots(dev, pool, orig_addr, > - alloc_size, alloc_align_mask); > + index = swiotlb_search_pool_area(dev, pool, 0, orig_addr, > + alloc_size, alloc_align_mask); > if (index < 0) { > swiotlb_dyn_free(&pool->rcu); > return -1; > @@ -1189,9 +1197,21 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, > size_t alloc_size, unsigned int alloc_align_mask, > struct io_tlb_pool **retpool) > { > - *retpool = &dev->dma_io_tlb_mem->defpool; > - return swiotlb_pool_find_slots(dev, *retpool, > - orig_addr, alloc_size, alloc_align_mask); > + struct io_tlb_pool *pool; > + int start, i; > + int index; > + > + *retpool = pool = &dev->dma_io_tlb_mem->defpool; > + i = start = raw_smp_processor_id() & (pool->nareas - 1); > + do { > + index = swiotlb_search_pool_area(dev, pool, i, orig_addr, > + alloc_size, alloc_align_mask); > + if (index >= 0) > + return index; > + if (++i >= pool->nareas) > + i = 0; > + } while (i != start); > + return -1; > } > > #endif /* CONFIG_SWIOTLB_DYNAMIC */