Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4193144pxu; Wed, 9 Dec 2020 10:32:33 -0800 (PST) X-Google-Smtp-Source: ABdhPJy7Ws2CNMPEQ7Qb12mg8ndGhkdlf7r9mWa2SpKxl2bNI9InkEskzE6TBWPToxMWdTXa221x X-Received: by 2002:a17:906:4809:: with SMTP id w9mr3212209ejq.139.1607538753438; Wed, 09 Dec 2020 10:32:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607538753; cv=none; d=google.com; s=arc-20160816; b=DODt5Ek3GsKUOmoaFkE/BEVaQuqQnYMEvBirISltD72slb+Tw/6MSIOcBL7YvfTY+t CDlnSAtANPUuS0Vc1UlZZ5iwDp2FA0pBclXRsMY7tqYfVuTCVgurodaSsD7UyC3LGy9l iidJbg6W78e+o+CRzGpRLvqLfTzEUMHWQ520Y3OTBgi+wdudx5kuveYl1k7q7NkcUUA9 3HOn6qPPOGiOdgCwKWRA8hNYDuYqocjb9tSbUME7gOllonZJDvHrSoFpgU67BlNL1RnA +RHnPyWQ2/C+lZQ3nU9w5dodD0kmjzOwRp8z8kJzJp8Bxjw9Ah59SA63BnGqV2ckqlgM 68wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=nKBC2/lhDt6ByyJtwna96gSqdlwQiBWVt7cu3s1nTH0=; b=VQxwaJJ2ZO/FamTSMMyOe041z0XaUguXYUnX29ePgr2AWInBtD1046mmj4Zvtccrdi Mr6oFQbAm0B6YG26jub0UXlkf+VaeYwdZ1hMGHgQQoEsFfnHxOLtxvyJjCQIM2PWUvnr AKBjQG2YT7F0UOE0LVZnx3prSF6MVZVYvke4+4EEmcESDMXAXS9xRx1VGpGNF4vMqLSA ViUJ05m5WGY4WagBtluCdueFVILV2GVkhiL1op4omx9bGaP7ThWMNwRAgUlrS5xKCUIK ZF6FQtNB18xwZ+qOkDgo3sIFNK7biBBBtsWKfhjPf/Ymel1zorDniLyGH93MPotrfXsC je3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bm11si1146264ejb.442.2020.12.09.10.32.10; Wed, 09 Dec 2020 10:32:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733050AbgLIS2C (ORCPT + 99 others); Wed, 9 Dec 2020 13:28:02 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:9414 "EHLO szxga07-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732863AbgLIS14 (ORCPT ); Wed, 9 Dec 2020 13:27:56 -0500 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Crlr04tb8z7ByW; Thu, 10 Dec 2020 02:26:40 +0800 (CST) Received: from localhost.localdomain (10.69.192.58) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.487.0; Thu, 10 Dec 2020 02:27:03 +0800 From: John Garry To: , CC: , , , , , John Garry Subject: [PATCH v4 3/3] iommu/iova: Flush CPU rcache for when a depot fills Date: Thu, 10 Dec 2020 02:23:09 +0800 Message-ID: <1607538189-237944-4-git-send-email-john.garry@huawei.com> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1607538189-237944-1-git-send-email-john.garry@huawei.com> References: <1607538189-237944-1-git-send-email-john.garry@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.58] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Leizhen reported some time ago that IOVA performance may degrade over time [0], but unfortunately his solution to fix this problem was not given attention. To summarize, the issue is that as time goes by, the CPU rcache and depot rcache continue to grow. As such, IOVA RB tree access time also continues to grow. At a certain point, a depot may become full, and also some CPU rcaches may also be full when inserting another IOVA is attempted. For this scenario, currently the "loaded" CPU rcache is freed and a new one is created. This freeing means that many IOVAs in the RB tree need to be freed, which makes IO throughput performance fall off a cliff in some storage scenarios: Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops] Jobs: 12 (f=12): [RRRRRRRRRRRR] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops] And continue in this fashion, without recovering. Note that in this example it was required to wait 16 hours for this to occur. Also note that IO throughput also becomes gradually becomes more unstable leading up to this point. This problem is only seen for non-strict mode. For strict mode, the rcaches stay quite compact. As a solution to this issue, judge that the IOVA caches have grown too big when cached magazines need to be free, and just flush all the CPUs rcaches instead. The depot rcaches, however, are not flushed, as they can be used to immediately replenish active CPUs. In future, some IOVA compaction could be implemented to solve the instability issue, which I figure could be quite complex to implement. [0] https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leizhen@huawei.com/ Analyzed-by: Zhen Lei Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry Reviewed-by: Zhen Lei --- drivers/iommu/iova.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 732ee687e0e2..39b7488de8bb 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -841,7 +841,6 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, struct iova_rcache *rcache, unsigned long iova_pfn) { - struct iova_magazine *mag_to_free = NULL; struct iova_cpu_rcache *cpu_rcache; bool can_insert = false; unsigned long flags; @@ -863,13 +862,12 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, if (cpu_rcache->loaded) rcache->depot[rcache->depot_size++] = cpu_rcache->loaded; - } else { - mag_to_free = cpu_rcache->loaded; + can_insert = true; + cpu_rcache->loaded = new_mag; } spin_unlock(&rcache->lock); - - cpu_rcache->loaded = new_mag; - can_insert = true; + if (!can_insert) + iova_magazine_free(new_mag); } } @@ -878,10 +876,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad, spin_unlock_irqrestore(&cpu_rcache->lock, flags); - if (mag_to_free) { - iova_magazine_free_pfns(mag_to_free, iovad); - iova_magazine_free(mag_to_free); - } + if (!can_insert) + free_all_cpu_cached_iovas(iovad); return can_insert; } -- 2.26.2