Received: by 10.192.165.148 with SMTP id m20csp3631230imm; Mon, 23 Apr 2018 09:38:57 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+M4VJ6Bwj0V1z4ZWnNtcaf0llteVr23BRi4LedJIvQjoRrEAFrB7zfyfIaV7MhSxr/uj3Z X-Received: by 10.98.189.14 with SMTP id a14mr20153294pff.30.1524501537715; Mon, 23 Apr 2018 09:38:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524501537; cv=none; d=google.com; s=arc-20160816; b=IW9hLCN0evpDIObj2mwn1srsB+spAH8ijI9CzzJBY0UA/vu6/QH8AAw+U7HuRHvbnI Nm8inXsGdYpeoX3tb8WnhPy89C3BxHma227vpn5joDGkDi4fXV4kvgpMUmlU+kLmviMT MPGFJHRZ/wEHwKd8jqb+H0APnVC/4iHx9RYyCyQAFT/e8Mrgn4AH7cEc7dVmO/ULBnAH W9cC22mxfqPsGimQseE0nA7I1sONyaKGxzjux465sRn9U21b7BnQERI4z8Rxq7cBacAC lBmImeUAVpo2XjfBfX5+p0DmLu1f4LaH/ra22CZOv/jsE4FBZLyuOCSDtvGunFW7eqPZ vhRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=OVQcRNmR429Nndjd/OG7K+Nua8y9OVhx0bxXiLeGDCw=; b=UgQ/uS+yUv3VHDxfON3FEr45bh4sKq9Oj1rXV4mlSm+laX06fm3dF0l+VTEduO75sR qDo+DUODMmHYGngrEB78H2axD1z9mPPDYxyKEkjFEEtHWLFGQthMiqccyrWrhCjgFw6c nl8unRlvrwL1gjfD4ONc9uhrnDw0DOIFBVh0BQYFpHYo4JPIp3ig71q9bEOsT1kP6W+J t+VM43REsg30+K2YzbpV51PWxS7NfHKP+XC6c0xbB23H1Ee3UfPDloP9aK5XSXS4N/Jo 9T2GJvQNeqaVZylF52PNLu2V9CrkbC3G+xnmvfxq1xsuJL9DANQU8eibWAj1/Wq+rpN6 yB8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l14si9993191pgs.319.2018.04.23.09.38.42; Mon, 23 Apr 2018 09:38:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755549AbeDWQhQ (ORCPT + 99 others); Mon, 23 Apr 2018 12:37:16 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:43258 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754900AbeDWQhP (ORCPT ); Mon, 23 Apr 2018 12:37:15 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0799615AD; Mon, 23 Apr 2018 09:37:15 -0700 (PDT) Received: from [10.1.210.88] (e110467-lin.cambridge.arm.com [10.1.210.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1D6083F590; Mon, 23 Apr 2018 09:37:12 -0700 (PDT) Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Ganapatrao Kulkarni , joro@8bytes.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Cc: tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert.Richter@cavium.com, Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com, gklkml16@gmail.com References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> From: Robin Murphy Message-ID: Date: Mon, 23 Apr 2018 17:37:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/04/18 18:12, Ganapatrao Kulkarni wrote: > The performance drop is observed with long hours iperf testing using 40G > cards. This is mainly due to long iterations in finding the free iova > range in 32bit address space. > > In current implementation for 64bit PCI devices, there is always first > attempt to allocate iova from 32bit(SAC preferred over DAC) address > range. Once we run out 32bit range, there is allocation from higher range, > however due to cached32_node optimization it does not suppose to be > painful. cached32_node always points to recently allocated 32-bit node. > When address range is full, it will be pointing to last allocated node > (leaf node), so walking rbtree to find the available range is not > expensive affair. However this optimization does not behave well when > one of the middle node is freed. In that case cached32_node is updated > to point to next iova range. The next iova allocation will consume free > range and again update cached32_node to itself. From now on, walking > over 32-bit range is more expensive. > > This patch adds fix to update cached node to leaf node when there are no > iova free range left, which avoids unnecessary long iterations. The only trouble with this is that "allocation failed" doesn't uniquely mean "space full". Say that after some time the 32-bit space ends up empty except for one page at 0x1000 and one at 0x80000000, then somebody tries to allocate 2GB. If we move the cached node down to the leftmost entry when that fails, all subsequent allocation attempts are now going to fail despite the space being 99.9999% free! I can see a couple of ways to solve that general problem of free space above the cached node getting lost, but neither of them helps with the case where there is genuinely insufficient space (and if anything would make it even slower). In terms of the optimisation you want here, i.e. fail fast when an allocation cannot possibly succeed, the only reliable idea which comes to mind is free-PFN accounting. I might give that a go myself to see how ugly it looks. Robin. > Signed-off-by: Ganapatrao Kulkarni > --- > drivers/iommu/iova.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index 83fe262..e6ee2ea 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -201,6 +201,12 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, > } while (curr && new_pfn <= curr_iova->pfn_hi); > > if (limit_pfn < size || new_pfn < iovad->start_pfn) { > + /* No more cached node points to free hole, update to leaf node. > + */ > + struct iova *prev_iova; > + > + prev_iova = rb_entry(prev, struct iova, node); > + __cached_rbnode_insert_update(iovad, prev_iova); > spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); > return -ENOMEM; > } >