Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3151006imm; Sun, 3 Jun 2018 21:06:43 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIwZYQS5TLrfO33SQXZFAXB313mRFOMFzMyMDoY7Ow3cNbSQGzRPXVBGJSoUzMlx4GHXvug X-Received: by 2002:a62:249b:: with SMTP id k27-v6mr19838874pfk.143.1528085203612; Sun, 03 Jun 2018 21:06:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528085203; cv=none; d=google.com; s=arc-20160816; b=I/3OfI7VKHHa3k8dmeeV6DDP/n+CqTvnorUI6ipmPb1sYErlgQkVF3O00KoTOeF4rN gk1JHAeTLvOMOtiS9JoOcoct+nYiaCYwRqz9Va8o7vUANeSqjez5WGBH5okkeAPTN7wA ikujFkTSxFaPQSafcxVZB4Y6Fr3ksS0/tA6rG6XaiemEM1Ax0mxp9CAp4G+rCrZN1zlF kt9bm6CftWcpycF1ySBis4G7SBSE0j+gvY3QIzcKDBHFKNR00YA+h4pvOYoSfQrkzuzx wJEwHgjWvGpSPt2E1SMurZoIBpFpRkcObiKofXB7hUMvEhXyH3al/Ud1ippH2TkfmB2M DajA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=TF4mWnO7IyprskyYXr/xSdJ85+oW+pADIUbQ8YyhOpE=; b=i3anPcmLtrnBSeW7Uhcp2TWB8dj7DFSvAuqga10R+bDhjUWbJJuxeHHNhMFPsKJWdX KXFCWy3TvFDEzLfQjP74iXDbNSwDPf2xpW487TngTxJHewF/ouvcPktlwte6h8apMKuP 6n8pTDHCgHae7llYoWgJP4pJsYJlAjWTCaKJeU5f8YzjRzqghQ1bWreZbyh+EyvijrPS u67dpkd+f2o+ENsi1SIWyTb5gnksd5IdLThGEo2tP2ncF9Lv+Uhe2CA8AmPHDwCL8vD3 ur7z2a14wveSHe3Cg8hig8HPaovV+jjL3vdpFU34FTzDxYQv0vJi1nI/bb8RJ8pdx5Ao uLQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OZzCDcZQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r15-v6si18635077pgn.498.2018.06.03.21.06.29; Sun, 03 Jun 2018 21:06:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OZzCDcZQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751379AbeFDEGH (ORCPT + 99 others); Mon, 4 Jun 2018 00:06:07 -0400 Received: from mail-ot0-f193.google.com ([74.125.82.193]:36817 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750899AbeFDEGF (ORCPT ); Mon, 4 Jun 2018 00:06:05 -0400 Received: by mail-ot0-f193.google.com with SMTP id m11-v6so35919176otf.3 for ; Sun, 03 Jun 2018 21:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=TF4mWnO7IyprskyYXr/xSdJ85+oW+pADIUbQ8YyhOpE=; b=OZzCDcZQkJAqu3MvdfM7slMk/qOxLLYrRp/2uLrRWTNJ/+0m3j8AfJe1pqe/b1pzoT 1CQDsCsxvh2fZT89Vz3Bs44K1ZVfnEj+teWo2zpp0P5WwuMDcWajimNzrVpkbxnCK4uB vfooKs8XnPcdXzR+BgKU9i+YScQ513NAkl+IJ3kQRS7vITqQ90vLqFVFcIaclf2ZF/QW 407QvzjxF3h8pkkqXS03cxg8PWTE2wU5BYucjvYc+7Hki7KqBJDut2oCZSFaNZXQY3XL r2xeH8VBiTylHL/pfQyTVCuVg2YGdrU0+ZF3F/sDevjMJWnj1L9Wh2rsMKCuQDWemddK juOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=TF4mWnO7IyprskyYXr/xSdJ85+oW+pADIUbQ8YyhOpE=; b=p+8qSoO7mFf8emZw5rpnhbFzewMKskev8C5WSI31KXjjUgMjr6RpWTAyPSVtPfo1Lt D9E9xn+uge90IY+SBxu+YMXodhyMBFUHBIGMZbE/vGkRkN5vwi979LziQDvpzUY8Ld0M gJGBJnY6eTB7fyC3EuhY1KHn6eC5rVTDmqK1npJzpM6EXqUvqy1SpPOb2ISxbsbdnq2+ /juudYJlFlHrj7YcJIKtSy0B9sYqIeROBRQkGqqsqvrw/2zJxYw7t3UFne/YtuaCpdtg MkE5wsVsZj37w+NZggNqKGnTf0MgHbbGhacmDR2dAIL7eZZZtxoRzre7USDNLpBs23qX Bn1g== X-Gm-Message-State: ALKqPwcHVq1MHU1Eh5qpCcD9H2GVnsEly5m6qIK9ojYRdhgGpnVwSXGC GWzqodxcTpB6lfn8b0fXfwZcO8cfIwlHkT3J0A4= X-Received: by 2002:a9d:2371:: with SMTP id k46-v6mr14537640otd.210.1528085165028; Sun, 03 Jun 2018 21:06:05 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac9:7a83:0:0:0:0:0 with HTTP; Sun, 3 Jun 2018 21:06:04 -0700 (PDT) In-Reply-To: References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> From: Ganapatrao Kulkarni Date: Mon, 4 Jun 2018 09:36:04 +0530 Message-ID: Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Robin Murphy Cc: Ganapatrao Kulkarni , Joerg Roedel , iommu@lists.linux-foundation.org, LKML , tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert Richter , Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ping?? On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni wrote: > On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni wrote: >> Hi Robin, >> >> On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni >> wrote: >>> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy wrote: >>>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote: >>>>> >>>>> The performance drop is observed with long hours iperf testing using 40G >>>>> cards. This is mainly due to long iterations in finding the free iova >>>>> range in 32bit address space. >>>>> >>>>> In current implementation for 64bit PCI devices, there is always first >>>>> attempt to allocate iova from 32bit(SAC preferred over DAC) address >>>>> range. Once we run out 32bit range, there is allocation from higher range, >>>>> however due to cached32_node optimization it does not suppose to be >>>>> painful. cached32_node always points to recently allocated 32-bit node. >>>>> When address range is full, it will be pointing to last allocated node >>>>> (leaf node), so walking rbtree to find the available range is not >>>>> expensive affair. However this optimization does not behave well when >>>>> one of the middle node is freed. In that case cached32_node is updated >>>>> to point to next iova range. The next iova allocation will consume free >>>>> range and again update cached32_node to itself. From now on, walking >>>>> over 32-bit range is more expensive. >>>>> >>>>> This patch adds fix to update cached node to leaf node when there are no >>>>> iova free range left, which avoids unnecessary long iterations. >>>> >>>> >>>> The only trouble with this is that "allocation failed" doesn't uniquely mean >>>> "space full". Say that after some time the 32-bit space ends up empty except >>>> for one page at 0x1000 and one at 0x80000000, then somebody tries to >>>> allocate 2GB. If we move the cached node down to the leftmost entry when >>>> that fails, all subsequent allocation attempts are now going to fail despite >>>> the space being 99.9999% free! >>>> >>>> I can see a couple of ways to solve that general problem of free space above >>>> the cached node getting lost, but neither of them helps with the case where >>>> there is genuinely insufficient space (and if anything would make it even >>>> slower). In terms of the optimisation you want here, i.e. fail fast when an >>>> allocation cannot possibly succeed, the only reliable idea which comes to >>>> mind is free-PFN accounting. I might give that a go myself to see how ugly >>>> it looks. >> >> For this testing, dual port intel 40G card(XL710) used and both ports >> were connected in loop-back. Ran iperf server and clients on both >> ports(used NAT to route packets out on intended ports).There were 10 >> iperf clients invoked every 60 seconds in loop for hours for each >> port. Initially the performance on both ports is seen close to line >> rate, however after test ran about 4 to 6 hours, the performance >> started dropping to very low (to few hundred Mbps) on both >> connections. >> >> IMO, this is common bug and should happen on any other platforms too >> and needs to be fixed at the earliest. >> Please let me know if you have better way to fix this, i am happy to >> test your patch! > > any update on this issue? >> >>> >>> i see 2 problems in current implementation, >>> 1. We don't replenish the 32 bits range, until first attempt of second >>> allocation(64 bit) fails. >>> 2. Having per cpu cache might not yield good hit on platforms with >>> more number of CPUs. >>> >>> however irrespective of current issues, It makes sense to update >>> cached node as done in this patch , when there is failure to get iova >>> range using current cached pointer which is forcing for the >>> unnecessary time consuming do-while iterations until any replenish >>> happens! >>> >>> thanks >>> Ganapat >>> >>>> >>>> Robin. >>>> >>>> >>>>> Signed-off-by: Ganapatrao Kulkarni >>>>> --- >>>>> drivers/iommu/iova.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c >>>>> index 83fe262..e6ee2ea 100644 >>>>> --- a/drivers/iommu/iova.c >>>>> +++ b/drivers/iommu/iova.c >>>>> @@ -201,6 +201,12 @@ static int __alloc_and_insert_iova_range(struct >>>>> iova_domain *iovad, >>>>> } while (curr && new_pfn <= curr_iova->pfn_hi); >>>>> if (limit_pfn < size || new_pfn < iovad->start_pfn) { >>>>> + /* No more cached node points to free hole, update to leaf >>>>> node. >>>>> + */ >>>>> + struct iova *prev_iova; >>>>> + >>>>> + prev_iova = rb_entry(prev, struct iova, node); >>>>> + __cached_rbnode_insert_update(iovad, prev_iova); >>>>> spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); >>>>> return -ENOMEM; >>>>> } >>>>> >>>> >> >> thanks >> Ganapat