Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp7082509imm; Sun, 20 May 2018 18:16:07 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo6bPL2LoPiPzy0alzozFn7UBvntezf5pk2qAdD9nmzn5INHDOwVdzijkkdzUdpmlcn+O3G X-Received: by 2002:a62:2043:: with SMTP id g64-v6mr18097707pfg.12.1526865367846; Sun, 20 May 2018 18:16:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526865367; cv=none; d=google.com; s=arc-20160816; b=jJwjj0DIcniCrlHG3M2G8p5fIauHqGyrEhz9PjbMPZLveCV0io9qOc7swA98tlpaLy Rxs4bs5k5r2xHTAx1F8UH8e8YfvYN8EgaILncQqLcdnUeaOPmdLl4V1Fi3BojKDu0qOC yUM52CHN7sg+AyOhh2Bg6wq/tc9NUo11qS5qa2XcCjNTYm8H2sTU4GB8UOBkib2hPgdz uwIrCktSddDXCKEg8HRtZI+eBZb3Qu3OGzQnHs8+gXj4oXYHmwFBV+hNFMOXs+kWusUx YZPelC/c8u+/S/i4nwaSEN2l6PeNnbrgEjHztWfHE3R/yRC4OEhU1Z1cxBBrpClZ0xs9 QEJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=wTPqRRigZJYtkHVDcdMLlDHAzO055CeD+uhNqAp/SrA=; b=RTr/F3J3aykVdy3/drPbfQvFRgM9Joxjaro1n1icm9fuwZB1GHCUsodEmEev9WZ/wT o9lFU8ZZvN95VDtKyHreXp9Kq9iwAnGM3pE8bDM6wUhUqMQ26z/kfKXKVi0UAoDpOVzx 74BnDlsZ2xzzVgNec5zfFdgBDHIpU6LKsbp1Tr53Dwcpvp7uBR6gH9kaK4hfDCwvXv6i 8WWnSvOzgabog2IdUNd3fd/R619X6v/alPlH5BGHxdzUgZijVeyf7+SbDz+y9DolN0Wq ETsHAVLW35jZwpGwoUxp2QyjtGWMu4glqa/ktlWr1mQjtr+uPKRW3qAk8KJQTxxseJtQ Dvzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GbeyZ+jO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v12-v6si13112622plz.33.2018.05.20.18.15.51; Sun, 20 May 2018 18:16:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GbeyZ+jO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752102AbeEUBPl (ORCPT + 99 others); Sun, 20 May 2018 21:15:41 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:44250 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751500AbeEUBPj (ORCPT ); Sun, 20 May 2018 21:15:39 -0400 Received: by mail-oi0-f66.google.com with SMTP id e80-v6so11631791oig.11 for ; Sun, 20 May 2018 18:15:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wTPqRRigZJYtkHVDcdMLlDHAzO055CeD+uhNqAp/SrA=; b=GbeyZ+jOJApXpRE9OBw48QgJpQiTK+wmILA/n9xoViFZYOAepnKv9i5FksTMv0o14E zNl26KWEHpDii4u6XmDglpNIm8OyubOS/TkxIQGsb7joRF1rPseVGWfTpGVthi/e9suU o7twqO0PcRbkgMB7PE7QdH75ax5+7z11PyCJUqv4kJKsUbQdU/7gye0ULif9MhfkucFA DZTktOhY2UVDtANwYZ1np96ynP5uwnvaJRDX3IHbLrwFgEkee0EHJoDuKJ8fjgJtKkD+ LR6rmCnkt8fhLoMMPIF8myBvqbnrNTk9GFkso7j7u5DOSRRg96tn01zgXlkaH/volg2x zQLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wTPqRRigZJYtkHVDcdMLlDHAzO055CeD+uhNqAp/SrA=; b=eZqKoU7HHu4k5MILn4ZkTIgY9pNCtCGFQM81jpuwbjojHD0Ewte5eSbG9vFIOQ9TED +iiybZ0SqelMZXYa3bS9Mm6KhAjS3v7krDy2tf9/oQY4Fyf632zetnQZ55YMC/4Ujayt kV0kqnakgukfB67H89iJ9jb+eC97reYp7a9TQehJ53x1RAuEK45hLHluXLET+bOQUq0z UAJ009hVEKNUcmWzFwPInPZ07/HW4bQlbnz0v188sobTtxOhygzrmwA/tHk/y4GCGg8Y YTPsEjshMaBtrySIwaQlGByZRLGz1tU3on7cuDQUYrjgPVtFBathQrLHBXQWQJVODs3X PGBg== X-Gm-Message-State: ALKqPwenN3ZNqaZ3CuWduxIghAhkS/oo8+ABiM3b/NNzUCiwpWzOQze/ 1sYC0ZGSEqBGyNInsf7lgfEHXEQyrh3sE9Ud1/w= X-Received: by 2002:aca:4e15:: with SMTP id c21-v6mr9808701oib.254.1526865339286; Sun, 20 May 2018 18:15:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.201.88.68 with HTTP; Sun, 20 May 2018 18:15:38 -0700 (PDT) In-Reply-To: References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> From: Ganapatrao Kulkarni Date: Mon, 21 May 2018 06:45:38 +0530 Message-ID: Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Robin Murphy Cc: Ganapatrao Kulkarni , Joerg Roedel , iommu@lists.linux-foundation.org, LKML , tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert Richter , Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni wrote: > Hi Robin, > > On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni > wrote: >> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy wrote: >>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote: >>>> >>>> The performance drop is observed with long hours iperf testing using 40G >>>> cards. This is mainly due to long iterations in finding the free iova >>>> range in 32bit address space. >>>> >>>> In current implementation for 64bit PCI devices, there is always first >>>> attempt to allocate iova from 32bit(SAC preferred over DAC) address >>>> range. Once we run out 32bit range, there is allocation from higher range, >>>> however due to cached32_node optimization it does not suppose to be >>>> painful. cached32_node always points to recently allocated 32-bit node. >>>> When address range is full, it will be pointing to last allocated node >>>> (leaf node), so walking rbtree to find the available range is not >>>> expensive affair. However this optimization does not behave well when >>>> one of the middle node is freed. In that case cached32_node is updated >>>> to point to next iova range. The next iova allocation will consume free >>>> range and again update cached32_node to itself. From now on, walking >>>> over 32-bit range is more expensive. >>>> >>>> This patch adds fix to update cached node to leaf node when there are no >>>> iova free range left, which avoids unnecessary long iterations. >>> >>> >>> The only trouble with this is that "allocation failed" doesn't uniquely mean >>> "space full". Say that after some time the 32-bit space ends up empty except >>> for one page at 0x1000 and one at 0x80000000, then somebody tries to >>> allocate 2GB. If we move the cached node down to the leftmost entry when >>> that fails, all subsequent allocation attempts are now going to fail despite >>> the space being 99.9999% free! >>> >>> I can see a couple of ways to solve that general problem of free space above >>> the cached node getting lost, but neither of them helps with the case where >>> there is genuinely insufficient space (and if anything would make it even >>> slower). In terms of the optimisation you want here, i.e. fail fast when an >>> allocation cannot possibly succeed, the only reliable idea which comes to >>> mind is free-PFN accounting. I might give that a go myself to see how ugly >>> it looks. > > For this testing, dual port intel 40G card(XL710) used and both ports > were connected in loop-back. Ran iperf server and clients on both > ports(used NAT to route packets out on intended ports).There were 10 > iperf clients invoked every 60 seconds in loop for hours for each > port. Initially the performance on both ports is seen close to line > rate, however after test ran about 4 to 6 hours, the performance > started dropping to very low (to few hundred Mbps) on both > connections. > > IMO, this is common bug and should happen on any other platforms too > and needs to be fixed at the earliest. > Please let me know if you have better way to fix this, i am happy to > test your patch! any update on this issue? > >> >> i see 2 problems in current implementation, >> 1. We don't replenish the 32 bits range, until first attempt of second >> allocation(64 bit) fails. >> 2. Having per cpu cache might not yield good hit on platforms with >> more number of CPUs. >> >> however irrespective of current issues, It makes sense to update >> cached node as done in this patch , when there is failure to get iova >> range using current cached pointer which is forcing for the >> unnecessary time consuming do-while iterations until any replenish >> happens! >> >> thanks >> Ganapat >> >>> >>> Robin. >>> >>> >>>> Signed-off-by: Ganapatrao Kulkarni >>>> --- >>>> drivers/iommu/iova.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c >>>> index 83fe262..e6ee2ea 100644 >>>> --- a/drivers/iommu/iova.c >>>> +++ b/drivers/iommu/iova.c >>>> @@ -201,6 +201,12 @@ static int __alloc_and_insert_iova_range(struct >>>> iova_domain *iovad, >>>> } while (curr && new_pfn <= curr_iova->pfn_hi); >>>> if (limit_pfn < size || new_pfn < iovad->start_pfn) { >>>> + /* No more cached node points to free hole, update to leaf >>>> node. >>>> + */ >>>> + struct iova *prev_iova; >>>> + >>>> + prev_iova = rb_entry(prev, struct iova, node); >>>> + __cached_rbnode_insert_update(iovad, prev_iova); >>>> spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); >>>> return -ENOMEM; >>>> } >>>> >>> > > thanks > Ganapat