Received: by 10.192.165.148 with SMTP id m20csp1821214imm; Thu, 26 Apr 2018 02:46:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZos+CoOTOmbHArqEMiEQ/SXvWqXmmGDeHpPMlvrDjEVHm033AnHeMh+xg974EEFjHTwafDr X-Received: by 2002:a17:902:8305:: with SMTP id bd5-v6mr7695896plb.13.1524736009647; Thu, 26 Apr 2018 02:46:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524736009; cv=none; d=google.com; s=arc-20160816; b=xgHltJf/KKIsKIeJ6uYRU9keZ3iqhs6MNAnl4dS2b+XsmB8O9ZMbl3I5VgaJbmSs04 eoAj0aa+qZvDGJC09Z0GWD5mGTITpKJg0LD7vAwwPAME/+Utd7oL3SurvlqGoSSCOY5m PK7bJR23uJ2syh0VAj3dKzguJ4Cd8ash8sKrKLHnIwy2gOEqL08jxRHcXYMUlActrvZW +Q7QmJJsIGoavAWuAsmWbEgMiHJh18OzEbyAuI0notNKL1A2X4SgdPFQQq/SwkmcB4Mz vHTWmNvsTNSK8hIGgiZmvoc/u1qnF6gUCtmvyaWgUqJDVo8S01/eud9lVT87bgCrRU1T wIwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=icOJHJwTsqLIcaPcqmpRYRikIk22KmWXK6nD8j+e61U=; b=cz9qsm/Hh2dDyO8lukQQwygKhQ0fSH2KiRjOe2k7/KBT54yu//p009FrnvMDInWv5M fOpDfyN+hJLPC0vZllUQWR6JSVLkIysY29D+ZiHdviHe+G+UBkvP+qmg+LZhOiW2a1Nw gFxkyW1zsBayIl5XWI+FS/zNTEpWItj81AiYmThDobiWSWsh0yqRsnL4sMwaxTE0q+Z1 ZuteRM+cdulGhnTKpL5M43ne9Swf/h7GOFxqQ+t7DbH6UWG8gvMkCWums8OkbmyykOSl oeg7raNqmMA/klM3kUhJgc7hvcUaxYdm9WrErqIrmOKrxCemy+kn3dHugOlWULxVNH3i Ml7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ldsq5MQA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n6-v6si19351913plp.386.2018.04.26.02.46.35; Thu, 26 Apr 2018 02:46:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ldsq5MQA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754715AbeDZJpN (ORCPT + 99 others); Thu, 26 Apr 2018 05:45:13 -0400 Received: from mail-ot0-f196.google.com ([74.125.82.196]:37384 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753629AbeDZJpC (ORCPT ); Thu, 26 Apr 2018 05:45:02 -0400 Received: by mail-ot0-f196.google.com with SMTP id 77-v6so22587888otd.4 for ; Thu, 26 Apr 2018 02:45:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=icOJHJwTsqLIcaPcqmpRYRikIk22KmWXK6nD8j+e61U=; b=ldsq5MQAz9mDiXonVv/KUeKMbsbgpg9kneTBcRxBi7NWkkZV2yihsFzE7Ts6CGWhH7 nS3wJJXYgk19lqYFhwfQ20Occr0RzxMYkTE01oBUGBuTKTP0NZBy3SXQ5q8JBELWqDaI bGPgnZij/YyuHHauGuk1K3UOBv9BsizmmKxQ1U916L5PNrC9yGu/mcy47ZLw+CABzO5e 2wFm3UxFC+utmD9UGzd9D98YRt0Fo5zgcnmLWAoXiay456ZWEyXKR9Vg3KMB4egp+342 +Oujf3tsEGz6voneg94j8Id24I5g4yNi/qmG9S/D98VpK6O7H3qKqLV0VSV4aUE9icz/ 2XIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=icOJHJwTsqLIcaPcqmpRYRikIk22KmWXK6nD8j+e61U=; b=hirqwZVLYSfjn1fmMzzP+xYM7Ms0C0yoZvEIIyrutqeM9gnTpMBO5ywouWfyzmXhcl 00SzESABTXMHT5x9gbzz4mqXv1YDBgF29BPaQebp751/WZOJAydhxqrKNm8c7szHU++J 8v47g05iTZZDP3pRp7U6X5/blv4zE/yw29ssSCMTGu8C1YXXzOQqSIHlB//RqeTnSL+v v/u725rmNQADVuS2PBLwdDOP84qu6EYuH1S3xEMw5CwhNMCSV6lqeWT/HExhv8kRTs7x +KW9wDwefN/+HegHfpp9JdVFuQZM8JfjytJCs0WGk/AlApUAz+emiDJzkxnVD6OhD7+8 39sA== X-Gm-Message-State: ALQs6tASiMShlOivruqJWHKdZICVp6cKY8ogwD0Ej8zys4FDyBR+0Re0 GCtsxS5omEsldNpiZbkdgKR6ukPrL6HwDFcLde4= X-Received: by 2002:a9d:2842:: with SMTP id h2-v6mr17862069otd.210.1524735901026; Thu, 26 Apr 2018 02:45:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.201.88.68 with HTTP; Thu, 26 Apr 2018 02:45:00 -0700 (PDT) In-Reply-To: References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> From: Ganapatrao Kulkarni Date: Thu, 26 Apr 2018 15:15:00 +0530 Message-ID: Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Robin Murphy Cc: Ganapatrao Kulkarni , Joerg Roedel , iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org, tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert Richter , Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Robin, On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni wrote: > On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy wrote: >> On 19/04/18 18:12, Ganapatrao Kulkarni wrote: >>> >>> The performance drop is observed with long hours iperf testing using 40G >>> cards. This is mainly due to long iterations in finding the free iova >>> range in 32bit address space. >>> >>> In current implementation for 64bit PCI devices, there is always first >>> attempt to allocate iova from 32bit(SAC preferred over DAC) address >>> range. Once we run out 32bit range, there is allocation from higher range, >>> however due to cached32_node optimization it does not suppose to be >>> painful. cached32_node always points to recently allocated 32-bit node. >>> When address range is full, it will be pointing to last allocated node >>> (leaf node), so walking rbtree to find the available range is not >>> expensive affair. However this optimization does not behave well when >>> one of the middle node is freed. In that case cached32_node is updated >>> to point to next iova range. The next iova allocation will consume free >>> range and again update cached32_node to itself. From now on, walking >>> over 32-bit range is more expensive. >>> >>> This patch adds fix to update cached node to leaf node when there are no >>> iova free range left, which avoids unnecessary long iterations. >> >> >> The only trouble with this is that "allocation failed" doesn't uniquely mean >> "space full". Say that after some time the 32-bit space ends up empty except >> for one page at 0x1000 and one at 0x80000000, then somebody tries to >> allocate 2GB. If we move the cached node down to the leftmost entry when >> that fails, all subsequent allocation attempts are now going to fail despite >> the space being 99.9999% free! >> >> I can see a couple of ways to solve that general problem of free space above >> the cached node getting lost, but neither of them helps with the case where >> there is genuinely insufficient space (and if anything would make it even >> slower). In terms of the optimisation you want here, i.e. fail fast when an >> allocation cannot possibly succeed, the only reliable idea which comes to >> mind is free-PFN accounting. I might give that a go myself to see how ugly >> it looks. For this testing, dual port intel 40G card(XL710) used and both ports were connected in loop-back. Ran iperf server and clients on both ports(used NAT to route packets out on intended ports).There were 10 iperf clients invoked every 60 seconds in loop for hours for each port. Initially the performance on both ports is seen close to line rate, however after test ran about 4 to 6 hours, the performance started dropping to very low (to few hundred Mbps) on both connections. IMO, this is common bug and should happen on any other platforms too and needs to be fixed at the earliest. Please let me know if you have better way to fix this, i am happy to test your patch! > > i see 2 problems in current implementation, > 1. We don't replenish the 32 bits range, until first attempt of second > allocation(64 bit) fails. > 2. Having per cpu cache might not yield good hit on platforms with > more number of CPUs. > > however irrespective of current issues, It makes sense to update > cached node as done in this patch , when there is failure to get iova > range using current cached pointer which is forcing for the > unnecessary time consuming do-while iterations until any replenish > happens! > > thanks > Ganapat > >> >> Robin. >> >> >>> Signed-off-by: Ganapatrao Kulkarni >>> --- >>> drivers/iommu/iova.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c >>> index 83fe262..e6ee2ea 100644 >>> --- a/drivers/iommu/iova.c >>> +++ b/drivers/iommu/iova.c >>> @@ -201,6 +201,12 @@ static int __alloc_and_insert_iova_range(struct >>> iova_domain *iovad, >>> } while (curr && new_pfn <= curr_iova->pfn_hi); >>> if (limit_pfn < size || new_pfn < iovad->start_pfn) { >>> + /* No more cached node points to free hole, update to leaf >>> node. >>> + */ >>> + struct iova *prev_iova; >>> + >>> + prev_iova = rb_entry(prev, struct iova, node); >>> + __cached_rbnode_insert_update(iovad, prev_iova); >>> spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); >>> return -ENOMEM; >>> } >>> >> thanks Ganapat