Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1394503imm; Thu, 12 Jul 2018 00:47:02 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdyr/KvKB4yENdD6okfn73OdocwF3P16A74RmlbuZ5xMLRs05pT4y/fBiy5e/XIVuS5f6S/ X-Received: by 2002:a63:338e:: with SMTP id z136-v6mr1071222pgz.171.1531381622251; Thu, 12 Jul 2018 00:47:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531381622; cv=none; d=google.com; s=arc-20160816; b=nXxATM9dTJbRblYZkB2rS7g989HvSq8n9VVID9ga/a8jJJ9E4zOgFvXbmyKii2tUkh Y0jZxrhymbMiRZ1KOn6eIojL3WrKETzAi6ilxoKvQI8n25kS5Ezh9T9d0V0pwFur5kb9 bgC377bNdcQyBQJVtMZq2gTZzfC3M50ikZavO4xSdUtItVINEg52083Q5BOQnQXQyxyb NnsK2JDYrqVaedho1ok3hIZ8+lIQk09NswoyfXVpa6oo86e6VRObx9f6lYqYtxFoE/I5 qWZ9T5SsE3vH3WvJgYVE641RKlmeQJFUCpKSHadKtCXmYjifDlk1fm1WBsQPtcqQbte1 mviA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=6MHKfnu2vhpngdyTa36QjKgIWuWjrDFMPbstN1nfsfo=; b=lAFdruphDk2rptenxkFDHZKXsy5uWfKDNckaqyu8pljp1WB/jLcYvUH0BRT9Kcj3Ne e2ZbfkkUsJfJq1a2P3UUhE8AelYf3+gSf4RHm4btOGCulSoavaKEwCys9rzE0q9rEN1C r8nVhjqdUDrgKqjdYoIvoCwIiz1PMnPSgkwsKcEE3Y+3yZCjiyliPEkt0UAYlEKK5qH5 sVFzBS7Sk/nO2W7bKl4WMwba4thgX5FpU6ZdSj/L7oCi947H8RXVIiwJqyH6X6v5pKZX pOc6WUrZnGmmjLw/MOThLPmESnT0ZObmujraLO2Csp2vDPy5v6utbKxHYS2hmxHkAUn0 B5Bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lWrH+m0m; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s65-v6si20186020pgb.486.2018.07.12.00.46.46; Thu, 12 Jul 2018 00:47:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lWrH+m0m; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726845AbeGLHxp (ORCPT + 99 others); Thu, 12 Jul 2018 03:53:45 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:45791 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbeGLHxo (ORCPT ); Thu, 12 Jul 2018 03:53:44 -0400 Received: by mail-oi0-f65.google.com with SMTP id q11-v6so28532058oic.12 for ; Thu, 12 Jul 2018 00:45:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=6MHKfnu2vhpngdyTa36QjKgIWuWjrDFMPbstN1nfsfo=; b=lWrH+m0m1bpSGfjp1XGLXbSyBPNtYvPqipflmXH7wDqoVwsdDue4tJoDWnBe6oFas5 sCoo/aDB4UffXONKt2EJglo1c95UT9k400YBI5ZUx4iKPqdxkdxi7B3V6aWwT5VGo8wN 8yrJDksYd66w5ctvfnP4q5QTkCWNZeHMilriBEx8GuJJKZUAme80h73AEsF0UkCvz7tE 1HYevT9TcrTpW+EoD3+gCFsFT4pKR62yCn/SOU8ZPviInDrSU3EKRA/N8/i4AUzuC2/U iUBn+V5PjJN9+8RotJav6nDkuauL1CZsIjYSHmbyxgeJGPS75PG4trSPFagmtQQnnyOo 1gIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=6MHKfnu2vhpngdyTa36QjKgIWuWjrDFMPbstN1nfsfo=; b=HWEyy7htEJnxlJlKZBeunJgM9V+ToxXBs1Npe9JSQVw8jtLk8yUycwqmzQF3Gz9g1u Xkewxs9K0IFS5uzcJke2gooiozYzGWPPopqfI6PZ3W8VCSHH+d01YxOJEWEPxqYTYN9g Ujt+m6mUVUeo1uHusHM0qwGCVSwzpSuegYg2CEsJVNXkVoeiH3szCXG0RukrTFcXrKTq 6ewp7VBr552K7Z6nUBTqYpMGD7+vQVK94uUJ32MqIb+x9BAqWJqn8CDP2RgxuT/6rFc1 ePOKZyNVdzFSaWJhQaRexvaUimCktUs66LbAp0KwsdstFQm5lT7pOf5fMxjopw+Pn4em Ynkg== X-Gm-Message-State: AOUpUlFqpcVw3Mj+T8B0IixDk7ELgPd5gPdkH85ITV6k1sqsCSuYVE6E 2lsEY2Yezn2KmRTiTlhaazdTl1pAKRDYa03Y6Lc= X-Received: by 2002:aca:ed57:: with SMTP id l84-v6mr1288777oih.62.1531381522518; Thu, 12 Jul 2018 00:45:22 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac9:d37:0:0:0:0:0 with HTTP; Thu, 12 Jul 2018 00:45:22 -0700 (PDT) In-Reply-To: References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> From: Ganapatrao Kulkarni Date: Thu, 12 Jul 2018 13:15:22 +0530 Message-ID: Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Robin Murphy Cc: Ganapatrao Kulkarni , Joerg Roedel , iommu@lists.linux-foundation.org, LKML , tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert Richter , Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Robin, On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni wrote: > ping?? > > On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni wrote: >> On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni wrote: >>> Hi Robin, >>> >>> On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni >>> wrote: >>>> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy wrote: >>>>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote: >>>>>> >>>>>> The performance drop is observed with long hours iperf testing using 40G >>>>>> cards. This is mainly due to long iterations in finding the free iova >>>>>> range in 32bit address space. >>>>>> >>>>>> In current implementation for 64bit PCI devices, there is always first >>>>>> attempt to allocate iova from 32bit(SAC preferred over DAC) address >>>>>> range. Once we run out 32bit range, there is allocation from higher range, >>>>>> however due to cached32_node optimization it does not suppose to be >>>>>> painful. cached32_node always points to recently allocated 32-bit node. >>>>>> When address range is full, it will be pointing to last allocated node >>>>>> (leaf node), so walking rbtree to find the available range is not >>>>>> expensive affair. However this optimization does not behave well when >>>>>> one of the middle node is freed. In that case cached32_node is updated >>>>>> to point to next iova range. The next iova allocation will consume free >>>>>> range and again update cached32_node to itself. From now on, walking >>>>>> over 32-bit range is more expensive. >>>>>> >>>>>> This patch adds fix to update cached node to leaf node when there are no >>>>>> iova free range left, which avoids unnecessary long iterations. >>>>> >>>>> >>>>> The only trouble with this is that "allocation failed" doesn't uniquely mean >>>>> "space full". Say that after some time the 32-bit space ends up empty except >>>>> for one page at 0x1000 and one at 0x80000000, then somebody tries to >>>>> allocate 2GB. If we move the cached node down to the leftmost entry when >>>>> that fails, all subsequent allocation attempts are now going to fail despite >>>>> the space being 99.9999% free! >>>>> >>>>> I can see a couple of ways to solve that general problem of free space above >>>>> the cached node getting lost, but neither of them helps with the case where >>>>> there is genuinely insufficient space (and if anything would make it even >>>>> slower). In terms of the optimisation you want here, i.e. fail fast when an >>>>> allocation cannot possibly succeed, the only reliable idea which comes to >>>>> mind is free-PFN accounting. I might give that a go myself to see how ugly >>>>> it looks. did you get any chance to look in to this issue? i am waiting for your suggestion/patch for this issue! >>> >>> For this testing, dual port intel 40G card(XL710) used and both ports >>> were connected in loop-back. Ran iperf server and clients on both >>> ports(used NAT to route packets out on intended ports).There were 10 >>> iperf clients invoked every 60 seconds in loop for hours for each >>> port. Initially the performance on both ports is seen close to line >>> rate, however after test ran about 4 to 6 hours, the performance >>> started dropping to very low (to few hundred Mbps) on both >>> connections. >>> >>> IMO, this is common bug and should happen on any other platforms too >>> and needs to be fixed at the earliest. >>> Please let me know if you have better way to fix this, i am happy to >>> test your patch! >> >> any update on this issue? >>> >>>> >>>> i see 2 problems in current implementation, >>>> 1. We don't replenish the 32 bits range, until first attempt of second >>>> allocation(64 bit) fails. >>>> 2. Having per cpu cache might not yield good hit on platforms with >>>> more number of CPUs. >>>> >>>> however irrespective of current issues, It makes sense to update >>>> cached node as done in this patch , when there is failure to get iova >>>> range using current cached pointer which is forcing for the >>>> unnecessary time consuming do-while iterations until any replenish >>>> happens! >>>> >>>> thanks >>>> Ganapat >>>> >>>>> >>>>> Robin. >>>>> >>>>> >>>>>> Signed-off-by: Ganapatrao Kulkarni >>>>>> --- >>>>>> drivers/iommu/iova.c | 6 ++++++ >>>>>> 1 file changed, 6 insertions(+) >>>>>> >>>>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c >>>>>> index 83fe262..e6ee2ea 100644 >>>>>> --- a/drivers/iommu/iova.c >>>>>> +++ b/drivers/iommu/iova.c >>>>>> @@ -201,6 +201,12 @@ static int __alloc_and_insert_iova_range(struct >>>>>> iova_domain *iovad, >>>>>> } while (curr && new_pfn <= curr_iova->pfn_hi); >>>>>> if (limit_pfn < size || new_pfn < iovad->start_pfn) { >>>>>> + /* No more cached node points to free hole, update to leaf >>>>>> node. >>>>>> + */ >>>>>> + struct iova *prev_iova; >>>>>> + >>>>>> + prev_iova = rb_entry(prev, struct iova, node); >>>>>> + __cached_rbnode_insert_update(iovad, prev_iova); >>>>>> spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); >>>>>> return -ENOMEM; >>>>>> } >>>>>> >>>>> >>> >>> thanks >>> Ganapat thanks Ganapat