Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp857091imm; Wed, 25 Jul 2018 07:22:12 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdo29d21j7zw06bACqjLnSdB5edVr+Us4BtW2nBthDpB8vZvkx7XfPsj4lAGIN486oz08CW X-Received: by 2002:a62:47c4:: with SMTP id p65-v6mr22721502pfi.170.1532528532150; Wed, 25 Jul 2018 07:22:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532528532; cv=none; d=google.com; s=arc-20160816; b=PuG4MauRHI5FljLINQbIz/wOBSEGlmxF9ER7tQrJRdiPbvkuiFFHRDgoddS8h/vljS 9HnT5XhFW0GQOsYRG19dTEEIW/nT69e1MHUciM3+K5bSsSAX0gwsBffPophSU+Pj3FIN XXMPg9+E4rqvIW39MMqai9qXwc1bl7v9JE3JbBhS97Jrt5mmVrQbcX/yx3pvAyEA83Qd W4CwOUKGHej4gqPnpeNLwrb5bm24pThJus5idZ8dZZtrrkhqI5jTJLgaD9MauLPNZDMF LZlf8TW4B5yiKVRQ04ogjejH8axryuVyWOqlO3akQSvZmRBdZkMamrwpmlqFSUgsvMSM gPPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=vzmL8zZJh6t50znEZF69H6iKLK6ekyDRy40wF4/vU2c=; b=wC/gcJSNIHVxkNAlmarmMBjdP9BiHOADgs5KXgq/pBZDwbKBhrBrEcdIz5sZ0Hcc/q o1RHfr/AsV9j70M9qKu3NCrurWZ5uHJ5+RbISvHc0ZNARLlnC9v2An/QJT0ur1Wdg48N UK1rejE187+92+DTXH3YUd9s/rboZOk8IYyoGuG27wTIDMV58MOG0ahLNwSMNyTrDYev nZWeMAWn4bpKy9fc/h32kq7ULt+P59hswP0DX2PawdS+8VBR2yMW9SyS1cmsZoToTnA4 r3V5z/n51RfbFozIz3jhN56cbGhhBeCIsrQa1k1JP8eJVOUR1PaGrNFwTN5eH5qyuwq3 izcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v28-v6si13748860pfi.22.2018.07.25.07.21.56; Wed, 25 Jul 2018 07:22:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729295AbeGYPcp (ORCPT + 99 others); Wed, 25 Jul 2018 11:32:45 -0400 Received: from foss.arm.com ([217.140.101.70]:39778 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727881AbeGYPco (ORCPT ); Wed, 25 Jul 2018 11:32:44 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D66F418A; Wed, 25 Jul 2018 07:20:50 -0700 (PDT) Received: from [10.4.12.131] (e110467-lin.emea.arm.com [10.4.12.131]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F65D3F6A8; Wed, 25 Jul 2018 07:20:49 -0700 (PDT) Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Ganapatrao Kulkarni Cc: Ganapatrao Kulkarni , Joerg Roedel , iommu@lists.linux-foundation.org, LKML , tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert Richter , Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> From: Robin Murphy Message-ID: <3ed2046c-6912-9380-7ea4-4d921981c64c@arm.com> Date: Wed, 25 Jul 2018 15:20:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/07/18 08:45, Ganapatrao Kulkarni wrote: > Hi Robin, > > > On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni wrote: >> ping?? >> >> On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni wrote: >>> On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni wrote: >>>> Hi Robin, >>>> >>>> On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni >>>> wrote: >>>>> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy wrote: >>>>>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote: >>>>>>> >>>>>>> The performance drop is observed with long hours iperf testing using 40G >>>>>>> cards. This is mainly due to long iterations in finding the free iova >>>>>>> range in 32bit address space. >>>>>>> >>>>>>> In current implementation for 64bit PCI devices, there is always first >>>>>>> attempt to allocate iova from 32bit(SAC preferred over DAC) address >>>>>>> range. Once we run out 32bit range, there is allocation from higher range, >>>>>>> however due to cached32_node optimization it does not suppose to be >>>>>>> painful. cached32_node always points to recently allocated 32-bit node. >>>>>>> When address range is full, it will be pointing to last allocated node >>>>>>> (leaf node), so walking rbtree to find the available range is not >>>>>>> expensive affair. However this optimization does not behave well when >>>>>>> one of the middle node is freed. In that case cached32_node is updated >>>>>>> to point to next iova range. The next iova allocation will consume free >>>>>>> range and again update cached32_node to itself. From now on, walking >>>>>>> over 32-bit range is more expensive. >>>>>>> >>>>>>> This patch adds fix to update cached node to leaf node when there are no >>>>>>> iova free range left, which avoids unnecessary long iterations. >>>>>> >>>>>> >>>>>> The only trouble with this is that "allocation failed" doesn't uniquely mean >>>>>> "space full". Say that after some time the 32-bit space ends up empty except >>>>>> for one page at 0x1000 and one at 0x80000000, then somebody tries to >>>>>> allocate 2GB. If we move the cached node down to the leftmost entry when >>>>>> that fails, all subsequent allocation attempts are now going to fail despite >>>>>> the space being 99.9999% free! >>>>>> >>>>>> I can see a couple of ways to solve that general problem of free space above >>>>>> the cached node getting lost, but neither of them helps with the case where >>>>>> there is genuinely insufficient space (and if anything would make it even >>>>>> slower). In terms of the optimisation you want here, i.e. fail fast when an >>>>>> allocation cannot possibly succeed, the only reliable idea which comes to >>>>>> mind is free-PFN accounting. I might give that a go myself to see how ugly >>>>>> it looks. > > did you get any chance to look in to this issue? > i am waiting for your suggestion/patch for this issue! I got as far as [1], but I wasn't sure how much I liked it, since it still seems a little invasive for such a specific case (plus I can't remember if it's actually been debugged or not). I think in the end I started wondering whether it's even worth bothering with the 32-bit optimisation for PCIe devices - 4 extra bytes worth of TLP is surely a lot less significant than every transaction taking up to 50% more bus cycles was for legacy PCI. Robin. [1] http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a8e0e4af10ebebb3669750e05bf0028e5bd6afe8