Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp772169imm; Fri, 27 Jul 2018 05:59:17 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcDdzdKf5FV1Trz9w8kklM0A2PnWM82UEuEOKI0le2fouEgTzMCDtreHwyqnUC6YnMHrmH/ X-Received: by 2002:a17:902:b709:: with SMTP id d9-v6mr6021642pls.138.1532696357185; Fri, 27 Jul 2018 05:59:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532696357; cv=none; d=google.com; s=arc-20160816; b=lbkK1m/bLFkXTOu8y41yTSbZv6wwfQLLNCORM3TkN6em3lrNToqa671JJURqU0pzj2 z6nk9gfn5t7SMUQWnPOM/RadV4jxKXQWR5YbwFHF6utvkojsQ9fpCNmB18yYx5Afx9Xq n1WJOTV0EtUNx0ilU+rTZ1Ch7H8KUYeWm5v4P1ZyoqQSe8jzGoIsoFEMEORmYXg8wMgv IEWibP7+KyyPraFum7183HqEG5SRk3NK2adDCVCsDXEtBdBa4bKBpBzdLS0+BNLhGNon OAaAn2Y7Px19WWcR+6B0Ui/hJfFzY8kF//I3/ojkEUO/k6Cy3ElJW/OfLG2sYnYhSCGC z1bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=6v/0N8AiFLlSLTV0L4auuUkXb03OdHWLa0gwiyViN1E=; b=XmLmtjW9kjSwavISCtAEdC1KtIYj5e/1POl/tdQPVRuONYsRv6m9E9jMCIqAyM5Fkq fVSuGltPQnxvTanYzmCCnyAn8QnoLxAGo1OXHzEpz9X22bltj8HTGXuSdaZhWuoi6piy xtapQkok5z5qX5+iRt4SVInz1m8R+PPcF45iUySH3NGpNfB+eoLyiWrg7JRmb/IBwaxi O6uctp8on8R5oMvwfBI26GW9PDf7+q0onaCNKZAhhgz73AD2ASAtDa9sOtmJlqNPun0n VlgwGNK6lxIBZmGrTkSjJVl/MDI4aNOdYzeBNkO87c3LbfOQkC7OvoCUb9vP1oW8T1dq B/SQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KIzr+D+Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v123-v6si3910666pfb.324.2018.07.27.05.59.02; Fri, 27 Jul 2018 05:59:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KIzr+D+Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732075AbeG0OSg (ORCPT + 99 others); Fri, 27 Jul 2018 10:18:36 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:43533 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730314AbeG0OSg (ORCPT ); Fri, 27 Jul 2018 10:18:36 -0400 Received: by mail-oi0-f67.google.com with SMTP id b15-v6so8853095oib.10 for ; Fri, 27 Jul 2018 05:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=6v/0N8AiFLlSLTV0L4auuUkXb03OdHWLa0gwiyViN1E=; b=KIzr+D+Y6xNEjtdPPy6ZaVa7AFYhleYb/LDNZKwfEbLsAwHwNMs0MFO88itmZ4JTcw CPgnoVyUVuV3nle1f8qsW7LdsvPEOBQso25BIYOj7cTa2WFFIHc+VfRdXaioetzHulJI fNHPob22SJgDkB0vIITP724J9GZ+W81bOzPVc2o6Ah/h3u3G4mA7b6oK334csatAlXbc qYw4FHV9YAsgq4SNDy/9xRe3bxmFvarIdLdkrSaIaO8Vdo8iDCDNcxlK6xw5dZ7JhXgf Vet90l9dXsQRJn+h3vyscRUwZ90mRaUryu07/POkWA1l5qlMtI2/jvlw/xaVxNllA/qK RK0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=6v/0N8AiFLlSLTV0L4auuUkXb03OdHWLa0gwiyViN1E=; b=Q9o3RAG5gED3v+hHB5x2z3hNFvPAKp1OBbA+qHd429hOFnBFwkTOCypJC36kQ2YPrX vY9xog3mSl9wBQYAjIrFaUGw04UDw30G8UjQnhP8EeDnfHLCF6asULmMqXaQj8WxAq5K qPnQuRf7QxU9kXlviEqQoywcTYjDVIGfZ+MYcYtDrf+I3mup5VFSx8zgW8Ez8U5vEmfa O7B0r619CtLyZ/JnMHF75ldj0Gj00GCQpSmlKQ7pLeC8cGeOLL8V4iZPviXISjzi7+OA rfjIqx9q372G9EFtU6bK25UunGFkj9qynjeryVmkaTIoQM6LhNqe2DBheAw03D+cS56Z uv8Q== X-Gm-Message-State: AOUpUlEjTf+S2tQ2Us0mzwNs0HYldXFQo/6YHoZccFUY7bfNebeUovQs JX+p1OEw5ftALO9veaEl0SXoANr3iGv/booTS08= X-Received: by 2002:aca:4e50:: with SMTP id c77-v6mr5931334oib.254.1532696206959; Fri, 27 Jul 2018 05:56:46 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac9:7702:0:0:0:0:0 with HTTP; Fri, 27 Jul 2018 05:56:46 -0700 (PDT) In-Reply-To: <3ed2046c-6912-9380-7ea4-4d921981c64c@arm.com> References: <20180419171234.11053-1-ganapatrao.kulkarni@cavium.com> <3ed2046c-6912-9380-7ea4-4d921981c64c@arm.com> From: Ganapatrao Kulkarni Date: Fri, 27 Jul 2018 18:26:46 +0530 Message-ID: Subject: Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA To: Robin Murphy Cc: Ganapatrao Kulkarni , Joerg Roedel , iommu@lists.linux-foundation.org, LKML , tomasz.nowicki@cavium.com, jnair@caviumnetworks.com, Robert Richter , Vadim.Lomovtsev@cavium.com, Jan.Glauber@cavium.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Robin, On Wed, Jul 25, 2018 at 7:50 PM, Robin Murphy wrote: > On 12/07/18 08:45, Ganapatrao Kulkarni wrote: >> >> Hi Robin, >> >> >> On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni >> wrote: >>> >>> ping?? >>> >>> On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni >>> wrote: >>>> >>>> On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni >>>> wrote: >>>>> >>>>> Hi Robin, >>>>> >>>>> On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni >>>>> wrote: >>>>>> >>>>>> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy >>>>>> wrote: >>>>>>> >>>>>>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote: >>>>>>>> >>>>>>>> >>>>>>>> The performance drop is observed with long hours iperf testing using >>>>>>>> 40G >>>>>>>> cards. This is mainly due to long iterations in finding the free >>>>>>>> iova >>>>>>>> range in 32bit address space. >>>>>>>> >>>>>>>> In current implementation for 64bit PCI devices, there is always >>>>>>>> first >>>>>>>> attempt to allocate iova from 32bit(SAC preferred over DAC) address >>>>>>>> range. Once we run out 32bit range, there is allocation from higher >>>>>>>> range, >>>>>>>> however due to cached32_node optimization it does not suppose to be >>>>>>>> painful. cached32_node always points to recently allocated 32-bit >>>>>>>> node. >>>>>>>> When address range is full, it will be pointing to last allocated >>>>>>>> node >>>>>>>> (leaf node), so walking rbtree to find the available range is not >>>>>>>> expensive affair. However this optimization does not behave well >>>>>>>> when >>>>>>>> one of the middle node is freed. In that case cached32_node is >>>>>>>> updated >>>>>>>> to point to next iova range. The next iova allocation will consume >>>>>>>> free >>>>>>>> range and again update cached32_node to itself. From now on, walking >>>>>>>> over 32-bit range is more expensive. >>>>>>>> >>>>>>>> This patch adds fix to update cached node to leaf node when there >>>>>>>> are no >>>>>>>> iova free range left, which avoids unnecessary long iterations. >>>>>>> >>>>>>> >>>>>>> >>>>>>> The only trouble with this is that "allocation failed" doesn't >>>>>>> uniquely mean >>>>>>> "space full". Say that after some time the 32-bit space ends up empty >>>>>>> except >>>>>>> for one page at 0x1000 and one at 0x80000000, then somebody tries to >>>>>>> allocate 2GB. If we move the cached node down to the leftmost entry >>>>>>> when >>>>>>> that fails, all subsequent allocation attempts are now going to fail >>>>>>> despite >>>>>>> the space being 99.9999% free! >>>>>>> >>>>>>> I can see a couple of ways to solve that general problem of free >>>>>>> space above >>>>>>> the cached node getting lost, but neither of them helps with the case >>>>>>> where >>>>>>> there is genuinely insufficient space (and if anything would make it >>>>>>> even >>>>>>> slower). In terms of the optimisation you want here, i.e. fail fast >>>>>>> when an >>>>>>> allocation cannot possibly succeed, the only reliable idea which >>>>>>> comes to >>>>>>> mind is free-PFN accounting. I might give that a go myself to see how >>>>>>> ugly >>>>>>> it looks. >> >> >> did you get any chance to look in to this issue? >> i am waiting for your suggestion/patch for this issue! > > > I got as far as [1], but I wasn't sure how much I liked it, since it still > seems a little invasive for such a specific case (plus I can't remember if > it's actually been debugged or not). I think in the end I started wondering > whether it's even worth bothering with the 32-bit optimisation for PCIe > devices - 4 extra bytes worth of TLP is surely a lot less significant than > every transaction taking up to 50% more bus cycles was for legacy PCI. how about tracking previous attempt to get 32bit range iova and avoid further attempts, if it was failed. Later Resume attempts once replenish happens. Created patch for the same [2] [2] https://github.com/gpkulkarni/linux/commit/e2343a3e1f55cdeb5694103dd354bcb881dc65c3 note, the testing of this patch is in progress. > > Robin. > > [1] > http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a8e0e4af10ebebb3669750e05bf0028e5bd6afe8 thanks Ganapat