Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp977537ybv; Thu, 20 Feb 2020 10:44:12 -0800 (PST) X-Google-Smtp-Source: APXvYqyutvpRYMw77EWZiwzGkMFWu1PcakEW9K/LEuP/EOTGkZEmg+Ziljs2mJaT5huEjjWEFVQ6 X-Received: by 2002:a9d:545:: with SMTP id 63mr25582708otw.285.1582224252187; Thu, 20 Feb 2020 10:44:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582224252; cv=none; d=google.com; s=arc-20160816; b=xR2zQBg9p8nMKt+moyzopW8bIg4NgAQx16qcfFehuRzyYa0lpmeaJ5vBCLopflgYaE L00VvZXZ3gKpDLw2RRyRdmSbxklJG9w8R+yRNjkOw2aPDYbBeTHfL+zSJudYwIqHBhfi +AVQPeZ7JChX+vSjJo0WHQwTsD1yReXHwaCdRNDgatMH2kFeEHo5cIZuwRFWAh/0IYqY WgGYraR8eMcwvih2qc4MutRlzXg/Bys5Nym4USHScbnYjT+EycaM5oAJhHfnAiIdVdKZ 5l2ZREUrEMUN2rHfYGo4/K2tFTIPKeUbl18Tt8HJBQ5ID/8HjfkspZkqx2nX7006BG8h 5stw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=wMG39bW3YrA2Qwp216CXtbMAxPlFEXLUg6UpgmlzxvY=; b=CgscriKZt1UP78W8KYSClGvOUUksqBwltlRYMhiBIwdGOLRnovIXY9RddYjHG6qfr9 DO3/BnMVdYdLkslu+D+uJu5WXod0CCR8oXnYYuyBBwvEr+CT/gS80pa6m+mV+TKodwMl 1O1XkFgr3luuPkd0SOuS2sldjFYNqSh0a9hO/k6sf3asZAklzRB5E1VlgTUtV8AouN9R q4Fa/vHEuphZzNXrBxA5F7inZQ3ELwQo+Xgyo+omK1LYS/ghgeBwz0pv9Gw7Dq9E1w+O 7FCcqcroufFuiZjUb11LJoZ0FrwzlomcfyqjmTtT35rAE/c0hqDnX70tM7Dh/6w5GFF7 hDDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g25si117794otj.198.2020.02.20.10.43.59; Thu, 20 Feb 2020 10:44:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728896AbgBTSma (ORCPT + 99 others); Thu, 20 Feb 2020 13:42:30 -0500 Received: from foss.arm.com ([217.140.110.172]:49398 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728237AbgBTSm3 (ORCPT ); Thu, 20 Feb 2020 13:42:29 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7041E30E; Thu, 20 Feb 2020 10:42:29 -0800 (PST) Received: from [10.1.196.37] (e121345-lin.cambridge.arm.com [10.1.196.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E8FD3F6CF; Thu, 20 Feb 2020 10:42:28 -0800 (PST) Subject: Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm To: isaacm@codeaurora.org Cc: iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org, kernel-team@android.com, pratikp@codeaurora.org, Liam Mark References: <1581721602-17010-1-git-send-email-isaacm@codeaurora.org> <7239ddd532e94a4371289f3be23c66a3@codeaurora.org> From: Robin Murphy Message-ID: <195d44d1-ff92-06fd-8ce8-75cd12d47c43@arm.com> Date: Thu, 20 Feb 2020 18:42:27 +0000 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <7239ddd532e94a4371289f3be23c66a3@codeaurora.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20/02/2020 6:38 am, isaacm@codeaurora.org wrote: > On 2020-02-17 08:03, Robin Murphy wrote: >> On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote: >>> From: Liam Mark >>> >>> Using the best-fit algorithm, instead of the first-fit >>> algorithm, may reduce fragmentation when allocating >>> IOVAs. >> >> What kind of pathological allocation patterns make that a serious >> problem? Is there any scope for simply changing the order of things in >> the callers? Do these drivers also run under other DMA API backends >> (e.g. 32-bit Arm)? >> > The usecases where the IOVA space has been fragmented have > non-deterministic allocation > patterns, and thus, it's not feasible to change the allocation order to > avoid fragmenting > the IOVA space. What about combining smaller buffers into larger individual allocations; any scope for that sort of thing? Certainly if you're consistently allocating small things less than PAGE_SIZE then DMA pools would be useful to avoid wanton memory wastage in general. > From what we've observed, the usecases involve allocations of two types of > buffers: one type of buffer between 1 KB to 4 MB in size, and another > type of > buffer between 1 KB to 400 MB in size. > > The pathological scenarios seem to arise when there are > many (100+) randomly distributed non-power of two allocations, which in > some cases leaves > behind holes of up to 100+ MB in the IOVA space. > > Here are some examples that show the state of the IOVA space under which > failure to > allocate an IOVA was observed: > > Instance 1: >     Currently mapped total size : ~1.3GB >     Free space available : ~2GB >     Map for ~162MB fails. >         Max contiguous space available : < 162MB > > Instance 2: >     Currently mapped total size : ~950MB >     Free space available : ~2.3GB >     Map for ~320MB fails. >     Max contiguous space available : ~189MB > > Instance 3: >     Currently mapped total size : ~1.2GB >     Free space available : ~2.7GB >     Map for ~162MB fails. >     Max contiguous space available : <162MB > > We are still in the process of collecting data with the best-fit > algorithm enabled > to provide some numbers to show that it results in less IOVA space > fragmentation. Thanks for those examples, and I'd definitely like to see the comparative figures. To dig a bit further, at the point where things start failing, where are the cached nodes pointing? IIRC there is still a pathological condition where empty space between limit_pfn and cached32_node gets 'lost' if nothing in between is freed, so the bigger the range of allocation sizes, the worse the effect, e.g.: (considering an empty domain, pfn 0 *not* reserved, 32-bit limit_pfn) alloc 4K, succeeds, cached32_node now at 4G-4K alloc 2G, succeeds, cached32_node now at 0 alloc 4K, fails despite almost 2G contiguous free space within limit_pfn (and max32_alloc_size==1 now fast-forwards *any* further allocation attempt to failure) If you're falling foul of this case (I was never sure how realistic a problem it would be in practice), there are at least a couple of much less invasive tweaks I can think of that would be worth exploring. > To answer your question about whether if this driver run under other DMA > API backends: > yes, such as 32 bit ARM. OK, that's what I suspected :) AFAICS arch/arm's __alloc_iova() is also a first-fit algorithm, so if you get better behaviour there then it would suggest that this aspect isn't really the most important issue. Certainly, the fact that the "best fit" logic here also happens to ignore the cached nodes does start drawing me back to the point above. Robin.