Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp920625ybv; Thu, 20 Feb 2020 09:36:08 -0800 (PST) X-Google-Smtp-Source: APXvYqyoJyIrnKtu3UQKe4vHqkZBN1z0U/hivh3vhMuF7XVKMvntNMLoHhVqg2Ef+to3xZ52pbux X-Received: by 2002:aca:4106:: with SMTP id o6mr2906173oia.173.1582220167967; Thu, 20 Feb 2020 09:36:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582220167; cv=none; d=google.com; s=arc-20160816; b=duxDksIPslB5mnqOkgSSnyv/u9gmnO9EFryLF2e1ty08EpFMeVkHGvCEGTbOFnM0T0 P/tq56Dc+wC4KSagA5k2w7vS34chdYJj0+hsc27iLgCjFdJuTtrRM/HARLBJs7PchHnF iDdRaB7Vxr40bFggKBKyvbWfOrwbRasBCvyLXtIHY0ACQwaM8KooSZ8DlbfLYv69YREd erj4nK0KoFkSOGe5X4oR/N7dp2oRR63832UHuGx5c7h4g33GWHPfi+8huT0aQj8YAXer 6izy96Iooyqruq8sRjFP2KGNWni2o4otogVg4c8YIw/cm5WGk8kkLdVPPaVKnijpzqCy suEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=7A9+vlw15kPI34OxCuq+OFrZzu4t4Hlhu7Sa20zwFqo=; b=kHb5isBNMM1ltyjYNiVnnwiiM8pjU0torIhcBJ2kHpSLHzwJHUlEXSOsJDxZvA5vCB UzHOSeYgD3SCDPyjxFrXu2L3rTs5Htvjnw0mtr1l3L8qnQsZYppmtKTa7onnPA2FHytI btaJBWxAS7VKBWsw7S3pMO07SWgNJwlCjvDcQoDb6ldkijlSjWPhYAT8VXmJBsXb+d47 yrXSNJOIux5uCWQ1lCVzeB0lwhGQXKmiF65PrLrU/Tu76fzzEUJGPtu7YueiuHNJFFsV kPbD0LuTpaaeut2Hu8+wWa8m90ugUVq5Out2O5tAvH4FlWUjMYnImHd/FSqLqgOGGaLC A+Xg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n11si17020otk.290.2020.02.20.09.35.55; Thu, 20 Feb 2020 09:36:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728528AbgBTRfl (ORCPT + 99 others); Thu, 20 Feb 2020 12:35:41 -0500 Received: from foss.arm.com ([217.140.110.172]:47962 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726959AbgBTRfl (ORCPT ); Thu, 20 Feb 2020 12:35:41 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AC7F831B; Thu, 20 Feb 2020 09:35:40 -0800 (PST) Received: from [10.1.196.37] (e121345-lin.cambridge.arm.com [10.1.196.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8EFF93F68F; Thu, 20 Feb 2020 09:35:39 -0800 (PST) Subject: Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment To: Liam Mark , Will Deacon Cc: Joerg Roedel , "Isaac J. Manjarres" , Pratik Patel , iommu@lists.linux-foundation.org, kernel-team@android.com, linux-kernel@vger.kernel.org References: <20200219123704.GC19400@willie-the-truck> From: Robin Murphy Message-ID: Date: Thu, 20 Feb 2020 17:35:38 +0000 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=iso-8859-7; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/02/2020 11:22 pm, Liam Mark wrote: > On Wed, 19 Feb 2020, Will Deacon wrote: > >> On Mon, Feb 17, 2020 at 04:46:14PM +0000, Robin Murphy wrote: >>> On 14/02/2020 8:30 pm, Liam Mark wrote: >>>> >>>> When the IOVA framework applies IOVA alignment it aligns all >>>> IOVAs to the smallest PAGE_SIZE order which is greater than or >>>> equal to the requested IOVA size. >>>> >>>> We support use cases that requires large buffers (> 64 MB in >>>> size) to be allocated and mapped in their stage 1 page tables. >>>> However, with this alignment scheme we find ourselves running >>>> out of IOVA space for 32 bit devices, so we are proposing this >>>> config, along the similar vein as CONFIG_CMA_ALIGNMENT for CMA >>>> allocations. >>> >>> As per [1], I'd really like to better understand the allocation patterns >>> that lead to such a sparsely-occupied address space to begin with, given >>> that the rbtree allocator is supposed to try to maintain locality as far as >>> possible, and the rcaches should further improve on that. Are you also >>> frequently cycling intermediate-sized buffers which are smaller than 64MB >>> but still too big to be cached? Are there a lot of non-power-of-two >>> allocations? >> >> Right, information on the allocation pattern would help with this change >> and also the choice of IOVA allocation algorithm. Without it, we're just >> shooting in the dark. >> > > Thanks for the responses. > > I am looking into how much of our allocation pattern details I can share. > > My general understanding is that this issue occurs on a 32bit devices > which have additional restrictions on the IOVA range they can use within those > 32bits. > > An example is a use case which involves allocating a lot of buffers ~80MB > is size, the current algorithm will require an alignment of 128MB for > those buffers. My understanding is that it simply can't accommodate the number of 80MB > buffers that are required because the of amount of IOVA space which can't > be used because of the 128MB alignment requirement. OK, that's a case I can understand :) >>>> Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of >>>> IOVAs to some desired PAGE_SIZE order, specified by >>>> CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of >>>> fragmentation caused by the current IOVA alignment scheme, and >>>> gives better IOVA space utilization. >>> >>> Even if the general change did prove reasonable, this IOVA allocator is not >>> owned by the DMA API, so entirely removing the option of strict >>> size-alignment feels a bit uncomfortable. Personally I'd replace the bool >>> argument with an actual alignment value to at least hand the authority out >>> to individual callers. >>> >>> Furthermore, even in DMA API terms, is anyone really ever going to bother >>> tuning that config? Since iommu-dma is supposed to be a transparent layer, >>> arguably it shouldn't behave unnecessarily differently from CMA, so simply >>> piggy-backing off CONFIG_CMA_ALIGNMENT would seem logical. >> >> Agreed, reusing CONFIG_CMA_ALIGNMENT makes a lot of sense here as callers >> relying on natural alignment of DMA buffer allocations already have to >> deal with that limitation. We could fix it as an optional parameter at >> init time (init_iova_domain()), and have the DMA IOMMU implementation >> pass it in there. >> > > My concern with using CONFIG_CMA_ALIGNMENT alignment is that for us this > would either involve further fragmenting our CMA regions (moving our CMA > max alignment from 1MB to max 2MB) or losing so of our 2MB IOVA block > mappings (changing our IOVA max alignment form 2MB to 1MB). > > At least for us CMA allocations are often not DMA mapped into stage 1 page > tables so moving the CMA max alignment to 2MB in our case would, I think, > only provide the disadvantage of having to increase the size our CMA > regions to accommodate this large alignment (which isn?t optimal for > memory utilization since CMA regions can't satisfy unmovable page > allocations). > > As an alternative would it be possible for the dma-iommu layer to use the > size of the allocation and the domain pgsize_bitmap field to pick a max > IOVA alignment, which it can pass in for that IOVA allocation, which will > maximize block mappings but not waste IOVA space? Given that we already have DMA_ATTR_ALOC_SINGLE_PAGES for video drivers and suchlike that know enough to know they want "large buffer" allocation behaviour, would it suffice to have a similar attribute that says "I'm not too fussed about alignment"? That way there's no visible change for anyone who doesn't opt in and might be relying on the existing behaviour, intentionally or otherwise. Then if necessary, the implementation can consider both flags together to decide whether to try to round down to the next block size or just shove it in anywhere. Robin.