Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751779AbdIUO0m (ORCPT ); Thu, 21 Sep 2017 10:26:42 -0400 Received: from verein.lst.de ([213.95.11.211]:50536 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751606AbdIUO0l (ORCPT ); Thu, 21 Sep 2017 10:26:41 -0400 Date: Thu, 21 Sep 2017 16:26:39 +0200 From: Christoph Hellwig To: Robin Murphy Cc: Ganapatrao Kulkarni , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, iommu@lists.linux-foundation.org, linux-mm@kvack.org, Christoph Hellwig , Marek Szyprowski , Will.Deacon@arm.com, lorenzo.pieralisi@arm.com, hanjun.guo@linaro.org, joro@8bytes.org, vbabka@suse.cz, akpm@linux-foundation.org, mhocko@suse.com, Tomasz.Nowicki@cavium.com, Robert.Richter@cavium.com, jnair@caviumnetworks.com, gklkml16@gmail.com Subject: Re: [PATCH 3/4] iommu/arm-smmu-v3: Use NUMA memory allocations for stream tables and comamnd queues Message-ID: <20170921142639.GA18211@lst.de> References: <20170921085922.11659-1-ganapatrao.kulkarni@cavium.com> <20170921085922.11659-4-ganapatrao.kulkarni@cavium.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 958 Lines: 17 On Thu, Sep 21, 2017 at 12:58:04PM +0100, Robin Murphy wrote: > Christoph, Marek; how reasonable do you think it is to expect > dma_alloc_coherent() to be inherently NUMA-aware on NUMA-capable > systems? SWIOTLB looks fairly straightforward to fix up (for the simple > allocation case; I'm not sure it's even worth it for bounce-buffering), > but the likes of CMA might be a little trickier... I think allocating data node local to dev is a good default. I'm not sure if we'd still need a version that takes an explicit node, though. On the one hand devices like NVMe or RDMA nics have queues that are assigned to specific cpus and thus have an inherent affinity to given nodes. On the other hand we'd still need to access the PCIe device, so for it to make sense we'd need to access the dma memory a lot more from the host than from the device, and I'm not sure if we ever have devices where that is the case (which would not be optimal to start with).