Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp589025ybl; Fri, 31 Jan 2020 04:31:54 -0800 (PST) X-Google-Smtp-Source: APXvYqyODgethnpLx7KnoEDM5V26zzjfva3Q7okkDvCtfSKaM9JzP7oemEcfPFguOSMHFs1yqrmH X-Received: by 2002:a9d:6758:: with SMTP id w24mr7751764otm.155.1580473914524; Fri, 31 Jan 2020 04:31:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580473914; cv=none; d=google.com; s=arc-20160816; b=j60SI8pBke6gZunynz56CQlYbMQyk4JtIWc+DeAXR2lsPbXQ0LXgQGo3e/9Asw2ucQ 6yJTWmGRUMn24y/9dFV13yi1nZ8Zdeuh+Q7SRZncDhwXCmrmcErQ9wZ+tG1rwz65pT5Z gyiGIbwMq+LvB+4OTmgfGAAWF47DNuPKiDtJ5FMrV8Wk9HaP2dd1GVeYWuCJYW3kmWkB cHBgwDFlVaTTzxy305/Dh3yLdw4h0xek5xPWcWgSD90ezAXehmDvVbAFf7V8CTtxOjCm 4noAXXsftuEYDqBV/W4uY6QEvDrPiQR5owidBX7FBNb1dW2mE4tFSlLJWSK+uo0cbw9W sD8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=f3SmP1KhAYPRG34YrZtLHmyQuzh5rDnZZoo4GetVX00=; b=nBE61QeTKXaFwTm9p6t8dJOtbntFu5aeMsLvU8XwOsl38RQuI0mb3Stz5W89abymQ7 UNGNMPS9oMS/e5UTmjCWhWfjk9tLE940ieDF5SjvylHa0/a8qVUWufPLErxC8ZEnCirs vAGL9GI3y6dsTqLIIULfcV4/LwdXxV82XZAq6NPyg2nwNIKvb1Y4yCRx+dy4Uc3VRzAZ tzLn7rnX9qtDLOTaNyIuscYAXGDmjx/4wqKGEWPDfvSPCiFtwo0xcyD3jI0PIl2Fn7fC 3/EF8lwGVn3jXRFJ283PVyZoXNASAkvWlIpXaRxar9OO5k3CT1vveWpj+Gso9VawQir8 sguQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v67si4387678oia.26.2020.01.31.04.31.41; Fri, 31 Jan 2020 04:31:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728544AbgAaMao (ORCPT + 99 others); Fri, 31 Jan 2020 07:30:44 -0500 Received: from foss.arm.com ([217.140.110.172]:34938 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728479AbgAaMan (ORCPT ); Fri, 31 Jan 2020 07:30:43 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D42EF1063; Fri, 31 Jan 2020 04:30:42 -0800 (PST) Received: from [192.168.1.123] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B7D5B3F67D; Fri, 31 Jan 2020 04:30:41 -0800 (PST) Subject: Re: [PATCH] dma-debug: dynamic allocation of hash table To: Eric Dumazet Cc: Christoph Hellwig , Joerg Roedel , iommu@lists.linux-foundation.org, Eric Dumazet , Geert Uytterhoeven , linux-kernel References: <20200130191049.190569-1-edumazet@google.com> From: Robin Murphy Message-ID: Date: Fri, 31 Jan 2020 12:30:42 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-01-31 12:17 am, Eric Dumazet wrote: > On Thu, Jan 30, 2020 at 3:46 PM Robin Murphy wrote: >> >> Hi Eric, >> >> On 2020-01-30 7:10 pm, Eric Dumazet via iommu wrote: >>> Increasing the size of dma_entry_hash size by 327680 bytes >>> has reached some bootloaders limitations. >> >> [ That might warrant some further explanation - I don't quite follow how >> this would relate to a bootloader specifically :/ ] > > I had no details, please look at the prior thread where this has been discussed. > > https://www.spinics.net/lists/linux-renesas-soc/msg46157.html Thanks for the pointer - that had passed me by as it doesn't seem to have been CCed to the IOMMU list or me as a reviewer. >>> Simply use dynamic allocations instead, and take >>> this opportunity to increase the hash table to 65536 >>> buckets. Finally my 40Gbit mlx4 NIC can sustain >>> line rate with CONFIG_DMA_API_DEBUG=y. >> >> That's pretty cool, but I can't help but wonder if making the table >> bigger caused a problem in the first place, whether making it bigger yet >> again in the name of a fix is really the wisest move. How might this >> impact DMA debugging on 32-bit embedded systems with limited vmalloc >> space and even less RAM, for instance? More to the point, does vmalloc() >> even work for !CONFIG_MMU builds? Obviously we don't want things to be >> *needlessly* slow if avoidable, but is there a genuine justification for >> needing to optimise what is fundamentally an invasive heavyweight >> correctness check - e.g. has it helped expose race conditions that were >> otherwise masked? > > Not sure what you are saying. > > vmalloc() _is_ supported by all arches, even !CONFIG_MMU OK, that's good - I wasn't sure off-hand, and I was on the wrong machine to go digging into the source. > I can not test all platforms, and this is a debugging > feature no one uses in production. ...which is exactly why I'd prefer to see a stronger justification for making a change which only benefits performance on large systems, while potentially impacting usability on small ones. >> That said, by moving to dynamic allocation maybe there's room to be >> cleverer and make HASH_SIZE scale with, say, system memory size? (I >> assume from the context it's not something we can expand on-demand like >> we did for the dma_debug_entry pool) >> > > How memory size can serve as a proxy of the number of entries ? > Current 10Gbit NIC need about 256,000 entries in the table. > Needless to say, the prior hash size was unusable. Because it's actually a reasonable proxy for "system size" in this context - there aren't many good reasons for a driver to maintain hundreds of mappings of the *same* buffer, so in general the maximum number of live mappings is effectively going to scale with the total amount of memory, particularly at the smaller end. Consider it a pretty safe assumption that an embedded system with RAM measured in megabytes won't ever be running anything like an MLX4, let alone in a debug situation. If those 256,000 mappings each represent a typical network packet, that implies well over 300MB of data alone, not to mention the size of the queues themselves, the actual DMA debug entries, and the whole rest of the kernel beyond that one driver - I doubt it's physically possible to 'need' 64K hash buckets without at very least 1GB of total RAM, quite likely much more. > As I suggested one month ago, HASH_SIZE can be tuned by a developper > eager to use this facility. > > 65536 slots are 768 KB on 32bit platforms. ...and when that represents ~5% of the total system RAM it is a *lot* less reasonable than even 12KB. As I said, it's great to make a debug option more efficient such that what it observes is more representative of the non-debug behaviour, but it mustn't come at the cost of making the entire option unworkable for other users. Robin. >>> Fixes: 5e76f564572b ("dma-debug: increase HASH_SIZE") >>> Signed-off-by: Eric Dumazet >>> Reported-by: Geert Uytterhoeven >>> Cc: Christoph Hellwig >>> --- >>> kernel/dma/debug.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c >>> index 2031ed1ad7fa109bb8a8c290bbbc5f825362baba..a310dbb1515e92c081f8f3f9a7290dd5e53fc889 100644 >>> --- a/kernel/dma/debug.c >>> +++ b/kernel/dma/debug.c >>> @@ -27,7 +27,7 @@ >>> >>> #include >>> >>> -#define HASH_SIZE 16384ULL >>> +#define HASH_SIZE 65536ULL >>> #define HASH_FN_SHIFT 13 >>> #define HASH_FN_MASK (HASH_SIZE - 1) >>> >>> @@ -90,7 +90,8 @@ struct hash_bucket { >>> }; >>> >>> /* Hash list to save the allocated dma addresses */ >>> -static struct hash_bucket dma_entry_hash[HASH_SIZE]; >>> +static struct hash_bucket *dma_entry_hash __read_mostly; >>> + >>> /* List of pre-allocated dma_debug_entry's */ >>> static LIST_HEAD(free_entries); >>> /* Lock for the list above */ >>> @@ -934,6 +935,10 @@ static int dma_debug_init(void) >>> if (global_disable) >>> return 0; >>> >>> + dma_entry_hash = vmalloc(HASH_SIZE * sizeof(*dma_entry_hash)); >>> + if (!dma_entry_hash) >>> + goto err; >>> + >>> for (i = 0; i < HASH_SIZE; ++i) { >>> INIT_LIST_HEAD(&dma_entry_hash[i].list); >>> spin_lock_init(&dma_entry_hash[i].lock); >>> @@ -950,6 +955,7 @@ static int dma_debug_init(void) >>> pr_warn("%d debug entries requested but only %d allocated\n", >>> nr_prealloc_entries, nr_total_entries); >>> } else { >>> +err: >>> pr_err("debugging out of memory error - disabled\n"); >>> global_disable = true; >>> >>>