Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754342AbdGJPPj (ORCPT ); Mon, 10 Jul 2017 11:15:39 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:36854 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754159AbdGJPPh (ORCPT ); Mon, 10 Jul 2017 11:15:37 -0400 Subject: Re: [PATCH] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables To: shankerd@codeaurora.org, Ganapatrao Kulkarni References: <1498405569-463-1-git-send-email-shankerd@codeaurora.org> <27b46938-ae23-9750-e0c7-09fa472d3297@arm.com> <09fa2b5a-9039-0902-4f57-6a6c2a5f7c37@arm.com> <6ea9d5df-d15f-a8a6-2442-e2f628b29816@codeaurora.org> Cc: Jason Cooper , Vikram Sethi , linux-kernel , "ganapatrao.kulkarni@cavium.com" , Thomas Gleixner , linux-arm-kernel , Jayachandran C From: Marc Zyngier Organization: ARM Ltd Message-ID: <03162741-2ddc-fe3d-7f5d-4eafb8209ef5@arm.com> Date: Mon, 10 Jul 2017 16:15:28 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <6ea9d5df-d15f-a8a6-2442-e2f628b29816@codeaurora.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5191 Lines: 115 On 10/07/17 15:57, Shanker Donthineni wrote: > Hi Marc, > > On 07/10/2017 08:50 AM, Marc Zyngier wrote: >> On 10/07/17 11:21, Ganapatrao Kulkarni wrote: >>> Hi Marc, >>> >>> On Mon, Jul 10, 2017 at 2:53 PM, Marc Zyngier wrote: >>>> On 10/07/17 10:08, Ganapatrao Kulkarni wrote: >>>>> On Mon, Jul 10, 2017 at 2:36 PM, Marc Zyngier wrote: >>>>>> On 10/07/17 09:48, Ganapatrao Kulkarni wrote: >>>>>>> Hi Marc, >>>>>>> >>>>>>> On Mon, Jul 3, 2017 at 8:23 PM, Marc Zyngier wrote: >>>>>>>> Hi Shanker, >>>>>>>> >>>>>>>> On 03/07/17 15:24, Shanker Donthineni wrote: >>>>>>>>> Hi Marc, >>>>>>>>> >>>>>>>>> On 06/30/2017 03:51 AM, Marc Zyngier wrote: >>>>>>>>>> On 30/06/17 04:01, Ganapatrao Kulkarni wrote: >>>>>>>>>>> On Fri, Jun 30, 2017 at 8:04 AM, Ganapatrao Kulkarni >>>>>>>>>>> wrote: >>>>>>>>>>>> Hi Shanker, >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jun 25, 2017 at 9:16 PM, Shanker Donthineni >>>>>>>>>>>> wrote: >>>>>>>>>>>>> The NUMA node information is visible to ITS driver but not being used >>>>>>>>>>>>> other than handling errata. This patch allocates the memory for ITS >>>>>>>>>>>>> tables from the corresponding NUMA node using the appropriate NUMA >>>>>>>>>>>>> aware functions. >>>>>>>>>>> >>>>>>>>>>> IMHO, the description would have been more constructive? >>>>>>>>>>> >>>>>>>>>>> "All ITS tables are mapped by default to NODE 0 memory. >>>>>>>>>>> Adding changes to allocate memory from respective NUMA NODES of ITS devices. >>>>>>>>>>> This will optimize tables access and avoids unnecessary inter-node traffic." >>>>>>>>>> >>>>>>>>>> But more importantly, I'd like to see figures showing the actual benefit >>>>>>>>>> of this per-node allocation. Given that both of you guys have access to >>>>>>>>>> such platforms, please show me the numbers! >>>>>>>>>> >>>>>>>>> >>>>>>>>> I'll share the actual results which shows the improvement whenever >>>>>>>>> available on our next chips. Current version of Qualcomm qdf2400 doesn't >>>>>>>>> support multi socket configuration to capture results and share with you. >>>>>>>>> >>>>>>>>> Do you see any other issues with this patch apart from the performance >>>>>>>>> improvements. I strongly believe this brings the noticeable improvement >>>>>>>>> in numbers on systems where it has multi node memory/CPU configuration. >>>>>>>> >>>>>>>> I agree that it *could* show an improvement, but it very much depends on >>>>>>>> how often the ITS misses in its caches. For this kind of patches, I want >>>>>>>> to see two things: >>>>>>>> >>>>>>>> 1) It brings a measurable benefit on NUMA platforms >>>>>>> >>>>>>> Did some measurement of interrupt response time for LPIs and we don't >>>>>>> see any major >>>>>>> improvement due to caching of Tables. However, we have seen >>>>>>> improvements of around 5%. >>>>>> >>>>>> An improvement of what exactly? >>>>> >>>>> interrupt response time. >>>> >>>> Measured how? On which HW? Using which benchmark? >>> >>> This has been tested on ThunderX2. >>> We have instrumented gic-v3-its driver code to create dummy LPI device >>> with few vectors. >>> The LPI is induced from dummy device(through sysfs by writing to >>> TRANSLATOR reg). >>> The ISR routine(gic_handle_irq) being called to handle the induced LPI. >>> NODE 1 cpu is used to induce LPI and NODE 1 cpu/collection is mapped >>> in ITT to route this LPI. >>> >>> CPU timer counter are sampled at the time LPI is Induced and in ISR >>> routine to calculate interrupt response time. >>> the result shown improvement of 5% with this patch. >> >> And you call that a realistic measurement of the latency? Really? Sorry, >> but I cannot take you seriously here. >> >>> Do you have any recommended benchmarks to test the same? >> >> Run a standard benchmark such as netperf, post the result with and >> without that patch. The above is just plain ridiculous. >> > > The whole purpose of ACPI subtable "GIC Interrupt Translation Service (ITS) Affinity structure" > is to provide the proximity information to OS so that software will take advantage of NUMA > aware allocations to improve the read latency of ITS/GICR tables, not just for implementing > software workarounds. > > > I believe ITS driver should provide NUMA aware allocations just like x86 Linux drivers. How much > performance improvement we observer is based on the individual SOC implementation, inter NODE > latency, inter node traffic, cache capacity, and type of the test used to measure results. > > Please consider this patch irrespective of the test results running on a specific hardware. We > need this patch for upcoming Qualcomm server chips. "I believe" and "We need" are not a proof of the usefulness of this. We can argue all day, or you can provide a set of convincing results. Your choice. But I can guarantee you the the latter is a much better method than the former. If you (or Cavium) cannot be bothered to provide tangible results that this is useful, why should I take this at face value? This is just like any other improvement we make to the kernel. We back it *with data*. M. -- Jazz is not dead. It just smells funny...