Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753969AbdGJMa4 (ORCPT ); Mon, 10 Jul 2017 08:30:56 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:48180 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753829AbdGJMay (ORCPT ); Mon, 10 Jul 2017 08:30:54 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 7448861268 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=shankerd@codeaurora.org Reply-To: shankerd@codeaurora.org Subject: Re: [PATCH] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables To: Ganapatrao Kulkarni , Marc Zyngier Cc: Jason Cooper , Vikram Sethi , linux-kernel , "ganapatrao.kulkarni@cavium.com" , Thomas Gleixner , linux-arm-kernel , Jayachandran C References: <1498405569-463-1-git-send-email-shankerd@codeaurora.org> <27b46938-ae23-9750-e0c7-09fa472d3297@arm.com> From: Shanker Donthineni Message-ID: Date: Mon, 10 Jul 2017 07:30:50 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4448 Lines: 114 Marc, Do you have any other concerns taking this patch? On 07/10/2017 05:21 AM, Ganapatrao Kulkarni wrote: > Hi Marc, > > On Mon, Jul 10, 2017 at 2:53 PM, Marc Zyngier wrote: >> On 10/07/17 10:08, Ganapatrao Kulkarni wrote: >>> On Mon, Jul 10, 2017 at 2:36 PM, Marc Zyngier wrote: >>>> On 10/07/17 09:48, Ganapatrao Kulkarni wrote: >>>>> Hi Marc, >>>>> >>>>> On Mon, Jul 3, 2017 at 8:23 PM, Marc Zyngier wrote: >>>>>> Hi Shanker, >>>>>> >>>>>> On 03/07/17 15:24, Shanker Donthineni wrote: >>>>>>> Hi Marc, >>>>>>> >>>>>>> On 06/30/2017 03:51 AM, Marc Zyngier wrote: >>>>>>>> On 30/06/17 04:01, Ganapatrao Kulkarni wrote: >>>>>>>>> On Fri, Jun 30, 2017 at 8:04 AM, Ganapatrao Kulkarni >>>>>>>>> wrote: >>>>>>>>>> Hi Shanker, >>>>>>>>>> >>>>>>>>>> On Sun, Jun 25, 2017 at 9:16 PM, Shanker Donthineni >>>>>>>>>> wrote: >>>>>>>>>>> The NUMA node information is visible to ITS driver but not being used >>>>>>>>>>> other than handling errata. This patch allocates the memory for ITS >>>>>>>>>>> tables from the corresponding NUMA node using the appropriate NUMA >>>>>>>>>>> aware functions. >>>>>>>>> >>>>>>>>> IMHO, the description would have been more constructive? >>>>>>>>> >>>>>>>>> "All ITS tables are mapped by default to NODE 0 memory. >>>>>>>>> Adding changes to allocate memory from respective NUMA NODES of ITS devices. >>>>>>>>> This will optimize tables access and avoids unnecessary inter-node traffic." >>>>>>>> >>>>>>>> But more importantly, I'd like to see figures showing the actual benefit >>>>>>>> of this per-node allocation. Given that both of you guys have access to >>>>>>>> such platforms, please show me the numbers! >>>>>>>> >>>>>>> >>>>>>> I'll share the actual results which shows the improvement whenever >>>>>>> available on our next chips. Current version of Qualcomm qdf2400 doesn't >>>>>>> support multi socket configuration to capture results and share with you. >>>>>>> >>>>>>> Do you see any other issues with this patch apart from the performance >>>>>>> improvements. I strongly believe this brings the noticeable improvement >>>>>>> in numbers on systems where it has multi node memory/CPU configuration. >>>>>> >>>>>> I agree that it *could* show an improvement, but it very much depends on >>>>>> how often the ITS misses in its caches. For this kind of patches, I want >>>>>> to see two things: >>>>>> >>>>>> 1) It brings a measurable benefit on NUMA platforms >>>>> >>>>> Did some measurement of interrupt response time for LPIs and we don't >>>>> see any major >>>>> improvement due to caching of Tables. However, we have seen >>>>> improvements of around 5%. >>>> >>>> An improvement of what exactly? >>> >>> interrupt response time. >> >> Measured how? On which HW? Using which benchmark? > > This has been tested on ThunderX2. > We have instrumented gic-v3-its driver code to create dummy LPI device > with few vectors. > The LPI is induced from dummy device(through sysfs by writing to > TRANSLATOR reg). > The ISR routine(gic_handle_irq) being called to handle the induced LPI. > NODE 1 cpu is used to induce LPI and NODE 1 cpu/collection is mapped > in ITT to route this LPI. > > CPU timer counter are sampled at the time LPI is Induced and in ISR > routine to calculate interrupt response time. > the result shown improvement of 5% with this patch. > > Do you have any recommended benchmarks to test the same? > Ganapatrao, Thanks for your efforts on instrumenting ITS driver code to show interrupt performance improvement of 5% on the ThunderX2 hardware. Actually the current ITS driver is not consistent on allocating memory for ITS/GICR tables, GICR pending tables are allocated from the corresponding NUMA node based on CPU proximity, but not the other tables. >> >> Give me the actual benchmark results. Don't expect me to accept this >> kind of hand-wavy statement. >> >> M. >> -- >> Jazz is not dead. It just smells funny... > > thanks > Ganapat > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > -- Shanker Donthineni Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.