Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934986AbdIYLRt (ORCPT ); Mon, 25 Sep 2017 07:17:49 -0400 Received: from [210.13.100.6] ([210.13.100.6]:44762 "EHLO mail2012.asrmicro.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S933842AbdIYLRs (ORCPT ); Mon, 25 Sep 2017 07:17:48 -0400 X-Greylist: delayed 918 seconds by postgrey-1.27 at vger.kernel.org; Mon, 25 Sep 2017 07:17:47 EDT Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync To: Vikram Mulukutla , Will Deacon CC: Thomas Gleixner , John Stultz , , LKML , Wang Wilbur , "Marc Zyngier" , Peter Zijlstra , , References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> <20170728092831.GA24839@arm.com> <2aa9684cf9c889ee9fdc8550b4388af6@codeaurora.org> <20170731131321.GB1737@arm.com> <20170815184039.GE10801@arm.com> <9f86bd426bbaede9de6d38cb047bd6fa@codeaurora.org> From: qiaozhou Message-ID: <8817730a-9581-240c-8de0-e6c96c20e9ec@asrmicro.com> Date: Mon, 25 Sep 2017 19:02:03 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.1.50.16] X-ClientProxiedBy: mail2012.asrmicro.com (10.1.24.123) To mail2012.asrmicro.com (10.1.24.123) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2623 Lines: 75 Hi Will, Will this bodging patch be merged? It can solve the livelock issue on arm64 platforms(at least improve a lot). I suspected that CCI-freq might impact the contention between little and big core, but on my platform, it impacts little. In fact the frequency of external DDR controller impacts the contention.(My last reply has detailed data). It might be flushed out of cache after entering WFE and be loaded from DDR to cache when woken up.(I guessed that's why external DDR freq matters.) Even with the lowest DDR freq(78M) on my platform, the maximum delay to get locked of the little core drops to ~10 ms with this bodging patch, while without the patch, the delay can be in 10s level by my testing, as discussed previously. So I'm wondering whether it's will be pushed into mainline, or still need more data? Thanks a lot. Best Regards Qiao On 2017年08月29日 07:12, Vikram Mulukutla wrote: > Hi Will, > > On 2017-08-25 12:48, Vikram Mulukutla wrote: >> Hi Will, >> >> On 2017-08-15 11:40, Will Deacon wrote: >>> Hi Vikram, >>> >>> On Thu, Aug 03, 2017 at 04:25:12PM -0700, Vikram Mulukutla wrote: >>>> On 2017-07-31 06:13, Will Deacon wrote: >>>> >On Fri, Jul 28, 2017 at 12:09:38PM -0700, Vikram Mulukutla wrote: >>>> >>On 2017-07-28 02:28, Will Deacon wrote: >>>> >>>On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote: >>>> >>>> >>> >>>> >>This does seem to help. Here's some data after 5 runs with and >>>> without >>>> >>the >>>> >>patch. >>>> > >>>> >Blimey, that does seem to make a difference. Shame it's so ugly! >>>> Would you >>>> >be able to experiment with other values for >>>> CPU_RELAX_WFE_THRESHOLD? I had >>>> >it set to 10000 in the diff I posted, but that might be higher than >>>> >optimal. >>>> >It would be interested to see if it correlates with >>>> num_possible_cpus() >>>> >for the highly contended case. >>>> > >>>> >Will >>>> >>>> Sorry for the late response - I should hopefully have some more data >>>> with >>>> different thresholds before the week is finished or on Monday. >>> >>> Did you get anywhere with the threshold heuristic? >>> >>> Will >> >> Here's some data from experiments that I finally got to today. I decided >> to recompile for every value of the threshold. Was doing a binary search >> of sorts and then started reducing by orders of magnitude. There pairs >> of rows here: >> > > Well here's something interesting. I tried a different platform and > found that > the workaround doesn't help much at all, similar to Qiao's observation > on his b.L > chipset. Something to do with the WFE implementation or event-stream?