Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656AbdG1TJl (ORCPT ); Fri, 28 Jul 2017 15:09:41 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:50280 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752450AbdG1TJk (ORCPT ); Fri, 28 Jul 2017 15:09:40 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 28 Jul 2017 12:09:38 -0700 From: Vikram Mulukutla To: Will Deacon Cc: qiaozhou , Thomas Gleixner , John Stultz , sboyd@codeaurora.org, LKML , Wang Wilbur , Marc Zyngier , Peter Zijlstra , linux-kernel-owner@vger.kernel.org, sudeep.holla@arm.com Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync In-Reply-To: <20170728092831.GA24839@arm.com> References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> <20170728092831.GA24839@arm.com> Message-ID: <2aa9684cf9c889ee9fdc8550b4388af6@codeaurora.org> User-Agent: Roundcube Webmail/1.2.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4046 Lines: 105 On 2017-07-28 02:28, Will Deacon wrote: > On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote: >> >> I think we should have this discussion now - I brought this up earlier >> [1] >> and I promised a test case that I completely forgot about - but here >> it >> is (attached). Essentially a Big CPU in an acquire-check-release loop >> will have an unfair advantage over a little CPU concurrently >> attempting >> to acquire the same lock, in spite of the ticket implementation. If >> the Big >> CPU needs the little CPU to make forward progress : livelock. >> >> >> One solution was to use udelay(1) in such loops instead of >> cpu_relax(), but >> that's not very 'relaxing'. I'm not sure if there's something we could >> do >> within the ticket spin-lock implementation to deal with this. > > Does bodging cpu_relax to back-off to wfe after a while help? The event > stream will wake it up if nothing else does. Nasty patch below, but I'd > be > interested to know whether or not it helps. > > Will > This does seem to help. Here's some data after 5 runs with and without the patch. time = max time taken to acquire lock counter = number of times lock acquired cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz Without the cpu_relax() bodging patch: ===================================================== cpu0 time | cpu0 counter | cpu4 time | cpu4 counter | ==========|==============|===========|==============| 117893us| 2349144| 2us| 6748236| 571260us| 2125651| 2us| 7643264| 19780us| 2392770| 2us| 5987203| 19948us| 2395413| 2us| 5977286| 19822us| 2429619| 2us| 5768252| 19888us| 2444940| 2us| 5675657| ===================================================== cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz With the cpu_relax() bodging patch: ===================================================== cpu0 time | cpu0 counter | cpu4 time | cpu4 counter | ==========|==============|===========|==============| 3us| 2737438| 2us| 6907147| 2us| 2742478| 2us| 6902241| 132us| 2745636| 2us| 6876485| 3us| 2744554| 2us| 6898048| 3us| 2741391| 2us| 6882901| ===================================================== The patch also seems to have helped with fairness in general allowing more work to be done if the CPU frequencies are more closely matched (I don't know if this translates to real world performance - probably not). The counter values are higher with the patch. time = max time taken to acquire lock counter = number of times lock acquired cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz Without the cpu_relax() bodging patch: ===================================================== cpu0 time | cpu0 counter | cpu4 time | cpu4 counter | ==========|==============|===========|==============| 2us| 5240654| 1us| 5339009| 2us| 5287797| 97us| 5327073| 2us| 5237634| 1us| 5334694| 2us| 5236676| 88us| 5333582| 84us| 5285880| 84us| 5329489| ===================================================== cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz With the cpu_relax() bodging patch: ===================================================== cpu0 time | cpu0 counter | cpu4 time | cpu4 counter | ==========|==============|===========|==============| 140us| 10449121| 1us| 11154596| 1us| 10757081| 1us| 11479395| 83us| 10237109| 1us| 10902557| 2us| 9871101| 1us| 10514313| 2us| 9758763| 1us| 10391849| ===================================================== Thanks, Vikram -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project