Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2705464yba; Mon, 6 May 2019 10:15:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqx/XrEVHOXeqptOrUApvdQNBMes4k5+noa+uVx3tVQfjPsbMDcrbT8AKI/32/tFnw28fHuu X-Received: by 2002:a65:5086:: with SMTP id r6mr34078320pgp.301.1557162910934; Mon, 06 May 2019 10:15:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557162910; cv=none; d=google.com; s=arc-20160816; b=XQJABVSPlKfK54h3DlhMQ6RdX9JWSgKV/sAqVGXjND5NTCAJkKO7jj4+cfYO/nrdey Gz3qTTee26owBbd7h0dPhIw74QywVVABh5XWHDeu4Q6kF4W8Br+Fskwl/31akcUMeeuS kG0XoP8TU2c5ntMO6zVZamknJSUIzwMya5hcyQQXMX+E2CWAMwE44AF7snqXrlotM7di wgSsFIrnExETlfXf47YMG9jL4ul4DPN/YIEVvCi5TPaBTIPQFz3CteDX5vytnyk6g7az EYWny+eKG4+ZfEjcLom4Yqioclp6vivZZz/tRGMLomBK051DGXJQalUnYHdDRMDvS/0b 4lqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=BWErFa2Tb4Iv9aICLnei73w528HT1mPAhqrqIzBV8n8=; b=C2jQCqPthghKd/rjuhMegLkP22J4lg/vaeeB0Cch3o8AlKC46kpxXPWt30qdo4Fwut /CpuIHD/bL3glfDJ5ErqCR8g3rREvKpoWdWW8WgOHyiLBFgsHCWH7/l/esYRgDaVbwHu FmSe1eRn5ANTWj+Nj5g9aWdm9uGxjMF/8lZYNvdSUlnaWDNVlJG8OTISdz4o4Y7FO/jL mlj5y0UolSxZSSAOHcbILv50WpQXjdQo4fDtH1zUmZfjNUIc4MMj/kVxJo64zoVNFoow wbnBI/io6/N/ZjL7VnZxWbLf4IX1OnfEPbpzsb6KABAXCM6aiKVSn5RYDqDL3xjMehzu PMTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b="A/1GS57f"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s4si15558952plp.59.2019.05.06.10.14.54; Mon, 06 May 2019 10:15:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b="A/1GS57f"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726956AbfEFROE (ORCPT + 99 others); Mon, 6 May 2019 13:14:04 -0400 Received: from mail-lf1-f48.google.com ([209.85.167.48]:44865 "EHLO mail-lf1-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbfEFROE (ORCPT ); Mon, 6 May 2019 13:14:04 -0400 Received: by mail-lf1-f48.google.com with SMTP id n134so7889505lfn.11 for ; Mon, 06 May 2019 10:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BWErFa2Tb4Iv9aICLnei73w528HT1mPAhqrqIzBV8n8=; b=A/1GS57fjXXZgfDIFSzv5EkjXjqduB2iecI//MYxRTsOyDEtXxuDBaXBNmKk3QNbZo ic+a6bYXBTLe0BX8PfBz1JeKdB8jrQ/3YOzCOEe0IgU7hvKDFxotfpKn/N3W1iJDimFx GYlEKt8UbE9tbFG9WjvQkiWtxAMk3TOqfUJS8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BWErFa2Tb4Iv9aICLnei73w528HT1mPAhqrqIzBV8n8=; b=m4w4NefOJiyV9MIRLs+/cjfWgbZ/o8jCswCbt4mOeI7lRC42SK6xlOlpotzcr/UVab VQJAEcH530OapNn3DaVihDVdNc99nZL73pXV3V1pes66N/x6aMPEDauEgrnKmbh5fAs4 LKqZjWXwq2u1oRWNAtXe0czYrJTA8fVFU9EEYofQarxvXPWeiMEeb190hds4Cp4wBnZw AHkhnw2dJml/JIWj/Y5yjRytQpDuRiXQ4hEdUSxagLjB/COHe+FmyJmyxn4Etfe6YFww nY1AoL79Bgx+TF3IIuhBsRei6AKjQTmebebPT3IrOhQlOaJOTmcHMuvOr+3b0upp1xn6 WDfQ== X-Gm-Message-State: APjAAAWuhimS07PeSr0Q+4ffcS6cauj+9HnsyvBpnEou0nXSTt7uF8c9 wT6dYaABmoT/NEFesnbSKgftKGPqrKk= X-Received: by 2002:ac2:51a1:: with SMTP id f1mr12425383lfk.129.1557162840951; Mon, 06 May 2019 10:14:00 -0700 (PDT) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com. [209.85.167.53]) by smtp.gmail.com with ESMTPSA id s12sm2430160ljd.66.2019.05.06.10.13.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 May 2019 10:13:59 -0700 (PDT) Received: by mail-lf1-f53.google.com with SMTP id h126so9710952lfh.4 for ; Mon, 06 May 2019 10:13:59 -0700 (PDT) X-Received: by 2002:ac2:43cf:: with SMTP id u15mr13444160lfl.67.1557162838971; Mon, 06 May 2019 10:13:58 -0700 (PDT) MIME-Version: 1.0 References: <20190429145159.GA29076@hc> <20190502082741.GE13955@hc> <20190502231858.GB13168@dc5-eodlnx05.marvell.com> <20190506061100.GA8465@dc5-eodlnx05.marvell.com> In-Reply-To: <20190506061100.GA8465@dc5-eodlnx05.marvell.com> From: Linus Torvalds Date: Mon, 6 May 2019 10:13:43 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [EXT] Re: [RFC] Disable lockref on arm64 To: Jayachandran Chandrasekharan Nair Cc: Jan Glauber , "catalin.marinas@arm.com" , "will.deacon@arm.com" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 5, 2019 at 11:13 PM Jayachandran Chandrasekharan Nair wrote: > > > It's not normal, and it's not inevitable. > > If you look at the code, the CAS failure is followed by a yield > before retrying the CAS. Yield on arm64 is expected to be a hint > to release resources so that other threads/cores can make progress. > Under heavy contention, I expect the current code to behave the way > I noted in my last mail, with the issue with fairness as well. Yes, this is a good point. It's entirely possibly that _particularly_ for CAS loops - where we get an updated value that caused the CAS to fail - we should not yield in between every CAS. Exactly so that we don't force the CPU to flush the cacheline that it just got and make the CAS loop do lots of unnecessary work and just keep looping. That said, I think right now all ARM implementations actually treat the yield instruction as a pure nop, so I assume this isn't actually the root of the ThunderX2 problems. But we could do something like "unroll the cmpxchg loop twice, only yield every other iteration" exactly for the case wher cpu_relax() might encourage the CPU to actually release the cacheline. If you find that something like that would help ThunderX2, we can most certainly try that kind of thing. It makes sense, and unrolling the cmpxchg loop once might even help with branch prediction (ie the common case might be that the *first* cmpxchg succeeds, and that's what the branch predictors get primed for, but then if the first one actually fails, we fall through to the contention case, and now maybe the branch predictor ends up being primed for *that* case for the second one). Yes, the above is wild handwaving - but it's the kind of thing we could easily do if the hardware people have a good argument for them. I'm not asking for _perfect_ hardware. I'm just asking for hardware to not be actively antagonistic to something fundamental like a cmpxchg loop. > Your larger point seems to be that the hardware has smarter to > scale standard locking implementations when adding cores, and > be graceful even in extremely high contention cases. Yes, this > is something we should be looking at for ThunderX3. Yes. Also, my point really is that no core should ever be in the situation that it fetches a cache-line, only to then release it immediately again without making any progress. You do want a *certain* amount of stickiness to the cachelines. Of course, you very much do not want to overdo it - I'm talking about keeping the cacheline around for tens of cycles, not for hundreds of cycles. Enough that if you have small code snippets that are in and out quickly, you don't see bouncing at that level. Because you *don't* want the stickiness to be such that once one CPU has the cacheline, it will stay with that CPU (or that socket) as long as the CPU hammers it all the time. So for example, in the case of a cmpxchg loop, maybe you'll get a couple of branch mispredicts the first few times round the loop (first because the original value read was a plain read and the cacheline was shared rather than exclusive, but then after that because the branch predictor saw previous cases where there was no contention, and the loop exited immediately). So maybe the first couple of iterations the core might be "slow enough" to not get the cmpxchg loop working well. But not a lot more than that (again, unless something special happens, like an interrupt storm that keeps interrupting the loop). End result: I do think the cache coherency needs to be a *bit* smarter than just a "oh, another core wants this line, I will now immediately try to satisfy it". And yes, those kinds of things get more important the more cores you have. You don't need a whole lot of smarts if you have just a couple of cores, because you'll be making progress regardless. And note that I really think this kind of support absolutely falls on hardware. Software simply _cannot_ make some good tuning decision using backoff or whatever. Yes, yes, I know people have done things like that historically (Sparc and SunOS were famous for it), but part of doing them historically was that (a) they didn't know better (b) the software was often tuned to particular hardware and loads (or at least a very small set of hardware) (c) there wasn't necessarily better hardware options available but none of the above is true today. Things like backoff loops in software are not only hw specific, they tend to have seriously bad fairness behavior (and they depend on everybody honoring the backoff, which can be a security issue), so then you have to add even more complexity to deal with the fairness issues, and your cmpxchg loop becomes something truly disgusting. Which is why I say: "do it right in hardware". Linus