Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1225806imu; Fri, 7 Dec 2018 17:11:59 -0800 (PST) X-Google-Smtp-Source: AFSGD/VdYq/lNQi51pSEnHoWwttGLnb3cZjK305MC7cRMe0suHLUWxP8ZezH1BR2f53TFWBZxF08 X-Received: by 2002:a17:902:7d82:: with SMTP id a2mr4177356plm.163.1544231519813; Fri, 07 Dec 2018 17:11:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544231519; cv=none; d=google.com; s=arc-20160816; b=mjMm2xfu59oLoPrZChr2RuX0USaV02uT6N1kGJrsFbNlxsSHpP8RCcZ0jLnyEp/I+/ OJ40Eqwtjx1oz9WYwLpAFXLXdfmAhzf3tmTxs7sHnmDrQj7VgmTxbba+8+Pj6WATCEU7 vskIMF79kwtYySGyXpphKhbymlnyjKDXgviQRm1l/ae0wtRsiPDeZkDyynbEOJXyJF53 naN5kLpYHYbq7TChlh4aVmJL354ckbP7bC7p+4wKRfZzIOgRyC+4iOL5d9CVCpzhwxWw Kwr787IZATaGsxJgyU6qMrZ3Oj15iBvTXy+08aDtPcaXjBNVUvbBc505Rp9041FVgcua ln1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=rdsRmF4WQnETxwYL1tK8P25bz6qzwmQ6m4WA7kOJymI=; b=YsgnV/hEqKvqNQ/5DkJidP7qa8T0qfehw3DoZc+Xcl5qp2aHX5H9aYCCWIbi+2KoDq 7JM8RCyd0cuZCvIyD89wgfikwl7CjN9cM16AiymTm0A75Q9pdhIF+GQhsgBSz1kN+sE6 e4x4KgZr9IFSaxIvQTQtZWiN/vaFMjn2a6LYiVi3h1aKi/T0us4iaq24eMU6rcrfFzCI 0m9lBQ5t9K+lJaOx3nu/6mHVh9cLtiJOszLeJ6N4UNxbo2lcVLbARvicQl7DPWEOrtKl uL4OkUKxa969uKhNhIwNmLZPTMfheRPPrhIcQOlG7DV1R5hwENWfBrLJICauWRZXjQ6y 0D5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="J+/ONAEZ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f186si4089246pfb.67.2018.12.07.17.11.31; Fri, 07 Dec 2018 17:11:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="J+/ONAEZ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726121AbeLHAk6 (ORCPT + 99 others); Fri, 7 Dec 2018 19:40:58 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:35976 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726041AbeLHAk5 (ORCPT ); Fri, 7 Dec 2018 19:40:57 -0500 Received: by mail-pf1-f194.google.com with SMTP id b85so2701806pfc.3 for ; Fri, 07 Dec 2018 16:40:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=rdsRmF4WQnETxwYL1tK8P25bz6qzwmQ6m4WA7kOJymI=; b=J+/ONAEZV13neSlpK/mhKeG3jtuolcyex4VPvCAT3NhNagfAOgkFtvPDZ2n9x7JySD gWJn1R+I7X6UNlch+xreRV/ktsnLALNrZCRb68MrWpxruuqgN5MFz8jxPCwT+tnafZjq 1acHQBdgWJ79tVHpG4fvWZYa4RZIZuNVWZ1MHWvoesPhFhHVwAPIAS3hFmMRaKGPyh6F AjSvX9KnD/m/p+9+RWHrRBqDQAcm5Os/zcHiop1O3Dq/dNKuvcp7LXkhA3BACrGrrAhM 4Wgxbc/+gjUtDZdgi3O5gb6LkH4wFe1gAbFm7ygNrq+GWbmjwph9y2V6oyTrebjETFPC hp+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=rdsRmF4WQnETxwYL1tK8P25bz6qzwmQ6m4WA7kOJymI=; b=frpDHz31PcVI5v1aDe2fPK3faXJzZ39Tf2ME4w4IAVCW2mjnobuu2rtX3b7EgweAe7 iWuV4UOFTLVIZDHcDCslZRAGf8khOaQWo6p2KlY9idFXoCy51UgOggT+t8YtV0+uk0Uk dm5OnUKP5NCN1MyjH8QBuM+qRKV7S/Jp85jQ/5RAn6G1WVBK5JW02oJueY7rKzdYfsTq zwhBPmwq4tGx1GWOeyPKfH4Y6kRmsLx3inMUJjxoNbPQ+Vjv1nSzLGPU6Dy6fMEB/T0v c9QnISk39nwmor+Kjf1PJ4hkgFD8bUzOseO3QSrb8v9yWFMfmZrhQoQhu7mok+rteeHb WEeg== X-Gm-Message-State: AA+aEWbxd2oTRgGfpo32UgRdzABzfxAiIZbK3iOAEQL1lUkpWHGDlxE6 TH59Ic+dv2rUyQLXDb8qGGs= X-Received: by 2002:a63:4b60:: with SMTP id k32mr3745849pgl.186.1544229655385; Fri, 07 Dec 2018 16:40:55 -0800 (PST) Received: from [10.2.19.70] ([208.91.2.1]) by smtp.gmail.com with ESMTPSA id d11sm4501923pgi.25.2018.12.07.16.40.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 16:40:54 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.1 \(3445.101.1\)) Subject: Should this_cpu_read() be volatile? From: Nadav Amit In-Reply-To: Date: Fri, 7 Dec 2018 16:40:52 -0800 Cc: Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , X86 ML , Ingo Molnar , Thomas Gleixner Content-Transfer-Encoding: quoted-printable Message-Id: References: <20181128140136.GG10377@bombadil.infradead.org> <3264149f-e01e-faa2-3bc8-8aa1c255e075@suse.cz> <20181203161352.GP10377@bombadil.infradead.org> <4F09425C-C9AB-452F-899C-3CF3D4B737E1@gmail.com> <20181203224920.GQ10377@bombadil.infradead.org> <20181206102559.GG13538@hirez.programming.kicks-ass.net> <55B665E1-3F64-4D87-B779-D1B4AFE719A9@gmail.com> <20181207084550.GA2237@hirez.programming.kicks-ass.net> To: Peter Zijlstra X-Mailer: Apple Mail (2.3445.101.1) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Resend, changing title & adding lkml and some others ] On Dec 7, 2018, at 3:12 PM, Nadav Amit wrote: [ We can start a new thread, since I have the tendency to hijack = threads. ] > On Dec 7, 2018, at 12:45 AM, Peter Zijlstra = wrote: >=20 > On Thu, Dec 06, 2018 at 09:26:24AM -0800, Nadav Amit wrote: >>> On Dec 6, 2018, at 2:25 AM, Peter Zijlstra = wrote: >>>=20 >>> On Thu, Dec 06, 2018 at 12:28:26AM -0800, Nadav Amit wrote: >>>> [ +Peter ] >>>>=20 [snip] >>>>=20 >>>> *But* there is one thing that may require some attention - patch >>>> b59167ac7bafd ("x86/percpu: Fix this_cpu_read()=E2=80=9D) set = ordering constraints >>>> on the VM_ARGS() evaluation. And this patch also imposes, it = appears, >>>> (unnecessary) constraints on other pieces of code. >>>>=20 >>>> These constraints are due to the addition of the volatile keyword = for >>>> this_cpu_read() by the patch. This affects at least 68 functions in = my >>>> kernel build, some of which are hot (I think), e.g., = finish_task_switch(), >>>> smp_x86_platform_ipi() and select_idle_sibling(). >>>>=20 >>>> Peter, perhaps the solution was too big of a hammer? Is it possible = instead >>>> to create a separate "this_cpu_read_once()=E2=80=9D with the = volatile keyword? Such >>>> a function can be used for native_sched_clock() and other seqlocks, = etc. >>>=20 >>> No. like the commit writes this_cpu_read() _must_ imply READ_ONCE(). = If >>> you want something else, use something else, there's plenty other >>> options available. >>>=20 >>> There's this_cpu_op_stable(), but also __this_cpu_read() and >>> raw_this_cpu_read() (which currently don't differ from = this_cpu_read() >>> but could). >>=20 >> Would setting the inline assembly memory operand both as input and = output be >> better than using the =E2=80=9Cvolatile=E2=80=9D? >=20 > I don't know.. I'm forever befuddled by the exact semantics of gcc > inline asm. >=20 >> I think that If you do that, the compiler would should the = this_cpu_read() >> as something that changes the per-cpu-variable, which would make it = invalid >> to re-read the value. At the same time, it would not prevent = reordering the >> read with other stuff. >=20 > So the thing is; as I wrote, the generic version of this_cpu_*() is: >=20 > local_irq_save(); > __this_cpu_*(); > local_irq_restore(); >=20 > And per local_irq_{save,restore}() including compiler barriers that > cannot be reordered around either. >=20 > And per the principle of least surprise, I think our primitives should > have similar semantics. I guess so, but as you=E2=80=99ll see below, the end result is ugly. > I'm actually having difficulty finding the this_cpu_read() in any of = the > functions you mention, so I cannot make any concrete suggestions other > than pointing at the alternative functions available. So I got deeper into the code to understand a couple of differences. In = the case of select_idle_sibling(), the patch (Peter=E2=80=99s) increase the = function code size by 123 bytes (over the baseline of 986). The per-cpu variable = is called through the following call chain: select_idle_sibling() =3D> select_idle_cpu() =3D> local_clock() =3D> raw_smp_processor_id() And results in 2 more calls to sched_clock_cpu(), as the compiler = assumes the processor id changes in between (which obviously wouldn=E2=80=99t = happen). There may be more changes around, which I didn=E2=80=99t fully analyze. But = the very least reading the processor id should not get =E2=80=9Cvolatile=E2=80=9D. As for finish_task_switch(), the impact is only few bytes, but still unnecessary. It appears that with your patch preempt_count() causes = multiple reads of __preempt_count in this code: if (WARN_ONCE(preempt_count() !=3D 2*PREEMPT_DISABLE_OFFSET, "corrupted preempt_count: %s/%d/0x%x\n", current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); Again, this is unwarranted, as the preemption count should not be = changed in any interrupt.