Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1914771imm; Thu, 11 Oct 2018 01:48:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV60CHqFCRpXtBNeePuoyZ8JABYCxEyYIhfJ/O4duyd4I730wB6ur3fXwgzE1yJkrXqbzN4CD X-Received: by 2002:a17:902:7d94:: with SMTP id a20-v6mr694291plm.40.1539247713530; Thu, 11 Oct 2018 01:48:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539247713; cv=none; d=google.com; s=arc-20160816; b=ua8MMBGHq2bVmpM8/dziWyXFA1DKr8rDOTqpjF6e7/ywtBKedKBCUK31ZDkV91aVlj OSYyTjnXsKqCSVWTK/Jfse/McK1C7lFGL1VhQVhGvyEm+V4Dq+NbabJztKarwLl0OywW 4IEmQ+svsQ50dwGnjheURGxSjTodcY/hEK2OC6tbX2i1Ynvi++IrykAUP9puY+PSNvd3 T+LQ5hykQF3mkDfbFsJDSIKEeeIWsM8wv+sEf9WX978zvetrKL5xfVKLu6/3slP/zNE8 wAoQlYtaaKecrLuSLhh1WZ/giXO9nH9TtO8/gyl8ejH7VHLndVd1IvFD6992VxAep2cq BdMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=g6j3VeIq8Dup12gH7zEipE5cMK85F1IGDFIIuCiFysc=; b=jYev42crioPVjoK0+5QNdxX/fnWtt2KEztWKH0qHRud9kKDcrC1jSv2SZejy27Ol0W 9tfVzaxI/j/4o01AtesiB7ksAKUGDAmSDviPcCZTJFQjlbEeW4AP5tGZijZ/9QDKVbG0 RtdbL0hiRLUvdL9WEVEaDpDFXabGHX7uokct91+hBNsnAT2gYOLOJUgBsZpP1uUG34NW KbbtHcq36XGPLlJ1vmI+Y5vsKzKJRU3Pc+S0m+BxnI99/FJj1mwQ30jrTea2n/ZhwGc8 qAnW0adNROqYge1Qt1lIYzo50lYb0pISPW8LGC/8tq4fbh5ZH/3qlleE8jyJlD4xkeNT ic/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b="TrMEW3/9"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d1-v6si28347092pla.103.2018.10.11.01.48.19; Thu, 11 Oct 2018 01:48:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b="TrMEW3/9"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728344AbeJKQHQ (ORCPT + 99 others); Thu, 11 Oct 2018 12:07:16 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:37592 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727080AbeJKQHL (ORCPT ); Thu, 11 Oct 2018 12:07:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=g6j3VeIq8Dup12gH7zEipE5cMK85F1IGDFIIuCiFysc=; b=TrMEW3/9i69nodwHaJ0FyPCnh ny0/NosV917OR9169aHWI74YoJxLNu5XtPF2K5omlbsnpQdf1H+AKAfHcD5xzMOwnUUxq7sxpqgET w6DFiH+pRxVCRuZI5BnWfTKUJ2Md0x5ESfa1geXk1PbwxGHfUbXk6X9suJooNn6wmWb+H6o01aoSj 0Mj8rHUN0NbhgYUdkZqUuQlB+CtW1WWK1dHsqjXLFV+58sLuhgowMElb9LlJvEhA3NqXlRQLopToC jwN7Sv2sBB1PEKX4jr8RP8hUFpEbAlR+qEvkUzB1C4zJ/k0KwaB0M5nuoBQORMNMb/4kFr1fNTnzC TOsKATweA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1gAWWY-0006Do-Oj; Thu, 11 Oct 2018 08:40:50 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 050972027E187; Thu, 11 Oct 2018 10:40:48 +0200 (CEST) Date: Thu, 11 Oct 2018 10:40:47 +0200 From: Peter Zijlstra To: Eric Dumazet Cc: linux-kernel , Eric Dumazet , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov Subject: Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin() Message-ID: <20181011084047.GA9885@hirez.programming.kicks-ass.net> References: <20181011003336.168941-1-edumazet@google.com> <20181011073133.GZ5663@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181011073133.GZ5663@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 11, 2018 at 09:31:33AM +0200, Peter Zijlstra wrote: > On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote: > > While looking at native_sched_clock() disassembly I had > > the surprise to see the compiler (gcc 7.3 here) had > > optimized out the loop, meaning the code is broken. > > > > Using the documented and approved API not only fixes the bug, > > it also makes the code more readable. > > > > Replacing five this_cpu_read() by one this_cpu_ptr() makes > > the generated code smaller. > > Does not for me, that is, the resulting asm is actually larger > > You're quite right the loop went missing; no idea wth that compiler is > smoking (gcc-8.2 for me). In order to eliminate that loop it needs to > think that two consecutive loads of this_cpu_read(cyc2ns.seq.sequence) > will return the same value. But this_cpu_read() is an asm() statement, > it _should_ not assume such. > > We assume that this_cpu_read() implies READ_ONCE() in a number of > locations, this really should not happen. > > The reason it was written using this_cpu_read() is so that it can use > %gs: prefixed instructions and avoid ever loading that percpu offset and > doing manual address computation. > > Let me prod at this with a sharp stick. OK, so apart from the inlining being crap, which is fixed by: diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 6490f618e096..638491062fea 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -57,7 +57,7 @@ struct cyc2ns { static DEFINE_PER_CPU_ALIGNED(struct cyc2ns, cyc2ns); -void cyc2ns_read_begin(struct cyc2ns_data *data) +void __always_inline cyc2ns_read_begin(struct cyc2ns_data *data) { int seq, idx; @@ -74,7 +74,7 @@ void cyc2ns_read_begin(struct cyc2ns_data *data) } while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence))); } -void cyc2ns_read_end(void) +void __always_inline cyc2ns_read_end(void) { preempt_enable_notrace(); } @@ -103,7 +103,7 @@ void cyc2ns_read_end(void) * -johnstul@us.ibm.com "math is hard, lets go shopping!" */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc) { struct cyc2ns_data data; unsigned long long ns; That gets us: native_sched_clock: pushq %rbp # movq %rsp, %rbp #, ... jump label ... rdtsc salq $32, %rdx #, tmp110 orq %rax, %rdx # low, tmp110 movl %gs:cyc2ns+32(%rip),%ecx # cyc2ns.seq.sequence, pfo_ret__ andl $1, %ecx #, idx salq $4, %rcx #, tmp116 movl %gs:cyc2ns(%rcx),%esi # cyc2ns.data[idx_14].cyc2ns_mul, pfo_ret__ movl %esi, %esi # pfo_ret__, pfo_ret__ movq %rsi, %rax # pfo_ret__, tmp133 mulq %rdx # _23 movq %gs:cyc2ns+8(%rcx),%rdi # cyc2ns.data[idx_14].cyc2ns_offset, pfo_ret__ addq $cyc2ns, %rcx #, tmp117 movl %gs:4(%rcx),%ecx # cyc2ns.data[idx_14].cyc2ns_shift, pfo_ret__ shrdq %rdx, %rax # pfo_ret__,, tmp134 shrq %cl, %rdx # pfo_ret__, testb $64, %cl #, pfo_ret__ cmovne %rdx, %rax #,, tmp134 addq %rdi, %rax # pfo_ret__, popq %rbp # ret If we then fix the percpu mess, with: diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index e9202a0de8f0..1a19d11cfbbd 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -185,22 +185,22 @@ do { \ typeof(var) pfo_ret__; \ switch (sizeof(var)) { \ case 1: \ - asm(op "b "__percpu_arg(1)",%0" \ + asm volatile(op "b "__percpu_arg(1)",%0"\ : "=q" (pfo_ret__) \ : "m" (var)); \ break; \ case 2: \ - asm(op "w "__percpu_arg(1)",%0" \ + asm volatile(op "w "__percpu_arg(1)",%0"\ : "=r" (pfo_ret__) \ : "m" (var)); \ break; \ case 4: \ - asm(op "l "__percpu_arg(1)",%0" \ + asm volatile(op "l "__percpu_arg(1)",%0"\ : "=r" (pfo_ret__) \ : "m" (var)); \ break; \ case 8: \ - asm(op "q "__percpu_arg(1)",%0" \ + asm volatile(op "q "__percpu_arg(1)",%0"\ : "=r" (pfo_ret__) \ : "m" (var)); \ break; \ That turns into: native_sched_clock: pushq %rbp # movq %rsp, %rbp #, ... jump label ... rdtsc salq $32, %rdx #, tmp110 orq %rax, %rdx # low, tmp110 movq %rdx, %r10 # tmp110, _23 movq $cyc2ns, %r9 #, tmp136 .L235: movl %gs:cyc2ns+32(%rip),%eax # cyc2ns.seq.sequence, pfo_ret__ movl %eax, %ecx # pfo_ret__, idx andl $1, %ecx #, idx salq $4, %rcx #, tmp116 addq %r9, %rcx # tmp136, tmp117 movq %gs:8(%rcx),%rdi # cyc2ns.data[idx_14].cyc2ns_offset, pfo_ret__ movl %gs:(%rcx),%esi # cyc2ns.data[idx_14].cyc2ns_mul, pfo_ret__ movl %gs:4(%rcx),%ecx # cyc2ns.data[idx_14].cyc2ns_shift, pfo_ret__ movl %gs:cyc2ns+32(%rip),%r8d # cyc2ns.seq.sequence, pfo_ret__ cmpl %r8d, %eax # pfo_ret__, pfo_ret__ jne .L235 #, movl %esi, %esi # pfo_ret__, pfo_ret__ movq %rsi, %rax # pfo_ret__, tmp133 mulq %r10 # _23 shrdq %rdx, %rax # pfo_ret__,, tmp134 shrq %cl, %rdx # pfo_ret__, testb $64, %cl #, pfo_ret__ cmovne %rdx, %rax #,, tmp134 addq %rdi, %rax # pfo_ret__, popq %rbp # ret which is exactly right. Except perhaps for the mess that mul_u64_u32_shr() turns into.