Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp3151056ybh; Sat, 25 Jul 2020 12:43:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzehWXOvQ/iedhNgopbRQZbcs1+mGwGm/Y4e8U9KYgkqDI3nXsdbkdLW7RR2VpMDzTSlweb X-Received: by 2002:aa7:db4f:: with SMTP id n15mr14215255edt.193.1595706205381; Sat, 25 Jul 2020 12:43:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595706205; cv=none; d=google.com; s=arc-20160816; b=S8eFd/68KIAxVP4zIWFUCJgeDuaFxc/zDQRdambuFHAaKwuUCO6OdkINZnDFhtb9S6 HRrOQyRY/RyT7RpOKRPj3F+VyXsCFFaFQnJ90p0SQRU648GKTXo/gmRCELme4YPjTxEs nfpe4pvjYd7/U1nlQgFB8OXwtTXMK6IssjsVQ5rqBsORvq4XMMVUj3bJrTnQH9EStA05 +omtX5l/ZTVIZLL9IzXZkpdcl+3f84Ny0LMIWFvGmEJDNwDdYT73XAUJMp79/K1m8j4q goahOeSHLbM+XVXc4l0rmWlNAq+H7SgexDRWt3fs6VJvBIbRD/kvQ0z36LCvOaSbHhFT aCvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=4Ffcjjap3DxsaA3A/5TApp2xdOnvXOUywtudTVEVogM=; b=DMbnWoPvbEMTtUnpbNbkN2ml9H4GyIUj8g13B2ar28RXn0trkLcfr/tumxOeiL+u3A JUl6mVxfhb5bBQCef9tY7G98xvpFqmXbV26BOw7Mtw4w633IeA1JPPAJpGxD8xDd04// 9sIAqBWwcnGI15wMK2KvkX/LxRMJhNoSXwpNP7OdZ5RNxr6wPnPelaWJe/20eIs7UA57 Irlj+Ac9cfZjSafjmu2z5ldX5Qz7Pa1Rrl7TCtlULdZFJMciEXyNzm8QBQqkAUkDJsjJ tyKKzwMGwWu+aoaGIDsUKpMoidzlEQlTUGLOlXcxrW8ZcEotadmzDnORHZg8fCdTCG/A 0bfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Wx8JwBnf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z13si2877506ejc.295.2020.07.25.12.42.44; Sat, 25 Jul 2020 12:43:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Wx8JwBnf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727034AbgGYTjK (ORCPT + 99 others); Sat, 25 Jul 2020 15:39:10 -0400 Received: from mail.kernel.org ([198.145.29.99]:41400 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726447AbgGYTjK (ORCPT ); Sat, 25 Jul 2020 15:39:10 -0400 Received: from paulmck-ThinkPad-P72.home (50-39-111-31.bvtn.or.frontiernet.net [50.39.111.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 25451206D8; Sat, 25 Jul 2020 19:39:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595705949; bh=HExWakA+miNWIPWnX/iqiAyGRgx5GHyT9HO5mLqYU+Q=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=Wx8JwBnfEZu6V2RbKgm3EMhAXAJwA2/Qm7sPAQHwLAV6IHtYZeTab621VK43bPgaT 5gOuwTDlbgK5/fdkjML+MOMEyoS+F/DxkfaZ2cE3I49BrIhgcphdv81z1EG05SUuEZ 5GiPtwZN+tfnSLx7wO3dRCVynRZCKzCXH7JyVrQk= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 0C9343522767; Sat, 25 Jul 2020 12:39:09 -0700 (PDT) Date: Sat, 25 Jul 2020 12:39:09 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Marco Elver , Andrey Konovalov , Alexander Potapenko , Dmitry Vyukov , kasan-dev , LKML Subject: Re: [PATCH] kcsan: Add option to allow watcher interruptions Message-ID: <20200725193909.GB9247@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200220141551.166537-1-elver@google.com> <20200220185855.GY2935@paulmck-ThinkPad-P72> <20200220213317.GA35033@google.com> <20200725145623.GZ9247@paulmck-ThinkPad-P72> <20200725174430.GH10769@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200725174430.GH10769@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jul 25, 2020 at 07:44:30PM +0200, Peter Zijlstra wrote: > On Sat, Jul 25, 2020 at 05:17:43PM +0200, Marco Elver wrote: > > On Sat, 25 Jul 2020 at 16:56, Paul E. McKenney wrote: > > > On Thu, Feb 20, 2020 at 10:33:17PM +0100, Marco Elver wrote: > > > > On Thu, 20 Feb 2020, Paul E. McKenney wrote: > > > > > On Thu, Feb 20, 2020 at 03:15:51PM +0100, Marco Elver wrote: > > > > > > Add option to allow interrupts while a watchpoint is set up. This can be > > > > > > enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot > > > > > > parameter 'kcsan.interrupt_watcher=1'. > > [...] > > > > > > As an example, the first data race that this found: > > > > > > > > > > > > write to 0xffff88806b3324b8 of 4 bytes by interrupt on cpu 0: > > > > > > rcu_preempt_read_enter kernel/rcu/tree_plugin.h:353 [inline] > > > > > > __rcu_read_lock+0x3c/0x50 kernel/rcu/tree_plugin.h:373 > > [...] > > > > > > read to 0xffff88806b3324b8 of 4 bytes by task 6131 on cpu 0: | > > > > > > rcu_preempt_read_enter kernel/rcu/tree_plugin.h:353 [inline] ----+ > > [...] > > > > > > > > > > > > The writer is doing 'current->rcu_read_lock_nesting++'. The read is as > > > > > > vulnerable to compiler optimizations and would therefore conclude this > > > > > > is a valid data race. > > > > > > > > > > Heh! That one is a fun one! It is on a very hot fastpath. READ_ONCE() > > > > > and WRITE_ONCE() are likely to be measurable at the system level. > > > > > > > > > > Thoughts on other options? > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > > > index c6ea81cd41890..e0595abd50c0f 100644 > > > > --- a/kernel/rcu/tree_plugin.h > > > > +++ b/kernel/rcu/tree_plugin.h > > > > @@ -350,17 +350,17 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp) > > > > > > > > static void rcu_preempt_read_enter(void) > > > > { > > > > - current->rcu_read_lock_nesting++; > > > > + local_inc(¤t->rcu_read_lock_nesting); > > > > } > > > > > > > > static void rcu_preempt_read_exit(void) > > > > { > > > > - current->rcu_read_lock_nesting--; > > > > + local_dec(¤t->rcu_read_lock_nesting); > > > > } > > > > > > > > static void rcu_preempt_depth_set(int val) > > > > { > > > > - current->rcu_read_lock_nesting = val; > > > > + local_set(¤t->rcu_read_lock_nesting, val); > > > > > I agree that this removes the data races, and that the code for x86 is > > > quite nice, but aren't rcu_read_lock() and rcu_read_unlock() going to > > > have heavyweight atomic operations on many CPUs? > > > > > > Maybe I am stuck with arch-specific code in rcu_read_lock() and > > > rcu_preempt_read_exit(). I suppose worse things could happen. > > > > Peter also mentioned to me that while local_t on x86 generates > > reasonable code, on other architectures it's terrible. So I think > > something else is needed, and feel free to discard the above idea. > > With sufficient enough reasoning, how bad would a 'data_race(..)' be? > > Right, so local_t it atrocious on many architectures, they fall back to > atomic_t. > > Even architectures that have optimized variants (eg. Power), they're > quite a lot more expensive than what we actually need here. > > Only architectures like x86 that have single instruction memops can > generate anywhere near the code that we'd want here. > > So the thing is, since RCU count is 0 per context (an IRQ must have an > equal amount of rcu_read_unlock() as it has rcu_read_lock()), interrupts > are not in fact a problem, even on load-store (RISC) architectures > (preempt_count has the same thing). True enough! > So the addition/subtraction in rcu_preempt_read_{enter,exit}() doesn't > need to be atomic vs interrupts. The only thing we really do need is > them being single-copy-atomic. > > The problem with READ/WRITE_ONCE is that if we were to use it, we'd end > up with a load-store, even on x86, which is sub-optimal. Agreed. > I suppose the 'correct' code here would be something like: > > *((volatile int *)¤t->rcu_read_lock_nesting)++; > > then the compiler can either do a single memop (x86 and the like) or a > load-store that is free from tearing. Hah!!! That is the original ACCESS_ONCE(), isn't it? ;-) ACCESS_ONCE(current->rcu_read_lock_nesting)++; But open-coding makes sense unless a lot of other places need something similar. Besides, open-coding allows me to defer bikeshedding on the name, given that there are actually two accesses. :-/ Ah, but that gets compiler warnings: kernel/rcu/tree_plugin.h:354:52: error: lvalue required as increment operand *((volatile int *)¤t->rcu_read_lock_nesting)++; Let's try the old ACCESS_ONCE(). Dialing back to v3.0: #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) So: (*(volatile int *)&(current->rcu_read_lock_nesting))++; This gets me the following for __rcu_read_lock(): 00000000000000e0 <__rcu_read_lock>: e0: 48 8b 14 25 00 00 00 mov 0x0,%rdx e7: 00 e8: 8b 82 e0 02 00 00 mov 0x2e0(%rdx),%eax ee: 83 c0 01 add $0x1,%eax f1: 89 82 e0 02 00 00 mov %eax,0x2e0(%rdx) f7: c3 retq f8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ff: 00 One might hope for a dec instruction, but this isn't bad. We do lose a few instructions compared to the C-language case due to differences in address calculation: 00000000000000e0 <__rcu_read_lock>: e0: 48 8b 04 25 00 00 00 mov 0x0,%rax e7: 00 e8: 83 80 e0 02 00 00 01 addl $0x1,0x2e0(%rax) ef: c3 retq For the relevant portion of __rcu_read_unlock(), this gets us the following: 00000000000027f0 <__rcu_read_unlock>: 27f0: 48 8b 3c 25 00 00 00 mov 0x0,%rdi 27f7: 00 27f8: 53 push %rbx 27f9: 8b 87 e0 02 00 00 mov 0x2e0(%rdi),%eax 27ff: 8d 50 ff lea -0x1(%rax),%edx 2802: 85 c0 test %eax,%eax 2804: 89 97 e0 02 00 00 mov %edx,0x2e0(%rdi) 280a: 75 0a jne 2816 <__rcu_read_unlock+0x26> Here we have a load-subtract-store, but given that we need to test the value, this seems reasonable to me. We again lose a few instructions compared to the C-language case, and again due to address calculation: 00000000000027e0 <__rcu_read_unlock>: 27e0: 53 push %rbx 27e1: 48 8b 3c 25 00 00 00 mov 0x0,%rdi 27e8: 00 27e9: 83 af e0 02 00 00 01 subl $0x1,0x2e0(%rdi) 27f0: 75 0a jne 27fc <__rcu_read_unlock+0x1c> Thoughts? Thanx, Paul