Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp94416ybh; Fri, 6 Mar 2020 16:49:00 -0800 (PST) X-Google-Smtp-Source: ADFU+vujR7z0TVJXB79fDqEljiQZH4qKhXnQl19Jft+EPciGZeGLHFqiIi2nx0ppnUFyFHjHr7wZ X-Received: by 2002:a05:6830:18ee:: with SMTP id d14mr4508882otf.298.1583542139981; Fri, 06 Mar 2020 16:48:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583542139; cv=none; d=google.com; s=arc-20160816; b=Qfv+mt5K7bEsouRtGQZdwx7dFaWlmRkqfNzZ69qc5sBKcXrzZUV1rzSSa9txA1Jaa0 auPKugKCoEad1GHOd5g3+iiaHt1cDOui8YWnnzzFIt8kM0OS0+Js3wBRIkzLXpqFGYQE XWtXy6tF0u1qpBVHT3BqkbcG9KiIjHyXDU7BZcDlBA9HMfCb0vOxNg2/ScIbqNSe7myU sQggLxekxesmDeUg+B4Td18p4C7QZZaADfJHwb0+UkZywjpeL/NT7v0r5Yl+cbcp41m9 /KEPhvW84lZSz0MGyyr8ok+r9v1WUdi9Me7SJ7/wgE4d5YuVoWjXqZPYRN/sCmeT6O2z mhNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=z6APkzK6CpJ/0qT0mowM0ODgtYzCrA3iEd0Og05TY9g=; b=ngTRxkhBXFCVeuVHPVbq+1ahxAHVj4oom1aKxK2Vjh5IieEF48kIhC/1aOs+t90fiM QFH0orAGxx9O1DqzDbWXMdxe9STloMRqoyoqMviFcjTJR8igYlfsXRkcsstz9BtUFKYF GX3DHgv+GpsLpnXJJjh0aJPZpkGr2dGNEhUIj5XnJ89bJHVYDgh2JNTvRpux8uejXqum 3YiUDyZp0WWpPtM2g5PFC0OsOH24hLUc9Msf171EUIgdk0RxXKP680Jibpi009fbHQJx BvkaIWhpmzH4Evb8jR1BygBaUGt5cAOnRatC3p9xznKlbRY5QVjvyrmRS3oBmjpa5NPW dj1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t23si510776oij.139.2020.03.06.16.48.36; Fri, 06 Mar 2020 16:48:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726314AbgCGApj (ORCPT + 99 others); Fri, 6 Mar 2020 19:45:39 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:54755 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726231AbgCGApj (ORCPT ); Fri, 6 Mar 2020 19:45:39 -0500 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jANav-000056-47; Sat, 07 Mar 2020 01:45:33 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 1C2B2104088; Sat, 7 Mar 2020 01:45:32 +0100 (CET) From: Thomas Gleixner To: Qian Cai Cc: fweisbec@gmail.com, mingo@kernel.org, elver@google.com, linux-kernel@vger.kernel.org, Peter Zijlstra Subject: Re: [PATCH] tick/sched: fix data races at tick_do_timer_cpu In-Reply-To: <1583506974.7365.160.camel@lca.pw> References: <87tv34laqq.fsf@nanos.tec.linutronix.de> <1C65422C-FFA4-4651-893B-300FAF9C49DE@lca.pw> <1583506974.7365.160.camel@lca.pw> Date: Sat, 07 Mar 2020 01:45:32 +0100 Message-ID: <878skdat77.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Qian Cai writes: > On Wed, 2020-03-04 at 06:20 -0500, Qian Cai wrote: >> Looks at tick_sched_do_timer(), for example, >> >> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) { >> #ifdef CONFIG_NO_HZ_FULL >> WARN_ON(tick_nohz_full_running); >> #endif >> tick_do_timer_cpu = cpu; >> } >> #endif >> >> /* Check, if the jiffies need an update */ >> if (tick_do_timer_cpu == cpu) >> tick_do_update_jiffies64(now); >> >> Could we rule out all compilers and archs will not optimize it if !CONFIG_NO_HZ_FULL to, >> >> if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE) || tick_do_timer_cpu == cpu) >> tick_do_update_jiffies64(now); >> >> So it could save a branch or/and realized that tick_do_timer_cpu is >> not used later in this cpu, so it could discard the store? Sorry, I can't make any sense of this. If a compiler optimizes this in the way you are describing then it is seriously broken. If that happens then lots of code will just fall apart in colourful ways. >> I am not all that familiar with all other 14 places if it is possible >> to happen concurrently as well, so I took a pragmatic approach that >> for now only deal with the places that KCSAN confirmed, and then look >> forward for an incremental approach if there are more places needs >> treatments later once confirmed. Stop this please. I just wasted days to understand why code was sprinkled with random rcu*enter/exit() invocations until I figured out that the reason was exactly what you are proposing: Curing the symptoms That's the worst engineering principle ever, really. Just shutting up the tool without doing a proper analysis is a guaranteed recipe for disaster because it is going to paper over real or latent bugs. Read the git log history carefully to find prime examples for this. So no, neither shutting up just the places KCSAN complained about and ignoring the rest nor your new proposal > If you don't think that will be happening and dislike using READ/WRITE_ONCE() > for documentation purpose, we could annotate those using the data_race() macro. > Is that more acceptable? is leading anywhere. It's both based on cargo cult programming and lacks proper root cause analysis. I'm surely not ignoring the output of KCSAN, but please understand that any attempt to silence its output purely based on mechanical duct tape is not going anywhere. I wasted actually months in the past to decode a subtle wreckage which was caused by the back then 'silence uninitilized variable warnings' frenzy. One of these blindly applied annotations papered over a real bug and made it horribly hard to find. As I'm busy with other urgent nightmares right now, this has to wait until I get some spare cycles or you come up with some proper analysis. Thanks, tglx