Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp323938pxb; Wed, 18 Nov 2020 05:38:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJzCMBwiMT4a5xtEusM7BJdUnF8SoPDVGUOlpLJYkL2eEOJStldae/bbfAHAPtXZVyArhNYQ X-Received: by 2002:a05:6402:2208:: with SMTP id cq8mr7441615edb.182.1605706694738; Wed, 18 Nov 2020 05:38:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605706694; cv=none; d=google.com; s=arc-20160816; b=YeRnFQdU4AZg+x7GfMf5ELW2zw9sztRGKpaYGO+7C94o3sOWJ7T6o7NtmzzZG81V/1 9F7I/rd5Wv4CugobEUFSDI8Urh5EZ77ZRyCmyMskMpwC3eX1ueM9hIjgyUwzT8m3Aaup KPuJkEEXNPAVOoZd5G95blNGgjuso4zeFka2v1WzHKYVBJKbCt2GOfxd+KT1H9v4Ym/x bypmlMK1aqYno4u9TwYprzJ94B1L+OvBqwOaCETgIHwqG5JRdeMW0kam/jv5MJQYBvSO qLOf6dOscpo+D1tNUNUP+tAuU+k7tYsILnJeX/ooo1K1NId3sbrwr45eXiSWdF7F7KZY busA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=Uwe/KCr8IAC9x09ii97BL2PKL937Dx5nmyaDJ5GjClRPql94QrtpwmS5mW2ffgceTV f8AYd8vo3x6Xf6yORKaFn6WTXM8myMzzgQRbSwRzB7pRc2aGioEId2kBz76DmXU5ytHD lvk4Fo/jOtmnHcZvlbm7IFUhPLWKxINz+fT9p+kANbBx8c8RjvJ8JtBf/VKHh6t4+T8v 1IskTpIeEPxOKwpwm9BHIM5EkFJ7lo2qHr6m+1NOBYm49sPpPiUe8tb3SAQJlbCs8ddO LjErmoLumuvxVOEm5prS70mca7vDkKdm8Ux3EF0K3jkBjSAza2m2SFFfftloTJee+mHg fUbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=QImmPshd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d17si8589466ejh.451.2020.11.18.05.37.51; Wed, 18 Nov 2020 05:38:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=QImmPshd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726219AbgKRNdo (ORCPT + 99 others); Wed, 18 Nov 2020 08:33:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725613AbgKRNdn (ORCPT ); Wed, 18 Nov 2020 08:33:43 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0B80C0613D4 for ; Wed, 18 Nov 2020 05:33:42 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id 1so2743009wme.3 for ; Wed, 18 Nov 2020 05:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=QImmPshdTjV1NJrDADCTneKdpxi5uJioL+eQ8QDHDpdv0hH30nPGo0kDGvD/zwALyd 30vCXYj2RStj1hB7kh4JE3YXMMTbI9H07/h8WOebJDrrsFvsP8xjQy72AMrNINrzqeMj tqUnoDB4cNjCxmsiFdUHG5vQmuoXBMKuGmdgn8yY2JDXREAMiNPwX2ht7UHmw3IYEFV3 ll80L/0XCHkm6Z+ZW+AF96hQ8IWr7pMQEEpcbXPSOEnYwaHvs7MWot6fdjzTzn56q/fh zO0ccXySdqQIY05+P9m+fDk4b43TH+3zK5ByjKaqZS1o2FJ/eQJt8+tiKMwScKV41+Tp gJKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=mxvZA200dvNgV5zMvWEqlVB4iRx1pToMXseFSX9J+zlkcSW3Eot5F/LseVyw0e/Yq8 /qrQZH619RJGjTO01PJrUc5DFXEEYBSBwXshi/UQsRERh3FIR7SGVFem53O9+3kehtFo WX34zYrInGumfjQO4S7VXJfzAzUfQR5G4SlhIF5LWzU/ahqat0T7sql8PHIuKyY6Dt71 Y+BjaaDgTZsxi51AuOVpbR1Kfo8G0a2Nf75StyU4k4n2GBLUvtKKLnAwhDge7AYfsV0U qJ9Krmeib/CEApqCSsi2XJJvaVyix5mrBNAOvGIDKoGLnuKULL8HPVaeJybyJgO7I40H 462Q== X-Gm-Message-State: AOAM5315Jo1GiJrxg76iNF1Xb42AZX0ylwD+HD8Ln+S7A1DGirPHTmCl NpKb+lemkqYF93pLTeO9MXCYfw== X-Received: by 2002:a1c:2d93:: with SMTP id t141mr80222wmt.104.1605706421330; Wed, 18 Nov 2020 05:33:41 -0800 (PST) Received: from elver.google.com ([2a00:79e0:15:13:f693:9fff:fef4:2449]) by smtp.gmail.com with ESMTPSA id b124sm3919845wmh.13.2020.11.18.05.33.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Nov 2020 05:33:39 -0800 (PST) Date: Wed, 18 Nov 2020 14:33:33 +0100 From: Marco Elver To: Peter Zijlstra Cc: Will Deacon , Mel Gorman , Davidlohr Bueso , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, paulmck@kernel.org Subject: Re: [PATCH] sched: Fix data-race in wakeup Message-ID: <20201118133333.GA1506553@elver.google.com> References: <20201116091054.GL3371@techsingularity.net> <20201116131102.GA29992@willie-the-truck> <20201116133721.GQ3371@techsingularity.net> <20201116142005.GE3121392@hirez.programming.kicks-ass.net> <20201116193149.GW3371@techsingularity.net> <20201117083016.GK3121392@hirez.programming.kicks-ass.net> <20201117091545.GA31837@willie-the-truck> <20201117092936.GA3121406@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201117092936.GA3121406@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.14.6 (2020-07-11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 17, 2020 at 10:29AM +0100, Peter Zijlstra wrote: [...] > > Now the million dollar question is why KCSAN hasn't run into this. Hrmph. > > kernel/sched/Makefile:KCSAN_SANITIZE := n > > might have something to do with that, I suppose. For the record, I tried to reproduce this data race. I found a read/write race on this bitfield, but not yet that write/write race (perhaps I wasn't running the right workload). | read to 0xffff8d4e2ce39aac of 1 bytes by task 5269 on cpu 3: | __sched_setscheduler+0x4a9/0x1070 kernel/sched/core.c:5297 | sched_setattr kernel/sched/core.c:5512 [inline] | ... | | write to 0xffff8d4e2ce39aac of 1 bytes by task 5268 on cpu 1: | __schedule+0x296/0xab0 kernel/sched/core.c:4462 prev->sched_contributes_to_load = | schedule+0xd1/0x130 kernel/sched/core.c:4601 | ... | | Full report: https://paste.debian.net/hidden/07a50732/ Getting to the above race also required some effort as 1) I kept hitting other unrelated data races in the scheduler and had to silence those first to be able to make progress, and 2) only enable KCSAN for scheduler code to just ignore all other data races. Then I let syzkaller run for a few minutes. Also note our default KCSAN config is suboptimal. For serious debugging, I'd recommend the same config that rcutorture uses with the --kcsan flag, specifically: CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n to get the full picture. However, as a first step, it'd be nice to eventually remove the KCSAN_SANITIZE := n from kernel/sched/Makefile when things are less noisy (so that syzbot and default builds can start finding more serious issues, too). Thanks, -- Marco