Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp3996306ybh; Tue, 17 Mar 2020 10:15:36 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvUtUdaaedXM1MU0b6VL4/LhBFQ3CtNAN0pX02wArBFgw/3uIL3aAbYiVXX45Ry5OHqLzQL X-Received: by 2002:a9d:247:: with SMTP id 65mr234442otb.364.1584465335865; Tue, 17 Mar 2020 10:15:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584465335; cv=none; d=google.com; s=arc-20160816; b=Ta3fRHnWGIs8EpgXD1A1nA998sastV+nnrVng2VITc4KDAuOF1qtDpUwg1nlK5jNwl 5tCnxSHSePW3npZuX1tloKVa4zaIT6lSxIu8XtDoDHvviYmKrDawiJAlGp+2RCLIuwes Lb4ysWHHgg5uE1sf/XOhhpHOX8ANmAupypKfm+IHKnpar40TmG4lYTjUGPdfVAAtHxs+ GCH2HP5D4ULykSFigWrxd87/mQe2h79deQ0car7Pq+T+2Eu71ZvoLN5PkYX0gGSlnsKe 7psrAE8t86ADqTYd9aQiLarPBgSQiCsSGw2krr1QfkQqsqOQWTBTxZeI7IWDiR4azNSg PgmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=W9B5D8RNzLHSs5ex7KSZazRjBGKa0YOcmwTm3BEQz9M=; b=N9n2VgTlYc3+w+VtdBOU2UaSJca/RS/8AeXOAJfq7Xso47xeOIVqZ846JspJ2zbAZA F5LNl1ditAW4PW3bVd6WHtVewraCoEs8Ytuk/4W+nKHB5xFpaPCNTm3pWIBjcVdlUXeq fjBLK8W6D2fNEXwUCj6PJXuY1L4AcyLEnp6gXgkjoiisdyAkSvHxgHt/U0UcCIj9RukR ncuDAOn5+G24Ut6AlQEGoFn3gJjcCcBaPTxdB88krilQCtAqJxWsQdDkX1PiOubdEJ5X MJNxo9AbBG8fU3wBEyi/K4AVi+ZvNr+UBwxVH3fj15oGtKHOygREiWMeKf4/MWZDi44Y IElg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mR6aXKPi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 103si2196363otv.23.2020.03.17.10.15.19; Tue, 17 Mar 2020 10:15:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=mR6aXKPi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726783AbgCQRNW (ORCPT + 99 others); Tue, 17 Mar 2020 13:13:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:58646 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726407AbgCQRNV (ORCPT ); Tue, 17 Mar 2020 13:13:21 -0400 Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 57CFC20738; Tue, 17 Mar 2020 17:13:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584465200; bh=BqCVv2OsKZz+I4IGlw/e0EIks1KktHt7HapHjxJ5TfM=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=mR6aXKPiuaEP/zjSjTImPnNGr7rETk3A8xHAPHt8BJ6L58JGejapK+uKrSswNtweU NvGWK+r0zzOFtnb1mhZJPCLP8naq2JcDT0wQX57uE8HGpuZpvwMWkbLeSEDVNG0hQR MQP68hXV5LAbq5Jw8L/vE+0U6SrxPcp7J8wJsWhk= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 30E1C35226E2; Tue, 17 Mar 2020 10:13:20 -0700 (PDT) Date: Tue, 17 Mar 2020 10:13:20 -0700 From: "Paul E. McKenney" To: Marco Elver Cc: LKML , kasan-dev , kernel-team@fb.com, Ingo Molnar , Andrey Konovalov , Alexander Potapenko , Dmitry Vyukov , Qian Cai , Boqun Feng Subject: Re: [PATCH kcsan 27/32] kcsan: Add option to allow watcher interruptions Message-ID: <20200317171320.GI3199@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200309190359.GA5822@paulmck-ThinkPad-P72> <20200309190420.6100-27-paulmck@kernel.org> <20200312180328.GA4772@paulmck-ThinkPad-P72> <20200312180414.GA8024@paulmck-ThinkPad-P72> <20200316154535.GX3199@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 16, 2020 at 05:22:34PM +0100, Marco Elver wrote: > On Mon, 16 Mar 2020 at 16:45, Paul E. McKenney wrote: > > > > On Mon, Mar 16, 2020 at 02:56:38PM +0100, Marco Elver wrote: > > > On Fri, 13 Mar 2020 at 16:28, Marco Elver wrote: > > > > > > > > On Thu, 12 Mar 2020 at 19:04, Paul E. McKenney wrote: > > > > > > > > > > On Thu, Mar 12, 2020 at 11:03:28AM -0700, Paul E. McKenney wrote: > > > > > > On Mon, Mar 09, 2020 at 12:04:15PM -0700, paulmck@kernel.org wrote: > > > > > > > From: Marco Elver > > > > > > > > > > > > > > Add option to allow interrupts while a watchpoint is set up. This can be > > > > > > > enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot > > > > > > > parameter 'kcsan.interrupt_watcher=1'. > > > > > > > > > > > > > > Note that, currently not all safe per-CPU access primitives and patterns > > > > > > > are accounted for, which could result in false positives. For example, > > > > > > > asm-generic/percpu.h uses plain operations, which by default are > > > > > > > instrumented. On interrupts and subsequent accesses to the same > > > > > > > variable, KCSAN would currently report a data race with this option. > > > > > > > > > > > > > > Therefore, this option should currently remain disabled by default, but > > > > > > > may be enabled for specific test scenarios. > > > > > > > > > > > > > > To avoid new warnings, changes all uses of smp_processor_id() to use the > > > > > > > raw version (as already done in kcsan_found_watchpoint()). The exact SMP > > > > > > > processor id is for informational purposes in the report, and > > > > > > > correctness is not affected. > > > > > > > > > > > > > > Signed-off-by: Marco Elver > > > > > > > Signed-off-by: Paul E. McKenney > > > > > > > > > > > > And I get silent hangs that bisect to this patch when running the > > > > > > following rcutorture command, run in the kernel source tree on a > > > > > > 12-hardware-thread laptop: > > > > > > > > > > > > bash tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 10 --kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y" --configs TREE03 > > > > > > > > > > > > It works fine on some (but not all) of the other rcutorture test > > > > > > scenarios. It fails on TREE01, TREE02, TREE03, TREE09. The common thread > > > > > > is that these are the TREE scenarios are all PREEMPT=y. So are RUDE01, > > > > > > SRCU-P, TASKS01, and TASKS03, but these scenarios are not hammering > > > > > > on Tree RCU, and thus have far less interrupt activity and the like. > > > > > > Given that it is an interrupt-related feature being added by this commit, > > > > > > this seems like expected (mis)behavior. > > > > > > > > > > > > Can you reproduce this? If not, are there any diagnostics I can add to > > > > > > my testing? Or a diagnostic patch I could apply? > > > > > > > > I think I can reproduce it. Let me debug some more, so far I haven't > > > > found anything yet. > > > > > > > > What I do know is that it's related to reporting. Turning kcsan_report > > > > into a noop makes the test run to completion. > > > > > > > > > I should hasten to add that this feature was quite helpful in recent work! > > > > > > > > Good to know. :-) We can probably keep this patch, since the default > > > > config doesn't turn this on. But I will try to see what's up with the > > > > hangs, and hopefully find a fix. > > > > > > So this one turned out to be quite interesting. We can get deadlocks > > > if we can set up multiple watchpoints per task in case it's > > > interrupted and the interrupt sets up another watchpoint, and there > > > are many concurrent races happening; because the other_info struct in > > > report.c may never be released if an interrupt blocks the consumer due > > > to waiting for other_info to become released. > > > > Been there, done that! ;-) > > > > > Give me another day or 2 to come up with a decent fix. > > > > My thought is to send a pull request for the commits up to but not > > including this patch, allowing ample development and testing time for > > the fix. My concern with sending this, even with a fix, is that any > > further bugs might cast a shadow on the whole series, further slowing > > acceptance into mainline. > > > > Fair enough? > > That's fine. I think the features changes can stay on -rcu/kcsan-dev > for now, but the documentation updates don't depend on them. > If it'd be useful, the updated documentation could be moved before > this patch to -rcu/kcsan, so we'd have > > kcsan: Add current->state to implicitly atomic accesses > kcsan: Add option for verbose reporting > kcsan: Add option to allow watcher interruptions > -- cut -- > kcsan: Update API documentation in kcsan-checks.h > kcsan: Update Documentation/dev-tools/kcsan.rst > kcsan: Fix a typo in a comment > .. rest of series .. > > Although I'm fine with either. Given my churn with a recent merge window, I am more reluctant than I might otherwise be to do that sort of rearrangement. Sorry to be so cowardly! Thanx, Paul