Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp462743ybn; Wed, 25 Sep 2019 02:41:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqy79FtFpRXWisbL1NYUREFzHvglXka1lWAckIvgIwZEujDen5vBXBXFKaHL1+fviEHlFhWO X-Received: by 2002:a2e:7e05:: with SMTP id z5mr3571350ljc.8.1569404476327; Wed, 25 Sep 2019 02:41:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569404476; cv=none; d=google.com; s=arc-20160816; b=nSVzNxuqKMX6vZz5KW6TM+V4i1hvYfIeXDPPi3406fW+TqFwxLoaRo2AUhBAD8lxl+ cOyWznLM9Uv1qHHjibMN7IYMk/0mHgtMXiMufTtI6hY/MbNCGk5DYYgdqbKFXi7FAFnQ UIcVTGE/ojQFkPywhhfMTMTaVlwwM3yyIsd09w8aAYcHnXys7iCY0Pm6NJ32EoZN/ZEM 6stgxr7ndjDbXx8bMZ+ngD8UEqh9d6FM+hckep6+gf2/JYoamZvYvOZM6MyjV+0RwFoy 7Bz38fjk7i5S8LDwWNSeH3n/JjBpRvFiIFYI9vhNhkPTe3T6Coequt8Ji03Gn/HuiFvx mtnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=1wmLHWhdWEkcWYQfJJdjt2kpx2xUUhXVzrZUl6Mqcvk=; b=g7/qKewMNqefzOjEKaiJ5ltP5bedRimWMI7uZ0m/WRdUBtC0ROJM3/y9c3n4+FQpBz kBAK0h5NXrgW7pMwhkWAMzsXJU9w+b3riU+7wl2K4Tv7sO5aYOAh7ElVvH0kDNuK2Hvu mH95qkyc7dL1IBZ8XLVzbxQmPTh2kDX5Q0bAxTX056+Bo68hzpH8fXVgZNRrEroitfvK 7KvQc1yAw3Ref4Lm9z/zHkF+GkSEotFy6RCKtwtldF6DQXmXsbZYAUVK5M6Vx1CBAhUf 6BXtf1zaaVDK8NtobA61EnNWiQ1uFf030KMS/fOeLVMV574t1qCRMIb9NJqZ32NJbkfu r6PA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oKV8sdMm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 4si2529937ejc.382.2019.09.25.02.40.53; Wed, 25 Sep 2019 02:41:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oKV8sdMm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406269AbfIWI7S (ORCPT + 99 others); Mon, 23 Sep 2019 04:59:18 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:45578 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405374AbfIWI7R (ORCPT ); Mon, 23 Sep 2019 04:59:17 -0400 Received: by mail-qt1-f193.google.com with SMTP id c21so16146402qtj.12 for ; Mon, 23 Sep 2019 01:59:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1wmLHWhdWEkcWYQfJJdjt2kpx2xUUhXVzrZUl6Mqcvk=; b=oKV8sdMm08lK+EjLtR8yNeyI4IyvkP6OrjmovokbPP9TprfyF7pcE5CmP/ErfTmFQJ CSrgM4dwbkaN51hRrAlQsXR7WvYcY0OmWk31Y9ZDsDcPjEJc2qkkvm7QcBHoAQ4Btqjm B09QkYxU6VbJWH2woeZYkSSCm7crCvOvxX/Hbt595TSzYlGQxePSCLE4hundQFWA96Jj 1BwOXUVL6PWAjK5Mao/1NRlmLGUXKEnsibfB2qIkSXnbhB0kFX2c7tz2vX38e1qIcdxB CPSal/fgJ+0UOGkPD1UQP/fcKoZca2ByrGz/i32hGznd27/80bY+weHhbp1+/aW+ZvgY oCBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1wmLHWhdWEkcWYQfJJdjt2kpx2xUUhXVzrZUl6Mqcvk=; b=LLIQvqq8khjalA9HI3i30jK1qKqE3w/C8ljIXySPsXSYmlrzbkutTa4oL2fOSBC2uy SBX3CZ2k47wc/flmrmUp+7xpRNe3MWYTf0+8cRQxuzT6ZcnoylV/xBvir6vl7lWP7932 HhJkti2zNvmmiZ67Ud6BoZCtgApFtyf3AFIgmsCD7jqW/jEfhvxJCejyC+WpPC028Zps k2wiL2k5yYMOAi2s5EzzIiAF6qwx1THSjkV6bXtAgPszJL9wpYe4LUdjf9p+J6DqiHFC zvD1K8R2tdlffLdtxKGz5ZiVc8MM+OC2rrz0/r7v8Ms+r9l9JG2iqS5qfCKo7437m5ai JlRA== X-Gm-Message-State: APjAAAW/LZjx9Up2Z6aT2kSBFnKdHK31KVp7GN6yOTjUnFDtLeT4et2R uhsIAeGgjrNLmzG4s3og0AEf/1IcM6ok7i3lDz5weg== X-Received: by 2002:a05:6214:801:: with SMTP id df1mr16880596qvb.54.1569229156108; Mon, 23 Sep 2019 01:59:16 -0700 (PDT) MIME-Version: 1.0 References: <20190920155420.rxiflqdrpzinncpy@willie-the-truck> <20190923043113.GA1080@tardis> <20190923085409.GB1080@tardis> In-Reply-To: <20190923085409.GB1080@tardis> From: Dmitry Vyukov Date: Mon, 23 Sep 2019 10:59:04 +0200 Message-ID: Subject: Re: Kernel Concurrency Sanitizer (KCSAN) To: Boqun Feng Cc: Will Deacon , Marco Elver , kasan-dev , LKML , Andrey Konovalov , Alexander Potapenko , "Paul E. McKenney" , Paul Turner , Daniel Axtens , Anatol Pomazau , Andrea Parri , Alan Stern , LKMM Maintainers -- Akira Yokosawa , Nicholas Piggin , Daniel Lustig , Jade Alglave , Luc Maranget Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 23, 2019 at 10:54 AM Boqun Feng wrote: > > On Mon, Sep 23, 2019 at 10:21:38AM +0200, Dmitry Vyukov wrote: > > On Mon, Sep 23, 2019 at 6:31 AM Boqun Feng wrote: > > > > > > On Fri, Sep 20, 2019 at 04:54:21PM +0100, Will Deacon wrote: > > > > Hi Marco, > > > > > > > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote: > > > > > We would like to share a new data-race detector for the Linux kernel: > > > > > Kernel Concurrency Sanitizer (KCSAN) -- > > > > > https://github.com/google/ktsan/wiki/KCSAN (Details: > > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst) > > > > > > > > > > To those of you who we mentioned at LPC that we're working on a > > > > > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we > > > > > renamed it to KCSAN to avoid confusion with KTSAN). > > > > > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf > > > > > > > > Oh, spiffy! > > > > > > > > > In the coming weeks we're planning to: > > > > > * Set up a syzkaller instance. > > > > > * Share the dashboard so that you can see the races that are found. > > > > > * Attempt to send fixes for some races upstream (if you find that the > > > > > kcsan-with-fixes branch contains an important fix, please feel free to > > > > > point it out and we'll prioritize that). > > > > > > > > Curious: do you take into account things like alignment and/or access size > > > > when looking at READ_ONCE/WRITE_ONCE? Perhaps you could initially prune > > > > naturally aligned accesses for which __native_word() is true? > > > > > > > > > There are a few open questions: > > > > > * The big one: most of the reported races are due to unmarked > > > > > accesses; prioritization or pruning of races to focus initial efforts > > > > > to fix races might be required. Comments on how best to proceed are > > > > > welcome. We're aware that these are issues that have recently received > > > > > attention in the context of the LKMM > > > > > (https://lwn.net/Articles/793253/). > > > > > > > > This one is tricky. What I think we need to avoid is an onslaught of > > > > patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the > > > > code being modified. My worry is that Joe Developer is eager to get their > > > > first patch into the kernel, so runs this tool and starts spamming > > > > maintainers with these things to the point that they start ignoring KCSAN > > > > reports altogether because of the time they take up. > > > > > > > > I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE > > > > to have a comment describing the racy access, a bit like we do for memory > > > > barriers. Another possibility would be to use atomic_t more widely if > > > > there is genuine concurrency involved. > > > > > > > > > > Instead of commenting READ_ONCE/WRITE_ONCE()s, how about adding > > > anotations for data fields/variables that might be accessed without > > > holding a lock? Because if all accesses to a variable are protected by > > > proper locks, we mostly don't need to worry about data races caused by > > > not using READ_ONCE/WRITE_ONCE(). Bad things happen when we write to a > > > variable using locks but read it outside a lock critical section for > > > better performance, for example, rcu_node::qsmask. I'm thinking so maybe > > > we can introduce a new annotation similar to __rcu, maybe call it > > > __lockfree ;-) as follow: > > > > > > struct rcu_node { > > > ... > > > unsigned long __lockfree qsmask; > > > ... > > > } > > > > > > , and __lockfree indicates that by design the maintainer of this data > > > structure or variable believe there will be accesses outside lock > > > critical sections. Note that not all accesses to __lockfree field, need > > > to be READ_ONCE/WRITE_ONCE(), if the developer manages to build a > > > complex but working wake/wait state machine so that it could not be > > > accessed in the same time, READ_ONCE()/WRITE_ONCE() is not needed. > > > > > > If we have such an annotation, I think it won't be hard for configuring > > > KCSAN to only examine accesses to variables with this annotation. Also > > > this annotation could help other checkers in the future. > > > > > > If KCSAN (at the least the upstream version) only check accesses with > > > such an anotation, "spamming with KCSAN warnings/fixes" will be the > > > choice of each maintainer ;-) > > > > > > Thoughts? > > > > But doesn't this defeat the main goal of any race detector -- finding > > concurrent accesses to complex data structures, e.g. forgotten > > spinlock around rbtree manipulation? Since rbtree is not meant to > > concurrent accesses, it won't have __lockfree annotation, and thus we > > will ignore races on it... > > Maybe, but for forgotten locks detection, we already have lockdep and > also sparse can help a little. They don't do this at all, or to the necessary degree. > Having a __lockfree annotation could be > benefical for KCSAN to focus on checking the accesses whose race > conditions could only be detected by KCSAN at this time. I think this > could help KCSAN find problem more easily (and fast). > > Out of curiosity, does KCSAN ever find a problem with forgotten locks > involved? I didn't see any in the -with-fixes branch (that's > understandable, given the seriousness, the fixes of this kind of > problems could already be submitted to upstream once KCSAN found it.) This one comes to mind: https://www.spinics.net/lists/linux-mm/msg92677.html Maybe some others here, but I don't remember which ones now: https://github.com/google/ktsan/wiki/KTSAN-Found-Bugs