Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp9428930ybc; Sat, 30 Nov 2019 07:10:15 -0800 (PST) X-Google-Smtp-Source: APXvYqxl9vRQnc3l8wW4hkCar3QSbLfJMipjiA/QVMpOhIFeyfADLGi/tCMnGD5cn/Mktti0ijbg X-Received: by 2002:aa7:d842:: with SMTP id f2mr51581728eds.262.1575126614927; Sat, 30 Nov 2019 07:10:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575126614; cv=none; d=google.com; s=arc-20160816; b=gzGsco349Q/tz/LCvI6Hgt7Lhnm6F15Eew74/wLGFPOHPp64gV6YVnVVzkOcS2trtF fCzmmYkjvWXk1y9E7niT+cV9MWuGL5mhHvu8aLq/p3u+akorgKwi/kON/ltl6JC6ppG/ 9dPHvUYLmxFs3L+fTNpvPZ6CHiDfgHqTO9kf1O5VpPjLevhxC3Nl+R6+Q1NAnx14vvxn JtLPF+us7G11tsYmn3ejZ5aTJccsfNEahUrLG3u1jnXaxf0co2fLauJN5byiCaWCFemE D3+5NkM8REP0NmiKT4E4JaRJClU3lFBzrUplJjDo5NEzujzgvubMmTihAO1OvuhohueP d9Ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=e98NLZC9+6eRzqKJHKp/llE6g5svW+olXfo5SFzj40g=; b=qZroa4qL5c1f7o9z9zjbVXemRKGJMVyjicM+DedgUHxtfeFk2nXh9YGlmZWkfoYviU M1GDLH3YBu6krMQwp2rrTtcLjk0/lm0tjH1nsZMFlohKoXkMi90IiMC4nJ1zmy7lYbGu FUC3WgpciIqVVFxgvQP572a2m3Je7jD2/XppQdRd8P8aPCyVslz4VzavrDz23n8UuuuH 5AKDbUzv8rHEDiZnNBJPEDcYmmieJdk/FCq7rSe3z9+yV5tMEb0S1E44icYiKLSpAub5 41kanu8D+QUFhqLcAsm9uGoVJvy+IS/JoSxJNsl/sKahMClXhJCOGQQWNgG2tVAfYTh1 zczg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cx12si19878012edb.184.2019.11.30.07.09.51; Sat, 30 Nov 2019 07:10:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726946AbfK3PI5 (ORCPT + 99 others); Sat, 30 Nov 2019 10:08:57 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:43866 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726376AbfK3PI5 (ORCPT ); Sat, 30 Nov 2019 10:08:57 -0500 Received: from [213.220.153.21] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ib4Mf-00027w-PZ; Sat, 30 Nov 2019 15:08:53 +0000 Date: Sat, 30 Nov 2019 16:08:53 +0100 From: Christian Brauner To: Will Deacon Cc: Rasmus Villemoes , linux-kernel@vger.kernel.org, bsingharora@gmail.com, dvyukov@google.com, elver@google.com, parri.andrea@gmail.com, stable@vger.kernel.org, syzbot+c5d03165a1bd1dead0c1@syzkaller.appspotmail.com, syzkaller-bugs@googlegroups.com Subject: Re: [PATCH v6] taskstats: fix data-race Message-ID: <20191130150851.r6lgwwatu42ad6i4@wittgenstein> References: <20191009114809.8643-1-christian.brauner@ubuntu.com> <20191021113327.22365-1-christian.brauner@ubuntu.com> <20191021130417.5yi7pxpigsydz5po@wittgenstein> <20191129175604.GA29789@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191129175604.GA29789@willie-the-truck> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 29, 2019 at 05:56:05PM +0000, Will Deacon wrote: > On Mon, Oct 21, 2019 at 03:04:18PM +0200, Christian Brauner wrote: > > On Mon, Oct 21, 2019 at 02:19:01PM +0200, Rasmus Villemoes wrote: > > > On 21/10/2019 13.33, Christian Brauner wrote: > > > > The first approach used smp_load_acquire() and smp_store_release(). > > > > However, after having discussed this it seems that the data dependency > > > > for kmem_cache_alloc() would be fixed by WRITE_ONCE(). > > > > Furthermore, the smp_load_acquire() would only manage to order the stats > > > > check before the thread_group_empty() check. So it seems just using > > > > READ_ONCE() and WRITE_ONCE() will do the job and I wanted to bring this > > > > up for discussion at least. > > > > > > > > /* v6 */ > > > > - Christian Brauner : > > > > - bring up READ_ONCE()/WRITE_ONCE() approach for discussion > > > > --- > > > > kernel/taskstats.c | 26 +++++++++++++++----------- > > > > 1 file changed, 15 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/kernel/taskstats.c b/kernel/taskstats.c > > > > index 13a0f2e6ebc2..111bb4139aa2 100644 > > > > --- a/kernel/taskstats.c > > > > +++ b/kernel/taskstats.c > > > > @@ -554,25 +554,29 @@ static int taskstats_user_cmd(struct sk_buff *skb, struct genl_info *info) > > > > static struct taskstats *taskstats_tgid_alloc(struct task_struct *tsk) > > > > { > > > > struct signal_struct *sig = tsk->signal; > > > > - struct taskstats *stats; > > > > + struct taskstats *stats_new, *stats; > > > > > > > > - if (sig->stats || thread_group_empty(tsk)) > > > > - goto ret; > > > > + /* Pairs with WRITE_ONCE() below. */ > > > > + stats = READ_ONCE(sig->stats); > > > > + if (stats || thread_group_empty(tsk)) > > > > + return stats; > > > > > > > > /* No problem if kmem_cache_zalloc() fails */ > > > > - stats = kmem_cache_zalloc(taskstats_cache, GFP_KERNEL); > > > > + stats_new = kmem_cache_zalloc(taskstats_cache, GFP_KERNEL); > > > > > > > > spin_lock_irq(&tsk->sighand->siglock); > > > > - if (!sig->stats) { > > > > - sig->stats = stats; > > > > - stats = NULL; > > > > + if (!stats) { > > > > + stats = stats_new; > > > > + /* Pairs with READ_ONCE() above. */ > > > > + WRITE_ONCE(sig->stats, stats_new); > > > > + stats_new = NULL; > > > > > > No idea about the memory ordering issues, but don't you need to > > > load/check sig->stats again? Otherwise it seems that two threads might > > > both see !sig->stats, both allocate a stats_new, and both > > > unconditionally in turn assign their stats_new to sig->stats. Then the > > > first assignment ends up becoming a memory leak (and any writes through > > > that pointer done by the caller end up in /dev/null...) > > > > Trigger hand too fast. I guess you're thinking sm like: > > > > diff --git a/kernel/taskstats.c b/kernel/taskstats.c > > index 13a0f2e6ebc2..c4e1ed11e785 100644 > > --- a/kernel/taskstats.c > > +++ b/kernel/taskstats.c > > @@ -554,25 +554,27 @@ static int taskstats_user_cmd(struct sk_buff *skb, struct genl_info *info) > > static struct taskstats *taskstats_tgid_alloc(struct task_struct *tsk) > > { > > struct signal_struct *sig = tsk->signal; > > - struct taskstats *stats; > > + struct taskstats *stats_new, *stats; > > > > - if (sig->stats || thread_group_empty(tsk)) > > - goto ret; > > + stats = READ_ONCE(sig->stats); > > This probably wants to be an acquire, since both the memcpy() later on > in taskstats_exit() and the accesses in {b,x}acct_add_tsk() appear to > read from the taskstats structure without the sighand->siglock held and > therefore may miss zeroed allocation from the zalloc() below, I think. > > > + if (stats || thread_group_empty(tsk)) > > + return stats; > > > > - /* No problem if kmem_cache_zalloc() fails */ > > - stats = kmem_cache_zalloc(taskstats_cache, GFP_KERNEL); > > + stats_new = kmem_cache_zalloc(taskstats_cache, GFP_KERNEL); > > > > spin_lock_irq(&tsk->sighand->siglock); > > - if (!sig->stats) { > > - sig->stats = stats; > > - stats = NULL; > > + stats = READ_ONCE(sig->stats); > > You hold the spinlock here, so I don't think you need the READ_ONCE(). > > > + if (!stats) { > > + stats = stats_new; > > + WRITE_ONCE(sig->stats, stats_new); > > You probably want a release here to publish the zeroes from the zalloc() > (back to my first comment). With those changes: > > Reviewed-by: Will Deacon Thanks, this is basically what we had in v5. I'll rework and send this after the merge window closes. > > However, this caused me to look at do_group_exit() and we appear to have > racy accesses on sig->flags there thanks to signal_group_exit(). I worry > that might run quite deep, and can probably be looked at separately. Yeah, we should look into this but separate from this patch. Thanks for taking a look at this! Much appreciated! Christian