Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1370393pxb; Fri, 20 Nov 2020 07:58:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJyYR2c79wTLUyAr8DfjeAS+JhNARTx35sA9eX7P5IhwfTv1UXbMovWkhgeRTeYW7qqhz2Wz X-Received: by 2002:aa7:d2d9:: with SMTP id k25mr34938873edr.310.1605887883431; Fri, 20 Nov 2020 07:58:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605887883; cv=none; d=google.com; s=arc-20160816; b=ZLixDlLc+tDDe2JMaQCjbOMvYSWE3+25NvkrZaXqIcfyN75b8owLnVAhaaTTQB6T0j hltbuAa0v9+9hthADlBvPfZM/WXh/LOflC7wjFD9iY5V8i1k76eG57H9ODpkWAWniGtR V5KLikqMweruEqlWQfmOsNSJ8Uo+ms/Jxojc4Tz7MN1bZurc/QWQ7sQjnzdf5USb/6A4 2eAO4DqdweqW7WCplIiVdTMICf76n6uB58UvzJWFvO6tsKzmB7oCPnm8IKhDD3U+ZS89 iw6wtG7JhDL5HaV7rKxXnUh43gnRgmKVniAq4oTYIuMnjG1eM0s6GOVHCj0yLF/VLo0X EYxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:in-reply-to:subject :cc:to:from:user-agent:references; bh=p7HwJERGmYuU4+MtFTEifu2CHWRLOxhe+SGxIV2ALU8=; b=JTdLWa50ipvD3z+gVwACWcEJhdHm0wZ7j7t0IFznkUEEyUehwRApN27wcjEy6XMOut 4COO/pV9PXTEBWzn+IytCZniSmwsB1XNSKPpAw9VbHEmgrKAyqV2wUvN/1q0qOnturG4 GPnbcSqECpVmlE2Wtn5eRA2vk7j6SqSWBxWZwchWLzeZirsbCuhl9ga+w3IG9meEAO1/ MrCWuZt43Hd19hRvVdCH3QuSv23/sP6xOFEyyuFDQKq0RX0ePEOXP0I86UXVcAzMT6Nf JUrjHukkm/i9X3C175m1cxkIlEvkFGLCZVMXvdslMJzxnsE73qCJ432YT5L3P+4ajTP9 q50w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id wq10si826772ejb.487.2020.11.20.07.57.40; Fri, 20 Nov 2020 07:58:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728919AbgKTPy6 (ORCPT + 99 others); Fri, 20 Nov 2020 10:54:58 -0500 Received: from foss.arm.com ([217.140.110.172]:51526 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728404AbgKTPy6 (ORCPT ); Fri, 20 Nov 2020 10:54:58 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 46D2711D4; Fri, 20 Nov 2020 07:54:57 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1229F3F70D; Fri, 20 Nov 2020 07:54:55 -0800 (PST) References: <20201118180030.22764-1-valentin.schneider@arm.com> <20201118180030.22764-3-valentin.schneider@arm.com> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: James Morse Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Fenghua Yu , Reinette Chatre , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" Subject: Re: [PATCH 2/2] x86/intel_rdt: Plug task_work vs task_struct {rmid,closid} update race In-reply-to: Date: Fri, 20 Nov 2020 15:54:53 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi James, On 20/11/20 14:53, James Morse wrote: > Hi Valentin, > > On 18/11/2020 18:00, Valentin Schneider wrote: >> Upon moving a task to a new control / monitor group, said task's {closid, >> rmid} fields are updated *after* triggering the move_myself() task_work >> callback. This can cause said callback to miss the update, e.g. if the >> triggering thread got preempted before fiddling with task_struct, or if the >> targeted task was already on its way to return to userspace. > > So, if move_myself() runs after task_work_add() but before tsk is written to. > Sounds fun! > > >> Update the task_struct's {closid, rmid} tuple *before* invoking >> task_work_add(). As they can happen concurrently, wrap {closid, rmid} >> accesses with READ_ONCE() and WRITE_ONCE(). Highlight the required ordering >> with a pair of comments. > > ... and this one is if move_myself() or __resctrl_sched_in() runs while tsk is being > written to on another CPU. It might get torn values, or multiple-reads get different values. > > The READ_ONCE/WRITE_ONCEry would have been easier to read as a separate patch as you touch > all sites, and move/change some of them. > True, I initially only fixed up the reads/writes involved with __rdtgroup_move_task(), but ended up coccinelle'ing the whole lot - which I should have then moved to a dedicated patch. Thanks for powering through it, I'll send a v2 with a neater split. > Regardless: > Reviewed-by: James Morse > Thanks! > > I don't 'get' memory-ordering, so one curiosity below: > >> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> index b6b5b95df833..135a51529f70 100644 >> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> @@ -524,11 +524,13 @@ static void move_myself(struct callback_head *head) >> * If resource group was deleted before this task work callback >> * was invoked, then assign the task to root group and free the >> * resource group. >> + * >> + * See pairing atomic_inc() in __rdtgroup_move_task() >> */ >> if (atomic_dec_and_test(&rdtgrp->waitcount) && >> (rdtgrp->flags & RDT_DELETED)) { >> - current->closid = 0; >> - current->rmid = 0; >> + WRITE_ONCE(current->closid, 0); >> + WRITE_ONCE(current->rmid, 0); >> kfree(rdtgrp); >> } >> >> @@ -553,14 +555,32 @@ static int __rdtgroup_move_task(struct task_struct *tsk, > >> /* >> * Take a refcount, so rdtgrp cannot be freed before the >> * callback has been invoked. >> + * >> + * Also ensures above {closid, rmid} writes are observed by >> + * move_myself(), as it can run immediately after task_work_add(). >> + * Otherwise old values may be loaded, and the move will only actually >> + * happen at the next context switch. > > But __resctrl_sched_in() can still occur at anytime and READ_ONCE() a pair of values that > don't go together? Yes, the thought did cross my mind... > I don't think this is a problem for RDT as with old-rmid the task was a member of that > monitor-group previously, and 'freed' rmid are kept in limbo for a while after. > (old-closid is the same as the task having not schedule()d since the change, which is fine). > > For MPAM, this is more annoying as changing just the closid may put the task in a > monitoring group that never existed, meaning its surprise dirty later. > > If this all makes sense, I guess the fix (for much later) is to union closid/rmid, and > WRITE_ONCE() them together where necessary. > (I've made a note for when I next pass that part of the MPAM tree) > It does make sense to me - one more question back to you: can RDT exist on an X86_32 system? It shouldn't be a stopper, but would be an inconvenience. FWIW kernel/sched/fair.c uses two synced u64's for this; see struct cfs_rq { .min_vruntime, .min_vruntime_copy } and kernel/sched/fair.c:update_min_vruntime() kernel/sched/fair.c:migrate_task_rq_fair() > >> + * >> + * Pairs with atomic_dec() in move_myself(). >> */ >> atomic_inc(&rdtgrp->waitcount); >> + >> ret = task_work_add(tsk, &callback->work, TWA_RESUME); >> if (ret) { >> /* > > > Thanks! > > James