Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp1323677pxb; Fri, 20 Nov 2020 06:56:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJxAZU9K0e5Lem830MSjg29rAZnZqN1iftdi1yl7PP25V2TXsnsaB+gmZhVtTUdU1HJUbVzk X-Received: by 2002:a50:8745:: with SMTP id 5mr35398933edv.49.1605884192265; Fri, 20 Nov 2020 06:56:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605884192; cv=none; d=google.com; s=arc-20160816; b=AF2gx0yPF7YujQkTx4qS0aIfiHOl1RfRGScI2t1cFIz6elnCuHepVNr5//xj7ijlxK xhU+m8FwQ2Wq3+DNdRf94HJapoig9ALklGjU8A82aHzHz4OU45xGhB87EQNFI0BVMbaN 4hoEQRgDrtZFVjVWhFHh0iAF692oOOg3NZC0T/f5QxT/sUHbDJ0UOOqlfjPBlRCm/SSD E8VFX/uv0xEdmlmLedEj7fAUrOeqavBKp5h/ZEfTVPdwHTCYSFr1SdfJ/JI12VQghGaB lB7r2btvYKkm6NKgqIIsQ7hv06sgbxev7cW1GdB+DO+Uz86dSWqS1r+dsY9QNJz8kd2x mwvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=U8zA3b6oDxkjVSSkQxWY3eRS/GQh1pUKcbotMmtEZRk=; b=gwwZ32PRxpCf77j+YmnJjJJS7pnsA7HWRcaRkkrbWD1b0NYUwipvWGv/M45U8MS9up 9j+d/Tw9qbCwX8GCIqD6kdYQhSY4JjQaujO87n+pO3uMR/q3U8dhsAdnzw5s2vHPTrh4 YlhATty6XeLq5GcjuGXftK7ko56uHBuDfA7W/9NGL6pgXE++GPWhcumvK6WufFMTgKMU aX9j2bSSSwaKXpMaAnS589JaplYgrxsZFtaXj8jVCREvg9vPDirx+R5W+a5HlIu2211a O0NYDuZs+5WbVg49gCXGcGkr/mJFUyKfw9rxeaFynONgts/j+VRfqmCwShOev284Z2A9 0DbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id uz26si1764149ejb.559.2020.11.20.06.56.07; Fri, 20 Nov 2020 06:56:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728065AbgKTOxX (ORCPT + 99 others); Fri, 20 Nov 2020 09:53:23 -0500 Received: from foss.arm.com ([217.140.110.172]:50380 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727053AbgKTOxX (ORCPT ); Fri, 20 Nov 2020 09:53:23 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A466811D4; Fri, 20 Nov 2020 06:53:22 -0800 (PST) Received: from [192.168.2.21] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 482263F70D; Fri, 20 Nov 2020 06:53:21 -0800 (PST) Subject: Re: [PATCH 2/2] x86/intel_rdt: Plug task_work vs task_struct {rmid,closid} update race To: Valentin Schneider Cc: linux-kernel@vger.kernel.org, x86@kernel.org, Fenghua Yu , Reinette Chatre , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" References: <20201118180030.22764-1-valentin.schneider@arm.com> <20201118180030.22764-3-valentin.schneider@arm.com> From: James Morse Message-ID: Date: Fri, 20 Nov 2020 14:53:16 +0000 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201118180030.22764-3-valentin.schneider@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Valentin, On 18/11/2020 18:00, Valentin Schneider wrote: > Upon moving a task to a new control / monitor group, said task's {closid, > rmid} fields are updated *after* triggering the move_myself() task_work > callback. This can cause said callback to miss the update, e.g. if the > triggering thread got preempted before fiddling with task_struct, or if the > targeted task was already on its way to return to userspace. So, if move_myself() runs after task_work_add() but before tsk is written to. Sounds fun! > Update the task_struct's {closid, rmid} tuple *before* invoking > task_work_add(). As they can happen concurrently, wrap {closid, rmid} > accesses with READ_ONCE() and WRITE_ONCE(). Highlight the required ordering > with a pair of comments. ... and this one is if move_myself() or __resctrl_sched_in() runs while tsk is being written to on another CPU. It might get torn values, or multiple-reads get different values. The READ_ONCE/WRITE_ONCEry would have been easier to read as a separate patch as you touch all sites, and move/change some of them. Regardless: Reviewed-by: James Morse I don't 'get' memory-ordering, so one curiosity below: > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index b6b5b95df833..135a51529f70 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -524,11 +524,13 @@ static void move_myself(struct callback_head *head) > * If resource group was deleted before this task work callback > * was invoked, then assign the task to root group and free the > * resource group. > + * > + * See pairing atomic_inc() in __rdtgroup_move_task() > */ > if (atomic_dec_and_test(&rdtgrp->waitcount) && > (rdtgrp->flags & RDT_DELETED)) { > - current->closid = 0; > - current->rmid = 0; > + WRITE_ONCE(current->closid, 0); > + WRITE_ONCE(current->rmid, 0); > kfree(rdtgrp); > } > > @@ -553,14 +555,32 @@ static int __rdtgroup_move_task(struct task_struct *tsk, > /* > * Take a refcount, so rdtgrp cannot be freed before the > * callback has been invoked. > + * > + * Also ensures above {closid, rmid} writes are observed by > + * move_myself(), as it can run immediately after task_work_add(). > + * Otherwise old values may be loaded, and the move will only actually > + * happen at the next context switch. But __resctrl_sched_in() can still occur at anytime and READ_ONCE() a pair of values that don't go together? I don't think this is a problem for RDT as with old-rmid the task was a member of that monitor-group previously, and 'freed' rmid are kept in limbo for a while after. (old-closid is the same as the task having not schedule()d since the change, which is fine). For MPAM, this is more annoying as changing just the closid may put the task in a monitoring group that never existed, meaning its surprise dirty later. If this all makes sense, I guess the fix (for much later) is to union closid/rmid, and WRITE_ONCE() them together where necessary. (I've made a note for when I next pass that part of the MPAM tree) > + * > + * Pairs with atomic_dec() in move_myself(). > */ > atomic_inc(&rdtgrp->waitcount); > + > ret = task_work_add(tsk, &callback->work, TWA_RESUME); > if (ret) { > /* Thanks! James