Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2146981pxb; Thu, 4 Nov 2021 14:54:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzID+D2oOeNZis03V0eXLMS2XnKob4JLEH4m02+ne9T03hbQ3fP+/GytGXHHF00ojb2+uf4 X-Received: by 2002:a92:dc0c:: with SMTP id t12mr29688862iln.198.1636062880054; Thu, 04 Nov 2021 14:54:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636062880; cv=none; d=google.com; s=arc-20160816; b=suUe/b/mSJX+XBeBvcc4WGa+jMoElMwA4QEXFjHql6lO52Ur8dS2p/tHqRg4DtKH5M +S6LzRnQ+d4aAX+3czo2cRf/e3q8G90Obv7s2T3TiloCYQmMuvUPpjm17eiywUjyQ/Rk xOVJPtzyWGyIAT0DyPcJetSXbmdcMy+HmngzUKbCcV4AkcAHyxq6BuVUhCVa4ObM+yjP WpErc7e14IS/slYNxoJC41yXOwRrfTvcvKdakyj5jevgLt+Lhu9976lF/1IBiS8eTPCQ T1Q3jnpEaVUrCJmojD3MnF558YqphRZzhCybVQC0FqX5CnCNiESWhueRBlCUX6M8wP/s 5hlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:in-reply-to:date:references:subject:cc:to :from:dkim-signature; bh=EviwvTQlRlyLRIR3hN/eKbYcE4bKQll4vj8Wevr8vpc=; b=d3iQwRleyMN9Mhl1hCBlSBON4KgMUDiXGi4GTcRQJv0RaBHUUnxWpIWIg8eQc79HPd 3d+nBYrVeUpvBmI6fkc2nd5n+hr6OT4lGLlu7TYxIOG4SOo9mAB2J30UO0gWmtZ130Zz 9yaE49s+wtCFjjhXlDvZrT3GMQmLhAjqUbct8a8PV2iCbbaS8gLUnG9qkRjF9ErY4PDV xG/Eai41VuqTm3VhMIMf6j7k/KPTx8tToLmCHQDTgj1QoEuHFTmwTtSBo1Tg9ozXSx+s 2bbbWbWi4X3JjDDLHa9WPmeYJdmGIAo1kkqA6naj3cXYD9nZCeqZpDoxYsAh2DHRC6jf T5Mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VMHDTF0K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y17si8505940ila.36.2021.11.04.14.54.25; Thu, 04 Nov 2021 14:54:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VMHDTF0K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232297AbhKDUtR (ORCPT + 99 others); Thu, 4 Nov 2021 16:49:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232290AbhKDUtP (ORCPT ); Thu, 4 Nov 2021 16:49:15 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56580C061714 for ; Thu, 4 Nov 2021 13:46:37 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id f8so9223307plo.12 for ; Thu, 04 Nov 2021 13:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=EviwvTQlRlyLRIR3hN/eKbYcE4bKQll4vj8Wevr8vpc=; b=VMHDTF0KXgHXm+kT+Fb0bxn8To3VI0xxVfS5zg0628g1qFkz2cy78mhBOgwOI1D+uO 3b577yGIfTyxnDG5Rd62CJPV4BXdHZ5FS/R6TG7xDM6iamRqT1t4EIq5otKaJ4LnJSKL 5KbMCZDXrGq38cwCxUSG0xV+6aUHlAp7cxc2s3GC2CrV+sZmn7TU14OxXqIZWfAIhIqA PtvFzy8oIwa/3jZyWq5zAQ2IfAAbXxDdKYUopkeUpFo+W3legsvx04TfuMWqZp14OOU1 qpdSJ7oTgqnsxxzPeCYrn5iHXXHfmDkKQpubfNMPhS92xrcBi1j9/ryRQpfq7LLgW+yG v1jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=EviwvTQlRlyLRIR3hN/eKbYcE4bKQll4vj8Wevr8vpc=; b=cfRN1Qw0SfbXHr8CnyEHUwuGlU736tAlFI796Bj/lyuiBYsrAEPfjzIh0GGrEGQTb6 4pdoCeXxxya/BfS9/p6VKgRWU+LzRafwqeaFhZT+/h+Q7K1pD0Ns+t8pGTRbaqiFM0MC 9ZY9o46uUil5MA72V3T+NBNl8hhGIHb8GWQ/i5sfBMyec/6MmqWT0RpAZVjtQ3BVgdRd K4WssIq17mK2qjKHudUx8fLYkz8SEI5d11To6QZ5Y5RFUGt78bVOjH+fk1CFeu+vyAFZ qbmjCo9SkPLo3EEJyvGsD0y72aOwV4iEL5lpjuBBtzg7TOAb+eh6NodIuPCkT6ejZNZ5 MP7A== X-Gm-Message-State: AOAM533iQxEGl/vAORmw2m99+aWgtdqrbZzL4skknv84eZXcBZOJzC+c xyDMOxlMxKCJU6NIBvV4MCjLhw== X-Received: by 2002:a17:902:c412:b0:141:f710:2a94 with SMTP id k18-20020a170902c41200b00141f7102a94mr24732805plk.1.1636058796654; Thu, 04 Nov 2021 13:46:36 -0700 (PDT) Received: from bsegall-glaptop.localhost (c-73-71-82-80.hsd1.ca.comcast.net. [73.71.82.80]) by smtp.gmail.com with ESMTPSA id c6sm5549050pfd.114.2021.11.04.13.46.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Nov 2021 13:46:34 -0700 (PDT) From: Benjamin Segall To: Mathias Krause Cc: Vincent Guittot , Ingo Molnar , Peter Zijlstra , Juri Lelli , Michal =?utf-8?Q?Koutn=C3=BD?= , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Odin Ugedal , Kevin Tanguy , Brad Spengler Subject: Re: [PATCH] sched/fair: Prevent dead task groups from regaining cfs_rq's References: <20211103190613.3595047-1-minipli@grsecurity.net> Date: Thu, 04 Nov 2021 13:46:33 -0700 In-Reply-To: (Mathias Krause's message of "Thu, 4 Nov 2021 16:13:17 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mathias Krause writes: > Am 04.11.21 um 09:50 schrieb Vincent Guittot: >> On Wed, 3 Nov 2021 at 23:04, Benjamin Segall wrote: >>> >>> Mathias Krause writes: >>> >>>> Kevin is reporting crashes which point to a use-after-free of a cfs_rq >>>> in update_blocked_averages(). Initial debugging revealed that we've li= ve >>>> cfs_rq's (on_list=3D1) in an about to be kfree()'d task group in >>>> free_fair_sched_group(). However, it was unclear how that can happen. >>>> [...] >>>> Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on= unthrottle") >>>> Cc: Odin Ugedal >>>> Cc: Michal Koutn=C3=BD >>>> Reported-by: Kevin Tanguy >>>> Suggested-by: Brad Spengler >>>> Signed-off-by: Mathias Krause >>>> --- >>>> kernel/sched/core.c | 18 +++++++++++++++--- >>>> 1 file changed, 15 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>>> index 978460f891a1..60125a6c9d1b 100644 >>>> --- a/kernel/sched/core.c >>>> +++ b/kernel/sched/core.c >>>> @@ -9506,13 +9506,25 @@ void sched_offline_group(struct task_group *tg) >>>> { >>>> unsigned long flags; >>>> >>>> - /* End participation in shares distribution: */ >>>> - unregister_fair_sched_group(tg); >>>> - >>>> + /* >>>> + * Unlink first, to avoid walk_tg_tree_from() from finding us (v= ia >>>> + * sched_cfs_period_timer()). >>>> + */ >>>> spin_lock_irqsave(&task_group_lock, flags); >>>> list_del_rcu(&tg->list); >>>> list_del_rcu(&tg->siblings); >>>> spin_unlock_irqrestore(&task_group_lock, flags); >>>> + >>>> + /* >>>> + * Wait for all pending users of this task group to leave their = RCU >>>> + * critical section to ensure no new user will see our dying task >>>> + * group any more. Specifically ensure that tg_unthrottle_up() w= on't >>>> + * add decayed cfs_rq's to it. >>>> + */ >>>> + synchronize_rcu(); >>> >>> I was going to suggest that we could just clear all of avg.load_sum/etc= , but >>> that breaks the speculative on_list read. Currently the final avg update >>> just races, but that's not good enough if we wanted to rely on it to >>> prevent UAF. synchronize_rcu() doesn't look so bad if the alternative is >>> taking every rqlock anyways. >>> >>> I do wonder if we can move the relevant part of >>> unregister_fair_sched_group into sched_free_group_rcu. After all >>> for_each_leaf_cfs_rq_safe is not _rcu and update_blocked_averages does >>> in fact hold the rqlock (though print_cfs_stats thinks it is _rcu and >>> should be updated). >>=20 >> I was wondering the same thing. >> we would have to move unregister_fair_sched_group() completely in >> sched_free_group_rcu() and probably in cpu_cgroup_css_free() too. > > Well, the point is, print_cfs_stats() pretty much relies on the list to > be stable, i.e. safe to traverse. It doesn't take locks while walking it > (beside the RCU lock), so if we would modify it concurrently, bad things > would happen. Yeah, my idea was that print_cfs_stats is just debug info so it wouldn't be a disaster to hold the lock, but I forgot that we probably can't hold it over all that writing.