Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp261075pxb; Wed, 3 Nov 2021 03:53:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzzqQ3efuve/sTzOCA0pTfZbGCCS3Fo0mrS4GKqZ2z3EKNVGH+nlbNhVuMj49xTvjyYmFB X-Received: by 2002:a05:6402:507:: with SMTP id m7mr36593431edv.178.1635936809803; Wed, 03 Nov 2021 03:53:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635936809; cv=none; d=google.com; s=arc-20160816; b=vF56vHfJMMXFs9YIPKkwzitgNMABDBCeg4XOfhJU4Qs56eMJfSjU6+n2nIfYDFWjOI jk+H4kwEgnBUK+dg5BVS48j8aoK4SlC+Bg3lnkNJPR60jmpEqFZsVXcspquqnYVausJZ r2S+cMM/grwa3yZqfNJloM8nvSRKfntlRcxaewVulAZ7SkARsufBrZrPsb50hA43Osfz nNMf4VX9LKBqqEj/UBvsIZ7lfaqHzGG6nuhln0X/yfdQCYHm+2UZ+jWE5xQBxyfDiFk7 3X3bl45l+1HXrSIbFvUTeuwAOgvcOC048gf0xsxKl2x9t59icIS1ANH0u5qRtA30lQ25 2fDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :references:cc:to:from:dkim-signature; bh=gMAIe/HpbfVHf2ikO37SjLuvm/DSngUkGE0Cmmhrzj4=; b=QRH8/IDpoMnLlx0KRczEI2ExCeq1IU9CCrQH5l+8QFcz64Xgh7ShMxCdYJDRfizD2S 0xlSA2mbSUlJ1Nl92z4bkiLdKuka0g5dgWF5to+IEWA0EIrgOyrxRsS6c+v9Zf7veIzx 6p15F/glrk6I2Q+PBpvZ/wTyiKJoWopxbG5JMFb22JuNELo2qUsduRNRTPgVBQnPkqkW ZFlw0+XVC3YFve+zeezy8N9xCQDHFFGcoBCbUZ1tRoM+v8CGoMeDzeFNLWkIryI5algT ojx/sLnh/2iCCUUKpV0H6LJ6m3rglqZDyZWEuRIE4McunFXDczWtTRTEZpq/s8HadoBe wHLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@grsecurity.net header.s=grsec header.b="lZZcQ0/l"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=grsecurity.net Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ce13si3323997edb.153.2021.11.03.03.53.03; Wed, 03 Nov 2021 03:53:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@grsecurity.net header.s=grsec header.b="lZZcQ0/l"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=grsecurity.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230299AbhKCKxw (ORCPT + 99 others); Wed, 3 Nov 2021 06:53:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229506AbhKCKxv (ORCPT ); Wed, 3 Nov 2021 06:53:51 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06A27C061714 for ; Wed, 3 Nov 2021 03:51:15 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id u18so2899018wrg.5 for ; Wed, 03 Nov 2021 03:51:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=grsecurity.net; s=grsec; h=from:to:cc:references:subject:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=gMAIe/HpbfVHf2ikO37SjLuvm/DSngUkGE0Cmmhrzj4=; b=lZZcQ0/lo1xHou8Eq1OfN8n9dE7+QrGNGNTxYQJioSe6GWQ4M4LiceK2bxvs7flb2k 5aHpHrYvwr/u/k04JLJaMKRbx5W916Dh8DX+AAfMJIWuz8efpm7VWriei+2Wrxb26e5Z oUHhc1t9/mTqw9svduOJblDGg5XMZfvwRyErdNJmNNQz8yBl/J+wi2HuTE2VCl0AcsOD ZZPpiPGBRR8A8lumqUbPS0lnW4Uh1alE9a/FlThYwedKOd0jelr0Cqu/jisl4esFptjU hmtie1YtMh16lW59BxFb7PZz2C+txroyD546PGXFkrM7FVL2ZG8hzPb/iqdNd0IhDbL1 VtHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:references:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=gMAIe/HpbfVHf2ikO37SjLuvm/DSngUkGE0Cmmhrzj4=; b=iSzb9wuOTjDh5aTf6PQB/whWEMgGQXCvXLpqZS/y9jP+GDd8uJFdC0T/Z0uefxlZkK qc/VlRUBI8Ij0/0XSbw6xi4ifHB9uybbelJ9O85wci/IuGlqN3uBXGKV5dDuvwRTy93q cGGt1VOXPwSQoF2ebzCyKfmO5i4MguDKwyeEtWCtN7FwnwrW05DER9x9KB2/lKXbKINI rNNneMVGUZH3W6mhyaJ7CE79+8ir2/6cmarzu9G1zJI2VH23ZDe4wnmXiG4cwnWb4Ye8 UJR5pd14aNXQM8Pdo/kssBlr8sxFcdrU8DGABsVZTse66e3gM9nnFO2a0oq5Y+U/KSJp Obng== X-Gm-Message-State: AOAM533oB7qQlBsoXbP8wSMQd6UACS6ILfQhp6tF6qtr0X4RTWoWhvEP hdPDngJMu+Dw2IZKpe1dsQRkN1CCloxCp4cl X-Received: by 2002:a5d:64e2:: with SMTP id g2mr55066359wri.253.1635936673469; Wed, 03 Nov 2021 03:51:13 -0700 (PDT) Received: from [192.168.24.132] (pd9fe9a3e.dip0.t-ipconnect.de. [217.254.154.62]) by smtp.gmail.com with ESMTPSA id o12sm1598509wrc.85.2021.11.03.03.51.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 03 Nov 2021 03:51:13 -0700 (PDT) From: Mathias Krause To: =?UTF-8?Q?Michal_Koutn=c3=bd?= , Vincent Guittot , Odin Ugedal Cc: Kevin Tanguy , Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , linux-kernel References: <20211011172236.11223-1-mkoutny@suse.com> <20211013142643.GA48428@blackbody.suse.cz> <20211102160228.GA57072@blackbody.suse.cz> <73b4bddb-335b-1f25-a203-199be546e44a@grsecurity.net> Subject: Re: task_group unthrottling and removal race (was Re: [PATCH] sched/fair: Use rq->lock when checking cfs_rq list) presence Message-ID: Date: Wed, 3 Nov 2021 11:51:12 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <73b4bddb-335b-1f25-a203-199be546e44a@grsecurity.net> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Heh, sometimes a good night sleep helps unfolding the knot in the head! Am 03.11.21 um 10:51 schrieb Mathias Krause: > [snip] > > We tried the below patch which, unfortunately, doesn't fix the issue. So > there must be something else. :( > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 978460f891a1..afee07e9faf9 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -9506,13 +9506,17 @@ void sched_offline_group(struct task_group *tg) > { > unsigned long flags; > > - /* End participation in shares distribution: */ > - unregister_fair_sched_group(tg); > - > + /* > + * Unlink first, to avoid walk_tg_tree_from() from finding us > + * (via sched_cfs_period_timer()). > + */ > spin_lock_irqsave(&task_group_lock, flags); > list_del_rcu(&tg->list); > list_del_rcu(&tg->siblings); > spin_unlock_irqrestore(&task_group_lock, flags); > + > + /* End participation in shares distribution: */ Adding synchronize_rcu() here will ensure all concurrent RCU "readers" will have finished what they're doing, so we can unlink safely. That was, apparently, the missing piece. > + unregister_fair_sched_group(tg); > } > > static void sched_change_group(struct task_struct *tsk, int type) > Now, synchronize_rcu() is quite a heavy hammer. So using a RCU callback should be more appropriate. I'll hack up something and post a proper patch, if you don't beat me to. Mathias