Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7324225imu; Thu, 27 Dec 2018 17:49:29 -0800 (PST) X-Google-Smtp-Source: AFSGD/X+5cYgWtJoKnVPcAaQRE8RHgf5JYCpXwGn5vJWX5CZ3F0D71jM4hMzb2Cn8Y74/xGzxA6b X-Received: by 2002:a62:798f:: with SMTP id u137mr26385356pfc.168.1545961769323; Thu, 27 Dec 2018 17:49:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545961769; cv=none; d=google.com; s=arc-20160816; b=xpsUSyuyB85mF7iuzBfOXwYHR1u2tXQh9lAz5+JM0gPWfqCC1r7CWRTeQTEyVsyd/1 dprISq303bAe+kvHwP62Aur29ck8L3+dumBUSgYOH4BtHGLU8mhnh62ErGgcTitPApjR t8pM0K+gWTEE0iREt0Qt6XLWPYfSmTmeapUbk7GfUBuf12t2PA2RUI633cXn2/8J9ShM sGiWvGvMEXR+Po4ubYFS6Ls1Dy3NuOrwuO2q6ENtAXA4AtWpq90v0EODPGNNZLaofRXe 2Tj10fNekWiY11sHyZFF+gTjwA2f4Zhzl3QmI18l2a08ijtGp4K7A3jzcZxb859XnbEg 3Exw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ZGMMdRR51gZrz3buUuepUm5lmSENf5MPqQ/IfjT9YLc=; b=WpPGZ/ItiSlVZGcsIP5Pu+MKEybFPoSbhsQ/XocggV8pQaxpGD4E3XJ3HEfaYPtlmh /CSqk2Yr/ple/zcjwJOSAiI8Sags0Pk4/sQ48+bPCxZWqPVtIlsb4iDyKg5WzvcKWPGQ 6jrT56D2/WvR+YG5z/kY75b17bFYnslrb/pt+L8+4pUv9oUEF6wdUx59M/8mihtdx9no aiO7D1dNfOGeemTYurV+o4n292oTELbkAnTr1AkXM6Z5F10i20mnpK5kBc3i/JQCtLkC /bpblIJJUQCpsCXbjbL7nO2xmCEvZ4EpIZQoqxcic6WYU1r+PSqVUaDyN4143divhm+k rURg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=IJgmifz8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j29si38498904pga.550.2018.12.27.17.49.14; Thu, 27 Dec 2018 17:49:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=IJgmifz8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728739AbeL0SPi (ORCPT + 99 others); Thu, 27 Dec 2018 13:15:38 -0500 Received: from mail-lj1-f193.google.com ([209.85.208.193]:35773 "EHLO mail-lj1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727118AbeL0SPg (ORCPT ); Thu, 27 Dec 2018 13:15:36 -0500 Received: by mail-lj1-f193.google.com with SMTP id x85-v6so16911454ljb.2 for ; Thu, 27 Dec 2018 10:15:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZGMMdRR51gZrz3buUuepUm5lmSENf5MPqQ/IfjT9YLc=; b=IJgmifz8ghi4hLPl0jnkUo3AGlk946gOb/GHMmxt5PCZbUw28k888n7Vvg2sLzDYhX v3MHTlndwIpXWO+UELsfy97FBSQnK7jdmRt7aDOxCvBjIlSUi/ATw9A9s5IaQKmHXZCp awRnzryCDCIDs8BiMJIBv0xDrjxccfWkJOY4o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZGMMdRR51gZrz3buUuepUm5lmSENf5MPqQ/IfjT9YLc=; b=pGV88VFQ2N0QKjVQ9z/rHOgEQiFf6I8ovDxNX77c4amKf4cMoXlHXwpZ3i8552DJ9Z tPNmjeCXZsysc6f675L6xH+81O61jIW3tAzc3MK3PlLT/Jfnp/9qXhszkexl/buTDsYL vsKYzd2cUDsiM4PYu1wbZ10HNc4JynEhfCWNMH+BcOU7/nw7VT5gczBz00g8lq6+T4qB GfEEFa71eRGDd7DjXyRht28AkjDqe//3GgBqBqzMpDtOw0aumbKHIMXhStnqygV9vGU7 MYdK/A9iT8J3xGZ9TfP4WbLrTccWcbUI6NciaHQ8jLH8+OJVVNVKO9+MnKSLrpWtqosf U6Yg== X-Gm-Message-State: AJcUukdhrLRhUHe/KlanywVPFVI1XAlEp1GfXKf0ITR3i7/7MoQv8Mil sQbx/RofgPfj8OB7h3XcuEk+SunBBSU= X-Received: by 2002:a2e:1b47:: with SMTP id b68-v6mr11961332ljb.104.1545934534634; Thu, 27 Dec 2018 10:15:34 -0800 (PST) Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com. [209.85.208.180]) by smtp.gmail.com with ESMTPSA id d15-v6sm8136477lja.38.2018.12.27.10.15.33 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Dec 2018 10:15:33 -0800 (PST) Received: by mail-lj1-f180.google.com with SMTP id t18-v6so16889791ljd.4 for ; Thu, 27 Dec 2018 10:15:33 -0800 (PST) X-Received: by 2002:a2e:3e04:: with SMTP id l4-v6mr13657272lja.148.1545934533233; Thu, 27 Dec 2018 10:15:33 -0800 (PST) MIME-Version: 1.0 References: <1545879866-27809-1-git-send-email-xiexiuqi@huawei.com> <20181227102107.GA21156@linaro.org> In-Reply-To: From: Linus Torvalds Date: Thu, 27 Dec 2018 10:15:17 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages To: Vincent Guittot Cc: Sargun Dhillon , Xie XiuQi , Ingo Molnar , Peter Zijlstra , xiezhipeng1@huawei.com, huawei.libin@huawei.com, linux-kernel , Dmitry Adamushko , Tejun Heo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 27, 2018 at 9:02 AM Vincent Guittot wrote: > > In the original behavior, the cs_rq was removed from the list only > when the cgroup was removed. > patch a9e7f6544b9c (sched/fair: Fix O(nr_cgroups) in load balance > path) has added an optimization which remove the cfs_rq when there > were no blocked load to update in order to optimize the loop but it > has introduced a race condition that create this infinite loop. The > patch fixes the problem by removing the optimization. > I will look at re-adding the optimization once i will have afix for > the race condition Hmm. What's the race? We seem to take the rq lock for all the cases, but maybe I'm missing something? That commit a9e7f6544b9c is a year and a half old, why did this start being reported now? [ goes off and looks ] Oh. unthrottle_cfs_rq -> enqueue_entity -> list_add_leaf_cfs_rq() doesn't actually seem to hold the rq lock at all. It's just called under a rcu read lock. So it all seems to depend on that "on_list" flag for exclusion. Which seems fundamentally racy, since it's not protected by a lock. So yeah, the whole logic seems to depend on "on_list is sticky and stays set until the whole task group is destroyed". So commit a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") would appear to be entirely wrong, because on_list isn't actually protected by a lock, and that can confuse things. But that still makes me go "how come is this only noticed 18 months after the fact"? So I'm probably still missing something. Tejun? PeterZ? Tell my why I'm being dense. Linus