Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7747671imu; Fri, 28 Dec 2018 04:16:48 -0800 (PST) X-Google-Smtp-Source: ALg8bN7LbjvVnYH6iHmwkzMhhdojZL/C15whzqUquDia+auGkNpn/gdrlUnDtdcGymTetRSAD+P1 X-Received: by 2002:a65:4946:: with SMTP id q6mr25492390pgs.201.1545999408433; Fri, 28 Dec 2018 04:16:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545999408; cv=none; d=google.com; s=arc-20160816; b=o0p4Jff7D2rcpr+R3kq2ofUBYHeM+RGHrlleYNvOei701Rq3V68kugUBdptDTEOeXy MVN8mlr7ag+RMfHhc5b9iNJtbXNFLCTnuM8cd+RqYee3tQ/TsKTNiyMFXV3b333W39Um 697M6Tt7xV39RvF/rniUPmFSMyyxmhHT951YPW5YnjbcQ1Qm7j4OVsB0Rk9MNJ7hYJcv tSF8HecwNy09stoVBS0LoFst4tRbQLV5KztV5CmMHKwYSt/fLlP8HjhlqlnrjcRKovw7 7PdFMnGQiEWrmFugnFw+XVYcOTxzcz0/Ji7f175A+AGxpBfF5WIr1PgsEJ2tun45aWKx suHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=4fbyeXMI0vtuOcbeeouQZpFx3cF7BhwKGe4FYzvN9t0=; b=FWxGc5a17Vl8A7pJQuAGWL9CZokSNki/ThzSFmFd8kapcvkmQetEoCjzTW96uRBMV4 +5uy4QEbbE1fykw6kSHN5wn0qQA5gvIFjdvb69GEMK2yRTNWwlLCnHoiOvmB9xLoyFyJ 8vIzDngQsfoS4gmQ7pjPl8o11RZdXBaaRIcz2hTvhb3Ulq5v1B097RgIP9RjxXwXnizN hmzZJeJRznRotOAYVp538oGC9p++gbI0sCsRXlX2dZ/ioX6oZaVCVrlaNna2tbHmB+d/ m6IsA82aFFjNnkb0euaOLHHbY+H89bKk0QCoBPbMz27xuVTxoIjoiNBgplZlMr6vNeSR YKlQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=RLn7eYmw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h9si38804506plb.180.2018.12.28.04.16.30; Fri, 28 Dec 2018 04:16:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=RLn7eYmw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730990AbeL1BPa (ORCPT + 99 others); Thu, 27 Dec 2018 20:15:30 -0500 Received: from mail-yw1-f65.google.com ([209.85.161.65]:37382 "EHLO mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727207AbeL1BPa (ORCPT ); Thu, 27 Dec 2018 20:15:30 -0500 Received: by mail-yw1-f65.google.com with SMTP id h193so7892506ywc.4 for ; Thu, 27 Dec 2018 17:15:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=4fbyeXMI0vtuOcbeeouQZpFx3cF7BhwKGe4FYzvN9t0=; b=RLn7eYmw1YOOcFI5wHGjnUxHLQgAEcP7Z8mbCe1G9uF1f3w0t0CneYgAe+obCJH198 X+UVRFzWONmSz8u69rvnQtLvuwChQ81cvgcCltlnbmb2KdWSjaVqYxR+BV9ToH/XR7rJ UHuGPWiGXE+KZ86LWVKjMMmwWnO3CFqXqloLtq1gjc5XCD8y4rnWFJ8sr9lfnd75BbXe wmgmRjLvtmqPyFa5WoRxRT1y0s6oqw6aA8MarBlxlt9V5PQOGmYwvSJOWxXhPdzwl763 guHyXZw80vuBmrwE22nTIkadXd0FB0593V/KWWgj/NY6w8f2SRZuX0X1Tgif6iQUqdhp jQ/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=4fbyeXMI0vtuOcbeeouQZpFx3cF7BhwKGe4FYzvN9t0=; b=LxMymZn7ldL1KZ1GoTsRq0z+M6uXdXZk9WnP50KQNLV9+90bnnoFrS33GCweXSmVdK 4QyipLOyZ7hiAVw19NiWlr1YSYF/VSTcwBjIYhR0pqWXD2Ai8VE+PvifbYGP1wauElqx hAK50acp7wvdSPa79rwg1l90VnYNBoMAs1KZ3noyPKjkJvH4vGi3DNx+KuQYkUBKusAY n4FP9olxdbKAs8nzmPBohetXZWCqJFIfe/71471Zy1XOQJTCvR47sOifri/97mx9/rl3 Dk74uDLO+ROD/RA0EEfzzW9RW1nwathOyvmy7+FhUxc8aMU5U3xMbY6daA5kPF4Z7aRm PYrg== X-Gm-Message-State: AA+aEWZ7UcZkbR+Stly2jKvRvvbRJ1jbYOiq2wYSOvLlrYnUUOCNy1SG McWUVNfrkVJDYAVz2+v2ViySauIl X-Received: by 2002:a81:e50d:: with SMTP id s13mr25716307ywl.405.1545959729252; Thu, 27 Dec 2018 17:15:29 -0800 (PST) Received: from localhost ([2620:10d:c091:180::1:7729]) by smtp.gmail.com with ESMTPSA id z74sm16947845ywz.51.2018.12.27.17.15.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Dec 2018 17:15:27 -0800 (PST) Date: Thu, 27 Dec 2018 17:15:24 -0800 From: Tejun Heo To: Linus Torvalds Cc: Vincent Guittot , Sargun Dhillon , Xie XiuQi , Ingo Molnar , Peter Zijlstra , xiezhipeng1@huawei.com, huawei.libin@huawei.com, linux-kernel , Dmitry Adamushko , Rik van Riel Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages Message-ID: <20181228011524.GF2509588@devbig004.ftw2.facebook.com> References: <1545879866-27809-1-git-send-email-xiexiuqi@huawei.com> <20181227102107.GA21156@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Happy holidays, everyone. (cc'ing Rik, who has been looking at the scheduler code a lot lately) On Thu, Dec 27, 2018 at 10:15:17AM -0800, Linus Torvalds wrote: > [ goes off and looks ] > > Oh. unthrottle_cfs_rq -> enqueue_entity -> list_add_leaf_cfs_rq() > doesn't actually seem to hold the rq lock at all. It's just called > under a rcu read lock. I'm pretty sure enqueue_entity() *has* to be called with rq lock. unthrottle_cfs_rq() is called from tg_set_cfs_bandwidth(), distribute_cfs_runtime() and unthrottle_offline_cfs_rqs. The first two grabs the rq_lock just around the calls and the last one has a lockdep assert on the rq_lock. What am I missing? > So it all seems to depend on that "on_list" flag for exclusion. Which > seems fundamentally racy, since it's not protected by a lock. The only place on_list is accessed without holding rq_lock is unregister_fair_sched_group(). It's a minor optimization on a relatively cold path (group destruction), so if it's racy there, I think we can take out that optimization. I'd be surprised if anyone notices that. That said, I don't think it's broken. False positive on on_list is fine and I can't see how a false negative would happen given that the only event which can set it is the sched entity getting scheduled and there's no way the removal path can't race against that transition. > But that still makes me go "how come is this only noticed 18 months > after the fact"? Unless I'm totally confused, which is definitely possible, I don't think there's a race condition and the only bug is the tmp_alone_branch pointer getting dangled, which maybe doesn't happen all that much? Thanks. -- tejun