Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751068AbcCJMya (ORCPT ); Thu, 10 Mar 2016 07:54:30 -0500 Received: from casper.infradead.org ([85.118.1.10]:33390 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750799AbcCJMyX (ORCPT ); Thu, 10 Mar 2016 07:54:23 -0500 Date: Thu, 10 Mar 2016 13:54:17 +0100 From: Peter Zijlstra To: Niklas Cassel Cc: tj@kernel.org, "linux-kernel@vger.kernel.org" Subject: Re: [BUG] sched: leaf_cfs_rq_list use after free Message-ID: <20160310125417.GW6344@twins.programming.kicks-ass.net> References: <56D9664D.8080503@axis.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56D9664D.8080503@axis.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3210 Lines: 52 On Fri, Mar 04, 2016 at 11:41:17AM +0100, Niklas Cassel wrote: > A snippet of the trace_printks I've added when analyzing the problem. > The prints show that a certain cfs_rq gets readded after it has been removed, > and that update_blocked_averages uses the cfs_rq which has already been freed: > > systemd-1 [000] 22.664453: bprint: alloc_fair_sched_group: allocated cfs_rq 0x8efb0780 tg 0x8efb1800 tg->css.id 0 > systemd-1 [000] 22.664479: bprint: alloc_fair_sched_group: allocated cfs_rq 0x8efb1680 tg 0x8efb1800 tg->css.id 0 > systemd-1 [000] 22.664481: bprint: cpu_cgroup_css_alloc: tg 0x8efb1800 tg->css.id 0 > systemd-1 [000] 22.664547: bprint: cpu_cgroup_css_online: tg 0x8efb1800 tg->css.id 80 > systemd-874 [001] 27.389000: bprint: list_add_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x0 > migrate_cert-820 [001] 27.421337: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 > kworker/0:1-24 [000] 27.421356: bprint: cpu_cgroup_css_offline: tg 0x8efb1800 tg->css.id 80 So we take the cgroup offline > kworker/0:1-24 [000] 27.421445: bprint: list_del_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 Remove our cfs_rq from the list > migrate_cert-820 [001] 27.421506: bprint: list_add_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x0 And stuff it back on again -> *FAIL* > system-status-815 [001] 27.491358: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 > kworker/0:1-24 [000] 27.501561: bprint: cpu_cgroup_css_free: tg 0x8efb1800 tg->css.id 80 > migrate_cert-820 [001] 27.511337: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 > ksoftirqd/0-3 [000] 27.521830: bprint: free_fair_sched_group: freeing cfs_rq 0x8efb0780 tg 0x8efb1800 tg->css.id 80 > ksoftirqd/0-3 [000] 27.521857: bprint: free_fair_sched_group: freeing cfs_rq 0x8efb1680 tg 0x8efb1800 tg->css.id 80 > logger-1252 [001] 27.531355: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x6b6b6b6b > > > I've reproduced this on v4.4, but I've also managed to reproduce the bug > after cherry-picking the following patches > (all but one were marked for v4.4 stable): > > 6fe1f34 sched/cgroup: Fix cgroup entity load tracking tear-down > d6e022f workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup > 041bd12 Revert "workqueue: make sure delayed work run in local cpu" > 8bb5ef7 cgroup: make sure a parent css isn't freed before its children > aa226ff cgroup: make sure a parent css isn't offlined before its children > e93ad19 cpuset: make mm migration asynchronous Hmm, that is most unfortunate indeed. Can you describe a reliable reproducer? So we only call list_add_leaf_cfs_rq() through enqueue_task_fair(), which means someone is still running inside that cgroup. TJ, I thought we only call offline when the cgroup is empty, don't we?