Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759250AbcCDKlk (ORCPT ); Fri, 4 Mar 2016 05:41:40 -0500 Received: from bes.se.axis.com ([195.60.68.10]:57756 "EHLO bes.se.axis.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757753AbcCDKlf (ORCPT ); Fri, 4 Mar 2016 05:41:35 -0500 From: Niklas Cassel To: , , "linux-kernel@vger.kernel.org" Subject: [BUG] sched: leaf_cfs_rq_list use after free Message-ID: <56D9664D.8080503@axis.com> Date: Fri, 4 Mar 2016 11:41:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.0.5.55] X-ClientProxiedBy: XBOX02.axis.com (10.0.5.16) To XBOX02.axis.com (10.0.5.16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8562 Lines: 117 Hello I've stumbled upon a use after free bug related to CONFIG_FAIR_GROUP_SCHED / rq->cfs_rq->leaf_cfs_rq_list in v4.4. Normally, a cfs_rq is immediately removed from the leaf_cfs_rq_list and cfs_rq->onlist is set to 0, then the cfs_rq is freed at a later time by call_rcu(&tg->rcu, free_sched_group_rcu). What happens when we crash is that a cfs_rq is immediately removed from the leaf_cfs_rq_list and cfs_rq->onlist is set to 0, however then the cfs_rq is readded to the list, cfs_rq->onlist gets set to 1, then comes the call to call_rcu(&tg->rcu, free_sched_group_rcu). Now the cfs_rq is freed, filled with 0x6b6b6b6b by SLUB_DEBUG, and still on the leaf_cfs_rq_list. Since the cfs_rq is still on the list, the next call to update_blocked_averages will iterate the list and will try to access members of the cfs_rq object, an object which has already been freed. [ 27.531374] Unable to handle kernel paging request at virtual address 6b6b706b [ 27.538596] pgd = 8cea8000 [ 27.541295] [6b6b706b] *pgd=00000000 [ 27.544870] Internal error: Oops: 1 [#1] PREEMPT SMP ARM [ 27.564025] CPU: 1 PID: 1252 Comm: logger Tainted: G O 4.4.0 #2 [ 27.571064] Hardware name: Axis ARTPEC-6 Platform [ 27.575759] task: b9586540 ti: 8c84c000 task.ti: 8c84c000 [ 27.581155] PC is at update_blocked_averages+0xcc/0x748 [ 27.586372] LR is at update_blocked_averages+0xbc/0x748 [ 27.591589] pc : [<80051d78>] lr : [<80051d68>] psr: 200c0193 sp : 8c84dce8 ip : 00000500 fp : 8efb1680 [ 27.603056] r10: 00000006 r9 : 80847788 r8 : 6b6b6b6b [ 27.608271] r7 : 00000007 r6 : ffff958a r5 : 00000007 r4 : ffff958a [ 27.614789] r3 : 6b6b6b6b r2 : 00000101 r1 : 00000000 r0 : 00000003 [ 27.621308] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 27.628521] Control: 10c5387d Table: 0cea804a DAC: 00000055 [ 27.634257] Process logger (pid: 1252, stack limit = 0x8c84c210) [ 27.640254] Stack: (0x8c84dce8 to 0x8c84e000) [ 27.644604] dce0: 6b6b6b6b 00000103 bad39440 80048250 00000000 bad398d0 [ 27.652774] dd00: bf6cf0d0 00000001 807e2c48 bad398d0 00000000 8054e7c8 ffff4582 bf6cec00 [ 27.660944] dd20: 00000001 8004825c 00000100 807dc400 8c84de40 bf6cb340 bad87ebc 00000100 [ 27.669114] dd40: afb50401 200c0113 00000200 807dc400 807e2100 ffff958a 00000007 8083916c [ 27.677283] dd60: 00000100 00000006 0000001c 80058748 bf6cb340 8054e810 00000000 00000001 [ 27.685452] dd80: 807dc400 bf6cec00 00000001 bf6cec00 8083916c 00000001 c0803100 807dc400 [ 27.693622] dda0: 807e209c 000000a0 00000007 8083916c 00000100 00000006 0000001c 800282a0 [ 27.701791] ddc0: 00000001 bf6d2a80 b95fac00 0000000a ffff958b 00400000 bacf7000 807dc400 [ 27.709961] dde0: 00000000 00000000 0000001b bf0188c0 00000001 c0803100 b95fac00 80028830 [ 27.718130] de00: 807dc400 8006ca14 c0802100 c080210c 807e2db0 8081a140 8c84de40 80009420 [ 27.726300] de20: 8054e780 80122048 800c0013 ffffffff 8c84de74 00000001 00100073 800142c0 [ 27.734469] de40: b95ace70 b9586540 00000000 00000000 600c0013 00000000 024080c0 8010e8e0 [ 27.742639] de60: 00000001 00000001 00100073 b95fac00 00000000 8c84de90 8054e780 80122048 [ 27.750808] de80: 800c0013 ffffffff b95fac00 80122044 bad00640 8011c418 000001f6 b95acb70 [ 27.758978] dea0: 76f42000 b95acb70 76f42000 b95acb68 76f43000 8ce48780 00100073 8010e8e0 [ 27.767148] dec0: 00100073 00000000 b95fac00 00000000 00000000 00000001 b9421000 00000001 [ 27.775317] dee0: 00000000 76f46000 00000000 00000000 8001e8b8 76f42000 00000003 00000003 [ 27.783486] df00: b95fac00 8ce48780 00000001 00001000 807e2c64 8010efb4 00000000 00000000 [ 27.791656] df20: 0000004d 00000073 8c84df50 8ce487c4 b95fac00 00000003 00000013 00000000 [ 27.799825] df40: 8c84c000 b95fac00 7ece0b44 800faf84 00000002 00000000 00000000 8c84df64 [ 27.807995] df60: b95fac00 00000000 00000002 00000003 00000013 00000000 00000000 8010d4e8 [ 27.816163] df80: 00000002 00000000 00000003 00000003 00000000 00000003 000000c0 800104e4 [ 27.824333] dfa0: 00000020 800104b0 00000003 00000000 00000000 00000013 00000003 00000002 [ 27.832502] dfc0: 00000003 00000000 00000003 000000c0 0007ecd0 76f45958 76f45574 7ece0b44 [ 27.840671] dfe0: 00000000 7ece09fc 76f2e814 76f368d8 400c0010 00000000 00000000 00000000 [ 27.848847] [<80051d78>] (update_blocked_averages) from [<80058748>] (rebalance_domains+0x38/0x2cc) [ 27.857889] [<80058748>] (rebalance_domains) from [<800282a0>] (__do_softirq+0x98/0x354) [ 27.865975] [<800282a0>] (__do_softirq) from [<80028830>] (irq_exit+0xb0/0x11c) [ 27.873281] [<80028830>] (irq_exit) from [<8006ca14>] (__handle_domain_irq+0x60/0xb8) [ 27.881106] [<8006ca14>] (__handle_domain_irq) from [<80009420>] (gic_handle_irq+0x48/0x94) [ 27.889452] [<80009420>] (gic_handle_irq) from [<800142c0>] (__irq_svc+0x40/0x74) [ 27.896924] Exception stack(0x8c84de40 to 0x8c84de88) [ 27.901969] de40: b95ace70 b9586540 00000000 00000000 600c0013 00000000 024080c0 8010e8e0 [ 27.910139] de60: 00000001 00000001 00100073 b95fac00 00000000 8c84de90 8054e780 80122048 [ 27.918306] de80: 800c0013 ffffffff [ 27.921793] [<800142c0>] (__irq_svc) from [<80122048>] (__slab_alloc.constprop.9+0x28/0x2c) [ 27.930139] [<80122048>] (__slab_alloc.constprop.9) from [<8011c418>] (kmem_cache_alloc+0x14c/0x204) [ 27.939265] [<8011c418>] (kmem_cache_alloc) from [<8010e8e0>] (mmap_region+0x29c/0x680) [ 27.947262] [<8010e8e0>] (mmap_region) from [<8010efb4>] (do_mmap+0x2f0/0x378) [ 27.954481] [<8010efb4>] (do_mmap) from [<800faf84>] (vm_mmap_pgoff+0x74/0xa4) [ 27.961699] [<800faf84>] (vm_mmap_pgoff) from [<8010d4e8>] (SyS_mmap_pgoff+0x94/0xf0) [ 27.969524] [<8010d4e8>] (SyS_mmap_pgoff) from [<800104b0>] (__sys_trace_return+0x0/0x10) [ 27.977694] Code: e59b8078 e59b309c e3a0cc05 e3580000 (e18300dc) A snippet of the trace_printks I've added when analyzing the problem. The prints show that a certain cfs_rq gets readded after it has been removed, and that update_blocked_averages uses the cfs_rq which has already been freed: systemd-1 [000] 22.664453: bprint: alloc_fair_sched_group: allocated cfs_rq 0x8efb0780 tg 0x8efb1800 tg->css.id 0 systemd-1 [000] 22.664479: bprint: alloc_fair_sched_group: allocated cfs_rq 0x8efb1680 tg 0x8efb1800 tg->css.id 0 systemd-1 [000] 22.664481: bprint: cpu_cgroup_css_alloc: tg 0x8efb1800 tg->css.id 0 systemd-1 [000] 22.664547: bprint: cpu_cgroup_css_online: tg 0x8efb1800 tg->css.id 80 systemd-874 [001] 27.389000: bprint: list_add_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x0 migrate_cert-820 [001] 27.421337: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 kworker/0:1-24 [000] 27.421356: bprint: cpu_cgroup_css_offline: tg 0x8efb1800 tg->css.id 80 kworker/0:1-24 [000] 27.421445: bprint: list_del_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 migrate_cert-820 [001] 27.421506: bprint: list_add_leaf_cfs_rq: cfs_rq 0x8efb1680 cpu 1 on_list 0x0 system-status-815 [001] 27.491358: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 kworker/0:1-24 [000] 27.501561: bprint: cpu_cgroup_css_free: tg 0x8efb1800 tg->css.id 80 migrate_cert-820 [001] 27.511337: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x1 ksoftirqd/0-3 [000] 27.521830: bprint: free_fair_sched_group: freeing cfs_rq 0x8efb0780 tg 0x8efb1800 tg->css.id 80 ksoftirqd/0-3 [000] 27.521857: bprint: free_fair_sched_group: freeing cfs_rq 0x8efb1680 tg 0x8efb1800 tg->css.id 80 logger-1252 [001] 27.531355: bprint: update_blocked_averages: cfs_rq 0x8efb1680 cpu 1 on_list 0x6b6b6b6b I've reproduced this on v4.4, but I've also managed to reproduce the bug after cherry-picking the following patches (all but one were marked for v4.4 stable): 6fe1f34 sched/cgroup: Fix cgroup entity load tracking tear-down d6e022f workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup 041bd12 Revert "workqueue: make sure delayed work run in local cpu" 8bb5ef7 cgroup: make sure a parent css isn't freed before its children aa226ff cgroup: make sure a parent css isn't offlined before its children e93ad19 cpuset: make mm migration asynchronous