Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030795AbaGRNOa (ORCPT ); Fri, 18 Jul 2014 09:14:30 -0400 Received: from bband-dyn38.178-41-141.t-com.sk ([178.41.141.38]:17741 "EHLO ip4-83-240-18-248.cust.nbox.cz" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935041AbaGRNOZ (ORCPT ); Fri, 18 Jul 2014 09:14:25 -0400 From: Jiri Slaby To: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Yasuaki Ishimatsu , Tejun Heo , Jiri Slaby Subject: [PATCH 3.12 134/170] workqueue: zero cpumask of wq_numa_possible_cpumask on init Date: Fri, 18 Jul 2014 14:12:19 +0200 Message-Id: <0abca2c33101429cda2aae2a1c8aa2e9085c2df5.1405685481.git.jslaby@suse.cz> X-Mailer: git-send-email 2.0.0 In-Reply-To: <48e8cad86bb1241c08bdaa80db022c25068ff8e0.1405685481.git.jslaby@suse.cz> References: <48e8cad86bb1241c08bdaa80db022c25068ff8e0.1405685481.git.jslaby@suse.cz> In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yasuaki Ishimatsu 3.12-stable review patch. If anyone has any objections, please let me know. =============== commit 5a6024f1604eef119cf3a6fa413fe0261a81a8f3 upstream. When hot-adding and onlining CPU, kernel panic occurs, showing following call trace. BUG: unable to handle kernel paging request at 0000000000001d08 IP: [] __alloc_pages_nodemask+0x9d/0xb10 PGD 0 Oops: 0000 [#1] SMP ... Call Trace: [] ? cpumask_next_and+0x35/0x50 [] ? find_busiest_group+0x113/0x8f0 [] ? deactivate_slab+0x349/0x3c0 [] new_slab+0x91/0x300 [] __slab_alloc+0x2bb/0x482 [] ? copy_process.part.25+0xfc/0x14c0 [] ? load_balance+0x218/0x890 [] ? sched_clock+0x9/0x10 [] ? trace_clock_local+0x9/0x10 [] kmem_cache_alloc_node+0x8c/0x200 [] copy_process.part.25+0xfc/0x14c0 [] ? trace_buffer_unlock_commit+0x4d/0x60 [] ? kthread_create_on_node+0x140/0x140 [] do_fork+0xbc/0x360 [] kernel_thread+0x26/0x30 [] kthreadd+0x2c2/0x300 [] ? kthread_create_on_cpu+0x60/0x60 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_cpu+0x60/0x60 In my investigation, I found the root cause is wq_numa_possible_cpumask. All entries of wq_numa_possible_cpumask is allocated by alloc_cpumask_var_node(). And these entries are used without initializing. So these entries have wrong value. When hot-adding and onlining CPU, wq_update_unbound_numa() is called. wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq() calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set as follow: 3592 /* if cpumask is contained inside a NUMA node, we belong to that node */ 3593 if (wq_numa_enabled) { 3594 for_each_node(node) { 3595 if (cpumask_subset(pool->attrs->cpumask, 3596 wq_numa_possible_cpumask[node])) { 3597 pool->node = node; 3598 break; 3599 } 3600 } 3601 } But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong node is selected. As a result, kernel panic occurs. By this patch, all entries of wq_numa_possible_cpumask are allocated by zalloc_cpumask_var_node to initialize them. And the panic disappeared. Signed-off-by: Yasuaki Ishimatsu Reviewed-by: Lai Jiangshan Signed-off-by: Tejun Heo Fixes: bce903809ab3 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]") Signed-off-by: Jiri Slaby --- kernel/workqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index e0b695893579..3fafbbb31927 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5029,7 +5029,7 @@ static void __init wq_numa_init(void) BUG_ON(!tbl); for_each_node(node) - BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL, + BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL, node_online(node) ? node : NUMA_NO_NODE)); for_each_possible_cpu(cpu) { -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/