Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757903AbbFQHgB (ORCPT ); Wed, 17 Jun 2015 03:36:01 -0400 Received: from cnc3.corp-email.com ([203.166.176.46]:1674 "EHLO cnc3.corp-email.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757784AbbFQHfz (ORCPT ); Wed, 17 Jun 2015 03:35:55 -0400 X-Greylist: delayed 1464 seconds by postgrey-1.27 at vger.kernel.org; Wed, 17 Jun 2015 03:35:55 EDT From: gongzg To: , CC: GongZhaogang , SongXiumiao Subject: [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded. Date: Wed, 17 Jun 2015 11:46:14 -0400 Message-ID: <1434555974-16352-1-git-send-email-gongzhaogang@inspur.com> X-Mailer: git-send-email 1.7.1 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.166.15.236] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10271 Lines: 165 From: GongZhaogang By analysing the bug function call trace,we find that create_worker function will alloc the memory from node0.Because node0 is offline,the allocation is failed.Then we add a condition to ensure the node is online and system can alloc memory from a node that is online. Follow is the bug information: [root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online [ 225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40 [18446744029.482996] kvm: enabling virtualization on CPU90 [ 225.725503] TSC synchronization [CPU#43 -> CPU#90]: [ 225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning off TSC clock. [ 225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed [ 225.755126] BUG: unable to handle kernel paging request at 0000000000001b08 [ 225.762931] IP: [] __alloc_pages_nodemask+0xb7/0x940 [ 225.770247] PGD 449bb0067 PUD 46110e067 PMD 0 [ 225.775248] Oops: 0000 [#1] SMP [ 225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd [ 225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 4.0.0-rc4-bug-fixed-remove #16 [ 225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015 [ 225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: ffff880446120000 [ 225.896484] RIP: 0010:[] [] __alloc_pages_nodemask+0xb7/0x940 [ 225.906509] RSP: 0018:ffff880446123918 EFLAGS: 00010246 [ 225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 0000000000000000 [ 225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000002052d0 [ 225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 0000000060eca101 [ 225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 000000000001002a [ 225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 00000000000040d0 [ 225.952306] FS: 00007f9386450740(0000) GS:ffff88046db60000(0000) knlGS:0000000000000000 [ 225.961346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 00000000001407e0 [ 225.975735] Stack: [ 225.977981] 00000000002052d0 0000000000000000 0000000000000003 ffff88045a3d8da0 [ 225.986291] ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 0000000000000000 [ 225.994597] 000080d000000002 ffff88046d005500 000000000003000f 002052d0002052d0 [ 226.002904] Call Trace: [ 226.005645] [] ? alloc_pages_current+0x91/0x100 [ 226.012557] [] ? deactivate_slab+0x383/0x400 [ 226.019173] [] new_slab+0xa7/0x460 [ 226.024826] [] __slab_alloc+0x310/0x470 [ 226.030960] [] ? get_from_free_list+0x46/0x60 [ 226.037679] [] ? alloc_worker+0x21/0x50 [ 226.043812] [] kmem_cache_alloc_node_trace+0x91/0x250 [ 226.051299] [] alloc_worker+0x21/0x50 [ 226.057236] [] create_worker+0x53/0x1e0 [ 226.063357] [] alloc_unbound_pwq+0x2a2/0x510 [ 226.069974] [] wq_update_unbound_numa+0x1b4/0x220 [ 226.077076] [] workqueue_cpu_up_callback+0x308/0x3d0 [ 226.084468] [] notifier_call_chain+0x4e/0x80 [ 226.091084] [] __raw_notifier_call_chain+0xe/0x10 [ 226.098189] [] cpu_notify+0x23/0x50 [ 226.103929] [] _cpu_up+0x188/0x1a0 [ 226.109574] [] cpu_up+0x89/0xb0 [ 226.114923] [] cpu_subsys_online+0x40/0x90 [ 226.121350] [] device_online+0x6d/0xa0 [ 226.127382] [] online_store+0x95/0xa0 [ 226.133322] [] dev_attr_store+0x18/0x30 [ 226.139457] [] sysfs_kf_write+0x3d/0x50 [ 226.145586] [] kernfs_fop_write+0x12a/0x180 [ 226.152109] [] vfs_write+0xb7/0x1f0 [ 226.157853] [] ? do_audit_syscall_entry+0x6c/0x70 [ 226.164954] [] SyS_write+0x55/0xd0 [ 226.170595] [] system_call_fastpath+0x12/0x17 [ 226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01 [ 226.199175] RIP [] __alloc_pages_nodemask+0xb7/0x940 [ 226.206576] RSP [ 226.210471] CR2: 0000000000001b08 [ 226.227939] ---[ end trace 30d753e1e1124696 ]--- [ 226.412591] Kernel panic - not syncing: Fatal exception [ 226.430948] Kernel Offset: disabled [ 226.434845] drm_kms_helper: panic occurred, switching back to text console [ 226.618325] ---[ end Kernel panic - not syncing: Fatal exception [ 226.625047] ------------[ cut here ]------------ [ 226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60() [ 226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd [ 226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G D 4.0.0-rc4-bug-fixed-remove #16 [ 226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015 [ 226.751558] 0000000000000000 00000000aa535e80 ffff88046db63d58 ffffffff8167aa08 [ 226.759865] 0000000000000000 0000000000000000 ffff88046db63d98 ffffffff810772da [ 226.768173] ffff88046db63d98 0000000000000000 ffff88046d615380 000000000000002b [ 226.776480] Call Trace: [ 226.779212] [] dump_stack+0x45/0x57 [ 226.785657] [] warn_slowpath_common+0x8a/0xc0 [ 226.792367] [] warn_slowpath_null+0x1a/0x20 [ 226.798886] [] native_smp_send_reschedule+0x5d/0x60 [ 226.806182] [] trigger_load_balance+0x145/0x1b0 [ 226.813093] [] scheduler_tick+0x9c/0xe0 [ 226.819228] [] update_process_times+0x51/0x60 [ 226.825946] [] tick_sched_handle.isra.18+0x25/0x60 [ 226.833143] [] tick_sched_timer+0x44/0x80 [ 226.839467] [] __run_hrtimer+0x77/0x1d0 [ 226.845590] [] ? tick_sched_handle.isra.18+0x60/0x60 [ 226.852980] [] hrtimer_interrupt+0x103/0x230 [ 226.859596] [] local_apic_timer_interrupt+0x39/0x60 [ 226.866883] [] smp_apic_timer_interrupt+0x45/0x60 [ 226.873982] [] apic_timer_interrupt+0x6d/0x80 [ 226.880690] [] ? panic+0x1c3/0x204 [ 226.887036] [] ? panic+0x1bc/0x204 [ 226.892682] [] oops_end+0x109/0x120 [ 226.898422] [] no_context+0x2ee/0x366 [ 226.904359] [] __bad_area_nosemaphore+0x73/0x1cc [ 226.911361] [] bad_area+0x44/0x4c [ 226.916910] [] __do_page_fault+0x2ea/0x420 [ 226.923331] [] do_page_fault+0x31/0x70 [ 226.929364] [] page_fault+0x28/0x30 [ 226.935106] [] ? alloc_worker+0x21/0x50 [ 226.941235] [] ? __alloc_pages_nodemask+0xb7/0x940 [ 226.948430] [] ? __alloc_pages_nodemask+0x225/0x940 [ 226.955725] [] ? alloc_pages_current+0x91/0x100 [ 226.962624] [] ? deactivate_slab+0x383/0x400 [ 226.969239] [] new_slab+0xa7/0x460 [ 226.974885] [] __slab_alloc+0x310/0x470 [ 226.981015] [] ? get_from_free_list+0x46/0x60 [ 226.987727] [] ? alloc_worker+0x21/0x50 [ 226.993851] [] kmem_cache_alloc_node_trace+0x91/0x250 [ 227.001340] [] alloc_worker+0x21/0x50 [ 227.007275] [] create_worker+0x53/0x1e0 [ 227.013404] [] alloc_unbound_pwq+0x2a2/0x510 [ 227.020019] [] wq_update_unbound_numa+0x1b4/0x220 [ 227.027112] [] workqueue_cpu_up_callback+0x308/0x3d0 [ 227.034502] [] notifier_call_chain+0x4e/0x80 [ 227.041117] [] __raw_notifier_call_chain+0xe/0x10 [ 227.048219] [] cpu_notify+0x23/0x50 [ 227.053961] [] _cpu_up+0x188/0x1a0 [ 227.059597] [] cpu_up+0x89/0xb0 [ 227.064950] [] cpu_subsys_online+0x40/0x90 [ 227.071372] [] device_online+0x6d/0xa0 [ 227.077395] [] online_store+0x95/0xa0 [ 227.083332] [] dev_attr_store+0x18/0x30 [ 227.089460] [] sysfs_kf_write+0x3d/0x50 [ 227.095589] [] kernfs_fop_write+0x12a/0x180 [ 227.102108] [] vfs_write+0xb7/0x1f0 [ 227.107850] [] ? do_audit_syscall_entry+0x6c/0x70 [ 227.114950] [] SyS_write+0x55/0xd0 [ 227.120595] [] system_call_fastpath+0x12/0x17 [ 227.127306] ---[ end trace 30d753e1e1124697 ]--- Signed-off-by: SongXiumiao --- kernel/workqueue.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 586ad91..22d194c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs) if (wq_numa_enabled) { for_each_node(node) { if (cpumask_subset(pool->attrs->cpumask, - wq_numa_possible_cpumask[node])) { + wq_numa_possible_cpumask[node]) + && node_online(node)) { pool->node = node; break; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/