Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2069405imu; Sat, 10 Nov 2018 07:10:15 -0800 (PST) X-Google-Smtp-Source: AJdET5fN8TvBa6Iu9+NZGVyNW4Smb8aABq629KoqQol8pSkmupflLUkMaLrBwMjQaLJalg8sxtMo X-Received: by 2002:a17:902:7c87:: with SMTP id y7-v6mr13188661pll.232.1541862615146; Sat, 10 Nov 2018 07:10:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541862615; cv=none; d=google.com; s=arc-20160816; b=L/GrlwahbXsEmAwVh1xDT/zpwM89KSaQk3+xFlRZyb4i19M+XuqUPaRSWmWGw0Hnzt Guh15PXUySdxe/V/Nk2rH9bhXtZU3GdqEC7z+DJwqecnyVa7nGCyXdbU3bE6bhDgh7YW AA9KxLLVsTH012U9rOrveWx7qy08Nu6Th4OQGBVnqc28C0KWeOZCGefwII0p34x2s1KD JzUQ4eiSALS/opVIdBs39n8XTuLYH4TAJ7x2QXtfwSP6vYJHePnHPt7JODKg6GxuVUK5 7paoBYfOFxjLTXe6TMsib50MmCxzn1HgGgIrFfRupnePPbsw8ZB2bIOLsvoAR3n//a0r VZhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=4T0VImvTxpRSiXeQNZ7u+yeJm6zyB8SeUgx235OhHNI=; b=CihUlSUkv5j4KYQ3RnGUyfelUMU4jjigCrzMUce29V/wcwV9y7UXAa7NnUQBTpukkH xVtW/TzjSX13zf+u3To0t50trE+SHDHPgeISYPYrG4ilu8cIT7C4o0yQc6V9WmbVfj5H /bjIIGgypEl6uv14yI+/5uRUiRcjdCRmGPPQdt1cyleQxmahPnYEIJhmLgI2qLmtorck VMkiJg4jTzaObZIEB13/adhQuKMvU92DTsjusu93fwyxLELtwfBCgmTzsLq7miaCw3TA UspOu0RbcZqMiRE21VbSJFVHT7yDse2h8KmFHh7qUpLNQleporB4kXs6WL5+qkFAx5Rv FQ8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m187-v6si12442371pfm.159.2018.11.10.07.09.59; Sat, 10 Nov 2018 07:10:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726352AbeKKAxg convert rfc822-to-8bit (ORCPT + 99 others); Sat, 10 Nov 2018 19:53:36 -0500 Received: from mout.gmx.net ([212.227.17.21]:59195 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726068AbeKKAxg (ORCPT ); Sat, 10 Nov 2018 19:53:36 -0500 Received: from [192.168.1.153] ([74.104.183.64]) by mail.gmx.com (mrgmx102 [212.227.17.174]) with ESMTPSA (Nemesis) id 0MDW9x-1gIQT21CbV-00Gs7v; Sat, 10 Nov 2018 16:08:13 +0100 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: Re: kmemleak: Early log buffer exceeded (525980) during boot From: Qian Cai In-Reply-To: <1541712198.12945.12.camel@gmx.us> Date: Sat, 10 Nov 2018 10:08:10 -0500 Cc: Catalin Marinas , linux-mm@kvack.org Content-Transfer-Encoding: 8BIT Message-Id: References: <1541712198.12945.12.camel@gmx.us> To: open list X-Mailer: Apple Mail (2.3445.100.39) X-Provags-ID: V03:K1:Uz4CQ1Qtqd3cbv731I9IJ6Df0yovs+8L1OcYMB4181O1dTQj9dR AGHxfLVfgAh7WS8uXtAIb7ZeDOx0AcikbInRYYQS1ksdYgWrlpFlq4KU/ZBd3dQsgDsiNb0 wykQGRJNLSD9RtT1HlR2i2JXgxWl3YV0bU2xiEqiGUZp119rOtVzWOPV1igutXdqGlNv5mJ SysjJDhNSFaZALOCPvHZw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V01:K0:XkkIm6Uq/XE=:f/uS67dmRZTgwEQo4SZPNa ELeOcdos6iOjDF0qGCUsD1dH/xVl8u7LEBSTeL9ByQdH+R7OaSLF2BG4iuItzS+m18zB0wZE/ VoCFT15DDUS9+JHXU6tv4nxOc0r3gXDbjeh7wgQprdcPhbSL3ATGad2DtoPvt0s7bCw9xlgbl Bs9mAwkiXVFrAnYUAG6h5MWliwW0mnkpJ2mQvBxBa01fbiZqDIF1PpVG3emIkp1B3Mrmj7YfC 8TMYqm+p8vKHuDAFc6QJVdbUiGschyuLbhh31gwZn7gwgpwqcUujZ/6GgL76Hy4AJE2goLfvu NeWPy1l4kcOKkHn6YaGvFIp5O73mGYt1JpyXUIoJZnAgN7xCvmDJ04Z8cz67pM55mJiTasQT1 FUfc3utCbHR2XxwMNp9/mlLyDqzsQab6583aCCSCU2Ei9EG65qRNquhonSYxc/kZZSF6TeeQX 56Z4BJ2+BOi76hKBBn3utb/jxVW/jW7Muj+h+SRx8XxieZesrKI5zQYG02hc7dXSIXCe+vDla +XMWpo72CaLqzoFIthcO7JOrZ3boKlsJfGbyY57+cJgoIlOzpn/5lMUUH/DTym/v5WEm1WVPP ceaxLIYo+y57h8tyF2ReTf8vbzFWO/pU7720FT+/NKsDbF9Uu4Fmxy5NJ3IXSRrsFYz0rUwE+ FaAtxGkvWguSP4KoMQxi6/kpuJFjbBMjhoyGrGzHkfnZCRR9DYZBCB0FDxdyxNIEBp+jJWyo5 pKBCTcCK9SSZ+6gsWnIeALeT+wb0FOO2Rli/Rr2oopeQsqGq1JW3BEgVq9aguXMGVAd7rI1W+ dmKIuTVaqAw4rtBN6mNl+tH3dmkMPCAnHtG/I+HWdkVWQUJZ142+BYAj+bBsM6p/V1P/INIoU DnYE3fphCNSZiyn2MpB9NBzopOOL8J9IWhDfnycYuIbqDhB+QBWl2YRobNgSrI Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Nov 8, 2018, at 4:23 PM, Qian Cai wrote: > > The maximum value for DEBUG_KMEMLEAK_EARLY_LOG_SIZE is only 40000, so it > disables kmemleak every time on this aarch64 server running the latest mainline > (b00d209). > > # echo scan > /sys/kernel/debug/kmemleak > -bash: echo: write error: Device or resource busy > > Any idea on how to enable kmemleak there? I have managed to hard-code DEBUG_KMEMLEAK_EARLY_LOG_SIZE to 600000, diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 877de4fa0720..c10119102c10 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -280,7 +280,7 @@ struct early_log { /* early logging buffer and current position */ static struct early_log - early_log[CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE] __initdata; + early_log[600000] __initdata; static int crt_early_log __initdata; static void kmemleak_disable(void); Even though kmemleak is enabled, there are continuous soft-lockups and eventually a kernel panic. Is it normal that kmemleak not going to work with large systems (this aarch64 server has 64-CPU and 100G memory)? [39418.164687] watchdog: BUG: soft lockup - CPU#20 stuck for 23s! [kworker/20:1:361] [39418.172172] Modules linked in: vfat fat ses enclosure ipmi_ssif ghash_ce sha2_ce sha256_arm64 sha1_ce sg ipmi_si ipmi_devintf ipmi_msghandler sbsa_gwdt sch_fq_codel xfs libcrc32c marvell mpt3sas raid_class mlx5_core hibmc_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe hisi_sas_v2_hw igb hisi_sas_main libsas hns_dsaf mlxfw hns_enet_drv devlink i2c_designware_platform mdio i2c_algo_bit scsi_transport_sas ehci_platform i2c_designware_core hns_mdio hnae dm_mirror dm_region_hash dm_log dm_mod [39418.218197] irq event stamp: 56 [39418.221346] hardirqs last enabled at (55): [] _raw_spin_unlock_irq+0x3c/0x78 [39418.230040] hardirqs last disabled at (56): [] __schedule+0x1d4/0xda0 [39418.238040] softirqs last enabled at (0): [] copy_process.isra.2+0xda0/0x2e20 [39418.246820] softirqs last disabled at (0): [<0000000000000000>] (null) [39418.254215] CPU: 20 PID: 361 Comm: kworker/20:1 Tainted: G B W L 4.20.0-rc1+ #3 [39418.262472] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [39418.269695] Workqueue: events free_obj_work [39418.273875] pstate: 20000005 (nzCv daif -PAN -UAO) [39418.278662] pc : _raw_spin_unlock_irqrestore+0x84/0x88 [39418.283795] lr : _raw_spin_unlock_irqrestore+0x80/0x88 [39418.288926] sp : ffff801ec3556a80 [39418.292234] x29: ffff801ec3556a80 x28: 1ffff003d86aad98 [39418.297542] x27: ffff2000111a1000 x26: ffff2000111bcd80 [39418.302850] x25: ffff801ebc5886c8 x24: ffff801ebc5886d0 [39418.308158] x23: ffff8019ce23b280 x22: ffff2000111a1c40 [39418.313465] x21: ffff2000087dc9fc x20: ffff2000111bcd88 [39418.318773] x19: 0000000000000000 x18: 0000000000000000 [39418.324080] x17: 0000000000000000 x16: 0000000000000000 [39418.329387] x15: 0000000000001880 x14: ffffffffffff8650 [39418.334694] x13: 0000000000000000 x12: 000000000000005c [39418.340001] x11: 00000000f2f2f2f2 x10: dfff200000000000 [39418.345309] x9 : ffff20000f669848 x8 : 00000000f2f2f200 [39418.350616] x7 : ffff2000087dc7ec x6 : 0000000041b58ab3 [39418.355924] x5 : dfff200000000000 x4 : dfff200000000000 [39418.361231] x3 : 1ffff003d86aace4 x2 : 0000000000000003 [39418.366538] x1 : bdf5a6abd51de100 x0 : 0000000000000000 [39418.371846] Call trace: [39418.374292] _raw_spin_unlock_irqrestore+0x84/0x88 [39418.379079] __debug_object_init+0x39c/0x918 [39418.383342] debug_object_activate+0x218/0x370 [39418.387782] __call_rcu+0xdc/0xad0 [39418.391178] call_rcu+0x30/0x40 [39418.394315] put_object+0x50/0x68 [39418.397624] __delete_object+0xfc/0x140 [39418.401455] delete_object_full+0x2c/0x38 [39418.405459] kmemleak_free+0xa4/0xb0 [39418.409030] kmem_cache_free+0x2e4/0x3a8 [39418.412947] free_obj_work+0x300/0x468 [39418.416692] process_one_work+0x60c/0xd90 [39418.420695] worker_thread+0x13c/0xa70 [39418.424440] kthread+0x1c4/0x1d0 [39418.427663] ret_from_fork+0x10/0x1c [39423.435417] BUG: workqueue lockup - pool cpus=20 node=1 flags=0x0 nice=0 stuck for 3608s! [39423.443658] Showing busy workqueues and worker pools: [39423.449354] workqueue events: flags=0x0 [39423.453272] pwq 64: cpus=32 node=2 flags=0x0 nice=0 active=1/256 [39423.459456] in-flight: 917:key_garbage_collector [39423.465289] pwq 40: cpus=20 node=1 flags=0x0 nice=0 active=2/256 [39423.471476] in-flight: 361:free_obj_work [39423.475749] pending: free_obj_work [39423.480087] pwq 26: cpus=13 node=0 flags=0x0 nice=0 active=1/256 [39423.486273] in-flight: 377:hns_nic_service_task [hns_enet_drv] [39423.493235] workqueue mm_percpu_wq: flags=0x8 [39423.498302] pwq 40: cpus=20 node=1 flags=0x0 nice=0 active=1/256 [39423.504488] pending: vmstat_update [39423.509247] workqueue ixgbe: flags=0xe000a [39423.513345] pwq 128: cpus=0-63 flags=0x4 nice=0 active=1/1 [39423.519007] in-flight: 16574:ixgbe_service_task [ixgbe] [39423.526001] pool 26: cpus=13 node=0 flags=0x0 nice=0 hung=0s workers=2 idle: 78 [39423.533349] pool 40: cpus=20 node=1 flags=0x0 nice=0 hung=3608s workers=3 idle: 906 114 [39423.542008] pool 64: cpus=32 node=2 flags=0x0 nice=0 hung=71s workers=3 idle: 9912 1737 [39454.155305] BUG: workqueue lockup - pool cpus=20 node=1 flags=0x0 nice=0 stuck for 3638s! [39454.163547] Showing busy workqueues and worker pools: [39454.169248] workqueue events: flags=0x0 [39454.173164] pwq 64: cpus=32 node=2 flags=0x0 nice=0 active=1/256 [39454.179349] in-flight: 917:key_garbage_collector [39454.185164] pwq 40: cpus=20 node=1 flags=0x0 nice=0 active=2/256 [39454.191351] in-flight: 361:free_obj_work [39454.195624] pending: free_obj_work [39454.199973] pwq 26: cpus=13 node=0 flags=0x0 nice=0 active=1/256 [39454.206159] in-flight: 377:hns_nic_service_task [hns_enet_drv] [39454.213117] workqueue mm_percpu_wq: flags=0x8 [39454.218185] pwq 40: cpus=20 node=1 flags=0x0 nice=0 active=1/256 [39454.224371] pending: vmstat_update [39454.229707] pool 26: cpus=13 node=0 flags=0x0 nice=0 hung=0s workers=2 idle: 78 [39454.237672] pool 40: cpus=20 node=1 flags=0x0 nice=0 hung=3638s workers=3 idle: 906 114 [39454.246320] pool 64: cpus=32 node=2 flags=0x0 nice=0 hung=26s workers=3 idle: 9912 1737 [39480.842570] WARNING: CPU: 8 PID: 2290 at mm/page_alloc.c:4270 __alloc_pages_nodemask+0x1f9c/0x2028 [39480.845824] kmemleak: Cannot allocate a kmemleak_object structure [39482.325003] Kernel panic - not syncing: corrupted stack end detected inside scheduler [39482.332839] CPU: 1 PID: 1319 Comm: systemd-journal Tainted: G B W L 4.20.0-rc1+ #3 [39482.341358] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [39482.348577] Call trace: [39482.351029] dump_backtrace+0x0/0x2c8 [39482.354688] show_stack+0x24/0x30 [39482.358001] dump_stack+0x118/0x19c [39482.361484] panic+0x1b8/0x31c [39482.364536] schedule+0x0/0x240 [39482.367673] preempt_schedule_common+0x3c/0x78 [39482.372110] _cond_resched+0xfc/0x108 [39482.375769] shrink_node_memcg+0x368/0x9c0 [39482.379861] shrink_node+0x200/0x940 [39482.383431] do_try_to_free_pages+0x234/0x7d0 [39482.387784] try_to_free_pages+0x228/0x6b0 [39482.391877] __alloc_pages_nodemask+0xcbc/0x2028 [39482.396492] alloc_pages_vma+0x1a4/0x208 [39482.400412] __read_swap_cache_async+0x4fc/0x858 [39482.405025] read_swap_cache_async+0xa4/0x100 [39482.409378] swap_cluster_readahead+0x598/0x650 [39482.413904] shmem_swapin+0xd4/0x150 [39482.417477] shmem_getpage_gfp+0xf50/0x1c48 [39482.421657] shmem_fault+0x140/0x340 [39482.425229] __do_fault+0xd0/0x440 [39482.428625] do_fault+0x54c/0xf48 [39482.431935] __handle_mm_fault+0x4c0/0x928 [39482.436028] handle_mm_fault+0x30c/0x4b8 [39482.439947] do_page_fault+0x294/0x658 [39482.443690] do_translation_fault+0x98/0xa8 [39482.447871] do_mem_abort+0x64/0xf0 [39482.451354] el0_da+0x24/0x28 [39482.454476] SMP: stopping secondary CPUs [39482.459038] Kernel Offset: disabled [39482.462522] CPU features: 0x2,21806008 [39482.466264] Memory Limit: none [39482.469435] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]--- [39483.029013] WARNING: CPU: 1 PID: 1319 at kernel/sched/core.c:1168 set_task_cpu+0x5ac/0x5f8 [39483.037272] Modules linked in: vfat fat ses enclosure ipmi_ssif ghash_ce sha2_ce sha256_arm64 sha1_ce sg ipmi_si ipmi_devintf ipmi_msghandler sbsa_gwdt sch_fq_codel xfs libcrc32c marvell mpt3sas raid_class mlx5_core hibmc_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe hisi_sas_v2_hw igb hisi_sas_main libsas hns_dsaf mlxfw hns_enet_drv devlink i2c_designware_platform mdio i2c_algo_bit scsi_transport_sas ehci_platform i2c_designware_core hns_mdio hnae dm_mirror dm_region_hash dm_log dm_mod [39483.083295] CPU: 1 PID: 1319 Comm: systemd-journal Tainted: G B W L 4.20.0-rc1+ #3 [39483.091812] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [39483.099027] pstate: 60000085 (nZCv daIf -PAN -UAO) [39483.103812] pc : set_task_cpu+0x5ac/0x5f8 [39483.107815] lr : set_task_cpu+0x124/0x5f8 [39483.111817] sp : ffff8016fb0440c0 [39483.115125] x29: ffff8016fb0440c0 x28: 0000000000000000 [39483.120433] x27: 0000000000000002 x26: 0000000000000004 [39483.125740] x25: ffff20000f6718b8 x24: ffff80161ac85244 [39483.131047] x23: ffff20000f669848 x22: 1ffff002df608826 [39483.136354] x21: 0000000000000002 x20: 0000000000000000 [39483.141661] x19: ffff80161ac85200 x18: ffff1011daf8d930 [39483.146968] x17: 0000000000000000 x16: 0000000000000000 [39483.152274] x15: ffff1011daf8d930 x14: 6863732065646973 [39483.157581] x13: 6e69206465746365 x12: 74656420646e6520 [39483.162887] x11: 00000000f2f2f2f2 x10: dfff200000000000 [39483.168194] x9 : ffff20000f669848 x8 : 00000000f2f2f200 [39483.173501] x7 : 0000000041b58ab3 x6 : 0000000041b58ab3 [39483.178808] x5 : dfff200000000000 x4 : dfff200000000000 [39483.184114] x3 : 00000000f2f2f2f2 x2 : 0000000000000007 [39483.189421] x1 : ffff20000f6718b8 x0 : 0000000000000000 [39483.194728] Call trace: [39483.197170] set_task_cpu+0x5ac/0x5f8 [39483.200826] try_to_wake_up+0x380/0xa48 [39483.204656] wake_up_process+0x28/0x38 [39483.208400] hrtimer_wakeup+0x30/0x40 [39483.212056] __run_hrtimer+0x20c/0x988 [39483.215800] __hrtimer_run_queues+0x16c/0x1f8 [39483.220151] hrtimer_interrupt+0x18c/0x370 [39483.224243] arch_timer_handler_phys+0x44/0x58 [39483.228682] handle_percpu_devid_irq+0x1a0/0x720 [39483.233293] generic_handle_irq+0x44/0x60 [39483.237296] __handle_domain_irq+0x88/0xf0 [39483.241386] gic_handle_irq+0xc0/0x168 [39483.245128] el1_irq+0xb4/0x140 [39483.248265] __delay+0x70/0xd8 [39483.251313] __const_udelay+0x34/0x40 [39483.254969] panic+0x30c/0x31c [39483.258018] schedule+0x0/0x240 [39483.261154] preempt_schedule_common+0x3c/0x78 [39483.265591] _cond_resched+0xfc/0x108 [39483.269249] shrink_node_memcg+0x368/0x9c0 [39483.273339] shrink_node+0x200/0x940 [39483.276909] do_try_to_free_pages+0x234/0x7d0 [39483.281260] try_to_free_pages+0x228/0x6b0 [39483.285351] __alloc_pages_nodemask+0xcbc/0x2028 [39483.289963] alloc_pages_vma+0x1a4/0x208 [39483.293880] __read_swap_cache_async+0x4fc/0x858 [39483.298492] read_swap_cache_async+0xa4/0x100 [39483.302844] swap_cluster_readahead+0x598/0x650 [39483.307368] shmem_swapin+0xd4/0x150 [39483.310938] shmem_getpage_gfp+0xf50/0x1c48 [39483.315115] shmem_fault+0x140/0x340 [39483.318684] __do_fault+0xd0/0x440 [39483.322080] do_fault+0x54c/0xf48 [39483.325390] __handle_mm_fault+0x4c0/0x928 [39483.329480] handle_mm_fault+0x30c/0x4b8 [39483.333397] do_page_fault+0x294/0x658 [39483.337141] do_translation_fault+0x98/0xa8 [39483.341318] do_mem_abort+0x64/0xf0 [39483.344800] el0_da+0x24/0x28 [39483.347761] irq event stamp: 0 [39483.350810] hardirqs last enabled at (0): [<0000000000000000>] (null) [39483.358200] hardirqs last disabled at (0): [] copy_process.isra.2+0xda0/0x2e20 [39483.366979] softirqs last enabled at (0): [] copy_process.isra.2+0xda0/0x2e20 [39483.375756] softirqs last disabled at (0): [<0000000000000000>] (null) [39483.383144] ---[ end trace 0d8bbddd3044cabe ]---