Received: by 10.223.164.202 with SMTP id h10csp3436609wrb; Tue, 28 Nov 2017 11:15:09 -0800 (PST) X-Google-Smtp-Source: AGs4zMZQabmgdtjEnSDEnGLKX+Z/FYA3ZOyCuY/xEW9Hw2GpiTFmT0yN4V+CVeKViXXsNvaKfP3q X-Received: by 10.84.130.97 with SMTP id 88mr191595plc.131.1511896509697; Tue, 28 Nov 2017 11:15:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511896509; cv=none; d=google.com; s=arc-20160816; b=uZtrrDeipq6vQJaLE3gwvaaWPubvgXZFPLCPj+8oX8ZuKLW5H+mti8jQ4MMDdBJ6Zq 1QUXEEjKiOWqMmCZ9bDR6azTUOMSdLZqSFRkz8xyXnVd7FTi2m2/a2XCVXTvlfriRHpL B4s/ANWENXWzkNhUryKu3y0VbgDY2pDdqf/6dKn4eInC2T9IxyhEMnNAlJG6Z2MCFhhE zMnn1Y0kORIJtHTBJxi+1RkmiAQdxmnKx8MNZrLNhO2ZUGPcNWOcu8E5myJiLTyvZIy4 kSDxx2VMOmrnBjRIMHDAlKeDHInf204zBnMyaGLmRrWoCaMhXuJOdXHS3nwz2r/OzL68 /jhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=Qyv2pDPt57D2GasiEwcmJ1BRzP5bo8DvRo/In9y7IaE=; b=Dp937Wp4Bw+7o4NBPe/wkBa1AKJSIOTs/YZvjrH2TLMhedonmndH437SpZwrHCnIhW /yjH9iWCe07jA/HMlS90SGDhfZHzZwPMdjgnrYgx46ioG7Uu9u6uUsevaXvXJhHypKQx CVAhP4fzUJYBAkVuoIvseZ9Ru3/43H4UZeEAN05HZ2H7ZsLXVuPPQ6vUJomaU5j13Kk7 0vLfl5k4zo/si5bd7AY/fwqRI4J8a9OeTL1qxeqOzZ6jVtJM1/pHbXKMiVCkMGDdSjGq wJCAZV5WbNWS3RwOsqa6MVRXH+QNtr15+pFiHdPNybF8CPGd9KDB1Zjfhp1aVDP21BE6 ojog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alibaba-inc.com header.s=default header.b=ZSa8tK1f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba-inc.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e90si23272553pfj.4.2017.11.28.11.14.58; Tue, 28 Nov 2017 11:15:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@alibaba-inc.com header.s=default header.b=ZSa8tK1f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba-inc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754096AbdK1TNA (ORCPT + 72 others); Tue, 28 Nov 2017 14:13:00 -0500 Received: from out0-243.mail.aliyun.com ([140.205.0.243]:60447 "EHLO out0-243.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752410AbdK1TM5 (ORCPT ); Tue, 28 Nov 2017 14:12:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alibaba-inc.com; s=default; t=1511896364; h=From:To:Subject:Date:Message-Id; bh=Qyv2pDPt57D2GasiEwcmJ1BRzP5bo8DvRo/In9y7IaE=; b=ZSa8tK1fXvQ1+PGRK/QBDEYc6DYhMqlEs749kagyrnSP6FsWtfBu1Q1fXa9n34aOD6cVCibOG9N3875cz/7TynRo22jPmoT0qoC6zVSBenZYej7ebTb5Uf7I2nolXhehDAJhiAzNtQtfxyWNDgNd4eKS38L4Yh5Vm1zMIpcASEs= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R731e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e02c03299;MF=yang.s@alibaba-inc.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---.9WyJM46_1511896354; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.s@alibaba-inc.com fp:106.11.237.231) by smtp.aliyun-inc.com(127.0.0.1); Wed, 29 Nov 2017 03:12:43 +0800 From: "Yang Shi" To: longman@redhat.com, tglx@linutronix.de Cc: "Yang Shi" , Subject: [PATCH 2/2 v8] lib: debugobjects: touch watchdog to avoid softlockup when !CONFIG_PREEMPT Date: Wed, 29 Nov 2017 03:12:16 +0800 Message-Id: <1511896336-103831-2-git-send-email-yang.s@alibaba-inc.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1511896336-103831-1-git-send-email-yang.s@alibaba-inc.com> References: <1511896336-103831-1-git-send-email-yang.s@alibaba-inc.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org There are nested loops on debug objects free path, sometimes it may take over hundred thousands of loops, then cause soft lockup with !CONFIG_PREEMPT occasionally, like below: NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [stress-ng-getde:110342] Modules linked in: binfmt_misc(E) tcp_diag(E) inet_diag(E) bonding(E) intel_rapl(E) iosf_mbi(E) x86_pkg_temp_thermal(E) coretemp(E) iTCO_wdt(E) iTCO_vendor_support(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) dcdbas(E) ghash_clmulni_intel(E) aesni_intel(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) ipmi_devintf(E) sg(E) cryptd(E) pcspkr(E) mei_me(E) lpc_ich(E) ipmi_si(E) mfd_core(E) mei(E) shpchp(E) wmi(E) ipmi_msghandler(E) acpi_power_meter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) jbd2(E) mbcache(E) sd_mod(E) mgag200(E) igb(E) drm_kms_helper(E) ixgbe(E) syscopyarea(E) mdio(E) sysfillrect(E) sysimgblt(E) ptp(E) fb_sys_fops(E) pps_core(E) ttm(E) drm(E) crc32c_intel(E) i2c_algo_bit(E) i2c_core(E) megaraid_sas(E) dca(E) irq event stamp: 4340444 hardirqs last enabled at (4340443): [] _raw_spin_unlock_irqrestore+0x36/0x60 hardirqs last disabled at (4340444): [] apic_timer_interrupt+0x91/0xa0 softirqs last enabled at (4340398): [] __do_softirq+0x349/0x50e softirqs last disabled at (4340391): [] irq_exit+0xf5/0x110 CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G E 4.9.44-003.ali3000.alios7.x86_64.debug #1 Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS 1.6.0 03/07/2013 task: ffff884cbb0d0000 task.stack: ffff884cabc70000 RIP: 0010:[] [] _raw_spin_unlock_irqrestore+0x3b/0x60 RSP: 0018:ffff884cabc77b78 EFLAGS: 00000292 RAX: ffff884cbb0d0000 RBX: 0000000000000292 RCX: 0000000000000000 RDX: ffff884cbb0d0000 RSI: 0000000000000001 RDI: 0000000000000292 RBP: ffff884cabc77b88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8357a0d8 R13: ffff884cabc77bc8 R14: ffffffff8357a0d0 R15: 00000000000000fc FS: 00002aee845fd2c0(0000) GS:ffff8852bd400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002991808 CR3: 0000005123abf000 CR4: 00000000000406e0 Stack: ffff884ff4fe0000 ffff884ff4fd8000 ffff884cabc77c00 ffffffff8141177e 0000000000000202 ffff884cbb0d0000 ffff884cabc77bc8 0000000000000006 ffff884ff4fda000 ffffffff8357a0d8 0000000000000000 91f5d976f6020b6c Call Trace: [] debug_check_no_obj_freed+0x13e/0x220 [] __free_pages_ok+0x1f1/0x5c0 [] __free_pages+0x25/0x40 [] __free_slab+0x19b/0x270 [] discard_slab+0x39/0x50 [] __slab_free+0x207/0x270 [] ___cache_free+0xa6/0xb0 [] qlist_free_all+0x47/0x80 [] quarantine_reduce+0x159/0x190 [] kasan_kmalloc+0xaf/0xc0 [] kasan_slab_alloc+0x12/0x20 [] kmem_cache_alloc+0xfa/0x360 [] ? getname_flags+0x4f/0x1f0 [] getname_flags+0x4f/0x1f0 [] getname+0x12/0x20 [] do_sys_open+0xf9/0x210 [] SyS_open+0x1e/0x20 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 Code: 7f 18 53 48 8b 55 08 48 89 f3 be 01 00 00 00 e8 3c cd 92 ff 4c 89 e7 e8 f4 0e 93 ff f6 c7 02 74 1b e8 3a ac 92 ff 48 89 df 57 9d <66> 66 90 66 90 65 ff 0d d1 ff 83 7e 5b 41 5c 5d c3 48 89 df 57 The code path might be called in either atomic or non-atomic context, so touching softlockup watchdog instead of calling cond_resched() which might fall asleep. However, it is unnecessary to touch the watchdog every loop, so just touch the watchdog at every 10000 (best estimate) loops. And, introduce a new knob: /sys/kernel/debug/debug_objects/suppress_lockup. 0 value means not suppress the softlockup message by touching the watchdog, non-zero value means suppress the softlockup message. The default value is zero. Signed-off-by: Yang Shi CC: Waiman Long CC: Thomas Gleixner --- v1 --> v2: * Added suppress_lockup knob in debugfs per Waiman's suggestion lib/debugobjects.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 166488d..f009a21 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -19,6 +19,7 @@ #include #include #include +#include #define ODEBUG_HASH_BITS 14 #define ODEBUG_HASH_SIZE (1 << ODEBUG_HASH_BITS) @@ -67,6 +68,8 @@ struct debug_bucket { static int debug_objects_allocated; static int debug_objects_freed; +static int suppress_lockup; + static void free_obj_work(struct work_struct *work); static DECLARE_WORK(debug_obj_work, free_obj_work); @@ -768,6 +771,10 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size) debug_objects_maxchain = cnt; max_loops += cnt; + + if (max_loops > 10000 && ((max_loops % 10000) == 0) + && suppress_lockup != 0) + touch_softlockup_watchdog(); } if (max_loops > debug_objects_maxloops) @@ -810,9 +817,23 @@ static int debug_stats_open(struct inode *inode, struct file *filp) .release = single_release, }; +static int suppress_lockup_get(void *data, u64 *val) +{ + *val = (u64) suppress_lockup; + return 0; +} + +static int suppress_lockup_set(void *data, u64 val) +{ + suppress_lockup = (u32) val; + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(suppress_lockup_fops, + suppress_lockup_get, suppress_lockup_set, "%llu\n"); + static int __init debug_objects_init_debugfs(void) { - struct dentry *dbgdir, *dbgstats; + struct dentry *dbgdir, *dbgstats, *dbglockup; if (!debug_objects_enabled) return 0; @@ -826,10 +847,15 @@ static int __init debug_objects_init_debugfs(void) if (!dbgstats) goto err; + dbglockup = debugfs_create_file_unsafe("suppress_lockup", 0644, dbgdir, + NULL, &suppress_lockup_fops); + if (!dbglockup) + goto err; + return 0; err: - debugfs_remove(dbgdir); + debugfs_remove_recursive(dbgdir); return -ENOMEM; } -- 1.8.3.1 From 1585372695352967166@xxx Wed Nov 29 04:20:21 +0000 2017 X-GM-THRID: 1585358519104788852 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread