Received: by 10.192.165.156 with SMTP id m28csp6354imm; Tue, 17 Apr 2018 05:43:34 -0700 (PDT) X-Google-Smtp-Source: AIpwx481NUziC3QRVlSFbwmXlLMnYiNXuUOiqYjyFZxRklLkmvzDXT09zNC90B1WAdh9Ju3RTiHF X-Received: by 10.98.131.69 with SMTP id h66mr1841349pfe.255.1523969014160; Tue, 17 Apr 2018 05:43:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523969014; cv=none; d=google.com; s=arc-20160816; b=PouN9nUo5Y5YaGVJhvOm5Mhhm9fHoS2bHPJw+vd5fud1CqBZjmGgLc+teofWsYVrZU 401ZNHbZgEaRwo8mxkAnDNUP3+12fxLY+UZrNVHNSwoyAY2+JR00BPkkJ3FMisVV84Av yJ2iCL3bYM45WEpPHNMon85hN92nCuS5e00OILjlP4wgMESwTho95NaO5V+1GyuvMeot //FaHS8zVj1dIjjbtPyD9Z3HRULKPyRjd/l0z/lydDDQu6xqcU/toebxNiteGmD0Lhbj w3uOZVifX7hcMx5bPX4mUA9LHJkbnI9JjZ0SFJUF2PRlpLcvjzuFQ9oNmQncNMCI+Q+O GY5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=nSFjpSeV50QRSZ6dthi6a0YoJLIOvxu9EZ0sYmb2qhg=; b=ER7ReyR5TtCZHyklvt2TKq10Y4prVGhpuAOMc/2ZZTrWoSMsHrzE+9+LL8ozzzgvh2 3ttT0wT24zQlrO86XyvnCxwGcsBkQRVNWzJPRj4VSqt8HtqE9UUzMZngN29btJEMk5bJ 1OZkzA6/pqIHkuSMJu9txvmSEFkt3HX0NfHjm3R6CE5jkZ4uv4CpwkH/J955fi+b7qiX AKhyqy8BYKxiSZ1SHhb7ZMcmIlKuK3dHR0cg/mM7R47OrXRVzEke5nzi+ExVhXwirEhk 4B7Cyan8F0XSKsFUc1fw4ty+1ZPRvObm/MmwTUHfxsHk/TnChdKDNMZ4yzpahZ/KPG3G xyHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f8si11373743pgr.419.2018.04.17.05.43.20; Tue, 17 Apr 2018 05:43:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753015AbeDQMXA (ORCPT + 99 others); Tue, 17 Apr 2018 08:23:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:41224 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752546AbeDQMW7 (ORCPT ); Tue, 17 Apr 2018 08:22:59 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 48E67AE3E; Tue, 17 Apr 2018 12:22:58 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 185F91E0531; Tue, 17 Apr 2018 14:22:58 +0200 (CEST) Date: Tue, 17 Apr 2018 14:22:58 +0200 From: Jan Kara To: Pavlos Parissis Cc: Jan Kara , mszeredi@redhat.com, amir73il@gmail.com, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: kernel 4.14.x crash around fsnotify_mark_connector Message-ID: <20180417122258.gj25idq7w7xm5dgc@quack2.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Tue 17-04-18 12:39:32, Pavlos Parissis wrote: > In one of our production servers where we run kernel version 4.14.32, I noticed > the following: Thanks for the report! So I believe this is the same underlying problem as for the softlockups you were observing - there I believe the linked list just gets corrupted which causes the list iteration to never finish. But this more points to a place where could be some use-after-free issues (either in fsnotify or somewhere else). I'll look into the code. Honza > > ---- > [138630.417246] cache_from_obj: Wrong slab cache. fsnotify_mark_connector but > object is from kmalloc-2048(361:anycast-healthchecker.service) > [138630.477075] ------------[ cut here ]------------ > [138630.500028] WARNING: CPU: 14 PID: 26002 at mm/slab.h:377 > kmem_cache_free+0x133/0x1c0 > [138630.538130] Modules linked in: netconsole binfmt_misc sctp_diag sctp dccp_diag > dccp tcp_diag udp_diag inet_diag unix_diag cfg80211 rfkill 8021q garp mrp xfs loop > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd > glue_helper cryptd iTCO_wdt intel_cstate hpwdt hpilo iTCO_vendor_support sg > intel_rapl_perf ipmi_si pcspkr ioatdma shpchp lpc_ich i2c_i801 ipmi_devintf dca > mfd_core wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables > ext4 mbcache jbd2 i2c_algo_bit sd_mod drm_kms_helper syscopyarea sysfillrect > sysimgblt fb_sys_fops ttm bnx2x serio_raw mdio libcrc32c crc32c_intel hpsa ptp drm > scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod dax > [138630.871763] CPU: 14 PID: 26002 Comm: kworker/u66:3 Not tainted > 4.14.32-1.el7.x86_64 #1 > [138630.910864] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/25/2017 > [138630.943498] Workqueue: events_unbound fsnotify_connector_destroy_workfn > [138630.976569] task: ffff88203b62ae80 task.stack: ffffc90008a20000 > [138631.005895] RIP: 0010:kmem_cache_free+0x133/0x1c0 > [138631.029230] RSP: 0018:ffffc90008a23e20 EFLAGS: 00010246 > [138631.055217] RAX: 000000000000007c RBX: ffff882000000000 RCX: 0000000000000000 > [138631.091346] RDX: 0000000000000000 RSI: ffff88203f3969d8 RDI: ffff88203f3969d8 > [138631.127139] RBP: ffffc90008a23e38 R08: 0000000000000001 R09: 000000000000057d > [138631.162575] R10: ffff88102b906cf0 R11: ffff88102b913ea0 R12: ffff88103f579980 > [138631.198018] R13: ffff88203bd57c00 R14: ffff88203bd63f00 R15: ffffffff82128388 > [138631.233371] FS: 0000000000000000(0000) GS:ffff88203f380000(0000) > knlGS:0000000000000000 > [138631.273426] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [138631.301980] CR2: 0000000003600fd8 CR3: 000000000200a006 CR4: 00000000003606e0 > [138631.338783] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [138631.374525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [138631.409797] Call Trace: > [138631.422387] fsnotify_connector_destroy_workfn+0x68/0x80 > [138631.448516] process_one_work+0x15c/0x380 > [138631.468352] worker_thread+0x4d/0x3e0 > [138631.486705] kthread+0x10c/0x140 > [138631.503091] ? max_active_store+0x80/0x80 > [138631.523217] ? __kthread_parkme+0x70/0x70 > [138631.543338] ? do_syscall_64+0x79/0x1b0 > [138631.562518] ret_from_fork+0x35/0x40 > [138631.580199] Code: 4c 3b a7 d8 00 00 00 0f 84 11 ff ff ff 48 8b 4f 60 49 8b 54 > 24 60 48 c7 c6 c0 19 c3 81 48 c7 c7 c0 9c e7 81 31 c0 e8 90 a5 eb ff <0f> 0b4c 89 > e7 e9 e9 fe ff ff 65 8b 05 2c 3c de 7e 89 c0 48 0f > [138631.672489] ---[ end trace 6748d86d682915c2 ]--- > [138631.695691] cache_from_obj: Wrong slab cache. fsnotify_mark_connector but > object is from kmalloc-2048(361:anycast-healthchecker.service) > --- > > The kernel was fine and server was responsive. > A coworker mentioned that it may come from > https://github.com/torvalds/linux/blob/master/fs/notify/mark.c#L160 > and commit > 08991e83b728 ("fsnotify: Free fsnotify_mark_connector when there is no mark > attached") arrived with 4.12-rc1 may be the cause of it. > > I am wondering if it is related to the issue I reported > https://lkml.org/lkml/2018/4/16/506. But, that issue causes the whole server to > crash, which is not the case for above kernel dump. > > > Cheers, > Pavlos > -- Jan Kara SUSE Labs, CR