Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754770AbbLJJ2J (ORCPT ); Thu, 10 Dec 2015 04:28:09 -0500 Received: from mail-wm0-f53.google.com ([74.125.82.53]:33161 "EHLO mail-wm0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751092AbbLJJ2G (ORCPT ); Thu, 10 Dec 2015 04:28:06 -0500 Subject: Re: corruption causing crash in __queue_work To: Tejun Heo References: <566819D8.5090804@kyup.com> <20151209160803.GK30240@mtj.duckdns.org> <56685573.1020805@kyup.com> <20151209162744.GN30240@mtj.duckdns.org> Cc: "Linux-Kernel@Vger. Kernel. Org" , SiteGround Operations From: Nikolay Borisov Message-ID: <566945A2.1050208@kyup.com> Date: Thu, 10 Dec 2015 11:28:02 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20151209162744.GN30240@mtj.duckdns.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6230 Lines: 85 On 12/09/2015 06:27 PM, Tejun Heo wrote: > Hello, > > On Wed, Dec 09, 2015 at 06:23:15PM +0200, Nikolay Borisov wrote: >> I think we are seeing this at least daily on at least 1 server (we have >> multiple servers like that). So adding printk's would likely be the way >> to go, anything in particular you might be interested in knowing? I see >> RCU stuff around so might be tricky race condition. > > Printing out the workqueue's pointer, name, pwq's pointer, the node > being installed for and the installed pointer should give us enough > clues. There's RCU involved but the pointers shouldn't be becoming > NULLs unless we're installing NULL ptrs. So the debug patch has been rolled on 1 server and several more are in the process, here it is what it prints: WQ: ffff88046f00ba00 (events_unbound) old_pwq: (null) new_pwq: ffff88046f00d300 node: 0 WQ: ffff88046f00be00 (events_power_efficient) old_pwq: (null) new_pwq: ffff88046f00d400 node: 0 WQ: ffff88046d71c000 (events_freezable_power_) old_pwq: (null) new_pwq: ffff88046f00d500 node: 0 WQ: ffff88046ce9ca00 (khelper) old_pwq: (null) new_pwq: ffff88046f00d600 node: 0 WQ: ffff88046ce9c000 (netns) old_pwq: (null) new_pwq: ffff88046f00d700 node: 0 WQ: ffff88046ce9d400 (perf) old_pwq: (null) new_pwq: ffff88046f00d800 node: 0 WQ: ffff88046c408000 (writeback) old_pwq: (null) new_pwq: ffff88046c800000 node: 0 WQ: ffff88046c409200 (kacpi_hotplug) old_pwq: (null) new_pwq: ffff88046c42e200 node: 0 WQ: ffff880468455600 (scsi_tmf_0) old_pwq: (null) new_pwq: ffff88046c801f00 node: 0 WQ: ffff8804687f4400 (scsi_tmf_1) old_pwq: (null) new_pwq: ffff88046caa6700 node: 0 WQ: ffff8804687f4c00 (scsi_tmf_2) old_pwq: (null) new_pwq: ffff88046caa6900 node: 0 WQ: ffff8804687f5400 (scsi_tmf_3) old_pwq: (null) new_pwq: ffff88046caa6b00 node: 0 WQ: ffff8804687f5c00 (scsi_tmf_4) old_pwq: (null) new_pwq: ffff88046caa6d00 node: 0 WQ: ffff8804687f6400 (scsi_tmf_5) old_pwq: (null) new_pwq: ffff88046caa7000 node: 0 WQ: ffff8804687f6c00 (scsi_tmf_6) old_pwq: (null) new_pwq: ffff88046caa7300 node: 0 WQ: ffff880467964000 (kdmremove) old_pwq: (null) new_pwq: ffff880467a3c800 node: 0 WQ: ffff880467965000 (deferwq) old_pwq: (null) new_pwq: ffff880467a3c100 node: 0 WQ: ffff8804669bc600 (ib_addr) old_pwq: (null) new_pwq: ffff88046845a600 node: 0 WQ: ffff88007d167e00 (qib0_0) old_pwq: (null) new_pwq: ffff880466c19800 node: 0 WQ: ffff88007d165a00 (qib0_1) old_pwq: (null) new_pwq: ffff880466c18e00 node: 0 WQ: ffff88007d165200 (ib_mad1) old_pwq: (null) new_pwq: ffff880466c19d00 node: 0 WQ: ffff8804665d2000 (ib_mad2) old_pwq: (null) new_pwq: ffff880466c18a00 node: 0 WQ: ffff8804667d7600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880469806100 node: 0 WQ: ffff880079a9fc00 (edac-poller) old_pwq: (null) new_pwq: ffff88007d5ebf00 node: 0 WQ: ffff88046b47cc00 (kvm-irqfd-cleanup) old_pwq: (null) new_pwq: ffff8804651f0f00 node: 0 WQ: ffff8804694baa00 (kloopd0) old_pwq: (null) new_pwq: ffff88046949d100 node: 0 WQ: ffff880079a9cc00 (kloopd1) old_pwq: (null) new_pwq: ffff8804698cb900 node: 0 WQ: ffff88046809dc00 (kloopd2) old_pwq: (null) new_pwq: ffff88046957aa00 node: 0 WQ: ffff88046809c000 (kloopd3) old_pwq: (null) new_pwq: ffff8804650acc00 node: 0 WQ: ffff880466f3b000 (kloopd4) old_pwq: (null) new_pwq: ffff880469575900 node: 0 WQ: ffff88046809e800 (kloopd5) old_pwq: (null) new_pwq: ffff880469888200 node: 0 WQ: ffff88046809de00 (kloopd6) old_pwq: (null) new_pwq: ffff880469827400 node: 0 WQ: ffff88007d5f1c00 (dm_bufio_cache) old_pwq: (null) new_pwq: ffff8804673dda00 node: 0 WQ: ffff88046c42a400 (dm-thin) old_pwq: (null) new_pwq: ffff880079955100 node: 0 WQ: ffff8804672d0800 (dm-thin) old_pwq: (null) new_pwq: ffff88046baed800 node: 0 WQ: ffff88046993fa00 (dm-thin) old_pwq: (null) new_pwq: ffff8804650ff100 node: 0 WQ: ffff88046993d400 (dm-thin) old_pwq: (null) new_pwq: ffff88046949d600 node: 0 WQ: ffff88046993e400 (dm-thin) old_pwq: (null) new_pwq: ffff88046b833000 node: 0 WQ: ffff880466466400 (dm-thin) old_pwq: (null) new_pwq: ffff88007da60d00 node: 0 WQ: ffff88046b3eb200 (dm-thin) old_pwq: (null) new_pwq: ffff88046633d200 node: 0 WQ: ffff8804672d0600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880079955400 node: 0 WQ: ffff88046b3eb600 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880465684900 node: 0 WQ: ffff88046c42a400 (dm-thin) old_pwq: (null) new_pwq: ffff8800799ee900 node: 0 WQ: ffff880466f39a00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880469849e00 node: 0 WQ: ffff880467b0cc00 (dm-thin) old_pwq: (null) new_pwq: ffff88007d52fa00 node: 0 WQ: ffff8804672d4e00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff88046ca07f00 node: 0 WQ: ffff880079a9ca00 (dm-thin) old_pwq: (null) new_pwq: ffff8802d1be9e00 node: 0 WQ: ffff880466175000 (dm-thin) old_pwq: (null) new_pwq: ffff8802d8efec00 node: 0 WQ: ffff880403f28400 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff8802e224dd00 node: 0 WQ: ffff880403f29a00 (dm-thin) old_pwq: (null) new_pwq: ffff880465685300 node: 0 WQ: ffff8804672d6c00 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff880466d69300 node: 0 WQ: ffff880466f3ba00 (dm-thin) old_pwq: (null) new_pwq: ffff880469576500 node: 0 WQ: ffff8804672d4600 (dm-thin) old_pwq: (null) new_pwq: ffff8802d1a1ee00 node: 0 WQ: ffff8803ccf5c200 (ext4-rsv-conversion) old_pwq: (null) new_pwq: ffff8804657b3200 node: 0 Is this format ok? Also I observed the exact same crash on a machine running 4.1.12 kernel as well. > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/