Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755927AbaFKQck (ORCPT ); Wed, 11 Jun 2014 12:32:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:23507 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751683AbaFKQci (ORCPT ); Wed, 11 Jun 2014 12:32:38 -0400 Date: Wed, 11 Jun 2014 12:32:29 -0400 From: Vivek Goyal To: Joe Lawrence Cc: linux-kernel@vger.kernel.org, Tejun Heo , Cgroups Subject: Re: docker crashes rcuos in __blkg_release_rcu Message-ID: <20140611163229.GA12974@redhat.com> References: <20140609174708.GA31499@redhat.com> <20140609182728.GB31499@redhat.com> <20140610143906.0d2f35d0@jlaw-desktop.mno.stratus.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140610143906.0d2f35d0@jlaw-desktop.mno.stratus.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 10, 2014 at 02:39:06PM -0400, Joe Lawrence wrote: > > Hi Vivek, > > Thanks for taking a look. For extra debugging, I wrote a quick set of > kprobes that: > > 1 - On blkg_alloc entry, save the request_queue's kobj address in a > list > 2 - On kobject_put entry, dump the stack if the kobj is found in that > list > > and this was the trace for the final kobject put for the > request_queue before a crash: > > JL: kobject_put kobj(queue) @ ffff88084d89c9e8, refcount=1 > ------------[ cut here ]------------ > WARNING: CPU: 27 PID: 11060 at /h/jlawrenc/kprobes/docker/probes_blk.c:166 kret_entry_kobject_put+0x47/0x50 [docker_debug]() > [ ... snip modules ... ] > CPU: 27 PID: 11060 Comm: docker Tainted: G W OE 3.15.0 #1 > Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 12/25/2013 > 0000000000000000 0000000093cbdc81 ffff88104196fae8 ffffffff8162738d > 0000000000000000 ffff88104196fb20 ffffffff8106d81d ffff88084d89c9e8 > ffff881041912cd0 ffffffffa0181020 ffff88104196fbe0 ffffffffa01810c8 > Call Trace: > [] dump_stack+0x45/0x56 > [] warn_slowpath_common+0x7d/0xa0 > [] warn_slowpath_null+0x1a/0x20 > [] kret_entry_kobject_put+0x47/0x50 [docker_debug] > [] pre_handler_kretprobe+0x9e/0x1c0 > [] opt_pre_handler+0x4f/0x90 > [] optimized_callback+0x97/0xb0 > [] ? kobject_put+0x1/0x60 > [] ? blk_cleanup_queue+0x101/0x1a0 > [] ? __dm_destroy+0x1db/0x260 [dm_mod] > [] ? dm_destroy+0x13/0x20 [dm_mod] > [] ? dev_remove+0x11e/0x180 [dm_mod] > [] ? dev_suspend+0x250/0x250 [dm_mod] > [] ? ctl_ioctl+0x255/0x500 [dm_mod] > [] ? do_wp_page+0x38f/0x750 > [] ? dm_ctl_ioctl+0x13/0x20 [dm_mod] > [] ? do_vfs_ioctl+0x2e0/0x4a0 > [] ? file_has_perm+0xa6/0xb0 > [] ? SyS_ioctl+0x81/0xa0 > [] ? system_call_fastpath+0x16/0x1b > ---[ end trace b4b8112437afdac8 ]--- > > so I think when dm_destroy() is called, it leads to the request_queue > in question going away. > > > I am wondering if we need to take a reference on the queue > > (blk_get_queue()) in blkg_alloc(), to make sure request queue is > > still around when blkg is being freed. > > I experimented with this and the crash does go away (and the docker > invocation completes successfully). I wasn't sure where the > accompanying blk_put_queue() should go. If I put it in blkg_free, the > kref accounting doesn't seem to even out, ie they never fall to zero. CC cgroups list. Ok, I think I figured out why reference counting does not seem to even out. There are two ways to destroy blkg. Either device goes away and blk_release_queue() will take care of removing blkg or cgroup is deleted and that will take care of cleaning up blkg. I think only exception is root blkg where one can not delete root cgroup so it is cleaned up only when request queue goes away. Now if blkg holds a reference to queue, then blk_release_queue() never gets called. And root blkg can't be cleaned till queue goes away. So this seems like chicken and egg situation. Even for non-root blkg, blkg will not be cleaned till cgroup goes away. Tejun, any thoughts on how to solve this issue. Delaying blkg release in rcu context and then expecting queue to be still present is causing this problem. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/