Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932463Ab3FRPKR (ORCPT ); Tue, 18 Jun 2013 11:10:17 -0400 Received: from cobra.newdream.net ([66.33.216.30]:39083 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755047Ab3FRPKP (ORCPT ); Tue, 18 Jun 2013 11:10:15 -0400 Date: Tue, 18 Jun 2013 08:10:14 -0700 (PDT) From: Sage Weil X-X-Sender: sage@cobra.newdream.net To: majianpeng cc: ceph-devel , linux-kernel Subject: Re: [PATCH] ceph: fix sleeping function called from invalid context. In-Reply-To: <201306181930448773810@gmail.com> Message-ID: References: <201306181930448773810@gmail.com> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4334 Lines: 88 On Tue, 18 Jun 2013, majianpeng wrote: > [ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 > [ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv > [ 1121.231971] 1 lock held by mv/9831: > [ 1121.231973] #0: (&(&ci->i_ceph_lock)->rlock){+.+...}, at:[] ceph_getxattr+0x58/0x1d0 [ceph] > [ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215 > [ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By > O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 > [ 1121.232027] ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8 > [ 1121.232045] ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba > [ 1121.232052] 0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68 > [ 1121.232056] Call Trace: > [ 1121.232062] [] dump_stack+0x19/0x1b > [ 1121.232067] [] __might_sleep+0xe5/0x110 > [ 1121.232071] [] down_read+0x2a/0x98 > [ 1121.232080] [] ceph_vxattrcb_layout+0x60/0xf0 [ceph] > [ 1121.232088] [] ceph_getxattr+0x9f/0x1d0 [ceph] > [ 1121.232093] [] vfs_getxattr+0xa8/0xd0 > [ 1121.232097] [] getxattr+0xab/0x1c0 > [ 1121.232100] [] ? final_putname+0x22/0x50 > [ 1121.232104] [] ? kmem_cache_free+0xb0/0x260 > [ 1121.232107] [] ? final_putname+0x22/0x50 > [ 1121.232110] [] ? trace_hardirqs_on+0xd/0x10 > [ 1121.232114] [] ? sysret_check+0x1b/0x56 > [ 1121.232120] [] SyS_fgetxattr+0x6c/0xc0 > [ 1121.232125] [] system_call_fastpath+0x16/0x1b > [ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002 > [ 1121.232154] 1 lock held by mv/9831: > [ 1121.232156] #0: (&(&ci->i_ceph_lock)->rlock){+.+...}, at: > [] ceph_getxattr+0x58/0x1d0 [ceph] > > I think move the ci->i_ceph_lock down is safe because we can't free > ceph_inode_info at there. > > Signed-off-by: Jianpeng Ma > --- > fs/ceph/xattr.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c > index 9b6b2b6..4efde06 100644 > --- a/fs/ceph/xattr.c > +++ b/fs/ceph/xattr.c > @@ -675,7 +675,6 @@ ssize_t ceph_getxattr(struct dentry *dentry, const char *name, void *value, > if (!ceph_is_valid_xattr(name)) > return -ENODATA; > > - spin_lock(&ci->i_ceph_lock); > dout("getxattr %p ver=%lld index_ver=%lld\n", inode, > ci->i_xattrs.version, ci->i_xattrs.index_version); Unfortunately these intervening lines neext i_ceph_lock to prevent the i_xattrs struct contents from shifting underneath us. It is more expensive for the general getxattr case, but a simpler fix is to take map_sem outside of i_ceph_lock. I think the best solution would be to pass an argument to teh vxattrcb callbacks indicating whether map_sem is held. If the callback needs to look at the map (for the layout xattr) and it isn't held yet, return EAGAIN and have teh caller take the lock and retry. Alternatively, it could check the xattr name in the caller and decide whether to take the lock, although that is a bit less elegant and maintainable. Either way, I think the right solution here is conditionally taking map_sem in ceph_getxattr... sage > > @@ -683,9 +682,10 @@ ssize_t ceph_getxattr(struct dentry *dentry, const char *name, void *value, > vxattr = ceph_match_vxattr(inode, name); > if (vxattr && !(vxattr->exists_cb && !vxattr->exists_cb(ci))) { > err = vxattr->getxattr_cb(ci, value, size); > - goto out; > + return err; > } > > + spin_lock(&ci->i_ceph_lock); > if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1) && > (ci->i_xattrs.index_version >= ci->i_xattrs.version)) { > goto get_xattr; > -- > 1.8.3.rc1.44.gb387c77 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/