With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
see a hang when accessing some NFS exported XFS filesystems. Local
access to these filesystems ahead of time works without problems.
This does not occur with 2.6.23.1. The filesystem does not appear to
be corrupt.
The call chain for the wedged process is:
[ 1462.911256] nfsd D ffffffff80547840 4760 2966 2
[ 1462.911283] ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610
[ 1462.911322] ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0
[ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
[ 1462.911391] Call Trace:
[ 1462.911417] [<ffffffff8052e638>] __down+0xe9/0x101
[ 1462.911437] [<ffffffff8022cc80>] default_wake_function+0x0/0xe
[ 1462.911458] [<ffffffff8052e275>] __down_failed+0x35/0x3a
[ 1462.911480] [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
[ 1462.911501] [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
[ 1462.911522] [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
[ 1462.911543] [<ffffffff8035ad5b>] _xfs_buf_find+0x1ba/0x24d
[ 1462.911564] [<ffffffff8035ae48>] xfs_buf_get_flags+0x5a/0x14b
[ 1462.911586] [<ffffffff8035b490>] xfs_buf_read_flags+0x12/0x86
[ 1462.911607] [<ffffffff8034ecf6>] xfs_trans_read_buf+0x4c/0x2cf
[ 1462.911629] [<ffffffff803292be>] xfs_da_do_buf+0x41b/0x65b
[ 1462.911652] [<ffffffff80329568>] xfs_da_read_buf+0x24/0x29
[ 1462.911673] [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
[ 1462.911694] [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
[ 1462.911717] [<ffffffff8032c718>] xfs_dir2_block_lookup+0x15/0x8e
[ 1462.911738] [<ffffffff8032b8e1>] xfs_dir_lookup+0xd2/0x12c
[ 1462.911761] [<ffffffff8036d658>] submit_bio+0x10d/0x114
[ 1462.911781] [<ffffffff8034fb56>] xfs_dir_lookup_int+0x2c/0xc5
[ 1462.911802] [<ffffffff802507a2>] lockdep_init_map+0x90/0x495
[ 1462.911823] [<ffffffff80353436>] xfs_lookup+0x44/0x6f
[ 1462.911843] [<ffffffff8035e364>] xfs_vn_lookup+0x29/0x60
[ 1462.915246] [<ffffffff8028856c>] __lookup_hash+0xe5/0x109
[ 1462.915267] [<ffffffff802893dd>] lookup_one_len+0x41/0x4e
[ 1462.915289] [<ffffffff80303d05>] compose_entry_fh+0xc1/0x117
[ 1462.915311] [<ffffffff80303f4c>] encode_entry+0x17c/0x38b
[ 1462.915333] [<ffffffff80261e4e>] find_or_create_page+0x3f/0xc9
[ 1462.915355] [<ffffffff8035a2c0>] _xfs_buf_lookup_pages+0x2c1/0x2f6
[ 1462.915377] [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
[ 1462.915399] [<ffffffff8027e632>] cache_alloc_refill+0x1ba/0x4b9
[ 1462.915424] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.915448] [<ffffffff8030416b>] nfs3svc_encode_entry_plus+0x10/0x13
[ 1462.915469] [<ffffffff8032c67c>] xfs_dir2_block_getdents+0x15b/0x1e2
[ 1462.915491] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.915514] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.915534] [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
[ 1462.915557] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.915579] [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
[ 1462.915599] [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
[ 1462.915619] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.915642] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
[ 1462.915663] [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
[ 1462.915686] [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
[ 1462.915706] [<ffffffff805215cd>] svc_process+0x3f8/0x717
[ 1462.915729] [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
[ 1462.915749] [<ffffffff8020c648>] child_rip+0xa/0x12
[ 1462.915769] [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
[ 1462.915792] [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
[ 1462.915812] [<ffffffff8020c63e>] child_rip+0x0/0x12
Over time other processes pile up beind this.
[ 1462.910728] nfsd D ffffffffffffffff 5440 2965 2
[ 1462.910769] ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900
[ 1462.910812] ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0
[ 1462.910852] ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040
[ 1462.910882] Call Trace:
[ 1462.910909] [<ffffffff802fbadf>] nfsd_permission+0x95/0xeb
[ 1462.910931] [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
[ 1462.910950] [<ffffffff8052d729>] mutex_lock_nested+0x165/0x27c
[ 1462.910971] [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
[ 1462.910994] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.911015] [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
[ 1462.911037] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
[ 1462.911057] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
[ 1462.911079] [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
[ 1462.911102] [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
[ 1462.911122] [<ffffffff805215cd>] svc_process+0x3f8/0x717
[ 1462.911143] [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
[ 1462.911165] [<ffffffff8020c648>] child_rip+0xa/0x12
[ 1462.911184] [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
[ 1462.911206] [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
[ 1462.911225] [<ffffffff8020c63e>] child_rip+0x0/0x12
Any suggestions other than to bisect this? (Bisection might be
painful as it crosses the x86-merge.)
I wonder if this is a similar hang to what Christian was seeing here:
http://lkml.org/lkml/2007/11/13/319
Benny
On Nov. 14, 2007, 9:04 +0200, Chris Wedgwood <[email protected]> wrote:
> With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> see a hang when accessing some NFS exported XFS filesystems. Local
> access to these filesystems ahead of time works without problems.
>
> This does not occur with 2.6.23.1. The filesystem does not appear to
> be corrupt.
>
>
> The call chain for the wedged process is:
>
> [ 1462.911256] nfsd D ffffffff80547840 4760 2966 2
> [ 1462.911283] ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610
> [ 1462.911322] ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0
> [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
> [ 1462.911391] Call Trace:
> [ 1462.911417] [<ffffffff8052e638>] __down+0xe9/0x101
> [ 1462.911437] [<ffffffff8022cc80>] default_wake_function+0x0/0xe
> [ 1462.911458] [<ffffffff8052e275>] __down_failed+0x35/0x3a
> [ 1462.911480] [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
> [ 1462.911501] [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
> [ 1462.911522] [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
> [ 1462.911543] [<ffffffff8035ad5b>] _xfs_buf_find+0x1ba/0x24d
> [ 1462.911564] [<ffffffff8035ae48>] xfs_buf_get_flags+0x5a/0x14b
> [ 1462.911586] [<ffffffff8035b490>] xfs_buf_read_flags+0x12/0x86
> [ 1462.911607] [<ffffffff8034ecf6>] xfs_trans_read_buf+0x4c/0x2cf
> [ 1462.911629] [<ffffffff803292be>] xfs_da_do_buf+0x41b/0x65b
> [ 1462.911652] [<ffffffff80329568>] xfs_da_read_buf+0x24/0x29
> [ 1462.911673] [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
> [ 1462.911694] [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
> [ 1462.911717] [<ffffffff8032c718>] xfs_dir2_block_lookup+0x15/0x8e
> [ 1462.911738] [<ffffffff8032b8e1>] xfs_dir_lookup+0xd2/0x12c
> [ 1462.911761] [<ffffffff8036d658>] submit_bio+0x10d/0x114
> [ 1462.911781] [<ffffffff8034fb56>] xfs_dir_lookup_int+0x2c/0xc5
> [ 1462.911802] [<ffffffff802507a2>] lockdep_init_map+0x90/0x495
> [ 1462.911823] [<ffffffff80353436>] xfs_lookup+0x44/0x6f
> [ 1462.911843] [<ffffffff8035e364>] xfs_vn_lookup+0x29/0x60
> [ 1462.915246] [<ffffffff8028856c>] __lookup_hash+0xe5/0x109
> [ 1462.915267] [<ffffffff802893dd>] lookup_one_len+0x41/0x4e
> [ 1462.915289] [<ffffffff80303d05>] compose_entry_fh+0xc1/0x117
> [ 1462.915311] [<ffffffff80303f4c>] encode_entry+0x17c/0x38b
> [ 1462.915333] [<ffffffff80261e4e>] find_or_create_page+0x3f/0xc9
> [ 1462.915355] [<ffffffff8035a2c0>] _xfs_buf_lookup_pages+0x2c1/0x2f6
> [ 1462.915377] [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
> [ 1462.915399] [<ffffffff8027e632>] cache_alloc_refill+0x1ba/0x4b9
> [ 1462.915424] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915448] [<ffffffff8030416b>] nfs3svc_encode_entry_plus+0x10/0x13
> [ 1462.915469] [<ffffffff8032c67c>] xfs_dir2_block_getdents+0x15b/0x1e2
> [ 1462.915491] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915514] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915534] [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
> [ 1462.915557] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915579] [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
> [ 1462.915599] [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
> [ 1462.915619] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915642] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> [ 1462.915663] [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
> [ 1462.915686] [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
> [ 1462.915706] [<ffffffff805215cd>] svc_process+0x3f8/0x717
> [ 1462.915729] [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
> [ 1462.915749] [<ffffffff8020c648>] child_rip+0xa/0x12
> [ 1462.915769] [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
> [ 1462.915792] [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
> [ 1462.915812] [<ffffffff8020c63e>] child_rip+0x0/0x12
>
> Over time other processes pile up beind this.
>
> [ 1462.910728] nfsd D ffffffffffffffff 5440 2965 2
> [ 1462.910769] ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900
> [ 1462.910812] ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0
> [ 1462.910852] ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040
> [ 1462.910882] Call Trace:
> [ 1462.910909] [<ffffffff802fbadf>] nfsd_permission+0x95/0xeb
> [ 1462.910931] [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
> [ 1462.910950] [<ffffffff8052d729>] mutex_lock_nested+0x165/0x27c
> [ 1462.910971] [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
> [ 1462.910994] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.911015] [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
> [ 1462.911037] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.911057] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> [ 1462.911079] [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
> [ 1462.911102] [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
> [ 1462.911122] [<ffffffff805215cd>] svc_process+0x3f8/0x717
> [ 1462.911143] [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
> [ 1462.911165] [<ffffffff8020c648>] child_rip+0xa/0x12
> [ 1462.911184] [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
> [ 1462.911206] [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
> [ 1462.911225] [<ffffffff8020c63e>] child_rip+0x0/0x12
>
>
> Any suggestions other than to bisect this? (Bisection might be
> painful as it crosses the x86-merge.)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote:
> With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> see a hang when accessing some NFS exported XFS filesystems. Local
> access to these filesystems ahead of time works without problems.
>
> This does not occur with 2.6.23.1. The filesystem does not appear
> to be corrupt.
After some bisection pain (sg broken in the middle and XFS not
compiling in other places) the regression seems to be:
commit 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b
Author: Christoph Hellwig <[email protected]>
Date: Tue Aug 28 13:58:24 2007 +1000
[XFS] use filldir internally
There have been a lot of changes since this so reverting it and
retesting as-is won't work. I'll have to see what I can come up with
after some sleep.
I'm not building/testing with dmapi --- perhaps that makes a
difference here? I would think it would have broken with xfsqa but
the number of bug reports seems small so far.
On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote:
> I wonder if this is a similar hang to what Christian was seeing here:
> http://lkml.org/lkml/2007/11/13/319
Ah, thanks for noticing that. Christian Kujau, is /data an xfs
partition? There are a bunch of xfs commits in
^92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0
^291702f017efdfe556cb87b8530eb7d1ff08cbae
^1d677a6dfaac1d1cf51a7f58847077240985faf2
^fba956c46a72f9e7503fd464ffee43c632307e31
^bbf25010f1a6b761914430f5fca081ec8c7accd1
6e800af233e0bdf108efb7bd23c11ea6fa34cdeb
7b1915a989ea4d426d0fd98974ab80f30ef1d779
c223701cf6c706f42840631c1ca919a18e6e2800
f77bf01425b11947eeb3b5b54685212c302741b8
which was the range remaining for him to bisect.
--b.
>
> Benny
>
> On Nov. 14, 2007, 9:04 +0200, Chris Wedgwood <[email protected]> wrote:
> > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> > see a hang when accessing some NFS exported XFS filesystems. Local
> > access to these filesystems ahead of time works without problems.
> >
> > This does not occur with 2.6.23.1. The filesystem does not appear to
> > be corrupt.
> >
> >
> > The call chain for the wedged process is:
> >
> > [ 1462.911256] nfsd D ffffffff80547840 4760 2966 2
> > [ 1462.911283] ffff81010414d4d0 0000000000000046 0000000000000000 ffff81010414d610
> > [ 1462.911322] ffff810104cbc6e0 ffff81010414d480 ffffffff80746dc0 ffffffff80746dc0
> > [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
> > [ 1462.911391] Call Trace:
> > [ 1462.911417] [<ffffffff8052e638>] __down+0xe9/0x101
> > [ 1462.911437] [<ffffffff8022cc80>] default_wake_function+0x0/0xe
> > [ 1462.911458] [<ffffffff8052e275>] __down_failed+0x35/0x3a
> > [ 1462.911480] [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
> > [ 1462.911501] [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
> > [ 1462.911522] [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
> > [ 1462.911543] [<ffffffff8035ad5b>] _xfs_buf_find+0x1ba/0x24d
> > [ 1462.911564] [<ffffffff8035ae48>] xfs_buf_get_flags+0x5a/0x14b
> > [ 1462.911586] [<ffffffff8035b490>] xfs_buf_read_flags+0x12/0x86
> > [ 1462.911607] [<ffffffff8034ecf6>] xfs_trans_read_buf+0x4c/0x2cf
> > [ 1462.911629] [<ffffffff803292be>] xfs_da_do_buf+0x41b/0x65b
> > [ 1462.911652] [<ffffffff80329568>] xfs_da_read_buf+0x24/0x29
> > [ 1462.911673] [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
> > [ 1462.911694] [<ffffffff8032be40>] xfs_dir2_block_lookup_int+0x4d/0x1ab
> > [ 1462.911717] [<ffffffff8032c718>] xfs_dir2_block_lookup+0x15/0x8e
> > [ 1462.911738] [<ffffffff8032b8e1>] xfs_dir_lookup+0xd2/0x12c
> > [ 1462.911761] [<ffffffff8036d658>] submit_bio+0x10d/0x114
> > [ 1462.911781] [<ffffffff8034fb56>] xfs_dir_lookup_int+0x2c/0xc5
> > [ 1462.911802] [<ffffffff802507a2>] lockdep_init_map+0x90/0x495
> > [ 1462.911823] [<ffffffff80353436>] xfs_lookup+0x44/0x6f
> > [ 1462.911843] [<ffffffff8035e364>] xfs_vn_lookup+0x29/0x60
> > [ 1462.915246] [<ffffffff8028856c>] __lookup_hash+0xe5/0x109
> > [ 1462.915267] [<ffffffff802893dd>] lookup_one_len+0x41/0x4e
> > [ 1462.915289] [<ffffffff80303d05>] compose_entry_fh+0xc1/0x117
> > [ 1462.915311] [<ffffffff80303f4c>] encode_entry+0x17c/0x38b
> > [ 1462.915333] [<ffffffff80261e4e>] find_or_create_page+0x3f/0xc9
> > [ 1462.915355] [<ffffffff8035a2c0>] _xfs_buf_lookup_pages+0x2c1/0x2f6
> > [ 1462.915377] [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
> > [ 1462.915399] [<ffffffff8027e632>] cache_alloc_refill+0x1ba/0x4b9
> > [ 1462.915424] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915448] [<ffffffff8030416b>] nfs3svc_encode_entry_plus+0x10/0x13
> > [ 1462.915469] [<ffffffff8032c67c>] xfs_dir2_block_getdents+0x15b/0x1e2
> > [ 1462.915491] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915514] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915534] [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
> > [ 1462.915557] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915579] [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
> > [ 1462.915599] [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
> > [ 1462.915619] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915642] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> > [ 1462.915663] [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
> > [ 1462.915686] [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
> > [ 1462.915706] [<ffffffff805215cd>] svc_process+0x3f8/0x717
> > [ 1462.915729] [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
> > [ 1462.915749] [<ffffffff8020c648>] child_rip+0xa/0x12
> > [ 1462.915769] [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
> > [ 1462.915792] [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
> > [ 1462.915812] [<ffffffff8020c63e>] child_rip+0x0/0x12
> >
> > Over time other processes pile up beind this.
> >
> > [ 1462.910728] nfsd D ffffffffffffffff 5440 2965 2
> > [ 1462.910769] ffff8101040cdd40 0000000000000046 0000000000000001 ffff810103471900
> > [ 1462.910812] ffff8101029a72c0 ffff8101040cdcf0 ffffffff80746dc0 ffffffff80746dc0
> > [ 1462.910852] ffffffff80744020 ffffffff80746dc0 ffff81010008e0c0 ffff8101012a1040
> > [ 1462.910882] Call Trace:
> > [ 1462.910909] [<ffffffff802fbadf>] nfsd_permission+0x95/0xeb
> > [ 1462.910931] [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
> > [ 1462.910950] [<ffffffff8052d729>] mutex_lock_nested+0x165/0x27c
> > [ 1462.910971] [<ffffffff8052ec6b>] _spin_unlock+0x1f/0x49
> > [ 1462.910994] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.911015] [<ffffffff8028c9dd>] vfs_readdir+0x46/0x93
> > [ 1462.911037] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.911057] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
> > [ 1462.911079] [<ffffffff80303158>] nfsd3_proc_readdirplus+0x114/0x204
> > [ 1462.911102] [<ffffffff802f8b82>] nfsd_dispatch+0xde/0x1b6
> > [ 1462.911122] [<ffffffff805215cd>] svc_process+0x3f8/0x717
> > [ 1462.911143] [<ffffffff802f9148>] nfsd+0x1a9/0x2c1
> > [ 1462.911165] [<ffffffff8020c648>] child_rip+0xa/0x12
> > [ 1462.911184] [<ffffffff80520af8>] __svc_create_thread+0xea/0x1eb
> > [ 1462.911206] [<ffffffff802f8f9f>] nfsd+0x0/0x2c1
> > [ 1462.911225] [<ffffffff8020c63e>] child_rip+0x0/0x12
> >
> >
> > Any suggestions other than to bisect this? (Bisection might be
> > painful as it crosses the x86-merge.)
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote:
> With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> see a hang when accessing some NFS exported XFS filesystems. Local
> access to these filesystems ahead of time works without problems.
>
> This does not occur with 2.6.23.1. The filesystem does not appear to
> be corrupt.
>
> [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
> [ 1462.911391] Call Trace:
> [ 1462.911417] [<ffffffff8052e638>] __down+0xe9/0x101
> [ 1462.911437] [<ffffffff8022cc80>] default_wake_function+0x0/0xe
> [ 1462.911458] [<ffffffff8052e275>] __down_failed+0x35/0x3a
> [ 1462.911480] [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
> [ 1462.911501] [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
> [ 1462.911522] [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
this is bp->b_sema which lookup wants.
> [ 1462.915534] [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
> [ 1462.915557] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915579] [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
> [ 1462.915599] [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
> [ 1462.915619] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> [ 1462.915642] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
and this is the nasty nfsd case where a filldir callback calls back
into lookup. I suspect we're somehow holding b_sema already. Previously
this was okay because we weren't inside the actualy readdir code when
calling filldir but operate on a copy of the data.
This gem has bitten other filesystem before, I'll see if I can find a
way around it.
On Wed, Nov 14, 2007 at 03:29:52PM +0000, Christoph Hellwig wrote:
> On Tue, Nov 13, 2007 at 11:04:00PM -0800, Chris Wedgwood wrote:
> > With 2.6.24-rc2 (amd64) I sometimes (usually but perhaps not always)
> > see a hang when accessing some NFS exported XFS filesystems. Local
> > access to these filesystems ahead of time works without problems.
> >
> > This does not occur with 2.6.23.1. The filesystem does not appear to
> > be corrupt.
> >
>
> > [ 1462.911360] ffffffff80744020 ffffffff80746dc0 ffff81010129c140 ffff8101000ad100
> > [ 1462.911391] Call Trace:
> > [ 1462.911417] [<ffffffff8052e638>] __down+0xe9/0x101
> > [ 1462.911437] [<ffffffff8022cc80>] default_wake_function+0x0/0xe
> > [ 1462.911458] [<ffffffff8052e275>] __down_failed+0x35/0x3a
> > [ 1462.911480] [<ffffffff8035ac25>] _xfs_buf_find+0x84/0x24d
> > [ 1462.911501] [<ffffffff8035ad34>] _xfs_buf_find+0x193/0x24d
> > [ 1462.911522] [<ffffffff803599b1>] xfs_buf_lock+0x43/0x45
>
> this is bp->b_sema which lookup wants.
>
> > [ 1462.915534] [<ffffffff8032b6da>] xfs_readdir+0x91/0xb6
> > [ 1462.915557] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915579] [<ffffffff8035be9d>] xfs_file_readdir+0x31/0x40
> > [ 1462.915599] [<ffffffff8028c9f8>] vfs_readdir+0x61/0x93
> > [ 1462.915619] [<ffffffff8030415b>] nfs3svc_encode_entry_plus+0x0/0x13
> > [ 1462.915642] [<ffffffff802fc78e>] nfsd_readdir+0x6d/0xc5
>
> and this is the nasty nfsd case where a filldir callback calls back
> into lookup. I suspect we're somehow holding b_sema already. Previously
> this was okay because we weren't inside the actualy readdir code when
> calling filldir but operate on a copy of the data.
>
> This gem has bitten other filesystem before, I'll see if I can find a
> way around it.
This must have come up before; feel free to remind me: is there any way
to make the interface easier to use? (E.g. would it help if the filldir
callback could be passed a dentry?)
--b.
On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> This must have come up before; feel free to remind me: is there any way
> to make the interface easier to use? (E.g. would it help if the filldir
> callback could be passed a dentry?)
The best thing for the filesystem would be to have a readdirplus
(or have it folded into readdir) instead of calling into lookup
from ->filldir.
On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote:
> On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> > This must have come up before; feel free to remind me: is there any way
> > to make the interface easier to use? (E.g. would it help if the filldir
> > callback could be passed a dentry?)
>
> The best thing for the filesystem would be to have a readdirplus
> (or have it folded into readdir) instead of calling into lookup
> from ->filldir.
And the readdirplus would pass a dentry to its equivalent of ->filldir?
Or something else?
--b.
On Wed, Nov 14, 2007 at 12:53:22PM -0500, J. Bruce Fields wrote:
> On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote:
> > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> > > This must have come up before; feel free to remind me: is there any way
> > > to make the interface easier to use? (E.g. would it help if the filldir
> > > callback could be passed a dentry?)
> >
> > The best thing for the filesystem would be to have a readdirplus
> > (or have it folded into readdir) instead of calling into lookup
> > from ->filldir.
>
> And the readdirplus would pass a dentry to its equivalent of ->filldir?
> Or something else?
Personally I'd prefer it to only grow a struct stat or rather it's members
But the nfsd code currently expects a dentry so this might require some
major refactoring.
On Wed, Nov 14, 2007 at 06:02:41PM +0000, Christoph Hellwig wrote:
> On Wed, Nov 14, 2007 at 12:53:22PM -0500, J. Bruce Fields wrote:
> > On Wed, Nov 14, 2007 at 05:44:19PM +0000, Christoph Hellwig wrote:
> > > On Wed, Nov 14, 2007 at 12:39:22PM -0500, J. Bruce Fields wrote:
> > > > This must have come up before; feel free to remind me: is there any way
> > > > to make the interface easier to use? (E.g. would it help if the filldir
> > > > callback could be passed a dentry?)
> > >
> > > The best thing for the filesystem would be to have a readdirplus
> > > (or have it folded into readdir) instead of calling into lookup
> > > from ->filldir.
> >
> > And the readdirplus would pass a dentry to its equivalent of ->filldir?
> > Or something else?
>
> Personally I'd prefer it to only grow a struct stat or rather it's members
> But the nfsd code currently expects a dentry so this might require some
> major refactoring.
Well, we need to check for mountpoints, for example, so I don't see any
way out of needing a dentry. What's the drawback?
--b.
On Wed, 14 Nov 2007, J. Bruce Fields wrote:
> On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote:
>> I wonder if this is a similar hang to what Christian was seeing here:
>> http://lkml.org/lkml/2007/11/13/319
>
> Ah, thanks for noticing that. Christian Kujau, is /data an xfs
> partition?
Sorry for the late reply :\
Yes, the nfsd process only got stuck when I did ls(1) (with or without -l)
on a NFS share which contained a XFS partition. I did not care for the
underlying fs first so I just ls'ed my shares and noticed that it got
stuck. Now that you mention it I tried again, with a (git-wise) current
2.6 kernel and the same .config: http://nerdbynature.de/bits/2.6.24-rc2/nfsd/
Running ls on a ext3 or jfs backed nfs share did succeed, running ls on an
xfs backed nfs share did not. The sysrq-t (see dmesg.2.gz please) looks
like yours (to my untrained eye):
nfsd D c04131c0 0 8535 2
e7ea97b8 00000046 e7ea9000 c04131c0 e7ea97b8 e697e7e0 00000282 e697e7e8
e7ea97e4 c0409ebc f71f3500 00000001 f71f3500 c0115540 e697e804 e697e804
e697e7e0 8f082000 00000001 e7ea97f4 c0409cc2 00000004 00000062 e7ea9800
Nov 14 23:07:14 sheep kernel: [ 1870.124185] Call Trace:
[<c0409ebc>] __down+0x7c/0xd0
[<c0409cc2>] __down_failed+0xa/0x10
[<c0296d46>] xfs_buf_lock+0x46/0x50
[<c02985a2>] _xfs_buf_find+0xf2/0x190
[<c0298694>] xfs_buf_get_flags+0x54/0x120
[<c029877d>] xfs_buf_read_flags+0x1d/0x80
[<c0289afa>] xfs_trans_read_buf+0x4a/0x350
[<c025e049>] xfs_da_do_buf+0x409/0x760
[<c025e42f>] xfs_da_read_buf+0x2f/0x40
[<c02634f2>] xfs_dir2_leaf_lookup_int+0x172/0x270
[<c02637ce>] xfs_dir2_leaf_lookup+0x1e/0x90
[<c02608e4>] xfs_dir_lookup+0xe4/0x100
[<c028abde>] xfs_dir_lookup_int+0x2e/0x100
[<c028eee2>] xfs_lookup+0x62/0x90
[<c029b644>] xfs_vn_lookup+0x34/0x70
[<c016de06>] __lookup_hash+0xb6/0x100
[<c016ee6e>] lookup_one_len+0x4e/0x50
[<f9037769>] compose_entry_fh+0x59/0x120 [nfsd]
[<f9037c29>] encode_entry+0x329/0x3c0 [nfsd]
[<f9037cfb>] nfs3svc_encode_entry_plus+0x3b/0x50 [nfsd]
[<c02639b4>] xfs_dir2_leaf_getdents+0x174/0x900
[<c026070a>] xfs_readdir+0xba/0xd0
[<c0298d74>] xfs_file_readdir+0x44/0x70
[<c01726ae>] vfs_readdir+0x7e/0xa0
[<f902e6b3>] nfsd_readdir+0x73/0xe0 [nfsd]
[<f9036eea>] nfsd3_proc_readdirplus+0xda/0x200 [nfsd]
[<f902a2db>] nfsd_dispatch+0x11b/0x210 [nfsd]
[<f920f2ac>] svc_process+0x41c/0x760 [sunrpc]
[<f902a8c4>] nfsd+0x164/0x2a0 [nfsd]
[<c0103507>] kernel_thread_helper+0x7/0x10
>> Any suggestions other than to bisect this? (Bisection might be
>> painful as it crosses the x86-merge.)
Make that "impossible" for me, as I could not boot the bisected kernel and
marking versions as "bad" for unrelated things seems to invalidate the
results. However, from ~2500 revisions (2.6.24-rc2 to 2.6.23.1) down to
~20 or so in just 10 builds, that's pretty awesome.
Christian.
--
BOFH excuse #321:
Scheduled global CPU outage
On Wed, 14 Nov 2007, Chris Wedgwood wrote:
> After some bisection pain (sg broken in the middle and XFS not
> compiling in other places) the regression seems to be:
>
> commit 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b
> Author: Christoph Hellwig <[email protected]>
> Date: Tue Aug 28 13:58:24 2007 +1000
Following a git-bisect howto[0], I tried to revert this commit:
# git checkout master
# git revert 051e7cd44ab8f0f7c2958371485b4a1ff64a8d1b
Auto-merged fs/xfs/linux-2.6/xfs_file.c
CONFLICT (content): Merge conflict in fs/xfs/linux-2.6/xfs_file.c
Auto-merged fs/xfs/linux-2.6/xfs_vnode.h
CONFLICT (content): Merge conflict in fs/xfs/linux-2.6/xfs_vnode.h
Auto-merged fs/xfs/xfs_dir2.c
CONFLICT (content): Merge conflict in fs/xfs/xfs_dir2.c
Auto-merged fs/xfs/xfs_dir2.h
Auto-merged fs/xfs/xfs_dir2_block.c
Auto-merged fs/xfs/xfs_dir2_sf.c
Auto-merged fs/xfs/xfs_vnodeops.c
CONFLICT (content): Merge conflict in fs/xfs/xfs_vnodeops.c
Automatic revert failed. After resolving the conflicts,
mark the corrected paths with 'git add <paths>' and commit the result.
Any ideas?
Christian
[0] is this still up-to-date?
http://kernel.org/pub/software/scm/git/docs/v1.4.4.4/howto/isolate-bugs-with-bisect.txt
--
BOFH excuse #423:
It's not RFC-822 compliant.
On Wed, 14 Nov 2007, Christian Kujau wrote:
> Yes, the nfsd process only got stuck when I did ls(1) (with or without -l) on
> a NFS share which contained a XFS partition.
Since NFS was not working (the nfsd processes were already in D state), to
mount a CIFS share from the very same server (and the same client). I'm
exporting the same /data share (JFS), but, since it's smbd I don't have to
export every single submount (as it is with NFS):
* with NFS:
server:/data (jfs)
server:/data/sub (xfs)
* with CIFS:
server:/data (containing both the jfs and the xfs partition as one
single share to mount)
Upon accessing the /data/sub part of the CIFS share, the client hung,
waiting for the server to respond (the [cifs] kernel thread on the client
was spinning, waiting for i/o). On the server, similar things as with the
nfsd processes happened (although I know that the smbd (Samba) processes
are running completely in userspace):
http://nerdbynature.de/bits/2.6.24-rc2/nfsd/debug.3.txt.gz
Sysrq-t again on the server:
http://nerdbynature.de/bits/2.6.24-rc2/nfsd/dmesg.3.gz
smbd D c04131c0 0 22782 3039
e242ad60 00000046 e242a000 c04131c0 00000001 e7875264 00000246 e7f88a80
e242ada8 c040914c 00000000 00000002 c016dc64 e7a3b7b8 e242a000 e7875284
00000000 c016dc64 f7343d88 f6337e90 e7f88a80 e7875264 e242ad88 e7a3b7b8
Call Trace:
[<c040914c>] mutex_lock_nested+0xcc/0x2c0
[<c016dc64>] do_lookup+0xa4/0x190
[<c016f6f9>] __link_path_walk+0x749/0xd10
[<c016fd04>] link_path_walk+0x44/0xc0
[<c016fd98>] path_walk+0x18/0x20
[<c016ff98>] do_path_lookup+0x78/0x1c0
[<c0170998>] __user_walk_fd+0x38/0x60
[<c0169bd1>] vfs_stat_fd+0x21/0x50
[<c0169ca1>] vfs_stat+0x11/0x20
[<c0169cc4>] sys_stat64+0x14/0x30
[<c01028d6>] sysenter_past_esp+0x5f/0xa5
=======================
So, it's really not NFS but ?FS related?
Christian.
--
BOFH excuse #199:
the curls in your keyboard cord are losing electricity.
On Thu, November 15, 2007 08:51, Christian Kujau wrote:
> Since NFS was not working (the nfsd processes were already in D state),
> to mount a CIFS share from the very same server (and the same client).
That should read:
Since NFS was not working (the nfsd processes were already in D state), I
decided to mount a CIFS share from the very same server (and the same
client). [...]
C.
--
BOFH excuse #442:
Trojan horse ran out of hay
On Thu, 15 Nov 2007, Christian Kujau wrote:
> Upon accessing the /data/sub part of the CIFS share, the client hung, waiting
> for the server to respond (the [cifs] kernel thread on the client was
> spinning, waiting for i/o). On the server, similar things as with the nfsd
> processes happened
Turns out that the CIFS only hung because the server was already stuck
because of the nfsd/XFS issue. After rebooting the server, I was
able to access the CIFS shares (the xfs partition too) just fine. Yes, the
xfs partition itsself has been checked too and no errors were found.
C.
--
BOFH excuse #348:
We're on Token Ring, and it looks like the token got loose.
On Thu, Nov 15, 2007 at 08:51:36AM +0100, Christian Kujau wrote:
> [<c040914c>] mutex_lock_nested+0xcc/0x2c0
> [<c016dc64>] do_lookup+0xa4/0x190
> [<c016f6f9>] __link_path_walk+0x749/0xd10
> [<c016fd04>] link_path_walk+0x44/0xc0
> [<c016fd98>] path_walk+0x18/0x20
> [<c016ff98>] do_path_lookup+0x78/0x1c0
> [<c0170998>] __user_walk_fd+0x38/0x60
> [<c0169bd1>] vfs_stat_fd+0x21/0x50
> [<c0169ca1>] vfs_stat+0x11/0x20
> [<c0169cc4>] sys_stat64+0x14/0x30
> [<c01028d6>] sysenter_past_esp+0x5f/0xa5
nfsd already wedged up and holds a lock, this is expected.
I'm not sure what you're doing here, but a viable work-around for now
might be to use nfsv2 mounts, something like
mount -o vers=2 ...
or to keep v3 and disable readdirplus doing something like:
mount -o vers=3,nordirplus ...
The later I didn't test but was suggested on #linuxfs.
On Fri, November 16, 2007 01:34, Chris Wedgwood wrote:
> I'm not sure what you're doing here, but a viable work-around for now
> might be to use nfsv2 mounts, something like
>
> mount -o vers=2 ...
> or to keep v3 and disable readdirplus doing something like:
> mount -o vers=3,nordirplus ...
OK, I'll try this. I hope this can be fixed somehow before 2.6.24...
Thank you for your time,
Christian.
--
BOFH excuse #442:
Trojan horse ran out of hay
On Fri, Nov 16, 2007 at 10:17:17AM +0100, Christian Kujau wrote:
> OK, I'll try this. I hope this can be fixed somehow before 2.6.24...
Well, one simple nasty idea would be something like:
diff --git a/fs/Kconfig b/fs/Kconfig
index 429a002..da231fd 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1604,7 +1604,7 @@ config NFS_FS
config NFS_V3
bool "Provide NFSv3 client support"
- depends on NFS_FS
+ depends on NFS_FS && !XFS
help
Say Y here if you want your NFS client to be able to speak version
3 of the NFS protocol.
So people who are likely to be affect just side-step the issue until
it's resolved.
On Fri, 2007-11-16 at 03:03 -0800, Chris Wedgwood wrote:
> On Fri, Nov 16, 2007 at 10:17:17AM +0100, Christian Kujau wrote:
>
> > OK, I'll try this. I hope this can be fixed somehow before 2.6.24...
>
> Well, one simple nasty idea would be something like:
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 429a002..da231fd 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -1604,7 +1604,7 @@ config NFS_FS
>
> config NFS_V3
> bool "Provide NFSv3 client support"
> - depends on NFS_FS
> + depends on NFS_FS && !XFS
> help
> Say Y here if you want your NFS client to be able to speak version
> 3 of the NFS protocol.
>
> So people who are likely to be affect just side-step the issue until
> it's resolved.
Very funny, but disabling XFS on the client won't help.
Trond
On Fri, Nov 16, 2007 at 09:19:32AM -0500, Trond Myklebust wrote:
> Very funny, but disabling XFS on the client won't help.
Oops, I meant it for NFSD... and I'm somewhat serious. I'm not
saying it's a good long term solution, but a potentially safer
short-term workaround.
On Fri, 16 Nov 2007, Chris Wedgwood wrote:
> Oops, I meant it for NFSD... and I'm somewhat serious. I'm not
> saying it's a good long term solution, but a potentially safer
> short-term workaround.
I've opened http://bugzilla.kernel.org/show_bug.cgi?id=9400 to track this
one (and to not forget about it :)).
I wonder why so few people are seeing this, I'd have assumed that
NFSv3 && XFS is not sooo exotic...
Christian.
--
BOFH excuse #273:
The cord jumped over and hit the power switch.
On Sun, 18 Nov 2007, Christian Kujau wrote:
> On Fri, 16 Nov 2007, Chris Wedgwood wrote:
>> Oops, I meant it for NFSD... and I'm somewhat serious. I'm not
>> saying it's a good long term solution, but a potentially safer
>> short-term workaround.
>
> I've opened http://bugzilla.kernel.org/show_bug.cgi?id=9400 to track this one
> (and to not forget about it :)).
>
> I wonder why so few people are seeing this, I'd have assumed that
> NFSv3 && XFS is not sooo exotic...
Still on 2.6.23.x here (also use nfsv3 + xfs).
>
> Christian.
> --
> BOFH excuse #273:
>
> The cord jumped over and hit the power switch.
>
>
On Sun, 18 Nov 2007, Justin Piszcz wrote:
>> I wonder why so few people are seeing this, I'd have assumed that
>> NFSv3 && XFS is not sooo exotic...
> Still on 2.6.23.x here (also use nfsv3 + xfs).
So, it's the "too few people are testing -rc kernels" issue again :(
Christian.
--
BOFH excuse #118:
the router thinks its a printer.
On Wed, Nov 14, 2007 at 01:08:38PM -0500, J. Bruce Fields wrote:
> > Personally I'd prefer it to only grow a struct stat or rather it's members
> > But the nfsd code currently expects a dentry so this might require some
> > major refactoring.
>
> Well, we need to check for mountpoints, for example, so I don't see any
> way out of needing a dentry. What's the drawback?
You're right - we'd probably need the dentry. The drawback is that
we need to always get it in the dcache. Which might be a good thing
depending on the workload.
On Wed, Nov 21, 2007 at 03:07:46PM +0000, Christoph Hellwig wrote:
> On Wed, Nov 14, 2007 at 01:08:38PM -0500, J. Bruce Fields wrote:
> > > Personally I'd prefer it to only grow a struct stat or rather it's members
> > > But the nfsd code currently expects a dentry so this might require some
> > > major refactoring.
> >
> > Well, we need to check for mountpoints, for example, so I don't see any
> > way out of needing a dentry. What's the drawback?
>
> You're right - we'd probably need the dentry. The drawback is that
> we need to always get it in the dcache. Which might be a good thing
> depending on the workload.
In any case, if the new api were only used by nfsd for now, then there'd
be no change here.
Seems like it might be worth a try.
--b.
The current readdir implementation deadlocks on a btree buffers locks
because nfsd calls back into ->lookup from the filldir callback. The
only short-term fix for this is to revert to the old inefficient
double-buffering scheme.
This patch does exactly that and reverts xfs_file_readdir to what's
basically the 2.6.23 version minus the uio and vnops junk.
I'll try to find something more optimal for 2.6.25 or at least find a
way to use the proper version for local access.
Signed-off-by: Christoph Hellwig <[email protected]>
Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 11:41:20.000000000 +0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 17:14:27.000000000 +0100
@@ -218,6 +218,15 @@
}
#endif /* CONFIG_XFS_DMAPI */
+/*
+ * Unfortunately we can't just use the clean and simple readdir implementation
+ * below, because nfs might call back into ->lookup from the filldir callback
+ * and that will deadlock the low-level btree code.
+ *
+ * Hopefully we'll find a better workaround that allows to use the optimal
+ * version at least for local readdirs for 2.6.25.
+ */
+#if 0
STATIC int
xfs_file_readdir(
struct file *filp,
@@ -249,6 +258,121 @@
return -error;
return 0;
}
+#else
+
+struct hack_dirent {
+ int namlen;
+ loff_t offset;
+ u64 ino;
+ unsigned int d_type;
+ char name[];
+};
+
+struct hack_callback {
+ char *dirent;
+ size_t len;
+ size_t used;
+};
+
+STATIC int
+xfs_hack_filldir(
+ void *__buf,
+ const char *name,
+ int namlen,
+ loff_t offset,
+ u64 ino,
+ unsigned int d_type)
+{
+ struct hack_callback *buf = __buf;
+ struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used);
+
+ if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len)
+ return -EINVAL;
+
+ de->namlen = namlen;
+ de->offset = offset;
+ de->ino = ino;
+ de->d_type = d_type;
+ memcpy(de->name, name, namlen);
+ buf->used += sizeof(struct hack_dirent) + namlen;
+ return 0;
+}
+
+STATIC int
+xfs_file_readdir(
+ struct file *filp,
+ void *dirent,
+ filldir_t filldir)
+{
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ xfs_inode_t *ip = XFS_I(inode);
+ struct hack_callback buf;
+ struct hack_dirent *de;
+ int error;
+ loff_t size;
+ int eof = 0;
+ xfs_off_t start_offset, curr_offset, offset;
+
+ /*
+ * Try fairly hard to get memory
+ */
+ buf.len = PAGE_CACHE_SIZE;
+ do {
+ buf.dirent = kmalloc(buf.len, GFP_KERNEL);
+ if (buf.dirent)
+ break;
+ buf.len >>= 1;
+ } while (buf.len >= 1024);
+
+ if (!buf.dirent)
+ return -ENOMEM;
+
+ curr_offset = filp->f_pos;
+ if (curr_offset == 0x7fffffff)
+ offset = 0xffffffff;
+ else
+ offset = filp->f_pos;
+
+ while (!eof) {
+ int reclen;
+ start_offset = offset;
+
+ buf.used = 0;
+ error = -xfs_readdir(ip, &buf, buf.len, &offset,
+ xfs_hack_filldir);
+ if (error || offset == start_offset) {
+ size = 0;
+ break;
+ }
+
+ size = buf.used;
+ de = (struct hack_dirent *)buf.dirent;
+ while (size > 0) {
+ if (filldir(dirent, de->name, de->namlen,
+ curr_offset & 0x7fffffff,
+ de->ino, de->d_type)) {
+ goto done;
+ }
+
+ reclen = sizeof(struct hack_dirent) + de->namlen;
+ size -= reclen;
+ curr_offset = de->offset /* & 0x7fffffff */;
+ de = (struct hack_dirent *)((char *)de + reclen);
+ }
+ }
+
+ done:
+ if (!error) {
+ if (size == 0)
+ filp->f_pos = offset & 0x7fffffff;
+ else if (de)
+ filp->f_pos = curr_offset;
+ }
+
+ kfree(buf.dirent);
+ return error;
+}
+#endif
STATIC int
xfs_file_mmap(
On Sun, Nov 25, 2007 at 04:30:14PM +0000, Christoph Hellwig wrote:
> The current readdir implementation deadlocks on a btree buffers
> locks because nfsd calls back into ->lookup from the filldir
> callback. The only short-term fix for this is to revert to the old
> inefficient double-buffering scheme.
This seems to work really well here.
> This patch does exactly that and reverts xfs_file_readdir to what's
> basically the 2.6.23 version minus the uio and vnops junk.
This should probably be submitted for inclusion stable-2.6.24.
Perhaps a version with the #if 0 [...] stuff dropped? (I'm happy to
send a patch for that if you prefer).
On Sun, 25 Nov 2007, Christoph Hellwig wrote:
> This patch does exactly that and reverts xfs_file_readdir to what's
> basically the 2.6.23 version minus the uio and vnops junk.
Thanks, works here too (without nordirplus as a mountoption).
Am I supposed to close the bug[0] or do you guys want to leave this
open to track the Real Fix (TM) for 2.6.25?
Again, thank you for the fix!
Christian.
[0] http://bugzilla.kernel.org/show_bug.cgi?id=9400
--
BOFH excuse #112:
The monitor is plugged into the serial port
Christoph Hellwig wrote:
> The current readdir implementation deadlocks on a btree buffers locks
> because nfsd calls back into ->lookup from the filldir callback. The
> only short-term fix for this is to revert to the old inefficient
> double-buffering scheme.
>
Probably why Steve did this: :)
xfs_file.c
----------------------------
revision 1.40
date: 2001/03/15 23:33:20; author: lord; state: Exp; lines: +54 -17
modid: 2.4.x-xfs:slinx:90125a
Change linvfs_readdir to allocate a buffer, call xfs to fill it, and
then call the filldir function on each entry. This is instead of doing the
filldir deep in the bowels of xfs which causes locking problems.
----------------------------
Yes it looks like it is done equivalently to before (minus the uio stuff etc).
I don't know what the 7fff* masking is about but we did that previously.
I hadn't come across the name[] struct field before,
was used to name[0] (or name[1] in times gone by) but found that is
a kosher way of doing things too for the variable len string at the end.
Hmmm, don't see the point of "eof" local var now.
Previously bhv_vop_readdir() returned eof.
I presume if we don't move the offset (offset == startoffset) then
we're done and break out?
So we lost eof when going to the filldir in the getdents code etc...
--Tim
> This patch does exactly that and reverts xfs_file_readdir to what's
> basically the 2.6.23 version minus the uio and vnops junk.
>
> I'll try to find something more optimal for 2.6.25 or at least find a
> way to use the proper version for local access.
>
>
> Signed-off-by: Christoph Hellwig <[email protected]>
>
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_file.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 11:41:20.000000000 +0100
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_file.c 2007-11-25 17:14:27.000000000 +0100
> @@ -218,6 +218,15 @@
> }
> #endif /* CONFIG_XFS_DMAPI */
>
> +/*
> + * Unfortunately we can't just use the clean and simple readdir implementation
> + * below, because nfs might call back into ->lookup from the filldir callback
> + * and that will deadlock the low-level btree code.
> + *
> + * Hopefully we'll find a better workaround that allows to use the optimal
> + * version at least for local readdirs for 2.6.25.
> + */
> +#if 0
> STATIC int
> xfs_file_readdir(
> struct file *filp,
> @@ -249,6 +258,121 @@
> return -error;
> return 0;
> }
> +#else
> +
> +struct hack_dirent {
> + int namlen;
> + loff_t offset;
> + u64 ino;
> + unsigned int d_type;
> + char name[];
> +};
> +
> +struct hack_callback {
> + char *dirent;
> + size_t len;
> + size_t used;
> +};
> +
> +STATIC int
> +xfs_hack_filldir(
> + void *__buf,
> + const char *name,
> + int namlen,
> + loff_t offset,
> + u64 ino,
> + unsigned int d_type)
> +{
> + struct hack_callback *buf = __buf;
> + struct hack_dirent *de = (struct hack_dirent *)(buf->dirent + buf->used);
> +
> + if (buf->used + sizeof(struct hack_dirent) + namlen > buf->len)
> + return -EINVAL;
> +
> + de->namlen = namlen;
> + de->offset = offset;
> + de->ino = ino;
> + de->d_type = d_type;
> + memcpy(de->name, name, namlen);
> + buf->used += sizeof(struct hack_dirent) + namlen;
> + return 0;
> +}
> +
> +STATIC int
> +xfs_file_readdir(
> + struct file *filp,
> + void *dirent,
> + filldir_t filldir)
> +{
> + struct inode *inode = filp->f_path.dentry->d_inode;
> + xfs_inode_t *ip = XFS_I(inode);
> + struct hack_callback buf;
> + struct hack_dirent *de;
> + int error;
> + loff_t size;
> + int eof = 0;
> + xfs_off_t start_offset, curr_offset, offset;
> +
> + /*
> + * Try fairly hard to get memory
> + */
> + buf.len = PAGE_CACHE_SIZE;
> + do {
> + buf.dirent = kmalloc(buf.len, GFP_KERNEL);
> + if (buf.dirent)
> + break;
> + buf.len >>= 1;
> + } while (buf.len >= 1024);
> +
> + if (!buf.dirent)
> + return -ENOMEM;
> +
> + curr_offset = filp->f_pos;
> + if (curr_offset == 0x7fffffff)
> + offset = 0xffffffff;
> + else
> + offset = filp->f_pos;
> +
> + while (!eof) {
> + int reclen;
> + start_offset = offset;
> +
> + buf.used = 0;
> + error = -xfs_readdir(ip, &buf, buf.len, &offset,
> + xfs_hack_filldir);
> + if (error || offset == start_offset) {
> + size = 0;
> + break;
> + }
> +
> + size = buf.used;
> + de = (struct hack_dirent *)buf.dirent;
> + while (size > 0) {
> + if (filldir(dirent, de->name, de->namlen,
> + curr_offset & 0x7fffffff,
> + de->ino, de->d_type)) {
> + goto done;
> + }
> +
> + reclen = sizeof(struct hack_dirent) + de->namlen;
> + size -= reclen;
> + curr_offset = de->offset /* & 0x7fffffff */;
> + de = (struct hack_dirent *)((char *)de + reclen);
> + }
> + }
> +
> + done:
> + if (!error) {
> + if (size == 0)
> + filp->f_pos = offset & 0x7fffffff;
> + else if (de)
> + filp->f_pos = curr_offset;
> + }
> +
> + kfree(buf.dirent);
> + return error;
> +}
> +#endif
>
> STATIC int
> xfs_file_mmap(
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Fri, Nov 30, 2007 at 12:45:05AM +0100, Christian Kujau wrote:
> On Sun, 25 Nov 2007, Christoph Hellwig wrote:
> >This patch does exactly that and reverts xfs_file_readdir to what's
> >basically the 2.6.23 version minus the uio and vnops junk.
>
> Thanks, works here too (without nordirplus as a mountoption).
> Am I supposed to close the bug[0] or do you guys want to leave this
> open to track the Real Fix (TM) for 2.6.25?
I've been giving the fix some QA - that change appears to have caused
a different regression as well so I'm holding off for a little bit
until we know what the cause of the other regression is before deciding
whether to take this fix or back the entire change out.
Either way we'll include the fix in 2.6.24....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
Wow, was it really that long ago!
Looks like the readdir is in the bowels of the btree code when
filldir gets called
here, there are probably locks on several buffers in the btree at
this point. This
will only show up for large directories I bet.
The xfs readdir code has the complete xfs inode number in its hands
at this point
(filldir is not necessarily getting all the bits of it). All we are
doing the lookup
for really is to get the inode number back again so we can get the inode
and get the attributes. Rather dumb really. There has got to be a way of
doing a callout structure here so that the inode number can be pushed
through filldir and back into an fs specific call. The fs then can do
a lookup
by id - which is what it does most of the time for resolving nfs handles
anyway. Should be more efficient than the current scheme.
Just rambling, not a single line of code was consulted in writing
this message.
You want to make a big fat btree directory for testing this stuff.
Make sure it gets
at least a couple of layers of node blocks.
Steve
On Nov 30, 2007, at 1:22 AM, Timothy Shimmin wrote:
> Christoph Hellwig wrote:
>> The current readdir implementation deadlocks on a btree buffers locks
>> because nfsd calls back into ->lookup from the filldir callback. The
>> only short-term fix for this is to revert to the old inefficient
>> double-buffering scheme.
>
> Probably why Steve did this: :)
>
> xfs_file.c
> ----------------------------
> revision 1.40
> date: 2001/03/15 23:33:20; author: lord; state: Exp; lines: +54 -17
> modid: 2.4.x-xfs:slinx:90125a
> Change linvfs_readdir to allocate a buffer, call xfs to fill it, and
> then call the filldir function on each entry. This is instead of
> doing the
> filldir deep in the bowels of xfs which causes locking problems.
> ----------------------------
>
On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:
> Looks like the readdir is in the bowels of the btree code when
> filldir gets called here, there are probably locks on several
> buffers in the btree at this point. This will only show up for large
> directories I bet.
I see it for fairly small directories. Larger than what you can stuff
into an inode but less than a block (I'm not checking but fairly sure
that's the case).
> Just rambling, not a single line of code was consulted in writing
> this message.
Can you explain why the offset is capped and treated in an 'odd way'
at all?
+ curr_offset = filp->f_pos;
+ if (curr_offset == 0x7fffffff)
+ offset = 0xffffffff;
+ else
+ offset = filp->f_pos;
and later the offset to filldir is masked. Is that some restriction
in filldir?
On Nov 30, 2007, at 5:04 PM, Chris Wedgwood wrote:
> On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:
>
>> Looks like the readdir is in the bowels of the btree code when
>> filldir gets called here, there are probably locks on several
>> buffers in the btree at this point. This will only show up for large
>> directories I bet.
>
> I see it for fairly small directories. Larger than what you can stuff
> into an inode but less than a block (I'm not checking but fairly sure
> that's the case).
I told you I did not read any code..... once a directory is out of
the inode
and into disk blocks, there will be a lock on the buffer while the
contents
are copied out.
>
>> Just rambling, not a single line of code was consulted in writing
>> this message.
>
> Can you explain why the offset is capped and treated in an 'odd way'
> at all?
>
> + curr_offset = filp->f_pos;
> + if (curr_offset == 0x7fffffff)
> + offset = 0xffffffff;
> + else
> + offset = filp->f_pos;
>
> and later the offset to filldir is masked. Is that some restriction
> in filldir?
Too long ago to remember exact reasons. The only thing I do recall is
issues
with glibc readdir code which wanted to remember positions in a dir
and seek
backwards. It was translating structures and could end up with more
data from the kernel than would fit in the user buffer. This may have
something
to do with that and special values used as eof markers in the
getdents output
and signed 32 bit arguments to lseek. In the original xfs directory
code, the
offset of an entry was a 64 bit hash+offset value, that really
confused things
when glibc attempted to do math on it.
I also recall that the offsets in the directory fields had different
meanings
on different OS's. Sometimes it was the offset of the entry itself,
sometimes it
was the offset of the next entry, that was one of the reasons for the
translation
layer I think.
Steve
On Fri, Nov 30, 2007 at 06:22:09PM +1100, Timothy Shimmin wrote:
> Hmmm, don't see the point of "eof" local var now.
> Previously bhv_vop_readdir() returned eof.
> I presume if we don't move the offset (offset == startoffset) then
> we're done and break out?
> So we lost eof when going to the filldir in the getdents code etc...
Yes, it's just copy & paste. We can trivially kill the variable.
On Fri, Nov 30, 2007 at 04:36:25PM -0600, Stephen Lord wrote:
> Wow, was it really that long ago!
>
> Looks like the readdir is in the bowels of the btree code when filldir gets
> called
> here, there are probably locks on several buffers in the btree at this
> point. This
> will only show up for large directories I bet.
Chris saw it with block-form directories. I've verified it works fine
with short-form directories, and the leaf code looks like it could
happen aswell. I also remember gfs2 running into a similar problem.
> The xfs readdir code has the complete xfs inode number in its hands at this
> point
> (filldir is not necessarily getting all the bits of it). All we are doing
> the lookup
> for really is to get the inode number back again so we can get the inode
> and get the attributes. Rather dumb really. There has got to be a way of
> doing a callout structure here so that the inode number can be pushed
> through filldir and back into an fs specific call. The fs then can do a
> lookup
> by id - which is what it does most of the time for resolving nfs handles
> anyway. Should be more efficient than the current scheme.
Yes, a lot more efficient. But it means adding a new operation for use
by the nfs server.