(this is ext3)
On Fri, 9 Jan 2009 16:36:45 -0800
"Pallipadi, Venkatesh" <[email protected]> wrote:
>
> I started seeing this BUG on one of my test systems during file system unmount
> on the way to shutdown/reboot. The problem is present with latest git and not
> present in 2.6.28.
>
> Hoping that someone has already seen this and fix available before I go
> git bisect way...
>
> Let me know if you need any more detais about the bug or test system.
>
> Thanks,
> Venki
>
>
> [ 212.222623] ADDRCONF(NETDEV_UP): eth0: link is not ready
> [ 213.824126] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: R
> X/TX
> [ 213.832083] 0000:06:00.0: eth0: 10/100 speed: disabling TSO
> [ 213.842423] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [ 579.820973] sb orphan head is 291585
> [ 579.824856] sb_info orphan list:
> [ 579.828373] inode sda3:291585 at ffff880220449c00: mode 100600, nlink 0, ne
> xt 0
> [ 579.836381] ------------[ cut here ]------------
> [ 579.841264] kernel BUG at /home/venkip/src/linus/linux-2.6/fs/ext3/super.c:42
> 8!
> [ 579.849065] invalid opcode: 0000 [#1] SMP
> [ 579.853602] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:0b:01.
> 0/irq
> [ 579.861638] CPU 0
> [ 579.863974] Modules linked in:
> [ 579.867387] Pid: 7027, comm: umount Not tainted 2.6.28-05716-gfe0bdec-dirty #
> 587
> [ 579.875244] RIP: 0010:[<ffffffff80321151>] [<ffffffff80321151>] ext3_put_sup
> er+0x198/0x1f6
> [ 579.884162] RSP: 0018:ffff88022c9a7e18 EFLAGS: 00010287
> [ 579.889753] RAX: ffff880220449b68 RBX: ffff880229151278 RCX: 0000000000000000
> [ 579.897162] RDX: 0000000000007675 RSI: 0000000000000001 RDI: 000000000000000f
> [ 579.904573] RBP: ffff88022c9a7e48 R08: 0000000000000086 R09: 0000000000000086
> [ 579.911982] R10: ffffffff8024ee4b R11: 0000000000000086 R12: ffff880229150000
> [ 579.919410] R13: ffff88022ae6b000 R14: ffff880229151278 R15: ffff88022cbf8498
> [ 579.926820] FS: 00007fa202bdd6d0(0000) GS:ffffffff80a83080(0000) knlGS:00000
> 00000000000
> [ 579.935408] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 579.941422] CR2: 000000000059d4b8 CR3: 00000002299d7000 CR4: 00000000000406e0
> [ 579.948832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 579.956244] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 579.963669] Process umount (pid: 7027, threadinfo ffff88022c9a6000, task ffff
> 88022993e180)
> [ 579.972422] Stack:
> [ 579.974695] 0000000000000000 ffffffff80a70c80 ffff88022ae6b000 ffffffff806c2
> 440
> [ 579.982414] 0000000000000008 ffffffff80a70c80 ffff88022c9a7e68 ffffffff802a5
> e6a
> [ 579.990643] ffff88022a9d5040 0000000000000003 ffff88022c9a7e88 ffffffff802a5
> f18
> [ 579.999317] Call Trace:
> [ 580.002027] [<ffffffff802a5e6a>] generic_shutdown_super+0x63/0xf7
> [ 580.008553] [<ffffffff802a5f18>] kill_block_super+0x1a/0x32
> [ 580.014584] [<ffffffff802a5fe5>] deactivate_super+0x4c/0x61
> [ 580.020609] [<ffffffff802b8e4a>] mntput_no_expire+0x10d/0x13d
> [ 580.026814] [<ffffffff802b93ec>] sys_umount+0x2c0/0x2ed
> [ 580.032473] [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
> [ 580.038826] Code: ff ff 48 81 c6 50 04 00 00 89 04 24 31 c0 e8 c8 4f 37 00 48
> 8b 1b 48 8b 03 4c 39 f3 0f 18 08 75 a9 4d 39 b4 24 78 12 00 00 74 04 <0f> 0b eb
> fe 49 8b bd d0 01 00 00 e8 0c 18 fa ff 49 8b bc 24 90
> [ 580.062926] RIP [<ffffffff80321151>] ext3_put_super+0x198/0x1f6
Well that's not good. I don't recall us making any changes which
affect the orphan list handling. Perhaps "filesystem freeze: add error
handling of write_super_lockfs/unlockfs", but only indirectly.
Does Arjan's new async stuff play with filesystems at umount/shutdown
time? Don't think so.
A bisect would be nice, please ;)
On Tue, Jan 13, 2009 at 04:48:42PM -0800, Andrew Morton wrote:
> Well that's not good. I don't recall us making any changes which
> affect the orphan list handling. Perhaps "filesystem freeze: add error
> handling of write_super_lockfs/unlockfs", but only indirectly.
>
> Does Arjan's new async stuff play with filesystems at umount/shutdown
> time? Don't think so.
Well, Arjan's commit, efaee192: "async: make the final inode deletion
an asynchronous event", does change how inodes get deleted, and this
looks like a race where an inode is getting deleted during the umount.
So I would try reverting commit efaee192 and see if it fixes things
before starting a full bisect...
- Ted
Theodore Tso wrote:
> On Tue, Jan 13, 2009 at 04:48:42PM -0800, Andrew Morton wrote:
>> Well that's not good. I don't recall us making any changes which
>> affect the orphan list handling. Perhaps "filesystem freeze: add error
>> handling of write_super_lockfs/unlockfs", but only indirectly.
>>
>> Does Arjan's new async stuff play with filesystems at umount/shutdown
>> time? Don't think so.
>
> Well, Arjan's commit, efaee192: "async: make the final inode deletion
> an asynchronous event", does change how inodes get deleted, and this
> looks like a race where an inode is getting deleted during the umount.
>
> So I would try reverting commit efaee192 and see if it fixes things
> before starting a full bisect...
the commit is already reverted before rc1
On Wed, Jan 14, 2009 at 02:48:13AM +0000, Arjan van de Ven wrote:
>> Well, Arjan's commit, efaee192: "async: make the final inode deletion
>> an asynchronous event", does change how inodes get deleted, and this
>> looks like a race where an inode is getting deleted during the umount.
>>
>> So I would try reverting commit efaee192 and see if it fixes things
>> before starting a full bisect...
>
> the commit is already reverted before rc1
>
Ah, right. I see, the async infrastructure is still in fs/super.c,
but the actual code to insert deleted inodes onto the s_async_list was
removed in commit b32714b. Sorry, that confused me.
OK, so assuming that Venkatesh was using something post-rc1, I can't
suggest anything other than a full bisect. Sorry....
- Ted
On Tue, Jan 13, 2009 at 08:40:59PM -0800, Theodore Tso wrote:
> On Wed, Jan 14, 2009 at 02:48:13AM +0000, Arjan van de Ven wrote:
> >> Well, Arjan's commit, efaee192: "async: make the final inode deletion
> >> an asynchronous event", does change how inodes get deleted, and this
> >> looks like a race where an inode is getting deleted during the umount.
> >>
> >> So I would try reverting commit efaee192 and see if it fixes things
> >> before starting a full bisect...
> >
> > the commit is already reverted before rc1
> >
>
> Ah, right. I see, the async infrastructure is still in fs/super.c,
> but the actual code to insert deleted inodes onto the s_async_list was
> removed in commit b32714b. Sorry, that confused me.
>
> OK, so assuming that Venkatesh was using something post-rc1, I can't
> suggest anything other than a full bisect. Sorry....
>
Below is the result of full bisect
38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
Author: Peter Zijlstra <[email protected]>
Date: Fri Sep 26 19:32:20 2008 +0200
futex: rely on get_user_pages() for shared futexes
On the way of getting rid of the mmap_sem requirement for shared futexes,
start by relying on get_user_pages().
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Nick Piggin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
:040000 040000 029e79e0a7421438c2a7437dd210a1acf40b6c29 b581716762c1952c0f515fac642690514e7224b7 M include
:040000 040000 5f604b60974dbb9b0ac1a0910234f28c43a5e691 10c576cabe7eae661501ec38861a0f7488d5353b M kernel
--
git-bisect start
# good: [4a6908a3a050aacc9c3a2f36b276b46c0629ad91] Linux 2.6.28
git-bisect good 4a6908a3a050aacc9c3a2f36b276b46c0629ad91
# bad: [c59765042f53a79a7a65585042ff463b69cb248c] Linux 2.6.29-rc1
git-bisect bad c59765042f53a79a7a65585042ff463b69cb248c
# bad: [590cf28580c999c8ba70dc39b40bab09d69e2630] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
git-bisect bad 590cf28580c999c8ba70dc39b40bab09d69e2630
# good: [0191b625ca5a46206d2fb862bb08f36f2fcb3b31] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git-bisect good 0191b625ca5a46206d2fb862bb08f36f2fcb3b31
# good: [134179823b3ca9c8b98e0631906459dbb022ff9b] V4L/DVB (10130): use USB API functions rather than constants
git-bisect good 134179823b3ca9c8b98e0631906459dbb022ff9b
# good: [47992cbdaef2f18a47871b2ed01ad27f568c8b73] Merge branch 'for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 into devel
git-bisect good 47992cbdaef2f18a47871b2ed01ad27f568c8b73
# bad: [6de71484cf9561edb45224f659a9db38b6056d5e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6
git-bisect bad 6de71484cf9561edb45224f659a9db38b6056d5e
# bad: [179475a3b46f86e2d06f83e2312218ac3f0cf3a7] Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git-bisect bad 179475a3b46f86e2d06f83e2312218ac3f0cf3a7
# bad: [cc37d3d20604f3759d269247b022616f710aa52d] Merge branch 'core/futexes' into core/core
git-bisect bad cc37d3d20604f3759d269247b022616f710aa52d
# good: [7918baa555140989eeee1270f48533987d48fdba] mutex: __used is needed for function referenced only from inline asm
git-bisect good 7918baa555140989eeee1270f48533987d48fdba
# bad: [b19b3c74c7bbec45a848631b8f970ac110665a01] Merge branches 'core/debug', 'core/futexes', 'core/locking', 'core/rcu', 'core/signal', 'core/urgent' and 'core/xen' into core/core
git-bisect bad b19b3c74c7bbec45a848631b8f970ac110665a01
# bad: [42569c39917a08e8de1e8b5685463be7b74baebd] futex: fixup get_futex_key() for private futexes
git-bisect bad 42569c39917a08e8de1e8b5685463be7b74baebd
# bad: [61270708ecf1cda148e84fbf6e0703ee5aa81814] futex: reduce mmap_sem usage
git-bisect bad 61270708ecf1cda148e84fbf6e0703ee5aa81814
# bad: [38d47c1b7075bd7ec3881141bb3629da58f88dab] futex: rely on get_user_pages() for shared futexes
git-bisect bad 38d47c1b7075bd7ec3881141bb3629da58f88dab
---
On Wed, 2009-01-14 at 11:16 -0800, Pallipadi, Venkatesh wrote:
> On Tue, Jan 13, 2009 at 08:40:59PM -0800, Theodore Tso wrote:
> > On Wed, Jan 14, 2009 at 02:48:13AM +0000, Arjan van de Ven wrote:
> > >> Well, Arjan's commit, efaee192: "async: make the final inode deletion
> > >> an asynchronous event", does change how inodes get deleted, and this
> > >> looks like a race where an inode is getting deleted during the umount.
> > >>
> > >> So I would try reverting commit efaee192 and see if it fixes things
> > >> before starting a full bisect...
> > >
> > > the commit is already reverted before rc1
> > >
> >
> > Ah, right. I see, the async infrastructure is still in fs/super.c,
> > but the actual code to insert deleted inodes onto the s_async_list was
> > removed in commit b32714b. Sorry, that confused me.
> >
> > OK, so assuming that Venkatesh was using something post-rc1, I can't
> > suggest anything other than a full bisect. Sorry....
> >
>
> Below is the result of full bisect
>
> 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
> commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Sep 26 19:32:20 2008 +0200
>
> futex: rely on get_user_pages() for shared futexes
>
> On the way of getting rid of the mmap_sem requirement for shared futexes,
> start by relying on get_user_pages().
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Acked-by: Nick Piggin <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> :040000 040000 029e79e0a7421438c2a7437dd210a1acf40b6c29 b581716762c1952c0f515fac642690514e7224b7 M include
> :040000 040000 5f604b60974dbb9b0ac1a0910234f28c43a5e691 10c576cabe7eae661501ec38861a0f7488d5353b M kernel
However does a futex change make ext3 crap its pants?
Is there anything more to it than start the machine, and reboot?
On Wed, Jan 14, 2009 at 11:29:37AM -0800, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 11:16 -0800, Pallipadi, Venkatesh wrote:
> > On Tue, Jan 13, 2009 at 08:40:59PM -0800, Theodore Tso wrote:
> > > On Wed, Jan 14, 2009 at 02:48:13AM +0000, Arjan van de Ven wrote:
> > > >> Well, Arjan's commit, efaee192: "async: make the final inode deletion
> > > >> an asynchronous event", does change how inodes get deleted, and this
> > > >> looks like a race where an inode is getting deleted during the umount.
> > > >>
> > > >> So I would try reverting commit efaee192 and see if it fixes things
> > > >> before starting a full bisect...
> > > >
> > > > the commit is already reverted before rc1
> > > >
> > >
> > > Ah, right. I see, the async infrastructure is still in fs/super.c,
> > > but the actual code to insert deleted inodes onto the s_async_list was
> > > removed in commit b32714b. Sorry, that confused me.
> > >
> > > OK, so assuming that Venkatesh was using something post-rc1, I can't
> > > suggest anything other than a full bisect. Sorry....
> > >
> >
> > Below is the result of full bisect
> >
> > 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
> > commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
> > Author: Peter Zijlstra <[email protected]>
> > Date: Fri Sep 26 19:32:20 2008 +0200
> >
> > futex: rely on get_user_pages() for shared futexes
> >
> > On the way of getting rid of the mmap_sem requirement for shared futexes,
> > start by relying on get_user_pages().
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Acked-by: Nick Piggin <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> > :040000 040000 029e79e0a7421438c2a7437dd210a1acf40b6c29 b581716762c1952c0f515fac642690514e7224b7 M include
> > :040000 040000 5f604b60974dbb9b0ac1a0910234f28c43a5e691 10c576cabe7eae661501ec38861a0f7488d5353b M kernel
>
> However does a futex change make ext3 crap its pants?
>
> Is there anything more to it than start the machine, and reboot?
Just system startup and reboot is enough to reproduce the problem. And 100%
reproducible. So, does seem to be any timing involved either.
Thanks,
Venki
On Wed, 2009-01-14 at 11:38 -0800, Pallipadi, Venkatesh wrote:
> > > Below is the result of full bisect
> > > futex: rely on get_user_pages() for shared futexes
> > However does a futex change make ext3 crap its pants?
> >
> > Is there anything more to it than start the machine, and reboot?
>
> Just system startup and reboot is enough to reproduce the problem. And 100%
> reproducible. So, does seem to be any timing involved either.
That's very odd indeed, non of my test systems nor Ingo's ever triggered
this, and that patch has been in -tip for a rather long time.
CommitDate: Tue Sep 30 12:35:20 2008 +0200
That's 3.5 months of test time.. most curious. What kind of userland do
you have that triggers this?
On Wed, Jan 14, 2009 at 11:42:49AM -0800, Peter Zijlstra wrote:
> On Wed, 2009-01-14 at 11:38 -0800, Pallipadi, Venkatesh wrote:
>
> > > > Below is the result of full bisect
>
> > > > futex: rely on get_user_pages() for shared futexes
>
> > > However does a futex change make ext3 crap its pants?
> > >
> > > Is there anything more to it than start the machine, and reboot?
> >
> > Just system startup and reboot is enough to reproduce the problem. And 100%
> > reproducible. So, does seem to be any timing involved either.
>
> That's very odd indeed, non of my test systems nor Ingo's ever triggered
> this, and that patch has been in -tip for a rather long time.
>
> CommitDate: Tue Sep 30 12:35:20 2008 +0200
>
> That's 3.5 months of test time.. most curious. What kind of userland do
> you have that triggers this?
>
This is a Core 2 (X5460) DP server system with SuSE SLES 10 installation.
I have 3 ext3 partitions mounted; root, home and my own masterboot that I use to
switch between different OS installations.
On Wed, Jan 14, 2009 at 08:29:37PM +0100, Peter Zijlstra wrote:
> > 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
> > commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
> > Author: Peter Zijlstra <[email protected]>
> > Date: Fri Sep 26 19:32:20 2008 +0200
> >
> > futex: rely on get_user_pages() for shared futexes
> >
> > On the way of getting rid of the mmap_sem requirement for shared futexes,
> > start by relying on get_user_pages().
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Acked-by: Nick Piggin <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> However does a futex change make ext3 crap its pants?
I agree, this doesn't make much sense. I've looked at the patch, and
I don't see how this would cause an ext3 orphaned-inode list handling
problem
Are you sure the bisect was done correctly? Have you tried reverting
that one commit, or otherwise conclusively shown that a kernel with
this commit fails, and one without this commit works just fine?
- Ted
On Wed, 2009-01-14 at 13:20 -0800, Theodore Tso wrote:
> On Wed, Jan 14, 2009 at 08:29:37PM +0100, Peter Zijlstra wrote:
> > > 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
> > > commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
> > > Author: Peter Zijlstra <[email protected]>
> > > Date: Fri Sep 26 19:32:20 2008 +0200
> > >
> > > futex: rely on get_user_pages() for shared futexes
> > >
> > > On the way of getting rid of the mmap_sem requirement for shared futexes,
> > > start by relying on get_user_pages().
> > >
> > > Signed-off-by: Peter Zijlstra <[email protected]>
> > > Acked-by: Nick Piggin <[email protected]>
> > > Signed-off-by: Ingo Molnar <[email protected]>
> > >
> > However does a futex change make ext3 crap its pants?
>
> I agree, this doesn't make much sense. I've looked at the patch, and
> I don't see how this would cause an ext3 orphaned-inode list handling
> problem
>
> Are you sure the bisect was done correctly? Have you tried reverting
> that one commit, or otherwise conclusively shown that a kernel with
> this commit fails, and one without this commit works just fine?
>
Unfortunately, I cannot revert this patch alone from upstream git.
But I consistently see
upstream git: Always produces this oops on reboot
checkout of 38d47c1b: Always produces this oops on reboot
checkout of 94aca1da (one patch before the above commit): Reboots fine
without the oops.
This is petty specific to the particular userspace, looks like.
I only see this on SLES10 installation. Also, I need a non-root user
logged in at least once after boot through X to see this problem. I was
always seeing this as I had autologin on local terminal and was remotely
rebooting the system. If I just boot to init 3 or boot to init 5 with no
user logged in or boot to init 5 with root logged in, I do not see this
problem.
Thanks,
Venki
On Wed, 2009-01-21 at 12:10 -0800, Pallipadi, Venkatesh wrote:
> On Wed, 2009-01-14 at 13:20 -0800, Theodore Tso wrote:
> > On Wed, Jan 14, 2009 at 08:29:37PM +0100, Peter Zijlstra wrote:
> > > > 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
> > > > commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
> > > > Author: Peter Zijlstra <[email protected]>
> > > > Date: Fri Sep 26 19:32:20 2008 +0200
> > > >
> > > > futex: rely on get_user_pages() for shared futexes
> > > >
> > > > On the way of getting rid of the mmap_sem requirement for shared futexes,
> > > > start by relying on get_user_pages().
> > > >
> > > > Signed-off-by: Peter Zijlstra <[email protected]>
> > > > Acked-by: Nick Piggin <[email protected]>
> > > > Signed-off-by: Ingo Molnar <[email protected]>
> > > >
> > > However does a futex change make ext3 crap its pants?
> >
> > I agree, this doesn't make much sense. I've looked at the patch, and
> > I don't see how this would cause an ext3 orphaned-inode list handling
> > problem
> >
> > Are you sure the bisect was done correctly? Have you tried reverting
> > that one commit, or otherwise conclusively shown that a kernel with
> > this commit fails, and one without this commit works just fine?
> >
>
> Unfortunately, I cannot revert this patch alone from upstream git.
> But I consistently see
> upstream git: Always produces this oops on reboot
> checkout of 38d47c1b: Always produces this oops on reboot
> checkout of 94aca1da (one patch before the above commit): Reboots fine
> without the oops.
>
> This is petty specific to the particular userspace, looks like.
> I only see this on SLES10 installation. Also, I need a non-root user
> logged in at least once after boot through X to see this problem. I was
> always seeing this as I had autologin on local terminal and was remotely
> rebooting the system. If I just boot to init 3 or boot to init 5 with no
> user logged in or boot to init 5 with root logged in, I do not see this
> problem.
Ted, could this happen due an extra iput()?
In that case, Venki, does the below patch fix it?
Credit goes to Darren for spotting this.
---
kernel/futex.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index f89d373..f4132ab 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -929,7 +929,7 @@ out_unlock:
/* drop_futex_key_refs() must be called outside the spinlocks. */
while (--drop_count >= 0)
- drop_futex_key_refs(&key1);
+ drop_futex_key_refs(&key2);
out_put_keys:
put_futex_key(fshared, &key2);
Peter Zijlstra wrote:
> On Wed, 2009-01-21 at 12:10 -0800, Pallipadi, Venkatesh wrote:
>> On Wed, 2009-01-14 at 13:20 -0800, Theodore Tso wrote:
>>> On Wed, Jan 14, 2009 at 08:29:37PM +0100, Peter Zijlstra wrote:
>>>>> 38d47c1b7075bd7ec3881141bb3629da58f88dab is first bad commit
>>>>> commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
>>>>> Author: Peter Zijlstra <[email protected]>
>>>>> Date: Fri Sep 26 19:32:20 2008 +0200
>>>>>
>>>>> futex: rely on get_user_pages() for shared futexes
>>>>>
>>>>> On the way of getting rid of the mmap_sem requirement for shared futexes,
>>>>> start by relying on get_user_pages().
>>>>>
>>>>> Signed-off-by: Peter Zijlstra <[email protected]>
>>>>> Acked-by: Nick Piggin <[email protected]>
>>>>> Signed-off-by: Ingo Molnar <[email protected]>
>>>>>
>>>> However does a futex change make ext3 crap its pants?
>>> I agree, this doesn't make much sense. I've looked at the patch, and
>>> I don't see how this would cause an ext3 orphaned-inode list handling
>>> problem
>>>
>>> Are you sure the bisect was done correctly? Have you tried reverting
>>> that one commit, or otherwise conclusively shown that a kernel with
>>> this commit fails, and one without this commit works just fine?
>>>
>> Unfortunately, I cannot revert this patch alone from upstream git.
>> But I consistently see
>> upstream git: Always produces this oops on reboot
>> checkout of 38d47c1b: Always produces this oops on reboot
>> checkout of 94aca1da (one patch before the above commit): Reboots fine
>> without the oops.
>>
>> This is petty specific to the particular userspace, looks like.
>> I only see this on SLES10 installation. Also, I need a non-root user
>> logged in at least once after boot through X to see this problem. I was
>> always seeing this as I had autologin on local terminal and was remotely
>> rebooting the system. If I just boot to init 3 or boot to init 5 with no
>> user logged in or boot to init 5 with root logged in, I do not see this
>> problem.
>
> Ted, could this happen due an extra iput()?
>
> In that case, Venki, does the below patch fix it?
>
> Credit goes to Darren for spotting this.
>
> ---
> kernel/futex.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index f89d373..f4132ab 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -929,7 +929,7 @@ out_unlock:
>
> /* drop_futex_key_refs() must be called outside the spinlocks. */
> while (--drop_count >= 0)
> - drop_futex_key_refs(&key1);
> + drop_futex_key_refs(&key2);
Unfortunately, I realized later that this code was indeed correct and I
asked Ingo to pull my patch implementing the above change. Quoting my
previous mail on the subject:
"I believe what is happening here is that the requeue loop requeues each
waiter from one futex (key1) to another (key2). It rightly takes a
reference to the futex at key2 and then decrements the references to
key1 by drop_count (since the waiters now reference key2, not key1).
The newly taken key2 references will be dropped in futex_wait() when
each waiter is woken up and takes the futex."
However, there are still two patches in linux-tip/core/futexes that
addresses get|put symmetry of futex keys:
90621c40cc4ab7b0a414311ce37e7cc7173403b6
42d35d48ce7cefb9429880af19d1c329d1554e7a
However, the first is an addition of a WARN_ON (which is unlikely to
catch this issue as it was geared toward catching puts on failed gets).
The latter mostly adds puts where they were missing, so also unlikely
to help.
--
Darren
>
> out_put_keys:
> put_futex_key(fshared, &key2);
>
>
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
On Mon, 2009-01-26 at 08:39 -0800, Darren Hart wrote:
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index f89d373..f4132ab 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -929,7 +929,7 @@ out_unlock:
> >
> > /* drop_futex_key_refs() must be called outside the spinlocks. */
> > while (--drop_count >= 0)
> > - drop_futex_key_refs(&key1);
> > + drop_futex_key_refs(&key2);
>
> Unfortunately, I realized later that this code was indeed correct and I
> asked Ingo to pull my patch implementing the above change. Quoting my
> previous mail on the subject:
>
> "I believe what is happening here is that the requeue loop requeues each
> waiter from one futex (key1) to another (key2). It rightly takes a
> reference to the futex at key2 and then decrements the references to
> key1 by drop_count (since the waiters now reference key2, not key1).
> The newly taken key2 references will be dropped in futex_wait() when
> each waiter is woken up and takes the futex."
Argh, that wants a comment..
Peter Zijlstra wrote:
> On Mon, 2009-01-26 at 08:39 -0800, Darren Hart wrote:
>
>>> diff --git a/kernel/futex.c b/kernel/futex.c
>>> index f89d373..f4132ab 100644
>>> --- a/kernel/futex.c
>>> +++ b/kernel/futex.c
>>> @@ -929,7 +929,7 @@ out_unlock:
>>>
>>> /* drop_futex_key_refs() must be called outside the spinlocks. */
>>> while (--drop_count >= 0)
>>> - drop_futex_key_refs(&key1);
>>> + drop_futex_key_refs(&key2);
>> Unfortunately, I realized later that this code was indeed correct and I
>> asked Ingo to pull my patch implementing the above change. Quoting my
>> previous mail on the subject:
>>
>> "I believe what is happening here is that the requeue loop requeues each
>> waiter from one futex (key1) to another (key2). It rightly takes a
>> reference to the futex at key2 and then decrements the references to
>> key1 by drop_count (since the waiters now reference key2, not key1).
>> The newly taken key2 references will be dropped in futex_wait() when
>> each waiter is woken up and takes the futex."
>
> Argh, that wants a comment..
>
Agreed. How about this.
futex: comment requeue key reference symmantics
From: Darren Hart <[email protected]>
We've tripped over the futex_requeue drop_count refering to key2
instead of key1. The code is actually correct, but is non-intuitive.
This patch adds an explicit comment explaining the requeue.
Signed-off-by: Darren Hart <[email protected]>
---
kernel/futex.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 8af1002..3655cad 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -969,7 +969,12 @@ out_unlock:
if (hb1 != hb2)
spin_unlock(&hb2->lock);
- /* drop_futex_key_refs() must be called outside the spinlocks. */
+ /*
+ * drop_futex_key_refs() must be called outside the spinlocks. During
+ * the requeue we moved futex_q's from the hash bucket at key1 to the
+ * one at key2 and updated their key pointer. We no longer need to
+ * hold the references to key1.
+ */
while (--drop_count >= 0)
drop_futex_key_refs(&key1);
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team