2009-03-11 13:57:30

by Laurent GUERBY

[permalink] [raw]
Subject: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock

Hi,

With 2.6.29-rc5 (on an ARM platform but I don't think it's significant)
when I try to enable swap on a file which is on an USB mounted vfat
partition the swapon syscall gets stuck:

swapon D c03780d4 0 22361 1
[<c0377db0>] (schedule+0x0/0x3ac) from [<c0378e50>] (__mutex_lock_slowpath+0x94/0xf4)
[<c0378dbc>] (__mutex_lock_slowpath+0x0/0xf4) from [<c0378c40>] (mutex_lock+0x20/0x24)
r6:df49e808 r5:00000000 r4:00000000
[<c0378c20>] (mutex_lock+0x0/0x24) from [<c013bb2c>] (_fat_bmap+0x28/0x68)
[<c013bb04>] (_fat_bmap+0x0/0x68) from [<c00af910>] (bmap+0x2c/0x40)
r6:0005ffff r5:00000000 r4:00000000
[<c00af8e4>] (bmap+0x0/0x40) from [<c0094b84>] (sys_swapon+0x630/0xcbc)
r5:c0541510 r4:00300000
[<c0094554>] (sys_swapon+0x0/0xcbc) from [<c0029960>] (ret_fast_syscall+0x0/0x2c)

Looking around in the kernel sources it looks like the inode mutex is
taken both by mm/swapfile.c and by fs/fat/inode.c, the latter one since
this patch from november 2008:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa93ca18a8b0da4e26bd9491ad144cd14d22f8ec

When I use a loopback device on the very same file the swapon call works
so there's an easy workaround for this issue.

Sincerely,

Laurent
http://guerby.org/blog


2009-03-11 15:54:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock



On Wed, 11 Mar 2009, Laurent GUERBY wrote:
>
> With 2.6.29-rc5 (on an ARM platform but I don't think it's significant)
> when I try to enable swap on a file which is on an USB mounted vfat
> partition the swapon syscall gets stuck:

Oh wow. Are you _sure_ you want to do that? That's crazy.

But:

> swapon D c03780d4 0 22361 1
> [<c0377db0>] (schedule+0x0/0x3ac) from [<c0378e50>] (__mutex_lock_slowpath+0x94/0xf4)
> [<c0378dbc>] (__mutex_lock_slowpath+0x0/0xf4) from [<c0378c40>] (mutex_lock+0x20/0x24) r6:df49e808 r5:00000000 r4:00000000
> [<c0378c20>] (mutex_lock+0x0/0x24) from [<c013bb2c>] (_fat_bmap+0x28/0x68)
> [<c013bb04>] (_fat_bmap+0x0/0x68) from [<c00af910>] (bmap+0x2c/0x40) r6:0005ffff r5:00000000 r4:00000000
> [<c00af8e4>] (bmap+0x0/0x40) from [<c0094b84>] (sys_swapon+0x630/0xcbc) r5:c0541510 r4:00300000
> [<c0094554>] (sys_swapon+0x0/0xcbc) from [<c0029960>] (ret_fast_syscall+0x0/0x2c)
>
> Looking around in the kernel sources it looks like the inode mutex is
> taken both by mm/swapfile.c and by fs/fat/inode.c, the latter one since
> this patch from november 2008:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa93ca18a8b0da4e26bd9491ad144cd14d22f8ec

Yes, clearly there's a deadlock there which was hidden before. And FAT
technically does the locking right, so that bmap() doesn't race with
somebody changing the file.

That said, other filesystems don't have this problem, simply because they
just ignore the race, knowing that bmap is inherently racy in that
situation _anyway_ (ie the value we return is clearly going to race
_after_ we release the lock even if we do the lookup with the lock held!).

So the right thing to do would appear to be to just remove the silly
locking in fat_bmap. It's not helping, and it's clearly hurting your
(crazy) case. In the _normal_ paths (ie a regular read/write) we handle
locking on a per-page basis anyway.

I dunno. No other filesystem has _any_ locking in their bmap that I can
see, so I strongly suspect that fat doesn't need it either.

IOW, I'm almost 100% sure that the right fix is this trivial one, but I'd
like somebody else to double-check my thinking.

Linus

---
fs/fat/inode.c | 9 +--------
1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 6b74d09..2fcb269 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -199,14 +199,7 @@ static ssize_t fat_direct_IO(int rw, struct kiocb *iocb,

static sector_t _fat_bmap(struct address_space *mapping, sector_t block)
{
- sector_t blocknr;
-
- /* fat_get_cluster() assumes the requested blocknr isn't truncated. */
- mutex_lock(&mapping->host->i_mutex);
- blocknr = generic_block_bmap(mapping, block, fat_get_block);
- mutex_unlock(&mapping->host->i_mutex);
-
- return blocknr;
+ return generic_block_bmap(mapping, block, fat_get_block);
}

static const struct address_space_operations fat_aops = {

2009-03-11 16:52:53

by Laurent GUERBY

[permalink] [raw]
Subject: Re: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock

On Wed, 2009-03-11 at 08:53 -0700, Linus Torvalds wrote:
> On Wed, 11 Mar 2009, Laurent GUERBY wrote:
> > With 2.6.29-rc5 (on an ARM platform but I don't think it's significant)
> > when I try to enable swap on a file which is on an USB mounted vfat
> > partition the swapon syscall gets stuck:
>
> Oh wow. Are you _sure_ you want to do that? That's crazy.

I was testing a device with no swap (root over NFS) and GCC bootstrap
failed because of lack of RAM. So I first tried to add swap on file over
NFS, judging from various warnings on the net and that the system froze
after about 4 hours it was in the "crazy" category. After that I grabbed
an USB HDD which happened to be vfat formated, hence second freeze but
on swapon process this time.

I promise next time I'll stick to local ext3 on x86 :).

Laurent

2009-03-11 17:16:07

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock

Linus Torvalds <[email protected]> writes:

> Yes, clearly there's a deadlock there which was hidden before. And FAT
> technically does the locking right, so that bmap() doesn't race with
> somebody changing the file.
>
> That said, other filesystems don't have this problem, simply because they
> just ignore the race, knowing that bmap is inherently racy in that
> situation _anyway_ (ie the value we return is clearly going to race
> _after_ we release the lock even if we do the lookup with the lock held!).
>
> So the right thing to do would appear to be to just remove the silly
> locking in fat_bmap. It's not helping, and it's clearly hurting your
> (crazy) case. In the _normal_ paths (ie a regular read/write) we handle
> locking on a per-page basis anyway.
>
> I dunno. No other filesystem has _any_ locking in their bmap that I can
> see, so I strongly suspect that fat doesn't need it either.
>
> IOW, I'm almost 100% sure that the right fix is this trivial one, but I'd
> like somebody else to double-check my thinking.

I'm sure that path touch the metadata without locking (so, reused entry
can not be for that inode anymore). However, I guess the result doesn't
become any fs corruption, so and other fs is ignoring the possibly wrong
result of bmap().

I'm thinking to use this patch instead of removing.


[PATCH] Fix _fat_bmap() locking

On swapon() path, it has already i_mutex. So, this uses i_alloc_sem
instead of it.

Signed-off-by: OGAWA Hirofumi <[email protected]>
---

fs/fat/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN fs/fat/inode.c~fat_bmap-locking-fix fs/fat/inode.c
--- linux-2.6/fs/fat/inode.c~fat_bmap-locking-fix 2009-03-12 00:47:15.000000000 +0900
+++ linux-2.6-hirofumi/fs/fat/inode.c 2009-03-12 00:47:42.000000000 +0900
@@ -202,9 +202,9 @@ static sector_t _fat_bmap(struct address
sector_t blocknr;

/* fat_get_cluster() assumes the requested blocknr isn't truncated. */
- mutex_lock(&mapping->host->i_mutex);
+ down_read(&mapping->host->i_alloc_sem);
blocknr = generic_block_bmap(mapping, block, fat_get_block);
- mutex_unlock(&mapping->host->i_mutex);
+ up_read(&mapping->host->i_alloc_sem);

return blocknr;
}
_
--
OGAWA Hirofumi <[email protected]>

2009-03-11 18:13:49

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock

OGAWA Hirofumi <[email protected]> writes:

> I'm sure that path touch the metadata without locking (so, reused entry
> can not be for that inode anymore). However, I guess the result doesn't
> become any fs corruption, so and other fs is ignoring the possibly wrong
> result of bmap().
>
> I'm thinking to use this patch instead of removing.

Ok. Simple stress test was done. BTW, since fat caches the metadata on
that path, fat would be sensitive to race more than other fs.

Anyway, if you are ok, please apply. Otherwise, I'm going to prepare for
this patch to next merge window.

Thanks.

> [PATCH] Fix _fat_bmap() locking
>
> On swapon() path, it has already i_mutex. So, this uses i_alloc_sem
> instead of it.
>
> Signed-off-by: OGAWA Hirofumi <[email protected]>
> ---
>
> fs/fat/inode.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff -puN fs/fat/inode.c~fat_bmap-locking-fix fs/fat/inode.c
> --- linux-2.6/fs/fat/inode.c~fat_bmap-locking-fix 2009-03-12 00:47:15.000000000 +0900
> +++ linux-2.6-hirofumi/fs/fat/inode.c 2009-03-12 00:47:42.000000000 +0900
> @@ -202,9 +202,9 @@ static sector_t _fat_bmap(struct address
> sector_t blocknr;
>
> /* fat_get_cluster() assumes the requested blocknr isn't truncated. */
> - mutex_lock(&mapping->host->i_mutex);
> + down_read(&mapping->host->i_alloc_sem);
> blocknr = generic_block_bmap(mapping, block, fat_get_block);
> - mutex_unlock(&mapping->host->i_mutex);
> + up_read(&mapping->host->i_alloc_sem);
>
> return blocknr;
> }
> _

--
OGAWA Hirofumi <[email protected]>

2009-03-11 18:21:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock



On Thu, 12 Mar 2009, OGAWA Hirofumi wrote:
>
> Ok. Simple stress test was done. BTW, since fat caches the metadata on
> that path, fat would be sensitive to race more than other fs.
>
> Anyway, if you are ok, please apply. Otherwise, I'm going to prepare for
> this patch to next merge window.

It looks good to me, and obviously safer than my version. Will apply.

Thanks,

Linus

2009-03-16 08:34:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [2.6.29-rc5][BUG] swapon on vfat file gets stuck on inode lock

On Wed, Mar 11, 2009 at 08:53:14AM -0700, Linus Torvalds wrote:
> I dunno. No other filesystem has _any_ locking in their bmap that I can
> see, so I strongly suspect that fat doesn't need it either.

XFS has proper locking using it's own locks.

Then again Peter had patchset somewhere to implement proper aops for
swap instead of abusing -bmap to implement swap over NFS, which should
be able to fix this for fat, too.