2005-01-25 19:24:41

by Attila Body

[permalink] [raw]
Subject: UDF madness

Hi,

I've spent some time (2 weekwnds) on this issue, while I was able to
realize that not the packet-writing but the UDF driver is broken

here is the recipie to reproduve the issue:

dd if=/dev/zer of=udf.img bs=1024k count=3000
mkudffs udf.img
mount -o loop udf.img /mnt/tmp
cd /mnt/tmp
tar xjvf /usr/src/linux-2.6.11-rc2
cd /
umount /mnt/tmp

On the end I get a never ending umount process in D state. sysreq-T
tells the following about the umount process:



umount D C03F93E0 0 5848 5655 (NOTLB)
c797dd68 00200082 ca674560 c03f93e0 00000040 00000000 c81e327c 0000000c
000ae62d 925f0dc0 000f43e3 ca674560 ca6746b0 c797c000 cebf184c
00200282
ca674560 c02ea6e0 cebf1854 00000001 ca674560 c01170c0 cebf1854
cebf1854
Call Trace:
[<c02ea6e0>] __down+0x90/0x110
[<c01170c0>] default_wake_function+0x0/0x20
[<c017c511>] __mark_inode_dirty+0xd1/0x1c0
[<c02ea8ab>] __down_failed+0x7/0xc
[<e0d1fda0>] .text.lock.balloc+0x8/0x128 [udf]
[<e0d25767>] udf_write_aext+0xf7/0x170 [udf]
[<e0d1fbac>] udf_free_blocks+0xcc/0x100 [udf]
[<e0d2cda4>] extent_trunc+0x124/0x180 [udf]
[<e0d2d025>] udf_discard_prealloc+0x225/0x2e0 [udf]
[<e0d21301>] udf_clear_inode+0x41/0x50 [udf]
[<c017285e>] clear_inode+0xde/0x120
[<c01728df>] dispose_list+0x3f/0xb0
[<c0172a92>] invalidate_inodes+0x62/0x90
[<c015e9ad>] generic_shutdown_super+0x5d/0x140
[<c015f69d>] kill_block_super+0x2d/0x50
[<c015e84d>] deactivate_super+0x6d/0x90
[<c0175e6f>] sys_umount+0x3f/0xa0
[<c014bc8d>] do_munmap+0x13d/0x180
[<c014bd14>] sys_munmap+0x44/0x70
[<c0175ee7>] sys_oldumount+0x17/0x20
[<c0103263>] syscall_call+0x7/0xb

any ideas (or patches?)

Thanks,

compi


2005-01-27 04:11:48

by Andrew Morton

[permalink] [raw]
Subject: Re: UDF madness

Attila Body <[email protected]> wrote:
>
> I've spent some time (2 weekwnds) on this issue, while I was able to
> realize that not the packet-writing but the UDF driver is broken
>
> here is the recipie to reproduve the issue:
>
> dd if=/dev/zer of=udf.img bs=1024k count=3000
> mkudffs udf.img
> mount -o loop udf.img /mnt/tmp
> cd /mnt/tmp
> tar xjvf /usr/src/linux-2.6.11-rc2
> cd /
> umount /mnt/tmp
>
> On the end I get a never ending umount process in D state. sysreq-T
> tells the following about the umount process:
>
>
>
> umount D C03F93E0 0 5848 5655 (NOTLB)
> c797dd68 00200082 ca674560 c03f93e0 00000040 00000000 c81e327c 0000000c
> 000ae62d 925f0dc0 000f43e3 ca674560 ca6746b0 c797c000 cebf184c
> 00200282
> ca674560 c02ea6e0 cebf1854 00000001 ca674560 c01170c0 cebf1854
> cebf1854
> Call Trace:
> [<c02ea6e0>] __down+0x90/0x110
> [<c01170c0>] default_wake_function+0x0/0x20
> [<c017c511>] __mark_inode_dirty+0xd1/0x1c0
> [<c02ea8ab>] __down_failed+0x7/0xc
> [<e0d1fda0>] .text.lock.balloc+0x8/0x128 [udf]
> [<e0d25767>] udf_write_aext+0xf7/0x170 [udf]
> [<e0d1fbac>] udf_free_blocks+0xcc/0x100 [udf]
> [<e0d2cda4>] extent_trunc+0x124/0x180 [udf]
> [<e0d2d025>] udf_discard_prealloc+0x225/0x2e0 [udf]
> [<e0d21301>] udf_clear_inode+0x41/0x50 [udf]
> [<c017285e>] clear_inode+0xde/0x120
> [<c01728df>] dispose_list+0x3f/0xb0
> [<c0172a92>] invalidate_inodes+0x62/0x90
> [<c015e9ad>] generic_shutdown_super+0x5d/0x140

Yes, me too. generic_shutdown_super() takes lock_super(). And udf uses
lock_super for protecting its block allocation data strutures. Trivial
deadlock on unmount.

Filesystems really shouldn't be using lock_super() for internal purposes,
and the main filesystems have been taught to not do that any more, but UDF
is a holdout.

It seems that this deadlock was introduced on Jan 5 by the "udf: fix
reservation discarding" patch which added the udf_discard_prealloc() call
into udf_clear_inode(). The below dopey patch prevents the deadlock, but
perhaps we can think of something more appealing. Ideally, use a new lock
altogether?

(I had UDF deadlock on me once during the untar. That was a multi-process
lockup and is probably due to a lock ranking bug. Will poke at that a bit
further).


--- 25/fs/udf/balloc.c~udf-umount-deadlock-fix 2005-01-26 19:45:38.351735200 -0800
+++ 25-akpm/fs/udf/balloc.c 2005-01-26 19:50:35.951493160 -0800
@@ -155,8 +155,17 @@ static void udf_bitmap_free_blocks(struc
unsigned long i;
int bitmap_nr;
unsigned long overflow;
+ int took_lock_super = 0;
+
+ if (sb->s_flags & MS_ACTIVE) {
+ /*
+ * On the umount path, generic_shutdown_super() holds
+ * lock_super for us
+ */
+ lock_super(sb);
+ took_lock_super = 1;
+ }

- lock_super(sb);
if (bloc.logicalBlockNum < 0 ||
(bloc.logicalBlockNum + count) > UDF_SB_PARTLEN(sb, bloc.partitionReferenceNum))
{
@@ -215,7 +224,8 @@ error_return:
sb->s_dirt = 1;
if (UDF_SB_LVIDBH(sb))
mark_buffer_dirty(UDF_SB_LVIDBH(sb));
- unlock_super(sb);
+ if (took_lock_super)
+ unlock_super(sb);
return;
}

_

2005-01-27 07:57:57

by Al Viro

[permalink] [raw]
Subject: Re: UDF madness

On Wed, Jan 26, 2005 at 08:11:41PM -0800, Andrew Morton wrote:
> Yes, me too. generic_shutdown_super() takes lock_super(). And udf uses
> lock_super for protecting its block allocation data strutures. Trivial
> deadlock on unmount.
>
> Filesystems really shouldn't be using lock_super() for internal purposes,
> and the main filesystems have been taught to not do that any more, but UDF
> is a holdout.
>
> It seems that this deadlock was introduced on Jan 5 by the "udf: fix
> reservation discarding" patch which added the udf_discard_prealloc() call
> into udf_clear_inode(). The below dopey patch prevents the deadlock, but
> perhaps we can think of something more appealing. Ideally, use a new lock
> altogether?

AFAICS, we probably could kill lock_super() in generic_shutdown_super().

Note that all callers of lock_super() in VFS hold ->s_umount and
generic_shutdown_super() is called with ->s_umount held exclusively.
And internal fs code doing it at that point == guaranteed deadlock like
UDF one, unless we abuse it as synchronization mechanism for per-fs kernel
thread.
Note that fs users of file_fsync() are definitely not going to be
involved into contention here - they need opened file => held active
reference to superblock.
So we are left only with fs-internal asynchronous callers of
lock_super(). UDF, UFS, sysv, hpfs and ext2 are out of question - they
don't have async callers of that sort. ext3... maybe, I'm not familiar
with resize code in there. In any case, that'd better be fixed in
ext3 if such abuse exists.

I really wonder if we need lock_super() in VFS callers - or at all,
since fs-internal stuff would better be handled by fs-private sempahores.
The only real use is to give exclusion between VFS-triggered ->write_super()
and fs-internal uses (in a handful of filesystems). Could as well use
fs-private semaphore and grab/release it in ->write_super() of that fs...
Another suspect is ->s_flags protection - that one is going to get messy
as hell, since lock_super() is certainly not held in many places that change
->s_flags...

2005-01-27 09:30:46

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: UDF madness

Hi,

On Thu, 2005-01-27 at 07:57, Al Viro wrote:

> Note that fs users of file_fsync() are definitely not going to be
> involved into contention here - they need opened file => held active
> reference to superblock.

> So we are left only with fs-internal asynchronous callers of
> lock_super(). UDF, UFS, sysv, hpfs and ext2 are out of question - they
> don't have async callers of that sort. ext3... maybe, I'm not familiar
> with resize code in there. In any case, that'd better be fixed in
> ext3 if such abuse exists.

ext3 resize is like file_fsync() --- it's an ioctl, so there's a
guaranteed open file at that point.

But while the resize code originally used lock_super() to protect from
races against allocation/deallocation, the ordering fixes in it means
that that's all safe anyway now. The lock_super() is currently only
really used to guard against two threads doing resize at the same time,
and if it's a problem, it would be trivial to use a different lock.

--Stephen


2005-01-27 09:59:24

by Al Viro

[permalink] [raw]
Subject: Re: UDF madness

On Thu, Jan 27, 2005 at 09:30:04AM +0000, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 2005-01-27 at 07:57, Al Viro wrote:
>
> > Note that fs users of file_fsync() are definitely not going to be
> > involved into contention here - they need opened file => held active
> > reference to superblock.
>
> > So we are left only with fs-internal asynchronous callers of
> > lock_super(). UDF, UFS, sysv, hpfs and ext2 are out of question - they
> > don't have async callers of that sort. ext3... maybe, I'm not familiar
> > with resize code in there. In any case, that'd better be fixed in
> > ext3 if such abuse exists.
>
> ext3 resize is like file_fsync() --- it's an ioctl, so there's a
> guaranteed open file at that point.
>
> But while the resize code originally used lock_super() to protect from
> races against allocation/deallocation, the ordering fixes in it means
> that that's all safe anyway now. The lock_super() is currently only
> really used to guard against two threads doing resize at the same time,
> and if it's a problem, it would be trivial to use a different lock.

That's fine - it does make sense to move to separate lock, but in any
case we are safe against generic_shutdown_super(). And it means that
at least there (which is the source of UDF problems) lock_super() can die.

2005-01-27 10:04:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: UDF madness

> Yes, me too. generic_shutdown_super() takes lock_super(). And udf uses
> lock_super for protecting its block allocation data strutures. Trivial
> deadlock on unmount.
>
> Filesystems really shouldn't be using lock_super() for internal purposes,
> and the main filesystems have been taught to not do that any more, but UDF
> is a holdout.
>
> It seems that this deadlock was introduced on Jan 5 by the "udf: fix
> reservation discarding" patch which added the udf_discard_prealloc() call
> into udf_clear_inode(). The below dopey patch prevents the deadlock, but
> perhaps we can think of something more appealing. Ideally, use a new lock
> altogether?

Yes, the lock_super usage in UDF only protects it's block allocation and
doesn't try to cross-protect anything with the VFS usage of it.

I'll cook up a patch to add a balloc_mutex to the UDF superblock and
give it some testing.

2005-01-29 15:34:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: UDF madness

On Wed, Jan 26, 2005 at 08:11:41PM -0800, Andrew Morton wrote:
> Yes, me too. generic_shutdown_super() takes lock_super(). And udf uses
> lock_super for protecting its block allocation data strutures. Trivial
> deadlock on unmount.
>
> Filesystems really shouldn't be using lock_super() for internal purposes,
> and the main filesystems have been taught to not do that any more, but UDF
> is a holdout.
>
> It seems that this deadlock was introduced on Jan 5 by the "udf: fix
> reservation discarding" patch which added the udf_discard_prealloc() call
> into udf_clear_inode(). The below dopey patch prevents the deadlock, but
> perhaps we can think of something more appealing. Ideally, use a new lock
> altogether?
>
> (I had UDF deadlock on me once during the untar. That was a multi-process
> lockup and is probably due to a lock ranking bug. Will poke at that a bit
> further).

Okay, below is a fix to switch udf to it's own private locking. It's
safe because it doesn't intefer with VFS lock_super usage anywhere.

udf_free_inode has some more updates than simply switching the used
lock:

- clear_inode() call moved outside locked section to avoid another
deadlock
- unused variable ino killed
- is_directory moved into the conditional it's actually used in


Signed-off-by: Christoph Hellwig <[email protected]>

(note that I see memory corruption in UDF_I_DATA(inode), but I've
reproduced that with a kernel without all recent udf changes. I'll
debug that one further)

--- 1.13/fs/udf/balloc.c 2004-10-20 10:37:15 +02:00
+++ edited/fs/udf/balloc.c 2005-01-29 16:11:49 +01:00
@@ -148,6 +148,7 @@ static void udf_bitmap_free_blocks(struc
struct udf_bitmap *bitmap,
kernel_lb_addr bloc, uint32_t offset, uint32_t count)
{
+ struct udf_sb_info *sbi = UDF_SB(sb);
struct buffer_head * bh = NULL;
unsigned long block;
unsigned long block_group;
@@ -156,7 +157,7 @@ static void udf_bitmap_free_blocks(struc
int bitmap_nr;
unsigned long overflow;

- lock_super(sb);
+ down(&sbi->s_alloc_sem);
if (bloc.logicalBlockNum < 0 ||
(bloc.logicalBlockNum + count) > UDF_SB_PARTLEN(sb, bloc.partitionReferenceNum))
{
@@ -215,7 +216,7 @@ error_return:
sb->s_dirt = 1;
if (UDF_SB_LVIDBH(sb))
mark_buffer_dirty(UDF_SB_LVIDBH(sb));
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return;
}

@@ -224,13 +225,13 @@ static int udf_bitmap_prealloc_blocks(st
struct udf_bitmap *bitmap, uint16_t partition, uint32_t first_block,
uint32_t block_count)
{
+ struct udf_sb_info *sbi = UDF_SB(sb);
int alloc_count = 0;
int bit, block, block_group, group_start;
int nr_groups, bitmap_nr;
struct buffer_head *bh;

- lock_super(sb);
-
+ down(&sbi->s_alloc_sem);
if (first_block < 0 || first_block >= UDF_SB_PARTLEN(sb, partition))
goto out;

@@ -279,7 +280,7 @@ out:
mark_buffer_dirty(UDF_SB_LVIDBH(sb));
}
sb->s_dirt = 1;
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return alloc_count;
}

@@ -287,6 +288,7 @@ static int udf_bitmap_new_block(struct s
struct inode * inode,
struct udf_bitmap *bitmap, uint16_t partition, uint32_t goal, int *err)
{
+ struct udf_sb_info *sbi = UDF_SB(sb);
int newbit, bit=0, block, block_group, group_start;
int end_goal, nr_groups, bitmap_nr, i;
struct buffer_head *bh = NULL;
@@ -294,7 +296,7 @@ static int udf_bitmap_new_block(struct s
int newblock = 0;

*err = -ENOSPC;
- lock_super(sb);
+ down(&sbi->s_alloc_sem);

repeat:
if (goal < 0 || goal >= UDF_SB_PARTLEN(sb, partition))
@@ -367,7 +369,7 @@ repeat:
}
if (i >= (nr_groups*2))
{
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return newblock;
}
if (bit < sb->s_blocksize << 3)
@@ -376,7 +378,7 @@ repeat:
bit = udf_find_next_one_bit(bh->b_data, sb->s_blocksize << 3, group_start << 3);
if (bit >= sb->s_blocksize << 3)
{
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return 0;
}

@@ -390,7 +392,7 @@ got_block:
*/
if (inode && DQUOT_ALLOC_BLOCK(inode, 1))
{
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
*err = -EDQUOT;
return 0;
}
@@ -413,13 +415,13 @@ got_block:
mark_buffer_dirty(UDF_SB_LVIDBH(sb));
}
sb->s_dirt = 1;
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
*err = 0;
return newblock;

error_return:
*err = -EIO;
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return 0;
}

@@ -428,6 +430,7 @@ static void udf_table_free_blocks(struct
struct inode * table,
kernel_lb_addr bloc, uint32_t offset, uint32_t count)
{
+ struct udf_sb_info *sbi = UDF_SB(sb);
uint32_t start, end;
uint32_t nextoffset, oextoffset, elen;
kernel_lb_addr nbloc, obloc, eloc;
@@ -435,7 +438,7 @@ static void udf_table_free_blocks(struct
int8_t etype;
int i;

- lock_super(sb);
+ down(&sbi->s_alloc_sem);
if (bloc.logicalBlockNum < 0 ||
(bloc.logicalBlockNum + count) > UDF_SB_PARTLEN(sb, bloc.partitionReferenceNum))
{
@@ -669,7 +672,7 @@ static void udf_table_free_blocks(struct

error_return:
sb->s_dirt = 1;
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return;
}

@@ -678,6 +681,7 @@ static int udf_table_prealloc_blocks(str
struct inode *table, uint16_t partition, uint32_t first_block,
uint32_t block_count)
{
+ struct udf_sb_info *sbi = UDF_SB(sb);
int alloc_count = 0;
uint32_t extoffset, elen, adsize;
kernel_lb_addr bloc, eloc;
@@ -694,8 +698,7 @@ static int udf_table_prealloc_blocks(str
else
return 0;

- lock_super(sb);
-
+ down(&sbi->s_alloc_sem);
extoffset = sizeof(struct unallocSpaceEntry);
bloc = UDF_I_LOCATION(table);

@@ -739,7 +742,7 @@ static int udf_table_prealloc_blocks(str
mark_buffer_dirty(UDF_SB_LVIDBH(sb));
sb->s_dirt = 1;
}
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return alloc_count;
}

@@ -747,6 +750,7 @@ static int udf_table_new_block(struct su
struct inode * inode,
struct inode *table, uint16_t partition, uint32_t goal, int *err)
{
+ struct udf_sb_info *sbi = UDF_SB(sb);
uint32_t spread = 0xFFFFFFFF, nspread = 0xFFFFFFFF;
uint32_t newblock = 0, adsize;
uint32_t extoffset, goal_extoffset, elen, goal_elen = 0;
@@ -763,8 +767,7 @@ static int udf_table_new_block(struct su
else
return newblock;

- lock_super(sb);
-
+ down(&sbi->s_alloc_sem);
if (goal < 0 || goal >= UDF_SB_PARTLEN(sb, partition))
goal = 0;

@@ -814,7 +817,7 @@ static int udf_table_new_block(struct su
if (spread == 0xFFFFFFFF)
{
udf_release_data(goal_bh);
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
return 0;
}

@@ -830,7 +833,7 @@ static int udf_table_new_block(struct su
if (inode && DQUOT_ALLOC_BLOCK(inode, 1))
{
udf_release_data(goal_bh);
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
*err = -EDQUOT;
return 0;
}
@@ -849,7 +852,7 @@ static int udf_table_new_block(struct su
}

sb->s_dirt = 1;
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);
*err = 0;
return newblock;
}
--- 1.13/fs/udf/ialloc.c 2005-01-05 03:48:14 +01:00
+++ edited/fs/udf/ialloc.c 2005-01-29 16:32:16 +01:00
@@ -35,11 +35,8 @@

void udf_free_inode(struct inode * inode)
{
- struct super_block * sb = inode->i_sb;
- int is_directory;
- unsigned long ino;
-
- ino = inode->i_ino;
+ struct super_block *sb = inode->i_sb;
+ struct udf_sb_info *sbi = UDF_SB(sb);

/*
* Note: we must free any quota before locking the superblock,
@@ -48,36 +45,32 @@ void udf_free_inode(struct inode * inode
DQUOT_FREE_INODE(inode);
DQUOT_DROP(inode);

- lock_super(sb);
-
- is_directory = S_ISDIR(inode->i_mode);
-
clear_inode(inode);

- if (UDF_SB_LVIDBH(sb))
- {
- if (is_directory)
+ down(&sbi->s_alloc_sem);
+ if (sbi->s_lvidbh) {
+ if (S_ISDIR(inode->i_mode))
UDF_SB_LVIDIU(sb)->numDirs =
cpu_to_le32(le32_to_cpu(UDF_SB_LVIDIU(sb)->numDirs) - 1);
else
UDF_SB_LVIDIU(sb)->numFiles =
cpu_to_le32(le32_to_cpu(UDF_SB_LVIDIU(sb)->numFiles) - 1);

- mark_buffer_dirty(UDF_SB_LVIDBH(sb));
+ mark_buffer_dirty(sbi->s_lvidbh);
}
- unlock_super(sb);
+ up(&sbi->s_alloc_sem);

udf_free_blocks(sb, NULL, UDF_I_LOCATION(inode), 0, 1);
}

struct inode * udf_new_inode (struct inode *dir, int mode, int * err)
{
- struct super_block *sb;
+ struct super_block *sb = dir->i_sb;
+ struct udf_sb_info *sbi = UDF_SB(sb);
struct inode * inode;
int block;
uint32_t start = UDF_I_LOCATION(dir).logicalBlockNum;

- sb = dir->i_sb;
inode = new_inode(sb);

if (!inode)
@@ -94,8 +87,8 @@ struct inode * udf_new_inode (struct ino
iput(inode);
return NULL;
}
- lock_super(sb);

+ down(&sbi->s_alloc_sem);
UDF_I_UNIQUE(inode) = 0;
UDF_I_LENEXTENTS(inode) = 0;
UDF_I_NEXT_ALLOC_BLOCK(inode) = 0;
@@ -160,8 +153,8 @@ struct inode * udf_new_inode (struct ino
UDF_I_CRTIME(inode) = current_fs_time(inode->i_sb);
insert_inode_hash(inode);
mark_inode_dirty(inode);
+ up(&sbi->s_alloc_sem);

- unlock_super(sb);
if (DQUOT_ALLOC_INODE(inode))
{
DQUOT_DROP(inode);
--- 1.48/fs/udf/super.c 2005-01-05 03:48:18 +01:00
+++ edited/fs/udf/super.c 2005-01-29 16:23:52 +01:00
@@ -1504,6 +1504,8 @@ static int udf_fill_super(struct super_b
sb->s_fs_info = sbi;
memset(UDF_SB(sb), 0x00, sizeof(struct udf_sb_info));

+ init_MUTEX(&sbi->s_alloc_sem);
+
if (!udf_parse_options((char *)options, &uopt))
goto error_out;

===== include/linux/udf_fs_sb.h 1.5 vs edited =====
--- 1.5/include/linux/udf_fs_sb.h 2002-11-19 01:49:50 +01:00
+++ edited/include/linux/udf_fs_sb.h 2005-01-29 16:11:49 +01:00
@@ -18,6 +18,8 @@
#ifndef _UDF_FS_SB_H
#define _UDF_FS_SB_H 1

+#include <asm/semaphore.h>
+
#pragma pack(1)

#define UDF_MAX_BLOCK_LOADED 8
@@ -113,6 +115,8 @@ struct udf_sb_info

/* VAT inode */
struct inode *s_vat;
+
+ struct semaphore s_alloc_sem;
};

#endif /* _UDF_FS_SB_H */