Hi,
I have been hunting a UDF bug that occasionally results in generation
of an Allocation Extent Descriptor with an incorrect tagLocation. So
far I haven't been able to see a path through the code that could
cause that. But, I noticed some inconsistency in locking during
AED generation and wonder if it could result in random corruption.
The function udf_update_inode() has this general pattern:
bh = udf_tgetblk(...); // calls sb_getblk()
lock_buffer(bh);
memset(bh->b_data, 0, inode->i_sb->s_blocksize);
// <snip>other code to populate FE/EFE data in the block</snip>
set_buffer_uptodate(bh);
unlock_buffer(bh);
mark_buffer_dirty(bh);
This I can understand - the lock is held for as long as the buffer
contents are being assembled.
In contrast, udf_setup_indirect_aext(), which constructs an AED,
has this sequence:
bh = udf_tgetblk(...); // calls sb_getblk()
lock_buffer(bh);
memset(bh->b_data, 0, inode->i_sb->s_blocksize);
set_buffer_uptodate(bh);
unlock_buffer(bh);
mark_buffer_dirty_inode(bh);
// <snip>other code to populate AED data in the block</snip>
In this case the population of the block occurs without
the protection of the lock.
Because the block has been marked dirty, does this mean that
writeback could occur at any point during population?
There is one path through udf_setup_indirect_aext() where
mark_buffer_dirty_inode() gets called again after population is
complete, which I suppose could heal a partial writeout, but there is
also another path in which the buffer does not get marked dirty again.
Regards,
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"
#include <standard.disclaimer>
Hi!
On Sat 23-03-19 15:14:05, Steve Magnani wrote:
> I have been hunting a UDF bug that occasionally results in generation
> of an Allocation Extent Descriptor with an incorrect tagLocation. So
> far I haven't been able to see a path through the code that could
> cause that. But, I noticed some inconsistency in locking during
> AED generation and wonder if it could result in random corruption.
>
> The function udf_update_inode() has this general pattern:
>
> bh = udf_tgetblk(...); // calls sb_getblk()
> lock_buffer(bh);
> memset(bh->b_data, 0, inode->i_sb->s_blocksize);
> // <snip>other code to populate FE/EFE data in the block</snip>
> set_buffer_uptodate(bh);
> unlock_buffer(bh);
> mark_buffer_dirty(bh);
>
> This I can understand - the lock is held for as long as the buffer
> contents are being assembled.
>
> In contrast, udf_setup_indirect_aext(), which constructs an AED,
> has this sequence:
>
> bh = udf_tgetblk(...); // calls sb_getblk()
> lock_buffer(bh);
> memset(bh->b_data, 0, inode->i_sb->s_blocksize);
>
> set_buffer_uptodate(bh);
> unlock_buffer(bh);
> mark_buffer_dirty_inode(bh);
>
> // <snip>other code to populate AED data in the block</snip>
>
> In this case the population of the block occurs without
> the protection of the lock.
>
> Because the block has been marked dirty, does this mean that
> writeback could occur at any point during population?
Yes. Thanks for noticing this!
> There is one path through udf_setup_indirect_aext() where
> mark_buffer_dirty_inode() gets called again after population is
> complete, which I suppose could heal a partial writeout, but there is
> also another path in which the buffer does not get marked dirty again.
Generally, we add new extents to the created indirect extent which dirties
the buffer and that should fix the problem. But you are definitely right
that the code is suspicious and should be fixed. Will you send a patch?
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 3/25/19 11:42 AM, Jan Kara wrote:
> Hi!
>
> On Sat 23-03-19 15:14:05, Steve Magnani wrote:
>> ...
>>
>> In contrast, udf_setup_indirect_aext(), which constructs an AED,
>> has this sequence:
>>
>> bh = udf_tgetblk(...); // calls sb_getblk()
>> lock_buffer(bh);
>> memset(bh->b_data, 0, inode->i_sb->s_blocksize);
>>
>> set_buffer_uptodate(bh);
>> unlock_buffer(bh);
>> mark_buffer_dirty_inode(bh);
>>
>> // <snip>other code to populate AED data in the block</snip>
>>
>> In this case the population of the block occurs without
>> the protection of the lock.
>>
>> Because the block has been marked dirty, does this mean that
>> writeback could occur at any point during population?
> Yes. Thanks for noticing this!
>
>> There is one path through udf_setup_indirect_aext() where
>> mark_buffer_dirty_inode() gets called again after population is
>> complete, which I suppose could heal a partial writeout, but there is
>> also another path in which the buffer does not get marked dirty again.
> Generally, we add new extents to the created indirect extent which dirties
> the buffer and that should fix the problem. But you are definitely right
> that the code is suspicious and should be fixed. Will you send a patch?
>
> Honza
Sure. There's at least one other place where it looked like there might
be a similar issue.
Steve
Jan -
On 3/25/19 11:42 AM, Jan Kara wrote:
> Hi!
>
> On Sat 23-03-19 15:14:05, Steve Magnani wrote:
>> I have been hunting a UDF bug that occasionally results in generation
>> of an Allocation Extent Descriptor with an incorrect tagLocation. So
>> far I haven't been able to see a path through the code that could
>> cause that. But, I noticed some inconsistency in locking during
>> AED generation and wonder if it could result in random corruption.
>>
>> The function udf_update_inode() has this general pattern:
>>
>> bh = udf_tgetblk(...); // calls sb_getblk()
>> lock_buffer(bh);
>> memset(bh->b_data, 0, inode->i_sb->s_blocksize);
>> // <snip>other code to populate FE/EFE data in the block</snip>
>> set_buffer_uptodate(bh);
>> unlock_buffer(bh);
>> mark_buffer_dirty(bh);
>>
>> This I can understand - the lock is held for as long as the buffer
>> contents are being assembled.
>>
>> In contrast, udf_setup_indirect_aext(), which constructs an AED,
>> has this sequence:
>>
>> bh = udf_tgetblk(...); // calls sb_getblk()
>> lock_buffer(bh);
>> memset(bh->b_data, 0, inode->i_sb->s_blocksize);
>>
>> set_buffer_uptodate(bh);
>> unlock_buffer(bh);
>> mark_buffer_dirty_inode(bh);
>>
>> // <snip>other code to populate AED data in the block</snip>
>>
>> In this case the population of the block occurs without
>> the protection of the lock.
>>
>> Because the block has been marked dirty, does this mean that
>> writeback could occur at any point during population?
> Yes. Thanks for noticing this!
>
>> There is one path through udf_setup_indirect_aext() where
>> mark_buffer_dirty_inode() gets called again after population is
>> complete, which I suppose could heal a partial writeout, but there is
>> also another path in which the buffer does not get marked dirty again.
> Generally, we add new extents to the created indirect extent which dirties
> the buffer and that should fix the problem. But you are definitely right
> that the code is suspicious and should be fixed. Will you send a patch?
I did a little archaeology to see how the code evolved to this point.
It's been like this a long time.
I also did some research to understand why filesystems use lock_buffer()
sometimes but not others. For example, the FAT driver never calls it. I
ran across this thread from 2011:
https://lkml.org/lkml/2011/5/16/402
...from which I conclude that while it is correct in a strict sense to
hold a lock on a buffer any time its contents are being modified,
performance considerations make it preferable (or at least reasonable)
to make some modifications without a lock provided it's known that a
subsequent write-out will "fix" any potential partial write out before
anyone else tries to read the block. I doubt that UDF sees common use
with DIF/DIX block devices, which might make a decision in favor of
performance a little easier. Since the FAT driver doesn't contain
Darrick's proposed changes I assume a decision was made that performance
was more important there.
Certainly the call to udf_setup_indirect_aext() from udf_add_aext()
meets those criteria. But udf_table_free_blocks() may not dirty the AED
block.
So if this looks reasonable I will resend as a formal patch:
--- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500
+++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500
@@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode
return -EIO;
lock_buffer(bh);
memset(bh->b_data, 0x00, sb->s_blocksize);
- set_buffer_uptodate(bh);
- unlock_buffer(bh);
- mark_buffer_dirty_inode(bh, inode);
aed = (struct allocExtDesc *)(bh->b_data);
if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) {
@@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode
udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block,
sizeof(struct tag));
+ set_buffer_uptodate(bh);
+ unlock_buffer(bh);
+
nepos.block = neloc;
nepos.offset = sizeof(struct allocExtDesc);
nepos.bh = bh;
@@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode
} else {
__udf_add_aext(inode, epos, &nepos.block,
sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0);
+ /* Make sure completed AED gets written out */
+ mark_buffer_dirty_inode(nepos.bh, inode);
}
brelse(epos->bh);
------------------------------------------------------------------------
Steven J. Magnani "I claim this network for MARS!
http://www.digidescorp.com Earthling, return my space modulator!"
#include <standard.disclaimer>
Hi,
On Sat 30-03-19 14:49:46, Steve Magnani wrote:
> On 3/25/19 11:42 AM, Jan Kara wrote:
> > Hi!
> >
> > On Sat 23-03-19 15:14:05, Steve Magnani wrote:
> > > I have been hunting a UDF bug that occasionally results in generation
> > > of an Allocation Extent Descriptor with an incorrect tagLocation. So
> > > far I haven't been able to see a path through the code that could
> > > cause that. But, I noticed some inconsistency in locking during
> > > AED generation and wonder if it could result in random corruption.
> > >
> > > The function udf_update_inode() has this general pattern:
> > >
> > > bh = udf_tgetblk(...); // calls sb_getblk()
> > > lock_buffer(bh);
> > > memset(bh->b_data, 0, inode->i_sb->s_blocksize);
> > > // <snip>other code to populate FE/EFE data in the block</snip>
> > > set_buffer_uptodate(bh);
> > > unlock_buffer(bh);
> > > mark_buffer_dirty(bh);
> > >
> > > This I can understand - the lock is held for as long as the buffer
> > > contents are being assembled.
> > >
> > > In contrast, udf_setup_indirect_aext(), which constructs an AED,
> > > has this sequence:
> > >
> > > bh = udf_tgetblk(...); // calls sb_getblk()
> > > lock_buffer(bh);
> > > memset(bh->b_data, 0, inode->i_sb->s_blocksize);
> > >
> > > set_buffer_uptodate(bh);
> > > unlock_buffer(bh);
> > > mark_buffer_dirty_inode(bh);
> > >
> > > // <snip>other code to populate AED data in the block</snip>
> > >
> > > In this case the population of the block occurs without
> > > the protection of the lock.
> > >
> > > Because the block has been marked dirty, does this mean that
> > > writeback could occur at any point during population?
> > Yes. Thanks for noticing this!
> >
> > > There is one path through udf_setup_indirect_aext() where
> > > mark_buffer_dirty_inode() gets called again after population is
> > > complete, which I suppose could heal a partial writeout, but there is
> > > also another path in which the buffer does not get marked dirty again.
> > Generally, we add new extents to the created indirect extent which dirties
> > the buffer and that should fix the problem. But you are definitely right
> > that the code is suspicious and should be fixed. Will you send a patch?
>
> I did a little archaeology to see how the code evolved to this point. It's
> been like this a long time.
>
> I also did some research to understand why filesystems use lock_buffer()
> sometimes but not others. For example, the FAT driver never calls it. I ran
> across this thread from 2011:
>
> https://lkml.org/lkml/2011/5/16/402
>
> ...from which I conclude that while it is correct in a strict sense to hold
> a lock on a buffer any time its contents are being modified, performance
> considerations make it preferable (or at least reasonable) to make some
> modifications without a lock provided it's known that a subsequent write-out
> will "fix" any potential partial write out before anyone else tries to read
> the block.
Understood but UDF (and neither FAT) are really that performance critical.
If you look for performance, you'd certainly pick a different filesystem.
UDF is mainly for data interchange so it should work reasonably for copy-in
copy-out style of workloads, the rest isn't that important. So there
correctness and simplicity is preferred over performance.
> I doubt that UDF sees common use with DIF/DIX block devices,
> which might make a decision in favor of performance a little easier. Since
> the FAT driver doesn't contain Darrick's proposed changes I assume a
> decision was made that performance was more important there.
>
> Certainly the call to udf_setup_indirect_aext() from udf_add_aext() meets
> those criteria. But udf_table_free_blocks() may not dirty the AED block.
>
> So if this looks reasonable I will resend as a formal patch:
>
> --- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500
> +++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500
> @@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode
> return -EIO;
> lock_buffer(bh);
> memset(bh->b_data, 0x00, sb->s_blocksize);
> - set_buffer_uptodate(bh);
> - unlock_buffer(bh);
> - mark_buffer_dirty_inode(bh, inode);
> aed = (struct allocExtDesc *)(bh->b_data);
> if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) {
> @@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode
> udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block,
> sizeof(struct tag));
> + set_buffer_uptodate(bh);
> + unlock_buffer(bh);
> +
> nepos.block = neloc;
> nepos.offset = sizeof(struct allocExtDesc);
> nepos.bh = bh;
> @@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode
> } else {
> __udf_add_aext(inode, epos, &nepos.block,
> sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0);
> + /* Make sure completed AED gets written out */
> + mark_buffer_dirty_inode(nepos.bh, inode);
Why do you mark the buffer as dirty only here? I'd just mark it dirty after
unlocking. If __udf_add_aext() or udf_write_aext() modify the buffer, they
will mark it as dirty as well... Thanks!
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR