There is one case can cause data corruption.
- write 4k to fileA
- fsync fileA, 4k data is writebacked to lbaA
- write 4k to fileA
- kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
- write 4k to fileB
- kworker flush 4k to lbaA due to SSR
- SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
data
One solution is tracking all fsynced file's block history, and disallow
SSR overwrite on newly invalidated block on that file.
However, during recovery, no matter the dnode is flushed or fsynced, all
previous dnodes until last fsynced one in node chain can be recovered,
that means we need to record all block change in flushed dnode, which
will cause heavy cost, so let's just use simple fix by forbidding SSR
overwrite directly.
Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/segment.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9d9d9a050d59..69b3b553ee6b 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2205,9 +2205,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
if (!f2fs_test_and_set_bit(offset, se->discard_map))
sbi->discard_blks--;
- /* don't overwrite by SSR to keep node chain */
- if (IS_NODESEG(se->type) &&
- !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
+ /*
+ * SSR should never reuse block which is checkpointed
+ * or newly invalidated.
+ */
+ if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
se->ckpt_valid_blocks++;
}
--
2.18.0.rc1
On 2019/8/16 11:03, Chao Yu wrote:
> There is one case can cause data corruption.
>
> - write 4k to fileA
> - fsync fileA, 4k data is writebacked to lbaA
> - write 4k to fileA
> - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
> - write 4k to fileB
> - kworker flush 4k to lbaA due to SSR
> - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
> data
>
> One solution is tracking all fsynced file's block history, and disallow
> SSR overwrite on newly invalidated block on that file.
>
> However, during recovery, no matter the dnode is flushed or fsynced, all
> previous dnodes until last fsynced one in node chain can be recovered,
> that means we need to record all block change in flushed dnode, which
> will cause heavy cost, so let's just use simple fix by forbidding SSR
> overwrite directly.
>
Jaegeuk,
Please help to add below missed tag to keep this patch being merged in stable
kernel.
Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
Thanks,
> Signed-off-by: Chao Yu <[email protected]>
> ---
> fs/f2fs/segment.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9d9d9a050d59..69b3b553ee6b 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2205,9 +2205,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> if (!f2fs_test_and_set_bit(offset, se->discard_map))
> sbi->discard_blks--;
>
> - /* don't overwrite by SSR to keep node chain */
> - if (IS_NODESEG(se->type) &&
> - !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> + /*
> + * SSR should never reuse block which is checkpointed
> + * or newly invalidated.
> + */
> + if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
> se->ckpt_valid_blocks++;
> }
>
On 08/19, Chao Yu wrote:
> On 2019/8/16 11:03, Chao Yu wrote:
> > There is one case can cause data corruption.
> >
> > - write 4k to fileA
> > - fsync fileA, 4k data is writebacked to lbaA
> > - write 4k to fileA
> > - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
> > - write 4k to fileB
> > - kworker flush 4k to lbaA due to SSR
> > - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
> > data
> >
> > One solution is tracking all fsynced file's block history, and disallow
> > SSR overwrite on newly invalidated block on that file.
> >
> > However, during recovery, no matter the dnode is flushed or fsynced, all
> > previous dnodes until last fsynced one in node chain can be recovered,
> > that means we need to record all block change in flushed dnode, which
> > will cause heavy cost, so let's just use simple fix by forbidding SSR
> > overwrite directly.
> >
>
> Jaegeuk,
>
> Please help to add below missed tag to keep this patch being merged in stable
> kernel.
>
> Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
Done.
>
> Thanks,
>
> > Signed-off-by: Chao Yu <[email protected]>
> > ---
> > fs/f2fs/segment.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 9d9d9a050d59..69b3b553ee6b 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -2205,9 +2205,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> > if (!f2fs_test_and_set_bit(offset, se->discard_map))
> > sbi->discard_blks--;
> >
> > - /* don't overwrite by SSR to keep node chain */
> > - if (IS_NODESEG(se->type) &&
> > - !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> > + /*
> > + * SSR should never reuse block which is checkpointed
> > + * or newly invalidated.
> > + */
> > + if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> > if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
> > se->ckpt_valid_blocks++;
> > }
> >
On 2019/8/20 9:00, Jaegeuk Kim wrote:
> On 08/19, Chao Yu wrote:
>> On 2019/8/16 11:03, Chao Yu wrote:
>>> There is one case can cause data corruption.
>>>
>>> - write 4k to fileA
>>> - fsync fileA, 4k data is writebacked to lbaA
>>> - write 4k to fileA
>>> - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
>>> - write 4k to fileB
>>> - kworker flush 4k to lbaA due to SSR
>>> - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
>>> data
>>>
>>> One solution is tracking all fsynced file's block history, and disallow
>>> SSR overwrite on newly invalidated block on that file.
>>>
>>> However, during recovery, no matter the dnode is flushed or fsynced, all
>>> previous dnodes until last fsynced one in node chain can be recovered,
>>> that means we need to record all block change in flushed dnode, which
>>> will cause heavy cost, so let's just use simple fix by forbidding SSR
>>> overwrite directly.
>>>
>>
>> Jaegeuk,
>>
>> Please help to add below missed tag to keep this patch being merged in stable
>> kernel.
>>
>> Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
>
> Done.
Thanks! :)
Thanks,
On Fri, Aug 16, 2019 at 11:03:34AM +0800, Chao Yu wrote:
> There is one case can cause data corruption.
>
> - write 4k to fileA
> - fsync fileA, 4k data is writebacked to lbaA
> - write 4k to fileA
> - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
> - write 4k to fileB
> - kworker flush 4k to lbaA due to SSR
> - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
> data
>
> One solution is tracking all fsynced file's block history, and disallow
> SSR overwrite on newly invalidated block on that file.
>
> However, during recovery, no matter the dnode is flushed or fsynced, all
> previous dnodes until last fsynced one in node chain can be recovered,
> that means we need to record all block change in flushed dnode, which
> will cause heavy cost, so let's just use simple fix by forbidding SSR
> overwrite directly.
>
> Signed-off-by: Chao Yu <[email protected]>
> ---
> fs/f2fs/segment.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9d9d9a050d59..69b3b553ee6b 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2205,9 +2205,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> if (!f2fs_test_and_set_bit(offset, se->discard_map))
> sbi->discard_blks--;
>
> - /* don't overwrite by SSR to keep node chain */
> - if (IS_NODESEG(se->type) &&
> - !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> + /*
> + * SSR should never reuse block which is checkpointed
> + * or newly invalidated.
> + */
> + if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
> se->ckpt_valid_blocks++;
> }
> --
FYI, this commit caused xfstests generic/064 to start failing:
$ kvm-xfstests -c f2fs generic/064
...
generic/064 3s ... [13:36:37][ 5.946293] run fstests generic/064 at 2019-09-26 13:36:37
[13:36:41]- output mismatch (see /results/f2fs/results-default/generic/064.out.bad)
--- tests/generic/064.out 2019-09-18 04:53:46.000000000 -0700
+++ /results/f2fs/results-default/generic/064.out.bad 2019-09-26 13:36:41.533018683 -0700
@@ -1,2 +1,3 @@
QA output created by 064
Extent count after inserts is in range
+extents mismatched before = 1 after = 50
...
(Run 'diff -u /root/xfstests/tests/generic/064.out /results/f2fs/results-default/generic/064.out.bad' to see the entire diff)
Ran: generic/064
Failures: generic/064
Failed 1 of 1 tests
On 09/26, Eric Biggers wrote:
> On Fri, Aug 16, 2019 at 11:03:34AM +0800, Chao Yu wrote:
> > There is one case can cause data corruption.
> >
> > - write 4k to fileA
> > - fsync fileA, 4k data is writebacked to lbaA
> > - write 4k to fileA
> > - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
> > - write 4k to fileB
> > - kworker flush 4k to lbaA due to SSR
> > - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
> > data
> >
> > One solution is tracking all fsynced file's block history, and disallow
> > SSR overwrite on newly invalidated block on that file.
> >
> > However, during recovery, no matter the dnode is flushed or fsynced, all
> > previous dnodes until last fsynced one in node chain can be recovered,
> > that means we need to record all block change in flushed dnode, which
> > will cause heavy cost, so let's just use simple fix by forbidding SSR
> > overwrite directly.
> >
> > Signed-off-by: Chao Yu <[email protected]>
> > ---
> > fs/f2fs/segment.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 9d9d9a050d59..69b3b553ee6b 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -2205,9 +2205,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> > if (!f2fs_test_and_set_bit(offset, se->discard_map))
> > sbi->discard_blks--;
> >
> > - /* don't overwrite by SSR to keep node chain */
> > - if (IS_NODESEG(se->type) &&
> > - !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> > + /*
> > + * SSR should never reuse block which is checkpointed
> > + * or newly invalidated.
> > + */
> > + if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> > if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
> > se->ckpt_valid_blocks++;
> > }
> > --
>
> FYI, this commit caused xfstests generic/064 to start failing:
Yup, I was looking at this.
>
> $ kvm-xfstests -c f2fs generic/064
> ...
> generic/064 3s ... [13:36:37][ 5.946293] run fstests generic/064 at 2019-09-26 13:36:37
> [13:36:41]- output mismatch (see /results/f2fs/results-default/generic/064.out.bad)
> --- tests/generic/064.out 2019-09-18 04:53:46.000000000 -0700
> +++ /results/f2fs/results-default/generic/064.out.bad 2019-09-26 13:36:41.533018683 -0700
> @@ -1,2 +1,3 @@
> QA output created by 064
> Extent count after inserts is in range
> +extents mismatched before = 1 after = 50
> ...
> (Run 'diff -u /root/xfstests/tests/generic/064.out /results/f2fs/results-default/generic/064.out.bad' to see the entire diff)
> Ran: generic/064
> Failures: generic/064
> Failed 1 of 1 tests
On 09/26, Jaegeuk Kim wrote:
> On 09/26, Eric Biggers wrote:
> > On Fri, Aug 16, 2019 at 11:03:34AM +0800, Chao Yu wrote:
> > > There is one case can cause data corruption.
> > >
> > > - write 4k to fileA
> > > - fsync fileA, 4k data is writebacked to lbaA
> > > - write 4k to fileA
> > > - kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
> > > - write 4k to fileB
> > > - kworker flush 4k to lbaA due to SSR
> > > - SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
> > > data
> > >
> > > One solution is tracking all fsynced file's block history, and disallow
> > > SSR overwrite on newly invalidated block on that file.
> > >
> > > However, during recovery, no matter the dnode is flushed or fsynced, all
> > > previous dnodes until last fsynced one in node chain can be recovered,
> > > that means we need to record all block change in flushed dnode, which
> > > will cause heavy cost, so let's just use simple fix by forbidding SSR
> > > overwrite directly.
> > >
> > > Signed-off-by: Chao Yu <[email protected]>
> > > ---
> > > fs/f2fs/segment.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index 9d9d9a050d59..69b3b553ee6b 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -2205,9 +2205,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> > > if (!f2fs_test_and_set_bit(offset, se->discard_map))
> > > sbi->discard_blks--;
> > >
> > > - /* don't overwrite by SSR to keep node chain */
> > > - if (IS_NODESEG(se->type) &&
> > > - !is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> > > + /*
> > > + * SSR should never reuse block which is checkpointed
> > > + * or newly invalidated.
> > > + */
> > > + if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
> > > if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
> > > se->ckpt_valid_blocks++;
> > > }
> > > --
> >
> > FYI, this commit caused xfstests generic/064 to start failing:
>
> Yup, I was looking at this.
It seems fcollapse couldn't allocate blocks sequential when rewriting blocks.
We need to adjust like this:
---
tests/generic/064 | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tests/generic/064 b/tests/generic/064
index 1ace14b6..058258d5 100755
--- a/tests/generic/064
+++ b/tests/generic/064
@@ -72,7 +72,9 @@ done
extent_after=`_count_extents $dest`
if [ $extent_before -ne $extent_after ]; then
- echo "extents mismatched before = $extent_before after = $extent_after"
+ if [ "$FSTYP" != "f2fs" ] || [ $extent_before -ne 1 ] || [ $extent_after -ne 50 ]; then
+ echo "extents mismatched before = $extent_before after = $extent_after"
+ fi
fi
# compare original file and test file.
--
2.19.0.605.g01d371f741-goog