If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
are free nids in that NAT block between the start of the block and
next_free_nid, then those free nids will not be scanned in scan_nat_page().
This results into mismatch between nm_i->available_nids and the sum of
nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
will always be greater than the sum of free nids in all the blocks.
Under this condition, if we use all the currently scanned free nids,
then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
is still not zero but nm_i->free_nid_count of that partially scanned
NAT block is zero.
Fix this to align the nm_i->next_scan_nid to the first nid of the
corresponding NAT block.
Signed-off-by: Sahitya Tummala <[email protected]>
---
fs/f2fs/node.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 9bbaa26..d615e59 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2402,6 +2402,8 @@ static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi,
if (IS_ERR(page)) {
ret = PTR_ERR(page);
} else {
+ if (nid % NAT_ENTRY_PER_BLOCK)
+ nid = NAT_BLOCK_OFFSET(nid) * NAT_ENTRY_PER_BLOCK;
ret = scan_nat_page(sbi, page, nid);
f2fs_put_page(page, 1);
}
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
On 2020/8/14 16:05, Sahitya Tummala wrote:
> If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
> are free nids in that NAT block between the start of the block and
> next_free_nid, then those free nids will not be scanned in scan_nat_page().
> This results into mismatch between nm_i->available_nids and the sum of
> nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
> will always be greater than the sum of free nids in all the blocks.
> Under this condition, if we use all the currently scanned free nids,
> then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
> is still not zero but nm_i->free_nid_count of that partially scanned
> NAT block is zero.
>
> Fix this to align the nm_i->next_scan_nid to the first nid of the
> corresponding NAT block.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> ---
> fs/f2fs/node.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 9bbaa26..d615e59 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -2402,6 +2402,8 @@ static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi,
> if (IS_ERR(page)) {
> ret = PTR_ERR(page);
> } else {
> + if (nid % NAT_ENTRY_PER_BLOCK)
> + nid = NAT_BLOCK_OFFSET(nid) * NAT_ENTRY_PER_BLOCK;
How about moving this logic to the beginning of __f2fs_build_free_nids(),
after nid reset?
BTW, it looks we can add unlikely in this judgment condition?
Thanks,
> ret = scan_nat_page(sbi, page, nid);
> f2fs_put_page(page, 1);
> }
>
On Tue, Aug 18, 2020 at 04:29:05PM +0800, Chao Yu wrote:
> On 2020/8/14 16:05, Sahitya Tummala wrote:
> >If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
> >are free nids in that NAT block between the start of the block and
> >next_free_nid, then those free nids will not be scanned in scan_nat_page().
> >This results into mismatch between nm_i->available_nids and the sum of
> >nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
> >will always be greater than the sum of free nids in all the blocks.
> >Under this condition, if we use all the currently scanned free nids,
> >then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
> >is still not zero but nm_i->free_nid_count of that partially scanned
> >NAT block is zero.
> >
> >Fix this to align the nm_i->next_scan_nid to the first nid of the
> >corresponding NAT block.
> >
> >Signed-off-by: Sahitya Tummala <[email protected]>
> >---
> > fs/f2fs/node.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> >diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> >index 9bbaa26..d615e59 100644
> >--- a/fs/f2fs/node.c
> >+++ b/fs/f2fs/node.c
> >@@ -2402,6 +2402,8 @@ static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi,
> > if (IS_ERR(page)) {
> > ret = PTR_ERR(page);
> > } else {
> >+ if (nid % NAT_ENTRY_PER_BLOCK)
> >+ nid = NAT_BLOCK_OFFSET(nid) * NAT_ENTRY_PER_BLOCK;
>
> How about moving this logic to the beginning of __f2fs_build_free_nids(),
> after nid reset?
>
Sure, I will move it.
> BTW, it looks we can add unlikely in this judgment condition?
But it may not be an unlikely as it can happen whenever checkpoint is done,
based on the next available free nid in function next_free_nid(), which can happen
quite a few times, right?
Hitting the loop forever issue condition due to this could be a rare/difficult to
reproduce but this check itself may not be unlikely in my opinion.
Thanks,
>
> Thanks,
>
> > ret = scan_nat_page(sbi, page, nid);
> > f2fs_put_page(page, 1);
> > }
> >
--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On 2020/8/18 17:55, Sahitya Tummala wrote:
> On Tue, Aug 18, 2020 at 04:29:05PM +0800, Chao Yu wrote:
>> On 2020/8/14 16:05, Sahitya Tummala wrote:
>>> If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
>>> are free nids in that NAT block between the start of the block and
>>> next_free_nid, then those free nids will not be scanned in scan_nat_page().
>>> This results into mismatch between nm_i->available_nids and the sum of
>>> nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
>>> will always be greater than the sum of free nids in all the blocks.
>>> Under this condition, if we use all the currently scanned free nids,
>>> then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
>>> is still not zero but nm_i->free_nid_count of that partially scanned
>>> NAT block is zero.
>>>
>>> Fix this to align the nm_i->next_scan_nid to the first nid of the
>>> corresponding NAT block.
>>>
>>> Signed-off-by: Sahitya Tummala <[email protected]>
>>> ---
>>> fs/f2fs/node.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>> index 9bbaa26..d615e59 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -2402,6 +2402,8 @@ static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi,
>>> if (IS_ERR(page)) {
>>> ret = PTR_ERR(page);
>>> } else {
>>> + if (nid % NAT_ENTRY_PER_BLOCK)
>>> + nid = NAT_BLOCK_OFFSET(nid) * NAT_ENTRY_PER_BLOCK;
>>
>> How about moving this logic to the beginning of __f2fs_build_free_nids(),
>> after nid reset?
>>
>
> Sure, I will move it.
>
>> BTW, it looks we can add unlikely in this judgment condition?
>
> But it may not be an unlikely as it can happen whenever checkpoint is done,
> based on the next available free nid in function next_free_nid(), which can happen
> quite a few times, right?
Oh, yes, I missed that place, please ignore that suggestion.. :)
Thanks,
>
> Hitting the loop forever issue condition due to this could be a rare/difficult to
> reproduce but this check itself may not be unlikely in my opinion.
>
> Thanks,
>
>>
>> Thanks,
>>
>>> ret = scan_nat_page(sbi, page, nid);
>>> f2fs_put_page(page, 1);
>>> }
>>>
>
On Tue, Aug 18, 2020 at 03:25:47PM +0530, Sahitya Tummala wrote:
> On Tue, Aug 18, 2020 at 04:29:05PM +0800, Chao Yu wrote:
> > On 2020/8/14 16:05, Sahitya Tummala wrote:
> > >If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
> > >are free nids in that NAT block between the start of the block and
> > >next_free_nid, then those free nids will not be scanned in scan_nat_page().
> > >This results into mismatch between nm_i->available_nids and the sum of
> > >nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
> > >will always be greater than the sum of free nids in all the blocks.
> > >Under this condition, if we use all the currently scanned free nids,
> > >then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
> > >is still not zero but nm_i->free_nid_count of that partially scanned
> > >NAT block is zero.
> > >
> > >Fix this to align the nm_i->next_scan_nid to the first nid of the
> > >corresponding NAT block.
> > >
> > >Signed-off-by: Sahitya Tummala <[email protected]>
> > >---
> > > fs/f2fs/node.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > >diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > >index 9bbaa26..d615e59 100644
> > >--- a/fs/f2fs/node.c
> > >+++ b/fs/f2fs/node.c
> > >@@ -2402,6 +2402,8 @@ static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi,
> > > if (IS_ERR(page)) {
> > > ret = PTR_ERR(page);
> > > } else {
> > >+ if (nid % NAT_ENTRY_PER_BLOCK)
> > >+ nid = NAT_BLOCK_OFFSET(nid) * NAT_ENTRY_PER_BLOCK;
> >
> > How about moving this logic to the beginning of __f2fs_build_free_nids(),
> > after nid reset?
> >
>
> Sure, I will move it.
>
> > BTW, it looks we can add unlikely in this judgment condition?
>
> But it may not be an unlikely as it can happen whenever checkpoint is done,
> based on the next available free nid in function next_free_nid(), which can happen
> quite a few times, right?
>
> Hitting the loop forever issue condition due to this could be a rare/difficult to
> reproduce but this check itself may not be unlikely in my opinion.
>
Sorry, I was wrong above. During CP we update only ckpt->next_free_nid but not
the nm_i->next_free_nid, which is done only once during boot up.
So yes, I will mark it as unlikely conditiona.
Thanks,
> Thanks,
>
> >
> > Thanks,
> >
> > > ret = scan_nat_page(sbi, page, nid);
> > > f2fs_put_page(page, 1);
> > > }
> > >
>
> --
> --
> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.