LinuxLists.cc - [PATCH] f2fs: retry to update the inode page given EIO

2023-01-06 01:15:33

by Jaegeuk Kim

[permalink] [raw]

Subject: [PATCH] f2fs: retry to update the inode page given EIO

2023-01-11 01:48:10

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [PATCH v2] f2fs: retry to update the inode page given EIO

2023-01-11 13:06:45

by Chao Yu

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v2] f2fs: retry to update the inode page given EIO

On 2023/1/11 9:20, Jaegeuk Kim wrote:
> In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
> f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
> As a result, we don't need to stop checkpoint right away given single EIO.

f2fs_handle_page_eio() only covers the case that EIO occurs on the same
page, should we cover the case EIO occurs on different pages?

Thanks,

>
> Cc: [email protected]
> Signed-off-by: Randall Huang <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
>
> Change log from v1:
> - fix a bug
>
> fs/f2fs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index ff6cf66ed46b..2ed7a621fdf1 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
> if (IS_ERR(node_page)) {
> int err = PTR_ERR(node_page);
>
> - if (err == -ENOMEM) {
> + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
> cond_resched();
> goto retry;
> } else if (err != -ENOENT) {

2023-01-11 19:17:22

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v2] f2fs: retry to update the inode page given EIO

On 01/11, Chao Yu wrote:
> On 2023/1/11 9:20, Jaegeuk Kim wrote:
> > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
> > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
> > As a result, we don't need to stop checkpoint right away given single EIO.
>
> f2fs_handle_page_eio() only covers the case that EIO occurs on the same
> page, should we cover the case EIO occurs on different pages?

Which case are you looking at?

>
> Thanks,
>
> >
> > Cc: [email protected]
> > Signed-off-by: Randall Huang <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> >
> > Change log from v1:
> > - fix a bug
> >
> > fs/f2fs/inode.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > index ff6cf66ed46b..2ed7a621fdf1 100644
> > --- a/fs/f2fs/inode.c
> > +++ b/fs/f2fs/inode.c
> > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
> > if (IS_ERR(node_page)) {
> > int err = PTR_ERR(node_page);
> > - if (err == -ENOMEM) {
> > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
> > cond_resched();
> > goto retry;
> > } else if (err != -ENOENT) {

2023-01-11 20:25:50

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v3] f2fs: retry to update the inode page given EIO

In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
As a result, we don't need to stop checkpoint right away given single EIO.

Cc: [email protected]
Signed-off-by: Randall Huang <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---

Change log from v2:
- set EIO to cover the case of data corruption given by buggy UFS driver

fs/f2fs/inode.c | 2 +-
fs/f2fs/node.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index ff6cf66ed46b..2ed7a621fdf1 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
if (IS_ERR(node_page)) {
int err = PTR_ERR(node_page);

- if (err == -ENOMEM) {
+ if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
cond_resched();
goto retry;
} else if (err != -ENOENT) {
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 558b318f7316..1629dc300c61 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1455,7 +1455,7 @@ static struct page *__get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid,
ofs_of_node(page), cpver_of_node(page),
next_blkaddr_of_node(page));
set_sbi_flag(sbi, SBI_NEED_FSCK);
- err = -EINVAL;
+ err = -EIO;
out_err:
ClearPageUptodate(page);
out_put_err:
--
2.39.0.314.g84b9a713c41-goog

2023-01-12 10:27:17

by Chao Yu

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v2] f2fs: retry to update the inode page given EIO

On 2023/1/12 2:50, Jaegeuk Kim wrote:
> On 01/11, Chao Yu wrote:
>> On 2023/1/11 9:20, Jaegeuk Kim wrote:
>>> In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
>>> f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
>>> As a result, we don't need to stop checkpoint right away given single EIO.
>>
>> f2fs_handle_page_eio() only covers the case that EIO occurs on the same
>> page, should we cover the case EIO occurs on different pages?
>
> Which case are you looking at?

- __get_node_page(PageA) - __get_node_page(PageB)
- f2fs_handle_page_eio
- sbi->page_eio_ofs[type] = PageA->index
- f2fs_handle_page_eio
- sbi->page_eio_ofs[type] = PageB->index

In such race case, it may has low probability to set CP_ERROR_FLAG as we expect?

Thanks,

>
>>
>> Thanks,
>>
>>>
>>> Cc: [email protected]
>>> Signed-off-by: Randall Huang <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>>
>>> Change log from v1:
>>> - fix a bug
>>>
>>> fs/f2fs/inode.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>> index ff6cf66ed46b..2ed7a621fdf1 100644
>>> --- a/fs/f2fs/inode.c
>>> +++ b/fs/f2fs/inode.c
>>> @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
>>> if (IS_ERR(node_page)) {
>>> int err = PTR_ERR(node_page);
>>> - if (err == -ENOMEM) {
>>> + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
>>> cond_resched();
>>> goto retry;
>>> } else if (err != -ENOENT) {

2023-01-13 00:24:39

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v2] f2fs: retry to update the inode page given EIO

On 01/12, Chao Yu wrote:
> On 2023/1/12 2:50, Jaegeuk Kim wrote:
> > On 01/11, Chao Yu wrote:
> > > On 2023/1/11 9:20, Jaegeuk Kim wrote:
> > > > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
> > > > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
> > > > As a result, we don't need to stop checkpoint right away given single EIO.
> > >
> > > f2fs_handle_page_eio() only covers the case that EIO occurs on the same
> > > page, should we cover the case EIO occurs on different pages?
> >
> > Which case are you looking at?
>
> - __get_node_page(PageA) - __get_node_page(PageB)
> - f2fs_handle_page_eio
> - sbi->page_eio_ofs[type] = PageA->index
> - f2fs_handle_page_eio
> - sbi->page_eio_ofs[type] = PageB->index
>
> In such race case, it may has low probability to set CP_ERROR_FLAG as we expect?

Do you see that case in products?
I'm trying to avoid setting CP_ERROR_FLAG here.

>
> Thanks,
>
> >
> > >
> > > Thanks,
> > >
> > > >
> > > > Cc: [email protected]
> > > > Signed-off-by: Randall Huang <[email protected]>
> > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > ---
> > > >
> > > > Change log from v1:
> > > > - fix a bug
> > > >
> > > > fs/f2fs/inode.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > > > index ff6cf66ed46b..2ed7a621fdf1 100644
> > > > --- a/fs/f2fs/inode.c
> > > > +++ b/fs/f2fs/inode.c
> > > > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
> > > > if (IS_ERR(node_page)) {
> > > > int err = PTR_ERR(node_page);
> > > > - if (err == -ENOMEM) {
> > > > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
> > > > cond_resched();
> > > > goto retry;
> > > > } else if (err != -ENOENT) {

2023-01-28 03:11:53

by Chao Yu

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v2] f2fs: retry to update the inode page given EIO

On 2023/1/13 8:01, Jaegeuk Kim wrote:
> On 01/12, Chao Yu wrote:
>> On 2023/1/12 2:50, Jaegeuk Kim wrote:
>>> On 01/11, Chao Yu wrote:
>>>> On 2023/1/11 9:20, Jaegeuk Kim wrote:
>>>>> In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
>>>>> f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
>>>>> As a result, we don't need to stop checkpoint right away given single EIO.
>>>>
>>>> f2fs_handle_page_eio() only covers the case that EIO occurs on the same
>>>> page, should we cover the case EIO occurs on different pages?
>>>
>>> Which case are you looking at?
>>
>> - __get_node_page(PageA) - __get_node_page(PageB)
>> - f2fs_handle_page_eio
>> - sbi->page_eio_ofs[type] = PageA->index
>> - f2fs_handle_page_eio
>> - sbi->page_eio_ofs[type] = PageB->index
>>
>> In such race case, it may has low probability to set CP_ERROR_FLAG as we expect?
>
> Do you see that case in products?

Not yet.

> I'm trying to avoid setting CP_ERROR_FLAG here.

Copied, how about using the same logic for node page as meta page, like
we did in f2fs_get_meta_page_retry()?

Thanks,

>
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>> Cc: [email protected]
>>>>> Signed-off-by: Randall Huang <[email protected]>
>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>> ---
>>>>>
>>>>> Change log from v1:
>>>>> - fix a bug
>>>>>
>>>>> fs/f2fs/inode.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>>>> index ff6cf66ed46b..2ed7a621fdf1 100644
>>>>> --- a/fs/f2fs/inode.c
>>>>> +++ b/fs/f2fs/inode.c
>>>>> @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
>>>>> if (IS_ERR(node_page)) {
>>>>> int err = PTR_ERR(node_page);
>>>>> - if (err == -ENOMEM) {
>>>>> + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
>>>>> cond_resched();
>>>>> goto retry;
>>>>> } else if (err != -ENOENT) {

2023-01-30 23:30:33

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v3] f2fs: retry to update the inode page given EIO

If the storage gives a corrupted node block due to short power failure and
reset, f2fs stops the entire operations by setting the checkpoint failure flag.

Let's give more chances to live by re-issuing IOs for a while in such critical
path.

Cc: [email protected]
Suggested-by: Randall Huang <[email protected]>
Suggested-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---

Change log from v2:
- adopting the retry logic

fs/f2fs/inode.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index e1d6e021e82b..7bf660d4cad9 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -724,18 +724,19 @@ void f2fs_update_inode_page(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *node_page;
+ int count = 0;
retry:
node_page = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page)) {
int err = PTR_ERR(node_page);

- if (err == -ENOMEM) {
- cond_resched();
+ /* The node block was truncated. */
+ if (err == -ENOENT)
+ return;
+
+ if (err == -ENOMEM || ++count <= DEFAULT_RETRY_IO_COUNT)
goto retry;
- } else if (err != -ENOENT) {
- f2fs_stop_checkpoint(sbi, false,
- STOP_CP_REASON_UPDATE_INODE);
- }
+ f2fs_stop_checkpoint(sbi, false, STOP_CP_REASON_UPDATE_INODE);
return;
}
f2fs_update_inode(inode, node_page);
--
2.39.1.456.gfc5497dd1b-goog

2023-01-30 23:31:08

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v2] f2fs: retry to update the inode page given EIO

On 01/28, Chao Yu wrote:
> On 2023/1/13 8:01, Jaegeuk Kim wrote:
> > On 01/12, Chao Yu wrote:
> > > On 2023/1/12 2:50, Jaegeuk Kim wrote:
> > > > On 01/11, Chao Yu wrote:
> > > > > On 2023/1/11 9:20, Jaegeuk Kim wrote:
> > > > > > In f2fs_update_inode_page, f2fs_get_node_page handles EIO along with
> > > > > > f2fs_handle_page_eio that stops checkpoint, if the disk couldn't be recovered.
> > > > > > As a result, we don't need to stop checkpoint right away given single EIO.
> > > > >
> > > > > f2fs_handle_page_eio() only covers the case that EIO occurs on the same
> > > > > page, should we cover the case EIO occurs on different pages?
> > > >
> > > > Which case are you looking at?
> > >
> > > - __get_node_page(PageA) - __get_node_page(PageB)
> > > - f2fs_handle_page_eio
> > > - sbi->page_eio_ofs[type] = PageA->index
> > > - f2fs_handle_page_eio
> > > - sbi->page_eio_ofs[type] = PageB->index
> > >
> > > In such race case, it may has low probability to set CP_ERROR_FLAG as we expect?
> >
> > Do you see that case in products?
>
> Not yet.
>
> > I'm trying to avoid setting CP_ERROR_FLAG here.
>
> Copied, how about using the same logic for node page as meta page, like
> we did in f2fs_get_meta_page_retry()?

Yeah.. let me send v3 and test a bit. Thanks~ :)

>
> Thanks,
>
> >
> > >
> > > Thanks,
> > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > >
> > > > > > Cc: [email protected]
> > > > > > Signed-off-by: Randall Huang <[email protected]>
> > > > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Change log from v1:
> > > > > > - fix a bug
> > > > > >
> > > > > > fs/f2fs/inode.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> > > > > > index ff6cf66ed46b..2ed7a621fdf1 100644
> > > > > > --- a/fs/f2fs/inode.c
> > > > > > +++ b/fs/f2fs/inode.c
> > > > > > @@ -719,7 +719,7 @@ void f2fs_update_inode_page(struct inode *inode)
> > > > > > if (IS_ERR(node_page)) {
> > > > > > int err = PTR_ERR(node_page);
> > > > > > - if (err == -ENOMEM) {
> > > > > > + if (err == -ENOMEM || (err == -EIO && !f2fs_cp_error(sbi))) {
> > > > > > cond_resched();
> > > > > > goto retry;
> > > > > > } else if (err != -ENOENT) {

2023-01-31 03:40:44

by Chao Yu

[permalink] [raw]

Subject: Re: [f2fs-dev] [PATCH v3] f2fs: retry to update the inode page given EIO

On 2023/1/31 7:30, Jaegeuk Kim wrote:
> If the storage gives a corrupted node block due to short power failure and
> reset, f2fs stops the entire operations by setting the checkpoint failure flag.
>
> Let's give more chances to live by re-issuing IOs for a while in such critical
> path.
>
> Cc: [email protected]
> Suggested-by: Randall Huang <[email protected]>
> Suggested-by: Chao Yu <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,