2023-06-13 21:00:57

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH] f2fs: do not issue small discard commands during checkpoint

If there're huge # of small discards, this will increase checkpoint latency
insanely. Let's issue small discards only by trim.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/segment.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0c0c033c4bdd..ef46bb085385 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2178,7 +2178,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
}
mutex_unlock(&dirty_i->seglist_lock);

- if (!f2fs_block_unit_discard(sbi))
+ if (!f2fs_block_unit_discard(sbi) || !force)
goto wakeup;

/* send small discards */
@@ -2192,8 +2192,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
sbi->blocks_per_seg, cur_pos);
len = next_pos - cur_pos;

- if (f2fs_sb_has_blkzoned(sbi) ||
- (force && len < cpc->trim_minlen))
+ if (f2fs_sb_has_blkzoned(sbi) || len < cpc->trim_minlen)
goto skip;

f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
--
2.41.0.162.gfafddb0af9-goog



2023-06-14 06:17:44

by Daejun Park

[permalink] [raw]
Subject: RE: [f2fs-dev] [PATCH] f2fs: do not issue small discard commands during checkpoint

Hi Jaegeuk,

> If there're huge # of small discards, this will increase checkpoint latency
> insanely. Let's issue small discards only by trim.
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/segment.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 0c0c033c4bdd..ef46bb085385 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2178,7 +2178,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>          }
>          mutex_unlock(&dirty_i->seglist_lock);
>
> -        if (!f2fs_block_unit_discard(sbi))
> +        if (!f2fs_block_unit_discard(sbi) || !force)

If we don't handle the discard entries here, dcc->entry_list will still have them,
so stale discard entries may be handled by trim, causing incorrect discards to be issued.
Therefore, I think this patch should also prevent the creation of discard entries
in add_discard_addrs(), unless it is a checkpoint caused by trim.
This would further reduce the latency caused by the creation of a discard entry.

Thanks,
Daejun


2023-06-14 16:37:07

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: do not issue small discard commands during checkpoint

Hi Daejun,

On 06/14, Daejun Park wrote:
> Hi Jaegeuk,
>
> > If there're huge # of small discards, this will increase checkpoint latency
> > insanely. Let's issue small discards only by trim.
> >
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/segment.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 0c0c033c4bdd..ef46bb085385 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -2178,7 +2178,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > ? ? ? ? ?}
> > ? ? ? ? ?mutex_unlock(&dirty_i->seglist_lock);
> >
> > - ? ? ? ?if (!f2fs_block_unit_discard(sbi))
> > + ? ? ? ?if (!f2fs_block_unit_discard(sbi) || !force)
>
> If we don't handle the discard entries here, dcc->entry_list will still have them,
> so stale discard entries may be handled by trim, causing incorrect discards to be issued.
> Therefore, I think this patch should also prevent the creation of discard entries
> in add_discard_addrs(), unless it is a checkpoint caused by trim.
> This would further reduce the latency caused by the creation of a discard entry.

I found this causes some objects were not reclaimed when removing the module.
Hence I'm testing v2.

>
> Thanks,
> Daejun
>

2023-06-14 17:13:22

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH v2] f2fs: do not issue small discard commands during checkpoint

If there're huge # of small discards, this will increase checkpoint latency
insanely. Let's issue small discards only by trim.

Signed-off-by: Jaegeuk Kim <[email protected]>
---

Change log from v1:
- move the skip logic to avoid dangling objects

fs/f2fs/segment.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 8c7af8b4fc47..0457d620011f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
len = next_pos - cur_pos;

if (f2fs_sb_has_blkzoned(sbi) ||
- (force && len < cpc->trim_minlen))
+ !force || len < cpc->trim_minlen)
goto skip;

f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
--
2.41.0.162.gfafddb0af9-goog


2023-07-03 02:54:02

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/6/15 0:10, Jaegeuk Kim wrote:
> If there're huge # of small discards, this will increase checkpoint latency
> insanely. Let's issue small discards only by trim.
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
>
> Change log from v1:
> - move the skip logic to avoid dangling objects
>
> fs/f2fs/segment.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 8c7af8b4fc47..0457d620011f 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> len = next_pos - cur_pos;
>
> if (f2fs_sb_has_blkzoned(sbi) ||
> - (force && len < cpc->trim_minlen))
> + !force || len < cpc->trim_minlen)
> goto skip;

Sorry for late reply.

We have a configuration for such case, what do you think of setting
max_small_discards to zero? otherwise, w/ above change, max_small_discards
logic may be broken?

What: /sys/fs/f2fs/<disk>/max_small_discards
Date: November 2013
Contact: "Jaegeuk Kim" <[email protected]>
Description: Controls the issue rate of discard commands that consist of small
blocks less than 2MB. The candidates to be discarded are cached until
checkpoint is triggered, and issued during the checkpoint.
By default, it is disabled with 0.

Or, if we prefer to disable small_discards by default, what about below change:

From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
From: Chao Yu <[email protected]>
Date: Mon, 3 Jul 2023 09:06:53 +0800
Subject: [PATCH] Revert "f2fs: enable small discard by default"

This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
to disable small discard by default, so that if there're huge number of
small discards, it will decrease checkpoint's latency obviously.

Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint"), due to it breaks small discard feature which
may be configured via sysfs entry max_small_discards.

Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/segment.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 14c822e5c9c9..0a313368f18b 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
len = next_pos - cur_pos;

if (f2fs_sb_has_blkzoned(sbi) ||
- !force || len < cpc->trim_minlen)
+ (force && len < cpc->trim_minlen))
goto skip;

f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
@@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
atomic_set(&dcc->queued_discard, 0);
atomic_set(&dcc->discard_cmd_cnt, 0);
dcc->nr_discards = 0;
- dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
+ dcc->max_discards = 0;
dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
--
2.40.1



>
> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-04 11:03:20

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 07/03, Chao Yu wrote:
> On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > If there're huge # of small discards, this will increase checkpoint latency
> > insanely. Let's issue small discards only by trim.
> >
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> >
> > Change log from v1:
> > - move the skip logic to avoid dangling objects
> >
> > fs/f2fs/segment.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 8c7af8b4fc47..0457d620011f 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > len = next_pos - cur_pos;
> > if (f2fs_sb_has_blkzoned(sbi) ||
> > - (force && len < cpc->trim_minlen))
> > + !force || len < cpc->trim_minlen)
> > goto skip;
>
> Sorry for late reply.
>
> We have a configuration for such case, what do you think of setting
> max_small_discards to zero? otherwise, w/ above change, max_small_discards
> logic may be broken?
>
> What: /sys/fs/f2fs/<disk>/max_small_discards
> Date: November 2013
> Contact: "Jaegeuk Kim" <[email protected]>
> Description: Controls the issue rate of discard commands that consist of small
> blocks less than 2MB. The candidates to be discarded are cached until
> checkpoint is triggered, and issued during the checkpoint.
> By default, it is disabled with 0.
>
> Or, if we prefer to disable small_discards by default, what about below change:

I think small_discards is fine, but need to avoid long checkpoint latency only.

>
> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> From: Chao Yu <[email protected]>
> Date: Mon, 3 Jul 2023 09:06:53 +0800
> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>
> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> to disable small discard by default, so that if there're huge number of
> small discards, it will decrease checkpoint's latency obviously.
>
> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> commands during checkpoint"), due to it breaks small discard feature which
> may be configured via sysfs entry max_small_discards.
>
> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> Signed-off-by: Chao Yu <[email protected]>
> ---
> fs/f2fs/segment.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 14c822e5c9c9..0a313368f18b 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> len = next_pos - cur_pos;
>
> if (f2fs_sb_has_blkzoned(sbi) ||
> - !force || len < cpc->trim_minlen)
> + (force && len < cpc->trim_minlen))
> goto skip;
>
> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> atomic_set(&dcc->queued_discard, 0);
> atomic_set(&dcc->discard_cmd_cnt, 0);
> dcc->nr_discards = 0;
> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> + dcc->max_discards = 0;
> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> --
> 2.40.1
>
>
>
> > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-04 15:09:05

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/7/4 18:53, Jaegeuk Kim wrote:
> On 07/03, Chao Yu wrote:
>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>> If there're huge # of small discards, this will increase checkpoint latency
>>> insanely. Let's issue small discards only by trim.
>>>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>>
>>> Change log from v1:
>>> - move the skip logic to avoid dangling objects
>>>
>>> fs/f2fs/segment.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 8c7af8b4fc47..0457d620011f 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>> len = next_pos - cur_pos;
>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>> - (force && len < cpc->trim_minlen))
>>> + !force || len < cpc->trim_minlen)
>>> goto skip;
>>
>> Sorry for late reply.
>>
>> We have a configuration for such case, what do you think of setting
>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>> logic may be broken?
>>
>> What: /sys/fs/f2fs/<disk>/max_small_discards
>> Date: November 2013
>> Contact: "Jaegeuk Kim" <[email protected]>
>> Description: Controls the issue rate of discard commands that consist of small
>> blocks less than 2MB. The candidates to be discarded are cached until
>> checkpoint is triggered, and issued during the checkpoint.
>> By default, it is disabled with 0.
>>
>> Or, if we prefer to disable small_discards by default, what about below change:
>
> I think small_discards is fine, but need to avoid long checkpoint latency only.

I didn't get you, do you mean we can still issue small discard by
fstrim, so small_discards functionality is fine?

Thanks,

>
>>
>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>> From: Chao Yu <[email protected]>
>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>
>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>> to disable small discard by default, so that if there're huge number of
>> small discards, it will decrease checkpoint's latency obviously.
>>
>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>> commands during checkpoint"), due to it breaks small discard feature which
>> may be configured via sysfs entry max_small_discards.
>>
>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>> Signed-off-by: Chao Yu <[email protected]>
>> ---
>> fs/f2fs/segment.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index 14c822e5c9c9..0a313368f18b 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>> len = next_pos - cur_pos;
>>
>> if (f2fs_sb_has_blkzoned(sbi) ||
>> - !force || len < cpc->trim_minlen)
>> + (force && len < cpc->trim_minlen))
>> goto skip;
>>
>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>> atomic_set(&dcc->queued_discard, 0);
>> atomic_set(&dcc->discard_cmd_cnt, 0);
>> dcc->nr_discards = 0;
>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>> + dcc->max_discards = 0;
>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>> --
>> 2.40.1
>>
>>
>>
>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-05 17:46:28

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 07/04, Chao Yu wrote:
> On 2023/7/4 18:53, Jaegeuk Kim wrote:
> > On 07/03, Chao Yu wrote:
> > > On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > > > If there're huge # of small discards, this will increase checkpoint latency
> > > > insanely. Let's issue small discards only by trim.
> > > >
> > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > ---
> > > >
> > > > Change log from v1:
> > > > - move the skip logic to avoid dangling objects
> > > >
> > > > fs/f2fs/segment.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > index 8c7af8b4fc47..0457d620011f 100644
> > > > --- a/fs/f2fs/segment.c
> > > > +++ b/fs/f2fs/segment.c
> > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > len = next_pos - cur_pos;
> > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > - (force && len < cpc->trim_minlen))
> > > > + !force || len < cpc->trim_minlen)
> > > > goto skip;
> > >
> > > Sorry for late reply.
> > >
> > > We have a configuration for such case, what do you think of setting
> > > max_small_discards to zero? otherwise, w/ above change, max_small_discards
> > > logic may be broken?
> > >
> > > What: /sys/fs/f2fs/<disk>/max_small_discards
> > > Date: November 2013
> > > Contact: "Jaegeuk Kim" <[email protected]>
> > > Description: Controls the issue rate of discard commands that consist of small
> > > blocks less than 2MB. The candidates to be discarded are cached until
> > > checkpoint is triggered, and issued during the checkpoint.
> > > By default, it is disabled with 0.
> > >
> > > Or, if we prefer to disable small_discards by default, what about below change:
> >
> > I think small_discards is fine, but need to avoid long checkpoint latency only.
>
> I didn't get you, do you mean we can still issue small discard by
> fstrim, so small_discards functionality is fine?

You got the point.

>
> Thanks,
>
> >
> > >
> > > From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> > > From: Chao Yu <[email protected]>
> > > Date: Mon, 3 Jul 2023 09:06:53 +0800
> > > Subject: [PATCH] Revert "f2fs: enable small discard by default"
> > >
> > > This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> > > to disable small discard by default, so that if there're huge number of
> > > small discards, it will decrease checkpoint's latency obviously.
> > >
> > > Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> > > commands during checkpoint"), due to it breaks small discard feature which
> > > may be configured via sysfs entry max_small_discards.
> > >
> > > Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> > > Signed-off-by: Chao Yu <[email protected]>
> > > ---
> > > fs/f2fs/segment.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > index 14c822e5c9c9..0a313368f18b 100644
> > > --- a/fs/f2fs/segment.c
> > > +++ b/fs/f2fs/segment.c
> > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > len = next_pos - cur_pos;
> > >
> > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > - !force || len < cpc->trim_minlen)
> > > + (force && len < cpc->trim_minlen))
> > > goto skip;
> > >
> > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> > > @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> > > atomic_set(&dcc->queued_discard, 0);
> > > atomic_set(&dcc->discard_cmd_cnt, 0);
> > > dcc->nr_discards = 0;
> > > - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> > > + dcc->max_discards = 0;
> > > dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> > > dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> > > dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> > > --
> > > 2.40.1
> > >
> > >
> > >
> > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-06 01:08:58

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/7/6 1:30, Jaegeuk Kim wrote:
> On 07/04, Chao Yu wrote:
>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>> On 07/03, Chao Yu wrote:
>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>> insanely. Let's issue small discards only by trim.
>>>>>
>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>> ---
>>>>>
>>>>> Change log from v1:
>>>>> - move the skip logic to avoid dangling objects
>>>>>
>>>>> fs/f2fs/segment.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>> len = next_pos - cur_pos;
>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>> - (force && len < cpc->trim_minlen))
>>>>> + !force || len < cpc->trim_minlen)
>>>>> goto skip;
>>>>
>>>> Sorry for late reply.
>>>>
>>>> We have a configuration for such case, what do you think of setting
>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>> logic may be broken?
>>>>
>>>> What: /sys/fs/f2fs/<disk>/max_small_discards
>>>> Date: November 2013
>>>> Contact: "Jaegeuk Kim" <[email protected]>
>>>> Description: Controls the issue rate of discard commands that consist of small
>>>> blocks less than 2MB. The candidates to be discarded are cached until
>>>> checkpoint is triggered, and issued during the checkpoint.
>>>> By default, it is disabled with 0.
>>>>
>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>
>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>
>> I didn't get you, do you mean we can still issue small discard by
>> fstrim, so small_discards functionality is fine?
>
> You got the point.

Well, actually, what I mean is max_small_discards sysfs entry's functionality
is broken. Now, the entry can not be used to control number of small discards
committed by checkpoint.

I think there is another way to achieve "avoid long checkpoint latency caused
by committing huge # of small discards", the way is we can set max_small_discards
to small value or zero, w/ such configuration, it will take checkpoint much less
time or no time to committing small discard due to below control logic:

f2fs_flush_sit_entries()
{
...
if (!(cpc->reason & CP_DISCARD)) {
cpc->trim_start = segno;
add_discard_addrs(sbi, cpc, false);
}
...
}

add_discard_addrs()
{
...
while (force || SM_I(sbi)->dcc_info->nr_discards <=
SM_I(sbi)->dcc_info->max_discards) {

It will break the loop once nr_discards is larger than max_discards, if
max_discards is set to zero, checkpoint won't take time to handle small discards.

...
if (!de) {
de = f2fs_kmem_cache_alloc(discard_entry_slab,
GFP_F2FS_ZERO, true, NULL);
de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
list_add_tail(&de->list, head);
}
...
}
...

Thanks,

>
>>
>> Thanks,
>>
>>>
>>>>
>>>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>> From: Chao Yu <[email protected]>
>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>
>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>> to disable small discard by default, so that if there're huge number of
>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>
>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>> may be configured via sysfs entry max_small_discards.
>>>>
>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>> Signed-off-by: Chao Yu <[email protected]>
>>>> ---
>>>> fs/f2fs/segment.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>> --- a/fs/f2fs/segment.c
>>>> +++ b/fs/f2fs/segment.c
>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>> len = next_pos - cur_pos;
>>>>
>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>> - !force || len < cpc->trim_minlen)
>>>> + (force && len < cpc->trim_minlen))
>>>> goto skip;
>>>>
>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>> atomic_set(&dcc->queued_discard, 0);
>>>> atomic_set(&dcc->discard_cmd_cnt, 0);
>>>> dcc->nr_discards = 0;
>>>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>> + dcc->max_discards = 0;
>>>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>> --
>>>> 2.40.1
>>>>
>>>>
>>>>
>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-11 08:16:25

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/7/6 8:46, Chao Yu wrote:
> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>> On 07/04, Chao Yu wrote:
>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>> On 07/03, Chao Yu wrote:
>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>
>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>> ---
>>>>>>
>>>>>>     Change log from v1:
>>>>>>      - move the skip logic to avoid dangling objects
>>>>>>
>>>>>>     fs/f2fs/segment.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>                 len = next_pos - cur_pos;
>>>>>>                 if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>> -                (force && len < cpc->trim_minlen))
>>>>>> +                    !force || len < cpc->trim_minlen)
>>>>>>                     goto skip;
>>>>>
>>>>> Sorry for late reply.
>>>>>
>>>>> We have a configuration for such case, what do you think of setting
>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>> logic may be broken?
>>>>>
>>>>> What:           /sys/fs/f2fs/<disk>/max_small_discards
>>>>> Date:           November 2013
>>>>> Contact:        "Jaegeuk Kim" <[email protected]>
>>>>> Description:    Controls the issue rate of discard commands that consist of small
>>>>>                   blocks less than 2MB. The candidates to be discarded are cached until
>>>>>                   checkpoint is triggered, and issued during the checkpoint.
>>>>>                   By default, it is disabled with 0.
>>>>>
>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>
>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>
>>> I didn't get you, do you mean we can still issue small discard by
>>> fstrim, so small_discards functionality is fine?
>>
>> You got the point.
>
> Well, actually, what I mean is max_small_discards sysfs entry's functionality
> is broken. Now, the entry can not be used to control number of small discards
> committed by checkpoint.
>
> I think there is another way to achieve "avoid long checkpoint latency caused
> by committing huge # of small discards", the way is we can set max_small_discards
> to small value or zero, w/ such configuration, it will take checkpoint much less
> time or no time to committing small discard due to below control logic:

Jaegeuk, any comments?

Thanks,

>
> f2fs_flush_sit_entries()
> {
> ...
>             if (!(cpc->reason & CP_DISCARD)) {
>                 cpc->trim_start = segno;
>                 add_discard_addrs(sbi, cpc, false);
>             }
> ...
> }
>
> add_discard_addrs()
> {
> ...
>     while (force || SM_I(sbi)->dcc_info->nr_discards <=
>                 SM_I(sbi)->dcc_info->max_discards) {
>
> It will break the loop once nr_discards is larger than max_discards, if
> max_discards is set to zero, checkpoint won't take time to handle small discards.
>
> ...
>         if (!de) {
>             de = f2fs_kmem_cache_alloc(discard_entry_slab,
>                         GFP_F2FS_ZERO, true, NULL);
>             de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>             list_add_tail(&de->list, head);
>         }
> ...
>     }
> ...
>
> Thanks,
>
>>
>>>
>>> Thanks,
>>>
>>>>
>>>>>
>>>>>   From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>> From: Chao Yu <[email protected]>
>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>
>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>> to disable small discard by default, so that if there're huge number of
>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>
>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>> may be configured via sysfs entry max_small_discards.
>>>>>
>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>> ---
>>>>>    fs/f2fs/segment.c | 4 ++--
>>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>                len = next_pos - cur_pos;
>>>>>
>>>>>                if (f2fs_sb_has_blkzoned(sbi) ||
>>>>> -                    !force || len < cpc->trim_minlen)
>>>>> +                (force && len < cpc->trim_minlen))
>>>>>                    goto skip;
>>>>>
>>>>>                f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>        atomic_set(&dcc->queued_discard, 0);
>>>>>        atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>        dcc->nr_discards = 0;
>>>>> -    dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>> +    dcc->max_discards = 0;
>>>>>        dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>        dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>        dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>> --
>>>>> 2.40.1
>>>>>
>>>>>
>>>>>
>>>>>>                 f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>
>
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

2023-07-11 17:02:58

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 07/06, Chao Yu wrote:
> On 2023/7/6 1:30, Jaegeuk Kim wrote:
> > On 07/04, Chao Yu wrote:
> > > On 2023/7/4 18:53, Jaegeuk Kim wrote:
> > > > On 07/03, Chao Yu wrote:
> > > > > On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > > > > > If there're huge # of small discards, this will increase checkpoint latency
> > > > > > insanely. Let's issue small discards only by trim.
> > > > > >
> > > > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Change log from v1:
> > > > > > - move the skip logic to avoid dangling objects
> > > > > >
> > > > > > fs/f2fs/segment.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > index 8c7af8b4fc47..0457d620011f 100644
> > > > > > --- a/fs/f2fs/segment.c
> > > > > > +++ b/fs/f2fs/segment.c
> > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > len = next_pos - cur_pos;
> > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > - (force && len < cpc->trim_minlen))
> > > > > > + !force || len < cpc->trim_minlen)
> > > > > > goto skip;
> > > > >
> > > > > Sorry for late reply.
> > > > >
> > > > > We have a configuration for such case, what do you think of setting
> > > > > max_small_discards to zero? otherwise, w/ above change, max_small_discards
> > > > > logic may be broken?
> > > > >
> > > > > What: /sys/fs/f2fs/<disk>/max_small_discards
> > > > > Date: November 2013
> > > > > Contact: "Jaegeuk Kim" <[email protected]>
> > > > > Description: Controls the issue rate of discard commands that consist of small
> > > > > blocks less than 2MB. The candidates to be discarded are cached until
> > > > > checkpoint is triggered, and issued during the checkpoint.
> > > > > By default, it is disabled with 0.
> > > > >
> > > > > Or, if we prefer to disable small_discards by default, what about below change:
> > > >
> > > > I think small_discards is fine, but need to avoid long checkpoint latency only.
> > >
> > > I didn't get you, do you mean we can still issue small discard by
> > > fstrim, so small_discards functionality is fine?
> >
> > You got the point.
>
> Well, actually, what I mean is max_small_discards sysfs entry's functionality
> is broken. Now, the entry can not be used to control number of small discards
> committed by checkpoint.

Could you descrbie this problem first?

>
> I think there is another way to achieve "avoid long checkpoint latency caused
> by committing huge # of small discards", the way is we can set max_small_discards
> to small value or zero, w/ such configuration, it will take checkpoint much less
> time or no time to committing small discard due to below control logic:
>
> f2fs_flush_sit_entries()
> {
> ...
> if (!(cpc->reason & CP_DISCARD)) {
> cpc->trim_start = segno;
> add_discard_addrs(sbi, cpc, false);
> }
> ...
> }
>
> add_discard_addrs()
> {
> ...
> while (force || SM_I(sbi)->dcc_info->nr_discards <=
> SM_I(sbi)->dcc_info->max_discards) {
>
> It will break the loop once nr_discards is larger than max_discards, if
> max_discards is set to zero, checkpoint won't take time to handle small discards.
>
> ...
> if (!de) {
> de = f2fs_kmem_cache_alloc(discard_entry_slab,
> GFP_F2FS_ZERO, true, NULL);
> de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
> list_add_tail(&de->list, head);
> }
> ...
> }
> ...
>
> Thanks,
>
> >
> > >
> > > Thanks,
> > >
> > > >
> > > > >
> > > > > From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> > > > > From: Chao Yu <[email protected]>
> > > > > Date: Mon, 3 Jul 2023 09:06:53 +0800
> > > > > Subject: [PATCH] Revert "f2fs: enable small discard by default"
> > > > >
> > > > > This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> > > > > to disable small discard by default, so that if there're huge number of
> > > > > small discards, it will decrease checkpoint's latency obviously.
> > > > >
> > > > > Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> > > > > commands during checkpoint"), due to it breaks small discard feature which
> > > > > may be configured via sysfs entry max_small_discards.
> > > > >
> > > > > Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> > > > > Signed-off-by: Chao Yu <[email protected]>
> > > > > ---
> > > > > fs/f2fs/segment.c | 4 ++--
> > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > index 14c822e5c9c9..0a313368f18b 100644
> > > > > --- a/fs/f2fs/segment.c
> > > > > +++ b/fs/f2fs/segment.c
> > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > len = next_pos - cur_pos;
> > > > >
> > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > - !force || len < cpc->trim_minlen)
> > > > > + (force && len < cpc->trim_minlen))
> > > > > goto skip;
> > > > >
> > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> > > > > @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> > > > > atomic_set(&dcc->queued_discard, 0);
> > > > > atomic_set(&dcc->discard_cmd_cnt, 0);
> > > > > dcc->nr_discards = 0;
> > > > > - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> > > > > + dcc->max_discards = 0;
> > > > > dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> > > > > dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> > > > > dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> > > > > --
> > > > > 2.40.1
> > > > >
> > > > >
> > > > >
> > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-12 02:04:31

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/7/12 0:37, Jaegeuk Kim wrote:
> On 07/06, Chao Yu wrote:
>> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>>> On 07/04, Chao Yu wrote:
>>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>>> On 07/03, Chao Yu wrote:
>>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>>
>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>> ---
>>>>>>>
>>>>>>> Change log from v1:
>>>>>>> - move the skip logic to avoid dangling objects
>>>>>>>
>>>>>>> fs/f2fs/segment.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>> len = next_pos - cur_pos;
>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>> - (force && len < cpc->trim_minlen))
>>>>>>> + !force || len < cpc->trim_minlen)
>>>>>>> goto skip;
>>>>>>
>>>>>> Sorry for late reply.
>>>>>>
>>>>>> We have a configuration for such case, what do you think of setting
>>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>>> logic may be broken?
>>>>>>
>>>>>> What: /sys/fs/f2fs/<disk>/max_small_discards
>>>>>> Date: November 2013
>>>>>> Contact: "Jaegeuk Kim" <[email protected]>
>>>>>> Description: Controls the issue rate of discard commands that consist of small
>>>>>> blocks less than 2MB. The candidates to be discarded are cached until
>>>>>> checkpoint is triggered, and issued during the checkpoint.
>>>>>> By default, it is disabled with 0.
>>>>>>
>>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>>
>>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>>
>>>> I didn't get you, do you mean we can still issue small discard by
>>>> fstrim, so small_discards functionality is fine?
>>>
>>> You got the point.
>>
>> Well, actually, what I mean is max_small_discards sysfs entry's functionality
>> is broken. Now, the entry can not be used to control number of small discards
>> committed by checkpoint.
>
> Could you descrbie this problem first?

Oh, alright, actually, I've described this problem literally, but maybe it's not
clear, let me give some examples as below:

echo 0 > /sys/fs/f2fs/vdb/max_small_discards
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2

echo 100 > /sys/fs/f2fs/vdb/max_small_discards
rm /mnt/f2fs/file
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2

Before the patch:

Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .

Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 3 1 . . . . . .

After the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .

Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .

So, now max_small_discards can not be used to control small discard number
cached by checkpoint.

Thanks,

>
>>
>> I think there is another way to achieve "avoid long checkpoint latency caused
>> by committing huge # of small discards", the way is we can set max_small_discards
>> to small value or zero, w/ such configuration, it will take checkpoint much less
>> time or no time to committing small discard due to below control logic:
>>
>> f2fs_flush_sit_entries()
>> {
>> ...
>> if (!(cpc->reason & CP_DISCARD)) {
>> cpc->trim_start = segno;
>> add_discard_addrs(sbi, cpc, false);
>> }
>> ...
>> }
>>
>> add_discard_addrs()
>> {
>> ...
>> while (force || SM_I(sbi)->dcc_info->nr_discards <=
>> SM_I(sbi)->dcc_info->max_discards) {
>>
>> It will break the loop once nr_discards is larger than max_discards, if
>> max_discards is set to zero, checkpoint won't take time to handle small discards.
>>
>> ...
>> if (!de) {
>> de = f2fs_kmem_cache_alloc(discard_entry_slab,
>> GFP_F2FS_ZERO, true, NULL);
>> de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>> list_add_tail(&de->list, head);
>> }
>> ...
>> }
>> ...
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>>> From: Chao Yu <[email protected]>
>>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>>
>>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>>> to disable small discard by default, so that if there're huge number of
>>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>>
>>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>>> may be configured via sysfs entry max_small_discards.
>>>>>>
>>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>> ---
>>>>>> fs/f2fs/segment.c | 4 ++--
>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>> len = next_pos - cur_pos;
>>>>>>
>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>> - !force || len < cpc->trim_minlen)
>>>>>> + (force && len < cpc->trim_minlen))
>>>>>> goto skip;
>>>>>>
>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>> atomic_set(&dcc->queued_discard, 0);
>>>>>> atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>> dcc->nr_discards = 0;
>>>>>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>>> + dcc->max_discards = 0;
>>>>>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>>> --
>>>>>> 2.40.1
>>>>>>
>>>>>>
>>>>>>
>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-12 16:22:24

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 07/12, Chao Yu wrote:
> On 2023/7/12 0:37, Jaegeuk Kim wrote:
> > On 07/06, Chao Yu wrote:
> > > On 2023/7/6 1:30, Jaegeuk Kim wrote:
> > > > On 07/04, Chao Yu wrote:
> > > > > On 2023/7/4 18:53, Jaegeuk Kim wrote:
> > > > > > On 07/03, Chao Yu wrote:
> > > > > > > On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > > > > > > > If there're huge # of small discards, this will increase checkpoint latency
> > > > > > > > insanely. Let's issue small discards only by trim.
> > > > > > > >
> > > > > > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Change log from v1:
> > > > > > > > - move the skip logic to avoid dangling objects
> > > > > > > >
> > > > > > > > fs/f2fs/segment.c | 2 +-
> > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > index 8c7af8b4fc47..0457d620011f 100644
> > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > len = next_pos - cur_pos;
> > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > - (force && len < cpc->trim_minlen))
> > > > > > > > + !force || len < cpc->trim_minlen)
> > > > > > > > goto skip;
> > > > > > >
> > > > > > > Sorry for late reply.
> > > > > > >
> > > > > > > We have a configuration for such case, what do you think of setting
> > > > > > > max_small_discards to zero? otherwise, w/ above change, max_small_discards
> > > > > > > logic may be broken?
> > > > > > >
> > > > > > > What: /sys/fs/f2fs/<disk>/max_small_discards
> > > > > > > Date: November 2013
> > > > > > > Contact: "Jaegeuk Kim" <[email protected]>
> > > > > > > Description: Controls the issue rate of discard commands that consist of small
> > > > > > > blocks less than 2MB. The candidates to be discarded are cached until
> > > > > > > checkpoint is triggered, and issued during the checkpoint.
> > > > > > > By default, it is disabled with 0.
> > > > > > >
> > > > > > > Or, if we prefer to disable small_discards by default, what about below change:
> > > > > >
> > > > > > I think small_discards is fine, but need to avoid long checkpoint latency only.
> > > > >
> > > > > I didn't get you, do you mean we can still issue small discard by
> > > > > fstrim, so small_discards functionality is fine?
> > > >
> > > > You got the point.
> > >
> > > Well, actually, what I mean is max_small_discards sysfs entry's functionality
> > > is broken. Now, the entry can not be used to control number of small discards
> > > committed by checkpoint.
> >
> > Could you descrbie this problem first?
>
> Oh, alright, actually, I've described this problem literally, but maybe it's not
> clear, let me give some examples as below:
>
> echo 0 > /sys/fs/f2fs/vdb/max_small_discards
> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> sync
> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>
> echo 100 > /sys/fs/f2fs/vdb/max_small_discards
> rm /mnt/f2fs/file
> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> sync
> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>
> Before the patch:
>
> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> 0 . . . . . . . .
>
> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> 0 3 1 . . . . . .
>
> After the patch:
> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> 0 . . . . . . . .
>
> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> 0 . . . . . . . .
>
> So, now max_small_discards can not be used to control small discard number
> cached by checkpoint.

Since we do not submit small discards anymore during checkpoint. Why not relying
on the discard thread to issue them?

>
> Thanks,
>
> >
> > >
> > > I think there is another way to achieve "avoid long checkpoint latency caused
> > > by committing huge # of small discards", the way is we can set max_small_discards
> > > to small value or zero, w/ such configuration, it will take checkpoint much less
> > > time or no time to committing small discard due to below control logic:
> > >
> > > f2fs_flush_sit_entries()
> > > {
> > > ...
> > > if (!(cpc->reason & CP_DISCARD)) {
> > > cpc->trim_start = segno;
> > > add_discard_addrs(sbi, cpc, false);
> > > }
> > > ...
> > > }
> > >
> > > add_discard_addrs()
> > > {
> > > ...
> > > while (force || SM_I(sbi)->dcc_info->nr_discards <=
> > > SM_I(sbi)->dcc_info->max_discards) {
> > >
> > > It will break the loop once nr_discards is larger than max_discards, if
> > > max_discards is set to zero, checkpoint won't take time to handle small discards.
> > >
> > > ...
> > > if (!de) {
> > > de = f2fs_kmem_cache_alloc(discard_entry_slab,
> > > GFP_F2FS_ZERO, true, NULL);
> > > de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
> > > list_add_tail(&de->list, head);
> > > }
> > > ...
> > > }
> > > ...
> > >
> > > Thanks,
> > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > >
> > > > > > >
> > > > > > > From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> > > > > > > From: Chao Yu <[email protected]>
> > > > > > > Date: Mon, 3 Jul 2023 09:06:53 +0800
> > > > > > > Subject: [PATCH] Revert "f2fs: enable small discard by default"
> > > > > > >
> > > > > > > This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> > > > > > > to disable small discard by default, so that if there're huge number of
> > > > > > > small discards, it will decrease checkpoint's latency obviously.
> > > > > > >
> > > > > > > Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> > > > > > > commands during checkpoint"), due to it breaks small discard feature which
> > > > > > > may be configured via sysfs entry max_small_discards.
> > > > > > >
> > > > > > > Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> > > > > > > Signed-off-by: Chao Yu <[email protected]>
> > > > > > > ---
> > > > > > > fs/f2fs/segment.c | 4 ++--
> > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > index 14c822e5c9c9..0a313368f18b 100644
> > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > len = next_pos - cur_pos;
> > > > > > >
> > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > - !force || len < cpc->trim_minlen)
> > > > > > > + (force && len < cpc->trim_minlen))
> > > > > > > goto skip;
> > > > > > >
> > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> > > > > > > @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> > > > > > > atomic_set(&dcc->queued_discard, 0);
> > > > > > > atomic_set(&dcc->discard_cmd_cnt, 0);
> > > > > > > dcc->nr_discards = 0;
> > > > > > > - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> > > > > > > + dcc->max_discards = 0;
> > > > > > > dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> > > > > > > dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> > > > > > > dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> > > > > > > --
> > > > > > > 2.40.1
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-13 02:12:44

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/7/12 23:55, Jaegeuk Kim wrote:
> On 07/12, Chao Yu wrote:
>> On 2023/7/12 0:37, Jaegeuk Kim wrote:
>>> On 07/06, Chao Yu wrote:
>>>> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>>>>> On 07/04, Chao Yu wrote:
>>>>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>>>>> On 07/03, Chao Yu wrote:
>>>>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Change log from v1:
>>>>>>>>> - move the skip logic to avoid dangling objects
>>>>>>>>>
>>>>>>>>> fs/f2fs/segment.c | 2 +-
>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>> - (force && len < cpc->trim_minlen))
>>>>>>>>> + !force || len < cpc->trim_minlen)
>>>>>>>>> goto skip;
>>>>>>>>
>>>>>>>> Sorry for late reply.
>>>>>>>>
>>>>>>>> We have a configuration for such case, what do you think of setting
>>>>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>>>>> logic may be broken?
>>>>>>>>
>>>>>>>> What: /sys/fs/f2fs/<disk>/max_small_discards
>>>>>>>> Date: November 2013
>>>>>>>> Contact: "Jaegeuk Kim" <[email protected]>
>>>>>>>> Description: Controls the issue rate of discard commands that consist of small
>>>>>>>> blocks less than 2MB. The candidates to be discarded are cached until
>>>>>>>> checkpoint is triggered, and issued during the checkpoint.
>>>>>>>> By default, it is disabled with 0.
>>>>>>>>
>>>>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>>>>
>>>>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>>>>
>>>>>> I didn't get you, do you mean we can still issue small discard by
>>>>>> fstrim, so small_discards functionality is fine?
>>>>>
>>>>> You got the point.
>>>>
>>>> Well, actually, what I mean is max_small_discards sysfs entry's functionality
>>>> is broken. Now, the entry can not be used to control number of small discards
>>>> committed by checkpoint.
>>>
>>> Could you descrbie this problem first?
>>
>> Oh, alright, actually, I've described this problem literally, but maybe it's not
>> clear, let me give some examples as below:
>>
>> echo 0 > /sys/fs/f2fs/vdb/max_small_discards
>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>> sync
>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>
>> echo 100 > /sys/fs/f2fs/vdb/max_small_discards
>> rm /mnt/f2fs/file
>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>> sync
>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>
>> Before the patch:
>>
>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>> 0 . . . . . . . .
>>
>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>> 0 3 1 . . . . . .
>>
>> After the patch:
>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>> 0 . . . . . . . .
>>
>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>> 0 . . . . . . . .
>>
>> So, now max_small_discards can not be used to control small discard number
>> cached by checkpoint.

Let me explain more:

Previously, we have two mechanisms to cache & submit small discards:

a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
will cache small discard candidates w/ configured maximum number.

b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
interface is asynchronized, so it won't submit discard directly.

Finally, discard thread will submit them in background periodically.

So what I mean is the mechanism a) is broken, since no matter how we configure the
sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
discard candidates any more.

So, it needs to fix max_small_discards sysfs functionality? or just drop the
functionality?

>
> Since we do not submit small discards anymore during checkpoint. Why not relying
> on the discard thread to issue them?

Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
is obsoleted, so it recommended to use fstrim to cache & submit small discards?

Let me know, if I'm missing something or misunderstanding the point.

Thanks,

>
>>
>> Thanks,
>>
>>>
>>>>
>>>> I think there is another way to achieve "avoid long checkpoint latency caused
>>>> by committing huge # of small discards", the way is we can set max_small_discards
>>>> to small value or zero, w/ such configuration, it will take checkpoint much less
>>>> time or no time to committing small discard due to below control logic:
>>>>
>>>> f2fs_flush_sit_entries()
>>>> {
>>>> ...
>>>> if (!(cpc->reason & CP_DISCARD)) {
>>>> cpc->trim_start = segno;
>>>> add_discard_addrs(sbi, cpc, false);
>>>> }
>>>> ...
>>>> }
>>>>
>>>> add_discard_addrs()
>>>> {
>>>> ...
>>>> while (force || SM_I(sbi)->dcc_info->nr_discards <=
>>>> SM_I(sbi)->dcc_info->max_discards) {
>>>>
>>>> It will break the loop once nr_discards is larger than max_discards, if
>>>> max_discards is set to zero, checkpoint won't take time to handle small discards.
>>>>
>>>> ...
>>>> if (!de) {
>>>> de = f2fs_kmem_cache_alloc(discard_entry_slab,
>>>> GFP_F2FS_ZERO, true, NULL);
>>>> de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>>>> list_add_tail(&de->list, head);
>>>> }
>>>> ...
>>>> }
>>>> ...
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>>>>> From: Chao Yu <[email protected]>
>>>>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>>>>
>>>>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>>>>> to disable small discard by default, so that if there're huge number of
>>>>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>>>>
>>>>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>>>>> may be configured via sysfs entry max_small_discards.
>>>>>>>>
>>>>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>>>> ---
>>>>>>>> fs/f2fs/segment.c | 4 ++--
>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>
>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>> - !force || len < cpc->trim_minlen)
>>>>>>>> + (force && len < cpc->trim_minlen))
>>>>>>>> goto skip;
>>>>>>>>
>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>>>> atomic_set(&dcc->queued_discard, 0);
>>>>>>>> atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>>>> dcc->nr_discards = 0;
>>>>>>>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>>>>> + dcc->max_discards = 0;
>>>>>>>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>>>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>>>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>>>>> --
>>>>>>>> 2.40.1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-18 04:01:16

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

Comments?

On 2023/7/13 9:31, Chao Yu wrote:
> On 2023/7/12 23:55, Jaegeuk Kim wrote:
>> On 07/12, Chao Yu wrote:
>>> On 2023/7/12 0:37, Jaegeuk Kim wrote:
>>>> On 07/06, Chao Yu wrote:
>>>>> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>>>>>> On 07/04, Chao Yu wrote:
>>>>>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>>>>>> On 07/03, Chao Yu wrote:
>>>>>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>>       Change log from v1:
>>>>>>>>>>        - move the skip logic to avoid dangling objects
>>>>>>>>>>
>>>>>>>>>>       fs/f2fs/segment.c | 2 +-
>>>>>>>>>>       1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>>                   len = next_pos - cur_pos;
>>>>>>>>>>                   if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>> -                (force && len < cpc->trim_minlen))
>>>>>>>>>> +                    !force || len < cpc->trim_minlen)
>>>>>>>>>>                       goto skip;
>>>>>>>>>
>>>>>>>>> Sorry for late reply.
>>>>>>>>>
>>>>>>>>> We have a configuration for such case, what do you think of setting
>>>>>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>>>>>> logic may be broken?
>>>>>>>>>
>>>>>>>>> What:           /sys/fs/f2fs/<disk>/max_small_discards
>>>>>>>>> Date:           November 2013
>>>>>>>>> Contact:        "Jaegeuk Kim" <[email protected]>
>>>>>>>>> Description:    Controls the issue rate of discard commands that consist of small
>>>>>>>>>                     blocks less than 2MB. The candidates to be discarded are cached until
>>>>>>>>>                     checkpoint is triggered, and issued during the checkpoint.
>>>>>>>>>                     By default, it is disabled with 0.
>>>>>>>>>
>>>>>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>>>>>
>>>>>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>>>>>
>>>>>>> I didn't get you, do you mean we can still issue small discard by
>>>>>>> fstrim, so small_discards functionality is fine?
>>>>>>
>>>>>> You got the point.
>>>>>
>>>>> Well, actually, what I mean is max_small_discards sysfs entry's functionality
>>>>> is broken. Now, the entry can not be used to control number of small discards
>>>>> committed by checkpoint.
>>>>
>>>> Could you descrbie this problem first?
>>>
>>> Oh, alright, actually, I've described this problem literally, but maybe it's not
>>> clear, let me give some examples as below:
>>>
>>> echo 0 > /sys/fs/f2fs/vdb/max_small_discards
>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>> sync
>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>
>>> echo 100 > /sys/fs/f2fs/vdb/max_small_discards
>>> rm /mnt/f2fs/file
>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>> sync
>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>
>>> Before the patch:
>>>
>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>    0         .       .       .       .       .       .       .       .
>>>
>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>    0         3       1       .       .       .       .       .       .
>>>
>>> After the patch:
>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>    0         .       .       .       .       .       .       .       .
>>>
>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>    0         .       .       .       .       .       .       .       .
>>>
>>> So, now max_small_discards can not be used to control small discard number
>>> cached by checkpoint.
>
> Let me explain more:
>
> Previously, we have two mechanisms to cache & submit small discards:
>
> a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
> will cache small discard candidates w/ configured maximum number.
>
> b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
> candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
> interface is asynchronized, so it won't submit discard directly.
>
> Finally, discard thread will submit them in background periodically.
>
> So what I mean is the mechanism a) is broken, since no matter how we configure the
> sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
> discard candidates any more.
>
> So, it needs to fix max_small_discards sysfs functionality? or just drop the
> functionality?
>
>>
>> Since we do not submit small discards anymore during checkpoint. Why not relying
>> on the discard thread to issue them?
>
> Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
> is obsoleted, so it recommended to use fstrim to cache & submit small discards?
>
> Let me know, if I'm missing something or misunderstanding the point.
>
> Thanks,
>
>>
>>>
>>> Thanks,
>>>
>>>>
>>>>>
>>>>> I think there is another way to achieve "avoid long checkpoint latency caused
>>>>> by committing huge # of small discards", the way is we can set max_small_discards
>>>>> to small value or zero, w/ such configuration, it will take checkpoint much less
>>>>> time or no time to committing small discard due to below control logic:
>>>>>
>>>>> f2fs_flush_sit_entries()
>>>>> {
>>>>> ...
>>>>>             if (!(cpc->reason & CP_DISCARD)) {
>>>>>                 cpc->trim_start = segno;
>>>>>                 add_discard_addrs(sbi, cpc, false);
>>>>>             }
>>>>> ...
>>>>> }
>>>>>
>>>>> add_discard_addrs()
>>>>> {
>>>>> ...
>>>>>     while (force || SM_I(sbi)->dcc_info->nr_discards <=
>>>>>                 SM_I(sbi)->dcc_info->max_discards) {
>>>>>
>>>>> It will break the loop once nr_discards is larger than max_discards, if
>>>>> max_discards is set to zero, checkpoint won't take time to handle small discards.
>>>>>
>>>>> ...
>>>>>         if (!de) {
>>>>>             de = f2fs_kmem_cache_alloc(discard_entry_slab,
>>>>>                         GFP_F2FS_ZERO, true, NULL);
>>>>>             de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>>>>>             list_add_tail(&de->list, head);
>>>>>         }
>>>>> ...
>>>>>     }
>>>>> ...
>>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>     From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>>>>>> From: Chao Yu <[email protected]>
>>>>>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>>>>>
>>>>>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>>>>>> to disable small discard by default, so that if there're huge number of
>>>>>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>>>>>
>>>>>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>>>>>> may be configured via sysfs entry max_small_discards.
>>>>>>>>>
>>>>>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>>>>> ---
>>>>>>>>>      fs/f2fs/segment.c | 4 ++--
>>>>>>>>>      1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>                  len = next_pos - cur_pos;
>>>>>>>>>
>>>>>>>>>                  if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>> -                    !force || len < cpc->trim_minlen)
>>>>>>>>> +                (force && len < cpc->trim_minlen))
>>>>>>>>>                      goto skip;
>>>>>>>>>
>>>>>>>>>                  f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>>>>>          atomic_set(&dcc->queued_discard, 0);
>>>>>>>>>          atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>>>>>          dcc->nr_discards = 0;
>>>>>>>>> -    dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>>>>>> +    dcc->max_discards = 0;
>>>>>>>>>          dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>>>>>          dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>>>>>          dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>>>>>> --
>>>>>>>>> 2.40.1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>                   f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>
>
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

2023-07-21 21:16:21

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 07/13, Chao Yu wrote:
> On 2023/7/12 23:55, Jaegeuk Kim wrote:
> > On 07/12, Chao Yu wrote:
> > > On 2023/7/12 0:37, Jaegeuk Kim wrote:
> > > > On 07/06, Chao Yu wrote:
> > > > > On 2023/7/6 1:30, Jaegeuk Kim wrote:
> > > > > > On 07/04, Chao Yu wrote:
> > > > > > > On 2023/7/4 18:53, Jaegeuk Kim wrote:
> > > > > > > > On 07/03, Chao Yu wrote:
> > > > > > > > > On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > > > > > > > > > If there're huge # of small discards, this will increase checkpoint latency
> > > > > > > > > > insanely. Let's issue small discards only by trim.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > Change log from v1:
> > > > > > > > > > - move the skip logic to avoid dangling objects
> > > > > > > > > >
> > > > > > > > > > fs/f2fs/segment.c | 2 +-
> > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > > > index 8c7af8b4fc47..0457d620011f 100644
> > > > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > > > len = next_pos - cur_pos;
> > > > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > > > - (force && len < cpc->trim_minlen))
> > > > > > > > > > + !force || len < cpc->trim_minlen)
> > > > > > > > > > goto skip;
> > > > > > > > >
> > > > > > > > > Sorry for late reply.
> > > > > > > > >
> > > > > > > > > We have a configuration for such case, what do you think of setting
> > > > > > > > > max_small_discards to zero? otherwise, w/ above change, max_small_discards
> > > > > > > > > logic may be broken?
> > > > > > > > >
> > > > > > > > > What: /sys/fs/f2fs/<disk>/max_small_discards
> > > > > > > > > Date: November 2013
> > > > > > > > > Contact: "Jaegeuk Kim" <[email protected]>
> > > > > > > > > Description: Controls the issue rate of discard commands that consist of small
> > > > > > > > > blocks less than 2MB. The candidates to be discarded are cached until
> > > > > > > > > checkpoint is triggered, and issued during the checkpoint.
> > > > > > > > > By default, it is disabled with 0.
> > > > > > > > >
> > > > > > > > > Or, if we prefer to disable small_discards by default, what about below change:
> > > > > > > >
> > > > > > > > I think small_discards is fine, but need to avoid long checkpoint latency only.
> > > > > > >
> > > > > > > I didn't get you, do you mean we can still issue small discard by
> > > > > > > fstrim, so small_discards functionality is fine?
> > > > > >
> > > > > > You got the point.
> > > > >
> > > > > Well, actually, what I mean is max_small_discards sysfs entry's functionality
> > > > > is broken. Now, the entry can not be used to control number of small discards
> > > > > committed by checkpoint.
> > > >
> > > > Could you descrbie this problem first?
> > >
> > > Oh, alright, actually, I've described this problem literally, but maybe it's not
> > > clear, let me give some examples as below:
> > >
> > > echo 0 > /sys/fs/f2fs/vdb/max_small_discards
> > > xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> > > xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> > > sync
> > > cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
> > >
> > > echo 100 > /sys/fs/f2fs/vdb/max_small_discards
> > > rm /mnt/f2fs/file
> > > xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> > > xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> > > sync
> > > cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
> > >
> > > Before the patch:
> > >
> > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > 0 . . . . . . . .
> > >
> > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > 0 3 1 . . . . . .
> > >
> > > After the patch:
> > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > 0 . . . . . . . .
> > >
> > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > 0 . . . . . . . .
> > >
> > > So, now max_small_discards can not be used to control small discard number
> > > cached by checkpoint.
>
> Let me explain more:
>
> Previously, we have two mechanisms to cache & submit small discards:
>
> a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
> will cache small discard candidates w/ configured maximum number.
>
> b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
> candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
> interface is asynchronized, so it won't submit discard directly.
>
> Finally, discard thread will submit them in background periodically.
>
> So what I mean is the mechanism a) is broken, since no matter how we configure the
> sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
> discard candidates any more.

Ok, it seems what I encountered before was adding this small discard even
after issuing it by checkpoint. Thoughts?

node=0], cp [data=0, node=0, meta=0], app [read=0 (direct=0, buffered=0), mapped=0], compr(buffered=0, mapped=0)], fs [data=0, (gc_data=0, cdata=0), node=0, meta=0]
f2fs_discard-25-752 [003] ..... 9744.173085: f2fs_iostat_latency: dev = (254,51), iotype [peak lat.(ms)/avg lat.(ms)/count], rd_data [0/0/0], rd_node [0/0/0], rd_meta [0/0/0], wr_sync_data [0/0/0], wr_sync_node [0/0/0], wr_sync_meta [0/0/0], wr_async_data [0/0/0], wr_async_node [0/0/0], wr_async_meta [0/0/0]
f2fs_discard-25-752 [003] ..... 9744.173089: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0c6, blklen = 0x1
f2fs_discard-25-752 [003] ..... 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
f2fs_discard-25-752 [003] ..... 9744.173126: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c116, blklen = 0x1
f2fs_discard-25-752 [003] ..... 9744.173140: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c121, blklen = 0x1
f2fs_discard-25-752 [003] ..... 9744.173154: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c15b, blklen = 0x1
f2fs_discard-25-752 [003] ..... 9744.173169: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c16e, blklen = 0x1
f2fs_discard-25-752 [003] ..... 9744.173183: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c181, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175272: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0a9, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175345: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0c6, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175352: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c116, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175354: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c121, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175360: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c15b, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175362: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c16e, blklen = 0x1
f2fs_discard-25-752 [004] ..... 9744.175367: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c181, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.228775: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c197, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.228950: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1ad, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.228965: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1b1, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.228994: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1b3, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.229006: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c27a, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.229017: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c2a3, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.229028: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c2ab, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.229041: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c304, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230811: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c197, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230926: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1ad, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230929: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1b1, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230932: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1b3, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230940: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c27a, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230945: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c2a3, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230947: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c2ab, blklen = 0x1
f2fs_discard-25-752 [002] ..... 9744.230952: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c304, blklen = 0x1

>
> So, it needs to fix max_small_discards sysfs functionality? or just drop the
> functionality?
>
> >
> > Since we do not submit small discards anymore during checkpoint. Why not relying
> > on the discard thread to issue them?
>
> Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
> is obsoleted, so it recommended to use fstrim to cache & submit small discards?
>
> Let me know, if I'm missing something or misunderstanding the point.
>
> Thanks,
>
> >
> > >
> > > Thanks,
> > >
> > > >
> > > > >
> > > > > I think there is another way to achieve "avoid long checkpoint latency caused
> > > > > by committing huge # of small discards", the way is we can set max_small_discards
> > > > > to small value or zero, w/ such configuration, it will take checkpoint much less
> > > > > time or no time to committing small discard due to below control logic:
> > > > >
> > > > > f2fs_flush_sit_entries()
> > > > > {
> > > > > ...
> > > > > if (!(cpc->reason & CP_DISCARD)) {
> > > > > cpc->trim_start = segno;
> > > > > add_discard_addrs(sbi, cpc, false);
> > > > > }
> > > > > ...
> > > > > }
> > > > >
> > > > > add_discard_addrs()
> > > > > {
> > > > > ...
> > > > > while (force || SM_I(sbi)->dcc_info->nr_discards <=
> > > > > SM_I(sbi)->dcc_info->max_discards) {
> > > > >
> > > > > It will break the loop once nr_discards is larger than max_discards, if
> > > > > max_discards is set to zero, checkpoint won't take time to handle small discards.
> > > > >
> > > > > ...
> > > > > if (!de) {
> > > > > de = f2fs_kmem_cache_alloc(discard_entry_slab,
> > > > > GFP_F2FS_ZERO, true, NULL);
> > > > > de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
> > > > > list_add_tail(&de->list, head);
> > > > > }
> > > > > ...
> > > > > }
> > > > > ...
> > > > >
> > > > > Thanks,
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> > > > > > > > > From: Chao Yu <[email protected]>
> > > > > > > > > Date: Mon, 3 Jul 2023 09:06:53 +0800
> > > > > > > > > Subject: [PATCH] Revert "f2fs: enable small discard by default"
> > > > > > > > >
> > > > > > > > > This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> > > > > > > > > to disable small discard by default, so that if there're huge number of
> > > > > > > > > small discards, it will decrease checkpoint's latency obviously.
> > > > > > > > >
> > > > > > > > > Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> > > > > > > > > commands during checkpoint"), due to it breaks small discard feature which
> > > > > > > > > may be configured via sysfs entry max_small_discards.
> > > > > > > > >
> > > > > > > > > Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> > > > > > > > > Signed-off-by: Chao Yu <[email protected]>
> > > > > > > > > ---
> > > > > > > > > fs/f2fs/segment.c | 4 ++--
> > > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > > index 14c822e5c9c9..0a313368f18b 100644
> > > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > > len = next_pos - cur_pos;
> > > > > > > > >
> > > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > > - !force || len < cpc->trim_minlen)
> > > > > > > > > + (force && len < cpc->trim_minlen))
> > > > > > > > > goto skip;
> > > > > > > > >
> > > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> > > > > > > > > @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> > > > > > > > > atomic_set(&dcc->queued_discard, 0);
> > > > > > > > > atomic_set(&dcc->discard_cmd_cnt, 0);
> > > > > > > > > dcc->nr_discards = 0;
> > > > > > > > > - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> > > > > > > > > + dcc->max_discards = 0;
> > > > > > > > > dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> > > > > > > > > dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> > > > > > > > > dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> > > > > > > > > --
> > > > > > > > > 2.40.1
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-07-25 13:48:51

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/7/22 4:23, Jaegeuk Kim wrote:
> On 07/13, Chao Yu wrote:
>> On 2023/7/12 23:55, Jaegeuk Kim wrote:
>>> On 07/12, Chao Yu wrote:
>>>> On 2023/7/12 0:37, Jaegeuk Kim wrote:
>>>>> On 07/06, Chao Yu wrote:
>>>>>> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>>>>>>> On 07/04, Chao Yu wrote:
>>>>>>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>>>>>>> On 07/03, Chao Yu wrote:
>>>>>>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> Change log from v1:
>>>>>>>>>>> - move the skip logic to avoid dangling objects
>>>>>>>>>>>
>>>>>>>>>>> fs/f2fs/segment.c | 2 +-
>>>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>>> - (force && len < cpc->trim_minlen))
>>>>>>>>>>> + !force || len < cpc->trim_minlen)
>>>>>>>>>>> goto skip;
>>>>>>>>>>
>>>>>>>>>> Sorry for late reply.
>>>>>>>>>>
>>>>>>>>>> We have a configuration for such case, what do you think of setting
>>>>>>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>>>>>>> logic may be broken?
>>>>>>>>>>
>>>>>>>>>> What: /sys/fs/f2fs/<disk>/max_small_discards
>>>>>>>>>> Date: November 2013
>>>>>>>>>> Contact: "Jaegeuk Kim" <[email protected]>
>>>>>>>>>> Description: Controls the issue rate of discard commands that consist of small
>>>>>>>>>> blocks less than 2MB. The candidates to be discarded are cached until
>>>>>>>>>> checkpoint is triggered, and issued during the checkpoint.
>>>>>>>>>> By default, it is disabled with 0.
>>>>>>>>>>
>>>>>>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>>>>>>
>>>>>>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>>>>>>
>>>>>>>> I didn't get you, do you mean we can still issue small discard by
>>>>>>>> fstrim, so small_discards functionality is fine?
>>>>>>>
>>>>>>> You got the point.
>>>>>>
>>>>>> Well, actually, what I mean is max_small_discards sysfs entry's functionality
>>>>>> is broken. Now, the entry can not be used to control number of small discards
>>>>>> committed by checkpoint.
>>>>>
>>>>> Could you descrbie this problem first?
>>>>
>>>> Oh, alright, actually, I've described this problem literally, but maybe it's not
>>>> clear, let me give some examples as below:
>>>>
>>>> echo 0 > /sys/fs/f2fs/vdb/max_small_discards
>>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>>> sync
>>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>>
>>>> echo 100 > /sys/fs/f2fs/vdb/max_small_discards
>>>> rm /mnt/f2fs/file
>>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>>> sync
>>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>>
>>>> Before the patch:
>>>>
>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>> 0 . . . . . . . .
>>>>
>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>> 0 3 1 . . . . . .
>>>>
>>>> After the patch:
>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>> 0 . . . . . . . .
>>>>
>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>> 0 . . . . . . . .
>>>>
>>>> So, now max_small_discards can not be used to control small discard number
>>>> cached by checkpoint.
>>
>> Let me explain more:
>>
>> Previously, we have two mechanisms to cache & submit small discards:
>>
>> a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
>> will cache small discard candidates w/ configured maximum number.
>>
>> b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
>> candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
>> interface is asynchronized, so it won't submit discard directly.
>>
>> Finally, discard thread will submit them in background periodically.
>>
>> So what I mean is the mechanism a) is broken, since no matter how we configure the
>> sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
>> discard candidates any more.
>
> Ok, it seems what I encountered before was adding this small discard even
> after issuing it by checkpoint. Thoughts?

Do you mean: in f2fs_clear_prefree_segments(), small discard may overlap
segment granularity discard?

e.g.
- f2fs_clear_prefree_segments
- f2fs_issue_discard(0, 512) --- segment granularity discard
- f2fs_issue_discard(0, 1) --- small discard
- f2fs_issue_discard(5, 1) --- small discard

Thanks,

>
> node=0], cp [data=0, node=0, meta=0], app [read=0 (direct=0, buffered=0), mapped=0], compr(buffered=0, mapped=0)], fs [data=0, (gc_data=0, cdata=0), node=0, meta=0]
> f2fs_discard-25-752 [003] ..... 9744.173085: f2fs_iostat_latency: dev = (254,51), iotype [peak lat.(ms)/avg lat.(ms)/count], rd_data [0/0/0], rd_node [0/0/0], rd_meta [0/0/0], wr_sync_data [0/0/0], wr_sync_node [0/0/0], wr_sync_meta [0/0/0], wr_async_data [0/0/0], wr_async_node [0/0/0], wr_async_meta [0/0/0]
> f2fs_discard-25-752 [003] ..... 9744.173089: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0c6, blklen = 0x1
> f2fs_discard-25-752 [003] ..... 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
> f2fs_discard-25-752 [003] ..... 9744.173126: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c116, blklen = 0x1
> f2fs_discard-25-752 [003] ..... 9744.173140: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c121, blklen = 0x1
> f2fs_discard-25-752 [003] ..... 9744.173154: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c15b, blklen = 0x1
> f2fs_discard-25-752 [003] ..... 9744.173169: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c16e, blklen = 0x1
> f2fs_discard-25-752 [003] ..... 9744.173183: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c181, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175272: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0a9, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175345: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0c6, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175352: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c116, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175354: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c121, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175360: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c15b, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175362: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c16e, blklen = 0x1
> f2fs_discard-25-752 [004] ..... 9744.175367: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c181, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.228775: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c197, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.228950: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1ad, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.228965: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1b1, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.228994: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1b3, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.229006: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c27a, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.229017: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c2a3, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.229028: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c2ab, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.229041: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c304, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230811: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c197, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230926: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1ad, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230929: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1b1, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230932: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1b3, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230940: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c27a, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230945: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c2a3, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230947: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c2ab, blklen = 0x1
> f2fs_discard-25-752 [002] ..... 9744.230952: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c304, blklen = 0x1
>
>>
>> So, it needs to fix max_small_discards sysfs functionality? or just drop the
>> functionality?
>>
>>>
>>> Since we do not submit small discards anymore during checkpoint. Why not relying
>>> on the discard thread to issue them?
>>
>> Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
>> is obsoleted, so it recommended to use fstrim to cache & submit small discards?
>>
>> Let me know, if I'm missing something or misunderstanding the point.
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> I think there is another way to achieve "avoid long checkpoint latency caused
>>>>>> by committing huge # of small discards", the way is we can set max_small_discards
>>>>>> to small value or zero, w/ such configuration, it will take checkpoint much less
>>>>>> time or no time to committing small discard due to below control logic:
>>>>>>
>>>>>> f2fs_flush_sit_entries()
>>>>>> {
>>>>>> ...
>>>>>> if (!(cpc->reason & CP_DISCARD)) {
>>>>>> cpc->trim_start = segno;
>>>>>> add_discard_addrs(sbi, cpc, false);
>>>>>> }
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> add_discard_addrs()
>>>>>> {
>>>>>> ...
>>>>>> while (force || SM_I(sbi)->dcc_info->nr_discards <=
>>>>>> SM_I(sbi)->dcc_info->max_discards) {
>>>>>>
>>>>>> It will break the loop once nr_discards is larger than max_discards, if
>>>>>> max_discards is set to zero, checkpoint won't take time to handle small discards.
>>>>>>
>>>>>> ...
>>>>>> if (!de) {
>>>>>> de = f2fs_kmem_cache_alloc(discard_entry_slab,
>>>>>> GFP_F2FS_ZERO, true, NULL);
>>>>>> de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>>>>>> list_add_tail(&de->list, head);
>>>>>> }
>>>>>> ...
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>>>>>>> From: Chao Yu <[email protected]>
>>>>>>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>>>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>>>>>>
>>>>>>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>>>>>>> to disable small discard by default, so that if there're huge number of
>>>>>>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>>>>>>
>>>>>>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>>>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>>>>>>> may be configured via sysfs entry max_small_discards.
>>>>>>>>>>
>>>>>>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> fs/f2fs/segment.c | 4 ++--
>>>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>>>
>>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>> - !force || len < cpc->trim_minlen)
>>>>>>>>>> + (force && len < cpc->trim_minlen))
>>>>>>>>>> goto skip;
>>>>>>>>>>
>>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>>>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>>>>>> atomic_set(&dcc->queued_discard, 0);
>>>>>>>>>> atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>>>>>> dcc->nr_discards = 0;
>>>>>>>>>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>>>>>>> + dcc->max_discards = 0;
>>>>>>>>>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>>>>>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>>>>>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>>>>>>> --
>>>>>>>>>> 2.40.1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-08-04 21:05:57

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 07/25, Chao Yu wrote:
> On 2023/7/22 4:23, Jaegeuk Kim wrote:
> > On 07/13, Chao Yu wrote:
> > > On 2023/7/12 23:55, Jaegeuk Kim wrote:
> > > > On 07/12, Chao Yu wrote:
> > > > > On 2023/7/12 0:37, Jaegeuk Kim wrote:
> > > > > > On 07/06, Chao Yu wrote:
> > > > > > > On 2023/7/6 1:30, Jaegeuk Kim wrote:
> > > > > > > > On 07/04, Chao Yu wrote:
> > > > > > > > > On 2023/7/4 18:53, Jaegeuk Kim wrote:
> > > > > > > > > > On 07/03, Chao Yu wrote:
> > > > > > > > > > > On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > > > > > > > > > > > If there're huge # of small discards, this will increase checkpoint latency
> > > > > > > > > > > > insanely. Let's issue small discards only by trim.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > > > > > > > > > ---
> > > > > > > > > > > >
> > > > > > > > > > > > Change log from v1:
> > > > > > > > > > > > - move the skip logic to avoid dangling objects
> > > > > > > > > > > >
> > > > > > > > > > > > fs/f2fs/segment.c | 2 +-
> > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > > > > > index 8c7af8b4fc47..0457d620011f 100644
> > > > > > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > > > > > len = next_pos - cur_pos;
> > > > > > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > > > > > - (force && len < cpc->trim_minlen))
> > > > > > > > > > > > + !force || len < cpc->trim_minlen)
> > > > > > > > > > > > goto skip;
> > > > > > > > > > >
> > > > > > > > > > > Sorry for late reply.
> > > > > > > > > > >
> > > > > > > > > > > We have a configuration for such case, what do you think of setting
> > > > > > > > > > > max_small_discards to zero? otherwise, w/ above change, max_small_discards
> > > > > > > > > > > logic may be broken?
> > > > > > > > > > >
> > > > > > > > > > > What: /sys/fs/f2fs/<disk>/max_small_discards
> > > > > > > > > > > Date: November 2013
> > > > > > > > > > > Contact: "Jaegeuk Kim" <[email protected]>
> > > > > > > > > > > Description: Controls the issue rate of discard commands that consist of small
> > > > > > > > > > > blocks less than 2MB. The candidates to be discarded are cached until
> > > > > > > > > > > checkpoint is triggered, and issued during the checkpoint.
> > > > > > > > > > > By default, it is disabled with 0.
> > > > > > > > > > >
> > > > > > > > > > > Or, if we prefer to disable small_discards by default, what about below change:
> > > > > > > > > >
> > > > > > > > > > I think small_discards is fine, but need to avoid long checkpoint latency only.
> > > > > > > > >
> > > > > > > > > I didn't get you, do you mean we can still issue small discard by
> > > > > > > > > fstrim, so small_discards functionality is fine?
> > > > > > > >
> > > > > > > > You got the point.
> > > > > > >
> > > > > > > Well, actually, what I mean is max_small_discards sysfs entry's functionality
> > > > > > > is broken. Now, the entry can not be used to control number of small discards
> > > > > > > committed by checkpoint.
> > > > > >
> > > > > > Could you descrbie this problem first?
> > > > >
> > > > > Oh, alright, actually, I've described this problem literally, but maybe it's not
> > > > > clear, let me give some examples as below:
> > > > >
> > > > > echo 0 > /sys/fs/f2fs/vdb/max_small_discards
> > > > > xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> > > > > xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> > > > > sync
> > > > > cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
> > > > >
> > > > > echo 100 > /sys/fs/f2fs/vdb/max_small_discards
> > > > > rm /mnt/f2fs/file
> > > > > xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> > > > > xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> > > > > sync
> > > > > cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
> > > > >
> > > > > Before the patch:
> > > > >
> > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > 0 . . . . . . . .
> > > > >
> > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > 0 3 1 . . . . . .
> > > > >
> > > > > After the patch:
> > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > 0 . . . . . . . .
> > > > >
> > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > 0 . . . . . . . .
> > > > >
> > > > > So, now max_small_discards can not be used to control small discard number
> > > > > cached by checkpoint.
> > >
> > > Let me explain more:
> > >
> > > Previously, we have two mechanisms to cache & submit small discards:
> > >
> > > a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
> > > will cache small discard candidates w/ configured maximum number.
> > >
> > > b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
> > > candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
> > > interface is asynchronized, so it won't submit discard directly.
> > >
> > > Finally, discard thread will submit them in background periodically.
> > >
> > > So what I mean is the mechanism a) is broken, since no matter how we configure the
> > > sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
> > > discard candidates any more.
> >
> > Ok, it seems what I encountered before was adding this small discard even
> > after issuing it by checkpoint. Thoughts?
>
> Do you mean: in f2fs_clear_prefree_segments(), small discard may overlap
> segment granularity discard?

I didn't dig enough tho, don't think so. Somehow I got a loop as below which
said the same LBA was issued and added back repeatedly, not seen this short log
unfortunately.

>
> e.g.
> - f2fs_clear_prefree_segments
> - f2fs_issue_discard(0, 512) --- segment granularity discard
> - f2fs_issue_discard(0, 1) --- small discard
> - f2fs_issue_discard(5, 1) --- small discard
>
> Thanks,
>
> >
> > node=0], cp [data=0, node=0, meta=0], app [read=0 (direct=0, buffered=0), mapped=0], compr(buffered=0, mapped=0)], fs [data=0, (gc_data=0, cdata=0), node=0, meta=0]
> > f2fs_discard-25-752 [003] ..... 9744.173085: f2fs_iostat_latency: dev = (254,51), iotype [peak lat.(ms)/avg lat.(ms)/count], rd_data [0/0/0], rd_node [0/0/0], rd_meta [0/0/0], wr_sync_data [0/0/0], wr_sync_node [0/0/0], wr_sync_meta [0/0/0], wr_async_data [0/0/0], wr_async_node [0/0/0], wr_async_meta [0/0/0]
> > f2fs_discard-25-752 [003] ..... 9744.173089: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0c6, blklen = 0x1
> > f2fs_discard-25-752 [003] ..... 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
> > f2fs_discard-25-752 [003] ..... 9744.173126: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c116, blklen = 0x1
> > f2fs_discard-25-752 [003] ..... 9744.173140: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c121, blklen = 0x1
> > f2fs_discard-25-752 [003] ..... 9744.173154: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c15b, blklen = 0x1
> > f2fs_discard-25-752 [003] ..... 9744.173169: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c16e, blklen = 0x1
> > f2fs_discard-25-752 [003] ..... 9744.173183: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c181, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175272: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0a9, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175345: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0c6, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175352: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c116, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175354: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c121, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175360: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c15b, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175362: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c16e, blklen = 0x1
> > f2fs_discard-25-752 [004] ..... 9744.175367: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c181, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.228775: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c197, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.228950: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1ad, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.228965: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1b1, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.228994: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c1b3, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.229006: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c27a, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.229017: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c2a3, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.229028: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c2ab, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.229041: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c304, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230811: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c197, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230926: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1ad, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230929: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1b1, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230932: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c1b3, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230940: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c27a, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230945: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c2a3, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230947: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c2ab, blklen = 0x1
> > f2fs_discard-25-752 [002] ..... 9744.230952: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c304, blklen = 0x1
> >
> > >
> > > So, it needs to fix max_small_discards sysfs functionality? or just drop the
> > > functionality?
> > >
> > > >
> > > > Since we do not submit small discards anymore during checkpoint. Why not relying
> > > > on the discard thread to issue them?
> > >
> > > Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
> > > is obsoleted, so it recommended to use fstrim to cache & submit small discards?
> > >
> > > Let me know, if I'm missing something or misunderstanding the point.
> > >
> > > Thanks,
> > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > >
> > > > > > >
> > > > > > > I think there is another way to achieve "avoid long checkpoint latency caused
> > > > > > > by committing huge # of small discards", the way is we can set max_small_discards
> > > > > > > to small value or zero, w/ such configuration, it will take checkpoint much less
> > > > > > > time or no time to committing small discard due to below control logic:
> > > > > > >
> > > > > > > f2fs_flush_sit_entries()
> > > > > > > {
> > > > > > > ...
> > > > > > > if (!(cpc->reason & CP_DISCARD)) {
> > > > > > > cpc->trim_start = segno;
> > > > > > > add_discard_addrs(sbi, cpc, false);
> > > > > > > }
> > > > > > > ...
> > > > > > > }
> > > > > > >
> > > > > > > add_discard_addrs()
> > > > > > > {
> > > > > > > ...
> > > > > > > while (force || SM_I(sbi)->dcc_info->nr_discards <=
> > > > > > > SM_I(sbi)->dcc_info->max_discards) {
> > > > > > >
> > > > > > > It will break the loop once nr_discards is larger than max_discards, if
> > > > > > > max_discards is set to zero, checkpoint won't take time to handle small discards.
> > > > > > >
> > > > > > > ...
> > > > > > > if (!de) {
> > > > > > > de = f2fs_kmem_cache_alloc(discard_entry_slab,
> > > > > > > GFP_F2FS_ZERO, true, NULL);
> > > > > > > de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
> > > > > > > list_add_tail(&de->list, head);
> > > > > > > }
> > > > > > > ...
> > > > > > > }
> > > > > > > ...
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> > > > > > > > > > > From: Chao Yu <[email protected]>
> > > > > > > > > > > Date: Mon, 3 Jul 2023 09:06:53 +0800
> > > > > > > > > > > Subject: [PATCH] Revert "f2fs: enable small discard by default"
> > > > > > > > > > >
> > > > > > > > > > > This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> > > > > > > > > > > to disable small discard by default, so that if there're huge number of
> > > > > > > > > > > small discards, it will decrease checkpoint's latency obviously.
> > > > > > > > > > >
> > > > > > > > > > > Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> > > > > > > > > > > commands during checkpoint"), due to it breaks small discard feature which
> > > > > > > > > > > may be configured via sysfs entry max_small_discards.
> > > > > > > > > > >
> > > > > > > > > > > Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> > > > > > > > > > > Signed-off-by: Chao Yu <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > > fs/f2fs/segment.c | 4 ++--
> > > > > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > > > > index 14c822e5c9c9..0a313368f18b 100644
> > > > > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > > > > len = next_pos - cur_pos;
> > > > > > > > > > >
> > > > > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > > > > - !force || len < cpc->trim_minlen)
> > > > > > > > > > > + (force && len < cpc->trim_minlen))
> > > > > > > > > > > goto skip;
> > > > > > > > > > >
> > > > > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> > > > > > > > > > > @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> > > > > > > > > > > atomic_set(&dcc->queued_discard, 0);
> > > > > > > > > > > atomic_set(&dcc->discard_cmd_cnt, 0);
> > > > > > > > > > > dcc->nr_discards = 0;
> > > > > > > > > > > - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> > > > > > > > > > > + dcc->max_discards = 0;
> > > > > > > > > > > dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> > > > > > > > > > > dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> > > > > > > > > > > dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> > > > > > > > > > > --
> > > > > > > > > > > 2.40.1
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-08-07 02:37:25

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/8/5 4:52, Jaegeuk Kim wrote:
> On 07/25, Chao Yu wrote:
>> On 2023/7/22 4:23, Jaegeuk Kim wrote:
>>> On 07/13, Chao Yu wrote:
>>>> On 2023/7/12 23:55, Jaegeuk Kim wrote:
>>>>> On 07/12, Chao Yu wrote:
>>>>>> On 2023/7/12 0:37, Jaegeuk Kim wrote:
>>>>>>> On 07/06, Chao Yu wrote:
>>>>>>>> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>>>>>>>>> On 07/04, Chao Yu wrote:
>>>>>>>>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>>>>>>>>> On 07/03, Chao Yu wrote:
>>>>>>>>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>>>>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>>>>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>
>>>>>>>>>>>>> Change log from v1:
>>>>>>>>>>>>> - move the skip logic to avoid dangling objects
>>>>>>>>>>>>>
>>>>>>>>>>>>> fs/f2fs/segment.c | 2 +-
>>>>>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>>>>> - (force && len < cpc->trim_minlen))
>>>>>>>>>>>>> + !force || len < cpc->trim_minlen)
>>>>>>>>>>>>> goto skip;
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for late reply.
>>>>>>>>>>>>
>>>>>>>>>>>> We have a configuration for such case, what do you think of setting
>>>>>>>>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>>>>>>>>> logic may be broken?
>>>>>>>>>>>>
>>>>>>>>>>>> What: /sys/fs/f2fs/<disk>/max_small_discards
>>>>>>>>>>>> Date: November 2013
>>>>>>>>>>>> Contact: "Jaegeuk Kim" <[email protected]>
>>>>>>>>>>>> Description: Controls the issue rate of discard commands that consist of small
>>>>>>>>>>>> blocks less than 2MB. The candidates to be discarded are cached until
>>>>>>>>>>>> checkpoint is triggered, and issued during the checkpoint.
>>>>>>>>>>>> By default, it is disabled with 0.
>>>>>>>>>>>>
>>>>>>>>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>>>>>>>>
>>>>>>>>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>>>>>>>>
>>>>>>>>>> I didn't get you, do you mean we can still issue small discard by
>>>>>>>>>> fstrim, so small_discards functionality is fine?
>>>>>>>>>
>>>>>>>>> You got the point.
>>>>>>>>
>>>>>>>> Well, actually, what I mean is max_small_discards sysfs entry's functionality
>>>>>>>> is broken. Now, the entry can not be used to control number of small discards
>>>>>>>> committed by checkpoint.
>>>>>>>
>>>>>>> Could you descrbie this problem first?
>>>>>>
>>>>>> Oh, alright, actually, I've described this problem literally, but maybe it's not
>>>>>> clear, let me give some examples as below:
>>>>>>
>>>>>> echo 0 > /sys/fs/f2fs/vdb/max_small_discards
>>>>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>>>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>>>>> sync
>>>>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>>>>
>>>>>> echo 100 > /sys/fs/f2fs/vdb/max_small_discards
>>>>>> rm /mnt/f2fs/file
>>>>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>>>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>>>>> sync
>>>>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>>>>
>>>>>> Before the patch:
>>>>>>
>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>> 0 . . . . . . . .
>>>>>>
>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>> 0 3 1 . . . . . .
>>>>>>
>>>>>> After the patch:
>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>> 0 . . . . . . . .
>>>>>>
>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>> 0 . . . . . . . .
>>>>>>
>>>>>> So, now max_small_discards can not be used to control small discard number
>>>>>> cached by checkpoint.
>>>>
>>>> Let me explain more:
>>>>
>>>> Previously, we have two mechanisms to cache & submit small discards:
>>>>
>>>> a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
>>>> will cache small discard candidates w/ configured maximum number.
>>>>
>>>> b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
>>>> candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
>>>> interface is asynchronized, so it won't submit discard directly.
>>>>
>>>> Finally, discard thread will submit them in background periodically.
>>>>
>>>> So what I mean is the mechanism a) is broken, since no matter how we configure the
>>>> sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
>>>> discard candidates any more.
>>>
>>> Ok, it seems what I encountered before was adding this small discard even
>>> after issuing it by checkpoint. Thoughts?
>>
>> Do you mean: in f2fs_clear_prefree_segments(), small discard may overlap
>> segment granularity discard?
>
> I didn't dig enough tho, don't think so. Somehow I got a loop as below which
> said the same LBA was issued and added back repeatedly, not seen this short log
> unfortunately.
>
>>
>> e.g.
>> - f2fs_clear_prefree_segments
>> - f2fs_issue_discard(0, 512) --- segment granularity discard
>> - f2fs_issue_discard(0, 1) --- small discard
>> - f2fs_issue_discard(5, 1) --- small discard
>>
>> Thanks,

[snip]

>>> f2fs_discard-25-752 [003] ..... 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>>> f2fs_discard-25-752 [004] ..... 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1

I don't see any loop via above log, am I missing something?

The traces were printed in below call paths, and it printed the same LBAs as
expected?

- issue_discard_thread
- __issue_discard_cmd
- __submit_discard_cmd
- trace_f2fs_issue_discard
9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
- __wait_all_discard_cmd
- __wait_discard_cmd_range
- __remove_discard_cmd
- trace_f2fs_remove_discard
9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1

Thanks,

>>>
>>>>
>>>> So, it needs to fix max_small_discards sysfs functionality? or just drop the
>>>> functionality?
>>>>
>>>>>
>>>>> Since we do not submit small discards anymore during checkpoint. Why not relying
>>>>> on the discard thread to issue them?
>>>>
>>>> Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
>>>> is obsoleted, so it recommended to use fstrim to cache & submit small discards?
>>>>
>>>> Let me know, if I'm missing something or misunderstanding the point.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I think there is another way to achieve "avoid long checkpoint latency caused
>>>>>>>> by committing huge # of small discards", the way is we can set max_small_discards
>>>>>>>> to small value or zero, w/ such configuration, it will take checkpoint much less
>>>>>>>> time or no time to committing small discard due to below control logic:
>>>>>>>>
>>>>>>>> f2fs_flush_sit_entries()
>>>>>>>> {
>>>>>>>> ...
>>>>>>>> if (!(cpc->reason & CP_DISCARD)) {
>>>>>>>> cpc->trim_start = segno;
>>>>>>>> add_discard_addrs(sbi, cpc, false);
>>>>>>>> }
>>>>>>>> ...
>>>>>>>> }
>>>>>>>>
>>>>>>>> add_discard_addrs()
>>>>>>>> {
>>>>>>>> ...
>>>>>>>> while (force || SM_I(sbi)->dcc_info->nr_discards <=
>>>>>>>> SM_I(sbi)->dcc_info->max_discards) {
>>>>>>>>
>>>>>>>> It will break the loop once nr_discards is larger than max_discards, if
>>>>>>>> max_discards is set to zero, checkpoint won't take time to handle small discards.
>>>>>>>>
>>>>>>>> ...
>>>>>>>> if (!de) {
>>>>>>>> de = f2fs_kmem_cache_alloc(discard_entry_slab,
>>>>>>>> GFP_F2FS_ZERO, true, NULL);
>>>>>>>> de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>>>>>>>> list_add_tail(&de->list, head);
>>>>>>>> }
>>>>>>>> ...
>>>>>>>> }
>>>>>>>> ...
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>>>>>>>>> From: Chao Yu <[email protected]>
>>>>>>>>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>>>>>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>>>>>>>>
>>>>>>>>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>>>>>>>>> to disable small discard by default, so that if there're huge number of
>>>>>>>>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>>>>>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>>>>>>>>> may be configured via sysfs entry max_small_discards.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>>>>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>> fs/f2fs/segment.c | 4 ++--
>>>>>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>>>>>
>>>>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>>>> - !force || len < cpc->trim_minlen)
>>>>>>>>>>>> + (force && len < cpc->trim_minlen))
>>>>>>>>>>>> goto skip;
>>>>>>>>>>>>
>>>>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>>>>>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>>>>>>>> atomic_set(&dcc->queued_discard, 0);
>>>>>>>>>>>> atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>>>>>>>> dcc->nr_discards = 0;
>>>>>>>>>>>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>>>>>>>>> + dcc->max_discards = 0;
>>>>>>>>>>>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>>>>>>>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>>>>>>>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.40.1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-08-07 20:41:15

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 08/07, Chao Yu wrote:
> On 2023/8/5 4:52, Jaegeuk Kim wrote:
> > On 07/25, Chao Yu wrote:
> > > On 2023/7/22 4:23, Jaegeuk Kim wrote:
> > > > On 07/13, Chao Yu wrote:
> > > > > On 2023/7/12 23:55, Jaegeuk Kim wrote:
> > > > > > On 07/12, Chao Yu wrote:
> > > > > > > On 2023/7/12 0:37, Jaegeuk Kim wrote:
> > > > > > > > On 07/06, Chao Yu wrote:
> > > > > > > > > On 2023/7/6 1:30, Jaegeuk Kim wrote:
> > > > > > > > > > On 07/04, Chao Yu wrote:
> > > > > > > > > > > On 2023/7/4 18:53, Jaegeuk Kim wrote:
> > > > > > > > > > > > On 07/03, Chao Yu wrote:
> > > > > > > > > > > > > On 2023/6/15 0:10, Jaegeuk Kim wrote:
> > > > > > > > > > > > > > If there're huge # of small discards, this will increase checkpoint latency
> > > > > > > > > > > > > > insanely. Let's issue small discards only by trim.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Jaegeuk Kim <[email protected]>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Change log from v1:
> > > > > > > > > > > > > > - move the skip logic to avoid dangling objects
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > fs/f2fs/segment.c | 2 +-
> > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > > > > > > > index 8c7af8b4fc47..0457d620011f 100644
> > > > > > > > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > > > > > > > len = next_pos - cur_pos;
> > > > > > > > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > > > > > > > - (force && len < cpc->trim_minlen))
> > > > > > > > > > > > > > + !force || len < cpc->trim_minlen)
> > > > > > > > > > > > > > goto skip;
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry for late reply.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We have a configuration for such case, what do you think of setting
> > > > > > > > > > > > > max_small_discards to zero? otherwise, w/ above change, max_small_discards
> > > > > > > > > > > > > logic may be broken?
> > > > > > > > > > > > >
> > > > > > > > > > > > > What: /sys/fs/f2fs/<disk>/max_small_discards
> > > > > > > > > > > > > Date: November 2013
> > > > > > > > > > > > > Contact: "Jaegeuk Kim" <[email protected]>
> > > > > > > > > > > > > Description: Controls the issue rate of discard commands that consist of small
> > > > > > > > > > > > > blocks less than 2MB. The candidates to be discarded are cached until
> > > > > > > > > > > > > checkpoint is triggered, and issued during the checkpoint.
> > > > > > > > > > > > > By default, it is disabled with 0.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Or, if we prefer to disable small_discards by default, what about below change:
> > > > > > > > > > > >
> > > > > > > > > > > > I think small_discards is fine, but need to avoid long checkpoint latency only.
> > > > > > > > > > >
> > > > > > > > > > > I didn't get you, do you mean we can still issue small discard by
> > > > > > > > > > > fstrim, so small_discards functionality is fine?
> > > > > > > > > >
> > > > > > > > > > You got the point.
> > > > > > > > >
> > > > > > > > > Well, actually, what I mean is max_small_discards sysfs entry's functionality
> > > > > > > > > is broken. Now, the entry can not be used to control number of small discards
> > > > > > > > > committed by checkpoint.
> > > > > > > >
> > > > > > > > Could you descrbie this problem first?
> > > > > > >
> > > > > > > Oh, alright, actually, I've described this problem literally, but maybe it's not
> > > > > > > clear, let me give some examples as below:
> > > > > > >
> > > > > > > echo 0 > /sys/fs/f2fs/vdb/max_small_discards
> > > > > > > xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> > > > > > > xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> > > > > > > sync
> > > > > > > cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
> > > > > > >
> > > > > > > echo 100 > /sys/fs/f2fs/vdb/max_small_discards
> > > > > > > rm /mnt/f2fs/file
> > > > > > > xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
> > > > > > > xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
> > > > > > > sync
> > > > > > > cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
> > > > > > >
> > > > > > > Before the patch:
> > > > > > >
> > > > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > > > 0 . . . . . . . .
> > > > > > >
> > > > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > > > 0 3 1 . . . . . .
> > > > > > >
> > > > > > > After the patch:
> > > > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > > > 0 . . . . . . . .
> > > > > > >
> > > > > > > Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
> > > > > > > 0 . . . . . . . .
> > > > > > >
> > > > > > > So, now max_small_discards can not be used to control small discard number
> > > > > > > cached by checkpoint.
> > > > >
> > > > > Let me explain more:
> > > > >
> > > > > Previously, we have two mechanisms to cache & submit small discards:
> > > > >
> > > > > a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
> > > > > will cache small discard candidates w/ configured maximum number.
> > > > >
> > > > > b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
> > > > > candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
> > > > > interface is asynchronized, so it won't submit discard directly.
> > > > >
> > > > > Finally, discard thread will submit them in background periodically.
> > > > >
> > > > > So what I mean is the mechanism a) is broken, since no matter how we configure the
> > > > > sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
> > > > > discard candidates any more.
> > > >
> > > > Ok, it seems what I encountered before was adding this small discard even
> > > > after issuing it by checkpoint. Thoughts?
> > >
> > > Do you mean: in f2fs_clear_prefree_segments(), small discard may overlap
> > > segment granularity discard?
> >
> > I didn't dig enough tho, don't think so. Somehow I got a loop as below which
> > said the same LBA was issued and added back repeatedly, not seen this short log
> > unfortunately.
> >
> > >
> > > e.g.
> > > - f2fs_clear_prefree_segments
> > > - f2fs_issue_discard(0, 512) --- segment granularity discard
> > > - f2fs_issue_discard(0, 1) --- small discard
> > > - f2fs_issue_discard(5, 1) --- small discard
> > >
> > > Thanks,
>
> [snip]
>
> > > > f2fs_discard-25-752 [003] ..... 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
> > > > f2fs_discard-25-752 [004] ..... 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>
> I don't see any loop via above log, am I missing something?
>
> The traces were printed in below call paths, and it printed the same LBAs as
> expected?

I said the log was cut to show the loop enoughly. I'll try to reproduce this
later locally.

>
> - issue_discard_thread
> - __issue_discard_cmd
> - __submit_discard_cmd
> - trace_f2fs_issue_discard
> 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
> - __wait_all_discard_cmd
> - __wait_discard_cmd_range
> - __remove_discard_cmd
> - trace_f2fs_remove_discard
> 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>
> Thanks,
>
> > > >
> > > > >
> > > > > So, it needs to fix max_small_discards sysfs functionality? or just drop the
> > > > > functionality?
> > > > >
> > > > > >
> > > > > > Since we do not submit small discards anymore during checkpoint. Why not relying
> > > > > > on the discard thread to issue them?
> > > > >
> > > > > Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
> > > > > is obsoleted, so it recommended to use fstrim to cache & submit small discards?
> > > > >
> > > > > Let me know, if I'm missing something or misunderstanding the point.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I think there is another way to achieve "avoid long checkpoint latency caused
> > > > > > > > > by committing huge # of small discards", the way is we can set max_small_discards
> > > > > > > > > to small value or zero, w/ such configuration, it will take checkpoint much less
> > > > > > > > > time or no time to committing small discard due to below control logic:
> > > > > > > > >
> > > > > > > > > f2fs_flush_sit_entries()
> > > > > > > > > {
> > > > > > > > > ...
> > > > > > > > > if (!(cpc->reason & CP_DISCARD)) {
> > > > > > > > > cpc->trim_start = segno;
> > > > > > > > > add_discard_addrs(sbi, cpc, false);
> > > > > > > > > }
> > > > > > > > > ...
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > add_discard_addrs()
> > > > > > > > > {
> > > > > > > > > ...
> > > > > > > > > while (force || SM_I(sbi)->dcc_info->nr_discards <=
> > > > > > > > > SM_I(sbi)->dcc_info->max_discards) {
> > > > > > > > >
> > > > > > > > > It will break the loop once nr_discards is larger than max_discards, if
> > > > > > > > > max_discards is set to zero, checkpoint won't take time to handle small discards.
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > > if (!de) {
> > > > > > > > > de = f2fs_kmem_cache_alloc(discard_entry_slab,
> > > > > > > > > GFP_F2FS_ZERO, true, NULL);
> > > > > > > > > de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
> > > > > > > > > list_add_tail(&de->list, head);
> > > > > > > > > }
> > > > > > > > > ...
> > > > > > > > > }
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
> > > > > > > > > > > > > From: Chao Yu <[email protected]>
> > > > > > > > > > > > > Date: Mon, 3 Jul 2023 09:06:53 +0800
> > > > > > > > > > > > > Subject: [PATCH] Revert "f2fs: enable small discard by default"
> > > > > > > > > > > > >
> > > > > > > > > > > > > This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
> > > > > > > > > > > > > to disable small discard by default, so that if there're huge number of
> > > > > > > > > > > > > small discards, it will decrease checkpoint's latency obviously.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
> > > > > > > > > > > > > commands during checkpoint"), due to it breaks small discard feature which
> > > > > > > > > > > > > may be configured via sysfs entry max_small_discards.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
> > > > > > > > > > > > > Signed-off-by: Chao Yu <[email protected]>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > fs/f2fs/segment.c | 4 ++--
> > > > > > > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > > > > > > > > > > > > index 14c822e5c9c9..0a313368f18b 100644
> > > > > > > > > > > > > --- a/fs/f2fs/segment.c
> > > > > > > > > > > > > +++ b/fs/f2fs/segment.c
> > > > > > > > > > > > > @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> > > > > > > > > > > > > len = next_pos - cur_pos;
> > > > > > > > > > > > >
> > > > > > > > > > > > > if (f2fs_sb_has_blkzoned(sbi) ||
> > > > > > > > > > > > > - !force || len < cpc->trim_minlen)
> > > > > > > > > > > > > + (force && len < cpc->trim_minlen))
> > > > > > > > > > > > > goto skip;
> > > > > > > > > > > > >
> > > > > > > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
> > > > > > > > > > > > > @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> > > > > > > > > > > > > atomic_set(&dcc->queued_discard, 0);
> > > > > > > > > > > > > atomic_set(&dcc->discard_cmd_cnt, 0);
> > > > > > > > > > > > > dcc->nr_discards = 0;
> > > > > > > > > > > > > - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
> > > > > > > > > > > > > + dcc->max_discards = 0;
> > > > > > > > > > > > > dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
> > > > > > > > > > > > > dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
> > > > > > > > > > > > > dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.40.1
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,

2023-08-08 02:23:31

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: do not issue small discard commands during checkpoint

On 2023/8/8 3:55, Jaegeuk Kim wrote:
> On 08/07, Chao Yu wrote:
>> On 2023/8/5 4:52, Jaegeuk Kim wrote:
>>> On 07/25, Chao Yu wrote:
>>>> On 2023/7/22 4:23, Jaegeuk Kim wrote:
>>>>> On 07/13, Chao Yu wrote:
>>>>>> On 2023/7/12 23:55, Jaegeuk Kim wrote:
>>>>>>> On 07/12, Chao Yu wrote:
>>>>>>>> On 2023/7/12 0:37, Jaegeuk Kim wrote:
>>>>>>>>> On 07/06, Chao Yu wrote:
>>>>>>>>>> On 2023/7/6 1:30, Jaegeuk Kim wrote:
>>>>>>>>>>> On 07/04, Chao Yu wrote:
>>>>>>>>>>>> On 2023/7/4 18:53, Jaegeuk Kim wrote:
>>>>>>>>>>>>> On 07/03, Chao Yu wrote:
>>>>>>>>>>>>>> On 2023/6/15 0:10, Jaegeuk Kim wrote:
>>>>>>>>>>>>>>> If there're huge # of small discards, this will increase checkpoint latency
>>>>>>>>>>>>>>> insanely. Let's issue small discards only by trim.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Change log from v1:
>>>>>>>>>>>>>>> - move the skip logic to avoid dangling objects
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> fs/f2fs/segment.c | 2 +-
>>>>>>>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>>>>>>> index 8c7af8b4fc47..0457d620011f 100644
>>>>>>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>>>>>>> - (force && len < cpc->trim_minlen))
>>>>>>>>>>>>>>> + !force || len < cpc->trim_minlen)
>>>>>>>>>>>>>>> goto skip;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry for late reply.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have a configuration for such case, what do you think of setting
>>>>>>>>>>>>>> max_small_discards to zero? otherwise, w/ above change, max_small_discards
>>>>>>>>>>>>>> logic may be broken?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What: /sys/fs/f2fs/<disk>/max_small_discards
>>>>>>>>>>>>>> Date: November 2013
>>>>>>>>>>>>>> Contact: "Jaegeuk Kim" <[email protected]>
>>>>>>>>>>>>>> Description: Controls the issue rate of discard commands that consist of small
>>>>>>>>>>>>>> blocks less than 2MB. The candidates to be discarded are cached until
>>>>>>>>>>>>>> checkpoint is triggered, and issued during the checkpoint.
>>>>>>>>>>>>>> By default, it is disabled with 0.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Or, if we prefer to disable small_discards by default, what about below change:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think small_discards is fine, but need to avoid long checkpoint latency only.
>>>>>>>>>>>>
>>>>>>>>>>>> I didn't get you, do you mean we can still issue small discard by
>>>>>>>>>>>> fstrim, so small_discards functionality is fine?
>>>>>>>>>>>
>>>>>>>>>>> You got the point.
>>>>>>>>>>
>>>>>>>>>> Well, actually, what I mean is max_small_discards sysfs entry's functionality
>>>>>>>>>> is broken. Now, the entry can not be used to control number of small discards
>>>>>>>>>> committed by checkpoint.
>>>>>>>>>
>>>>>>>>> Could you descrbie this problem first?
>>>>>>>>
>>>>>>>> Oh, alright, actually, I've described this problem literally, but maybe it's not
>>>>>>>> clear, let me give some examples as below:
>>>>>>>>
>>>>>>>> echo 0 > /sys/fs/f2fs/vdb/max_small_discards
>>>>>>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>>>>>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>>>>>>> sync
>>>>>>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>>>>>>
>>>>>>>> echo 100 > /sys/fs/f2fs/vdb/max_small_discards
>>>>>>>> rm /mnt/f2fs/file
>>>>>>>> xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
>>>>>>>> xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
>>>>>>>> sync
>>>>>>>> cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
>>>>>>>>
>>>>>>>> Before the patch:
>>>>>>>>
>>>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>>>> 0 . . . . . . . .
>>>>>>>>
>>>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>>>> 0 3 1 . . . . . .
>>>>>>>>
>>>>>>>> After the patch:
>>>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>>>> 0 . . . . . . . .
>>>>>>>>
>>>>>>>> Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
>>>>>>>> 0 . . . . . . . .
>>>>>>>>
>>>>>>>> So, now max_small_discards can not be used to control small discard number
>>>>>>>> cached by checkpoint.
>>>>>>
>>>>>> Let me explain more:
>>>>>>
>>>>>> Previously, we have two mechanisms to cache & submit small discards:
>>>>>>
>>>>>> a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards, and checkpoint
>>>>>> will cache small discard candidates w/ configured maximum number.
>>>>>>
>>>>>> b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small discard
>>>>>> candidates w/ configured discard granularity, but w/o limitation of number. FSTRIM
>>>>>> interface is asynchronized, so it won't submit discard directly.
>>>>>>
>>>>>> Finally, discard thread will submit them in background periodically.
>>>>>>
>>>>>> So what I mean is the mechanism a) is broken, since no matter how we configure the
>>>>>> sysfs entry /sys/fs/f2fs/vdb/max_small_discards, checkpoint will not cache small
>>>>>> discard candidates any more.
>>>>>
>>>>> Ok, it seems what I encountered before was adding this small discard even
>>>>> after issuing it by checkpoint. Thoughts?
>>>>
>>>> Do you mean: in f2fs_clear_prefree_segments(), small discard may overlap
>>>> segment granularity discard?
>>>
>>> I didn't dig enough tho, don't think so. Somehow I got a loop as below which
>>> said the same LBA was issued and added back repeatedly, not seen this short log
>>> unfortunately.
>>>
>>>>
>>>> e.g.
>>>> - f2fs_clear_prefree_segments
>>>> - f2fs_issue_discard(0, 512) --- segment granularity discard
>>>> - f2fs_issue_discard(0, 1) --- small discard
>>>> - f2fs_issue_discard(5, 1) --- small discard
>>>>
>>>> Thanks,
>>
>> [snip]
>>
>>>>> f2fs_discard-25-752 [003] ..... 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>>>>> f2fs_discard-25-752 [004] ..... 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>>
>> I don't see any loop via above log, am I missing something?
>>
>> The traces were printed in below call paths, and it printed the same LBAs as
>> expected?
>
> I said the log was cut to show the loop enoughly. I'll try to reproduce this

Oh, I misunderstood it...

> later locally.

Alright.

Thanks,

>
>>
>> - issue_discard_thread
>> - __issue_discard_cmd
>> - __submit_discard_cmd
>> - trace_f2fs_issue_discard
>> 9744.173111: f2fs_issue_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>> - __wait_all_discard_cmd
>> - __wait_discard_cmd_range
>> - __remove_discard_cmd
>> - trace_f2fs_remove_discard
>> 9744.175348: f2fs_remove_discard: dev = (254,51), blkstart = 0x18c0ca, blklen = 0x1
>>
>> Thanks,
>>
>>>>>
>>>>>>
>>>>>> So, it needs to fix max_small_discards sysfs functionality? or just drop the
>>>>>> functionality?
>>>>>>
>>>>>>>
>>>>>>> Since we do not submit small discards anymore during checkpoint. Why not relying
>>>>>>> on the discard thread to issue them?
>>>>>>
>>>>>> Sorry, I'm not sure I get your point, do you mean max_small_discards functionality
>>>>>> is obsoleted, so it recommended to use fstrim to cache & submit small discards?
>>>>>>
>>>>>> Let me know, if I'm missing something or misunderstanding the point.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think there is another way to achieve "avoid long checkpoint latency caused
>>>>>>>>>> by committing huge # of small discards", the way is we can set max_small_discards
>>>>>>>>>> to small value or zero, w/ such configuration, it will take checkpoint much less
>>>>>>>>>> time or no time to committing small discard due to below control logic:
>>>>>>>>>>
>>>>>>>>>> f2fs_flush_sit_entries()
>>>>>>>>>> {
>>>>>>>>>> ...
>>>>>>>>>> if (!(cpc->reason & CP_DISCARD)) {
>>>>>>>>>> cpc->trim_start = segno;
>>>>>>>>>> add_discard_addrs(sbi, cpc, false);
>>>>>>>>>> }
>>>>>>>>>> ...
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> add_discard_addrs()
>>>>>>>>>> {
>>>>>>>>>> ...
>>>>>>>>>> while (force || SM_I(sbi)->dcc_info->nr_discards <=
>>>>>>>>>> SM_I(sbi)->dcc_info->max_discards) {
>>>>>>>>>>
>>>>>>>>>> It will break the loop once nr_discards is larger than max_discards, if
>>>>>>>>>> max_discards is set to zero, checkpoint won't take time to handle small discards.
>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>> if (!de) {
>>>>>>>>>> de = f2fs_kmem_cache_alloc(discard_entry_slab,
>>>>>>>>>> GFP_F2FS_ZERO, true, NULL);
>>>>>>>>>> de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
>>>>>>>>>> list_add_tail(&de->list, head);
>>>>>>>>>> }
>>>>>>>>>> ...
>>>>>>>>>> }
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From eb89d9b56e817e3046d7fa17165b12416f09d456 Mon Sep 17 00:00:00 2001
>>>>>>>>>>>>>> From: Chao Yu <[email protected]>
>>>>>>>>>>>>>> Date: Mon, 3 Jul 2023 09:06:53 +0800
>>>>>>>>>>>>>> Subject: [PATCH] Revert "f2fs: enable small discard by default"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This reverts commit d618ebaf0aa83d175658aea5291e0c459d471d39 in order
>>>>>>>>>>>>>> to disable small discard by default, so that if there're huge number of
>>>>>>>>>>>>>> small discards, it will decrease checkpoint's latency obviously.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, this patch reverts 9ac00e7cef10 ("f2fs: do not issue small discard
>>>>>>>>>>>>>> commands during checkpoint"), due to it breaks small discard feature which
>>>>>>>>>>>>>> may be configured via sysfs entry max_small_discards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
>>>>>>>>>>>>>> Signed-off-by: Chao Yu <[email protected]>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> fs/f2fs/segment.c | 4 ++--
>>>>>>>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>>>>>>>> index 14c822e5c9c9..0a313368f18b 100644
>>>>>>>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>>>>>>>> @@ -2193,7 +2193,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
>>>>>>>>>>>>>> len = next_pos - cur_pos;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if (f2fs_sb_has_blkzoned(sbi) ||
>>>>>>>>>>>>>> - !force || len < cpc->trim_minlen)
>>>>>>>>>>>>>> + (force && len < cpc->trim_minlen))
>>>>>>>>>>>>>> goto skip;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
>>>>>>>>>>>>>> @@ -2269,7 +2269,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
>>>>>>>>>>>>>> atomic_set(&dcc->queued_discard, 0);
>>>>>>>>>>>>>> atomic_set(&dcc->discard_cmd_cnt, 0);
>>>>>>>>>>>>>> dcc->nr_discards = 0;
>>>>>>>>>>>>>> - dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
>>>>>>>>>>>>>> + dcc->max_discards = 0;
>>>>>>>>>>>>>> dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST;
>>>>>>>>>>>>>> dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME;
>>>>>>>>>>>>>> dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME;
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.40.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,