Vlastimil Babka figured out that when fragmentation score didn't go down
across the proactive compaction i.e. when no progress is made, next wake
up for proactive compaction is deferred for 1 <<
COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
decrement 'proactive_defer' counter and goes sleep i.e. it is getting
woken to just decrement a counter. The same deferral time can also
achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
avoided thus also removes the need of 'proactive_defer' thread counter.
Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Signed-off-by: Charan Teja Reddy <[email protected]>
---
Changes in V1:
o Removed the 'proactive_defer' thread counter by optimizing proactive
o This is a resend as earlier it was clubbed with other changes posted
at https://lore.kernel.org/patchwork/patch/1448789/
mm/compaction.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 621508e..db00dbf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2885,7 +2885,8 @@ static int kcompactd(void *p)
{
pg_data_t *pgdat = (pg_data_t *)p;
struct task_struct *tsk = current;
- unsigned int proactive_defer = 0;
+ long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
+ long timeout = default_timeout;
const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
@@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
- kcompactd_work_requested(pgdat),
- msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
+ kcompactd_work_requested(pgdat), timeout)) {
psi_memstall_enter(&pflags);
kcompactd_do_work(pgdat);
psi_memstall_leave(&pflags);
+ /*
+ * Reset the timeout value. The defer timeout by
+ * proactive compaction can effectively lost
+ * here but that is fine as the condition of the
+ * zone changed substantionally and carrying on
+ * with the previous defer is not useful.
+ */
+ timeout = default_timeout;
continue;
}
- /* kcompactd wait timeout */
+ /*
+ * Start the proactive work with default timeout. Based
+ * on the fragmentation score, this timeout is updated.
+ */
+ timeout = default_timeout;
if (should_proactive_compact_node(pgdat)) {
unsigned int prev_score, score;
- if (proactive_defer) {
- proactive_defer--;
- continue;
- }
prev_score = fragmentation_score_node(pgdat);
proactive_compact_node(pgdat);
score = fragmentation_score_node(pgdat);
@@ -2926,8 +2934,9 @@ static int kcompactd(void *p)
* Defer proactive compaction if the fragmentation
* score did not go down i.e. no progress made.
*/
- proactive_defer = score < prev_score ?
- 0 : 1 << COMPACT_MAX_DEFER_SHIFT;
+ if (unlikely(score >= prev_score))
+ timeout =
+ default_timeout << COMPACT_MAX_DEFER_SHIFT;
}
}
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
On Wed, 21 Jul 2021 17:43:19 +0530 Charan Teja Reddy <[email protected]> wrote:
> Vlastimil Babka figured out that when fragmentation score didn't go down
> across the proactive compaction i.e. when no progress is made, next wake
> up for proactive compaction is deferred for 1 <<
> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
> woken to just decrement a counter. The same deferral time can also
> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
> avoided thus also removes the need of 'proactive_defer' thread counter.
>
> @@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
>
> trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
> if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
> - kcompactd_work_requested(pgdat),
> - msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
> + kcompactd_work_requested(pgdat), timeout)) {
>
> psi_memstall_enter(&pflags);
> kcompactd_do_work(pgdat);
> psi_memstall_leave(&pflags);
> + /*
> + * Reset the timeout value. The defer timeout by
> + * proactive compaction can effectively lost
> + * here but that is fine as the condition of the
> + * zone changed substantionally and carrying on
> + * with the previous defer is not useful.
> + */
> + timeout = default_timeout;
> continue;
I find this comment hard to follow. Is this better?
--- a/mm/compaction.c~mm-compaction-optimize-proactive-compaction-deferrals-fix
+++ a/mm/compaction.c
@@ -2909,11 +2909,11 @@ static int kcompactd(void *p)
kcompactd_do_work(pgdat);
psi_memstall_leave(&pflags);
/*
- * Reset the timeout value. The defer timeout by
- * proactive compaction can effectively lost
- * here but that is fine as the condition of the
- * zone changed substantionally and carrying on
- * with the previous defer is not useful.
+ * Reset the timeout value. The defer timeout from
+ * proactive compaction is lost here but that is fine
+ * as the condition of the zone changing substantionally
+ * then carrying on with the previous defer interval is
+ * not useful.
*/
timeout = default_timeout;
continue;
_
On 7/21/21 10:18 PM, Andrew Morton wrote:
> On Wed, 21 Jul 2021 17:43:19 +0530 Charan Teja Reddy <[email protected]> wrote:
>
>> Vlastimil Babka figured out that when fragmentation score didn't go down
>> across the proactive compaction i.e. when no progress is made, next wake
>> up for proactive compaction is deferred for 1 <<
>> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
>> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
>> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
>> woken to just decrement a counter. The same deferral time can also
>> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
>> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
>> avoided thus also removes the need of 'proactive_defer' thread counter.
Acked-by: Vlastimil Babka <[email protected]>
>>
>> @@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
>>
>> trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
>> if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
>> - kcompactd_work_requested(pgdat),
>> - msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
>> + kcompactd_work_requested(pgdat), timeout)) {
>>
>> psi_memstall_enter(&pflags);
>> kcompactd_do_work(pgdat);
>> psi_memstall_leave(&pflags);
>> + /*
>> + * Reset the timeout value. The defer timeout by
>> + * proactive compaction can effectively lost
>> + * here but that is fine as the condition of the
>> + * zone changed substantionally and carrying on
>> + * with the previous defer is not useful.
>> + */
>> + timeout = default_timeout;
>> continue;
>
> I find this comment hard to follow. Is this better?
Yes, thanks.
> --- a/mm/compaction.c~mm-compaction-optimize-proactive-compaction-deferrals-fix
> +++ a/mm/compaction.c
> @@ -2909,11 +2909,11 @@ static int kcompactd(void *p)
> kcompactd_do_work(pgdat);
> psi_memstall_leave(&pflags);
> /*
> - * Reset the timeout value. The defer timeout by
> - * proactive compaction can effectively lost
> - * here but that is fine as the condition of the
> - * zone changed substantionally and carrying on
> - * with the previous defer is not useful.
> + * Reset the timeout value. The defer timeout from
> + * proactive compaction is lost here but that is fine
> + * as the condition of the zone changing substantionally
> + * then carrying on with the previous defer interval is
> + * not useful.
> */
> timeout = default_timeout;
> continue;
> _
>
On 7/21/21 6:13 AM, Charan Teja Reddy wrote:
> Vlastimil Babka figured out that when fragmentation score didn't go down
> across the proactive compaction i.e. when no progress is made, next wake
> up for proactive compaction is deferred for 1 <<
> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
> woken to just decrement a counter. The same deferral time can also
> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
> avoided thus also removes the need of 'proactive_defer' thread counter.
>
> Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
> Signed-off-by: Charan Teja Reddy <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
> ---
> Changes in V1:
> o Removed the 'proactive_defer' thread counter by optimizing proactive
> o This is a resend as earlier it was clubbed with other changes posted
> at https://lore.kernel.org/patchwork/patch/1448789/
>
> mm/compaction.c | 29 +++++++++++++++++++----------
> 1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 621508e..db00dbf 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2885,7 +2885,8 @@ static int kcompactd(void *p)
> {
> pg_data_t *pgdat = (pg_data_t *)p;
> struct task_struct *tsk = current;
> - unsigned int proactive_defer = 0;
> + long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
> + long timeout = default_timeout;
>
> const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
>
> @@ -2902,23 +2903,30 @@ static int kcompactd(void *p)
>
> trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
> if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
> - kcompactd_work_requested(pgdat),
> - msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) {
> + kcompactd_work_requested(pgdat), timeout)) {
>
> psi_memstall_enter(&pflags);
> kcompactd_do_work(pgdat);
> psi_memstall_leave(&pflags);
> + /*
> + * Reset the timeout value. The defer timeout by
> + * proactive compaction can effectively lost
> + * here but that is fine as the condition of the
> + * zone changed substantionally and carrying on
> + * with the previous defer is not useful.
> + */
> + timeout = default_timeout;
> continue;
> }
>
> - /* kcompactd wait timeout */
> + /*
> + * Start the proactive work with default timeout. Based
> + * on the fragmentation score, this timeout is updated.
> + */
> + timeout = default_timeout;
> if (should_proactive_compact_node(pgdat)) {
> unsigned int prev_score, score;
>
> - if (proactive_defer) {
> - proactive_defer--;
> - continue;
> - }
> prev_score = fragmentation_score_node(pgdat);
> proactive_compact_node(pgdat);
> score = fragmentation_score_node(pgdat);
> @@ -2926,8 +2934,9 @@ static int kcompactd(void *p)
> * Defer proactive compaction if the fragmentation
> * score did not go down i.e. no progress made.
> */
> - proactive_defer = score < prev_score ?
> - 0 : 1 << COMPACT_MAX_DEFER_SHIFT;
> + if (unlikely(score >= prev_score))
> + timeout =
> + default_timeout << COMPACT_MAX_DEFER_SHIFT;
> }
> }
>
>
On Wed, 21 Jul 2021, Charan Teja Reddy wrote:
> Vlastimil Babka figured out that when fragmentation score didn't go down
> across the proactive compaction i.e. when no progress is made, next wake
> up for proactive compaction is deferred for 1 <<
> COMPACT_MAX_DEFER_SHIFT, i.e. 64 times, with each wakeup interval of
> HPAGE_FRAG_CHECK_INTERVAL_MSEC(=500). In each of this wakeup, it just
> decrement 'proactive_defer' counter and goes sleep i.e. it is getting
> woken to just decrement a counter. The same deferral time can also
> achieved by simply doing the HPAGE_FRAG_CHECK_INTERVAL_MSEC <<
> COMPACT_MAX_DEFER_SHIFT thus unnecessary wakeup of kcompact thread is
> avoided thus also removes the need of 'proactive_defer' thread counter.
>
> Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
> Signed-off-by: Charan Teja Reddy <[email protected]>
With Andrew's comment fixup:
Acked-by: David Rientjes <[email protected]>
Thanks, Charan.