2021-11-26 14:52:25

by SeongJae Park

[permalink] [raw]
Subject: [PATCH v3 0/2] mm/damon: Fix fake /proc/loadavg reports

This patchset fixes DAMON's fake load report issue. The first patch
makes yet another variant of usleep_range() for this fix, and the second
patch fixes the issue of DAMON by making it using the newly introduced
function.

I think these need to be applied on v5.15.y, but the second patch cannot
cleanly applied there as is. I will back-port this on v5.15.y and post
later once this is merged in the mainline. If you think this is not
appropriate for stable tree, please let me know.

Changelog
---------

From v2
(https://lore.kernel.org/linux-mm/[email protected]/)
- Cc Oleksandr (Oleksandr Natalenko)
- Add 'Suggested-by:' for Andrew Morton on the first patch

From v1
(https://lore.kernel.org/linux-mm/[email protected]/)
- Avoid copy-and-pasting usleep_delay() in DAMON code (Andrew Morton)

SeongJae Park (2):
timers: Implement usleep_idle_range()
mm/damon/core: Fix fake load reports due to uninterruptible sleeps

include/linux/delay.h | 14 +++++++++++++-
kernel/time/timer.c | 16 +++++++++-------
mm/damon/core.c | 6 +++---
3 files changed, 25 insertions(+), 11 deletions(-)

--
2.17.1



2021-11-26 14:52:29

by SeongJae Park

[permalink] [raw]
Subject: [PATCH v3 2/2] mm/damon/core: Fix fake load reports due to uninterruptible sleeps

Because DAMON sleeps in uninterruptible mode, /proc/loadavg reports fake
load while DAMON is turned on, though it is doing nothing. This can
confuse users[1]. To avoid the case, this commit makes DAMON sleeps in
idle mode.

[1] https://lore.kernel.org/all/[email protected]/

Fixes: 2224d8485492 ("mm: introduce Data Access MONitor (DAMON)")
Reported-by: Oleksandr Natalenko <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> # 5.15.x
---
mm/damon/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index daacd9536c7c..8cd8fddc931e 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -979,9 +979,9 @@ static unsigned long damos_wmark_wait_us(struct damos *scheme)
static void kdamond_usleep(unsigned long usecs)
{
if (usecs > 100 * 1000)
- schedule_timeout_interruptible(usecs_to_jiffies(usecs));
+ schedule_timeout_idle(usecs_to_jiffies(usecs));
else
- usleep_range(usecs, usecs + 1);
+ usleep_idle_range(usecs, usecs + 1);
}

/* Returns negative error code if it's not activated but should return */
@@ -1036,7 +1036,7 @@ static int kdamond_fn(void *data)
ctx->callback.after_sampling(ctx))
done = true;

- usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
+ kdamond_usleep(ctx->sample_interval);

if (ctx->primitive.check_accesses)
max_nr_accesses = ctx->primitive.check_accesses(ctx);
--
2.17.1


2021-11-26 15:06:32

by SeongJae Park

[permalink] [raw]
Subject: [PATCH v3 1/2] timers: Implement usleep_idle_range()

Some kernel threads such as DAMON could need to repeatedly sleep in
micro seconds level. Because usleep_range() sleeps in uninterruptible
state, however, such threads would make /proc/loadavg reports fake load.

To help such cases, this commit implements a variant of usleep_range()
called usleep_idle_range(). It is same to usleep_range() but sets the
state of the current task as TASK_IDLE while sleeping.

Suggested-by: Andrew Morton <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
---
include/linux/delay.h | 14 +++++++++++++-
kernel/time/timer.c | 16 +++++++++-------
2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/include/linux/delay.h b/include/linux/delay.h
index 8eacf67eb212..039e7e0c7378 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -20,6 +20,7 @@
*/

#include <linux/math.h>
+#include <linux/sched.h>

extern unsigned long loops_per_jiffy;

@@ -58,7 +59,18 @@ void calibrate_delay(void);
void __attribute__((weak)) calibration_delay_done(void);
void msleep(unsigned int msecs);
unsigned long msleep_interruptible(unsigned int msecs);
-void usleep_range(unsigned long min, unsigned long max);
+void usleep_range_state(unsigned long min, unsigned long max,
+ unsigned int state);
+
+static inline void usleep_range(unsigned long min, unsigned long max)
+{
+ usleep_range_state(min, max, TASK_UNINTERRUPTIBLE);
+}
+
+static inline void usleep_idle_range(unsigned long min, unsigned long max)
+{
+ usleep_range_state(min, max, TASK_IDLE);
+}

static inline void ssleep(unsigned int seconds)
{
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index e3d2c23c413d..85f1021ad459 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2054,26 +2054,28 @@ unsigned long msleep_interruptible(unsigned int msecs)
EXPORT_SYMBOL(msleep_interruptible);

/**
- * usleep_range - Sleep for an approximate time
- * @min: Minimum time in usecs to sleep
- * @max: Maximum time in usecs to sleep
+ * usleep_range_state - Sleep for an approximate time in a given state
+ * @min: Minimum time in usecs to sleep
+ * @max: Maximum time in usecs to sleep
+ * @state: State of the current task that will be while sleeping
*
* In non-atomic context where the exact wakeup time is flexible, use
- * usleep_range() instead of udelay(). The sleep improves responsiveness
+ * usleep_range_state() instead of udelay(). The sleep improves responsiveness
* by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces
* power usage by allowing hrtimers to take advantage of an already-
* scheduled interrupt instead of scheduling a new one just for this sleep.
*/
-void __sched usleep_range(unsigned long min, unsigned long max)
+void __sched usleep_range_state(unsigned long min, unsigned long max,
+ unsigned int state)
{
ktime_t exp = ktime_add_us(ktime_get(), min);
u64 delta = (u64)(max - min) * NSEC_PER_USEC;

for (;;) {
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(state);
/* Do not return before the requested sleep time has elapsed */
if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
break;
}
}
-EXPORT_SYMBOL(usleep_range);
+EXPORT_SYMBOL(usleep_range_state);
--
2.17.1


2021-11-26 21:28:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] timers: Implement usleep_idle_range()

On Fri, Nov 26 2021 at 14:50, SeongJae Park wrote:
> Some kernel threads such as DAMON could need to repeatedly sleep in
> micro seconds level. Because usleep_range() sleeps in uninterruptible
> state, however, such threads would make /proc/loadavg reports fake load.
>
> To help such cases, this commit implements a variant of usleep_range()
> called usleep_idle_range(). It is same to usleep_range() but sets the
> state of the current task as TASK_IDLE while sleeping.
>
> Suggested-by: Andrew Morton <[email protected]>
> Signed-off-by: SeongJae Park <[email protected]>

Andrew, I assume you want to pick that up along with the mm fix, right?

Reviewed-by: Thomas Gleixner <[email protected]>

2021-11-27 13:11:48

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] mm/damon: Fix fake /proc/loadavg reports

Hello.

On p?tek 26. listopadu 2021 15:50:13 CET SeongJae Park wrote:
> This patchset fixes DAMON's fake load report issue. The first patch
> makes yet another variant of usleep_range() for this fix, and the second
> patch fixes the issue of DAMON by making it using the newly introduced
> function.
>
> I think these need to be applied on v5.15.y, but the second patch cannot
> cleanly applied there as is. I will back-port this on v5.15.y and post
> later once this is merged in the mainline. If you think this is not
> appropriate for stable tree, please let me know.
>
> Changelog
> ---------
>
> From v2
> (https://lore.kernel.org/linux-mm/[email protected]/)
> - Cc Oleksandr (Oleksandr Natalenko)
> - Add 'Suggested-by:' for Andrew Morton on the first patch
>
> From v1
> (https://lore.kernel.org/linux-mm/[email protected]/)
> - Avoid copy-and-pasting usleep_delay() in DAMON code (Andrew Morton)
>
> SeongJae Park (2):
> timers: Implement usleep_idle_range()
> mm/damon/core: Fix fake load reports due to uninterruptible sleeps
>
> include/linux/delay.h | 14 +++++++++++++-
> kernel/time/timer.c | 16 +++++++++-------
> mm/damon/core.c | 6 +++---
> 3 files changed, 25 insertions(+), 11 deletions(-)

For the series:

Tested-by: Oleksandr Natalenko <[email protected]>

Thank you.

--
Oleksandr Natalenko (post-factum)