From: Zhang Qiao <[email protected]>
When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
to the base level (around cfs_rq->min_vruntime), so that the entity
doesn't gain extra boost when placed backwards.
However, if the entity being placed wasn't executed for a long time, its
vruntime may get too far behind (e.g. while cfs_rq was executing a
low-weight hog), which can inverse the vruntime comparison due to s64
overflow. This results in the entity being placed with its original
vruntime way forwards, so that it will effectively never get to the cpu.
To prevent that, ignore the vruntime of the entity being placed if it
didn't execute for much longer than the characteristic sheduler time
scale.
[rkagan: formatted, adjusted commit log, comments, cutoff value]
Co-developed-by: Roman Kagan <[email protected]>
Signed-off-by: Roman Kagan <[email protected]>
---
@zhangqiao22, I took the liberty to put you as the author of the patch,
as this is essentially what you posted for discussion, with minor
tweaks. Please stamp with your s-o-b if you're ok with it.
kernel/sched/fair.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0f8736991427..d6cf131ebb0b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4656,6 +4656,7 @@ static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
{
u64 vruntime = cfs_rq->min_vruntime;
+ u64 sleep_time;
/*
* The 'current' period is already promised to the current tasks,
@@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
vruntime -= thresh;
}
- /* ensure we never gain time by being placed backwards. */
- se->vruntime = max_vruntime(se->vruntime, vruntime);
+ /*
+ * Pull vruntime of the entity being placed to the base level of
+ * cfs_rq, to prevent boosting it if placed backwards. If the entity
+ * slept for a long time, don't even try to compare its vruntime with
+ * the base as it may be too far off and the comparison may get
+ * inversed due to s64 overflow.
+ */
+ sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+ if ((s64)sleep_time > 60 * NSEC_PER_SEC)
+ se->vruntime = vruntime;
+ else
+ se->vruntime = max_vruntime(se->vruntime, vruntime);
}
static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
--
2.34.1
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Hi Roman,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.2-rc5 next-20230127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
patch link: https://lore.kernel.org/r/20230127163230.3339408-1-rkagan%40amazon.de
patch subject: [PATCH] sched/fair: sanitize vruntime of entity being placed
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230128/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/2db31a18bcb88c280481672d7721f7e003d8df5a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
git checkout 2db31a18bcb88c280481672d7721f7e003d8df5a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash kernel/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
kernel/sched/fair.c:688:5: warning: no previous prototype for 'sched_update_scaling' [-Wmissing-prototypes]
688 | int sched_update_scaling(void)
| ^~~~~~~~~~~~~~~~~~~~
kernel/sched/fair.c: In function 'place_entity':
>> kernel/sched/fair.c:4697:34: warning: integer overflow in expression of type 'long int' results in '-129542144' [-Woverflow]
4697 | if ((s64)sleep_time > 60 * NSEC_PER_SEC)
| ^
vim +4697 kernel/sched/fair.c
4654
4655 static void
4656 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
4657 {
4658 u64 vruntime = cfs_rq->min_vruntime;
4659 u64 sleep_time;
4660
4661 /*
4662 * The 'current' period is already promised to the current tasks,
4663 * however the extra weight of the new task will slow them down a
4664 * little, place the new task so that it fits in the slot that
4665 * stays open at the end.
4666 */
4667 if (initial && sched_feat(START_DEBIT))
4668 vruntime += sched_vslice(cfs_rq, se);
4669
4670 /* sleeps up to a single latency don't count. */
4671 if (!initial) {
4672 unsigned long thresh;
4673
4674 if (se_is_idle(se))
4675 thresh = sysctl_sched_min_granularity;
4676 else
4677 thresh = sysctl_sched_latency;
4678
4679 /*
4680 * Halve their sleep time's effect, to allow
4681 * for a gentler effect of sleepers:
4682 */
4683 if (sched_feat(GENTLE_FAIR_SLEEPERS))
4684 thresh >>= 1;
4685
4686 vruntime -= thresh;
4687 }
4688
4689 /*
4690 * Pull vruntime of the entity being placed to the base level of
4691 * cfs_rq, to prevent boosting it if placed backwards. If the entity
4692 * slept for a long time, don't even try to compare its vruntime with
4693 * the base as it may be too far off and the comparison may get
4694 * inversed due to s64 overflow.
4695 */
4696 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> 4697 if ((s64)sleep_time > 60 * NSEC_PER_SEC)
4698 se->vruntime = vruntime;
4699 else
4700 se->vruntime = max_vruntime(se->vruntime, vruntime);
4701 }
4702
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
Hi Roman,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.2-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
patch link: https://lore.kernel.org/r/20230127163230.3339408-1-rkagan%40amazon.de
patch subject: [PATCH] sched/fair: sanitize vruntime of entity being placed
config: hexagon-randconfig-r041-20230123 (https://download.01.org/0day-ci/archive/20230128/[email protected]/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 4196ca3278f78c6e19246e54ab0ecb364e37d66a)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/2db31a18bcb88c280481672d7721f7e003d8df5a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Roman-Kagan/sched-fair-sanitize-vruntime-of-entity-being-placed/20230128-130846
git checkout 2db31a18bcb88c280481672d7721f7e003d8df5a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/sched/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
In file included from kernel/sched/fair.c:28:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from kernel/sched/fair.c:28:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from kernel/sched/fair.c:28:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
>> kernel/sched/fair.c:4697:27: warning: overflow in expression; result is -129542144 with type 'long' [-Winteger-overflow]
if ((s64)sleep_time > 60 * NSEC_PER_SEC)
^
kernel/sched/fair.c:6078:6: warning: no previous prototype for function 'init_cfs_bandwidth' [-Wmissing-prototypes]
void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
^
kernel/sched/fair.c:6078:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
^
static
kernel/sched/fair.c:12498:6: warning: no previous prototype for function 'free_fair_sched_group' [-Wmissing-prototypes]
void free_fair_sched_group(struct task_group *tg) { }
^
kernel/sched/fair.c:12498:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void free_fair_sched_group(struct task_group *tg) { }
^
static
kernel/sched/fair.c:12500:5: warning: no previous prototype for function 'alloc_fair_sched_group' [-Wmissing-prototypes]
int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
^
kernel/sched/fair.c:12500:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
^
static
kernel/sched/fair.c:12505:6: warning: no previous prototype for function 'online_fair_sched_group' [-Wmissing-prototypes]
void online_fair_sched_group(struct task_group *tg) { }
^
kernel/sched/fair.c:12505:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void online_fair_sched_group(struct task_group *tg) { }
^
static
kernel/sched/fair.c:12507:6: warning: no previous prototype for function 'unregister_fair_sched_group' [-Wmissing-prototypes]
void unregister_fair_sched_group(struct task_group *tg) { }
^
kernel/sched/fair.c:12507:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void unregister_fair_sched_group(struct task_group *tg) { }
^
static
kernel/sched/fair.c:535:20: warning: unused function 'list_del_leaf_cfs_rq' [-Wunused-function]
static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
^
kernel/sched/fair.c:556:19: warning: unused function 'tg_is_idle' [-Wunused-function]
static inline int tg_is_idle(struct task_group *tg)
^
kernel/sched/fair.c:6051:20: warning: unused function 'cfs_bandwidth_used' [-Wunused-function]
static inline bool cfs_bandwidth_used(void)
^
kernel/sched/fair.c:6059:20: warning: unused function 'sync_throttle' [-Wunused-function]
static inline void sync_throttle(struct task_group *tg, int cpu) {}
^
kernel/sched/fair.c:6084:37: warning: unused function 'tg_cfs_bandwidth' [-Wunused-function]
static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
^
kernel/sched/fair.c:6088:20: warning: unused function 'destroy_cfs_bandwidth' [-Wunused-function]
static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
^
18 warnings generated.
vim +/long +4697 kernel/sched/fair.c
4654
4655 static void
4656 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
4657 {
4658 u64 vruntime = cfs_rq->min_vruntime;
4659 u64 sleep_time;
4660
4661 /*
4662 * The 'current' period is already promised to the current tasks,
4663 * however the extra weight of the new task will slow them down a
4664 * little, place the new task so that it fits in the slot that
4665 * stays open at the end.
4666 */
4667 if (initial && sched_feat(START_DEBIT))
4668 vruntime += sched_vslice(cfs_rq, se);
4669
4670 /* sleeps up to a single latency don't count. */
4671 if (!initial) {
4672 unsigned long thresh;
4673
4674 if (se_is_idle(se))
4675 thresh = sysctl_sched_min_granularity;
4676 else
4677 thresh = sysctl_sched_latency;
4678
4679 /*
4680 * Halve their sleep time's effect, to allow
4681 * for a gentler effect of sleepers:
4682 */
4683 if (sched_feat(GENTLE_FAIR_SLEEPERS))
4684 thresh >>= 1;
4685
4686 vruntime -= thresh;
4687 }
4688
4689 /*
4690 * Pull vruntime of the entity being placed to the base level of
4691 * cfs_rq, to prevent boosting it if placed backwards. If the entity
4692 * slept for a long time, don't even try to compare its vruntime with
4693 * the base as it may be too far off and the comparison may get
4694 * inversed due to s64 overflow.
4695 */
4696 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> 4697 if ((s64)sleep_time > 60 * NSEC_PER_SEC)
4698 se->vruntime = vruntime;
4699 else
4700 se->vruntime = max_vruntime(se->vruntime, vruntime);
4701 }
4702
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
在 2023/1/28 0:32, Roman Kagan 写道:
> From: Zhang Qiao <[email protected]>
>
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
>
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow. This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
>
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for much longer than the characteristic sheduler time
> scale.
>
Signed-off-by: Zhang Qiao <[email protected]>
> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Co-developed-by: Roman Kagan <[email protected]>
> Signed-off-by: Roman Kagan <[email protected]>
> ---
> @zhangqiao22, I took the liberty to put you as the author of the patch,
> as this is essentially what you posted for discussion, with minor
> tweaks. Please stamp with your s-o-b if you're ok with it.
>
> kernel/sched/fair.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0f8736991427..d6cf131ebb0b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4656,6 +4656,7 @@ static void
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> u64 vruntime = cfs_rq->min_vruntime;
> + u64 sleep_time;
>
> /*
> * The 'current' period is already promised to the current tasks,
> @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime -= thresh;
> }
>
> - /* ensure we never gain time by being placed backwards. */
> - se->vruntime = max_vruntime(se->vruntime, vruntime);
> + /*
> + * Pull vruntime of the entity being placed to the base level of
> + * cfs_rq, to prevent boosting it if placed backwards. If the entity
> + * slept for a long time, don't even try to compare its vruntime with
> + * the base as it may be too far off and the comparison may get
> + * inversed due to s64 overflow.
> + */
> + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> + if ((s64)sleep_time > 60 * NSEC_PER_SEC)
In order to avoid overflowing, it'd better be "60LL * NSEC_PER_SEC"
Thanks,
Qiao.
> + se->vruntime = vruntime;
> + else
> + se->vruntime = max_vruntime(se->vruntime, vruntime);
> }
>
> static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
>
Hi Roman, Qiao,
On 2023-01-27 at 17:32:30 +0100, Roman Kagan wrote:
> From: Zhang Qiao <[email protected]>
>
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
>
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow. This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
>
Looks interesting,
case 1:
se->vruntime = 1, cfs_rq->min_vruntime = ULONG_MAX
==> max = 1
case 2:
se->vruntime = 1, cfs_rq->min_vruntime = LONG_MAX
==> max = LONG_MAX
May I know if the issue you described above is in case 1? We want
the max to be ULONG_MAX but it returns 1 because of s64
comparison? Then max = 1 is incorrectly used as se's vruntime?
Could you please elaborate a little more about this issue?
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for much longer than the characteristic sheduler time
> scale.
>
> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Co-developed-by: Roman Kagan <[email protected]>
> Signed-off-by: Roman Kagan <[email protected]>
> ---
> @zhangqiao22, I took the liberty to put you as the author of the patch,
> as this is essentially what you posted for discussion, with minor
> tweaks. Please stamp with your s-o-b if you're ok with it.
>
> kernel/sched/fair.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0f8736991427..d6cf131ebb0b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4656,6 +4656,7 @@ static void
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> u64 vruntime = cfs_rq->min_vruntime;
> + u64 sleep_time;
>
> /*
> * The 'current' period is already promised to the current tasks,
> @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime -= thresh;
> }
>
> - /* ensure we never gain time by being placed backwards. */
> - se->vruntime = max_vruntime(se->vruntime, vruntime);
> + /*
> + * Pull vruntime of the entity being placed to the base level of
> + * cfs_rq, to prevent boosting it if placed backwards. If the entity
> + * slept for a long time, don't even try to compare its vruntime with
> + * the base as it may be too far off and the comparison may get
> + * inversed due to s64 overflow.
> + */
> + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
If I understand correctly, se->exec_start is just updated by enqueue_entity()->update_curr(cfs_rq),
then place_entity() in invoked here, I'm not sure if sleep_time above
could reflect the real sleep time. Maybe something like:
rq_clock_task(rq_of(cfs_rq)) - se->time_stamp_dequeued ?
thanks,
Chenyu
Hi, Chenyu,
在 2023/1/29 1:27, Chen Yu 写道:
> Hi Roman, Qiao,
> On 2023-01-27 at 17:32:30 +0100, Roman Kagan wrote:
>> From: Zhang Qiao <[email protected]>
>>
>> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
>> to the base level (around cfs_rq->min_vruntime), so that the entity
>> doesn't gain extra boost when placed backwards.
>>
>> However, if the entity being placed wasn't executed for a long time, its
>> vruntime may get too far behind (e.g. while cfs_rq was executing a
>> low-weight hog), which can inverse the vruntime comparison due to s64
>> overflow. This results in the entity being placed with its original
>> vruntime way forwards, so that it will effectively never get to the cpu.
>>
> Looks interesting,
> case 1:
> se->vruntime = 1, cfs_rq->min_vruntime = ULONG_MAX
> ==> max = 1
> case 2:
> se->vruntime = 1, cfs_rq->min_vruntime = LONG_MAX
> ==> max = LONG_MAX
>
> May I know if the issue you described above is in case 1? We want
> the max to be ULONG_MAX but it returns 1 because of s64
> comparison? Then max = 1 is incorrectly used as se's vruntime?
> Could you please elaborate a little more about this issue?
Yes, the issue is in case 1.
For more detailed discussion, can see https://lkml.org/lkml/2022/12/21/435.
>> To prevent that, ignore the vruntime of the entity being placed if it
>> didn't execute for much longer than the characteristic sheduler time
>> scale.
>>
>> [rkagan: formatted, adjusted commit log, comments, cutoff value]
>> Co-developed-by: Roman Kagan <[email protected]>
>> Signed-off-by: Roman Kagan <[email protected]>
>> ---
>> @zhangqiao22, I took the liberty to put you as the author of the patch,
>> as this is essentially what you posted for discussion, with minor
>> tweaks. Please stamp with your s-o-b if you're ok with it.
>>
>> kernel/sched/fair.c | 15 +++++++++++++--
>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 0f8736991427..d6cf131ebb0b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4656,6 +4656,7 @@ static void
>> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>> {
>> u64 vruntime = cfs_rq->min_vruntime;
>> + u64 sleep_time;
>>
>> /*
>> * The 'current' period is already promised to the current tasks,
>> @@ -4685,8 +4686,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>> vruntime -= thresh;
>> }
>>
>> - /* ensure we never gain time by being placed backwards. */
>> - se->vruntime = max_vruntime(se->vruntime, vruntime);
>> + /*
>> + * Pull vruntime of the entity being placed to the base level of
>> + * cfs_rq, to prevent boosting it if placed backwards. If the entity
>> + * slept for a long time, don't even try to compare its vruntime with
>> + * the base as it may be too far off and the comparison may get
>> + * inversed due to s64 overflow.
>> + */
>> + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> If I understand correctly, se->exec_start is just updated by enqueue_entity()->update_curr(cfs_rq),
When a task go to sleep, se->exec_start will update at dequeue_entity()->update_curr(cfs_rq).
And enqueue_entity()->update_curr(cfs_rq) just update current se.
Thank,
Qiao.
> then place_entity() in invoked here, I'm not sure if sleep_time above
> could reflect the real sleep time. Maybe something like:
> rq_clock_task(rq_of(cfs_rq)) - se->time_stamp_dequeued ?
>
> thanks,
> Chenyu
> .
>
On 2023-01-29 at 09:27:07 +0800, Zhang Qiao wrote:
> Hi, Chenyu,
>
> On 2023/1/29 1:27, Chen Yu wrote:
> > Hi Roman, Qiao,
> > On 2023-01-27 at 17:32:30 +0100, Roman Kagan wrote:
> >> From: Zhang Qiao <[email protected]>
[snip]
> > If I understand correctly, se->exec_start is just updated by enqueue_entity()->update_curr(cfs_rq),
>
> When a task go to sleep, se->exec_start will update at dequeue_entity()->update_curr(cfs_rq).
> And enqueue_entity()->update_curr(cfs_rq) just update current se.
>
You are right. update_curr(cfs_rq) only updates the current running one and the se
has not been queued yet.
thanks,
Chenyu