Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp653365pxj; Thu, 27 May 2021 08:41:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIAhPPXrBOs23DHV4JYw8yklcKvgomy9IgRg7m2JhE2M31FIQmcJ3J+28cASb9cap2od17 X-Received: by 2002:a05:6e02:4a6:: with SMTP id e6mr3565164ils.254.1622130076414; Thu, 27 May 2021 08:41:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622130076; cv=none; d=google.com; s=arc-20160816; b=VoysT1b4xgUFHbGB27R1+crtED7morCYkNW4vERiYjZ6qQlgXepOytIljAGefHgLic wgJMQ2GgGJy5YVTl90rIWgpHwj4dkfUeLJ8ORj0bGvmiM/tVBXnFf1jZWDXNAr+ODfw5 N1h7edylUCruHoCeQEmJWuohoDr77DsbQhn45RV/+lHjxvWwCMh/X3hxnFMWMfNHKpgX XjISEeT1qEnWaL6XTcw9arjIBXHqALt30vrMimipW3UywX4MgqKHXeK3g5nUYhZVcz2s O9Nga9wFuOxUvVQwh4822NSyutX00Q03pJm//CBlIsPxhLE1V0zeCRpgE5cBgv0oAR7J +YlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Jn9RJlG2E82BpD7Vv6Vb1dj/ANJIGulwdI2asEU4BmY=; b=ffIAbufX5EZodpOM68vHeIscfzg/dq/I4Wb3QdLLAj4d1b6/uz/rUAYU+gZTILRFA9 x6l7+B6toid5ejPn8FGmVWF28eRu9SnS9mE3lC2D9GeS+RNI0lNW295JeEunoaXIttct 07LkNufmMoyp9YSnmhpKw/JrxAZ8nE3RfZ6LpXWGZe/YaCugG2Ga27MS/DPUtrMldmzT RkjjUcrnDfqQr6waEv1+kw5I1b37VjbNtzE5gRnQwnz40J5uyNppomA9pAiGVwmVMfvP EZhpKtRjkeKDg8MBJNTBTa3XhV7Qicbi7PC8O8LGcC9uvtxJ3YgZumtE+1T22P4Hy7ln b/og== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=L3gVldGT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g7si2342777ilj.162.2021.05.27.08.41.01; Thu, 27 May 2021 08:41:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=L3gVldGT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235825AbhE0Jh1 (ORCPT + 99 others); Thu, 27 May 2021 05:37:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235675AbhE0Jh1 (ORCPT ); Thu, 27 May 2021 05:37:27 -0400 Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CFA3C061574 for ; Thu, 27 May 2021 02:35:53 -0700 (PDT) Received: by mail-lj1-x22e.google.com with SMTP id p20so5427210ljj.8 for ; Thu, 27 May 2021 02:35:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Jn9RJlG2E82BpD7Vv6Vb1dj/ANJIGulwdI2asEU4BmY=; b=L3gVldGT/P2ri4MuOk/ktD+DIC70vqifVvkd1+wwrMC74Z5jL4w7rnYWgGFrKSnhYs EmmLzIDttfs+F4qcpuyGfK2wcsv4RkiZF4Ukne6mtw3ck8RWtzK0cs68IitWUC8B3uzA MxMeWvSUVAa9WgsHE/92DCK0122qJglI7aADedo5hWiSLfg4YBGUCUpdTo5mfY8ri+cG LSvr1q5mWO5ZrSl858mswTN/SXGmnooCvWzh1Qp4iF/6zrkeSUsXaw7uxalkJnqzhKiB bLhhKNibwFRxfkdvEiTAtlR+RNHXgGkwKZbLgYIoBYG94H3Htd8mynZH5goQBTLD/adW W1zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Jn9RJlG2E82BpD7Vv6Vb1dj/ANJIGulwdI2asEU4BmY=; b=j9kBcz5ilQPkpVexrLB4wDSS+pToTNkKx5GktWucYA7GxAEHdHQrLZymCaS4BqcYkK H+cEufIjbe0jksHApUBp5IsjSzntSbBG6sO2Z7h5MRVts2lEEpGl4yZZmdMDBTDAlR/A tX6ODwzqXeIm/WK3+EGhpjKlEEEw9qvtjNdW0hruASmE9Oaq5jqG4zAVPH2s/0FnTChW x58Z5XWBvpn0LNV7PkOAme+mqoROnLuBGEo3u2rxKywpK5fRIF9xH9BgoAMJhijtVWQS qfk6HcA/Sfy/3DP44xEJPGq6gDIYsTME2sm7XHt916jVZDnHGYm/pmK+Ijh+IP8m/ekL Ay8w== X-Gm-Message-State: AOAM530XdQ9DYKCzIjfSG8XRhHW6yd87r4jqgy4JDO9QFK207P9AQi68 +9w7yB+6YL6QuEulf5iVV+Vri2C/q5zu0ZqApzt5XlpQxDAUOA== X-Received: by 2002:a2e:760c:: with SMTP id r12mr1789014ljc.299.1622108151511; Thu, 27 May 2021 02:35:51 -0700 (PDT) MIME-Version: 1.0 References: <20210518125202.78658-1-odin@uged.al> <20210518125202.78658-2-odin@uged.al> In-Reply-To: From: Vincent Guittot Date: Thu, 27 May 2021 11:35:40 +0200 Message-ID: Subject: Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking To: Odin Ugedal Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "open list:CONTROL GROUP (CGROUP)" , linux-kernel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 27 May 2021 at 09:51, Odin Ugedal wrote: > > Hi, > > Did some quick testing with bpftrace, and it looks like it is > "update_tg_cfs_load" that does this; > > kretfunc:attach_entity_load_avg: cfs_rq: 0xffff8a3e6773e400, > se.load_sum: 0, se.load_avg: 0, se->load.weight: 1048576 diff.sum: 0, > diff.load: 0, new_sum: 0, new_load: 0, period_contrib: 821 > > // quite some time passes > > kretfunc:update_tg_cfs_load: cfs_rq: 0xffff8a3e6773e400, old load_sum > 0, old load_avg: 0, new load_sum: 47876096, new load_avg: 1022 > tg_load_avg_contrib: 0, period_contrib: 82, on_list: 0, throttled: 0/0 > stack: > bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 > bpf_get_stackid_raw_tp+115 > bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 > bpf_trampoline_6442466486_1+144 > update_tg_cfs_load+5 > update_load_avg+450 > enqueue_entity+91 > enqueue_task_fair+134 > move_queued_task+172 > affine_move_task+1282 > __set_cpus_allowed_ptr+317 > update_tasks_cpumask+70 > update_cpumasks_hier+448 > update_cpumask+345 > cpuset_write_resmask+796 > cgroup_file_write+162 > kernfs_fop_write_iter+284 > new_sync_write+345 > vfs_write+511 > ksys_write+103 > do_syscall_64+51 > entry_SYSCALL_64_after_hwframe+68 > > // There might be more stuff happening here > // __update_load_avg_cfs_rq runs way too often to be able to > // trace it without filtering, and didn't look into that. > > kretfunc:update_tg_cfs_load: cfs_rq: 0xffff8a3e6773e400, old load_sum > 48117265, old load_avg: 1025, new load_sum: 0, new load_avg: 2 > tg_load_avg_contrib: 1022, period_contrib: 223, on_list: 1, throttled: > 1/1 stack: > bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 > bpf_get_stackid_raw_tp+115 > bpf_prog_a37cf01922e02958_update_tg_cfs_l+504 > bpf_trampoline_6442466486_1+144 > update_tg_cfs_load+5 > update_load_avg+450 > update_blocked_averages+723 > newidle_balance+533 > pick_next_task_fair+57 > __schedule+376 > schedule+91 > schedule_hrtimeout_range_clock+164 > do_sys_poll+1043 > __x64_sys_poll+179 > do_syscall_64+51 > entry_SYSCALL_64_after_hwframe+68 > > This particular example resulted in a 75/25 share of cpu time for the > two groups. > > From this, it also looks like it doesn't matter if new load_avg is 0 or not > (when new load_sum is 0), since tg_load_avg_contrib will not be properly > removed either way (without this patch). There might be some other precision > > > > I have added a trace in cfs_rq_is_decayed() but I'm not able to > > reproduce a situation where load_sum == 0 but not load_avg even with > > the script in the cover letter > > Hmm, strange. I am able to reproduce on both v5.12 and v5.13. I am running on > a non SMT 4 core CPU (4670k), with CONFIG_HZ=250. Since this is timing > sensitive, > you might need to tweak the timings for either the CFS bw period/quota, or the > time waiting between changing what cpu it runs on. If you have more than 4 cpus, > you can also try to start on CPU 0 and move it through all cores and > then onto CPU n. > Maybe that makes it possible to reproduce. I finally got it this morning with your script and I confirm that the problem of load_sum == 0 but load_avg != 0 comes from update_tg_cfs_load(). Then, it seems that we don't call update_tg_load_avg for this cfs_rq in __update_blocked_fair() because of a recent update while propagating child's load changes. At the end we remove the cfs_rq from the list without updating its contribution. I'm going to prepare a patch to fix this > > Since bpftrace doesn't work properly in v5.13, this particular test was > on v1.12 with cherry-pick of 0258bdfaff5bd13c4d2383150b7097aecd6b6d82 and > the other patch in this patchset. > > testscript.bt: > > kfunc:attach_entity_load_avg {@a_sum[tid] = > args->cfs_rq->avg.load_sum; @a_load[tid] = > args->cfs_rq->avg.load_avg;} > kretfunc:attach_entity_load_avg{printf("%s: cfs_rq: %p, se.load_sum: > %llu, se.load_avg: %llu, se->load.weight: %llu diff.sum: %llu, > diff.load: %llu, new_sum: %llu, new_load: %llu, period_contrib: > %llu\n", probe, args->cfs_rq,args->se->avg.load_sum, > args->se->avg.load_avg, > args->se->load.weight,args->cfs_rq->avg.load_sum - @a_sum[tid], > args->cfs_rq->avg.load_avg - @a_load[tid], args->cfs_rq->avg.load_sum, > args->cfs_rq->avg.load_avg, args->cfs_rq->avg.period_contrib)} > > kfunc:update_tg_cfs_load {@beg_load_avg[tid] = > args->cfs_rq->avg.load_avg; @beg_load_sum[tid] = > args->cfs_rq->avg.load_sum} > kretfunc:update_tg_cfs_load {printf("%s: cfs_rq: %p, old load_sum > %llu, old load_avg: %llu, new load_sum: %llu, new load_avg: %llu > tg_load_avg_contrib: %llu, period_contrib: %llu, on_list: %d, > throttled: %d/%d stack: %s\n",probe, args->cfs_rq, @beg_load_sum[tid], > @beg_load_avg[tid], args->cfs_rq->avg.load_sum, > args->cfs_rq->avg.load_avg, args->cfs_rq->tg_load_avg_contrib, > args->cfs_rq->avg.period_contrib,args->cfs_rq->on_list, > args->cfs_rq->throttled, args->cfs_rq->throttle_count,kstack)} > > Thanks > Odin