Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp4405999pxy; Tue, 27 Apr 2021 04:27:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw2Yf0tdRyXT1wJbuGuwyM7vmiS4lKplVYctPLE/BDVBeI/X53AHxc/YF5vFj3Y/qTBk3zx X-Received: by 2002:a17:906:7016:: with SMTP id n22mr22884681ejj.23.1619522822141; Tue, 27 Apr 2021 04:27:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619522822; cv=none; d=google.com; s=arc-20160816; b=nSMGJylY8F7JS7y3LPwJofyx99vxYrW8777rrTWu3pA7NcLQP/KsE5G/nYaL//J8+U Y+b3ZY2hTtxBlmjt8feO1feYvq6hOFWbCLjJ/GXDF1FRUb2L6mIANs+KCQVHY2Rxfg9v YbrM9g6G/r5Yozp7YO867V+1hC2I7yRuS/0UxHeAd3SogbqGVrktTXrVKzv9G4ckiZy9 OMCVpTrrzQJ9dmMhVstRJ61OJci5pifH/ec+YspCwTNrK7E0rS+9ZNiDI80ANHfMkVcm 1jfunOX3TJnO9ryJT7zTheQpLpS9WBrFJxSFQ4ZcsTxSiD0uPU0ME7+KZWwZCyIDtn4h 9KaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=/+zgbtU1vOwhB3SBIed2el/86CyL2G7RXLQCFx+dXdg=; b=WkvABtbfTmR31hpOt0MBTV6gBXfIR6OktrQvRUfr243W/JUHUrBWkmkblYthky/ykQ yn0fKPYvPZihzhwxsUXuSd3ycTVrXZoTZo77PYEboJ886RH5aav0+l2ZNnhLZLaF2B3v 2tgukk65OXj8GKcoc61E+RlblP1EstB2R0vu89l33wr7Q/G9plgHpHH3HKhHEZbVWGIJ ll1xwtRWpaEJROdUF/Hv1bsNTrai/mcJixx+459fe0zktcHOD2kM0frrTr42va1D/oSU kPqIha/7HmD8HqqclBLgz6BHYFYBSvQqcpj97M/hKQCG55DXpfSiH71NALqkRytsFSvp R7NQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ugedal.com header.s=google header.b=HlpVAXsG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u16si2330576edi.298.2021.04.27.04.26.38; Tue, 27 Apr 2021 04:27:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ugedal.com header.s=google header.b=HlpVAXsG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235443AbhD0LZT (ORCPT + 99 others); Tue, 27 Apr 2021 07:25:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230270AbhD0LZS (ORCPT ); Tue, 27 Apr 2021 07:25:18 -0400 Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D013C061574 for ; Tue, 27 Apr 2021 04:24:34 -0700 (PDT) Received: by mail-qt1-x830.google.com with SMTP id z5so5695392qts.3 for ; Tue, 27 Apr 2021 04:24:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ugedal.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/+zgbtU1vOwhB3SBIed2el/86CyL2G7RXLQCFx+dXdg=; b=HlpVAXsGwPwleAUn5ptrQGwKr3io8B11Ggf5oOOFF/81PXRpzhU1BycPn7o/Jt6S2g nv/RFiCmYAREjuFLohn5tc3Ph5xIJF73FSXqTuYWAZ0H8ahvcbSLqjCJyJf+1zAaKyp1 313H9G30Y4gMvZnhEis/kECBcqIGerF7dKuyyPVLXUe9gOizLeCoO7uaonnh2SayjBtQ ysQoQYoXuyE6PCWMeBMwEJzM/nIokfX0pKBmZMBH6RaoRHr2b2NQRV95mKW4iAtt5Ja1 9ge5M8wjBHy8+wdmhfhtpbixsu/ynWL/M5lmCQnDRIDrQHtRAG4l8/D13PrnsZgGuJ+6 YTuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/+zgbtU1vOwhB3SBIed2el/86CyL2G7RXLQCFx+dXdg=; b=egrZI6ILmSz+V3+llpj7tn6VkXRH+XW9G+HdZ/KMjrxX4LVuMSERuCydB6iEdMtSYR Jodr8HmsUoZWKjyAsk1Mcy11syPw+jtABgkCWedoVfAZsYEa7k5EH5OlEIibAZFsBCLS w7G4fPUdy/rFNEI20t0mLc4l8/KbuDkJmaJ220RIIThE/4tE/BcB7bpcwq5k8Yyku+3g ovX1ZT+NbY0I3BJ6xdT8T+Ua55+7eCqfyQ8BazPL27P3MVt3y8EMKltzdSqI5+InM1M4 PHeMcQmT3EENpdcopby1lUHHbbK5Y60ax1WifqI2z6aiXlvXwLnndnCtK69oRzAEaUqf zzsA== X-Gm-Message-State: AOAM531q188qufHFhDcz8cDEhgaTAjFksOentuT9/CYFub+0lEtURyrA mm8ixnassCKB6YIXnBFXzNah2d/RJlqwn7qkSlVRKA== X-Received: by 2002:ac8:6f7b:: with SMTP id u27mr20961714qtv.209.1619522673525; Tue, 27 Apr 2021 04:24:33 -0700 (PDT) MIME-Version: 1.0 References: <20210425080902.11854-1-odin@uged.al> In-Reply-To: From: Odin Ugedal Date: Tue, 27 Apr 2021 13:24:00 +0200 Message-ID: Subject: Re: [PATCH 0/1] sched/fair: Fix unfairness caused by missing load decay To: Vincent Guittot Cc: Odin Ugedal , Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "open list:CONTROL GROUP (CGROUP)" , linux-kernel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, > I wanted to say one v5.12-rcX version to make sure this is still a > valid problem on latest version Ahh, I see. No problem. :) Thank you so much for taking the time to look at this! > I confirm that I can see a ratio of 4ms vs 204ms running time with the > patch below. (I assume you talk about the bash code for reproducing, not the actual sched patch.) > But when I look more deeply in my trace (I have > instrumented the code), it seems that the 2 stress-ng don't belong to > the same cgroup but remained in cg-1 and cg-2 which explains such > running time difference. (mail reply number two to your previous mail might also help surface it) Not sure if I have stated it correctly, or if we are talking about the same thing. It _is_ the intention that the two procs should not be in the same cgroup. In the same way as people create "containers", each proc runs in a separate cgroup in the example. The issue is not the balancing between the procs themselves, but rather cgroups/sched_entities inside the cgroup hierarchy. (due to the fact that the vruntime of those sched_entities end up being calculated with more load than they are supposed to). If you have any thought about the phrasing of the patch itself to make it easier to understand, feel free to suggest. Given the last cgroup v1 script, I get this: - cat /proc//cgroup | grep cpu 11:cpu,cpuacct:/slice/cg-1/sub 3:cpuset:/slice - cat /proc//cgroup | grep cpu 11:cpu,cpuacct:/slice/cg-2/sub 3:cpuset:/slice The cgroup hierarchy will then roughly be like this (using cgroup v2 terms, becuase I find them easier to reason about): slice/ cg-1/ cpu.shares: 100 sub/ cpu.weight: 1 cpuset.cpus: 1 cgroup.procs - stress process 1 here cg-2/ cpu.weight: 100 sub/ cpu.weight: 10000 cpuset.cpus: 1 cgroup.procs - stress process 2 here This should result in 50/50 due to the fact that cg-1 and cg-2 both have a weight of 100, and "live" inside the /slice cgroup. The inner weight should not matter, since there is only one cgroup at that level. > So your script doesn't reproduce the bug you > want to highlight. That being said, I can also see a diff between the > contrib of the cpu0 in the tg_load. I'm going to look further There can definitely be some other issues involved, and I am pretty sure you have way more knowledge about the scheduler than me... :) However, I am pretty sure that it is in fact showing the issue I am talking about, and applying the patch does indeed make it impossible to reproduce it on my systems. Odin