Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5348258imu; Tue, 29 Jan 2019 17:55:52 -0800 (PST) X-Google-Smtp-Source: ALg8bN7mVZ59/8LTyoovW5qFF2DovnKRhYPox4AznC30iVYKDKEdNbVYDwpF6/X0EJCDf9MslYxj X-Received: by 2002:a63:d547:: with SMTP id v7mr25448619pgi.339.1548813352342; Tue, 29 Jan 2019 17:55:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548813352; cv=none; d=google.com; s=arc-20160816; b=OEwleMctlzYUzXzEEI/Tf0+zRktSfv38KD0MYkdkXaWkcmJRvqwxMRpmDLJhQColKN rkntrYwXz9f23ZfbfN0a2czfaIAJSwYdZ4r/K8ikG8tEGvhsKx25ryQM9k2J39naynOG qwDyWyoShB3Bvcp88KHWz+8HKOdv62R2re5yYJKwxj4huRjEHR5RWiI8vqoPnO3fxPDa ZCktJXWa6Lp0T9BnHWx7+g5PBbfQC27yOpK1xIVKOqZtPBZE7TiaE/Zt7MALKlt/3owL cuuuKS4QXVdRl8yY8lvPAftJbqUsn2lPG9QUTLKmNHknlVUeJ3Btq8NdJdHne6UySt64 cGvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=24HDctBBWUEK1NVz6yeUuSWX9Env/mg911t9WKIshjY=; b=ywrUULdYsN6HZhm9wEUSxl3leaNiR8foy6uidt5J2FZnP5kY9EnjWE43GcevLhRVoO H5bra9mrEcTJHwlbUNIJr0I8qpveoTGqcKGGD008RixLgnXxr3t3Ax9QRZP5OGK843Wb IITiXPXfK4WOAUxZPhk5RFjzY3wITfF7ve9m3aX6GRVkJ5DH6ouqjlbYDMfe/ltCmFbw e0gTiJNesKP1zbfyGg1V2vDWnVQOC6XRc6wSDYq5FmlPAHeVFI9w+sJbWbKeR7NnSMi7 9+O3kq41p+DJ4ry0B0Hjyj+saE1qtHED75nvgNcfMfud/K4/yaie2wgMJC7OkKKbYVdS 4DGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f1si60580pgq.553.2019.01.29.17.55.36; Tue, 29 Jan 2019 17:55:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728176AbfA3Bxy (ORCPT + 99 others); Tue, 29 Jan 2019 20:53:54 -0500 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:58605 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727618AbfA3Bxy (ORCPT ); Tue, 29 Jan 2019 20:53:54 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0TJEZP5._1548813231; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TJEZP5._1548813231) by smtp.aliyun-inc.com(127.0.0.1); Wed, 30 Jan 2019 09:53:51 +0800 Subject: Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group To: =?UTF-8?B?56a56Iif6ZSu?= Cc: mingo@redhat.com, Peter Zijlstra , Wind Yu , linux-kernel@vger.kernel.org References: <1548236816-18712-1-git-send-email-ufo19890607@gmail.com> <3680160f-a439-02a3-3d40-56de18096c4b@linux.alibaba.com> <6625b261-d199-6b37-de65-1b576dcbad5b@linux.alibaba.com> From: =?UTF-8?B?546L6LSH?= Message-ID: <4840cfe6-161a-5d2e-361d-732bedcaee87@linux.alibaba.com> Date: Wed, 30 Jan 2019 09:53:51 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/1/28 下午3:21, 禹舟键 wrote: [snip] >> No offense but I'm afraid you misunderstand the problem we try to solve >> by wait_sum, if your purpose is to have a way to tell whether there are >> sufficient CPU inside a container, please try lxcfs + top, if there are >> almost no idle and load is high, then the CPU resource is not sufficient. > > emmmm... Maybe I didn't make it clear. We need to dynamically adjust the > number of CPUs for a container based on the running state of tasks inside > the container. If we find tasks' wait_sum are really high, we will add more > CPU cores to this container, or else we will decline some CPU to this container. > In a word, we want to ensure 'co-scheduling' for high priority containers. > I understand that you want to use task wait time which is a raw metric, but IMHO when task wait more, idle will be less and load will be high, they are more general metric to tell whether there are CPU starving rather then task wait time on rq, and we rely on these too. The only issue we got previously is that we don't know what caused the less idle and high load, could be wrong resource assignment or cgroup competition, and now with wait_sum we can firstly make sure competition is low, then if idle is still low and load is still high inside container, time to assign more CPU. > >> Frankly speaking this sounds like a supplement rather than a missing piece, >> although we don't rely on lxcfs and modify the kernel ourselves to support >> container environment, I still don't think such kind of solutions should be >> in kernel. > > I don't care if this value is considered as a supplement or a missing piece. I > only care about how can I assess the running state inside a container. I think > lxcfs is really a good solution to improve the visibility of container > resources, > but it is not good enough at the moment. > > /proc/cpuinfo > /proc/diskstats > /proc/meminfo > /proc/stat > /proc/swaps > /proc/uptime > > we can read this procfs file inside a container,but this file still > cannot reflect > real-time information. Please think about the following scenario: a > 'rabbit' process > will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms > and then exit. How can we detect this thrashing workload without > hierarchy wait_sum? As mentioned, we implement the isolation on ourselves, so we will see the isolated idle and load information inside container, rather then the host data, we don't rely on lxcfs but we know it's something doing similar work, so what you got by reading /proc/stat, does it tell the isolated idle data? And you need a isolated /proc/loadavg too. Anyway, IMHO this is a special requirement only for container environment, not a general solution for kernel problem, so I would suggest either help improve the lxcfs to make it useful for your production, or do the modification in your own kernel. Regards, Michael Wang > > Thanks, > Yuzhoujian >