Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1410548imm; Wed, 25 Jul 2018 18:09:17 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeTRrZjdfGesu1uA+w6repxb4CIvBvhcbnepRxM+UJwqr5T4/9s/68KvjxoqoZXFuDTHr9X X-Received: by 2002:a17:902:27a8:: with SMTP id d37-v6mr15712146plb.290.1532567357070; Wed, 25 Jul 2018 18:09:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532567357; cv=none; d=google.com; s=arc-20160816; b=0A1WDsSATCRKrQLwpm5kjFOQSl2hw/4u28A4ecJVOyyJnqw7YxJZqQYUTj1LBXtxiO LfW1lXDKNsxBfztgQqW49gL9/cqgFByAQPPZ9jEHGdRPknAM1rCc3DgekODIxqR6Y7tc FOB9Dmt0TE3dNYP8wisMzjeiZ9UaUtRCi6OAlKHCC/QEC9odXtuo8dVOacr1vx0u3mcZ 1QJu8b9QtkL5OPGegASIP7ilWZ8uwI3K9hp88miS9ORbOSYt4IIKj/AWXcY3JSxZUpU/ G4JFqoSPYQKRr0cRa+pIn/Q2Jz5joyrIo6w/xNF3XPPLkwUKqoi1uU92Q3DRjdi9aYUM oSKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=ywLQPJZS0v12teFwlUPp7DbAsuqlLkl4NP6N1NUIKtI=; b=Dg9MvwY89cu7bA8HspGoQb9mUMqeqlcpk+dmeIf6ttOZB+HmvpNUp+hLHC34mujxhi Fonuw2dU/BeN5AqxmcWVzU4yuycOmgMQ10i5DEtXe8q38iF80FtN6JUwo+vJD2kBkG7w 9rv0qy9bv9JXs2JhhQTM2f5Z1ZcMrieltx7+rrwdycYnQlTczG4s1/D8ACoSgVizqCsF jSth2xgXm2IjifDj7uxtVfxeAgtN87KFFYGhM56MvbqBtp5QfKDBVBsyzR8Wya2QfSlX y3V36A91kwefk7olSEdSkASei8pV+qPF31GTHuA1HVDoz8zpB3QW6EE0X6SUW2DjGjYm 4vgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BTZDI6gP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p1-v6si9783748pfb.280.2018.07.25.18.09.02; Wed, 25 Jul 2018 18:09:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BTZDI6gP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728629AbeGZCV5 (ORCPT + 99 others); Wed, 25 Jul 2018 22:21:57 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:45811 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728292AbeGZCV5 (ORCPT ); Wed, 25 Jul 2018 22:21:57 -0400 Received: by mail-pg1-f193.google.com with SMTP id f1-v6so14596pgq.12; Wed, 25 Jul 2018 18:07:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=ywLQPJZS0v12teFwlUPp7DbAsuqlLkl4NP6N1NUIKtI=; b=BTZDI6gPlmJZb4X/f/ZXKG0R6lBpB+i93gTpiJG44Vt/VTLg2KwYdSDJSlQyGPT9A7 6IM3vAe1JisN43wFMncOnEcSWsuDgCVYM1wRHbh6b66DgPCC59ohj4c6Qv8KtrKMMoxL KqNMLQX7FWz1OJUdY60Z9Zp2Xmvolc1n6UmRfd0zt93MVWslbO01KzYxpmZxK6vQvswK lQnBL0W6JhxmGGOfHOQWKYOpA24j1e1WYbyxK9i0rKe0Ett9h3n2EAh9MRchuC61gUtH LoTiikcs9zg0dZ+TE3nUI7Ez1gpUKrLG/ZzxgOkTDsVI2mACmCA7WAbwNWXRwefj4GsK cm7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ywLQPJZS0v12teFwlUPp7DbAsuqlLkl4NP6N1NUIKtI=; b=k3SuRyhHge9eTHLMJV3a9YNJRnn+lWsi5dA13xEBoEJQID5/IVy7dg+rAL6E4e36O/ 7LIRoZhQXjqNtmB4CSRvETYSav/De1JIQOEEedDIKGE4yaWkmNrdXIxhuvLazlkmXxf+ kkUFxk0WI4UpKB5ZyMgD0sbWgEhp8hVdCp2fPsax8fXl4bAxYTHQK+mm8hgTwVeWQ1oK 1gfDdV0uyk1wQeXT1vcg1BBw83YHW48WmNwQfivV77Gfo7PNXWPRf95hrbJnbGbkCvcF GtSs/HXdM31jScP20C0HXbca9B+Lahwc/OL+wIEOw+EeJPsxKTtYYCYcyFCQQAe7HNxr s54g== X-Gm-Message-State: AOUpUlG07t7UBKUDqO63thbBnlq4kNYvf0CQDq+T1UaZOZdlUGnTY+60 1xVvUMI3PpeU8MBMLV+QXDPalmJZgzE= X-Received: by 2002:a63:214f:: with SMTP id s15-v6mr22333482pgm.267.1532567260091; Wed, 25 Jul 2018 18:07:40 -0700 (PDT) Received: from 8c8590bceeee.ant.amazon.com ([54.240.193.1]) by smtp.gmail.com with ESMTPSA id g63-v6sm22101545pfc.77.2018.07.25.18.07.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Jul 2018 18:07:39 -0700 (PDT) Subject: Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 To: Johannes Weiner Cc: Ingo Molnar , Peter Zijlstra , "akpm@linux-foundation.org" , Linus Torvalds , Tejun Heo , surenb@google.com, Vinayak Menon , Christoph Lameter , Mike Galbraith , Shakeel Butt , linux-mm , cgroups@vger.kernel.org, "linux-kernel@vger.kernel.org" , kernel-team@fb.com References: <20180712172942.10094-1-hannes@cmpxchg.org> <20180724151519.GA11598@cmpxchg.org> From: "Singh, Balbir" Message-ID: <268c2b08-6c90-de2b-d693-1270bb186713@gmail.com> Date: Thu, 26 Jul 2018 11:07:32 +1000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180724151519.GA11598@cmpxchg.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/25/18 1:15 AM, Johannes Weiner wrote: > Hi Balbir, > > On Tue, Jul 24, 2018 at 07:14:02AM +1000, Balbir Singh wrote: >> Does the mechanism scale? I am a little concerned about how frequently >> this infrastructure is monitored/read/acted upon. > > I expect most users to poll in the frequency ballpark of the running > averages (10s, 1m, 5m). Our OOMD defaults to 5s polling of the 10s > average; we collect the 1m average once per minute from our machines > and cgroups to log the system/workload health trends in our fleet. > > Suren has been experimenting with adaptive polling down to the > millisecond range on Android. > I think this is a bad way of doing things, polling only adds to overheads, there needs to be an event driven mechanism and the selection of the events need to happen in user space. >> Why aren't existing mechanisms sufficient > > Our existing stuff gives a lot of indication when something *may* be > an issue, like the rate of page reclaim, the number of refaults, the > average number of active processes, one task waiting on a resource. > > But the real difference between an issue and a non-issue is how much > it affects your overall goal of making forward progress or reacting to > a request in time. And that's the only thing users really care > about. It doesn't matter whether my system is doing 2314 or 6723 page > refaults per minute, or scanned 8495 pages recently. I need to know > whether I'm losing 1% or 20% of my time on overcommitted memory. > > Delayacct is time-based, so it's a step in the right direction, but it > doesn't aggregate tasks and CPUs into compound productivity states to > tell you if only parts of your workload are seeing delays (which is > often tolerable for the purpose of ensuring maximum HW utilization) or > your system overall is not making forward progress. That aggregation > isn't something you can do in userspace with polled delayacct data. By aggregation you mean cgroup aggregation? > >> -- why is the avg delay calculation in the kernel? > > For one, as per above, most users will probably be using the standard > averaging windows, and we already have this highly optimizd > infrastructure from the load average. I don't see why we shouldn't use > that instead of exporting an obscure number that requires most users > to have an additional library or copy-paste the loadavg code. > > I also mentioned the OOM killer as a likely in-kernel user of the > pressure percentages to protect from memory livelocks out of the box, > in which case we have to do this calculation in the kernel anyway. > >> There is no talk about the overhead this introduces in general, may be >> the details are in the patches. I'll read through them > > I sent an email on benchmarks and overhead in one of the subthreads, I > will include that information in the cover letter in v3. > > https://lore.kernel.org/lkml/20180718215644.GB2838@cmpxchg.org/ Thanks, I'll take a look Balbir Singh.