Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp668478imm; Wed, 22 Aug 2018 10:29:48 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxGOHgUxQlDpHR6qpEJbuKW68g1HIuwBvCXyt/S5tOOXWrqlga76z3PdUwZAe3XwWWz45zA X-Received: by 2002:a62:9645:: with SMTP id c66-v6mr59083749pfe.56.1534958988842; Wed, 22 Aug 2018 10:29:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534958988; cv=none; d=google.com; s=arc-20160816; b=K5/kEPqubbjRK0amDyXQ9NUTWm0cx+xZMofjSWiJ86l8eSwhYkwE3QfOOoRIimfhoK RmizeDjsY+Q/a8DRx0oF8fFYkl3PNBuO+ywUcB+ufVBfAAkfbpDhiOBowqnWUO9nEyEg OHYY3wGmrpfeLla9fpypQQ+ejKOU++h1GWO6XWkg2mVqM5wNm3w5kMz4Yg8+S/9orMhD 9kcFKSOxdMqh57w0Ze9BngDb+k0esf5I/ka3Ppq75zDO8ekYVUSEyVrsBYiJBsibZ+8M X42RVMHM6LtiyXoovUVZZ9IB22PsIUA/vCvVJsYPmFtOeag1+tq7jQmnveIMKeNzmjPD 3wIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=0fqYneS8rObbjrQtaVaioH+NNLEqKGl0wY/oATv9iXk=; b=J8ntSeKjcVtTkUrJx6a4PaIrh0pKUbRjo4nxoOfuNke2XDqqzjmgMEzKaWuu+6at0G 1mL9uMkLtCUT7638FGEVz34ZnvJ7e0jnCk/BFrsGUHAw8Q1GINrATsc48+Nr/YSfqi+O 950YICb8SpMGA0TSvvxewcgKjhq2HFX+WUjyOTfNaCEv+hzmmf2t4YGmK3pARdWVnpYq Pvef5u7lp32inwmeOlS+yXPp5jySu7KjmFOTaffuT+OjrFgxUx6ZgINefSrMGRD0As1e XQf/y63BRzTtEUdGU7icBrqxoycVd3V723r5cdJww332xeSUydGiZRLtDuLAI9lll5N3 +mnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="zMr/NBJz"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l186-v6si2026948pge.433.2018.08.22.10.29.32; Wed, 22 Aug 2018 10:29:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="zMr/NBJz"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727484AbeHVUyO (ORCPT + 99 others); Wed, 22 Aug 2018 16:54:14 -0400 Received: from mail-yw1-f67.google.com ([209.85.161.67]:42012 "EHLO mail-yw1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727334AbeHVUyO (ORCPT ); Wed, 22 Aug 2018 16:54:14 -0400 Received: by mail-yw1-f67.google.com with SMTP id n207-v6so931456ywn.9 for ; Wed, 22 Aug 2018 10:28:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=0fqYneS8rObbjrQtaVaioH+NNLEqKGl0wY/oATv9iXk=; b=zMr/NBJzORtE1COVfHArW9FY3wL+CbxviCU//4jeV194L2ftzIk+l+qA2p5rCW/Gdq 11Os04dc43eIa0tEDp9oFg/y+Wo6lsqcyG4FDCxC5aH71q59ijQPQkeRNbm5dOOD4Io7 aWhzN6IkM0yvn6C9AJt+0Ft65+pOMvYJgILrqhaQdPx/Poj1xuuq+m4nlfUIvU64XBk8 rhTxgyiwJtR6Ysoo97y7QX/aRjhHlatDrZ9GerLkrAhB6GJ7AYO9XTeNWSn+W3jwU7Av kWrqQi83dDWfU304d8y4A3MFqjsMOh9Z4SyIb4YEjffsNprKHUp1LKX930wyl26XGNsK wgPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=0fqYneS8rObbjrQtaVaioH+NNLEqKGl0wY/oATv9iXk=; b=LZYT22QdrztDn+uiIN+EBxY8/nKD2OPmuo4R0HrvYx4YDYckD7Wn3tXxaFSAJyB0La Q7yFmQtZnzqNlpyL7JvUZW7aCCIDzy6eEIvCOSox/mNQvCeoaPWhmemCxIgISEPVSHy8 dNpOcbCTGHEGchqszC2OUrICeYyfG8Sq7EY7PpNFz/HKdsCcfZhCpVz38E97Qu2uXyAC CdeIiM6M6k7ka9lvSIx+LqMjkmat+CxLW/N7NWlBjfKEWWUxgPxhQEvDJbEHF6ud96La YDNvpT3A8eUniKdeQdh/ccezKuo/8PvRjMupZr8n0wl9V15lwR/JYmJik6l64hCQN4tO XN4w== X-Gm-Message-State: APzg51AjyEL5J3DFwpFW8T2CpZ1CF0GJBjivajc9g8D7uIOnjmrVMMwT jBsd5keeDJfuNtHCSuhM8Fm/nA== X-Received: by 2002:a0d:c6c5:: with SMTP id i188-v6mr604994ywd.262.1534958907476; Wed, 22 Aug 2018 10:28:27 -0700 (PDT) Received: from localhost ([2620:10d:c091:200::2:8f6c]) by smtp.gmail.com with ESMTPSA id r3-v6sm883206ywd.82.2018.08.22.10.28.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 22 Aug 2018 10:28:26 -0700 (PDT) Date: Wed, 22 Aug 2018 13:28:25 -0400 From: Johannes Weiner To: Peter Zijlstra Cc: Ingo Molnar , Andrew Morton , Linus Torvalds , Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 8/9] psi: pressure stall information for CPU, memory, and IO Message-ID: <20180822172825.GA1317@cmpxchg.org> References: <20180801151958.32590-1-hannes@cmpxchg.org> <20180801151958.32590-9-hannes@cmpxchg.org> <20180803172139.GE2494@hirez.programming.kicks-ass.net> <20180821201115.GB24538@cmpxchg.org> <20180822091024.GU24124@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180822091024.GU24124@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 22, 2018 at 11:10:24AM +0200, Peter Zijlstra wrote: > On Tue, Aug 21, 2018 at 04:11:15PM -0400, Johannes Weiner wrote: > > On Fri, Aug 03, 2018 at 07:21:39PM +0200, Peter Zijlstra wrote: > > > On Wed, Aug 01, 2018 at 11:19:57AM -0400, Johannes Weiner wrote: > > > > + time = READ_ONCE(groupc->times[s]); > > > > + /* > > > > + * In addition to already concluded states, we > > > > + * also incorporate currently active states on > > > > + * the CPU, since states may last for many > > > > + * sampling periods. > > > > + * > > > > + * This way we keep our delta sampling buckets > > > > + * small (u32) and our reported pressure close > > > > + * to what's actually happening. > > > > + */ > > > > + if (test_state(groupc->tasks, cpu, s)) { > > > > + /* > > > > + * We can race with a state change and > > > > + * need to make sure the state_start > > > > + * update is ordered against the > > > > + * updates to the live state and the > > > > + * time buckets (groupc->times). > > > > + * > > > > + * 1. If we observe task state that > > > > + * needs to be recorded, make sure we > > > > + * see state_start from when that > > > > + * state went into effect or we'll > > > > + * count time from the previous state. > > > > + * > > > > + * 2. If the time delta has already > > > > + * been added to the bucket, make sure > > > > + * we don't see it in state_start or > > > > + * we'll count it twice. > > > > + * > > > > + * If the time delta is out of > > > > + * state_start but not in the time > > > > + * bucket yet, we'll miss it entirely > > > > + * and handle it in the next period. > > > > + */ > > > > + smp_rmb(); > > > > + time += cpu_clock(cpu) - groupc->state_start; > > > > + } > > > > > > As is, groupc->state_start needs a READ_ONCE() above and a WRITE_ONCE() > > > below. But like stated earlier, doing an update in scheduler_tick() is > > > probably easier. > > > > I've wrapped these in READ_ONCE/WRITE_ONCE. > > I just realized, these are u64, so READ_ONCE/WRITE_ONCE will not work > correct on 32bit. Ah, right. Actually, that race described in the comment above - "If the time delta is out of state_start but not in the time bucket yet, we'll miss it entirely and handle it in the next period" - can cause bogus time samples if state persists for more than 2s. Because if we observed a live state and included it in our private copy of the time bucket (times_prev), missing the delta in transit to the time bucket in the next aggregation results in times_prev being ahead of 'time', which causes the delta to underflow into a bogusly large sample. Memory barriers alone cannot guarantee full coherency here (neither seeing the delta twice, nor missing it entirely) so I'm switching this over to seqcount to make sure the aggregator sees something sensible. And then I don't need the READ_ONCE/WRITE_ONCE.