Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp151132ybe; Wed, 4 Sep 2019 17:01:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqzcxh8nrN0RikGk9WKP+Zxz1tYuNohPC+RNLUdIPrP39IypKy2ykC+bFfWb0bZYXJQ4PF5k X-Received: by 2002:a17:90a:1b46:: with SMTP id q64mr813401pjq.97.1567641663203; Wed, 04 Sep 2019 17:01:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567641663; cv=none; d=google.com; s=arc-20160816; b=b7rEmnX/58TAYPSTP0QwJ5EkNNFNifUeDnJY/H8nA9LA54AajEClpc19M5wFkyMSw0 gGit4+Zz0wpNjJK9fsX1xjBOfZsA0kRIlGHGy+qTbZBbfs615vCqsvkC4CuIQxrzs0hv HLvkZzpG7EO6ZBi3CoioDQu0A110GzJdLfxWRaTaO9p3nvG42O1VBiuJ+yD0V7czEzbC WK2o6/kwWn7DB7sITJp0QX9RI94FNBXCu3XJ9JQY7wA9SmvScX4nkGFbCuaBUh1VJZDT DJlnvZ1F36SEEdhanoA0zZChYxlhq9IWhT1CxcbwD1JLqT50XKdmlb5lWa7PX8femxYG 7XXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=+X0h1ts+MVZV3XmIkyVf8C7h9fPhIeM62LiNUi/8G3A=; b=kWqOUIJmQTkH55NQaID71NoHPuUTsDD92LdTHIfeXX4JpjblEiOBZfquak6Vkb8DoB ukv1jyHsUg88inv0vL4N09VmU72E5aJL/7Dp5jGCFwhdcHimYq8NMXmp41kCf2mmL2lE wqy/2jpZuudT2cFemOF6lxYvsh7VIW0q8ZApsaT0rt2utJjjXQebtqDS4BcSLXCFzVmZ u1hss7UqOM56ZB77GvWyh6R8edBoMtuzfeYAS5XPdy031BYx35MtpSvbrh1KrU0jzTmq SpyjBuDHj1/yJJVshZ0s1PnAkWvkAMNNW7wQQuyz2taVDcSrQJ5zABXOdNq4cjiVr0r1 8umg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fIXcKTuK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v2si277373pfm.195.2019.09.04.17.00.46; Wed, 04 Sep 2019 17:01:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=fIXcKTuK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730630AbfIDX76 (ORCPT + 99 others); Wed, 4 Sep 2019 19:59:58 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:38497 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730196AbfIDX75 (ORCPT ); Wed, 4 Sep 2019 19:59:57 -0400 Received: by mail-pf1-f195.google.com with SMTP id h195so471650pfe.5 for ; Wed, 04 Sep 2019 16:59:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=+X0h1ts+MVZV3XmIkyVf8C7h9fPhIeM62LiNUi/8G3A=; b=fIXcKTuKnw1UQvuGKxVfWjVBLyghLw9pQEsEMRAmxScf17+aBm3Cvg9tH3Hv/qq4N8 XY3oHrgvMGnbEjRBcNRLZWnoBnDDaGy8fwfMIdo47scstvvgyR5PJU9HTQSZ5XeFp3mO NYlWoLk6u5yQzec20/dwKo37T07QPkcJtAnlbjF6xT1SS3yepMYpGkFrSSoZ7ANWPm3u XHIlwi//MzAp/RUrsh/BDboGHEDtIslWQG2G4DuB5D2aQQ9eWAf9ME24K/JCMuE3dARF f1CewAzQ8PFVk/yA7tw5y9Cm5mTJTRUkBrTE52Dwkey+N0yzq7NFTT7QtwvXnsJmvG2W HUXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=+X0h1ts+MVZV3XmIkyVf8C7h9fPhIeM62LiNUi/8G3A=; b=LLPwA/JWEZq4FQlsKYuZ/MQGS/MNbhjX4d3V5Yd7OimUG5yvyQVk9tbMxaJXJMeJIG Z0lpIPMddrgD5WEvjsqoVc9ya1kRieZD3YUll/7CXmCKoXXgL5yC9oMxNolQjw+9Q9Td ICGbf08QQo/kc3QdfIojv1dXZ+lgn6ByC9Q97Gms3GaF9krhODxy5s0IeYGZeUVMtnih WshPDvSUnZ0geZdd7gOJJb6X0G7Zw5mEpH+qnOGdihJNSI6tRYFN8/MBGRtMixtRj0h2 m13jESmzEVHtuKUgHD7ceePL4lHwbhfneqEc33EQzfIDMgGL9MagvlSTiiZEile8W3sq CzmA== X-Gm-Message-State: APjAAAXpS9rXqWcv+iotsluGROP/zAahbK9oZerf8ElqKJc9Nk+UDVe7 O2CZjITXMDMjeh1xjiGhpOTIcQ== X-Received: by 2002:aa7:8251:: with SMTP id e17mr356053pfn.189.1567641596235; Wed, 04 Sep 2019 16:59:56 -0700 (PDT) Received: from sspatil-glaptop2.roam.corp.google.com (96-90-215-179-static.hfc.comcastbusiness.net. [96.90.215.179]) by smtp.gmail.com with ESMTPSA id k14sm214018pgi.20.2019.09.04.16.59.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Sep 2019 16:59:55 -0700 (PDT) Date: Wed, 4 Sep 2019 16:59:53 -0700 From: sspatil@google.com To: dancol@google.com, joel@joelfernandes.org, surenb@google.com, linux-kernel@vger.kernel.org, timmurray@google.com, carmenjackson@google.com, mayankgupta@google.com, rostedt@goodmis.org, minchan@kernel.org, akpm@linux-foundation.org, kernel-team@android.com, aneesh.kumar@linux.ibm.com, dan.j.williams@intel.com, jglisse@redhat.com, linux-mm@kvack.org, willy@infradead.org, mhocko@suse.cz, rcampbell@nvidia.com, vbabka@suse.cz, sspatil+mutt@google.com Cc: Joel Fernandes , Suren Baghdasaryan , LKML , Tim Murray , Carmen Jackson , Mayank Gupta , Steven Rostedt , Minchan Kim , Andrew Morton , kernel-team , "Aneesh Kumar K.V" , Dan Williams , Jerome Glisse , linux-mm , Matthew Wilcox , Michal Hocko , Ralph Campbell , Vlastimil Babka Subject: Re: [PATCH v2] mm: emit tracepoint when RSS changes by threshold Message-ID: <20190904235953.GH14859@google.com> References: <20190903200905.198642-1-joel@joelfernandes.org> <20190904051549.GB256568@google.com> <20190904145941.GF240514@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 04, 2019 at 10:15:10AM -0700, 'Daniel Colascione' via kernel-team wrote: > On Wed, Sep 4, 2019 at 7:59 AM Joel Fernandes wrote: > > > > On Tue, Sep 03, 2019 at 10:42:53PM -0700, Daniel Colascione wrote: > > > On Tue, Sep 3, 2019 at 10:15 PM Joel Fernandes wrote: > > > > > > > > On Tue, Sep 03, 2019 at 09:51:20PM -0700, Daniel Colascione wrote: > > > > > On Tue, Sep 3, 2019 at 9:45 PM Suren Baghdasaryan wrote: > > > > > > > > > > > > On Tue, Sep 3, 2019 at 1:09 PM Joel Fernandes (Google) > > > > > > wrote: > > > > > > > > > > > > > > Useful to track how RSS is changing per TGID to detect spikes in RSS and > > > > > > > memory hogs. Several Android teams have been using this patch in various > > > > > > > kernel trees for half a year now. Many reported to me it is really > > > > > > > useful so I'm posting it upstream. > > > > > > > > > > It's also worth being able to turn off the per-task memory counter > > > > > caching, otherwise you'll have two levels of batching before the > > > > > counter gets updated, IIUC. > > > > > > > > I prefer to keep split RSS accounting turned on if it is available. > > > > > > Why? AFAIK, nobody's produced numbers showing that split accounting > > > has a real benefit. > > > > I am not too sure. Have you checked the original patches that added this > > stuff though? It seems to me the main win would be on big systems that have > > to pay for atomic updates. > > I looked into this issue the last time I mentioned split mm > accounting. See [1]. It's my sense that the original change was > inadequately justified; Michal Hocko seems to agree. I've tried > disabling split rss accounting locally on a variety of systems --- > Android, laptop, desktop --- and failed to notice any difference. It's > possible that some difference appears at a scale beyond that to which > I have access, but if the benefit of split rss accounting is limited > to these cases, split rss accounting shouldn't be on by default, since > it comes at a cost in consistency. > > [1] https://lore.kernel.org/linux-mm/20180227100234.GF15357@dhcp22.suse.cz/ > > > > > I think > > > > discussing split RSS accounting is a bit out of scope of this patch as well. > > > > > > It's in-scope, because with split RSS accounting, allocated memory can > > > stay accumulated in task structs for an indefinite time without being > > > flushed to the mm. As a result, if you take the stream of virtual > > > memory management system calls that program makes on one hand, and VM > > > counter values on the other, the two don't add up. For various kinds > > > of robustness (trace self-checking, say) it's important that various > > > sources of data add up. > > > > > > If we're adding a configuration knob that controls how often VM > > > counters get reflected in system trace points, we should also have a > > > knob to control delayed VM counter operations. The whole point is for > > > users to be able to specify how precisely they want VM counter changes > > > reported to analysis tools. > > > > We're not adding more configuration knobs. > > This position doesn't seem to be the thread consensus yet. > > > > > Any improvements on that front can be a follow-up. > > > > > > > > Curious, has split RSS accounting shown you any issue with this patch? > > > > > > Split accounting has been a source of confusion for a while now: it > > > causes that numbers-don't-add-up problem even when sampling from > > > procfs instead of reading memory tracepoint data. > > > > I think you can just disable split RSS accounting if it does not work well > > for your configuration. > > There's no build-time configuration for split RSS accounting. It's not > reasonable to expect people to carry patches just to get their memory > usage numbers to add up. sure, may be send a patch to make one in that case or for deleting the split RSS accounting code like you said below. > > > Also AFAIU, every TASK_RSS_EVENTS_THRESH the page fault code does sync the > > counters. So it does not indefinitely lurk. > > If a thread incurs TASK_RSS_EVENTS_THRESH - 1 page faults and then > sleeps for a week, all memory counters observable from userspace will > be wrong for a week. Multiply this potential error by the number of > threads on a typical system and you have to conclude that split RSS > accounting produces a lot of potential uncertainty. What are we > getting in exchange for this uncertainty? > > > The tracepoint's main intended > > use is to detect spikes which provides ample opportunity to sync the cache. > > The intended use is measuring memory levels of various processes over > time, not just detecting "spikes". In order to make sense of the > resulting data series, we need to be able to place error bars on it. > The presence of split RSS accounting makes those error bars much > larger than they have to be. > > > You could reduce TASK_RSS_EVENTS_THRESH in your kernel, or even just disable > > split RSS accounting if that suits you better. That would solve all the > > issues you raised, not just any potential ones that you raised here for this > > tracepoint. > > I think we should just delete the split RSS accounting code unless > someone can demonstrate that it's a measurable win on a typical > system. The first priority of any system should be correctness. > Consistency is a kind of correctness. Departures from correctness > coming only from quantitatively-justifiable need. I think you make some good points for correctness, but I still don't see how all that relates to _this_ change. We currently do want an ability to get these rss spikes in the traces (as the patch description shows). You seem to be arguing about the correctness of split RSS accounting. I would suggest to send a patch to delete the split RSS accounting code and take these *very valid* arguments there? I am struggling to see the point of derailing this _specific_ change for that. - ssp > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >