Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1151733ybb; Fri, 3 Apr 2020 19:58:53 -0700 (PDT) X-Google-Smtp-Source: APiQypIGJEYrnrcCe0taLjwuDLk6FMvbQg89G+LaPUeEmrnep97psRADjG27Rj3ffIavfrp7L4AA X-Received: by 2002:a05:6830:19ea:: with SMTP id t10mr8790180ott.200.1585969133364; Fri, 03 Apr 2020 19:58:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585969133; cv=none; d=google.com; s=arc-20160816; b=QjeSTFiRw3/4KNLH/yKKLn7SxZKGbMIFcZf7kZ6T8PzOr9HDmsPcN8B7RmOHglsZp1 6TbV4MGbbwge/+xUcBWqpE7Y/5HaGVeD0PBEBkBHIdG/g1pNkvy5gSZl+urSq6ILbagX 0mv7KYhbRLExxB6S3NpaVW7hjQSJmyOnI8ZlFV31rlwuGsc9UEjD7aQ9ZJHj1rP6bspG KrJsqhMBGN0TL05MqC/gIv7gIsiDtxOTTEjcV5D3Z4fb5i3PgCzhepWaaH7NDbriMOWn pIPx2II9SwnZt//JmEG/62BOHj+aFOe6a6v12fK8qHBRpQM7gDuV6sDb+3siH20oEKnX 34jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=k/GSiXHRl4dofFNbOAtNMPylFu49nhTT6CSicdQWJaI=; b=akn3T1XEvcvWhtAcQDaL+Juvzf2hIA0b8MBps7rwomxG6gLAtB9iK0bdTeZBKh2cSo mt8ddJKjqk3WQ7bce/L2MOSiFm4S52vr7gg7FILINC5k+jzKWuIFy6A33y67SL7E+TBg r6UsbgMCU6JqJSfDruxsP0aLXujFf5NNbVprZ50XV3YTvaJ7eucvkOMvOrU75uLWCWwi 9ErS79duDXVwAOh7xqFTdtB8EZUAutHI61+PJy7Ka2SQQNkbIWdZ/0szflP0341Xpl+8 pvFCoh8K6hzfR35IXOin4xExAj+qoEkkvgqpUosPKJVy4ITNwe1NEJeEIfh0NHsZsks/ qPKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=efO09LaU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a12si4585747otf.272.2020.04.03.19.58.40; Fri, 03 Apr 2020 19:58:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=efO09LaU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726312AbgDDCzU (ORCPT + 99 others); Fri, 3 Apr 2020 22:55:20 -0400 Received: from mail-il1-f194.google.com ([209.85.166.194]:44596 "EHLO mail-il1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726208AbgDDCzU (ORCPT ); Fri, 3 Apr 2020 22:55:20 -0400 Received: by mail-il1-f194.google.com with SMTP id j69so9366308ila.11; Fri, 03 Apr 2020 19:55:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=k/GSiXHRl4dofFNbOAtNMPylFu49nhTT6CSicdQWJaI=; b=efO09LaUfSIK5axgkCEl5IKkOWZXHGIh52uBx+UyOOiNV1/XQ5uRRmiR02MfO+n7KI DOXcMQkyAa6ZDqbSXmoCF0cFCgJulXkXq5+bxFv+ooy4w7cneNMXPdbs32eRIzi0mX6h vI+k7wgM5E696NUzveZcTKaHhnfpG/gWRwHPsM24NqUZN1RYZNxPEG582Fu3e6sojul+ zOzkNKooC/2hn+5MSCzWIfXUrMHDO3UZwsNXk0bkz2gBcgtol6SoV7VPSQsIVNnkB4hF stu/zg3b74eNHEIpG5W6Kr7uA5Y5le77LybPy5YJtmVkhCCaCVcrSNBo9ZW8RC+pLQkG cBRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k/GSiXHRl4dofFNbOAtNMPylFu49nhTT6CSicdQWJaI=; b=rmKF41EyY8ndMi4HozsiR334hI9/qxCz2XuLalkBoSLXQg9uUWI0cSAqGguei0jhYC 5UeYSdXhD/eNAVPRWvtb0RkVcFg49tVLsE2WRZUjcI1N2eNt48ztaGOC4Dje6eX3ZMMv lIMc5SLZByUadMJeVLjJ1Lpquy/VsAPDcj0MSPOKCl8bwbu+Tp0M5KDg3M/BIHALNJC/ MqGgKzBRvlNT41PldDFGCnehSTsQLpSTzZA5i2tbCTnn5CuVaTRBiFxhHAE9ia+rhzdt e+O5Ut2YNTg61vAttR1YR6YOim8JqHWFzTN5xrPBLmu8kUm5vBqtP1a99WYkK6glH/Z4 BYRw== X-Gm-Message-State: AGi0PuaW5wn4qaobFl+ZldiChHFa3byMQPrzBa1GSy7bAEJnU/lYGeyB o0L8UVEfM+7Vx9W6umKNk4lLQizBEuLm739Hm9I= X-Received: by 2002:a05:6e02:811:: with SMTP id u17mr11414585ilm.93.1585968918595; Fri, 03 Apr 2020 19:55:18 -0700 (PDT) MIME-Version: 1.0 References: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> <20200326143102.GB342070@cmpxchg.org> <20200331151103.GB2089@cmpxchg.org> <20200403154808.GB69203@cmpxchg.org> In-Reply-To: <20200403154808.GB69203@cmpxchg.org> From: Yafang Shao Date: Sat, 4 Apr 2020 10:54:42 +0800 Message-ID: Subject: Re: [PATCH 0/2] psi: enhance psi with the help of ebpf To: Johannes Weiner Cc: Peter Zijlstra , Andrew Morton , Michal Hocko , Jens Axboe , mgorman@suse.de, Steven Rostedt , mingo@redhat.com, Linux MM , linux-block@vger.kernel.org, LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 3, 2020 at 11:48 PM Johannes Weiner wrote: > > On Wed, Apr 01, 2020 at 09:22:24AM +0800, Yafang Shao wrote: > > On Tue, Mar 31, 2020 at 11:11 PM Johannes Weiner wrote: > > > On Fri, Mar 27, 2020 at 09:17:59AM +0800, Yafang Shao wrote: > > > > On Thu, Mar 26, 2020 at 10:31 PM Johannes Weiner wrote: > > With the newly added facility, we can know when these events occur > > and how long each event takes. > > Then we can use these datas to generate a Latency Heat Map[1] and to > > compare whether these latencies match with the application latencies > > recoreded in its log - for example the slow query log in mysql. If the > > refault latencies match with the slow query log, then these slow > > queries are caused by these workingset refault. If the refault > > latencies don't match with slow query log, IOW much smaller than the > > slow query log, then the slow query log isn't caused by the > > workingset refault. > > Okay, you want to use it much finer-grained to understand individual > end-to-end latencies for your services, rather than "the system is > melting down and I want to know why." That sounds valid to me. > Right, individual end-to-end latencies are very important for the latency sensitive services. > > > > > Can you elaborate a bit how you are using this information? It's not > > > > > quite clear to me from the example in patch #2. > > > > > > > > > > > > > From the traced data in patch #2, we can find that the high latencies > > > > of user tasks are always type 7 of memstall , which is > > > > MEMSTALL_WORKINGSET_THRASHING, and then we should look into the > > > > details of wokingset of the user tasks and think about how to improve > > > > it - for example, by reducing the workingset. > > > > > > That's an analyses we run frequently as well: we see high pressure, > > > and then correlate it with the events. > > > > > > High rate of refaults? The workingset is too big. > > > > > > High rate of compaction work? Somebody is asking for higher order > > > pages under load; check THP events next. > > > > > > etc. > > > > > > This works fairly reliably. I'm curious what the extra per-event > > > latency breakdown would add and where it would be helpful. > > > > > > I'm not really opposed to your patches it if it is, I just don't see > > > the usecase right now. > > > > > > > As I explained above, the rate of these events can't give us a full > > view of the memory pressure. High memory pressure may not caused by > > high rate of vmstat events, while it can be caused by low rate of > > events but with high latencies. Latencies are the application really > > concerns, that's why PSI is very useful for us. > > > > Furthermore, there're some events are not recored in vmstat. e.g. > > > > typf of memstall vmstat event > > > > MEMSTALL_KSWAPD pageoutrun, {pgscan, > > pgsteal}_kswapd > > MEMSTALL_RECLAIM_DIRECT {pgscan,steal}_direct > > MEMSTALL_RECLAIM_MEMCG /* no event */ > > MEMSTALL_RECLAIM_HIGH /* no event */ > > MEMSTALL_KCOMPACTD compact_daemon_wake > > MEMSTALL_COMPACT compact_{stall, fail, success} > > MEMSTALL_WORKINGSET_REFAULT workingset_refault > > MEMSTALL_WORKINGSET_THRASH /* no event */ > > MEMSTALL_MEMDELAY /* no event */ > > MEMSTALL_SWAPIO pswpin > > > > After we add these types of memstall, we don't need to add these > > missed events one by one. > > I'm a bit concerned about the maintainability of these things. It > makes moving code around harder, and it forces somebody who has no > interest in this debugging facility to thing about the categories. > > And naming them is hard even for somebody who does care. I'm not a fan > of MEMSTALL_MEMDELAY, for example because it's way too > non-descript. The distinction between MEMSTALL_WORKINGSET_REFAULT and > MEMSTALL_WORKINGSET_THRASH is dubious as well. > Agree with you that the naming is not good. > These are recipes for bit rot and making the code harder to hack on. > > I see two options to do this better: One is to use stack traces as > identifiers instead of a made-up type. The other is to use the calling > function as the id (see how kmalloc_track_caller() utilizes _RET_IP_). > > bpftrace can use the stack as a map key. So this should already work > without any kernel modifications, using @start[tid, kstack]? > If we don't make any kernel modifications, it is not easy to get whehter the psi_memstall_{enter, leave} is nested or not. The nested psi_memstall_{enter, leave} may make some noises. Seems the first option is better. With _RET_IP_ we can also get the caller. So how about adding tracepoints for psi_memstall_{enter, leave} as bellow ? @@ -904,7 +904,7 @@ void psi_memstall_enter(unsigned long *flags, enum memstall_types type) if (*flags) return; + trace_psi_memstall_enter(_RET_IP_); @@ -944,7 +943,7 @@ void psi_memstall_leave(unsigned long *flags, enum memstall_types type) if (*flags) return; + trace_psi_memstall_leave(_RET_IP_); Thanks Yafang