Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1366256ybh; Thu, 16 Jul 2020 10:05:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJytLIube6JsH5qHJvFklRXYR/F/TIXdabIPOPAG9o6ffeKdU6C5SyOf+L2E5toj2MZhMCMH X-Received: by 2002:a17:906:35cd:: with SMTP id p13mr4654298ejb.172.1594919128744; Thu, 16 Jul 2020 10:05:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594919128; cv=none; d=google.com; s=arc-20160816; b=Su0vc0bVH0LpY3+gv6WvZWIN1SSXitSR373CPod1W2YiJnDjT1SyxRudX4i7k+Vuc9 ktKTsdAoe9zrC9poG/MDsKdtOhBUcbvUabrFmUwvSooS22mk1qPCWySgbxPF1kLZGr3I 5TJfttJYphFjWANjIv5/zEatFfk9mWkORmUoFUCRgtlyREZOrRIBOaQ/vf8B7mEMXU2e LjbZ9yM+1JpuUu2jFhB7Ht06qBu5XXOS0qXQWgiuDar8xpMYHkW+wbIFMI4WVFerCjzO DbVXgZqEWkmBtnDvo1NaFLvv7hSZG6on6aSy0L3x7j0nqJwrfg5dmjHTuiTDahD8Ui6L ryYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=g2kCsm/bm1ve9jG+Cs+m54b7wlSjss2b9ylTUfr3H5Q=; b=lZGfNNWdP8UgD/XtfWOJ1nospIX2rSVULPfrgGF5o2mHMIR2k8270HaerFWht7yG3w bLmnqh/4tnZKBqTx8qfWrObRHCel9ZiR3W3UUmxutxPMBIIK5fg7emZ0AkOoDqjvh8TQ D6PBwFvbHTIEoJ8Kj/97cYg1rsS+/5qibTqgCrzgWmhJXVjiKydfKYFNJY7y32pUFhhL 8r78T67Dvk6BkABJAp6nU1hzocrGUi848EtDdG7uumiuBjGIYjHbclk5tXZOhYCs5UNc qe+qXu6LWhmXabfoWCNkn4eF08hWpiHLm9FGPUl87muAQq6IUii10LYF6Hw8MmaEiGGY jHzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SmOJInW8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d6si3597645eds.492.2020.07.16.10.05.03; Thu, 16 Jul 2020 10:05:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SmOJInW8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728774AbgGPRE6 (ORCPT + 99 others); Thu, 16 Jul 2020 13:04:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725867AbgGPRE5 (ORCPT ); Thu, 16 Jul 2020 13:04:57 -0400 Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2758C061755 for ; Thu, 16 Jul 2020 10:04:56 -0700 (PDT) Received: by mail-lj1-x242.google.com with SMTP id e4so8719239ljn.4 for ; Thu, 16 Jul 2020 10:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=g2kCsm/bm1ve9jG+Cs+m54b7wlSjss2b9ylTUfr3H5Q=; b=SmOJInW8UzmPU3mqjDPCmjZFhN/t2ZScfhEKYZb3jaL+M1UhR0jdgzDcrSW7i3qQYP FhShRV+wjnwoaJP1VNmeb4BKME47gsr56p2OPQm0wati9CLgyCvsGy80KwrjimucLuF7 gV0/qhIwmOB9cH0MpkgXUXNCF6NkFNwUuF8FMqLCaamd4smxNNEcZUAk/jPPmwGjvxWr E3aFdMyb+qiHFZbjT2MGpl/Yva9H4VkF5ZuYTKBEgXn+kpzvQ5WBPbd+ozFtrD3UF2Zx lneXnovYbFNNoHepJ8V37MuOgdvD7ON+QFe2Jiice5+Cvx2ZyAxhl74GwT9JOVl/loqs MA4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=g2kCsm/bm1ve9jG+Cs+m54b7wlSjss2b9ylTUfr3H5Q=; b=CYPqv91ybsg9kJSW5PjnvV0G3zrzNeI2AbJgxi6murLVtDBeosUrcHC7Of8bjPmUak Pt52FeucIjMRQ0ybnLzYLTo5+VpuGpebo4Y2+R2osmS5jnA9/nv1zJLlILt4UXx2dJB8 KOVtstjO1vKY5dPwp08/pZ15lQOdNX+P8PGDHSq360Q93wNdYxXHvKZLnsJHs23VGLLu wa/Ch7UZCb6AnxyrfERKINs2R2e+qgFnUuiN1UF0Lq4AUxJf4mtVEndbV4vi1/lbRsnt jzt1hS6/ZBr6dxOTex7VLXuZk0QsaI/0aUR/Ubi9VbYfZhDKCOOamyhZM0qjQSBmyfDb xtAQ== X-Gm-Message-State: AOAM532Th7zODNfVZJBlekizS6TnaRWEx8MPg+bA08f42WKksQbH8w2t kpmMtqewz3msUC7hGppBmRbWlr/lBz8SITudi29yyw== X-Received: by 2002:a05:651c:10f:: with SMTP id a15mr2396471ljb.192.1594919094710; Thu, 16 Jul 2020 10:04:54 -0700 (PDT) MIME-Version: 1.0 References: <1585649077-10896-1-git-send-email-laoar.shao@gmail.com> In-Reply-To: From: Shakeel Butt Date: Thu, 16 Jul 2020 10:04:43 -0700 Message-ID: Subject: Re: [PATCH v2 0/2] psi: enhance psi with the help of ebpf To: Yafang Shao Cc: Johannes Weiner , "Peter Zijlstra (Intel)" , Andrew Morton , Linux MM , "open list:BLOCK LAYER" , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 15, 2020 at 8:19 PM Yafang Shao wrote: > > On Thu, Jul 16, 2020 at 12:36 AM Shakeel Butt wrote: > > > > Hi Yafang, > > > > On Tue, Mar 31, 2020 at 3:05 AM Yafang Shao wrote: > > > > > > PSI gives us a powerful way to anaylze memory pressure issue, but we can > > > make it more powerful with the help of tracepoint, kprobe, ebpf and etc. > > > Especially with ebpf we can flexiblely get more details of the memory > > > pressure. > > > > > > In orderc to achieve this goal, a new parameter is added into > > > psi_memstall_{enter, leave}, which indicates the specific type of a > > > memstall. There're totally ten memstalls by now, > > > MEMSTALL_KSWAPD > > > MEMSTALL_RECLAIM_DIRECT > > > MEMSTALL_RECLAIM_MEMCG > > > MEMSTALL_RECLAIM_HIGH > > > MEMSTALL_KCOMPACTD > > > MEMSTALL_COMPACT > > > MEMSTALL_WORKINGSET_REFAULT > > > MEMSTALL_WORKINGSET_THRASH > > > MEMSTALL_MEMDELAY > > > MEMSTALL_SWAPIO > > > With the help of kprobe or tracepoint to trace this newly added agument we > > > can know which type of memstall it is and then do corresponding > > > improvement. I can also help us to analyze the latency spike caused by > > > memory pressure. > > > > > > But note that we can't use it to build memory pressure for a specific type > > > of memstall, e.g. memcg pressure, compaction pressure and etc, because it > > > doesn't implement various types of task->in_memstall, e.g. > > > task->in_memcgstall, task->in_compactionstall and etc. > > > > > > Although there're already some tracepoints can help us to achieve this > > > goal, e.g. > > > vmscan:mm_vmscan_kswapd_{wake, sleep} > > > vmscan:mm_vmscan_direct_reclaim_{begin, end} > > > vmscan:mm_vmscan_memcg_reclaim_{begin, end} > > > /* no tracepoint for memcg high reclaim*/ > > > compcation:mm_compaction_kcompactd_{wake, sleep} > > > compcation:mm_compaction_begin_{begin, end} > > > /* no tracepoint for workingset refault */ > > > /* no tracepoint for workingset thrashing */ > > > /* no tracepoint for use memdelay */ > > > /* no tracepoint for swapio */ > > > but psi_memstall_{enter, leave} gives us a unified entrance for all > > > types of memstall and we don't need to add many begin and end tracepoints > > > that hasn't been implemented yet. > > > > > > Patch #2 gives us an example of how to use it with ebpf. With the help of > > > ebpf we can trace a specific task, application, container and etc. It also > > > can help us to analyze the spread of latencies and whether they were > > > clustered at a point of time or spread out over long periods of time. > > > > > > To summarize, with the pressure data in /proc/pressure/memroy we know that > > > the system is under memory pressure, and then with the newly added tracing > > > facility in this patchset we can get the reason of this memory pressure, > > > and then thinks about how to make the change. > > > The workflow can be illustrated as bellow. > > > > > > REASON ACTION > > > | compcation | improve compcation | > > > | vmscan | improve vmscan | > > > Memory pressure -| workingset | improve workingset | > > > | etc | ... | > > > > > > > I have not looked at the patch series in detail but I wanted to get > > your thoughts if it is possible to achieve what I am trying to do with > > this patch series. > > > > At the moment I am only interested in global reclaim and I wanted to > > enable alerts like "alert if there is process stuck in global reclaim > > for x seconds in last y seconds window" or "alert if all the processes > > are stuck in global reclaim for some z seconds". > > > > I see that using this series I can identify global reclaim but I am > > wondering if alert or notifications are possible. Android is using psi > > monitors for such alerts but it does not use cgroups, so, most of the > > memstalls are related to global reclaim stall. For cgroup environment, > > do we need for add support to psi monitor similar to this patch > > series? > > > > Hi Shakeel, > > We use the PSI tracepoints in our kernel to analyze the individual > latency caused by memory pressure, but the PSI tracepoints are > implemented with a new version as bellow: > trace_psi_memstall_enter(_RET_IP_); > trace_psi_memstall_leave(_RET_IP_); > And then using the _RET_IP_ to identify the specific PSI type. > > If the _RET_IP_ is at try_to_free_mem_cgroup_pages(), then it means > the pressure caused by the memory cgroup, IOW, the limit of memcg is > reached and it has to do memcg reclaim. Otherwise we can consider it > as global memory pressure. > try_to_free_mem_cgroup_pages > psi_memstall_enter > if (static_branch_likely(&psi_disabled)) > return; > *flags = current->in_memstall; > if (*flags) > return; > trace_psi_memstall_enter(_RET_IP_); <<<<< memcg pressure > Thanks for the response. I am looking for 'always on' monitoring. More specifically defining the system level SLIs based on PSI. My concern with ftrace is its global shared state and also it is not really for 'always on' monitoring. You have mentioned ebpf. Is ebpf fine for 'always on' monitoring and is it possible to notify user space by ebpf on specific conditions (e.g. a process stuck in global reclaim for 60 seconds)? thanks, Shakeel