Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1395066pxu; Mon, 23 Nov 2020 21:43:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJwFh2Jly8UPD0fdJssrDMrUgqyEcBErmv3nBgepQKkascUiS9FrbCZTMsQM4l7jUHv5Y6BQ X-Received: by 2002:a17:906:2818:: with SMTP id r24mr2866210ejc.100.1606196593083; Mon, 23 Nov 2020 21:43:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606196593; cv=none; d=google.com; s=arc-20160816; b=R2zkmbq9rAMoqqV2MRyL9ZJNMxwiayoDBANlqJ2rwQSXs1y7cmg52mKbz/+uY/2Ibl 9hwjSo9zbtDombcrEIzDK6dwQgKLCLJKZ0tdY+avkB2XMSF6gnCDMD2JidBwOADFIQpb AmbZnwzcXu99CwQEGLrYMKOCofXFfPVmgHLx5ymYu6GRR+51qxPADCk02IYIsMQMWiIP UlBH18yX3G2bSCXpDVoNubUpi+kgM0CcvOaA9xirmcyiNdJwO6iAWraW3x5HZpGD0lqF 6fKQXP1nHD1k1PEwotQxynbTNOrueyNgeyK35NeyhsPhbKZpJwxoI49aXxd/KrfuXTLj Wm0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=num0SQsxDIMRNuuSRBGQ+NlMxj/jOeSqEtVQ3GNIWyo=; b=BO6iDTBo5yLHO84ZuHla6CZm5WCpEYx68CtZk2TiHraIjmLJ78/U9TAlqhS19T/qPC xmDtiBkvoElpmkzbHDdzyrwUPGQgfvn8FcmNPoKyZ5fRVMylceFLnu1c9dbJgDI7no4y PZqOyKpRuFF3pbotZlzujoCP/s1ow6JZqHEbw3l5s2ct/CO+RO4tQ/2iZRZSXnH1toci R45veuO8HXJdo/Zc4PQLol5yaWBVM6wSbBk+8Lkyx3WXbylbVfdl8SDmBDK4/z0FKHSO 9/OJxRwx0TKz7I8+hpI1V+u4TAl7mdb19Oq4ys1dHUEHGIRU9o9ow/H/QMkFNwMBSMEq R63g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YpxnqpYi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d11si7759780ejt.527.2020.11.23.21.42.50; Mon, 23 Nov 2020 21:43:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YpxnqpYi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729116AbgKXFjy (ORCPT + 99 others); Tue, 24 Nov 2020 00:39:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729099AbgKXFjx (ORCPT ); Tue, 24 Nov 2020 00:39:53 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EA3FC0613D6 for ; Mon, 23 Nov 2020 21:39:53 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id w8so26208023ybj.14 for ; Mon, 23 Nov 2020 21:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=num0SQsxDIMRNuuSRBGQ+NlMxj/jOeSqEtVQ3GNIWyo=; b=YpxnqpYi239SbpxtHV+Vxz7w+h0AiAQYV1GookO796eyPdkf+79Unh1sSkTeYIGYlZ fYGtsTS/V0+StBRMMtQWYykHOSsxU8EHFJJMuz5OQum15tUzmoU4mIhQhDqV2nBEyqKs XK/KIZ2+zwBhcA+EzWiaCrqZe2Nb2x65s/Lfy+6uoLbUsHLsF3davJSBrpklz4sSUv4W /SQPXg6/uMywGKgpV3mVypJfITIHuhBrto6iRSwkpTUClCyA2efXPf7U0yXt3j1K5YKW p7TYHvw5CfC6xs5MJRByYXxoD6L1RnaBXJHLv7SbZb1BSqe2QO2ZMHreLbFH0hCJmgun cHmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=num0SQsxDIMRNuuSRBGQ+NlMxj/jOeSqEtVQ3GNIWyo=; b=eQAR1UH11d7BZpVDHWjJwjJJD7qcGqL9YyIr6Mr565p1S2dcM7TGhw1UXLF4w6bXWY A0xrwdktTlznQyMIB67qqytFVqNm/9wafLkBNL2RK8g0cWXDvnglxBr3z1tKiTS9NXGG pv8Rk4aj6DctbGhsXBzwrdUg5WEekTtydZUsTJPaobmY01pTzOsWnRdmNjaI+ym613v5 dI65nyzeAyBzopHte7r9/mmAWheA4zXjXLlNnkmdr7XKAca0sGjv87RX7ZHcBB+ESA9/ WGDtgMcQQXUoPm/Qqec3A+HlzFBtOF9NZbR2tDCEFor3jpwd495MfCKtxC06RxdFmuWE LPnw== X-Gm-Message-State: AOAM532BQ+rbzg26wkWACKxLXK1ZRAXUCjzyzSnl/GZrImu0ngIzHQw3 X6f/ZZBQ1tobM6oS9KW0yJcdbUszizQ= Sender: "surenb via sendgmr" X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:f693:9fff:fef4:2055]) (user=surenb job=sendgmr) by 2002:a25:cc0c:: with SMTP id l12mr5070405ybf.90.1606196392342; Mon, 23 Nov 2020 21:39:52 -0800 (PST) Date: Mon, 23 Nov 2020 21:39:43 -0800 In-Reply-To: <20201124053943.1684874-1-surenb@google.com> Message-Id: <20201124053943.1684874-3-surenb@google.com> Mime-Version: 1.0 References: <20201124053943.1684874-1-surenb@google.com> X-Mailer: git-send-email 2.29.2.454.gaff20da3a2-goog Subject: [PATCH 2/2] mm/madvise: add process_madvise MADV_DONTNEER support From: Suren Baghdasaryan To: surenb@google.com Cc: akpm@linux-foundation.org, mhocko@kernel.org, mhocko@suse.com, rientjes@google.com, willy@infradead.org, hannes@cmpxchg.org, guro@fb.com, riel@surriel.com, minchan@kernel.org, christian@brauner.io, oleg@redhat.com, timmurray@google.com, linux-api@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In modern systems it's not unusual to have a system component monitoring memory conditions of the system and tasked with keeping system memory pressure under control. One way to accomplish that is to kill non-essential processes to free up memory for more important ones. Examples of this are Facebook's OOM killer daemon called oomd and Android's low memory killer daemon called lmkd. For such system component it's important to be able to free memory quickly and efficiently. Unfortunately the time process takes to free up its memory after receiving a SIGKILL might vary based on the state of the process (uninterruptible sleep), size and OPP level of the core the process is running. In such situation it is desirable to be able to free up the memory of the process being killed in a more controlled way. Enable MADV_DONTNEED to be used with process_madvise when applied to a dying process to reclaim its memory. This would allow userspace system components like oomd and lmkd to free memory of the target process in a more predictable way. Signed-off-by: Suren Baghdasaryan --- mm/madvise.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/mm/madvise.c b/mm/madvise.c index 1aa074a46524..11306534369e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -29,6 +29,7 @@ #include #include #include +#include #include @@ -995,6 +996,18 @@ process_madvise_behavior_valid(int behavior) switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: + case MADV_DONTNEED: + return true; + default: + return false; + } +} + +static bool madvise_destructive(int behavior) +{ + switch (behavior) { + case MADV_DONTNEED: + case MADV_FREE: return true; default: return false; @@ -1006,6 +1019,10 @@ static bool can_range_madv_lru_vma(struct vm_area_struct *vma, int behavior) if (!can_madv_lru_vma(vma)) return false; + /* For destructive madvise skip shared file-backed VMAs */ + if (madvise_destructive(behavior)) + return vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED); + return true; } @@ -1239,6 +1256,23 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto release_task; } + if (madvise_destructive(behavior)) { + /* Allow destructive madvise only on a dying processes */ + if (!signal_group_exit(task->signal)) { + ret = -EINVAL; + goto release_mm; + } + /* Ensure no competition with OOM-killer to avoid contention */ + if (unlikely(mm_is_oom_victim(mm)) || + unlikely(test_bit(MMF_OOM_SKIP, &mm->flags))) { + /* Already being reclaimed */ + ret = 0; + goto release_mm; + } + /* Mark mm as unstable */ + set_bit(MMF_UNSTABLE, &mm->flags); + } + /* * For range madvise only the entire address space is supported for now * and input iovec is ignored. -- 2.29.2.454.gaff20da3a2-goog