Received: by 10.213.65.68 with SMTP id h4csp506578imn; Fri, 30 Mar 2018 09:39:24 -0700 (PDT) X-Google-Smtp-Source: AIpwx49WiCooPeB0cgGJXiW1DaFNDm+JvBzscdSODGw6HXYozFLDn75zf5oooG927ZuwQnQdmsNX X-Received: by 2002:a17:902:3041:: with SMTP id u59-v6mr13280350plb.208.1522427964834; Fri, 30 Mar 2018 09:39:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522427964; cv=none; d=google.com; s=arc-20160816; b=zyiw60GixE0SoEdCu7PnW0gBSXS5MA1AVoM0lGpPrWUfCloRjA3Xd6Pn6xBfkZjSxG ZnwIrOLtpawR1hdaRyIylmyDf4f5r1H2Ql6cbj0OMNUbg5TWXtWGSVKlu9Cj4iY3IqLg Em7ifjZmSQ1wNop4enygzRoW+hEEmJm8GlmQ4OPhYeljrM1hIHHXghDjXBBgkxsmCYw7 QbH+PqotZ7GySDCHyAPDgAQc+TW2dk83lFzlA6satZ2eF+DmRQTuy9s6+9lsIPG0tYDR /ggS4h8peRl3SjZiq+ORacD8U48aukYMB+N1+hZWGBqYb5ysXq3Cl/9x1IIps2Dg206s YdUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=lKGimEc3Eus6bybaiQhvRk2CiX6lbn/u+9e/JxdlnWY=; b=mMKaAJbZS3eMR4H5mmWsXZptlsyCvMovznPmjdvS8Jq6VQo1e4K5hf3PLKOIlFVxm2 BwnPSaskQIcoMgtgcr26aHYW1I6IU021Y0IE8/Eae/tZGUVe1fI+9KG2nv3hDV8Dgc9l QLs2yAqh6mBPdbGVovASQndf2WUqhXzeBc4fBSN5t0ze9mPZExyZWq0N2pKsmLOzxUD6 YMjJRb2agakdHhRjZ4fWbrBvAHoo3mXJ5JDkpX2hUW0Lhq5PVuVdUi8+ttYgS2q0Do+6 9h2ABQoEiJME2I0DkQw6YOvNAUqCHNsXokdt/dNP4l0206hhmTTFVDKGQsCNoEYdkJfM AuHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vaV2ZhKv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i12si5716614pgq.43.2018.03.30.09.39.10; Fri, 30 Mar 2018 09:39:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vaV2ZhKv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751878AbeC3QiC (ORCPT + 99 others); Fri, 30 Mar 2018 12:38:02 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:39487 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751167AbeC3QiA (ORCPT ); Fri, 30 Mar 2018 12:38:00 -0400 Received: by mail-it0-f66.google.com with SMTP id e98-v6so11913838itd.4 for ; Fri, 30 Mar 2018 09:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=lKGimEc3Eus6bybaiQhvRk2CiX6lbn/u+9e/JxdlnWY=; b=vaV2ZhKvGaSUjX8R+8MXrXiYk2mENNKo9WFJKwRmcDLgKcVtCHNltAOTIaNvKvLaC0 3l6d52kJROfHoZuDn4gpyQ1RiAzA66KaD82XHPc/VPhATsD5sZh1tW5WXuoK3Ww4rykA YBMTikYiCaLNLBVVj8rE5XORzUksWFXZ4vStKoVrHfVa+EwtO83ldLrETSw5DL1axKun nrH7y3j5kzgjavL1fey52uCpchDtxPFuYjRJr6tssFASdjgHDz0poCCE6DtCTfXdrfpg ajr58PUR7294J9JYeyNq5ecxSJFW4oA/TR13zcDtfJMuYMGT8aGqYc5fYBXdkDjMfW14 ZgNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=lKGimEc3Eus6bybaiQhvRk2CiX6lbn/u+9e/JxdlnWY=; b=HcUjxy3l2GAvu1K1dse8asoEB0Qe0TdnrdE5hZll+BriAYFXtybhaIlDCWZpUrY16X KLf9OHdr+sYG5zphi1N7PT32srtyRkMJTbUTS4z5UImtGxH80JBILMcJQXz+8j/hbNM9 ppgo3MaSNNC1qp66hHgbkl0Xm4obpKrJzvj0JAxEh1+CQcBU5TjO4VV7/Dp+muKtBwVg sz0rwyUeo+3B2p9N5F0IWHvwSMMnUQ8uHrB4waU31pc5jMBFYbaXjd8RwOPSZON9YcEP +m3R2lkPPD/Cu6gPnp0W3XhjgD6PxIcJFDU6TeWzmsUKkjKKJbglULDUm2yOy9jDp84f zQWw== X-Gm-Message-State: ALQs6tB/vSZeHZ5BEMiTGzQ9BActRFBP5W9MzWrRv5CIJCjI7xoB9juh RM2FGRBhYBIuHxtMRgRj5epXvXdV1rc8csbSF0xTGQ== X-Received: by 2002:a24:cd45:: with SMTP id l66-v6mr3907790itg.151.1522427879017; Fri, 30 Mar 2018 09:37:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.11.158 with HTTP; Fri, 30 Mar 2018 09:37:58 -0700 (PDT) In-Reply-To: <20180330102038.2378925b@gandalf.local.home> References: <1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com> <20180330102038.2378925b@gandalf.local.home> From: Joel Fernandes Date: Fri, 30 Mar 2018 09:37:58 -0700 Message-ID: Subject: Re: [PATCH v1] kernel/trace:check the val against the available mem To: Steven Rostedt Cc: Zhaoyang Huang , Ingo Molnar , LKML , kernel-patch-test@lists.linaro.org, Andrew Morton , Michal Hocko , "open list:MEMORY MANAGEMENT" , Vlastimil Babka , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Steve, On Fri, Mar 30, 2018 at 7:20 AM, Steven Rostedt wrote: > > [ Adding memory management folks to discuss the issue ] > > On Thu, 29 Mar 2018 18:41:44 +0800 > Zhaoyang Huang wrote: > >> It is reported that some user app would like to echo a huge >> number to "/sys/kernel/debug/tracing/buffer_size_kb" regardless >> of the available memory, which will cause the coinstantaneous >> page allocation failed and introduce OOM. The commit checking the >> val against the available mem first to avoid the consequence allocation. >> >> Signed-off-by: Zhaoyang Huang >> --- >> kernel/trace/trace.c | 39 ++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 38 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >> index 2d0ffcc..a4a4237 100644 >> --- a/kernel/trace/trace.c >> +++ b/kernel/trace/trace.c >> @@ -43,6 +43,8 @@ >> #include >> #include >> >> +#include >> +#include >> #include "trace.h" >> #include "trace_output.h" >> >> @@ -5967,6 +5969,39 @@ static ssize_t tracing_splice_read_pipe(struct file *filp, >> return ret; >> } >> >> +static long get_available_mem(void) >> +{ >> + struct sysinfo i; >> + long available; >> + unsigned long pagecache; >> + unsigned long wmark_low = 0; >> + unsigned long pages[NR_LRU_LISTS]; >> + struct zone *zone; >> + int lru; >> + >> + si_meminfo(&i); >> + si_swapinfo(&i); >> + >> + for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++) >> + pages[lru] = global_page_state(NR_LRU_BASE + lru); >> + >> + for_each_zone(zone) >> + wmark_low += zone->watermark[WMARK_LOW]; >> + >> + available = i.freeram - wmark_low; >> + >> + pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE]; >> + pagecache -= min(pagecache / 2, wmark_low); >> + available += pagecache; >> + >> + available += global_page_state(NR_SLAB_RECLAIMABLE) - >> + min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low); >> + >> + if (available < 0) >> + available = 0; >> + return available; >> +} >> + > > As I stated in my other reply, the above function does not belong in > tracing. > > That said, it appears you are having issues that were caused by the > change by commit 848618857d2 ("tracing/ring_buffer: Try harder to > allocate"), where we replaced NORETRY with RETRY_MAYFAIL. The point of > NORETRY was to keep allocations of the tracing ring-buffer from causing > OOMs. But the RETRY was too strong in that case, because there were Yes this was discussed with -mm folks. Basically the problem we were seeing is devices with tonnes of free memory (but free as in free but used by page cache) were not being used so it was unnecessarily failing to allocate ring buffer on the system with otherwise lots of memory. > those that wanted to allocate large ring buffers but it would fail due > to memory being used that could be reclaimed. Supposedly, RETRY_MAYFAIL > is to allocate with reclaim but still allow to fail, and isn't suppose > to trigger an OOM. From my own tests, this is obviously not the case. > IIRC, the OOM that my patch was trying to avoid, was being triggered in the path/context of the write to buffer_size_kb itself (when not doing the NORETRY), not by other processes. > Perhaps this is because the ring buffer allocates one page at a time, > and by doing so, it can get every last available page, and if anything > in the mean time does an allocation without MAYFAIL, it will cause an > OOM. For example, when I stressed this I triggered this: > > pool invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 > pool cpuset=/ mems_allowed=0 > CPU: 7 PID: 1040 Comm: pool Not tainted 4.16.0-rc4-test+ #663 > Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 > Call Trace: > dump_stack+0x8e/0xce > dump_header.isra.30+0x6e/0x28f > ? _raw_spin_unlock_irqrestore+0x30/0x60 > oom_kill_process+0x218/0x400 > ? has_capability_noaudit+0x17/0x20 > out_of_memory+0xe3/0x5c0 > __alloc_pages_slowpath+0xa8e/0xe50 > __alloc_pages_nodemask+0x206/0x220 > alloc_pages_current+0x6a/0xe0 > __page_cache_alloc+0x6a/0xa0 > filemap_fault+0x208/0x5f0 > ? __might_sleep+0x4a/0x80 > ext4_filemap_fault+0x31/0x44 > __do_fault+0x20/0xd0 > __handle_mm_fault+0xc08/0x1160 > handle_mm_fault+0x76/0x110 > __do_page_fault+0x299/0x580 > do_page_fault+0x2d/0x110 > ? page_fault+0x2f/0x50 > page_fault+0x45/0x50 But this OOM is not in the path of the buffer_size_kb write, right? So then what does it have to do with buffer_size_kb write failure? I guess the original issue reported is that the buffer_size_kb write causes *other* applications to fail allocation. So in that case, capping the amount that ftrace writes makes sense. Basically my point is I don't see how the patch you mentioned introduces the problem here - in the sense the patch just makes ftrace allocate from memory it couldn't before and to try harder. > > I wonder if I should have the ring buffer allocate groups of pages, to > avoid this. Or try to allocate with NORETRY, one page at a time, and > when that fails, allocate groups of pages with RETRY_MAYFAIL, and that > may keep it from causing an OOM? > I don't see immediately how that can prevent an OOM in other applications here? If ftrace allocates lots of memory with RETRY_MAYFAIL, then we would still OOM in other applications if memory isn't available. Sorry if I missed something. Thanks, - Joel