Received: by 10.213.65.68 with SMTP id h4csp382856imn; Fri, 30 Mar 2018 07:22:10 -0700 (PDT) X-Google-Smtp-Source: AIpwx49uQlrnVrwEVFneflfDQXqUnVDE0Ux3mSE5f4mKAg4XrR8LYx/7QVF4DzoD5CQ5VWioZ+FC X-Received: by 10.98.10.131 with SMTP id 3mr10101115pfk.112.1522419730132; Fri, 30 Mar 2018 07:22:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522419730; cv=none; d=google.com; s=arc-20160816; b=0DrGxqnQMqC2TWeEIk7kdL4uk7W27OkVoN55zRCuRVhOdiDSCA4NpBl1E5spqSjUk1 dx0HBq9JMUKZPMLPAf88OiczSwU4KUBTPVucj3IH/BcbqU6ClJ6leZ/dcieLrcUtl2up nkIrR4iCDfgnwt4ZVtFv7jFx4pFBMQnGkz8JAO2j4wlJjqirZQauR9T6rH88kk8aouiz XJ/vCf8ur7awnbtDtGs2jiKwsEIF4YhW+QjrJlWhZdcf61G/CDbuRqWI4ir3V4GgtU4d Lbtd8H/MGTYE4Z4z3J+Dx2eJk7Jr6kvoQgEXTl9q28iQX2KRepDEwHFYRxxgkAQF44sO Hgtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dmarc-filter:arc-authentication-results; bh=RgkV/8VmT12psWGGoEQNMGLpPdw3BSQcxrnw9mKg1XI=; b=V8yku5FnZC5AU98aFtkdsJC/nvgTPxUAadYvYK9HaEr52hPtGpnewFtuSOchE6eS1G vbQ/3ex73A0GZrYWkteLCwde3GKNbc7IBLNVPLCu0Gm3JIWPHKZFhDLXvT6GWNgvfKfm wKJE3EwMm52WuA3BPqehACleyUjtqqWK1ywrd9peR4HQLZozMyOKuX01PGTupA3znIMB kTLqcUIJxgoSIWBeDFKkb+9+fT4YUdU394+3SWM8UC9/vfcIV1CxbRaLkma0DEs7StD+ UIVsUsSlUV/VipSK8t2UY0ipoBwvtbVM5cHj2IWz2Mb/2+WxCafN7Cqn74YXXEDM6hqO Ug8g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a11-v6si8234154plp.363.2018.03.30.07.21.55; Fri, 30 Mar 2018 07:22:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751384AbeC3OUn (ORCPT + 99 others); Fri, 30 Mar 2018 10:20:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:32906 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750794AbeC3OUl (ORCPT ); Fri, 30 Mar 2018 10:20:41 -0400 Received: from gandalf.local.home (cpe-66-24-56-78.stny.res.rr.com [66.24.56.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1D9BD21777; Fri, 30 Mar 2018 14:20:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D9BD21777 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Fri, 30 Mar 2018 10:20:38 -0400 From: Steven Rostedt To: Zhaoyang Huang Cc: Ingo Molnar , linux-kernel@vger.kernel.org, kernel-patch-test@lists.linaro.org, Andrew Morton , Joel Fernandes , Michal Hocko , linux-mm@kvack.org, Vlastimil Babka , Michal Hocko Subject: Re: [PATCH v1] kernel/trace:check the val against the available mem Message-ID: <20180330102038.2378925b@gandalf.local.home> In-Reply-To: <1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com> References: <1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ Adding memory management folks to discuss the issue ] On Thu, 29 Mar 2018 18:41:44 +0800 Zhaoyang Huang wrote: > It is reported that some user app would like to echo a huge > number to "/sys/kernel/debug/tracing/buffer_size_kb" regardless > of the available memory, which will cause the coinstantaneous > page allocation failed and introduce OOM. The commit checking the > val against the available mem first to avoid the consequence allocation. > > Signed-off-by: Zhaoyang Huang > --- > kernel/trace/trace.c | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 2d0ffcc..a4a4237 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -43,6 +43,8 @@ > #include > #include > > +#include > +#include > #include "trace.h" > #include "trace_output.h" > > @@ -5967,6 +5969,39 @@ static ssize_t tracing_splice_read_pipe(struct file *filp, > return ret; > } > > +static long get_available_mem(void) > +{ > + struct sysinfo i; > + long available; > + unsigned long pagecache; > + unsigned long wmark_low = 0; > + unsigned long pages[NR_LRU_LISTS]; > + struct zone *zone; > + int lru; > + > + si_meminfo(&i); > + si_swapinfo(&i); > + > + for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++) > + pages[lru] = global_page_state(NR_LRU_BASE + lru); > + > + for_each_zone(zone) > + wmark_low += zone->watermark[WMARK_LOW]; > + > + available = i.freeram - wmark_low; > + > + pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE]; > + pagecache -= min(pagecache / 2, wmark_low); > + available += pagecache; > + > + available += global_page_state(NR_SLAB_RECLAIMABLE) - > + min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low); > + > + if (available < 0) > + available = 0; > + return available; > +} > + As I stated in my other reply, the above function does not belong in tracing. That said, it appears you are having issues that were caused by the change by commit 848618857d2 ("tracing/ring_buffer: Try harder to allocate"), where we replaced NORETRY with RETRY_MAYFAIL. The point of NORETRY was to keep allocations of the tracing ring-buffer from causing OOMs. But the RETRY was too strong in that case, because there were those that wanted to allocate large ring buffers but it would fail due to memory being used that could be reclaimed. Supposedly, RETRY_MAYFAIL is to allocate with reclaim but still allow to fail, and isn't suppose to trigger an OOM. From my own tests, this is obviously not the case. Perhaps this is because the ring buffer allocates one page at a time, and by doing so, it can get every last available page, and if anything in the mean time does an allocation without MAYFAIL, it will cause an OOM. For example, when I stressed this I triggered this: pool invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 pool cpuset=/ mems_allowed=0 CPU: 7 PID: 1040 Comm: pool Not tainted 4.16.0-rc4-test+ #663 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: dump_stack+0x8e/0xce dump_header.isra.30+0x6e/0x28f ? _raw_spin_unlock_irqrestore+0x30/0x60 oom_kill_process+0x218/0x400 ? has_capability_noaudit+0x17/0x20 out_of_memory+0xe3/0x5c0 __alloc_pages_slowpath+0xa8e/0xe50 __alloc_pages_nodemask+0x206/0x220 alloc_pages_current+0x6a/0xe0 __page_cache_alloc+0x6a/0xa0 filemap_fault+0x208/0x5f0 ? __might_sleep+0x4a/0x80 ext4_filemap_fault+0x31/0x44 __do_fault+0x20/0xd0 __handle_mm_fault+0xc08/0x1160 handle_mm_fault+0x76/0x110 __do_page_fault+0x299/0x580 do_page_fault+0x2d/0x110 ? page_fault+0x2f/0x50 page_fault+0x45/0x50 I wonder if I should have the ring buffer allocate groups of pages, to avoid this. Or try to allocate with NORETRY, one page at a time, and when that fails, allocate groups of pages with RETRY_MAYFAIL, and that may keep it from causing an OOM? Thoughts? -- Steve > static ssize_t > tracing_entries_write(struct file *filp, const char __user *ubuf, > size_t cnt, loff_t *ppos) > @@ -5975,13 +6010,15 @@ static ssize_t tracing_splice_read_pipe(struct file *filp, > struct trace_array *tr = inode->i_private; > unsigned long val; > int ret; > + long available; > > + available = get_available_mem(); > ret = kstrtoul_from_user(ubuf, cnt, 10, &val); > if (ret) > return ret; > > /* must have at least 1 entry */ > - if (!val) > + if (!val || (val > available)) > return -EINVAL; > > /* value is in KB */