Received: by 10.213.65.68 with SMTP id h4csp3618922imn; Tue, 10 Apr 2018 01:46:11 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+SwoiO/2NphZUDBYAMkNiL9iGR9SVBtdPuos80LDHS03xiFDVsewOxqT+TosZrRERLDHus X-Received: by 10.101.100.132 with SMTP id e4mr27648225pgv.240.1523349971648; Tue, 10 Apr 2018 01:46:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523349971; cv=none; d=google.com; s=arc-20160816; b=lKPY3Nucsm0IuL/jJ60bv4zHyi5QxG2sLp3gy4hG2gexwegWg0m757eCCO3PjJUsCM fQT52/p85nTbM16vCgBEKw/jSY3UwTlzeAZjEkWxPYIngzTmChSCXwYtf9bvtymhRP7O nx30OHj7zofzwh0OVaKZn2BYIuZ7OyOkYTtVgeZVR8pSDtAWvtf60IdWow5p3ECXmQa9 aNXWCn6HkLtPF9cNfF8S6ut3wb18FN/7BXhiJm7A7aW+uvx4N65Clm+HUMg3wCxT/uuz sqFaRfN+Az9QQJblHAd9I+nqI8vIJkX0jXA0mr9RmDQnXAToRlWd9zjjUA6c0cUeF+t/ r8lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=tQ6syhSgb/sOjnu8UANuzhgva96SJcPlbjianEjSaso=; b=fzkmPZUI+DgcakLTc4oBA22fDzsnsEYvcVfTlM/7k1aCEuBeU2z4RvI4Ch8bkGfOo9 YGG8fxW+C9NBY9G/YPAwkXj2N8BqmkAGsBHCKO5YWqvG3tY23OxvwuSlGcg7Azr/rOmf 1m39cI20k+1bxcvPEQENKs9ACpWlBOcAXxxzUaxtGdDF2dZuIYDODTKQG44wmSw6fcLO N5qJUdrZuUwFF7OLarLki64p9pM/UYJ8PrUP9OevnRf5WxhbN114NXBnHCNrwZSdqCH/ 1lnRQTi3smrYzdJXgz20P4yMW9XZOcjp4EeOFbaOtEIDw9I0HFb6pQ26oR4UNiWvkxwQ tKtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kcnfbZ/w; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e15si1454748pgf.812.2018.04.10.01.45.34; Tue, 10 Apr 2018 01:46:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kcnfbZ/w; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752323AbeDJIif (ORCPT + 99 others); Tue, 10 Apr 2018 04:38:35 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:55628 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752668AbeDJIid (ORCPT ); Tue, 10 Apr 2018 04:38:33 -0400 Received: by mail-wm0-f67.google.com with SMTP id b127so24451347wmf.5 for ; Tue, 10 Apr 2018 01:38:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tQ6syhSgb/sOjnu8UANuzhgva96SJcPlbjianEjSaso=; b=kcnfbZ/wr0UFDVtYR+ZitejTYpJdj2kDkQ5k623a9wslXRaOLd6yL40rijr7v1uqpv 8AKPg7tV7ef7eJ/aNNMmn1aNJXQMXw9V0kaklpGF3gJgFLl/cX0CVWQX2G0SSoBxJIFl my2KTzysObnsFhslIJ4V/RrQhAGMtoIXpKlWcXuv8Dp+RVMIHag/ec8ovHvLn9Gd+Ty0 MJoWkvlGCELG2HdzgdlovpJAD4wyCb5SZ+lxFRputegHRKQbfR8NZ3K7JUML17vmM7ym nURPwrnUOo0QSvXNg5HZBphrcUKM0AD8m70dOOjniIjgk8msTvu7ghuS0XwJQYegRhhI ESrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tQ6syhSgb/sOjnu8UANuzhgva96SJcPlbjianEjSaso=; b=lR9NOcimrctHKb9JWyGrng9PgDCxq4F3YXM3kEv2YqFKyBOMKAAIOjRtuhuZen4hLw foApnk577B6mFaozTVQAcyR/Om3USJh7fkl68r+npWSod3K9IZg6vu+uxhzYaRXZWDMz YcPbwlstKXJ83BbhC67a+tpIP6g+60/RRas1foOpxZK7UO1RxWBa9Xmd+mbuODDljgDG YJ5OpephX+dhuyzML/+UAEqr5A6SgFZlzr07WrNuCKspbDK9OSnGj1iAYPlWAHei6rVO ataKumOOcW/Jdms9l1GaNmFrZzdqkPZPdDIUFGRBODJMdOY9eZHtWGlhEwwcJZeW2NvR Tk0Q== X-Gm-Message-State: ALQs6tCMl1pgOhqC/aPiU1NPEgHbcJyVcqpjjsttG0PQDRkHENuSNmon pb2/YGPqNIcsbRDXOj639WCU4sK6N8bmTrFPcKM= X-Received: by 10.80.247.202 with SMTP id i10mr1795234edn.291.1523349512530; Tue, 10 Apr 2018 01:38:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.201.76 with HTTP; Tue, 10 Apr 2018 01:38:32 -0700 (PDT) In-Reply-To: <20180410081231.GV21835@dhcp22.suse.cz> References: <20180409094944.6399b211@gandalf.local.home> <20180409231230.1ab99e85@vmware.local.home> <20180410061447.GQ21835@dhcp22.suse.cz> <20180410074921.GU21835@dhcp22.suse.cz> <20180410081231.GV21835@dhcp22.suse.cz> From: Zhaoyang Huang Date: Tue, 10 Apr 2018 16:38:32 +0800 Message-ID: Subject: Re: [PATCH v1] ringbuffer: Don't choose the process with adj equal OOM_SCORE_ADJ_MIN To: Michal Hocko Cc: Steven Rostedt , Ingo Molnar , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 10, 2018 at 4:12 PM, Michal Hocko wrote: > On Tue 10-04-18 16:04:40, Zhaoyang Huang wrote: >> On Tue, Apr 10, 2018 at 3:49 PM, Michal Hocko wrote: >> > On Tue 10-04-18 14:39:35, Zhaoyang Huang wrote: >> >> On Tue, Apr 10, 2018 at 2:14 PM, Michal Hocko wrote: > [...] >> >> > OOM_SCORE_ADJ_MIN means "hide the process from the OOM killer completely". >> >> > So what exactly do you want to achieve here? Because from the above it >> >> > sounds like opposite things. /me confused... >> >> > >> >> Steve's patch intend to have the process be OOM's victim when it >> >> over-allocating pages for ring buffer. I amend a patch over to protect >> >> process with OOM_SCORE_ADJ_MIN from doing so. Because it will make >> >> such process to be selected by current OOM's way of >> >> selecting.(consider OOM_FLAG_ORIGIN first before the adj) >> > >> > I just wouldn't really care unless there is an existing and reasonable >> > usecase for an application which updates the ring buffer size _and_ it >> > is OOM disabled at the same time. >> There is indeed such kind of test case on my android system, which is >> known as CTS and Monkey etc. > > Does the test simulate a real workload? I mean we have two things here > > oom disabled task and an updater of the ftrace ring buffer to a > potentially large size. The second can be completely isolated to a > different context, no? So why do they run in the single user process > context? ok. I think there are some misunderstandings here. Let me try to explain more by my poor English. There is just one thing here. The updater is originally a oom disabled task with adj=OOM_SCORE_ADJ_MIN. With Steven's patch, it will periodically become a oom killable task by calling set_current_oom_origin() for user process which is enlarging the ring buffer. What I am doing here is limit the user process to the ones that adj > -1000. > >> Furthermore, I think we should make the >> patch to be as safest as possible. Why do we leave a potential risk >> here? There is no side effect for my patch. > > I do not have the full context. Could you point me to your patch? here are Steven and my patches diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 5f38398..1005d73 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1135,7 +1135,7 @@ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu) { struct buffer_page *bpage, *tmp; - bool user_thread = current->mm != NULL; + bool user_thread = (current->mm != NULL && current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN); //by zhaoyang gfp_t mflags; long i; ----------------------------------------------------------------------------------------------------- { struct buffer_page *bpage, *tmp; + bool user_thread = current->mm != NULL; + gfp_t mflags; long i; - /* Check if the available memory is there first */ + /* + * Check if the available memory is there first. + * Note, si_mem_available() only gives us a rough estimate of available + * memory. It may not be accurate. But we don't care, we just want + * to prevent doing any allocation when it is obvious that it is + * not going to succeed. + */ i = si_mem_available(); if (i < nr_pages) return -ENOMEM; + /* + * __GFP_RETRY_MAYFAIL flag makes sure that the allocation fails + * gracefully without invoking oom-killer and the system is not + * destabilized. + */ + mflags = GFP_KERNEL | __GFP_RETRY_MAYFAIL; + + /* + * If a user thread allocates too much, and si_mem_available() + * reports there's enough memory, even though there is not. + * Make sure the OOM killer kills this thread. This can happen + * even with RETRY_MAYFAIL because another task may be doing + * an allocation after this task has taken all memory. + * This is the task the OOM killer needs to take out during this + * loop, even if it was triggered by an allocation somewhere else. + */ + if (user_thread) + set_current_oom_origin(); for (i = 0; i < nr_pages; i++) { struct page *page; - /* - * __GFP_RETRY_MAYFAIL flag makes sure that the allocation fails - * gracefully without invoking oom-killer and the system is not - * destabilized. - */ + bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), - GFP_KERNEL | __GFP_RETRY_MAYFAIL, - cpu_to_node(cpu)); + mflags, cpu_to_node(cpu)); if (!bpage) goto free_pages; list_add(&bpage->list, pages); - page = alloc_pages_node(cpu_to_node(cpu), - GFP_KERNEL | __GFP_RETRY_MAYFAIL, 0); + page = alloc_pages_node(cpu_to_node(cpu), mflags, 0); if (!page) goto free_pages; bpage->page = page_address(page); rb_init_page(bpage->page); + + if (user_thread && fatal_signal_pending(current)) + goto free_pages; } + if (user_thread) + clear_current_oom_origin(); return 0; @@ -1199,6 +1225,8 @@ static int __rb_allocate_pages(long nr_pages, struct list_head *pages, int cpu) list_del_init(&bpage->list); free_buffer_page(bpage); } + if (user_thread) + clear_current_oom_origin(); return -ENOMEM; } > -- > Michal Hocko > SUSE Labs