Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3885453yba; Tue, 7 May 2019 08:33:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqxYRD+rfpHMAerk0x2i9iRlPH1xLqvEevk4P2vR9dwLYuUM/sqbjCpXeSJvXUB1HvOi38GO X-Received: by 2002:a17:902:e708:: with SMTP id co8mr9735826plb.141.1557243228754; Tue, 07 May 2019 08:33:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557243228; cv=none; d=google.com; s=arc-20160816; b=BuAlrqTDpH2Fryn/MR85oJbCWV84r6Z+1mYfgYH3OisOjtzc4Bq8rl6TWTIW7vNBoa be2Fs6DI5o8113mMX/PtWnjU0wenZfvPD+tZrW+NpxkzX/RN1Ws+d1HoATBMmlfPuKeQ tEiq7LbJvL2w2zalHSdmETh0GE1ZACNe1F2i2RpSt1Szfkj6jNN97X/+XDMcrslsGj9z sThzI+ZLWxUYEI5iosTP+9CDVdKKAhPpsgxLD2moIE3/UVceL/kpvlLNyJijFNM4RPtQ jSALrkI4A9v516rSGHQQFmyZ/0B2MqAwWH29UYZ548lPi8/GEW/L4y8GhWVNuX7QiaYX /e1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=5pI5PWgKV+F3OehC1OfcnlFzaNkCdk4j+T3RVss1qZ4=; b=LuwbXa008ynZWa4mtfYeFe9baM7NDSBj2lMjdkzb34ojWdAHjyfGazbhejSabjQjPu JTsClf3kDqvwhHlNQrFMZSk9K9Oj+XAyplxfhb/J58m4kjxsCMgHbRxgap6RwMHtDgBw gXdvwKTftEG3oR+dkeCK1WViDyuHNuQpjNQDsh/Ltt/1NoRy4aW0Ela/EyHoaRjj/gF9 z8nwm3+0OcqrakLhdnm4Xra7f66h3YRu4SOr5sah3n5wR8JQhYrv3dF6jP9M1pBDhI7Q 7tteVTqN0vwdTav9aEbm4rmiXTv/XZj8CLLG0dBnC9ZMspPcsdRgdt4S+daHD+CyISmN ESfw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f24si17656226pgv.511.2019.05.07.08.33.32; Tue, 07 May 2019 08:33:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726755AbfEGPcB (ORCPT + 99 others); Tue, 7 May 2019 11:32:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47908 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbfEGPcB (ORCPT ); Tue, 7 May 2019 11:32:01 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BB055308424E; Tue, 7 May 2019 15:31:59 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.43.17.159]) by smtp.corp.redhat.com (Postfix) with SMTP id 648915C225; Tue, 7 May 2019 15:31:55 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Tue, 7 May 2019 17:31:59 +0200 (CEST) Date: Tue, 7 May 2019 17:31:54 +0200 From: Oleg Nesterov To: Sultan Alsawaf Cc: Christian Brauner , Daniel Colascione , Suren Baghdasaryan , Steven Rostedt , Tim Murray , Michal Hocko , Greg Kroah-Hartman , Arve =?iso-8859-1?B?SGr4bm5lduVn?= , Todd Kjos , Martijn Coenen , Ingo Molnar , Peter Zijlstra , LKML , "open list:ANDROID DRIVERS" , linux-mm , kernel-team , Andy Lutomirski , "Serge E. Hallyn" , Kees Cook , Joel Fernandes Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Message-ID: <20190507153154.GA5750@redhat.com> References: <20190317015306.GA167393@google.com> <20190317114238.ab6tvvovpkpozld5@brauner.io> <20190318002949.mqknisgt7cmjmt7n@brauner.io> <20190318235052.GA65315@google.com> <20190319221415.baov7x6zoz7hvsno@brauner.io> <20190319231020.tdcttojlbmx57gke@brauner.io> <20190320015249.GC129907@google.com> <20190507021622.GA27300@sultan-box.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190507021622.GA27300@sultan-box.localdomain> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Tue, 07 May 2019 15:32:00 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I am not going to comment the intent, but to be honest I am skeptical too. On 05/06, Sultan Alsawaf wrote: > > +static unsigned long find_victims(struct victim_info *varr, int *vindex, > + int vmaxlen, int min_adj, int max_adj) > +{ > + unsigned long pages_found = 0; > + int old_vindex = *vindex; > + struct task_struct *tsk; > + > + for_each_process(tsk) { > + struct task_struct *vtsk; > + unsigned long tasksize; > + short oom_score_adj; > + > + /* Make sure there's space left in the victim array */ > + if (*vindex == vmaxlen) > + break; > + > + /* Don't kill current, kthreads, init, or duplicates */ > + if (same_thread_group(tsk, current) || > + tsk->flags & PF_KTHREAD || > + is_global_init(tsk) || > + vtsk_is_duplicate(varr, *vindex, tsk)) > + continue; > + > + vtsk = find_lock_task_mm(tsk); Did you test this patch with lockdep enabled? If I read the patch correctly, lockdep should complain. vtsk_is_duplicate() ensures that we do not take the same ->alloc_lock twice or more, but lockdep can't know this. > +static void scan_and_kill(unsigned long pages_needed) > +{ > + static DECLARE_WAIT_QUEUE_HEAD(victim_waitq); > + struct victim_info victims[MAX_VICTIMS]; > + int i, nr_to_kill = 0, nr_victims = 0; > + unsigned long pages_found = 0; > + atomic_t victim_count; > + > + /* > + * Hold the tasklist lock so tasks don't disappear while scanning. This > + * is preferred to holding an RCU read lock so that the list of tasks > + * is guaranteed to be up to date. Keep preemption disabled until the > + * SIGKILLs are sent so the victim kill process isn't interrupted. > + */ > + read_lock(&tasklist_lock); > + preempt_disable(); read_lock() disables preemption, every task_lock() too, so this looks unnecessary. > + for (i = 1; i < ARRAY_SIZE(adj_prio); i++) { > + pages_found += find_victims(victims, &nr_victims, MAX_VICTIMS, > + adj_prio[i], adj_prio[i - 1]); > + if (pages_found >= pages_needed || nr_victims == MAX_VICTIMS) > + break; > + } > + > + /* > + * Calculate the number of tasks that need to be killed and quickly > + * release the references to those that'll live. > + */ > + for (i = 0, pages_found = 0; i < nr_victims; i++) { > + struct victim_info *victim = &victims[i]; > + struct task_struct *vtsk = victim->tsk; > + > + /* The victims' mm lock is taken in find_victims; release it */ > + if (pages_found >= pages_needed) { > + task_unlock(vtsk); > + continue; > + } > + > + /* > + * Grab a reference to the victim so it doesn't disappear after > + * the tasklist lock is released. > + */ > + get_task_struct(vtsk); The comment doesn't look correct. the victim can't dissapear until task_unlock() below, it can't pass exit_mm(). > + pages_found += victim->size; > + nr_to_kill++; > + } > + read_unlock(&tasklist_lock); > + > + /* Kill the victims */ > + victim_count = (atomic_t)ATOMIC_INIT(nr_to_kill); > + for (i = 0; i < nr_to_kill; i++) { > + struct victim_info *victim = &victims[i]; > + struct task_struct *vtsk = victim->tsk; > + > + pr_info("Killing %s with adj %d to free %lu kiB\n", vtsk->comm, > + vtsk->signal->oom_score_adj, > + victim->size << (PAGE_SHIFT - 10)); > + > + /* Configure the victim's mm to notify us when it's freed */ > + vtsk->mm->slmk_waitq = &victim_waitq; > + vtsk->mm->slmk_counter = &victim_count; > + > + /* Accelerate the victim's death by forcing the kill signal */ > + do_send_sig_info(SIGKILL, SIG_INFO_TYPE, vtsk, true); ^^^^ this should be PIDTYPE_TGID > + > + /* Finally release the victim's mm lock */ > + task_unlock(vtsk); > + } > + preempt_enable_no_resched(); See above. And I don't understand how can _no_resched() really help... > + > + /* Try to speed up the death process now that we can schedule again */ > + for (i = 0; i < nr_to_kill; i++) { > + struct task_struct *vtsk = victims[i].tsk; > + > + /* Increase the victim's priority to make it die faster */ > + set_user_nice(vtsk, MIN_NICE); > + > + /* Allow the victim to run on any CPU */ > + set_cpus_allowed_ptr(vtsk, cpu_all_mask); > + > + /* Finally release the victim reference acquired earlier */ > + put_task_struct(vtsk); > + } > + > + /* Wait until all the victims die */ > + wait_event(victim_waitq, !atomic_read(&victim_count)); Can't we avoid the new slmk_waitq/slmk_counter members in mm_struct? I mean, can't we export victim_waitq and victim_count and, say, set/test MMF_OOM_VICTIM. In fact I think you should try to re-use mark_oom_victim() at least. Oleg.