Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp8945654ybl; Fri, 17 Jan 2020 03:53:52 -0800 (PST) X-Google-Smtp-Source: APXvYqx1WB3TEd+Axun1c/jrSRy7tWpa3F+V1b7sDzRi5Bxzuui+aRxgMXwcskq6GbEEduWDDWMh X-Received: by 2002:a9d:588c:: with SMTP id x12mr5638159otg.2.1579262031942; Fri, 17 Jan 2020 03:53:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579262031; cv=none; d=google.com; s=arc-20160816; b=TwT6hKfpWEV3z/kpCyhKl6oWJ79qOnwk2qyXYtQM0cycJSsDJDieCjOa9YY+Qx8uiV gJ7yEAcyGR4zFJnQZaejfVF30WK6YkyNG4FQiIvfGv/DAVh5Ny/1QcDWx8pQh/07Njox NjgMnHtsQI/fgekibLU5ahAGp4SwCyzoeHMDSTxujqgu8ydvVLQrdb1s7E03HqwLPYja Sywlx34XscrzE0i36pwm7gKdZbN5en5TO8c9O6mDyX3d9+wTZXLfPoR21rQvW//3JZr/ tik0inAHHYOPt9toQxsVzHJl9JUeYWbkySo8rKZlr5ntYspuYswGcaC94+05D1gIpAJ1 00Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=CfaMrw+TYdEFJRwNODCdrVm4OOMZyFpQ0nkjoCPs5oc=; b=mVKQRjhVhOXRxMNuUmwLUBwsrKuPOazl03HwSNQjiQbL90dSz5aLrXq6I21dfZ+sjd C8PYQhSJgA5+/8JO9RuFfSPLjaETxhTNpe0GVy4GMKrh6XdRCWhAOIkXsRrPzE+et3mm +ZawYlK+blhyhElPZ7vf9t7oswc5Cz4J6ZBQxlVMxJHVvSU8q3tQW/N3y77LDux9B1Zu /LtnH5Y5WVWeRmeHPGy4CSC+q7YLOxm3nE4FUuTHWymhp2oHIh7yuewALzsjNd898kB7 5jvJznWawzdiLWAcFAulrTeVBBoIXHQ5pTnajwoZi0QfwgMqEwm2uePePMzEeIOv0qKA Xzrw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h64si13505296oif.215.2020.01.17.03.53.40; Fri, 17 Jan 2020 03:53:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727285AbgAQLwa (ORCPT + 99 others); Fri, 17 Jan 2020 06:52:30 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:35405 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726864AbgAQLw3 (ORCPT ); Fri, 17 Jan 2020 06:52:29 -0500 Received: by mail-wr1-f68.google.com with SMTP id g17so22397452wro.2; Fri, 17 Jan 2020 03:52:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=CfaMrw+TYdEFJRwNODCdrVm4OOMZyFpQ0nkjoCPs5oc=; b=joiWYM23dUV5fRgLneE6DZ+1UH/i4alcTSHI7MiV4iZNn1nDf5MPxofqkdHbJAfyyH 3FvHBuS0gfu/qRxN8OU9/ERnrT/HL7ip7tMd12l9xbRiVspPzGLO4NA8j190fddtSjYy 6uhJVgn0z+0NNy0O37oqOFAzcDyVMsplAbxsB5uXxAwTwMm/T6S7CR3tuuPj3AKwoXkV CGQjnGr8wPWHzJbg0cqfABPwdAWMFviZJQWHtfZwYQVEkymAUBYYHFGADhsuDVYclSkG pxWlrnbpeuujcAzMhE2PEP8f6MF37KAIeTd90C0GMHUk9tb1dpEW5eUEUE/960fteYuy qP0A== X-Gm-Message-State: APjAAAWVNGcsfDDPZkxE40LsCXBFGSvJYsJqscLy5/MYwU/7JNcNpmuh weCVE3LHcPofQfva6qKw0/s= X-Received: by 2002:a5d:6a83:: with SMTP id s3mr2496048wru.99.1579261946860; Fri, 17 Jan 2020 03:52:26 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id k13sm33558283wrx.59.2020.01.17.03.52.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2020 03:52:26 -0800 (PST) Date: Fri, 17 Jan 2020 12:52:25 +0100 From: Michal Hocko To: Minchan Kim Cc: Andrew Morton , LKML , linux-mm , linux-api@vger.kernel.org, oleksandr@redhat.com, Suren Baghdasaryan , Tim Murray , Daniel Colascione , Sandeep Patil , Sonny Rao , Brian Geffon , Johannes Weiner , Shakeel Butt , John Dias , ktkhai@virtuozzo.com, christian.brauner@ubuntu.com, sjpark@amazon.de Subject: Re: [PATCH v2 2/5] mm: introduce external memory hinting API Message-ID: <20200117115225.GV19428@dhcp22.suse.cz> References: <20200116235953.163318-1-minchan@kernel.org> <20200116235953.163318-3-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200116235953.163318-3-minchan@kernel.org> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 16-01-20 15:59:50, Minchan Kim wrote: > There is usecase that System Management Software(SMS) want to give > a memory hint like MADV_[COLD|PAGEEOUT] to other processes and > in the case of Android, it is the ActivityManagerService. > > It's similar in spirit to madvise(MADV_WONTNEED), but the information > required to make the reclaim decision is not known to the app. Instead, > it is known to the centralized userspace daemon(ActivityManagerService), > and that daemon must be able to initiate reclaim on its own without > any app involvement. > > To solve the issue, this patch introduces new syscall process_madvise(2). > It uses pidfd of an external processs to give the hint. > > int process_madvise(int pidfd, void *addr, size_t length, int advise, > unsigned long flag); > > Since it could affect other process's address range, only privileged > process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) > gives it the right to ptrace the process could use it successfully. > The flag argument is reserved for future use if we need to extend the > API. > > I think supporting all hints madvise has/will supported/support to > process_madvise is rather risky. Because we are not sure all hints make > sense from external process and implementation for the hint may rely on > the caller being in the current context so it could be error-prone. > Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. > > If someone want to add other hints, we could hear hear the usecase and > review it for each hint. It's more safe for maintainace rather than > introducing a buggy syscall but hard to fix it later. I have brought this up when we discussed this in the past but there is no reflection on that here so let me bring that up again. I believe that the interface has an inherent problem that it is racy. The external entity needs to know the address space layout of the target process to do anyhing useful on it. The address space is however under the full control of the target process though and the external entity has no means to find out that the layout has changed. So time-to-check-time-to-act is an inherent problem. This is a serious design flaw and it should be explained why it doesn't matter or how to use the interface properly to prevent that problem. -- Michal Hocko SUSE Labs