Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp380338ybk; Sat, 9 May 2020 05:51:29 -0700 (PDT) X-Google-Smtp-Source: APiQypK2nglxqtngZwETg7cHOptNH6YipHEsL/o9UGOFRVGwpr7u4w92I2Ul0aiq2lR/Jnn3rIOJ X-Received: by 2002:a17:906:16ce:: with SMTP id t14mr6059768ejd.366.1589028688919; Sat, 09 May 2020 05:51:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589028688; cv=none; d=google.com; s=arc-20160816; b=B6uHb6OoAtqW5wpC3HpkorjvslXFxDRhlA/uYGRmMt9W/sbVLIkfsXR3nGDeqASmjb 3dPDT5vNtEfskoxJanLc54INWsO+rR4/gBvib+RQZ3vR9WknVOmVFmVWsREtItQAXGWt ovSTR5hvjR/8TKfo8EpXg4JDnVoXAN3nT0juDqmcRMR7OAIrLchq3ClG+NJKdZMe5Th/ AxJwoP0wzLe3wMwCDkFbOsvDkjWNYAXWb1yYOS0XwpyvUYdVWpRtrrScXnWrtb9OfS9R SLxkWjESsafxndGNtWfMAmQppqGt/yhJ629LW1punGjMAqJSLmMTImTFWEYsUZdVJUfc eStg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=RalBfNpoGyDMrqXsVdbFNymjQdr1HVkdgl0QQ8Ulg0s=; b=HD8xaesIUyxr5tS/We79820WC4Bxce/TcfLMN8ZLVcZgwuHqxPzmen1M5/LBp7hf0x qwNb0v2I1L2nduKtCLj10A+fCgWC6sS8MAyIY+jyDrtSLBivhIpL/Do0ojx7vIVLSw2P sK8KB5nStXFQPB+Kc52ywFYzU6xng6XuMji6uUgrJvcrBytALYRy1ojNeQBoFXBcBiiL n/7s+tJrFTly4Kt3icR4UluxGRsPexWD7CEb8uUIYZLksLC+UHvemD/fFyA7Yr7yBcqz FjqreZhBa+8GOwFV+SHKyiAJ8PamqDwgtBCvO3eisXQVqqVm/dJ9PpRpKH2n4wPv7bdr nxuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dk9si2624255edb.403.2020.05.09.05.50.51; Sat, 09 May 2020 05:51:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727051AbgEIMsZ (ORCPT + 99 others); Sat, 9 May 2020 08:48:25 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:42669 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726017AbgEIMsZ (ORCPT ); Sat, 9 May 2020 08:48:25 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jXOtv-0003Jb-3t; Sat, 09 May 2020 12:48:19 +0000 Date: Sat, 9 May 2020 14:48:17 +0200 From: Christian Brauner To: Andrew Morton Cc: Minchan Kim , Vlastimil Babka , LKML , linux-mm , linux-api@vger.kernel.org, oleksandr@redhat.com, Suren Baghdasaryan , Tim Murray , Daniel Colascione , Sandeep Patil , Sonny Rao , Brian Geffon , Michal Hocko , Johannes Weiner , Shakeel Butt , John Dias , Joel Fernandes , Jann Horn , alexander.h.duyck@linux.intel.com, sj38.park@gmail.com, Christian Brauner , Kirill Tkhai Subject: Re: [PATCH v7 5/7] mm: support both pid and pidfd for process_madvise Message-ID: <20200509124817.xmrvsrq3mla6b76k@wittgenstein> References: <20200302193630.68771-1-minchan@kernel.org> <20200302193630.68771-6-minchan@kernel.org> <14089609-5fb1-b082-716f-c2e129d27c48@suse.cz> <20200311004251.GB87930@google.com> <20200508183653.GB125527@google.com> <20200508160415.65ff359a9e312c613336587b@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200508160415.65ff359a9e312c613336587b@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 08, 2020 at 04:04:15PM -0700, Andrew Morton wrote: > On Fri, 8 May 2020 11:36:53 -0700 Minchan Kim wrote: > > > > > ... > > > > Per Vlastimil's request, I changed "which and advise" with "idtype and > > advice" in function prototype of description. > > Could you replace the part in the description? Code is never changed. > > > > Done, but... > > > > > ... > > > > There is a demand[1] to support pid as well pidfd for process_madvise to > > reduce unnecessary syscall to get pidfd if the user has control of the > > target process(ie, they could guarantee the process is not gone or pid is > > not reused). > > > > This patch aims for supporting both options like waitid(2). So, the > > syscall is currently, > > > > int process_madvise(idtype_t idtype, id_t id, void *addr, > > size_t length, int advice, unsigned long flags); > > > > @which is actually idtype_t for userspace libray and currently, it > > supports P_PID and P_PIDFD. > > What does "@which is actually idtype_t for userspace libray" mean? Can > you clarify and expand? If I may clarify, the only case where we've supported both pidfd and pid in the same system call is waitid() to avoid adding a dedicated system call for waiting and because waitid() already had this (imho insane) argument type switching. The idtype_t thing comes from waitid() and is located int sys/wait.h and is defined as "The type idtype_t is defined as an enumeration type whose possible values include at least the following: P_ALL P_PID P_PGID " int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options); If idtype is P_PID, waitid() shall wait for the child with a process ID equal to (pid_t)id. If idtype is P_PGID, waitid() shall wait for any child with a process group ID equal to (pid_t)id. If idtype is P_ALL, waitid() shall wait for any children and id is ignored. I'm personally not a fan of this idtype_t thing and think this should just have been > > int pidfd_madvise(int pidfd, void *addr, > > size_t length, int advice, unsigned long flags); and call it a day. Also, if I may ask, why is the flag argument "unsigned long"? That's pretty unorthodox. The expectation is that flag arguments are not word-size dependent and should usually use "unsigned int". All new system calls follow this pattern too. The current syscall layout will mean that on 64 bit systems you have 64 flag bits and on 32 bit you have 32 flag bits, I think. That has just recently led to some problems with the clone() syscall (fixed in [1] which I'm sending Monday) which has the same weird word-size-dependent flag argument layout. If a system does sign-extension and a userspace api or glibc uses e.g. an int for the flag argument in the system call wrapper - which is fairly common - you can get sign extended and then you end up with garbage in the upper 32 bits of your system call. > > Also, does this userspace library exist? If so, where is it? [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/commit/?h=fixes&id=3f2c788a13143620c5471ac96ac4f033fc9ac3f3 Christian