Received: by 2002:a05:6a10:6006:0:0:0:0 with SMTP id w6csp1397349pxa; Fri, 28 Aug 2020 11:28:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxUqtJg/7ySidF8lpLE2g1VVnpYF4a3U8z56PoDtXXr1ZZnz/+XewWH3QpJSpWH3c3ZVQmt X-Received: by 2002:a05:6402:3102:: with SMTP id dc2mr29738edb.152.1598639335816; Fri, 28 Aug 2020 11:28:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598639335; cv=none; d=google.com; s=arc-20160816; b=eiN+yWLyw3+lZyAiBj/unEOBsWovprhcKqmMqi1m11uVCprFV9FKh6nDXM4Kbynxaw ap+Fru3sJOeuIykQXelkXa5bvqgNgERrISPsyXhri0ofzUXoObYhgL3GgwowZImss3yq 9TLW+lV/xYno5/IytsqKyhk0kmkIOsIyAWEpR3MVMVdgN8qKLcSK3P0ydW8TFmlenxKA AT21SNHOSQSnFOBVrcJEUTzkxKg+ddupC9bu2KuzMGiCNxf368127sLS0UlIUTZn6xeU ZXXokRreDDgSpmqsjCWxhGbI0pOXErV4r+JBnSBuKR1xnkY/UNEB9E07mNhZoiOWvkUq PqzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=tNcRyj9RyFCqhML1vaB3/0lTIN4FgUKY5VeSnjUCZmc=; b=X9J0fkJjs72kPI437D96H5udcvkmztEHtoRyssCyqIuXpwZnhn2nYZLRzEF35GBp9w R9l6Jke0V0Ez3GvMut0Qg6e2jWpti65XPIoIa4wF5Xu3hs0pejgJwsChUUqTjQD3Sd1Y X3WhqBGf+WoR07wfvSSd/rDU2XwkYrwG7cWARZqWKwX8PM6/bA9w2aPY70LC9/jD3gSV erQtXvJCFiRAdWS1wFgWuJ78vrp/pyGBlWdKtAsOpC+pMaod+yawiNiYSIELI3CqDRYe ZS8sgamVkatK46XJptkrJnm8q2mNGov7Db83ot0HqVF0m4tc2VWRHBIHR36irRdjIf6h D02A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=fko4qsnU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bv7si182675ejb.523.2020.08.28.11.28.30; Fri, 28 Aug 2020 11:28:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=fko4qsnU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727923AbgH1SZu (ORCPT + 99 others); Fri, 28 Aug 2020 14:25:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726010AbgH1SZr (ORCPT ); Fri, 28 Aug 2020 14:25:47 -0400 Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAFC0C061264 for ; Fri, 28 Aug 2020 11:25:46 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id o18so371967eje.7 for ; Fri, 28 Aug 2020 11:25:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tNcRyj9RyFCqhML1vaB3/0lTIN4FgUKY5VeSnjUCZmc=; b=fko4qsnU8XsIG5wTzcpyBnhk2ZNW7LMMDcERg5LQjJnk/nhoReV6uTCwCciCdnm9XQ td3jm8s3IxQzbIWItUTVIyQw2DIhHO70YwD88A3kwPuTc/XP41FHOFg1Mj4qIEk+GJE3 lm12IAf3+JJzxbSyHX0MWuIwn/UsBFpZ5OZyCKjWx2inRtZv8dNaGYwhRZuvkPS7ojIs SVhPktB5+5MeeN94MPT5Xik4kyzOzDx+bmzMMtXd2VhwoeexqHsLj8I/lFCNu1rZsWl+ 0G0FoHC1mk8ny8gadNdX1AU5d2L3B4r9kSmr+1ZuFYSJ6wPbxkQcfyr3xxn7zPnQNEa9 Rxbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tNcRyj9RyFCqhML1vaB3/0lTIN4FgUKY5VeSnjUCZmc=; b=FFVSaozlqwAqeSbfk6Vx1QTi9A+cOWU7WFW5aEM6ICNHgeeHq0hoUKE/KjtEwFOJQj EBgJuTXSBT5FZh/F+pjegP/hsT9e+xBAOo1fnNQVHStjEd/aW27nTV6OtiW+S7E9nubi F4AVx3z0OiC/IYRnJZ9+4RSYASCHazAwRCppVFQBvsANX5s0Ba1ufkTExGD4hZ9ohN0P 66P2UG3nS5lYX/cIxmKktxOVJOaKKkaslLZhLtdCfmvs/3LmEd/Ajhi0SsnjRkiMfjDO g/SOynIidClv71bhU4r47dh2B7i+nkZDslM7QON8EVTRDeRqjIX9Bte1Rj1ct6dngMwu zyNA== X-Gm-Message-State: AOAM531VXZibv2+FUW6U54p3AxGM7SSFSxtI0NnIwybtWCYBWnnzxHQP 2dON4ngyIIoEp3zGQMTbiOgWzz/pCOl6HHV25QLjTg== X-Received: by 2002:a17:906:178d:: with SMTP id t13mr3428857eje.410.1598639145504; Fri, 28 Aug 2020 11:25:45 -0700 (PDT) MIME-Version: 1.0 References: <20200622192900.22757-1-minchan@kernel.org> <20200622192900.22757-4-minchan@kernel.org> <9c339413-68c7-344e-dd01-327cb988d385@kernel.dk> In-Reply-To: <9c339413-68c7-344e-dd01-327cb988d385@kernel.dk> From: Christian Brauner Date: Fri, 28 Aug 2020 20:25:34 +0200 Message-ID: Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall: an external memory hinting API To: Jens Axboe Cc: Arnd Bergmann , Minchan Kim , Andrew Morton , LKML , linux-mm , Linux API , Oleksandr Natalenko , Suren Baghdasaryan , Tim Murray , Sandeep Patil , Sonny Rao , Brian Geffon , Michal Hocko , Johannes Weiner , Shakeel Butt , John Dias , Joel Fernandes , Jann Horn , alexander.h.duyck@linux.intel.com, SeongJae Park , David Rientjes , Arjun Roy , Vlastimil Babka , Daniel Colascione , Kirill Tkhai , SeongJae Park , linux-man Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 28, 2020 at 8:24 PM Jens Axboe wrote: > > On 8/28/20 11:40 AM, Arnd Bergmann wrote: > > On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim wrote: > >> So finally, the API is as follows, > >> > >> ssize_t process_madvise(int pidfd, const struct iovec *iovec, > >> unsigned long vlen, int advice, unsigned int flags); > > > > I had not followed the discussion earlier and only now came across > > the syscall in linux-next, sorry for stirring things up this late. > > > >> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl > >> index 94bf4958d114..8f959d90338a 100644 > >> --- a/arch/x86/entry/syscalls/syscall_64.tbl > >> +++ b/arch/x86/entry/syscalls/syscall_64.tbl > >> @@ -364,6 +364,7 @@ > >> 440 common watch_mount sys_watch_mount > >> 441 common watch_sb sys_watch_sb > >> 442 common fsinfo sys_fsinfo > >> +443 64 process_madvise sys_process_madvise > >> > >> # > >> # x32-specific system call numbers start at 512 to avoid cache impact > >> @@ -407,3 +408,4 @@ > >> 545 x32 execveat compat_sys_execveat > >> 546 x32 preadv2 compat_sys_preadv64v2 > >> 547 x32 pwritev2 compat_sys_pwritev64v2 > >> +548 x32 process_madvise compat_sys_process_madvise > > > > I think we should not add any new x32-specific syscalls. Instead I think > > the compat_sys_process_madvise/sys_process_madvise can be > > merged into one. > > > >> + mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); > >> + if (IS_ERR_OR_NULL(mm)) { > >> + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; > >> + goto release_task; > >> + } > > > > Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile, > > and I would try to avoid that. Can mm_access() be changed to > > itself return PTR_ERR(-ESRCH) instead of NULL to improve its > > calling conventions? I see there are only three other callers. > > > > > >> + ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); > >> + if (ret >= 0) { > >> + ret = do_process_madvise(pidfd, &iter, behavior, flags); > >> + kfree(iov); > >> + } > >> + return ret; > >> +} > >> + > >> +#ifdef CONFIG_COMPAT > > ... > >> + > >> + ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), > >> + &iov, &iter); > >> + if (ret >= 0) { > >> + ret = do_process_madvise(pidfd, &iter, behavior, flags); > >> + kfree(iov); > >> + } > > > > Every syscall that passes an iovec seems to do this. If we make import_iovec() > > handle both cases directly, this syscall and a number of others can > > be simplified, and you avoid the x32 entry point I mentioned above > > > > Something like (untested) > > > > index dad8d0cfaaf7..0de4ddff24c1 100644 > > --- a/lib/iov_iter.c > > +++ b/lib/iov_iter.c > > @@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct > > iovec __user * uvector, > > { > > ssize_t n; > > struct iovec *p; > > - n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs, > > - *iov, &p); > > + > > + if (in_compat_syscall()) I suggested the exact same solutions roughly 1.5 weeks ago. :) Fun when I saw you mentioning this in BBB I knew exactly what you were referring too. :) Christian