Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp21374pxb; Wed, 14 Apr 2021 08:31:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxaAsQYKPX5Qy27s52BQHPSN18Vl3hyzuZyaF4iqcHkhjBgMbIu8G/3SuWjN8IXXt5e9j80 X-Received: by 2002:a17:907:988c:: with SMTP id ja12mr21395527ejc.41.1618414306791; Wed, 14 Apr 2021 08:31:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618414306; cv=none; d=google.com; s=arc-20160816; b=x2d4jF3R5XoSCfT4O939iHAdX4eNf53oOOGTEp1V88hmRthBahhkfXxgHOcejZP/o1 3pwpFyY9/gJfdre2EMPTjwDupOgBQBgQeqFsIRp17UcySQgK2IO2NaLNpznvL7k5ndhs YJHQNztElvNCelNohVwkdfiTn/9thshjw9t8XRF8bFBIl66Qj0Q+9E3+sL41YOBrNecg pMbLb+4OF4+jCK9U791dZduZlYFXb0KRAN1KOv7Mql0aoW/FI5x9ONk7n6Che3sAOiIN CWn14rYgg/VaxmztjrmK4ubN3DvplztYSc3emZAV5Weu6SyE9N6/hYbhYdlsNxHvNqEP pTmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=OuhKmn9FYt80c9ROH/y1oStEmOA02E9r2ZFVc64O4vU=; b=c7GZlxweYHhvHhjBiskt1luwONncCyN0hsersx45wnN+Dn2LeFETVPwvqMSbD4CKsV 2tuMpeZvFfGXNV/R/9PdtZcTrnND5QijEC7/wSiA5RCvnQCYT/nRj3RGa8Fc4t9JS4AT KM2ZVQ5PYQN2Ms9C9zRgXpSzMASGexjs/C1cfEI/rzhXLbSZGCbdFxYdA3Zb12l6qyv7 IWxNTumqtUvk1OM6ZP2NEJEochw49c9jmF/KtxPuB67un6Ql6lpvcJD1BZp/IPCFmE85 2yZaGR6k37tGzdXHdO91rYrzyrvz6VYaa/mug7I0eERLh6kJ3MBtn9OMVdHvQbqT7cG6 ZXWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=h8GWyIwQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si12617276edb.566.2021.04.14.08.31.15; Wed, 14 Apr 2021 08:31:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=h8GWyIwQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238477AbhDNL0J (ORCPT + 99 others); Wed, 14 Apr 2021 07:26:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350286AbhDNLZW (ORCPT ); Wed, 14 Apr 2021 07:25:22 -0400 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1F43C06138D for ; Wed, 14 Apr 2021 04:24:58 -0700 (PDT) Received: by mail-lf1-x130.google.com with SMTP id r128so5567312lff.4 for ; Wed, 14 Apr 2021 04:24:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OuhKmn9FYt80c9ROH/y1oStEmOA02E9r2ZFVc64O4vU=; b=h8GWyIwQyu60pOCe4C60MthHMqT5pMO47tMHYURZaLLnjuqqmRLPPW3NxTj3zmgCkq J9MJdes7wg290CO9mdRjSDufx1MJ88id5lkJr9CrTxpgHv7xmqwSq4HV/wDPoLhiLN3I Zjgr9bP+xMv1I85lpUGUO6rdq+4JRxq2NMzd9rxJD5kq/BEGlCxTHtSMhZtJgvZTgSx2 bl/VKJ7ykaGdB5iExvBHq0VwnYzGl8AH1TfgUwPqcMFyGrgg5I3pGLUqiUVWXnLRpkGS Iavi/tLjCf/cv1wGSfges1YuQVHuiVRsVl0V6Ds2Lkhu5rUQlPEKdH7YLOGzzGPQwSXk DEgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OuhKmn9FYt80c9ROH/y1oStEmOA02E9r2ZFVc64O4vU=; b=O7DPTdSPOvYFArnrJF+SPIwaQG/hpmq+yzaSePG6547eRoFYHYqhgbrzbKSc62c/D6 7U/6my9ILdP2ZPraTj5fQJc4xUz/Y1VG2qCzNrucLIE6TsaklQ+r6faGpIRrMjACSs+y tKu7AtaVyj+rWOMZ/CimSpK9HfKDcjAs9pVwCu8pI7NZwhQwZlO8du5fUXCOvlZlw7mY Fv0sIpvZRs02OJc5I5DRQZ7O5/mGMZWz+oQesDZtgsnIgWyGZkCFRVqf4bhRzkk8LvnF bx4OO5adXgQ/9OWY2jm4sfZp3UWimcsrvzpC0ooM9AAgajSY626IS/8RTHZku1HgmcQ1 IRhQ== X-Gm-Message-State: AOAM532c2MXFpcJK5MuA0zdtP1tDWYFW1XO0zDzFERZlrW4WLrkQIrGK vmxeeFNUSA1tDwZ0kGjOqdR6jDtXyKpQijnZI0l2UQ== X-Received: by 2002:ac2:5cae:: with SMTP id e14mr11595625lfq.69.1618399497182; Wed, 14 Apr 2021 04:24:57 -0700 (PDT) MIME-Version: 1.0 References: <20210414055217.543246-1-avagin@gmail.com> <87blahb1pr.fsf@oldenburg.str.redhat.com> In-Reply-To: <87blahb1pr.fsf@oldenburg.str.redhat.com> From: Jann Horn Date: Wed, 14 Apr 2021 13:24:30 +0200 Message-ID: Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space To: Florian Weimer Cc: Andrei Vagin , kernel list , Linux API , linux-um@lists.infradead.org, criu@openvz.org, Andrei Vagin , Andrew Morton , Andy Lutomirski , Anton Ivanov , Christian Brauner , Dmitry Safonov <0x7f454c46@gmail.com>, Ingo Molnar , Jeff Dike , Mike Rapoport , Michael Kerrisk , Oleg Nesterov , Peter Zijlstra , Richard Weinberger , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 14, 2021 at 12:27 PM Florian Weimer wrote: > > * Andrei Vagin: > > > We already have process_vm_readv and process_vm_writev to read and write > > to a process memory faster than we can do this with ptrace. And now it > > is time for process_vm_exec that allows executing code in an address > > space of another process. We can do this with ptrace but it is much > > slower. > > > > = Use-cases = > > We also have some vaguely related within the same address space: running > code on another thread, without modifying its stack, while it has signal > handlers blocked, and without causing system calls to fail with EINTR. > This can be used to implement certain kinds of memory barriers. That's what the membarrier() syscall is for, right? Unless you don't want to register all threads for expedited membarrier use? > It is > also necessary to implement set*id with POSIX semantics in userspace. > (Linux only changes the current thread credentials, POSIX requires > process-wide changes.) We currently use a signal for set*id, but it has > issues (it can be blocked, the signal could come from somewhere, etc.). > We can't use signals for barriers because of the EINTR issue, and > because the signal context is stored on the stack. This essentially becomes a question of "how much is set*id allowed to block and what level of guarantee should there be by the time it returns that no threads will perform privileged actions anymore after it returns", right? Like, if some piece of kernel code grabs a pointer to the current credentials or acquires a temporary reference to some privileged resource, then blocks on reading an argument from userspace, and then performs a privileged action using the previously-grabbed credentials or resource, what behavior do you want? Should setuid() block until that privileged action has completed? Should it abort that action (which is kinda what you get with the signals approach)? Should it just return immediately even though an attacker who can write to process memory at that point might still be able to influence a privileged operation that hasn't read all its inputs yet? Should the kernel be designed to keep track of whether it is currently holding a privileged resource? Or should the kernel just specifically permit credential changes in specific places where it is known that a task might block for a long time and it is not holding any privileged resources (kinda like the approach taken for freezer stuff)? If userspace wants multithreaded setuid() without syscall aborting, things get gnarly really fast; and having an interface to remotely perform operations under another task's context isn't really relevant to the core problem here, I think.