Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp1949876pxb; Fri, 24 Sep 2021 16:08:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyEiMev7kGQJ73wM5VRI72yPhQBMJpgk3kZF5YHhMaqGvlE6DXAiM5Bt/2505SX4lCua1Dh X-Received: by 2002:a50:bec8:: with SMTP id e8mr8136197edk.231.1632524902154; Fri, 24 Sep 2021 16:08:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632524902; cv=none; d=google.com; s=arc-20160816; b=Bexdd9HMxqtTxgvGProuVn++9P/Zo1QO2jHdiR4r0z63WYbsjJ72Di+tf6s6dvK8Hj bn+SIZC1PvX2FwjkI9UEzpAoiekTTME/E4Cfrz+chbunALiGoqx4zG/ABV0LAsa0u3tD p29pUcCn69//n+MMHzYgTXI1hEwAkCAcjQsd6r8RXIRtOOpady3iAXAUJQWR5iEHa8rB YbpsYPKT3llzZEPyNynOqCk46xOm06uKTFC+abV4SF0MtZk9WmMrqSucGP1wmC6OkuL/ m5c9NJj3F09yi4A/6Xbccf6EPd1ojwFmbaf1E4G6TGkL0nUcVwC2E3VqzUtAX9WlWeyQ xGhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=BTEnaTsjZV/Myt/S7bXbh7jnmxht9cJklJxA24XO/+s=; b=RMjmsV2BmlZNsoegWGmw28Cc/b5nOh3JrL2cOpv70hvnNnKQk+/r552/6xeLz6YaXd rQUna3WfHL2GeelnpOXM9PjJG+Hi0a+qV0EuoYKESx/hvhZbV+1KGhVHMPeatIiZj1sn 6eALpo/rJOExT7Kt5qMW3Pa5EP6lkIRy99pHQB9SbjrX9zNkIZ0cQUGczmCzXkPuw6oR Af8WYFgswkKagJtmlVjDUu9VMgvUzZTj7Wmdvc0meFhIrHPVSzwRVxSi7jvSMsY4PpDE b7Ohct43EQq4jQOLlWY+KECenJaFk21Iblo2PJkDxi5/xfXTBU2LI0SubNYE/TLWGfPv R4CQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FFKPuUGC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n1si11229058ejl.229.2021.09.24.16.07.57; Fri, 24 Sep 2021 16:08:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FFKPuUGC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245016AbhIXVvw (ORCPT + 99 others); Fri, 24 Sep 2021 17:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232861AbhIXVvv (ORCPT ); Fri, 24 Sep 2021 17:51:51 -0400 Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08753C061571 for ; Fri, 24 Sep 2021 14:50:18 -0700 (PDT) Received: by mail-io1-xd2a.google.com with SMTP id s20so14499416ioa.4 for ; Fri, 24 Sep 2021 14:50:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BTEnaTsjZV/Myt/S7bXbh7jnmxht9cJklJxA24XO/+s=; b=FFKPuUGCIRUXYPy2O/4lSvx0BoSKh+LtG779CDrIiBYjwr8rF8GhiDlDJz9k3PQcRg R+AyzSSolA98kI/R4a6J5pWx8MMk6eSIyR27Ye3se14WayDENm90zqMom1CzdDfzu5YR w9HDf/Tu2tRebCcy2NIkgAQ3rOk0KgOvZKP5eX9NTF/AJTz9iZ4Rag4jqujI+UAmWFlI WBSzQkcHWFZ1EO5N3nObdE27+oc+EBuJSxKd4EuD6FIUrx5UGWa/X1xX+1i1nsmw+kjc oRTfFdHOZaHOo3x1iuH2Sjr2yfMOimeb7SuTXFuT+MJZ1re/gFtkU9uVtI0Ut7lJp3IJ Qi4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BTEnaTsjZV/Myt/S7bXbh7jnmxht9cJklJxA24XO/+s=; b=w6UZ4Gq285XfVnoCH9VqBaH9Xu33pf6aNSxdnXRwreg6oPJxZosT6dR76QMxDv0cbA pWOoiDLixrxZFswq7KJcXRdckQ8ZoX4glAZZMqEujsgSxU8RZaOJMPUC0nErVaqSTchO Nfi073KFfkq3RIzP6491VYnRRwBjGoLY54PrzG217fLwtBYlW3styJVEMCwFfK8NTHO2 p7kOXkafEOv9ZWCZgYEObU8Bb8AMEc7wkyXLAFHe0w7UB30sLlY7xsutM9xWlfAhybAB mFk8cOyYimGD851n7twnkIX7DsY7liupFXb4MswZ/d28UWJa3cFmoyhs35XKwnApdf26 3H8A== X-Gm-Message-State: AOAM530xQgp+vJEDzApDySM/lHYgSaNr+vYGVUwM3Gmemd1BpkNc/C9X WJlQ576EHvp1A0ksqGt9Raxw1WgbAZX3j9Zs28YvfQ== X-Received: by 2002:a6b:5a0c:: with SMTP id o12mr10993276iob.140.1632520216332; Fri, 24 Sep 2021 14:50:16 -0700 (PDT) MIME-Version: 1.0 References: <20210922061809.736124-1-pcc@google.com> <87k0j8zo35.fsf@disp2133> <202109220755.B0CFED9F5@keescook> In-Reply-To: From: Peter Collingbourne Date: Fri, 24 Sep 2021 14:50:04 -0700 Message-ID: Subject: Re: [PATCH] kernel: introduce prctl(PR_LOG_UACCESS) To: Jann Horn Cc: Kees Cook , "Eric W. Biederman" , Catalin Marinas , Will Deacon , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Thomas Gleixner , Andy Lutomirski , Andrew Morton , Masahiro Yamada , Sami Tolvanen , YiFei Zhu , Colin Ian King , Mark Rutland , Frederic Weisbecker , Viresh Kumar , Andrey Konovalov , Gabriel Krisman Bertazi , Balbir Singh , Chris Hyser , Daniel Vetter , Chris Wilson , Arnd Bergmann , Dmitry Vyukov , Christian Brauner , Alexey Gladkov , Ran Xiaokai , David Hildenbrand , Xiaofeng Cao , Cyrill Gorcunov , Thomas Cedeno , Marco Elver , Alexander Potapenko , Linux Kernel Mailing List , Linux ARM , Evgenii Stepanov Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 22, 2021 at 8:59 AM Jann Horn wrote: > > On Wed, Sep 22, 2021 at 5:30 PM Kees Cook wrote: > > On Wed, Sep 22, 2021 at 09:23:10AM -0500, Eric W. Biederman wrote: > > > Peter Collingbourne writes: > > > > > > > This patch introduces a kernel feature known as uaccess logging. > > > > With uaccess logging, the userspace program passes the address and size > > > > of a so-called uaccess buffer to the kernel via a prctl(). The prctl() > > > > is a request for the kernel to log any uaccesses made during the next > > > > syscall to the uaccess buffer. When the next syscall returns, the address > > > > one past the end of the logged uaccess buffer entries is written to the > > > > location specified by the third argument to the prctl(). In this way, > > > > the userspace program may enumerate the uaccesses logged to the access > > > > buffer to determine which accesses occurred. > > > > [...] > > > > 3) Kernel fuzzing. We may use the list of reported kernel accesses to > > > > guide a kernel fuzzing tool such as syzkaller (so that it knows which > > > > parts of user memory to fuzz), as an alternative to providing the tool > > > > with a list of syscalls and their uaccesses (which again thanks to > > > > (2) may not be accurate). > > > > > > How is logging the kernel's activity like this not a significant > > > information leak? How is this safe for unprivileged users? > > > > This does result in userspace being able to "watch" the kernel progress > > through a syscall. I'd say it's less dangerous than userfaultfd, but > > still worrisome. (And userfaultfd is normally disabled[1] for unprivileged > > users trying to interpose the kernel accessing user memory.) > > > > Regardless, this is a pretty useful tool for this kind of fuzzing. > > Perhaps the timing exposure could be mitigated by having the kernel > > collect the record in a separate kernel-allocated buffer and flush the > > results to userspace at syscall exit? (This would solve the > > copy_to_user() recursion issue too.) Seems reasonable. I suppose that in terms of timing information we're already (unavoidably) exposing how long the syscall took overall, and we probably shouldn't deliberately expose more than that. That being said, I'm wondering if that has security implications on its own if it's then possible for userspace to manipulate the kernel into allocating a large buffer (either at prctl() time or as a result of getting the kernel to do a large number of uaccesses). Perhaps it can be mitigated by limiting the size of the uaccess buffer provided at prctl() time. > Other than what Kees has already said, the only security concern I > have with that patch should be trivial to fix: If the ->uaccess_buffer > machinery writes to current's memory, it must be reset during > execve(), before switching to the new mm, to prevent the old task from > causing the kernel to scribble into the new mm. Yes, that's a good point. I'll fix that in the next version. > One aspect that might benefit from some clarification on intended > behavior is: what should happen if there are BPF tracing programs > running (possibly as part of some kind of system-wide profiling or > such) that poke around in userspace memory with BPF's uaccess helpers > (especially "bpf_copy_from_user")? I think we should probably be ignoring those accesses, since we cannot know a priori whether the accesses are directly associated with the syscall or not, and this is after all a best-effort mechanism. > > I'm pondering what else might be getting exposed by creating this level > > of probing... kernel addresses would already be getting rejected, so > > they wouldn't show up in the buffer. Hmm. Jann, any thoughts here? > > > > > > Some other thoughts: > > > > > > Instead of reimplementing copy_*_user() with a new wrapper that > > bypasses some checks and adds others and has to stay in sync, etc, > > how about just adding a "recursion" flag? Something like: > > > > copy_from_user(...) > > instrument_copy_from_user(...) > > uaccess_buffer_log_read(...) > > if (current->uaccess_buffer.writing) > > return; > > uaccess_buffer_log(...) > > current->uaccess_buffer.writing = true; > > copy_to_user(...) > > current->uaccess_buffer.writing = false; > > > > > > How about using this via seccomp instead of a per-syscall prctl? This > > would mean you would have very specific control over which syscalls > > should get the uaccess tracing, and wouldn't need to deal with > > the signal mask (I think). I would imagine something similar to > > SECCOMP_FILTER_FLAG_LOG, maybe SECCOMP_FILTER_FLAG_UACCESS_TRACE, and > > add a new top-level seccomp command, (like SECCOMP_GET_NOTIF_SIZES) > > maybe named SECCOMP_SET_UACCESS_TRACE_BUFFER. > > > > This would likely only make sense for SECCOMP_RET_TRACE or _TRAP if the > > program wants to collect the results after every syscall. And maybe this > > won't make any sense across exec (losing the mm that was used during > > SECCOMP_SET_UACCESS_TRACE_BUFFER). Hmmm. > > And then I guess your plan would be that userspace would be expected > to use the userspace instruction pointer > (seccomp_data::instruction_pointer) to indicate instructions that > should be traced? > > Or instead of seccomp, you could do it kinda like > https://www.kernel.org/doc/html/latest/admin-guide/syscall-user-dispatch.html > , with a prctl that specifies a specific instruction pointer? Given a choice between these two options, I would prefer the prctl() because userspace programs may already be using seccomp filters and sanitizers shouldn't interfere with it. However, in either the seccomp filter or prctl() case, you still have the problem of deciding where to log to. Keep in mind that you would need to prevent intervening async signals (that occur between when the syscall happens and when we read the log) from triggering additional syscalls that may overwrite the log (as a result of using the same syscall wrapper). This implies that the log location would need to be per-syscall, so the location would need to be on the stack (or equivalent, like a heap-allocated buffer with lifetime tied to that of the syscall wrapper function). So then, how do you notify the kernel of that location? One possibility is arch-specific augmentation of the syscall calling convention, as I mentioned in the initial message. But then if you're doing that, you might as well dispense with the seccomp filter or prctl() entirely and have the request for a log be communicated purely via the calling convention. It seems like any approach which avoids the multiple syscalls is going to be arch-specific to some degree. So I think I would prefer to start with a "slow but simple" API, and let architectures provide their own arch-specific mechanisms if they wish to optimize it. Peter