Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp6263975ybv; Wed, 12 Feb 2020 08:56:20 -0800 (PST) X-Google-Smtp-Source: APXvYqzhsyCcB4vqjwg6EsQTqNOujoHoeHyhfU6e8zNKpLKjt9mkvl65lVasQwI37VrjHbW6FMe3 X-Received: by 2002:a9d:7984:: with SMTP id h4mr10209994otm.297.1581526553324; Wed, 12 Feb 2020 08:55:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581526553; cv=none; d=google.com; s=arc-20160816; b=FtLNMLQtCJluFhDaSWAKOG6AmL0+ELOs9G5r11FqohY1v3+QNYLonyReoY5hl74hEi rU8TKEm8yeGcUbyXQPSaSVZNq9mEzkvDoFIB4f96uN7KgxsK90mqkqXQIXljPb5yEfa/ kOorxTj7vt0+8VnuHieEBfYESSAeTzo8u647Ih6PLiCioZinlwFQNzarmuf0kcNV2RHu JVgoGS8fFBOLlcnm6vmEkJ+EO3VRsRW6kUC51y0bZeHPpoFvSdi16XWiuAOJT7o578zD JOvcfTkWS5h8p9q/4HY12nsu8jgSXMgH51DIwYwdiIh8k7fiVl9Vai2NkzPhGcNsJS2a vdeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=lSkRhzJ0OgAeMcDZYCLOFCEGGgp2p5lC13LHYR1YgYQ=; b=dbt/nElp/8ZmmoCUqceraUSpsPHToWyZ/YNJST+7tay5orKCrbrDaHZe1fhIVT3yua EtvfFFuOEhH7czJTOcd9Sj93QzoOi3chGKb9xlD5IB3Jfxl8xoyGjkAuhqJnpcrIW2Pf Jjlb6T6nP4jqAhn5+uCHvtZ8dZDQPkypWpA9Xo5erutfvhw9lp9UixUjhGLXfEVHKK5t Qs0QgAHNH+n3WvdGx8FfLlx1L1TUzzKb7BZEMk3NgDYH6FN2lYp+FtVMKIFTXwwkQcuG T6M6dnlr7Vxmu7dpyDCALb3M83PKklQ9xotazWjUbUr09aWBMlKHpcln/HkOX2MdgsK8 4ppA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ZIpMsy4q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x25si3561900oie.261.2020.02.12.08.55.33; Wed, 12 Feb 2020 08:55:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ZIpMsy4q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727923AbgBLQzF (ORCPT + 99 others); Wed, 12 Feb 2020 11:55:05 -0500 Received: from mail-oi1-f195.google.com ([209.85.167.195]:45156 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727231AbgBLQzE (ORCPT ); Wed, 12 Feb 2020 11:55:04 -0500 Received: by mail-oi1-f195.google.com with SMTP id v19so2637716oic.12 for ; Wed, 12 Feb 2020 08:55:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lSkRhzJ0OgAeMcDZYCLOFCEGGgp2p5lC13LHYR1YgYQ=; b=ZIpMsy4qygjUeFX9wzK5ylo+se7CXBlKbUF6E5DO6YfLBLlgk0xfU1y+ioblr27Fxm XXRkd7e0QiBiV3hPCDz1YqWTFDcJe3Fgsk02FoO5u8ihRaB10mRZXKe8Oa75+g8N+t0U K2bfRy9ApAwR8kpZAdYe4JPB9Avx2l9v3UY6hANBGxx+DYJ7mGPe4pdbAGWsaRnnzFHv /ZDZS0IJQnbJhvBdCj487AMRSPTjkz5QPGPxaXmAGDK5fRgSdwZyklkRNsdY71SOBCR1 EfooY1HX6JZQoaOc0VzGQeu5nEloIPJ9sE1dd0j/Z+YSPcbV1Ks5oKZpCCV/4ZuABw3B oZpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lSkRhzJ0OgAeMcDZYCLOFCEGGgp2p5lC13LHYR1YgYQ=; b=Y3W9fvaGvTe9d3zdIGUuNvD/75Nal0XWC4rlV7zqUBVh7uYJuxYDwGujt4zC+lu+2k GmmfQzJuNaFM1xS2Ch4SdwBd8TjA7B+d9X6b2znF5RK2WN5wSyqSEks2twrN+tPR1d8E +0ex5ozhLffRTir6kMnF8bxMZjPJ7Zid8AW98RHZn7GIj5ERRPsjY2znppTuzn1KP7xz uNRKqOQVLw6wYxKXCzYb6y4b4mQXl76XOQnxrlpKkprU/TOp6eKnO4vSc8qN6fzARdJE FqRwezj1h5erBXZktSq1dG3iU6YgTXp3fvn+4H2D9YI5dN5jFcUduqBk2KjHCWWumZ82 yNWg== X-Gm-Message-State: APjAAAWurZXwSbf4FVrX8aMo/2VKQTFT4Ah1NCEIgeTZSwCiS0t+A1re po+h9JahH9H/E5MMFnPmKn5mnbCJr+0A+S1pgMTtZA== X-Received: by 2002:aca:1913:: with SMTP id l19mr6487887oii.47.1581526503068; Wed, 12 Feb 2020 08:55:03 -0800 (PST) MIME-Version: 1.0 References: <20200211225547.235083-1-dancol@google.com> <202002112332.BE71455@keescook> In-Reply-To: <202002112332.BE71455@keescook> From: Jann Horn Date: Wed, 12 Feb 2020 17:54:35 +0100 Message-ID: Subject: Re: [PATCH v2 0/6] Harden userfaultfd To: Kees Cook Cc: Daniel Colascione , Tim Murray , Nosh Minwalla , Nick Kralevich , Lokesh Gidra , kernel list , Linux API , SElinux list , Andrea Arcangeli , Mike Rapoport , Peter Xu , linux-security-module Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 12, 2020 at 8:51 AM Kees Cook wrote: > On Tue, Feb 11, 2020 at 02:55:41PM -0800, Daniel Colascione wrote: > > Let userfaultfd opt out of handling kernel-mode faults > > Add a new sysctl for limiting userfaultfd to user mode faults > > Now this I'm very interested in. Can you go into more detail about two > things: [...] > - Why is this needed in addition to the existing vm.unprivileged_userfaultfd > sysctl? (And should this maybe just be another setting for that > sysctl, like "2"?) > > As to the mechanics of the change, I'm not sure I like the idea of adding > a UAPI flag for this. Why not just retain the permission check done at > open() and if kernelmode faults aren't allowed, ignore them? This would > require no changes to existing programs and gains the desired defense. > (And, I think, the sysctl value could be bumped to "2" as that's a > better default state -- does qemu actually need kernelmode traps?) I think this might be necessary for I/O emulation? As in, if before getting migrated, the guest writes some data into a buffer, then the guest gets migrated, and then while the postcopy migration stuff is still running, the guest tells QEMU to write that data from guest-physical memory to disk or whatever; I think in that case, QEMU will do something like a pwrite() syscall where the userspace pointer points into the memory area containing guest-physical memory, which would return -EFAULT if userfaultfd was restricted to userspace accesses. This was described in this old presentation about why userfaultfd is better than a SIGSEGV handler: https://drive.google.com/file/d/0BzyAwvVlQckeSzlCSDFmRHVybzQ/view (slide 6) (recording at https://youtu.be/pC8cWWRVSPw?t=463)