Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp980245ybx; Tue, 5 Nov 2019 08:26:19 -0800 (PST) X-Google-Smtp-Source: APXvYqwB2HLUNbjfFLwCKlOFRr5VzkMIPRWIGdSbwEe1QzCdd3Jwp9MTTrZEVp4nfHpdAzBJCqCt X-Received: by 2002:a17:906:c836:: with SMTP id dd22mr30521465ejb.178.1572971179230; Tue, 05 Nov 2019 08:26:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1572971179; cv=none; d=google.com; s=arc-20160816; b=AENK4KnTD99mkLplYYCilUewmoJA/cdRU+abq4rbYyMLZXf7x6Ncm59DC/6OGCUeMt IMzzBjueB312HVSztzoizsJrxfMaJcFfO8WCDwkFtCAYthFn4g093otxeK4qZqMXA2g9 LIg928E7a3uLV7C5J52uqAtZnMkFYvAH9AZg8U4odI0HF1rzfLhLyWrCkXm6vPXSFB42 OQ6NgYXm0wvsQhn/tFdez9hR5m9TL3WZtnnwiWhbWVWDyZOadoTU4HnwWjyyhVToo+Zo 6Pjd+WNRU9tfcSIWUOWXZMOCvtc7o7yOFazptNZBoEkjXvSMVOzplehERTwsWuTZ0isV GSxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:user-agent:in-reply-to:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Al/+c8KHyhxPJlQUXk+cOOY+ajrIwqI170evHk6gx8A=; b=Iz3+xMXTKw2u8jPZ1oQtOkeNABZL0RShAO7OtFaVSsekmxu1K/t+YjHFq3mbx+frG9 S0dud0tbP3TXWI6Ccr47dgMDOCUvZpS9jC2U8VkkJGMtX+5UNxrPGjQ95mUYqwbpCAbO AEptgRNaBjNpn3VOJW8Uw/cu7cQCnF8jGmrCFwb88KZl8UbzO3w7EkzmKHk6BPKo2BPL Iy89SRbz1iV+T5cBl2IzIBVO9pMdnS227HhBGW+tzmaPVqOmqnYbd2mlV7yOzWNfA04o eNc0qJX9PqGmGC/0x+3KLoREEwUeczrthrrZijavOdJk71WFTlv+KgOErSiOedDe1+LQ BCqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CqI1ZXbl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m7si11551771eda.192.2019.11.05.08.25.55; Tue, 05 Nov 2019 08:26:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CqI1ZXbl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390327AbfKEQYc (ORCPT + 99 others); Tue, 5 Nov 2019 11:24:32 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:53612 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389934AbfKEQYc (ORCPT ); Tue, 5 Nov 2019 11:24:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1572971071; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Al/+c8KHyhxPJlQUXk+cOOY+ajrIwqI170evHk6gx8A=; b=CqI1ZXblgRzb+q9/pIvdG6gk6abBEDrZxCTpZSpMA55FPzDfKXcF0WO+A6CofiSowW8Cc1 Xjdlw8uUiDt7+Tzu+bw58v0TEAlcPBSUgQUINuDLj6j6YdhCENeDOU+geeHZIKXWoIrDxV OL8h5ZWKIcy1Xp/+ouIuFYK3kcc4Wkw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-93-4kcvJejoNyy-KkwYcL7N0g-1; Tue, 05 Nov 2019 11:24:28 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DD2C9107ACC3; Tue, 5 Nov 2019 16:24:25 +0000 (UTC) Received: from mail (ovpn-121-157.rdu2.redhat.com [10.10.121.157]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 65A90393B; Tue, 5 Nov 2019 16:24:25 +0000 (UTC) Date: Tue, 5 Nov 2019 11:24:24 -0500 From: Andrea Arcangeli To: Andy Lutomirski Cc: Daniel Colascione , Mike Rapoport , linux-kernel , Andrew Morton , Jann Horn , Linus Torvalds , Lokesh Gidra , Nick Kralevich , Nosh Minwalla , Pavel Emelyanov , Tim Murray , Linux API , linux-mm Subject: Re: [PATCH 1/1] userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK Message-ID: <20191105162424.GH30717@redhat.com> References: <1572967777-8812-1-git-send-email-rppt@linux.ibm.com> <1572967777-8812-2-git-send-email-rppt@linux.ibm.com> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.12.2 (2019-09-21) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 4kcvJejoNyy-KkwYcL7N0g-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 05, 2019 at 08:00:26AM -0800, Andy Lutomirski wrote: > On Tue, Nov 5, 2019 at 7:55 AM Daniel Colascione wrot= e: > > > > On Tue, Nov 5, 2019 at 7:29 AM Mike Rapoport wrote= : > > > > > > Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file > > > descriptor table from the read() implementation of uffd, which may ha= ve > > > security implications for unprivileged use of the userfaultfd. > > > > > > Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that h= ave > > > CAP_SYS_PTRACE. > > > > Thanks. But shouldn't we be doing the capability check at > > userfaultfd(2) time (when we do the other permission checks), not > > later, in the API ioctl? >=20 > The ioctl seems reasonable to me. In particular, if there is anyone > who creates a userfaultfd as root and then drop permissions, a later > ioctl could unexpectedly enable FORK. >=20 > This assumes that the code in question is only reachable through > ioctl() and not write(). write isn't implemented. Until UFFDIO_API runs, all other implemented syscalls are disabled (i.e. all other ioctls, poll and read). You can quickly verify all the 3 blocks by searching for UFFD_STATE_WAIT_API, UFFDIO_API is the place where the handshake with userland happens. userland asks for certain features and the kernel implementation of userlands answers yes or no. Normally we would only ever return -EINVAL on a request of a feature that isn't available in the running kernel (equivalent to -ENOSYS if the syscall is entirely missing on an even older kernel), -EPERM is more informative as it tells userland the feature is actually in the kernel just it requires more permissions. We could have returned -EINVAL too, but it wouldn't have made a difference to non-privileged CRIU and we're not aware of other users that could benefit from -EINVAL instead of -EPERM. This the relevant CRIU userland: if (ioctl(uffd, UFFDIO_API, &uffdio_api)) { pr_perror("Failed to get uffd API"); goto err; =09} Unfortunately this is an ABI break, preferred than the clean removal of the feature, because it's at least not going to break CRIU deployments running with the PTRACE privilege. The clean removal while non-ABI breaking, would have prevented all CRIU users to keep running after a kernel upgrade. The long term plan is to introduce UFFD_FEATURE_EVENT_FORK2 feature flag that uses the ioctl to receive the child uffd, it'll consume more CPU, but it wouldn't require the PTRACE privilege anymore. Overall any suid or SCM_RIGHTS fd-receiving app, that isn't checking the retval of open/socket or whatever fd "installing" syscall, is non robust and is prone to break over time as more people edit the code or as any library call internally change behavior, so if there's any practical issue caused by this, it should be fixed in userland too for higher robustness. If you stick your userland to std::fs and std::net robustness against issues like this is enforced by the language. Thanks, Andrea