Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp2358488ybl; Sat, 14 Dec 2019 10:53:16 -0800 (PST) X-Google-Smtp-Source: APXvYqy6OJblZquYrNjv9lo5Aa4jBBvzSlLVk0Xbb28o4xuRjhQK9UpTLQX87gDp4bmmXL/DfL+s X-Received: by 2002:a9d:7999:: with SMTP id h25mr21640512otm.347.1576349596260; Sat, 14 Dec 2019 10:53:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576349596; cv=none; d=google.com; s=arc-20160816; b=G2cUfVOOlXLwAlv59Mdoivubb8/UB53NAenmpLzIK8lRri0rvs01wj84iP7x01c/ll gAqrL+aoglMx0TSC7m6O0oKq8CLkIIQ97oa6wGg+21cGWr0HfLjNtCeu6hylWp5xvikE EFq6cgbFbk5U2te/+u5a44uob/cylC59AwEd+13ZFRMaUNuCKHkv97KQHkS/UXQ4yVfh /Qgry1nqMWz+aweqEeubW5QqMOIwIaLb23wBQUUEOMYp1v3KsRGcmJXoYR3QLF3enjLJ fdbjDRRDrYdQCLxV9drh1oAswxoxnoJIquh6dTCh/+wbHwocQKcWUvFvkmOMrOe0EfWJ 5nEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=JXNSYmiyVo7J45ItzV9PCSkFWIS1r+3A4E6slrSpFSE=; b=aAb/Nrh5jXTyZfJfTI7S9Lnz6Mvon9poOyJOI/ujZzfjv1zs0LnR6+Ggp2dR3Eq8cv pqQU01jfNi10SSpyA+MVNAluMnu3rsP0Bw3WKehU490fPwaUdI/Etq0+4753nqqLrY6b wWBQJuIE7DDwtLuHnXYeR8a3c1mD7oHkpNyiFrCm9PhDXpGWR4wJ6i0MQ/e4TuBlkZEy egg78oHdM1IngRzWXysJPII9KPy5EAsB5RwiTbrIAV/XKYd15s89xPFIIpZzDuXr2WZG 7KZOsvF9yuCBA7wdsknsCsZaqL3UX/WcAfiyJCrdpPlZszKEmo0ofa7g3bVSNrFSkHHn +fAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=bGbeJVz1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f7si7005368otp.6.2019.12.14.10.53.02; Sat, 14 Dec 2019 10:53:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=bGbeJVz1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726713AbfLNSwU (ORCPT + 99 others); Sat, 14 Dec 2019 13:52:20 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:43081 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726072AbfLNSwU (ORCPT ); Sat, 14 Dec 2019 13:52:20 -0500 Received: by mail-pl1-f193.google.com with SMTP id p27so2646487pli.10 for ; Sat, 14 Dec 2019 10:52:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=JXNSYmiyVo7J45ItzV9PCSkFWIS1r+3A4E6slrSpFSE=; b=bGbeJVz1GOQY1Bp2BCxd3Mn6LcOg9sn6BjFkJNqwsqSIIipID4KY1YZBLQu/1pAWk+ sxRqaJ+y++8TDXAXLcpNhnZ62XB+u0P1SatNnWQg3pVzs9p0UzMsqPb4qQfQj+SplYEn zRGInrElloEFrABG8nLWyy7AlxgUSj+XT9FIQR/ObsaaJcT55izkmbDtnFnPYhiyUQK/ fnREpgyEQDQylGkFMi9oLTZ4LXh3sn9rDfZpx1AUxrvhSbcXTJELwG3h+awayONQO58o mRgvBY0CzIq6HF6f0pbHjlNW4A1pHXX4mP5/UE5DKzMc7cgPyYqsSTepERapkUQWR8To f4PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=JXNSYmiyVo7J45ItzV9PCSkFWIS1r+3A4E6slrSpFSE=; b=TZxOP3g5eGB4CHcQMJlbrMfAUAGCsT/FL6MkX++jjBIK8oYWoExaR5LgRdbing+X/T TWfx15RlIjW7OyoUbn6umMfIpSZpJ2yVgZVKyy5jHK852xF1eilHEnmA4u71DmOATAWl C+aeSTSQiDlPqwLuvotqj9AcAaoSdDE/aYnGxO17Wc2EGXc7cnVkyN3sglRjHsXwmQXi RfvEql52aywOEyzLmHmnnQUyuJhT4ShmfnxwEaLRDLNOC0ivSDV6AhtESkLdwn0NImas XNGHMJ06dPiO773RqM16HQsih7+naqD/ilzqszLNtqeiN8Dh18HlyaPKl8AKf9nIXw6E nBWg== X-Gm-Message-State: APjAAAUY5xSI3MybebBLfVUjO7H/37UjCH+ny28k/FnYP0XYCBJ0UXIf gya1mPTeFmwLiq1Nx8qP+cN2A9x1Ifn2Tw== X-Received: by 2002:a17:902:70cb:: with SMTP id l11mr6594894plt.216.1576349538419; Sat, 14 Dec 2019 10:52:18 -0800 (PST) Received: from [192.168.1.188] ([66.219.217.145]) by smtp.gmail.com with ESMTPSA id a13sm16044703pfc.40.2019.12.14.10.52.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 14 Dec 2019 10:52:17 -0800 (PST) Subject: Re: [RFC PATCH] io_uring: add support for IORING_OP_IOCTL To: Pavel Begunkov , Jann Horn Cc: io-uring , kernel list References: <9b4f56c1-dce9-1acd-2775-e64a3955d8ee@gmail.com> From: Jens Axboe Message-ID: <1f995281-4a56-a7de-d20b-14b0f64536c0@kernel.dk> Date: Sat, 14 Dec 2019 11:52:15 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <9b4f56c1-dce9-1acd-2775-e64a3955d8ee@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/14/19 10:56 AM, Pavel Begunkov wrote: > > On 14/12/2019 20:12, Jann Horn wrote: >> On Sat, Dec 14, 2019 at 4:30 PM Pavel Begunkov wrote: >>> This works almost like ioctl(2), except it doesn't support a bunch of >>> common opcodes, (e.g. FIOCLEX and FIBMAP, see ioctl.c), and goes >>> straight to a device specific implementation. >>> >>> The case in mind is dma-buf, drm and other ioctl-centric interfaces. >>> >>> Not-yet Signed-off-by: Pavel Begunkov >>> --- >>> >>> It clearly needs some testing first, though works fine with dma-buf, >>> but I'd like to discuss whether the use cases are convincing enough, >>> and is it ok to desert some ioctl opcodes. For the last point it's >>> fairly easy to add, maybe except three requiring fd (e.g. FIOCLEX) >>> >>> P.S. Probably, it won't benefit enough to consider using io_uring >>> in drm/mesa, but anyway. >> [...] >>> +static int io_ioctl(struct io_kiocb *req, >>> + struct io_kiocb **nxt, bool force_nonblock) >>> +{ >>> + const struct io_uring_sqe *sqe = req->sqe; >>> + unsigned int cmd = READ_ONCE(sqe->ioctl_cmd); >>> + unsigned long arg = READ_ONCE(sqe->ioctl_arg); >>> + int ret; >>> + >>> + if (!req->file) >>> + return -EBADF; >>> + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) >>> + return -EINVAL; >>> + if (unlikely(sqe->ioprio || sqe->addr || sqe->buf_index >>> + || sqe->rw_flags)) >>> + return -EINVAL; >>> + if (force_nonblock) >>> + return -EAGAIN; >>> + >>> + ret = security_file_ioctl(req->file, cmd, arg); >>> + if (!ret) >>> + ret = (int)vfs_ioctl(req->file, cmd, arg); >> >> This isn't going to work. For several of the syscalls that were added, >> special care had to be taken to avoid bugs - like for RECVMSG, for the >> upcoming OPEN/CLOSE stuff, and so on. >> >> And in principle, ioctls handlers can do pretty much all of the things >> syscalls can do, and more. They can look at the caller's PID, they can >> open and close (well, technically that's slightly unsafe, but IIRC >> autofs does it anyway) things in the file descriptor table, they can >> give another process access to the calling process in some way, and so >> on. If you just allow calling arbitrary ioctls through io_uring, you >> will certainly get bugs, and probably security bugs, too. >> >> Therefore, I would prefer to see this not happen at all; and if you do >> have a usecase where you think the complexity is worth it, then I >> think you'll have to add new infrastructure that allows each >> file_operations instance to opt in to having specific ioctls called >> via this mechanism, or something like that, and ensure that each of >> the exposed ioctls only performs operations that are safe from uring >> worker context. > > Sounds like hell of a problem. Thanks for sorting this out! While the ioctl approach is tempting, for the use cases where it makes sense, I think we should just add a ioctl type opcode and have the sub-opcode be somewhere else in the sqe. Because I do think there's a large opportunity to expose a fast API that works with ioctl like mechanisms. If we have IORING_OP_IOCTL and set aside an sqe field for the per-driver (or per-user) and add a file_operations method for sending these to the fd, then we'll have a much better (and faster + async) API than ioctls. We could add fops->uring_issue() or something, and that passes the io_kiocb. When it completes, the ->io_uring_issue() posts a completion by calling io_uring_complete_req() or something. Outside of the issues that Jann outlined, ioctls are also such a decade old mess that we have to do the -EAGAIN punt for all of them like you did in your patch. If it's opt-in like ->uring_issue(), then care could be taken to do this right and just have it return -EAGAIN if it does need async context. ret = fops->uring_issue(req, force_nonblock); if (ret == -EAGAIN) { ... usual punt ... } I think working on this would be great, and some of the more performance sensitive ioctl cases should flock to it. -- Jens Axboe