Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5131355iob; Mon, 9 May 2022 09:13:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxqonDNzswEuRLuxYhOvn4ChMXqF+bfBpq63zCsyARwRtVz99wCPF2I6XAVSQITzUmanTud X-Received: by 2002:a05:6214:21e6:b0:45a:95a5:e82a with SMTP id p6-20020a05621421e600b0045a95a5e82amr14174181qvj.104.1652112826784; Mon, 09 May 2022 09:13:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652112826; cv=none; d=google.com; s=arc-20160816; b=dIu0x8w7cdvE9SkhOqER8jVXFTpDmqkPa4aPuywEwnNCQMRSOaZvMQpsv6LQRd4PNn /DvjkQK2wAu+3h5XEsrrNA4LW9hBDK85/BvyD/tWAM6DFjCI+h7sqT369BFd23ejXzMQ gUE3/kflRb21tAazni868cIuCBg866gkuWVM1m0HFhJLa8GMh5P+g12y8OELqU4YHa9y 8TKoSQzLFAJgTthtSUghLPqybb3intafvlzU8NVUakLb0x1syBMnylcTDedLHeVEmZSJ qIkp6i8P9qGjND4ZYHuLicK9N++vC3+Jk49rWu1O1iBwl5ldjwTZKnFc1GgAaoU04Fs6 Ap9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=fCvLOr8+eHkzeHyWMYW7RF71t3Gl1LCKsc2kzaAVZOM=; b=rpmfUQZTbbu4qnt3RnCkdb3botodGFTIO476lS2dlyrN6g/D77uqDQ2IuYjO2ZWfNA y04Osfx4s8f/8f5hEj6PGrT/vPFS6ppYkX4kGdxd71Lg6TWbElNvX/8URZus61bb6Pyb 5wUyjt9MX2b14nUsNaWWnANyRt95xjs8v0ix8M+fIgg48W93Nl3MkI8fDYAQfeZpUyy5 UQnaSKja1nhB+4pqq5fT/FPpTOEjFzpUD9N9QEArCN6UKp9nBaRgADQYL9y+jLYyV02S qMQXsUDp2hoLJ6E3CgclTtCxh0OaMg2NOyTUkYGU8qkwUDka4g+fqZP6C+0UxnZurTH8 216g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20210112.gappssmtp.com header.s=20210112 header.b=QlT+8onn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id a17-20020ac85b91000000b002f35c0bf9e8si8119944qta.533.2022.05.09.09.13.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 09:13:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20210112.gappssmtp.com header.s=20210112 header.b=QlT+8onn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ABE7F27A8A2; Mon, 9 May 2022 09:09:23 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238957AbiEIQNJ (ORCPT + 99 others); Mon, 9 May 2022 12:13:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238893AbiEIQNH (ORCPT ); Mon, 9 May 2022 12:13:07 -0400 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEC322764F1 for ; Mon, 9 May 2022 09:09:12 -0700 (PDT) Received: by mail-io1-xd32.google.com with SMTP id f4so15837952iov.2 for ; Mon, 09 May 2022 09:09:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=fCvLOr8+eHkzeHyWMYW7RF71t3Gl1LCKsc2kzaAVZOM=; b=QlT+8onnX75/tEvfJkYh9kLNZ+hsPPZPdpIbHZXl8kpppooGxKYmuoC6vLZclgirkt wTUJPGqs70408FK5mHAO6rN9wR9jArSexSqpr6Zv4jIqknRuMdyECi9FKHFSFDVoLPPl pZDYCqKxHpQUtd2fiQH7UKzkf1FxOOgh+o8EheT71+7rFS7zEoRna5dzBgP8KNcJCJVt 0ALV6L2qoolorwBEsM8jk2TyDagowC44jDNzNgJ317AhCVW3oR+3ULHcLKoc/ehyGVSe PNDbAyAOih/dVPedbHz/cvsnG+m6NWT9W+OBW0zbsEnECBTCokUFrm5/lL3Qqa8HR5w7 dCvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=fCvLOr8+eHkzeHyWMYW7RF71t3Gl1LCKsc2kzaAVZOM=; b=s71We/o1xEmHdBKnAts/5l3vE4l36dnuNVkcBwebJ+avtEcJ8e/7/lnP8p9DWu0f8F l3qGVhn4LcTzRkQhI0QuMT37sWrwlfcxpU+6FlfFENpyTygFWyeD5mqBaTbQW84wLsFR Cd2a+GEyv1dE5+UiSjSq+EDj/n3n4BvBr9ZEEOCe7I6mC9u+m+CM65h6LEXfE1qKMAAt 5iAXM8XbbR7/tR5oWWFlvhQ4qs4pv2BjEGw2GGDovCRbse+R0nEuRLtftNDNDbfO9nou l6wTufVLWdCQavBFyb+0QNQ2Q3omGBRXm10KhZU38P76fhQk5X4o872eUPLi7r6qr/he qIaQ== X-Gm-Message-State: AOAM530fmllIErVx3RTWBZxnisOTTa+DRshz9YvwxEBpI0rT2BUvfkTw 9O8d0Ud5A3WVcYY/+luGECBJVU5vowNIyA== X-Received: by 2002:a05:6638:372c:b0:32b:604c:ec04 with SMTP id k44-20020a056638372c00b0032b604cec04mr7108749jav.84.1652112552162; Mon, 09 May 2022 09:09:12 -0700 (PDT) Received: from [192.168.1.172] ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id cp20-20020a056638481400b0032b75b98013sm3695366jab.148.2022.05.09.09.09.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 May 2022 09:09:11 -0700 (PDT) Message-ID: <9c833e12-fd09-fe7d-d4f2-e916c6ce4524@kernel.dk> Date: Mon, 9 May 2022 10:09:10 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [RFC PATCH] ubd: add io_uring based userspace block driver Content-Language: en-US To: Ming Lei Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, Gabriel Krisman Bertazi , ZiyangZhang , Xiaoguang Wang References: <20220509092312.254354-1-ming.lei@redhat.com> From: Jens Axboe In-Reply-To: <20220509092312.254354-1-ming.lei@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/9/22 3:23 AM, Ming Lei wrote: > This is the driver part of userspace block driver(ubd driver), the other > part is userspace daemon part(ubdsrv)[1]. > > The two parts communicate by io_uring's IORING_OP_URING_CMD with one > shared cmd buffer for storing io command, and the buffer is read only for > ubdsrv, each io command is indexed by io request tag directly, and > is written by ubd driver. > > For example, when one READ io request is submitted to ubd block driver, ubd > driver stores the io command into cmd buffer first, then completes one > IORING_OP_URING_CMD for notifying ubdsrv, and the URING_CMD is issued to > ubd driver beforehand by ubdsrv for getting notification of any new io request, > and each URING_CMD is associated with one io request by tag. > > After ubdsrv gets the io command, it translates and handles the ubd io > request, such as, for the ubd-loop target, ubdsrv translates the request > into same request on another file or disk, like the kernel loop block > driver. In ubdsrv's implementation, the io is still handled by io_uring, > and share same ring with IORING_OP_URING_CMD command. When the target io > request is done, the same IORING_OP_URING_CMD is issued to ubd driver for > both committing io request result and getting future notification of new > io request. > > Another thing done by ubd driver is to copy data between kernel io > request and ubdsrv's io buffer: > > 1) before ubsrv handles WRITE request, copy the request's data into > ubdsrv's userspace io buffer, so that ubdsrv can handle the write > request > > 2) after ubsrv handles READ request, copy ubdsrv's userspace io buffer > into this READ request, then ubd driver can complete the READ request > > Zero copy may be switched if mm is ready to support it. > > ubd driver doesn't handle any logic of the specific user space driver, > so it should be small/simple enough. This is pretty interesting! Just one small thing I noticed, since you want to make sure batching is Good Enough: > +static blk_status_t ubd_queue_rq(struct blk_mq_hw_ctx *hctx, > + const struct blk_mq_queue_data *bd) > +{ > + struct ubd_queue *ubq = hctx->driver_data; > + struct request *rq = bd->rq; > + struct ubd_io *io = &ubq->ios[rq->tag]; > + struct ubd_rq_data *data = blk_mq_rq_to_pdu(rq); > + blk_status_t res; > + > + if (ubq->aborted) > + return BLK_STS_IOERR; > + > + /* this io cmd slot isn't active, so have to fail this io */ > + if (WARN_ON_ONCE(!(io->flags & UBD_IO_FLAG_ACTIVE))) > + return BLK_STS_IOERR; > + > + /* fill iod to slot in io cmd buffer */ > + res = ubd_setup_iod(ubq, rq); > + if (res != BLK_STS_OK) > + return BLK_STS_IOERR; > + > + blk_mq_start_request(bd->rq); > + > + /* mark this cmd owned by ubdsrv */ > + io->flags |= UBD_IO_FLAG_OWNED_BY_SRV; > + > + /* > + * clear ACTIVE since we are done with this sqe/cmd slot > + * > + * We can only accept io cmd in case of being not active. > + */ > + io->flags &= ~UBD_IO_FLAG_ACTIVE; > + > + /* > + * run data copy in task work context for WRITE, and complete io_uring > + * cmd there too. > + * > + * This way should improve batching, meantime pinning pages in current > + * context is pretty fast. > + */ > + task_work_add(ubq->ubq_daemon, &data->work, TWA_SIGNAL); > + > + return BLK_STS_OK; > +} It'd be better to use bd->last to indicate what kind of signaling you need here. TWA_SIGNAL will force an immediate transition if the app is running in userspace, which may not be what you want. Also see: https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=e788be95a57a9bebe446878ce9bf2750f6fe4974 But regardless of signaling needed, you don't need it except if bd->last is true. Would need a commit_rqs() as well, but that's trivial. More importantly, what prevents ubq->ubq_daemon from going away after it's been assigned? I didn't look at the details, but is this relying on io_uring being closed to cancel pending requests? That should work, but we need some way to ensure that ->ubq_daemon is always valid here. -- Jens Axboe