Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1569102iob; Thu, 19 May 2022 09:15:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyFg/3D/+hGFhagcF3FJnb6DGk4HjvoHfgueX45Ey/ye/H9JDqicKDf/XeT9kl5dGYy3MlM X-Received: by 2002:a17:902:8c98:b0:15f:3e78:dc43 with SMTP id t24-20020a1709028c9800b0015f3e78dc43mr5297065plo.120.1652976920368; Thu, 19 May 2022 09:15:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652976920; cv=none; d=google.com; s=arc-20160816; b=mFIAu36zhyz94lzbwkvgl2Hu0EjoKMuMZnNDSExR2LPoA49aSKv3wRndUjNow4FS0f Z/1GP8jrF+qa+m9FJot7DieUvuTxvBP0d4LkduTgdkyAOZouEGpaiVcaKrUWjC9ovc9O QnozZ818yuM2AeDBUPRuM1Yamf6/RIFNh2Mw0QMEqOQ4LYtIuxIFBZiUfc3j6MqtJeUE OOVHpRE1jjukDKKXrhkV8xD/SzlpLJuQ2Ofx0yp9F09KdAaPewOWYyCeYTZk3yX+7Soa CXVrU5eb9Ex07cNy0ASl/dbZ++iLCm4HN6f8/tr42fJJ66omaf1SI5NfeQcI+wgtZf/w Vpzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=eMu0TlQoEPsu5A0dNqyE4kosLWfd95ivJDcxXU5XZ9k=; b=UNtzwepYOok0s0gpsooqbvmLbvwDCXquMXOE6NNYNAMI8POYD3caNMiQPjXL1sFi8S Jz9EKZFJIEk1QIiGDbiIGL2uDuIfVVYpJG/qvMfGm+ry8VmcGvn+zlcHy48dhYeoHaLq AMvQF1mb4kqxn4HbnmQh+RcDOyzRcJ8NQ+h5HgFMmHETXpbsRHhzESh5eygAoUZdzqTX LhM30gSoW6pqaZcHeaCCFEUk6z7hAYnfl3cL8c/3TRHg2jqejK2sBXA5flwT5oY/bZwP YYbonb31JXN07EtpHlHwaSW/ia5JVDI5zlJ5DVij8g/oeQ+8LDk4iJwcmn5OVWmBFLas ba+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PO3K0+kD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id il6-20020a17090b164600b001df53dcc0absi5367pjb.83.2022.05.19.09.15.07; Thu, 19 May 2022 09:15:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PO3K0+kD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233140AbiESNdw (ORCPT + 99 others); Thu, 19 May 2022 09:33:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232861AbiESNdu (ORCPT ); Thu, 19 May 2022 09:33:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AFCD059B9C for ; Thu, 19 May 2022 06:33:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652967228; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=eMu0TlQoEPsu5A0dNqyE4kosLWfd95ivJDcxXU5XZ9k=; b=PO3K0+kD9FYhusOrWIkKqJDfuz3eSrp3a21zazdshujyi0yVFxlE//DeV30+b5iF5of2QJ 2XEO57fGEdgn0lOclBE1icCQETQy/pzKC+q+E8BcBBFgZqbESGddVdJOUtVav3i5mexQku d5k3SRodnFkyNpI+VSyypthpupI+bdA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-310-B37KWeoPNrm0MpDgCivevw-1; Thu, 19 May 2022 09:33:47 -0400 X-MC-Unique: B37KWeoPNrm0MpDgCivevw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 78B1E185A7BA; Thu, 19 May 2022 13:33:46 +0000 (UTC) Received: from T590 (ovpn-8-21.pek2.redhat.com [10.72.8.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F1AF67AE4; Thu, 19 May 2022 13:33:39 +0000 (UTC) Date: Thu, 19 May 2022 21:33:34 +0800 From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Harris James R , io-uring@vger.kernel.org, Gabriel Krisman Bertazi , ZiyangZhang , Xiaoguang Wang , Stefan Hajnoczi , ming.lei@redhat.com Subject: Re: [PATCH V2 0/1] ubd: add io_uring based userspace block driver Message-ID: References: <20220517055358.3164431-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220517055358.3164431-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 17, 2022 at 01:53:57PM +0800, Ming Lei wrote: > Hello Guys, > > ubd driver is one kernel driver for implementing generic userspace block > device/driver, which delivers io request from ubd block device(/dev/ubdbN) into > ubd server[1] which is the userspace part of ubd for communicating > with ubd driver and handling specific io logic by its target module. > > Another thing ubd driver handles is to copy data between user space buffer > and request/bio's pages, or take zero copy if mm is ready for support it in > future. ubd driver doesn't handle any IO logic of the specific driver, so > it is small/simple, and all io logics are done by the target code in ubdserver. > > The above two are main jobs done by ubd driver. > > ubd driver can help to move IO logic into userspace, in which the > development work is easier/more effective than doing in kernel, such as, > ubd-loop takes < 200 lines of loop specific code to get basically same > function with kernel loop block driver, meantime the performance is > still good. ubdsrv[1] provide built-in test for comparing both by running > "make test T=loop". > > Another example is high performance qcow2 support[2], which could be built with > ubd framework more easily than doing it inside kernel. > > Also there are more people who express interests on userspace block driver[3], > Gabriel Krisman Bertazi proposes this topic in lsf/mm/ebpf 2022 and mentioned > requirement from Google. Ziyang Zhang from Alibaba said they "plan to > replace TCMU by UBD as a new choice" because UBD can get better throughput than > TCMU even with single queue[4], meantime UBD is simple. Also there is userspace > storage service for providing storage to containers. > > It is io_uring based: io request is delivered to userspace via new added > io_uring command which has been proved as very efficient for making nvme > passthrough IO to get better IOPS than io_uring(READ/WRITE). Meantime one > shared/mmap buffer is used for sharing io descriptor to userspace, the > buffer is readonly for userspace, each IO just takes 24bytes so far. > It is suggested to use io_uring in userspace(target part of ubd server) > to handle IO request too. And it is still easy for ubdserver to support > io handling by non-io_uring, and this work isn't done yet, but can be > supported easily with help o eventfd. > > This way is efficient since no extra io command copy is required, no sleep > is needed in transferring io command to userspace. Meantime the communication > protocol is simple and efficient, one single command of > UBD_IO_COMMIT_AND_FETCH_REQ can handle both fetching io request desc and commit > command result in one trip. IO handling is often batched after single > io_uring_enter() returns, both IO requests from ubd server target and > IO commands could be handled as a whole batch. > > Remove RFC now because ubd driver codes gets lots of cleanup, enhancement and > bug fixes since V1: > > - cleanup uapi: remove ubd specific error code, switch to linux error code, > remove one command op, remove one field from cmd_desc > > - add monitor mechanism to handle ubq_daemon being killed, ubdsrv[1] > includes builtin tests for covering heavy IO with deleting ubd / killing > ubq_daemon at the same time, and V2 pass all the two tests(make test T=generic), > and the abort/stop mechanism is simple > > - fix MQ command buffer mmap bug, and now 'xfstetests -g auto' works well on > MQ ubd-loop devices(test/scratch) > > - improve batching submission as suggested by Jens > > - improve handling for starting device, replace random wait/poll with > completion > > - all kinds of cleanup, bug fix,.. > > And the patch by patch change since V1 can be found in the following > tree: > > https://github.com/ming1/linux/commits/my_for-5.18-ubd-devel_v2 BTW, a one-line fix[1] is added to above branch, which fixes performance obviously on small BS(< 128k) test. If anyone run performance test, please include this fix. [1] https://github.com/ming1/linux/commit/fa91354b418e83953304a3efad4ee6ac40ea6110 Thanks, Ming