Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp1312794ybg; Mon, 27 Jul 2020 13:35:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzowb/JCUwvjoJ5u10ZKfnXZcHsvd26/iIqEsW1Sun1X8eb/VzJmCqZ2b9J/BIBh/sQPMKJ X-Received: by 2002:a17:906:c096:: with SMTP id f22mr13490080ejz.159.1595882151399; Mon, 27 Jul 2020 13:35:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595882151; cv=none; d=google.com; s=arc-20160816; b=RB8mNg8NQFr0w0Du2WE9A0LVc3rxPBJtjwCgWWL7C6yEud8lV/ByIa7fHCCDYSY+7Y P9ngLU/g7WavOaM9yZVV4yz164Wc/6EXjJ9TRp8/E1SOJG63eeSz4CIkzSFiDgBH4mmW 1UcD3eAp9S8yXjqRmg/0PWev/7mblEew3FlOPLpc+uK9+i9otJtUJAuDq1IdZmYTpL/D BuUlctD/wmDM/8CXI3p/AHEL0dMHJ4QpYVW8XVm+HcTyvFvJPzxKjh4S995YbRSPZ7lM QCAPQcPVksrdskXHj3DJlag3PwjXTIWxylMSE/VLXnnqJltgS2X5BQn40HW24g9Tdqoy TNOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=1e9h0tE+ElK0HWRPXiuc5VM0AhTp1F0sLszAnPFX/o4=; b=L9QehEaznzVZVgov2u+cYpICSD9fMBWeV5gqS6OJNNpkwUxmWs28svfUH/XGjWy0cw GuOdlSOb3npSx3NfxsWualJjNFPo/ZcHKpMaKNHmoEGNzVZtqSiPUP3G0AOW2TKYut9L f/gBcEzqzufpfpGQW9YOpbn0gUEvvI9r09zlEH9a35GBqeHGWVGUwXY/CJi4HMz+tBOM F5Ez0YIRz/+HZCXrrD4eTsQ+/+riETZyEkiPJ4mt/9MbJxmMokotrU51gnGuYEp0GuvK vlbDWYD/XvxuO4E9c/pLCQk9/mTol3Srk3GIwGSg8vRb5T9RauTvETstQGEf/2aJLMbo WgZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Zafaxclz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs1si7017896edb.418.2020.07.27.13.35.28; Mon, 27 Jul 2020 13:35:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Zafaxclz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729266AbgG0Uee (ORCPT + 99 others); Mon, 27 Jul 2020 16:34:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728917AbgG0Uec (ORCPT ); Mon, 27 Jul 2020 16:34:32 -0400 Received: from mail-pj1-x1041.google.com (mail-pj1-x1041.google.com [IPv6:2607:f8b0:4864:20::1041]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C182AC061794 for ; Mon, 27 Jul 2020 13:34:32 -0700 (PDT) Received: by mail-pj1-x1041.google.com with SMTP id k71so10325726pje.0 for ; Mon, 27 Jul 2020 13:34:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=1e9h0tE+ElK0HWRPXiuc5VM0AhTp1F0sLszAnPFX/o4=; b=ZafaxclzPZhVsXz8l83xVPlfPaaHGIsnBsq3YYS+6u818LolWPB9JvqcfE5gaa8xTt xMeKMiEUoOBjhQEN30y00OvB3GJvnZCmHen3WlDqQl1atnjPNZzXQnXq13xWBdO0RqDk UjBeZR8HAFCSn6EgK2imAOBWV4HTBBMGp+cS1LMXd97OoXjDVUB7TSQtJTGuKlyPFlGv 47hzCZYu25rqnnw5lCBm48U/PxTUsLm9HQwDcpHJ8BS5Iqy1eq8Y1QSbIF+igXGeLiaw 5kQb4Je9qd10su3MNHjyZt2WiZXA5AoXdEge+klxew2Fz7tViQhLADlu2el1GkI3bMX9 ko6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=1e9h0tE+ElK0HWRPXiuc5VM0AhTp1F0sLszAnPFX/o4=; b=TvYO2kAYYk/5bc9QTp0re4wWotFuS54e/8qZN67GkaMoJLOAoJncfewFKI7qF/b1BX Uln2C5QdiEa1OTYhMH8lwhRo8xlraxnDWmNHf8roz4/elC5AKXpPsTfJBbjBtdzVXoHG ETlS//kd797917JNEAaxBSfrZgUdTcHraOF1ZpXY6W0Nw7zZt1Nqc6J8uMAxi+BZDeYZ OwuxNGjs6li35AGZUt4IX6VdS+cPkl2+Y1ygiX1hy53vgLXUXi8RF7s2AImnGaMYoG6X I6j5YlRCbg7TPKkpkzWIntpqCenyq8AtRCLTB9jDiXSrcop9oA3kSkCIohJj1en9Q7Nr 7DMQ== X-Gm-Message-State: AOAM530ZjqR4lnBqmG2WFuste7NdMeUot3mT8V+hUaCRFaDpo7e8qeGD hRZRAaozBzAK6EOPYrliTprfUdaQcoc= X-Received: by 2002:a17:90b:120a:: with SMTP id gl10mr927614pjb.44.1595882072153; Mon, 27 Jul 2020 13:34:32 -0700 (PDT) Received: from [192.168.1.182] ([66.219.217.173]) by smtp.gmail.com with ESMTPSA id u26sm16345833pfn.54.2020.07.27.13.34.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jul 2020 13:34:31 -0700 (PDT) Subject: Re: [PATCH v4 6/6] io_uring: add support for zone-append To: Kanchan Joshi Cc: Kanchan Joshi , viro@zeniv.linux.org.uk, bcrl@kvack.org, Matthew Wilcox , Christoph Hellwig , Damien Le Moal , asml.silence@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-aio@kvack.org, io-uring@vger.kernel.org, linux-block@vger.kernel.org, linux-api@vger.kernel.org, SelvaKumar S , Nitesh Shetty , Javier Gonzalez References: <1595605762-17010-1-git-send-email-joshi.k@samsung.com> <1595605762-17010-7-git-send-email-joshi.k@samsung.com> From: Jens Axboe Message-ID: Date: Mon, 27 Jul 2020 14:34:28 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/27/20 1:16 PM, Kanchan Joshi wrote: > On Fri, Jul 24, 2020 at 10:00 PM Jens Axboe wrote: >> >> On 7/24/20 9:49 AM, Kanchan Joshi wrote: >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 7809ab2..6510cf5 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -1284,8 +1301,15 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags) >>> cqe = io_get_cqring(ctx); >>> if (likely(cqe)) { >>> WRITE_ONCE(cqe->user_data, req->user_data); >>> - WRITE_ONCE(cqe->res, res); >>> - WRITE_ONCE(cqe->flags, cflags); >>> + if (unlikely(req->flags & REQ_F_ZONE_APPEND)) { >>> + if (likely(res > 0)) >>> + WRITE_ONCE(cqe->res64, req->rw.append_offset); >>> + else >>> + WRITE_ONCE(cqe->res64, res); >>> + } else { >>> + WRITE_ONCE(cqe->res, res); >>> + WRITE_ONCE(cqe->flags, cflags); >>> + } >> >> This would be nice to keep out of the fast path, if possible. > > I was thinking of keeping a function-pointer (in io_kiocb) during > submission. That would have avoided this check......but argument count > differs, so it did not add up. But that'd grow the io_kiocb just for this use case, which is arguably even worse. Unless you can keep it in the per-request private data, but there's no more room there for the regular read/write side. >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>> index 92c2269..2580d93 100644 >>> --- a/include/uapi/linux/io_uring.h >>> +++ b/include/uapi/linux/io_uring.h >>> @@ -156,8 +156,13 @@ enum { >>> */ >>> struct io_uring_cqe { >>> __u64 user_data; /* sqe->data submission passed back */ >>> - __s32 res; /* result code for this event */ >>> - __u32 flags; >>> + union { >>> + struct { >>> + __s32 res; /* result code for this event */ >>> + __u32 flags; >>> + }; >>> + __s64 res64; /* appending offset for zone append */ >>> + }; >>> }; >> >> Is this a compatible change, both for now but also going forward? You >> could randomly have IORING_CQE_F_BUFFER set, or any other future flags. > > Sorry, I didn't quite understand the concern. CQE_F_BUFFER is not > used/set for write currently, so it looked compatible at this point. Not worried about that, since we won't ever use that for writes. But it is a potential headache down the line for other flags, if they apply to normal writes. > Yes, no room for future flags for this operation. > Do you see any other way to enable this support in io-uring? Honestly I think the only viable option is as we discussed previously, pass in a pointer to a 64-bit type where we can copy the additional completion information to. >> Layout would also be different between big and little endian, so not >> even that easy to set aside a flag for this. But even if that was done, >> we'd still have this weird API where liburing or the app would need to >> distinguish this cqe from all others based on... the user_data? Hence >> liburing can't do it, only the app would be able to. >> >> Just seems like a hack to me. > > Yes, only user_data to distinguish. Do liburing helpers need to look > at cqe->res (and decide something) before returning the cqe to > application? They generally don't, outside of the internal timeout. But it's an issue for the API, as it forces applications to handle the CQEs a certain way. Normally there's flexibility. This makes the append writes behave differently than everything else, which is never a good idea. -- Jens Axboe