Received: by 2002:ab2:3350:0:b0:1f4:6588:b3a7 with SMTP id o16csp1046295lqe; Sun, 7 Apr 2024 16:47:01 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCV/Wz4Vcy8J8XbEsoReLTg8nNdflrTdf7BIAvFT6r32t/lN6Sh8H2m314hsfjvouIEW9G1x57IddRT6wC+hnAqXoooUAiXZ/McE5fAucg== X-Google-Smtp-Source: AGHT+IHacCXoNkRqTp28/tgTj3kooiNQuibzo+s1OXm/ClqwH4L6ZkC00Kx9Dt30GlIUr+zYIkVS X-Received: by 2002:a17:906:24d6:b0:a4e:8e96:d43e with SMTP id f22-20020a17090624d600b00a4e8e96d43emr4521996ejb.67.1712533621166; Sun, 07 Apr 2024 16:47:01 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712533621; cv=pass; d=google.com; s=arc-20160816; b=bdnfnoU6vc2giZUs2/RmxVvJK/0SeCYB4KLocG7tEdyKhc2+EO6UvLLbsUF5pOAnbv ixhGtkW33Cr5h0bhLWYkr/PvyyFmqDX5JTEiXA4Jj2lclm/lhtSs+oDJl2u3GBrN6Xe1 4Eh2DkxZmrXHpzl61ah0+iLWrIIOB6b0AyAMDK2b0mwvSsFD9yWjxmHAHMSPmE/qb/+c iNqTCyEcIbggoyyiEJQ/jwNqZAZTlmfVh7mhBJQyxQmlke8C4IKUMKOiWKvOSsyNpeEG qnjdzvcw1kucL8VitJY9Gh9sw+hNM4s7R9i/8o31seuANRhHXHZqAVHbeqd+EA9EBjvZ 3gPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=1aCu+fVrPuUWtldkGC+iCGr5hauMtng6T63RCtW8kqE=; fh=LfHO/ko3JM+garuMnMl9o2K8cciCyGORAeEPzHjBXco=; b=uFhaCyRl0e4qmnt3BKokC7XIAsJd+/ubIDLNxytIK5pktzoH7AhZl32F0C43QR4HWd zCTqsm/xq1zVLhNhe4Cu8DkbCVYQxuwXBEgSpKnF+s3fCzxvv7hmE175aNrozppkXry3 wya8d4SG/o9WrOwwpAV5rh+o1WfQOkwatwsdZOmxhkyzdCDw9ZxGEB/yk2aK26UqH4gq 1ULyLgQX8nQJXPdE6/MiZ2vqo3ZuIqDyyNPlY/+bTif56DWqSQSsrQALiCZyyrZLEpfv Ei9Ib/45ffiJAZwCDGYBTwkE9EM99fyyF7U2xQ9YZkNlFL6EbO5Y12Q7BAt2gxEgWRUY h8lw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=BG5HEtDN; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-134668-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-134668-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id nb39-20020a1709071ca700b00a51cba07d76si1108343ejc.528.2024.04.07.16.47.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 07 Apr 2024 16:47:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-134668-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=BG5HEtDN; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-134668-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-134668-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 86F7F1F2137C for ; Sun, 7 Apr 2024 23:47:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 67CA944C92; Sun, 7 Apr 2024 23:46:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BG5HEtDN" Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2BAE47F48; Sun, 7 Apr 2024 23:46:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712533610; cv=none; b=naQWoHdRLrwbg3pqJlLOVjDYfrfTSa5rMs3ilEYui2yHaA3hwZsDemfp+aBsVx9Kq1QfBeS2IjE63U8TEPdl2lPvPHXsU4caqS+kZV2rz1arau+dToPk8nWU8w2l0OHlugOMIPE/9BH81kK0gjIVPtIbaLaT4lwckN6nUUmGges= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712533610; c=relaxed/simple; bh=IGsTUZ2yVms9I0iyqQsYWf4iC+FSnHAR1pKHuXtkhA0=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=OsOk2BEeUFip8/hp4IjZ883JlWaPa4O4Tc4OFSJxwXzsw4j3oYcXbaYMf8+HiK0iYV6sk6VZa88CTwL+hu7yoKWQjCQHDcrJcP/8Mpk3rL6ShZxT2RK4OjPBVSrDVg2PtomFCN88sNOaT4wGQm/93Iu/HB4XpLlbtn/r1DQVGf8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BG5HEtDN; arc=none smtp.client-ip=209.85.221.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-343c7fae6e4so2567433f8f.1; Sun, 07 Apr 2024 16:46:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712533607; x=1713138407; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=1aCu+fVrPuUWtldkGC+iCGr5hauMtng6T63RCtW8kqE=; b=BG5HEtDNySJlTyr3epgnm3Yv1z+YOLsw4r5m6lcIXFPpxs3oTFCmLIluwTPek5OXWa Tc+igMO9gWnZsA8vGjoNgssR02bn4FCmqIxwN/wDWwKUdwvxOBHQg75QtJfAnGNGYg0v iyIauki2FJX2/V9TmcGt9xiQMNLvKa/EOdh9hMWHIVmFqAaBTC1sCZ/SZEtKy+sXsmyI IgMMuXorZ0wt0+s4unEkj8PhNK8xvtzZI5svDgZWjvsuOCTu+flXzy1O1l+at4BuFodC GuZsbhfFH6ROs4To9TXYIqk3XPidWSbv2RNDeLUMJqBtV6SZGLCrxar/jDVYN+DhvBW0 Nm2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712533607; x=1713138407; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1aCu+fVrPuUWtldkGC+iCGr5hauMtng6T63RCtW8kqE=; b=WfQqtSFb5z5fxsg6noICqb3PiJRaOW3T0xd8EX+SFiZ65m7hvkqkeWqa/EovDG6YSS bE5o98vs0kOUtXexDIApJM1QAeGqCs3SDuH01n09LAvBKe4kXJd8ncdc3Vr8SOU43N71 oy4pJH3Tavjgep/r0Et+QmL9LjLgbEcctjpWUIEq6aO/0SvYQGTolJ0tfQ77UoQl1l2T mTc5U2h/GDZzDirV4DW3tDvng/6xLBK0A7QKDIRJxy7NPJwbpSJqPw5WN3/kpjGOmqyy BtAJ68YqG64uaP/tCMRoTdKby0psH9eF8Dv43+1NJwrXyD+h3IDeuPnpf/h6nb3gLKxK VM/Q== X-Forwarded-Encrypted: i=1; AJvYcCXLYpYbSAFxVgP8qLnlU4XqBDxuOdGj/DoIdmJrFC761iV9lGZRA/MRNfCfEhPHVb8HxIenf6rIb/c1HXzutxNBzAUTh/HF0Wom0I+q X-Gm-Message-State: AOJu0YwRGAk4cKWSpW0s4474dqPf0Co+hQPTfyIOcxaj5leYNXI8MF0O eFfca4kta2mH5eO7MGQlD7YdKFI/OkFhqjXJezSSfC1oR2MbbX5w X-Received: by 2002:a5d:64e9:0:b0:343:8b9e:be4a with SMTP id g9-20020a5d64e9000000b003438b9ebe4amr5448144wri.71.1712533606912; Sun, 07 Apr 2024 16:46:46 -0700 (PDT) Received: from [192.168.42.234] ([85.255.235.79]) by smtp.gmail.com with ESMTPSA id t14-20020a5d6a4e000000b0033b48190e5esm7557673wrw.67.2024.04.07.16.46.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 07 Apr 2024 16:46:46 -0700 (PDT) Message-ID: <09f1a8e9-d9ad-4b40-885b-21e1c2ba147b@gmail.com> Date: Mon, 8 Apr 2024 00:46:44 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy To: Oliver Crumrine , axboe@kernel.dk Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org References: <6850f08d-0e89-4eb3-bbfb-bdcc5d4e1b78@gmail.com> Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 4/7/24 20:14, Oliver Crumrine wrote: > Oliver Crumrine wrote: >> Pavel Begunkov wrote: >>> On 4/5/24 21:04, Oliver Crumrine wrote: >>>> Pavel Begunkov wrote: >>>>> On 4/4/24 23:17, Oliver Crumrine wrote: >>>>>> In his patch to enable zerocopy networking for io_uring, Pavel Begunkov >>>>>> specifically disabled REQ_F_CQE_SKIP, as (at least from my >>>>>> understanding) the userspace program wouldn't receive the >>>>>> IORING_CQE_F_MORE flag in the result value. >>>>> >>>>> No. IORING_CQE_F_MORE means there will be another CQE from this >>>>> request, so a single CQE without IORING_CQE_F_MORE is trivially >>>>> fine. >>>>> >>>>> The problem is the semantics, because by suppressing the first >>>>> CQE you're loosing the result value. You might rely on WAITALL >>>> That's already happening with io_send. >>> >>> Right, and it's still annoying and hard to use >> Another solution might be something where there is a counter that stores >> how many CQEs with REQ_F_CQE_SKIP have been processed. Before exiting, >> userspace could call a function like: io_wait_completions(int completions) >> which would wait until everything is done, and then userspace could peek >> the completion ring. >>> >>>>> as other sends and "fail" (in terms of io_uring) the request >>>>> in case of a partial send posting 2 CQEs, but that's not a great >>>>> way and it's getting userspace complicated pretty easily. >>>>> >>>>> In short, it was left out for later because there is a >>>>> better way to implement it, but it should be done carefully >>>> Maybe we could put the return values in the notifs? That would be a >>>> discrepancy between io_send and io_send_zc, though. >>> >>> Yes. And yes, having a custom flavour is not good. It'd only >>> be well usable if apart from returning the actual result >>> it also guarantees there will be one and only one CQE, then >>> the userspace doesn't have to do the dancing with counting >>> and checking F_MORE. In fact, I outlined before how a generic >>> solution may looks like: >>> >>> https://github.com/axboe/liburing/issues/824 >>> >>> The only interesting part, IMHO, is to be able to merge the >>> main completion with its notification. Below is an old stash >>> rebased onto for-6.10. The only thing missing is relinking, >>> but maybe we don't even care about it. I need to cover it >>> well with tests. >> The patch looks pretty good. The only potential issue is that you store >> the res of the normal CQE into the notif CQE. This overwrites the >> IORING_CQE_F_NOTIF with IORING_CQE_F_MORE. This means that the notif would >> indicate to userspace that there will be another CQE, of which there >> won't. > I was wrong here; Mixed up flags and result value. Right, it's fine. And it's synchronised by the ubuf refcounting, though it might get more complicated if I'd try out some counting optimisations. FWIW, it shouldn't give any performance wins. The heavy stuff is notifications waking the task, which is still there. I can even imagine that having separate CQEs might be more flexible and would allow more efficient CQ batching. -- Pavel Begunkov