Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp168099rwi; Wed, 12 Oct 2022 18:00:48 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Q+o6kzFoQuvCdkXb7HoMGc05/IEYqAVBSmLnxr4CU1zubgyhJUZi450yPHwHRHKyaxd5Y X-Received: by 2002:a05:6402:1e88:b0:45b:e8ca:9b1a with SMTP id f8-20020a0564021e8800b0045be8ca9b1amr21015855edf.156.1665622848634; Wed, 12 Oct 2022 18:00:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665622848; cv=none; d=google.com; s=arc-20160816; b=ak0co4keUtsW01vLiPszFAXY9a/s0RYlB8cz60IE6dMX6yrojoHOz8TdnTBoj+h2Zu Fg66CluUyW8Gou8XQRtZVbeMl4Us826z5VQgx48L1r7TMSxHkj+OA+Jj8cFic/UI5XEL 2W9WDvAGEaNo0kl7og+z7vNE0kT786Olb9+UrKm24PJh67n+kH5aKAuIpOz5i6zAbLQV JcwX9Bcy9V00W4VQGXqO8RO0wgR1XiNO/pgjUmQX1CyScT4lQbZiSMkvFlPhuiZuvmfZ pOx/U047MWPczGJXVSFBGMOb5+jHSxeMEq2Pw+FDcFSEZUcbkNPqYwEkfRtTJDUAjbQx Psdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=w+oybmH4JnmQv4tD/fN6RdtrFkCfAEBGX2BKahzeVgc=; b=o3FhKc0l/8VoreJ19kq0mlevi7l1Q3sviwYau7GwOldN90GI8xvktmuuguMm1NOQbI V/owodaHR3KBOpXG8ThGm5muN8RznUFLlUF77JnGON5aM60owObc0sxQ+5w7BHnxNz3O 6MDofL6MQV2NmsDPfeMrppXx4+ovQQKk6E/WU20+Ebzd4v9jDjjlAA1US8RRSXJXQVzE vxtHxxkjECMOKLUJfKxSeu+IfFvUylFCAuH70xhTscAkApQhGqL6sreBhRzGDYkWsPoq O2kTTTTuqqpWmzJ0lfkP/42HVyLS9JokJnvrbxZkVvL05HcXux0BLT/oesgdarJ9S8Z9 LvkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cYRfO4Em; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sz14-20020a1709078b0e00b007806c7c3fc1si14492197ejc.975.2022.10.12.18.00.23; Wed, 12 Oct 2022 18:00:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cYRfO4Em; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230154AbiJMAU4 (ORCPT + 99 others); Wed, 12 Oct 2022 20:20:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229891AbiJMATw (ORCPT ); Wed, 12 Oct 2022 20:19:52 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 997B718C964; Wed, 12 Oct 2022 17:17:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D6B1E616CF; Thu, 13 Oct 2022 00:17:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81AFAC43470; Thu, 13 Oct 2022 00:17:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665620255; bh=CsHxDoZ+/dYYLqL/Nt4EJ0X73AKDJKUYh5A/bZrTH2g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cYRfO4EmIw3RplBvIN5myAxpro2sFzssJNrjadi1lu/J1nJTSDmqvBn57OZjDPYb+ LdpyUt2C5CjtK9FiG0bq8u16VMFOkFvy66+uI+ctJ/0Kgth4FXauhSbwPAO3Ez4W+X 688qy/P1NcMMYobA5Ajso35nbCUJuu1L8RhohPXhDj4FgPP3XZMN4Au2kwaeBy/aTg 9BGh7hwKRYPMWYUzL4fNf2sU/HbmA/5e8qI4Fy6oZrvawMKLkiLJNdJk1S/dIkJZz/ Yj/KVxzsyfCIZKvAtVi+pjXj2M0FPa5NufbbG8XOL4ZYNLOc24/zWnaz9Jnpse5YPV 4n6jy7jjyBlyQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Pavel Begunkov , Dylan Yudaken , Jens Axboe , Sasha Levin , io-uring@vger.kernel.org Subject: [PATCH AUTOSEL 6.0 44/67] io_uring: fix CQE reordering Date: Wed, 12 Oct 2022 20:15:25 -0400 Message-Id: <20221013001554.1892206-44-sashal@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221013001554.1892206-1-sashal@kernel.org> References: <20221013001554.1892206-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Pavel Begunkov [ Upstream commit aa1df3a360a0c50e0f0086a785d75c2785c29967 ] Overflowing CQEs may result in reordering, which is buggy in case of links, F_MORE and so on. If we guarantee that we don't reorder for the unlikely event of a CQ ring overflow, then we can further extend this to not have to terminate multishot requests if it happens. For other operations, like zerocopy sends, we have no choice but to honor CQE ordering. Reported-by: Dylan Yudaken Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/ec3bc55687b0768bbe20fb62d7d06cfced7d7e70.1663892031.git.asml.silence@gmail.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- io_uring/io_uring.c | 12 ++++++++++-- io_uring/io_uring.h | 12 +++++++++--- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 242d896c00f3..13af6b56ebd2 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -567,7 +567,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) io_cq_lock(ctx); while (!list_empty(&ctx->cq_overflow_list)) { - struct io_uring_cqe *cqe = io_get_cqe(ctx); + struct io_uring_cqe *cqe = io_get_cqe_overflow(ctx, true); struct io_overflow_cqe *ocqe; if (!cqe && !force) @@ -694,12 +694,19 @@ bool io_req_cqe_overflow(struct io_kiocb *req) * control dependency is enough as we're using WRITE_ONCE to * fill the cq entry */ -struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx) +struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow) { struct io_rings *rings = ctx->rings; unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1); unsigned int free, queued, len; + /* + * Posting into the CQ when there are pending overflowed CQEs may break + * ordering guarantees, which will affect links, F_MORE users and more. + * Force overflow the completion. + */ + if (!overflow && (ctx->check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))) + return NULL; /* userspace may cheat modifying the tail, be safe and do min */ queued = min(__io_cqring_events(ctx), ctx->cq_entries); @@ -2228,6 +2235,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, do { io_cqring_overflow_flush(ctx); + if (io_cqring_events(ctx) >= min_events) return 0; if (!io_run_task_work()) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2f73f83af960..45809ae6f64e 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -24,7 +24,7 @@ enum { IOU_STOP_MULTISHOT = -ECANCELED, }; -struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx); +struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow); bool io_req_cqe_overflow(struct io_kiocb *req); int io_run_task_work_sig(void); void io_req_complete_failed(struct io_kiocb *req, s32 res); @@ -91,7 +91,8 @@ static inline void io_cq_lock(struct io_ring_ctx *ctx) void io_cq_unlock_post(struct io_ring_ctx *ctx); -static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) +static inline struct io_uring_cqe *io_get_cqe_overflow(struct io_ring_ctx *ctx, + bool overflow) { if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) { struct io_uring_cqe *cqe = ctx->cqe_cached; @@ -103,7 +104,12 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) return cqe; } - return __io_get_cqe(ctx); + return __io_get_cqe(ctx, overflow); +} + +static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) +{ + return io_get_cqe_overflow(ctx, false); } static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx, -- 2.35.1