Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp1170766rdb; Fri, 1 Dec 2023 08:42:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IEUb7PNuCmo/8GGPJwJjOfIpN+tb1l1Ta+MDwskg8hw1uHVcB6hlk1Nf/7i2PJiz6+SAri0 X-Received: by 2002:a17:902:e843:b0:1cf:c404:45dd with SMTP id t3-20020a170902e84300b001cfc40445ddmr20508205plg.57.1701448925555; Fri, 01 Dec 2023 08:42:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701448925; cv=none; d=google.com; s=arc-20160816; b=B11/F1Ncg9HEwVXzYQZR1WtaI+e30d57g9+Ur4aju4yBC8XJLCVN/PzXbJcg/27g/a /UH+SX1f2evdRzg6yfagybVtObRMahVJjMU0K+LkcPixXLmfu6Yr2ONIoMh9Sc4ffNfe KsyUNRc80Zpk0bj+yP9Pmrq3hm3NRAmB10H/BVu54L6JdQcShIbHrzihxqxMd8hCSd2b XoCF8ueyji4zKJ0TvTRVyWDSxYYeNdzx0iLEfgrPLgRXx5fryvpx4meqRe3UaHn1ALJu iCmN8WAVDegLCpxklPyOslO14jNJt2d83dfFwIuaQYXTWlnHyHLi29N+3xMuC2ErynyU 8BWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:mime-version :dkim-signature; bh=wJ6qDwmyRvhNqpylX9f8LuOmYLWmceoEf5k8FTfkzPU=; fh=7cqa58tvIpehV4PJxq19jYLIiQG6NErNMe+2HrMXZtc=; b=OTwgWB9vEAZ8/c6DVGnvJwGlBBxKrZYSIj4O9whUO+AbBN9UtFCuJLYIIHGEBxO68W DBIqxgXO1lQs5hKc9MoeZ+A4Rp5MXnql8C/iRM206K8HlWUev9Go89ssbCMQipgP74WZ cb5F39NzjHdPKGtv6MMtJ0vaw49oD+5PsdhbfmTJrW7TsXSm8/FbkpxFAr7/HnpwxNPv 4T5rlXuP37viUuimkQqlDVMVba/NV7PypwqwJaqvYLZ0wLaMFkYbcwL/9Kn1n2nsd/WO yEVCNnDuCh/hl3rxwJ2zPjpbN90k+2cHEqtJRFJaelmqLESyVY2iEEqb5YFiDoYvmMJh fyaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=znacTWRI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id w10-20020a170902a70a00b001b973681493si3333701plq.16.2023.12.01.08.42.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 08:42:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=znacTWRI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 4AC098275347; Fri, 1 Dec 2023 08:42:02 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378265AbjLAQls (ORCPT + 99 others); Fri, 1 Dec 2023 11:41:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378536AbjLAQlq (ORCPT ); Fri, 1 Dec 2023 11:41:46 -0500 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8CC5197 for ; Fri, 1 Dec 2023 08:41:52 -0800 (PST) Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-54744e66d27so13554a12.0 for ; Fri, 01 Dec 2023 08:41:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701448911; x=1702053711; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=wJ6qDwmyRvhNqpylX9f8LuOmYLWmceoEf5k8FTfkzPU=; b=znacTWRINH7bjtkr56UNC1pMzBG1MLZZtyC3HCMP9cD00i5ttvvAjA8JXDV9KPTBrR 5qa/JAljXjWikCWHzhvSQ04Sjw5xpx9NNC5ghPD9YxPG9QeQmWFXY0Vh7+USxBu7jQvX NS4cp9Vau1PE4dKvSOv1f0Xm+RpBWpJ5V2w3xQVKo3X35Mq5bA7J3KNF1RLZWA0s4lqD 05niG2hrDQFHeLj1T5bbQAFIWNTxN4xP3dbfe6A6scvIUXE0VsBIX02ntC+OHA46EAbu ZBeHwB+QSie0WqcXfuX3BxkbjLSyTpevAILf4mWH/HZnnVbHUy3sk4qW5gCUZYckEAn+ eoeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701448911; x=1702053711; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wJ6qDwmyRvhNqpylX9f8LuOmYLWmceoEf5k8FTfkzPU=; b=gOkTj/OJn6/flkQ5f8o1OM3+jifry+G93vefWQk2fpPVZNyd99PCK/S2/q2YWFDCxg H4w6xlWPRl7lNmprmaQ3S9tDEvvEbz0sYURJc02+OjRUkU1f9Hd4GnL/ClkXVe9RRuTY wvPK9tLCuVoFdoMSfepRSBEBnccqBzZ2RmtVZAR9vdJeISQw1Be7ZYCMJLiG/jrUhg1q PWMReZT89t0XqnPOdNmJyYTg+zATMQn2i1jD+zlSLQdRCbbyDyRxJHRgX7SysfjkhvkN dNMxTGZf7Sehwe8JeYMq5gSO3gCGJ4DIriVDh57PGcZUg0XlF/M4WZEsw0p6uAlXh4OB +qWA== X-Gm-Message-State: AOJu0Yx8Ncc6354C8G54YhUDYCksYrqAc9seolckLu3rNo49uLhN3Xgy WUmuqEVC1vnLrb9q/NlysNXukRt0X3MBdUTRA8I3fg== X-Received: by 2002:a50:aade:0:b0:54a:ee8b:7a8c with SMTP id r30-20020a50aade000000b0054aee8b7a8cmr168518edc.0.1701448911052; Fri, 01 Dec 2023 08:41:51 -0800 (PST) MIME-Version: 1.0 From: Jann Horn Date: Fri, 1 Dec 2023 17:41:13 +0100 Message-ID: Subject: io_uring: incorrect assumption about mutex behavior on unlock? To: Jens Axboe , Pavel Begunkov , io-uring Cc: kernel list , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Fri, 01 Dec 2023 08:42:02 -0800 (PST) mutex_unlock() has a different API contract compared to spin_unlock(). spin_unlock() can be used to release ownership of an object, so that as soon as the spinlock is unlocked, another task is allowed to free the object containing the spinlock. mutex_unlock() does not support this kind of usage: The caller of mutex_unlock() must ensure that the mutex stays alive until mutex_unlock() has returned. (See the thread which discusses adding documentation about this.) (POSIX userspace mutexes are different from kernel mutexes, in userspace this pattern is allowed.) io_ring_exit_work() has a comment that seems to assume that the uring_lock (which is a mutex) can be used as if the spinlock-style API contract applied: /* * Some may use context even when all refs and requests have been put, * and they are free to do so while still holding uring_lock or * completion_lock, see io_req_task_submit(). Apart from other work, * this lock/unlock section also waits them to finish. */ mutex_lock(&ctx->uring_lock); I couldn't find any way in which io_req_task_submit() actually still relies on this. I think io_fallback_req_func() now relies on it, though I'm not sure whether that's intentional. ctx->fallback_work is flushed in io_ring_ctx_wait_and_kill(), but I think it can probably be restarted later on via: io_ring_exit_work -> io_move_task_work_from_local -> io_req_normal_work_add -> io_fallback_tw(sync=false) -> schedule_delayed_work I think it is probably guaranteed that ctx->refs is non-zero when we enter io_fallback_req_func, since I think we can't enter io_fallback_req_func with an empty ctx->fallback_llist, and the requests queued up on ctx->fallback_llist have to hold refcounted references to the ctx. But by the time we reach the mutex_unlock(), I think we're not guaranteed to hold any references on the ctx anymore, and so the ctx could theoretically be freed in the middle of the mutex_unlock() call? I think that to make this code properly correct, it might be necessary to either add another flush_delayed_work() call after ctx->refs has dropped to zero and we know that the fallback work can't be restarted anymore, or create an extra ctx->refs reference that is dropped in io_fallback_req_func() after the mutex_unlock(). (Though I guess it's probably unlikely that this goes wrong in practice.)