Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp3779661pxt; Tue, 10 Aug 2021 11:07:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwhMlLk+2cEGW5QKq0x7bZccdSA4r0FW3x1Cschpx3UdGK1x4YkrXtagVdhLJ07StunkYUs X-Received: by 2002:a17:906:ac1:: with SMTP id z1mr8685891ejf.261.1628618878390; Tue, 10 Aug 2021 11:07:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628618878; cv=none; d=google.com; s=arc-20160816; b=d9M7xB1YSwtzOy3bwtEN80TgSBHRFa3DAUdRhz3Y+Ojf45SVQEuDOhjmNzFNesRQtr 9vMGVX8rnP7RP9GduaUzwwaM2bjzQwgDhEyCEsyloEAh3qgZoQbOjKQoMTISuTE4gR4t 82DCVsNi9y4qODoyu5V86m0QiZK2OmARPbYAELKAbKjYI3nzalbnTVq4UcTnOICH1mX+ MxOhuA7YlfQ2NeQDaBXk8C5wc7Mguz+2Ej6bx/xLQ9+EZe8wTZok83QnwiPvZHPDWMG6 cCDmune4rM6dSoyZI/1Zk/XIpQlTGFQwg6cwHsvCoIBrZvHEQyL1KayyNB9EKC5WuT62 Annw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=pT05e3RXuY0idPdFz+scy1zcQuKHG3g9KOSzduiL/bc=; b=cEr3rqLT+ZNuHHtc+Xy/ydG8MUhHr92b9/JUqdOejHxUTGXDegAE6VTlbKaAoRqa4K 6VHRXbYUJZ2Gf7mSLDCnpBOBwN/fJYemjQOcnz0EylKqZt6ndxg/kp/WhZzQP1Mtfjqg PEUZIn4eYeJdvpiljI7rR6RJ5qXKeu0pPuuDHS2iV9xGQ+QSeWSEJpgzhD4y6Emv6PN8 q4gdHR0aFFtC8mGDMPf1/pR3GI73WCMSP28JwMgZh/SDIFU8xhPioGM4AsVO5eMHszDA U0lBghK9ts+My9SyBZcCsyI4v6p/Y+F1TLt1G1zDRg++FdnwrX5G6SVPzf7naA7blwDl MxPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=xf7W1WFy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u19si4687967edo.454.2021.08.10.11.07.22; Tue, 10 Aug 2021 11:07:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=xf7W1WFy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239529AbhHJSFS (ORCPT + 99 others); Tue, 10 Aug 2021 14:05:18 -0400 Received: from mail.kernel.org ([198.145.29.99]:60846 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237156AbhHJSBe (ORCPT ); Tue, 10 Aug 2021 14:01:34 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2E10D613A3; Tue, 10 Aug 2021 17:47:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1628617632; bh=uI7adSp74S15+zQsLIx57ME6q8nqPuHrY0K4PGtLg4c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=xf7W1WFy3bpIfCRmotyBQ1M+1gloeFHjUHGOCy+PR1OJ84dTP90XDKG2G58AcAkRn yYe/KDRanMbM08R2UHpxo94NrIKoNpcqI0lblwvZ+jto89uRMA4Sq79MYJd9UrK6DE 3Gv71wE7uhhR0txg0uaIo22Hr3pNLJMx5XlYNbCs= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Nadav Amit , Jens Axboe Subject: [PATCH 5.13 145/175] io-wq: fix race between worker exiting and activating free worker Date: Tue, 10 Aug 2021 19:30:53 +0200 Message-Id: <20210810173005.743233009@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210810173000.928681411@linuxfoundation.org> References: <20210810173000.928681411@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jens Axboe commit 83d6c39310b6d11199179f6384c2b0a415389597 upstream. Nadav correctly reports that we have a race between a worker exiting, and new work being queued. This can lead to work being queued behind an existing worker that could be sleeping on an event before it can run to completion, and hence introducing potential big latency gaps if we hit this race condition: cpu0 cpu1 ---- ---- io_wqe_worker() schedule_timeout() // timed out io_wqe_enqueue() io_wqe_wake_worker() // work_flags & IO_WQ_WORK_CONCURRENT io_wqe_activate_free_worker() io_worker_exit() Fix this by having the exiting worker go through the normal decrement of a running worker, which will spawn a new one if needed. The free worker activation is modified to only return success if we were able to find a sleeping worker - if not, we keep looking through the list. If we fail, we create a new worker as per usual. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/BFF746C0-FEDE-4646-A253-3021C57C26C9@gmail.com/ Reported-by: Nadav Amit Tested-by: Nadav Amit Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- fs/io-wq.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -131,6 +131,7 @@ struct io_cb_cancel_data { }; static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index); +static void io_wqe_dec_running(struct io_worker *worker); static bool io_worker_get(struct io_worker *worker) { @@ -169,26 +170,21 @@ static void io_worker_exit(struct io_wor { struct io_wqe *wqe = worker->wqe; struct io_wqe_acct *acct = io_wqe_get_acct(worker); - unsigned flags; if (refcount_dec_and_test(&worker->ref)) complete(&worker->ref_done); wait_for_completion(&worker->ref_done); - preempt_disable(); - current->flags &= ~PF_IO_WORKER; - flags = worker->flags; - worker->flags = 0; - if (flags & IO_WORKER_F_RUNNING) - atomic_dec(&acct->nr_running); - worker->flags = 0; - preempt_enable(); - raw_spin_lock_irq(&wqe->lock); - if (flags & IO_WORKER_F_FREE) + if (worker->flags & IO_WORKER_F_FREE) hlist_nulls_del_rcu(&worker->nulls_node); list_del_rcu(&worker->all_list); acct->nr_workers--; + preempt_disable(); + io_wqe_dec_running(worker); + worker->flags = 0; + current->flags &= ~PF_IO_WORKER; + preempt_enable(); raw_spin_unlock_irq(&wqe->lock); kfree_rcu(worker, rcu); @@ -215,15 +211,19 @@ static bool io_wqe_activate_free_worker( struct hlist_nulls_node *n; struct io_worker *worker; - n = rcu_dereference(hlist_nulls_first_rcu(&wqe->free_list)); - if (is_a_nulls(n)) - return false; - - worker = hlist_nulls_entry(n, struct io_worker, nulls_node); - if (io_worker_get(worker)) { - wake_up_process(worker->task); + /* + * Iterate free_list and see if we can find an idle worker to + * activate. If a given worker is on the free_list but in the process + * of exiting, keep trying. + */ + hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) { + if (!io_worker_get(worker)) + continue; + if (wake_up_process(worker->task)) { + io_worker_release(worker); + return true; + } io_worker_release(worker); - return true; } return false;