Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp2884314rdb; Tue, 6 Feb 2024 00:01:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IG8gSuHePRt2VpJTqTkj9bBaTnNdeyv1qgtJteosK48tcR3flFeIHKkASvEXXYaG4yi63Bi X-Received: by 2002:a05:620a:16b9:b0:783:96c4:8c8 with SMTP id s25-20020a05620a16b900b0078396c408c8mr1643389qkj.63.1707206507717; Tue, 06 Feb 2024 00:01:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707206507; cv=pass; d=google.com; s=arc-20160816; b=AQ3oT1K9dEa+A9yrho5WYejXU3D/7QvtM/rIM1+DS5I/FCHco7YmbBZSR9lJWtThBu 8nAzqyqNP34qkZPs8aeaQnfLtroIGw+v/0caFBuoDp9cGfDOrh0OByVONm1q7/WsA+QY COATEVmqxRwvqdEMcGZfo4/QLPqVI0VT6GjTWiil2NI6qzZph/TdTmUbzyUq0JjoSFz8 mQARHYV+LHgxrxuO7JoVmpvAnlXYq472eNk5hJTr2XXlRZFagjFtg7ko7362zDp3qedJ R3fEEWgCACnLBLZeaxCVBzRDMniU466hek8JlfU3thVheRDU0zJSPEeqGTDDr+fL3G3U /YIw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=gtWPZOyO/KWhE4EahR/tEhiAq60UHn6JhVSUiDeAFE0=; fh=K5k2yEvkyJYQ33Vr/Q+htL5P03/95cAsA8bMVdR+fcw=; b=Kp/v9hRpYqzE2ew3FyR9hkMm/xLBZKaLT359Fou8IkQ+6Vd7L1U4Ux2588hpTYqNZI 0vMfhao5qlG1wKDI9FL/tdXfKpRxIkVhMDvJB0Ht4z8t61WEYc+4w7qwV5mh5N2TYaU0 QqFgkDP4WGagnVNLd8UzKSfsKAZlV3tH2yJxci2x67Y+ruoZLDxEx1WrjJlYpEfIRZFo QNpz0Kpj9nItd0IXMrukJpKffokbUmOeW2aVe2iGVbU1kH3OujdMPS8ioG56ZoLS3pMw Ck1exP5vYA0W0BRWyrXerNGk6OOtuLK2JFRMw5TDZOLF9YQyWpYwdBMFtCp+Lk3kjxdh 69Qg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=unisoc.com); spf=pass (google.com: domain of linux-kernel+bounces-54455-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-54455-linux.lists.archive=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=1; AJvYcCVq5VOwRE2hmdgHOXKjDPwiYIEWjQGq4MbOryKyLbLDwds6yzFweT4pdZ8a+DQv3UPNychT4jwiuC8j9vbJ9m/1N7PwNauvoh30KK7MFg== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d5-20020a37c405000000b00783f70a019dsi1800491qki.422.2024.02.06.00.01.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 00:01:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-54455-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=unisoc.com); spf=pass (google.com: domain of linux-kernel+bounces-54455-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-54455-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 75BFD1C24749 for ; Tue, 6 Feb 2024 08:01:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 11FC512A177; Tue, 6 Feb 2024 08:01:18 +0000 (UTC) Received: from SHSQR01.spreadtrum.com (mx1.unisoc.com [222.66.158.135]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2385C12AAC3 for ; Tue, 6 Feb 2024 08:01:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=222.66.158.135 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707206477; cv=none; b=L8AYMX7lkHGU6Dk6ywye/N9OiadeZl66hAxXrwY3flAxd9awXb3dWS5TqiHt4ZpxbllXZiHNLvfR7LZ/K1CLJwTQBEyV/u4Iz8Z9z2nwkkt/YvbhkVnChqOL1PUW7RRyskzUaT8Q32kbzqk2PUnyTMnvxkgF7v8MON29hqXpSrY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707206477; c=relaxed/simple; bh=fmhTYTwqZx8d+uKtvsowSz7QMkwruX8ClBH7swlZWZE=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=q3R4y/AHmmylmTNFghvmJ3aZ3Y0mltJrZMzbH0Vjs6BnbUqeyydDoIfPBk0IEAy5uYCS6wfxbXVd7R4ZM5wPBMjIA8SDf/ie9qWPXij98g8pmo4NHCjv3L8Mq5mswjHl5nvwbfNzB4Gv0FaJdn9n3GnBT2dbytH2hUwvAlmINwM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=unisoc.com; spf=pass smtp.mailfrom=unisoc.com; arc=none smtp.client-ip=222.66.158.135 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=unisoc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=unisoc.com Received: from dlp.unisoc.com ([10.29.3.86]) by SHSQR01.spreadtrum.com with ESMTP id 41680iiv065768; Tue, 6 Feb 2024 16:00:44 +0800 (+08) (envelope-from Yunlong.Xing@unisoc.com) Received: from SHDLP.spreadtrum.com (bjmbx02.spreadtrum.com [10.0.64.8]) by dlp.unisoc.com (SkyGuard) with ESMTPS id 4TTbK24BmNz2SDHVM; Tue, 6 Feb 2024 16:00:38 +0800 (CST) Received: from tj10379pcu.spreadtrum.com (10.5.32.15) by BJMBX02.spreadtrum.com (10.0.64.8) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Tue, 6 Feb 2024 16:00:42 +0800 From: Yunlong Xing To: , , CC: , , , Subject: [PATCH] workqueue: Fix pool->nr_running type back to atomic Date: Tue, 6 Feb 2024 16:00:24 +0800 Message-ID: <20240206080024.2373490-1-yunlong.xing@unisoc.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-ClientProxiedBy: SHCAS03.spreadtrum.com (10.0.1.207) To BJMBX02.spreadtrum.com (10.0.64.8) X-MAIL:SHSQR01.spreadtrum.com 41680iiv065768 In CPU-hotplug test, when plug the core, set_cpus_allowed_ptr() restoring the cpus_mask of the per-cpu worker may fail, the cpus_mask of the worker remain wq_unbound_cpumask until the core hotpluged next time. so, workers in the same per-cpu pool can run concurrently and change nr_running at the same time, atomic problem occur. Crash ps info: [RU] PID: 19966 TASK: ffffff802a71a580 CPU: 6 COMMAND: "kworker/6:1" [ID] PID: 2620 TASK: ffffff80d451a580 CPU: 0 COMMAND: "kworker/6:2" workqueue_online_cpu() ->rebind_workers() ->set_cpus_allowed_ptr() // restore cpus_mask failed kworker/6:2 cpus_mask is 0xFF worker enter idle T1:kworker/6:1(CPU6) T2:kworker/6:2(CPU0) 1:worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND) ->pool->nr_running++; (1) 2:wq_worker_sleeping() ->pool->nr_running--; (0) 3:wq_worker_running() worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND) ->pool->nr_running++; (1) ->pool->nr_running++; (1) //Two workers that running on two CPUs modify the nr_running at the same time, atomic problems(race condition) may occur. the nr_running should be 2, but in this case it is 1. 4: worker_set_flags(worker, WORKER_PREP) ->pool->nr_running--; (0) 5:worker_set_flags(worker, WORKER_PREP) ->pool->nr_running--; (-1) The complete following debug log: [70285.393470] wqdb kworker/6:1:19966 cpu6 pool 6 ffffff817f311900 nr_running++ is 1, by clr 264 [70285.393484] wqdb kworker/6:1:19966 cpu6 pool 6 ffffff817f311900 nr_running-- is 0, by sleep [70285.399883] wqdb kworker/6:1:19966 cpu6 pool 6 ffffff817f311900 nr_running++ is 1, by run [70285.399882] wqdb kworker/6:2:2620 cpu0 pool 6 ffffff817f311900 nr_running++ is 1, by clr 264 [70285.399894] wqdb kworker/6:2:2620 cpu0 pool 6 ffffff817f311900 nr_running-- is 0, by set 8 [70285.400013] wqdb kworker/6:1:19966 cpu6 pool 6 ffffff817f311900 nr_running-- is -1, by set 8 [70285.400017] wqdb kworker/6:1:19966 cpu6 pool 6 ffffff817f311900 nr_running_error is -1 Recover nr_running to the atomic variable type and use atomic method where nr_running is accessed. Fixes: bc35f7ef9628 ("workqueue: Convert the type of pool->nr_running to int") Cc: stable@vger.kernel.org Signed-off-by: Yunlong Xing --- kernel/workqueue.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 76e60faed892..e74d9b83322c 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -161,12 +161,10 @@ struct worker_pool { bool cpu_stall; /* WD: stalled cpu bound pool */ /* - * The counter is incremented in a process context on the associated CPU - * w/ preemption disabled, and decremented or reset in the same context - * but w/ pool->lock held. The readers grab pool->lock and are - * guaranteed to see if the counter reached zero. + * The workers associated the same CPU maybe running on different CPU, + * so need use atomic_t. */ - int nr_running; + atomic_t nr_running; struct list_head worklist; /* L: list of pending works */ @@ -832,7 +830,7 @@ static bool work_is_canceling(struct work_struct *work) */ static bool need_more_worker(struct worker_pool *pool) { - return !list_empty(&pool->worklist) && !pool->nr_running; + return !list_empty(&pool->worklist) && !atomic_read(&pool->nr_running); } /* Can I start working? Called from busy but !running workers. */ @@ -844,7 +842,7 @@ static bool may_start_working(struct worker_pool *pool) /* Do I need to keep working? Called from currently running workers. */ static bool keep_working(struct worker_pool *pool) { - return !list_empty(&pool->worklist) && (pool->nr_running <= 1); + return !list_empty(&pool->worklist) && (atomic_read(&pool->nr_running) <= 1); } /* Do we need a new worker? Called from manager. */ @@ -879,7 +877,7 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags) /* If transitioning into NOT_RUNNING, adjust nr_running. */ if ((flags & WORKER_NOT_RUNNING) && !(worker->flags & WORKER_NOT_RUNNING)) { - pool->nr_running--; + atomic_dec(&pool->nr_running); } worker->flags |= flags; @@ -908,7 +906,7 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags) */ if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING)) if (!(worker->flags & WORKER_NOT_RUNNING)) - pool->nr_running++; + atomic_inc(&pool->nr_running); } /* Return the first idle worker. Called with pool->lock held. */ @@ -951,7 +949,7 @@ static void worker_enter_idle(struct worker *worker) mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT); /* Sanity check nr_running. */ - WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running); + WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && atomic_read(&pool->nr_running)); } /** @@ -1262,7 +1260,7 @@ void wq_worker_running(struct task_struct *task) */ preempt_disable(); if (!(worker->flags & WORKER_NOT_RUNNING)) - worker->pool->nr_running++; + atomic_inc(&worker->pool->nr_running); preempt_enable(); /* @@ -1313,7 +1311,7 @@ void wq_worker_sleeping(struct task_struct *task) return; } - pool->nr_running--; + atomic_dec(&pool->nr_running); if (kick_pool(pool)) worker->current_pwq->stats[PWQ_STAT_CM_WAKEUP]++; @@ -5418,7 +5416,7 @@ static void unbind_workers(int cpu) * an unbound (in terms of concurrency management) pool which * are served by workers tied to the pool. */ - pool->nr_running = 0; + atomic_set(&pool->nr_running, 0); /* * With concurrency management just turned off, a busy -- 2.25.1