Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5131766imu; Tue, 8 Jan 2019 12:05:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN7Uv6UVAVeaYg74jQOeAY95iUpOh+5hPUMXyp5kvnA/7NZ71vCmYhEVZo4T850Cr5Sd3/0u X-Received: by 2002:a17:902:380c:: with SMTP id l12mr3051130plc.326.1546977953171; Tue, 08 Jan 2019 12:05:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546977953; cv=none; d=google.com; s=arc-20160816; b=wZCHkMnfsg1GrnwXimJ6M10tZFwx1Di/9hdTQh3ggR/Q+/yCklJGwae7sS3llZtwKO ynLm+uZ4LEiiWORKKvS7U93njztiZg8yg1YSjs4Mr5pDtTVWkVSR6omRV03uenxgOXql OPJ0ogwI+9ZDna7vI1y02fnXDKb1KEa3d5VYtPSNpUVdxFvgrm0yWardj2oBRXHzJmv0 NaTKRQb09sTR8OHZrocYOMmwM8kvy2UCCD03LvABk8TTExOikKaiBQl7RucGNI+NsCoq wrC/3Feu20OirsZ8Nuu49qYnrLdrkEPfHP/fRyp423E4T/DPufi4D4bvzvywGwEhUf8O w5Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=3UPEJbA0u19zMg8+MfMIWo/O3C/Zqy3pN88UF7OKrMM=; b=Xb8lnB83mHi86ZSEC2h/yb2eXXtIbEVF8v0qxcbgQEjY0LHSS2PrynyjMWhHh5iK14 UKk+2e6kl0YRRZiq/GLFXToSmur6+ddG2UV0OLLqFgmTfc8UocYiHTXet6lRjZH9vOBY RjcMw5liTlFvuMmf+aoBtG59xoy/VMpIC4vqrlrQYPeLpKSvdw4ZDrTU/2qTn9UzS9Pi 3NrfsaOrzjm0+WwB+BroImqfWlly35rP6RsdifrFKUN5xKfSWHJ6hSqvELAdiwzv2mSB bTG5tGW7/or9ZbLjLpnXWJdUuBbpquh445tg/haI4BsRwgXvxUANIKk6XXOCc39Ya9ar jRAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=scLC5Tei; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l66si32319086pfi.5.2019.01.08.12.05.37; Tue, 08 Jan 2019 12:05:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=scLC5Tei; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730407AbfAHUCW (ORCPT + 99 others); Tue, 8 Jan 2019 15:02:22 -0500 Received: from mail.kernel.org ([198.145.29.99]:36254 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730268AbfAHT3D (ORCPT ); Tue, 8 Jan 2019 14:29:03 -0500 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A3EE22183F; Tue, 8 Jan 2019 19:29:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1546975742; bh=Z8MumRHjEYYAs2Zm5KUZdX9CeyWPAbUkAtQM98nDZhY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=scLC5Tei6Gb4XmtK2V9lun0DUS1xHGN6vp+SB38lBf6FtelV8asGbQ2U8W2sq7eyD MdUZ+w/hZDuIiM9nt5tUfWEJh/ef8xuhx/T1NYX+35V/i4d6v7pKcHu3MrL6+zJlAg GRfMVdzzeK+HlGyXRh4ub4DlWjAda0B9azvGj+/8= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Nikos Tsironis , Ilias Tsitsimpis , Mike Snitzer , Sasha Levin Subject: [PATCH AUTOSEL 4.20 089/117] dm kcopyd: Fix bug causing workqueue stalls Date: Tue, 8 Jan 2019 14:25:57 -0500 Message-Id: <20190108192628.121270-89-sashal@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190108192628.121270-1-sashal@kernel.org> References: <20190108192628.121270-1-sashal@kernel.org> MIME-Version: 1.0 X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Nikos Tsironis [ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ] When using kcopyd to run callbacks through dm_kcopyd_do_callback() or submitting copy jobs with a source size of 0, the jobs are pushed directly to the complete_jobs list, which could be under processing by the kcopyd thread. As a result, the kcopyd thread can continue running completed jobs indefinitely, without releasing the CPU, as long as someone keeps submitting new completed jobs through the aforementioned paths. Processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is thus stalled for excessive amounts of time, hurting performance. Running the following test, from the device mapper test suite [1], dmtest run --suite snapshot -n parallel_io_to_many_snaps_N , with 8 active snapshots, we get, in dmesg, messages like the following: [68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s! [68899.949282] Showing busy workqueues and worker pools: [68899.949288] workqueue events: flags=0x0 [68899.949295] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949306] pending: vmstat_shepherd, cache_reap [68899.949331] workqueue mm_percpu_wq: flags=0x8 [68899.949337] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949345] pending: vmstat_update [68899.949387] workqueue dm_bufio_cache: flags=0x8 [68899.949392] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [68899.949400] pending: work_fn [dm_bufio] [68899.949423] workqueue kcopyd: flags=0x8 [68899.949429] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949437] pending: do_work [dm_mod] [68899.949452] workqueue kcopyd: flags=0x8 [68899.949458] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949466] in-flight: 13:do_work [dm_mod] [68899.949474] pending: do_work [dm_mod] [68899.949487] workqueue kcopyd: flags=0x8 [68899.949493] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949501] pending: do_work [dm_mod] [68899.949515] workqueue kcopyd: flags=0x8 [68899.949521] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949529] pending: do_work [dm_mod] [68899.949541] workqueue kcopyd: flags=0x8 [68899.949547] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949555] pending: do_work [dm_mod] [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084 Fix this by splitting the complete_jobs list into two parts: A user facing part, named callback_jobs, and one used internally by kcopyd, retaining the name complete_jobs. dm_kcopyd_do_callback() and dispatch_job() now push their jobs to the callback_jobs list, which is spliced to the complete_jobs list once, every time the kcopyd thread wakes up. This prevents kcopyd from hogging the CPU indefinitely and causing workqueue stalls. Re-running the aforementioned test: * Workqueue stalls are eliminated * The maximum writing time among all targets is reduced from 09m37.10s to 06m04.85s and the total run time of the test is reduced from 10m43.591s to 7m19.199s [1] https://github.com/jthornber/device-mapper-test-suite Signed-off-by: Nikos Tsironis Signed-off-by: Ilias Tsitsimpis Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin --- drivers/md/dm-kcopyd.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c index 2fc4213e02b5..671c24332802 100644 --- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -56,15 +56,17 @@ struct dm_kcopyd_client { atomic_t nr_jobs; /* - * We maintain three lists of jobs: + * We maintain four lists of jobs: * * i) jobs waiting for pages * ii) jobs that have pages, and are waiting for the io to be issued. - * iii) jobs that have completed. + * iii) jobs that don't need to do any IO and just run a callback + * iv) jobs that have completed. * - * All three of these are protected by job_lock. + * All four of these are protected by job_lock. */ spinlock_t job_lock; + struct list_head callback_jobs; struct list_head complete_jobs; struct list_head io_jobs; struct list_head pages_jobs; @@ -625,6 +627,7 @@ static void do_work(struct work_struct *work) struct dm_kcopyd_client *kc = container_of(work, struct dm_kcopyd_client, kcopyd_work); struct blk_plug plug; + unsigned long flags; /* * The order that these are called is *very* important. @@ -633,6 +636,10 @@ static void do_work(struct work_struct *work) * list. io jobs call wake when they complete and it all * starts again. */ + spin_lock_irqsave(&kc->job_lock, flags); + list_splice_tail_init(&kc->callback_jobs, &kc->complete_jobs); + spin_unlock_irqrestore(&kc->job_lock, flags); + blk_start_plug(&plug); process_jobs(&kc->complete_jobs, kc, run_complete_job); process_jobs(&kc->pages_jobs, kc, run_pages_job); @@ -650,7 +657,7 @@ static void dispatch_job(struct kcopyd_job *job) struct dm_kcopyd_client *kc = job->kc; atomic_inc(&kc->nr_jobs); if (unlikely(!job->source.count)) - push(&kc->complete_jobs, job); + push(&kc->callback_jobs, job); else if (job->pages == &zero_page_list) push(&kc->io_jobs, job); else @@ -858,7 +865,7 @@ void dm_kcopyd_do_callback(void *j, int read_err, unsigned long write_err) job->read_err = read_err; job->write_err = write_err; - push(&kc->complete_jobs, job); + push(&kc->callback_jobs, job); wake(kc); } EXPORT_SYMBOL(dm_kcopyd_do_callback); @@ -888,6 +895,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro return ERR_PTR(-ENOMEM); spin_lock_init(&kc->job_lock); + INIT_LIST_HEAD(&kc->callback_jobs); INIT_LIST_HEAD(&kc->complete_jobs); INIT_LIST_HEAD(&kc->io_jobs); INIT_LIST_HEAD(&kc->pages_jobs); @@ -939,6 +947,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc) /* Wait for completion of all jobs submitted by this client. */ wait_event(kc->destroyq, !atomic_read(&kc->nr_jobs)); + BUG_ON(!list_empty(&kc->callback_jobs)); BUG_ON(!list_empty(&kc->complete_jobs)); BUG_ON(!list_empty(&kc->io_jobs)); BUG_ON(!list_empty(&kc->pages_jobs)); -- 2.19.1