Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754394AbcDZXQv (ORCPT ); Tue, 26 Apr 2016 19:16:51 -0400 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:40604 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754375AbcDZXQs (ORCPT ); Tue, 26 Apr 2016 19:16:48 -0400 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Shaohua Li" , "Nate Dailey" Date: Wed, 27 Apr 2016 01:02:24 +0200 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.2 041/115] raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang In-Reply-To: X-SA-Exim-Connect-IP: 2a02:8426:ae4:c500:9cba:69ae:962d:6167 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2201 Lines: 59 3.2.80-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Nate Dailey commit ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb upstream. If raid1d is handling a mix of read and write errors, handle_read_error's call to freeze_array can get stuck. This can happen because, though the bio_end_io_list is initially drained, writes can be added to it via handle_write_finished as the retry_list is processed. These writes contribute to nr_pending but are not included in nr_queued. If a later entry on the retry_list triggers a call to handle_read_error, freeze array hangs waiting for nr_pending == nr_queued+extra. The writes on the bio_end_io_list aren't included in nr_queued so the condition will never be satisfied. To prevent the hang, include bio_end_io_list writes in nr_queued. There's probably a better way to handle decrementing nr_queued, but this seemed like the safest way to avoid breaking surrounding code. I'm happy to supply the script I used to repro this hang. Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.) Signed-off-by: Nate Dailey Signed-off-by: Shaohua Li Signed-off-by: Ben Hutchings --- drivers/md/raid1.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1976,6 +1976,7 @@ static void handle_write_finished(struct if (fail) { spin_lock_irq(&conf->device_lock); list_add(&r1_bio->retry_list, &conf->bio_end_io_list); + conf->nr_queued++; spin_unlock_irq(&conf->device_lock); md_wakeup_thread(conf->mddev->thread); } else { @@ -2090,8 +2091,10 @@ static void raid1d(struct mddev *mddev) LIST_HEAD(tmp); spin_lock_irqsave(&conf->device_lock, flags); if (!test_bit(MD_CHANGE_PENDING, &mddev->flags)) { - list_add(&tmp, &conf->bio_end_io_list); - list_del_init(&conf->bio_end_io_list); + while (!list_empty(&conf->bio_end_io_list)) { + list_move(conf->bio_end_io_list.prev, &tmp); + conf->nr_queued--; + } } spin_unlock_irqrestore(&conf->device_lock, flags); while (!list_empty(&tmp)) {