Received: by 10.223.185.116 with SMTP id b49csp6362175wrg; Wed, 28 Feb 2018 08:10:53 -0800 (PST) X-Google-Smtp-Source: AH8x224z489qTpJtD0nTCXgAZI252ALtn6Tu02GU8ut+pjVQN/x2+xDi2hl7HXtvhATkyFdioOHK X-Received: by 10.99.111.137 with SMTP id k131mr14771230pgc.11.1519834253100; Wed, 28 Feb 2018 08:10:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519834253; cv=none; d=google.com; s=arc-20160816; b=z3QKLFARqAqLc47wzWk4XsiUyX1hes5O2JFjBHdM7GZYQ2U2P9FV4f6nua3eZatYX/ rcV2y6Mw3ijpsq1xw5ITSLA/j/Sm/n9IWJZLz1hO7BEibxar7Alyq4EGSqg1VJqid6WL Q8pRvLaGgsptRPWyp1RYgYZ8pThn6zFLk7uW8c0xCP/BwXb2rXQ6TOxBcOqTMeFJ2E9Z BO7vPKzPMIjcLvcLsBm+iLndlWhN/EOEvRv7sgijVPf7/7JyN5eC6RVqtVKE2TxYDZ8T ruewQuNe6XUvapsvzUdTGTf1BnzK+Veb4tbwuyEso6KHR5NyKZUDQOXZsz76VY47R25X v3lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition :arc-authentication-results; bh=xqQIw33gSIjhvbooUThsOpO7gmZTaNl1cK2v3RwDQNQ=; b=OL5MbDG5TYSQNBghpUE3e+1AbVEOzGvrB5hswLoIeYTzNNgjqAkQeVNG2bgBnyNfSB kBIwfJZDZ1otBWR1o7hAqbcBoTGpEJHIkR1158E+9h9+IPAYL9PubheovtCP2kxalHfQ 7WJ5lnSXbrLkuJ0GgigeI3msiaDPhlsn+xaDyNpAd9lUPWXjaAuyOUh3NDD0e1Hiosww H0nUsYzR3TtqI03zbBqKiSd/FVL0ht/JDXUYoXYZiIauh2H0WCctpsVzE70P1EyDI1sX jajoGz8RBngT2Z38aTFRFrjnL+8LFG6v21b+DgEMHW+pid5EjFCIabH7A9UfI0BH25Qo fmlA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 14si1422187pfk.412.2018.02.28.08.10.37; Wed, 28 Feb 2018 08:10:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934656AbeB1QJL (ORCPT + 99 others); Wed, 28 Feb 2018 11:09:11 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:34986 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932875AbeB1QJI (ORCPT ); Wed, 28 Feb 2018 11:09:08 -0500 Received: from [2a02:8011:400e:2:6f00:88c8:c921:d332] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1er3Yv-0006Xh-Dx; Wed, 28 Feb 2018 15:22:33 +0000 Received: from ben by deadeye with local (Exim 4.90_1) (envelope-from ) id 1er3Yd-0008Mm-3R; Wed, 28 Feb 2018 15:22:15 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Michael Lyle" , "Jens Axboe" , "Coly Li" , "Rui Hua" Date: Wed, 28 Feb 2018 15:20:18 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.16 011/254] bcache: recover data from backing when data is clean In-Reply-To: X-SA-Exim-Connect-IP: 2a02:8011:400e:2:6f00:88c8:c921:d332 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.16.55-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Rui Hua commit e393aa2446150536929140739f09c6ecbcbea7f0 upstream. When we send a read request and hit the clean data in cache device, there is a situation called cache read race in bcache(see the commit in the tail of cache_look_up(), the following explaination just copy from there): The bucket we're reading from might be reused while our bio is in flight, and we could then end up reading the wrong data. We guard against this by checking (in bch_cache_read_endio()) if the pointer is stale again; if so, we treat it as an error (s->iop.error = -EINTR) and reread from the backing device (but we don't pass that error up anywhere) It should be noted that cache read race happened under normal circumstances, not the circumstance when SSD failed, it was counted and shown in /sys/fs/bcache/XXX/internal/cache_read_races. Without this patch, when we use writeback mode, we will never reread from the backing device when cache read race happened, until the whole cache device is clean, because the condition (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR) will be passed up, at last, user will receive -EINTR when it's bio end, this is not suitable, and wield to up-application. In this patch, we use s->read_dirty_data to judge whether the read request hit dirty data in cache device, it is safe to reread data from the backing device when the read request hit clean data. This can not only handle cache read race, but also recover data when failed read request from cache device. [edited by mlyle to fix up whitespace, commit log title, comment spelling] Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean") Signed-off-by: Hua Rui Reviewed-by: Michael Lyle Reviewed-by: Coly Li Signed-off-by: Michael Lyle Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings --- drivers/md/bcache/request.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -698,16 +698,15 @@ static void cached_dev_read_error(struct { struct search *s = container_of(cl, struct search, cl); struct bio *bio = &s->bio.bio; - struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); /* - * If cache device is dirty (dc->has_dirty is non-zero), then - * recovery a failed read request from cached device may get a - * stale data back. So read failure recovery is only permitted - * when cache device is clean. + * If read request hit dirty data (s->read_dirty_data is true), + * then recovery a failed read request from cached device may + * get a stale data back. So read failure recovery is only + * permitted when read request hit clean data in cache device, + * or when cache read race happened. */ - if (s->recoverable && - (dc && !atomic_read(&dc->has_dirty))) { + if (s->recoverable && !s->read_dirty_data) { /* Retry from the backing device: */ trace_bcache_read_retry(s->orig_bio);