Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp3096464ybb; Mon, 30 Mar 2020 20:51:51 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuGQcPZFGHXU3mgJ7ZS72l7AP/EkfO3o73xgcFQ8UBaeZXqiHvt2OkO1s87SOV269EwC951 X-Received: by 2002:a9d:4b98:: with SMTP id k24mr1733866otf.26.1585626711311; Mon, 30 Mar 2020 20:51:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585626711; cv=none; d=google.com; s=arc-20160816; b=zfu7OAyRD2NSgSyxBkra4MRQSbVeN14U5MGlUFisGNOZkN1Jb3mz2xUwcprtLH0ys9 XKp8wbVsPhxdeTPxweye7pMg7/gwy1OspbNERSok+L1WDuhnYTJPYt2M6RK9lK8OXDSu qHh14wazh8jUDhf4iBGfjvvRxDQNHgkwAeGOWbCzYlBDOx2ay3KA4GMDQZ85d9Cz8R4c KlccbbjrQTbz/jgcG1MBCgS5PSjh+KhXkE59VMAeA07/l77g12rl1Q/CRg07pZNmlRFb XHnemumsA9OEY3pvLNlEpJwuKmWFQm9vp7bCrAK7v40T9LuAJfXJEZoiVLNCtY+VLMHR 1vmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=rgHcUe/KCuSOiydEaw9XNaMro3e075EYAppk43NME0k=; b=llmcgavIFkWFkixtOhemGT5da9I+TNTfGE0FhVWMMJfPicQcTJxiAXLWEITSJdob28 wYaiosFNjImBDU2BqIPEp5LbLIIX0vGXkBXmP9hP2hL38k+pfeaZiSFXwOrC/4K8518J oe2D+CNgXEc7xxNTM87HqaX6u1v5pGkm7lw1y2s5haUQkyIKlYyJyaH2Zyh95lwnziB7 qvpxsX5cBW7W0URgSAK06yOUGuMnjR9C64JcfPbGx4W4mgvTSxPST9ynthz33QpWbpz4 1YEB+h8AYrkBomOLth3goQzG1zJBCiLk6wLyxr1jy0QUilJi19qI85gNTEhcxFnUsxe3 LWIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=sO+vJ2Io; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n22si1527949otq.10.2020.03.30.20.51.36; Mon, 30 Mar 2020 20:51:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=sO+vJ2Io; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729429AbgCaDue (ORCPT + 99 others); Mon, 30 Mar 2020 23:50:34 -0400 Received: from mail.kernel.org ([198.145.29.99]:36268 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729013AbgCaDue (ORCPT ); Mon, 30 Mar 2020 23:50:34 -0400 Received: from localhost (unknown [104.132.1.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 47F4E2072D; Tue, 31 Mar 2020 03:50:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1585626633; bh=h7evqLoRS+TtiH8NA805NMyh6rHNmX6o4iSJvYr/zS4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=sO+vJ2Io2Kcpv5t7uyDGmmYqFc0l7Z3oJpImlqRF4r1m7JY6zbfUh3WAtn29Bmiia 0kLxt8g+ZJoMGQZF+t0VaDasZ5P6YBrD9z9iPdvPnx/wd57tkVSNSRTa40gmBddH8j ptZTzV/MIb8DbNlOMmNOuXd9pCK1LNvH0GFu4aJM= Date: Mon, 30 Mar 2020 20:50:32 -0700 From: Jaegeuk Kim To: Sahitya Tummala Cc: Chao Yu , linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH] f2fs: fix long latency due to discard during umount Message-ID: <20200331035032.GA79749@google.com> References: <29d4adc4-482d-3d92-1470-3405989ea231@huawei.com> <20200326133700.GR20234@codeaurora.org> <2b0d8d4c-a981-4edc-d8ca-fe199a63ea79@huawei.com> <20200327030542.GS20234@codeaurora.org> <20200330065335.GT20234@codeaurora.org> <9adc5c7e-7936-bac7-58b1-50631f8ac5eb@huawei.com> <5ec3b2e1-162c-e62d-1834-100c8ae39ff7@huawei.com> <20200330105122.GV20234@codeaurora.org> <20200331031027.GY20234@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200331031027.GY20234@codeaurora.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/31, Sahitya Tummala wrote: > > Sure Chao. Let us put this patch on hold for now then. > > Jaeguek, > > Please drop this patch from your tree for time being as it needs > further discussion. Yeah, I dropped it. Thanks, > > Thanks, > > On Tue, Mar 31, 2020 at 09:46:30AM +0800, Chao Yu wrote: > > Hi Sahitya, > > > > On 2020/3/30 18:51, Sahitya Tummala wrote: > > > Hi Chao, > > > > > > On Mon, Mar 30, 2020 at 06:16:40PM +0800, Chao Yu wrote: > > >> On 2020/3/30 16:38, Chao Yu wrote: > > >>> Hi Sahitya, > > >>> > > >>> Bad news, :( I guess we didn't catch the root cause, as after applying v3, > > >>> I still can reproduce this issue: > > >>> > > >>> generic/003 10s ... 30s > > >> > > >> I use zram as backend device of fstest, > > >> > > >> Call Trace: > > >> dump_stack+0x66/0x8b > > >> f2fs_submit_discard_endio+0x88/0xa0 [f2fs] > > >> generic_make_request_checks+0x70/0x5f0 > > >> generic_make_request+0x3e/0x2e0 > > >> submit_bio+0x72/0x140 > > >> __submit_discard_cmd.isra.50+0x4a8/0x710 [f2fs] > > >> __issue_discard_cmd+0x171/0x3a0 [f2fs] > > >> > > >> Does this mean zram uses single queue, so we may always fail to submit 'nowait' > > >> IO due to below condition: > > >> > > >> /* > > >> * Non-mq queues do not honor REQ_NOWAIT, so complete a bio > > >> * with BLK_STS_AGAIN status in order to catch -EAGAIN and > > >> * to give a chance to the caller to repeat request gracefully. > > >> */ > > >> if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_mq(q)) { > > >> status = BLK_STS_AGAIN; > > >> goto end_io; > > >> } > > >> > > > > > > Yes, I have also just figured out that as the reason. But most of the real block > > > devic drivers support MQ. Can we thus fix this case by checking for MQ status > > > before enabling REQ_NOWAIT as below? Please share your comments. > > > > > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > > > index cda7935..e7e2ffe 100644 > > > --- a/fs/f2fs/segment.c > > > +++ b/fs/f2fs/segment.c > > > @@ -1131,7 +1131,9 @@ static int __submit_discard_cmd(struct f2fs_sb_info *sbi, > > > > > > flag = dpolicy->sync ? REQ_SYNC : 0; > > > - flag |= dpolicy->type == DPOLICY_UMOUNT ? REQ_NOWAIT : 0; > > > + > > > + if (sbi->sb->s_bdev->bd_queue && queue_is_mq(sbi->sb->s_bdev->bd_queue)) > > > + flag |= dpolicy->type == DPOLICY_UMOUNT ? REQ_NOWAIT : 0; > > > > IMO, it's too tight to couple with block layer logic? however, I don't have > > any better idea about the solution. > > > > Anyway, I guess we can Cc to Jan and block mailing list for comments to see > > whether there is a better solution. > > > > Thoughts? > > > > Thanks, > > > > > > > > if (dc->state != D_PREP) > > > return 0; > > > > > > Thanks, > > > > > >> > > >> > > >>> > > >>> Thanks, > > >>> > > >>> On 2020/3/30 14:53, Sahitya Tummala wrote: > > >>>> Hi Chao, > > >>>> > > >>>> On Fri, Mar 27, 2020 at 08:35:42AM +0530, Sahitya Tummala wrote: > > >>>>> On Fri, Mar 27, 2020 at 09:51:43AM +0800, Chao Yu wrote: > > >>>>>> > > >>>>>> With this patch, most of xfstest cases cost 5 * n second longer than before. > > >>>>>> > > >>>>>> E.g. generic/003, during umount(), we looped into retrying one bio > > >>>>>> submission. > > >>>>>> > > >>>>>> [61279.829724] F2FS-fs (zram1): Found nat_bits in checkpoint > > >>>>>> [61279.885337] F2FS-fs (zram1): Mounted with checkpoint version = 5cf3cb8e > > >>>>>> [61281.912832] submit discard bio start [23555,1] > > >>>>>> [61281.912835] f2fs_submit_discard_endio [23555,1] err:-11 > > >>>>>> [61281.912836] submit discard bio end [23555,1] > > >>>>>> [61281.912836] move dc to retry list [23555,1] > > >>>>>> > > >>>>>> ... > > >>>>>> > > >>>>>> [61286.881212] submit discard bio start [23555,1] > > >>>>>> [61286.881217] f2fs_submit_discard_endio [23555,1] err:-11 > > >>>>>> [61286.881223] submit discard bio end [23555,1] > > >>>>>> [61286.881224] move dc to retry list [23555,1] > > >>>>>> [61286.905198] submit discard bio start [23555,1] > > >>>>>> [61286.905203] f2fs_submit_discard_endio [23555,1] err:-11 > > >>>>>> [61286.905205] submit discard bio end [23555,1] > > >>>>>> [61286.905206] move dc to retry list [23555,1] > > >>>>>> [61286.929157] F2FS-fs (zram1): Issue discard(23555, 23555, 1) failed, ret: -11 > > >>>>>> > > >>>>>> Could you take a look at this issue? > > >>>>> > > >>>>> Let me check and get back on this. > > >>>> > > >>>> I found the issue. The dc with multiple bios is getting requeued again and > > >>>> again in case if one of its bio gets -EAGAIN error. Even the successfully > > >>>> completed bios are getting requeued again resulting into long latency. > > >>>> I have fixed it by splitting the dc in such case so that we can requeue only > > >>>> the leftover bios into a new dc and retry that later within the 5 sec timeout. > > >>>> > > >>>> Please help to review v3 posted and if it looks good, I would like to request > > >>>> you to test the earlier regression scenario with it to check the result again? > > >>>> > > >>>> thanks, > > >>>> > > >>>>> > > >>>>> Thanks, > > >>>>> > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> > > >>>>>>>>> + break; > > >>>>>>>>> + } > > >>>>>>>>> + } > > >>>>>>>>> > > >>>>>>>>> atomic_inc(&dcc->issued_discard); > > >>>>>>>>> > > >>>>>>>>> @@ -1463,6 +1477,40 @@ static unsigned int __issue_discard_cmd_orderly(struct f2fs_sb_info *sbi, > > >>>>>>>>> return issued; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> +static bool __should_discard_retry(struct f2fs_sb_info *sbi, > > >>>>>>> s> > + struct discard_policy *dpolicy) > > >>>>>>>>> +{ > > >>>>>>>>> + struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; > > >>>>>>>>> + struct discard_cmd *dc, *tmp; > > >>>>>>>>> + bool retry = false; > > >>>>>>>>> + unsigned long flags; > > >>>>>>>>> + > > >>>>>>>>> + if (dpolicy->type != DPOLICY_UMOUNT) > > >>>>>>>>> + f2fs_bug_on(sbi, 1); > > >>>>>>>>> + > > >>>>>>>>> + mutex_lock(&dcc->cmd_lock); > > >>>>>>>>> + list_for_each_entry_safe(dc, tmp, &(dcc->retry_list), list) { > > >>>>>>>>> + if (dpolicy->timeout != 0 && > > >>>>>>>>> + f2fs_time_over(sbi, dpolicy->timeout)) { > > >>>>>>>>> + retry = false; > > >>>>>>>>> + break; > > >>>>>>>>> + } > > >>>>>>>>> + > > >>>>>>>>> + spin_lock_irqsave(&dc->lock, flags); > > >>>>>>>>> + if (!dc->bio_ref) { > > >>>>>>>>> + dc->state = D_PREP; > > >>>>>>>>> + dc->error = 0; > > >>>>>>>>> + reinit_completion(&dc->wait); > > >>>>>>>>> + __relocate_discard_cmd(dcc, dc); > > >>>>>>>>> + retry = true; > > >>>>>>>>> + } > > >>>>>>>>> + spin_unlock_irqrestore(&dc->lock, flags); > > >>>>>>>>> + } > > >>>>>>>>> + mutex_unlock(&dcc->cmd_lock); > > >>>>>>>>> + > > >>>>>>>>> + return retry; > > >>>>>>>>> +} > > >>>>>>>>> + > > >>>>>>>>> static int __issue_discard_cmd(struct f2fs_sb_info *sbi, > > >>>>>>>>> struct discard_policy *dpolicy) > > >>>>>>>>> { > > >>>>>>>>> @@ -1470,12 +1518,13 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi, > > >>>>>>>>> struct list_head *pend_list; > > >>>>>>>>> struct discard_cmd *dc, *tmp; > > >>>>>>>>> struct blk_plug plug; > > >>>>>>>>> - int i, issued = 0; > > >>>>>>>>> + int i, err, issued = 0; > > >>>>>>>>> bool io_interrupted = false; > > >>>>>>>>> > > >>>>>>>>> if (dpolicy->timeout != 0) > > >>>>>>>>> f2fs_update_time(sbi, dpolicy->timeout); > > >>>>>>>>> > > >>>>>>>>> +retry: > > >>>>>>>>> for (i = MAX_PLIST_NUM - 1; i >= 0; i--) { > > >>>>>>>>> if (dpolicy->timeout != 0 && > > >>>>>>>>> f2fs_time_over(sbi, dpolicy->timeout)) > > >>>>>>>>> @@ -1509,7 +1558,10 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi, > > >>>>>>>>> break; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> - __submit_discard_cmd(sbi, dpolicy, dc, &issued); > > >>>>>>>>> + err = __submit_discard_cmd(sbi, dpolicy, dc, &issued); > > >>>>>>>>> + if (err == -EAGAIN) > > >>>>>>>>> + congestion_wait(BLK_RW_ASYNC, > > >>>>>>>>> + DEFAULT_IO_TIMEOUT); > > >>>>>>>>> > > >>>>>>>>> if (issued >= dpolicy->max_requests) > > >>>>>>>>> break; > > >>>>>>>>> @@ -1522,6 +1574,10 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi, > > >>>>>>>>> break; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> + if (!list_empty(&dcc->retry_list) && > > >>>>>>>>> + __should_discard_retry(sbi, dpolicy)) > > >>>>>>>>> + goto retry; > > >>>>>>>>> + > > >>>>>>>>> if (!issued && io_interrupted) > > >>>>>>>>> issued = -1; > > >>>>>>>>> > > >>>>>>>>> @@ -1613,6 +1669,12 @@ static unsigned int __wait_discard_cmd_range(struct f2fs_sb_info *sbi, > > >>>>>>>>> goto next; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> + if (dpolicy->type == DPOLICY_UMOUNT && > > >>>>>>>>> + !list_empty(&dcc->retry_list)) { > > >>>>>>>>> + wait_list = &dcc->retry_list; > > >>>>>>>>> + goto next; > > >>>>>>>>> + } > > >>>>>>>>> + > > >>>>>>>>> return trimmed; > > >>>>>>>>> } > > >>>>>>>>> > > >>>>>>>>> @@ -2051,6 +2113,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi) > > >>>>>>>>> for (i = 0; i < MAX_PLIST_NUM; i++) > > >>>>>>>>> INIT_LIST_HEAD(&dcc->pend_list[i]); > > >>>>>>>>> INIT_LIST_HEAD(&dcc->wait_list); > > >>>>>>>>> + INIT_LIST_HEAD(&dcc->retry_list); > > >>>>>>>>> INIT_LIST_HEAD(&dcc->fstrim_list); > > >>>>>>>>> mutex_init(&dcc->cmd_lock); > > >>>>>>>>> atomic_set(&dcc->issued_discard, 0); > > >>>>>>>>> > > >>>>>>> > > >>>>> > > >>>>> -- > > >>>>> -- > > >>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc. > > >>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. > > >>>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> Linux-f2fs-devel mailing list > > >>> Linux-f2fs-devel@lists.sourceforge.net > > >>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel > > >>> . > > >>> > > > > > -- > -- > Sent by a consultant of the Qualcomm Innovation Center, Inc. > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.