Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1872152pxk; Sun, 13 Sep 2020 20:22:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxngOZevpWuWAFIMVrbqtaMahIyMau2wDbrrtk6mkt1bpeEp6AjDWbCGwk9sIvlWvH2J3XK X-Received: by 2002:a17:907:72c2:: with SMTP id du2mr13055311ejc.512.1600053749301; Sun, 13 Sep 2020 20:22:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600053749; cv=none; d=google.com; s=arc-20160816; b=apoX16Z0bB/6jXD6flIoQY/ntLKyVzj0jwg3wK0fLowdPGT/vgVZekYxDQAifwgrmh y1luKFYNsCOW1Mf+GUKKHVZo5ss0Oh10gioDa9HfTlC4LlpCrXYVjnnwGmRVcZjazljM AlNpzzGpdAJwXTv/jfeIhEmmoN2ayrUrk71bgRtN6y7ivGfVc6C+/MjPjAEmcWu0zoc6 UvH6B4OaNAaNyXSF6c+RUkswGDADgAmVqAe8V7ZXH+5e0Fjj7GxNggT4WtHpTdAhFUqe 1gsZTXqQTxHVlpBNXgpeQ0Mr5vizJ6F/2kdVInAz8ATnT1cmEKQ2R/N1/c2IIbXzi0Cb YkSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=ATmQZEQDqDDD+sC54lHEFqlYuT94MQF2Oyn29PCSh3E=; b=dvTYyAovswuG5Y4/UaTagRg/JI1E9RNuJ9ksVmnZBBp2rEKMhJAh0e7ct1Fde2Cszl O7I6k9H6oT8zKzl1CLoiJZVKif+7Vqlu4JdIn6BbkmUkgMVXjaiBAQP1+P+ASnkcf7K8 RfP0qMoJGl3XPhzbxH5aIjt801sA+/+SGGFSm0ldwokYv3mZioJe2EzWWevpz0XQpM2q SD0JyEtX7FFfng/dW8vTzKUiLj3TfP00QmYAFtzcEXNMwdL9rQe3umL/5fDBCwIQVBZC vUmN+krg/ixV8cYqztnG/WxPe6FhyTVUtiaTtEVI7XhwfZ5WqAATLM631meAPlmFWcRe j2Jw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w7si5987816ejf.3.2020.09.13.20.21.55; Sun, 13 Sep 2020 20:22:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725981AbgINDVw (ORCPT + 99 others); Sun, 13 Sep 2020 23:21:52 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:11824 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725965AbgINDVv (ORCPT ); Sun, 13 Sep 2020 23:21:51 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 18639A43D1BD9D69EB85; Mon, 14 Sep 2020 11:21:48 +0800 (CST) Received: from [10.174.179.86] (10.174.179.86) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Mon, 14 Sep 2020 11:21:46 +0800 Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks To: Ritesh Harjani , Jan Kara References: <20200910091252.525346-1-yebin10@huawei.com> <20200910161753.GB17362@quack2.suse.cz> <0e48b9bb-f839-4b3b-dbce-45755618df97@linux.ibm.com> CC: , , , From: yebin Message-ID: <5F5EE1CA.9090306@huawei.com> Date: Mon, 14 Sep 2020 11:21:46 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <0e48b9bb-f839-4b3b-dbce-45755618df97@linux.ibm.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.86] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org In fact, we didn't free the available space, and other processes couldn't get it even if they tried again. According to your opinions, I made two different revisions. Which one do you think is better? (1)Free PAs before repeat diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 132c118d12e1..4ab76882350d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4189,7 +4189,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, INIT_LIST_HEAD(&list); repeat: ext4_lock_group(sb, group); - this_cpu_inc(discard_pa_seq); list_for_each_entry_safe(pa, tmp, &grp->bb_prealloc_list, pa_group_list) { spin_lock(&pa->pa_lock); @@ -4215,22 +4214,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, list_add(&pa->u.pa_tmp_list, &list); } - /* if we still need more blocks and some PAs were used, try again */ - if (free < needed && busy) { - busy = 0; - ext4_unlock_group(sb, group); - cond_resched(); - goto repeat; - } - - /* found anything to free? */ - if (list_empty(&list)) { - BUG_ON(free != 0); - mb_debug(sb, "Someone else may have freed PA for this group %u\n", - group); - goto out; - } - /* now free all selected PAs */ list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) { @@ -4248,6 +4231,14 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback); } + /* if we still need more blocks and some PAs were used, try again */ + if (free < needed && busy) { + busy = 0; + ext4_unlock_group(sb, group); + cond_resched(); + goto repeat; + } + out: ext4_unlock_group(sb, group); ext4_mb_unload_buddy(&e4b); -- (2) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 132c118d12e1..188772bbf679 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4189,7 +4189,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, INIT_LIST_HEAD(&list); repeat: ext4_lock_group(sb, group); - this_cpu_inc(discard_pa_seq); list_for_each_entry_safe(pa, tmp, &grp->bb_prealloc_list, pa_group_list) { spin_lock(&pa->pa_lock); @@ -4217,6 +4216,8 @@ ext4_mb_discard_group_preallocations(struct super_block *sb, /* if we still need more blocks and some PAs were used, try again */ if (free < needed && busy) { + if (free) + this_cpu_inc(discard_pa_seq); busy = 0; ext4_unlock_group(sb, group); cond_resched(); On 2020/9/11 21:20, Ritesh Harjani wrote: > Hello Ye, > > Please excuse if there is something horribly wrong with my email > formatting. Have yesterday received this laptop and still setting up > few things. > > On 9/10/20 9:47 PM, Jan Kara wrote: >> On Thu 10-09-20 17:12:52, Ye Bin wrote: >>> As we test disk offline/online with running fsstress, we find fsstress >>> process is keeping running state. >>> kworker/u32:3-262 [004] ...1 140.787471: >>> ext4_mb_discard_preallocations: dev 8,32 needed 114 >>> .... >>> kworker/u32:3-262 [004] ...1 140.787471: >>> ext4_mb_discard_preallocations: dev 8,32 needed 114 >>> >>> ext4_mb_new_blocks >>> repeat: >>> ext4_mb_discard_preallocations_should_retry(sb, ac, &seq) >>> freed = ext4_mb_discard_preallocations >>> ext4_mb_discard_group_preallocations >>> this_cpu_inc(discard_pa_seq); >>> ---> freed == 0 >>> seq_retry = ext4_get_discard_pa_seq_sum >>> for_each_possible_cpu(__cpu) >>> __seq += per_cpu(discard_pa_seq, __cpu); >>> if (seq_retry != *seq) { >>> *seq = seq_retry; >>> ret = true; >>> } >>> >>> As we see seq_retry is sum of discard_pa_seq every cpu, if >>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this >>> cpu maybe increase one, so condition "seq_retry != *seq" have always >>> been met. >>> To Fix this problem, in ext4_mb_discard_group_preallocations >>> function increase >>> discard_pa_seq only when it found preallocation to discard. >>> >>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for >>> freeing PA to improve ENOSPC handling") >>> Signed-off-by: Ye Bin >> >> Thanks for the patch. One comment below. >> >>> --- >>> fs/ext4/mballoc.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >>> index f386fe62727d..fd55264dc3fe 100644 >>> --- a/fs/ext4/mballoc.c >>> +++ b/fs/ext4/mballoc.c >>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct >>> super_block *sb, >>> INIT_LIST_HEAD(&list); >>> repeat: >>> ext4_lock_group(sb, group); >>> - this_cpu_inc(discard_pa_seq); >>> list_for_each_entry_safe(pa, tmp, >>> &grp->bb_prealloc_list, pa_group_list) { >>> spin_lock(&pa->pa_lock); >>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct >>> super_block *sb, >>> goto out; >>> } >>> + /* only increase when find reallocation to discard */ >>> + this_cpu_inc(discard_pa_seq); >>> + >> >> This is a good place to increment the counter but I think you also >> need to >> handle the case: >> >> if (free < needed && busy) { >> busy = 0; >> ext4_unlock_group(sb, group); >> cond_resched(); >> goto repeat; >> } >> >> We can unlock the group here after removing some preallocations and thus >> other processes checking discard_pa_seq could miss we did this. In >> fact I >> think the code is somewhat buggy here and we should also discard extents >> accumulated on "list" so far before unlocking the group. Ritesh? >> > > mmm, so even though this code is not discarding those buffers b4 > unlocking, but it has still removed those from the grp->bb_prealloc_list > and added it to the local list. And since it will at some point get > scheduled and start operating from repeat: label so functionally wise > this should be ok. Am I missing anything? > > Although I agree, that if we remove at least the current pa's before > unlocking the group may be a good idea, but we should also check > why was this done like this at the first place. > > > I agree with Jan, that we should increment discard_pa_seq once we > actually have something > to discard. I should have written a comment here to explain why we did > this here. > But I think commit msg should have all the history (since I have a > habit of writing long commit msgs ;) > > But IIRC, it was done since in case if there is a parallel thread > which is discarding > all the preallocations so the current thread may return 0 since it > checks the > list_empty(&grp->bb_prealloc_list) check in > ext4_mb_discard_group_preallocations() and returns 0 directly. > > And why the discard_pa_seq counter at other places may not help since > we remove the pa nodes from > grp->bb_prealloc_list into a local list and then start operating on > that. So meanwhile some thread may comes and just checks that the list > is empty and return 0 while some other thread may start discarding from > it's local list. > So I guess the main problem was that in the current code we remove > the pa from grp->bb_prealloc_list and add it to local list. So if > someone else comes > and checks that grp->bb_prealloc_list is empty then it will directly > return 0. > > So, maybe we could do something like this then? > > repeat: > ext4_lock_group(sb, group); > - this_cpu_inc(discard_pa_seq); > list_for_each_entry_safe(pa, tmp, > &grp->bb_prealloc_list, pa_group_list) {<...> > > + if (!free) > + this_cpu_inc(discard_pa_seq); // we should do this here > before calling list_del(&pa->pa_group_list); > > /* we can trust pa_free ... */ > free += pa->pa_free; > spin_unlock(&pa->pa_lock); > > list_del(&pa->pa_group_list); > list_add(&pa->u.pa_tmp_list, &list); > } > > I have some test cases around this to test for cases which were > failing. Since in world of parallelism you can't be 100% certain of some > corner case (like this one you just reported). > But, I don't have my other box rite now where I kept all of those - > due to some technical issues. I think I should be able to get those by > next week, if not, I anyways will setup my current machine for testing > this. > > -ritesh > . >