Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp1677076pxb; Sun, 18 Apr 2021 03:29:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy1IlVimdTEi9U+rPMJ1+FAzx3TXVjbsoBhQRcVE+mR2gsLV6MsJiwucmLf7T0OxSgFXDYp X-Received: by 2002:a17:907:1c01:: with SMTP id nc1mr17081508ejc.283.1618741771856; Sun, 18 Apr 2021 03:29:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618741771; cv=none; d=google.com; s=arc-20160816; b=TMh0Y6/1nWeNR7CaLk/rp/W4YM0xXI3O5H8dc99Kb2z1x4fLc5Ruy47JRlriD423E4 C+gsUH2Pf5f5J9t3viI5iB1FpUDnGHmUXf/DCPxwxxZghws6qiwp8BowaBeJBQd3dMu2 bunGMDuiOYcaJTM/eZPq2pUasMiLwQDGU4dMRZdKAlQ/xNwvpO+D3H5aYt3jDmKGxstN JhEJl3bCbGL8KlHSYFF7dMWs7ULK8cJiUYbPTZUFEpa3WmXLUiIwrPTG0NLlXnFfHV9U rqYRCZoT9KRvBf6ORE6LloSUcCfdWx1zggiw5NbtCWgflkc4eYH92CmEig3lyvSNQWnu QkAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=xwS5mXOCzAfAhwqK8DIpLCGsasiEFs4mQYJYc0VzbAA=; b=DnkGK9gW0uZs463xCPzaIu7s9mJBoZ7oQdhPr20AhZLIyCCxO4B/CDhJ6Ib7yEQ4rJ 3B2TzgsLXTrjsQ+jsFkfE9LGT87pl4ROQWGjtZ1KR0MQ++iPntSMkcsmvvrI9Pt86jDf nIcsre+ZdhRRVstgeEVC1A7IaAJVq/86YlykmJ1exu5apRU5ihv4HlZy+j0zvoT28rR9 MN9rpX/KW12FuUKUpN7o/iVJNijDcgPl1aGZCTSpgkP6nIgSFZT45TyJIWu/HEF9xR5x ps7wPVcaOUL8K3bWt+SP/fhRZQ08+K8oN7SaLHcGNX1uoERfApQ5ZfWtTHilN8cwExiu WBJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h17si9540396ejx.114.2021.04.18.03.28.54; Sun, 18 Apr 2021 03:29:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229574AbhDRK3T (ORCPT + 99 others); Sun, 18 Apr 2021 06:29:19 -0400 Received: from out30-45.freemail.mail.aliyun.com ([115.124.30.45]:37109 "EHLO out30-45.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229544AbhDRK3S (ORCPT ); Sun, 18 Apr 2021 06:29:18 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=wenyang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0UVv.WOJ_1618741722; Received: from localhost(mailfrom:wenyang@linux.alibaba.com fp:SMTPD_---0UVv.WOJ_1618741722) by smtp.aliyun-inc.com(127.0.0.1); Sun, 18 Apr 2021 18:28:47 +0800 From: Wen Yang To: tytso@mit.edu, Andreas Dilger Cc: Wen Yang , Ritesh Harjani , Baoyou Xie , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] fs/ext4: prevent the CPU from being 100% occupied in ext4_mb_discard_group_preallocations Date: Sun, 18 Apr 2021 18:28:34 +0800 Message-Id: <20210418102834.29589-1-wenyang@linux.alibaba.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org The kworker has occupied 100% of the CPU for several days: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 68086 root 20 0 0 0 0 R 100.0 0.0 9718:18 kworker/u64:11 And the stack obtained through sysrq is as follows: [20613144.850426] task: ffff8800b5e08000 task.stack: ffffc9001342c000 [20613144.850427] RIP: 0010:[] ^Ac [] ext4_mb_discard_group_preallocations+0x1b3/0x480 [ext4] ... [20613144.850435] Stack: [20613144.850435] ffff881942d6a6e8^Ac ffff8813bb5f72d0^Ac 00000001a02427cf^Ac 0000000000000140^Ac [20613144.850436] ffff880f80618000^Ac 0000000000000000^Ac ffffc9001342f770^Ac ffffc9001342f770^Ac [20613144.850437] ffffea0056360dc0^Ac ffff88158d837000^Ac ffffea0045155f80^Ac ffff88114557e000^Ac [20613144.850438] Call Trace: [20613144.850439] [] ext4_mb_new_blocks+0x429/0x550 [ext4] [20613144.850439] [] ext4_ext_map_blocks+0xb5e/0xf30 [ext4] [20613144.850440] [] ? numa_zonelist_order_handler+0xa1/0x1c0 [20613144.850441] [] ext4_map_blocks+0x172/0x620 [ext4] [20613144.850441] [] ? ext4_writepages+0x4cd/0xf00 [ext4] [20613144.850442] [] ext4_writepages+0x7e5/0xf00 [ext4] [20613144.850442] [] ? wb_position_ratio+0x1f0/0x1f0 [20613144.850443] [] do_writepages+0x1e/0x30 [20613144.850444] [] __writeback_single_inode+0x45/0x320 [20613144.850444] [] writeback_sb_inodes+0x272/0x600 [20613144.850445] [] __writeback_inodes_wb+0x92/0xc0 [20613144.850445] [] wb_writeback+0x268/0x300 [20613144.850446] [] wb_workfn+0xb4/0x380 [20613144.850447] [] process_one_work+0x189/0x420 [20613144.850447] [] worker_thread+0x4e/0x4b0 [20613144.850448] [] ? process_one_work+0x420/0x420 [20613144.850448] [] kthread+0xe6/0x100 [20613144.850449] [] ? kthread_park+0x60/0x60 [20613144.850450] [] ret_from_fork+0x39/0x50 The thread that references this pa has been waiting for IO to return: PID: 15140 TASK: ffff88004d6dc300 CPU: 16 COMMAND: "kworker/u64:1" [ffffc900273e7518] __schedule at ffffffff8173ca3b [ffffc900273e75a0] schedule at ffffffff8173cfb6 [ffffc900273e75b8] io_schedule at ffffffff810bb75a [ffffc900273e75e0] bit_wait_io at ffffffff8173d8d1 [ffffc900273e75f8] __wait_on_bit_lock at ffffffff8173d4e9 [ffffc900273e7638] out_of_line_wait_on_bit_lock at ffffffff8173d742 [ffffc900273e76b0] __lock_buffer at ffffffff81288c32 [ffffc900273e76c8] do_get_write_access at ffffffffa00dd177 [jbd2] [ffffc900273e7728] jbd2_journal_get_write_access at ffffffffa00dd3a3 [jbd2] [ffffc900273e7750] __ext4_journal_get_write_access at ffffffffa023b37b [ext4] [ffffc900273e7788] ext4_mb_mark_diskspace_used at ffffffffa0242a0b [ext4] [ffffc900273e77f0] ext4_mb_new_blocks at ffffffffa0244100 [ext4] [ffffc900273e7860] ext4_ext_map_blocks at ffffffffa02389ae [ext4] [ffffc900273e7950] ext4_map_blocks at ffffffffa0204b52 [ext4] [ffffc900273e79d0] ext4_writepages at ffffffffa0208675 [ext4] [ffffc900273e7b30] do_writepages at ffffffff811c487e [ffffc900273e7b40] __writeback_single_inode at ffffffff81280265 [ffffc900273e7b90] writeback_sb_inodes at ffffffff81280ab2 [ffffc900273e7c90] __writeback_inodes_wb at ffffffff81280ed2 [ffffc900273e7cd8] wb_writeback at ffffffff81281238 [ffffc900273e7d80] wb_workfn at ffffffff812819f4 [ffffc900273e7e18] process_one_work at ffffffff810a5dc9 [ffffc900273e7e60] worker_thread at ffffffff810a60ae [ffffc900273e7ec0] kthread at ffffffff810ac696 [ffffc900273e7f50] ret_from_fork at ffffffff81741dd9 On the bare metal server, we will use multiple hard disks, the Linux kernel will run on the system disk, and business programs will run on several hard disks virtualized by the BM hypervisor. The reason why IO has not returned here is that the process handling IO in the BM hypervisor has failed. The cpu resources of the cloud server are precious, and the server cannot be restarted after running for a long time. So it's slightly optimized here to prevent the CPU from being 100% occupied. Signed-off-by: Wen Yang Cc: "Theodore Ts'o" Cc: Andreas Dilger Cc: Ritesh Harjani Cc: Baoyou Xie Cc: linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/ext4/mballoc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a02fadf..c73f212 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -351,6 +351,8 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap, ext4_group_t group); static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac); +static inline void ext4_mb_show_pa(struct super_block *sb); + /* * The algorithm using this percpu seq counter goes below: * 1. We sample the percpu discard_pa_seq counter before trying for block @@ -4217,9 +4219,9 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac) struct ext4_prealloc_space *pa, *tmp; struct list_head list; struct ext4_buddy e4b; + int free_total = 0; + int busy, free; int err; - int busy = 0; - int free, free_total = 0; mb_debug(sb, "discard preallocation for group %u\n", group); if (list_empty(&grp->bb_prealloc_list)) @@ -4247,6 +4249,7 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac) INIT_LIST_HEAD(&list); repeat: + busy = 0; free = 0; ext4_lock_group(sb, group); list_for_each_entry_safe(pa, tmp, @@ -4255,6 +4258,8 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac) if (atomic_read(&pa->pa_count)) { spin_unlock(&pa->pa_lock); busy = 1; + mb_debug(sb, "used pa while discarding for group %u\n", group); + ext4_mb_show_pa(sb); continue; } if (pa->pa_deleted) { @@ -4299,8 +4304,7 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac) /* if we still need more blocks and some PAs were used, try again */ if (free_total < needed && busy) { ext4_unlock_group(sb, group); - cond_resched(); - busy = 0; + schedule_timeout_uninterruptible(HZ/100); goto repeat; } ext4_unlock_group(sb, group); -- 1.8.3.1