Received: by 2002:a05:6358:489b:b0:bb:da1:e618 with SMTP id x27csp465743rwn; Thu, 15 Sep 2022 01:12:46 -0700 (PDT) X-Google-Smtp-Source: AA6agR4A1kKs7lB0rl6uLU4iqUSP7cqMxKjFqeRodUttw+AlEXWl4yA3LCM96dhZ7A+qwouJffcT X-Received: by 2002:a17:907:2c77:b0:77b:4445:a852 with SMTP id ib23-20020a1709072c7700b0077b4445a852mr16961253ejc.582.1663229565891; Thu, 15 Sep 2022 01:12:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663229565; cv=none; d=google.com; s=arc-20160816; b=JQlrAIS+pbTWjsid7Np+CEnmTFva1w+Qc4Vabsf7HPZGS4ddZgycfW5rLsWTQ1QbFS GmQv4ctOz4cDoCTyB0mzR0Qd6GaTEO4AxBcQpQCO9pDwnc4XuV1Quh0S1N54TjS5kqNk tUM6M+KsJqnuv6aGRDhKvxr6JOnmm22jsmpxpHXBJx/bBdmE3giX+uQF8ciz5Ji5VIM2 wH/0fQk0b5d+enxx0Ql1zaU62bSGyq6UQJi9oGKGJnPW1Bs6+cxhFh383inInbXE3tqJ +x0xWaHOBeukZ1KBVp0Pjfl/djLrbJ21x2mvkThjjYcOo/HPpZf03lu6YHnhqhNngFez CBLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=jZjXacIHtf71qVEc6F7nmis+mD+u/JEdyW+ir9p4QvI=; b=lFKRM2wi83xemWWiAO1kf0eAl3+geWggfD6jao2ZN4noMdAv9BkzAwqj7DEeLnyjN0 rtDOsGmSlYhV0WnkZQiWSkuQUjh6/Q1sxVdAmQN0GwFbxIv5Uq0zn3RpE0uwHp+7Ok+H M6Gf5ZHFQWxN0TOhVLVB13t33+LmiZ0CNKoPic0r+QXx+jtr4OL1cCIVw5Oh81Ojt+Hx VLQodlrKQDHNcQiJB+W5sJ02ayyThl3H+NKPNUvakAxkzlLAoXXsgvP+8IoiyLqkIqq6 nnvPHTFrTWqTkd/oSoWiX+IWKhNrt0ynkm2i3XFFWGl2HylWMdeLhOfI6qm6kqz+95mG gdVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=Tm1mpm+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dd22-20020a1709069b9600b0077e7d991623si9157755ejc.109.2022.09.15.01.12.19; Thu, 15 Sep 2022 01:12:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=Tm1mpm+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230085AbiIOH3j (ORCPT + 99 others); Thu, 15 Sep 2022 03:29:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229501AbiIOH3R (ORCPT ); Thu, 15 Sep 2022 03:29:17 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E727397D48 for ; Thu, 15 Sep 2022 00:28:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=jZjXacIHtf71qVEc6F7nmis+mD+u/JEdyW+ir9p4QvI=; b=Tm1mpm+PSptPgt+Dy3cmUPgS4x i+lGxOmra5kHPfQdRFZIZqfmRGLc5RvWdhltMtOpgbU04M+XF2jiFII3ziIRrDPufwmFeIk9RGCyx V0WbPfCdylvwuQYj2TFMh87T329sKOh3U/sp2DsebspDEj3y89jjiCFBj7SyZMiuX4ntEE+cpp01h 4k9HSaYH3kjmFiUIQtE/1LcdyHOd98v97UB9shIhRXxZaP2U6YFUEwGGK+AJuR8Gdkc1X7L9r+UaE cViwxmw4IX9usMVcvkDLbNqK3HlVjhfAjxOJTLmH5CKaDOs2jLBQN0KRq1lyX/qRTG4KPyZ3nfxwE F4Fk60DA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1oYjIQ-000tE2-Ad; Thu, 15 Sep 2022 07:28:26 +0000 Date: Thu, 15 Sep 2022 08:28:26 +0100 From: Matthew Wilcox To: Hongchen Zhang Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/vmscan: don't scan adjust too much if current is not kswapd Message-ID: References: <20220914023318.549118-1-zhanghongchen@loongson.cn> <20220914155142.bf388515a39fb45bae987231@linux-foundation.org> <6bcb4883-03d0-88eb-4c42-84fff0a9a141@loongson.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6bcb4883-03d0-88eb-4c42-84fff0a9a141@loongson.cn> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 15, 2022 at 09:19:48AM +0800, Hongchen Zhang wrote: > [ 3748.453561] INFO: task float_bessel:77920 blocked for more than 120 > seconds. > [ 3748.460839] Not tainted 5.15.0-46-generic #49-Ubuntu > [ 3748.466490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 3748.474618] task:float_bessel state:D stack: 0 pid:77920 ppid: > 77327 flags:0x00004002 > [ 3748.483358] Call Trace: > [ 3748.485964] > [ 3748.488150] __schedule+0x23d/0x590 > [ 3748.491804] schedule+0x4e/0xc0 > [ 3748.495038] rwsem_down_read_slowpath+0x336/0x390 > [ 3748.499886] ? copy_user_enhanced_fast_string+0xe/0x40 > [ 3748.505181] down_read+0x43/0xa0 > [ 3748.508518] do_user_addr_fault+0x41c/0x670 > [ 3748.512799] exc_page_fault+0x77/0x170 > [ 3748.516673] asm_exc_page_fault+0x26/0x30 > [ 3748.520824] RIP: 0010:copy_user_enhanced_fast_string+0xe/0x40 > [ 3748.526764] Code: 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 31 c0 0f > 01 ca c3 cc cc cc cc 0f 1f 00 0f 01 cb 83 fa 40 0f 82 70 ff ff ff 89 d1 > a4 31 c0 0f 01 ca c3 cc cc cc cc 66 08 > [ 3748.546120] RSP: 0018:ffffaa9248fffb90 EFLAGS: 00050206 > [ 3748.551495] RAX: 00007f99faa1a010 RBX: ffffaa9248fffd88 RCX: > 0000000000000010 > [ 3748.558828] RDX: 0000000000001000 RSI: ffff9db397ab8ff0 RDI: > 00007f99faa1a000 > [ 3748.566160] RBP: ffffaa9248fffbf0 R08: ffffcc2fc2965d80 R09: > 0000000000000014 > [ 3748.573492] R10: 0000000000000000 R11: 0000000000000014 R12: > 0000000000001000 > [ 3748.580858] R13: 0000000000001000 R14: 0000000000000000 R15: > ffffaa9248fffd98 > [ 3748.588196] ? copy_page_to_iter+0x10e/0x400 > [ 3748.592614] filemap_read+0x174/0x3e0 Interesting; it wasn't the process itself which triggered the page fault; the process called read() and the kernel took the page fault to satisfy the read() call. > [ 3748.596354] ? ima_file_check+0x6a/0xa0 > [ 3748.600301] generic_file_read_iter+0xe5/0x150 > [ 3748.604884] ext4_file_read_iter+0x5b/0x190 > [ 3748.609164] ? aa_file_perm+0x102/0x250 > [ 3748.613125] new_sync_read+0x10d/0x1a0 > [ 3748.617009] vfs_read+0x103/0x1a0 > [ 3748.620423] ksys_read+0x67/0xf0 > [ 3748.623743] __x64_sys_read+0x19/0x20 > [ 3748.627511] do_syscall_64+0x59/0xc0 > [ 3748.631203] ? syscall_exit_to_user_mode+0x27/0x50 > [ 3748.636144] ? do_syscall_64+0x69/0xc0 > [ 3748.639992] ? exit_to_user_mode_prepare+0x96/0xb0 > [ 3748.644931] ? irqentry_exit_to_user_mode+0x9/0x20 > [ 3748.649872] ? irqentry_exit+0x1d/0x30 > [ 3748.653737] ? exc_page_fault+0x89/0x170 > [ 3748.657795] entry_SYSCALL_64_after_hwframe+0x61/0xcb > [ 3748.663030] RIP: 0033:0x7f9a852989cc > [ 3748.666713] RSP: 002b:00007f9a8497dc90 EFLAGS: 00000246 ORIG_RAX: > 0000000000000000 > [ 3748.674487] RAX: ffffffffffffffda RBX: 00007f9a8497f5c0 RCX: > 00007f9a852989cc > [ 3748.681817] RDX: 0000000000027100 RSI: 00007f99faa18010 RDI: > 0000000000000061 > [ 3748.689150] RBP: 00007f9a8497dd60 R08: 0000000000000000 R09: > 00007f99faa18010 > [ 3748.696493] R10: 0000000000000000 R11: 0000000000000246 R12: > 00007f99faa18010 > [ 3748.703841] R13: 00005605e11c406f R14: 0000000000000001 R15: > 0000000000027100 ORIG_RAX is 0, which matches sys_read. RDI is file descriptor 0x61 RSI is plausibly a userspace pointer, 0x7f99faa18010 RDX is the length, 0x27100 or 160kB. That all seems reasonable. What I really want to know is who is _holding_ the lock. We stash a pointer to the task_struct in 'owner', so we could clearly find this out in the 'blocked for too long' report, and print their stack trace. You must have done something like this already in order to deduce that it was the direct reclaim path that was the problem?