Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp1218698pxb; Fri, 20 Aug 2021 00:11:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzi/jV8USOgEM5OioFVdEUWdZkp+YnN23/q9wLXErheagmS/C+IqnABaEGZQDLonluhaOFS X-Received: by 2002:a17:906:a08a:: with SMTP id q10mr19624546ejy.100.1629443478244; Fri, 20 Aug 2021 00:11:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629443478; cv=none; d=google.com; s=arc-20160816; b=JQ9f1byXS/TOMbAmYFmLDUXjG7k3JJtBqatIHdF1Hi17mP1R2Px+LcEkpZa1yLfzvv uvlEiq/KtevwCluaRvOvdq65kYs8JCKm+pHFbh0COMY96WLqv8jfymPY2B6BT0dg1p34 Z82JDfVJ3mEOOpcFdGu69/TgvHLzMy4UgDEwn6hJitcYqGLXIEnyn2U5JWFNEq6YkTIE CbrWnnO3x9aLYIKF6QQSYimNLRryiAGCvMVEODY12wfehZUFPQswN9zBYMKLfhF+J8mv FFyMmt0TgoJCPeN5+VOuesv021RidQ/daOhW9669AfPji/baQcny3vFiKP9e/3VbvDeK M5Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=FoMMrW1eYnWXMK1i/ukyETErdaYGzYVx0R0Q3/Dd0TI=; b=a/+niaRV2xunud28jUvqNx2Tcb4Da9HkqG4Q/rwq6B6455SrLqnfserXO3DZ97oRYX Uq8r8+dfm6FmQVkNY5ZVMGXmPulHV4l1+1RH8xdX8jBMC8Sg2yR+64ZwKj4xaaa7SqgW ln4/rwQRoGKKksEi7axjRMQ7v9NWvW4+mcZe0Sn86iU3ZND+KynWt/25bQjyQldt1gmV 1gDkp8kfS4jJssKPSQ/y3fuLn56hotjlY/aBalcQ4IwgXP4S+fxW1aIv/WdBIro7icx5 83afQ3HDtop6FsYicjBzk1gyUy0jVvvaPySqASlOrLMSIo+ri6i3/Ht9lRwJnRtfZpHo b1eQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="eK5/2ktw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m17si5896007edv.35.2021.08.20.00.10.55; Fri, 20 Aug 2021 00:11:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="eK5/2ktw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238564AbhHTHJY (ORCPT + 99 others); Fri, 20 Aug 2021 03:09:24 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25616 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235574AbhHTHJV (ORCPT ); Fri, 20 Aug 2021 03:09:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629443324; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FoMMrW1eYnWXMK1i/ukyETErdaYGzYVx0R0Q3/Dd0TI=; b=eK5/2ktwwBMeEk+3DEdm7Gwh7itLcj3S7l2icD5FxjuxszjAkXjm9h5px2Et3soXJJiI7Z 0mRqziOoh9hsEDJ+R8vcHBbipt3Y777MOsYz1uxEexpJOjqb7RzCXRClht3ITAYhoQ5QTi DILAayEOdcAxX5iuvkcu+Votn/m6kQg= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-504-dmWZRXcbMp2ZUWvJucol0A-1; Fri, 20 Aug 2021 03:08:42 -0400 X-MC-Unique: dmWZRXcbMp2ZUWvJucol0A-1 Received: by mail-wr1-f70.google.com with SMTP id p10-20020a5d68ca000000b001552bf8b9daso2511753wrw.22 for ; Fri, 20 Aug 2021 00:08:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=FoMMrW1eYnWXMK1i/ukyETErdaYGzYVx0R0Q3/Dd0TI=; b=LHX2v7C5GtRpysUOWkCt83OF9jgwBYPtck3ZB7Bktr7/PQWOd7LWQQgvVyx0WUsppy yNMzHtjo9CML3d1c3iys5+MAylThVn7RRECl61EmbFMik7coQRhELwgpw9m6f2n2mgg+ lEJ4doN2o641cOO9GL9JePahhWMr3RuakgIfAOaVkcNamLZnQa9E6oQk+Y+JOoLkEVHb fMcgyzShOimDxemGSsn+fw1M2zL4lLojKJiMeH93xjDTd6EYKH1XoEOz6AJ4M25M9HaU /kuaBogHhrg4yZmUbZWQG+dX2pEKAb2N28phxI9T1xb8LuHLYcJRqaaTWeaekHpm1lPr HisA== X-Gm-Message-State: AOAM532x2Zoa0rOfl+LeZrJyxoFxeka4huSEM1mKUVnZPdhRSUwELam4 x9Rr19AUv+svBin4EVGoP3VZE2GDzBtZj22anLEMs7DB3WtoVJRVMeLGQL8Gmhdvrh11pdx3W6c hKqKeDsR4QuNY4lTYwEPg2Iq+eRddq5aVQR4wnCLXMNSTKYBVM7mtImqHZa22CzMSaIGiuV0z X-Received: by 2002:a5d:4ed2:: with SMTP id s18mr8291970wrv.72.1629443321447; Fri, 20 Aug 2021 00:08:41 -0700 (PDT) X-Received: by 2002:a5d:4ed2:: with SMTP id s18mr8291930wrv.72.1629443321036; Fri, 20 Aug 2021 00:08:41 -0700 (PDT) Received: from ?IPv6:2003:d8:2f0a:7f00:fad7:3bc9:69d:31f? (p200300d82f0a7f00fad73bc9069d031f.dip0.t-ipconnect.de. [2003:d8:2f0a:7f00:fad7:3bc9:69d:31f]) by smtp.gmail.com with ESMTPSA id x18sm5062871wrw.19.2021.08.20.00.08.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 20 Aug 2021 00:08:40 -0700 (PDT) Subject: Re: [v2 PATCH 1/3] mm: hwpoison: don't drop slab caches for offlining non-LRU page To: Yang Shi , naoya.horiguchi@nec.com, osalvador@suse.de, tdmackey@twitter.com, willy@infradead.org, akpm@linux-foundation.org, corbet@lwn.net Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210819054116.266126-1-shy828301@gmail.com> From: David Hildenbrand Organization: Red Hat Message-ID: <12bf652b-7042-25d7-4910-276a6e422111@redhat.com> Date: Fri, 20 Aug 2021 09:08:39 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210819054116.266126-1-shy828301@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.08.21 07:41, Yang Shi wrote: > In the current implementation of soft offline, if non-LRU page is met, > all the slab caches will be dropped to free the page then offline. But > if the page is not slab page all the effort is wasted in vain. Even > though it is a slab page, it is not guaranteed the page could be freed > at all. > > However the side effect and cost is quite high. It does not only drop > the slab caches, but also may drop a significant amount of page caches > which are associated with inode caches. It could make the most > workingset gone in order to just offline a page. And the offline is not > guaranteed to succeed at all, actually I really doubt the success rate > for real life workload. > > Furthermore the worse consequence is the system may be locked up and > unusable since the page cache release may incur huge amount of works > queued for memcg release. > > Actually we ran into such unpleasant case in our production environment. > Firstly, the workqueue of memory_failure_work_func is locked up as > below: > > BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 53s! > Showing busy workqueues and worker pools: > workqueue events: flags=0x0 >   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=14/256 refcnt=15 >     in-flight: 409271:memory_failure_work_func >     pending: kfree_rcu_work, kfree_rcu_monitor, kfree_rcu_work, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, rht_deferred_worker, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, kfree_rcu_work, drain_local_stock, kfree_rcu_work > workqueue mm_percpu_wq: flags=0x8 >   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 >     pending: vmstat_update > workqueue cgroup_destroy: flags=0x0 > pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1 refcnt=12072 > pending: css_release_work_fn > > There were over 12K css_release_work_fn queued, and this caused a few > lockups due to the contention of worker pool lock with IRQ disabled, for > example: > > NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 > Modules linked in: amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xt_DSCP iptable_mangle kvm_amd bpfilter vfat fat acpi_ipmi i2c_piix4 usb_storage ipmi_si k10temp i2c_core ipmi_devintf ipmi_msghandler acpi_cpufreq sch_fq_codel xfs libcrc32c crc32c_intel mlx5_core mlxfw nvme xhci_pci ptp nvme_core pps_core xhci_hcd > CPU: 1 PID: 205500 Comm: kworker/1:0 Tainted: G L 5.10.32-t1.el7.twitter.x86_64 #1 > Hardware name: TYAN F5AMT /z /S8026GM2NRE-CGN, BIOS V8.030 03/30/2021 > Workqueue: events memory_failure_work_func > RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0 > Code: 41 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 1b 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f6 c4 01 75 04 c6 47 > RSP: 0018:ffff9b2ac278f900 EFLAGS: 00000002 > RAX: 0000000000480101 RBX: ffff8ce98ce71800 RCX: 0000000000000084 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8ce98ce6a140 > RBP: 00000000000284c8 R08: ffffd7248dcb6808 R09: 0000000000000000 > R10: 0000000000000003 R11: ffff9b2ac278f9b0 R12: 0000000000000001 > R13: ffff8cb44dab9c00 R14: ffffffffbd1ce6a0 R15: ffff8cacaa37f068 > FS: 0000000000000000(0000) GS:ffff8ce98ce40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fcf6e8cb000 CR3: 0000000a0c60a000 CR4: 0000000000350ee0 > Call Trace: > __queue_work+0xd6/0x3c0 > queue_work_on+0x1c/0x30 > uncharge_batch+0x10e/0x110 > mem_cgroup_uncharge_list+0x6d/0x80 > release_pages+0x37f/0x3f0 > __pagevec_release+0x1c/0x50 > __invalidate_mapping_pages+0x348/0x380 > ? xfs_alloc_buftarg+0xa4/0x120 [xfs] > inode_lru_isolate+0x10a/0x160 > ? iput+0x1d0/0x1d0 > __list_lru_walk_one+0x7b/0x170 > ? iput+0x1d0/0x1d0 > list_lru_walk_one+0x4a/0x60 > prune_icache_sb+0x37/0x50 > super_cache_scan+0x123/0x1a0 > do_shrink_slab+0x10c/0x2c0 > shrink_slab+0x1f1/0x290 > drop_slab_node+0x4d/0x70 > soft_offline_page+0x1ac/0x5b0 > ? dev_mce_log+0xee/0x110 > ? notifier_call_chain+0x39/0x90 > memory_failure_work_func+0x6a/0x90 > process_one_work+0x19e/0x340 > ? process_one_work+0x340/0x340 > worker_thread+0x30/0x360 > ? process_one_work+0x340/0x340 > kthread+0x116/0x130 > > The lockup made the machine is quite unusable. And it also made the > most workingset gone, the reclaimabled slab caches were reduced from 12G > to 300MB, the page caches were decreased from 17G to 4G. > > But the most disappointing thing is all the effort doesn't make the page > offline, it just returns: > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 () > > It seems the aggressive behavior for non-LRU page didn't pay back, so it > doesn't make too much sense to keep it considering the terrible side > effect. > > Reported-by: David Mackey > Cc: Naoya Horiguchi > Cc: Oscar Salvador > Cc: David Hildenbrand > Signed-off-by: Yang Shi > --- Acked-by: David Hildenbrand -- Thanks, David / dhildenb