> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>
>
> The patch titled
> Subject: mm/vmscan: remove unnecessary lruvec adding
> has been removed from the -mm tree. Its filename was
> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>
> This patch was dropped because it had testing failures
Andrew, do you have more information about this failure? I hit a bug
here under memory pressure and am wondering if this is related
which might save me some time digging…
[ 4389.727184][ T6600] mem_cgroup_update_lru_size(00000000bb31aaed, 0, -7): lru_size -1
[ 4389.735272][ T6600] WARNING: CPU: 9 PID: 6600 at mm/memcontrol.c:1287 mem_cgroup_update_lru_size+0x17d/0x1b0
[ 4389.745210][ T6600] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
[ 4389.771620][ T6600] CPU: 9 PID: 6600 Comm: oom01 Tainted: G L 5.6.0-rc4-next-20200305+ #4
[ 4389.781209][ T6600] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 4389.790577][ T6600] RIP: 0010:mem_cgroup_update_lru_size+0x17d/0x1b0
[ 4389.797108][ T6600] Code: d9 c7 e5 ff 49 89 d9 45 89 e0 44 89 f1 4c 89 ea 48 c7 c6 a0 86 81 83 48 c7 c7 9e 07 9e 83 c6 05 90 53 18 01 01 e8 25 a5 c8 ff <0f> 0b eb bc 48 89 de 48 c7 c7 80 e7 ce 83 e8 10 14 23 00 e9 e1 fe
[ 4389.816750][ T6600] RSP: 0018:ffffbf7b0adc3598 EFLAGS: 00010082
[ 4389.822793][ T6600] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4389.830737][ T6600] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbf7b0adc341c
[ 4389.838685][ T6600] RBP: ffffbf7b0adc35d8 R08: 0000000000000000 R09: 0000bf7b0adc341c
[ 4389.846631][ T6600] R10: 0000bf7b0adc33a8 R11: 0000bf7b0adc341f R12: 00000000fffffff9
[ 4389.854556][ T6600] R13: ffff978a77534400 R14: 0000000000000000 R15: 0000000000000000
[ 4389.862525][ T6600] FS: 00007f64a8f3b700(0000) GS:ffff979272880000(0000) knlGS:0000000000000000
[ 4389.871498][ T6600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4389.878065][ T6600] CR2: 00007f632d210000 CR3: 000000067ee08000 CR4: 00000000003406e0
[ 4389.885986][ T6600] Call Trace:
[ 4389.889259][ T6600] isolate_lru_pages+0x6c5/0xfd0
[ 4389.894227][ T6600] ? __const_udelay+0x3c/0x40
[ 4389.898935][ T6600] shrink_inactive_list+0x18a/0x860
[ 4389.904182][ T6600] shrink_lruvec+0x5d9/0xb70
[ 4389.908736][ T6600] ? find_held_lock+0x35/0xa0
[ 4389.913382][ T6600] ? percpu_ref_put_many+0xdd/0x1c0
[ 4389.918579][ T6600] shrink_node+0x2d6/0xca0
[ 4389.923032][ T6600] do_try_to_free_pages+0x1f7/0x9a0
[ 4389.928226][ T6600] try_to_free_pages+0x252/0x5b0
[ 4389.933112][ T6600] __alloc_pages_slowpath+0x458/0x1290
[ 4389.938548][ T6600] __alloc_pages_nodemask+0x3bb/0x450
[ 4389.943889][ T6600] alloc_pages_vma+0x8a/0x2c0
[ 4389.948631][ T6600] do_anonymous_page+0x16e/0x6f0
[ 4389.953523][ T6600] ? __lock_acquire+0x443/0x37c0
[ 4389.958426][ T6600] __handle_mm_fault+0xce1/0xd50
[ 4389.963415][ T6600] handle_mm_fault+0xfc/0x2f0
[ 4389.968055][ T6600] do_page_fault+0x263/0x6f9
[ 4389.972629][ T6600] page_fault+0x34/0x40
[ 4389.976741][ T6600] RIP: 0033:0x411ab0
[ 4389.980600][ T6600] Code: 89 de e8 83 16 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 75 1c ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 4390.000293][ T6600] RSP: 002b:00007f64a8f3aec0 EFLAGS: 00010206
[ 4390.006320][ T6600] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f837e05cb77
[ 4390.014254][ T6600] RDX: 00000000052d6000 RSI: 00000000c0000000 RDI: 0000000000000000
[ 4390.022213][ T6600] RBP: 00007f6327f3a000 R08: 00000000ffffffff R09: 0000000000000000
[ 4390.030150][ T6600] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[ 4390.038104][ T6600] R13: 00007ffd7960ec0f R14: 0000000000000000 R15: 00007f64a8f3afc0
[ 4390.046046][ T6600] irq event stamp: 400622
[ 4390.050376][ T6600] hardirqs last enabled at (400621): [<ffffffff82b94df7>] free_unref_page_list+0x1c7/0x2b0
[ 4390.060430][ T6600] hardirqs last disabled at (400622): [<ffffffff832d8fbc>] _raw_spin_lock_irq+0x1c/0x60
[ 4390.070144][ T6600] softirqs last enabled at (400510): [<ffffffff8360034c>] __do_softirq+0x34c/0x57c
[ 4390.079487][ T6600] softirqs last disabled at (400501): [<ffffffff828c68d2>] irq_exit+0xa2/0xc0
[ 4390.088394][ T6600] ---[ end trace eb6136217ea3d652 ]---
[ 4390.093976][ T6600] ------------[ cut here ]------------
[ 4390.099379][ T6600] kernel BUG at mm/memcontrol.c:1288!
[ 4390.104712][ T6600] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 4390.111523][ T6600] CPU: 9 PID: 6600 Comm: oom01 Tainted: G W L 5.6.0-rc4-next-20200305+ #4
[ 4390.121105][ T6600] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 4390.130485][ T6600] RIP: 0010:mem_cgroup_update_lru_size+0x13d/0x1b0
[ 4390.136987][ T6600] Code: 00 48 85 db 79 b7 48 c7 c7 78 32 db 83 e8 7b cd e5 ff 44 0f b6 3d db 53 18 01 41 80 ff 01 0f 87 e3 69 00 00 41 83 e7 01 74 0e <0f> 0b 48 c7 c7 70 e7 ce 83 e8 47 17 23 00 48 c7 c7 78 32 db 83 e8
[ 4390.156680][ T6600] RSP: 0018:ffffbf7b0adc3598 EFLAGS: 00010082
[ 4390.162716][ T6600] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4390.170664][ T6600] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbf7b0adc341c
[ 4390.178598][ T6600] RBP: ffffbf7b0adc35d8 R08: 0000000000000000 R09: 0000bf7b0adc341c
[ 4390.186551][ T6600] R10: 0000bf7b0adc33a8 R11: 0000bf7b0adc341f R12: 00000000fffffff9
[ 4390.194468][ T6600] R13: ffff978a77534400 R14: 0000000000000000 R15: 0000000000000000
[ 4390.202478][ T6600] FS: 00007f64a8f3b700(0000) GS:ffff979272880000(0000) knlGS:0000000000000000
[ 4390.211380][ T6600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.217923][ T6600] CR2: 00007f632d210000 CR3: 000000067ee08000 CR4: 00000000003406e0
[ 4390.225852][ T6600] Call Trace:
[ 4390.229064][ T6600] isolate_lru_pages+0x6c5/0xfd0
[ 4390.233926][ T6600] ? __const_udelay+0x3c/0x40
[ 4390.238594][ T6600] shrink_inactive_list+0x18a/0x860
[ 4390.243779][ T6600] shrink_lruvec+0x5d9/0xb70
[ 4390.248312][ T6600] ? find_held_lock+0x35/0xa0
[ 4390.252945][ T6600] ? percpu_ref_put_many+0xdd/0x1c0
[ 4390.258106][ T6600] shrink_node+0x2d6/0xca0
[ 4390.262472][ T6600] do_try_to_free_pages+0x1f7/0x9a0
[ 4390.267627][ T6600] try_to_free_pages+0x252/0x5b0
[ 4390.272527][ T6600] __alloc_pages_slowpath+0x458/0x1290
[ 4390.277953][ T6600] __alloc_pages_nodemask+0x3bb/0x450
[ 4390.283264][ T6600] alloc_pages_vma+0x8a/0x2c0
[ 4390.287889][ T6600] do_anonymous_page+0x16e/0x6f0
[ 4390.292760][ T6600] ? __lock_acquire+0x443/0x37c0
[ 4390.297650][ T6600] __handle_mm_fault+0xce1/0xd50
[ 4390.302551][ T6600] handle_mm_fault+0xfc/0x2f0
[ 4390.307177][ T6600] do_page_fault+0x263/0x6f9
[ 4390.311780][ T6600] page_fault+0x34/0x40
[ 4390.315899][ T6600] RIP: 0033:0x411ab0
[ 4390.319854][ T6600] Code: 89 de e8 83 16 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 75 1c ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 4390.339502][ T6600] RSP: 002b:00007f64a8f3aec0 EFLAGS: 00010206
[ 4390.345521][ T6600] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f837e05cb77
[ 4390.353463][ T6600] RDX: 00000000052d6000 RSI: 00000000c0000000 RDI: 0000000000000000
[ 4390.361389][ T6600] RBP: 00007f6327f3a000 R08: 00000000ffffffff R09: 0000000000000000
[ 4390.369318][ T6600] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[ 4390.377256][ T6600] R13: 00007ffd7960ec0f R14: 0000000000000000 R15: 00007f64a8f3afc0
[ 4390.385241][ T6600] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure irqbypass dax_pmem dax_pmem_core efivars acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
[ 4390.412408][ T6600] ---[ end trace eb6136217ea3d653 ]---
[ 4390.417817][ T6600] RIP: 0010:mem_cgroup_update_lru_size+0x13d/0x1b0
[ 4390.424306][ T6600] Code: 00 48 85 db 79 b7 48 c7 c7 78 32 db 83 e8 7b cd e5 ff 44 0f b6 3d db 53 18 01 41 80 ff 01 0f 87 e3 69 00 00 41 83 e7 01 74 0e <0f> 0b 48 c7 c7 70 e7 ce 83 e8 47 17 23 00 48 c7 c7 78 32 db 83 e8
[ 4390.443957][ T6600] RSP: 0018:ffffbf7b0adc3598 EFLAGS: 00010082
[ 4390.449975][ T6600] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[ 4390.457930][ T6600] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffbf7b0adc341c
[ 4390.465853][ T6600] RBP: ffffbf7b0adc35d8 R08: 0000000000000000 R09: 0000bf7b0adc341c
[ 4390.473808][ T6600] R10: 0000bf7b0adc33a8 R11: 0000bf7b0adc341f R12: 00000000fffffff9
[ 4390.481743][ T6600] R13: ffff978a77534400 R14: 0000000000000000 R15: 0000000000000000
[ 4390.489718][ T6600] FS: 00007f64a8f3b700(0000) GS:ffff979272880000(0000) knlGS:0000000000000000
[ 4390.498624][ T6600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.505162][ T6600] CR2: 00007f632d210000 CR3: 000000067ee08000 CR4: 00000000003406e0
[ 4390.513086][ T6600] Kernel panic - not syncing: Fatal exception
[ 4391.870599][ T6600] Shutting down cpus with NMI
[ 4391.875212][ T6600] Kernel Offset: 0x1800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 4391.886841][ T6600] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> ------------------------------------------------------
> From: Alex Shi <[email protected]>
> Subject: mm/vmscan: remove unnecessary lruvec adding
>
> Patch series "per lruvec lru_lock for memcg", v9.
>
> A partial merge. The first 6 patches from a 20 patch series. Some code
> cleanups and minimal optimizations.
>
>
> This patch (of 6):
>
> We don't have to add a freeable page into lru and then remove from it.
> This change saves a couple of actions and makes the moving more clear.
>
> The SetPageLRU needs to be kept here for list intergrity.
> Otherwise:
> #0 mave_pages_to_lru #1 release_pages
> if (put_page_testzero())
> if !put_page_testzero
> !PageLRU //skip lru_lock
> list_add(&page->lru,)
> list_add(&page->lru,) //corrupt
>
> [[email protected]: coding style fixes]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Alex Shi <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Konstantin Khlebnikov <[email protected]>
> Cc: Daniel Jordan <[email protected]>
> Cc: Yang Shi <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Mike Kravetz <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> mm/vmscan.c | 32 +++++++++++++++++++++-----------
> 1 file changed, 21 insertions(+), 11 deletions(-)
>
> --- a/mm/vmscan.c~mm-vmscan-remove-unnecessary-lruvec-adding
> +++ a/mm/vmscan.c
> @@ -1838,26 +1838,29 @@ static unsigned noinline_for_stack move_
> while (!list_empty(list)) {
> page = lru_to_page(list);
> VM_BUG_ON_PAGE(PageLRU(page), page);
> + list_del(&page->lru);
> if (unlikely(!page_evictable(page))) {
> - list_del(&page->lru);
> spin_unlock_irq(&pgdat->lru_lock);
> putback_lru_page(page);
> spin_lock_irq(&pgdat->lru_lock);
> continue;
> }
> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
>
> + /*
> + * The SetPageLRU needs to be kept here for list intergrity.
> + * Otherwise:
> + * #0 mave_pages_to_lru #1 release_pages
> + * if (put_page_testzero())
> + * if !put_page_testzero
> + * !PageLRU //skip lru_lock
> + * list_add(&page->lru,)
> + * list_add(&page->lru,) //corrupt
> + */
> SetPageLRU(page);
> - lru = page_lru(page);
> -
> - nr_pages = hpage_nr_pages(page);
> - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> - list_move(&page->lru, &lruvec->lists[lru]);
>
> - if (put_page_testzero(page)) {
> + if (unlikely(put_page_testzero(page))) {
> __ClearPageLRU(page);
> __ClearPageActive(page);
> - del_page_from_lru_list(page, lruvec, lru);
>
> if (unlikely(PageCompound(page))) {
> spin_unlock_irq(&pgdat->lru_lock);
> @@ -1865,9 +1868,16 @@ static unsigned noinline_for_stack move_
> spin_lock_irq(&pgdat->lru_lock);
> } else
> list_add(&page->lru, &pages_to_free);
> - } else {
> - nr_moved += nr_pages;
> + continue;
> }
> +
> + lruvec = mem_cgroup_page_lruvec(page, pgdat);
> + lru = page_lru(page);
> + nr_pages = hpage_nr_pages(page);
> +
> + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
> + list_add(&page->lru, &lruvec->lists[lru]);
> + nr_moved += nr_pages;
> }
>
> /*
> _
>
> Patches currently in -mm which might be from [email protected] are
>
> ocfs2-remove-fs_ocfs2_nm.patch
> ocfs2-remove-unused-macros.patch
> ocfs2-use-ocfs2_sec_bits-in-macro.patch
> ocfs2-remove-dlm_lock_is_remote.patch
> ocfs2-remove-useless-err.patch
> mm-memcg-fold-lock_page_lru-into-commit_charge.patch
> mm-page_idle-no-unlikely-double-check-for-idle-page-counting.patch
> mm-thp-move-lru_add_page_tail-func-to-huge_memoryc.patch
> mm-thp-clean-up-lru_add_page_tail.patch
> mm-thp-narrow-lru-locking.patch
>
On Thu, Mar 05, 2020 at 10:32:18PM -0500, Qian Cai wrote:
> > On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
> > The patch titled
> > Subject: mm/vmscan: remove unnecessary lruvec adding
> > has been removed from the -mm tree. Its filename was
> > mm-vmscan-remove-unnecessary-lruvec-adding.patch
> >
> > This patch was dropped because it had testing failures
>
> Andrew, do you have more information about this failure? I hit a bug
> here under memory pressure and am wondering if this is related
> which might save me some time digging…
See Hugh's message from a few minutes ago:
Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg
> On Mar 5, 2020, at 10:38 PM, Matthew Wilcox <[email protected]> wrote:
>
> On Thu, Mar 05, 2020 at 10:32:18PM -0500, Qian Cai wrote:
>>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>>> The patch titled
>>> Subject: mm/vmscan: remove unnecessary lruvec adding
>>> has been removed from the -mm tree. Its filename was
>>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>>>
>>> This patch was dropped because it had testing failures
>>
>> Andrew, do you have more information about this failure? I hit a bug
>> here under memory pressure and am wondering if this is related
>> which might save me some time digging…
>
> See Hugh's message from a few minutes ago:
>
> Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg
I don’t see it on lore.kernel or anywhere. Private email?
On Thu, 5 Mar 2020, Qian Cai wrote:
> > On Mar 5, 2020, at 10:38 PM, Matthew Wilcox <[email protected]> wrote:
> >
> > On Thu, Mar 05, 2020 at 10:32:18PM -0500, Qian Cai wrote:
> >>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
> >>> The patch titled
> >>> Subject: mm/vmscan: remove unnecessary lruvec adding
> >>> has been removed from the -mm tree. Its filename was
> >>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
> >>>
> >>> This patch was dropped because it had testing failures
> >>
> >> Andrew, do you have more information about this failure? I hit a bug
> >> here under memory pressure and am wondering if this is related
> >> which might save me some time digging…
Very likely related.
> >
> > See Hugh's message from a few minutes ago:
Thanks Matthew.
> >
> > Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg
>
> I don’t see it on lore.kernel or anywhere. Private email?
You're right, sorry I didn't notice, lots of ccs but
neither lkml nor linux-mm were on that thread from the start:
From [email protected] Thu Mar 5 18:16:06 2020
Date: Thu, 5 Mar 2020 18:15:40 -0800 (PST)
From: Hugh Dickins <[email protected]>
To: Andew Morton <[email protected]>, Alex Shi <[email protected]>
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Fengguang Wu <[email protected]>, Rong Chen <[email protected]>
Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg
On Tue, 3 Mar 2020, Alex Shi wrote:
> 在 2020/3/3 上午6:12, Andrew Morton 写道:
> >> Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
> >> and Yun Wang.
> > I'm not seeing a lot of evidence of review and test activity yet. But
> > I think I'll grab patches 01-06 as they look like fairly
> > straightforward improvements.
>
> cc Fengguang and Rong Chen
>
> I did some local functional testing and kselftest, they all look fine.
> 0day only warn me if some case failed. Is it no news is good news? :)
And now the bad news.
Andrew, please revert those six (or seven as they ended up in mmotm).
5.6-rc4-mm1 without them runs my tmpfs+loop+swapping+memcg+ksm kernel
build loads fine (did four hours just now), but 5.6-rc4-mm1 itself
crashed just after starting - seconds or minutes I didn't see,
but it did not complete an iteration.
I thought maybe those six would be harmless (though I've not looked
at them at all); but knew already that the full series is not good yet:
I gave it a try over 5.6-rc4 on Monday, and crashed very soon on simpler
testing, in different ways from what hits mmotm.
The first thing wrong with the full set was when I tried tmpfs+loop+
swapping kernel builds in "mem=700M cgroup_disabled=memory", of course
with CONFIG_DEBUG_LIST=y. That soon collapsed in a splurge of OOM kills
and list_del corruption messages: __list_del_entry_valid < list_del <
__page_cache_release < __put_page < put_page < __try_to_reclaim_swap <
free_swap_and_cache < shmem_free_swap < shmem_undo_range.
When I next tried with "mem=1G" and memcg enabled (but not being used),
that managed some iterations, no OOM kills, no list_del warnings (was
it swapping? perhaps, perhaps not, I was trying to go easy on it just
to see if "cgroup_disabled=memory" had been the problem); but when
rebooting after that, again list_del corruption messages and crash
(I didn't note them down).
So I didn't take much notice of what the mmotm crash backtrace showed
(but IIRC shmem and swap were in it).
Alex, I'm afraid you're focusing too much on performance results,
without doing the basic testing needed - I thought we had given you
some hints on the challenging areas (swapping, move_charge_at_immigrate,
page migration) when we attached a *correctly working* 5.3 version back
on 23rd August:
https://lore.kernel.org/linux-mm/[email protected]/
(Correctly working, except missing two patches I'd mistakenly dropped
as unnecessary in earlier rebases: but our discussions with Johannes
later showed to be very necessary, though their races rarely seen.)
I have not had the time (and do not expect to have the time) to review
your series: maybe it's one or two small fixes away from being complete,
or maybe it's still fundamentally flawed, I do not know. I had naively
hoped that you would help with a patchset that worked, rather than
cutting it down into something which does not.
Submitting your series to routine testing is much easier for me than
reviewing it: but then, yes, it's a pity that I don't find the time
to report the results on intervening versions, which also crashed.
What I have to do now, is set aside time today and tomorrow, to package
up the old scripts I use, describe them and their environment, and send
them to you (cc akpm in case I fall under a bus): so that you can
reproduce the crashes for yourself, and get to work on them.
Hugh
在 2020/3/6 下午12:17, Hugh Dickins 写道:
> On Thu, 5 Mar 2020, Qian Cai wrote:
>>> On Mar 5, 2020, at 10:38 PM, Matthew Wilcox <[email protected]> wrote:
>>>
>>> On Thu, Mar 05, 2020 at 10:32:18PM -0500, Qian Cai wrote:
>>>>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>>>>> The patch titled
>>>>> Subject: mm/vmscan: remove unnecessary lruvec adding
>>>>> has been removed from the -mm tree. Its filename was
>>>>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>>>>>
>>>>> This patch was dropped because it had testing failures
>>>> Andrew, do you have more information about this failure? I hit a bug
>>>> here under memory pressure and am wondering if this is related
>>>> which might save me some time digging…
> Very likely related.
>
Hi all,
Apologize for the trouble!
And Many thanks for you all for the report!
Obviously, I missed memory stress testing which I should do. Apologize again!
Qian Cai,
Which test case are you using? Could you share the reproduce steps for me?
Hugh,
Many thanks for help! I will seek some memory stress case and waiting for your case.
Thank you all!
Alex
> On Mar 5, 2020, at 11:42 PM, Alex Shi <[email protected]> wrote:
>
>
>
> 在 2020/3/6 下午12:17, Hugh Dickins 写道:
>> On Thu, 5 Mar 2020, Qian Cai wrote:
>>>> On Mar 5, 2020, at 10:38 PM, Matthew Wilcox <[email protected]> wrote:
>>>>
>>>> On Thu, Mar 05, 2020 at 10:32:18PM -0500, Qian Cai wrote:
>>>>>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>>>>>> The patch titled
>>>>>> Subject: mm/vmscan: remove unnecessary lruvec adding
>>>>>> has been removed from the -mm tree. Its filename was
>>>>>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>>>>>>
>>>>>> This patch was dropped because it had testing failures
>>>>> Andrew, do you have more information about this failure? I hit a bug
>>>>> here under memory pressure and am wondering if this is related
>>>>> which might save me some time digging…
>> Very likely related.
>>
>
> Hi all,
>
> Apologize for the trouble!
> And Many thanks for you all for the report!
> Obviously, I missed memory stress testing which I should do. Apologize again!
>
> Qian Cai,
> Which test case are you using? Could you share the reproduce steps for me?
LTP oom01 in a tight loop with swap,
# i=0; while :; do echo $((i++)); oom01; sleep 5; done
>
> Hugh,
> Many thanks for help! I will seek some memory stress case and waiting for your case.
>
>
> Thank you all!
> Alex
在 2020/3/6 上午11:32, Qian Cai 写道:
>
>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>>
>>
>> The patch titled
>> Subject: mm/vmscan: remove unnecessary lruvec adding
>> has been removed from the -mm tree. Its filename was
>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>>
>> This patch was dropped because it had testing failures
> Andrew, do you have more information about this failure? I hit a bug
> here under memory pressure and am wondering if this is related
> which might save me some time digging…
>
> [ 4389.727184][ T6600] mem_cgroup_update_lru_size(00000000bb31aaed, 0, -7): lru_size -1
This bug seems failed due to a update_lru_size() missing or misplace, but
what's I changed on this patch seems unlike to cause this bug.
Anyway, Qian, could you do me a favor to remove this patch and try again?
Since I am trying ltp's oom01 case,
# i=0; while :; do echo $((i++)); oom01; sleep 5; done
It runs well in dozens times on my qemu and hardware machine, on akpm branch commit f2cbd107a99b,
which included my 6 patches.
Andrew,
What's the reproduce steps for this test failure?
Thanks a lot for everyone!
Alex
在 2020/3/6 下午5:04, Alex Shi 写道:
>
>
> 在 2020/3/6 上午11:32, Qian Cai 写道:
>>
>>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>>>
>>>
>>> The patch titled
>>> Subject: mm/vmscan: remove unnecessary lruvec adding
>>> has been removed from the -mm tree. Its filename was
>>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>>>
>>> This patch was dropped because it had testing failures
>> Andrew, do you have more information about this failure? I hit a bug
>> here under memory pressure and am wondering if this is related
>> which might save me some time digging…
>>
>> [ 4389.727184][ T6600] mem_cgroup_update_lru_size(00000000bb31aaed, 0, -7): lru_size -1
>
> This bug seems failed due to a update_lru_size() missing or misplace, but
> what's I changed on this patch seems unlike to cause this bug.
>
> Anyway, Qian, could you do me a favor to remove this patch and try again?
Compare to this patch's change, the 'c8cba0cc2a80 mm/thp: narrow lru locking' is more
likely bad. Maybe it's due to lru unlock was moved before ClearPageCompound() from
before remap_page(head); guess this unlock should be move after ClearPageCompound or
move back to origin place.
But I still can not reproduce this bug. Awkward!
Alex
---
line 2605 mm/huge_memory.c:
spin_unlock_irqrestore(&pgdat->lru_lock, flags);
ClearPageCompound(head);
split_page_owner(head, HPAGE_PMD_ORDER);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
if (PageSwapCache(head)) {
page_ref_add(head, 2);
xa_unlock(&swap_cache->i_pages);
} else {
page_ref_inc(head);
}
} else {
/* Additional pin to page cache */
page_ref_add(head, 2);
xa_unlock(&head->mapping->i_pages);
}
remap_page(head);
在 2020/3/6 下午12:17, Hugh Dickins 写道:
>>>
>>> Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg
>>
>> I don’t see it on lore.kernel or anywhere. Private email?
>
> You're right, sorry I didn't notice, lots of ccs but
> neither lkml nor linux-mm were on that thread from the start:
My fault, I thought people would often give comments on each patch, will care this from now on.
>
> And now the bad news.
>
> Andrew, please revert those six (or seven as they ended up in mmotm).
> 5.6-rc4-mm1 without them runs my tmpfs+loop+swapping+memcg+ksm kernel
> build loads fine (did four hours just now), but 5.6-rc4-mm1 itself
> crashed just after starting - seconds or minutes I didn't see,
> but it did not complete an iteration.
>
> I thought maybe those six would be harmless (though I've not looked
> at them at all); but knew already that the full series is not good yet:
> I gave it a try over 5.6-rc4 on Monday, and crashed very soon on simpler
> testing, in different ways from what hits mmotm.
>
> The first thing wrong with the full set was when I tried tmpfs+loop+
> swapping kernel builds in "mem=700M cgroup_disabled=memory", of course
> with CONFIG_DEBUG_LIST=y. That soon collapsed in a splurge of OOM kills
> and list_del corruption messages: __list_del_entry_valid < list_del <
> __page_cache_release < __put_page < put_page < __try_to_reclaim_swap <
> free_swap_and_cache < shmem_free_swap < shmem_undo_range.
I have been run kernel build with a "mem=700M cgroup_disabled=memory" qemu-kvm
with a swapfile for 3 hours, Hope I could catch sth while waiting for your
kindly reproduce scripts. Thanks Hugh!
>
> When I next tried with "mem=1G" and memcg enabled (but not being used),
> that managed some iterations, no OOM kills, no list_del warnings (was
> it swapping? perhaps, perhaps not, I was trying to go easy on it just
> to see if "cgroup_disabled=memory" had been the problem); but when
> rebooting after that, again list_del corruption messages and crash
> (I didn't note them down).
>
> So I didn't take much notice of what the mmotm crash backtrace showed
> (but IIRC shmem and swap were in it).
Is there some place to get mmotm's crash backtrace?
>
> Alex, I'm afraid you're focusing too much on performance results,
> without doing the basic testing needed - I thought we had given you
> some hints on the challenging areas (swapping, move_charge_at_immigrate,
> page migration) when we attached a *correctly working* 5.3 version back
> on 23rd August:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> (Correctly working, except missing two patches I'd mistakenly dropped
> as unnecessary in earlier rebases: but our discussions with Johannes
> later showed to be very necessary, though their races rarely seen.)
>
Did you mean the Johannes's question of race on page->memcg in previous email?
"> I don't see what prevents the lruvec from changing under compaction,
> neither in your patches nor in Hugh's. Maybe I'm missing something?"
https://lkml.org/lkml/2019/11/22/2153
From then on, I have tired 2 solutions to protect page->memcg,
first use lock_page_memcg(wrong) and 2nd new solution, taking PageLRU bit as page
isoltion precondition which may work for memcg migration, and page
migration in compaction etc. Could you like to give some comments on this?
> I have not had the time (and do not expect to have the time) to review
> your series: maybe it's one or two small fixes away from being complete,
> or maybe it's still fundamentally flawed, I do not know. I had naively
> hoped that you would help with a patchset that worked, rather than
> cutting it down into something which does not.>
Sorry, Hugh, I didn't know you have per memcg lru_lock patchset before I sent
out my first verion.
> Submitting your series to routine testing is much easier for me than
> reviewing it: but then, yes, it's a pity that I don't find the time
> to report the results on intervening versions, which also crashed.
>
> What I have to do now, is set aside time today and tomorrow, to package
> up the old scripts I use, describe them and their environment, and send
> them to you (cc akpm in case I fall under a bus): so that you can
> reproduce the crashes for yourself, and get to work on them.
>
Thanks advance for your coming testing scripts, I believe it will help a lot.
BTW, I try my best to orgnize this patches to make it stright, a senior experts
like you, won't cost much time to go through whole patches. and give some precious
comment!
I am looking forward to hear comments from you. :)
Thanks
Alex
On Thu, Mar 05, 2020 at 08:17:46PM -0800, Hugh Dickins wrote:
> On Tue, 3 Mar 2020, Alex Shi wrote:
> > 在 2020/3/3 上午6:12, Andrew Morton 写道:
> > >> Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
> > >> and Yun Wang.
> > > I'm not seeing a lot of evidence of review and test activity yet. But
> > > I think I'll grab patches 01-06 as they look like fairly
> > > straightforward improvements.
> >
> > cc Fengguang and Rong Chen
> >
> > I did some local functional testing and kselftest, they all look fine.
> > 0day only warn me if some case failed. Is it no news is good news? :)
>
> And now the bad news.
>
> Andrew, please revert those six (or seven as they ended up in mmotm).
> 5.6-rc4-mm1 without them runs my tmpfs+loop+swapping+memcg+ksm kernel
> build loads fine (did four hours just now), but 5.6-rc4-mm1 itself
> crashed just after starting - seconds or minutes I didn't see,
> but it did not complete an iteration.
>
> I thought maybe those six would be harmless (though I've not looked
> at them at all); but knew already that the full series is not good yet:
> I gave it a try over 5.6-rc4 on Monday, and crashed very soon on simpler
> testing, in different ways from what hits mmotm.
>
> The first thing wrong with the full set was when I tried tmpfs+loop+
> swapping kernel builds in "mem=700M cgroup_disabled=memory", of course
> with CONFIG_DEBUG_LIST=y. That soon collapsed in a splurge of OOM kills
> and list_del corruption messages: __list_del_entry_valid < list_del <
> __page_cache_release < __put_page < put_page < __try_to_reclaim_swap <
> free_swap_and_cache < shmem_free_swap < shmem_undo_range.
>
> When I next tried with "mem=1G" and memcg enabled (but not being used),
> that managed some iterations, no OOM kills, no list_del warnings (was
> it swapping? perhaps, perhaps not, I was trying to go easy on it just
> to see if "cgroup_disabled=memory" had been the problem); but when
> rebooting after that, again list_del corruption messages and crash
> (I didn't note them down).
>
> So I didn't take much notice of what the mmotm crash backtrace showed
> (but IIRC shmem and swap were in it).
>
> Alex, I'm afraid you're focusing too much on performance results,
> without doing the basic testing needed - I thought we had given you
> some hints on the challenging areas (swapping, move_charge_at_immigrate,
> page migration) when we attached a *correctly working* 5.3 version back
> on 23rd August:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> (Correctly working, except missing two patches I'd mistakenly dropped
> as unnecessary in earlier rebases: but our discussions with Johannes
> later showed to be very necessary, though their races rarely seen.)
>
> I have not had the time (and do not expect to have the time) to review
> your series: maybe it's one or two small fixes away from being complete,
> or maybe it's still fundamentally flawed, I do not know. I had naively
> hoped that you would help with a patchset that worked, rather than
> cutting it down into something which does not.
I'm a bit confused by this. I, and I believe Alex, kept going down a
different path because it didn't sound like there was a solution to
the compaction race. As I remember, the conversation ended on this:
: Your race here (again, lruvec lock taken then PageLRU observed, but
: page->mem_cgroup changed in between) really questions my whole scheme:
: I am not going to propose a solution now, I'll have to go back and
: recheck my assumptions all over. Certainly isolate_migratepage_block()
: has a harder job than any other, but I need to re-review it all.
https://lore.kernel.org/lkml/[email protected]/
That's certainly why I kept looking and eventually proposed using
PageLRU clearing as a lock. Maybe there is a better way to do it, but
I didn't see it.
An LRU list corruption in page_cache_release() suggests a bug in the
way this new locking scheme works or is applied - rather than a
gratuitous divergence from your series that could have been avoided.
> Submitting your series to routine testing is much easier for me than
> reviewing it: but then, yes, it's a pity that I don't find the time
> to report the results on intervening versions, which also crashed.
>
> What I have to do now, is set aside time today and tomorrow, to package
> up the old scripts I use, describe them and their environment, and send
> them to you (cc akpm in case I fall under a bus): so that you can
> reproduce the crashes for yourself, and get to work on them.
I think that would be very useful. tmpfs+loop+swapping+memcg+ksm
kernel builds aren't exactly a go-to test case for most mm developers
(although maybe they should be!)
> On Mar 6, 2020, at 6:58 AM, Alex Shi <[email protected]> wrote:
>
>
>
> 在 2020/3/6 下午5:04, Alex Shi 写道:
>>
>>
>> 在 2020/3/6 上午11:32, Qian Cai 写道:
>>>
>>>> On Mar 5, 2020, at 9:50 PM, [email protected] wrote:
>>>>
>>>>
>>>> The patch titled
>>>> Subject: mm/vmscan: remove unnecessary lruvec adding
>>>> has been removed from the -mm tree. Its filename was
>>>> mm-vmscan-remove-unnecessary-lruvec-adding.patch
>>>>
>>>> This patch was dropped because it had testing failures
>>> Andrew, do you have more information about this failure? I hit a bug
>>> here under memory pressure and am wondering if this is related
>>> which might save me some time digging…
>>>
>>> [ 4389.727184][ T6600] mem_cgroup_update_lru_size(00000000bb31aaed, 0, -7): lru_size -1
>>
>> This bug seems failed due to a update_lru_size() missing or misplace, but
>> what's I changed on this patch seems unlike to cause this bug.
>>
>> Anyway, Qian, could you do me a favor to remove this patch and try again?
>
> Compare to this patch's change, the 'c8cba0cc2a80 mm/thp: narrow lru locking' is more
> likely bad. Maybe it's due to lru unlock was moved before ClearPageCompound() from
> before remap_page(head); guess this unlock should be move after ClearPageCompound or
> move back to origin place.
I can only confirmed that after reverted those 6 patches, I am no long be able to reproduce it.
>
> But I still can not reproduce this bug. Awkward!
>
> Alex
>
> ---
> line 2605 mm/huge_memory.c:
> spin_unlock_irqrestore(&pgdat->lru_lock, flags);
>
> ClearPageCompound(head);
>
> split_page_owner(head, HPAGE_PMD_ORDER);
>
> /* See comment in __split_huge_page_tail() */
> if (PageAnon(head)) {
> /* Additional pin to swap cache */
> if (PageSwapCache(head)) {
> page_ref_add(head, 2);
> xa_unlock(&swap_cache->i_pages);
> } else {
> page_ref_inc(head);
> }
> } else {
> /* Additional pin to page cache */
> page_ref_add(head, 2);
> xa_unlock(&head->mapping->i_pages);
> }
>
> remap_page(head);
在 2020/3/7 上午10:27, Qian Cai 写道:
>> Compare to this patch's change, the 'c8cba0cc2a80 mm/thp: narrow lru locking' is more
>> likely bad. Maybe it's due to lru unlock was moved before ClearPageCompound() from
>> before remap_page(head); guess this unlock should be move after ClearPageCompound or
>> move back to origin place.
> I can only confirmed that after reverted those 6 patches, I am no long be able to reproduce it.
>
Hi Qian,
Thanks for response!
Could you like just try to revert the patch: 'mm/thp: narrow lru locking'? or would you like to
share me info of your tests and let me reproduce it? like kernel config, system ENV, machine type.
I had run hundreds cycle of oom01, but akpm kernel(f2cbd107a99b) still survived.
I got my ltp mm testing results, it run total 75 cases, failed 2, skip 9 and others are success
and kernel works well after test on yesterday's akmp head: f2cbd107a99b.
Many Thanks for help!
Alex
=====
Test Start Time: Fri Mar 6 20:49:59 2020
-----------------------------------------
Testcase Result Exit Value
-------- ------ ----------
mm01 PASS 0
mm02 PASS 0
mtest01 PASS 0
mtest01w PASS 0
mtest05 PASS 0
mtest06 PASS 0
mtest06_2 PASS 0
mtest06_3 PASS 0
mem01 PASS 0
mem02 PASS 0
mem03 PASS 0
page01 PASS 0
page02 PASS 0
data_space PASS 0
stack_space PASS 0
shmt02 PASS 0
shmt03 PASS 0
shmt04 PASS 0
shmt05 PASS 0
shmt06 PASS 0
shmt07 PASS 0
shmt08 PASS 0
shmt09 PASS 0
shmt10 PASS 0
shm_test01 PASS 0
mallocstress01 PASS 0
mmapstress01 PASS 0
mmapstress02 PASS 0
mmapstress03 PASS 0
mmapstress04 PASS 0
mmapstress05 PASS 0
mmapstress06 PASS 0
mmapstress07 PASS 0
mmapstress08 PASS 0
mmapstress09 PASS 0
mmapstress10 PASS 0
mmap10 PASS 0
mmap10_1 PASS 0
mmap10_2 PASS 0
mmap10_3 PASS 0
mmap10_4 PASS 0
ksm01 FAIL 2
ksm01_1 FAIL 1
ksm02 CONF 32
ksm02_1 CONF 32
ksm03 PASS 0
ksm03_1 PASS 0
ksm04 CONF 32
ksm04_1 CONF 32
ksm05 PASS 0
ksm06 CONF 32
ksm06_1 CONF 32
ksm06_2 CONF 32
oom01 PASS 0
oom02 CONF 32
oom03 PASS 0
oom04 PASS 0
oom05 PASS 0
swapping01 PASS 0
thp01 PASS 0
thp02 PASS 0
thp03 PASS 0
vma01 PASS 0
vma02 PASS 0
vma03 CONF 32
vma04 PASS 0
vma05 PASS 0
overcommit_memory01 PASS 0
overcommit_memory02 PASS 0
overcommit_memory03 PASS 0
overcommit_memory04 PASS 0
overcommit_memory05 PASS 0
overcommit_memory06 PASS 0
max_map_count PASS 0
min_free_kbytes PASS 0
-----------------------------------------------
Total Tests: 75
Total Skipped Tests: 9
Total Failures: 2
Kernel Version: 5.6.0-rc4-06724-gf2cbd107a99b
Machine Architecture: x86_64
Hostname: alexshi-test
> On Mar 6, 2020, at 10:26 PM, Alex Shi <[email protected]> wrote:
>
> 在 2020/3/7 上午10:27, Qian Cai 写道:
>>> Compare to this patch's change, the 'c8cba0cc2a80 mm/thp: narrow lru locking' is more
>>> likely bad. Maybe it's due to lru unlock was moved before ClearPageCompound() from
>>> before remap_page(head); guess this unlock should be move after ClearPageCompound or
>>> move back to origin place.
>> I can only confirmed that after reverted those 6 patches, I am no long be able to reproduce it.
>>
>
> Hi Qian,
>
> Thanks for response!
> Could you like just try to revert the patch: 'mm/thp: narrow lru locking'? or would you like to
> share me info of your tests and let me reproduce it? like kernel config, system ENV, machine type.
> I had run hundreds cycle of oom01, but akpm kernel(f2cbd107a99b) still survived.
https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
HPE ProLiant DL385 Gen10
AMD EPYC 7601 32-Core Processor
65536 MB memory, 400 GB disk space
Processors 128
Cores 64
Sockets 2
linux-next 20200306
>
> I got my ltp mm testing results, it run total 75 cases, failed 2, skip 9 and others are success
> and kernel works well after test on yesterday's akmp head: f2cbd107a99b.
>
> Many Thanks for help!
> Alex
>
> =====
>
> Test Start Time: Fri Mar 6 20:49:59 2020
> -----------------------------------------
> Testcase Result Exit Value
> -------- ------ ----------
> mm01 PASS 0
> mm02 PASS 0
> mtest01 PASS 0
> mtest01w PASS 0
> mtest05 PASS 0
> mtest06 PASS 0
> mtest06_2 PASS 0
> mtest06_3 PASS 0
> mem01 PASS 0
> mem02 PASS 0
> mem03 PASS 0
> page01 PASS 0
> page02 PASS 0
> data_space PASS 0
> stack_space PASS 0
> shmt02 PASS 0
> shmt03 PASS 0
> shmt04 PASS 0
> shmt05 PASS 0
> shmt06 PASS 0
> shmt07 PASS 0
> shmt08 PASS 0
> shmt09 PASS 0
> shmt10 PASS 0
> shm_test01 PASS 0
> mallocstress01 PASS 0
> mmapstress01 PASS 0
> mmapstress02 PASS 0
> mmapstress03 PASS 0
> mmapstress04 PASS 0
> mmapstress05 PASS 0
> mmapstress06 PASS 0
> mmapstress07 PASS 0
> mmapstress08 PASS 0
> mmapstress09 PASS 0
> mmapstress10 PASS 0
> mmap10 PASS 0
> mmap10_1 PASS 0
> mmap10_2 PASS 0
> mmap10_3 PASS 0
> mmap10_4 PASS 0
> ksm01 FAIL 2
> ksm01_1 FAIL 1
> ksm02 CONF 32
> ksm02_1 CONF 32
> ksm03 PASS 0
> ksm03_1 PASS 0
> ksm04 CONF 32
> ksm04_1 CONF 32
> ksm05 PASS 0
> ksm06 CONF 32
> ksm06_1 CONF 32
> ksm06_2 CONF 32
> oom01 PASS 0
> oom02 CONF 32
> oom03 PASS 0
> oom04 PASS 0
> oom05 PASS 0
> swapping01 PASS 0
> thp01 PASS 0
> thp02 PASS 0
> thp03 PASS 0
> vma01 PASS 0
> vma02 PASS 0
> vma03 CONF 32
> vma04 PASS 0
> vma05 PASS 0
> overcommit_memory01 PASS 0
> overcommit_memory02 PASS 0
> overcommit_memory03 PASS 0
> overcommit_memory04 PASS 0
> overcommit_memory05 PASS 0
> overcommit_memory06 PASS 0
> max_map_count PASS 0
> min_free_kbytes PASS 0
>
> -----------------------------------------------
> Total Tests: 75
> Total Skipped Tests: 9
> Total Failures: 2
> Kernel Version: 5.6.0-rc4-06724-gf2cbd107a99b
> Machine Architecture: x86_64
> Hostname: alexshi-test
>