LinuxLists.cc - kernel BUG at mm/swap

2019-05-29 04:06:46

Subject: kernel BUG at mm/swap_state.c:170!

Hi folks.
I am observed kernel panic after update to git tag 5.2-rc2.
This crash happens at memory pressing when swap being used.

Unfortunately in journalctl saved only this:

May 29 08:02:02 localhost.localdomain kernel: page:ffffe90958230000
refcount:1 mapcount:1 mapping:ffff8f3ffeb36949 index:0x625002ab2
May 29 08:02:02 localhost.localdomain kernel: anon
May 29 08:02:02 localhost.localdomain kernel: flags:
0x17fffe00080034(uptodate|lru|active|swapbacked)
May 29 08:02:02 localhost.localdomain kernel: raw: 0017fffe00080034
ffffe90944640888 ffffe90956e208c8 ffff8f3ffeb36949
May 29 08:02:02 localhost.localdomain kernel: raw: 0000000625002ab2
0000000000000000 0000000100000000 ffff8f41aeeff000
May 29 08:02:02 localhost.localdomain kernel: page dumped because:
VM_BUG_ON_PAGE(entry != page)
May 29 08:02:02 localhost.localdomain kernel: page->mem_cgroup:ffff8f41aeeff000
May 29 08:02:02 localhost.localdomain kernel: ------------[ cut here
]------------
May 29 08:02:02 localhost.localdomain kernel: kernel BUG at mm/swap_state.c:170!

--
Best Regards,
Mike Gavrilov.

2019-05-29 17:33:58

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Wed, 29 May 2019 at 09:05, Mikhail Gavrilov
<[email protected]> wrote:
>
> Hi folks.
> I am observed kernel panic after update to git tag 5.2-rc2.
> This crash happens at memory pressing when swap being used.
>
> Unfortunately in journalctl saved only this:
>

Now I captured better trace.

: page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689
index:0xfecec363
: anon
: flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
: raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
: raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
: page dumped because: VM_BUG_ON_PAGE(entry != page)
: page->mem_cgroup:ffff978433ace000
: ------------[ cut here ]------------
: kernel BUG at mm/swap_state.c:170!
: invalid opcode: 0000 [#1] SMP NOPTI
: CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1
: Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 2202 04/11/2019
: RIP: 0010:__delete_from_swap_cache+0x20d/0x240
: Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41
5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff <0f> 0b
48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f
: RSP: 0018:ffffa982036e7980 EFLAGS: 00010046
: RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006
: RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900
: RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535
: R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120
: R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000
: FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000
: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
: CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0
: Call Trace:
: delete_from_swap_cache+0x46/0xa0
: try_to_free_swap+0xbc/0x110
: swap_writepage+0x13/0x70
: pageout.isra.0+0x13c/0x350
: shrink_page_list+0xc14/0xdf0
: shrink_inactive_list+0x1e5/0x3c0
: shrink_node_memcg+0x202/0x760
: ? do_shrink_slab+0x52/0x2c0
: shrink_node+0xe0/0x470
: balance_pgdat+0x2d1/0x510
: kswapd+0x220/0x420
: ? finish_wait+0x80/0x80
: kthread+0xfb/0x130
: ? balance_pgdat+0x510/0x510
: ? kthread_park+0x90/0x90
: ret_from_fork+0x22/0x40
: Modules linked in: uinput rfcomm fuse xt_CHECKSUM xt_MASQUERADE tun
bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security cmac nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter ip_tables bnep sunrpc vfat fat edac_mce_amd
arc4 kvm_amd rtwpci snd_hda_codec_realtek rtw88 kvm eeepc_wmi
snd_hda_codec_generic asus_wmi sparse_keymap ledtrig_audio
snd_hda_codec_hdmi video wmi_bmof mac80211 snd_hda_intel uvcvideo
snd_hda_codec videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
irqbypass snd_usb_audio videobuf2_common snd_hda_core videodev
snd_usbmidi_lib snd_seq snd_hwdep snd_rawmidi snd_seq_device btusb
snd_pcm crct10dif_pclmul btrtl crc32_pclmul btbcm btintel bluetooth
: cfg80211 snd_timer ghash_clmulni_intel joydev snd k10temp soundcore
media sp5100_tco ccp i2c_piix4 ecdh_generic rfkill ecc gpio_amdpt
pcc_cpufreq gpio_generic acpi_cpufreq binfmt_misc hid_logitech_hidpp
hid_logitech_dj uas usb_storage hid_sony ff_memless amdgpu
amd_iommu_v2 gpu_sched ttm drm_kms_helper igb nvme dca drm
crc32c_intel i2c_algo_bit nvme_core wmi pinctrl_amd
: ---[ end trace 3840e49b1d8d2c24 ]---

$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname -r`/vmlinux
__delete_from_swap_cache+0x20d
__delete_from_swap_cache+0x20d/0x240:
__delete_from_swap_cache at mm/swap_state.c:170 (discriminator 1)

--
Best Regards,
Mike Gavrilov.

2019-05-29 18:11:19

by Michal Hocko

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Wed 29-05-19 22:32:08, Mikhail Gavrilov wrote:
> On Wed, 29 May 2019 at 09:05, Mikhail Gavrilov
> <[email protected]> wrote:
> >
> > Hi folks.
> > I am observed kernel panic after update to git tag 5.2-rc2.
> > This crash happens at memory pressing when swap being used.
> >
> > Unfortunately in journalctl saved only this:
> >
>
> Now I captured better trace.
>
> : page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689
> index:0xfecec363
> : anon
> : flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
> : raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
> : raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
> : page dumped because: VM_BUG_ON_PAGE(entry != page)
> : page->mem_cgroup:ffff978433ace000
> : ------------[ cut here ]------------
> : kernel BUG at mm/swap_state.c:170!

Do you see the same with 5.2-rc1 resp. 5.1?
--
Michal Hocko
SUSE Labs

2019-05-29 19:29:54

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Wed, 29 May 2019 at 23:09, Michal Hocko <[email protected]> wrote:
>
> On Wed 29-05-19 22:32:08, Mikhail Gavrilov wrote:
> > On Wed, 29 May 2019 at 09:05, Mikhail Gavrilov
> > <[email protected]> wrote:
> > >
> > > Hi folks.
> > > I am observed kernel panic after update to git tag 5.2-rc2.
> > > This crash happens at memory pressing when swap being used.
> > >
> > > Unfortunately in journalctl saved only this:
> > >
> >
> > Now I captured better trace.
> >
> > : page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689
> > index:0xfecec363
> > : anon
> > : flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
> > : raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
> > : raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
> > : page dumped because: VM_BUG_ON_PAGE(entry != page)
> > : page->mem_cgroup:ffff978433ace000
> > : ------------[ cut here ]------------
> > : kernel BUG at mm/swap_state.c:170!
>
> Do you see the same with 5.2-rc1 resp. 5.1?

On 5.2-rc1 I has another another issue
https://www.spinics.net/lists/linux-ext4/msg65661.html

--
Best Regards,
Mike Gavrilov.

2019-06-05 20:10:53

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Wed, 29 May 2019 at 23:09, Michal Hocko <[email protected]> wrote:
>
> Do you see the same with 5.2-rc1 resp. 5.1?

The problem still occurs at 5.2-rc3.
Unfortunately hard reproducible does not allow to make bisect.
Any ideas what is wrong?

--
Best Regards,
Mike Gavrilov.

2019-06-11 04:00:08

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Wed, 29 May 2019 at 23:09, Michal Hocko <[email protected]> wrote:
>
>
> Do you see the same with 5.2-rc1 resp. 5.1?

I can say with 100% certainty that kernel tag 5.1 is not affected by this bug.

Say anything about 5.2 rc1 is very difficult because occurs another
problem due to which all file systems are switched to read only mode.

And I am sure that since 5.2 rc2 this issue is begin occurring.

I also able recorded much more kernel logs with netconsole and option
memblock=debug. (attached as file here)

Please help me.

Attachments:

dmesg.txt (177.71 kB)

2019-06-16 10:13:16

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Hi,
I finished today bisecting kernel.
And first bad commit for me was cd736d8b67fb22a85a68c1ee8020eb0d660615ec

Can you look into this?

$ git bisect log
git bisect start
# good: [a188339ca5a396acc588e5851ed7e19f66b0ebd9] Linux 5.2-rc1
git bisect good a188339ca5a396acc588e5851ed7e19f66b0ebd9
# good: [a188339ca5a396acc588e5851ed7e19f66b0ebd9] Linux 5.2-rc1
git bisect good a188339ca5a396acc588e5851ed7e19f66b0ebd9
# bad: [cd6c84d8f0cdc911df435bb075ba22ce3c605b07] Linux 5.2-rc2
git bisect bad cd6c84d8f0cdc911df435bb075ba22ce3c605b07
# bad: [060358de993f24562e884e265c4c57864a3a4141] treewide: Replace
GPLv2 boilerplate/reference with SPDX - rule 125
git bisect bad 060358de993f24562e884e265c4c57864a3a4141
# bad: [d53e860fd46f3d95c437bb67518f7374500de467] Merge branch 'linus'
of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad d53e860fd46f3d95c437bb67518f7374500de467
# bad: [34dcf6a1902ac214149a2742250ff03aa5346f3e] net: caif: fix the
value of size argument of snprintf
git bisect bad 34dcf6a1902ac214149a2742250ff03aa5346f3e
# bad: [c7d5ec26ea4adf450d9ab2b794e7735761a93af1] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
git bisect bad c7d5ec26ea4adf450d9ab2b794e7735761a93af1
# good: [3d21b6525caeae45f08e2d3a07ddfdef64882b8b] selftests/bpf: add
prog detach to flow_dissector test
git bisect good 3d21b6525caeae45f08e2d3a07ddfdef64882b8b
# bad: [3ebe1bca58c85325c97a22d4fc3f5b5420752e6f] ppp: deflate: Fix
possible crash in deflate_init
git bisect bad 3ebe1bca58c85325c97a22d4fc3f5b5420752e6f
# bad: [d0a7e8cb3c9d7d4fa2bcdd557be19f0841e2a3be] NFC: Orphan the subsystem
git bisect bad d0a7e8cb3c9d7d4fa2bcdd557be19f0841e2a3be
# bad: [0fe9f173d6cda95874edeb413b1fa9907b5ae830] net: Always descend into dsa/
git bisect bad 0fe9f173d6cda95874edeb413b1fa9907b5ae830
# bad: [cd736d8b67fb22a85a68c1ee8020eb0d660615ec] tcp: fix retrans
timestamp on passive Fast Open
git bisect bad cd736d8b67fb22a85a68c1ee8020eb0d660615ec
# first bad commit: [cd736d8b67fb22a85a68c1ee8020eb0d660615ec] tcp:
fix retrans timestamp on passive Fast Open

--
Best Regards,
Mike Gavrilov.

On Tue, 11 Jun 2019 at 08:59, Mikhail Gavrilov
<[email protected]> wrote:
>
> On Wed, 29 May 2019 at 23:09, Michal Hocko <[email protected]> wrote:
> >
> >
> > Do you see the same with 5.2-rc1 resp. 5.1?
>
> I can say with 100% certainty that kernel tag 5.1 is not affected by this bug.
>
> Say anything about 5.2 rc1 is very difficult because occurs another
> problem due to which all file systems are switched to read only mode.
>
> And I am sure that since 5.2 rc2 this issue is begin occurring.
>
> I also able recorded much more kernel logs with netconsole and option
> memblock=debug. (attached as file here)
>
> Please help me.

2019-06-17 12:18:28

by Vlastimil Babka

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On 6/16/19 12:12 PM, Mikhail Gavrilov wrote:
> Hi,
> I finished today bisecting kernel.
> And first bad commit for me was cd736d8b67fb22a85a68c1ee8020eb0d660615ec

That's commit "tcp: fix retrans timestamp on passive Fast Open" which is
almost certainly not the culprit.

> Can you look into this?
>
>
> $ git bisect log
> git bisect start
> # good: [a188339ca5a396acc588e5851ed7e19f66b0ebd9] Linux 5.2-rc1
> git bisect good a188339ca5a396acc588e5851ed7e19f66b0ebd9
> # good: [a188339ca5a396acc588e5851ed7e19f66b0ebd9] Linux 5.2-rc1
> git bisect good a188339ca5a396acc588e5851ed7e19f66b0ebd9

You told bisect that 5.2-rc1 is good, but it probably isn't.
What you probably need to do is:
git bisect good v5.1
git bisect bad v5.2-rc2

The presence of the other ext4 bug complicates the bisect, however.
According to tytso in the thread you linked, it should be fixed by
commit 0a944e8a6c66, while the bug was introduced by commit
345c0dbf3a30. So in each step of bisect, before building the kernel, you
should cherry-pick the fix if the bug is there:

git merge-base --is-ancestor 345c0dbf3a30 HEAD && git cherry-pick 0a944e8a6c66

Also in case you see a completely different problem in some bisect step, try
'git bisect skip' instead of guessing if it's good or bad.
Hopefully that will lead to a better result.

2019-06-17 12:47:47

by Vlastimil Babka

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On 5/29/19 7:32 PM, Mikhail Gavrilov wrote:
> On Wed, 29 May 2019 at 09:05, Mikhail Gavrilov
> <[email protected]> wrote:
>>
>> Hi folks.
>> I am observed kernel panic after update to git tag 5.2-rc2.
>> This crash happens at memory pressing when swap being used.
>>
>> Unfortunately in journalctl saved only this:
>>
>
> Now I captured better trace.

The VM_BUG_ON_PAGE has been touched in 5.2-rc1 by commit
5fd4ca2d84b2 ("mm: page cache: store only head pages in i_pages")
CCing relevant people and keeping rest of mail for reference.

> : page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689
> index:0xfecec363
> : anon
> : flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
> : raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
> : raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
> : page dumped because: VM_BUG_ON_PAGE(entry != page)
> : page->mem_cgroup:ffff978433ace000
> : ------------[ cut here ]------------
> : kernel BUG at mm/swap_state.c:170!
> : invalid opcode: 0000 [#1] SMP NOPTI
> : CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1
> : Hardware name: System manufacturer System Product Name/ROG STRIX
> X470-I GAMING, BIOS 2202 04/11/2019
> : RIP: 0010:__delete_from_swap_cache+0x20d/0x240
> : Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41
> 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff <0f> 0b
> 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f
> : RSP: 0018:ffffa982036e7980 EFLAGS: 00010046
> : RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006
> : RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900
> : RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535
> : R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120
> : R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000
> : FS: 0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000
> : CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> : CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0
> : Call Trace:
> : delete_from_swap_cache+0x46/0xa0
> : try_to_free_swap+0xbc/0x110
> : swap_writepage+0x13/0x70
> : pageout.isra.0+0x13c/0x350
> : shrink_page_list+0xc14/0xdf0
> : shrink_inactive_list+0x1e5/0x3c0
> : shrink_node_memcg+0x202/0x760
> : ? do_shrink_slab+0x52/0x2c0
> : shrink_node+0xe0/0x470
> : balance_pgdat+0x2d1/0x510
> : kswapd+0x220/0x420
> : ? finish_wait+0x80/0x80
> : kthread+0xfb/0x130
> : ? balance_pgdat+0x510/0x510
> : ? kthread_park+0x90/0x90
> : ret_from_fork+0x22/0x40
> : Modules linked in: uinput rfcomm fuse xt_CHECKSUM xt_MASQUERADE tun
> bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT
> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
> xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
> iptable_security cmac nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter ip_tables bnep sunrpc vfat fat edac_mce_amd
> arc4 kvm_amd rtwpci snd_hda_codec_realtek rtw88 kvm eeepc_wmi
> snd_hda_codec_generic asus_wmi sparse_keymap ledtrig_audio
> snd_hda_codec_hdmi video wmi_bmof mac80211 snd_hda_intel uvcvideo
> snd_hda_codec videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> irqbypass snd_usb_audio videobuf2_common snd_hda_core videodev
> snd_usbmidi_lib snd_seq snd_hwdep snd_rawmidi snd_seq_device btusb
> snd_pcm crct10dif_pclmul btrtl crc32_pclmul btbcm btintel bluetooth
> : cfg80211 snd_timer ghash_clmulni_intel joydev snd k10temp soundcore
> media sp5100_tco ccp i2c_piix4 ecdh_generic rfkill ecc gpio_amdpt
> pcc_cpufreq gpio_generic acpi_cpufreq binfmt_misc hid_logitech_hidpp
> hid_logitech_dj uas usb_storage hid_sony ff_memless amdgpu
> amd_iommu_v2 gpu_sched ttm drm_kms_helper igb nvme dca drm
> crc32c_intel i2c_algo_bit nvme_core wmi pinctrl_amd
> : ---[ end trace 3840e49b1d8d2c24 ]---
>
>
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname -r`/vmlinux
> __delete_from_swap_cache+0x20d
> __delete_from_swap_cache+0x20d/0x240:
> __delete_from_swap_cache at mm/swap_state.c:170 (discriminator 1)
>
>
>
>
> --
> Best Regards,
> Mike Gavrilov.
>

2019-06-17 20:11:00

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Mon, 17 Jun 2019 at 17:17, Vlastimil Babka <[email protected]> wrote:
>
> That's commit "tcp: fix retrans timestamp on passive Fast Open" which is
> almost certainly not the culprit.

Yes, I seen also content of this commit.
And it looks like madness.
But I can proving that my bisect are properly created.
Here I saved all dmesg output from all bisecting steps:
https://mega.nz/#F!00wFHACA!nmaLgkkbrlt46DteERjl7Q
And only one of them ended without crash message "kernel BUG at
mm/swap_state.c:170!"
This is step5 with commit 3d21b6525cae.

I tried to cause kernel panic several times when kenel compiled from
commit 3d21b6525cae would be launched and all my attempts was be
unsuccessful.

So I can say that commit 3d21b6525cae is enough stable for me and I
now sitting on it.

> You told bisect that 5.2-rc1 is good, but it probably isn't.
> What you probably need to do is:
> git bisect good v5.1
> git bisect bad v5.2-rc2
>
> The presence of the other ext4 bug complicates the bisect, however.
> According to tytso in the thread you linked, it should be fixed by
> commit 0a944e8a6c66, while the bug was introduced by commit
> 345c0dbf3a30. So in each step of bisect, before building the kernel, you
> should cherry-pick the fix if the bug is there:
>
> git merge-base --is-ancestor 345c0dbf3a30 HEAD && git cherry-pick 0a944e8a6c66

Oh, thanks for advise. But I am used another solution.
(I applied the patch every time when bisect move to new step)

> Also in case you see a completely different problem in some bisect step, try
> 'git bisect skip' instead of guessing if it's good or bad.
> Hopefully that will lead to a better result.

If you take a look all my dmesg logs you can sure that all bad steps
ended with crash "kernel BUG at mm/swap_state.c:170!".

And yes, I look again at commit cd736d8b67fb still don't understand
how it can broke my system.

--
Best Regards,
Mike Gavrilov.

2019-06-30 21:45:50

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Mon, 17 Jun 2019 at 17:17, Vlastimil Babka <[email protected]> wrote:
>
>
> You told bisect that 5.2-rc1 is good, but it probably isn't.
> What you probably need to do is:
> git bisect good v5.1
> git bisect bad v5.2-rc2
>

$ git bisect log
git bisect start
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
# bad: [cd6c84d8f0cdc911df435bb075ba22ce3c605b07] Linux 5.2-rc2
git bisect bad cd6c84d8f0cdc911df435bb075ba22ce3c605b07
# good: [f4d9a23d3dad0252f375901bf4ff6523a2c97241] sparc64: simplify
reduce_memory() function
git bisect good f4d9a23d3dad0252f375901bf4ff6523a2c97241
# good: [4dbf09fea60d158e60a30c419e0cfa1ea138dd57] Merge tag
'mtd/for-5.2' of
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux
git bisect good 4dbf09fea60d158e60a30c419e0cfa1ea138dd57
# bad: [22c58fd70ca48a29505922b1563826593b08cc00] Merge tag
'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 22c58fd70ca48a29505922b1563826593b08cc00
# skip: [414147d99b928c574ed76e9374a5d2cb77866a29] Merge tag
'pci-v5.2-changes' of
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
git bisect skip 414147d99b928c574ed76e9374a5d2cb77866a29
# bad: [841964e86acf48317db0469daf821e44554d8b0c] autofs: update mount
control expire desription with AUTOFS_EXP_FORCED
git bisect bad 841964e86acf48317db0469daf821e44554d8b0c
# good: [7e9890a3500d95c01511a4c45b7e7192dfa47ae2] Merge tag
'ovl-update-5.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
git bisect good 7e9890a3500d95c01511a4c45b7e7192dfa47ae2
# good: [c7a1c2bbb65e25551d585fba0fd36a01e0a22690] Merge branch 'pci/trivial'
git bisect good c7a1c2bbb65e25551d585fba0fd36a01e0a22690
# good: [9dc2108d667da44c7b147b185b64e31c0a60f583] ocfs2: use common
file type conversion
git bisect good 9dc2108d667da44c7b147b185b64e31c0a60f583
# bad: [ef4d6f6b275c498f8e5626c99dbeefdc5027f843]
include/linux/bitops.h: sanitize rotate primitives
git bisect bad ef4d6f6b275c498f8e5626c99dbeefdc5027f843
# bad: [19343b5bdd16ad4ae6b845ef829f68b683c4dfb5] mm/page-writeback:
introduce tracepoint for wait_on_page_writeback()
git bisect bad 19343b5bdd16ad4ae6b845ef829f68b683c4dfb5
# bad: [2d0adf7e0d7ac1e18da874c5b19ef30a0db59658] mm/hugetlb: get rid
of NODEMASK_ALLOC
git bisect bad 2d0adf7e0d7ac1e18da874c5b19ef30a0db59658
# skip: [2b59e01a3aa665f751d1410b99fae9336bd424e1] mm/cma.c: fix the
bitmap status to show failed allocation reason
git bisect skip 2b59e01a3aa665f751d1410b99fae9336bd424e1
# good: [916ac0527837aa0be46d82804f93dd46f03aaedc] slub: use slab_list
instead of lru
git bisect good 916ac0527837aa0be46d82804f93dd46f03aaedc
# skip: [5e65af19e89ac33dc83e1869c78b33ed7099469b]
mm/page_isolation.c: remove redundant pfn_valid_within() in
__first_valid_page()
git bisect skip 5e65af19e89ac33dc83e1869c78b33ed7099469b
# skip: [2b487e59f00aaa885ebf9c47d44d09f3ef4df80e] mm: memcontrol:
push down mem_cgroup_node_nr_lru_pages()
git bisect skip 2b487e59f00aaa885ebf9c47d44d09f3ef4df80e
# bad: [a861bbce27634160ae0330126b4ef001d6941c8f] sh: advertise
gigantic page support
git bisect bad a861bbce27634160ae0330126b4ef001d6941c8f
# skip: [d3ba3ae19751e476b0840a0c9a673a5766fa3219]
mm/memory_hotplug.c: fix the wrong usage of N_HIGH_MEMORY
git bisect skip d3ba3ae19751e476b0840a0c9a673a5766fa3219
# bad: [54c7a8916a887f357088f99e9c3a7720cd57d2c8] initramfs: free
initrd memory if opening /initrd.image fails
git bisect bad 54c7a8916a887f357088f99e9c3a7720cd57d2c8
# skip: [7af75561e17132b20b5bc047d222f34b3e7a3e6e] mm/gup: add
FOLL_LONGTERM capability to GUP fast
git bisect skip 7af75561e17132b20b5bc047d222f34b3e7a3e6e
# skip: [9851ac13592df77958ae7bac6ba39e71420c38ec] mm: move
nr_deactivate accounting to shrink_active_list()
git bisect skip 9851ac13592df77958ae7bac6ba39e71420c38ec
# good: [16cb0ec75b346ec4fce11c5ce40d68b173f4e2f4] slab: use slab_list
instead of lru
git bisect good 16cb0ec75b346ec4fce11c5ce40d68b173f4e2f4
# skip: [9fdf4aa156733e3f075a9d7d0b026648b3874afe] IB/hfi1: use the
new FOLL_LONGTERM flag to get_user_pages_fast()
git bisect skip 9fdf4aa156733e3f075a9d7d0b026648b3874afe
# bad: [113b7dfd827175977ea71cc4a29c1ac24acb9fce] mm: memcontrol:
quarantine the mem_cgroup_[node_]nr_lru_pages() API
git bisect bad 113b7dfd827175977ea71cc4a29c1ac24acb9fce
# good: [517f9f1ee5ed0a05d0f6f884f6d9b5c46ac5a810] mm/slab.c: remove
unneed check in cpuup_canceled
git bisect good 517f9f1ee5ed0a05d0f6f884f6d9b5c46ac5a810
# bad: [664b21e717cfe4781137263f2555da335549210e] IB/qib: use the new
FOLL_LONGTERM flag to get_user_pages_fast()
git bisect bad 664b21e717cfe4781137263f2555da335549210e
# skip: [886cf1901db962cee5f8b82b9b260079a5e8a4eb] mm: move
recent_rotated pages calculation to shrink_inactive_list()
git bisect skip 886cf1901db962cee5f8b82b9b260079a5e8a4eb
# skip: [b798bec4741bdd80224214fdd004c8e52698e425] mm/gup: change
write parameter to flags in fast walk
git bisect skip b798bec4741bdd80224214fdd004c8e52698e425
# skip: [63931eb97508cd67515dbcc049defaebd7b1fcd0] mm, page_alloc:
disallow __GFP_COMP in alloc_pages_exact()
git bisect skip 63931eb97508cd67515dbcc049defaebd7b1fcd0
# good: [f0fd50504a54f5548eb666dc16ddf8394e44e4b7] mm/cma_debug.c: fix
the break condition in cma_maxchunk_get()
git bisect good f0fd50504a54f5548eb666dc16ddf8394e44e4b7
# skip: [a222f341586834073c2bbea225be38216eb5d993] mm: generalize
putback scan functions
git bisect skip a222f341586834073c2bbea225be38216eb5d993
# bad: [f372d89e5dbbf2bc8e37089bacd526afd4e1d6c2] mm: remove
pages_to_free argument of move_active_pages_to_lru()
git bisect bad f372d89e5dbbf2bc8e37089bacd526afd4e1d6c2
# bad: [5fd4ca2d84b249f0858ce28cf637cf25b61a398f] mm: page cache:
store only head pages in i_pages
git bisect bad 5fd4ca2d84b249f0858ce28cf637cf25b61a398f
# good: [cefdca0a86be517bc390fc4541e3674b8e7803b0] userfaultfd/sysctl:
add vm.unprivileged_userfaultfd
git bisect good cefdca0a86be517bc390fc4541e3674b8e7803b0
# first bad commit: [5fd4ca2d84b249f0858ce28cf637cf25b61a398f] mm:
page cache: store only head pages in i_pages

All kernel logs uploaded here:
https://mega.nz/#F!00wFHACA!nmaLgkkbrlt46DteERjl7Q

> The VM_BUG_ON_PAGE has been touched in 5.2-rc1 by commit
> 5fd4ca2d84b2 ("mm: page cache: store only head pages in i_pages")
> CCing relevant people and keeping rest of mail for reference.

Yes, this is exactly right commit.
But when I did bisect I am found yet another problem.
When hit memory limit and starting paging the virtual machine in KVM is hanged.
If after it I reboot computer then the system will completely hang.
All this commits I marked as skip in my bisecting.
And the backtrace in kernel log looks like as below:

[ 5729.146324] sysrq: Show Blocked State
[ 5729.151171] task PC stack pid father
[ 5729.156081] khugepaged D 0 98 2 0x80004000
[ 5729.160941] Call Trace:
[ 5729.165803] ? __schedule+0x29f/0x680
[ 5729.170669] schedule+0x33/0x90
[ 5729.175543] __rwsem_down_write_failed_common+0x1a6/0x470
[ 5729.180424] down_write+0x39/0x40
[ 5729.185301] khugepaged+0xe46/0x2380
[ 5729.190179] ? syscall_return_via_sysret+0x10/0x7f
[ 5729.195069] ? __switch_to_asm+0x34/0x70
[ 5729.199951] ? __switch_to_asm+0x40/0x70
[ 5729.204825] ? __switch_to_asm+0x34/0x70
[ 5729.209692] ? __switch_to_asm+0x40/0x70
[ 5729.214553] ? __switch_to_asm+0x34/0x70
[ 5729.219407] ? finish_wait+0x80/0x80
[ 5729.224261] kthread+0xfb/0x130
[ 5729.229115] ? collapse_shmem+0xe00/0xe00
[ 5729.233968] ? kthread_park+0x90/0x90
[ 5729.238823] ret_from_fork+0x22/0x40
[ 5729.243784] umount D 0 30362 1 0x00004004
[ 5729.248649] Call Trace:
[ 5729.253499] ? __schedule+0x29f/0x680
[ 5729.258354] ? account_entity_enqueue+0xc5/0xf0
[ 5729.263217] schedule+0x33/0x90
[ 5729.268072] schedule_timeout+0x209/0x300
[ 5729.272930] wait_for_completion+0x10d/0x160
[ 5729.277796] ? wake_up_q+0x60/0x60
[ 5729.282654] __flush_work+0x11f/0x1d0
[ 5729.287512] ? worker_attach_to_pool+0x90/0x90
[ 5729.292378] lru_add_drain_all+0x106/0x140
[ 5729.297240] invalidate_bdev+0x3c/0x50
[ 5729.302100] ext4_put_super+0x1e6/0x380
[ 5729.306961] generic_shutdown_super+0x6c/0x100
[ 5729.311809] kill_block_super+0x21/0x50
[ 5729.316648] deactivate_locked_super+0x36/0x70
[ 5729.321491] cleanup_mnt+0x3b/0x80
[ 5729.326325] task_work_run+0x87/0xa0
[ 5729.331156] exit_to_usermode_loop+0xc2/0xd0
[ 5729.335993] do_syscall_64+0x14f/0x170
[ 5729.340815] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5729.345646] RIP: 0033:0x7f996dde665b
[ 5729.350477] Code: Bad RIP value.
[ 5729.355304] RSP: 002b:00007ffe57829d78 EFLAGS: 00000246 ORIG_RAX:
00000000000000a6
[ 5729.360166] RAX: 0000000000000000 RBX: 00007f996df0f1e4 RCX: 00007f996dde665b
[ 5729.365032] RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 00005618b787c380
[ 5729.369903] RBP: 00005618b787b460 R08: 0000000000000000 R09: 00007ffe57828af0
[ 5729.374781] R10: 00007f996de6bc40 R11: 0000000000000246 R12: 00005618b787c380
[ 5729.379657] R13: 0000000000000000 R14: 00005618b787b558 R15: 00005618b787b670
[ 5729.384535] swapoff D 0 30364 1 0x00004004
[ 5729.389419] Call Trace:
[ 5729.394285] ? __schedule+0x29f/0x680
[ 5729.399136] schedule+0x33/0x90
[ 5729.403982] rwsem_down_read_failed+0x12a/0x1c0
[ 5729.408835] try_to_unuse+0x116/0x6c0
[ 5729.413684] __do_sys_swapoff+0x1d0/0x660
[ 5729.418535] do_syscall_64+0x5b/0x170
[ 5729.423380] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5729.428222] RIP: 0033:0x7f390519464b
[ 5729.433158] Code: Bad RIP value.
[ 5729.437869] RSP: 002b:00007ffd7fa96f08 EFLAGS: 00000206 ORIG_RAX:
00000000000000a8
[ 5729.442604] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f390519464b
[ 5729.447336] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000562ba52ef4a0
[ 5729.452069] RBP: 00007ffd7fa97f23 R08: 0000562ba52f04f0 R09: 0000000000000060
[ 5729.456745] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[ 5729.461356] R13: 0000562ba52ef4a0 R14: 0000000000000000 R15: 0000000000000000
[ 7016.819033] sysrq: Kill All Tasks
[ 7016.825376] kauditd_printk_skb: 1 callbacks suppressed
[ 7016.825377] audit: type=1131 audit(1561806917.764:517): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-journald comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=failed'
[ 7016.835240] systemd[1]: systemd-udevd.service: Main process exited,
code=killed, status=9/KILL
[ 7016.839643] systemd[1]: systemd-udevd.service: Failed with result 'signal'.
[ 7016.842886] audit: type=1131 audit(1561806917.781:518): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=failed'
[ 7016.842925] systemd[1]: systemd-journald.service: Service has no
hold-off time (RestartSec=0), scheduling restart.
[ 7016.848111] systemd[1]: systemd-journald.service: Scheduled restart
job, restart counter is at 4.
[ 7016.850579] systemd[1]: systemd-udevd.service: Service has no
hold-off time (RestartSec=0), scheduling restart.
[ 7016.853030] systemd[1]: systemd-udevd.service: Scheduled restart
job, restart counter is at 3.
[ 7016.855447] systemd[1]: Stopped udev Kernel Device Manager.
[ 7016.858073] audit: type=1130 audit(1561806917.797:519): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
[ 7016.862154] audit: type=1131 audit(1561806917.797:520): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
[ 7016.867367] systemd[1]: Starting udev Kernel Device Manager...
[ 7016.872311] systemd[1]: systemd-journal-flush.service: Succeeded.
[ 7016.874695] systemd[1]: Stopped Flush Journal to Persistent Storage.
[ 7016.879546] audit: type=1131 audit(1561806917.818:521): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-journal-flush comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
[ 7016.888393] audit: type=1130 audit(1561806917.827:522): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-journald comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
[ 7016.893011] audit: type=1131 audit(1561806917.827:523): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-journald comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
[ 7017.088027] audit: type=1130 audit(1561806918.027:524): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
[ 7017.105306] audit: type=1305 audit(1561806918.044:525): op=set
audit_enabled=1 old=1 auid=4294967295 ses=4294967295
subj=system_u:system_r:syslogd_t:s0 res=1
[ 7017.113711] audit: type=1130 audit(1561806918.052:526): pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=systemd-journald comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
[ 7026.699959] sysrq: Resetting
[ 7026.704037] ACPI MEMORY or I/O RESET_REG.

The third problem I noticed a long time ago. Symptomes: applications
can use only half of the swap partition. I have a swap partition size
RAM * 2. The applications start crashing when the swap partition is
used more than half. And this is not OOM.
For example for testing each commit I completely filled RAM, and
after sure that system is not hanged I launched compiling next bisect
iteration. And here happens crashes of gcc if swap is used over 32GB.
(Full swap size is 64GB)

>
> The presence of the other ext4 bug complicates the bisect, however.
> According to tytso in the thread you linked, it should be fixed by
> commit 0a944e8a6c66, while the bug was introduced by commit
> 345c0dbf3a30. So in each step of bisect, before building the kernel, you
> should cherry-pick the fix if the bug is there:
>
> git merge-base --is-ancestor 345c0dbf3a30 HEAD && git cherry-pick 0a944e8a6c66
>

Commands above lead to cyclic bisecting on one commit. So I used `git
stash` and `git stash pop` for applying ext4 patch.

--
Best Regards,
Mike Gavrilov.

2019-07-05 15:22:14

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Hey folks.
Excuse me, is anybody read my previous message?
5.2-rc7 is still affected by this issue [the logs in file
dmesg-5.2rc7-0.1.tar.xz] and I worry that stable 5.2 would be released
with this bug because there is almost no time left and I didn't see
the attention to this problem.
I confirm that reverting commit 5fd4ca2d84b2 on top of the rc7 tag is
help fix it [the logs in file dmesg-5.2rc7-0.2.tar.xz].
I am still awaiting any feedback here.
Thanks.

--
Best Regards,
Mike Gavrilov.

Attachments:

dmesg-5.2rc7-0.1.tar.xz (176.28 kB)
dmesg-5.2rc7-0.2.tar.xz (14.11 kB)
Download all attachments

2019-07-05 23:03:57

by Jan Kara

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Fri 05-07-19 20:19:48, Mikhail Gavrilov wrote:
> Hey folks.
> Excuse me, is anybody read my previous message?
> 5.2-rc7 is still affected by this issue [the logs in file
> dmesg-5.2rc7-0.1.tar.xz] and I worry that stable 5.2 would be released
> with this bug because there is almost no time left and I didn't see
> the attention to this problem.
> I confirm that reverting commit 5fd4ca2d84b2 on top of the rc7 tag is
> help fix it [the logs in file dmesg-5.2rc7-0.2.tar.xz].
> I am still awaiting any feedback here.

Yeah, I guess revert of 5fd4ca2d84b2 at this point is probably the best we
can do. Let's CC Linus, Andrew, and Greg (Linus is travelling AFAIK so I'm
not sure whether Greg won't do release for him).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-07-06 02:42:26

by Linus Torvalds

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Fri, Jul 5, 2019 at 4:03 PM Jan Kara <[email protected]> wrote:
>
> Yeah, I guess revert of 5fd4ca2d84b2 at this point is probably the best we
> can do. Let's CC Linus, Andrew, and Greg (Linus is travelling AFAIK so I'm
> not sure whether Greg won't do release for him).

I'm back home now, although possibly jetlagged.

The revert looks trivial (a conflict due to find_get_entries_tag()
having been removed in the meantime), and I guess that's the right
thing to do right now.

Matthew, comments?

Linus

2019-07-22 01:47:58

by huang ying

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Hi, Mikhail,

On Wed, May 29, 2019 at 12:05 PM Mikhail Gavrilov
<[email protected]> wrote:
>
> Hi folks.
> I am observed kernel panic after update to git tag 5.2-rc2.
> This crash happens at memory pressing when swap being used.
>
> Unfortunately in journalctl saved only this:
>
> May 29 08:02:02 localhost.localdomain kernel: page:ffffe90958230000
> refcount:1 mapcount:1 mapping:ffff8f3ffeb36949 index:0x625002ab2
> May 29 08:02:02 localhost.localdomain kernel: anon
> May 29 08:02:02 localhost.localdomain kernel: flags:
> 0x17fffe00080034(uptodate|lru|active|swapbacked)
> May 29 08:02:02 localhost.localdomain kernel: raw: 0017fffe00080034
> ffffe90944640888 ffffe90956e208c8 ffff8f3ffeb36949
> May 29 08:02:02 localhost.localdomain kernel: raw: 0000000625002ab2
> 0000000000000000 0000000100000000 ffff8f41aeeff000
> May 29 08:02:02 localhost.localdomain kernel: page dumped because:
> VM_BUG_ON_PAGE(entry != page)
> May 29 08:02:02 localhost.localdomain kernel: page->mem_cgroup:ffff8f41aeeff000
> May 29 08:02:02 localhost.localdomain kernel: ------------[ cut here
> ]------------
> May 29 08:02:02 localhost.localdomain kernel: kernel BUG at mm/swap_state.c:170!

I am trying to reproduce this bug. Can you give me some information
about your test case?

Best Regards,
Huang, Ying

2019-07-22 07:46:01

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Mon, 22 Jul 2019 at 06:37, huang ying <[email protected]> wrote:
>
> I am trying to reproduce this bug. Can you give me some information
> about your test case?

It not easy, but I try to explain:

1. I have the system with 32Gb RAM, 64GB swap and after boot, I always
launch follow applications:
a. Google Chrome dev channel
Note: here you should have 3 windows full of tabs on my
monitor 118 tabs in each window.
Don't worry modern Chrome browser is wise and load tabs only on demand.
We will use this feature later (on the last step).
b. Firefox Nightly ASAN this build with enabled address sanitizer.
c. Virtual Machine Manager (virt-manager) and start a virtual
machine with Windows 10 (2048 MiB RAM allocated)
d. Evolution
e. Steam client
f. Telegram client
g. DeadBeef music player

After all launched applications 15GB RAM should be allocated.

2. This step the most difficult, because we should by using Firefox
allocated 27-28GB RAM.
I use the infinite scroll on sites Facebook, VK, Pinterest, Tumblr
and open many tabs in Firefox as I could.
Note: our goal is 27-28GB allocated RAM in the system.

3. When we hit our goal in the second step now go to Google Chrome and
click as fast as you can on all unloaded tabs.
As usual, after 60 tabs this issue usually happens. 100%
reproducible for me.

Of course, I tried to simplify my workflow case by using stress-ng but
without success.

I hope it will help to make autotests.

--
Best Regards,
Mike Gavrilov.

2019-07-22 07:53:45

by Huang, Ying

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Mikhail Gavrilov <[email protected]> writes:

> On Mon, 22 Jul 2019 at 06:37, huang ying <[email protected]> wrote:
>>
>> I am trying to reproduce this bug. Can you give me some information
>> about your test case?
>
> It not easy, but I try to explain:
>
> 1. I have the system with 32Gb RAM, 64GB swap and after boot, I always
> launch follow applications:
> a. Google Chrome dev channel
> Note: here you should have 3 windows full of tabs on my
> monitor 118 tabs in each window.
> Don't worry modern Chrome browser is wise and load tabs only on demand.
> We will use this feature later (on the last step).
> b. Firefox Nightly ASAN this build with enabled address sanitizer.
> c. Virtual Machine Manager (virt-manager) and start a virtual
> machine with Windows 10 (2048 MiB RAM allocated)
> d. Evolution
> e. Steam client
> f. Telegram client
> g. DeadBeef music player
>
> After all launched applications 15GB RAM should be allocated.
>
> 2. This step the most difficult, because we should by using Firefox
> allocated 27-28GB RAM.
> I use the infinite scroll on sites Facebook, VK, Pinterest, Tumblr
> and open many tabs in Firefox as I could.
> Note: our goal is 27-28GB allocated RAM in the system.
>
> 3. When we hit our goal in the second step now go to Google Chrome and
> click as fast as you can on all unloaded tabs.
> As usual, after 60 tabs this issue usually happens. 100%
> reproducible for me.
>
> Of course, I tried to simplify my workflow case by using stress-ng but
> without success.
>
> I hope it will help to make autotests.

Yes. This is quite complex. Is the transparent huge page enabled in
your system? You can check the output of

$ cat /sys/kernel/mm/transparent_hugepage/enabled

And, whether is the swap device you use a SSD or NVMe disk (not HDD)?

Best Regards,
Huang, Ying

> --
> Best Regards,
> Mike Gavrilov.

2019-07-22 08:20:07

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Mon, 22 Jul 2019 at 12:53, Huang, Ying <[email protected]> wrote:
>
> Yes. This is quite complex. Is the transparent huge page enabled in
> your system? You can check the output of
>
> $ cat /sys/kernel/mm/transparent_hugepage/enabled

always [madvise] never

> And, whether is the swap device you use a SSD or NVMe disk (not HDD)?

NVMe INTEL Optane 905P SSDPE21D480GAM3

--
Best Regards,
Mike Gavrilov.

2019-07-23 13:32:43

by Huang, Ying

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Mikhail Gavrilov <[email protected]> writes:

> On Mon, 22 Jul 2019 at 12:53, Huang, Ying <[email protected]> wrote:
>>
>> Yes. This is quite complex. Is the transparent huge page enabled in
>> your system? You can check the output of
>>
>> $ cat /sys/kernel/mm/transparent_hugepage/enabled
>
> always [madvise] never
>
>> And, whether is the swap device you use a SSD or NVMe disk (not HDD)?
>
> NVMe INTEL Optane 905P SSDPE21D480GAM3

Thanks! I have found another (easier way) to reproduce the panic.
Could you try the below patch on top of v5.2-rc2? It can fix the panic
for me.

Best Regards,
Huang, Ying

-----------------------------------8<----------------------------------
From 5e519c2de54b9fd4b32b7a59e47ce7f94beb8845 Mon Sep 17 00:00:00 2001
From: Huang Ying <[email protected]>
Date: Tue, 23 Jul 2019 08:49:57 +0800
Subject: [PATCH] dbg xa head

---
mm/huge_memory.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9f8bce9a6b32..c6ca1c7157ed 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2482,6 +2482,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
struct lruvec *lruvec;
+ struct address_space *swap_cache = NULL;
+ unsigned long offset;
int i;

lruvec = mem_cgroup_page_lruvec(head, pgdat);
@@ -2489,6 +2491,14 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);

+ if (PageAnon(head) && PageSwapCache(head)) {
+ swp_entry_t entry = { .val = page_private(head) };
+
+ offset = swp_offset(entry);
+ swap_cache = swap_address_space(entry);
+ xa_lock(&swap_cache->i_pages);
+ }
+
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2501,6 +2511,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
} else if (!PageAnon(page)) {
__xa_store(&head->mapping->i_pages, head[i].index,
head + i, 0);
+ } else if (swap_cache) {
+ __xa_store(&swap_cache->i_pages, offset + i,
+ head + i, 0);
}
}

@@ -2508,9 +2521,10 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
- if (PageSwapCache(head))
+ if (PageSwapCache(head)) {
page_ref_add(head, 2);
- else
+ xa_unlock(&swap_cache->i_pages);
+ } else
page_ref_inc(head);
} else {
/* Additional pin to page cache */
--
2.20.1

2019-07-25 06:34:01

by Mikhail Gavrilov

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Tue, 23 Jul 2019 at 10:08, Huang, Ying <[email protected]> wrote:
>
> Thanks! I have found another (easier way) to reproduce the panic.
> Could you try the below patch on top of v5.2-rc2? It can fix the panic
> for me.
>

Thanks! Amazing work! The patch fixes the issue completely. The system
worked at a high load of 16 hours without failures.

But still seems to me that page cache is being too actively crowded
out with a lack of memory. Since, in addition to the top speed SSD on
which the swap is located, there is also the slow HDD in the system
that just starts to rustle continuously when swap being used. It would
seem better to push some of the RAM onto a fast SSD into the swap
partition than to leave the slow HDD without a cache.

https://imgur.com/a/e8TIkBa

But I am afraid it will be difficult to implement such an algorithm
that analyzes the waiting time for the file I/O and waiting for paging
(memory) and decides to leave parts in memory where the waiting time
is more higher it would be more efficient for systems with several
drives with access speeds can vary greatly. By waiting time I mean
waiting time reading/writing to storage multiplied on the count of
hits. Thus, we will not just keep in memory the most popular parts of
the memory/disk, but also those parts of which read/write where was
most costly.

--
Best Regards,
Mike Gavrilov.

2019-07-25 13:48:32

by Matthew Wilcox

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

On Tue, Jul 23, 2019 at 01:08:42PM +0800, Huang, Ying wrote:
> @@ -2489,6 +2491,14 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> /* complete memcg works before add pages to LRU */
> mem_cgroup_split_huge_fixup(head);
>
> + if (PageAnon(head) && PageSwapCache(head)) {
> + swp_entry_t entry = { .val = page_private(head) };
> +
> + offset = swp_offset(entry);
> + swap_cache = swap_address_space(entry);
> + xa_lock(&swap_cache->i_pages);
> + }
> +
> for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
> __split_huge_page_tail(head, i, lruvec, list);
> /* Some pages can be beyond i_size: drop them from page cache */
> @@ -2501,6 +2511,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> } else if (!PageAnon(page)) {
> __xa_store(&head->mapping->i_pages, head[i].index,
> head + i, 0);
> + } else if (swap_cache) {
> + __xa_store(&swap_cache->i_pages, offset + i,
> + head + i, 0);

I tried something along these lines (though I think I messed up the offset
calculation which is why it wasn't working for me). My other concern
was with the case where SWAPFILE_CLUSTER was less than HPAGE_PMD_NR.
Don't we need to drop the lock and look up a new swap_cache if offset >=
SWAPFILE_CLUSTER?

2019-07-25 16:17:25

by Huang, Ying

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Mikhail Gavrilov <[email protected]> writes:

> On Tue, 23 Jul 2019 at 10:08, Huang, Ying <[email protected]> wrote:
>>
>> Thanks! I have found another (easier way) to reproduce the panic.
>> Could you try the below patch on top of v5.2-rc2? It can fix the panic
>> for me.
>>
>
> Thanks! Amazing work! The patch fixes the issue completely. The system
> worked at a high load of 16 hours without failures.

Thanks a lot for your help!

Hi, Matthew and Kirill,

I think we can fold this fix patch into your original patch and try
again.

> But still seems to me that page cache is being too actively crowded
> out with a lack of memory. Since, in addition to the top speed SSD on
> which the swap is located, there is also the slow HDD in the system
> that just starts to rustle continuously when swap being used. It would
> seem better to push some of the RAM onto a fast SSD into the swap
> partition than to leave the slow HDD without a cache.
>
> https://imgur.com/a/e8TIkBa
>
> But I am afraid it will be difficult to implement such an algorithm
> that analyzes the waiting time for the file I/O and waiting for paging
> (memory) and decides to leave parts in memory where the waiting time
> is more higher it would be more efficient for systems with several
> drives with access speeds can vary greatly. By waiting time I mean
> waiting time reading/writing to storage multiplied on the count of
> hits. Thus, we will not just keep in memory the most popular parts of
> the memory/disk, but also those parts of which read/write where was
> most costly.

Yes. This is a valid problem. I remember Johannes has a solution long
ago, but I don't know why he give up that. Some information can be
found in the following URL.

https://lwn.net/Articles/690079/

Best Regards,
Huang, Ying

> --
> Best Regards,
> Mike Gavrilov.

2019-07-26 03:21:58

by Huang, Ying

[permalink] [raw]

Subject: Re: kernel BUG at mm/swap_state.c:170!

Matthew Wilcox <[email protected]> writes:

> On Tue, Jul 23, 2019 at 01:08:42PM +0800, Huang, Ying wrote:
>> @@ -2489,6 +2491,14 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> /* complete memcg works before add pages to LRU */
>> mem_cgroup_split_huge_fixup(head);
>>
>> + if (PageAnon(head) && PageSwapCache(head)) {
>> + swp_entry_t entry = { .val = page_private(head) };
>> +
>> + offset = swp_offset(entry);
>> + swap_cache = swap_address_space(entry);
>> + xa_lock(&swap_cache->i_pages);
>> + }
>> +
>> for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
>> __split_huge_page_tail(head, i, lruvec, list);
>> /* Some pages can be beyond i_size: drop them from page cache */
>> @@ -2501,6 +2511,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>> } else if (!PageAnon(page)) {
>> __xa_store(&head->mapping->i_pages, head[i].index,
>> head + i, 0);
>> + } else if (swap_cache) {
>> + __xa_store(&swap_cache->i_pages, offset + i,
>> + head + i, 0);
>
> I tried something along these lines (though I think I messed up the offset
> calculation which is why it wasn't working for me). My other concern
> was with the case where SWAPFILE_CLUSTER was less than HPAGE_PMD_NR.
> Don't we need to drop the lock and look up a new swap_cache if offset >=
> SWAPFILE_CLUSTER?

In swapfile.c, there is

#ifdef CONFIG_THP_SWAP
#define SWAPFILE_CLUSTER HPAGE_PMD_NR
...
#else
#define SWAPFILE_CLUSTER 256
...
#endif

So, if a THP is in swap cache, then SWAPFILE_CLUSTER equals
HPAGE_PMD_NR.

And there is one swap address space for each 64M swap space. So one THP
will be in one swap address space.

In swap.h, there is

/* One swap address space for each 64M swap space */
#define SWAP_ADDRESS_SPACE_SHIFT 14
#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT)

Best Regards,
Huang, Ying