2022-08-11 16:54:19

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd

Carry over the dirty bit from pmd to pte when a huge pmd splits. It
shouldn't be a correctness issue since when pmd_dirty() we'll have the page
marked dirty anyway, however having dirty bit carried over helps the next
initial writes of split ptes on some archs like x86.

Reviewed-by: Huang Ying <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/huge_memory.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3222b40a0f6d..2f68e034ddec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2027,7 +2027,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
pgtable_t pgtable;
pmd_t old_pmd, _pmd;
bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false;
- bool anon_exclusive = false;
+ bool anon_exclusive = false, dirty = false;
unsigned long addr;
int i;

@@ -2116,8 +2116,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
page = pmd_page(old_pmd);
- if (pmd_dirty(old_pmd))
+ if (pmd_dirty(old_pmd)) {
+ dirty = true;
SetPageDirty(page);
+ }
write = pmd_write(old_pmd);
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
@@ -2183,6 +2185,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = pte_wrprotect(entry);
if (!young)
entry = pte_mkold(entry);
+ /* NOTE: this may set soft-dirty too on some archs */
+ if (dirty)
+ entry = pte_mkdirty(entry);
if (soft_dirty)
entry = pte_mksoft_dirty(entry);
if (uffd_wp)
--
2.32.0


2022-10-21 16:26:48

by Anatoly Pugachev

[permalink] [raw]
Subject: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Thu, Aug 11, 2022 at 12:13:28PM -0400, Peter Xu wrote:
> Carry over the dirty bit from pmd to pte when a huge pmd splits. It
> shouldn't be a correctness issue since when pmd_dirty() we'll have the page
> marked dirty anyway, however having dirty bit carried over helps the next
> initial writes of split ptes on some archs like x86.
>
> Reviewed-by: Huang Ying <[email protected]>
> Signed-off-by: Peter Xu <[email protected]>
> ---
> mm/huge_memory.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)


Hello!

Tried to update my debian sparc64 sid (unstable) linux distro to latest
version of available packages, got dpkg segfault...

$ apt update -y
...
Unpacking linux-image-sparc64-smp (6.0.2-1) ...
E: Sub-process /usr/bin/dpkg received a segmentation fault.

Downgraded dpkg from 1.21.9 to 1.21.8 / 1.21.7 (2-3 monthes old
versions) - still getting segfault on package install (which was never
an issue before, even on this old dpkg versions).

Tried to gdb backtrace core file, which is unlucky :


root@ttip:/# apt install -y linux-image-sparc64-smp ccache qemu-utils xdelta qemu-system-x86 distcc qemu-efi-aarch64 pkg-kde-tools
...
Preparing to unpack .../2-linux-image-6.0.0-1-sparc64-smp_6.0.2-1_sparc64.deb ...
Unpacking linux-image-6.0.0-1-sparc64-smp (6.0.2-1) ...
Selecting previously unselected package linux-image-sparc64-smp.
Preparing to unpack .../3-linux-image-sparc64-smp_6.0.2-1_sparc64.deb ...
Unpacking linux-image-sparc64-smp (6.0.2-1) ...
E: Sub-process /usr/bin/dpkg received a segmentation fault.
root@ttip:/# ls -l core.4751
-rw------- 1 root root 25042944 Oct 21 14:38 core.4751
root@ttip:/# gdb -q -c core.4751
GNU gdb (Debian 12.1-4) 12.1
[New LWP 4751]
Core was generated by `/usr/bin/dpkg --status-fd 15 --no-triggers --unpack --auto-deconfigure --recurs'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xfff800010089cde4 in ?? ()
(gdb) bt
#0 0xfff800010089cde4 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)


rebooted from my compiled kernel 6.1.0-rc1 to older (debian) kernel -
5.19.0-2-sparc64-smp

dpkg installed packages without any problems. Removed just installed
packages, rebooted to 6.1.0-rc1 and tried to install packages, dpkg got
segfault again.

Recompiled 6.1.0-rc1 with gcc-11 instead of gcc-12, still segfaults...
... bisect time ...

mator@ttip:~/linux-2.6$ git bisect log
# bad: [9abf2313adc1ca1b6180c508c25f22f9395cc780] Linux 6.1-rc1
# good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect start 'v6.1-rc1' 'v6.0'
# good: [18fd049731e67651009f316195da9281b756f2cf] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
git bisect good 18fd049731e67651009f316195da9281b756f2cf
# good: [4c540c92b46497dcda59203eea78e4620bc96f47] RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output
git bisect good 4c540c92b46497dcda59203eea78e4620bc96f47
# bad: [27bc50fc90647bbf7b734c3fc306a5e61350da53] Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 27bc50fc90647bbf7b734c3fc306a5e61350da53
# good: [ada3bfb6492a6d0d3eca50f3b61315fe032efc72] Merge tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
git bisect good ada3bfb6492a6d0d3eca50f3b61315fe032efc72
# bad: [5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6] mm: add pageblock_align() macro
git bisect bad 5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6
# bad: [54a611b605901c7d5d05b6b8f5d04a6ceb0962aa] Maple Tree: add new data structure
git bisect bad 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa
# good: [59298997df89e19aad426d4ae0a7e5037074da5a] x86/uaccess: avoid check_object_size() in copy_from_user_nmi()
git bisect good 59298997df89e19aad426d4ae0a7e5037074da5a
# good: [04c6b79ae4f0bcbd96afd7cea5e1a8848162438e] btrfs: convert __process_pages_contig() to use filemap_get_folios_contig()
git bisect good 04c6b79ae4f0bcbd96afd7cea5e1a8848162438e
# good: [da29499124cd2221539b235c1f93c7d93faf6565] mm, hwpoison: use __PageMovable() to detect non-lru movable pages
git bisect good da29499124cd2221539b235c1f93c7d93faf6565
# bad: [eed9a328aa1ae6ac1edaa026957e6882f57de0dd] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
git bisect bad eed9a328aa1ae6ac1edaa026957e6882f57de0dd
# bad: [f347c9d2697fcbbb64e077f7113a3887a181b8c0] filemap: make the accounting of thrashing more consistent
git bisect bad f347c9d2697fcbbb64e077f7113a3887a181b8c0
# good: [eba4d770efc86a3710e36b828190858abfa3bb74] mm/swap: comment all the ifdef in swapops.h
git bisect good eba4d770efc86a3710e36b828190858abfa3bb74
# bad: [2e3468778dbe3ec389a10c21a703bb8e5be5cfbc] mm: remember young/dirty bit for page migrations
git bisect bad 2e3468778dbe3ec389a10c21a703bb8e5be5cfbc
# bad: [0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb] mm/thp: carry over dirty bit when thp splits on pmd
git bisect bad 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb
# good: [0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e] mm/swap: add swp_offset_pfn() to fetch PFN from swap entry
git bisect good 0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e
# first bad commit: [0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb] mm/thp: carry over dirty bit when thp splits on pmd


mator@ttip:~/linux-2.6$ git bisect good
0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb is the first bad commit
commit 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb
Author: Peter Xu <[email protected]>
Date: Thu Aug 11 12:13:28 2022 -0400

mm/thp: carry over dirty bit when thp splits on pmd

Carry over the dirty bit from pmd to pte when a huge pmd splits. It
shouldn't be a correctness issue since when pmd_dirty() we'll have the
page marked dirty anyway, however having dirty bit carried over helps the
next initial writes of split ptes on some archs like x86.

Link: https://lkml.kernel.org/r/[email protected]



So, v6.0-rc3-176-g0d206b5d2e0d) does not segfault dpkg,
v6.0-rc3-177-g0ccf7f168e17 segfaults it on package install.

dpkg test was (apt) install/remove some packages, segfaults only on install
(not remove).

Reverted 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb from top of v6.1-rc1 and
tried to compile kernel, but got error

mm/huge_memory.c: In function ‘__split_huge_pmd_locked’:
mm/huge_memory.c:2129:17: error: ‘dirty’ undeclared (first use in this function)
2129 | dirty = is_migration_entry_dirty(entry);
| ^~~~~
mm/huge_memory.c:2129:17: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [scripts/Makefile.build:250: mm/huge_memory.o] Error 1

So can't test v6.1-rc1 with patch reverted...

2022-10-23 13:38:39

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd) #forregzbot

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.CCing the regression
mailing list, as it should be in the loop for all regressions, as
explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 21.10.22 18:06, Anatoly Pugachev wrote:
> On Thu, Aug 11, 2022 at 12:13:28PM -0400, Peter Xu wrote:
>> Carry over the dirty bit from pmd to pte when a huge pmd splits. It
>> shouldn't be a correctness issue since when pmd_dirty() we'll have the page
>> marked dirty anyway, however having dirty bit carried over helps the next
>> initial writes of split ptes on some archs like x86.
>>
>> Reviewed-by: Huang Ying <[email protected]>
>> Signed-off-by: Peter Xu <[email protected]>
>> ---
>> mm/huge_memory.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>
>
> Hello!
>
> Tried to update my debian sparc64 sid (unstable) linux distro to latest
> version of available packages, got dpkg segfault...

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced 0ccf7f168e17bb7
#regzbot title mm: sparc64: dpkg fails on sparc64 since "mm/thp: Carry
over dirty bit when thp splits on pmd)"
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> $ apt update -y
> ...
> Unpacking linux-image-sparc64-smp (6.0.2-1) ...
> E: Sub-process /usr/bin/dpkg received a segmentation fault.
>
> Downgraded dpkg from 1.21.9 to 1.21.8 / 1.21.7 (2-3 monthes old
> versions) - still getting segfault on package install (which was never
> an issue before, even on this old dpkg versions).
>
> Tried to gdb backtrace core file, which is unlucky :
>
>
> root@ttip:/# apt install -y linux-image-sparc64-smp ccache qemu-utils xdelta qemu-system-x86 distcc qemu-efi-aarch64 pkg-kde-tools
> ...
> Preparing to unpack .../2-linux-image-6.0.0-1-sparc64-smp_6.0.2-1_sparc64.deb ...
> Unpacking linux-image-6.0.0-1-sparc64-smp (6.0.2-1) ...
> Selecting previously unselected package linux-image-sparc64-smp.
> Preparing to unpack .../3-linux-image-sparc64-smp_6.0.2-1_sparc64.deb ...
> Unpacking linux-image-sparc64-smp (6.0.2-1) ...
> E: Sub-process /usr/bin/dpkg received a segmentation fault.
> root@ttip:/# ls -l core.4751
> -rw------- 1 root root 25042944 Oct 21 14:38 core.4751
> root@ttip:/# gdb -q -c core.4751
> GNU gdb (Debian 12.1-4) 12.1
> [New LWP 4751]
> Core was generated by `/usr/bin/dpkg --status-fd 15 --no-triggers --unpack --auto-deconfigure --recurs'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0xfff800010089cde4 in ?? ()
> (gdb) bt
> #0 0xfff800010089cde4 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb)
>
>
> rebooted from my compiled kernel 6.1.0-rc1 to older (debian) kernel -
> 5.19.0-2-sparc64-smp
>
> dpkg installed packages without any problems. Removed just installed
> packages, rebooted to 6.1.0-rc1 and tried to install packages, dpkg got
> segfault again.
>
> Recompiled 6.1.0-rc1 with gcc-11 instead of gcc-12, still segfaults...
> ... bisect time ...
>
> mator@ttip:~/linux-2.6$ git bisect log
> # bad: [9abf2313adc1ca1b6180c508c25f22f9395cc780] Linux 6.1-rc1
> # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
> git bisect start 'v6.1-rc1' 'v6.0'
> # good: [18fd049731e67651009f316195da9281b756f2cf] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> git bisect good 18fd049731e67651009f316195da9281b756f2cf
> # good: [4c540c92b46497dcda59203eea78e4620bc96f47] RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output
> git bisect good 4c540c92b46497dcda59203eea78e4620bc96f47
> # bad: [27bc50fc90647bbf7b734c3fc306a5e61350da53] Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad 27bc50fc90647bbf7b734c3fc306a5e61350da53
> # good: [ada3bfb6492a6d0d3eca50f3b61315fe032efc72] Merge tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
> git bisect good ada3bfb6492a6d0d3eca50f3b61315fe032efc72
> # bad: [5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6] mm: add pageblock_align() macro
> git bisect bad 5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6
> # bad: [54a611b605901c7d5d05b6b8f5d04a6ceb0962aa] Maple Tree: add new data structure
> git bisect bad 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa
> # good: [59298997df89e19aad426d4ae0a7e5037074da5a] x86/uaccess: avoid check_object_size() in copy_from_user_nmi()
> git bisect good 59298997df89e19aad426d4ae0a7e5037074da5a
> # good: [04c6b79ae4f0bcbd96afd7cea5e1a8848162438e] btrfs: convert __process_pages_contig() to use filemap_get_folios_contig()
> git bisect good 04c6b79ae4f0bcbd96afd7cea5e1a8848162438e
> # good: [da29499124cd2221539b235c1f93c7d93faf6565] mm, hwpoison: use __PageMovable() to detect non-lru movable pages
> git bisect good da29499124cd2221539b235c1f93c7d93faf6565
> # bad: [eed9a328aa1ae6ac1edaa026957e6882f57de0dd] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
> git bisect bad eed9a328aa1ae6ac1edaa026957e6882f57de0dd
> # bad: [f347c9d2697fcbbb64e077f7113a3887a181b8c0] filemap: make the accounting of thrashing more consistent
> git bisect bad f347c9d2697fcbbb64e077f7113a3887a181b8c0
> # good: [eba4d770efc86a3710e36b828190858abfa3bb74] mm/swap: comment all the ifdef in swapops.h
> git bisect good eba4d770efc86a3710e36b828190858abfa3bb74
> # bad: [2e3468778dbe3ec389a10c21a703bb8e5be5cfbc] mm: remember young/dirty bit for page migrations
> git bisect bad 2e3468778dbe3ec389a10c21a703bb8e5be5cfbc
> # bad: [0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb] mm/thp: carry over dirty bit when thp splits on pmd
> git bisect bad 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb
> # good: [0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e] mm/swap: add swp_offset_pfn() to fetch PFN from swap entry
> git bisect good 0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e
> # first bad commit: [0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb] mm/thp: carry over dirty bit when thp splits on pmd
>
>
> mator@ttip:~/linux-2.6$ git bisect good
> 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb is the first bad commit
> commit 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb
> Author: Peter Xu <[email protected]>
> Date: Thu Aug 11 12:13:28 2022 -0400
>
> mm/thp: carry over dirty bit when thp splits on pmd
>
> Carry over the dirty bit from pmd to pte when a huge pmd splits. It
> shouldn't be a correctness issue since when pmd_dirty() we'll have the
> page marked dirty anyway, however having dirty bit carried over helps the
> next initial writes of split ptes on some archs like x86.
>
> Link: https://lkml.kernel.org/r/[email protected]
>
>
>
> So, v6.0-rc3-176-g0d206b5d2e0d) does not segfault dpkg,
> v6.0-rc3-177-g0ccf7f168e17 segfaults it on package install.
>
> dpkg test was (apt) install/remove some packages, segfaults only on install
> (not remove).
>
> Reverted 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb from top of v6.1-rc1 and
> tried to compile kernel, but got error
>
> mm/huge_memory.c: In function ‘__split_huge_pmd_locked’:
> mm/huge_memory.c:2129:17: error: ‘dirty’ undeclared (first use in this function)
> 2129 | dirty = is_migration_entry_dirty(entry);
> | ^~~~~
> mm/huge_memory.c:2129:17: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [scripts/Makefile.build:250: mm/huge_memory.o] Error 1
>
> So can't test v6.1-rc1 with patch reverted...

2022-10-23 20:07:09

by Peter Xu

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Fri, Oct 21, 2022 at 07:06:03PM +0300, Anatoly Pugachev wrote:
> On Thu, Aug 11, 2022 at 12:13:28PM -0400, Peter Xu wrote:
> > Carry over the dirty bit from pmd to pte when a huge pmd splits. It
> > shouldn't be a correctness issue since when pmd_dirty() we'll have the page
> > marked dirty anyway, however having dirty bit carried over helps the next
> > initial writes of split ptes on some archs like x86.
> >
> > Reviewed-by: Huang Ying <[email protected]>
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > mm/huge_memory.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
>
>
> Hello!

Hi, Anatoly,

>
> Tried to update my debian sparc64 sid (unstable) linux distro to latest
> version of available packages, got dpkg segfault...
>
> $ apt update -y
> ...
> Unpacking linux-image-sparc64-smp (6.0.2-1) ...
> E: Sub-process /usr/bin/dpkg received a segmentation fault.
>
> Downgraded dpkg from 1.21.9 to 1.21.8 / 1.21.7 (2-3 monthes old
> versions) - still getting segfault on package install (which was never
> an issue before, even on this old dpkg versions).
>
> Tried to gdb backtrace core file, which is unlucky :
>
>
> root@ttip:/# apt install -y linux-image-sparc64-smp ccache qemu-utils xdelta qemu-system-x86 distcc qemu-efi-aarch64 pkg-kde-tools
> ...
> Preparing to unpack .../2-linux-image-6.0.0-1-sparc64-smp_6.0.2-1_sparc64.deb ...
> Unpacking linux-image-6.0.0-1-sparc64-smp (6.0.2-1) ...
> Selecting previously unselected package linux-image-sparc64-smp.
> Preparing to unpack .../3-linux-image-sparc64-smp_6.0.2-1_sparc64.deb ...
> Unpacking linux-image-sparc64-smp (6.0.2-1) ...
> E: Sub-process /usr/bin/dpkg received a segmentation fault.
> root@ttip:/# ls -l core.4751
> -rw------- 1 root root 25042944 Oct 21 14:38 core.4751
> root@ttip:/# gdb -q -c core.4751
> GNU gdb (Debian 12.1-4) 12.1
> [New LWP 4751]
> Core was generated by `/usr/bin/dpkg --status-fd 15 --no-triggers --unpack --auto-deconfigure --recurs'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0xfff800010089cde4 in ?? ()
> (gdb) bt
> #0 0xfff800010089cde4 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb)
>
>
> rebooted from my compiled kernel 6.1.0-rc1 to older (debian) kernel -
> 5.19.0-2-sparc64-smp
>
> dpkg installed packages without any problems. Removed just installed
> packages, rebooted to 6.1.0-rc1 and tried to install packages, dpkg got
> segfault again.
>
> Recompiled 6.1.0-rc1 with gcc-11 instead of gcc-12, still segfaults...
> ... bisect time ...
>
> mator@ttip:~/linux-2.6$ git bisect log
> # bad: [9abf2313adc1ca1b6180c508c25f22f9395cc780] Linux 6.1-rc1
> # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
> git bisect start 'v6.1-rc1' 'v6.0'
> # good: [18fd049731e67651009f316195da9281b756f2cf] Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> git bisect good 18fd049731e67651009f316195da9281b756f2cf
> # good: [4c540c92b46497dcda59203eea78e4620bc96f47] RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output
> git bisect good 4c540c92b46497dcda59203eea78e4620bc96f47
> # bad: [27bc50fc90647bbf7b734c3fc306a5e61350da53] Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad 27bc50fc90647bbf7b734c3fc306a5e61350da53
> # good: [ada3bfb6492a6d0d3eca50f3b61315fe032efc72] Merge tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
> git bisect good ada3bfb6492a6d0d3eca50f3b61315fe032efc72
> # bad: [5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6] mm: add pageblock_align() macro
> git bisect bad 5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6
> # bad: [54a611b605901c7d5d05b6b8f5d04a6ceb0962aa] Maple Tree: add new data structure
> git bisect bad 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa
> # good: [59298997df89e19aad426d4ae0a7e5037074da5a] x86/uaccess: avoid check_object_size() in copy_from_user_nmi()
> git bisect good 59298997df89e19aad426d4ae0a7e5037074da5a
> # good: [04c6b79ae4f0bcbd96afd7cea5e1a8848162438e] btrfs: convert __process_pages_contig() to use filemap_get_folios_contig()
> git bisect good 04c6b79ae4f0bcbd96afd7cea5e1a8848162438e
> # good: [da29499124cd2221539b235c1f93c7d93faf6565] mm, hwpoison: use __PageMovable() to detect non-lru movable pages
> git bisect good da29499124cd2221539b235c1f93c7d93faf6565
> # bad: [eed9a328aa1ae6ac1edaa026957e6882f57de0dd] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
> git bisect bad eed9a328aa1ae6ac1edaa026957e6882f57de0dd
> # bad: [f347c9d2697fcbbb64e077f7113a3887a181b8c0] filemap: make the accounting of thrashing more consistent
> git bisect bad f347c9d2697fcbbb64e077f7113a3887a181b8c0
> # good: [eba4d770efc86a3710e36b828190858abfa3bb74] mm/swap: comment all the ifdef in swapops.h
> git bisect good eba4d770efc86a3710e36b828190858abfa3bb74
> # bad: [2e3468778dbe3ec389a10c21a703bb8e5be5cfbc] mm: remember young/dirty bit for page migrations
> git bisect bad 2e3468778dbe3ec389a10c21a703bb8e5be5cfbc
> # bad: [0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb] mm/thp: carry over dirty bit when thp splits on pmd
> git bisect bad 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb
> # good: [0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e] mm/swap: add swp_offset_pfn() to fetch PFN from swap entry
> git bisect good 0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e
> # first bad commit: [0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb] mm/thp: carry over dirty bit when thp splits on pmd
>
>
> mator@ttip:~/linux-2.6$ git bisect good
> 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb is the first bad commit
> commit 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb
> Author: Peter Xu <[email protected]>
> Date: Thu Aug 11 12:13:28 2022 -0400
>
> mm/thp: carry over dirty bit when thp splits on pmd
>
> Carry over the dirty bit from pmd to pte when a huge pmd splits. It
> shouldn't be a correctness issue since when pmd_dirty() we'll have the
> page marked dirty anyway, however having dirty bit carried over helps the
> next initial writes of split ptes on some archs like x86.
>
> Link: https://lkml.kernel.org/r/[email protected]
>
>
>
> So, v6.0-rc3-176-g0d206b5d2e0d) does not segfault dpkg,
> v6.0-rc3-177-g0ccf7f168e17 segfaults it on package install.
>
> dpkg test was (apt) install/remove some packages, segfaults only on install
> (not remove).
>
> Reverted 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb from top of v6.1-rc1 and
> tried to compile kernel, but got error
>
> mm/huge_memory.c: In function ‘__split_huge_pmd_locked’:
> mm/huge_memory.c:2129:17: error: ‘dirty’ undeclared (first use in this function)
> 2129 | dirty = is_migration_entry_dirty(entry);
> | ^~~~~
> mm/huge_memory.c:2129:17: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [scripts/Makefile.build:250: mm/huge_memory.o] Error 1
>
> So can't test v6.1-rc1 with patch reverted...

Sorry to know this, and thanks for the report and debugging. The revert
won't work because dirty variable is used in later patch for the swap path
too. I've attached a partial (and minimum) revert, feel free to try.

I had a feeling that it's somehow related to the special impl of sparc64
pte_mkdirty() where a kernel patching mechanism is used to share code
between sun4[uv]. I'd assume your machine is sun4v? As that's the one
that needs the patching, iiuc.

The sparc64 impl goes back to commit cf627156c450 ("[SPARC64]: Use inline
patching for critical PTE operations.", 2006-03-20). I believe it works
solidly for all these years, so I really have no quick clue on why that can
fail with the new code added.

I think the magic is done with sun4v_patch_2insn_range(). What I can think
of is this thp patch can definitely add much more places of the kernel that
will need patching, because both __split_huge_pmd() and split_huge_pmd()
are defined as macros not functions. However I don't see a problem for it
so far, e.g., I don't see a limitation of __sun4v_2insn_patch_end growing
to satisfy all those new spots.

I'm copying David Miller who implemented the sparc64 pte operations. I
know he's probably always very busy, but just in case there'll be quick
answers so we don't need the revert patch but just make it work for sparc64
too. Currently with the revert patch we'll start to loose dirty bit again
like before on many archs when thp split, but I assume that's so far better
than breaking any arch or making an arch specific ifdef so we can revisit.

Thanks,

--
Peter Xu


Attachments:
(No filename) (8.58 kB)
0001-Partly-revert-mm-thp-carry-over-dirty-bit-when-thp-s.patch (1.68 kB)
Download all attachments

2022-10-25 10:52:20

by Anatoly Pugachev

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Sun, Oct 23, 2022 at 10:53 PM Peter Xu <[email protected]> wrote:
> On Fri, Oct 21, 2022 at 07:06:03PM +0300, Anatoly Pugachev wrote:
> >
> > Link: https://lkml.kernel.org/r/[email protected]
> >
> > So, v6.0-rc3-176-g0d206b5d2e0d) does not segfault dpkg,
> > v6.0-rc3-177-g0ccf7f168e17 segfaults it on package install.
> >
> > dpkg test was (apt) install/remove some packages, segfaults only on install
> > (not remove).
> >
> > Reverted 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb from top of v6.1-rc1 and
> > tried to compile kernel, but got error
> >
> > mm/huge_memory.c: In function ‘__split_huge_pmd_locked’:
> > mm/huge_memory.c:2129:17: error: ‘dirty’ undeclared (first use in this function)
> > 2129 | dirty = is_migration_entry_dirty(entry);
> > | ^~~~~
> > mm/huge_memory.c:2129:17: note: each undeclared identifier is reported only once for each function it appears in
> > make[2]: *** [scripts/Makefile.build:250: mm/huge_memory.o] Error 1
> >
> > So can't test v6.1-rc1 with patch reverted...
>
> Sorry to know this, and thanks for the report and debugging. The revert
> won't work because dirty variable is used in later patch for the swap path
> too. I've attached a partial (and minimum) revert, feel free to try.

Peter,

tested again with 6.1.0-rc2 already, non patched kernel segfaulting
dpkg, using your patch makes dpkg
(or kernel) to behave properly.
Thanks!

> I had a feeling that it's somehow related to the special impl of sparc64
> pte_mkdirty() where a kernel patching mechanism is used to share code
> between sun4[uv]. I'd assume your machine is sun4v? As that's the one
> that needs the patching, iiuc.

kernel boot log reports
ARCH: SUN4V

2022-10-25 15:32:37

by Peter Xu

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Tue, Oct 25, 2022 at 01:22:45PM +0300, Anatoly Pugachev wrote:
> On Sun, Oct 23, 2022 at 10:53 PM Peter Xu <[email protected]> wrote:
> > On Fri, Oct 21, 2022 at 07:06:03PM +0300, Anatoly Pugachev wrote:
> > >
> > > Link: https://lkml.kernel.org/r/[email protected]
> > >
> > > So, v6.0-rc3-176-g0d206b5d2e0d) does not segfault dpkg,
> > > v6.0-rc3-177-g0ccf7f168e17 segfaults it on package install.
> > >
> > > dpkg test was (apt) install/remove some packages, segfaults only on install
> > > (not remove).
> > >
> > > Reverted 0ccf7f168e17bb7eb5a322397ba5a841f4fbaccb from top of v6.1-rc1 and
> > > tried to compile kernel, but got error
> > >
> > > mm/huge_memory.c: In function ‘__split_huge_pmd_locked’:
> > > mm/huge_memory.c:2129:17: error: ‘dirty’ undeclared (first use in this function)
> > > 2129 | dirty = is_migration_entry_dirty(entry);
> > > | ^~~~~
> > > mm/huge_memory.c:2129:17: note: each undeclared identifier is reported only once for each function it appears in
> > > make[2]: *** [scripts/Makefile.build:250: mm/huge_memory.o] Error 1
> > >
> > > So can't test v6.1-rc1 with patch reverted...
> >
> > Sorry to know this, and thanks for the report and debugging. The revert
> > won't work because dirty variable is used in later patch for the swap path
> > too. I've attached a partial (and minimum) revert, feel free to try.
>
> Peter,
>
> tested again with 6.1.0-rc2 already, non patched kernel segfaulting
> dpkg, using your patch makes dpkg
> (or kernel) to behave properly.
> Thanks!

Thanks for the quick feedback.

>
> > I had a feeling that it's somehow related to the special impl of sparc64
> > pte_mkdirty() where a kernel patching mechanism is used to share code
> > between sun4[uv]. I'd assume your machine is sun4v? As that's the one
> > that needs the patching, iiuc.
>
> kernel boot log reports
> ARCH: SUN4V

Then it's expected but unfortunate too, as QEMU doesn't seem to have
support on sun4v so I cannot even try that out with a VM.

https://wiki.qemu.org/Documentation/Platforms/SPARC

I'd also expect there's nothing useful in either dmesg or relevant logs
because it's segv, but please share if you find anything that may be
helpful.

Maybe we need to have the minimum revert for v6.1 before we have more
clues.

Thanks,

--
Peter Xu


2022-11-01 13:32:27

by Anatoly Pugachev

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Tue, Oct 25, 2022 at 5:43 PM Peter Xu <[email protected]> wrote:
>
> On Tue, Oct 25, 2022 at 01:22:45PM +0300, Anatoly Pugachev wrote:
> > On Sun, Oct 23, 2022 at 10:53 PM Peter Xu <[email protected]> wrote:
> > > On Fri, Oct 21, 2022 at 07:06:03PM +0300, Anatoly Pugachev wrote:
> > > >
> > > > Link: https://lkml.kernel.org/r/[email protected]
>
> Maybe we need to have the minimum revert for v6.1 before we have more
> clues.

Just a quick update on 6.1.0-rc3

Tested again with 6.1.0-rc3, segfaults dpkg... applied patch - no dpkg
segfaults.

2022-11-02 19:14:31

by Peter Xu

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Tue, Nov 01, 2022 at 04:13:20PM +0300, Anatoly Pugachev wrote:
> On Tue, Oct 25, 2022 at 5:43 PM Peter Xu <[email protected]> wrote:
> >
> > On Tue, Oct 25, 2022 at 01:22:45PM +0300, Anatoly Pugachev wrote:
> > > On Sun, Oct 23, 2022 at 10:53 PM Peter Xu <[email protected]> wrote:
> > > > On Fri, Oct 21, 2022 at 07:06:03PM +0300, Anatoly Pugachev wrote:
> > > > >
> > > > > Link: https://lkml.kernel.org/r/[email protected]
> >
> > Maybe we need to have the minimum revert for v6.1 before we have more
> > clues.
>
> Just a quick update on 6.1.0-rc3
>
> Tested again with 6.1.0-rc3, segfaults dpkg... applied patch - no dpkg
> segfaults.

Andrew, shall we apply the minimum revert for this patch for now? The
one-liner was attached in this email I replied to Anatoly:

https://lore.kernel.org/all/Y1Wbi4yyVvDtg4zN@x1n/

Thanks,

--
Peter Xu


2022-11-02 20:02:20

by Andrew Morton

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd)

On Wed, 2 Nov 2022 14:34:17 -0400 Peter Xu <[email protected]> wrote:

> > Tested again with 6.1.0-rc3, segfaults dpkg... applied patch - no dpkg
> > segfaults.
>
> Andrew, shall we apply the minimum revert for this patch for now? The
> one-liner was attached in this email I replied to Anatoly:
>
> https://lore.kernel.org/all/Y1Wbi4yyVvDtg4zN@x1n/

Oh. I missed that in the email flood.

I added the Fixes: and queued it, thanks.

2022-11-04 11:16:46

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd) #forregzbot

On 23.10.22 15:33, Thorsten Leemhuis wrote:
> On 21.10.22 18:06, Anatoly Pugachev wrote:
>> Tried to update my debian sparc64 sid (unstable) linux distro to latest
>> version of available packages, got dpkg segfault...
> #regzbot ^introduced 0ccf7f168e17bb7
> #regzbot title mm: sparc64: dpkg fails on sparc64 since "mm/thp: Carry
> over dirty bit when thp splits on pmd)"
> #regzbot ignore-activity

#regzbot fixed-by: 434e3d15d92b

2022-11-13 19:01:34

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd) #forregzbot



On 04.11.22 11:39, Thorsten Leemhuis wrote:
> On 23.10.22 15:33, Thorsten Leemhuis wrote:
>> On 21.10.22 18:06, Anatoly Pugachev wrote:
>>> Tried to update my debian sparc64 sid (unstable) linux distro to latest
>>> version of available packages, got dpkg segfault...
>> #regzbot ^introduced 0ccf7f168e17bb7
>> #regzbot title mm: sparc64: dpkg fails on sparc64 since "mm/thp: Carry
>> over dirty bit when thp splits on pmd)"
>> #regzbot ignore-activity
>
> #regzbot fixed-by: 434e3d15d92b

#regzbot fixed-by: 624a2c94f5b7