A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.
Disable per-VMA locks config to prevent this issue while the problem is
being investigated. This is expected to be a temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/[email protected]
Reported-by: Jiri Slaby <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Reported-by: Jacob Young <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: [email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 09130434e30d..0abc6c71dd89 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
def_bool n
config PER_VMA_LOCK
- def_bool y
+ bool "Enable per-vma locking during page fault handling."
depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+ depends on BROKEN
help
Allow per-vma locking during page fault handling.
--
2.41.0.255.g8b1d071c50-goog
On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
>
> On 05.07.23 19:12, Suren Baghdasaryan wrote:
> > A memory corruption was reported in [1] with bisection pointing to the
> > patch [2] enabling per-VMA locks for x86.
> > Disable per-VMA locks config to prevent this issue while the problem is
> > being investigated. This is expected to be a temporary measure.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > [2] https://lore.kernel.org/all/[email protected]
> >
> > Reported-by: Jiri Slaby <[email protected]>
> > Closes: https://lore.kernel.org/all/[email protected]/
> > Reported-by: Jacob Young <[email protected]>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> > Cc: [email protected]
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > ---
> > mm/Kconfig | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 09130434e30d..0abc6c71dd89 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> > def_bool n
> >
> > config PER_VMA_LOCK
> > - def_bool y
> > + bool "Enable per-vma locking during page fault handling."
> > depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> > + depends on BROKEN
> > help
> > Allow per-vma locking during page fault handling.
> >
> Do we have any testing results (that don't reveal other issues :) ) for
> patch #1? Not sure if we really want to mark it broken if patch #1 fixes
> the issue.
I tested the fix using the only reproducer provided in the reports
plus kernel compilation and my fork stress test. All looked good and
stable but I don't know if other reports had the same issue or
something different.
I think the urgency to disable the feature stems from the timeline
being very close to when distributions will start using the 6.4 stable
kernel version.
>
> --
> Cheers,
>
> David / dhildenb
>
On 05.07.23 19:12, Suren Baghdasaryan wrote:
> A memory corruption was reported in [1] with bisection pointing to the
> patch [2] enabling per-VMA locks for x86.
> Disable per-VMA locks config to prevent this issue while the problem is
> being investigated. This is expected to be a temporary measure.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> [2] https://lore.kernel.org/all/[email protected]
>
> Reported-by: Jiri Slaby <[email protected]>
> Closes: https://lore.kernel.org/all/[email protected]/
> Reported-by: Jacob Young <[email protected]>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> Cc: [email protected]
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> ---
> mm/Kconfig | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 09130434e30d..0abc6c71dd89 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> def_bool n
>
> config PER_VMA_LOCK
> - def_bool y
> + bool "Enable per-vma locking during page fault handling."
> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> + depends on BROKEN
> help
> Allow per-vma locking during page fault handling.
>
Do we have any testing results (that don't reveal other issues :) ) for
patch #1? Not sure if we really want to mark it broken if patch #1 fixes
the issue.
--
Cheers,
David / dhildenb
On 05.07.23 19:22, Suren Baghdasaryan wrote:
> On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
>>
>> On 05.07.23 19:12, Suren Baghdasaryan wrote:
>>> A memory corruption was reported in [1] with bisection pointing to the
>>> patch [2] enabling per-VMA locks for x86.
>>> Disable per-VMA locks config to prevent this issue while the problem is
>>> being investigated. This is expected to be a temporary measure.
>>>
>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
>>> [2] https://lore.kernel.org/all/[email protected]
>>>
>>> Reported-by: Jiri Slaby <[email protected]>
>>> Closes: https://lore.kernel.org/all/[email protected]/
>>> Reported-by: Jacob Young <[email protected]>
>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
>>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
>>> Cc: [email protected]
>>> Signed-off-by: Suren Baghdasaryan <[email protected]>
>>> ---
>>> mm/Kconfig | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 09130434e30d..0abc6c71dd89 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
>>> def_bool n
>>>
>>> config PER_VMA_LOCK
>>> - def_bool y
>>> + bool "Enable per-vma locking during page fault handling."
>>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
>>> + depends on BROKEN
>>> help
>>> Allow per-vma locking during page fault handling.
>>>
>> Do we have any testing results (that don't reveal other issues :) ) for
>> patch #1? Not sure if we really want to mark it broken if patch #1 fixes
>> the issue.
>
> I tested the fix using the only reproducer provided in the reports
> plus kernel compilation and my fork stress test. All looked good and
> stable but I don't know if other reports had the same issue or
> something different.
Can you point me at the other reports, so I can quickly scan them?
--
Cheers,
David / dhildenb
On Wed, Jul 5, 2023 at 10:24 AM David Hildenbrand <[email protected]> wrote:
>
> On 05.07.23 19:22, Suren Baghdasaryan wrote:
> > On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
> >>
> >> On 05.07.23 19:12, Suren Baghdasaryan wrote:
> >>> A memory corruption was reported in [1] with bisection pointing to the
> >>> patch [2] enabling per-VMA locks for x86.
> >>> Disable per-VMA locks config to prevent this issue while the problem is
> >>> being investigated. This is expected to be a temporary measure.
> >>>
> >>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> >>> [2] https://lore.kernel.org/all/[email protected]
> >>>
> >>> Reported-by: Jiri Slaby <[email protected]>
> >>> Closes: https://lore.kernel.org/all/[email protected]/
> >>> Reported-by: Jacob Young <[email protected]>
> >>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> >>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> >>> Cc: [email protected]
> >>> Signed-off-by: Suren Baghdasaryan <[email protected]>
> >>> ---
> >>> mm/Kconfig | 3 ++-
> >>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/Kconfig b/mm/Kconfig
> >>> index 09130434e30d..0abc6c71dd89 100644
> >>> --- a/mm/Kconfig
> >>> +++ b/mm/Kconfig
> >>> @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> >>> def_bool n
> >>>
> >>> config PER_VMA_LOCK
> >>> - def_bool y
> >>> + bool "Enable per-vma locking during page fault handling."
> >>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> >>> + depends on BROKEN
> >>> help
> >>> Allow per-vma locking during page fault handling.
> >>>
> >> Do we have any testing results (that don't reveal other issues :) ) for
> >> patch #1? Not sure if we really want to mark it broken if patch #1 fixes
> >> the issue.
> >
> > I tested the fix using the only reproducer provided in the reports
> > plus kernel compilation and my fork stress test. All looked good and
> > stable but I don't know if other reports had the same issue or
> > something different.
>
> Can you point me at the other reports, so I can quickly scan them?
by Jacob Young: https://bugzilla.kernel.org/show_bug.cgi?id=217624
by Jiri Slaby: https://lore.kernel.org/all/[email protected]/
by Holger Hoffstätte:
https://lore.kernel.org/all/[email protected]/
only saying that Firefox started crashing after upgrading to 6.4.1
>
> --
> Cheers,
>
> David / dhildenb
>
On Wed, Jul 5, 2023 at 11:09 AM Suren Baghdasaryan <[email protected]> wrote:
>
> On Wed, Jul 5, 2023 at 10:24 AM David Hildenbrand <[email protected]> wrote:
> >
> > On 05.07.23 19:22, Suren Baghdasaryan wrote:
> > > On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
> > >>
> > >> On 05.07.23 19:12, Suren Baghdasaryan wrote:
> > >>> A memory corruption was reported in [1] with bisection pointing to the
> > >>> patch [2] enabling per-VMA locks for x86.
> > >>> Disable per-VMA locks config to prevent this issue while the problem is
> > >>> being investigated. This is expected to be a temporary measure.
> > >>>
> > >>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > >>> [2] https://lore.kernel.org/all/[email protected]
> > >>>
> > >>> Reported-by: Jiri Slaby <[email protected]>
> > >>> Closes: https://lore.kernel.org/all/[email protected]/
> > >>> Reported-by: Jacob Young <[email protected]>
> > >>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > >>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> > >>> Cc: [email protected]
> > >>> Signed-off-by: Suren Baghdasaryan <[email protected]>
> > >>> ---
> > >>> mm/Kconfig | 3 ++-
> > >>> 1 file changed, 2 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/mm/Kconfig b/mm/Kconfig
> > >>> index 09130434e30d..0abc6c71dd89 100644
> > >>> --- a/mm/Kconfig
> > >>> +++ b/mm/Kconfig
> > >>> @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> > >>> def_bool n
> > >>>
> > >>> config PER_VMA_LOCK
> > >>> - def_bool y
> > >>> + bool "Enable per-vma locking during page fault handling."
> > >>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> > >>> + depends on BROKEN
> > >>> help
> > >>> Allow per-vma locking during page fault handling.
> > >>>
> > >> Do we have any testing results (that don't reveal other issues :) ) for
> > >> patch #1? Not sure if we really want to mark it broken if patch #1 fixes
> > >> the issue.
> > >
> > > I tested the fix using the only reproducer provided in the reports
> > > plus kernel compilation and my fork stress test. All looked good and
> > > stable but I don't know if other reports had the same issue or
> > > something different.
> >
> > Can you point me at the other reports, so I can quickly scan them?
>
> by Jacob Young: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> by Jiri Slaby: https://lore.kernel.org/all/[email protected]/
From strace in https://lore.kernel.org/all/[email protected]/
looks like clone3() was involved, so this seems quite likely to be the
same issue I think.
> by Holger Hoffstätte:
> https://lore.kernel.org/all/[email protected]/
> only saying that Firefox started crashing after upgrading to 6.4.1
>
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
On 05.07.23 22:25, Peter Xu wrote:
> On Wed, Jul 05, 2023 at 10:22:27AM -0700, Suren Baghdasaryan wrote:
>> On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
>>>
>>> On 05.07.23 19:12, Suren Baghdasaryan wrote:
>>>> A memory corruption was reported in [1] with bisection pointing to the
>>>> patch [2] enabling per-VMA locks for x86.
>>>> Disable per-VMA locks config to prevent this issue while the problem is
>>>> being investigated. This is expected to be a temporary measure.
>>>>
>>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
>>>> [2] https://lore.kernel.org/all/[email protected]
>>>>
>>>> Reported-by: Jiri Slaby <[email protected]>
>>>> Closes: https://lore.kernel.org/all/[email protected]/
>>>> Reported-by: Jacob Young <[email protected]>
>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
>>>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
>>>> Cc: [email protected]
>>>> Signed-off-by: Suren Baghdasaryan <[email protected]>
>>>> ---
>>>> mm/Kconfig | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>>> index 09130434e30d..0abc6c71dd89 100644
>>>> --- a/mm/Kconfig
>>>> +++ b/mm/Kconfig
>>>> @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
>>>> def_bool n
>>>>
>>>> config PER_VMA_LOCK
>>>> - def_bool y
>>>> + bool "Enable per-vma locking during page fault handling."
>>>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
>>>> + depends on BROKEN
>>>> help
>>>> Allow per-vma locking during page fault handling.
>>>>
>>> Do we have any testing results (that don't reveal other issues :) ) for
>>> patch #1? Not sure if we really want to mark it broken if patch #1 fixes
>>> the issue.
>>
>> I tested the fix using the only reproducer provided in the reports
>> plus kernel compilation and my fork stress test. All looked good and
>> stable but I don't know if other reports had the same issue or
>> something different.
>
> The commit log seems slightly confusing. It mostly says the bug was still
> not solved, but I assume patch 1 is the current "fix", it's just not clear
> whether there's any other potential issues?
>
> According to the stable tree rules:
>
> - It must fix a problem that causes a build error (but not for things
> marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> security issue, or some "oh, that's not good" issue. In short, something
> critical.
>
> I think it means vma lock will never be fixed in 6.4, and it can't (because
> after this patch it'll be BROKEN, and this patch copies stable, and we
> can't fix BROKEN things in stables).
>
> Totally no problem I see, just to make sure this is what you wanted..
>
> There'll still try to be a final fix, am I right? As IIRC allowing page
> faults during fork() is one of the major goals of vma lock.
At least not that I am aware of (and people who care about that should
really work on scalable fork() alternatives, like that io_uring fork()
thingy).
My understanding is that CONFIG_PER_VMA_LOCK wants to speed up page
concurrent page faults *after* fork() [or rather, after new process
creation], IOW, when we have a lot of mmap() activity going on while
some threads of the new process are already active and don't actually
touch what's getting newly mmaped.
--
Cheers,
David / dhildenb
On Wed, Jul 05, 2023 at 10:22:27AM -0700, Suren Baghdasaryan wrote:
> On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
> >
> > On 05.07.23 19:12, Suren Baghdasaryan wrote:
> > > A memory corruption was reported in [1] with bisection pointing to the
> > > patch [2] enabling per-VMA locks for x86.
> > > Disable per-VMA locks config to prevent this issue while the problem is
> > > being investigated. This is expected to be a temporary measure.
> > >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > > [2] https://lore.kernel.org/all/[email protected]
> > >
> > > Reported-by: Jiri Slaby <[email protected]>
> > > Closes: https://lore.kernel.org/all/[email protected]/
> > > Reported-by: Jacob Young <[email protected]>
> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > > Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> > > Cc: [email protected]
> > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > > ---
> > > mm/Kconfig | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 09130434e30d..0abc6c71dd89 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> > > def_bool n
> > >
> > > config PER_VMA_LOCK
> > > - def_bool y
> > > + bool "Enable per-vma locking during page fault handling."
> > > depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> > > + depends on BROKEN
> > > help
> > > Allow per-vma locking during page fault handling.
> > >
> > Do we have any testing results (that don't reveal other issues :) ) for
> > patch #1? Not sure if we really want to mark it broken if patch #1 fixes
> > the issue.
>
> I tested the fix using the only reproducer provided in the reports
> plus kernel compilation and my fork stress test. All looked good and
> stable but I don't know if other reports had the same issue or
> something different.
The commit log seems slightly confusing. It mostly says the bug was still
not solved, but I assume patch 1 is the current "fix", it's just not clear
whether there's any other potential issues?
According to the stable tree rules:
- It must fix a problem that causes a build error (but not for things
marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
security issue, or some "oh, that's not good" issue. In short, something
critical.
I think it means vma lock will never be fixed in 6.4, and it can't (because
after this patch it'll be BROKEN, and this patch copies stable, and we
can't fix BROKEN things in stables).
Totally no problem I see, just to make sure this is what you wanted..
There'll still try to be a final fix, am I right? As IIRC allowing page
faults during fork() is one of the major goals of vma lock.
Thanks,
--
Peter Xu
On Wed, Jul 5, 2023 at 1:25 PM Peter Xu <[email protected]> wrote:
>
> On Wed, Jul 05, 2023 at 10:22:27AM -0700, Suren Baghdasaryan wrote:
> > On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
> > >
> > > On 05.07.23 19:12, Suren Baghdasaryan wrote:
> > > > A memory corruption was reported in [1] with bisection pointing to the
> > > > patch [2] enabling per-VMA locks for x86.
> > > > Disable per-VMA locks config to prevent this issue while the problem is
> > > > being investigated. This is expected to be a temporary measure.
> > > >
> > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > > > [2] https://lore.kernel.org/all/[email protected]
> > > >
> > > > Reported-by: Jiri Slaby <[email protected]>
> > > > Closes: https://lore.kernel.org/all/[email protected]/
> > > > Reported-by: Jacob Young <[email protected]>
> > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> > > > Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> > > > Cc: [email protected]
> > > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > > > ---
> > > > mm/Kconfig | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > > index 09130434e30d..0abc6c71dd89 100644
> > > > --- a/mm/Kconfig
> > > > +++ b/mm/Kconfig
> > > > @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> > > > def_bool n
> > > >
> > > > config PER_VMA_LOCK
> > > > - def_bool y
> > > > + bool "Enable per-vma locking during page fault handling."
> > > > depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> > > > + depends on BROKEN
> > > > help
> > > > Allow per-vma locking during page fault handling.
> > > >
> > > Do we have any testing results (that don't reveal other issues :) ) for
> > > patch #1? Not sure if we really want to mark it broken if patch #1 fixes
> > > the issue.
> >
> > I tested the fix using the only reproducer provided in the reports
> > plus kernel compilation and my fork stress test. All looked good and
> > stable but I don't know if other reports had the same issue or
> > something different.
>
> The commit log seems slightly confusing. It mostly says the bug was still
> not solved, but I assume patch 1 is the current "fix", it's just not clear
> whether there's any other potential issues?
>
> According to the stable tree rules:
>
> - It must fix a problem that causes a build error (but not for things
> marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> security issue, or some "oh, that's not good" issue. In short, something
> critical.
>
> I think it means vma lock will never be fixed in 6.4, and it can't (because
> after this patch it'll be BROKEN, and this patch copies stable, and we
> can't fix BROKEN things in stables).
I was hoping we could re-enable VMA locks in 6.4 once we get more
confirmations that the problem is gone. Is that not possible once the
BROKEN dependency is merged?
>
> Totally no problem I see, just to make sure this is what you wanted..
>
> There'll still try to be a final fix, am I right? As IIRC allowing page
> faults during fork() is one of the major goals of vma lock.
I think we can further optimize the locking rules here (see discussion
in https://lore.kernel.org/all/[email protected]/)
but for now we want the most effective and simple way to fix the
memory corruption problem.
Thanks,
Suren.
>
> Thanks,
>
> --
> Peter Xu
>
On Wed, Jul 5, 2023 at 1:37 PM David Hildenbrand <[email protected]> wrote:
>
> On 05.07.23 22:25, Peter Xu wrote:
> > On Wed, Jul 05, 2023 at 10:22:27AM -0700, Suren Baghdasaryan wrote:
> >> On Wed, Jul 5, 2023 at 10:16 AM David Hildenbrand <[email protected]> wrote:
> >>>
> >>> On 05.07.23 19:12, Suren Baghdasaryan wrote:
> >>>> A memory corruption was reported in [1] with bisection pointing to the
> >>>> patch [2] enabling per-VMA locks for x86.
> >>>> Disable per-VMA locks config to prevent this issue while the problem is
> >>>> being investigated. This is expected to be a temporary measure.
> >>>>
> >>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
> >>>> [2] https://lore.kernel.org/all/[email protected]
> >>>>
> >>>> Reported-by: Jiri Slaby <[email protected]>
> >>>> Closes: https://lore.kernel.org/all/[email protected]/
> >>>> Reported-by: Jacob Young <[email protected]>
> >>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
> >>>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
> >>>> Cc: [email protected]
> >>>> Signed-off-by: Suren Baghdasaryan <[email protected]>
> >>>> ---
> >>>> mm/Kconfig | 3 ++-
> >>>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/mm/Kconfig b/mm/Kconfig
> >>>> index 09130434e30d..0abc6c71dd89 100644
> >>>> --- a/mm/Kconfig
> >>>> +++ b/mm/Kconfig
> >>>> @@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
> >>>> def_bool n
> >>>>
> >>>> config PER_VMA_LOCK
> >>>> - def_bool y
> >>>> + bool "Enable per-vma locking during page fault handling."
> >>>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> >>>> + depends on BROKEN
> >>>> help
> >>>> Allow per-vma locking during page fault handling.
> >>>>
> >>> Do we have any testing results (that don't reveal other issues :) ) for
> >>> patch #1? Not sure if we really want to mark it broken if patch #1 fixes
> >>> the issue.
> >>
> >> I tested the fix using the only reproducer provided in the reports
> >> plus kernel compilation and my fork stress test. All looked good and
> >> stable but I don't know if other reports had the same issue or
> >> something different.
> >
> > The commit log seems slightly confusing. It mostly says the bug was still
> > not solved, but I assume patch 1 is the current "fix", it's just not clear
> > whether there's any other potential issues?
> >
> > According to the stable tree rules:
> >
> > - It must fix a problem that causes a build error (but not for things
> > marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> > security issue, or some "oh, that's not good" issue. In short, something
> > critical.
> >
> > I think it means vma lock will never be fixed in 6.4, and it can't (because
> > after this patch it'll be BROKEN, and this patch copies stable, and we
> > can't fix BROKEN things in stables).
> >
> > Totally no problem I see, just to make sure this is what you wanted..
> >
> > There'll still try to be a final fix, am I right? As IIRC allowing page
> > faults during fork() is one of the major goals of vma lock.
>
> At least not that I am aware of (and people who care about that should
> really work on scalable fork() alternatives, like that io_uring fork()
> thingy).
>
> My understanding is that CONFIG_PER_VMA_LOCK wants to speed up page
> concurrent page faults *after* fork() [or rather, after new process
> creation], IOW, when we have a lot of mmap() activity going on while
> some threads of the new process are already active and don't actually
> touch what's getting newly mmaped.
Getting as much concurrency as we can is the goal. If we can allow
some page faults during fork, I would take that too. But for now let's
deploy the simplest and safest approach.
>
> --
> Cheers,
>
> David / dhildenb
>
On Wed, Jul 05, 2023 at 04:25:21PM -0400, Peter Xu wrote:
> There'll still try to be a final fix, am I right? As IIRC allowing page
> faults during fork() is one of the major goals of vma lock.
Good grief, no. Why would we want to optimise something that happens
so rarely? The goal is, as usual, more performance. Satisfying page
faults while mmap()/munmap()/mprotect() are happening is worthwhile.
Those happen a lot more than fork().
In this case though, there's also a priority-inversion problem that
we're trying to solve where process A (high priority) calls mmap() while
process B (low priority) is reading /proc/$pid/smaps and now (because
rwsems are fair), none of process A's other threads can satisy any page
faults until process B is scheduled.
Where on earth did you get the idea that we cared even a little bit
about the performance of page fault during fork()?
On Wed, Jul 5, 2023 at 2:28 PM Matthew Wilcox <[email protected]> wrote:
>
> On Wed, Jul 05, 2023 at 04:25:21PM -0400, Peter Xu wrote:
> > There'll still try to be a final fix, am I right? As IIRC allowing page
> > faults during fork() is one of the major goals of vma lock.
>
> Good grief, no. Why would we want to optimise something that happens
> so rarely? The goal is, as usual, more performance. Satisfying page
> faults while mmap()/munmap()/mprotect() are happening is worthwhile.
> Those happen a lot more than fork().
>
> In this case though, there's also a priority-inversion problem that
> we're trying to solve where process A (high priority) calls mmap() while
> process B (low priority) is reading /proc/$pid/smaps and now (because
> rwsems are fair), none of process A's other threads can satisy any page
> faults until process B is scheduled.
>
> Where on earth did you get the idea that we cared even a little bit
> about the performance of page fault during fork()?
I think the original reasoning for Android to improve app launch time
could have made that impression. However the speed up there comes not
from allowing page faults into the parent process (Zygote) while it
forks a child but rather from the child being able to fault pages and
establish its mappings concurrently.
On Wed, Jul 05, 2023 at 10:27:56PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 05, 2023 at 04:25:21PM -0400, Peter Xu wrote:
> > There'll still try to be a final fix, am I right? As IIRC allowing page
> > faults during fork() is one of the major goals of vma lock.
>
> Good grief, no. Why would we want to optimise something that happens
> so rarely? The goal is, as usual, more performance. Satisfying page
> faults while mmap()/munmap()/mprotect() are happening is worthwhile.
> Those happen a lot more than fork().
>
> In this case though, there's also a priority-inversion problem that
> we're trying to solve where process A (high priority) calls mmap() while
> process B (low priority) is reading /proc/$pid/smaps and now (because
> rwsems are fair), none of process A's other threads can satisy any page
> faults until process B is scheduled.
Is it possible to extend vma lock to things like smaps?
>
> Where on earth did you get the idea that we cared even a little bit
> about the performance of page fault during fork()?
My memory, when I was talking to someone during the conference that
mentioned such a use case. But my memory can be just wrong, in that case
it's my fault, but I hope it's still fine to just ask here.
--
Peter Xu
On Wed, 5 Jul 2023 13:33:26 -0700 Suren Baghdasaryan <[email protected]> wrote:
> I was hoping we could re-enable VMA locks in 6.4 once we get more
> confirmations that the problem is gone. Is that not possible once the
> BROKEN dependency is merged?
I think "no". By doing this we're effectively backporting a minor
performance optimization, which isn't a thing we'd normally do.
On Wed, Jul 5, 2023 at 5:24 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 5 Jul 2023 13:33:26 -0700 Suren Baghdasaryan <[email protected]> wrote:
>
> > I was hoping we could re-enable VMA locks in 6.4 once we get more
> > confirmations that the problem is gone. Is that not possible once the
> > BROKEN dependency is merged?
>
> I think "no". By doing this we're effectively backporting a minor
> performance optimization, which isn't a thing we'd normally do.
In that case, maybe for 6.4 we send the fix and only disable it by
default without marking BROKEN? That way we still have a way to enable
it if desired?
>
On Wed, Jul 5, 2023 at 5:30 PM Suren Baghdasaryan <[email protected]> wrote:
>
> On Wed, Jul 5, 2023 at 5:24 PM Andrew Morton <[email protected]> wrote:
> >
> > On Wed, 5 Jul 2023 13:33:26 -0700 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > I was hoping we could re-enable VMA locks in 6.4 once we get more
> > > confirmations that the problem is gone. Is that not possible once the
> > > BROKEN dependency is merged?
> >
> > I think "no". By doing this we're effectively backporting a minor
> > performance optimization, which isn't a thing we'd normally do.
>
> In that case, maybe for 6.4 we send the fix and only disable it by
> default without marking BROKEN? That way we still have a way to enable
> it if desired?
I'm preparing the next version with Liam's corrections. If the above
option I suggested is acceptable I can send a modified second patch
which would not have BROKEN dependency.
>
> >
On Wed, 5 Jul 2023 17:32:09 -0700 Suren Baghdasaryan <[email protected]> wrote:
> On Wed, Jul 5, 2023 at 5:30 PM Suren Baghdasaryan <[email protected]> wrote:
> >
> > On Wed, Jul 5, 2023 at 5:24 PM Andrew Morton <[email protected]> wrote:
> > >
> > > On Wed, 5 Jul 2023 13:33:26 -0700 Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > > I was hoping we could re-enable VMA locks in 6.4 once we get more
> > > > confirmations that the problem is gone. Is that not possible once the
> > > > BROKEN dependency is merged?
> > >
> > > I think "no". By doing this we're effectively backporting a minor
> > > performance optimization, which isn't a thing we'd normally do.
> >
> > In that case, maybe for 6.4 we send the fix and only disable it by
> > default without marking BROKEN? That way we still have a way to enable
> > it if desired?
>
> I'm preparing the next version with Liam's corrections. If the above
> option I suggested is acceptable I can send a modified second patch
> which would not have BROKEN dependency.
I think just mark it broken and move on. At some later time we can
consider backporting the fixes into 6.4.x and reenabling, but I don't
think it's likely that we'll do this.
On Wed, Jul 5, 2023 at 5:44 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 5 Jul 2023 17:32:09 -0700 Suren Baghdasaryan <[email protected]> wrote:
>
> > On Wed, Jul 5, 2023 at 5:30 PM Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > On Wed, Jul 5, 2023 at 5:24 PM Andrew Morton <[email protected]> wrote:
> > > >
> > > > On Wed, 5 Jul 2023 13:33:26 -0700 Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > > I was hoping we could re-enable VMA locks in 6.4 once we get more
> > > > > confirmations that the problem is gone. Is that not possible once the
> > > > > BROKEN dependency is merged?
> > > >
> > > > I think "no". By doing this we're effectively backporting a minor
> > > > performance optimization, which isn't a thing we'd normally do.
> > >
> > > In that case, maybe for 6.4 we send the fix and only disable it by
> > > default without marking BROKEN? That way we still have a way to enable
> > > it if desired?
> >
> > I'm preparing the next version with Liam's corrections. If the above
> > option I suggested is acceptable I can send a modified second patch
> > which would not have BROKEN dependency.
>
> I think just mark it broken and move on. At some later time we can
> consider backporting the fixes into 6.4.x and reenabling, but I don't
> think it's likely that we'll do this.
Uh, ok. I'll send the next version shortly with the patch fixing the
issue and another one marking it BROKEN. Hopefully in the next version
we can roll it our more carefully, removing BROKEN dependency but
keeping it disabled by default?
>
On Wed, Jul 5, 2023 at 5:49 PM Suren Baghdasaryan <[email protected]> wrote:
>
> On Wed, Jul 5, 2023 at 5:44 PM Andrew Morton <[email protected]> wrote:
> >
> > On Wed, 5 Jul 2023 17:32:09 -0700 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > On Wed, Jul 5, 2023 at 5:30 PM Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > On Wed, Jul 5, 2023 at 5:24 PM Andrew Morton <[email protected]> wrote:
> > > > >
> > > > > On Wed, 5 Jul 2023 13:33:26 -0700 Suren Baghdasaryan <[email protected]> wrote:
> > > > >
> > > > > > I was hoping we could re-enable VMA locks in 6.4 once we get more
> > > > > > confirmations that the problem is gone. Is that not possible once the
> > > > > > BROKEN dependency is merged?
> > > > >
> > > > > I think "no". By doing this we're effectively backporting a minor
> > > > > performance optimization, which isn't a thing we'd normally do.
> > > >
> > > > In that case, maybe for 6.4 we send the fix and only disable it by
> > > > default without marking BROKEN? That way we still have a way to enable
> > > > it if desired?
> > >
> > > I'm preparing the next version with Liam's corrections. If the above
> > > option I suggested is acceptable I can send a modified second patch
> > > which would not have BROKEN dependency.
> >
> > I think just mark it broken and move on. At some later time we can
> > consider backporting the fixes into 6.4.x and reenabling, but I don't
> > think it's likely that we'll do this.
>
> Uh, ok. I'll send the next version shortly with the patch fixing the
> issue and another one marking it BROKEN. Hopefully in the next version
> we can roll it our more carefully, removing BROKEN dependency but
> keeping it disabled by default?
v4 is posted at
https://lore.kernel.org/all/[email protected]/
Thanks,
Suren.
>
> >