Hi,
Routine run of the test in net-next gave also this mm unit error.
root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
Testing UFFDIO_API (with syscall)... done
Testing UFFDIO_API (with /dev/userfaultfd)... done
Testing register-ioctls on anon... done
Testing register-ioctls on shmem... done
Testing register-ioctls on shmem-private... done
Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
Testing zeropage on anon... done
Testing zeropage on shmem... done
Testing zeropage on shmem-private... done
Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
Testing move on anon... done
Testing move-pmd on anon... done
Testing move-pmd-split on anon... done
Testing wp-fork on anon... done
Testing wp-fork on shmem... done
Testing wp-fork on shmem-private... done
Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-fork-with-event on anon... done
Testing wp-fork-with-event on shmem... done
Testing wp-fork-with-event on shmem-private... done
Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-fork-pin on anon... done
Testing wp-fork-pin on shmem... done
Testing wp-fork-pin on shmem-private... done
Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-fork-pin-with-event on anon... done
Testing wp-fork-pin-with-event on shmem... done
Testing wp-fork-pin-with-event on shmem-private... done
Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-unpopulated on anon... done
Testing minor on shmem... done
Testing minor on hugetlb... skipped [reason: memory allocation failed]
Testing minor-wp on shmem... done
Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
Testing minor-collapse on shmem... done
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... skipped [reason: memory allocation failed]
Testing events on hugetlb-private... skipped [reason: memory allocation failed]
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
Testing poison on anon... done
Testing poison on shmem... done
Testing poison on shmem-private... done
Testing poison on hugetlb... skipped [reason: memory allocation failed]
Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
It resulted in alarming errors in the syslog:
Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
At this point, it can be problem with my box's memory chips, or something with HUGETLB.
However, since the "classic" allocations were successful, the problem might be in huge pages, or
if I understood well, in deliberate poisoning of pages?
Please also find strace of the run.
Best regards,
Mirsad Todorovac
On 09.03.24 20:12, Mirsad Todorovac wrote:
> Hi,
>
> Routine run of the test in net-next gave also this mm unit error.
>
> root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
> Testing UFFDIO_API (with syscall)... done
> Testing UFFDIO_API (with /dev/userfaultfd)... done
> Testing register-ioctls on anon... done
> Testing register-ioctls on shmem... done
> Testing register-ioctls on shmem-private... done
> Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
> Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
> Testing zeropage on anon... done
> Testing zeropage on shmem... done
> Testing zeropage on shmem-private... done
> Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
> Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
> Testing move on anon... done
> Testing move-pmd on anon... done
> Testing move-pmd-split on anon... done
> Testing wp-fork on anon... done
> Testing wp-fork on shmem... done
> Testing wp-fork on shmem-private... done
> Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
> Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
> Testing wp-fork-with-event on anon... done
> Testing wp-fork-with-event on shmem... done
> Testing wp-fork-with-event on shmem-private... done
> Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
> Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> Testing wp-fork-pin on anon... done
> Testing wp-fork-pin on shmem... done
> Testing wp-fork-pin on shmem-private... done
> Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
> Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
> Testing wp-fork-pin-with-event on anon... done
> Testing wp-fork-pin-with-event on shmem... done
> Testing wp-fork-pin-with-event on shmem-private... done
> Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
> Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> Testing wp-unpopulated on anon... done
> Testing minor on shmem... done
> Testing minor on hugetlb... skipped [reason: memory allocation failed]
> Testing minor-wp on shmem... done
> Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
> Testing minor-collapse on shmem... done
> Testing sigbus on anon... done
> Testing sigbus on shmem... done
> Testing sigbus on shmem-private... done
> Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
> Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
> Testing sigbus-wp on anon... done
> Testing sigbus-wp on shmem... done
> Testing sigbus-wp on shmem-private... done
> Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
> Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
> Testing events on anon... done
> Testing events on shmem... done
> Testing events on shmem-private... done
> Testing events on hugetlb... skipped [reason: memory allocation failed]
> Testing events on hugetlb-private... skipped [reason: memory allocation failed]
> Testing events-wp on anon... done
> Testing events-wp on shmem... done
> Testing events-wp on shmem-private... done
> Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
> Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
> Testing poison on anon... done
> Testing poison on shmem... done
> Testing poison on shmem-private... done
> Testing poison on hugetlb... skipped [reason: memory allocation failed]
> Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
> Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
> root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
>
> It resulted in alarming errors in the syslog:
>
> Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
> Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
> Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
> Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
> Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
> Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
> Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
> Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
> Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
> Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
> Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
> Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
> Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
> Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
> Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
> Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
> Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
> Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
>
> At this point, it can be problem with my box's memory chips, or something with HUGETLB.
>
> However, since the "classic" allocations were successful, the problem might be in huge pages, or
> if I understood well, in deliberate poisoning of pages?
>
Isn't that just the (expected) side effect of UFFDIO_POISON tests?
IOW, there is no problem here. We are poisoning virtual memory locations
(not actual memory) and expect a SIGBUS on next access. While testing
that, we receive these messages.
The "ugly" thing here seems to be that we can trigger repeated pr_err()
from user space. There is no rate-limiting in place. Maybe UFFDIO_POISON
requires root permissions so this cannot be exploited by unprivileged
user space to flood the system log?
CCing Axel
--
Cheers,
David / dhildenb
On Mon, Mar 11, 2024 at 10:31:41AM +0100, David Hildenbrand wrote:
> On 09.03.24 20:12, Mirsad Todorovac wrote:
> > Hi,
> >
> > Routine run of the test in net-next gave also this mm unit error.
> >
> > root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
> > Testing UFFDIO_API (with syscall)... done
> > Testing UFFDIO_API (with /dev/userfaultfd)... done
> > Testing register-ioctls on anon... done
> > Testing register-ioctls on shmem... done
> > Testing register-ioctls on shmem-private... done
> > Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
> > Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing zeropage on anon... done
> > Testing zeropage on shmem... done
> > Testing zeropage on shmem-private... done
> > Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
> > Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing move on anon... done
> > Testing move-pmd on anon... done
> > Testing move-pmd-split on anon... done
> > Testing wp-fork on anon... done
> > Testing wp-fork on shmem... done
> > Testing wp-fork on shmem-private... done
> > Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
> > Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing wp-fork-with-event on anon... done
> > Testing wp-fork-with-event on shmem... done
> > Testing wp-fork-with-event on shmem-private... done
> > Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
> > Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing wp-fork-pin on anon... done
> > Testing wp-fork-pin on shmem... done
> > Testing wp-fork-pin on shmem-private... done
> > Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
> > Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing wp-fork-pin-with-event on anon... done
> > Testing wp-fork-pin-with-event on shmem... done
> > Testing wp-fork-pin-with-event on shmem-private... done
> > Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
> > Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing wp-unpopulated on anon... done
> > Testing minor on shmem... done
> > Testing minor on hugetlb... skipped [reason: memory allocation failed]
> > Testing minor-wp on shmem... done
> > Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
> > Testing minor-collapse on shmem... done
> > Testing sigbus on anon... done
> > Testing sigbus on shmem... done
> > Testing sigbus on shmem-private... done
> > Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
> > Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing sigbus-wp on anon... done
> > Testing sigbus-wp on shmem... done
> > Testing sigbus-wp on shmem-private... done
> > Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
> > Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing events on anon... done
> > Testing events on shmem... done
> > Testing events on shmem-private... done
> > Testing events on hugetlb... skipped [reason: memory allocation failed]
> > Testing events on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing events-wp on anon... done
> > Testing events-wp on shmem... done
> > Testing events-wp on shmem-private... done
> > Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
> > Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
> > Testing poison on anon... done
> > Testing poison on shmem... done
> > Testing poison on shmem-private... done
> > Testing poison on hugetlb... skipped [reason: memory allocation failed]
> > Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
> > Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
> > root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
> >
> > It resulted in alarming errors in the syslog:
> >
> > Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
> > Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
> > Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
> > Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
> > Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
> > Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
> > Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
> > Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
> > Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
> > Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
> > Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
> > Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
> > Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
> > Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
> > Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
> > Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
> > Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
> > Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
> >
> > At this point, it can be problem with my box's memory chips, or something with HUGETLB.
> >
> > However, since the "classic" allocations were successful, the problem might be in huge pages, or
> > if I understood well, in deliberate poisoning of pages?
> >
>
> Isn't that just the (expected) side effect of UFFDIO_POISON tests?
>
> IOW, there is no problem here. We are poisoning virtual memory locations
> (not actual memory) and expect a SIGBUS on next access. While testing that,
> we receive these messages.
Correct.
>
> The "ugly" thing here seems to be that we can trigger repeated pr_err() from
> user space. There is no rate-limiting in place. Maybe UFFDIO_POISON requires
> root permissions so this cannot be exploited by unprivileged user space to
> flood the system log?
>
> CCing Axel
This is pretty unfortunate.
I'm not concerned too much on flooding whoever kicks off the selftests, but
indeed this seems to be able to be used by anyone to trigger such endless
reports in dmesg.
The issue with requiring a privilege means any hypervisor that will need to
use this to emulate memory errors will also require such privilege, and it
can be a problem.
Logically such "hwpoison errors" are not real so it is not needed to be
reported in dmesg, but now we're leveraging it to be exactly the same as a
real hw error to share the code path, iiuc (e.g. on MCE injections).
One option is to use a different marker reflecting that such hwpoison error
is internal, so we don't need to report in dmesg. That'll also require
(besides another bit in pte markers) one extra VM_FAULT_* flag just for
such reports. Might be slightly an overkill, but I don't see another
better way; not reporting HWPOISON will complicate at least kvm use case
even more.
Or.. does syslog has its own protection in general for such printk floods?
It'll be easier if that's not a concern to flood then, but I'm not sure
from that regard.
Thanks,
--
Peter Xu
On 11.03.24 15:35, Peter Xu wrote:
> On Mon, Mar 11, 2024 at 10:31:41AM +0100, David Hildenbrand wrote:
>> On 09.03.24 20:12, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> Routine run of the test in net-next gave also this mm unit error.
>>>
>>> root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
>>> Testing UFFDIO_API (with syscall)... done
>>> Testing UFFDIO_API (with /dev/userfaultfd)... done
>>> Testing register-ioctls on anon... done
>>> Testing register-ioctls on shmem... done
>>> Testing register-ioctls on shmem-private... done
>>> Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
>>> Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing zeropage on anon... done
>>> Testing zeropage on shmem... done
>>> Testing zeropage on shmem-private... done
>>> Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
>>> Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing move on anon... done
>>> Testing move-pmd on anon... done
>>> Testing move-pmd-split on anon... done
>>> Testing wp-fork on anon... done
>>> Testing wp-fork on shmem... done
>>> Testing wp-fork on shmem-private... done
>>> Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
>>> Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing wp-fork-with-event on anon... done
>>> Testing wp-fork-with-event on shmem... done
>>> Testing wp-fork-with-event on shmem-private... done
>>> Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
>>> Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing wp-fork-pin on anon... done
>>> Testing wp-fork-pin on shmem... done
>>> Testing wp-fork-pin on shmem-private... done
>>> Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
>>> Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing wp-fork-pin-with-event on anon... done
>>> Testing wp-fork-pin-with-event on shmem... done
>>> Testing wp-fork-pin-with-event on shmem-private... done
>>> Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
>>> Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing wp-unpopulated on anon... done
>>> Testing minor on shmem... done
>>> Testing minor on hugetlb... skipped [reason: memory allocation failed]
>>> Testing minor-wp on shmem... done
>>> Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
>>> Testing minor-collapse on shmem... done
>>> Testing sigbus on anon... done
>>> Testing sigbus on shmem... done
>>> Testing sigbus on shmem-private... done
>>> Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
>>> Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing sigbus-wp on anon... done
>>> Testing sigbus-wp on shmem... done
>>> Testing sigbus-wp on shmem-private... done
>>> Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
>>> Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing events on anon... done
>>> Testing events on shmem... done
>>> Testing events on shmem-private... done
>>> Testing events on hugetlb... skipped [reason: memory allocation failed]
>>> Testing events on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing events-wp on anon... done
>>> Testing events-wp on shmem... done
>>> Testing events-wp on shmem-private... done
>>> Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
>>> Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
>>> Testing poison on anon... done
>>> Testing poison on shmem... done
>>> Testing poison on shmem-private... done
>>> Testing poison on hugetlb... skipped [reason: memory allocation failed]
>>> Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
>>> Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
>>> root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
>>>
>>> It resulted in alarming errors in the syslog:
>>>
>>> Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
>>> Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
>>> Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
>>> Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
>>> Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
>>> Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
>>> Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
>>> Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
>>> Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
>>> Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
>>> Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
>>> Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
>>> Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
>>> Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
>>> Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
>>> Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
>>> Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
>>> Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
>>>
>>> At this point, it can be problem with my box's memory chips, or something with HUGETLB.
>>>
>>> However, since the "classic" allocations were successful, the problem might be in huge pages, or
>>> if I understood well, in deliberate poisoning of pages?
>>>
>>
>> Isn't that just the (expected) side effect of UFFDIO_POISON tests?
>>
>> IOW, there is no problem here. We are poisoning virtual memory locations
>> (not actual memory) and expect a SIGBUS on next access. While testing that,
>> we receive these messages.
>
> Correct.
>
>>
>> The "ugly" thing here seems to be that we can trigger repeated pr_err() from
>> user space. There is no rate-limiting in place. Maybe UFFDIO_POISON requires
>> root permissions so this cannot be exploited by unprivileged user space to
>> flood the system log?
>>
>> CCing Axel
>
> This is pretty unfortunate.
>
> I'm not concerned too much on flooding whoever kicks off the selftests, but
> indeed this seems to be able to be used by anyone to trigger such endless
> reports in dmesg.
Right.
>
> The issue with requiring a privilege means any hypervisor that will need to
> use this to emulate memory errors will also require such privilege, and it
> can be a problem.
>
Yes, we don't want that.
> Logically such "hwpoison errors" are not real so it is not needed to be
> reported in dmesg, but now we're leveraging it to be exactly the same as a
> real hw error to share the code path, iiuc (e.g. on MCE injections).
>
> One option is to use a different marker reflecting that such hwpoison error
> is internal, so we don't need to report in dmesg. That'll also require
> (besides another bit in pte markers) one extra VM_FAULT_* flag just for
> such reports. Might be slightly an overkill, but I don't see another
> better way; not reporting HWPOISON will complicate at least kvm use case
> even more.
>
> Or.. does syslog has its own protection in general for such printk floods?
> It'll be easier if that's not a concern to flood then, but I'm not sure
> from that regard.
From what I know, flooding is considered problematic and we fix it up
using "Fixes:" commits. See 1b0a151c10a6d823f033023b9fdd9af72a89591b as
one "recent" example.
Usually we switch to the _ratelimited() functions, maybe
pr_warn_ratelimited() is good enough? But we'd lose some details on a
"real" MCE storm, though.
--
Cheers,
David / dhildenb
On Mon, Mar 11, 2024 at 03:48:14PM +0100, David Hildenbrand wrote:
> On 11.03.24 15:35, Peter Xu wrote:
> > On Mon, Mar 11, 2024 at 10:31:41AM +0100, David Hildenbrand wrote:
> > > On 09.03.24 20:12, Mirsad Todorovac wrote:
> > > > Hi,
> > > >
> > > > Routine run of the test in net-next gave also this mm unit error.
> > > >
> > > > root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
> > > > Testing UFFDIO_API (with syscall)... done
> > > > Testing UFFDIO_API (with /dev/userfaultfd)... done
> > > > Testing register-ioctls on anon... done
> > > > Testing register-ioctls on shmem... done
> > > > Testing register-ioctls on shmem-private... done
> > > > Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing zeropage on anon... done
> > > > Testing zeropage on shmem... done
> > > > Testing zeropage on shmem-private... done
> > > > Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing move on anon... done
> > > > Testing move-pmd on anon... done
> > > > Testing move-pmd-split on anon... done
> > > > Testing wp-fork on anon... done
> > > > Testing wp-fork on shmem... done
> > > > Testing wp-fork on shmem-private... done
> > > > Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing wp-fork-with-event on anon... done
> > > > Testing wp-fork-with-event on shmem... done
> > > > Testing wp-fork-with-event on shmem-private... done
> > > > Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing wp-fork-pin on anon... done
> > > > Testing wp-fork-pin on shmem... done
> > > > Testing wp-fork-pin on shmem-private... done
> > > > Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing wp-fork-pin-with-event on anon... done
> > > > Testing wp-fork-pin-with-event on shmem... done
> > > > Testing wp-fork-pin-with-event on shmem-private... done
> > > > Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing wp-unpopulated on anon... done
> > > > Testing minor on shmem... done
> > > > Testing minor on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing minor-wp on shmem... done
> > > > Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing minor-collapse on shmem... done
> > > > Testing sigbus on anon... done
> > > > Testing sigbus on shmem... done
> > > > Testing sigbus on shmem-private... done
> > > > Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing sigbus-wp on anon... done
> > > > Testing sigbus-wp on shmem... done
> > > > Testing sigbus-wp on shmem-private... done
> > > > Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing events on anon... done
> > > > Testing events on shmem... done
> > > > Testing events on shmem-private... done
> > > > Testing events on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing events on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing events-wp on anon... done
> > > > Testing events-wp on shmem... done
> > > > Testing events-wp on shmem-private... done
> > > > Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Testing poison on anon... done
> > > > Testing poison on shmem... done
> > > > Testing poison on shmem-private... done
> > > > Testing poison on hugetlb... skipped [reason: memory allocation failed]
> > > > Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
> > > > Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
> > > > root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
> > > >
> > > > It resulted in alarming errors in the syslog:
> > > >
> > > > Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
> > > > Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
> > > >
> > > > At this point, it can be problem with my box's memory chips, or something with HUGETLB.
> > > >
> > > > However, since the "classic" allocations were successful, the problem might be in huge pages, or
> > > > if I understood well, in deliberate poisoning of pages?
> > > >
> > >
> > > Isn't that just the (expected) side effect of UFFDIO_POISON tests?
> > >
> > > IOW, there is no problem here. We are poisoning virtual memory locations
> > > (not actual memory) and expect a SIGBUS on next access. While testing that,
> > > we receive these messages.
> >
> > Correct.
> >
> > >
> > > The "ugly" thing here seems to be that we can trigger repeated pr_err() from
> > > user space. There is no rate-limiting in place. Maybe UFFDIO_POISON requires
> > > root permissions so this cannot be exploited by unprivileged user space to
> > > flood the system log?
> > >
> > > CCing Axel
> >
> > This is pretty unfortunate.
> >
> > I'm not concerned too much on flooding whoever kicks off the selftests, but
> > indeed this seems to be able to be used by anyone to trigger such endless
> > reports in dmesg.
>
> Right.
>
> >
> > The issue with requiring a privilege means any hypervisor that will need to
> > use this to emulate memory errors will also require such privilege, and it
> > can be a problem.
> >
>
> Yes, we don't want that.
>
> > Logically such "hwpoison errors" are not real so it is not needed to be
> > reported in dmesg, but now we're leveraging it to be exactly the same as a
> > real hw error to share the code path, iiuc (e.g. on MCE injections).
> >
> > One option is to use a different marker reflecting that such hwpoison error
> > is internal, so we don't need to report in dmesg. That'll also require
> > (besides another bit in pte markers) one extra VM_FAULT_* flag just for
> > such reports. Might be slightly an overkill, but I don't see another
> > better way; not reporting HWPOISON will complicate at least kvm use case
> > even more.
> >
> > Or.. does syslog has its own protection in general for such printk floods?
> > It'll be easier if that's not a concern to flood then, but I'm not sure
> > from that regard.
>
> From what I know, flooding is considered problematic and we fix it up using
> "Fixes:" commits. See 1b0a151c10a6d823f033023b9fdd9af72a89591b as one
> "recent" example.
>
>
> Usually we switch to the _ratelimited() functions, maybe
> pr_warn_ratelimited() is good enough? But we'd lose some details on a "real"
> MCE storm, though.
Yeah, I didn't consider that previously because I thought leaking MCE
addresses might be a problem.
But now thinking it again, it'll be great if pr_err_ratelimited() works
here (I think we'd still want to report them with "err" not "warnings",
btw).
I don't worry too much on MCE storm, as in that case explicit addresses may
not be necessary if the whole system is on risk. What I don't know however
is whether the addresses may still matter if e.g. two continuous MCEs are
reported in a small time window, and whether those addresses are a concern
in that case if some got lost.
My MCE experience is pretty limited, so I don't have an answer to that.
Maybe it can be verified by proposing a patch like that and see whether
there can be any objections making it rate limtied. I'll leave that to
Axel to decide how to move forward.
--
Peter Xu
On Mon, Mar 11, 2024 at 8:12 AM Peter Xu <[email protected]> wrote:
>
> On Mon, Mar 11, 2024 at 03:48:14PM +0100, David Hildenbrand wrote:
> > On 11.03.24 15:35, Peter Xu wrote:
> > > On Mon, Mar 11, 2024 at 10:31:41AM +0100, David Hildenbrand wrote:
> > > > On 09.03.24 20:12, Mirsad Todorovac wrote:
> > > > > Hi,
> > > > >
> > > > > Routine run of the test in net-next gave also this mm unit error.
> > > > >
> > > > > root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
> > > > > Testing UFFDIO_API (with syscall)... done
> > > > > Testing UFFDIO_API (with /dev/userfaultfd)... done
> > > > > Testing register-ioctls on anon... done
> > > > > Testing register-ioctls on shmem... done
> > > > > Testing register-ioctls on shmem-private... done
> > > > > Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing zeropage on anon... done
> > > > > Testing zeropage on shmem... done
> > > > > Testing zeropage on shmem-private... done
> > > > > Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing move on anon... done
> > > > > Testing move-pmd on anon... done
> > > > > Testing move-pmd-split on anon... done
> > > > > Testing wp-fork on anon... done
> > > > > Testing wp-fork on shmem... done
> > > > > Testing wp-fork on shmem-private... done
> > > > > Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork-with-event on anon... done
> > > > > Testing wp-fork-with-event on shmem... done
> > > > > Testing wp-fork-with-event on shmem-private... done
> > > > > Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork-pin on anon... done
> > > > > Testing wp-fork-pin on shmem... done
> > > > > Testing wp-fork-pin on shmem-private... done
> > > > > Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork-pin-with-event on anon... done
> > > > > Testing wp-fork-pin-with-event on shmem... done
> > > > > Testing wp-fork-pin-with-event on shmem-private... done
> > > > > Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing wp-unpopulated on anon... done
> > > > > Testing minor on shmem... done
> > > > > Testing minor on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing minor-wp on shmem... done
> > > > > Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing minor-collapse on shmem... done
> > > > > Testing sigbus on anon... done
> > > > > Testing sigbus on shmem... done
> > > > > Testing sigbus on shmem-private... done
> > > > > Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing sigbus-wp on anon... done
> > > > > Testing sigbus-wp on shmem... done
> > > > > Testing sigbus-wp on shmem-private... done
> > > > > Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing events on anon... done
> > > > > Testing events on shmem... done
> > > > > Testing events on shmem-private... done
> > > > > Testing events on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing events on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing events-wp on anon... done
> > > > > Testing events-wp on shmem... done
> > > > > Testing events-wp on shmem-private... done
> > > > > Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Testing poison on anon... done
> > > > > Testing poison on shmem... done
> > > > > Testing poison on shmem-private... done
> > > > > Testing poison on hugetlb... skipped [reason: memory allocation failed]
> > > > > Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
> > > > > Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
> > > > > root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
> > > > >
> > > > > It resulted in alarming errors in the syslog:
> > > > >
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
> > > > > Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
> > > > >
> > > > > At this point, it can be problem with my box's memory chips, or something with HUGETLB.
> > > > >
> > > > > However, since the "classic" allocations were successful, the problem might be in huge pages, or
> > > > > if I understood well, in deliberate poisoning of pages?
> > > > >
> > > >
> > > > Isn't that just the (expected) side effect of UFFDIO_POISON tests?
> > > >
> > > > IOW, there is no problem here. We are poisoning virtual memory locations
> > > > (not actual memory) and expect a SIGBUS on next access. While testing that,
> > > > we receive these messages.
> > >
> > > Correct.
> > >
> > > >
> > > > The "ugly" thing here seems to be that we can trigger repeated pr_err() from
> > > > user space. There is no rate-limiting in place. Maybe UFFDIO_POISON requires
> > > > root permissions so this cannot be exploited by unprivileged user space to
> > > > flood the system log?
> > > >
> > > > CCing Axel
> > >
> > > This is pretty unfortunate.
> > >
> > > I'm not concerned too much on flooding whoever kicks off the selftests, but
> > > indeed this seems to be able to be used by anyone to trigger such endless
> > > reports in dmesg.
> >
> > Right.
> >
> > >
> > > The issue with requiring a privilege means any hypervisor that will need to
> > > use this to emulate memory errors will also require such privilege, and it
> > > can be a problem.
> > >
> >
> > Yes, we don't want that.
> >
> > > Logically such "hwpoison errors" are not real so it is not needed to be
> > > reported in dmesg, but now we're leveraging it to be exactly the same as a
> > > real hw error to share the code path, iiuc (e.g. on MCE injections).
> > >
> > > One option is to use a different marker reflecting that such hwpoison error
> > > is internal, so we don't need to report in dmesg. That'll also require
> > > (besides another bit in pte markers) one extra VM_FAULT_* flag just for
> > > such reports. Might be slightly an overkill, but I don't see another
> > > better way; not reporting HWPOISON will complicate at least kvm use case
> > > even more.
> > >
> > > Or.. does syslog has its own protection in general for such printk floods?
> > > It'll be easier if that's not a concern to flood then, but I'm not sure
> > > from that regard.
> >
> > From what I know, flooding is considered problematic and we fix it up using
> > "Fixes:" commits. See 1b0a151c10a6d823f033023b9fdd9af72a89591b as one
> > "recent" example.
> >
> >
> > Usually we switch to the _ratelimited() functions, maybe
> > pr_warn_ratelimited() is good enough? But we'd lose some details on a "real"
> > MCE storm, though.
>
> Yeah, I didn't consider that previously because I thought leaking MCE
> addresses might be a problem.
>
> But now thinking it again, it'll be great if pr_err_ratelimited() works
> here (I think we'd still want to report them with "err" not "warnings",
> btw).
>
> I don't worry too much on MCE storm, as in that case explicit addresses may
> not be necessary if the whole system is on risk. What I don't know however
> is whether the addresses may still matter if e.g. two continuous MCEs are
> reported in a small time window, and whether those addresses are a concern
> in that case if some got lost.
>
> My MCE experience is pretty limited, so I don't have an answer to that.
> Maybe it can be verified by proposing a patch like that and see whether
> there can be any objections making it rate limtied. I'll leave that to
> Axel to decide how to move forward.
I'd prefer not to require root or CAP_SYS_ADMIN or similar for
UFFDIO_POISON, because those control access to lots more things
besides, which we don't necessarily want the process using UFFD to be
able to do. :/
Ratelimiting seems fairly reasonable to me. I do see the concern about
dropping some addresses though. Perhaps we can mitigate that concern
by defining our own ratelimit interval/burst configuration? Another
idea would be to only ratelimit it if !CONFIG_DEBUG_VM or similar. Not
sure if that's considered valid or not. :)
>
> --
> Peter Xu
>
On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> UFFDIO_POISON, because those control access to lots more things
> besides, which we don't necessarily want the process using UFFD to be
> able to do. :/
>
> Ratelimiting seems fairly reasonable to me. I do see the concern about
> dropping some addresses though.
Do you know how much could an admin rely on such addresses? How frequent
would MCE generate normally in a sane system?
> Perhaps we can mitigate that concern by defining our own ratelimit
> interval/burst configuration?
Any details?
> Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> similar. Not sure if that's considered valid or not. :)
This, OTOH, sounds like an overkill..
I just checked again on the detail of ratelimit code, where we by default
it has:
#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
#define DEFAULT_RATELIMIT_BURST 10
So it allows a 10 times burst rather than 2.. IIUC it means even if
there're continous 10 MCEs it won't get suppressed, until the 11th came, in
5 seconds interval. I think it means it's possibly even less of a concern
to directly use pr_err_ratelimited().
Thanks,
--
Peter Xu
On Mon, Mar 11, 2024 at 12:28 PM Peter Xu <[email protected]> wrote:
>
> On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> > I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> > UFFDIO_POISON, because those control access to lots more things
> > besides, which we don't necessarily want the process using UFFD to be
> > able to do. :/
I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN.
> >
> > Ratelimiting seems fairly reasonable to me. I do see the concern about
> > dropping some addresses though.
>
> Do you know how much could an admin rely on such addresses? How frequent
> would MCE generate normally in a sane system?
I'm not sure about how much admins rely on the address themselves. +cc
Jiaqi Yan
It's possible for a sane hypervisor dealing with a buggy guest / guest
userspace to trigger lots of these pr_errs. Consider the case where a
guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to
ignore), and then ignores SIGBUS. It will keep getting MCEs /
SIGBUSes.
The sane hypervisor will use UFFDIO_POISON to prevent the guest from
re-accessing *real* poison, but we will still get the pr_err, and we
still keep injecting MCEs into the guest. We have observed scenarios
like this before.
>
> > Perhaps we can mitigate that concern by defining our own ratelimit
> > interval/burst configuration?
>
> Any details?
>
> > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> > similar. Not sure if that's considered valid or not. :)
>
> This, OTOH, sounds like an overkill..
>
> I just checked again on the detail of ratelimit code, where we by default
> it has:
>
> #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
> #define DEFAULT_RATELIMIT_BURST 10
>
> So it allows a 10 times burst rather than 2.. IIUC it means even if
> there're continous 10 MCEs it won't get suppressed, until the 11th came, in
> 5 seconds interval. I think it means it's possibly even less of a concern
> to directly use pr_err_ratelimited().
I'm okay with any rate limiting everyone agrees on. IMO, silencing
these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they
did not come from real hardware MCE events) sounds like the most
correct thing to do, but I don't mind. Just don't make UFFDIO_POISON
require CAP_SYS_ADMIN. :)
Thanks.
On Mon, Mar 11, 2024 at 2:27 PM James Houghton <[email protected]> wrote:
>
> On Mon, Mar 11, 2024 at 12:28 PM Peter Xu <[email protected]> wrote:
> >
> > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> > > UFFDIO_POISON, because those control access to lots more things
> > > besides, which we don't necessarily want the process using UFFD to be
> > > able to do. :/
>
> I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN.
+1.
>
> > >
> > > Ratelimiting seems fairly reasonable to me. I do see the concern about
> > > dropping some addresses though.
> >
> > Do you know how much could an admin rely on such addresses? How frequent
> > would MCE generate normally in a sane system?
>
> I'm not sure about how much admins rely on the address themselves. +cc
> Jiaqi Yan
I think admins mostly care about MCEs from **real** hardware. For
example they may choose to perform some maintenance if the number of
hardware DIMM errors, keyed by PFN, exceeds some threshold. And I
think mcelog or /sys/devices/system/node/node${X}/memory_failure are
better tools than dmesg. In the case all memory errors are emulated by
hypervisor after a live migration, these dmesgs may confuse admins to
think there is dimm error on host but actually it is not the case. In
this sense, silencing these emulated by UFFDIO_POISON makes sense (if
not too complicated to do).
SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory
corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON
are less useful to admins AFAIK. They are for sure crucial to
userspace / vmm / hypervisor, but the SIGBUS sent already contains the
poisoned address (in si_addr from force_sig_mceerr).
>
> It's possible for a sane hypervisor dealing with a buggy guest / guest
> userspace to trigger lots of these pr_errs. Consider the case where a
> guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to
> ignore), and then ignores SIGBUS. It will keep getting MCEs /
> SIGBUSes.
>
> The sane hypervisor will use UFFDIO_POISON to prevent the guest from
> re-accessing *real* poison, but we will still get the pr_err, and we
> still keep injecting MCEs into the guest. We have observed scenarios
> like this before.
>
> >
> > > Perhaps we can mitigate that concern by defining our own ratelimit
> > > interval/burst configuration?
> >
> > Any details?
> >
> > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> > > similar. Not sure if that's considered valid or not. :)
> >
> > This, OTOH, sounds like an overkill..
> >
> > I just checked again on the detail of ratelimit code, where we by default
> > it has:
> >
> > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
> > #define DEFAULT_RATELIMIT_BURST 10
> >
> > So it allows a 10 times burst rather than 2.. IIUC it means even if
> > there're continous 10 MCEs it won't get suppressed, until the 11th came, in
> > 5 seconds interval. I think it means it's possibly even less of a concern
> > to directly use pr_err_ratelimited().
>
> I'm okay with any rate limiting everyone agrees on. IMO, silencing
> these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they
> did not come from real hardware MCE events) sounds like the most
> correct thing to do, but I don't mind. Just don't make UFFDIO_POISON
> require CAP_SYS_ADMIN. :)
>
> Thanks.
On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote:
> On Mon, Mar 11, 2024 at 2:27 PM James Houghton <[email protected]> wrote:
> >
> > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu <[email protected]> wrote:
> > >
> > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> > > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> > > > UFFDIO_POISON, because those control access to lots more things
> > > > besides, which we don't necessarily want the process using UFFD to be
> > > > able to do. :/
> >
> > I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN.
>
> +1.
>
>
> >
> > > >
> > > > Ratelimiting seems fairly reasonable to me. I do see the concern about
> > > > dropping some addresses though.
> > >
> > > Do you know how much could an admin rely on such addresses? How frequent
> > > would MCE generate normally in a sane system?
> >
> > I'm not sure about how much admins rely on the address themselves. +cc
> > Jiaqi Yan
>
> I think admins mostly care about MCEs from **real** hardware. For
> example they may choose to perform some maintenance if the number of
> hardware DIMM errors, keyed by PFN, exceeds some threshold. And I
> think mcelog or /sys/devices/system/node/node${X}/memory_failure are
> better tools than dmesg. In the case all memory errors are emulated by
> hypervisor after a live migration, these dmesgs may confuse admins to
> think there is dimm error on host but actually it is not the case. In
> this sense, silencing these emulated by UFFDIO_POISON makes sense (if
> not too complicated to do).
Now we have three types of such error: (1) PFN poisoned, (2) swapin error,
(3) emulated. Both 1+2 should deserve a global message dump, while (3)
should be process-internal, and nobody else should need to care except the
process itself (via the signal + meta info).
If we want to differenciate (2) v.s. (3), we may need 1 more pte marker bit
to show whether such poison is "global" or "local" (while as of now 2+3
shares the usage of the same PTE_MARKER_POISONED bit); a swapin error can
still be seen as a "global" error (instead of a mem error, it can be a disk
error, and the err msg still applies to it describing a VA corrupt).
Another VM_FAULT_* flag is also needed to reflect that locality, then
ignore a global broadcast for "local" poison faults.
>
> SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory
> corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON
> are less useful to admins AFAIK. They are for sure crucial to
> userspace / vmm / hypervisor, but the SIGBUS sent already contains the
> poisoned address (in si_addr from force_sig_mceerr).
>
> >
> > It's possible for a sane hypervisor dealing with a buggy guest / guest
> > userspace to trigger lots of these pr_errs. Consider the case where a
> > guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to
> > ignore), and then ignores SIGBUS. It will keep getting MCEs /
> > SIGBUSes.
> >
> > The sane hypervisor will use UFFDIO_POISON to prevent the guest from
> > re-accessing *real* poison, but we will still get the pr_err, and we
> > still keep injecting MCEs into the guest. We have observed scenarios
> > like this before.
> >
> > >
> > > > Perhaps we can mitigate that concern by defining our own ratelimit
> > > > interval/burst configuration?
> > >
> > > Any details?
> > >
> > > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> > > > similar. Not sure if that's considered valid or not. :)
> > >
> > > This, OTOH, sounds like an overkill..
> > >
> > > I just checked again on the detail of ratelimit code, where we by default
> > > it has:
> > >
> > > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
> > > #define DEFAULT_RATELIMIT_BURST 10
> > >
> > > So it allows a 10 times burst rather than 2.. IIUC it means even if
> > > there're continous 10 MCEs it won't get suppressed, until the 11th came, in
> > > 5 seconds interval. I think it means it's possibly even less of a concern
> > > to directly use pr_err_ratelimited().
> >
> > I'm okay with any rate limiting everyone agrees on. IMO, silencing
> > these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they
> > did not come from real hardware MCE events) sounds like the most
> > correct thing to do, but I don't mind. Just don't make UFFDIO_POISON
> > require CAP_SYS_ADMIN. :)
> >
> > Thanks.
>
--
Peter Xu
On Tue, Mar 12, 2024 at 8:38 AM Peter Xu <[email protected]> wrote:
>
> On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote:
> > On Mon, Mar 11, 2024 at 2:27 PM James Houghton <[email protected]> wrote:
> > >
> > > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu <[email protected]> wrote:
> > > >
> > > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote:
> > > > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for
> > > > > UFFDIO_POISON, because those control access to lots more things
> > > > > besides, which we don't necessarily want the process using UFFD to be
> > > > > able to do. :/
> > >
> > > I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN.
> >
> > +1.
> >
> >
> > >
> > > > >
> > > > > Ratelimiting seems fairly reasonable to me. I do see the concern about
> > > > > dropping some addresses though.
> > > >
> > > > Do you know how much could an admin rely on such addresses? How frequent
> > > > would MCE generate normally in a sane system?
> > >
> > > I'm not sure about how much admins rely on the address themselves. +cc
> > > Jiaqi Yan
> >
> > I think admins mostly care about MCEs from **real** hardware. For
> > example they may choose to perform some maintenance if the number of
> > hardware DIMM errors, keyed by PFN, exceeds some threshold. And I
> > think mcelog or /sys/devices/system/node/node${X}/memory_failure are
> > better tools than dmesg. In the case all memory errors are emulated by
> > hypervisor after a live migration, these dmesgs may confuse admins to
> > think there is dimm error on host but actually it is not the case. In
> > this sense, silencing these emulated by UFFDIO_POISON makes sense (if
> > not too complicated to do).
>
> Now we have three types of such error: (1) PFN poisoned, (2) swapin error,
> (3) emulated. Both 1+2 should deserve a global message dump, while (3)
> should be process-internal, and nobody else should need to care except the
> process itself (via the signal + meta info).
>
> If we want to differenciate (2) v.s. (3), we may need 1 more pte marker bit
> to show whether such poison is "global" or "local" (while as of now 2+3
> shares the usage of the same PTE_MARKER_POISONED bit); a swapin error can
> still be seen as a "global" error (instead of a mem error, it can be a disk
> error, and the err msg still applies to it describing a VA corrupt).
> Another VM_FAULT_* flag is also needed to reflect that locality, then
> ignore a global broadcast for "local" poison faults.
It's easy to implement, as long as folks aren't too offended by taking
one more bit. :) I can send a patch for this on Monday if there are no
objections.
>
> >
> > SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory
> > corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON
> > are less useful to admins AFAIK. They are for sure crucial to
> > userspace / vmm / hypervisor, but the SIGBUS sent already contains the
> > poisoned address (in si_addr from force_sig_mceerr).
> >
> > >
> > > It's possible for a sane hypervisor dealing with a buggy guest / guest
> > > userspace to trigger lots of these pr_errs. Consider the case where a
> > > guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to
> > > ignore), and then ignores SIGBUS. It will keep getting MCEs /
> > > SIGBUSes.
> > >
> > > The sane hypervisor will use UFFDIO_POISON to prevent the guest from
> > > re-accessing *real* poison, but we will still get the pr_err, and we
> > > still keep injecting MCEs into the guest. We have observed scenarios
> > > like this before.
> > >
> > > >
> > > > > Perhaps we can mitigate that concern by defining our own ratelimit
> > > > > interval/burst configuration?
> > > >
> > > > Any details?
> > > >
> > > > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or
> > > > > similar. Not sure if that's considered valid or not. :)
> > > >
> > > > This, OTOH, sounds like an overkill..
> > > >
> > > > I just checked again on the detail of ratelimit code, where we by default
> > > > it has:
> > > >
> > > > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
> > > > #define DEFAULT_RATELIMIT_BURST 10
> > > >
> > > > So it allows a 10 times burst rather than 2.. IIUC it means even if
> > > > there're continous 10 MCEs it won't get suppressed, until the 11th came, in
> > > > 5 seconds interval. I think it means it's possibly even less of a concern
> > > > to directly use pr_err_ratelimited().
> > >
> > > I'm okay with any rate limiting everyone agrees on. IMO, silencing
> > > these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they
> > > did not come from real hardware MCE events) sounds like the most
> > > correct thing to do, but I don't mind. Just don't make UFFDIO_POISON
> > > require CAP_SYS_ADMIN. :)
> > >
> > > Thanks.
> >
>
> --
> Peter Xu
>