perf test 26 in virgin master inspired the below.
The oops is not brand new, apparently arriving sometime after 6.1, as
the stock 6.1.21 Raspian/PiOS kernel does not reproduce but the rpi-
6.4.y pi kernel (https://github.com/raspberrypi/linux.git) branch does.
[ 65.416656] Unable to handle kernel paging request at virtual address ffffffd080a00000
[ 65.416680] Mem abort info:
[ 65.416685] ESR = 0x0000000096000007
[ 65.416690] EC = 0x25: DABT (current EL), IL = 32 bits
[ 65.416696] SET = 0, FnV = 0
[ 65.416701] EA = 0, S1PTW = 0
[ 65.416705] FSC = 0x07: level 3 translation fault
[ 65.416710] Data abort info:
[ 65.416715] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 65.416720] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 65.416725] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 65.416731] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000000f51000
[ 65.416737] [ffffffd080a00000] pgd=10000001fffff003, p4d=10000001fffff003, pud=10000001fffff003, pmd=10000001ffffe003, pte=0000000000000000
[ 65.416758] Internal error: Oops: 0000000096000007 [#1] SMP
[ 65.416764] Modules linked in: rfcomm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep vc4 brcmfmac_wcc 8021q brcmfmac garp stp llc snd_soc_hdmi_codec drm_display_helper hci_uart drm_dma_helper drm_kms_helper btbcm bluetooth cfg80211 bcm2835_v4l2(C) cec videobuf2_vmalloc snd_soc_core videobuf2_memops bcm2835_mmal_vchiq(C) videobuf2_v4l2 videodev snd_pcm_dmaengine snd_bcm2835(C) v3d snd_pcm videobuf2_common snd_timer sg ecdh_generic ecc drm_shmem_helper mc rfkill snd gpu_sched brcmutil i2c_brcmstb raspberrypi_hwmon pwm_bcm2835 i2c_bcm2835 libaes uio_pdrv_genirq uio nvmem_rmem drm nfsd fuse i2c_dev drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 65.416903] CPU: 0 PID: 1769 Comm: objdump Tainted: G C 6.5.0.g0a8db05-master #1
[ 65.416909] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 65.416913] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 65.416919] pc : __arch_copy_to_user+0x180/0x240
[ 65.416929] lr : _copy_to_iter+0xe8/0x4d0
[ 65.416936] sp : ffffffc08219b970
[ 65.416939] x29: ffffffc08219b970 x28: ffffffd080a00000 x27: ffffffd081d5cd30
[ 65.416948] x26: 0000000000000000 x25: ffffffd080a00000 x24: 0000000000000000
[ 65.416956] x23: ffffffc08219bd40 x22: ffffffd081c00000 x21: ffffffc08219bd40
[ 65.416964] x20: ffffffd081c005c0 x19: 0000000000000400 x18: 0000000000000000
[ 65.416972] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffd080a00000
[ 65.416980] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 65.416987] x11: 0000008000000000 x10: ffffffd081c005c0 x9 : 0000000000000230
[ 65.416995] x8 : 0000000004000000 x7 : 0000007e08000000 x6 : 000000559ae52840
[ 65.417003] x5 : 000000559ae52c40 x4 : 0000000000000000 x3 : 0000000000000400
[ 65.417010] x2 : 0000000000000380 x1 : ffffffd080a00000 x0 : 000000559ae52840
[ 65.417018] Call trace:
[ 65.417021] __arch_copy_to_user+0x180/0x240
[ 65.417026] read_kcore_iter+0x5b4/0x830
[ 65.417035] proc_reg_read_iter+0x8c/0xe8
[ 65.417042] vfs_read+0x200/0x2a8
[ 65.417049] ksys_read+0x70/0x108
[ 65.417054] __arm64_sys_read+0x24/0x38
[ 65.417060] invoke_syscall+0x50/0x128
[ 65.417067] el0_svc_common.constprop.0+0x68/0x120
[ 65.417073] do_el0_svc+0x40/0xb8
[ 65.417079] el0_svc+0x30/0x98
[ 65.417086] el0t_64_sync_handler+0xc0/0xc8
[ 65.417092] el0t_64_sync+0x190/0x198
[ 65.417098] Code: d503201f d503201f d503201f d503201f (a8c12027)
Hi Mike,
On Fri, Jul 28, 2023 at 06:06:33AM +0200, Mike Galbraith wrote:
> perf test 26 in virgin master inspired the below.
Ah, so this is the 6.5-based panic message below?
> The oops is not brand new, apparently arriving sometime after 6.1, as
> the stock 6.1.21 Raspian/PiOS kernel does not reproduce but the rpi-
> 6.4.y pi kernel (https://github.com/raspberrypi/linux.git) branch does.
>
> [ 65.416656] Unable to handle kernel paging request at virtual address ffffffd080a00000
> [ 65.416680] Mem abort info:
> [ 65.416685] ESR = 0x0000000096000007
> [ 65.416690] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 65.416696] SET = 0, FnV = 0
> [ 65.416701] EA = 0, S1PTW = 0
> [ 65.416705] FSC = 0x07: level 3 translation fault
> [ 65.416710] Data abort info:
> [ 65.416715] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [ 65.416720] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 65.416725] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 65.416731] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000000f51000
> [ 65.416737] [ffffffd080a00000] pgd=10000001fffff003, p4d=10000001fffff003, pud=10000001fffff003, pmd=10000001ffffe003, pte=0000000000000000
> [ 65.416758] Internal error: Oops: 0000000096000007 [#1] SMP
> [ 65.416764] Modules linked in: rfcomm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep vc4 brcmfmac_wcc 8021q brcmfmac garp stp llc snd_soc_hdmi_codec drm_display_helper hci_uart drm_dma_helper drm_kms_helper btbcm bluetooth cfg80211 bcm2835_v4l2(C) cec videobuf2_vmalloc snd_soc_core videobuf2_memops bcm2835_mmal_vchiq(C) videobuf2_v4l2 videodev snd_pcm_dmaengine snd_bcm2835(C) v3d snd_pcm videobuf2_common snd_timer sg ecdh_generic ecc drm_shmem_helper mc rfkill snd gpu_sched brcmutil i2c_brcmstb raspberrypi_hwmon pwm_bcm2835 i2c_bcm2835 libaes uio_pdrv_genirq uio nvmem_rmem drm nfsd fuse i2c_dev drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
> [ 65.416903] CPU: 0 PID: 1769 Comm: objdump Tainted: G C 6.5.0.g0a8db05-master #1
> [ 65.416909] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
> [ 65.416913] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 65.416919] pc : __arch_copy_to_user+0x180/0x240
> [ 65.416929] lr : _copy_to_iter+0xe8/0x4d0
> [ 65.416936] sp : ffffffc08219b970
> [ 65.416939] x29: ffffffc08219b970 x28: ffffffd080a00000 x27: ffffffd081d5cd30
> [ 65.416948] x26: 0000000000000000 x25: ffffffd080a00000 x24: 0000000000000000
> [ 65.416956] x23: ffffffc08219bd40 x22: ffffffd081c00000 x21: ffffffc08219bd40
> [ 65.416964] x20: ffffffd081c005c0 x19: 0000000000000400 x18: 0000000000000000
> [ 65.416972] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffd080a00000
> [ 65.416980] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [ 65.416987] x11: 0000008000000000 x10: ffffffd081c005c0 x9 : 0000000000000230
> [ 65.416995] x8 : 0000000004000000 x7 : 0000007e08000000 x6 : 000000559ae52840
> [ 65.417003] x5 : 000000559ae52c40 x4 : 0000000000000000 x3 : 0000000000000400
> [ 65.417010] x2 : 0000000000000380 x1 : ffffffd080a00000 x0 : 000000559ae52840
> [ 65.417018] Call trace:
> [ 65.417021] __arch_copy_to_user+0x180/0x240
> [ 65.417026] read_kcore_iter+0x5b4/0x830
> [ 65.417035] proc_reg_read_iter+0x8c/0xe8
> [ 65.417042] vfs_read+0x200/0x2a8
> [ 65.417049] ksys_read+0x70/0x108
> [ 65.417054] __arm64_sys_read+0x24/0x38
> [ 65.417060] invoke_syscall+0x50/0x128
> [ 65.417067] el0_svc_common.constprop.0+0x68/0x120
> [ 65.417073] do_el0_svc+0x40/0xb8
> [ 65.417079] el0_svc+0x30/0x98
> [ 65.417086] el0t_64_sync_handler+0xc0/0xc8
> [ 65.417092] el0t_64_sync+0x190/0x198
Looking at this quickly with Mark, the most likely explanation is that
a bogus kernel address is being passed as the source pointer to
copy_to_user(). On a whim, are you able to revert 4c91c07c93bb ("mm:
vmalloc: convert vread() to vread_iter()") and see if the issue persists? If
not, maybe you can try the commit before?
Otherwise, we'll see if we can reproduce the issue locally.
Cheers,
Will
On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> Hi Mike,
>
> On Fri, Jul 28, 2023 at 06:06:33AM +0200, Mike Galbraith wrote:
> > perf test 26 in virgin master inspired the below.
>
> Ah, so this is the 6.5-based panic message below?
Yeah, that was virgin master.
The oops arrived with 2e1c0170771e, _but_ it turns out to be somehow
config related. The obese distro config rolled forward does not oops,
much leaner local config does, despite cute little rpi4 working just
fine.. for everything _except_ 'perf test 26' :)
Hohum, send report to round file. I'm looking for the option out of
curiosity, but it's not gonna be a long search.
-Mike
On Fri, Jul 28, 2023 at 05:12:43PM +0200, Mike Galbraith wrote:
> On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> > On Fri, Jul 28, 2023 at 06:06:33AM +0200, Mike Galbraith wrote:
> > > perf test 26 in virgin master inspired the below.
> >
> > Ah, so this is the 6.5-based panic message below?
>
> Yeah, that was virgin master.
>
> The oops arrived with 2e1c0170771e, _but_ it turns out to be somehow
> config related. The obese distro config rolled forward does not oops,
> much leaner local config does, despite cute little rpi4 working just
> fine.. for everything _except_ 'perf test 26' :)
The commit mentions HARDENED_USERCOPY, so maybe that? In any case, if
you can share your broken config someplace I can try to give it a spin.
Thanks,
Will
On Fri, 2023-07-28 at 16:23 +0100, Will Deacon wrote:
> In any case, if you can share your broken config someplace I can try
> to give it a spin.
Busted master config attached.
-Mike
On Fri, 2023-07-28 at 17:12 +0200, Mike Galbraith wrote:
>
> The oops arrived with 2e1c0170771e, _but_ it turns out to be somehow
> config related. The obese distro config rolled forward does not oops...
It would have had I actually set PROC_KCORE before building the thing.
-Mike
On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
>
> Looking at this quickly with Mark, the most likely explanation is that
> a bogus kernel address is being passed as the source pointer to
> copy_to_user(). On a whim, are you able to revert 4c91c07c93bb ("mm:
> vmalloc: convert vread() to vread_iter()") and see if the issue persists? If
> not, maybe you can try the commit before?
Ok, did that. Neither helped, I had to revert 2e1c0170771e
6.1.42 clean
6.2.16 not so clean
[ 180.564427] __vm_enough_memory: pid: 1935, comm: objdump, no enough memory for the allocation
[ 180.564445] __vm_enough_memory: pid: 1935, comm: objdump, no enough memory for the allocation
6.4.7 oops
+Revert-mm-vmalloc-convert-vread-to-vread_iter.patch oops
+Revert-iov_iter-add-copy_page_to_iter_nofault.patch oops
+Revert-fs-proc-kcore-convert-read_kcore-to-read_kcore_iter.patch oops
+Revert-fs-proc-kcore-avoid-bounce-buffer-for-ktext-data.patch back to 6.2 induced alloc failures
[ 48.718233] __vm_enough_memory: pid: 1747, comm: objdump, not enough memory for the allocation
[ 48.718264] __vm_enough_memory: pid: 1747, comm: objdump, not enough memory for the allocation
6.4.7 plus only arm64 revert (sorta) of e025ab842ec3 does the same
[ 37.587688] Unhandled KCORE type: 0
[ 37.611042] __vm_enough_memory: pid: 1736, comm: objdump, not enough memory for the allocation
[ 37.611073] __vm_enough_memory: pid: 1736, comm: objdump, not enough memory for the allocation
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -554,6 +554,8 @@ static ssize_t read_kcore_iter(struct ki
fallthrough;
case KCORE_VMEMMAP:
case KCORE_TEXT:
+ if (!kern_addr_valid(start))
+ goto invalid_addr;
/*
* We use _copy_to_iter() to bypass usermode hardening
* which would otherwise prevent this operation.
@@ -563,6 +565,7 @@ static ssize_t read_kcore_iter(struct ki
goto out;
}
break;
+invalid_addr:
default:
pr_warn_once("Unhandled KCORE type: %d\n", m->type);
if (iov_iter_zero(tsz, iter) != tsz) {
On Sun, 2023-07-30 at 07:54 +0200, Mike Galbraith wrote:
>
> 6.1.42 clean
Disregard, git bisect bad bad bad... just informed me that that is
simply not true :/
On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
>
> Looking at this quickly with Mark, the most likely explanation is that
> a bogus kernel address is being passed as the source pointer to
> copy_to_user().
'start' in read_kcore_iter() is bogus a LOT when running perf test 26,
and that back to at least 5.15. Seems removal of bogon-proofing gave a
toothless old bug teeth, but seemingly only to perf test 26. Rummaging
around with crash vmlinux /proc/kcore seems to be bogon free anyway.
Someone should perhaps take a peek at perf. Bogons aside, it also
doesn't seem to care deeply about kernel response. Whether the kernel
oops or I bat 945 bogons aside, it says 'OK'. That seems a tad odd.
-Mike
[+Lorenzo, Kefeng and others]
On Sun, Jul 30, 2023 at 06:09:15PM +0200, Mike Galbraith wrote:
> On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> >
> > Looking at this quickly with Mark, the most likely explanation is that
> > a bogus kernel address is being passed as the source pointer to
> > copy_to_user().
>
> 'start' in read_kcore_iter() is bogus a LOT when running perf test 26,
> and that back to at least 5.15. Seems removal of bogon-proofing gave a
> toothless old bug teeth, but seemingly only to perf test 26. Rummaging
> around with crash vmlinux /proc/kcore seems to be bogon free anyway.
>
> Someone should perhaps take a peek at perf. Bogons aside, it also
> doesn't seem to care deeply about kernel response. Whether the kernel
> oops or I bat 945 bogons aside, it says 'OK'. That seems a tad odd.
Aha, so I think I triggered the issue you're seeing under QEMU (log
below). perf (unhelpfully) doesn't have stable test numbers, so it's
test 21 in my case. However, it only explodes if I run it as root, since
/proc/kcore is 0400 on my system.
The easiest way to trigger the problem is simply:
# objdump -d /proc/kcore
Looking at the history, I wonder whether this is because of a combination
of:
e025ab842ec3 ("mm: remove kern_addr_valid() completely")
which removed the kern_addr_valid() check on the basis that kcore used
copy_from_kernel_nofault() anyway, and:
2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
which replaced the copy_from_kernel_nofault() with _copy_to_user().
So with both of those applied, we're missing the address check on arm64.
Will
--->8
[ 40.136085] Unable to handle kernel paging request at virtual address ffffffdcaabf0000
[ 40.136453] Mem abort info:
[ 40.136570] ESR = 0x0000000096000006
[ 40.136737] EC = 0x25: DABT (current EL), IL = 32 bits
[ 40.136946] SET = 0, FnV = 0
[ 40.137085] EA = 0, S1PTW = 0
[ 40.137232] FSC = 0x06: level 2 translation fault
[ 40.137441] Data abort info:
[ 40.137573] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 40.137801] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 40.138005] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 40.138257] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000e51c5000
[ 40.138516] [ffffffdcaabf0000] pgd=100000023ffff003, p4d=100000023ffff003, pud=100000023ffff003, pmd=0000000000000000
[ 40.139218] Internal error: Oops: 0000000096000006 [#1] SMP
[ 40.139515] Modules linked in:
[ 40.139780] CPU: 1 PID: 225 Comm: objdump Not tainted 6.5.0-rc3-v8 #5
[ 40.140038] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 40.140439] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 40.140710] pc : __arch_copy_to_user+0x180/0x240
[ 40.140944] lr : _copy_to_iter+0xe8/0x4b0
[ 40.141115] sp : ffffffc0805b3950
[ 40.141256] x29: ffffffc0805b3950 x28: ffffffdcaabf0000 x27: ffffffdcabfff228
[ 40.141568] x26: ffffffdcaabf0000 x25: ffffffc0805b3d68 x24: 0000000000000001
[ 40.141841] x23: ffffff80c0e2d880 x22: ffffffc0805b3d40 x21: 0000000000000400
[ 40.142119] x20: ffffffdcabe9cd40 x19: 0000000000000400 x18: 0000000000000000
[ 40.142385] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffdcaabf0000
[ 40.142665] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 40.142933] x11: 0000008000000000 x10: ffffffdcabe9cd40 x9 : 0000000000000230
[ 40.143206] x8 : 0000000000101000 x7 : 0000007e08000000 x6 : 00000055c51a9840
[ 40.143479] x5 : 00000055c51a9c40 x4 : 0000000000000000 x3 : 0000000000000400
[ 40.143761] x2 : 0000000000000380 x1 : ffffffdcaabf0000 x0 : 00000055c51a9840
[ 40.144124] Call trace:
[ 40.144258] __arch_copy_to_user+0x180/0x240
[ 40.144433] read_kcore_iter+0x718/0x810
[ 40.144564] proc_reg_read_iter+0x8c/0xe8
[ 40.144720] vfs_read+0x204/0x2a8
[ 40.144846] ksys_read+0x74/0x108
[ 40.144975] __arm64_sys_read+0x24/0x38
[ 40.145113] invoke_syscall+0x4c/0x118
[ 40.145267] el0_svc_common.constprop.0+0x68/0x120
[ 40.145437] do_el0_svc+0x40/0xa8
[ 40.145568] el0_svc+0x30/0x98
[ 40.145702] el0t_64_sync_handler+0xc0/0xc8
[ 40.145866] el0t_64_sync+0x190/0x198
[ 40.146103] Code: d503201f d503201f d503201f d503201f (a8c12027)
[ 40.146518] ---[ end trace 0000000000000000 ]---
On Mon, 2023-07-31 at 11:43 +0100, Will Deacon wrote:
> The easiest way to trigger the problem is simply:
>
> # objdump -d /proc/kcore
>
> Looking at the history, I wonder whether this is because of a combination
> of:
>
> e025ab842ec3 ("mm: remove kern_addr_valid() completely")
>
> which removed the kern_addr_valid() check on the basis that kcore used
> copy_from_kernel_nofault() anyway, and:
>
> 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
>
> which replaced the copy_from_kernel_nofault() with _copy_to_user().
>
> So with both of those applied, we're missing the address check on arm64.
Hm, I see trace_printk() in the future of self and x86_64 desktop box.
While my attempt to bisect the two arm64 allocation failures was doomed
by an apparent inability to reliably type sudo perf test 26, I still
suspect they might be a part of this fossilized bug.. see last two with
funny looking addr/len.
dmesg|tail -2
[ 1506.732999] __vm_enough_memory: pid: 3824, comm: objdump, not enough memory for the allocation of 64913409 pages
[ 1506.733050] __vm_enough_memory: pid: 3824, comm: objdump, not enough memory for the allocation of 64913441 pages
cat trace|grep 3824
...
objdump-3824 [000] ....... 1506.708497: vm_unmapped_area: addr=0x7fa9ac8000 err=0 total_vm=0x52f flags=0x1 len=0x2e7000 lo=0x1000 hi=0x7faa24e000 mask=0x0 ofs=0x0
objdump-3824 [000] ....... 1506.708657: vm_unmapped_area: addr=0x7faa213000 err=0 total_vm=0x816 flags=0x1 len=0x7000 lo=0x1000 hi=0x7faa24e000 mask=0x0 ofs=0x0
objdump-3824 [000] ....... 1506.709740: vm_unmapped_area: addr=0x7fa86f7000 err=0 total_vm=0x81d flags=0x1 len=0x13d1000 lo=0x1000 hi=0x7faa24e000 mask=0x0 ofs=0x0
objdump-3824 [000] ....... 1506.732991: vm_unmapped_area: addr=0x17995df000 err=0 total_vm=0x831 flags=0x1 len=0x3de8001000 lo=0x1000 hi=0x7faa24e000 mask=0x0 ofs=0x0
objdump-3824 [000] ....... 1506.733049: vm_unmapped_area: addr=0x17995bf000 err=0 total_vm=0x831 flags=0x1 len=0x3de8021000 lo=0x1000 hi=0x7faa24e000 mask=0x0 ofs=0x0
objdump-3824 [000] ....... 1506.733846: exit_mmap: mt_mod ffffff8149991980, DESTROY
On Mon, Jul 31, 2023 at 11:43:40AM +0100, Will Deacon wrote:
> [+Lorenzo, Kefeng and others]
>
> On Sun, Jul 30, 2023 at 06:09:15PM +0200, Mike Galbraith wrote:
> > On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> > >
> > > Looking at this quickly with Mark, the most likely explanation is that
> > > a bogus kernel address is being passed as the source pointer to
> > > copy_to_user().
> >
> > 'start' in read_kcore_iter() is bogus a LOT when running perf test 26,
> > and that back to at least 5.15. Seems removal of bogon-proofing gave a
> > toothless old bug teeth, but seemingly only to perf test 26. Rummaging
> > around with crash vmlinux /proc/kcore seems to be bogon free anyway.
> >
> > Someone should perhaps take a peek at perf. Bogons aside, it also
> > doesn't seem to care deeply about kernel response. Whether the kernel
> > oops or I bat 945 bogons aside, it says 'OK'. That seems a tad odd.
>
> Aha, so I think I triggered the issue you're seeing under QEMU (log
> below). perf (unhelpfully) doesn't have stable test numbers, so it's
> test 21 in my case. However, it only explodes if I run it as root, since
> /proc/kcore is 0400 on my system.
>
> The easiest way to trigger the problem is simply:
>
> # objdump -d /proc/kcore
>
> Looking at the history, I wonder whether this is because of a combination
> of:
>
> e025ab842ec3 ("mm: remove kern_addr_valid() completely")
>
> which removed the kern_addr_valid() check on the basis that kcore used
> copy_from_kernel_nofault() anyway, and:
>
> 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
>
> which replaced the copy_from_kernel_nofault() with _copy_to_user().
>
> So with both of those applied, we're missing the address check on arm64.
Digging into this a little more, the fault occurs because kcore is
treating everything from '_text' to '_end' as KCORE_TEXT and expects it
to be mapped linearly. However, there's plenty of stuff we _don't_ map
in that range on arm64 (e.g. .head.text, the pKVM hypervisor, the entry
trampoline) so kcore is broken.
One hack is to limit KCORE_TEXT to actually point at the kernel text
(see below), but this is a user-visible change in behaviour for things
like .data so I think it would be better to restore the old behaviour
of handling the faults.
Lorenzo?
Will
--->8
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 9cb32e1a78a0..3696a209c1ec 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -635,7 +635,7 @@ static struct kcore_list kcore_text;
*/
static void __init proc_kcore_text_init(void)
{
- kclist_add(&kcore_text, _text, _end - _text, KCORE_TEXT);
+ kclist_add(&kcore_text, _stext, _etext - _stext, KCORE_TEXT);
}
#else
static void __init proc_kcore_text_init(void)
On Mon, 2023-07-31 at 12:52 +0100, Will Deacon wrote:
>
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 9cb32e1a78a0..3696a209c1ec 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -635,7 +635,7 @@ static struct kcore_list kcore_text;
> */
> static void __init proc_kcore_text_init(void)
> {
> - kclist_add(&kcore_text, _text, _end - _text, KCORE_TEXT);
> + kclist_add(&kcore_text, _stext, _etext - _stext, KCORE_TEXT);
> }
> #else
> static void __init proc_kcore_text_init(void)
That did kill all 945 bogus start instances, and in a brief test drive,
did not appear to have annoyed crash.
The two odd arm64 specific huge objdump allocations persist fwtw.
Unrelated fossilized bug?.. who knows.
-Mike
On Mon, 31 Jul 2023 at 12:52, Will Deacon <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 11:43:40AM +0100, Will Deacon wrote:
> > [+Lorenzo, Kefeng and others]
> >
> > On Sun, Jul 30, 2023 at 06:09:15PM +0200, Mike Galbraith wrote:
> > > On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> > > >
> > > > Looking at this quickly with Mark, the most likely explanation is that
> > > > a bogus kernel address is being passed as the source pointer to
> > > > copy_to_user().
> > >
> > > 'start' in read_kcore_iter() is bogus a LOT when running perf test 26,
> > > and that back to at least 5.15. Seems removal of bogon-proofing gave a
> > > toothless old bug teeth, but seemingly only to perf test 26. Rummaging
> > > around with crash vmlinux /proc/kcore seems to be bogon free anyway.
> > >
> > > Someone should perhaps take a peek at perf. Bogons aside, it also
> > > doesn't seem to care deeply about kernel response. Whether the kernel
> > > oops or I bat 945 bogons aside, it says 'OK'. That seems a tad odd.
> >
> > Aha, so I think I triggered the issue you're seeing under QEMU (log
> > below). perf (unhelpfully) doesn't have stable test numbers, so it's
> > test 21 in my case. However, it only explodes if I run it as root, since
> > /proc/kcore is 0400 on my system.
> >
> > The easiest way to trigger the problem is simply:
> >
> > # objdump -d /proc/kcore
> >
> > Looking at the history, I wonder whether this is because of a combination
> > of:
> >
> > e025ab842ec3 ("mm: remove kern_addr_valid() completely")
> >
> > which removed the kern_addr_valid() check on the basis that kcore used
> > copy_from_kernel_nofault() anyway, and:
> >
> > 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
> >
> > which replaced the copy_from_kernel_nofault() with _copy_to_user().
> >
> > So with both of those applied, we're missing the address check on arm64.
>
> Digging into this a little more, the fault occurs because kcore is
> treating everything from '_text' to '_end' as KCORE_TEXT and expects it
> to be mapped linearly. However, there's plenty of stuff we _don't_ map
> in that range on arm64 (e.g. .head.text, the pKVM hypervisor, the entry
> trampoline) so kcore is broken.
>
> One hack is to limit KCORE_TEXT to actually point at the kernel text
> (see below), but this is a user-visible change in behaviour for things
> like .data so I think it would be better to restore the old behaviour
> of handling the faults.
>
> Lorenzo?
FYI there is a parallel discussion at
https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava/ :)
[sorry lei isn't playing ball so will have to reply from gmail,
apologies if this breaks formatting]
It'd be a real pity to have to revert that behaviour, as using a
bounce buffer is such a hack and means you have to iterate through a
page at a time...
Either that or a change such that for KCORE_TEXT specifically we
reinstate the bounce buffer and use copy_from_kernel_nofault().
It definitely is a bug in kcore to have ranges of memory that are not
mapped marked as readable. What kind of behaviour changes do you
anticipate exactly with your prospective change re: .data? The
fallthroughs?
kcore as a whole needs some love and attention I think.
An alternative is to implement some version of
copy_from_kernel_nofault() in the iterator code.
However TL;DR - I think we probably do need a semi-revert and to just
make the ktext do a bounce buffer thing. I definitely want to keep the
use of iterators so I would really not want to revert anything else.
>
> Will
>
> --->8
>
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 9cb32e1a78a0..3696a209c1ec 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -635,7 +635,7 @@ static struct kcore_list kcore_text;
> */
> static void __init proc_kcore_text_init(void)
> {
> - kclist_add(&kcore_text, _text, _end - _text, KCORE_TEXT);
> + kclist_add(&kcore_text, _stext, _etext - _stext, KCORE_TEXT);
> }
> #else
> static void __init proc_kcore_text_init(void)
>
--
Lorenzo Stoakes
https://ljs.io
On Mon, 31 Jul 2023 at 22:08, Lorenzo Stoakes <[email protected]> wrote:
>
> On Mon, 31 Jul 2023 at 12:52, Will Deacon <[email protected]> wrote:
> >
> > On Mon, Jul 31, 2023 at 11:43:40AM +0100, Will Deacon wrote:
> > > [+Lorenzo, Kefeng and others]
> > >
> > > On Sun, Jul 30, 2023 at 06:09:15PM +0200, Mike Galbraith wrote:
> > > > On Fri, 2023-07-28 at 15:18 +0100, Will Deacon wrote:
> > > > >
> > > > > Looking at this quickly with Mark, the most likely explanation is that
> > > > > a bogus kernel address is being passed as the source pointer to
> > > > > copy_to_user().
> > > >
> > > > 'start' in read_kcore_iter() is bogus a LOT when running perf test 26,
> > > > and that back to at least 5.15. Seems removal of bogon-proofing gave a
> > > > toothless old bug teeth, but seemingly only to perf test 26. Rummaging
> > > > around with crash vmlinux /proc/kcore seems to be bogon free anyway.
> > > >
> > > > Someone should perhaps take a peek at perf. Bogons aside, it also
> > > > doesn't seem to care deeply about kernel response. Whether the kernel
> > > > oops or I bat 945 bogons aside, it says 'OK'. That seems a tad odd.
> > >
> > > Aha, so I think I triggered the issue you're seeing under QEMU (log
> > > below). perf (unhelpfully) doesn't have stable test numbers, so it's
> > > test 21 in my case. However, it only explodes if I run it as root, since
> > > /proc/kcore is 0400 on my system.
> > >
> > > The easiest way to trigger the problem is simply:
> > >
> > > # objdump -d /proc/kcore
> > >
> > > Looking at the history, I wonder whether this is because of a combination
> > > of:
> > >
> > > e025ab842ec3 ("mm: remove kern_addr_valid() completely")
> > >
> > > which removed the kern_addr_valid() check on the basis that kcore used
> > > copy_from_kernel_nofault() anyway, and:
> > >
> > > 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
> > >
> > > which replaced the copy_from_kernel_nofault() with _copy_to_user().
> > >
> > > So with both of those applied, we're missing the address check on arm64.
> >
> > Digging into this a little more, the fault occurs because kcore is
> > treating everything from '_text' to '_end' as KCORE_TEXT and expects it
> > to be mapped linearly. However, there's plenty of stuff we _don't_ map
> > in that range on arm64 (e.g. .head.text, the pKVM hypervisor, the entry
> > trampoline) so kcore is broken.
> >
> > One hack is to limit KCORE_TEXT to actually point at the kernel text
> > (see below), but this is a user-visible change in behaviour for things
> > like .data so I think it would be better to restore the old behaviour
> > of handling the faults.
> >
> > Lorenzo?
>
> FYI there is a parallel discussion at
> https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava/ :)
>
> [sorry lei isn't playing ball so will have to reply from gmail,
> apologies if this breaks formatting]
>
> It'd be a real pity to have to revert that behaviour, as using a
> bounce buffer is such a hack and means you have to iterate through a
> page at a time...
> Either that or a change such that for KCORE_TEXT specifically we
> reinstate the bounce buffer and use copy_from_kernel_nofault().
>
> It definitely is a bug in kcore to have ranges of memory that are not
> mapped marked as readable. What kind of behaviour changes do you
> anticipate exactly with your prospective change re: .data? The
> fallthroughs?
>
> kcore as a whole needs some love and attention I think.
>
> An alternative is to implement some version of
> copy_from_kernel_nofault() in the iterator code.
>
> However TL;DR - I think we probably do need a semi-revert and to just
> make the ktext do a bounce buffer thing. I definitely want to keep the
> use of iterators so I would really not want to revert anything else.
>
>
> >
> > Will
> >
> > --->8
> >
> > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > index 9cb32e1a78a0..3696a209c1ec 100644
> > --- a/fs/proc/kcore.c
> > +++ b/fs/proc/kcore.c
> > @@ -635,7 +635,7 @@ static struct kcore_list kcore_text;
> > */
> > static void __init proc_kcore_text_init(void)
> > {
> > - kclist_add(&kcore_text, _text, _end - _text, KCORE_TEXT);
> > + kclist_add(&kcore_text, _stext, _etext - _stext, KCORE_TEXT);
> > }
> > #else
> > static void __init proc_kcore_text_init(void)
> >
>
>
Posted a fix at:-
https://lore.kernel.org/all/[email protected]/
Please give that a go and indicate whether that resolves the issue.
This is in effect a partial revert in order to use the
copy_from_kernel_nofault() function to avoid faults on reading
unmapped regions, and sadly necessitates the use of a bounce buffer.
Hopefully this should be something resembling the smallest change we
can do to resolve the problem.
On Mon, 2023-07-31 at 22:54 +0100, Lorenzo Stoakes wrote:
>
> Posted a fix at:-
>
> https://lore.kernel.org/all/[email protected]/
>
> Please give that a go and indicate whether that resolves the issue.
Yup, that fixed the oops issue.
-Mike
On Tue, 2023-08-01 at 01:27 +0200, Mike Galbraith wrote:
> On Mon, 2023-07-31 at 22:54 +0100, Lorenzo Stoakes wrote:
> >
> > Posted a fix at:-
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > Please give that a go and indicate whether that resolves the issue.
>
> Yup, that fixed the oops issue.
Heh, using Jiri's pick a spot from /proc/kallsyms test...
root@rpi4:~# cat /proc/kallsyms | grep ksys_read
ffffffe950baf7f8 T ksys_readahead
ffffffe950c38708 T ksys_read
root@rpi4:~# objdump -d --start-address=0xffffffe950c38708 --stop-address=0xffffffe950c3870a /proc/kcore
/proc/kcore: file format elf64-littleaarch64
Disassembly of section load1:
ffffffe950c38708 <load1+0x238708>:
ffffffe950c38708: Address 0xffffffe950c38708 is out of bounds.
objdump: error: /proc/kcore(load2) is too large (0x3de8000000 bytes)
objdump: Reading section load2 failed because: memory exhausted
root@rpi4:~# dmesg|tail -2
[ 979.003019] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
[ 979.003080] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
On Tue, 2023-08-01 at 01:48 +0200, Mike Galbraith wrote:
> On Tue, 2023-08-01 at 01:27 +0200, Mike Galbraith wrote:
> > On Mon, 2023-07-31 at 22:54 +0100, Lorenzo Stoakes wrote:
> > >
> > > Posted a fix at:-
> > >
> > > https://lore.kernel.org/all/[email protected]/
> > >
> > > Please give that a go and indicate whether that resolves the issue.
> >
> > Yup, that fixed the oops issue.
>
> Heh, using Jiri's pick a spot from /proc/kallsyms test...
>
> root@rpi4:~# cat /proc/kallsyms | grep ksys_read
> ffffffe950baf7f8 T ksys_readahead
> ffffffe950c38708 T ksys_read
> root@rpi4:~# objdump -d --start-address=0xffffffe950c38708 --stop-address=0xffffffe950c3870a /proc/kcore
>
> /proc/kcore: file format elf64-littleaarch64
>
>
> Disassembly of section load1:
>
> ffffffe950c38708 <load1+0x238708>:
> ffffffe950c38708: Address 0xffffffe950c38708 is out of bounds.
>
> objdump: error: /proc/kcore(load2) is too large (0x3de8000000 bytes)
> objdump: Reading section load2 failed because: memory exhausted
> root@rpi4:~# dmesg|tail -2
> [ 979.003019] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
> [ 979.003080] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
crash> p ksys_read
ksys_read = $2 =
{ssize_t (unsigned int, char *, size_t)} 0xffffffe950c38710 <ksys_read>
crash> dis 0xffffffe950c38710 10
0xffffffe950c38710 <ksys_read+8>: paciasp
0xffffffe950c38714 <ksys_read+12>: stp x29, x30, [sp, #-64]!
0xffffffe950c38718 <ksys_read+16>: mrs x3, sp_el0
0xffffffe950c3871c <ksys_read+20>: mov x29, sp
0xffffffe950c38720 <ksys_read+24>: stp x21, x22, [sp, #32]
0xffffffe950c38724 <ksys_read+28>: mov x21, x1
0xffffffe950c38728 <ksys_read+32>: mov x22, x2
0xffffffe950c3872c <ksys_read+36>: ldr x1, [x3, #1320]
0xffffffe950c38730 <ksys_read+40>: str x1, [sp, #56]
0xffffffe950c38734 <ksys_read+44>: mov x1, #0x0 // #0
crash>
Ah, so arm64 kallsyms fibs a little.. x64_64 doesn't.
root@rpi4:~# objdump -d --start-address=0xffffffe950c38710 --stop-address=0xffffffe950c38734 /proc/kcore
/proc/kcore: file format elf64-littleaarch64
Disassembly of section load1:
ffffffe950c38710 <load1+0x238710>:
ffffffe950c38710: d503233f paciasp
ffffffe950c38714: a9bc7bfd stp x29, x30, [sp, #-64]!
ffffffe950c38718: d5384103 mrs x3, sp_el0
ffffffe950c3871c: 910003fd mov x29, sp
ffffffe950c38720: a9025bf5 stp x21, x22, [sp, #32]
ffffffe950c38724: aa0103f5 mov x21, x1
ffffffe950c38728: aa0203f6 mov x22, x2
ffffffe950c3872c: f9429461 ldr x1, [x3, #1320]
ffffffe950c38730: f9001fe1 str x1, [sp, #56]
objdump: error: /proc/kcore(load2) is too large (0x3de8000000 bytes)
objdump: Reading section load2 failed because: memory exhausted
root@rpi4:~#
On Tue, 2023-08-01 at 01:48 +0200, Mike Galbraith wrote:
>
> root@rpi4:~# dmesg|tail -2
> [ 979.003019] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
> [ 979.003080] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
Ancient bug droppings are due to VMALLOC_END-VMALLOC_START=265885319168.
-Mike
On Tue, Aug 01, 2023 at 06:34:15AM +0200, Mike Galbraith wrote:
> On Tue, 2023-08-01 at 01:48 +0200, Mike Galbraith wrote:
> >
> > root@rpi4:~# dmesg|tail -2
> > [? 979.003019] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
> > [? 979.003080] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
>
> Ancient bug droppings are due to VMALLOC_END-VMALLOC_START=265885319168.
FWIW, I see the same thing on my x86 laptop. I'm guessing objdump tries
to read all of the vmalloc space into memory, which is going to end
pretty badly!
Will
On Tue, 2023-08-01 at 08:27 +0100, Will Deacon wrote:
> On Tue, Aug 01, 2023 at 06:34:15AM +0200, Mike Galbraith wrote:
> > On Tue, 2023-08-01 at 01:48 +0200, Mike Galbraith wrote:
> > >
> > > root@rpi4:~# dmesg|tail -2
> > > [ 979.003019] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
> > > [ 979.003080] __vm_enough_memory: pid: 12439, comm: objdump, not enough memory for the allocation
> >
> > Ancient bug droppings are due to VMALLOC_END-VMALLOC_START=265885319168.
>
> FWIW, I see the same thing on my x86 laptop. I'm guessing objdump tries
> to read all of the vmalloc space into memory, which is going to end
> pretty badly!
Weird, I don't see that even booting my 16G desktop box with mem=4G,
with a crashdump kernel setup and the GUI up and running...
homer:/root # echo 0 > /proc/sys/vm/overcommit_memory
homer:/root # free -h
total used free shared buff/cache available
Mem: 3.2Gi 1.6Gi 667Mi 18Mi 1.1Gi 1.6Gi
Swap: 4.0Gi 0B 4.0Gi
homer:/root # swapoff -a
homer:/root # perf test 26
26: Object code reading : Ok
homer:/root # dmesg | tail
[ 9.379431] br0: port 1(eth0) entered blocking state
[ 9.379434] br0: port 1(eth0) entered forwarding state
[ 9.379527] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready
[ 9.388643] NET: Registered PF_PACKET protocol family
[ 17.225053] usblp0: removed
[ 17.226278] usblp 2-10:1.1: usblp0: USB Bidirectional printer dev 5 if 1 alt 0 proto 2 vid 0x04E8 pid 0x342E
[ 17.911334] r8169 0000:03:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
[ 18.430787] NFSD: Using UMH upcall client tracking operations.
[ 18.430802] NFSD: starting 90-second grace period (net f0000000)
[ 20.578718] logitech-hidpp-device 0003:046D:401B.0006: HID++ 2.0 device connected.
homer:/root # uname -r
6.4.7-stable
homer:/root #
Same for a limited objdump -d, though unrestricted will oom instantly.
-Mike