Attempting to build strace-4.18 as sparcv9 code and run its test suite
on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
because two test binaries (sched_xetattr and poll) OOPS the kernel and
get killed. Sample dmesg from 4.13-rc2:
[42912.270398] Unable to handle kernel NULL pointer dereference
[42912.327717] tsk->{mm,active_mm}->context = 000000000000136a
[42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000
[42912.435247] \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
[42912.559982] sched_xetattr(21866): Oops [#1]
[42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
[42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000
[42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted
[42912.845079] TPC: <__bzero+0x20/0xc0>
[42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100
[42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030
[42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030
[42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4
[42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150>
[42913.311046] l0: 00000000f7b6caa8 l1: 00000000cccccccd l2: 00000000ffc2e7d4 l3: 00000000f7b6c888
[42913.408293] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000f7ba2000
[42913.505627] i0: 0000000000000000 i1: 00000000f79f1ffc i2: 0000000000000000 i3: 0000000000000000
[42913.602940] i4: fff0000227defe70 i5: fff0000227defe70 i6: fff0000227def6a1 i7: 00000000004061b4
[42913.700268] I7: <linux_sparc_syscall32+0x34/0x60>
[42913.744966] Call Trace:
[42913.759938] [00000000004061b4] linux_sparc_syscall32+0x34/0x60
[42913.820656] Disabling lock debugging due to kernel taint
[42913.873374] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60
[42913.940953] Caller[0000000000010ed0]: 0x10ed0
[42913.981081] Instruction DUMP:
[42913.981085] c56a2000
[42914.002817] 808a2003
[42914.016643] 02480006
[42914.030363] <d42a2000>
[42914.044094] 90022001
[42914.057816] 808a2003
[42914.071539] 1247fffd
[42914.085269] 92226001
[42914.098992] 808a2007
[42914.471525] Unable to handle kernel NULL pointer dereference
[42914.528830] tsk->{mm,active_mm}->context = 00000000000017cd
[42914.584862] tsk->{mm,active_mm}->pgd = fff0000227b78000
[42914.636319] \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
[42914.761013] sched_xetattr(22483): Oops [#2]
[42914.798837] CPU: 0 PID: 22483 Comm: sched_xetattr Tainted: G D 4.13.0-rc2 #1
[42914.889222] task: fff000123c73bc00 task.stack: fff0001238998000
[42914.949915] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Tainted: G D
[42915.078076] TPC: <__bzero+0x20/0xc0>
[42915.107875] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100
[42915.205205] g4: fff000123c73bc00 g5: fff000023ef44000 g6: fff0001238998000 g7: 0000000000000030
[42915.302532] o0: 0000000000000030 o1: fff000123899be70 o2: 0000000000000000 o3: 0000000000000030
[42915.399851] o4: fff000123899be70 o5: 0000000000000000 sp: fff000123899b5c1 ret_pc: 0000000000474fa4
[42915.501731] RPC: <SyS_sched_setattr+0xb0/0x150>
[42915.544033] l0: 00000000f784caa8 l1: 00000000cccccccd l2: 00000000ff91c7d4 l3: 00000000f784c888
[42915.641289] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 00000000f7882000
[42915.738582] i0: 0000000000000000 i1: 00000000f76d1ffc i2: 0000000000000000 i3: 0000000000000000
[42915.835827] i4: fff000123899be70 i5: fff000123899be70 i6: fff000123899b6a1 i7: 00000000004061b4
[42915.933160] I7: <linux_sparc_syscall32+0x34/0x60>
[42915.977822] Call Trace:
[42915.992698] [00000000004061b4] linux_sparc_syscall32+0x34/0x60
[42916.053335] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60
[42916.120934] Caller[0000000000010ed0]: 0x10ed0
[42916.161052] Instruction DUMP:
[42916.161056] c56a2000
[42916.182878] 808a2003
[42916.196607] 02480006
[42916.210330] <d42a2000>
[42916.224052] 90022001
[42916.237781] 808a2003
[42916.251502] 1247fffd
[42916.265224] 92226001
[42916.278955] 808a2007
[42918.071476] ------------[ cut here ]------------
[42918.115146] WARNING: CPU: 0 PID: 23177 at arch/sparc/kernel/sys_sparc32.c:150 compat_SyS_sparc_sigaction+0x34/0x4c
[42918.234167] Modules linked in: af_packet ipv6 hid_generic tg3 hwmon i2c_ali1535 ohci_pci ptp ohci_hcd evdev i2c_core pps_core flash sr_mod cdrom pata_ali libata
[42918.405845] CPU: 0 PID: 23177 Comm: sigaction Tainted: G D 4.13.0-rc2 #1
[42918.491645] Call Trace:
[42918.506518] [0000000000455b18] __warn+0xb4/0xc4
[42918.549976] [00000000004449e4] compat_SyS_sparc_sigaction+0x34/0x4c
[42918.616319] [00000000004061b4] linux_sparc_syscall32+0x34/0x60
[42918.677014] ---[ end trace 4800f70b0fef934e ]---
[42947.617187] Unable to handle kernel NULL pointer dereference
[42947.674440] tsk->{mm,active_mm}->context = 00000000000018d3
[42947.730560] tsk->{mm,active_mm}->pgd = fff0000202a04000
[42947.782020] \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
[42947.906726] poll(31644): Oops [#3]
[42947.934244] CPU: 0 PID: 31644 Comm: poll Tainted: G D W 4.13.0-rc2 #1
[42948.014399] task: fff000023c68cb00 task.stack: fff0000227adc000
[42948.075064] TSTATE: 0000004411001601 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Tainted: G D W
[42948.203275] TPC: <__bzero+0x20/0xc0>
[42948.233069] g0: fff000123c5a8828 g1: 0000000000000000 g2: 0000000000000000 g3: 00000000008ca100
[42948.330322] g4: fff000023c68cb00 g5: fff000023ef44000 g6: fff0000227adc000 g7: 0000000000000008
[42948.427651] o0: 000000000000000c o1: fff0000227adfa80 o2: 0000000000000000 o3: 000000000000000c
[42948.524959] o4: fff0000227adfa7c o5: 00000000000000fb sp: fff0000227adf181 ret_pc: 0000000000516ee0
[42948.626876] RPC: <do_sys_poll+0x80/0x3c0>
[42948.662408] l0: 0000000000000002 l1: 00000000014000c0 l2: 00000000000003fe l3: fff0000227adfa7c
[42948.759738] l4: 0000000000000000 l5: 0000000000000000 l6: 000000000000006d l7: ffffffffffffffea
[42948.857064] i0: 00000000f7dbdff8 i1: 0000000000000002 i2: fff0000227adfe90 i3: fff0000227adfa70
[42948.954389] i4: 000ffffdd8520590 i5: fff0000227adfa70 i6: fff0000227adf5e1 i7: 00000000005177f8
[42949.051703] I7: <SyS_poll+0x74/0xd0>
[42949.081507] Call Trace:
[42949.096407] [00000000005177f8] SyS_poll+0x74/0xd0
[42949.142242] [00000000004061b4] linux_sparc_syscall32+0x34/0x60
[42949.202876] Caller[00000000005177f8]: SyS_poll+0x74/0xd0
[42949.255596] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60
[42949.323177] Caller[0000000000010a20]: 0x10a20
[42949.363284] Instruction DUMP:
[42949.363288] c56a2000
[42949.385037] 808a2003
[42949.398841] 02480006
[42949.412564] <d42a2000>
[42949.426287] 90022001
[42949.440034] 808a2003
[42949.453739] 1247fffd
[42949.467465] 92226001
[42949.481188] 808a2007
[42965.393520] pc[534]: segfault at 1085c ip 000000000001085c (rpc 0000000000010854) sp 00000000ffba8da8 error 30001 in pc[20000+2000]
This occurs reliably with the 4.13-rc2, 4.13-rc1, and 4.12.0 kernels.
With 4.11.0 and older kernels the binaries get some EFAULTs but they
survive that, and there are also no OOPSes.
/Mikael
From: Mikael Pettersson <[email protected]>
Date: Thu, 27 Jul 2017 21:45:25 +0200
> Attempting to build strace-4.18 as sparcv9 code and run its test suite
> on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
> reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
> because two test binaries (sched_xetattr and poll) OOPS the kernel and
> get killed. Sample dmesg from 4.13-rc2:
>
> [42912.270398] Unable to handle kernel NULL pointer dereference
> [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a
> [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000
> [42912.435247] \|/ ____ \|/
> "@'/ .. \`@"
> /_| \__/ |_\
> \__U_/
> [42912.559982] sched_xetattr(21866): Oops [#1]
> [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
> [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000
> [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted
> [42912.845079] TPC: <__bzero+0x20/0xc0>
> [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100
> [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030
> [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030
> [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4
> [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150>
This looks really strange. It is a memset() call with the buffer pointer
and length arguments reversed.
What exact command did you give to configure and build strace-4.18 so that
I can try to reproduce this?
Thanks.
David Miller writes:
> From: Mikael Pettersson <[email protected]>
> Date: Thu, 27 Jul 2017 21:45:25 +0200
>
> > Attempting to build strace-4.18 as sparcv9 code and run its test suite
> > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
> > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
> > because two test binaries (sched_xetattr and poll) OOPS the kernel and
> > get killed. Sample dmesg from 4.13-rc2:
> >
> > [42912.270398] Unable to handle kernel NULL pointer dereference
> > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a
> > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000
> > [42912.435247] \|/ ____ \|/
> > "@'/ .. \`@"
> > /_| \__/ |_\
> > \__U_/
> > [42912.559982] sched_xetattr(21866): Oops [#1]
> > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
> > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000
> > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted
> > [42912.845079] TPC: <__bzero+0x20/0xc0>
> > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100
> > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030
> > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030
> > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4
> > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150>
>
> This looks really strange. It is a memset() call with the buffer pointer
> and length arguments reversed.
>
> What exact command did you give to configure and build strace-4.18 so that
> I can try to reproduce this?
It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
build log the following should do it:
export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
make -j2
make -j2 -k check VERBOSE=1
/Mikael
From: Mikael Pettersson <[email protected]>
Date: Fri, 28 Jul 2017 10:45:15 +0200
> David Miller writes:
> > From: Mikael Pettersson <[email protected]>
> > Date: Thu, 27 Jul 2017 21:45:25 +0200
> >
> > > Attempting to build strace-4.18 as sparcv9 code and run its test suite
> > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
> > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
> > > because two test binaries (sched_xetattr and poll) OOPS the kernel and
> > > get killed. Sample dmesg from 4.13-rc2:
> > >
> > > [42912.270398] Unable to handle kernel NULL pointer dereference
> > > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a
> > > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000
> > > [42912.435247] \|/ ____ \|/
> > > "@'/ .. \`@"
> > > /_| \__/ |_\
> > > \__U_/
> > > [42912.559982] sched_xetattr(21866): Oops [#1]
> > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
> > > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000
> > > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted
> > > [42912.845079] TPC: <__bzero+0x20/0xc0>
> > > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100
> > > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030
> > > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030
> > > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4
> > > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150>
> >
> > This looks really strange. It is a memset() call with the buffer pointer
> > and length arguments reversed.
> >
> > What exact command did you give to configure and build strace-4.18 so that
> > I can try to reproduce this?
>
> It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
> build log the following should do it:
>
> export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
> ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
> sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
> statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
> make -j2
> make -j2 -k check VERBOSE=1
I guess your gcc is emitting 64-bit code by default?
Because simply using that configure line doesn't cause any problems for me and
I get 32-bit binaries from the build.
.
From: David Miller <[email protected]>
Date: Fri, 28 Jul 2017 11:27:41 -0700 (PDT)
> From: Mikael Pettersson <[email protected]>
> Date: Fri, 28 Jul 2017 10:45:15 +0200
>
>> David Miller writes:
>> > From: Mikael Pettersson <[email protected]>
>> > Date: Thu, 27 Jul 2017 21:45:25 +0200
>> >
>> > > Attempting to build strace-4.18 as sparcv9 code and run its test suite
>> > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
>> > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
>> > > because two test binaries (sched_xetattr and poll) OOPS the kernel and
>> > > get killed. Sample dmesg from 4.13-rc2:
>> > >
>> > > [42912.270398] Unable to handle kernel NULL pointer dereference
>> > > [42912.327717] tsk->{mm,active_mm}->context = 000000000000136a
>> > > [42912.383789] tsk->{mm,active_mm}->pgd = fff0000227db4000
>> > > [42912.435247] \|/ ____ \|/
>> > > "@'/ .. \`@"
>> > > /_| \__/ |_\
>> > > \__U_/
>> > > [42912.559982] sched_xetattr(21866): Oops [#1]
>> > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
>> > > [42912.672138] task: fff0000229a5c380 task.stack: fff0000227dec000
>> > > [42912.732876] TSTATE: 0000004411001603 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted
>> > > [42912.845079] TPC: <__bzero+0x20/0xc0>
>> > > [42912.874870] g0: 0000000000000000 g1: 0000000000000000 g2: 0000003000000000 g3: 00000000008ca100
>> > > [42912.972120] g4: fff0000229a5c380 g5: fff000023ef44000 g6: fff0000227dec000 g7: 0000000000000030
>> > > [42913.069446] o0: 0000000000000030 o1: fff0000227defe70 o2: 0000000000000000 o3: 0000000000000030
>> > > [42913.166765] o4: fff0000227defe70 o5: 0000000000000000 sp: fff0000227def5c1 ret_pc: 0000000000474fa4
>> > > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150>
>> >
>> > This looks really strange. It is a memset() call with the buffer pointer
>> > and length arguments reversed.
>> >
>> > What exact command did you give to configure and build strace-4.18 so that
>> > I can try to reproduce this?
>>
>> It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
>> build log the following should do it:
>>
>> export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
>> ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
>> sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
>> statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
>> make -j2
>> make -j2 -k check VERBOSE=1
>
> I guess your gcc is emitting 64-bit code by default?
>
> Because simply using that configure line doesn't cause any problems for me and
> I get 32-bit binaries from the build.
> .
I've just also done a forced 64-bit build with "CC="gcc -m64
./configure --build=sparc64-unknown-linux-gnu ..." and it built just
fine and the testsuite ran without incident.
I cannot reporduce your crashes at all.
Please provide me with the binaries you have which trigger the OOPS
and tell me exactly how to run them.
On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
<[email protected]> wrote:
> It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
> build log the following should do it:
>
> export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
> ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
> sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
> statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
> make -j2
> make -j2 -k check VERBOSE=1
cant' reproduce it here on debian sparc64 LDOM:
used git version of strace ( https://github.com/strace/strace )
strace$ make -j24 check VERBOSE=1
...
============================================================================
Testsuite summary for strace 4.18.0.134.805d
============================================================================
# TOTAL: 443
# PASS: 389
# SKIP: 40
# XFAIL: 0
# FAIL: 14
# XPASS: 0
# ERROR: 0
while in kernel logs (journalctl -k -f):
Jul 29 12:49:22 ttip kernel: mmap: remap_file_page (77341) uses
deprecated remap_file_pages() syscall. See
Documentation/vm/remap_file_pages.txt.
Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses
deprecated v2 capabilities in a way that may be insecure
Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses 32-bit
capabilities (legacy support in use)
Jul 29 12:49:22 ttip kernel: ------------[ cut here ]------------
Jul 29 12:49:22 ttip kernel: WARNING: CPU: 18 PID: 78388 at
arch/sparc/kernel/sys_sparc32.c:150
compat_SyS_sparc_sigaction+0x3c/0x60
Jul 29 12:49:22 ttip kernel: Modules linked in: tcp_diag inet_diag
xfrm_user xfrm_algo nfnetlink netlink_diag xt_tcpudp xt_multiport
xt_conntrack tun iptable_filter iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xfs camellia_sparc64 des_sparc64
des_generic aes_sparc64 n2_rng md5_sparc64 rng_core flash
sha512_sparc64 sha256_sparc64 sha1_sparc64 nf_nat_pptp
nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat
nf_conntrack libcrc32c crc32c_generic ip_tables x_tables autofs4 ext4
crc16 mbcache jbd2 crc32c_sparc64
Jul 29 12:49:22 ttip kernel: CPU: 18 PID: 78388 Comm: sigaction Not
tainted 4.13.0-rc2-00220-g0a07b238e5f4 #376
Jul 29 12:49:22 ttip kernel: Call Trace:
Jul 29 12:49:22 ttip kernel: [000000000046c074] __warn+0xb4/0xe0
Jul 29 12:49:22 ttip kernel: [000000000046c120] warn_slowpath_null+0x20/0x40
Jul 29 12:49:22 ttip kernel: [000000000044b7bc]
compat_SyS_sparc_sigaction+0x3c/0x60
Jul 29 12:49:22 ttip kernel: [00000000004061d4] linux_sparc_syscall32+0x34/0x60
Jul 29 12:49:22 ttip kernel: ---[ end trace 1ad5184278304e6d ]---
Jul 29 12:49:25 ttip kernel: pc[83378]: segfault at 70000974 ip
0000000070000974 (rpc 000000007000096c) sp 00000000ffdd9438 error
30001 in pc[70010000+2000]
Anatoly Pugachev writes:
> On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
> <[email protected]> wrote:
> > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
> > build log the following should do it:
> >
> > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
> > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
> > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
> > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
> > make -j2
> > make -j2 -k check VERBOSE=1
>
> cant' reproduce it here on debian sparc64 LDOM:
DaveM was also unable to reproduce it.
I'll be investigating a possible kernel miscompile next.
>
> used git version of strace ( https://github.com/strace/strace )
>
> strace$ make -j24 check VERBOSE=1
> ...
> ============================================================================
> Testsuite summary for strace 4.18.0.134.805d
> ============================================================================
> # TOTAL: 443
> # PASS: 389
> # SKIP: 40
> # XFAIL: 0
> # FAIL: 14
> # XPASS: 0
> # ERROR: 0
>
>
> while in kernel logs (journalctl -k -f):
>
> Jul 29 12:49:22 ttip kernel: mmap: remap_file_page (77341) uses
> deprecated remap_file_pages() syscall. See
> Documentation/vm/remap_file_pages.txt.
> Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses
> deprecated v2 capabilities in a way that may be insecure
> Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses 32-bit
> capabilities (legacy support in use)
> Jul 29 12:49:22 ttip kernel: ------------[ cut here ]------------
> Jul 29 12:49:22 ttip kernel: WARNING: CPU: 18 PID: 78388 at
> arch/sparc/kernel/sys_sparc32.c:150
> compat_SyS_sparc_sigaction+0x3c/0x60
> Jul 29 12:49:22 ttip kernel: Modules linked in: tcp_diag inet_diag
> xfrm_user xfrm_algo nfnetlink netlink_diag xt_tcpudp xt_multiport
> xt_conntrack tun iptable_filter iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 xfs camellia_sparc64 des_sparc64
> des_generic aes_sparc64 n2_rng md5_sparc64 rng_core flash
> sha512_sparc64 sha256_sparc64 sha1_sparc64 nf_nat_pptp
> nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat
> nf_conntrack libcrc32c crc32c_generic ip_tables x_tables autofs4 ext4
> crc16 mbcache jbd2 crc32c_sparc64
> Jul 29 12:49:22 ttip kernel: CPU: 18 PID: 78388 Comm: sigaction Not
> tainted 4.13.0-rc2-00220-g0a07b238e5f4 #376
> Jul 29 12:49:22 ttip kernel: Call Trace:
> Jul 29 12:49:22 ttip kernel: [000000000046c074] __warn+0xb4/0xe0
> Jul 29 12:49:22 ttip kernel: [000000000046c120] warn_slowpath_null+0x20/0x40
> Jul 29 12:49:22 ttip kernel: [000000000044b7bc]
> compat_SyS_sparc_sigaction+0x3c/0x60
> Jul 29 12:49:22 ttip kernel: [00000000004061d4] linux_sparc_syscall32+0x34/0x60
> Jul 29 12:49:22 ttip kernel: ---[ end trace 1ad5184278304e6d ]---
> Jul 29 12:49:25 ttip kernel: pc[83378]: segfault at 70000974 ip
> 0000000070000974 (rpc 000000007000096c) sp 00000000ffdd9438 error
> 30001 in pc[70010000+2000]
--
Mikael Pettersson writes:
> Anatoly Pugachev writes:
> > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
> > <[email protected]> wrote:
> > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
> > > build log the following should do it:
> > >
> > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
> > > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
> > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
> > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
> > > make -j2
> > > make -j2 -k check VERBOSE=1
> >
> > cant' reproduce it here on debian sparc64 LDOM:
>
> DaveM was also unable to reproduce it.
>
> I'll be investigating a possible kernel miscompile next.
I don't think it's a miscompile.
First I recompiled 4.13-rc2 with each of gcc-7, gcc-6, and gcc-5, each
bootstrapped and regtested from the head of the respective FSF GCC branch:
no change, kernel 4.11 works while kernels >= 4.12 OOPS. So a miscompile
seems unlikely.
Then I ran a git bisect between v4.11 (good) and v4.12 (bad), booting
each kernel and trying the problematic strace test binaries. That
identified the following as the first bad commit:
commit 31af2f36d50e3b9b2fb7f17aa430c11c91f946c4
Author: Al Viro <[email protected]>
Date: Tue Mar 21 17:04:45 2017 -0400
sparc: switch to RAW_COPY_USER
... and drop zeroing in sparc32.
Signed-off-by: Al Viro <[email protected]>
That touches the CPU model specific assembly code in arch/sparc/lib/ for
copy_{to,from}_user and changes how it's wired into the rest of the kernel.
There's different code for different UltraSPARC and Niagara generations,
so if there is a bug in e.g. the USIII code, you won't see it on Niagara.
Unfortunately I don't see anything obviously wrong in Al's patch...
/Mikael
On Mon, Jul 31, 2017 at 8:14 PM, Mikael Pettersson <[email protected]> wrote:
> Mikael Pettersson writes:
> > Anatoly Pugachev writes:
> > > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
> > > <[email protected]> wrote:
> > > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but according to the
> > > > build log the following should do it:
> > > >
> > > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -mcpu=ultrasparc'
> > > > ./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
> > > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
> > > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info
> > > > make -j2
> > > > make -j2 -k check VERBOSE=1
> > >
> > > cant' reproduce it here on debian sparc64 LDOM:
> >
> > DaveM was also unable to reproduce it.
> >
> > I'll be investigating a possible kernel miscompile next.
>
> I don't think it's a miscompile.
>
> First I recompiled 4.13-rc2 with each of gcc-7, gcc-6, and gcc-5, each
> bootstrapped and regtested from the head of the respective FSF GCC branch:
> no change, kernel 4.11 works while kernels >= 4.12 OOPS. So a miscompile
> seems unlikely.
>
> Then I ran a git bisect between v4.11 (good) and v4.12 (bad), booting
> each kernel and trying the problematic strace test binaries. That
> identified the following as the first bad commit:
>
> commit 31af2f36d50e3b9b2fb7f17aa430c11c91f946c4
> Author: Al Viro <[email protected]>
> Date: Tue Mar 21 17:04:45 2017 -0400
>
> sparc: switch to RAW_COPY_USER
>
> ... and drop zeroing in sparc32.
>
> Signed-off-by: Al Viro <[email protected]>
>
> That touches the CPU model specific assembly code in arch/sparc/lib/ for
> copy_{to,from}_user and changes how it's wired into the rest of the kernel.
> There's different code for different UltraSPARC and Niagara generations,
> so if there is a bug in e.g. the USIII code, you won't see it on Niagara.
Just to let you know, just reproduced this OOPS on my v215 debian sparc64:
Aug 01 00:34:56 v215 kernel: capability: warning: `caps' uses
deprecated v2 capabilities in a way that may be insecure
Aug 01 00:34:56 v215 kernel: capability: warning: `caps' uses 32-bit
capabilities (legacy support in use)
Aug 01 00:35:00 v215 kernel: Netfilter messages via NETLINK v0.30.
Aug 01 00:35:00 v215 kernel: Initializing XFRM netlink socket
Aug 01 00:35:09 v215 kernel: mmap: remap_file_page (1155) uses
deprecated remap_file_pages() syscall. See
Documentation/vm/remap_file_pages.txt.
Aug 01 00:35:10 v215 kernel: Unable to handle kernel NULL pointer dereference
Aug 01 00:35:10 v215 kernel: tsk->{mm,active_mm}->context = 0000000000000de6
Aug 01 00:35:10 v215 kernel: tsk->{mm,active_mm}->pgd = fff000123d478000
Aug 01 00:35:10 v215 kernel: \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
Aug 01 00:35:11 v215 kernel: sched_xetattr(1527): Oops [#1]
Aug 01 00:35:11 v215 kernel: CPU: 1 PID: 1527 Comm: sched_xetattr Not
tainted 4.12.0 #365
Aug 01 00:35:11 v215 kernel: task: fff0001231d41340 task.stack: fff000123dfc4000
Aug 01 00:35:11 v215 kernel: TSTATE: 0000004411001604 TPC:
0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Not tainted
Aug 01 00:35:11 v215 kernel: TPC: <__bzero+0x20/0xc0>
Aug 01 00:35:11 v215 kernel: g0: fff000123dfc7d20 g1: 0000000000000000
g2: 0000003000000000 g3: 0000000000000000
Aug 01 00:35:11 v215 kernel: g4: fff0001231d41340 g5: fff000123ed08000
g6: fff000123dfc4000 g7: 0000000000000030
Aug 01 00:35:11 v215 kernel: o0: 0000000000000030 o1: fff000123dfc7e70
o2: 0000000000000000 o3: 0000000000000030
Aug 01 00:35:11 v215 kernel: o4: fff000123dfc7e70 o5: 000000000000000a
sp: fff000123dfc75c1 ret_pc: 000000000049b294
Aug 01 00:35:11 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0>
Aug 01 00:35:11 v215 kernel: l0: 0000000000000000 l1: 0000000000000000
l2: 0000000000000000 l3: 0000000000000000
Aug 01 00:35:11 v215 kernel: l4: 0000000000000000 l5: 0000000000000000
l6: 0000000000000000 l7: 00000000f7d58000
Aug 01 00:35:12 v215 kernel: i0: 0000000000000000 i1: 00000000f7bc5ffc
i2: 0000000000000000 i3: fff000123dfc7e70
Aug 01 00:35:12 v215 kernel: i4: 0000000000000000 i5: fff000123dfc7e70
i6: fff000123dfc76a1 i7: 00000000004061b4
Aug 01 00:35:12 v215 kernel: I7: <linux_sparc_syscall32+0x34/0x60>
Aug 01 00:35:12 v215 kernel: Call Trace:
Aug 01 00:35:12 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:12 v215 kernel: Disabling lock debugging due to kernel taint
Aug 01 00:35:12 v215 kernel: Caller[00000000004061b4]:
linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:12 v215 kernel: Caller[000000007000117c]: 0x7000117c
Aug 01 00:35:12 v215 kernel: Instruction DUMP:
Aug 01 00:35:12 v215 kernel: c56a2000
Aug 01 00:35:12 v215 kernel: 808a2003
Aug 01 00:35:12 v215 kernel: 02480006
Aug 01 00:35:12 v215 kernel: <d42a2000>
Aug 01 00:35:12 v215 kernel: 90022001
Aug 01 00:35:12 v215 kernel: 808a2003
Aug 01 00:35:12 v215 kernel: 1247fffd
Aug 01 00:35:12 v215 kernel: 92226001
Aug 01 00:35:12 v215 kernel: 808a2007
Aug 01 00:35:12 v215 kernel:
Aug 01 00:35:13 v215 kernel: Unable to handle kernel NULL pointer dereference
Aug 01 00:35:13 v215 kernel: tsk->{mm,active_mm}->context = 00000000000012cb
Aug 01 00:35:14 v215 kernel: tsk->{mm,active_mm}->pgd = fff0001230a12000
Aug 01 00:35:14 v215 kernel: \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
Aug 01 00:35:14 v215 kernel: sched_xetattr(2216): Oops [#2]
Aug 01 00:35:14 v215 kernel: CPU: 0 PID: 2216 Comm: sched_xetattr
Tainted: G D 4.12.0 #365
Aug 01 00:35:14 v215 kernel: task: fff0001231d41340 task.stack: fff0001232754000
Aug 01 00:35:14 v215 kernel: TSTATE: 0000004411001601 TPC:
0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Tainted: G
D
Aug 01 00:35:14 v215 kernel: TPC: <__bzero+0x20/0xc0>
Aug 01 00:35:14 v215 kernel: g0: fff0001232757d20 g1: 0000000000000000
g2: 0000003000000000 g3: 0000000000000000
Aug 01 00:35:14 v215 kernel: g4: fff0001231d41340 g5: fff000123eb08000
g6: fff0001232754000 g7: 0000000000000030
Aug 01 00:35:14 v215 kernel: o0: 0000000000000030 o1: fff0001232757e70
o2: 0000000000000000 o3: 0000000000000030
Aug 01 00:35:14 v215 kernel: o4: fff0001232757e70 o5: 000000000000000a
sp: fff00012327575c1 ret_pc: 000000000049b294
Aug 01 00:35:14 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0>
Aug 01 00:35:14 v215 kernel: l0: 0000000000000000 l1: 0000000000000000
l2: 0000000000000000 l3: 0000000000000000
Aug 01 00:35:14 v215 kernel: l4: 0000000000000000 l5: 0000000000000000
l6: 0000000000000000 l7: 00000000f7cdc000
Aug 01 00:35:14 v215 kernel: i0: 0000000000000000 i1: 00000000f7b49ffc
i2: 0000000000000000 i3: fff0001232757e70
Aug 01 00:35:15 v215 kernel: i4: 0000000000000000 i5: fff0001232757e70
i6: fff00012327576a1 i7: 00000000004061b4
Aug 01 00:35:15 v215 kernel: I7: <linux_sparc_syscall32+0x34/0x60>
Aug 01 00:35:15 v215 kernel: Call Trace:
Aug 01 00:35:15 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:15 v215 kernel: Caller[00000000004061b4]:
linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:15 v215 kernel: Caller[000000007000117c]: 0x7000117c
Aug 01 00:35:15 v215 kernel: Instruction DUMP:
Aug 01 00:35:15 v215 kernel: c56a2000
Aug 01 00:35:15 v215 kernel: 808a2003
Aug 01 00:35:15 v215 kernel: 02480006
Aug 01 00:35:15 v215 kernel: <d42a2000>
Aug 01 00:35:15 v215 kernel: 90022001
Aug 01 00:35:15 v215 kernel: 808a2003
Aug 01 00:35:15 v215 kernel: 1247fffd
Aug 01 00:35:15 v215 kernel: 92226001
Aug 01 00:35:15 v215 kernel: 808a2007
Aug 01 00:35:15 v215 kernel:
Aug 01 00:35:16 v215 kernel: ------------[ cut here ]------------
Aug 01 00:35:16 v215 kernel: WARNING: CPU: 0 PID: 2900 at
arch/sparc/kernel/sys_sparc32.c:150
compat_SyS_sparc_sigaction+0x54/0x80
Aug 01 00:35:16 v215 kernel: Modules linked in: xfrm_user xfrm_algo
tcp_diag inet_diag af_packet_diag netlink_diag unix_diag nfnetlink
ohci_pci ata_generic tg3 ohci_hcd ehci_pci ptp ehci_hcd pps_core
libphy usbcore pata_ali libata sg flash jitterentropy_rng ip_tables
x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto raid10 raid456
libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq
raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod
dm_mod dax sd_mod mptsas scsi_transport_sas mptscsih scsi_mod mptbase
Aug 01 00:35:17 v215 kernel: CPU: 0 PID: 2900 Comm: sigaction Tainted:
G D 4.12.0 #365
Aug 01 00:35:17 v215 kernel: Call Trace:
Aug 01 00:35:17 v215 kernel: [000000000046b900] __warn+0xc0/0xe0
Aug 01 00:35:17 v215 kernel: [000000000046b9e0] warn_slowpath_null+0x20/0x40
Aug 01 00:35:17 v215 kernel: [000000000044b614]
compat_SyS_sparc_sigaction+0x54/0x80
Aug 01 00:35:17 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:17 v215 kernel: ---[ end trace 0413ef9096de5564 ]---
Aug 01 00:35:41 v215 kernel: Unable to handle kernel NULL pointer dereference
Aug 01 00:35:41 v215 kernel: tsk->{mm,active_mm}->context = 00000000000015f9
Aug 01 00:35:41 v215 kernel: tsk->{mm,active_mm}->pgd = fff0001230ab6000
Aug 01 00:35:41 v215 kernel: \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
Aug 01 00:35:42 v215 kernel: poll(11551): Oops [#3]
Aug 01 00:35:42 v215 kernel: CPU: 1 PID: 11551 Comm: poll Tainted: G
D W 4.12.0 #365
Aug 01 00:35:42 v215 kernel: task: fff000123c9113a0 task.stack: fff0001232e7c000
Aug 01 00:35:42 v215 kernel: TSTATE: 0000004411001603 TPC:
0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Tainted: G
D W
Aug 01 00:35:42 v215 kernel: TPC: <__bzero+0x20/0xc0>
Aug 01 00:35:42 v215 kernel: g0: fff000123cfce548 g1: 0000000000000000
g2: 0000000000000000 g3: 0000000000000000
Aug 01 00:35:42 v215 kernel: g4: fff000123c9113a0 g5: fff000123ed08000
g6: fff0001232e7c000 g7: 0000000000000008
Aug 01 00:35:42 v215 kernel: o0: 000000000000000c o1: fff0001232e7fa80
o2: 0000000000000000 o3: 000000000000000c
Aug 01 00:35:42 v215 kernel: o4: fff0001232e7fa7c o5: 00000000000000fb
sp: fff0001232e7f1a1 ret_pc: 0000000000630ad4
Aug 01 00:35:42 v215 kernel: RPC: <do_sys_poll+0xd4/0x460>
Aug 01 00:35:42 v215 kernel: l0: 0000000000000002 l1: 00000000014000c0
l2: 00000000000003fe l3: 000fffedcd180590
Aug 01 00:35:43 v215 kernel: l4: fff0001232e7fa7c l5: 00000000f78346f4
l6: 0000000000000002 l7: 00000000f7968000
Aug 01 00:35:43 v215 kernel: i0: 00000000f77e1ff8 i1: 0000000000000002
i2: fff0001232e7fe90 i3: fff0001232e7fa70
Aug 01 00:35:43 v215 kernel: i4: 0000000000000002 i5: 00000000f77e1ff8
i6: fff0001232e7f5e1 i7: 00000000006315d8
Aug 01 00:35:43 v215 kernel: I7: <SyS_poll+0x78/0x100>
Aug 01 00:35:43 v215 kernel: Call Trace:
Aug 01 00:35:43 v215 kernel: [00000000006315d8] SyS_poll+0x78/0x100
Aug 01 00:35:43 v215 kernel: [00000000004061b4] linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:43 v215 kernel: Caller[00000000006315d8]: SyS_poll+0x78/0x100
Aug 01 00:35:43 v215 kernel: Caller[00000000004061b4]:
linux_sparc_syscall32+0x34/0x60
Aug 01 00:35:43 v215 kernel: Caller[0000000070000ba8]: 0x70000ba8
Aug 01 00:35:43 v215 kernel: Instruction DUMP:
Aug 01 00:35:43 v215 kernel: c56a2000
Aug 01 00:35:43 v215 kernel: 808a2003
Aug 01 00:35:44 v215 kernel: 02480006
Aug 01 00:35:44 v215 kernel: <d42a2000>
Aug 01 00:35:44 v215 kernel: 90022001
Aug 01 00:35:44 v215 kernel: 808a2003
Aug 01 00:35:44 v215 kernel: 1247fffd
Aug 01 00:35:44 v215 kernel: 92226001
Aug 01 00:35:44 v215 kernel: 808a2007
Aug 01 00:35:44 v215 kernel:
Aug 01 00:35:55 v215 kernel: pc[12811]: segfault at 70000974 ip
0000000070000974 (rpc 000000007000096c) sp 00000000ffa69488 error
30001 in pc[70010000+2000]
...
============================================================================
Testsuite summary for strace 4.18.0.134.805d
============================================================================
# TOTAL: 443
# PASS: 387
# SKIP: 39
# XFAIL: 0
# FAIL: 17
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
From: Anatoly Pugachev <[email protected]>
Date: Tue, 1 Aug 2017 00:48:07 +0300
> Aug 01 00:35:11 v215 kernel: sched_xetattr(1527): Oops [#1]
> Aug 01 00:35:11 v215 kernel: CPU: 1 PID: 1527 Comm: sched_xetattr Not
> tainted 4.12.0 #365
> Aug 01 00:35:11 v215 kernel: task: fff0001231d41340 task.stack: fff000123dfc4000
> Aug 01 00:35:11 v215 kernel: TSTATE: 0000004411001604 TPC:
> 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Not tainted
> Aug 01 00:35:11 v215 kernel: TPC: <__bzero+0x20/0xc0>
> Aug 01 00:35:11 v215 kernel: g0: fff000123dfc7d20 g1: 0000000000000000
> g2: 0000003000000000 g3: 0000000000000000
> Aug 01 00:35:11 v215 kernel: g4: fff0001231d41340 g5: fff000123ed08000
> g6: fff000123dfc4000 g7: 0000000000000030
> Aug 01 00:35:11 v215 kernel: o0: 0000000000000030 o1: fff000123dfc7e70
> o2: 0000000000000000 o3: 0000000000000030
> Aug 01 00:35:11 v215 kernel: o4: fff000123dfc7e70 o5: 000000000000000a
> sp: fff000123dfc75c1 ret_pc: 000000000049b294
> Aug 01 00:35:11 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0>
Please run gdb on this kernel image and tell it:
(gdb) x/20i 0x49b294 - 16
Thanks.
I think perhaps one of Al Viro's changes in the bisected commit causes
a branch to either have an overflowed offset field, or get mispatched
to the wrong destination.
On Tue, Aug 1, 2017 at 12:51 AM, David Miller <[email protected]> wrote:
> From: Anatoly Pugachev <[email protected]>
> Date: Tue, 1 Aug 2017 00:48:07 +0300
>
>> Aug 01 00:35:11 v215 kernel: sched_xetattr(1527): Oops [#1]
>> Aug 01 00:35:11 v215 kernel: CPU: 1 PID: 1527 Comm: sched_xetattr Not
>> tainted 4.12.0 #365
>> Aug 01 00:35:11 v215 kernel: task: fff0001231d41340 task.stack: fff000123dfc4000
>> Aug 01 00:35:11 v215 kernel: TSTATE: 0000004411001604 TPC:
>> 0000000000a121fc TNPC: 0000000000a12210 Y: 00000000 Not tainted
>> Aug 01 00:35:11 v215 kernel: TPC: <__bzero+0x20/0xc0>
>> Aug 01 00:35:11 v215 kernel: g0: fff000123dfc7d20 g1: 0000000000000000
>> g2: 0000003000000000 g3: 0000000000000000
>> Aug 01 00:35:11 v215 kernel: g4: fff0001231d41340 g5: fff000123ed08000
>> g6: fff000123dfc4000 g7: 0000000000000030
>> Aug 01 00:35:11 v215 kernel: o0: 0000000000000030 o1: fff000123dfc7e70
>> o2: 0000000000000000 o3: 0000000000000030
>> Aug 01 00:35:11 v215 kernel: o4: fff000123dfc7e70 o5: 000000000000000a
>> sp: fff000123dfc75c1 ret_pc: 000000000049b294
>> Aug 01 00:35:11 v215 kernel: RPC: <SyS_sched_setattr+0x174/0x1a0>
>
> Please run gdb on this kernel image and tell it:
>
> (gdb) x/20i 0x49b294 - 16
>
> Thanks.
>
> I think perhaps one of Al Viro's changes in the bisected commit causes
> a branch to either have an overflowed offset field, or get mispatched
> to the wrong destination.
David,
I don't know how to run on a running kernel , but as I understood:
root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux
root@v215:strace# gdb -q vmlinux
Reading symbols from vmlinux...(no debugging symbols found)...done.
(gdb) x/20i 0x49b294 - 16
0x49b284 <_start+619140>: mov -22, %o0
0x49b288 <_start+619144>: sub %i5, %o0, %o0
0x49b28c <_start+619148>: mov %i3, %o2
0x49b290 <_start+619152>: clr %o1
0x49b294 <_start+619156>: call 0xa121b8 <_start+6349240>
0x49b298 <_start+619160>: add %o0, 0x30, %o0
0x49b29c <_start+619164>: cmp %i3, 0
0x49b2a0 <_start+619168>: be %icc, 0x49b20c <_start+619020>
0x49b2a4 <_start+619172>: mov -14, %i0
0x49b2a8 <_start+619176>: rett %i7 + 8
0x49b2ac <_start+619180>: nop
0x49b2b0 <_start+619184>: b,a %xcc, 0x49b2c0 <_start+619200>
0x49b2b4 <_start+619188>: nop
0x49b2b8 <_start+619192>: nop
0x49b2bc <_start+619196>: nop
0x49b2c0 <_start+619200>: save %sp, -176, %sp
0x49b2c4 <_start+619204>: call 0xa136c0 <_start+6354624>
0x49b2c8 <_start+619208>: nop
0x49b2cc <_start+619212>: cmp %i0, 0
0x49b2d0 <_start+619216>: bl,pn %icc, 0x49b318 <_start+619288>
0x49b2d4 <_start+619220>: mov -22, %o0
(gdb)
From: Anatoly Pugachev <[email protected]>
Date: Tue, 1 Aug 2017 01:01:47 +0300
> I don't know how to run on a running kernel , but as I understood:
>
> root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux
> root@v215:strace# gdb -q vmlinux
> Reading symbols from vmlinux...(no debugging symbols found)...done.
> (gdb) x/20i 0x49b294 - 16
Unfortunately you need to do this on the build kernel image before it
has been stripped of all of it's symbols.
Mikael, you built your kernels right?
Go into one of your OOPS's and extract the "RPC: " hex value, and run
the gdb command:
bash$ cd src/linux
bash$ gdb ./vmlinux
(gdb) x/10i 0x${RPC_HEX_VALUE} - 16
Thanks.
David Miller writes:
> From: Anatoly Pugachev <[email protected]>
> Date: Tue, 1 Aug 2017 01:01:47 +0300
>
> > I don't know how to run on a running kernel , but as I understood:
> >
> > root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux
> > root@v215:strace# gdb -q vmlinux
> > Reading symbols from vmlinux...(no debugging symbols found)...done.
> > (gdb) x/20i 0x49b294 - 16
>
> Unfortunately you need to do this on the build kernel image before it
> has been stripped of all of it's symbols.
>
> Mikael, you built your kernels right?
>
> Go into one of your OOPS's and extract the "RPC: " hex value, and run
> the gdb command:
>
> bash$ cd src/linux
> bash$ gdb ./vmlinux
> (gdb) x/10i 0x${RPC_HEX_VALUE} - 16
>
> Thanks.
Ok, with 4.13-rc3 I got
[ 240.085153] Unable to handle kernel NULL pointer dereference
[ 240.142397] tsk->{mm,active_mm}->context = 000000000000044a
[ 240.198531] tsk->{mm,active_mm}->pgd = fff000023c784000
[ 240.250112] \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
[ 240.374879] poll(724): Oops [#1]
[ 240.400132] CPU: 0 PID: 724 Comm: poll Not tainted 4.13.0-rc3 #1
[ 240.462002] task: fff000123cc71e00 task.stack: fff000123c894000
[ 240.522717] TSTATE: 0000004411001605 TPC: 00000000007570fc TNPC: 0000000000757110 Y: 00000000 Not tainted
[ 240.634921] TPC: <__bzero+0x20/0xc0>
[ 240.664747] g0: fff000123c897081 g1: 0000000000000000 g2: 0000000000000000 g3: 00000000008ca100
[ 240.762068] g4: fff000123cc71e00 g5: fff000023ef44000 g6: fff000123c894000 g7: 0000000000000008
[ 240.859389] o0: 000000000000000c o1: fff000123c897a80 o2: 0000000000000000 o3: 000000000000000c
[ 240.956718] o4: fff000123c897a7c o5: 00000000000000fb sp: fff000123c897181 ret_pc: 0000000000516ee0
[ 241.058627] RPC: <do_sys_poll+0x80/0x3c0>
[ 241.094166] l0: 0000000000000002 l1: 00000000014000c0 l2: 00000000000003fe l3: fff000123c897a7c
[ 241.191506] l4: 0000000000000000 l5: 0000000000000000 l6: 000000000000006d l7: ffffffffffffffea
[ 241.288822] i0: 00000000f7d93ff8 i1: 0000000000000002 i2: fff000123c897e90 i3: fff000123c897a70
[ 241.386141] i4: 000fffedc3768590 i5: fff000123c897a70 i6: fff000123c8975e1 i7: 00000000005177f8
[ 241.483468] I7: <SyS_poll+0x74/0xd0>
[ 241.513292] Call Trace:
[ 241.528265] [00000000005177f8] SyS_poll+0x74/0xd0
[ 241.574140] [00000000004061b4] linux_sparc_syscall32+0x34/0x60
[ 241.634847] Disabling lock debugging due to kernel taint
[ 241.687555] Caller[00000000005177f8]: SyS_poll+0x74/0xd0
[ 241.740276] Caller[00000000004061b4]: linux_sparc_syscall32+0x34/0x60
[ 241.807855] Caller[0000000000010a20]: 0x10a20
[ 241.847983] Instruction DUMP:
[ 241.847987] c56a2000
[ 241.869824] 808a2003
[ 241.883651] 02480006
[ 241.897475] <d42a2000>
[ 241.911207] 90022001
[ 241.925032] 808a2003
[ 241.938755] 1247fffd
[ 241.952484] 92226001
[ 241.966310] 808a2007
so the RPC should be do_sys_poll+0x80 right? Then gdb on the original vmlinux said:
(gdb) x/10i do_sys_poll+0x80-16
0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668>
0x516ed4 <do_sys_poll+116>: mov %o0, %o2
0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4
0x516edc <do_sys_poll+124>: clr %o1
0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset>
0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0
0x516ee8 <do_sys_poll+136>: b %xcc, 0x5170b0 <do_sys_poll+592>
0x516eec <do_sys_poll+140>: mov -14, %l7
0x516ef0 <do_sys_poll+144>: mov %l2, %o0
0x516ef4 <do_sys_poll+148>: movleu %xcc, %l0, %o0
(gdb)
/Mikael
Hi Mikael.
I think this translates to the following code
from linux/uaccess.h
first part is the inlined _copy_from_user()
>
> (gdb) x/10i do_sys_poll+0x80-16
> 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668>
if (unlikely(res))
> 0x516ed4 <do_sys_poll+116>: mov %o0, %o2
> 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4
> 0x516edc <do_sys_poll+124>: clr %o1
> 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset>
> 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0
memset(to + (n - res), 0, res);
and this part is from the inlined copy_from_user()
> 0x516ee8 <do_sys_poll+136>: b %xcc, 0x5170b0 <do_sys_poll+592>
jump to end of function
> 0x516eec <do_sys_poll+140>: mov -14, %l7
> 0x516ef0 <do_sys_poll+144>: mov %l2, %o0
> 0x516ef4 <do_sys_poll+148>: movleu %xcc, %l0, %o0
} else if (!__builtin_constant_p(n))
copy_user_overflow(sz, n);
Where we in the generic implementation now uses the return value
of raw_copy_from_user() which we did not do before said patch.
So I suspect that what we see here is that:
1) with the patch from Al we start to use the return value of raw_copy_from_user
2) The return value is wrong in the sparc implmentation so boom
3) We only trigger this on old HW because the return value is correct in some,
but not all of the implemantions of raw_copy_from_user.
Davem fixed this is a series of patches that requires some sparc
assembler knowledge to dechifer.
The return value was fixed in ee841d0aff649164080e445e84885015958d8ff4
for the Ultra III as used by SUN Blade 2500.
And if I am right then this fix fails with the paramters used
in our case with strace.
Mikael - could you try to edit U3patch.S like this:
Change the following lines:
cheetah_patch_copyops:
ULTRA3_DO_PATCH(memcpy, U3memcpy)
ULTRA3_DO_PATCH(___copy_from_user, U3copy_from_user)
ULTRA3_DO_PATCH(___copy_to_user, U3copy_to_user)
retl
To:
cheetah_patch_copyops:
ULTRA3_DO_PATCH(memcpy, GENmemcpy)
ULTRA3_DO_PATCH(raw_copy_from_user, GENcopy_from_user)
ULTRA3_DO_PATCH(raw_copy_to_user, GENcopy_to_user)
retl
In other words, so we use the generic versions which I assume
is OK on Ultra III, but slower.
Sam
On Tue, Aug 01, 2017 at 10:58:29PM +0200, Sam Ravnborg wrote:
> Hi Mikael.
>
> I think this translates to the following code
> from linux/uaccess.h
>
> first part is the inlined _copy_from_user()
>
> >
> > (gdb) x/10i do_sys_poll+0x80-16
> > 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668>
> if (unlikely(res))
>
> > 0x516ed4 <do_sys_poll+116>: mov %o0, %o2
> > 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4
> > 0x516edc <do_sys_poll+124>: clr %o1
> > 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset>
> > 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0
> memset(to + (n - res), 0, res);
And memset calls down to bzero, where %o0=buf, %o1=len
%o0 = 0xc
%o1 = 0xfff000123c897a80
%o2 = 0x0
%o3 = 0xc
So from this we know that:
res = 0xfff000123c897a80
to + (n - 0xfff000123c897a80)) = 0xc
The value "fff000123c897a80" really looks like a constructed address
from somewhere in the strace code, and where this constructed address
is used to provoke some unusual behaviour.
The "fff0" part may be a sparc thing.
So far the analysis seems to match the intial conclusion that
we in this special case try to zero out the remaining memory
based on the return value of raw_copy_from_user.
And therefore we use the return value (res) which triggers the oops.
So rather than manipulating with the assembler code as suggested
in the previous mail this simpler patch could be tested:
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index acdd6f915a8d..13d299ff1f21 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -115,7 +115,7 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
res = raw_copy_from_user(to, from, n);
}
if (unlikely(res))
- memset(to + (n - res), 0, res);
+ void: /*memset(to + (n - res), 0, res);*/
return res;
}
#else
It would be good to know if this makes the opps go away.
And maybe you could try to print the parameters
supplied to _copy_from_user in case memset would be called,
so we have an idea what error path is taken.
I have tried to dechiper U3memcpy.S - but that is non-trivial.
So it would be good with a bit more data to verify the theory.
Sam
From: Sam Ravnborg <[email protected]>
Date: Wed, 2 Aug 2017 23:36:47 +0200
> And memset calls down to bzero, where %o0=buf, %o1=len
>
> %o0 = 0xc
> %o1 = 0xfff000123c897a80
> %o2 = 0x0
> %o3 = 0xc
>
> So from this we know that:
> res = 0xfff000123c897a80
> to + (n - 0xfff000123c897a80)) = 0xc
>
> The value "fff000123c897a80" really looks like a constructed address
> from somewhere in the strace code, and where this constructed address
> is used to provoke some unusual behaviour.
> The "fff0" part may be a sparc thing.
>
> So far the analysis seems to match the intial conclusion that
> we in this special case try to zero out the remaining memory
> based on the return value of raw_copy_from_user.
> And therefore we use the return value (res) which triggers the oops.
Yes, the return value is bogus.
> So rather than manipulating with the assembler code as suggested
> in the previous mail this simpler patch could be tested:
...
> - memset(to + (n - res), 0, res);
> + void: /*memset(to + (n - res), 0, res);*/
Need a semicolon rather than a colon there :-)
Sam Ravnborg writes:
> On Tue, Aug 01, 2017 at 10:58:29PM +0200, Sam Ravnborg wrote:
> > Hi Mikael.
> >
> > I think this translates to the following code
> > from linux/uaccess.h
> >
> > first part is the inlined _copy_from_user()
> >
> > >
> > > (gdb) x/10i do_sys_poll+0x80-16
> > > 0x516ed0 <do_sys_poll+112>: brz %o0, 0x5170fc <do_sys_poll+668>
> > if (unlikely(res))
> >
> > > 0x516ed4 <do_sys_poll+116>: mov %o0, %o2
> > > 0x516ed8 <do_sys_poll+120>: sub %i4, %o0, %i4
> > > 0x516edc <do_sys_poll+124>: clr %o1
> > > 0x516ee0 <do_sys_poll+128>: call 0x7570b8 <memset>
> > > 0x516ee4 <do_sys_poll+132>: add %l3, %i4, %o0
> > memset(to + (n - res), 0, res);
>
> And memset calls down to bzero, where %o0=buf, %o1=len
>
> %o0 = 0xc
> %o1 = 0xfff000123c897a80
> %o2 = 0x0
> %o3 = 0xc
>
> So from this we know that:
> res = 0xfff000123c897a80
> to + (n - 0xfff000123c897a80)) = 0xc
>
> The value "fff000123c897a80" really looks like a constructed address
> from somewhere in the strace code, and where this constructed address
> is used to provoke some unusual behaviour.
> The "fff0" part may be a sparc thing.
>
> So far the analysis seems to match the intial conclusion that
> we in this special case try to zero out the remaining memory
> based on the return value of raw_copy_from_user.
> And therefore we use the return value (res) which triggers the oops.
>
> So rather than manipulating with the assembler code as suggested
> in the previous mail this simpler patch could be tested:
>
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index acdd6f915a8d..13d299ff1f21 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -115,7 +115,7 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
> res = raw_copy_from_user(to, from, n);
> }
> if (unlikely(res))
> - memset(to + (n - res), 0, res);
> + void: /*memset(to + (n - res), 0, res);*/
> return res;
> }
> #else
>
>
> It would be good to know if this makes the opps go away.
>
> And maybe you could try to print the parameters
> supplied to _copy_from_user in case memset would be called,
> so we have an idea what error path is taken.
>
> I have tried to dechiper U3memcpy.S - but that is non-trivial.
> So it would be good with a bit more data to verify the theory.
I applied the following:
--- linux-4.13-rc3/include/linux/uaccess.h.~1~ 2017-08-01 08:49:48.397819726 +0200
+++ linux-4.13-rc3/include/linux/uaccess.h 2017-08-03 21:33:11.009634421 +0200
@@ -4,6 +4,8 @@
#include <linux/sched.h>
#include <linux/thread_info.h>
#include <linux/kasan-checks.h>
+#include <linux/ratelimit.h>
+#include <linux/printk.h>
#define VERIFY_READ 0
#define VERIFY_WRITE 1
@@ -115,7 +117,9 @@ _copy_from_user(void *to, const void __u
res = raw_copy_from_user(to, from, n);
}
if (unlikely(res))
- memset(to + (n - res), 0, res);
+ {
+ printk_ratelimited("_copy_from_user(%p, %p, %lu) res %lu\n", to, from, n, res);
+ }
return res;
}
#else
With that in place the kernel booted fine.
When I then ran the `poll' strace test binary, the OOPS was replaced by:
[ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240
[ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8
[ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112
That last `res' doesn't look good.
/Mikael
From: Mikael Pettersson <[email protected]>
Date: Thu, 3 Aug 2017 22:02:57 +0200
> With that in place the kernel booted fine.
> When I then ran the `poll' strace test binary, the OOPS was replaced by:
>
> [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240
> [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8
> [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112
>
> That last `res' doesn't look good.
Please test this patch:
diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
index 54f9870..5a8cb37 100644
--- a/arch/sparc/lib/U3memcpy.S
+++ b/arch/sparc/lib/U3memcpy.S
@@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
ENTRY(U3_retl_o2_and_7_plus_GS)
and %o2, 7, %o2
retl
- add %o2, GLOBAL_SPARE, %o2
+ add %o2, GLOBAL_SPARE, %o0
ENDPROC(U3_retl_o2_and_7_plus_GS)
ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
add GLOBAL_SPARE, 8, GLOBAL_SPARE
and %o2, 7, %o2
retl
- add %o2, GLOBAL_SPARE, %o2
+ add %o2, GLOBAL_SPARE, %o0
ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
#endif
Hi Davem.
On Thu, Aug 03, 2017 at 02:57:48PM -0700, David Miller wrote:
> From: Mikael Pettersson <[email protected]>
> Date: Thu, 3 Aug 2017 22:02:57 +0200
>
> > With that in place the kernel booted fine.
> > When I then ran the `poll' strace test binary, the OOPS was replaced by:
> >
> > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240
> > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8
> > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112
> >
> > That last `res' doesn't look good.
>
> Please test this patch:
>
> diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
> index 54f9870..5a8cb37 100644
> --- a/arch/sparc/lib/U3memcpy.S
> +++ b/arch/sparc/lib/U3memcpy.S
> @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
> ENTRY(U3_retl_o2_and_7_plus_GS)
> and %o2, 7, %o2
> retl
> - add %o2, GLOBAL_SPARE, %o2
> + add %o2, GLOBAL_SPARE, %o0
> ENDPROC(U3_retl_o2_and_7_plus_GS)
> ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
> add GLOBAL_SPARE, 8, GLOBAL_SPARE
> and %o2, 7, %o2
> retl
> - add %o2, GLOBAL_SPARE, %o2
> + add %o2, GLOBAL_SPARE, %o0
> ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
> #endif
>
Patch looks obviously correct, and I am a bit irritated that
I did not see this myself.
Reviewed-by: Sam Ravnborg <[email protected]>
I will send another patch that fixes/adds a few comments to the same file.
Sam
David Miller writes:
> From: Mikael Pettersson <[email protected]>
> Date: Thu, 3 Aug 2017 22:02:57 +0200
>
> > With that in place the kernel booted fine.
> > When I then ran the `poll' strace test binary, the OOPS was replaced by:
> >
> > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240
> > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8
> > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112
> >
> > That last `res' doesn't look good.
>
> Please test this patch:
>
> diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
> index 54f9870..5a8cb37 100644
> --- a/arch/sparc/lib/U3memcpy.S
> +++ b/arch/sparc/lib/U3memcpy.S
> @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
> ENTRY(U3_retl_o2_and_7_plus_GS)
> and %o2, 7, %o2
> retl
> - add %o2, GLOBAL_SPARE, %o2
> + add %o2, GLOBAL_SPARE, %o0
> ENDPROC(U3_retl_o2_and_7_plus_GS)
> ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
> add GLOBAL_SPARE, 8, GLOBAL_SPARE
> and %o2, 7, %o2
> retl
> - add %o2, GLOBAL_SPARE, %o2
> + add %o2, GLOBAL_SPARE, %o0
> ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
> #endif
>
Backing out my debugging patch and adding this one instead
gave me a working kernel that doesn't OOPS. Thanks.
Tested-by: Mikael Pettersson <[email protected]>
From: Mikael Pettersson <[email protected]>
Date: Fri, 4 Aug 2017 10:02:25 +0200
> David Miller writes:
> > From: Mikael Pettersson <[email protected]>
> > Date: Thu, 3 Aug 2017 22:02:57 +0200
> >
> > > With that in place the kernel booted fine.
> > > When I then ran the `poll' strace test binary, the OOPS was replaced by:
> > >
> > > [ 140.589913] _copy_from_user(fff000123c8dfa7c, (null), 240) res 240
> > > [ 140.753162] _copy_from_user(fff000123c8dfa7c, 00000000f7e4a000, 8) res 8
> > > [ 140.824155] _copy_from_user(fff000123c8dfa7c, 00000000f7e49ff8, 16) res 18442240552407530112
> > >
> > > That last `res' doesn't look good.
> >
> > Please test this patch:
> >
> > diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
> > index 54f9870..5a8cb37 100644
> > --- a/arch/sparc/lib/U3memcpy.S
> > +++ b/arch/sparc/lib/U3memcpy.S
> > @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
> > ENTRY(U3_retl_o2_and_7_plus_GS)
> > and %o2, 7, %o2
> > retl
> > - add %o2, GLOBAL_SPARE, %o2
> > + add %o2, GLOBAL_SPARE, %o0
> > ENDPROC(U3_retl_o2_and_7_plus_GS)
> > ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
> > add GLOBAL_SPARE, 8, GLOBAL_SPARE
> > and %o2, 7, %o2
> > retl
> > - add %o2, GLOBAL_SPARE, %o2
> > + add %o2, GLOBAL_SPARE, %o0
> > ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
> > #endif
> >
>
> Backing out my debugging patch and adding this one instead
> gave me a working kernel that doesn't OOPS. Thanks.
>
> Tested-by: Mikael Pettersson <[email protected]>
Great, thanks for testing.
This is the final patch I committed:
====================
>From 0ede1c401332173ab0693121dc6cde04a4dbf131 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <[email protected]>
Date: Fri, 4 Aug 2017 09:47:52 -0700
Subject: [PATCH] sparc64: Fix exception handling in UltraSPARC-III memcpy.
Mikael Pettersson reported that some test programs in the strace-4.18
testsuite cause an OOPS.
After some debugging it turns out that garbage values are returned
when an exception occurs, causing the fixup memset() to be run with
bogus arguments.
The problem is that two of the exception handler stubs write the
successfully copied length into the wrong register.
Fixes: ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.")
Reported-by: Mikael Pettersson <[email protected]>
Tested-by: Mikael Pettersson <[email protected]>
Reviewed-by: Sam Ravnborg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
arch/sparc/lib/U3memcpy.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
index 54f98706b03b..5a8cb37f0a3b 100644
--- a/arch/sparc/lib/U3memcpy.S
+++ b/arch/sparc/lib/U3memcpy.S
@@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
ENTRY(U3_retl_o2_and_7_plus_GS)
and %o2, 7, %o2
retl
- add %o2, GLOBAL_SPARE, %o2
+ add %o2, GLOBAL_SPARE, %o0
ENDPROC(U3_retl_o2_and_7_plus_GS)
ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
add GLOBAL_SPARE, 8, GLOBAL_SPARE
and %o2, 7, %o2
retl
- add %o2, GLOBAL_SPARE, %o2
+ add %o2, GLOBAL_SPARE, %o0
ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
#endif
--
2.13.3