On 05/24/2017 09:19 AM, Florian Fainelli wrote:
> On 05/24/2017 12:02 AM, Richard Weinberger wrote:
>> Florian,
>>
>> Am 24.05.2017 um 02:32 schrieb Florian Fainelli:
>>> Building a statically linked UML kernel on a Centos 6.9 host resulted in
>>> the following linking failure (GCC 4.4, glibc-2.12):
>>>
>>> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o):
>>> In function `siglongjmp':
>>> (.text+0x8490): multiple definition of `longjmp'
>>> arch/x86/um/built-in.o:/local/users/fainelli/openwrt/trunk/build_dir/target-x86_64_musl/linux-uml/linux-4.4.69/arch/x86/um/setjmp_64.S:44:
>>> first defined here
>>> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o):
>>> In function `sem_open':
>>> (.text+0x77cd): warning: the use of `mktemp' is dangerous, better use
>>> `mkstemp'
>>> collect2: ld returned 1 exit status
>>> make[4]: *** [vmlinux] Error 1
>>>
>>> Adopt a solution similar to the one done for vmap where we define
>>> longjmp/setjmp to be kernel_longjmp/setjmp. In the process, make sure we
>>> do rename the functions in arch/x86/um/setjmp_*.S accordingly.
>>
>> What is not so clear to me, why are you facing this build issue and other users, including me,
>> not?
>
> Presumably because we are not using the same glibc version? The one I
> have installed on this machine is glibc-2.12, do you want me to attach a
> copy of it?
Richard, what do we do with this?
--
Florian
Florian,
Am 01.06.2017 um 21:38 schrieb Florian Fainelli:
>> Presumably because we are not using the same glibc version? The one I
>> have installed on this machine is glibc-2.12, do you want me to attach a
>> copy of it?
>
> Richard, what do we do with this?
I'd like to see the issues that Thomas sees also get addressed.
Thanks,
//richard
On 06/01/2017 01:11 PM, Richard Weinberger wrote:
> Florian,
>
> Am 01.06.2017 um 21:38 schrieb Florian Fainelli:
>>> Presumably because we are not using the same glibc version? The one I
>>> have installed on this machine is glibc-2.12, do you want me to attach a
>>> copy of it?
>>
>> Richard, what do we do with this?
>
> I'd like to see the issues that Thomas sees also get addressed.
Sure, but that seems orthogonal? In the absence of an answer from Eli,
either you could take my patch or just send reverts of Eli's two
commits, whichever you prefer.
--
Florian
Am 01.06.2017 um 22:15 schrieb Florian Fainelli:
> On 06/01/2017 01:11 PM, Richard Weinberger wrote:
>> Florian,
>>
>> Am 01.06.2017 um 21:38 schrieb Florian Fainelli:
>>>> Presumably because we are not using the same glibc version? The one I
>>>> have installed on this machine is glibc-2.12, do you want me to attach a
>>>> copy of it?
>>>
>>> Richard, what do we do with this?
>>
>> I'd like to see the issues that Thomas sees also get addressed.
>
> Sure, but that seems orthogonal? In the absence of an answer from Eli,
> either you could take my patch or just send reverts of Eli's two
> commits, whichever you prefer.
Or you and Thomas could investigate. :-)
Thanks,
//richard
On 06/01/2017 01:17 PM, Richard Weinberger wrote:
> Am 01.06.2017 um 22:15 schrieb Florian Fainelli:
>> On 06/01/2017 01:11 PM, Richard Weinberger wrote:
>>> Florian,
>>>
>>> Am 01.06.2017 um 21:38 schrieb Florian Fainelli:
>>>>> Presumably because we are not using the same glibc version? The one I
>>>>> have installed on this machine is glibc-2.12, do you want me to attach a
>>>>> copy of it?
>>>>
>>>> Richard, what do we do with this?
>>>
>>> I'd like to see the issues that Thomas sees also get addressed.
>>
>> Sure, but that seems orthogonal? In the absence of an answer from Eli,
>> either you could take my patch or just send reverts of Eli's two
>> commits, whichever you prefer.
>
> Or you and Thomas could investigate. :-)
Honestly, I don't know what do you want me to investigate, my host
machine is old (2.6.32) and does not support PTRACE_GETREGSET or
friends, nor does it have _xstate, so with that, we either don't use
those period, which would be a revert, or we just conditionally build
support for that (my patch) and everyone is happy.
I don't know what the issue Thomas is having (he is now CC'd) and I
still don't understand why you insist on conflating the symbol clash
while statically linking with support for newer x86 FPU stuff...
--
Florian
Am 01.06.2017 um 22:40 schrieb Florian Fainelli:
>>> Sure, but that seems orthogonal? In the absence of an answer from Eli,
>>> either you could take my patch or just send reverts of Eli's two
>>> commits, whichever you prefer.
>>
>> Or you and Thomas could investigate. :-)
>
> Honestly, I don't know what do you want me to investigate, my host
> machine is old (2.6.32) and does not support PTRACE_GETREGSET or
> friends, nor does it have _xstate, so with that, we either don't use
> those period, which would be a revert, or we just conditionally build
> support for that (my patch) and everyone is happy.
This is exactly why we have this mess right now. Everybody is just focusing
on his own stuff.
> I don't know what the issue Thomas is having (he is now CC'd) and I
> still don't understand why you insist on conflating the symbol clash
> while statically linking with support for newer x86 FPU stuff...
The said commits introduced issues, you face some, Thomas is facing some.
I want them to get fixed or at least understood before we apply new patches.
Thanks,
//richard
On 06/01/2017 01:44 PM, Richard Weinberger wrote:
> Am 01.06.2017 um 22:40 schrieb Florian Fainelli:
>>>> Sure, but that seems orthogonal? In the absence of an answer from Eli,
>>>> either you could take my patch or just send reverts of Eli's two
>>>> commits, whichever you prefer.
>>>
>>> Or you and Thomas could investigate. :-)
>>
>> Honestly, I don't know what do you want me to investigate, my host
>> machine is old (2.6.32) and does not support PTRACE_GETREGSET or
>> friends, nor does it have _xstate, so with that, we either don't use
>> those period, which would be a revert, or we just conditionally build
>> support for that (my patch) and everyone is happy.
>
> This is exactly why we have this mess right now. Everybody is just focusing
> on his own stuff.
No, we have this mess right now because you applied patches before and
now you realize that this broke other people's system based on their
attempts to fix it. Don't blame it on people trying to fix things, this
is the worse possible attitude.
>
>> I don't know what the issue Thomas is having (he is now CC'd) and I
>> still don't understand why you insist on conflating the symbol clash
>> while statically linking with support for newer x86 FPU stuff...
>
> The said commits introduced issues, you face some, Thomas is facing some.
> I want them to get fixed or at least understood before we apply new patches.
Well, I would very much like to know what Thomas' issues are, a link to
his findings would be helpful to begin with instead of just waving that
flag.
--
Florian
Florian,
Am 01.06.2017 um 22:53 schrieb Florian Fainelli:
>> This is exactly why we have this mess right now. Everybody is just focusing
>> on his own stuff.
>
> No, we have this mess right now because you applied patches before and
> now you realize that this broke other people's system based on their
> attempts to fix it. Don't blame it on people trying to fix things, this
> is the worse possible attitude.
Let's calm down a bit.
All I want is that we understand all issues before applying partial fixes.
>>> I don't know what the issue Thomas is having (he is now CC'd) and I
>>> still don't understand why you insist on conflating the symbol clash
>>> while statically linking with support for newer x86 FPU stuff...
>>
>> The said commits introduced issues, you face some, Thomas is facing some.
>> I want them to get fixed or at least understood before we apply new patches.
>
> Well, I would very much like to know what Thomas' issues are, a link to
> his findings would be helpful to begin with instead of just waving that
> flag.
Sorry, I thought you are CC'ed.
Thomas please speak up. AFAIR UML fails to boot on one of your new Laptops.
Thanks,
//richard
Am Donnerstag, den 01.06.2017, 22:58 +0200 schrieb Richard Weinberger:
>
> Sorry, I thought you are CC'ed.
> Thomas please speak up. AFAIR UML fails to boot on one of your new
> Laptops.
Hi,
yes, the first userspace process failes here:
void userspace(struct uml_pt_regs *regs)
{
int err, status, op, pid = userspace_pid[0];
/* To prevent races if using_sysemu changes under us.*/
int local_using_sysemu;
siginfo_t si;
/* Handle any immediate reschedules or signals */
interrupt_end();
while (1) {
/*
* This can legitimately fail if the process loads a
* bogus value into a segment register. It will
* segfault and PTRACE_GETREGS will read that value
* out of the process. However, PTRACE_SETREGS will
* fail. In this case, there is nothing to do but
* just kill the process.
*/
if (ptrace(PTRACE_SETREGS, pid, 0, regs->gp))
fatal_sigsegv();
if (put_fp_registers(pid, regs->fp))
-> fatal_sigsegv();
the put_fp_registers fails with errno 4 if I recall correctly.
I didn't investigate yet further, why the the xstate ptrace call fails.
kind regards
thomas
> Thanks,
> //richard
On 06/01/2017 02:25 PM, Thomas Meyer wrote:
> Am Donnerstag, den 01.06.2017, 22:58 +0200 schrieb Richard Weinberger:
>>
>> Sorry, I thought you are CC'ed.
>> Thomas please speak up. AFAIR UML fails to boot on one of your new
>> Laptops.
>
> Hi,
>
> yes, the first userspace process failes here:
>
> void userspace(struct uml_pt_regs *regs)
> {
> int err, status, op, pid = userspace_pid[0];
> /* To prevent races if using_sysemu changes under us.*/
> int local_using_sysemu;
> siginfo_t si;
>
> /* Handle any immediate reschedules or signals */
> interrupt_end();
>
> while (1) {
>
> /*
> * This can legitimately fail if the process loads a
> * bogus value into a segment register. It will
> * segfault and PTRACE_GETREGS will read that value
> * out of the process. However, PTRACE_SETREGS will
> * fail. In this case, there is nothing to do but
> * just kill the process.
> */
> if (ptrace(PTRACE_SETREGS, pid, 0, regs->gp))
> fatal_sigsegv();
>
> if (put_fp_registers(pid, regs->fp))
> -> fatal_sigsegv();
>
> the put_fp_registers fails with errno 4 if I recall correctly.
>
> I didn't investigate yet further, why the the xstate ptrace call fails.
Which of the branches is put_fp_registers() taking? The
restore_fpx_registers() or restore_fp_registers()? 4 would be EINTR...
What kernel version is used on your host running the UML binary?
Thanks
--
Florian
Thomas,
Am 02.06.2017 um 07:49 schrieb Florian Fainelli:
>> the put_fp_registers fails with errno 4 if I recall correctly.
>>
>> I didn't investigate yet further, why the the xstate ptrace call fails.
>
> Which of the branches is put_fp_registers() taking? The
> restore_fpx_registers() or restore_fp_registers()? 4 would be EINTR...
> What kernel version is used on your host running the UML binary?
Another question, is this x86_64 UML on a x86_64 host?
Or i386 on x86_64, i386 on i386?
Thanks,
//richard
Am Donnerstag, den 01.06.2017, 22:49 -0700 schrieb Florian Fainelli:
>
> On 06/01/2017 02:25 PM, Thomas Meyer wrote:
> > Am Donnerstag, den 01.06.2017, 22:58 +0200 schrieb Richard
> > Weinberger:
> > >
> > > Sorry, I thought you are CC'ed.
> > > Thomas please speak up. AFAIR UML fails to boot on one of your
> > > new
> > > Laptops.
> >
> > Hi,
> >
> > yes, the first userspace process failes here:
> >
> > void userspace(struct uml_pt_regs *regs)
> > {
> > int err, status, op, pid = userspace_pid[0];
> > /* To prevent races if using_sysemu changes under us.*/
> > int local_using_sysemu;
> > siginfo_t si;
> >
> > /* Handle any immediate reschedules or signals */
> > interrupt_end();
> >
> > while (1) {
> >
> > /*
> > * This can legitimately fail if the process loads
> > a
> > * bogus value into a segment register. It will
> > * segfault and PTRACE_GETREGS will read that value
> > * out of the process. However, PTRACE_SETREGS
> > will
> > * fail. In this case, there is nothing to do but
> > * just kill the process.
> > */
> > if (ptrace(PTRACE_SETREGS, pid, 0, regs->gp))
> > fatal_sigsegv();
> >
> > if (put_fp_registers(pid, regs->fp))
> > -> fatal_sigsegv();
> >
> > the put_fp_registers fails with errno 4 if I recall correctly.
> >
> > I didn't investigate yet further, why the the xstate ptrace call
> > fails.
>
> Which of the branches is put_fp_registers() taking?
#0 restore_fp_registers (pid=2226, fp_regs=0xafcbf738) at arch/x86/um/os-Linux/registers.c:57
#1 0x0000000060084c80 in put_fp_registers (pid=<optimized out>, regs=<optimized out>) at arch/x86/um/os-Linux/registers.c:124
#2 0x00000000600814e1 in userspace (regs=0xafcbf660) at arch/um/os-Linux/skas/process.c:329
#3 0x0000000060070fc1 in new_thread_handler () at arch/um/kernel/process.c:134
#4 0x0000000000000000 in ?? ()
> The restore_fpx_registers() or restore_fp_registers()?
> 4 would be EINTR...
Yes, strange, indeed.
> What kernel version is used on your host running the UML binary?
It's a VirtualBox with Fedora 25 and "Linux localhost.localdomain 4.10.15-200.fc25.x86_64 #1 SMP Mon May 8 18:46:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux"
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
stepping : 9
cpu MHz : 2904.002
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp
lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3
cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor
lahf_lm abm 3dnowprefetch rdseed clflushopt
bugs :
bogomips : 5808.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
I see this in the kernel log:
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:595 fpu__init_system_xstate+0x4d0/0x877
[ 0.000000] XSAVE consistency problem, dumping leaves
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.10.15-200.fc25.x86_64 #1
[ 0.000000] Call Trace:
[ 0.000000] dump_stack+0x63/0x86
[ 0.000000] __warn+0xcb/0xf0
[ 0.000000] warn_slowpath_fmt+0x5f/0x80
[ 0.000000] ? xfeature_size+0x5a/0x78
[ 0.000000] fpu__init_system_xstate+0x4d0/0x877
[ 0.000000] ? msr_clear_bit+0x3a/0xa0
[ 0.000000] ? 0xffffffffa3000000
[ 0.000000] fpu__init_system+0x194/0x1be
[ 0.000000] early_cpu_init+0xf7/0xf9
[ 0.000000] setup_arch+0xba/0xcf0
[ 0.000000] ? printk+0x57/0x73
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] start_kernel+0xb2/0x48a
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x14d/0x170
[ 0.000000] start_cpu+0x14/0x14
[ 0.000000] ---[ end trace d5213d72358dda94 ]---
[ 0.000000] CPUID[0d, 00]: eax=00000007 ebx=00000440 ecx=00000440 edx=00000000
[...]
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 1088 bytes, using 'standard' format.
UML kernel is:
v4.12-rc3-69-g9ea15a5
CONFIG_UML_X86=y
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
> Thanks
Thomas,
Am 02.06.2017 um 10:04 schrieb Thomas Meyer:
> Am Donnerstag, den 01.06.2017, 22:49 -0700 schrieb Florian Fainelli:
> I see this in the kernel log:
>
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:595 fpu__init_system_xstate+0x4d0/0x877
> [ 0.000000] XSAVE consistency problem, dumping leaves
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.10.15-200.fc25.x86_64 #1
> [ 0.000000] Call Trace:
> [ 0.000000] dump_stack+0x63/0x86
> [ 0.000000] __warn+0xcb/0xf0
> [ 0.000000] warn_slowpath_fmt+0x5f/0x80
> [ 0.000000] ? xfeature_size+0x5a/0x78
> [ 0.000000] fpu__init_system_xstate+0x4d0/0x877
> [ 0.000000] ? msr_clear_bit+0x3a/0xa0
> [ 0.000000] ? 0xffffffffa3000000
> [ 0.000000] fpu__init_system+0x194/0x1be
> [ 0.000000] early_cpu_init+0xf7/0xf9
> [ 0.000000] setup_arch+0xba/0xcf0
> [ 0.000000] ? printk+0x57/0x73
> [ 0.000000] ? early_idt_handler_array+0x120/0x120
> [ 0.000000] start_kernel+0xb2/0x48a
> [ 0.000000] ? early_idt_handler_array+0x120/0x120
> [ 0.000000] x86_64_start_reservations+0x24/0x26
> [ 0.000000] x86_64_start_kernel+0x14d/0x170
> [ 0.000000] start_cpu+0x14/0x14
> [ 0.000000] ---[ end trace d5213d72358dda94 ]---
> [ 0.000000] CPUID[0d, 00]: eax=00000007 ebx=00000440 ecx=00000440 edx=00000000
Does this also happen with a mainline kernel? Also on KVM or bare metal?
Not that UML fails because of this and we're hunting a ghost...
Thanks,
//richard
Am Freitag, den 02.06.2017, 10:30 +0200 schrieb Richard Weinberger:
> Thomas,
>
> Am 02.06.2017 um 10:04 schrieb Thomas Meyer:
> > Am Donnerstag, den 01.06.2017, 22:49 -0700 schrieb Florian
> > Fainelli:
> > I see this in the kernel log:
> >
> > [ 0.000000] ------------[ cut here ]------------
> > [ 0.000000] WARNING: CPU: 0 PID: 0 at
> > arch/x86/kernel/fpu/xstate.c:595
> > fpu__init_system_xstate+0x4d0/0x877
> > [ 0.000000] XSAVE consistency problem, dumping leaves
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.10.15-
> > 200.fc25.x86_64 #1
> > [ 0.000000] Call Trace:
> > [ 0.000000] dump_stack+0x63/0x86
> > [ 0.000000] __warn+0xcb/0xf0
> > [ 0.000000] warn_slowpath_fmt+0x5f/0x80
> > [ 0.000000] ? xfeature_size+0x5a/0x78
> > [ 0.000000] fpu__init_system_xstate+0x4d0/0x877
> > [ 0.000000] ? msr_clear_bit+0x3a/0xa0
> > [ 0.000000] ? 0xffffffffa3000000
> > [ 0.000000] fpu__init_system+0x194/0x1be
> > [ 0.000000] early_cpu_init+0xf7/0xf9
> > [ 0.000000] setup_arch+0xba/0xcf0
> > [ 0.000000] ? printk+0x57/0x73
> > [ 0.000000] ? early_idt_handler_array+0x120/0x120
> > [ 0.000000] start_kernel+0xb2/0x48a
> > [ 0.000000] ? early_idt_handler_array+0x120/0x120
> > [ 0.000000] x86_64_start_reservations+0x24/0x26
> > [ 0.000000] x86_64_start_kernel+0x14d/0x170
> > [ 0.000000] start_cpu+0x14/0x14
> > [ 0.000000] ---[ end trace d5213d72358dda94 ]---
> > [ 0.000000] CPUID[0d, 00]: eax=00000007 ebx=00000440
> > ecx=00000440 edx=00000000
>
> Does this also happen with a mainline kernel? Also on KVM or bare
> metal?
Hi,
I just booted into bare metal, same machine, but host is Fedora 26
"Linux localhost.localdomain 4.11.3-300.fc26.x86_64 #1 SMP Thu May 25 18:43:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux"
kernel log from host:
[ 0.000000] Linux version 4.11.3-300.fc26.x86_64 ([email protected]) (gcc version 7.1.1 20170503 (Red Hat 7.1.1-1) (GCC) ) #1 SMP Thu May 25 18:43:57 UTC 2017
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
The same error happens in restore_fp_registers() with errno 4.
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
stepping : 9
microcode : 0x38
cpu MHz : 499.853
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni
pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr
pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow
vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms
invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5808.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
I'll try a host kernel from master, once I find some time.
> Not that UML fails because of this and we're hunting a ghost...
>
> Thanks,
> //richard
>
Am Freitag, den 02.06.2017, 10:30 +0200 schrieb Richard Weinberger:
> Thomas,
>
> Am 02.06.2017 um 10:04 schrieb Thomas Meyer:
> > Am Donnerstag, den 01.06.2017, 22:49 -0700 schrieb Florian
> > Fainelli:
> > I see this in the kernel log:
> >
> > [ 0.000000] ------------[ cut here ]------------
> > [ 0.000000] WARNING: CPU: 0 PID: 0 at
> > arch/x86/kernel/fpu/xstate.c:595
> > fpu__init_system_xstate+0x4d0/0x877
> > [ 0.000000] XSAVE consistency problem, dumping leaves
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.10.15-
> > 200.fc25.x86_64 #1
> > [ 0.000000] Call Trace:
> > [ 0.000000] dump_stack+0x63/0x86
> > [ 0.000000] __warn+0xcb/0xf0
> > [ 0.000000] warn_slowpath_fmt+0x5f/0x80
> > [ 0.000000] ? xfeature_size+0x5a/0x78
> > [ 0.000000] fpu__init_system_xstate+0x4d0/0x877
> > [ 0.000000] ? msr_clear_bit+0x3a/0xa0
> > [ 0.000000] ? 0xffffffffa3000000
> > [ 0.000000] fpu__init_system+0x194/0x1be
> > [ 0.000000] early_cpu_init+0xf7/0xf9
> > [ 0.000000] setup_arch+0xba/0xcf0
> > [ 0.000000] ? printk+0x57/0x73
> > [ 0.000000] ? early_idt_handler_array+0x120/0x120
> > [ 0.000000] start_kernel+0xb2/0x48a
> > [ 0.000000] ? early_idt_handler_array+0x120/0x120
> > [ 0.000000] x86_64_start_reservations+0x24/0x26
> > [ 0.000000] x86_64_start_kernel+0x14d/0x170
> > [ 0.000000] start_cpu+0x14/0x14
> > [ 0.000000] ---[ end trace d5213d72358dda94 ]---
> > [ 0.000000] CPUID[0d, 00]: eax=00000007 ebx=00000440
> > ecx=00000440 edx=00000000
>
> Does this also happen with a mainline kernel?
Yes, same error on current master on bare metal:
[ 5.300000] Key type encrypted registered
[ 5.300000] This architecture does not have kernel memory protection.
[ 5.300000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
* master 3b1e342be265 Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux
> Also on KVM or bare metal?
> Not that UML fails because of this and we're hunting a ghost...
>
> Thanks,
> //richard
>
Am Freitag, den 02.06.2017, 09:38 +0200 schrieb Richard Weinberger:
> Thomas,
>
> Am 02.06.2017 um 07:49 schrieb Florian Fainelli:
> > > the put_fp_registers fails with errno 4 if I recall correctly.
> > >
> > > I didn't investigate yet further, why the the xstate ptrace call
> > > fails.
> >
> > Which of the branches is put_fp_registers() taking? The
> > restore_fpx_registers() or restore_fp_registers()? 4 would be
> > EINTR...
> > What kernel version is used on your host running the UML binary?
>
> Another question, is this x86_64 UML on a x86_64 host?
yes,
and strace shows this:
ptrace(PTRACE_CONT, 21664, NULL, SIG_0) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=21664, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
wait4(21664, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], WSTOPPED|__WALL, NULL) = 21664
ptrace(PTRACE_SETREGS, 21664, NULL, 0x60f7fa20) = 0
ptrace(PTRACE_CONT, 21664, NULL, SIG_0) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=21664, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
wait4(21664, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], WSTOPPED|__WALL, NULL) = 21664
ptrace(PTRACE_SETREGS, 21664, NULL, 0xb18bc4e0) = 0
ptrace(PTRACE_SETREGSET, 21664, NT_X86_XSTATE, [{iov_base=0xb18bc5b8, iov_len=832}]) = -1 EFAULT (Bad address)
ioctl(1, TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
don't know why gdb shows errno as 4...
> Or i386 on x86_64, i386 on i386?
>
> Thanks,
> //richard
Thomas,
Am 03.06.2017 um 23:25 schrieb Thomas Meyer:
> ptrace(PTRACE_CONT, 21664, NULL, SIG_0) = 0
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=21664, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
> wait4(21664, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], WSTOPPED|__WALL, NULL) = 21664
> ptrace(PTRACE_SETREGS, 21664, NULL, 0x60f7fa20) = 0
> ptrace(PTRACE_CONT, 21664, NULL, SIG_0) = 0
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=21664, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
> wait4(21664, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], WSTOPPED|__WALL, NULL) = 21664
> ptrace(PTRACE_SETREGS, 21664, NULL, 0xb18bc4e0) = 0
> ptrace(PTRACE_SETREGSET, 21664, NT_X86_XSTATE, [{iov_base=0xb18bc5b8, iov_len=832}]) = -1 EFAULT (Bad address)
> ioctl(1, TCGETS, {B38400 -opost -isig -icanon -echo ...}) = 0
>
> don't know why gdb shows errno as 4...
Interesting. This makes much more sense than -EINTR.
I've allocated an i7 machine at $dayjob and hope to be able to able reproduce soon.
Let's gets resolved that mess and apply all the fixes. :-)
Thanks,
//richard