2014-01-26 09:09:27

by Thomas Bächler

[permalink] [raw]
Subject: 3.13: <module> disagrees about version of symbol <symbol>

Good morning,

I am trying to build Linux 3.13 on i686 with CONFIG_MODVERSIONS enabled
(for configuration, see [2]). Upon booting it in a VM, I discovered that
I was unable to load several kernel modules, like ext4:

ext4: disagrees about version of symbol d_tmpfile
ext4: Unknown symbol d_tmpfile (err -22)
ext4: disagrees about version of symbol iov_shorten
ext4: Unknown symbol iov_shorten (err -22)
ext4: disagrees about version of symbol in_group_p
ext4: Unknown symbol in_group_p (err -22)
ext4: disagrees about version of symbol do_sync_read
ext4: Unknown symbol do_sync_read (err -22)
ext4: disagrees about version of symbol current_fs_time
ext4: Unknown symbol current_fs_time (err -22)
ext4: disagrees about version of symbol generic_write_sync
ext4: Unknown symbol generic_write_sync (err -22)
ext4: disagrees about version of symbol generic_getxattr
ext4: Unknown symbol generic_getxattr (err -22)

This looks exactly like the problem experienced by Tetsuo Handa in [1].
However, for me, his solution, i.e. setting
CONFIG_PHYSICAL_ALIGN=0x1000000
instead of
CONFIG_PHYSICAL_ALIGN=0x100000
doesn't help and the symptoms stay the same (and, according to the
documentation and to Kbuild, both are valid values on i686).

The affected symbols seem to be exactly those that do not get a CRC
during build:

$ grep 0x000000 Module.symvers
0x00000000 task_nice vmlinux EXPORT_SYMBOL
0x00000000 alloc_pages_current vmlinux EXPORT_SYMBOL
0x00000000 iov_shorten vmlinux EXPORT_SYMBOL
0x00000000 filp_close vmlinux EXPORT_SYMBOL
0x00000000 perf_event_create_kernel_counter vmlinux
EXPORT_SYMBOL_GPL
0x00000000 do_sync_read vmlinux EXPORT_SYMBOL
0x00000000 finish_open vmlinux EXPORT_SYMBOL
0x00000000 vfs_fsync_range vmlinux EXPORT_SYMBOL
0x00000000 path_is_under vmlinux EXPORT_SYMBOL
0x00000000 kern_mount_data vmlinux EXPORT_SYMBOL_GPL
0x00000000 mnt_set_expiry vmlinux EXPORT_SYMBOL
0x00000000 in_group_p vmlinux EXPORT_SYMBOL
0x00000000 sys_close vmlinux EXPORT_SYMBOL
0x00000000 generic_getxattr vmlinux EXPORT_SYMBOL
0x00000000 sigset_from_compat vmlinux EXPORT_SYMBOL_GPL
0x00000000 vm_brk vmlinux EXPORT_SYMBOL
0x00000000 iterate_fd vmlinux EXPORT_SYMBOL
0x00000000 __page_file_mapping vmlinux EXPORT_SYMBOL_GPL
0x00000000 get_unmapped_area vmlinux EXPORT_SYMBOL
0x00000000 ns_capable vmlinux EXPORT_SYMBOL
0x00000000 compat_alloc_user_space vmlinux EXPORT_SYMBOL_GPL
0x00000000 current_fs_time vmlinux EXPORT_SYMBOL
0x00000000 vfs_test_lock vmlinux EXPORT_SYMBOL_GPL
0x00000000 schedule_timeout vmlinux EXPORT_SYMBOL
0x00000000 register_exec_domain vmlinux EXPORT_SYMBOL
0x00000000 generic_write_sync vmlinux EXPORT_SYMBOL
0x00000000 inode_add_bytes vmlinux EXPORT_SYMBOL
0x00000000 softirq_work_list vmlinux EXPORT_SYMBOL
0x00000000 __symbol_put vmlinux EXPORT_SYMBOL
0x00000000 sock_register vmlinux EXPORT_SYMBOL
0x00000000 d_tmpfile vmlinux EXPORT_SYMBOL

Bisecting the problem leads me to the exact same commit that Tetsuo
identified in September, namely

commit 83460ec8dcac14142e7860a01fa59c267ac4657c
Author: Andi Kleen <[email protected]>
Date: Tue Nov 12 15:08:36 2013 -0800

syscalls.h: use gcc alias instead of assembler aliases for syscalls

In fact, reverting this commit gives me a kernel that boots just fine
and loads all modules.

The CRC being 0x0 should not cause a mismatch, after all, it is 0x0 in
both the kernel and the module - so there is another problem on i686
(Tetsuo talks about this in his emails).

However, the 0x0 CRCs are incorrect as well.

As it stands, there is no way to run a modular 3.13 kernel on i686.
What's the correct solution here?

[1] http://www.serverphorums.com/read.php?12,770337
[2]
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux


Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-01-26 14:23:36

by Tetsuo Handa

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Thomas B$BgD(Bhler wrote:
> This looks exactly like the problem experienced by Tetsuo Handa in [1].
> However, for me, his solution, i.e. setting
> CONFIG_PHYSICAL_ALIGN=0x1000000
> instead of
> CONFIG_PHYSICAL_ALIGN=0x100000
> doesn't help and the symptoms stay the same (and, according to the
> documentation and to Kbuild, both are valid values on i686).

I tried your config with "make localmodconfig" and saw the symptoms. I changed
CONFIG_PHYSICAL_ALIGN from 0x100000 to 0x1000000 and no longer see the symptoms.
Did you save your config after changing CONFIG_PHYSICAL_ALIGN ?

2014-01-28 07:54:00

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 26.01.2014 15:22, schrieb Tetsuo Handa:
> Thomas B臘hler wrote:
>> This looks exactly like the problem experienced by Tetsuo Handa in [1].
>> However, for me, his solution, i.e. setting
>> CONFIG_PHYSICAL_ALIGN=0x1000000
>> instead of
>> CONFIG_PHYSICAL_ALIGN=0x100000
>> doesn't help and the symptoms stay the same (and, according to the
>> documentation and to Kbuild, both are valid values on i686).
>
> I tried your config with "make localmodconfig" and saw the symptoms. I changed
> CONFIG_PHYSICAL_ALIGN from 0x100000 to 0x1000000 and no longer see the symptoms.

No idea why this worked for you. Anyway, if
CONFIG_PHYSICAL_ALIGN=0x1000000 is necessary, then Kconfig should
enforce it.

And none of this changes that symbols without CRC are a bug.

> Did you save your config after changing CONFIG_PHYSICAL_ALIGN ?

Well, of course.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-05 01:13:34

by Andi Kleen

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On Tue, Apr 01, 2014 at 01:38:10AM +0200, Thomas B?chler wrote:
> Am 01.04.2014 01:34, schrieb Andi Kleen:
> >> This problem persists in v3.14, i.e. I still have to revert
> >> 83460ec8dcac14142e7860a01fa59c267ac4657c in order to get a working
> >> kernel on i686. I would really appreciate if someone would actually read
> >> my original mail from about 3 months ago and write an answer.
> >
> > Can you resend it please?
>
> It's available here: https://lkml.org/lkml/2014/1/26/22
>
> For convenience, here is a copy-and-paste of the full text:

I did some experiments know and I can't find any 32bit modules
that do not load with 32bit MODVERSIONS on or off with
a current tree.

Do you have a specific config?
Specific compiler version?

Thanks,

-Andi

2014-04-05 14:29:39

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 05.04.2014 03:13, schrieb Andi Kleen:
> On Tue, Apr 01, 2014 at 01:38:10AM +0200, Thomas B?chler wrote:
>> Am 01.04.2014 01:34, schrieb Andi Kleen:
>>>> This problem persists in v3.14, i.e. I still have to revert
>>>> 83460ec8dcac14142e7860a01fa59c267ac4657c in order to get a working
>>>> kernel on i686. I would really appreciate if someone would actually read
>>>> my original mail from about 3 months ago and write an answer.
>>>
>>> Can you resend it please?
>>
>> It's available here: https://lkml.org/lkml/2014/1/26/22
>>
>> For convenience, here is a copy-and-paste of the full text:
>
> I did some experiments know and I can't find any 32bit modules
> that do not load with 32bit MODVERSIONS on or off with
> a current tree.
>
> Do you have a specific config?
> Specific compiler version?

Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.

[1]
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux


Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-05 17:24:26

by Tetsuo Handa

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Thomas B$BgD(Bhler wrote:
> >> For convenience, here is a copy-and-paste of the full text:
> >
> > I did some experiments know and I can't find any 32bit modules
> > that do not load with 32bit MODVERSIONS on or off with
> > a current tree.
> >
> > Do you have a specific config?
> > Specific compiler version?
>
> Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.
>
> [1]
> https://projects.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux
>

I installed ArchLinux i686 and compiled
https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.14.tar.xz using [1]
as of commit bbf3d68a without any changes.

It seems to me that everything is working well.
Which modules you cannot load?

[root@localhost ~]# ls -lrt /boot/*3.14.0-ARCH*
-rw-r--r-- 1 root root 3619792 Apr 6 10:44 /boot/vmlinuz-3.14.0-ARCH
-rw-r--r-- 1 root root 3076964 Apr 6 11:08 /boot/initramfs-3.14.0-ARCH.img
[root@localhost ~]# cat /proc/version
Linux version 3.14.0-ARCH (root@localhost) (gcc version 4.8.2 20140206 (prerelease) (GCC) ) #3 SMP PREEMPT Sun Apr 6 10:28:24 JST 2014
[root@localhost ~]# lsmod | grep ext4
ext4 434862 1
crc16 1091 1 ext4
mbcache 4458 1 ext4
jbd2 70413 1 ext4

2014-04-05 21:47:12

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 05.04.2014 19:23, schrieb Tetsuo Handa:
> Thomas B臘hler wrote:
>>>> For convenience, here is a copy-and-paste of the full text:
>>>
>>> I did some experiments know and I can't find any 32bit modules
>>> that do not load with 32bit MODVERSIONS on or off with
>>> a current tree.
>>>
>>> Do you have a specific config?
>>> Specific compiler version?
>>
>> Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.
>>
>> [1]
>> https://projects.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux
>>
>
> I installed ArchLinux i686 and compiled
> https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.14.tar.xz using [1]
> as of commit bbf3d68a without any changes.
>
> It seems to me that everything is working well.
> Which modules you cannot load?

If I remove the revert of the offending commit, it fails loading ext4,
among others.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-07 17:39:44

by Andi Kleen

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On Sat, Apr 05, 2014 at 04:29:31PM +0200, Thomas B?chler wrote:
> Am 05.04.2014 03:13, schrieb Andi Kleen:
> > On Tue, Apr 01, 2014 at 01:38:10AM +0200, Thomas B?chler wrote:
> >> Am 01.04.2014 01:34, schrieb Andi Kleen:
> >>>> This problem persists in v3.14, i.e. I still have to revert
> >>>> 83460ec8dcac14142e7860a01fa59c267ac4657c in order to get a working
> >>>> kernel on i686. I would really appreciate if someone would actually read
> >>>> my original mail from about 3 months ago and write an answer.
> >>>
> >>> Can you resend it please?
> >>
> >> It's available here: https://lkml.org/lkml/2014/1/26/22
> >>
> >> For convenience, here is a copy-and-paste of the full text:
> >
> > I did some experiments know and I can't find any 32bit modules
> > that do not load with 32bit MODVERSIONS on or off with
> > a current tree.
> >
> > Do you have a specific config?
> > Specific compiler version?
>
> Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.

I tested this configuration (with gcc 4.8 on FC20/19) and it loads
ext4 and all the other modules without any problems.

Base tree:

commit e06df6a7eae1ab1ef4deb076aeeaed90e948e5c0
Merge: c0fc3cb 9dd721c
Author: Linus Torvalds <[email protected]>
Date: Mon Mar 31 12:34:49 2014 -0700

Must be something really archlinux specific. Please do some debugging.
Also please double check that all your test procedures are correct.

-Andi

2014-04-07 17:46:37

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 07.04.2014 19:30, schrieb Andi Kleen:
>>> Do you have a specific config?
>>> Specific compiler version?
>>
>> Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.
>
> I tested this configuration (with gcc 4.8 on FC20/19) and it loads
> ext4 and all the other modules without any problems.
>
> Base tree:
>
> commit e06df6a7eae1ab1ef4deb076aeeaed90e948e5c0
> Merge: c0fc3cb 9dd721c
> Author: Linus Torvalds <[email protected]>
> Date: Mon Mar 31 12:34:49 2014 -0700
>
> Must be something really archlinux specific. Please do some debugging.
> Also please double check that all your test procedures are correct.

Tetsuo was so kind to install Arch Linux and reproduce the exact
procedure I use to create the kernel (which includes the automated
creation of a pristine build environment) - his kernel booted just fine.
I will do more tests today on two of my own computers to narrow this down.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-07 20:10:53

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 07.04.2014 19:46, schrieb Thomas B?chler:
> Am 07.04.2014 19:30, schrieb Andi Kleen:
>>>> Do you have a specific config?
>>>> Specific compiler version?
>>>
>>> Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.
>>
>> I tested this configuration (with gcc 4.8 on FC20/19) and it loads
>> ext4 and all the other modules without any problems.
>>
>> Base tree:
>>
>> commit e06df6a7eae1ab1ef4deb076aeeaed90e948e5c0
>> Merge: c0fc3cb 9dd721c
>> Author: Linus Torvalds <[email protected]>
>> Date: Mon Mar 31 12:34:49 2014 -0700
>>
>> Must be something really archlinux specific. Please do some debugging.
>> Also please double check that all your test procedures are correct.
>
> Tetsuo was so kind to install Arch Linux and reproduce the exact
> procedure I use to create the kernel (which includes the automated
> creation of a pristine build environment) - his kernel booted just fine.
> I will do more tests today on two of my own computers to narrow this down.

I think I found out why nobody could reproduce the problem.

I did a few more tests and it turns out that the problem only occurs
when I boot the kernel with UEFI (using Gummiboot+EFISTUB). Now, except
for OVMF virtual machines, there are barely any 32 Bit UEFI machines
around, so nobody noticed. When I boot the kernel with 32 Bit BIOS, it
boots fine.

Just to clarify: As mentioned in my first mail, some symbols still get a
0x0 CRC (which I still think is wrong), but the mismatch does not occur
in BIOS mode.

On x86_64, the problem does not occur at all.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-07 21:25:59

by Matt Fleming

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On 7 April 2014 21:42, Andi Kleen <[email protected]> wrote:
>
> This sounds like the UEFI boot corrupts some memory?

Hmpf, yeah. I'll take a look in the morning.

Thomas, you mention you're running in a 32-bit vm earlier in this
thread. Any chance you're using ovmf because that would make it much
easier to track this down?

2014-04-07 21:30:40

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 07.04.2014 23:25, schrieb Fleming, Matt:
> On 7 April 2014 21:42, Andi Kleen <[email protected]> wrote:
>>
>> This sounds like the UEFI boot corrupts some memory?
>
> Hmpf, yeah. I'll take a look in the morning.
>
> Thomas, you mention you're running in a 32-bit vm earlier in this
> thread. Any chance you're using ovmf because that would make it much
> easier to track this down?

Yes, the virtual machine is running OVMF svn 15280 (which is the last
version I built).

qemu-system-x86_64 -enable-kvm -watchdog i6300esb -device
virtio-net-pci,netdev=n1,mac=1a:2b:3c:4d:8e:2f -netdev
vde,sock=/run/vde/qemu.ctl,id=n1 -device virtio-scsi-pci,id=scsi -drive
file=./vm32-uefi.img,if=none,id=hd -device scsi-hd,drive=hd -enable-kvm
-m 2G -cpu kvm32 -smp 4,cores=4 -k de -balloon virtio -name vm32-uefi
-usb -nodefaults -vga cirrus -monitor vc -pflash ./vm32-uefi-ovmf.img


Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-07 21:53:51

by Tetsuo Handa

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Fleming, Matt wrote:
> On 7 April 2014 21:42, Andi Kleen <[email protected]> wrote:
> >
> > This sounds like the UEFI boot corrupts some memory?
>
> Hmpf, yeah. I'll take a look in the morning.
>
> Thomas, you mention you're running in a 32-bit vm earlier in this
> thread. Any chance you're using ovmf because that would make it much
> easier to track this down?
>

I'm not familiar with UEFI boot, but it could happen because what
I experienced with BIOS boot was an address dependent behavior.

https://lkml.org/lkml/2013/9/4/188

2014-04-07 23:07:32

by Andi Kleen

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On Mon, Apr 07, 2014 at 10:10:46PM +0200, Thomas B?chler wrote:
> Am 07.04.2014 19:46, schrieb Thomas B?chler:
> > Am 07.04.2014 19:30, schrieb Andi Kleen:
> >>>> Do you have a specific config?
> >>>> Specific compiler version?
> >>>
> >>> Using gcc 4.8 from Arch Linux with the configuration at [1] and Linux 3.14.
> >>
> >> I tested this configuration (with gcc 4.8 on FC20/19) and it loads
> >> ext4 and all the other modules without any problems.
> >>
> >> Base tree:
> >>
> >> commit e06df6a7eae1ab1ef4deb076aeeaed90e948e5c0
> >> Merge: c0fc3cb 9dd721c
> >> Author: Linus Torvalds <[email protected]>
> >> Date: Mon Mar 31 12:34:49 2014 -0700
> >>
> >> Must be something really archlinux specific. Please do some debugging.
> >> Also please double check that all your test procedures are correct.
> >
> > Tetsuo was so kind to install Arch Linux and reproduce the exact
> > procedure I use to create the kernel (which includes the automated
> > creation of a pristine build environment) - his kernel booted just fine.
> > I will do more tests today on two of my own computers to narrow this down.
>
> I think I found out why nobody could reproduce the problem.
>
> I did a few more tests and it turns out that the problem only occurs
> when I boot the kernel with UEFI (using Gummiboot+EFISTUB). Now, except
> for OVMF virtual machines, there are barely any 32 Bit UEFI machines
> around, so nobody noticed. When I boot the kernel with 32 Bit BIOS, it
> boots fine.
>
> Just to clarify: As mentioned in my first mail, some symbols still get a
> 0x0 CRC (which I still think is wrong), but the mismatch does not occur
> in BIOS mode.
>
> On x86_64, the problem does not occur at all.

Thanks.

This sounds like the UEFI boot corrupts some memory?

Copying some UEFI experts.

Maybe we need some CRC checksum checking in the kernel?

-Andi

--
[email protected] -- Speaking for myself only

2014-04-08 12:14:15

by Matt Fleming

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On Tue, 08 Apr, at 06:46:49AM, Tetsuo Handa wrote:
> Fleming, Matt wrote:
> > On 7 April 2014 21:42, Andi Kleen <[email protected]> wrote:
> > >
> > > This sounds like the UEFI boot corrupts some memory?
> >
> > Hmpf, yeah. I'll take a look in the morning.
> >
> > Thomas, you mention you're running in a 32-bit vm earlier in this
> > thread. Any chance you're using ovmf because that would make it much
> > easier to track this down?
> >
>
> I'm not familiar with UEFI boot, but it could happen because what
> I experienced with BIOS boot was an address dependent behavior.
>
> https://lkml.org/lkml/2013/9/4/188

OK, that's a pretty good clue, thanks Tetsuo.

Thomas, could you try this patch? It seems the use of code32_start in
the EFI boot stub was totally wrong for the case where the boot stub
relocates the kernel - you're likely to hit this path if using the EFI
boot stub directly from the EFI shell or gummiboot.

It was pointing at the start of the kernel image and not the protected
mode code.


diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 1e6146137f8e..65e7d7e0ef1b 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -1016,6 +1016,9 @@ void setup_graphics(struct boot_params *boot_params)
* Because the x86 boot code expects to be passed a boot_params we
* need to create one ourselves (usually the bootloader would create
* one for us).
+ *
+ * The caller is responsible for filling out ->code32_start in the
+ * returned boot_params.
*/
struct boot_params *make_boot_params(struct efi_config *c)
{
@@ -1081,8 +1084,6 @@ struct boot_params *make_boot_params(struct efi_config *c)
hdr->vid_mode = 0xffff;
hdr->boot_flag = 0xAA55;

- hdr->code32_start = (__u64)(unsigned long)image->image_base;
-
hdr->type_of_loader = 0x21;

/* Convert unicode cmdline to ascii */
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index de9d4200d305..cbed1407a5cd 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -59,6 +59,7 @@ ENTRY(efi_pe_entry)
call make_boot_params
cmpl $0, %eax
je fail
+ movl %esi, BP_code32_start(%eax)
popl %ecx
pushl %eax
pushl %ecx
@@ -90,12 +91,7 @@ fail:
hlt
jmp fail
2:
- call 3f
-3:
- popl %eax
- subl $3b, %eax
- subl BP_pref_address(%esi), %eax
- add BP_code32_start(%esi), %eax
+ movl BP_code32_start(%esi), %eax
leal preferred_addr(%eax), %eax
jmp *%eax

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 57e58a5fa210..0d558ee899ae 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -261,6 +261,8 @@ ENTRY(efi_pe_entry)
cmpq $0,%rax
je fail
mov %rax, %rsi
+ leaq startup_32(%rip), %rax
+ movl %eax, BP_code32_start(%rsi)
jmp 2f /* Skip the relocation */

handover_entry:
@@ -284,12 +286,7 @@ fail:
hlt
jmp fail
2:
- call 3f
-3:
- popq %rax
- subq $3b, %rax
- subq BP_pref_address(%rsi), %rax
- add BP_code32_start(%esi), %eax
+ movl BP_code32_start(%esi), %eax
leaq preferred_addr(%rax), %rax
jmp *%rax


--
Matt Fleming, Intel Open Source Technology Center

2014-04-08 18:57:12

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 08.04.2014 14:14, schrieb Matt Fleming:
> On Tue, 08 Apr, at 06:46:49AM, Tetsuo Handa wrote:
>> Fleming, Matt wrote:
>>> On 7 April 2014 21:42, Andi Kleen <[email protected]> wrote:
>>>>
>>>> This sounds like the UEFI boot corrupts some memory?
>>>
>>> Hmpf, yeah. I'll take a look in the morning.
>>>
>>> Thomas, you mention you're running in a 32-bit vm earlier in this
>>> thread. Any chance you're using ovmf because that would make it much
>>> easier to track this down?
>>>
>>
>> I'm not familiar with UEFI boot, but it could happen because what
>> I experienced with BIOS boot was an address dependent behavior.
>>
>> https://lkml.org/lkml/2013/9/4/188
>
> OK, that's a pretty good clue, thanks Tetsuo.
>
> Thomas, could you try this patch? It seems the use of code32_start in
> the EFI boot stub was totally wrong for the case where the boot stub
> relocates the kernel - you're likely to hit this path if using the EFI
> boot stub directly from the EFI shell or gummiboot.
>
> It was pointing at the start of the kernel image and not the protected
> mode code.

Hello Matt,

I am unable to backport this to 3.14 for lack of assembler magic. While
I can test this with git master, I eventually still need a version that
is backported to 3.14. Any chance you could provide that, too?



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-08 20:04:57

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 08.04.2014 20:57, schrieb Thomas B?chler:
>>>> Thomas, you mention you're running in a 32-bit vm earlier in this
>>>> thread. Any chance you're using ovmf because that would make it much
>>>> easier to track this down?
>>>>
>>>
>>> I'm not familiar with UEFI boot, but it could happen because what
>>> I experienced with BIOS boot was an address dependent behavior.
>>>
>>> https://lkml.org/lkml/2013/9/4/188
>>
>> OK, that's a pretty good clue, thanks Tetsuo.
>>
>> Thomas, could you try this patch? It seems the use of code32_start in
>> the EFI boot stub was totally wrong for the case where the boot stub
>> relocates the kernel - you're likely to hit this path if using the EFI
>> boot stub directly from the EFI shell or gummiboot.
>>
>> It was pointing at the start of the kernel image and not the protected
>> mode code.
>
> Hello Matt,
>
> I am unable to backport this to 3.14 for lack of assembler magic. While
> I can test this with git master, I eventually still need a version that
> is backported to 3.14. Any chance you could provide that, too?

Hello again Matt,

with linux.git master, I cannot reproduce the problem at all (with or
without your patch). In fact, all the 0x0 CRCs on symbols are gone, and
those were the symbols that were broken after all.

FWIW, with your patch the kernel still boots.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-09 08:25:52

by Matt Fleming

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On Tue, 08 Apr, at 10:04:48PM, Thomas B?chler wrote:
>
> Hello again Matt,
>
> with linux.git master, I cannot reproduce the problem at all (with or
> without your patch). In fact, all the 0x0 CRCs on symbols are gone, and
> those were the symbols that were broken after all.
>
> FWIW, with your patch the kernel still boots.

Could you try this version? It's against v3.14,


diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index a7677babf946..78cbb2db5a85 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -425,6 +425,9 @@ void setup_graphics(struct boot_params *boot_params)
* Because the x86 boot code expects to be passed a boot_params we
* need to create one ourselves (usually the bootloader would create
* one for us).
+ *
+ * The caller is responsible for filling out ->code32_start in the
+ * returned boot_params.
*/
struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table)
{
@@ -483,8 +486,6 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table)
hdr->vid_mode = 0xffff;
hdr->boot_flag = 0xAA55;

- hdr->code32_start = (__u64)(unsigned long)image->image_base;
-
hdr->type_of_loader = 0x21;

/* Convert unicode cmdline to ascii */
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 9116aac232c7..f45ab7a36fb6 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -50,6 +50,13 @@ ENTRY(efi_pe_entry)
pushl %eax
pushl %esi
pushl %ecx
+
+ call reloc
+reloc:
+ popl %ecx
+ subl reloc, %ecx
+ movl %ecx, BP_code32_start(%eax)
+
sub $0x4, %esp

ENTRY(efi_stub_entry)
@@ -63,12 +70,7 @@ ENTRY(efi_stub_entry)
hlt
jmp 1b
2:
- call 3f
-3:
- popl %eax
- subl $3b, %eax
- subl BP_pref_address(%esi), %eax
- add BP_code32_start(%esi), %eax
+ movl BP_code32_start(%esi), %eax
leal preferred_addr(%eax), %eax
jmp *%eax

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c5c1ae0997e7..b10fa66a2540 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -217,6 +217,8 @@ ENTRY(efi_pe_entry)
cmpq $0,%rax
je 1f
mov %rax, %rdx
+ leaq startup_32(%rip), %rax
+ movl %eax, BP_code32_start(%rdx)
popq %rsi
popq %rdi

@@ -230,12 +232,7 @@ ENTRY(efi_stub_entry)
hlt
jmp 1b
2:
- call 3f
-3:
- popq %rax
- subq $3b, %rax
- subq BP_pref_address(%rsi), %rax
- add BP_code32_start(%esi), %eax
+ movl BP_code32_start(%esi), %eax
leaq preferred_addr(%rax), %rax
jmp *%rax


--
Matt Fleming, Intel Open Source Technology Center

2014-04-09 08:30:41

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 09.04.2014 10:25, schrieb Matt Fleming:
> On Tue, 08 Apr, at 10:04:48PM, Thomas B?chler wrote:
>>
>> Hello again Matt,
>>
>> with linux.git master, I cannot reproduce the problem at all (with or
>> without your patch). In fact, all the 0x0 CRCs on symbols are gone, and
>> those were the symbols that were broken after all.
>>
>> FWIW, with your patch the kernel still boots.
>
> Could you try this version? It's against v3.14,

I'll do that tonight and report back (now + ~10 hours).

In the meantime, I figured out which commit fixed the 0x00000000 symbol
CRCs in 3.14+, making this bug invisible - it was
dc53324060f324e8af6867f57bf4891c13c6ef18 in the Linus tree.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-09 18:01:19

by Thomas Bächler

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

Am 09.04.2014 10:30, schrieb Thomas B?chler:
> Am 09.04.2014 10:25, schrieb Matt Fleming:
>> On Tue, 08 Apr, at 10:04:48PM, Thomas B?chler wrote:
>>>
>>> Hello again Matt,
>>>
>>> with linux.git master, I cannot reproduce the problem at all (with or
>>> without your patch). In fact, all the 0x0 CRCs on symbols are gone, and
>>> those were the symbols that were broken after all.
>>>
>>> FWIW, with your patch the kernel still boots.
>>
>> Could you try this version? It's against v3.14,
>
> I'll do that tonight and report back (now + ~10 hours).

Hello again Matt,

that patch seems to help. Thank you so much.

(I am going to apply this patch and backport dc53324060, too, so
everything should be in order then.)

Regards
Thomas

> In the meantime, I figured out which commit fixed the 0x00000000 symbol
> CRCs in 3.14+, making this bug invisible - it was
> dc53324060f324e8af6867f57bf4891c13c6ef18 in the Linus tree.



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2014-04-09 20:43:25

by Matt Fleming

[permalink] [raw]
Subject: Re: 3.13: <module> disagrees about version of symbol <symbol>

On Wed, 09 Apr, at 08:01:02PM, Thomas B?chler wrote:
>
> Hello again Matt,
>
> that patch seems to help. Thank you so much.
>
> (I am going to apply this patch and backport dc53324060, too, so
> everything should be in order then.)

That's great to hear, thanks for testing. I'll get this applied.

--
Matt Fleming, Intel Open Source Technology Center