On Thu, Aug 29, 2019 at 10:19:55AM -0500, Meyer, Kyle wrote:
> Hi Steve,
>
>
> My patch series was accepted so I don't have much to work on currently, is
> there anything I can help you with?
>
>
> Thank you
Why yes, there is!
Loading the tip of the tree kernel, in my case on top of SLES12sp4, I
could not get the kdump kernel to load properly.
I've actually got a fix for it (reverting a commit), but I'm working
on narrowing it down to a fix rather than a revert. I've already
involved the linux list, some details are below, except I typo'd on
the commit hash (missed the first character on the copy / paste), it
should be b059f801a937.
If you could see if you can reproduce the same problem to start with,
you could help me.
Load up SLES12sp4.
Make sure the kernel command line is using crashkernel=512M,high.
Build and install the community kernel.
Reboot into that kernel.
run "systemctl status kdump" until kdump installation completes -- I
get a failure, do you? If not we need to figure out why.
If you run dmesg | tail, you should also see a kexec relocation
overflow message.
After you get that far, we'll see where I'm at and what you can do to
help.
--> Steve
On Wed, Aug 28, 2019 at 02:42:26PM -0500, Steve Wahl wrote:
> Please CC me on responses to this.
>
> I normally would do more diligence on this, but the timing is such
> that I think it's better to get this out sooner.
>
> With the tip of the tree from https://github.com/torvalds/linux.git (a
> few days old, most recent commit fetched is
> bb7ba8069de933d69cb45dd0a5806b61033796a3), I'm seeing "kexec: Overflow
> in relocation type 11 value 0x11fffd000" when I try to load a crash
> kernel with kdump. This seems to be caused by commit
> 059f801a937d164e03b33c1848bb3dca67c0b04, which changed the compiler
> flags used to compile purgatory.ro, apparently creating 32 bit
> relocations for things that aren't necessarily reachable with a 32 bit
> reference. My guess is this only occurs when the crash kernel is
> located outside 32-bit addressable physical space.
>
> I have so far verified that the problem occurs with that commit, and
> does not occur with the previous commit. For this commit, Thomas
> Gleixner mentioned a few of the changed flags should have been looked
> at twice. I have not gone so far as to figure out which flags cause
> the problem.
>
> The hardware in use is a HPE Superdome Flex with 48 * 32GiB dimms
> (total 1536 GiB).
>
> One example of the exact error messages seen:
>
> 019-08-28T13:42:39.308110-05:00 uv4test14 kernel: [ 45.137743] kexec: Overflow in relocation type 11 value 0x17f7affd000
> 2019-08-28T13:42:39.308123-05:00 uv4test14 kernel: [ 45.137749] kexec-bzImage64: Loading purgatory failed
>
> --> Steve Wahl
> --
> Steve Wahl, Hewlett Packard Enterprise
On Thu, Aug 29, 2019 at 10:19:55AM -0500, Meyer, Kyle wrote:
> Hi Steve,
>
>
> My patch series was accepted so I don't have much to work on currently, is
> there anything I can help you with?
>
>
> Thank you
>
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Wahl, Steve <[email protected]>
> Sent: Friday, August 23, 2019 4:16:38 PM
> To: Meyer, Kyle <[email protected]>
> Subject: Re: The patch
>
>
> unfortunately, uv4test23 was mostly taken by someone else today.
>
>
> On uv4test14, I first tried with my own copy of the upstream kernel, then I
> snuck on to uv4test23 and grabbed a copy of your kernel directory. I still
> keep getting relocation errors when kexec tries to load the crash kernel. Did
> you ever see anything like this?
>
>
> v4test14:~ # dmesg | grep kexec
> [ 141.497797] kexec: Overflow in relocation type 11 value 0x17f7affd000
> [ 141.497802] kexec-bzImage64: Loading purgatory failed
> [ 480.183448] kexec: Overflow in relocation type 11 value 0x17f7affd000
> [ 480.183453] kexec-bzImage64: Loading purgatory failed
> [ 512.094071] kexec: Overflow in relocation type 11 value 0x17f7affd000
> [ 512.094076] kexec-bzImage64: Loading purgatory failed
>
> --> Steve
>
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Meyer, Kyle <[email protected]>
> Sent: Thursday, August 22, 2019 10:06:11 AM
> To: Wahl, Steve <[email protected]>
> Subject: Re: The patch
>
>
> I have uv4test23 reserved until 5:00 today, it's booted up with the upstream
> kernel on fs0. I've already change hpe-auto-config also. Feel free to use it
> anytime!
>
>
> Thanks
>
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Meyer, Kyle <[email protected]>
> Sent: Wednesday, August 21, 2019 5:18:05 PM
> To: Wahl, Steve <[email protected]>
> Subject: Re: The patch
>
>
> Hi Steve,
>
>
> Thanks for sending me that, I'll come in early tomorrow morning and get another
> machine booted up with the upstream kernel. I got one running and consistently
> crashing but someone has it reserved after me.
>
>
> Thanks
>
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Wahl, Steve <[email protected]>
> Sent: Wednesday, August 21, 2019 4:30:10 PM
> To: Meyer, Kyle <[email protected]>
> Subject: The patch
>
>
> When the time comes, the attached file is the change I'm running that matters.
>
>
> I have other stuff that dumps the page tables, but this is the meat. Raw, not
> suitable for submitting upstream, of course.
>
>
> --> Steve
>
--
Steve Wahl, Hewlett Packard Enterprise