2014-01-22 11:28:39

by Wang Nan

[permalink] [raw]
Subject: [PATCH 0/3] Bugfix for kdump on arm

This patch series introduce 3 bugfix for kdump (and kexec) on arm platform.

kdump for arm in fact is corrupted (at least for omap4460). With one-month hard
work and with the help of a jtag debugger, we finally make kdump works
reliablly.


Following is the patches. The first 2 patches forms a group, it allow
ioremap_nocache to be taken on reserved pages on arm platform (which is
prohibited by 309caa9cc) and then use ioremap_nocache to copy kexec required
code. The last 1 is for crash dump kernel. It allow kernel to be loaded in the
middle of kernel awared physical memory. Without it, crashdump kernel must be
carefully configured to boot.

Wang Nan (3):
ARM: Premit ioremap() to map reserved pages
ARM: kexec: copying code to ioremapped area
ARM: allow kernel to be loaded in middle of phymem

arch/arm/kernel/machine_kexec.c | 18 ++++++++++++++++--
arch/arm/mm/init.c | 21 ++++++++++++++++++++-
arch/arm/mm/ioremap.c | 2 +-
arch/arm/mm/mmu.c | 13 +++++++++++++
kernel/kexec.c | 40 +++++++++++++++++++++++++++++++++++-----
mm/page_alloc.c | 7 +++++--
6 files changed, 90 insertions(+), 11 deletions(-)


Signed-off-by: Wang Nan <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Geng Hui <[email protected]>

--
1.8.4


2014-01-22 11:28:55

by Wang Nan

[permalink] [raw]
Subject: [PATCH 2/3] ARM: kexec: copying code to ioremapped area

ARM's kdump is actually corrupted (at least for omap4460), mainly because of
cache problem: flush_icache_range can't reliably ensure the copied data
correctly goes into RAM. After mmu turned off and jump to the trampoline, kexec
always failed due to random undef instructions.

This patch use ioremap to make sure the destnation of all memcpy() is
uncachable memory, including copying of target kernel and trampoline.

Signed-off-by: Wang Nan <[email protected]>
Cc: <[email protected]> # 3.4+
Cc: Eric Biederman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Geng Hui <[email protected]>
---
arch/arm/kernel/machine_kexec.c | 18 ++++++++++++++++--
kernel/kexec.c | 40 +++++++++++++++++++++++++++++++++++-----
2 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f0d180d..ba0a5a8 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -144,6 +144,7 @@ void machine_kexec(struct kimage *image)
unsigned long page_list;
unsigned long reboot_code_buffer_phys;
unsigned long reboot_entry = (unsigned long)relocate_new_kernel;
+ void __iomem *reboot_entry_remap;
unsigned long reboot_entry_phys;
void *reboot_code_buffer;

@@ -171,9 +172,22 @@ void machine_kexec(struct kimage *image)


/* copy our kernel relocation code to the control code page */
- reboot_entry = fncpy(reboot_code_buffer,
- reboot_entry,
+ reboot_entry_remap = ioremap_nocache(reboot_code_buffer_phys,
+ relocate_new_kernel_size);
+ if (reboot_entry_remap == NULL) {
+ pr_warn("startup code may not be reliably flushed\n");
+ reboot_entry_remap = (void __iomem *)reboot_code_buffer;
+ }
+
+ reboot_entry = fncpy(reboot_entry_remap, reboot_entry,
relocate_new_kernel_size);
+ reboot_entry = (unsigned long)reboot_code_buffer +
+ (reboot_entry -
+ (unsigned long)reboot_entry_remap);
+
+ if (reboot_entry_remap != reboot_code_buffer)
+ iounmap(reboot_entry_remap);
+
reboot_entry_phys = (unsigned long)reboot_entry +
(reboot_code_buffer_phys - (unsigned long)reboot_code_buffer);

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 9c97016..3e92999 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -806,6 +806,7 @@ static int kimage_load_normal_segment(struct kimage *image,
while (mbytes) {
struct page *page;
char *ptr;
+ void __iomem *ioptr;
size_t uchunk, mchunk;

page = kimage_alloc_page(image, GFP_HIGHUSER, maddr);
@@ -818,7 +819,17 @@ static int kimage_load_normal_segment(struct kimage *image,
if (result < 0)
goto out;

- ptr = kmap(page);
+ /*
+ * Try ioremap to make sure the copied data goes into RAM
+ * reliably. If failed (some archs don't allow ioremap RAM),
+ * use kmap instead.
+ */
+ ioptr = ioremap(page_to_pfn(page) << PAGE_SHIFT,
+ PAGE_SIZE);
+ if (ioptr != NULL)
+ ptr = ioptr;
+ else
+ ptr = kmap(page);
/* Start with a clear page */
clear_page(ptr);
ptr += maddr & ~PAGE_MASK;
@@ -827,7 +838,10 @@ static int kimage_load_normal_segment(struct kimage *image,
uchunk = min(ubytes, mchunk);

result = copy_from_user(ptr, buf, uchunk);
- kunmap(page);
+ if (ioptr != NULL)
+ iounmap(ioptr);
+ else
+ kunmap(page);
if (result) {
result = -EFAULT;
goto out;
@@ -846,7 +860,7 @@ static int kimage_load_crash_segment(struct kimage *image,
{
/* For crash dumps kernels we simply copy the data from
* user space to it's destination.
- * We do things a page at a time for the sake of kmap.
+ * We do things a page at a time for the sake of ioremap/kmap.
*/
unsigned long maddr;
size_t ubytes, mbytes;
@@ -861,6 +875,7 @@ static int kimage_load_crash_segment(struct kimage *image,
while (mbytes) {
struct page *page;
char *ptr;
+ void __iomem *ioptr;
size_t uchunk, mchunk;

page = pfn_to_page(maddr >> PAGE_SHIFT);
@@ -868,7 +883,18 @@ static int kimage_load_crash_segment(struct kimage *image,
result = -ENOMEM;
goto out;
}
- ptr = kmap(page);
+ /*
+ * Try ioremap to make sure the copied data goes into RAM
+ * reliably. If failed (some archs don't allow ioremap RAM),
+ * use kmap instead.
+ */
+ ioptr = ioremap_nocache(page_to_pfn(page) << PAGE_SHIFT,
+ PAGE_SIZE);
+ if (ioptr != NULL)
+ ptr = ioptr;
+ else
+ ptr = kmap(page);
+
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
PAGE_SIZE - (maddr & ~PAGE_MASK));
@@ -879,7 +905,11 @@ static int kimage_load_crash_segment(struct kimage *image,
}
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
- kunmap(page);
+ if (ioptr != NULL)
+ iounmap(ioptr);
+ else
+ kunmap(page);
+
if (result) {
result = -EFAULT;
goto out;
--
1.8.4

2014-01-22 11:28:52

by Wang Nan

[permalink] [raw]
Subject: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem

This patch allows the kernel to be loaded at the middle of kernel awared
physical memory. Before this patch, users must use mem= or device tree to cheat
kernel about the start address of physical memory.

This feature is useful in some special cases, for example, building a crash
dump kernel. Without it, kernel command line, atag and devicetree must be
adjusted carefully, sometimes is impossible.

Signed-off-by: Wang Nan <[email protected]>
Cc: <[email protected]> # 3.4+
Cc: Eric Biederman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Geng Hui <[email protected]>
---
arch/arm/mm/init.c | 21 ++++++++++++++++++++-
arch/arm/mm/mmu.c | 13 +++++++++++++
mm/page_alloc.c | 7 +++++--
3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 3e8f106..4952726 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -334,9 +334,28 @@ void __init arm_memblock_init(struct meminfo *mi,
{
int i;

- for (i = 0; i < mi->nr_banks; i++)
+ for (i = 0; i < mi->nr_banks; i++) {
memblock_add(mi->bank[i].start, mi->bank[i].size);

+ /*
+ * In some special case, for example, building a crushdump
+ * kernel, we want the kernel to be loaded in the middle of
+ * physical memory. In such case, the physical memory before
+ * PHYS_OFFSET is awkward: it can't get directly mapped
+ * (because its address will be smaller than PAGE_OFFSET,
+ * disturbs user address space) also can't be mapped as
+ * HighMem. We reserve such pages here. The only way to access
+ * those pages is ioremap.
+ */
+ if (mi->bank[i].start < PHYS_OFFSET) {
+ unsigned long reserv_size = PHYS_OFFSET -
+ mi->bank[i].start;
+ if (reserv_size > mi->bank[i].size)
+ reserv_size = mi->bank[i].size;
+ memblock_reserve(mi->bank[i].start, reserv_size);
+ }
+ }
+
/* Register the kernel text, kernel data and initrd with memblock. */
#ifdef CONFIG_XIP_KERNEL
memblock_reserve(__pa(_sdata), _end - _sdata);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 580ef2d..2a17c24 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1308,6 +1308,19 @@ static void __init map_lowmem(void)
if (start >= end)
break;

+ /*
+ * If this memblock contain memory before PAGE_OFFSET, memory
+ * before PAGE_OFFSET should't get directly mapped, see code
+ * in create_mapping(). However, memory after PAGE_OFFSET is
+ * occupyed by kernel and still need to be mapped.
+ */
+ if (__phys_to_virt(start) < PAGE_OFFSET) {
+ if (__phys_to_virt(end) > PAGE_OFFSET)
+ start = __virt_to_phys(PAGE_OFFSET);
+ else
+ break;
+ }
+
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = end - start;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5248fe0..d2959e3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4840,10 +4840,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
*/
if (pgdat == NODE_DATA(0)) {
mem_map = NODE_DATA(0)->node_mem_map;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+ /*
+ * In case of CONFIG_HAVE_MEMBLOCK_NODE_MAP or when kernel
+ * loaded at the middle of physical memory, mem_map should
+ * be adjusted.
+ */
if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
}
#endif
#endif /* CONFIG_FLAT_NODE_MEM_MAP */
--
1.8.4

2014-01-22 11:29:21

by Wang Nan

[permalink] [raw]
Subject: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages

This patch relaxes the restriction set by commit 309caa9cc, which
prohibit ioremap() on all kernel managed pages.

Other architectures, such as x86 and (some specific platforms of) powerpc,
allow such mapping.

ioremap() pages is an efficient way to avoid arm's mysterious cache control.
This feature will be used for arm kexec support to ensure copied data goes into
RAM even without cache flushing, because we found that flush_cache_xxx can't
reliably flush code to memory.

Signed-off-by: Wang Nan <[email protected]>
Cc: <[email protected]> # 3.4+
Cc: Eric Biederman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Geng Hui <[email protected]>
---
arch/arm/mm/ioremap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f123d6e..98b1c10 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -298,7 +298,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
/*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
*/
- if (WARN_ON(pfn_valid(pfn)))
+ if (WARN_ON(pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))))
return NULL;

area = get_vm_area_caller(size, VM_IOREMAP, caller);
--
1.8.4

2014-01-22 11:39:50

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages

On Wed, Jan 22, 2014 at 11:25:14AM +0000, Wang Nan wrote:
> This patch relaxes the restriction set by commit 309caa9cc, which
> prohibit ioremap() on all kernel managed pages.
>
> Other architectures, such as x86 and (some specific platforms of) powerpc,
> allow such mapping.
>
> ioremap() pages is an efficient way to avoid arm's mysterious cache control.
> This feature will be used for arm kexec support to ensure copied data goes into
> RAM even without cache flushing, because we found that flush_cache_xxx can't
> reliably flush code to memory.
>
> Signed-off-by: Wang Nan <[email protected]>
> Cc: <[email protected]> # 3.4+
> Cc: Eric Biederman <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Geng Hui <[email protected]>
> ---
> arch/arm/mm/ioremap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
> index f123d6e..98b1c10 100644
> --- a/arch/arm/mm/ioremap.c
> +++ b/arch/arm/mm/ioremap.c
> @@ -298,7 +298,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
> /*
> * Don't allow RAM to be mapped - this causes problems with ARMv6+
> */
> - if (WARN_ON(pfn_valid(pfn)))
> + if (WARN_ON(pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))))

Since reserved pages can still be mapped, how does this avoid the cacheable
alias issue fixed by 309caa9cc6ff ("ARM: Prohibit ioremap() on kernel
managed RAM")?

Will

2014-01-22 11:42:24

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages

On Wed, Jan 22, 2014 at 07:25:14PM +0800, Wang Nan wrote:
> This patch relaxes the restriction set by commit 309caa9cc, which
> prohibit ioremap() on all kernel managed pages.
>
> Other architectures, such as x86 and (some specific platforms of) powerpc,
> allow such mapping.
>
> ioremap() pages is an efficient way to avoid arm's mysterious cache control.
> This feature will be used for arm kexec support to ensure copied data goes into
> RAM even without cache flushing, because we found that flush_cache_xxx can't
> reliably flush code to memory.

Yes, let's bypass the check and allow this in violation of the
architecture specification by allowing mapping the same memory with
different types, which leads to unpredictable behaviour. Yes, that's
a very good idea, because what we want to do is far more important than
following the requirements of the architecture.

So... NAK.

Yes, flush_cache_xxx() doesn't flush back to physical RAM, that's not
what it's defined to do - it's defined that it flushes enough of the
cache to ensure that page table updates are safe (such as when tearing
down a page mapping.) So it's hardly surprising that doesn't work.

If you want to be able to have DMA access to memory, then you need to
use an API which has been designed for that purpose, and if there isn't
one, then you need to discuss your requirements, rather than trying to
hack around the problem.

The issue here will be that the APIs we currently have for DMA become
extremely expensive when you want to deal with (eg) all system RAM.
Or, there's flush_cache_all() which should flush all levels of cache
in the system, and thus push all data back to RAM.

Now, why are you copying your patches to the stable people? That makes
no sense - they haven't been reviewed and they haven't been integrated
into an existing kernel. So, they don't meet the basic requirements
for stable tree submission...

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

2014-01-22 11:56:57

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 1/3] ARM: Premit ioremap() to map reserved pages

On 2014/1/22 19:42, Russell King - ARM Linux wrote:
> On Wed, Jan 22, 2014 at 07:25:14PM +0800, Wang Nan wrote:
>> This patch relaxes the restriction set by commit 309caa9cc, which
>> prohibit ioremap() on all kernel managed pages.
>>
>> Other architectures, such as x86 and (some specific platforms of) powerpc,
>> allow such mapping.
>>
>> ioremap() pages is an efficient way to avoid arm's mysterious cache control.
>> This feature will be used for arm kexec support to ensure copied data goes into
>> RAM even without cache flushing, because we found that flush_cache_xxx can't
>> reliably flush code to memory.
>
> Yes, let's bypass the check and allow this in violation of the
> architecture specification by allowing mapping the same memory with
> different types, which leads to unpredictable behaviour. Yes, that's
> a very good idea, because what we want to do is far more important than
> following the requirements of the architecture.
>
> So... NAK.
>
> Yes, flush_cache_xxx() doesn't flush back to physical RAM, that's not
> what it's defined to do - it's defined that it flushes enough of the
> cache to ensure that page table updates are safe (such as when tearing
> down a page mapping.) So it's hardly surprising that doesn't work.
>
> If you want to be able to have DMA access to memory, then you need to
> use an API which has been designed for that purpose, and if there isn't
> one, then you need to discuss your requirements, rather than trying to
> hack around the problem.

So what is correct API which is designed for this propose?

>
> The issue here will be that the APIs we currently have for DMA become
> extremely expensive when you want to deal with (eg) all system RAM.
> Or, there's flush_cache_all() which should flush all levels of cache
> in the system, and thus push all data back to RAM.
>
> Now, why are you copying your patches to the stable people? That makes
> no sense - they haven't been reviewed and they haven't been integrated
> into an existing kernel. So, they don't meet the basic requirements
> for stable tree submission...
>

2014-01-22 13:05:05

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area

On 2014/1/22 20:56, Vaibhav Bedia wrote:
> On Wed, Jan 22, 2014 at 6:25 AM, Wang Nan <[email protected] <mailto:[email protected]>> wrote:
>
> ARM's kdump is actually corrupted (at least for omap4460), mainly because of
> cache problem: flush_icache_range can't reliably ensure the copied data
> correctly goes into RAM. After mmu turned off and jump to the trampoline, kexec
> always failed due to random undef instructions.
>
> This patch use ioremap to make sure the destnation of all memcpy() is
> uncachable memory, including copying of target kernel and trampoline.
>
>
> AFAIK ioremap on RAM in forbidden in ARM and device memory that ioremap()
> ends up creating is not meant for executable code.
>
> Doesn't this trigger the WARN_ON() in _arm_ioremap_pfn_caller)?

This patch is depend on the previous one:

ARM: Premit ioremap() to map reserved pages

However, Russell is opposed to it.

2014-01-22 13:28:09

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area

On Wed, Jan 22, 2014 at 07:25:15PM +0800, Wang Nan wrote:
> ARM's kdump is actually corrupted (at least for omap4460), mainly because of
> cache problem: flush_icache_range can't reliably ensure the copied data
> correctly goes into RAM.

Quite right too. You're mistake here is thinking that flush_icache_range()
should push it to RAM. That's incorrect.

flush_icache_range() is there to deal with such things as loadable modules
and self modifying code, where the MMU is not being turned off. Hence, it
only flushes to the point of coherency between the I and D caches, and
any further levels of cache between that point and memory are not touched.
Why should it touch any more levels - it's not the function's purpose.

> After mmu turned off and jump to the trampoline, kexec always failed due
> to random undef instructions.

We already have code in the kernel which deals with shutting the MMU off.
An instance of how this can be done is illustrated in the soft_restart()
code path, and kexec already uses this.

One of the first things soft_restart() does is turn off the outer cache -
which OMAP4 does have, but this can only be done if there is a single CPU
running. If there's multiple CPUs running, then the outer cache can't be
disabled, and that's the most likely cause of the problem you're seeing.

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

2014-01-23 02:20:47

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area

On 2014/1/22 21:27, Russell King - ARM Linux wrote:
> On Wed, Jan 22, 2014 at 07:25:15PM +0800, Wang Nan wrote:
>> ARM's kdump is actually corrupted (at least for omap4460), mainly because of
>> cache problem: flush_icache_range can't reliably ensure the copied data
>> correctly goes into RAM.
>
> Quite right too. You're mistake here is thinking that flush_icache_range()
> should push it to RAM. That's incorrect.
>
> flush_icache_range() is there to deal with such things as loadable modules
> and self modifying code, where the MMU is not being turned off. Hence, it
> only flushes to the point of coherency between the I and D caches, and
> any further levels of cache between that point and memory are not touched.
> Why should it touch any more levels - it's not the function's purpose.
>
>> After mmu turned off and jump to the trampoline, kexec always failed due
>> to random undef instructions.
>
> We already have code in the kernel which deals with shutting the MMU off.
> An instance of how this can be done is illustrated in the soft_restart()
> code path, and kexec already uses this.
>
> One of the first things soft_restart() does is turn off the outer cache -
> which OMAP4 does have, but this can only be done if there is a single CPU
> running. If there's multiple CPUs running, then the outer cache can't be
> disabled, and that's the most likely cause of the problem you're seeing.
>

You are right, commit b25f3e1c (OMAP4/highbank: Flush L2 cache before disabling)
solves my problem, it flushes outer cache before disabling. I have tested it in
UP and SMP situations and it works (actually, omap4 has not ready to support kexec
in SMP case, I insert an empty cpu_kill() to make it work), so the first 2
patches are unneeded.

What about the 3rd one (ARM: allow kernel to be loaded in middle of phymem)?

2014-01-23 19:15:14

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem

On Wed, 22 Jan 2014, Wang Nan wrote:

> This patch allows the kernel to be loaded at the middle of kernel awared
> physical memory. Before this patch, users must use mem= or device tree to cheat
> kernel about the start address of physical memory.
>
> This feature is useful in some special cases, for example, building a crash
> dump kernel. Without it, kernel command line, atag and devicetree must be
> adjusted carefully, sometimes is impossible.

With CONFIG_PATCH_PHYS_VIRT the value for PHYS_OFFSET is determined
dynamically by rounding down the kernel image start address to the
previous 16MB boundary. In the case of a crash kernel, this might be
cleaner to simply readjust __pv_phys_offset during early boot and call
fixup_pv_table(), and then reserve away the memory from the previous
kernel. That will let you access that memory directly (with gdb for
example) and no pointer address translation will be required.


> Signed-off-by: Wang Nan <[email protected]>
> Cc: <[email protected]> # 3.4+
> Cc: Eric Biederman <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Geng Hui <[email protected]>
> ---
> arch/arm/mm/init.c | 21 ++++++++++++++++++++-
> arch/arm/mm/mmu.c | 13 +++++++++++++
> mm/page_alloc.c | 7 +++++--
> 3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 3e8f106..4952726 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -334,9 +334,28 @@ void __init arm_memblock_init(struct meminfo *mi,
> {
> int i;
>
> - for (i = 0; i < mi->nr_banks; i++)
> + for (i = 0; i < mi->nr_banks; i++) {
> memblock_add(mi->bank[i].start, mi->bank[i].size);
>
> + /*
> + * In some special case, for example, building a crushdump
> + * kernel, we want the kernel to be loaded in the middle of
> + * physical memory. In such case, the physical memory before
> + * PHYS_OFFSET is awkward: it can't get directly mapped
> + * (because its address will be smaller than PAGE_OFFSET,
> + * disturbs user address space) also can't be mapped as
> + * HighMem. We reserve such pages here. The only way to access
> + * those pages is ioremap.
> + */
> + if (mi->bank[i].start < PHYS_OFFSET) {
> + unsigned long reserv_size = PHYS_OFFSET -
> + mi->bank[i].start;
> + if (reserv_size > mi->bank[i].size)
> + reserv_size = mi->bank[i].size;
> + memblock_reserve(mi->bank[i].start, reserv_size);
> + }
> + }
> +
> /* Register the kernel text, kernel data and initrd with memblock. */
> #ifdef CONFIG_XIP_KERNEL
> memblock_reserve(__pa(_sdata), _end - _sdata);
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 580ef2d..2a17c24 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1308,6 +1308,19 @@ static void __init map_lowmem(void)
> if (start >= end)
> break;
>
> + /*
> + * If this memblock contain memory before PAGE_OFFSET, memory
> + * before PAGE_OFFSET should't get directly mapped, see code
> + * in create_mapping(). However, memory after PAGE_OFFSET is
> + * occupyed by kernel and still need to be mapped.
> + */
> + if (__phys_to_virt(start) < PAGE_OFFSET) {
> + if (__phys_to_virt(end) > PAGE_OFFSET)
> + start = __virt_to_phys(PAGE_OFFSET);
> + else
> + break;
> + }
> +
> map.pfn = __phys_to_pfn(start);
> map.virtual = __phys_to_virt(start);
> map.length = end - start;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5248fe0..d2959e3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4840,10 +4840,13 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> */
> if (pgdat == NODE_DATA(0)) {
> mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> + /*
> + * In case of CONFIG_HAVE_MEMBLOCK_NODE_MAP or when kernel
> + * loaded at the middle of physical memory, mem_map should
> + * be adjusted.
> + */
> if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
> mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> }
> #endif
> #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> --
> 1.8.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-01-23 19:32:15

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem

On Thu, Jan 23, 2014 at 02:15:07PM -0500, Nicolas Pitre wrote:
> On Wed, 22 Jan 2014, Wang Nan wrote:
>
> > This patch allows the kernel to be loaded at the middle of kernel awared
> > physical memory. Before this patch, users must use mem= or device tree to cheat
> > kernel about the start address of physical memory.
> >
> > This feature is useful in some special cases, for example, building a crash
> > dump kernel. Without it, kernel command line, atag and devicetree must be
> > adjusted carefully, sometimes is impossible.
>
> With CONFIG_PATCH_PHYS_VIRT the value for PHYS_OFFSET is determined
> dynamically by rounding down the kernel image start address to the
> previous 16MB boundary. In the case of a crash kernel, this might be
> cleaner to simply readjust __pv_phys_offset during early boot and call
> fixup_pv_table(), and then reserve away the memory from the previous
> kernel. That will let you access that memory directly (with gdb for
> example) and no pointer address translation will be required.

We already have support in the kernel to ignore memory below the calculated
PHYS_OFFSET. See 571b14375019c3a66ef70d4d4a7083f4238aca30.

--
FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up. Estimation
in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
Estimate before purchase was "up to 13.2Mbit".

2014-01-23 20:01:13

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 3/3] ARM: allow kernel to be loaded in middle of phymem

On Thu, 23 Jan 2014, Russell King - ARM Linux wrote:

> On Thu, Jan 23, 2014 at 02:15:07PM -0500, Nicolas Pitre wrote:
> > On Wed, 22 Jan 2014, Wang Nan wrote:
> >
> > > This patch allows the kernel to be loaded at the middle of kernel awared
> > > physical memory. Before this patch, users must use mem= or device tree to cheat
> > > kernel about the start address of physical memory.
> > >
> > > This feature is useful in some special cases, for example, building a crash
> > > dump kernel. Without it, kernel command line, atag and devicetree must be
> > > adjusted carefully, sometimes is impossible.
> >
> > With CONFIG_PATCH_PHYS_VIRT the value for PHYS_OFFSET is determined
> > dynamically by rounding down the kernel image start address to the
> > previous 16MB boundary. In the case of a crash kernel, this might be
> > cleaner to simply readjust __pv_phys_offset during early boot and call
> > fixup_pv_table(), and then reserve away the memory from the previous
> > kernel. That will let you access that memory directly (with gdb for
> > example) and no pointer address translation will be required.
>
> We already have support in the kernel to ignore memory below the calculated
> PHYS_OFFSET. See 571b14375019c3a66ef70d4d4a7083f4238aca30.

Sure. Anyway what I'm suggesting above would require that the crash
kernel be linked at a different virtual address for that to work.
That's probably more trouble than simply mapping the otherwise still
unmapped memory from the crashed kernel.


Nicolas