LinuxLists.cc - [PATCH v6 0/3] Introduce IRQ stack on arm64 with percpu changes

2015-11-01 07:47:11

Subject: [PATCH v6 0/3] Introduce IRQ stack on arm64 with percpu changes

This is version 6 of IRQ stack on arm64.

A major change between v5 and v6 is how IRQ stack is allocated. The space
is allocated via generic VM APIs, such as __get_free_pages() or kmalloc()
up to v5. In contrast, PERCPU is responsible for the work in this version
since it helps to 1) handle stack pointer with a minimum number of memory
access and 2) unify memory allocation regardless of page size. (Now ARM64
supports three kinds of page size: 4KB, 16KB, and 64KB.)

Unfortunately, generic percpu codes should be touched a little bit to
support PERCPU stack allocation. That is, 'atom_size' should be adjusted
in case of 4KB page system because stack pointer logic works on the
assumption that IRQ stack is aligned with its own size. Although it is not
mandated by ARMv8 architecture, the restriction faciliates IRQ re-entrance
check and call trace linkage between procee stack and IRQ one.

It would be redundant to introduce ARM64-specific setup_per_cpu_areas()
for a single parameter, 'atom_size' change. This is why this series tries
to update the generic percpu layer. At the same time, but, it is doubtable
to define a new definition for a single arch support. Thus, it is arguable
which approach is better than the other. (I tried to get feedbacks via
linux-mm, but no comments were left.)

In case of Patch1 and Patch2, v6 tag, not v1 one, is appended to align
with a history of this IRQ work. Please let me know if it violates patch
submission rules.

Any comments are greatly welcome.

Thanks in advance!

Best Regards
Jungseok Lee

Changes since v5:
- Introduced a new definition for 'atom_size' configuration
- Used PERCPU for stack allocation, per Catalin

Changes since v4:
- Supported 64KB page system
- Introduced IRQ_STACK_* macro, per Catalin
- Rebased on top of for-next/core

Changes since v3:
- Expanded stack trace to support IRQ stack
- Added more comments

Changes since v2:
- Optmised current_thread_info function as removing masking operation
and volatile keyword, per James and Catalin
- Reworked irq re-enterance check logic using top-bit comparison of
stacks, per James
- Added sp_el0 update in cpu_resume, per James
- Selected HAVE_IRQ_EXIT_ON_IRQ_STACK to expose this feature explicitly
- Added a Tested-by tag from James
- Added comments on sp_el0 as a helper messeage

Changes since v1:
- Rebased on top of v4.3-rc1
- Removed Kconfig about IRQ stack, per James
- Used PERCPU for IRQ stack, per James
- Tried to allocate IRQ stack when CPU is about to start up, per James
- Moved sp_el0 update into kernel_entry macro, per James
- Dropped S_SP removal patch, per Mark and James

Jungseok Lee (3):
percpu: remove PERCPU_ENOUGH_ROOM which is stale definition
percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup
arm64: Introduce IRQ stack

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/irq.h | 6 +++
arch/arm64/include/asm/percpu.h | 6 +++
arch/arm64/include/asm/thread_info.h | 10 +++-
arch/arm64/kernel/entry.S | 42 ++++++++++++++--
arch/arm64/kernel/head.S | 5 ++
arch/arm64/kernel/irq.c | 2 +
arch/arm64/kernel/sleep.S | 3 ++
arch/arm64/kernel/smp.c | 4 +-
include/linux/percpu.h | 6 +--
mm/percpu.c | 6 +--
11 files changed, 75 insertions(+), 16 deletions(-)

--
2.5.0

2015-11-01 07:47:14

by Jungseok Lee

[permalink] [raw]

Subject: [PATCH v6 1/3] percpu: remove PERCPU_ENOUGH_ROOM which is stale definition

As pure cleanup, this patch removes PERCPU_ENOUGH_ROOM which is not
used any more. That is, no code refers to the definition.

Signed-off-by: Jungseok Lee <[email protected]>
---
include/linux/percpu.h | 6 ------
1 file changed, 6 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index caebf2a..4bc6daf 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -18,12 +18,6 @@
#define PERCPU_MODULE_RESERVE 0
#endif

-#ifndef PERCPU_ENOUGH_ROOM
-#define PERCPU_ENOUGH_ROOM \
- (ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
- PERCPU_MODULE_RESERVE)
-#endif
-
/* minimum unit size, also is the maximum supported allocation size */
#define PCPU_MIN_UNIT_SIZE PFN_ALIGN(32 << 10)

--
2.5.0

2015-11-01 07:47:17

by Jungseok Lee

[permalink] [raw]

Subject: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

There is no room to adjust 'atom_size' now when a generic percpu area
is used. It would be redundant to write down an architecture-specific
setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
by default. The value could be updated if needed by architecture.

Signed-off-by: Jungseok Lee <[email protected]>
---
include/linux/percpu.h | 4 ++++
mm/percpu.c | 6 +++---
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 4bc6daf..57a2f16 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -18,6 +18,10 @@
#define PERCPU_MODULE_RESERVE 0
#endif

+#ifndef PERCPU_ATOM_SIZE
+#define PERCPU_ATOM_SIZE PAGE_SIZE
+#endif
+
/* minimum unit size, also is the maximum supported allocation size */
#define PCPU_MIN_UNIT_SIZE PFN_ALIGN(32 << 10)

diff --git a/mm/percpu.c b/mm/percpu.c
index a63b4d8..cd1e0ec 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2201,8 +2201,8 @@ void __init setup_per_cpu_areas(void)
* what the legacy allocator did.
*/
rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
- PERCPU_DYNAMIC_RESERVE, PAGE_SIZE, NULL,
- pcpu_dfl_fc_alloc, pcpu_dfl_fc_free);
+ PERCPU_DYNAMIC_RESERVE, PERCPU_ATOM_SIZE,
+ NULL, pcpu_dfl_fc_alloc, pcpu_dfl_fc_free);
if (rc < 0)
panic("Failed to initialize percpu areas.");

@@ -2231,7 +2231,7 @@ void __init setup_per_cpu_areas(void)

ai = pcpu_alloc_alloc_info(1, 1);
fc = memblock_virt_alloc_from_nopanic(unit_size,
- PAGE_SIZE,
+ PERCPU_ATOM_SIZE,
__pa(MAX_DMA_ADDRESS));
if (!ai || !fc)
panic("Failed to allocate memory for percpu areas.");
--
2.5.0

2015-11-01 07:47:21

by Jungseok Lee

[permalink] [raw]

Subject: [PATCH v6 3/3] arm64: Introduce IRQ stack

Currently, kernel context and interrupts are handled using a single
kernel stack navigated by sp_el1. This forces a system to use 16KB
stack, not 8KB one. This restriction can make low memory platforms
suffer from memory pressure accompanied by performance degradation.

This patch addresses the issue as introducing a separate percpu IRQ
stack to handle both hard and soft interrupts with two ground rules:

- Utilize sp_el0 in EL1 context, which is not used currently
- Do not complicate current_thread_info calculation

It is a core concept to directly retrieve struct thread_info from
sp_el0. This approach helps to prevent text section size from being
increased largely as removing masking operation using THREAD_SIZE
in tons of places.

[Thanks to James Morse for his valuable feedbacks which greatly help
to figure out a better implementation. - Jungseok]

Cc: AKASHI Takahiro <[email protected]>
Tested-by: James Morse <[email protected]>
Signed-off-by: Jungseok Lee <[email protected]>
---
Note that this change has been tested with 4 different combos:
- THREAD_SIZE = 16KB, IRQ_STACK_SIZE = 16KB
- THREAD_SIZE = 16KB, IRQ_STACK_SIZE = 8KB
- THREAD_SIZE = 8KB, IRQ_STACK_SIZE = 16KB
- THREAD_SIZE = 8KB, IRQ_STACK_SIZE = 8KB

I've reviwed the approach using do_softirq_own_stack() which Catalin
mentioned, but it is questionable to reduce max stack depth highly.
That is why this hunk does not change its direction.

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/irq.h | 6 +++
arch/arm64/include/asm/percpu.h | 6 +++
arch/arm64/include/asm/thread_info.h | 10 +++-
arch/arm64/kernel/entry.S | 42 ++++++++++++++--
arch/arm64/kernel/head.S | 5 ++
arch/arm64/kernel/irq.c | 2 +
arch/arm64/kernel/sleep.S | 3 ++
arch/arm64/kernel/smp.c | 4 +-
9 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4d8a5b2..de4e4c9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -69,6 +69,7 @@ config ARM64
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
+ select HAVE_IRQ_EXIT_ON_IRQ_STACK
select HAVE_MEMBLOCK
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index 0916929..d4c23bd 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -1,6 +1,11 @@
#ifndef __ASM_IRQ_H
#define __ASM_IRQ_H

+#define IRQ_STACK_SIZE 16384
+#define IRQ_STACK_START_SP (IRQ_STACK_SIZE - 16)
+
+#ifndef __ASSEMBLY__
+
#include <linux/irqchip/arm-gic-acpi.h>

#include <asm-generic/irq.h>
@@ -21,3 +26,4 @@ static inline void acpi_irq_init(void)
#define acpi_irq_init acpi_irq_init

#endif
+#endif
diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
index 0a456be..c581ed4 100644
--- a/arch/arm64/include/asm/percpu.h
+++ b/arch/arm64/include/asm/percpu.h
@@ -16,6 +16,12 @@
#ifndef __ASM_PERCPU_H
#define __ASM_PERCPU_H

+#ifdef CONFIG_ARM64_4K_PAGES
+#include <asm/irq.h>
+
+#define PERCPU_ATOM_SIZE IRQ_STACK_SIZE
+#endif
+
static inline void set_my_cpu_offset(unsigned long off)
{
asm volatile("msr tpidr_el1, %0" :: "r" (off) : "memory");
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 90c7ff2..abd64bd 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -73,10 +73,16 @@ register unsigned long current_stack_pointer asm ("sp");
*/
static inline struct thread_info *current_thread_info(void) __attribute_const__;

+/*
+ * struct thread_info can be accessed directly via sp_el0.
+ */
static inline struct thread_info *current_thread_info(void)
{
- return (struct thread_info *)
- (current_stack_pointer & ~(THREAD_SIZE - 1));
+ unsigned long sp_el0;
+
+ asm ("mrs %0, sp_el0" : "=r" (sp_el0));
+
+ return (struct thread_info *)sp_el0;
}

#define thread_saved_pc(tsk) \
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 7ed3d75..d24acfc 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -27,6 +27,7 @@
#include <asm/cpufeature.h>
#include <asm/errno.h>
#include <asm/esr.h>
+#include <asm/irq.h>
#include <asm/thread_info.h>
#include <asm/unistd.h>

@@ -88,7 +89,8 @@

.if \el == 0
mrs x21, sp_el0
- get_thread_info tsk // Ensure MDSCR_EL1.SS is clear,
+ mov tsk, sp
+ and tsk, tsk, #~(THREAD_SIZE - 1) // Ensure MDSCR_EL1.SS is clear,
ldr x19, [tsk, #TI_FLAGS] // since we can unmask debug
disable_step_tsk x19, x20 // exceptions when scheduling.
.else
@@ -108,6 +110,13 @@
.endif

/*
+ * Set sp_el0 to current thread_info.
+ */
+ .if \el == 0
+ msr sp_el0, tsk
+ .endif
+
+ /*
* Registers that may be useful after this macro is invoked:
*
* x21 - aborted SP
@@ -164,8 +173,28 @@ alternative_endif
.endm

.macro get_thread_info, rd
- mov \rd, sp
- and \rd, \rd, #~(THREAD_SIZE - 1) // top of stack
+ mrs \rd, sp_el0
+ .endm
+
+ .macro irq_stack_entry
+ adr_l x19, irq_stacks
+ mrs x20, tpidr_el1
+ add x20, x19, x20
+ mov x23, sp
+ and x23, x23, #~(IRQ_STACK_SIZE - 1)
+ cmp x20, x23 // check irq re-entrance
+ mov x19, sp
+ mov x23, #IRQ_STACK_START_SP
+ add x23, x20, x23 // x23 = top of irq stack
+ csel x23, x19, x23, eq
+ mov sp, x23
+ .endm
+
+ /*
+ * x19 is preserved between irq_stack_entry and irq_stack_exit.
+ */
+ .macro irq_stack_exit
+ mov sp, x19
.endm

/*
@@ -183,10 +212,11 @@ tsk .req x28 // current thread_info
* Interrupt handling.
*/
.macro irq_handler
- adrp x1, handle_arch_irq
- ldr x1, [x1, #:lo12:handle_arch_irq]
+ ldr_l x1, handle_arch_irq
mov x0, sp
+ irq_stack_entry
blr x1
+ irq_stack_exit
.endm

.text
@@ -599,6 +629,8 @@ ENTRY(cpu_switch_to)
ldp x29, x9, [x8], #16
ldr lr, [x8]
mov sp, x9
+ and x9, x9, #~(THREAD_SIZE - 1)
+ msr sp_el0, x9
ret
ENDPROC(cpu_switch_to)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 514c1cc..353376e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -424,6 +424,9 @@ __mmap_switched:
b 1b
2:
adr_l sp, initial_sp, x4
+ mov x4, sp
+ and x4, x4, #~(THREAD_SIZE - 1)
+ msr sp_el0, x4 // Save thread_info
str_l x21, __fdt_pointer, x5 // Save FDT pointer
str_l x24, memstart_addr, x6 // Save PHYS_OFFSET
mov x29, #0
@@ -604,6 +607,8 @@ ENDPROC(secondary_startup)
ENTRY(__secondary_switched)
ldr x0, [x21] // get secondary_data.stack
mov sp, x0
+ and x0, x0, #~(THREAD_SIZE - 1)
+ msr sp_el0, x0 // save thread_info
mov x29, #0
b secondary_start_kernel
ENDPROC(__secondary_switched)
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 9f17ec0..943d106 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -30,6 +30,8 @@

unsigned long irq_err_count;

+DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE);
+
int arch_show_interrupts(struct seq_file *p, int prec)
{
show_ipi_list(p, prec);
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index f586f7c..e33fe33 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -173,6 +173,9 @@ ENTRY(cpu_resume)
/* load physical address of identity map page table in x1 */
adrp x1, idmap_pg_dir
mov sp, x2
+ /* save thread_info */
+ and x2, x2, #~(THREAD_SIZE - 1)
+ msr sp_el0, x2
/*
* cpu_do_resume expects x0 to contain context physical address
* pointer and x1 to contain physical address of 1:1 page tables
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 2bbdc0e..4908923 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -91,8 +91,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
int ret;

/*
- * We need to tell the secondary core where to find its stack and the
- * page tables.
+ * We need to tell the secondary core where to find its process stack
+ * and the page tables.
*/
secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
__flush_dcache_area(&secondary_data, sizeof(secondary_data));
--
2.5.0

2015-11-02 16:08:50

by Christoph Lameter (Ampere)

[permalink] [raw]

Subject: Re: [PATCH v6 1/3] percpu: remove PERCPU_ENOUGH_ROOM which is stale definition

On Sun, 1 Nov 2015, Jungseok Lee wrote:

> As pure cleanup, this patch removes PERCPU_ENOUGH_ROOM which is not
> used any more. That is, no code refers to the definition.

Acked-by: Christoph Lameter <[email protected]>

2015-11-02 16:10:27

by Christoph Lameter (Ampere)

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Sun, 1 Nov 2015, Jungseok Lee wrote:

> There is no room to adjust 'atom_size' now when a generic percpu area
> is used. It would be redundant to write down an architecture-specific
> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
> by default. The value could be updated if needed by architecture.

What is atom_size? Why would you want a difference allocation size here?
The percpu area is virtually mapped regardless. So you will have
contiguous addresses even without atom_size.

2015-11-02 16:22:43

by Catalin Marinas

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Mon, Nov 02, 2015 at 10:10:23AM -0600, Christoph Lameter wrote:
> On Sun, 1 Nov 2015, Jungseok Lee wrote:
>
> > There is no room to adjust 'atom_size' now when a generic percpu area
> > is used. It would be redundant to write down an architecture-specific
> > setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
> > this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
> > by default. The value could be updated if needed by architecture.
>
> What is atom_size? Why would you want a difference allocation size here?
> The percpu area is virtually mapped regardless. So you will have
> contiguous addresses even without atom_size.

I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
the approach (and the definition of PERCPU_ATOM_SIZE), therefore
rendering this patch unnecessary. IIUC, this is used to enforce some
alignment of the per-CPU IRQ stack to be able to check whether the
current stack is process or IRQ on exception entry. But there are other,
less intrusive ways to achieve the same (e.g. x86).

--
Catalin

2015-11-02 16:48:29

by Christoph Lameter (Ampere)

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Mon, 2 Nov 2015, Catalin Marinas wrote:

> I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> rendering this patch unnecessary. IIUC, this is used to enforce some
> alignment of the per-CPU IRQ stack to be able to check whether the
> current stack is process or IRQ on exception entry. But there are other,
> less intrusive ways to achieve the same (e.g. x86).

The percpu allocator allows the specification of alignment requirements.

2015-11-02 17:35:26

by Catalin Marinas

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote:
> On Mon, 2 Nov 2015, Catalin Marinas wrote:
> > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> > the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> > rendering this patch unnecessary. IIUC, this is used to enforce some
> > alignment of the per-CPU IRQ stack to be able to check whether the
> > current stack is process or IRQ on exception entry. But there are other,
> > less intrusive ways to achieve the same (e.g. x86).
>
> The percpu allocator allows the specification of alignment requirements.

Patch 3/3 does something like this:

DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE)

where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't
guarantee alignment greater than PAGE_SIZE.

--
Catalin

2015-11-02 18:11:37

by Christoph Lameter (Ampere)

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Mon, 2 Nov 2015, Catalin Marinas wrote:

> On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote:
> > On Mon, 2 Nov 2015, Catalin Marinas wrote:
> > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> > > rendering this patch unnecessary. IIUC, this is used to enforce some
> > > alignment of the per-CPU IRQ stack to be able to check whether the
> > > current stack is process or IRQ on exception entry. But there are other,
> > > less intrusive ways to achieve the same (e.g. x86).
> >
> > The percpu allocator allows the specification of alignment requirements.
>
> Patch 3/3 does something like this:
>
> DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE)
>
> where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't
> guarantee alignment greater than PAGE_SIZE.

And we cannot use percpu_alloc() instead? Aligning the whole of the percpu
area because one allocation requires it?

2015-11-02 18:31:16

by Catalin Marinas

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Mon, Nov 02, 2015 at 12:11:33PM -0600, Christoph Lameter wrote:
> On Mon, 2 Nov 2015, Catalin Marinas wrote:
>
> > On Mon, Nov 02, 2015 at 10:48:17AM -0600, Christoph Lameter wrote:
> > > On Mon, 2 Nov 2015, Catalin Marinas wrote:
> > > > I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> > > > the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> > > > rendering this patch unnecessary. IIUC, this is used to enforce some
> > > > alignment of the per-CPU IRQ stack to be able to check whether the
> > > > current stack is process or IRQ on exception entry. But there are other,
> > > > less intrusive ways to achieve the same (e.g. x86).
> > >
> > > The percpu allocator allows the specification of alignment requirements.
> >
> > Patch 3/3 does something like this:
> >
> > DEFINE_PER_CPU(char [IRQ_STACK_SIZE], irq_stacks) __aligned(IRQ_STACK_SIZE)
> >
> > where IRQ_STACK_SIZE > PAGE_SIZE. AFAICT, setup_per_cpu_areas() doesn't
> > guarantee alignment greater than PAGE_SIZE.
>
> And we cannot use percpu_alloc() instead? Aligning the whole of the percpu
> area because one allocation requires it?

I haven't tried but it seems that pcpu_alloc() has a WARN() when align >
PAGE_SIZE and it would fail.

As I said in a previous reply, I don't think this patch is necessary,
mainly because I don't particularly like the logic for detecting the IRQ
stack re-entrance based on the stack pointer alignment.

--
Catalin

2015-11-02 19:10:49

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH v6 1/3] percpu: remove PERCPU_ENOUGH_ROOM which is stale definition

On Sun, Nov 01, 2015 at 07:46:15AM +0000, Jungseok Lee wrote:
> As pure cleanup, this patch removes PERCPU_ENOUGH_ROOM which is not
> used any more. That is, no code refers to the definition.
>
> Signed-off-by: Jungseok Lee <[email protected]>

Applied to percpu/for-4.4.

Thanks.

--
tejun

2015-11-03 13:50:09

by Jungseok Lee

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Nov 3, 2015, at 1:22 AM, Catalin Marinas wrote:

Hi Catalin,

> On Mon, Nov 02, 2015 at 10:10:23AM -0600, Christoph Lameter wrote:
>> On Sun, 1 Nov 2015, Jungseok Lee wrote:
>>
>>> There is no room to adjust 'atom_size' now when a generic percpu area
>>> is used. It would be redundant to write down an architecture-specific
>>> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
>>> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
>>> by default. The value could be updated if needed by architecture.
>>
>> What is atom_size? Why would you want a difference allocation size here?
>> The percpu area is virtually mapped regardless. So you will have
>> contiguous addresses even without atom_size.
>
> I haven't looked at the patch 3/3 in detail but I'm pretty sure I'll NAK
> the approach (and the definition of PERCPU_ATOM_SIZE), therefore
> rendering this patch unnecessary. IIUC, this is used to enforce some
> alignment of the per-CPU IRQ stack to be able to check whether the
> current stack is process or IRQ on exception entry. But there are other,
> less intrusive ways to achieve the same (e.g. x86).

First of all, thanks for clarification!

That is why I chose the word, 'doubtable', in the cover letter. I will
give up this approach. I've been paranoid about "another pointer read"
which you mentioned [1] for over a week. This wrong idea is my conclusion
with respect to your feedback. I think I've failed to follow you here.

Most ideas came from x86 implementation when I started this work. v2, [2]
might be close to x86 approach. At that time, for IRQ re-entrance check,
count based method was used. But count was considered a redundant variable
since we have preempt_count. As a result, the top-bit comparison idea,
which is an origin of this IRQ_STACK_SIZE alignment, have taken the work,
re-entrance check. Like x86, if we pick up the count method, we could
achieve the goal without this unnecessary alignment. How about your opinon?

I copy and paste x86 code (arch/x86/entry/entry_64.S) for convenience. It has
a comment on why the redundancy is allowed.

----8<----
.macro interrupt func
cld
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS

testb $3, CS(%rsp)
jz 1f

/*
* IRQ from user mode. Switch to kernel gsbase and inform context
* tracking that we're in kernel mode.
*/
SWAPGS
#ifdef CONFIG_CONTEXT_TRACKING
call enter_from_user_mode
#endif

1:
/*
* Save previous stack pointer, optionally switch to interrupt stack.
* irq_count is used to check if a CPU is already on an interrupt stack
* or not. While this is essentially redundant with preempt_count it is
* a little cheaper to use a separate counter in the PDA (short of
* moving irq_enter into assembly, which would be too much work)
*/
movq %rsp, %rdi
incl PER_CPU_VAR(irq_count)
cmovzq PER_CPU_VAR(irq_stack_ptr), %rsp
pushq %rdi
/* We entered an interrupt context - irqs are off: */
TRACE_IRQS_OFF

call \func /* rdi points to pt_regs */
.endm

/*
* The interrupt stubs push (~vector+0x80) onto the stack and
* then jump to common_interrupt.
*/
.p2align CONFIG_X86_L1_CACHE_SHIFT
common_interrupt:
ASM_CLAC
addq $-0x80, (%rsp) /* Adjust vector to [-256, -1] range */
interrupt do_IRQ
----8<----

Additionally, I've been thinking of do_softirq_own_stack() which is your
another comment [3]. Recently, I've realized there is possibility that
I misunderstood your intention. Did you mean that irq_handler hook is not
enough? Should do_softirq_own_stack() be implemented together? If so,
this is my another failure.. It perfectly makes sense.

I hope these are the last two pieces of this interesting feature.

Thanks again!

Best Regards
Jungseok Lee

[1] https://lkml.org/lkml/2015/10/19/596
[2] http://article.gmane.org/gmane.linux.kernel/2037257
[3] http://article.gmane.org/gmane.linux.kernel/2041877-

2015-11-03 14:11:25

by Jungseok Lee

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Nov 3, 2015, at 1:10 AM, Christoph Lameter wrote:

Dear Christoph,

> On Sun, 1 Nov 2015, Jungseok Lee wrote:
>
>> There is no room to adjust 'atom_size' now when a generic percpu area
>> is used. It would be redundant to write down an architecture-specific
>> setup_per_cpu_areas() in order to only change the 'atom_size'. Thus,
>> this patch adds a new definition, PERCPU_ATOM_SIZE, which is PAGE_SIZE
>> by default. The value could be updated if needed by architecture.
>
> What is atom_size? Why would you want a difference allocation size here?
> The percpu area is virtually mapped regardless. So you will have
> contiguous addresses even without atom_size.

I think Catalin have already written down a perfect explanation. I'd like
memory with an alignment greater than PAGE_SIZE. But, __per_cpu_offset[]
is PAGE_SIZE aligned under a generic setup_per_cpu_areas(). That is,
secondary cores cannot get that kind of space.

Thanks for taking a look at this doubtable change!

Best Regards
Jungseok Lee

2015-11-03 14:13:01

by Jungseok Lee

[permalink] [raw]

Subject: Re: [PATCH v6 1/3] percpu: remove PERCPU_ENOUGH_ROOM which is stale definition

On Nov 3, 2015, at 4:10 AM, Tejun Heo wrote:

Dear Tejun,

> On Sun, Nov 01, 2015 at 07:46:15AM +0000, Jungseok Lee wrote:
>> As pure cleanup, this patch removes PERCPU_ENOUGH_ROOM which is not
>> used any more. That is, no code refers to the definition.
>>
>> Signed-off-by: Jungseok Lee <[email protected]>
>
> Applied to percpu/for-4.4.

Thanks for taking care of this!

Best Regards
Jungseok Lee

2015-11-03 17:59:01

by James Morse

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

Hi Jungseok,

On 03/11/15 13:49, Jungseok Lee wrote:
> Additionally, I've been thinking of do_softirq_own_stack() which is your
> another comment [3]. Recently, I've realized there is possibility that
> I misunderstood your intention. Did you mean that irq_handler hook is not
> enough? Should do_softirq_own_stack() be implemented together?

I've been putting together a version to illustrate this, I aim to post it
before the end of this week...

Thanks,

James

2015-11-03 22:07:08

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH v6 1/3] percpu: remove PERCPU_ENOUGH_ROOM which is stale definition

Hello,

On Tue, Nov 03, 2015 at 11:12:51PM +0900, Jungseok Lee wrote:
> On Nov 3, 2015, at 4:10 AM, Tejun Heo wrote:
>
> Dear Tejun,
>
> > On Sun, Nov 01, 2015 at 07:46:15AM +0000, Jungseok Lee wrote:
> >> As pure cleanup, this patch removes PERCPU_ENOUGH_ROOM which is not
> >> used any more. That is, no code refers to the definition.
> >>
> >> Signed-off-by: Jungseok Lee <[email protected]>
> >
> > Applied to percpu/for-4.4.
>
> Thanks for taking care of this!

Can you please refresh the patch so that it also drops
PERCPU_ENOUGH_ROOM definition from ia64?

Thanks.

--
tejun

2015-11-04 13:19:33

by Jungseok Lee

[permalink] [raw]

Subject: Re: [PATCH v6 1/3] percpu: remove PERCPU_ENOUGH_ROOM which is stale definition

On Nov 4, 2015, at 7:07 AM, Tejun Heo wrote:
> Hello,

Hello,

> On Tue, Nov 03, 2015 at 11:12:51PM +0900, Jungseok Lee wrote:
>> On Nov 3, 2015, at 4:10 AM, Tejun Heo wrote:
>>
>> Dear Tejun,
>>
>>> On Sun, Nov 01, 2015 at 07:46:15AM +0000, Jungseok Lee wrote:
>>>> As pure cleanup, this patch removes PERCPU_ENOUGH_ROOM which is not
>>>> used any more. That is, no code refers to the definition.
>>>>
>>>> Signed-off-by: Jungseok Lee <[email protected]>
>>>
>>> Applied to percpu/for-4.4.
>>
>> Thanks for taking care of this!
>
> Can you please refresh the patch so that it also drops
> PERCPU_ENOUGH_ROOM definition from ia64?

Sure! I will do re-spin soon.

Best Regards
Jungseok Lee

2015-11-04 13:35:44

by Jungseok Lee

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] percpu: add PERCPU_ATOM_SIZE for a generic percpu area setup

On Nov 4, 2015, at 2:58 AM, James Morse wrote:
> Hi Jungseok,

Hi James,

> On 03/11/15 13:49, Jungseok Lee wrote:
>> Additionally, I've been thinking of do_softirq_own_stack() which is your
>> another comment [3]. Recently, I've realized there is possibility that
>> I misunderstood your intention. Did you mean that irq_handler hook is not
>> enough? Should do_softirq_own_stack() be implemented together?
>
> I've been putting together a version to illustrate this, I aim to post it
> before the end of this week?

It sounds great! I will wait for your version.

Best Regards
Jungseok Lee-