by Ard Biesheuvel

[permalink] [raw]

Subject: Re: [kernel-hardening] Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and detect out-of-bounds SP

On 18 July 2017 at 22:53, Laura Abbott <[email protected]> wrote:
> On 07/15/2017 05:03 PM, Ard Biesheuvel wrote:
>> On 14 July 2017 at 22:27, Mark Rutland <[email protected]> wrote:
>>> On Fri, Jul 14, 2017 at 03:06:06PM +0100, Mark Rutland wrote:
>>>> On Fri, Jul 14, 2017 at 01:27:14PM +0100, Ard Biesheuvel wrote:
>>>>> On 14 July 2017 at 11:48, Ard Biesheuvel <[email protected]> wrote:
>>>>>> On 14 July 2017 at 11:32, Mark Rutland <[email protected]> wrote:
>>>>>>> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote:
>>>>
>>>>>>>> OK, so here's a crazy idea: what if we
>>>>>>>> a) carve out a dedicated range in the VMALLOC area for stacks
>>>>>>>> b) for each stack, allocate a naturally aligned window of 2x the stack
>>>>>>>> size, and map the stack inside it, leaving the remaining space
>>>>>>>> unmapped
>>>>
>>>>>>> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate
>>>>>>> on XZR rather than SP, so to do this we need to get the SP value into a
>>>>>>> GPR.
>>>>>>>
>>>>>>> Previously, I assumed this meant we needed to corrupt a GPR (and hence
>>>>>>> stash that GPR in a sysreg), so I started writing code to free sysregs.
>>>>>>>
>>>>>>> However, I now realise I was being thick, since we can stash the GPR
>>>>>>> in the SP:
>>>>>>>
>>>>>>> sub sp, sp, x0 // sp = orig_sp - x0
>>>>>>> add x0, sp, x0 // x0 = x0 - (orig_sp - x0) == orig_sp
>>>>
>>>> That comment is off, and should say x0 = x0 + (orig_sp - x0) == orig_sp
>>>>
>>>>>>> sub x0, x0, #S_FRAME_SIZE
>>>>>>> tb(nz) x0, #THREAD_SHIFT, overflow
>>>>>>> add x0, x0, #S_FRAME_SIZE
>>>>>>> sub x0, sp, x0
>>>>>
>>>>> You need a neg x0, x0 here I think
>>>>
>>>> Oh, whoops. I'd mis-simplified things.
>>>>
>>>> We can avoid that by storing orig_sp + orig_x0 in sp:
>>>>
>>>> add sp, sp, x0 // sp = orig_sp + orig_x0
>>>> sub x0, sp, x0 // x0 = orig_sp
>>>> < check >
>>>> sub x0, sp, x0 // x0 = orig_x0
>>>> sub sp, sp, x0 // sp = orig_sp
>>>>
>>>> ... which works in a locally-built kernel where I've aligned all the
>>>> stacks.
>>>
>>> FWIW, I've pushed out a somewhat cleaned-up (and slightly broken!)
>>> version of said kernel source to my arm64/vmap-stack-align branch [1].
>>> That's still missing the backtrace handling, IRQ stack alignment is
>>> broken at least on 64K pages, and there's still more cleanup and rework
>>> to do.
>>>
>>
>> I have spent some time addressing the issues mentioned in the commit
>> log. Please take a look.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git vmap-arm64-mark
>>
>
> I used vmap-arm64-mark to compile kernels for a few days. It seemed to
> work well enough.
>

Thanks for giving this a spin. Any comments on the performance impact?
(if you happened to notice any)

2017-07-19 23:33:12

On 07/20/2017 10:30 AM, Ard Biesheuvel wrote:
> On 20 July 2017 at 09:56, Ard Biesheuvel <[email protected]> wrote:
>> On 20 July 2017 at 09:36, James Morse <[email protected]> wrote:
>>> Hi Ard,
>>>
>>> On 20/07/17 06:35, Ard Biesheuvel wrote:
>>>> On 20 July 2017 at 00:32, Laura Abbott <[email protected]> wrote:
>>>>> I didn't notice any performance impact but I also wasn't trying that
>>>>> hard. I did try this with a different configuration and ran into
>>>>> stackspace errors almost immediately:
>>>>>
>>>>> [ 0.358026] smp: Brought up 1 node, 8 CPUs
>>>>> [ 0.359359] SMP: Total of 8 processors activated.
>>>>> [ 0.359542] CPU features: detected feature: 32-bit EL0 Support
>>>>> [ 0.361781] Insufficient stack space to handle exception!
>>>
>>> [...]
>>>
>>>>> [ 0.367382] Task stack: [0xffffff8008e80000..0xffffff8008e84000]
>>>>> [ 0.367519] IRQ stack: [0xffffffc03bf62000..0xffffffc03bf66000]
>>>>
>>>> The IRQ stack is not 16K aligned ...
>>>
>>>>> [ 0.367687] ESR: 0x00000000 -- Unknown/Uncategorized
>>>>> [ 0.367868] FAR: 0x0000000000000000
>>>>> [ 0.368059] Kernel panic - not syncing: kernel stack overflow
>>>>> [ 0.368252] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.12.0-00018-ge9cf49d604ef-dirty #23
>>>>> [ 0.368427] Hardware name: linux,dummy-virt (DT)
>>>>> [ 0.368612] Call trace:
>>>>> [ 0.368774] [<ffffff8008087fd8>] dump_backtrace+0x0/0x228
>>>>> [ 0.368979] [<ffffff80080882c8>] show_stack+0x10/0x20
>>>>> [ 0.369270] [<ffffff80084602dc>] dump_stack+0x88/0xac
>>>>> [ 0.369459] [<ffffff800816328c>] panic+0x120/0x278
>>>>> [ 0.369582] [<ffffff8008088b40>] handle_bad_stack+0xd0/0xd8
>>>>> [ 0.369799] [<ffffff80080bfb94>] __do_softirq+0x74/0x210
>>>>> [ 0.370560] SMP: stopping secondary CPUs
>>>>> [ 0.384269] Rebooting in 5 seconds..
>>>>>
>>>>> The config is based on what I use for booting my Hikey android
>>>>> board. I haven't been able to narrow down exactly which
>>>>> set of configs set this off.
>>>>>
>>>>
>>>> ... so for some reason, the percpu atom size change fails to take effect here.
>>>
>>> I'm not completely up to speed with these series, so this may be noise:
>>>
>>> When we added the IRQ stack Jungseok Lee discovered that alignment greater than
>>> PAGE_SIZE only applies to CPU0. Secondary CPUs read the per-cpu init data into a
>>> page-aligned area, but any greater alignment requirement is lost.
>>>
>>> Because of this the irqstack was only 16byte aligned, and struct thread_info had
>>> to be be discovered without depending on stack alignment.
>>>
>>
>> We [attempted to] address that by increasing the per-CPU atom size to
>> THREAD_ALIGN if CONFIG_VMAP_STACK=y, but as I am typing this, I wonder
>> if that percolates all the way down to the actual vmap() calls. I will
>> investigate ...
>
> The issue is easily reproducible in QEMU as well, when building from
> the same config. I tracked it down to CONFIG_NUMA=y, which sets
> CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y, affecting the placement of
> the static per-CPU data (including the IRQ stack).
>
> However, what I hadn't realised is that the first chunk is referenced
> via the linear mapping, so we will need to [vm]allocate the per-CPU
> IRQ stacks explicitly, and record the address in a per-CPU pointer
> variable instead.
>
> I have updated my branch accordingly.
>
Yep, this version works, both in QEMU and booting Android.

Thanks,
Laura