2014-04-14 19:03:08

by Daniel Borkmann

[permalink] [raw]
Subject: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:

audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000

audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000

These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.

The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:

struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};

Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:

syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);

Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.

Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: James Morris <[email protected]>
Cc: Kees Cook <[email protected]>
---
Dave, do you want to pick this up?

kernel/seccomp.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index d8d046c..590c379 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -69,18 +69,17 @@ static void populate_seccomp_data(struct seccomp_data *sd)
{
struct task_struct *task = current;
struct pt_regs *regs = task_pt_regs(task);
+ unsigned long args[6];

sd->nr = syscall_get_nr(task, regs);
sd->arch = syscall_get_arch();
-
- /* Unroll syscall_get_args to help gcc on arm. */
- syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
- syscall_get_arguments(task, regs, 1, 1, (unsigned long *) &sd->args[1]);
- syscall_get_arguments(task, regs, 2, 1, (unsigned long *) &sd->args[2]);
- syscall_get_arguments(task, regs, 3, 1, (unsigned long *) &sd->args[3]);
- syscall_get_arguments(task, regs, 4, 1, (unsigned long *) &sd->args[4]);
- syscall_get_arguments(task, regs, 5, 1, (unsigned long *) &sd->args[5]);
-
+ syscall_get_arguments(task, regs, 0, 6, args);
+ sd->args[0] = args[0];
+ sd->args[1] = args[1];
+ sd->args[2] = args[2];
+ sd->args[3] = args[3];
+ sd->args[4] = args[4];
+ sd->args[5] = args[5];
sd->instruction_pointer = KSTK_EIP(task);
}

--
1.7.11.7


2014-04-14 20:13:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

On 04/14/2014 12:02 PM, Daniel Borkmann wrote:
> Linus reports that on 32-bit x86 Chromium throws the following seccomp
> resp. audit log messages:
>
> audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
> gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
> pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
> syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
>
> audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
> gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
> pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
> compat=0 ip=0xb2dd9852 code=0x50000
>
> These audit messages are being triggered via audit_seccomp() through
> __secure_computing() in seccomp mode (BPF) filter with seccomp return
> codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
> during filter runtime. Moreover, Linus reports that x86_64 Chromium
> seems fine.
>
> The underlying issue that explains this is that the implementation of
> populate_seccomp_data() is wrong. Our seccomp data structure sd that
> is being shared with user ABI is:
>
> struct seccomp_data {
> int nr;
> __u32 arch;
> __u64 instruction_pointer;
> __u64 args[6];
> };
>
> Therefore, a simple cast to 'unsigned long *' for storing the value of
> the syscall argument via syscall_get_arguments() is just wrong as on
> 32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
> at wrong offsets in args[] member, and thus i) could leak stack memory
> to user space and ii) tampers with the logic of seccomp BPF programs
> that read out and check for syscall arguments:
>
> syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);

I think this description is wrong. (unsigned long *) &sd->args[1] is
the right location, at least on 32-bit little-endian architectures.
((unsigned long *) &sd->args)[1] would be wrong, as I think you've
described, but that's not what the code does.

I think the real problem is that 32-bit BE is hosed, and on 32-bit LE,
the high bits aren't getting cleared.

I would make this change conditional on BITS_PER_LONG != 8, since this
probably severely pessimizes architectures like ia-64.

>
> Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
> test machine through slow ssh X forwarding, but it fixes the issue on
> my side. So fix it up by storing args in type correct variables, gcc
> is clever and optimizes the copy away in other cases, e.g. x86_64.
>
> Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
> Reported-and-bisected-by: Linus Torvalds <[email protected]>
> Signed-off-by: Daniel Borkmann <[email protected]>
> Signed-off-by: Alexei Starovoitov <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Eric Paris <[email protected]>
> Cc: James Morris <[email protected]>
> Cc: Kees Cook <[email protected]>
> ---
> Dave, do you want to pick this up?
>
> kernel/seccomp.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index d8d046c..590c379 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -69,18 +69,17 @@ static void populate_seccomp_data(struct seccomp_data *sd)
> {
> struct task_struct *task = current;
> struct pt_regs *regs = task_pt_regs(task);
> + unsigned long args[6];
>
> sd->nr = syscall_get_nr(task, regs);
> sd->arch = syscall_get_arch();
> -
> - /* Unroll syscall_get_args to help gcc on arm. */
> - syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
> - syscall_get_arguments(task, regs, 1, 1, (unsigned long *) &sd->args[1]);
> - syscall_get_arguments(task, regs, 2, 1, (unsigned long *) &sd->args[2]);
> - syscall_get_arguments(task, regs, 3, 1, (unsigned long *) &sd->args[3]);
> - syscall_get_arguments(task, regs, 4, 1, (unsigned long *) &sd->args[4]);
> - syscall_get_arguments(task, regs, 5, 1, (unsigned long *) &sd->args[5]);
> -
> + syscall_get_arguments(task, regs, 0, 6, args);
> + sd->args[0] = args[0];
> + sd->args[1] = args[1];
> + sd->args[2] = args[2];
> + sd->args[3] = args[3];
> + sd->args[4] = args[4];
> + sd->args[5] = args[5];
> sd->instruction_pointer = KSTK_EIP(task);
> }
>
>

2014-04-14 20:24:58

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

From: Andy Lutomirski <[email protected]>
Date: Mon, 14 Apr 2014 13:13:45 -0700

> I think this description is wrong. (unsigned long *) &sd->args[1] is
> the right location, at least on 32-bit little-endian architectures.

It absolutely is not.

The thing is a u64, and we must respect that type in a completely
portable way.

Daniel's change is %100 correct, portable, and doesn't have any
ugly ifdef crap.

If you want to optimize this, and potentially break it again, do
it in the next merge window not now.

I'm going to apply Daniel's patch.

2014-04-14 20:29:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

On Mon, Apr 14, 2014 at 1:24 PM, David Miller <[email protected]> wrote:
> From: Andy Lutomirski <[email protected]>
> Date: Mon, 14 Apr 2014 13:13:45 -0700
>
>> I think this description is wrong. (unsigned long *) &sd->args[1] is
>> the right location, at least on 32-bit little-endian architectures.
>
> It absolutely is not.

Huh? It's a pointer to the right address, but the type is wrong.

The changelog says "on 32-bit x86 (or any other 32bit arch), it would
result in storing a0-a5 at wrong offsets in args[] member". Unless
I'm mistaken, this is incorrect: a0-a5 are are the correct offsets,
but they are stored with the wrong type, so the other bits in there
are garbage.

>
> The thing is a u64, and we must respect that type in a completely
> portable way.
>
> Daniel's change is %100 correct, portable, and doesn't have any
> ugly ifdef crap.
>

I have no problem with the patch itself. I'm suggesting that a better
changelog message would confuse other people reading the same patch
less.

--Andy

2014-04-15 06:31:54

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

On Mon, Apr 14, 2014 at 1:28 PM, Andy Lutomirski <[email protected]> wrote:
> On Mon, Apr 14, 2014 at 1:24 PM, David Miller <[email protected]> wrote:
>> From: Andy Lutomirski <[email protected]>
>> Date: Mon, 14 Apr 2014 13:13:45 -0700
>>
>>> I think this description is wrong. (unsigned long *) &sd->args[1] is
>>> the right location, at least on 32-bit little-endian architectures.
>>
>> It absolutely is not.
>
> Huh? It's a pointer to the right address, but the type is wrong.
>
> The changelog says "on 32-bit x86 (or any other 32bit arch), it would
> result in storing a0-a5 at wrong offsets in args[] member". Unless
> I'm mistaken, this is incorrect: a0-a5 are are the correct offsets,
> but they are stored with the wrong type, so the other bits in there
> are garbage.

agree. your above description is more correct than the log.
We were focusing on the bug itself and the log came a bit misleading
as a result of multiple iterations back and forth between me and Daniel.

also the log says:
"gcc is clever and optimizes the copy away in other cases, e.g. x86_64"
since we actually checked assembler, so the fix doesn't pessimize
64-bit architectures :)
This function is in critical path for seccomp, so performance definitely
matters.

>>
>> The thing is a u64, and we must respect that type in a completely
>> portable way.
>>
>> Daniel's change is %100 correct, portable, and doesn't have any
>> ugly ifdef crap.
>>
>
> I have no problem with the patch itself. I'm suggesting that a better
> changelog message would confuse other people reading the same patch
> less.
>
> --Andy

2014-04-15 17:46:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

On Mon, Apr 14, 2014 at 11:31 PM, Alexei Starovoitov <[email protected]> wrote:
> On Mon, Apr 14, 2014 at 1:28 PM, Andy Lutomirski <[email protected]> wrote:
>> On Mon, Apr 14, 2014 at 1:24 PM, David Miller <[email protected]> wrote:
>>> From: Andy Lutomirski <[email protected]>
>>> Date: Mon, 14 Apr 2014 13:13:45 -0700
>>>
>>>> I think this description is wrong. (unsigned long *) &sd->args[1] is
>>>> the right location, at least on 32-bit little-endian architectures.
>>>
>>> It absolutely is not.
>>
>> Huh? It's a pointer to the right address, but the type is wrong.
>>
>> The changelog says "on 32-bit x86 (or any other 32bit arch), it would
>> result in storing a0-a5 at wrong offsets in args[] member". Unless
>> I'm mistaken, this is incorrect: a0-a5 are are the correct offsets,
>> but they are stored with the wrong type, so the other bits in there
>> are garbage.
>
> agree. your above description is more correct than the log.
> We were focusing on the bug itself and the log came a bit misleading
> as a result of multiple iterations back and forth between me and Daniel.
>
> also the log says:
> "gcc is clever and optimizes the copy away in other cases, e.g. x86_64"
> since we actually checked assembler, so the fix doesn't pessimize
> 64-bit architectures :)
> This function is in critical path for seccomp, so performance definitely
> matters.

Yeah, I'm not entirely sure what I was thinking when I wrote that
part. The new code should actually be much better than the old code
for weird architectures like ia-64.

For reference, ia-64 uses the unwinder (!) to look up arguments, so
the fewer times it gets invoked, the better.

--Andy

2014-04-15 22:52:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

[ Sorry for delayed testing, I just came back home and didn't have
access to the affected 32-bit machine on the road ]

On Mon, Apr 14, 2014 at 12:02 PM, Daniel Borkmann <[email protected]> wrote:
> Linus reports that on 32-bit x86 Chromium throws the following seccomp
> resp. audit log messages:

Tested, and fixes the problem for me.

Thanks,

Linus

2014-04-15 23:17:49

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF

From: Linus Torvalds <[email protected]>
Date: Tue, 15 Apr 2014 15:52:42 -0700

> [ Sorry for delayed testing, I just came back home and didn't have
> access to the affected 32-bit machine on the road ]
>
> On Mon, Apr 14, 2014 at 12:02 PM, Daniel Borkmann <[email protected]> wrote:
>> Linus reports that on 32-bit x86 Chromium throws the following seccomp
>> resp. audit log messages:
>
> Tested, and fixes the problem for me.

I'll push this fix to you later this evening.