LinuxLists.cc - disable tsc with seccomp

2005-11-05 13:47:46

Subject: disable tsc with seccomp

Hello,

This changeset is backing out an useful feature I implemented some month
ago:

http://kernel.org/hg/linux-2.6/?cs=2fd4e5f089df

Anything that can strengthen security is needed, the covert channels are
theoretically possible and this is a fact, you don't need hyperthreading
for that.

I tried to convince you a few times privately but I failed, and now that
you made mainline less secure, I have to raise the topic on l-k since
all other attemps to convince you privately already failed.

As I told you a few times, in real life any admin that doesn't notice a
task running at 100% cpu load for months means there are more serious
problems in that server, than the risk of covert channel. Because of
that, covert channels remains mostly a theoretical problem in servers.

But with the CPUShare usage of seccomp, running untrusted bycode for
months at 100% cpu load is the norm, so we must disable all high
precision timing information that we can disable.

Infact we should disable MISC_ENABLE too at runtime (if possible).

Furthermore i386 still has the tsc disable with seccomp, so the fact my
patch is still applied to i386 and has been backed out only of x86-64 is
a nosense. Either we back out both (and I strongly disagree with that),
or we keep both applied (this is what I'm suggesting). Current status
makes no sense to me.

If the end result of this discussion will be that both patches are
backed out, I'll rewrite them with a config option (turned off by
default). So the CPUShare users that wants to be safer, can enable it
when compiling the kernel (plus crossing fingers in the hope that
distros would also enable it before compiling their kernels). A config
option would make it acceptable even in the worst case I hope.

I think it would have been nicer from your part to at least make it a
config option instead of dropping it right away, especially after I
explicitly asked you not to drop it.

Thanks.

2005-11-05 15:37:51

by Andi Kleen

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Saturday 05 November 2005 14:47, Andrea Arcangeli wrote:
> Hello,
>
> This changeset is backing out an useful feature I implemented some month
> ago:
>
> http://kernel.org/hg/linux-2.6/?cs=2fd4e5f089df
>
> Anything that can strengthen security is needed, the covert channels are
> theoretically possible and this is a fact, you don't need hyperthreading
> for that.

It was useless, you can get exactly the same information by using RDPMC
on perfctr 0 which always runs the NMI watchdog and counts all cycles too.

And even with that I don't want to have such checks in the context switch
for something that is at best theoretical. Letting it in would open the
floodgates for making the context switch really slow long term.

-Andi

2005-11-05 16:07:22

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Sat, Nov 05, 2005 at 04:37:44PM +0100, Andi Kleen wrote:
> It was useless, you can get exactly the same information by using RDPMC
> on perfctr 0 which always runs the NMI watchdog and counts all cycles too.

nmi watchdog is off in my system, but it was used to be very slow.
Anyway performance counters should be turned off too. They can be turned
off on a per task basis right? Just switching another cr4 bit or what?

The fact turning off the tsc is not enough to remove all timing info, is
sure not a good reason to remove that code IMHO, infact we should
disable more stuff if there are other ways to gather that information.

The fast path cost is constant and not measurable, no matter how much
stuff we disable in the slow path. So that's not a problem. We must
disable everything that can be disabled and that can provide potential
high precision timing info to userland. For example we could also flip
the ptes to non-present over the hpet mapping. Zero-cost! The only
_fixed_ cost is the one we already had before you backed out the
feature. But note that hpet is a very very low prio compared to tsc, the
precision is not high enough for it to matter, but certainly I would
welcome a patch flipping the hpet ptes too. I only care about turning
off timings sources that allows to count cycles.

2005-11-05 16:12:15

by Andi Kleen

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Saturday 05 November 2005 17:07, Andrea Arcangeli wrote:
> On Sat, Nov 05, 2005 at 04:37:44PM +0100, Andi Kleen wrote:
> > It was useless, you can get exactly the same information by using RDPMC
> > on perfctr 0 which always runs the NMI watchdog and counts all cycles
> > too.
>
> nmi watchdog is off in my system, but it was used to be very slow.

It is normally on on all x86-64 systems.

> Anyway performance counters should be turned off too. They can be turned
> off on a per task basis right? Just switching another cr4 bit or what?

I definitely don't want any code like this in the context switch. It is
critical and I don't want to pollute fast paths with stuff like this
that nobody needs.

-Andi

2005-11-05 16:31:39

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Sat, Nov 05, 2005 at 05:12:09PM +0100, Andi Kleen wrote:
> It is normally on on all x86-64 systems.

Can the performance counters be disabled for seccomp only right?

> I definitely don't want any code like this in the context switch. It is
> critical and I don't want to pollute fast paths with stuff like this
> that nobody needs.

287 registered CPUShare users will appreciate to compute more securely
thanks to this feature (about 10 up at any given time), and once I start
allowing transactions I hope much more users will need this (it's not
finished yet).

We have in the kernel lots of features that slowdown a bit and that
benefit only a part of the userbase. Even kmap only benefits people with
>1G of ram. Even the security_* api in the syscalls only benefit a part
of the userbase. There are infinite other examples. The point is that
none of this is measurable, _especially_ this one in the context switch,
context switches aren't as frequent as syscalls! It's only two
cachelines at every context switch, and they might be hot already since
we're mangling over the task struct of the prev/next regardless of my
patch. If something we could discuss how to pack the seccomp structure
in the task structure so that it's already hot. But calling my feature
"pollution" seems a bit exaggerated to me.

Plus Andrew would have never allowed it to go in, if this could have
impacted performance, you also should know this can't slowdown anything
and you're just talking about theory.

Of course if 1000 other people also adds their feature to the context
switch then it might become measurable, but this is the first time we
had to change the context switch to add more security on per-task basis,
so I don't see it happening any time soon. Plus if it happens we can
implement some technique to share the same cacheline for multiple
purposes so it remains a fixed cost (like an hook). If you tell me how
to use an hook today that avoids me to touch two cachelines in the
context switch fast path, let me know, I don't see it.

2005-11-05 17:04:15

by Andi Kleen

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Saturday 05 November 2005 17:31, Andrea Arcangeli wrote:
> On Sat, Nov 05, 2005 at 05:12:09PM +0100, Andi Kleen wrote:
> > It is normally on on all x86-64 systems.
>
> Can the performance counters be disabled for seccomp only right?

Yes, there is a bit to disable reading performance counters in ring 3.

But I promise you to complain about a patch to add setting it in the context
switch too :)

> > I definitely don't want any code like this in the context switch. It is
> > critical and I don't want to pollute fast paths with stuff like this
> > that nobody needs.
>
> 287 registered CPUShare users will appreciate to compute more securely
> thanks to this feature (about 10 up at any given time), and once I start
> allowing transactions I hope much more users will need this (it's not
> finished yet).

I don't believe they need it - the side channel attack is too theoretical for
their use case.

> We have in the kernel lots of features that slowdown a bit and that
> benefit only a part of the userbase. Even kmap only benefits people with
>
> >1G of ram. Even the security_* api in the syscalls only benefit a part
>
> of the userbase. There are infinite other examples. The point is that
> none of this is measurable,

LSM was actually quite measurable on some systems, the indirect
calls really hurt on IA64 on some of the network benchmarks.

> _especially_ this one in the context switch,
> context switches aren't as frequent as syscalls! It's only two
> cachelines at every context switch, and they might be hot

If they're not hot for some reason (e.g. cache pig in userspace) you're
talking about 1000+ cycles.

> Plus Andrew would have never allowed it to go in, if this could have
> impacted performance, you also should know this can't slowdown anything
> and you're just talking about theory.

The person talking about theory is you in my opinion with this basically
theoretical attack.

> Of course if 1000 other people also adds their feature to the context
> switch then it might become measurable, but this is the first time we
> had to change the context switch to add more security on per-task basis,

Better to stamp out any such attempts in the roots.

-Andi

2005-11-06 01:55:47

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Sat, Nov 05, 2005 at 06:04:08PM +0100, Andi Kleen wrote:
> But I promise you to complain about a patch to add setting it in the context
> switch too :)

If we include the tsc disable, the performance counter disable will be
completely zerocost for the fast path, so you shouldn't complain about
that, and we can keep the focus only on the tsc part :)

This invalidates your argument that the tsc patch can be disabled
because the same info can be collected using performance counters: if
you are ok with the tsc patch, the performance counter problem will be
solved too at zero additional cost for the fast path.

> LSM was actually quite measurable on some systems, the indirect
> calls really hurt on IA64 on some of the network benchmarks.

I can imagine that since they're executed during syscalls. But we're
talking about context switches here, so in average order of magnitude
less frequent than syscalls.

> > _especially_ this one in the context switch,
> > context switches aren't as frequent as syscalls! It's only two
> > cachelines at every context switch, and they might be hot
>
> If they're not hot for some reason (e.g. cache pig in userspace) you're
> talking about 1000+ cycles.

What I meant is that we could try to be careful enough in ordering the
task struct so that no additional cacheline has to be read from the cpu,
so it doesn't matter what runs in userland.

I'm all for discussing how to possibly eliminate the additional two
cachelines of working set in the cpu.

If you have to touch a close area during the context switch, the
cacheline of the close area has to become hot, no matter what pigs runs
in userland. So our objective would be to store the seccomp bitflags in the same
cacheline that is required to become hot for the context switch to
execute. If we can achieve that objective we'll be doing the check
nearly at zerocost.

> The person talking about theory is you in my opinion with this basically
> theoretical attack.

But there is a great deal of difference in the impact of the two theories.

If my theory happens in practice, a ssh or ssl key can be stolen.

If your theory happens in practice, there will be a masurable but for
sure very tiny slowdown in the fast path.

So if I'm right, things might go bad.

If you're right, nothing will go wrong. If you're right, it basically
means we need a config option around my code, but we shouldn't remove it
since it still provides value to the CPUShare users (I'm sure they'll be
happy to risk your claimed microslowdown, to be sure there is no covert
channel possible with seccomp).

I can't fix what I don't know about. But this thing I know about it, and
I want to do all I can to guarantee it's impossible to trigger it. The
only way to guarantee it is to apply my patch.

For normal execution of tasks, disabling the tsc sounds not needed, as
said in the previous email, CPUShare is the only project out there that
runs untrusted bytecode at 100% cpu load for months, with the
administrator acknowledging it as the "normal safe load". So CPUShare is
the only thing that has to be careful about preventing covert channels.
For anything else, it is more an administration problem than a kernel
problem.

> Better to stamp out any such attempts in the roots.

I see your point, but OTOH I'd like math guarantee of preventing covert
channels. I know for myself, that I'll feel safer in running CPUShare on
my systems, knowing a cover channel is mathematically impossible.

If you don't use CPUShare, to know a cover channel is impossible, you
only have to run `top`! So you definitely don't need to disable the tsc
to feel safe, if you're not running untrusted bytecode under seccomp at
full cpu load.

Hope this clarifies my point of view ;).

2005-11-21 16:44:03

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

Since there was no feedback to my last post, I assume you agree, so
please backout the tsc disable so then I can plug the performane counter
disable on top of it (at zero additional runtime cost).

Also note: even if it may not be obvious because it's nicely hidden by
header files, if you disable CONFIG_SECCOMP, the tsc disable patch
becomes a noop and it's optimized away by gcc (like the rest of
seccomp).

So if you don't want to risk the microslowdown in your systems, just
disable SECCOMP and be done with it but please resurrect that patch so I
can fix the performance counters too after that.

Thanks!

2005-11-21 17:05:20

by Andi Kleen

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Mon, Nov 21, 2005 at 05:43:49PM +0100, Andrea Arcangeli wrote:
> Since there was no feedback to my last post, I assume you agree, so
> please backout the tsc disable so then I can plug the performane counter
> disable on top of it (at zero additional runtime cost).

Sorry I don't agree.

-Andi

2005-11-21 17:16:21

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Mon, Nov 21, 2005 at 06:05:17PM +0100, Andi Kleen wrote:
> On Mon, Nov 21, 2005 at 05:43:49PM +0100, Andrea Arcangeli wrote:
> > Since there was no feedback to my last post, I assume you agree, so
> > please backout the tsc disable so then I can plug the performane counter
> > disable on top of it (at zero additional runtime cost).
>
> Sorry I don't agree.

You've the config option, turn that off on your systems, what's the
problem with that?

Or does this mean I need to ship kernels myself with covert channels
made mathematically impossible with seccomp enabled? I'd rather avoid
having to ship special kernels to run CPUShare as safely as physically
possible. My time is already too short, so I hope I won't have to take
care of this additional burden.

2005-11-21 17:24:47

by Andi Kleen

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Mon, Nov 21, 2005 at 06:16:16PM +0100, Andrea Arcangeli wrote:
> On Mon, Nov 21, 2005 at 06:05:17PM +0100, Andi Kleen wrote:
> > On Mon, Nov 21, 2005 at 05:43:49PM +0100, Andrea Arcangeli wrote:
> > > Since there was no feedback to my last post, I assume you agree, so
> > > please backout the tsc disable so then I can plug the performane counter
> > > disable on top of it (at zero additional runtime cost).
> >
> > Sorry I don't agree.
>
> You've the config option, turn that off on your systems, what's the
> problem with that?
>
> Or does this mean I need to ship kernels myself with covert channels
> made mathematically impossible with seccomp enabled? I'd rather avoid

I don't believe theoretical, unlikely to be usable in any way
covert channels are a justification to make fast paths slower.
That's independent of CONFIG options.

And in addition your change doesn't even close that channel
in theory.

-Andi

2005-11-21 17:39:00

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Mon, Nov 21, 2005 at 06:24:44PM +0100, Andi Kleen wrote:
> I don't believe theoretical, unlikely to be usable in any way

Demonstrate it if you can.

The fact is that the closest thing to a demonstration that we have,
demonstrated the opposite of what you claim, so you'll have an hard
time to convince me about your point given you've no way to demonstrate
it.

I can guarantee that my machine is safe by running top, the only thing
that escapes the easy "top" check is CPUShare and I want to fix it,
since I can.

> And in addition your change doesn't even close that channel
> in theory.

The the PCE is red herring, since it's a zero-cost addition.

2005-11-21 18:40:25

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable tsc with seccomp

On Sat, Nov 05, 2005 at 04:37:44PM +0100, Andi Kleen wrote:
> It was useless, you can get exactly the same information by using
> RDPMC on perfctr 0 which always runs the NMI watchdog and counts all
> cycles too.

I can't see how you can claim that you can read stuff with rdpmc.

andrea@opteron:~> gdb ./a.out
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "x86_64-suse-linux"...Using host libthread_db
library "/lib64/tls/libthread_db.so.1".

(gdb) r
Starting program: /home/andrea/a.out

Program received signal SIGSEGV, Segmentation fault.
0x00000000004004bc in main ()
(gdb) dis 0x00000000004004bc
warning: bad breakpoint number at or near '0x00000000004004bc'
(gdb) disassem 0x00000000004004bc
Dump of assembler code for function main:
0x00000000004004b8 <main+0>: push %rbp
0x00000000004004b9 <main+1>: mov %rsp,%rbp
0x00000000004004bc <main+4>: rdpmc

I get an immediate segfault if I try to execute that instruction. infact
the PCE bitflag is _already_ zero (checked with sysrq+P).

Perhaps you mean if you change the kernel to allow RDPMC then you have a
problem? But then it means your kernel modifications are buggy if they
break seccomp.

Please tell me how to generate a convert channel with an unmodified
2.6.15-rc2 plus the attached patch applied so I can fix it too. I also
moved the seccomp struct next to the scheduler data so the two
cachelines may be hot already and the theoretical overhead may go away
too.

And next time please bother to send me an email instead of silenty
backing out my recent patches from your tree, especially when your
backouts might decrease the security of some users of the kernel.
I was lucky that Andrew notified me about it. (thanks!)

In the below patch I also added a forced clear of the PCE just in case
somebody writes buggy kernel code (as an additional security measure so
if somebody writes buggy code like you seem to imply, seccomp still
won't break this way, the buggy code will break instead and it will
deserve it ;).

Signed-off-by: Andrea Arcangeli <[email protected]>

diff -r 6377b3f31134 include/linux/sched.h
--- a/include/linux/sched.h Mon Nov 21 06:06:28 2005 +0800
+++ b/include/linux/sched.h Mon Nov 21 20:04:38 2005 +0200
@@ -713,6 +719,8 @@
#ifdef CONFIG_SCHEDSTATS
struct sched_info sched_info;
#endif
+
+ seccomp_t seccomp;

struct list_head tasks;
/*
@@ -810,7 +818,6 @@

void *security;
struct audit_context *audit_context;
- seccomp_t seccomp;

/* Thread group tracking */
u32 parent_exec_id;
diff -r 6377b3f31134 arch/x86_64/kernel/process.c
--- a/arch/x86_64/kernel/process.c Mon Nov 21 06:06:28 2005 +0800
+++ b/arch/x86_64/kernel/process.c Mon Nov 21 20:04:38 2005 +0200
@@ -485,6 +485,33 @@
}

/*
+ * This function selects if the context switch from prev to next
+ * has to tweak the TSC disable bit in the cr4.
+ */
+static inline void disable_tsc(struct task_struct *prev_p,
+ struct task_struct *next_p)
+{
+ struct thread_info *prev, *next;
+
+ /*
+ * gcc should eliminate the ->thread_info dereference if
+ * has_secure_computing returns 0 at compile time (SECCOMP=n).
+ */
+ prev = prev_p->thread_info;
+ next = next_p->thread_info;
+
+ if (has_secure_computing(prev) || has_secure_computing(next)) {
+ /* slow path here */
+ if (has_secure_computing(prev) &&
+ !has_secure_computing(next)) {
+ write_cr4(read_cr4() & ~X86_CR4_TSD);
+ } else if (!has_secure_computing(prev) &&
+ has_secure_computing(next))
+ write_cr4((read_cr4() | X86_CR4_TSD) & ~X86_CR4_PCE);
+ }
+}
+
+/*
* This special macro can be used to load a debugging register
*/
#define loaddebug(thread,r) set_debug(thread->debugreg ## r, r)
@@ -603,6 +630,8 @@
memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
}
}
+
+ disable_tsc(prev_p, next_p);

return prev_p;
}