2016-10-26 20:54:23

by Pavel Machek

[permalink] [raw]
Subject: Getting interrupt every million cache misses

Hi!

I'd like to get an interrupt every million cache misses... to do a
printk() or something like that. As far as I can tell, modern hardware
should allow me to do that. AFAICT performance events subsystem can do
something like that, but I can't figure out where the code is / what I
should call.

Can someone help?

Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (477.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-27 09:16:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Getting interrupt every million cache misses

On Thu, Oct 27, 2016 at 10:46:38AM +0200, Pavel Machek wrote:

> And actually, printk() is not needed, udelay(50msec) is. Reason is,
> that DRAM becomes unreliable if about milion cache misses happen in
> under 64msec -- so I'd like to slow the system down in such cases to
> prevent bug from biting me.
>
> (Details are here
> https://googleprojectzero.blogspot.cz/2015/03/exploiting-dram-rowhammer-bug-to-gain.html
> ). Bug is exploitable to get local root; it is also exploitable to
> gain local code execution from javascript... so it is rather severe.

Cute, a rowhammer defence.

So we can do in-kernel perf events too, see for example
kernel/watchdog.c:wd_hw_attr and its users.

I suppose you want PERF_COUNT_HW_CACHE_MISSES as config, although
depending on platform you could use better (u-arch specific) events.

2016-10-27 14:36:19

by Pavel Machek

[permalink] [raw]
Subject: Re: Getting interrupt every million cache misses

On Thu 2016-10-27 10:28:01, Peter Zijlstra wrote:
> On Wed, Oct 26, 2016 at 10:54:16PM +0200, Pavel Machek wrote:
> > Hi!
> >
> > I'd like to get an interrupt every million cache misses... to do a
> > printk() or something like that. As far as I can tell, modern hardware
> > should allow me to do that. AFAICT performance events subsystem can do
> > something like that, but I can't figure out where the code is / what I
> > should call.
> >
> > Can someone help?
>
> Can you go back one step and explain why you would want this? What use
> is a printk() on every 1e6-th cache miss.
>
> That is, why doesn't:
>
> $ perf record -e cache-misses -c 1000000 -a -- sleep 5
>
> suffice?

How to work around rowhammer, break my system _and_ make kernel perf
maintainers scream at the same time: (:-) )

I think I got the place now. Let me try...

Thanks,
Pavel


diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d31735f..ce83f5e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1495,6 +1495,11 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)

perf_sample_event_took(finish_clock - start_clock);

+ /* Here */
+ {
+ udelay(58000);
+ }
+
return ret;
}
NOKPROBE_SYMBOL(perf_event_nmi_handler);


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.37 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-27 14:33:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Getting interrupt every million cache misses

On Thu, Oct 27, 2016 at 11:11:04AM +0200, Pavel Machek wrote:
> How to work around rowhammer, break my system _and_ make kernel perf
> maintainers scream at the same time: (:-) )
>
> I think I got the place now. Let me try...

Lol ;-)

>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index d31735f..ce83f5e 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1495,6 +1495,11 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>
> perf_sample_event_took(finish_clock - start_clock);
>
> + /* Here */
> + {
> + udelay(58000);
> + }
> +
> return ret;
> }
> NOKPROBE_SYMBOL(perf_event_nmi_handler);

Like you guess, not quite ;-)


I think you want to register a custom overflow handler with your event.

So you get something like:


struct perf_event_attr rh_attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CACHE_MISSES,
.size = sizeof(struct perf_event_attr),
.pinned = 1,
.sample_period = 1000000,
};

static DEFINE_PER_CPU(struct perf_event *, rh_event);
static DEFINE_PER_CPU(u64, rh_timestamp);

static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
{
u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
u64 now = ktime_get_mono_fast_ns();
s64 delta = now - *ts;

*ts = now;

if (delta > 64 * NSEC_PER_USEC)
udelay(58000);
}

__init int my_module_init()
{
int cpu;

/* XXX borken vs hotplug */

for_each_online_cpu(cpu) {
struct perf_event *event = per_cpu(event, cpu);

event = perf_event_create_kernel_counter(&rh_attr, cpu, NULL, rh_overflow, NULL);
if (!event)
/* meh */
;

}
}

__exit void my_module_exit()
{
int cpu;

for_each_online_cpu(cpu) {
struct perf_event *event = per_cpu(event, cpu);

if (event)
perf_event_release_kernel(event);
}
}

2016-10-27 14:46:13

by Pavel Machek

[permalink] [raw]
Subject: Re: Getting interrupt every million cache misses

Hi!

> > I'd like to get an interrupt every million cache misses... to do a
> > printk() or something like that. As far as I can tell, modern hardware
> > should allow me to do that. AFAICT performance events subsystem can do
> > something like that, but I can't figure out where the code is / what I
> > should call.
> >
> > Can someone help?
>
> Can you go back one step and explain why you would want this? What use
> is a printk() on every 1e6-th cache miss.

First, thanks for quick reply.

And actually, printk() is not needed, udelay(50msec) is. Reason is,
that DRAM becomes unreliable if about milion cache misses happen in
under 64msec -- so I'd like to slow the system down in such cases to
prevent bug from biting me.

(Details are here
https://googleprojectzero.blogspot.cz/2015/03/exploiting-dram-rowhammer-bug-to-gain.html
). Bug is exploitable to get local root; it is also exploitable to
gain local code execution from javascript... so it is rather severe.

> That is, why doesn't:
>
> $ perf record -e cache-misses -c 1000000 -a -- sleep 5
>
> suffice?

Thanks for the pointer... I'd really like to do this from kernel, so
that I can "almost synchronously" stop the execution when excessive
cache isses happen.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.37 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-27 15:19:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Getting interrupt every million cache misses

On Wed, Oct 26, 2016 at 10:54:16PM +0200, Pavel Machek wrote:
> Hi!
>
> I'd like to get an interrupt every million cache misses... to do a
> printk() or something like that. As far as I can tell, modern hardware
> should allow me to do that. AFAICT performance events subsystem can do
> something like that, but I can't figure out where the code is / what I
> should call.
>
> Can someone help?

Can you go back one step and explain why you would want this? What use
is a printk() on every 1e6-th cache miss.

That is, why doesn't:

$ perf record -e cache-misses -c 1000000 -a -- sleep 5

suffice?

2016-10-27 20:41:03

by Kees Cook

[permalink] [raw]
Subject: Re: Getting interrupt every million cache misses

On Thu, Oct 27, 2016 at 2:33 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Oct 27, 2016 at 11:11:04AM +0200, Pavel Machek wrote:
>> How to work around rowhammer, break my system _and_ make kernel perf
>> maintainers scream at the same time: (:-) )
>>
>> I think I got the place now. Let me try...
>
> Lol ;-)
>
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index d31735f..ce83f5e 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1495,6 +1495,11 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>>
>> perf_sample_event_took(finish_clock - start_clock);
>>
>> + /* Here */
>> + {
>> + udelay(58000);
>> + }
>> +
>> return ret;
>> }
>> NOKPROBE_SYMBOL(perf_event_nmi_handler);
>
> Like you guess, not quite ;-)
>
>
> I think you want to register a custom overflow handler with your event.
>
> So you get something like:
>
>
> struct perf_event_attr rh_attr = {
> .type = PERF_TYPE_HARDWARE,
> .config = PERF_COUNT_HW_CACHE_MISSES,
> .size = sizeof(struct perf_event_attr),
> .pinned = 1,
> .sample_period = 1000000,
> };
>
> static DEFINE_PER_CPU(struct perf_event *, rh_event);
> static DEFINE_PER_CPU(u64, rh_timestamp);
>
> static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> {
> u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> u64 now = ktime_get_mono_fast_ns();
> s64 delta = now - *ts;
>
> *ts = now;
>
> if (delta > 64 * NSEC_PER_USEC)
> udelay(58000);
> }
>
> __init int my_module_init()
> {
> int cpu;
>
> /* XXX borken vs hotplug */
>
> for_each_online_cpu(cpu) {
> struct perf_event *event = per_cpu(event, cpu);
>
> event = perf_event_create_kernel_counter(&rh_attr, cpu, NULL, rh_overflow, NULL);
> if (!event)
> /* meh */
> ;
>
> }
> }
>
> __exit void my_module_exit()
> {
> int cpu;
>
> for_each_online_cpu(cpu) {
> struct perf_event *event = per_cpu(event, cpu);
>
> if (event)
> perf_event_release_kernel(event);
> }
> }

This is pretty cool. Are there workloads other than rowhammer that
could trip this, and if so, how bad would this delay be for them?

At the very least, this could be behind a CONFIG for people that don't
have a way to fix their RAM refresh timings, etc.

-Kees

--
Kees Cook
Nexus Security

2016-10-27 21:28:10

by Pavel Machek

[permalink] [raw]
Subject: rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> > if (event)
> > perf_event_release_kernel(event);
> > }
> > }
>
> This is pretty cool. Are there workloads other than rowhammer that
> could trip this, and if so, how bad would this delay be for them?
>
> At the very least, this could be behind a CONFIG for people that don't
> have a way to fix their RAM refresh timings, etc.

Yes, CONFIG_ is next.

Here's the patch, notice that I reversed the time handling logic -- it
should be correct now.

We can't tell cache misses on different addresses from cache misses on
same address (rowhammer), so this will have false positive. But so
far, my machine seems to work.

Unfortunately, I don't have machine suitable for testing nearby. Can
someone help with testing? [On the other hand... testing this is not
going to be easy. This will probably make problem way harder to
reproduce it in any case...]

I did run rowhammer, and yes, this did trigger and it was getting
delayed -- by factor of 2. That is slightly low -- delay should be
factor of 8 to get guarantees, if I understand things correctly.

Oh and NMI gets quite angry, but that was to be expected.

[ 112.476009] perf: interrupt took too long (23660454 > 23654965),
lowering ker
nel.perf_event_max_sample_rate to 250
[ 170.224007] INFO: NMI handler (perf_event_nmi_handler) took too
long to run: 55.844 msecs
[ 191.872007] INFO: NMI handler (perf_event_nmi_handler) took too
long to run: 55.845 msecs

Best regards,
Pavel

diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 2925188..130a185 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
endif

-obj-y := core.o ring_buffer.o callchain.o
+obj-y := core.o ring_buffer.o callchain.o nohammer.o

obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_UPROBES) += uprobes.o
diff --git a/kernel/events/nohammer.c b/kernel/events/nohammer.c
new file mode 100644
index 0000000..01844d2
--- /dev/null
+++ b/kernel/events/nohammer.c
@@ -0,0 +1,66 @@
+/*
+ * Thanks to Peter Zijlstra <[email protected]>.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+
+struct perf_event_attr rh_attr = {
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_CACHE_MISSES,
+ .size = sizeof(struct perf_event_attr),
+ .pinned = 1,
+ /* FIXME: it is 1000000 per cpu. */
+ .sample_period = 500000,
+};
+
+static DEFINE_PER_CPU(struct perf_event *, rh_event);
+static DEFINE_PER_CPU(u64, rh_timestamp);
+
+static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
+{
+ u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
+ u64 now = ktime_get_mono_fast_ns();
+ s64 delta = now - *ts;
+
+ *ts = now;
+
+ /* FIXME msec per usec, reverse logic? */
+ if (delta < 64 * NSEC_PER_MSEC)
+ mdelay(56);
+}
+
+static __init int my_module_init(void)
+{
+ int cpu;
+
+ /* XXX borken vs hotplug */
+
+ for_each_online_cpu(cpu) {
+ struct perf_event *event = per_cpu(rh_event, cpu);
+
+ event = perf_event_create_kernel_counter(&rh_attr, cpu, NULL, rh_overflow, NULL);
+ if (!event)
+ pr_err("Not enough resources to initialize nohammer on cpu %d\n", cpu);
+ pr_info("Nohammer initialized on cpu %d\n", cpu);
+
+ }
+ return 0;
+}
+
+static __exit void my_module_exit(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ struct perf_event *event = per_cpu(rh_event, cpu);
+
+ if (event)
+ perf_event_release_kernel(event);
+ }
+ return;
+}
+
+module_init(my_module_init);
+module_exit(my_module_exit);


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (3.69 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 07:07:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]


* Pavel Machek <[email protected]> wrote:

> +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> +{
> + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> + u64 now = ktime_get_mono_fast_ns();
> + s64 delta = now - *ts;
> +
> + *ts = now;
> +
> + /* FIXME msec per usec, reverse logic? */
> + if (delta < 64 * NSEC_PER_MSEC)
> + mdelay(56);
> +}

I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
very magic, and do we know it 100% that 56 msecs is what is needed everywhere?

Plus I'd also suggest exposing an 'NMI rowhammer delay count' in /proc/interrupts,
to make it easier to debug this. (Perhaps only show the line if the count is
nonzero.)

Finally, could we please also add a sysctl and Kconfig that allows this feature to
be turned on/off, with the default bootup value determined by the Kconfig value
(i.e. by the distribution)? Similar to CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE.

Thanks,

Ingo

2016-10-28 08:51:18

by Pavel Machek

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

On Fri 2016-10-28 09:07:01, Ingo Molnar wrote:
>
> * Pavel Machek <[email protected]> wrote:
>
> > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> > +{
> > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> > + u64 now = ktime_get_mono_fast_ns();
> > + s64 delta = now - *ts;
> > +
> > + *ts = now;
> > +
> > + /* FIXME msec per usec, reverse logic? */
> > + if (delta < 64 * NSEC_PER_MSEC)
> > + mdelay(56);
> > +}
>
> I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
> very magic, and do we know it 100% that 56 msecs is what is needed
> everywhere?

I agree this needs to be tunable (and with the other suggestions). But
this is actually not the most important tunable: the detection
threshold (rh_attr.sample_period) should be way more important.

And yes, this will all need to be tunable, somehow. But lets verify
that this works, first :-).

Thanks and best regards,
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.11 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 08:59:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]


* Pavel Machek <[email protected]> wrote:

> On Fri 2016-10-28 09:07:01, Ingo Molnar wrote:
> >
> > * Pavel Machek <[email protected]> wrote:
> >
> > > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> > > +{
> > > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> > > + u64 now = ktime_get_mono_fast_ns();
> > > + s64 delta = now - *ts;
> > > +
> > > + *ts = now;
> > > +
> > > + /* FIXME msec per usec, reverse logic? */
> > > + if (delta < 64 * NSEC_PER_MSEC)
> > > + mdelay(56);
> > > +}
> >
> > I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
> > very magic, and do we know it 100% that 56 msecs is what is needed
> > everywhere?
>
> I agree this needs to be tunable (and with the other suggestions). But
> this is actually not the most important tunable: the detection
> threshold (rh_attr.sample_period) should be way more important.
>
> And yes, this will all need to be tunable, somehow. But lets verify
> that this works, first :-).

Yeah.

Btw., a 56 NMI delay is pretty brutal in terms of latencies - it might
result in a smoother system to detect 100,000 cache misses and do a
~5.6 msecs delay instead?

(Assuming the shorter threshold does not trigger too often, of course.)

With all the tunables and statistics it would be possible to enumerate how
frequently the protection mechanism kicks in during regular workloads.

Thanks,

Ingo

2016-10-28 09:04:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

On Fri, Oct 28, 2016 at 10:50:39AM +0200, Pavel Machek wrote:
> On Fri 2016-10-28 09:07:01, Ingo Molnar wrote:
> >
> > * Pavel Machek <[email protected]> wrote:
> >
> > > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> > > +{
> > > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> > > + u64 now = ktime_get_mono_fast_ns();
> > > + s64 delta = now - *ts;
> > > +
> > > + *ts = now;
> > > +
> > > + /* FIXME msec per usec, reverse logic? */
> > > + if (delta < 64 * NSEC_PER_MSEC)
> > > + mdelay(56);
> > > +}
> >
> > I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
> > very magic, and do we know it 100% that 56 msecs is what is needed
> > everywhere?
>
> I agree this needs to be tunable (and with the other suggestions). But
> this is actually not the most important tunable: the detection
> threshold (rh_attr.sample_period) should be way more important.

So being totally ignorant of the detail of how rowhammer abuses the DDR
thing, would it make sense to trigger more often and delay shorter? Or
is there some minimal delay required for things to settle or something.

2016-10-28 09:27:22

by Vegard Nossum

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

On 28 October 2016 at 11:04, Peter Zijlstra <[email protected]> wrote:
> On Fri, Oct 28, 2016 at 10:50:39AM +0200, Pavel Machek wrote:
>> On Fri 2016-10-28 09:07:01, Ingo Molnar wrote:
>> >
>> > * Pavel Machek <[email protected]> wrote:
>> >
>> > > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
>> > > +{
>> > > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
>> > > + u64 now = ktime_get_mono_fast_ns();
>> > > + s64 delta = now - *ts;
>> > > +
>> > > + *ts = now;
>> > > +
>> > > + /* FIXME msec per usec, reverse logic? */
>> > > + if (delta < 64 * NSEC_PER_MSEC)
>> > > + mdelay(56);
>> > > +}
>> >
>> > I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
>> > very magic, and do we know it 100% that 56 msecs is what is needed
>> > everywhere?
>>
>> I agree this needs to be tunable (and with the other suggestions). But
>> this is actually not the most important tunable: the detection
>> threshold (rh_attr.sample_period) should be way more important.
>
> So being totally ignorant of the detail of how rowhammer abuses the DDR
> thing, would it make sense to trigger more often and delay shorter? Or
> is there some minimal delay required for things to settle or something.

Would it make sense to sample the counter on context switch, do some
accounting on a per-task cache miss counter, and slow down just the
single task(s) with a too high cache miss rate? That way there's no
global slowdown (which I assume would be the case here). The task's
slice of CPU would have to be taken into account because otherwise you
could have multiple cooperating tasks that each escape the limit but
taken together go above it.


Vegard

2016-10-28 09:35:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]


* Vegard Nossum <[email protected]> wrote:

> Would it make sense to sample the counter on context switch, do some
> accounting on a per-task cache miss counter, and slow down just the
> single task(s) with a too high cache miss rate? That way there's no
> global slowdown (which I assume would be the case here). The task's
> slice of CPU would have to be taken into account because otherwise you
> could have multiple cooperating tasks that each escape the limit but
> taken together go above it.

Attackers could work this around by splitting the rowhammer workload between
multiple threads/processes.

I.e. the problem is that the risk may come from any 'unprivileged user-space
code', where the rowhammer workload might be spread over multiple threads,
processes or even users.

Thanks,

Ingo

2016-10-28 09:47:18

by Vegard Nossum

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

On 28 October 2016 at 11:35, Ingo Molnar <[email protected]> wrote:
>
> * Vegard Nossum <[email protected]> wrote:
>
>> Would it make sense to sample the counter on context switch, do some
>> accounting on a per-task cache miss counter, and slow down just the
>> single task(s) with a too high cache miss rate? That way there's no
>> global slowdown (which I assume would be the case here). The task's
>> slice of CPU would have to be taken into account because otherwise you
>> could have multiple cooperating tasks that each escape the limit but
>> taken together go above it.
>
> Attackers could work this around by splitting the rowhammer workload between
> multiple threads/processes.
>
> I.e. the problem is that the risk may come from any 'unprivileged user-space
> code', where the rowhammer workload might be spread over multiple threads,
> processes or even users.

That's why I emphasised the number of misses per CPU slice rather than
just the total number of misses. I assumed there must be at least one
task continuously hammering memory for a successful attack, in which
case it should be observable with as little as 1 slice of CPU (however
long that is), no?


Vegard

2016-10-28 09:52:19

by Mark Rutland

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi,

I missed the original, so I've lost some context.

Has this been tested on a system vulnerable to rowhammer, and if so, was
it reliable in mitigating the issue?

Which particular attack codebase was it tested against?

On Thu, Oct 27, 2016 at 11:27:47PM +0200, Pavel Machek wrote:
> --- /dev/null
> +++ b/kernel/events/nohammer.c
> @@ -0,0 +1,66 @@
> +/*
> + * Thanks to Peter Zijlstra <[email protected]>.
> + */
> +
> +#include <linux/perf_event.h>
> +#include <linux/module.h>
> +#include <linux/delay.h>
> +
> +struct perf_event_attr rh_attr = {
> + .type = PERF_TYPE_HARDWARE,
> + .config = PERF_COUNT_HW_CACHE_MISSES,
> + .size = sizeof(struct perf_event_attr),
> + .pinned = 1,
> + /* FIXME: it is 1000000 per cpu. */
> + .sample_period = 500000,
> +};

I'm not sure that this is general enough to live in core code, because:

* there are existing ways around this (e.g. in the drammer case, using a
non-cacheable mapping, which I don't believe would count as a cache
miss).

Given that, I'm very worried that this gives the false impression of
protection in cases where a software workaround of this sort is
insufficient or impossible.

* the precise semantics of performance counter events varies drastically
across implementations. PERF_COUNT_HW_CACHE_MISSES, might only map to
one particular level of cache, and/or may not be implemented on all
cores.

* On some implementations, it may be that the counters are not
interchangeable, and for those this would take away
PERF_COUNT_HW_CACHE_MISSES from existing users.

> +static DEFINE_PER_CPU(struct perf_event *, rh_event);
> +static DEFINE_PER_CPU(u64, rh_timestamp);
> +
> +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> +{
> + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> + u64 now = ktime_get_mono_fast_ns();
> + s64 delta = now - *ts;
> +
> + *ts = now;
> +
> + /* FIXME msec per usec, reverse logic? */
> + if (delta < 64 * NSEC_PER_MSEC)
> + mdelay(56);
> +}

If I round-robin my attack across CPUs, how much does this help?

Thanks,
Mark.

2016-10-28 09:53:36

by Mark Rutland

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

On Fri, Oct 28, 2016 at 11:35:47AM +0200, Ingo Molnar wrote:
>
> * Vegard Nossum <[email protected]> wrote:
>
> > Would it make sense to sample the counter on context switch, do some
> > accounting on a per-task cache miss counter, and slow down just the
> > single task(s) with a too high cache miss rate? That way there's no
> > global slowdown (which I assume would be the case here). The task's
> > slice of CPU would have to be taken into account because otherwise you
> > could have multiple cooperating tasks that each escape the limit but
> > taken together go above it.
>
> Attackers could work this around by splitting the rowhammer workload between
> multiple threads/processes.

With the proposed approach, they could split across multiple CPUs
instead, no?

... or was that covered in a prior thread?

Thanks,
Mark.

2016-10-28 11:21:50

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> I missed the original, so I've lost some context.

You can read it on lkml, but I guess you did not lose anything
important.

> Has this been tested on a system vulnerable to rowhammer, and if so, was
> it reliable in mitigating the issue?
>
> Which particular attack codebase was it tested against?

I have rowhammer-test here,

commit 9824453fff76e0a3f5d1ac8200bc6c447c4fff57
Author: Mark Seaborn <[email protected]>

. I do not have vulnerable machine near me, so no "real" tests, but
I'm pretty sure it will make the error no longer reproducible with the
newer version. [Help welcome ;-)]

> > +struct perf_event_attr rh_attr = {
> > + .type = PERF_TYPE_HARDWARE,
> > + .config = PERF_COUNT_HW_CACHE_MISSES,
> > + .size = sizeof(struct perf_event_attr),
> > + .pinned = 1,
> > + /* FIXME: it is 1000000 per cpu. */
> > + .sample_period = 500000,
> > +};
>
> I'm not sure that this is general enough to live in core code, because:

Well, I'd like to postpone debate 'where does it live' to the later
stage. The problem is not arch-specific, the solution is not too
arch-specific either. I believe we can use Kconfig to hide it from
users where it does not apply. Anyway, lets decide if it works and
where, first.

> * the precise semantics of performance counter events varies drastically
> across implementations. PERF_COUNT_HW_CACHE_MISSES, might only map to
> one particular level of cache, and/or may not be implemented on all
> cores.

If it maps to one particular cache level, we are fine (or maybe will
trigger protection too often). If some cores are not counted, that's
bad.

> * On some implementations, it may be that the counters are not
> interchangeable, and for those this would take away
> PERF_COUNT_HW_CACHE_MISSES from existing users.

Yup. Note that with this kind of protection, one missing performance
counter is likely to be small problem.

> > + *ts = now;
> > +
> > + /* FIXME msec per usec, reverse logic? */
> > + if (delta < 64 * NSEC_PER_MSEC)
> > + mdelay(56);
> > +}
>
> If I round-robin my attack across CPUs, how much does this help?

See below for new explanation. With 2 CPUs, we are fine. On monster
big-little 8-core machines, we'd probably trigger protection too
often.

Pavel

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e24e981..c6ffcaf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -315,6 +315,7 @@ config PGTABLE_LEVELS

source "init/Kconfig"
source "kernel/Kconfig.freezer"
+source "kernel/events/Kconfig"

menu "Processor type and features"

diff --git a/kernel/events/Kconfig b/kernel/events/Kconfig
new file mode 100644
index 0000000..7359427
--- /dev/null
+++ b/kernel/events/Kconfig
@@ -0,0 +1,9 @@
+config NOHAMMER
+ tristate "Rowhammer protection"
+ help
+ Enable rowhammer attack prevention. Will degrade system
+ performance under attack so much that attack should not
+ be feasible.
+
+ To compile this driver as a module, choose M here: the
+ module will be called nohammer.
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 2925188..03a2785 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -4,6 +4,8 @@ endif

obj-y := core.o ring_buffer.o callchain.o

+obj-$(CONFIG_NOHAMMER) += nohammer.o
+
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_UPROBES) += uprobes.o

diff --git a/kernel/events/nohammer.c b/kernel/events/nohammer.c
new file mode 100644
index 0000000..d96bacd
--- /dev/null
+++ b/kernel/events/nohammer.c
@@ -0,0 +1,140 @@
+/*
+ * Attempt to prevent rowhammer attack.
+ *
+ * On many new DRAM chips, repeated read access to nearby cells can cause
+ * victim cell to flip bits. Unfortunately, that can be used to gain root
+ * on affected machine, or to execute native code from javascript, escaping
+ * the sandbox.
+ *
+ * Fortunately, a lot of memory accesses is needed between DRAM refresh
+ * cycles. This is rather unusual workload, and we can detect it, and
+ * prevent the DRAM accesses, before bit flips happen.
+ *
+ * Thanks to Peter Zijlstra <[email protected]>.
+ * Thanks to presentation at blackhat.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+
+static struct perf_event_attr rh_attr = {
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_CACHE_MISSES,
+ .size = sizeof(struct perf_event_attr),
+ .pinned = 1,
+ .sample_period = 10000,
+};
+
+/*
+ * How often is the DRAM refreshed. Setting it too high is safe.
+ */
+static int dram_refresh_msec = 64;
+
+static DEFINE_PER_CPU(struct perf_event *, rh_event);
+static DEFINE_PER_CPU(u64, rh_timestamp);
+
+static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
+{
+ u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
+ u64 now = ktime_get_mono_fast_ns();
+ s64 delta = now - *ts;
+
+ *ts = now;
+
+ if (delta < dram_refresh_msec * NSEC_PER_MSEC)
+ mdelay(dram_refresh_msec);
+}
+
+static __init int rh_module_init(void)
+{
+ int cpu;
+
+/*
+ * DRAM refresh is every 64 msec. That is not enough to prevent rowhammer.
+ * Some vendors doubled the refresh rate to 32 msec, that helps a lot, but
+ * does not close the attack completely. 8 msec refresh would probably do
+ * that on almost all chips.
+ *
+ * Thinkpad X60 can produce cca 12,200,000 cache misses a second, that's
+ * 780,800 cache misses per 64 msec window.
+ *
+ * X60 is from generation that is not yet vulnerable from rowhammer, and
+ * is pretty slow machine. That means that this limit is probably very
+ * safe on newer machines.
+ */
+ int cache_misses_per_second = 12200000;
+
+/*
+ * Maximum permitted utilization of DRAM. Setting this to f will mean that
+ * when more than 1/f of maximum cache-miss performance is used, delay will
+ * be inserted, and will have similar effect on rowhammer as refreshing memory
+ * f times more often.
+ *
+ * Setting this to 8 should prevent the rowhammer attack.
+ */
+ int dram_max_utilization_factor = 8;
+
+ /*
+ * Hardware should be able to do approximately this many
+ * misses per refresh
+ */
+ int cache_miss_per_refresh = (cache_misses_per_second * dram_refresh_msec)/1000;
+
+ /*
+ * So we do not want more than this many accesses to DRAM per
+ * refresh.
+ */
+ int cache_miss_limit = cache_miss_per_refresh / dram_max_utilization_factor;
+
+/*
+ * DRAM is shared between CPUs, but these performance counters are per-CPU.
+ */
+ int max_attacking_cpus = 2;
+
+ /*
+ * We ignore counter overflows "too far away", but some of the
+ * events might have actually occurent recently. Thus additional
+ * factor of 2
+ */
+
+ rh_attr.sample_period = cache_miss_limit / (2*max_attacking_cpus);
+
+ printk("Rowhammer protection limit is set to %d cache misses per %d msec\n",
+ (int) rh_attr.sample_period, dram_refresh_msec);
+
+ /* XXX borken vs hotplug */
+
+ for_each_online_cpu(cpu) {
+ struct perf_event *event;
+
+ event = perf_event_create_kernel_counter(&rh_attr, cpu, NULL, rh_overflow, NULL);
+ per_cpu(rh_event, cpu) = event;
+ if (!event) {
+ pr_err("Not enough resources to initialize nohammer on cpu %d\n", cpu);
+ continue;
+ }
+ pr_info("Nohammer initialized on cpu %d\n", cpu);
+ }
+ return 0;
+}
+
+static __exit void rh_module_exit(void)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ struct perf_event *event = per_cpu(rh_event, cpu);
+
+ if (event)
+ perf_event_release_kernel(event);
+ }
+ return;
+}
+
+module_init(rh_module_init);
+module_exit(rh_module_exit);
+
+MODULE_DESCRIPTION("Rowhammer protection");
+//MODULE_LICENSE("GPL v2+");
+MODULE_LICENSE("GPL");


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (7.58 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 11:27:12

by Pavel Machek

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> > I agree this needs to be tunable (and with the other suggestions). But
> > this is actually not the most important tunable: the detection
> > threshold (rh_attr.sample_period) should be way more important.
>
> So being totally ignorant of the detail of how rowhammer abuses the DDR
> thing, would it make sense to trigger more often and delay shorter? Or
> is there some minimal delay required for things to settle or
> something.

We can trigger more often and delay shorter, but it will mean that
protection will trigger with more false positives. I guess I'll play
with constants too see how big the effect is.

BTW...

[ 6267.180092] INFO: NMI handler (perf_event_nmi_handler) took too
long to run: 63.501 msecs

but I'm doing mdelay(64). .5 msec is not big difference, but...

Best regards,
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (959.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 11:55:51

by Pavel Machek

[permalink] [raw]
Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> > I agree this needs to be tunable (and with the other suggestions). But
> > this is actually not the most important tunable: the detection
> > threshold (rh_attr.sample_period) should be way more important.
> >
> > And yes, this will all need to be tunable, somehow. But lets verify
> > that this works, first :-).
>
> Yeah.
>
> Btw., a 56 NMI delay is pretty brutal in terms of latencies - it might
> result in a smoother system to detect 100,000 cache misses and do a
> ~5.6 msecs delay instead?
>
> (Assuming the shorter threshold does not trigger too often, of
> course.)

Yeah, it is brutal workaround for a nasty bug. Slowdown depends on maximum utilization:

+/*
+ * Maximum permitted utilization of DRAM. Setting this to f will mean that
+ * when more than 1/f of maximum cache-miss performance is used, delay will
+ * be inserted, and will have similar effect on rowhammer as refreshing memory
+ * f times more often.
+ *
+ * Setting this to 8 should prevent the rowhammer attack.
+ */
+ int dram_max_utilization_factor = 8;

| | no prot. | fact. 1 | fact. 2 | fact. 8 |
| linux-n900$ time ./mkit | 1m35 | 1m47 | 2m07 | 6m37 |
| rowhammer-test (for 43200000) | 2.86 | 9.75 | 16.7307 | 59.3738 |

(With factor 1 and 2 cpu attacker, we don't guarantee any protection.)

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.48 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 14:05:58

by Mark Rutland

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi,

On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > it reliable in mitigating the issue?
> >
> > Which particular attack codebase was it tested against?
>
> I have rowhammer-test here,
>
> commit 9824453fff76e0a3f5d1ac8200bc6c447c4fff57
> Author: Mark Seaborn <[email protected]>

... from which repo?

> I do not have vulnerable machine near me, so no "real" tests, but
> I'm pretty sure it will make the error no longer reproducible with the
> newer version. [Help welcome ;-)]

Even if we hope this works, I think we have to be very careful with that
kind of assertion. Until we have data is to its efficacy, I don't think
we should claim that this is an effective mitigation.

> > > +struct perf_event_attr rh_attr = {
> > > + .type = PERF_TYPE_HARDWARE,
> > > + .config = PERF_COUNT_HW_CACHE_MISSES,
> > > + .size = sizeof(struct perf_event_attr),
> > > + .pinned = 1,
> > > + /* FIXME: it is 1000000 per cpu. */
> > > + .sample_period = 500000,
> > > +};
> >
> > I'm not sure that this is general enough to live in core code, because:
>
> Well, I'd like to postpone debate 'where does it live' to the later
> stage. The problem is not arch-specific, the solution is not too
> arch-specific either. I believe we can use Kconfig to hide it from
> users where it does not apply. Anyway, lets decide if it works and
> where, first.

You seem to have forgotten the drammer case here, which this would not
have protected against. I'm not sure, but I suspect that we could have
similar issues with mappings using other attributes (e.g write-through),
as these would cause the memory traffic without cache miss events.

> > * the precise semantics of performance counter events varies drastically
> > across implementations. PERF_COUNT_HW_CACHE_MISSES, might only map to
> > one particular level of cache, and/or may not be implemented on all
> > cores.
>
> If it maps to one particular cache level, we are fine (or maybe will
> trigger protection too often). If some cores are not counted, that's bad.

Perhaps, but that depends on a number of implementation details. If "too
often" means "all the time", people will turn this off when they could
otherwise have been protected (e.g. if we can accurately monitor the
last level of cache).

> > * On some implementations, it may be that the counters are not
> > interchangeable, and for those this would take away
> > PERF_COUNT_HW_CACHE_MISSES from existing users.
>
> Yup. Note that with this kind of protection, one missing performance
> counter is likely to be small problem.

That depends. Who chooses when to turn this on? If it's down to the
distro, this can adversely affect users with perfectly safe DRAM.

> > > + /* FIXME msec per usec, reverse logic? */
> > > + if (delta < 64 * NSEC_PER_MSEC)
> > > + mdelay(56);
> > > +}
> >
> > If I round-robin my attack across CPUs, how much does this help?
>
> See below for new explanation. With 2 CPUs, we are fine. On monster
> big-little 8-core machines, we'd probably trigger protection too
> often.

We see larger core counts in mobile devices these days. In China,
octa-core phones are popular, for example. Servers go much larger.

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index e24e981..c6ffcaf 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -315,6 +315,7 @@ config PGTABLE_LEVELS
>
> source "init/Kconfig"
> source "kernel/Kconfig.freezer"
> +source "kernel/events/Kconfig"
>
> menu "Processor type and features"
>
> diff --git a/kernel/events/Kconfig b/kernel/events/Kconfig
> new file mode 100644
> index 0000000..7359427
> --- /dev/null
> +++ b/kernel/events/Kconfig
> @@ -0,0 +1,9 @@
> +config NOHAMMER
> + tristate "Rowhammer protection"
> + help
> + Enable rowhammer attack prevention. Will degrade system
> + performance under attack so much that attack should not
> + be feasible.


I think that this must make it clear that this is a best-effort approach
(i.e. it does not guarantee that an attack is not possible), and also
should make clear that said penalty may occur in other situations.

[...]

> +static struct perf_event_attr rh_attr = {
> + .type = PERF_TYPE_HARDWARE,
> + .config = PERF_COUNT_HW_CACHE_MISSES,
> + .size = sizeof(struct perf_event_attr),
> + .pinned = 1,
> + .sample_period = 10000,
> +};

What kind of overhead (just from taking the interrupt) will this come
with?

> +/*
> + * How often is the DRAM refreshed. Setting it too high is safe.
> + */

Stale comment? Given the check against delta below, this doesn't look to
be true.

> +static int dram_refresh_msec = 64;
> +
> +static DEFINE_PER_CPU(struct perf_event *, rh_event);
> +static DEFINE_PER_CPU(u64, rh_timestamp);
> +
> +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> +{
> + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> + u64 now = ktime_get_mono_fast_ns();
> + s64 delta = now - *ts;
> +
> + *ts = now;
> +
> + if (delta < dram_refresh_msec * NSEC_PER_MSEC)
> + mdelay(dram_refresh_msec);
> +}

[...]

> +/*
> + * DRAM is shared between CPUs, but these performance counters are per-CPU.
> + */
> + int max_attacking_cpus = 2;

As above, many systems today have more than two CPUs. In the drammmer
paper, it looks like the majority had four.

Thanks
Mark.

2016-10-28 14:18:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Fri, Oct 28, 2016 at 03:05:22PM +0100, Mark Rutland wrote:
>
> > > * the precise semantics of performance counter events varies drastically
> > > across implementations. PERF_COUNT_HW_CACHE_MISSES, might only map to
> > > one particular level of cache, and/or may not be implemented on all
> > > cores.
> >
> > If it maps to one particular cache level, we are fine (or maybe will
> > trigger protection too often). If some cores are not counted, that's bad.
>
> Perhaps, but that depends on a number of implementation details. If "too
> often" means "all the time", people will turn this off when they could
> otherwise have been protected (e.g. if we can accurately monitor the
> last level of cache).

Right, so one of the things mentioned in the paper is x86 NT stores.
Those are not cached and I'm not at all sure they're accounted in the
event we use for cache misses.

2016-10-28 17:27:34

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > > it reliable in mitigating the issue?
> > >
> > > Which particular attack codebase was it tested against?
> >
> > I have rowhammer-test here,
> >
> > commit 9824453fff76e0a3f5d1ac8200bc6c447c4fff57
> > Author: Mark Seaborn <[email protected]>
>
> ... from which repo?

https://github.com/mseaborn/rowhammer-test.git

> > I do not have vulnerable machine near me, so no "real" tests, but
> > I'm pretty sure it will make the error no longer reproducible with the
> > newer version. [Help welcome ;-)]
>
> Even if we hope this works, I think we have to be very careful with that
> kind of assertion. Until we have data is to its efficacy, I don't think
> we should claim that this is an effective mitigation.

On my hardware, rowhammer errors are not trivial to reproduce. It
takes time (minutes). I'm pretty sure this will be enough to stop the
exploit. If you have machines where rowhammer errors are really easy
to reproduce, testing on it would be welcome.

> > Well, I'd like to postpone debate 'where does it live' to the later
> > stage. The problem is not arch-specific, the solution is not too
> > arch-specific either. I believe we can use Kconfig to hide it from
> > users where it does not apply. Anyway, lets decide if it works and
> > where, first.
>
> You seem to have forgotten the drammer case here, which this would not
> have protected against. I'm not sure, but I suspect that we could have
> similar issues with mappings using other attributes (e.g write-through),
> as these would cause the memory traffic without cache miss events.

Can you get me example code for x86 or x86-64? If this is trivial to
workaround using movnt or something like that, it would be good to
know.

I did not go through the drammer paper in too great detail. They have
some kind of DMA-able memory, and they abuse it to do direct writes?
So you can "simply" stop providing DMA-able memory to the userland,
right? [Ok, bye bye accelerated graphics, I guess. But living w/o
graphics acceleration is preferable to remote root...]

OTOH... the exploit that scares me most is javascript sandbox
escape. I should be able to stop that... and other JIT escape cases
where untrusted code does not have access to special instructions.

On x86, there seems to be "DATA_MEM_REFS" performance counter, if
cache misses do not account movnt, this one should. Will need checking.

> Perhaps, but that depends on a number of implementation details. If "too
> often" means "all the time", people will turn this off when they could
> otherwise have been protected (e.g. if we can accurately monitor the
> last level of cache).

Yup. Doing it well is preferable to doing it badly.

> > > * On some implementations, it may be that the counters are not
> > > interchangeable, and for those this would take away
> > > PERF_COUNT_HW_CACHE_MISSES from existing users.
> >
> > Yup. Note that with this kind of protection, one missing performance
> > counter is likely to be small problem.
>
> That depends. Who chooses when to turn this on? If it's down to the
> distro, this can adversely affect users with perfectly safe DRAM.

You don't want this enabled on machines with working DRAM, there will
be performance impact.

> > > > + /* FIXME msec per usec, reverse logic? */
> > > > + if (delta < 64 * NSEC_PER_MSEC)
> > > > + mdelay(56);
> > > > +}
> > >
> > > If I round-robin my attack across CPUs, how much does this help?
> >
> > See below for new explanation. With 2 CPUs, we are fine. On monster
> > big-little 8-core machines, we'd probably trigger protection too
> > often.
>
> We see larger core counts in mobile devices these days. In China,
> octa-core phones are popular, for example. Servers go much larger.

Well, I can't help everyone :-(. On servers, there's ECC. On phones,
well, don't buy broken machines. This will work, but performance
impact will not be nice.

> > +static struct perf_event_attr rh_attr = {
...
> > + .sample_period = 10000,
> > +};
>
> What kind of overhead (just from taking the interrupt) will this come
> with?

This is not used, see below.

> > +/*
> > + * How often is the DRAM refreshed. Setting it too high is safe.
> > + */
>
> Stale comment? Given the check against delta below, this doesn't look to
> be true.

Thinko, actually. Too low is safe, AFAICT.

> > +/*
> > + * DRAM is shared between CPUs, but these performance counters are per-CPU.
> > + */
> > + int max_attacking_cpus = 2;
>
> As above, many systems today have more than two CPUs. In the drammmer
> paper, it looks like the majority had four.

We can do set this automatically, and we should also take cpu hotplug
into account. But lets get it working first.

Actually in the ARM case (Drammer), it may be better to stop exploit
some other way. Turning off/redesigning GPU acceleration should work
there, right?

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (5.02 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 18:31:23

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Fri 2016-10-28 16:18:40, Peter Zijlstra wrote:
> On Fri, Oct 28, 2016 at 03:05:22PM +0100, Mark Rutland wrote:
> >
> > > > * the precise semantics of performance counter events varies drastically
> > > > across implementations. PERF_COUNT_HW_CACHE_MISSES, might only map to
> > > > one particular level of cache, and/or may not be implemented on all
> > > > cores.
> > >
> > > If it maps to one particular cache level, we are fine (or maybe will
> > > trigger protection too often). If some cores are not counted, that's bad.
> >
> > Perhaps, but that depends on a number of implementation details. If "too
> > often" means "all the time", people will turn this off when they could
> > otherwise have been protected (e.g. if we can accurately monitor the
> > last level of cache).
>
> Right, so one of the things mentioned in the paper is x86 NT stores.
> Those are not cached and I'm not at all sure they're accounted in the
> event we use for cache misses.

Would you (or someone) have pointer to good documentation source on
available performance counters?

Rowhammer is normally done using reads (not writes), exploiting fact
that you can modify memory just by reading it. But it may be possible
that writes have similar effect, and that attacker cells can be far
enough from victim cells that it is a problem.

MOVNTDQA could be another problem, but hopefully that happens only on
memory types userland does not have access to.

Hmm, and according to short test, movnt is not counted:

pavel@duo:/data/l/linux/tools$ sudo perf_3.16 stat
--event=cache-misses ./a.out
^C./a.out: Interrupt

Performance counter stats for './a.out':

61,271 cache-misses

11.605840031 seconds time elapsed

long long foo;

void main(void)
{
foo = &foo;
while (1) {
asm volatile(
"mov foo, %edi \n\
movnti %eax, (%edi)");
}
}


Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.98 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-28 18:48:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Fri, Oct 28, 2016 at 08:30:14PM +0200, Pavel Machek wrote:
> Would you (or someone) have pointer to good documentation source on
> available performance counters?

The Intel SDM has a section on them and the AMD Bios and Kernel
Developers Guide does too.

That is, they contain lists of available counters for the various parts
from these vendors and that's pretty much all there is.

2016-10-29 13:12:19

by Daniel Gruss

[permalink] [raw]
Subject: Re: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

I think that this idea to mitigate Rowhammer is not a good approach.

I wrote Rowhammer.js (we published a paper on that) and I had the first
reproducible bit flips on DDR4 at both, increased and default refresh
rates (published in our DRAMA paper).

We have researched the number of cache misses induced from different
applications in the past and there are many applications that cause more
cache misses than Rowhammer (published in our Flush+Flush paper) they
just cause them on different rows.
Slowing down a system surely works, but you could also, as a mitigation
just make this CPU core run at the lowest possible frequency. That would
likely be more effective than the solution you suggest.

Now, every Rowhammer attack exploits not only the DRAM effects but also
the way the operating system organizes memory.

Some papers exploit page deduplication and disabling page deduplication
should be the default also for other reasons, such as information
disclosure attacks. If page deduplication is disabled, attacks like
Dedup est Machina and Flip Feng Shui are inherently not possible anymore.

Most other attacks target page tables (the Google exploit, Rowhammer.js,
Drammer). Now in Rowhammer.js we suggested a very simple fix, that is
just an extension of what Linux already does.
Unless out of memory page tables and user pages are not placed in the
same 2MB region. We suggested that this behavior should be more strict
even in memory pressure situations. If the OS can only find a page table
that resides in the same 2MB region as a user page, the request should
fail instead and the process requesting it should go out of memory. More
generally, the attack surface is gone if the OS never places a page
table in proximity of less than 2MB to a user page.
That is a simple fix that does not cost any runtime performance. It
mitigates all these scary attacks and won't even incur a memory cost in
most situation.

2016-10-29 19:42:31

by Pavel Machek

[permalink] [raw]
Subject: Re: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> I think that this idea to mitigate Rowhammer is not a good approach.

Well.. it does not have to be good if it is the best we have.

> I wrote Rowhammer.js (we published a paper on that) and I had the first
> reproducible bit flips on DDR4 at both, increased and default refresh rates
> (published in our DRAMA paper).

Congratulations. Now I'd like to take away your toys :-).

> We have researched the number of cache misses induced from different
> applications in the past and there are many applications that cause more
> cache misses than Rowhammer (published in our Flush+Flush paper) they just
> cause them on different rows.
> Slowing down a system surely works, but you could also, as a mitigation just
> make this CPU core run at the lowest possible frequency. That would likely
> be more effective than the solution you suggest.

Not in my testing. First, I'm not at all sure lowest CPU speed would
make any difference at all (even CPU at lowest clock is way faster
than DRAM). Second, going to lowest clock speed will reduce
performance

[But if you can test it and it works... it would be nice to know. It
is very simple to implement w/o kernel changes.]

> Now, every Rowhammer attack exploits not only the DRAM effects but also the
> way the operating system organizes memory.
>
> Some papers exploit page deduplication and disabling page deduplication
> should be the default also for other reasons, such as information disclosure
> attacks. If page deduplication is disabled, attacks like Dedup est Machina
> and Flip Feng Shui are inherently not possible anymore.

No, sorry, not going to play this particular whack-a-mole game. Linux
is designed for working hardware, and with bit flips, something is
going to break. (Does Flip Feng Shui really depend on dedup?)

> Most other attacks target page tables (the Google exploit, Rowhammer.js,
> Drammer). Now in Rowhammer.js we suggested a very simple fix, that is just
> an extension of what Linux already does.
> Unless out of memory page tables and user pages are not placed in the same
> 2MB region. We suggested that this behavior should be more strict even in
> memory pressure situations. If the OS can only find a page table that
> resides in the same 2MB region as a user page, the request should fail
> instead and the process requesting it should go out of memory. More
> generally, the attack surface is gone if the OS never places a page table in
> proximity of less than 2MB to a user page.

But it will be nowhere near complete fix, right?

Will fix user attacking kernel, but not user1 attacking user2. You
could put each "user" into separate 2MB region, but then you'd have to
track who needs go go where. (Same uid is not enough, probably "can
ptrace"?)

But more importantly....

That'll still let remote server gain permissons of local user running
web server... using javascript exploit right? And that's actually
attack that I find most scary. Local user to root exploit is bad, but
getting permissions of web browser from remote web server is very,
very, very bad.

> That is a simple fix that does not cost any runtime performance.

Simple? Not really, I'm afraid. Feel free to try to implement it.

Best regards,

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (3.29 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-29 20:05:21

by Daniel Gruss

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On 29.10.2016 21:42, Pavel Machek wrote:
> Congratulations. Now I'd like to take away your toys :-).

I'm would like you to do that, but I'm very confident you're not
successful the way your starting ;)

> Not in my testing.

Have you tried music/video reencoding? Games? Anything that works with a
decent amount of memory but not too much hard disk i/o?
Numbers are very clear there...

> First, I'm not at all sure lowest CPU speed would
> make any difference at all

It would. I've seen many bitflips but none where the CPU operated in the
lower frequency range.

> Second, going to lowest clock speed will reduce performance

As does the countermeasure you propose...

> No, sorry, not going to play this particular whack-a-mole game.

But you are already with the countermeasure you propose...

> Linux is designed for working hardware, and with bit flips, something is
> going to break. (Does Flip Feng Shui really depend on dedup?)

Deduplication should be disabled not because of bit flips but because of
information leakage (deduplication attacks, cache side-channel attacks, ...)

Yes, Flip Feng Shui requires deduplication and does not work without.
Disabling deduplication is what the authors recommend as a countermeasure.

> But it will be nowhere near complete fix, right?
>
> Will fix user attacking kernel, but not user1 attacking user2. You
> could put each "user" into separate 2MB region, but then you'd have to
> track who needs go go where. (Same uid is not enough, probably "can
> ptrace"?)

Exactly. But preventing user2kernel is already a good start, and you
would prevent that without any doubt and without any cost.

user2user is something else to think about and more complicated because
you have shared libraries + copy on write --> same problems as
deduplication. I think it might make sense to discuss whether separating
by uids or even pids would be viable.

> That'll still let remote server gain permissons of local user running
> web server... using javascript exploit right? And that's actually
> attack that I find most scary. Local user to root exploit is bad, but
> getting permissions of web browser from remote web server is very,
> very, very bad.

Rowhammer.js skips the browser... it goes JS to full phys. memory
access. Anyway, preventing Rowhammer from JS should be easy because even
the slightest slow down should be enough to prevent any Rowhammer attack
from JS.

>> That is a simple fix that does not cost any runtime performance.
>
> Simple? Not really, I'm afraid. Feel free to try to implement it.

I had a student who already implemented this in another OS, I'm
confident it can be done in Linux as well...


Cheers,
Daniel

2016-10-29 21:06:11

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

On Sat 2016-10-29 22:05:16, Daniel Gruss wrote:
> On 29.10.2016 21:42, Pavel Machek wrote:
> >Congratulations. Now I'd like to take away your toys :-).
>
> I'm would like you to do that, but I'm very confident you're not successful
> the way your starting ;)

:-). Lets see.

> >Not in my testing.
>
> Have you tried music/video reencoding? Games? Anything that works with a
> decent amount of memory but not too much hard disk i/o?
> Numbers are very clear there...

So far I did bzip2 and kernel compilation. I believe I can prevent
flips in rowhammer-test with bzip2 going from 4 seconds to 5
seconds... let me see.

If you have simple test that you'd like me to try, speak up. Best if
it takes cca 10 seconds to run.

> >First, I'm not at all sure lowest CPU speed would
> >make any difference at all
>
> It would. I've seen many bitflips but none where the CPU operated in the
> lower frequency range.

Ok, let me try that. Problem is that the machine I'm testing on takes
20 minutes to produce bit flip...

> >Second, going to lowest clock speed will reduce performance
>
> As does the countermeasure you propose...

Yes. But hopefully not quite _as_ drastically. (going to lowest clock
would make bzip2 go from 4 to 12 seconds or so, right?)

> Yes, Flip Feng Shui requires deduplication and does not work without.
> Disabling deduplication is what the authors recommend as a
> countermeasure.

Ok, Flip Feng Shui is easy, then. :-).

> >But it will be nowhere near complete fix, right?
> >
> >Will fix user attacking kernel, but not user1 attacking user2. You
> >could put each "user" into separate 2MB region, but then you'd have to
> >track who needs go go where. (Same uid is not enough, probably "can
> >ptrace"?)
>
> Exactly. But preventing user2kernel is already a good start, and you would
> prevent that without any doubt and without any cost.

Well, it is only good start if the result is mergeable, and can be
used to prevent all attacks we care about.

> >That'll still let remote server gain permissons of local user running
> >web server... using javascript exploit right? And that's actually
> >attack that I find most scary. Local user to root exploit is bad, but
> >getting permissions of web browser from remote web server is very,
> >very, very bad.
>
> Rowhammer.js skips the browser... it goes JS to full phys. memory access.
> Anyway, preventing Rowhammer from JS should be easy because even the
> slightest slow down should be enough to prevent any Rowhammer attack from
> JS.

Are you sure? How much slowdown is enough to prevent the attack? (And
can I get patched chromium? Patched JVM? Patched qemu?) Dunno.. are
only just in time compilers affected? Or can I get for example pdf
document that does all the wrong memory accesses during rendering,
triggering buffer overrun in xpdf and arbitrary code execution?

Running userland on non-working machine is scary :-(.

Shall we introduce new syscall "get_mandatory_jit_slowdown()"?

I'd like kernel patch that works around rowhammer problem... in
kernel. I'm willing to accept some slowdown (say from 4 to 6 seconds
for common tasks). I'd prefer solution to be contained in kernel, and
present working (but slower) machine to userspace. I believe I can do
that.

> >>That is a simple fix that does not cost any runtime performance.
> >
> >Simple? Not really, I'm afraid. Feel free to try to implement it.
>
> I had a student who already implemented this in another OS, I'm confident it
> can be done in Linux as well...

Well, I'm not saying its impossible. But I'd like to see the
implementation. Its definitely more work than nohammer.c. Order of
magnitude more, at least.

But yes, it will help with side channel attacks, etc. So yes, I'd like
to see the patch.

Best regards,

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (3.83 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-29 21:08:04

by Daniel Gruss

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On 29.10.2016 23:05, Pavel Machek wrote:
> So far I did bzip2 and kernel compilation. I believe I can prevent
> flips in rowhammer-test with bzip2 going from 4 seconds to 5
> seconds... let me see.

can you prevent bitflips in this one?
https://github.com/IAIK/rowhammerjs/tree/master/native

> Ok, let me try that. Problem is that the machine I'm testing on takes
> 20 minutes to produce bit flip...

will be lots faster with my code above ;)

2016-10-29 21:45:46

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Sat 2016-10-29 23:07:59, Daniel Gruss wrote:
> On 29.10.2016 23:05, Pavel Machek wrote:
> >So far I did bzip2 and kernel compilation. I believe I can prevent
> >flips in rowhammer-test with bzip2 going from 4 seconds to 5
> >seconds... let me see.
>
> can you prevent bitflips in this one?
> https://github.com/IAIK/rowhammerjs/tree/master/native

Thanks for the pointer. Unfortunately, my test machine has 64-bit
kernel, but 32-bit userland, so I can't compile it:

g++ -g -pthread -std=c++11 -O3 -o rowhammer rowhammer.cc
rowhammer.cc: In function ‘int main(int, char**)’:
rowhammer.cc:243:57: error: inconsistent operand constraints in an
‘asm’
asm volatile ("rdtscp" : "=a" (a), "=d" (d) : : "rcx");

I tried g++ -m64, but that does not seem to work here at all. I'll try
to find some way to compile it during the week.

(BTW any idea which version would be right for this cpu?

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz
stepping : 10

Its Wolfdale-3M according to wikipedia... that seems older than
indy/sandy/haswell/skylake, so I'll just use the generic version...?)

> >Ok, let me try that. Problem is that the machine I'm testing on takes
> >20 minutes to produce bit flip...
>
> will be lots faster with my code above ;)

Yes, that will help :-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.49 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-29 21:50:01

by Daniel Gruss

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On 29.10.2016 23:45, Pavel Machek wrote:
> indy/sandy/haswell/skylake, so I'll just use the generic version...?)

yes, generic might work, but i never tested it on anything that old...

on my system i have >30 bit flips per second (ivy bridge i5-3xxx) with
the rowhammer-ivy test... sometimes even more than 100 per second...

2016-10-29 22:01:38

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Sat 2016-10-29 23:49:57, Daniel Gruss wrote:
> On 29.10.2016 23:45, Pavel Machek wrote:
> >indy/sandy/haswell/skylake, so I'll just use the generic version...?)
>
> yes, generic might work, but i never tested it on anything that old...
>
> on my system i have >30 bit flips per second (ivy bridge i5-3xxx) with the
> rowhammer-ivy test... sometimes even more than 100 per second...

Hmm, maybe I'm glad I don't have a new machine :-).

I assume you still get _some_ bitflips with generic "rowhammer"?

Best regards,
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (673.00 B)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-29 22:02:50

by Daniel Gruss

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On 30.10.2016 00:01, Pavel Machek wrote:
> Hmm, maybe I'm glad I don't have a new machine :-).
>
> I assume you still get _some_ bitflips with generic "rowhammer"?

1 or 2 every 20-30 minutes...

2016-10-31 08:27:10

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > > it reliable in mitigating the issue?
> > >
> > > Which particular attack codebase was it tested against?
> >
> > I have rowhammer-test here,
> >
> > commit 9824453fff76e0a3f5d1ac8200bc6c447c4fff57
> > Author: Mark Seaborn <[email protected]>
>
> ... from which repo?
>
> > I do not have vulnerable machine near me, so no "real" tests, but
> > I'm pretty sure it will make the error no longer reproducible with the
> > newer version. [Help welcome ;-)]
>
> Even if we hope this works, I think we have to be very careful with that
> kind of assertion. Until we have data is to its efficacy, I don't think
> we should claim that this is an effective mitigation.

Ok, so it turns out I was right. On my vulnerable machine, normally
bug is reproducible in less than 500 iterations:

Iteration 432 (after 1013.31s)
error at 0xda7cf280: got 0xffffffffffffffef
Iteration 446 (after 1102.56s)
error at 0xec21ea00: got 0xffffffefffffffff
Iteration 206 (after 497.50s)
error at 0xd07d1438: got 0xffffffffffffffdf
Iteration 409 (after 1350.96s)
error at 0xbd3b9108: got 0xefffffffffffffff
Iteration 120 (after 326.08s)
error at 0xe398c438: got 0xffffffffffffffdf

With nohammer, I'm at 2300 iterations, and still no faults.

Daniel Gruss <[email protected]> claims he has an attack that can do 30
flips a second on modern hardware. I'm not going to buy broken
hardware just for a test. Code is at
https://github.com/IAIK/rowhammerjs/tree/master/native . Would someone
be willing to get it running on vulnerable machine and test kernel
patches?

Thanks,

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.82 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-31 14:47:49

by Mark Rutland

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Mon, Oct 31, 2016 at 09:27:05AM +0100, Pavel Machek wrote:
> > On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > > > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > > > it reliable in mitigating the issue?

> > > I do not have vulnerable machine near me, so no "real" tests, but
> > > I'm pretty sure it will make the error no longer reproducible with the
> > > newer version. [Help welcome ;-)]
> >
> > Even if we hope this works, I think we have to be very careful with that
> > kind of assertion. Until we have data is to its efficacy, I don't think
> > we should claim that this is an effective mitigation.
>
> Ok, so it turns out I was right. On my vulnerable machine, normally
> bug is reproducible in less than 500 iterations:

> With nohammer, I'm at 2300 iterations, and still no faults.

To be quite frank, this is anecdotal. It only shows one particular attack is
made slower (or perhaps defeated), and doesn't show that the mitigation is
reliable or generally applicable (to other machines or other variants of the
attack).

Even if this happens to work on some machines, I still do not think one can
sell this as a generally applicable and reliable mitigation. Especially given
that others working in this area seem to have evidence otherwise, e.g. [1] (as
noted by spender in the LWN comments).

Thanks,
Mark.

[1] https://twitter.com/halvarflake/status/792314613568311296

2016-10-31 21:13:08

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Mon 2016-10-31 14:47:39, Mark Rutland wrote:
> On Mon, Oct 31, 2016 at 09:27:05AM +0100, Pavel Machek wrote:
> > > On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > > > > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > > > > it reliable in mitigating the issue?
>
> > > > I do not have vulnerable machine near me, so no "real" tests, but
> > > > I'm pretty sure it will make the error no longer reproducible with the
> > > > newer version. [Help welcome ;-)]
> > >
> > > Even if we hope this works, I think we have to be very careful with that
> > > kind of assertion. Until we have data is to its efficacy, I don't think
> > > we should claim that this is an effective mitigation.
...
>
> To be quite frank, this is anecdotal. It only shows one particular attack is
> made slower (or perhaps defeated), and doesn't show that the mitigation is
> reliable or generally applicable (to other machines or other variants of the
> attack).

So... I said that I'm pretty sure it will fix problem in my testing,
then you say that I should be careful with my words, I confirm it was
true, and now you complain that it is anecdotal?

Are you serious?

Of course I know that fixing rowhammer-test on my machine is quite a
low bar to ask. _And that's also why I said I'm pretty sure I'd pass
that bar_.

I'm still asking for help with testing, but all you do is claim that
"we can't be sure".

> Even if this happens to work on some machines, I still do not think one can
> sell this as a generally applicable and reliable mitigation. Especially given
> that others working in this area seem to have evidence otherwise, e.g. [1] (as
> noted by spender in the LWN comments).

Slowing this attack _is_ defeating it. It is enough to slow it 8
times, and it is gone, boom, not there any more.

Now.. I have to figure out what to do with movnt. No currently known
attack uses movnt. Still, that one should be solved.

Other than that... this is not magic. Attack is quite well
understood. All you have to do is prevent more than 8msec worth of
memory accesses. My patch can do that, and it will work,
everywhere... you just won't like the fact that your machine now works
on 10% of original performance.

Now, it is possible that researches will come up with attack that only
needs 2msec worth of accesses. So we change the constants. Performance
will be even worse. It is also possible that even more broken DRAM
comes out. Same solution. Plus someone certainly has a memory that
flips some bits even without help from funny access patterns. Too
bad. We can't help them.

Would it be less confusing if we redefined task description from
"prevent rowhammer" to "prevent more than X memory accesses in 64
msec"?

Best regards,
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.84 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-10-31 22:09:17

by Mark Rutland

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Mon, Oct 31, 2016 at 10:13:03PM +0100, Pavel Machek wrote:
> On Mon 2016-10-31 14:47:39, Mark Rutland wrote:
> > On Mon, Oct 31, 2016 at 09:27:05AM +0100, Pavel Machek wrote:
> > > > On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > > > > > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > > > > > it reliable in mitigating the issue?
> >
> > > > > I do not have vulnerable machine near me, so no "real" tests, but
> > > > > I'm pretty sure it will make the error no longer reproducible with the
> > > > > newer version. [Help welcome ;-)]
> > > >
> > > > Even if we hope this works, I think we have to be very careful with that
> > > > kind of assertion. Until we have data is to its efficacy, I don't think
> > > > we should claim that this is an effective mitigation.
> ...
> >
> > To be quite frank, this is anecdotal. It only shows one particular attack is
> > made slower (or perhaps defeated), and doesn't show that the mitigation is
> > reliable or generally applicable (to other machines or other variants of the
> > attack).
>
> So... I said that I'm pretty sure it will fix problem in my testing,
> then you say that I should be careful with my words, I confirm it was
> true, and now you complain that it is anecdotal?

Clearly I have chosen my words poorly here. I believe that this may help
against some attacks on some machines and workloads, and I believe your results
for your machine.

My main concern was that this appears to be described as a general solution, as
in the Kconfig text:

Enable rowhammer attack prevention. Will degrade system
performance under attack so much that attack should not
be feasible.

... yet there are a number of reasons why this may not be the case given varied
attack mechanisms (e.g. using non-cacheable mappings, movnt, etc), given some
hardware configurations (e.g. "large" SMP machines or where timing is
marginal), given some workloads may incidentally trip often enough to be
severely penalised, and given that performance counter support is sufficiently
varied (across architectures, CPU implementations, and even boards using the
same CPU if one considers things like interrupt routing).

Given that, I think that makes an overly-strong, and perhaps misleading claim
(i.e. people could turn the option on and believe that they are protected, when
they are not, leaving them worse off). It isn't really possible to fail
gracefully here, and even if this is suitable for some hardware, very few
people are in a position to determine whether their hardware falls in that
category.

Unfortunately, I do not believe that there is a simple and/or general software
mitigation.

> Would it be less confusing if we redefined task description from
> "prevent rowhammer" to "prevent more than X memory accesses in 64
> msec"?

Definitely. Quantifying exactly what you're trying to defend against (and
therefore what you are not) would help to address at least one of my concerns.

Thanks,
Mark.

2016-11-01 06:34:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]


* Pavel Machek <[email protected]> wrote:

> I'm not going to buy broken hardware just for a test.

Can you suggest a method to find heavily rowhammer affected hardware? Only by
testing it, or are there some chipset IDs ranges or dmidecode info that will
pinpoint potentially affected machines?

Thanks,

Ingo

2016-11-01 07:20:34

by Daniel Micay

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On Tue, 2016-11-01 at 07:33 +0100, Ingo Molnar wrote:
> * Pavel Machek <[email protected]> wrote:
>
> > I'm not going to buy broken hardware just for a test.
>
> Can you suggest a method to find heavily rowhammer affected hardware?
> Only by 
> testing it, or are there some chipset IDs ranges or dmidecode info
> that will 
> pinpoint potentially affected machines?
>
> Thanks,
>
> Ingo

You can read the memory timing values, but you can't know if they're
reasonable for that hardware. Higher quality memory can have better
timings without being broken. The only relevant information would be the
memory model, combined with an expensive / time consuming effort to
build a blacklist based on testing. It doesn't seem realistic, unless
it's done in a coarse way based on brand and the date information.

I don't know how to get this data on Linux. The CPU-Z tool for Windows
knows how to obtain it but it's based on a proprietary library.

You definitely don't need to buy broken hardware to test a broken
hardware setup though. You just need a custom computer build where
motherboards expose the memory timing configuration. You can make it
more vulnerable by raising the refresh period (tREF). I wanted to play
around with that but haven't gotten around to it.


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part

2016-11-01 07:54:00

by Daniel Gruss

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On 01.11.2016 07:33, Ingo Molnar wrote:
> Can you suggest a method to find heavily rowhammer affected hardware? Only by
> testing it, or are there some chipset IDs ranges or dmidecode info that will
> pinpoint potentially affected machines?

I have worked with many different systems both on running rowhammer
attacks and testing defense mechanisms. So far, every Ivy Bridge i5
(DDR3) that I had access to was susceptible to bit flips - you will have
highest chances with Ivy Bridge i5...

2016-11-01 08:10:51

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> * Pavel Machek <[email protected]> wrote:
>
> > I'm not going to buy broken hardware just for a test.
>
> Can you suggest a method to find heavily rowhammer affected hardware? Only by
> testing it, or are there some chipset IDs ranges or dmidecode info that will
> pinpoint potentially affected machines?

Testing can be used. https://github.com/mseaborn/rowhammer-test.git
. It finds faults at 1 of 2 machines here (but takes half an
hour). Then, if your hardware is one of ivy/sandy/haswell/skylake,
https://github.com/IAIK/rowhammerjs.git can be used for much faster
attack (many flips a second).

Unfortunately, what I have here is:

cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz
stepping : 10
microcode : 0xa07

so rowhammerjs/native is not available for this system. Bit mapping
for memory hash functions would need to be reverse engineered for more
effective attack.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.08 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-11-01 08:13:26

by Daniel Gruss

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

On 01.11.2016 09:10, Pavel Machek wrote:
> cpu family : 6
> model : 23
> model name : Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz
> stepping : 10
> microcode : 0xa07
>
> so rowhammerjs/native is not available for this system. Bit mapping
> for memory hash functions would need to be reverse engineered for more
> effective attack.

By coincidence, we wrote a tool to do that in software:
https://github.com/IAIK/drama ;)

2016-11-02 18:13:56

by Pavel Machek

[permalink] [raw]
Subject: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

Hi!

> On Fri, Oct 28, 2016 at 03:05:22PM +0100, Mark Rutland wrote:
> >
> > > > * the precise semantics of performance counter events varies drastically
> > > > across implementations. PERF_COUNT_HW_CACHE_MISSES, might only map to
> > > > one particular level of cache, and/or may not be implemented on all
> > > > cores.
> > >
> > > If it maps to one particular cache level, we are fine (or maybe will
> > > trigger protection too often). If some cores are not counted, that's bad.
> >
> > Perhaps, but that depends on a number of implementation details. If "too
> > often" means "all the time", people will turn this off when they could
> > otherwise have been protected (e.g. if we can accurately monitor the
> > last level of cache).
>
> Right, so one of the things mentioned in the paper is x86 NT stores.
> Those are not cached and I'm not at all sure they're accounted in the
> event we use for cache misses.

Well, I tried this... and the movnti is as fast as plain mov. Clearly
it is being cached here.

I guess we could switch to different performance counter, such as

+ [PERF_COUNT_HW_BUS_CYCLES] = 0xc06f, /* Non
halted bus cycles: 0x013c */

if NT stores are indeed a problem. But so far I don't have any
indication they are, so I'd like to have an working example to test
against. (It does not have to produce bitflips, it would be enough to
produce enough memory traffic bypassing cache.)

Best regards,
Pavel

/*
* gcc -O2 rowhammer.c -o rowhammer
*/

char pad[1024];
long long foo;
char pad2[1024];

void main(void)
{
long long i;
asm volatile(
"mov $foo, %%edi \n\
clflush (%%edi)" ::: "%edi");

for (i=0; i<1000000000; i++) {
#if 1
asm volatile(
"mov $foo, %%edi \n\
movnti %%eax, (%%edi)" ::: "%edi");
#endif

// asm volatile( "" );
}
}


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.91 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments