Hi!
It seems that it is easy to induce DRAM bit errors by doing repeated
reads from adjacent memory cells on common hw. Details are at
https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
. Older memory modules seem to work better, and ECC should detect
this. Paper has inner loop that should trigger this.
Workarounds seem to be at hardware level, and tricky, too.
Does anyone have implementation of detector? Any ideas how to work
around it in software?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
(I added original researches to the list).
I see you have FPGA-based detector, and probably PC based detector,
too. Would it be possible to share sources of the PC based one?
Thanks,
Pavel
On Wed 2014-12-24 17:38:23, Pavel Machek wrote:
> Hi!
>
> It seems that it is easy to induce DRAM bit errors by doing repeated
> reads from adjacent memory cells on common hw. Details are at
>
> https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
>
> . Older memory modules seem to work better, and ECC should detect
> this. Paper has inner loop that should trigger this.
>
> Workarounds seem to be at hardware level, and tricky, too.
>
> Does anyone have implementation of detector? Any ideas how to work
> around it in software?
>
> Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, Dec 24, 2014 at 8:38 AM, Pavel Machek <[email protected]> wrote:
> Hi!
>
> It seems that it is easy to induce DRAM bit errors by doing repeated
> reads from adjacent memory cells on common hw. Details are at
>
> https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
>
> . Older memory modules seem to work better, and ECC should detect
> this. Paper has inner loop that should trigger this.
>
> Workarounds seem to be at hardware level, and tricky, too.
One mostly-effective solution would be to stop buying computers
without ECC. Unfortunately, no one seems to sell non-server chips
that can do ECC.
>
> Does anyone have implementation of detector? Any ideas how to work
> around it in software?
>
Platform-dependent page coloring with very strict, and impossible to
implement fully correctly, page allocation constraints?
--Andy
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Andy Lutomirski
AMA Capital Management, LLC
On Wed 2014-12-24 09:13:32, Andy Lutomirski wrote:
> On Wed, Dec 24, 2014 at 8:38 AM, Pavel Machek <[email protected]> wrote:
> > Hi!
> >
> > It seems that it is easy to induce DRAM bit errors by doing repeated
> > reads from adjacent memory cells on common hw. Details are at
> >
> > https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
> >
> > . Older memory modules seem to work better, and ECC should detect
> > this. Paper has inner loop that should trigger this.
> >
> > Workarounds seem to be at hardware level, and tricky, too.
>
> One mostly-effective solution would be to stop buying computers
> without ECC. Unfortunately, no one seems to sell non-server chips
> that can do ECC.
Or keep using old computers :-).
> > Does anyone have implementation of detector? Any ideas how to work
> > around it in software?
> >
>
> Platform-dependent page coloring with very strict, and impossible to
> implement fully correctly, page allocation constraints?
This seems to be at cacheline level, not at page level, if I
understand it correctly.
So the problem would is: I have something mapped read-only, and I can
still cause bitflips in it.
Hmm. So it is pretty obviously a security problem, no need for
java. Just do some bit flips in binary root is going to run, and it
will crash for him. You can map binaries read-only, so you have enough
access.
As far as I understand it, attached program could reproduce it on
affected machines?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, Dec 24, 2014 at 9:25 AM, Pavel Machek <[email protected]> wrote:
> On Wed 2014-12-24 09:13:32, Andy Lutomirski wrote:
>> On Wed, Dec 24, 2014 at 8:38 AM, Pavel Machek <[email protected]> wrote:
>> > Hi!
>> >
>> > It seems that it is easy to induce DRAM bit errors by doing repeated
>> > reads from adjacent memory cells on common hw. Details are at
>> >
>> > https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
>> >
>> > . Older memory modules seem to work better, and ECC should detect
>> > this. Paper has inner loop that should trigger this.
>> >
>> > Workarounds seem to be at hardware level, and tricky, too.
>>
>> One mostly-effective solution would be to stop buying computers
>> without ECC. Unfortunately, no one seems to sell non-server chips
>> that can do ECC.
>
> Or keep using old computers :-).
>
>> > Does anyone have implementation of detector? Any ideas how to work
>> > around it in software?
>> >
>>
>> Platform-dependent page coloring with very strict, and impossible to
>> implement fully correctly, page allocation constraints?
>
> This seems to be at cacheline level, not at page level, if I
> understand it correctly.
>
> So the problem would is: I have something mapped read-only, and I can
> still cause bitflips in it.
>
> Hmm. So it is pretty obviously a security problem, no need for
> java. Just do some bit flips in binary root is going to run, and it
> will crash for him. You can map binaries read-only, so you have enough
> access.
Right. So we're mostly screwed.
>
> As far as I understand it, attached program could reproduce it on
> affected machines?
I thought that article suggested using addresses 8M (is that 8
megabytes?) apart for the two accesses.
--Andy
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Andy Lutomirski
AMA Capital Management, LLC
On Wed 2014-12-24 09:38:22, Andy Lutomirski wrote:
> On Wed, Dec 24, 2014 at 9:25 AM, Pavel Machek <[email protected]> wrote:
> > On Wed 2014-12-24 09:13:32, Andy Lutomirski wrote:
> >> On Wed, Dec 24, 2014 at 8:38 AM, Pavel Machek <[email protected]> wrote:
> >> > Hi!
> >> >
> >> > It seems that it is easy to induce DRAM bit errors by doing repeated
> >> > reads from adjacent memory cells on common hw. Details are at
> >> >
> >> > https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
> >> >
> >> > . Older memory modules seem to work better, and ECC should detect
> >> > this. Paper has inner loop that should trigger this.
> >> >
> >> > Workarounds seem to be at hardware level, and tricky, too.
> >>
> >> One mostly-effective solution would be to stop buying computers
> >> without ECC. Unfortunately, no one seems to sell non-server chips
> >> that can do ECC.
> >
> > Or keep using old computers :-).
> >
> >> > Does anyone have implementation of detector? Any ideas how to work
> >> > around it in software?
> >> >
> >>
> >> Platform-dependent page coloring with very strict, and impossible to
> >> implement fully correctly, page allocation constraints?
> >
> > This seems to be at cacheline level, not at page level, if I
> > understand it correctly.
> >
> > So the problem would is: I have something mapped read-only, and I can
> > still cause bitflips in it.
> >
> > Hmm. So it is pretty obviously a security problem, no need for
> > java. Just do some bit flips in binary root is going to run, and it
> > will crash for him. You can map binaries read-only, so you have enough
> > access.
> Right. So we're mostly screwed.
Well... We could periodically scrub (every few miliseconds) pages
mapped to userspace. We might be able to do some magic and disallow
cache flushes to userspace programs. We might be able to use
performance metrics to detect heavy readers.
We might be able to reprogram DRAM controller to refresh more often.
Or we may switch to AMD systems as they seem to be less suspectible
:-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> Try this test program: https://github.com/mseaborn/rowhammer-test
>
> It has reproduced bit flips on various machines.
>
> Your program won't be an effective test because you're just hammering
> addresses x and x+64, which will typically be in the same row of
> DRAM.
Yep, I found out I was wrong in the meantime.
> For the test to be effective, you have to pick addresses that are in
> different rows but in the same bank. A good way of doing that is just to
> pick random pairs of addresses (as the test program above does). If the
> machine has 16 banks of DRAM (as many of the machines I've tested on do),
> there will be a 1/16 chance that the two addresses are in the same
> bank.
How long does it normally teake to reproduce something on the bad machine?
> [Replying off-list just because I'm not subscribed to lkml and only saw
> this thread via the web, but feel free to reply on the list. :-) ]
Will do. (Actually, it is ok to reply to lkml even if you are not
subscribed; lkml is open list.).
In the meantime, I created test that actually uses physical memory,
8MB apart, as described in some footnote. It is attached. It should
work, but it needs boot with specific config options and specific
kernel parameters.
[Unfortunately, I don't have new-enough machine handy].
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed 2014-12-24 11:47:50, Mark Seaborn wrote:
> Hi Pavel,
>
> Try this test program: https://github.com/mseaborn/rowhammer-test
>
> It has reproduced bit flips on various machines.
>
> Your program won't be an effective test because you're just hammering
> addresses x and x+64, which will typically be in the same row of DRAM.
>
> For the test to be effective, you have to pick addresses that are in
> different rows but in the same bank. A good way of doing that is just to
> pick random pairs of addresses (as the test program above does). If the
> machine has 16 banks of DRAM (as many of the machines I've tested on do),
> there will be a 1/16 chance that the two addresses are in the same bank.
>
> [Replying off-list just because I'm not subscribed to lkml and only saw
> this thread via the web, but feel free to reply on the list. :-) ]
Ok, so I thought my machine is too old to be affected. Apparently, it
is not :-(. (With rowhammer-test).
Iteration 140 (after 328.76s)
48.805 nanosec per iteration: 2.1084 sec for 43200000 iterations
check
error at 0x890f1118: got 0xfeffffffffffffff
(check took 0.244179s)
** exited with status 256 (0x100)
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz
stepping : 10
microcode : 0xa07
cpu MHz : 1596.000
cache size : 3072 KB
Pavel
> Cheers,
> Mark
>
> Pavel Machek <pavel <at> ucw.cz> wrote:
> > On Wed 2014-12-24 09:13:32, Andy Lutomirski wrote:
> > > On Wed, Dec 24, 2014 at 8:38 AM, Pavel Machek <pavel <at> ucw.cz> wrote:
> > > > Hi!
> > > >
> > > > It seems that it is easy to induce DRAM bit errors by doing repeated
> > > > reads from adjacent memory cells on common hw. Details are at
> > > >
> > > > https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
> > > >
> > > > . Older memory modules seem to work better, and ECC should detect
> > > > this. Paper has inner loop that should trigger this.
> > > >
> > > > Workarounds seem to be at hardware level, and tricky, too.
> > >
> > > One mostly-effective solution would be to stop buying computers
> > > without ECC. Unfortunately, no one seems to sell non-server chips
> > > that can do ECC.
> >
> > Or keep using old computers .
> >
> > > > Does anyone have implementation of detector? Any ideas how to work
> > > > around it in software?
> > > >
> > >
> > > Platform-dependent page coloring with very strict, and impossible to
> > > implement fully correctly, page allocation constraints?
> >
> > This seems to be at cacheline level, not at page level, if I
> > understand it correctly.
> >
> > So the problem would is: I have something mapped read-only, and I can
> > still cause bitflips in it.
> >
> > Hmm. So it is pretty obviously a security problem, no need for
> > java. Just do some bit flips in binary root is going to run, and it
> > will crash for him. You can map binaries read-only, so you have enough
> > access.
> >
> > As far as I understand it, attached program could reproduce it on
> > affected machines?
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
>
> Try this test program: https://github.com/mseaborn/rowhammer-test
>
> It has reproduced bit flips on various machines.
>
> Your program won't be an effective test because you're just hammering
> addresses x and x+64, which will typically be in the same row of DRAM.
>
> For the test to be effective, you have to pick addresses that are in
> different rows but in the same bank. A good way of doing that is just to
> pick random pairs of addresses (as the test program above does). If the
> machine has 16 banks of DRAM (as many of the machines I've tested on do),
> there will be a 1/16 chance that the two addresses are in the same
> bank.
Ok. Row size is something like 8MB, right?
So we have a program that corrupts basically random memory on many
machines. That is not good. That means that unpriviledged user can
crash processes of other users.
I relies on hammering DRAM rows so fast that refresh is unable to keep
data consistent in adjacent rows. It relies on clflush: without that,
it would likely not be possible to force fast enough row switches.
Unfortunately, clflush is not a priviledged instruction. Bad Intel.
Flushing cache seems to be priviledged on ARM (mcr p15). That means it
is probably impossible to exploit on ARM based machines.
We could make DRAM refresh faster. That will incur performance
penalty (<10%?), and is probably chipset-specific...?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Thu 2014-12-25 09:26:41, Bastien ROUCARIES wrote:
> Le 25 d?c. 2014 00:42, "Pavel Machek" <[email protected]> a ?crit :
> >
> > Hi!
> > >
> > > Try this test program: https://github.com/mseaborn/rowhammer-test
> > >
> > > It has reproduced bit flips on various machines.
> > >
> > > Your program won't be an effective test because you're just hammering
> > > addresses x and x+64, which will typically be in the same row of DRAM.
> > >
> > > For the test to be effective, you have to pick addresses that are in
> > > different rows but in the same bank. A good way of doing that is just
> to
> > > pick random pairs of addresses (as the test program above does). If the
> > > machine has 16 banks of DRAM (as many of the machines I've tested on
> do),
> > > there will be a 1/16 chance that the two addresses are in the same
> > > bank.
> >
> > Ok. Row size is something like 8MB, right?
> >
> > So we have a program that corrupts basically random memory on many
> > machines. That is not good. That means that unpriviledged user can
> > crash processes of other users.
> >
> > I relies on hammering DRAM rows so fast that refresh is unable to keep
> > data consistent in adjacent rows. It relies on clflush: without that,
> > it would likely not be possible to force fast enough row switches.
> >
> > Unfortunately, clflush is not a priviledged instruction. Bad Intel.
> >
>
> Ask a microcode update asking clflush to be penalized in userspace.
Indeed. Optionally making clflush priviledged intstruction, or
artifically make that instruction slower could do the trick.
Alternatively, lowering memory refresh intervals would reliably do the
same, but with bigger overhead. I guess documenting that controls for
common chipsets would do the trick, so kernel can adjust values before
starting userspace.
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi Pavel,
On Wed, Dec 24, 2014 at 05:38:23PM +0100, Pavel Machek wrote:
> Hi!
>
> It seems that it is easy to induce DRAM bit errors by doing repeated
> reads from adjacent memory cells on common hw. Details are at
>
> https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf
Extremely interesting stuff. I've always wondered if such modules
were *that* reliable given how picky they are about all timings.
> . Older memory modules seem to work better, and ECC should detect
> this. Paper has inner loop that should trigger this.
>
> Workarounds seem to be at hardware level, and tricky, too.
>
> Does anyone have implementation of detector? Any ideas how to work
> around it in software?
Maybe reserve some memory "canary" that is periodically scanned and
observe changes there. That will not tell you for sure that something
has not been done, but it will tell you for sure that bits were flipped.
Also I'm wondering whether perf counters on certain CPUs could be used
to detect the abnormal number of clflushes or even the memory access
pattern (will not work in multi-socket environments if a user has one
dedicated CPU though).
Thanks for sharing the link!
Willy
On 24 December 2014 at 15:41, Pavel Machek <[email protected]> wrote:
> > Try this test program: https://github.com/mseaborn/rowhammer-test
> >
> > It has reproduced bit flips on various machines.
...
> So we have a program that corrupts basically random memory on many
> machines. That is not good. That means that unpriviledged user can
> crash processes of other users.
...
> We could make DRAM refresh faster. That will incur performance
> penalty (<10%?), and is probably chipset-specific...?
Some machines already double the DRAM refresh rate in some cases.
For example, a presentation from Intel says:
"When non-pTRR compliant DIMMs are used, the E5-2600 v2 system
defaults into double refresh mode, which has longer memory
latency/DIMM access latency and can lower memory bandwidth by up to
2-4%.
...
* DDR3 DIMMs are affected by a pass gate charge migration issue (also
known as Row Hammer) that may result in a memory error.
* The Pseudo Target Row Refresh (pTRR) feature introduced on Ivy
Bridge processor families (2S/4S E5 v2, E7 v2) helps mitigate the
DDR3 pass gate issue by automatically refreshing victim rows."
-- from http://infobazy.gda.pl/2014/pliki/prezentacje/d2s2e4-Kaczmarski-Optymalna.pdf
("Thoughts on Intel Xeon E5-2600 v2 Product Family Performance
Optimisation – component selection guidelines", August 2014, Marcin
Kaczmarski)
Note that Target Row Refresh (TRR) is a DRAM feature that was added to
the recently-published LPDDR4 standard (where "LP" = "Low Power").
See http://www.jedec.org/standards-documents/results/jesd209-4
(registration is required to download the spec, but it's free). TRR
is basically a request that the CPU's memory controller can send to a
DRAM module to ask it to refresh a row's neighbours. I am not sure
how Pseudo TRR differs from TRR, though.
That presentation mentions one CPU (or CPU family), but I don't know
which other CPUs support these features (i.e. doubling the refresh
rate and/or using pTRR). Even if a CPU supports these features, it is
difficult to determine whether a machine's BIOS enables them. It is
the BIOS's responsibility to configure the CPU's memory controller at
startup.
Also, it is not clear how much doubling the DRAM refresh rate would
help prevent rowhammer-induced bit flips. Yoongu Kim et al's paper
shows that, for some DRAM modules, a refresh period of 32ms (instead
of the usual 64ms) is not short enough to reduce the error rate to
zero. See Figure 4 in
http://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf. I expect
that doubling the refresh rate is useful for reliability, but not
necessarily security. It would prevent accidental bit flips caused by
accidental row hammering, where programs accidentally generate a lot
of cache misses without using CLFLUSH. But it might not prevent a
determined attacker from generating bit flips that might be used for
taking control of a system.
Cheers,
Mark
On Wed, 24 Dec 2014, Pavel Machek wrote:
> Well... We could periodically scrub (every few miliseconds) pages
> mapped to userspace.
I.e. implement ECC in software. Would be extremely slow though.
> We might be able to do some magic and disallow cache flushes to
> userspace programs.
My understanding is that cflush is not strictly necessary, it only makes
the issue more likely to trigger.
If you modify the pattern so that it neraly fits into cacheline (but not
really), you would be able to produce similar (if not the same) cache
eviction pattern as if without explicit cflush. Right?
--
Jiri Kosina
SUSE Labs
On Mon 2014-12-29 13:13:17, Jiri Kosina wrote:
> On Wed, 24 Dec 2014, Pavel Machek wrote:
>
> > Well... We could periodically scrub (every few miliseconds) pages
> > mapped to userspace.
>
> I.e. implement ECC in software. Would be extremely slow though.
No, not really. If you read the cells that are about to go bad, you'll
update them. Agreed on extremely slow.
> > We might be able to do some magic and disallow cache flushes to
> > userspace programs.
>
> My understanding is that cflush is not strictly necessary, it only makes
> the issue more likely to trigger.
Umm. Not really, AFAICT.
So, the memory can take "certain ammount" of "neighboring
accesses". You need to do that ammount before next refresh.
> If you modify the pattern so that it neraly fits into cacheline (but not
> really), you would be able to produce similar (if not the same) cache
> eviction pattern as if without explicit cflush. Right?
No, I don't think so.
Well.. you need to generate certain ammount of traffic on the address
lines, and it corrupts "neighboring" cells. I wish I knew more about
DRAM... If you'll read a cache line, you can't "break" it as reads
refreshes it. You need to do few miliseconds worth of reads, AFAICT.
If you'll just keep reading cachelines, the cachelines you read will
not be "neighboring" enough to the "target" cells you want to break.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html