Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756329AbcJ1J1W (ORCPT ); Fri, 28 Oct 2016 05:27:22 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:38611 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752109AbcJ1J1U (ORCPT ); Fri, 28 Oct 2016 05:27:20 -0400 MIME-Version: 1.0 In-Reply-To: <20161028090423.GY3102@twins.programming.kicks-ass.net> References: <20161026204748.GA11177@amd> <20161027082801.GE3568@worktop.programming.kicks-ass.net> <20161027091104.GB19469@amd> <20161027093334.GK3102@twins.programming.kicks-ass.net> <20161027212747.GA18147@amd> <20161028070701.GA11376@gmail.com> <20161028085039.GA15032@amd> <20161028090423.GY3102@twins.programming.kicks-ass.net> From: Vegard Nossum Date: Fri, 28 Oct 2016 11:27:17 +0200 Message-ID: Subject: Re: rowhammer protection [was Re: Getting interrupt every million cache misses] To: Peter Zijlstra Cc: Pavel Machek , Ingo Molnar , Kees Cook , Arnaldo Carvalho de Melo , kernel list , Ingo Molnar , Alexander Shishkin , "kernel-hardening@lists.openwall.com" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1745 Lines: 41 On 28 October 2016 at 11:04, Peter Zijlstra wrote: > On Fri, Oct 28, 2016 at 10:50:39AM +0200, Pavel Machek wrote: >> On Fri 2016-10-28 09:07:01, Ingo Molnar wrote: >> > >> > * Pavel Machek wrote: >> > >> > > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) >> > > +{ >> > > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */ >> > > + u64 now = ktime_get_mono_fast_ns(); >> > > + s64 delta = now - *ts; >> > > + >> > > + *ts = now; >> > > + >> > > + /* FIXME msec per usec, reverse logic? */ >> > > + if (delta < 64 * NSEC_PER_MSEC) >> > > + mdelay(56); >> > > +} >> > >> > I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is >> > very magic, and do we know it 100% that 56 msecs is what is needed >> > everywhere? >> >> I agree this needs to be tunable (and with the other suggestions). But >> this is actually not the most important tunable: the detection >> threshold (rh_attr.sample_period) should be way more important. > > So being totally ignorant of the detail of how rowhammer abuses the DDR > thing, would it make sense to trigger more often and delay shorter? Or > is there some minimal delay required for things to settle or something. Would it make sense to sample the counter on context switch, do some accounting on a per-task cache miss counter, and slow down just the single task(s) with a too high cache miss rate? That way there's no global slowdown (which I assume would be the case here). The task's slice of CPU would have to be taken into account because otherwise you could have multiple cooperating tasks that each escape the limit but taken together go above it. Vegard