Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp95475imm; Mon, 2 Jul 2018 08:14:38 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJkV8QpvGPVkItc184/aPOu2z9a6QpdEBSBJjEfoJdrWB5HkbY+UX77h1bgvj0T74U7wvz7 X-Received: by 2002:a63:3807:: with SMTP id f7-v6mr22207298pga.446.1530544478515; Mon, 02 Jul 2018 08:14:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530544478; cv=none; d=google.com; s=arc-20160816; b=SyR9TZmpkDHn8sbwJYtSYayaLcHm06bojx9qijYnpUFSM5ZxPb3zbGuGETv82eWKLr 6CfZZha8pENF0h3wLVpWd/J7+cy5FkwRhnw/O8fGKjqdLyTpbSFVXzRg++Npp/TAZj1T az/xh2PV0ubSKMSCzQWPv8AcS6VSEIOE7AGzjCvc5vKMEfj+yBb/G2WNFBV4NbSoVcSM 0orKeRau/JgCsqQHeQ6SumfA7laevT6dZ2VNZJD0cbFFUI9EprH6v/WyRuvnBDS3d8po NmMntJn4P0bpNTJLWot2CTbzXuOGabct25Ee8blT9qQThJ5mjYBf2Pvcl6Va8Yivo8hB IsNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=MIZb5m95pLteUu1mlO9j68qHQ8ct+917n7yFakDiRos=; b=V1ddepna+6y1+BkyVwjLkR4QgvojeH7NsGJA1hoYOlUa2SndWvWqGFD8e4I1O90MZ6 308XrJR6zuLTcloNDI7Q+iwPuRUhOFfAkqn8alulPo9iiPU6yhZC5JX5nxfWYupscRp6 rIAq9iAxZpRuMJ12YfTH90EcnhUuDoZYXt57aQ7NSiqhM9B7rpfnC7OEOtZMqEYFNaob 7D+kf8VZyidLPedtw6C1mhiggwEiemUE+nv3iiVfnaITijlxT3NuN0u0GNf5xNlzovVc mAiht8S7SqoIavkj282bYaP5gHXZ0foHbNWdedgd+0c2Bv2VNa1fPTc/sQ/paKqg0ut4 zMcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b59-v6si16646124plc.335.2018.07.02.08.14.23; Mon, 02 Jul 2018 08:14:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752580AbeGBPM5 (ORCPT + 99 others); Mon, 2 Jul 2018 11:12:57 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:34240 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752075AbeGBPMz (ORCPT ); Mon, 2 Jul 2018 11:12:55 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A14BF80D; Mon, 2 Jul 2018 08:12:55 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9318B3F2EA; Mon, 2 Jul 2018 08:12:54 -0700 (PDT) From: Mark Rutland To: linux-kernel@vger.kernel.org Cc: Mark Rutland , Ingo Molnar , Jin Yao , Peter Zijlstra Subject: [RFC PATCH] perf/core: don't sample kernel regs upon skid Date: Mon, 2 Jul 2018 16:12:50 +0100 Message-Id: <20180702151250.14536-1-mark.rutland@arm.com> X-Mailer: git-send-email 2.11.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Users can request that general purpose registers, instruction pointer, etc, are sampled when a perf event counter overflows. To try to avoid this resulting in kernel state being leaked, unprivileged users are usually forbidden from opening events which count while the kernel is running. Unfortunately, this is not sufficient to avoid leading kernel state. For various reasons, there can be a delay between the overflow occurring and the resulting overflow exception (e.g. an NMI) being taken. During this window, other instructions may be executed, resulting in skid. This skid means that a userspace-only event overflowing may result in an exception being taken *after* entry to the kernel, allowing kernel registers to be sampled. Depending on the amount of skid, this may only leak the PC (breaking KASLR), or it may leak secrets which are currently live in GPRs. Let's avoid this by only sampling from the user registers when an event is supposed to exclude the kernel, providing the illusion that the overflow exception is taken from userspace. We also have similar cases when sampling a guest, where we get the host regs in some cases. It's not entirely clear to me how we should handle these. Signed-off-by: Mark Rutland Cc: Ingo Molnar Cc: Jin Yao Cc: Peter Zijlstra --- kernel/events/core.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 8f0434a9951a..2ab2548b2e66 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6361,6 +6361,32 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs) return callchain ?: &__empty_callchain; } +static struct pt_regs *perf_get_sample_regs(struct perf_event *event, + struct pt_regs *regs) +{ + /* + * Due to interrupt latency (AKA "skid"), we may enter the kernel + * before taking an overflow, even if the PMU is only counting user + * events. + * + * If we're not counting kernel events, always use the user regs when + * sampling. + * + * TODO: what do we do about sampling a guest's registers? The IP is + * special-cased, but for the rest of the regs they'll get the + * user/kernel regs depending on whether exclude_kernel is set, which + * is nonsensical. + * + * We can't get at the full set of regs in all cases (e.g. Xen's PV PMU + * can't provide the GPRs), so should we just zero the GPRs when in a + * guest? Or skip outputting the regs in perf_output_sample? + */ + if (event->attr.exclude_kernel && !user_mode(regs)) + return task_pt_regs(current); + + return regs; +} + void perf_prepare_sample(struct perf_event_header *header, struct perf_sample_data *data, struct perf_event *event, @@ -6368,6 +6394,8 @@ void perf_prepare_sample(struct perf_event_header *header, { u64 sample_type = event->attr.sample_type; + regs = perf_get_sample_regs(event, regs); + header->type = PERF_RECORD_SAMPLE; header->size = sizeof(*header) + event->header_size; -- 2.11.0