Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp456253imm; Fri, 15 Jun 2018 00:15:58 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKDhYyUAYoir6rEsODbqRXcGjXsQH1YJFbe8/BiWREvrJiIfnppc99uuwNIQHAycdWwwJfv X-Received: by 2002:a17:902:9685:: with SMTP id n5-v6mr636188plp.15.1529046958732; Fri, 15 Jun 2018 00:15:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529046958; cv=none; d=google.com; s=arc-20160816; b=vYIS3zeE/d9EmFDvzA8okj5snI3/Cj86lvVeHW+3zPJTHDcUh1dFiDYndjXKc7Eb1r rPGBw0i9rxLoJga4X5Bcz2KpFOyWuBGK7dKBoCaVFgwEHOEH9v9RnHxpLPu3ROHYbAFP HgzgqtQhDSguntKi5y+HGLww43UpKxAH+i9DMB6z/yNRgomcSRUZjCBVUdMtIF6vunbd Z4+JapZCQ/KeHHqWjEDFuI7jF3R/65LbvbOtnImL2aYScMKCz6kp8osMokCqLPnL6y+X SBZVn+UoODL3La2LM+Ea8Hff5fXQ51j24jTFKXsQ3s1v4F390Z7P6nCg4PCC4LdgKEeB PrKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=eAYgJ7qG85oc5ReJqkJXJFYJteE0UNTFLobUSFIq6dU=; b=byLLdffRGh0ctxBXOKVbwLrelogA8+9Afc1oy04zE6fHAXFgtGxfmuDhJRIcaAA0Jt t1Y6whjuWLOz1WsgfZmn98uGEZ0LVdaMFuKijXdYkbhwJsqLdG+lCIjO+OS9MUVZ8UtY bYf/ACzvt7Pqono3cSB5ihPbDMtm5nOXIfg4GXCMbeX4Hz/Xy9+gSLeGvgyvwxtR2nAA 12YR9T4ye49kzfli58avAJgMnFfz1Hdi7u4gEd+EQBufIvbj4Z81XUqsSQMeSkpzNPPZ pwGzCFK1s99RublZNIQOt+fYAQbPgDsvJvv+rjW2S+Kf7jb+jpEAMtUqylhQ7arfqfR0 yM1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m6-v6si6224844pgt.636.2018.06.15.00.15.41; Fri, 15 Jun 2018 00:15:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755729AbeFOHPN (ORCPT + 99 others); Fri, 15 Jun 2018 03:15:13 -0400 Received: from mga03.intel.com ([134.134.136.65]:17578 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754623AbeFOHPM (ORCPT ); Fri, 15 Jun 2018 03:15:12 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Jun 2018 00:15:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,226,1526367600"; d="scan'208";a="237369902" Received: from yjin15-mobl.ccr.corp.intel.com (HELO [10.239.161.30]) ([10.239.161.30]) by fmsmga005.fm.intel.com with ESMTP; 15 Jun 2018 00:15:04 -0700 Subject: Re: [PATCH v1 1/2] perf/core: Use sysctl to turn on/off dropping leaked kernel samples To: Stephane Eranian Cc: Arnaldo Carvalho de Melo , Jiri Olsa , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , me@kylehuey.com, LKML , Vince Weaver , Will Deacon , Namhyung Kim , Andi Kleen , "Liang, Kan" , "Jin, Yao" References: <1529057003-2212-1-git-send-email-yao.jin@linux.intel.com> <1529057003-2212-2-git-send-email-yao.jin@linux.intel.com> From: "Jin, Yao" Message-ID: <5ee8dba2-3346-615c-84a2-3a65203c06a8@linux.intel.com> Date: Fri, 15 Jun 2018 15:15:03 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/15/2018 1:59 PM, Stephane Eranian wrote: > On Thu, Jun 14, 2018 at 7:10 PM Jin Yao wrote: >> >> When doing sampling, for example: >> >> perf record -e cycles:u ... >> >> On workloads that do a lot of kernel entry/exits we see kernel >> samples, even though :u is specified. This is due to skid existing. >> >> This might be a security issue because it can leak kernel addresses even >> though kernel sampling support is disabled. >> >> One patch "perf/core: Drop kernel samples even though :u is specified" >> was posted in last year but it was reverted because it introduced a >> regression issue that broke the rr-project, which used sampling >> events to receive a signal on overflow. These signals were critical >> to the correct operation of rr. >> >> See '6a8a75f32357 ("Revert "perf/core: Drop kernel samples even >> though :u is specified"")' for detail. >> >> Now the idea is to use sysctl to control the dropping of leaked >> kernel samples. >> >> /sys/devices/cpu/perf_allow_sample_leakage: >> >> 0 - default, drop the leaked kernel samples. >> 1 - don't drop the leaked kernel samples. >> >> For rr it can write 1 to /sys/devices/cpu/perf_allow_sample_leakage. >> >> For example, >> >> root@skl:/tmp# cat /sys/devices/cpu/perf_allow_sample_leakage >> 0 >> root@skl:/tmp# perf record -e cycles:u ./div >> root@skl:/tmp# perf report --stdio >> >> ........ ....... ............. ................ >> >> 47.01% div div [.] main >> 20.74% div libc-2.23.so [.] __random_r >> 15.59% div libc-2.23.so [.] __random >> 8.68% div div [.] compute_flag >> 4.48% div libc-2.23.so [.] rand >> 3.50% div div [.] rand@plt >> 0.00% div ld-2.23.so [.] do_lookup_x >> 0.00% div ld-2.23.so [.] memcmp >> 0.00% div ld-2.23.so [.] _dl_start >> 0.00% div ld-2.23.so [.] _start >> >> There is no kernel symbol reported. >> >> root@skl:/tmp# echo 1 > /sys/devices/cpu/perf_allow_sample_leakage >> root@skl:/tmp# cat /sys/devices/cpu/perf_allow_sample_leakage >> 1 >> root@skl:/tmp# perf record -e cycles:u ./div >> root@skl:/tmp# perf report --stdio >> >> ........ ....... ................ ............. >> >> 47.53% div div [.] main >> 20.62% div libc-2.23.so [.] __random_r >> 15.32% div libc-2.23.so [.] __random >> 8.66% div div [.] compute_flag >> 4.53% div libc-2.23.so [.] rand >> 3.34% div div [.] rand@plt >> 0.00% div [kernel.vmlinux] [k] apic_timer_interrupt >> 0.00% div libc-2.23.so [.] intel_check_word >> 0.00% div ld-2.23.so [.] brk >> 0.00% div [kernel.vmlinux] [k] page_fault >> 0.00% div ld-2.23.so [.] _start >> >> We can see the kernel symbols apic_timer_interrupt and page_fault. >> > These kernel symbols do not match your description here. How much skid > do you think you have here? > You're saying you are at the user level, you get a counter overflow, > and the interrupted IP lands in the kernel > because you where there by the time the interrupt is delivered. How > many instructions does it take to get > from user land to apic_timer_interrupt() or page_fault()? These > functions are not right at the kernel entry, > I believe. So how did you get there, the skid must have been VERY big > or symbolization has a problem. > I'm testing with the latest perf/core branch (4.17+). Again I test with Linux's vmstat (not test with my application). perf record -e cycles:u vmstat 1 perf script -F ip 7f84e2b0bc30 7f84e2b0bc30 7f84e2b0bc30 7f84e2b0bc30 ffffffffb7a01070 7f84e2b243f0 7f84e2b11891 7f84e2b27f5e 7f84e25a3b26 7f84e25680f5 cat /proc/kallsyms | grep page_fault .... ffffffffb7a01070 T page_fault ffffffffb7a010a0 T async_page_fault .... So one sample (ip ffffffffb7a01070) hits on page_fault. Maybe you can have a try too. :) Thanks Jin Yao >> Signed-off-by: Jin Yao >> --- >> kernel/events/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 58 insertions(+) >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index 80cca2b..7867541 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -7721,6 +7721,28 @@ int perf_event_account_interrupt(struct perf_event *event) >> return __perf_event_account_interrupt(event, 1); >> } >> >> +static int perf_allow_sample_leakage __read_mostly; >> + >> +static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs) >> +{ >> + int allow_leakage = READ_ONCE(perf_allow_sample_leakage); >> + >> + if (allow_leakage) >> + return true; >> + >> + /* >> + * Due to interrupt latency (AKA "skid"), we may enter the >> + * kernel before taking an overflow, even if the PMU is only >> + * counting user events. >> + * To avoid leaking information to userspace, we must always >> + * reject kernel samples when exclude_kernel is set. >> + */ >> + if (event->attr.exclude_kernel && !user_mode(regs)) >> + return false; >> + >> + return true; >> +} >> + >> /* >> * Generic event overflow handling, sampling. >> */ >> @@ -7742,6 +7764,12 @@ static int __perf_event_overflow(struct perf_event *event, >> ret = __perf_event_account_interrupt(event, throttle); >> >> /* >> + * For security, drop the skid kernel samples if necessary. >> + */ >> + if (!sample_is_allowed(event, regs)) >> + return ret; >> + >> + /* >> * XXX event_limit might not quite work as expected on inherited >> * events >> */ >> @@ -9500,9 +9528,39 @@ perf_event_mux_interval_ms_store(struct device *dev, >> } >> static DEVICE_ATTR_RW(perf_event_mux_interval_ms); >> >> +static ssize_t >> +perf_allow_sample_leakage_show(struct device *dev, >> + struct device_attribute *attr, char *page) >> +{ >> + int allow_leakage = READ_ONCE(perf_allow_sample_leakage); >> + >> + return snprintf(page, PAGE_SIZE-1, "%d\n", allow_leakage); >> +} >> + >> +static ssize_t >> +perf_allow_sample_leakage_store(struct device *dev, >> + struct device_attribute *attr, >> + const char *buf, size_t count) >> +{ >> + int allow_leakage, ret; >> + >> + ret = kstrtoint(buf, 0, &allow_leakage); >> + if (ret) >> + return ret; >> + >> + if (allow_leakage != 0 && allow_leakage != 1) >> + return -EINVAL; >> + >> + WRITE_ONCE(perf_allow_sample_leakage, allow_leakage); >> + >> + return count; >> +} >> +static DEVICE_ATTR_RW(perf_allow_sample_leakage); >> + >> static struct attribute *pmu_dev_attrs[] = { >> &dev_attr_type.attr, >> &dev_attr_perf_event_mux_interval_ms.attr, >> + &dev_attr_perf_allow_sample_leakage.attr, >> NULL, >> }; >> ATTRIBUTE_GROUPS(pmu_dev); >> -- >> 2.7.4 >>