Received: by 2002:a05:6a10:8a4d:0:0:0:0 with SMTP id dn13csp961249pxb; Fri, 13 Aug 2021 10:06:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxIK6Glt03ZYuXqL3gxhqg25fzYp/T4mCPchiGQetD29PcH3hdHDcXVNCQh+ZWmhV4LYwsz X-Received: by 2002:a92:cd8a:: with SMTP id r10mr2319261ilb.287.1628874405845; Fri, 13 Aug 2021 10:06:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628874405; cv=none; d=google.com; s=arc-20160816; b=Efm7G6P0LkGk02aGsYyvsanOX5A8jpIRq0TeM3h+T6xKzeVBRP53XarQOSYgRZ4v91 okpybOYalZUUFi3591T8N0ZkxboRzCUAzKgLLpA+qX61g8pBEaUoK435ARmBA5CGp5rA DhJEcNtFIQlNgzn0rCJ758V/sAiMi5kv15JZ7/ifiluCRr3L4MnrY0woQAeqZ7jBAlrK VU8aGKU59spIts7s9gHqOpyvaf/szQOdoo+tawt8TA0cPhj5Sh5KAzPfartt1jGKaL0f B4LgNRoWG9v5HTQKS/TMsYci2zWXj8CDgzTVto6dFvEGS84jBXNzVTB//9bIGRG2Aeok 3MLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:cc :references:to:subject; bh=LmGFWDWoILHE9xYurfoYIHSUaFKexeBwRgl71d+L/yA=; b=OXT7AK69Q81mJ24pdaWMtjoxDzfBPOn7/1857k4P0sNOngwdAypKnprFrNvWy06rWy nEcUV/Q+gjG+WIPISoXyTmuraHiBFaGk4PEvN+4IxHw8oF8piqgqk9k7aQYUGVguEeUk XXRGu3GNsfxwB4Z/+XEynWpcXWoTqsqn8zfmwwtbydD/7c5Ylp+3DBesHT+1FA6AMDnz HYRCdhwU3f+s8Pv6/N3vLWj5/WI1QfHv9wJxXlanbZqTHzPyw2SEAAgBzqtKR4J9gbdn 5WphgZigK3TsSC1OiOGv/d1fg1cvlk/iQtfJhdkq2lHG130+TvXlLFLcHkcQLAptIlkq yxIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d20si2030436ioo.61.2021.08.13.10.06.33; Fri, 13 Aug 2021 10:06:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234624AbhHMQXF (ORCPT + 99 others); Fri, 13 Aug 2021 12:23:05 -0400 Received: from foss.arm.com ([217.140.110.172]:55506 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234260AbhHMQXC (ORCPT ); Fri, 13 Aug 2021 12:23:02 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 76E851FB; Fri, 13 Aug 2021 09:22:35 -0700 (PDT) Received: from [10.57.41.96] (unknown [10.57.41.96]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 621583F718; Fri, 13 Aug 2021 09:22:32 -0700 (PDT) Subject: Re: [PATCH v1 2/3] perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} To: Leo Yan References: <20210809112727.596876-1-leo.yan@linaro.org> <20210809112727.596876-3-leo.yan@linaro.org> Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Arnaldo Carvalho de Melo , Peter Zijlstra , Adrian Hunter , Ingo Molnar , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Will Deacon , Russell King , Catalin Marinas , Mathieu Poirier , Suzuki K Poulose , Mike Leach , John Garry , Andi Kleen , Riccardo Mancini , Jin Yao , Li Huafei , coresight@lists.linaro.org From: James Clark Message-ID: <2b4e0c07-a8df-cca6-6a94-328560f4b0c6@arm.com> Date: Fri, 13 Aug 2021 17:22:31 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210809112727.596876-3-leo.yan@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/08/2021 12:27, Leo Yan wrote: > +/* > + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode, > + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to > + * the issues caused by the below sequence on multiple CPUs: when perf tool > + * accesses either the load operation or the store operation for 64-bit value, > + * on some architectures the operation is divided into two instructions, one > + * is for accessing the low 32-bit value and another is for the high 32-bit; > + * thus these two user operations can give the kernel chances to access the > + * 64-bit value, and thus leads to the unexpected load values. > + * > + * kernel (64-bit) user (32-bit) > + * > + * if (LOAD ->aux_tail) { --, LOAD ->aux_head_lo > + * STORE $aux_data | ,---> > + * FLUSH $aux_data | | LOAD ->aux_head_hi > + * STORE ->aux_head --|-------` smp_rmb() > + * } | LOAD $data > + * | smp_mb() > + * | STORE ->aux_tail_lo > + * `-----------> > + * STORE ->aux_tail_hi > + * > + * For this reason, it's impossible for the perf tool to work correctly when > + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we > + * can not simply limit the AUX ring buffer to less than 4GB, the reason is > + * the pointers can be increased monotonically, whatever the buffer size it is, > + * at the end the head and tail can be bigger than 4GB and carry out to the > + * high 32-bit. > + * > + * To mitigate the issues and improve the user experience, we can allow the > + * perf tool working in certain conditions and bail out with error if detect > + * any overflow cannot be handled. > + * > + * For reading the AUX head, it reads out the values for three times, and > + * compares the high 4 bytes of the values between the first time and the last > + * time, if there has no change for high 4 bytes injected by the kernel during > + * the user reading sequence, it's safe for use the second value. > + * > + * When update the AUX tail and detects any carrying in the high 32 bits, it > + * means there have two store operations in user space and it cannot promise > + * the atomicity for 64-bit write, so return '-1' in this case to tell the > + * caller an overflow error has happened. > + */ > +u64 __weak compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm) > +{ > + struct perf_event_mmap_page *pc = mm->userpg; > + u64 first, second, last; > + u64 mask = (u64)(UINT32_MAX) << 32; > + > + do { > + first = READ_ONCE(pc->aux_head); > + /* Ensure all reads are done after we read the head */ > + smp_rmb(); > + second = READ_ONCE(pc->aux_head); > + /* Ensure all reads are done after we read the head */ > + smp_rmb(); > + last = READ_ONCE(pc->aux_head); > + } while ((first & mask) != (last & mask)); > + > + return second; > +} > + Hi Leo, I had a couple of questions about this bit. If we're assuming that the high bytes of 'first' and 'last' are equal, then 'second' is supposed to be somewhere in between or equal to 'first' and 'last'. If that's the case, wouldn't it be better to return 'last', because it's closer to the value at the time of reading? And then in that case, if last is returned, then why do a read for 'second' at all? Can 'second' be skipped and just read first and last? Also maybe it won't make a difference, but is there a missing smp_rmb() between the read of 'last' and 'first'? Thanks James