Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp1927349iog; Tue, 14 Jun 2022 17:05:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNOOXoDL1fKGrUNG8W1CdT4RKlyAf2DO+tQAeJyVAEBZtmkPV/wpng6pntFPd2HdxQG8Og X-Received: by 2002:aa7:8d47:0:b0:4f6:a7f9:1ead with SMTP id s7-20020aa78d47000000b004f6a7f91eadmr7077084pfe.42.1655251519009; Tue, 14 Jun 2022 17:05:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655251519; cv=none; d=google.com; s=arc-20160816; b=jiqnqIOhpcjPxM8Z1W935OBYRLklSVyRc8lNcD6hoiZvImFADCiVpsehEt3aIJo3z1 swOensM+WfzvDAFfrvxeXu4GSOcp/f2lMuvwNlhi/daGyhCluvbIhtwsYSSYpqiNm8jX cCB+/Z3HGO9JaaLm5Q+icxx6yvZD4DFOxb8sqIS9zog+wrlrnvNqfYRZXQvmw4qcPdqL kzlbgmyjw+IdAvfnFZiicWhVq3JT9Ir4YNgQtyOemIR7Lm/vy0JV7+pFt8KUcaaPZ/wX 7xdLm/2L+jjXrZAVP6ltF6TxH2TXWTvHDM2Y1HxqkLEX57SnLT/vgwr2Gox7GHPlcIiz PruQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=EWo9Y9cWnGX30kGBP5X/TRlRx2kF6m/fLJ37TkDptlY=; b=jCWfd5oFdryov9YdKqeUa0K+L5M7KKRdOknX8YeG/sgMMcxE9I26BiL/wy4Ype18m/ Nc6Hot3+854C2SXUY0GtmVQtOYJQd/6+x2ekVsi0RkujzT2l5dywELYHp/CdqQ/IOkf2 L2k3crc0BySVMjPYbf/l20Wgvhc3BfcXBlE0qUFr0MoFZCBi6+w7v5X69eERtUsZsu5u 4narlb1YlIcpRfauzC0Me40wjvX1THxVdiq5E7Jb98Y64QXTpQYvgBhyPaUxQRT2t1LO 2b2TrcxRlbdy5iug9iwgRjTHZ7BB+7RmiZsTw0kNGT2l8DK04sCY1bFSqnosDlngp9DH 7/nw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VJ5tlNOS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a62-20020a639041000000b00401a7a4dc3dsi15001838pge.532.2022.06.14.17.05.06; Tue, 14 Jun 2022 17:05:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VJ5tlNOS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231879AbiFNXvq (ORCPT + 99 others); Tue, 14 Jun 2022 19:51:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229950AbiFNXvo (ORCPT ); Tue, 14 Jun 2022 19:51:44 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE26A3E0DE for ; Tue, 14 Jun 2022 16:51:42 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id k19so13158604wrd.8 for ; Tue, 14 Jun 2022 16:51:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EWo9Y9cWnGX30kGBP5X/TRlRx2kF6m/fLJ37TkDptlY=; b=VJ5tlNOSo5HPBFmnTQBV1rFyhOLi9OcQsLIGwu7MxQ2bf2wDx38xJmuxtMO8tHnTbt 7R6WlZqcAAGxrIR+txrLdPkKNQD2x1V4L78gi8pxIh1fA2nUV6XoSe0a2DPm2hDwGEvi qkjbWGfXihXPChHThHEolhmsHR0TQPCbHCczGGz81FXUIxnUT+oPaHvqrZL9Pr/3jvTg FekAVYl8TgPChEOcGgx37b5XcHO2tbWenwm/QTJX7ZITQxzAzXb59ufhcnaIlgkC0XbQ 8Ita2j+UQIF7l6z4Ulas13hVaS5b3ikYEkhtZ2urmRThjFfAMKdM2vcgPC91EbmISu39 EZ+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EWo9Y9cWnGX30kGBP5X/TRlRx2kF6m/fLJ37TkDptlY=; b=cfcGD3Cppj3kTLE9hxc2dHrV/gM2VjTdyn2QbvAyRbmAgF9ytywcHRHgj/KCaHocai 2UYnOvmepu0cvKZVmXV9gvkS1jipyo/6vlpb5S7/zT7ShMRsIbpL2FdAZiXixAjDBZRP e76DCJdhlfx+S9MEN6OKqGs8NdkSwgpTLzgFXKlYw9iRa0OPphC6yJll/FJCrP3fX87V dfYOGB+3mpzHGHBYf8vtntPXtYFfbCF7lm5XfPnH2S6fsL5hCynMgYiEbYFOKrR4K2et 7IjncBJ1qWpjs168kHTVwfUpTVd4bP2vXF5tNztTwAY9w1v0pDYXwSdOkHNcqJ2Qi5C4 tg7Q== X-Gm-Message-State: AJIora+o3/6J8Rql97kV8dwrNRlcZCPWS5cOCuEw7M5EdDW2mNo1K0Ei zFSY+vKw2PF7WjyEbzI/SvmOC4n1nIYwH/KCZDHBhA== X-Received: by 2002:a5d:598e:0:b0:219:ea16:5a2a with SMTP id n14-20020a5d598e000000b00219ea165a2amr7214291wri.343.1655250701192; Tue, 14 Jun 2022 16:51:41 -0700 (PDT) MIME-Version: 1.0 References: <20220614143353.1559597-1-irogers@google.com> <20220614143353.1559597-5-irogers@google.com> In-Reply-To: From: Ian Rogers Date: Tue, 14 Jun 2022 16:51:28 -0700 Message-ID: Subject: Re: [PATCH v2 4/6] perf cpumap: Fix alignment for masks in event encoding To: Namhyung Kim Cc: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , James Clark , Kees Cook , "Gustavo A. R. Silva" , Adrian Hunter , Riccardo Mancini , German Gomez , Colin Ian King , Song Liu , Dave Marchevsky , Athira Rajeev , Alexey Bayduraev , Leo Yan , linux-perf-users , linux-kernel , Stephane Eranian Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 14, 2022 at 3:44 PM Namhyung Kim wrote: > > Hi Ian, > > On Tue, Jun 14, 2022 at 7:34 AM Ian Rogers wrote: > > > > A mask encoding of a cpu map is laid out as: > > u16 nr > > u16 long_size > > unsigned long mask[]; > > However, the mask may be 8-byte aligned meaning there is a 4-byte pad > > after long_size. This means 32-bit and 64-bit builds see the mask as > > being at different offsets. On top of this the structure is in the byte > > data[] encoded as: > > u16 type > > char data[] > > This means the mask's struct isn't the required 4 or 8 byte aligned, but > > is offset by 2. Consequently the long reads and writes are causing > > undefined behavior as the alignment is broken. > > > > Fix the mask struct by creating explicit 32 and 64-bit variants, use a > > union to avoid data[] and casts; the struct must be packed so the > > layout matches the existing perf.data layout. Taking an address of a > > member of a packed struct breaks alignment so pass the packed > > perf_record_cpu_map_data to functions, so they can access variables with > > the right alignment. > > > > As the 64-bit version has 4 bytes of padding, optimizing writing to only > > write the 32-bit version. > > > > Signed-off-by: Ian Rogers > > --- > > tools/lib/perf/include/perf/event.h | 36 +++++++++++-- > > tools/perf/tests/cpumap.c | 19 ++++--- > > tools/perf/util/cpumap.c | 80 +++++++++++++++++++++++------ > > tools/perf/util/cpumap.h | 4 +- > > tools/perf/util/session.c | 30 +++++------ > > tools/perf/util/synthetic-events.c | 34 +++++++----- > > 6 files changed, 143 insertions(+), 60 deletions(-) > > > > diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h > > index e7758707cadd..d2d32589758a 100644 > > --- a/tools/lib/perf/include/perf/event.h > > +++ b/tools/lib/perf/include/perf/event.h > > @@ -6,6 +6,7 @@ > > #include > > #include > > #include > > +#include > > #include /* pid_t */ > > > > #define event_contains(obj, mem) ((obj).header.size > offsetof(typeof(obj), mem)) > > @@ -153,20 +154,47 @@ enum { > > PERF_CPU_MAP__MASK = 1, > > }; > > > > +/* > > + * Array encoding of a perf_cpu_map where nr is the number of entries in cpu[] > > + * and each entry is a value for a CPU in the map. > > + */ > > struct cpu_map_entries { > > __u16 nr; > > __u16 cpu[]; > > }; > > > > -struct perf_record_record_cpu_map { > > +/* Bitmap encoding of a perf_cpu_map where bitmap entries are 32-bit. */ > > +struct perf_record_mask_cpu_map32 { > > + /* Number of mask values. */ > > __u16 nr; > > + /* Constant 4. */ > > __u16 long_size; > > - unsigned long mask[]; > > + /* Bitmap data. */ > > + __u32 mask[]; > > }; > > > > -struct perf_record_cpu_map_data { > > +/* Bitmap encoding of a perf_cpu_map where bitmap entries are 64-bit. */ > > +struct perf_record_mask_cpu_map64 { > > + /* Number of mask values. */ > > + __u16 nr; > > + /* Constant 8. */ > > + __u16 long_size; > > + /* Legacy padding. */ > > + char __pad[4]; > > + /* Bitmap data. */ > > + __u64 mask[]; > > +}; > > + > > +struct __packed perf_record_cpu_map_data { > > __u16 type; > > - char data[]; > > + union { > > + /* Used when type == PERF_CPU_MAP__CPUS. */ > > + struct cpu_map_entries cpus_data; > > + /* Used when type == PERF_CPU_MAP__MASK and long_size == 4. */ > > + struct perf_record_mask_cpu_map32 mask32_data; > > + /* Used when type == PERF_CPU_MAP__MASK and long_size == 8. */ > > + struct perf_record_mask_cpu_map64 mask64_data; > > + }; > > }; > > How about moving the 'type' to the union as well? > This way we don't need to pack the entire struct > and can have a common struct for 32 and 64 bit.. > > struct cpu_map_entries { > __u16 type; > __u16 nr; > __u16 cpu[]; > }; > > struct perf_record_mask_cpu_map { > __u16 type; > __u16 nr; > __u16 long_size; // still needed? > __u16 pad; > unsigned long mask[]; > }; > > // changed it to union > union perf_record_cpu_map_data { > __u16 type; > struct cpu_map_entries cpus_data; > struct perf_record_mask_cpu_map mask_data; > }; Thanks Namhyung, Unfortunately this doesn't quite work as I want to make it so that the existing cpu map encodings work with this change - ie, an old perf.data should be readable in a newer perf with this change (the range encoding will require that new perf.data files are read by versions of perf with these changes). For this to work I need the layout to match the existing unaligned code, I either need to make mask bytes and memcpy or use an attribute like packed. Fwiw, this is a little more efficient than the layout above as with long_size == 4 the pad isn't necessary saving 2 bytes. I think with the packed approach we can also add new unpacked variants like above, although I'd be keen not to use a type that varies in size like long. I guess at some future date we could remove the legacy supporting packed versions so that packed or byte copying is unnecessary. I could use a union as you show above, unfortunately that will need the 'struct perf_record_mask_cpu_map32' and 'struct perf_record_mask_cpu_map64' to be packed or to use bytes. We'd lose one use of packed just to introduce two others. Potentially it is more of a breaking change for users of this code via libperf. These changes are something of a bug report along with fixes. If there is a consensus that the right way to fix the bug is to break legacy perf.data files then I'm happy to update the code accordingly (as you show above). Thanks, Ian > Thanks, > Namhyung