Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp776408rwb; Wed, 16 Nov 2022 07:32:14 -0800 (PST) X-Google-Smtp-Source: AA0mqf4iWR++LMIAVRhkXl7Wli6GYQvD6AwFyLbSGGYGRXz6LUCca9EMhKm7ZmlTg6IJLpC0ku6S X-Received: by 2002:a17:90a:4496:b0:20a:f638:3c9f with SMTP id t22-20020a17090a449600b0020af6383c9fmr4358832pjg.38.1668612733839; Wed, 16 Nov 2022 07:32:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668612733; cv=none; d=google.com; s=arc-20160816; b=cht1l4p7o6mw/Lqqz4pokt71XTS9gHwvFbcMir0qrP3QQsZ1T4PjAHQSagN+wRDFbD a01JuZf1SiNyOoxQTvu4FKxSQOMHINk/LU3k5Af+KwdKI53iQB+0D9Bji3M+s8+1jm8T 52wbH/brKsNjEuI/HSpJQnytMnxLD0xspnjwWuJddr2iZxIuB4N2tTkk+wa0c44YoYls 5CGKnBGsuVvuPNtYSW09HGfeW9X5zPWaMuNXBdRTJiBh885CNxg2wFGreiShc7dGi++I o82B5FJ5MLz6E+jpzkHEcLPYsxkjLi9wn+9FTFS+RYyQGNxA4k1ARwM8UYjiOkvPO/M7 RR4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=+6Sw1pw/L/oB216dGWEEUgjwy6muyOeb2hK4NNltt4U=; b=0RbCgW1xG/jhuq/SMEbzT/M002dacMDqpEu1ck2Hp3uwGYZY7SJ00FpUy0ZAavvPEh PFKi5yO7qHRMzAVx1qNzM5tMwrglzwSmzZRe3RkHBCS8hsFZhwlgTSuB3c7Q0ylbUi1s slHvfeOTa1SN54noSDss/A0eqcSQIYfzkupXYNYqamWP3EkvNIlQdeodV8oOE6RXX3Xe 4/cSR4aRwSD2WGQY+IMbZNlooMcVFCC7thyirmDyNuyjEfUiKFsnr2NQKSxFtWXC6v3O K0hdtLwhjaWgYqlK/KTR1hwLZoaAZmCmppAVLuzfLQp0HoNJh/z0ALv/DMk8bUuK6CIO qYKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lb12-20020a17090b4a4c00b00212d47deabesi2484245pjb.60.2022.11.16.07.31.59; Wed, 16 Nov 2022 07:32:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234735AbiKPPUS (ORCPT + 91 others); Wed, 16 Nov 2022 10:20:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234631AbiKPPTn (ORCPT ); Wed, 16 Nov 2022 10:19:43 -0500 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 794064FF8D; Wed, 16 Nov 2022 07:19:40 -0800 (PST) Received: from fraeml713-chm.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4NC66V22R9z689R4; Wed, 16 Nov 2022 23:14:58 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (7.191.163.240) by fraeml713-chm.china.huawei.com (10.206.15.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 16 Nov 2022 16:19:38 +0100 Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 16 Nov 2022 15:19:37 +0000 Date: Wed, 16 Nov 2022 15:19:36 +0000 From: Jonathan Cameron To: CC: Dan Williams , Steven Rostedt , Alison Schofield , "Vishal Verma" , Ben Widawsky , Davidlohr Bueso , , Subject: Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command Message-ID: <20221116151936.0000662f@Huawei.com> In-Reply-To: <20221110185758.879472-3-ira.weiny@intel.com> References: <20221110185758.879472-1-ira.weiny@intel.com> <20221110185758.879472-3-ira.weiny@intel.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.227.76] X-ClientProxiedBy: lhrpeml500004.china.huawei.com (7.191.163.9) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 10 Nov 2022 10:57:49 -0800 ira.weiny@intel.com wrote: > From: Ira Weiny > > CXL devices have multiple event logs which can be queried for CXL event > records. Devices are required to support the storage of at least one > event record in each event log type. > > Devices track event log overflow by incrementing a counter and tracking > the time of the first and last overflow event seen. > > Software queries events via the Get Event Record mailbox command; CXL > rev 3.0 section 8.2.9.2.2. > > Issue the Get Event Record mailbox command on driver load. Trace each > record found with a generic record trace. Trace any overflow > conditions. > > The device can return up to 1MB worth of event records per query. This > presents complications with allocating a huge buffers to potentially > capture all the records. It is not anticipated that these event logs > will be very deep and reading them does not need to be performant. > Process only 3 records at a time. 3 records was chosen as it fits > comfortably on the stack to prevent dynamic allocation while still > cutting down on extra mailbox messages. > > This patch traces a raw event record only and leaves the specific event > record types to subsequent patches. > > Macros are created to use for tracing the common CXL Event header > fields. > > Cc: Steven Rostedt > Signed-off-by: Ira Weiny Hi Ira, A question inline about whether some of the conditions you are checking for can actually happen. Otherwise looks good to me. Jonathan > > --- > Change from RFC v2: > Support reading 3 events at once. > Reverse Jonathan's suggestion and check for positive number of > records. Because the record count may have been > returned as something > 3 based on what the device > thinks it can send back even though the core Linux mbox > processing truncates the data. > Alison and Dave Jiang > Change header uuid type to uuid_t for better user space > processing > Smita > Check status reg before reading log. > Steven > Prefix all trace points with 'cxl_' > Use static branch _enabled() calls > Jonathan > s/CXL_EVENT_TYPE_INFO/0 > s/{first,last}/{first,last}_ts > Remove Reserved field from header > Fix header issue for cxl_event_log_type_str() > > Change from RFC: > Remove redundant error message in get event records loop > s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH > Use hdr_uuid for the header UUID field > Use cxl_event_log_type_str() for the trace events > Create macros for the header fields and common entries of each event > Add reserved buffer output dump > Report error if event query fails > Remove unused record_cnt variable > Steven - reorder overflow record > Remove NOTE about checkpatch > Jonathan > check for exactly 1 record > s/v3.0/rev 3.0 > Use 3 byte fields for 24bit fields > Add 3.0 Maintenance Operation Class > Add Dynamic Capacity log type > Fix spelling > Dave Jiang/Dan/Alison > s/cxl-event/cxl > trace/events/cxl-events => trace/events/cxl.h > s/cxl_event_overflow/overflow > s/cxl_event/generic_event > --- > MAINTAINERS | 1 + > drivers/cxl/core/mbox.c | 70 +++++++++++++++++++ > drivers/cxl/cxl.h | 8 +++ > drivers/cxl/cxlmem.h | 73 ++++++++++++++++++++ > include/trace/events/cxl.h | 127 +++++++++++++++++++++++++++++++++++ > include/uapi/linux/cxl_mem.h | 1 + > 6 files changed, 280 insertions(+) > create mode 100644 include/trace/events/cxl.h > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > index 16176b9278b4..a908b95a7de4 100644 > --- a/drivers/cxl/core/mbox.c > +++ b/drivers/cxl/core/mbox.c > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds, > + enum cxl_event_log_type type) > +{ > + struct cxl_get_event_payload payload; > + u16 pl_nr; > + > + do { > + u8 log_type = type; > + int rc; > + > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD, > + &log_type, sizeof(log_type), > + &payload, sizeof(payload)); > + if (rc) { > + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d", > + cxl_event_log_type_str(type), rc); > + return; > + } > + > + pl_nr = le16_to_cpu(payload.record_count); > + if (trace_cxl_generic_event_enabled()) { > + u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS); Either I'm misreading the spec, or it can't be greater than NR_RECORDS. "The number of event records in the Event Records list...." Event Records being the field inside this payload which is not big enough to take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records refers to the number being restricted by the mailbox output payload provided. I'm in favor of defense against broken hardware, but don't paper over any such error - scream about it. > + int i; > + > + for (i = 0; i < nr_rec; i++) > + trace_cxl_generic_event(dev_name(cxlds->dev), > + type, > + &payload.record[i]); > + } > + > + if (trace_cxl_overflow_enabled() && > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)) > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload); > + > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS || Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned payload not the total number. > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS); > +} > diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h > new file mode 100644 > index 000000000000..60dec9a84918 > --- /dev/null > +++ b/include/trace/events/cxl.h > @@ -0,0 +1,127 @@ > +#define CXL_EVT_TP_fast_assign(dname, l, hdr) \ > + __assign_str(dev_name, (dname)); \ > + __entry->log = (l); \ > + memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \ > + __entry->hdr_length = (hdr).length; \ > + __entry->hdr_flags = get_unaligned_le24((hdr).flags); \ > + __entry->hdr_handle = le16_to_cpu((hdr).handle); \ > + __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \ > + __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \ > + __entry->hdr_maint_op_class = (hdr).maint_op_class > + Trivial: Maybe one blank line is enough? > + > +#define CXL_EVT_TP_printk(fmt, ...) \ > + TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' " \ > + "handle=%x related_handle=%x maint_op_class=%u" \ > + " : " fmt, \ > + __get_str(dev_name), cxl_event_log_type_str(__entry->log), \ > + __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\ > + show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \ > + __entry->hdr_related_handle, __entry->hdr_maint_op_class, \ > + ##__VA_ARGS__)