Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp1785112iog; Thu, 16 Jun 2022 13:52:21 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vipIA1XAsvIE+gXcK5958ocWnfxLs5FnYu29dLlAo7Tjm104H9jevIXeDw995qayoY0ek+ X-Received: by 2002:a17:902:e891:b0:167:5d51:f381 with SMTP id w17-20020a170902e89100b001675d51f381mr6274243plg.75.1655412740830; Thu, 16 Jun 2022 13:52:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655412740; cv=none; d=google.com; s=arc-20160816; b=vOxx7mb1vJ4y/UlyYpGPN6KDGbQJKX+O1r9jTJCVIJcx7DdKpkRJQpLV/wxiXQL0tH ouJoffkXxLzMXRuZK/FPBzaW/uyZXb89mUZtrgyRzm5Oayh4GdegjD+10H5gxQD9hUvd OYZIRwBMn46TKQ1TFduobk9HI6Ljv9mYcGriG7cTGfW5UTqr/k6eV0fe5ScbZGlG4lsE /5Q4TF0c0Dbqpe2ZB3fKK6roeIU1fWav5Pk0ubeREveRVRWJgIZpFszd3KDnlYm33wPT /jTTwy2VkGG7bFalvZsgL94GabrmVyOI2yZUYOMz4uPrHMIDzqlDmXkAuUadCYz/r9qG 9eFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=6fqHDLEHnZg7jxzKk3601xoekbgzx0ZyqsRu+al1JAg=; b=UFMCmhAc9smev/gKTiaZqk0hpyipTdeLVvlD+ZRh8bznHPtXyLyTyVnfBcBDszonPq ZShD226G750Ai8BSn8PZLZrJZykPNYy68tVg7be49zwWy9AvKMamlkCMkJkhQZImlO7j vLMJ2r+FGkMMTqP4bjk9ni5IQPphbWSjsM1cTDwsLoELmPyo23ktzlEmR+pZbIRpL3Yc cCepyFSn2zLhGtNLkVoEbabP7rN8ISXQKq+esEKeUAqrCJGOzJpTcQKbuWfkZfcDWPxQ CukEcgV8xPd6PO9CtYDreDW5gtGLoXTdnVS8mwFH4VufVBpaSwxnzk1MFpA+SDnWX8yJ iZbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="boAc/izs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k189-20020a6384c6000000b004088712cdc3si3505791pgd.716.2022.06.16.13.52.07; Thu, 16 Jun 2022 13:52:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="boAc/izs"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378658AbiFPUeX (ORCPT + 99 others); Thu, 16 Jun 2022 16:34:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378223AbiFPUeW (ORCPT ); Thu, 16 Jun 2022 16:34:22 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B67185C758; Thu, 16 Jun 2022 13:34:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655411661; x=1686947661; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=6NPi5JN15DLj+hnNH4iONRsN3PsHwgvkpeMbdRdXc4Q=; b=boAc/izsFt1FGUR2jyltC0JtUdJcq9Q0ziVHxG7XEdoYhPYogI7WVVUu aq8MFdOWViQx8MRgKhTrSGj53Wx4mwzHNR67NAbU2WVMyWwJsQ9t5YVMn FoHbauwJpjPneqlFxp6VXtsjZGneTcke2GU9KcnfeJjxdAK1lQ9pXsG1h 2msZrqR6g4Sa3l7dhU3erTCE3rxJ47GXY03aFV99DgTu4Ss0vuvpz+Vbu n2Sb6JJtg9MQec62Pm1Pacv9xSH2wiGVn6fv96LzDE2cTMzY59wV6rI2Y FDNBwQhh6pGzgcpa3nRLHBfiihiCnNK+Fl2/wi0VTsrn0atloAv426EHs g==; X-IronPort-AV: E=McAfee;i="6400,9594,10380"; a="365698497" X-IronPort-AV: E=Sophos;i="5.92,306,1650956400"; d="scan'208";a="365698497" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2022 13:34:20 -0700 X-IronPort-AV: E=Sophos;i="5.92,306,1650956400"; d="scan'208";a="589802054" Received: from alison-desk.jf.intel.com (HELO alison-desk) ([10.54.74.41]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2022 13:34:20 -0700 Date: Thu, 16 Jun 2022 13:34:00 -0700 From: Alison Schofield To: Davidlohr Bueso Cc: "Williams, Dan J" , "Weiny, Ira" , "Verma, Vishal L" , Ben Widawsky , Steven Rostedt , Ingo Molnar , "linux-cxl@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "a.manzanares@samsung.com" Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support Message-ID: <20220616203400.GA1529208@alison-desk> References: <382a9c35ef43e89db85670637d88371f9197b7a2.1655250669.git.alison.schofield@intel.com> <20220616194334.pvorvoozt4rrzr66@offworld> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220616194334.pvorvoozt4rrzr66@offworld> X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 16, 2022 at 12:43:34PM -0700, Davidlohr Bueso wrote: > On Tue, 14 Jun 2022, alison.schofield@intel.com wrote: > > >From: Alison Schofield > > > >CXL devices that support persistent memory maintain a list of locations > >that are poisoned or result in poison if the addresses are accessed by > >the host. > > > >Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison > >list as a set of Media Error Records that include the source of the > >error, the starting device physical address and length. The length is > >the number of adjacent DPAs in the record and is in units of 64 bytes. > > > >Retrieve the list and log each Media Error Record as a trace event of > >type cxl_poison_list. > > > >Signed-off-by: Alison Schofield > >--- > > drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++ > > drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 118 insertions(+) > > snip > >+int cxl_mem_get_poison_list(struct device *dev) > >+{ > >+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > >+ struct cxl_dev_state *cxlds = cxlmd->cxlds; > >+ struct cxl_mbox_poison_payload_out *po; > >+ struct cxl_mbox_poison_payload_in pi; > >+ int nr_records = 0; > >+ int rc, i; > >+ > >+ if (range_len(&cxlds->pmem_range)) { > >+ pi.offset = cpu_to_le64(cxlds->pmem_range.start); > >+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range)); First off - you stopped at a bug here - that pi.length needs to be in units of 64 bytes. > > Do you ever see this changing to not always use the full pmem DPA range > but allow arbitrary ones? I also assume this is the reason why you don't > check the range vs cxlds->ram_range to prevent any overlaps, no? > > Thanks, > Davidlohr David - Great question! I'm headed in this direction - cxl list --media-errors -m mem1 lists media errors for requested memdev cxl list --media-errors -r region# lists region errors with HPA addresses (So here cxl tool will collect the poison for all the regions memdevs and do the DPA to HPA translation) To answer your question, I wasn't thinking of limiting the range within the memdev, but certainly could. And if we were taking in ranges, those ranges would need to be checked. $cxl list --media-errors -m mem1 --range-start= --range-end|len= Now, if I left the sysfs inteface as is, the driver will read the entire poison list for the memdev and then cxl tool will filter it for the range requested. Or, maybe we should implement in libcxl (not sysfs), with memdev and range options and only collect from the device the range requested. Either one looks the same to the cxl tool user, but limiting the range we send to the device would certainly cut down on unwanted records being logged, retrieved, and examined. I'd like to hear more from you and other community members. Alison > > snip