Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp1039286rdb; Fri, 2 Feb 2024 11:29:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IFrR7E2bw7KKU+TB4rR9pWe3wm19bDYtXOknDyT5TgOKzyHKJZtIg4ewNgJ5CeUgWcw3zU/ X-Received: by 2002:a17:90a:b015:b0:296:5c1a:f681 with SMTP id x21-20020a17090ab01500b002965c1af681mr151107pjq.5.1706902163692; Fri, 02 Feb 2024 11:29:23 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706902163; cv=pass; d=google.com; s=arc-20160816; b=FL5bVnDcApsCIY47PALHta9TDvydD1Npbn0HOzY3yh3hbLlpoI00R009NrQYU0ntPx zW8Q5en9Bvz1ydWLkGJjYay0vkLlptPSUiB/X6cF4kLR+NOwYLS0LE0Svrj9jQWHxU51 rUstAxsyYLggM6Ej6WthVv4rWEytXo2u5E7D/qLczxF/jRGaLebjS1go+Nu+XbjG2H8I W7WUV29iZ0PlWV7XbxBbnWIhfz95Fzt/nssTCLChcE0P5t5or+dHkCqtnuLskoTHK9R5 UY/KfYhKDosDnnvRr/Jfg74akClSLDw49dXCDJzTO90VHGrp6RcAGfKRH9c3ROSGorb6 dWAw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=06LoBIpBsKBy2QprdyT9WQSgpJdIqcGNJyZIddLKHb4=; fh=V1iRbI0Qw81vEiVlqoi6EYHVpPYyGNNHeRZd1rtI/pg=; b=A0GfENgMzAeIkJkvFFl8sznG3O3FK5IsggqPjxvDF1l0WNtBSLHplygAyV/+SVwAYC 9TjDnGD7oaKPn5YuUbnRa0NL4m+twvYKI8ePks5LPrJX7PzAvH1nZ4pBKgAfBDtn+IRW 7J0R76SeLXcEBc9hyW3jhVWiW6iz4MOFG9ZNb3cOVZZX4VvllDzUE4vBYrngtcULa18r u30THy+sjwUF6OPcZ1s3kDeRM+y2of+0M44VSdUOw5SQmvbrSIkXaEARd+iaMqNV6Wof +i2pe3cTlh1oGimrdY4l5P/1/TcUHjXqmXpo0uzqxY/bBOhjFQOC9YIRAEqE02JnyCLo GZFQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=vuoQB1le; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-50479-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50479-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com X-Forwarded-Encrypted: i=1; AJvYcCUaqfnPbBdiKSvvaZNhWExn42pDVhnC3jNOOJqdcf/3qSHEDrBiEnybmObTNmrW18vJjZQCpswF644b6Xg8Q7aBKh8oWhUnV4IlhwpRwA== Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id f18-20020a17090ac29200b002961238b4b9si360908pjt.186.2024.02.02.11.29.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Feb 2024 11:29:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-50479-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=vuoQB1le; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-50479-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50479-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4C255289876 for ; Fri, 2 Feb 2024 19:29:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 130A080606; Fri, 2 Feb 2024 19:29:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="vuoQB1le" Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDC8215E5AE; Fri, 2 Feb 2024 19:29:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706902152; cv=none; b=CW9IUT8prQznoqyWEoVXa/26XSlx9ByNQFSdwgM1QxhdbRB00Zyj3+Fjzaazky0gSnAnxv5/FaYcgzQ+ucxLtc8GKEVs4r+TEd5jSRgO8T5OBATvlAugZM20DXH/jaLiFR90JX/5t+TjBcNfuZCLcqq+thFbny2iZ6UI3jLN8fg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706902152; c=relaxed/simple; bh=P5L94LQ6Hg9uW7husBFpiivZAxhm5usar4GO4ypUsik=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=OtQ3WydL2Wk6zuPCX4umYA+OAsphlVmRAgKQO3C6RSZV4I6fF1hUMQcDg6cZ64R0K4TgMI5GDgOOcm2kk9y1PDrYjk0DFIVEgCCJuQIZlT+vKq7/HJxyG9aCTupBfoaXzuvgxI61orzwDQ1l3X62oZqINd4jLWmyy4F0sJjJzD4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=vuoQB1le; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1706902144; bh=P5L94LQ6Hg9uW7husBFpiivZAxhm5usar4GO4ypUsik=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=vuoQB1lekXlbjADkCcvYIvisuIDfVtul+Mv7g1QhfkY+Dn8Kqp5EqVVFl6WgUeCME W9MsuJgQBpNHIHIEZrM7HtgrA80tGoe3cEBvkf6bFPPH93r8ERPxoEGcU19Vj0+0N6 vvtibcNd6X2uUGqTdNGHLALH2lWeui4gSATNqcr9uoTDeHeflu3Bh9HHFnnUzjxPdr 2UeI6iLMjX38nTDhl/OfOu57xWLk8+a9WU2pcnEYgkgUyO9CpJPcbBkMD7EXg1Ghaj DFKnWfrxihPS7a21/dHY3lcOn/pX72Zc1QAG/Xba2Jl3NCt0qpKoejnR54b+IqXJUn nbll165jXqVbA== Received: from [172.16.0.134] (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4TRQnD3vfJzXHG; Fri, 2 Feb 2024 14:29:04 -0500 (EST) Message-ID: <6bdf6085-101d-47ef-86f4-87936622345a@efficios.com> Date: Fri, 2 Feb 2024 14:29:05 -0500 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime Content-Language: en-US To: Dan Williams , Arnd Bergmann , Dave Chinner Cc: linux-kernel@vger.kernel.org, Andrew Morton , Linus Torvalds , linux-mm@kvack.org, linux-arch@vger.kernel.org, Vishal Verma , Dave Jiang , Matthew Wilcox , Russell King , nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, dm-devel@lists.linux.dev References: <20240131162533.247710-1-mathieu.desnoyers@efficios.com> <20240131162533.247710-3-mathieu.desnoyers@efficios.com> <65bab567665f3_37ad2943c@dwillia2-xfh.jf.intel.com.notmuch> <0a38176b-c453-4be0-be83-f3e1bb897973@efficios.com> <65bac71a9659b_37ad29428@dwillia2-xfh.jf.intel.com.notmuch> <65bd284165177_2d43c29443@dwillia2-mobl3.amr.corp.intel.com.notmuch> From: Mathieu Desnoyers In-Reply-To: <65bd284165177_2d43c29443@dwillia2-mobl3.amr.corp.intel.com.notmuch> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2024-02-02 12:37, Dan Williams wrote: > Mathieu Desnoyers wrote: [...] >> > >> The alternative route I intend to take is to audit all callers >> of alloc_dax() and make sure they all save the alloc_dax() return >> value in a struct dax_device * local variable first for the sake >> of checking for IS_ERR(). This will leave the xyz->dax_dev pointer >> initialized to NULL in the error case and simplify the rest of >> error checking. > > I could maybe get on board with that, but it needs a comment somewhere > about the asymmetric subtlety. Is this "somewhere" at every alloc_dax() call site, or do you have something else in mind ? > >> >> >>> return; >>> >>> if (dax_dev->holder_data != NULL) >>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >>> index 4e8fdcb3f1c8..b69c9e442cf4 100644 >>> --- a/drivers/nvdimm/pmem.c >>> +++ b/drivers/nvdimm/pmem.c >>> @@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev, >>> dax_dev = alloc_dax(pmem, &pmem_dax_ops); >>> if (IS_ERR(dax_dev)) { >>> rc = PTR_ERR(dax_dev); >>> - goto out; >>> + if (rc != -EOPNOTSUPP) >>> + goto out; >> >> If I compare the before / after this change, if previously >> pmem_attach_disk() was called in a configuration with FS_DAX=n, it would >> result in a NULL pointer dereference. > > No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the > CONFIG_FS_DAX=n case. Indeed, I was wrong there. > So that means that pmem devices on ARM have been > possible without FS_DAX. So, in order for alloc_dax() returning > ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error > path needs to be changed. Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y, which includes configurations that had FS_DAX=n previously. I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP alloc_dax() return value as non-fatal. > >> This would be an error handling fix all by itself. Do we really want >> to return successfully if dax is unsupported, or should we return >> an error here ? > > Per above, there is no error handling fix, and pmem block device > available should not depend on alloc_dax() succeeding. I agree on treating alloc_dax() failure as non-fatal. There is however one error handling fix to nvdimm/pmem which I plan to introduce as an initial patch before this change: nvdimm/pmem: Fix leak on dax_add_host() failure Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done before setting pmem->dax_dev, which therefore issues the two following calls on NULL pointers: out_cleanup_dax: kill_dax(pmem->dax_dev); put_dax(pmem->dax_dev); Signed-off-by: Mathieu Desnoyers diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 4e8fdcb3f1c8..9fe358090720 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev, set_dax_nomc(dax_dev); if (is_nvdimm_sync(nd_region)) set_dax_synchronous(dax_dev); + pmem->dax_dev = dax_dev; rc = dax_add_host(dax_dev, disk); if (rc) goto out_cleanup_dax; dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); - pmem->dax_dev = dax_dev; - rc = device_add_disk(dev, disk, pmem_attribute_groups); if (rc) goto out_remove_host; > > The real question is what to do about device-dax. I *think* it is not > affected by cpu_dcache aliasing because it never accesses user mappings > through a kernel alias. I doubt device-dax is in use on these platforms, > but we might need another fixup for that if someone screams about the > alloc_dax() behavior change making them lose device-dax access. By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX. Based on your analysis, is alloc_dax() still the right spot where to place this runtime check ? Which call sites are responsible for invoking alloc_dax() for device-dax ? If we know which call sites do not intend to use the kernel linear mapping, we could introduce a flag (or a new variant of the alloc_dax() API) that would either enforce or skip the check. [...] >> >> Here what I'm seeing so far: >> >> - devm_release_mem_region() is never called after devm_request_mem_region(). Not >> on error, neither on teardown, > > devm_release_mem_region() is called from virtio_fs_probe() context. That I guess you mean "devm_request_mem_region()" here. > means that when virtio_fs_probe() returns an error the driver core will > automatically call devm_request_mem_region(). And "devm_release_mem_region()" here. > >> - pgmap is never freed on error after devm_kzalloc. > > That is what the "devm_" in devm_kzalloc() does, free the memory on > driver-probe failure, or after the driver remove callback is invoked. Got it. > >> >>> { >>> + struct dax_device *dax_dev __free(cleanup_dax) = NULL; >>> struct virtio_shm_region cache_reg; >>> struct dev_pagemap *pgmap; >>> bool have_cache; >>> @@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) >>> if (!IS_ENABLED(CONFIG_FUSE_DAX)) >>> return 0; >>> >>> + dax_dev = alloc_dax(fs, &virtio_fs_dax_ops); >>> + if (IS_ERR(dax_dev)) { >>> + int rc = PTR_ERR(dax_dev); >>> + >>> + if (rc == -EOPNOTSUPP) >>> + return 0; >>> + return rc; >>> + } >> >> What is gained by moving this allocation here ? > > The gain is to fail early in virtio_fs_setup_dax() since the fundamental > dependency of alloc_dax() success is not met. For example why let the > setup progress to devm_memremap_pages() when alloc_dax() is going to > return ERR_PTR(-EOPNOTSUPP). What I don't know is whether there is a dependency requiring to do devm_request_mem_region(), devm_kzalloc(), devm_memremap_pages() before calling alloc_dax() ? Those 3 calls are used to populate: fs->window_phys_addr = (phys_addr_t) cache_reg.addr; fs->window_len = (phys_addr_t) cache_reg.len; and then alloc_dax() takes "fs" as private data parameter. So it's unclear to me whether we can swap the invocation order. I suspect that it is not an issue because it is only used to populate dax_dev->private, but I prefer to confirm this with you just to be on the safe side. [...] Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com