2022-12-17 04:04:00

by Luis Chamberlain

[permalink] [raw]
Subject: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

To unit test CXL today we make use of the ndctl meson test suite but
this requires the cxl_test driver. There is however is no kconfig option
for this driver as its required to be built as an external driver. To
debug CXL with this framework is not inuitive as it departs itself from
typical Linux kernel development debugging processes, requiring an
external module build which also happens to *rebuild* all CXL related
production drivers with some new magic incantations, and replacing
your production CXL modules with the new ones.

This is quite complex, departs ourselves from the typical build/boot
debugging process most folks are used to, and requires a manual error-prone
process which in some kernels / configurations can leads up to a kernel
crash [0].

We can replace this by having the requirements be defined through proper
kconfig symbols and having the cxl_test driver and requirements also become
part of the standard Linux kernel build process. This matches most other
kernel kernel debugging frameworks for subsystems, which don't require any
external modules.

Let's review the current strategy today, first, so nothing is lost:

* one must manually *build*, and then as a second step install
the cxl_test driver as an external modules:

make M=tools/testing/cxl/
sudo M=tools/testing/cxl/ modules_install

Provided your depmod.d was configured correctly on your Linux
distribution you will end up with a complete set of CXL production
modules and cxl_test mock drivers to let you now use the ndctl
test suite. To be clear, you will not only end up with cxl_test
but also with a complete set of module replacements for your CXL
environment.

This works by:

a) allowing the external module to re-define the __mock macro
to __weak, used on to_cxl_host_bridge() and allows the mock driver
to provide a replacement for that single call.

b) the external module build process *rebuilds* all production
modules *again* but uses the the binutils --wrap=symbol
feature [0] [1] to let the production CXL code use the mocked up
CXL features.

We can simplify all this considerably and do away with the external
modules requirements. The __mock stuff is raplaced by addressing the
to_cxl_host_bridge() mapping using a define based on your kernel
configuration. If using the production code you use the produciton
__to_cxl_host_bridge(), otherwise mock_to_cxl_host_bridge() will be
used. This is the *only* eyesore in the CXL code to enable use of the
mock driver.

The magic --wrap=symbol incantations are also just tucked in a new
production drivers/cxl/Makefile.mock which is only read when the kernel
has been configured for debugging using the CXl mock framework.

The last bit of work left is to move as built-in code shared code
between a production environment (non-debugging) and between what is
needed for the same code to run when doing mock debugging. Today the
requirements are small:

* The code to implement to_cxl_host_bridge()
* When mock debugging is enabled, just the code we need to
support mock_to_cxl_host_bridge()

For both cases this is needed you have CXL_ACPI enabled.

In the future if we wanted to then now use the kernel selftests,
for example a tools/testing/sefltests/cxl/ directory, we can easily
do so. This also enables us to separate out unit tests out from the
ndctl tree and allow unit tests to also be developed and written
upstream on the kernel.

Another benefit of this approach is that there is no bit rot,
in the sense that now bots can go willy nilly test building this
code, whereas before only those who knew the proper incantations
actually were building this code and loading it properly.

[0] https://lkml.kernel.org/r/[email protected]
[1] https://sourceware.org/binutils/docs-2.23.1/ld/Options.html#index-g_t_002d_002dwrap_003d_0040var_007bsymbol_007d-263
[2] https://lwn.net/Articles/558106/
Signed-off-by: Luis Chamberlain <[email protected]>
---

What do folks think?

The test results:

https://gist.github.com/mcgrof/2ab7f1601141faa5ac7b16240b4ea652

The summary of test results:

Summary of Failures:

1/5 ndctl:cxl / cxl-topology.sh FAIL 0.50s exit status 1
2/5 ndctl:cxl / cxl-region-sysfs.sh FAIL 0.44s exit status 1
5/5 ndctl:cxl / cxl-xor-region.sh FAIL 0.45s exit status 1

Ok: 2
Expected Fail: 0
Fail: 3
Unexpected Pass: 0
Skipped: 0
Timeout: 0

I don't quite get the failures yet, but hey it's a start.
This commit depends on Dan's patch:

https://lkmll.kernel.org/r/[email protected]

But I can build another RFC if folks want without it.

drivers/cxl/Kconfig | 23 +++++++
drivers/cxl/Makefile | 2 +
drivers/cxl/Makefile.mock | 15 +++++
drivers/cxl/acpi.c | 13 ----
drivers/cxl/core/Makefile | 4 ++
drivers/cxl/core/acpi.c | 30 +++++++++
drivers/cxl/cxl.h | 19 +++---
lib/Kconfig.debug | 8 +++
lib/Makefile | 1 +
lib/test_cxl/Makefile | 13 ++++
lib/test_cxl/acpi.c | 28 +++++++++
lib/test_cxl/core.c | 37 +++++++++++
.../testing/cxl/test => lib/test_cxl}/cxl.c | 7 ---
.../testing/cxl/test => lib/test_cxl}/mem.c | 0
.../testing/cxl/test => lib/test_cxl}/mock.c | 30 ---------
.../testing/cxl/test => lib/test_cxl}/mock.h | 0
tools/testing/cxl/Kbuild | 61 -------------------
tools/testing/cxl/config_check.c | 13 ----
tools/testing/cxl/cxl_acpi_test.c | 6 --
tools/testing/cxl/cxl_core_test.c | 6 --
tools/testing/cxl/cxl_mem_test.c | 6 --
tools/testing/cxl/cxl_pmem_test.c | 6 --
tools/testing/cxl/cxl_port_test.c | 6 --
tools/testing/cxl/mock_acpi.c | 35 -----------
tools/testing/cxl/test/Kbuild | 10 ---
tools/testing/cxl/watermark.h | 25 --------
26 files changed, 173 insertions(+), 231 deletions(-)
create mode 100644 drivers/cxl/Makefile.mock
create mode 100644 drivers/cxl/core/acpi.c
create mode 100644 lib/test_cxl/Makefile
create mode 100644 lib/test_cxl/acpi.c
create mode 100644 lib/test_cxl/core.c
rename {tools/testing/cxl/test => lib/test_cxl}/cxl.c (99%)
rename {tools/testing/cxl/test => lib/test_cxl}/mem.c (100%)
rename {tools/testing/cxl/test => lib/test_cxl}/mock.c (88%)
rename {tools/testing/cxl/test => lib/test_cxl}/mock.h (100%)
delete mode 100644 tools/testing/cxl/Kbuild
delete mode 100644 tools/testing/cxl/config_check.c
delete mode 100644 tools/testing/cxl/cxl_acpi_test.c
delete mode 100644 tools/testing/cxl/cxl_core_test.c
delete mode 100644 tools/testing/cxl/cxl_mem_test.c
delete mode 100644 tools/testing/cxl/cxl_pmem_test.c
delete mode 100644 tools/testing/cxl/cxl_port_test.c
delete mode 100644 tools/testing/cxl/mock_acpi.c
delete mode 100644 tools/testing/cxl/test/Kbuild
delete mode 100644 tools/testing/cxl/watermark.h

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 0ac53c422c31..051f2a531ce0 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -49,6 +49,11 @@ config CXL_MEM_RAW_COMMANDS

If developing CXL hardware or the driver say Y, otherwise say N.

+config CXL_ACPI_BUILTIN
+ bool
+ depends on CXL_ACPI
+ default y
+
config CXL_ACPI
tristate "CXL ACPI: Platform Support"
depends on ACPI
@@ -129,4 +134,22 @@ config CXL_REGION_INVALIDATION_TEST
If unsure, or if this kernel is meant for production environments,
say N.

+config CXL_DEBUG_MOCK_ACPI
+ bool
+ default y
+ depends on CXL_DEBUG_MOCK
+ depends on CXL_ACPI
+
+config CXL_DEBUG_MOCK
+ bool "Enables CXL debug mocking framework"
+ depends on CXL_BUS
+ depends on CXL_BUS!=y
+ depends on CXL_ACPI!=y
+ depends on CXL_PMEM!=y
+ select TEST_CXL
+ help
+ Enables CXL debugging mocking facilities. Enable this if you want
+ to test CXL with the ability to mock topologies otherwise not easily
+ possible with a real system or qemu.
+
endif
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index db321f48ba52..894488b4746a 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,4 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
+include $(srctree)/drivers/cxl/Makefile.mock
+
obj-y += core/
obj-$(CONFIG_CXL_PCI) += cxl_pci.o
obj-$(CONFIG_CXL_MEM) += cxl_mem.o
diff --git a/drivers/cxl/Makefile.mock b/drivers/cxl/Makefile.mock
new file mode 100644
index 000000000000..523acf01894a
--- /dev/null
+++ b/drivers/cxl/Makefile.mock
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+ifneq ($(CONFIG_CXL_DEBUG_MOCK),)
+ldflags-y += --wrap=acpi_table_parse_cedt
+ldflags-y += --wrap=is_acpi_device_node
+ldflags-y += --wrap=acpi_evaluate_integer
+ldflags-y += --wrap=acpi_pci_find_root
+ldflags-y += --wrap=nvdimm_bus_register
+ldflags-y += --wrap=devm_cxl_port_enumerate_dports
+ldflags-y += --wrap=devm_cxl_setup_hdm
+ldflags-y += --wrap=devm_cxl_add_passthrough_decoder
+ldflags-y += --wrap=devm_cxl_enumerate_decoders
+ldflags-y += --wrap=cxl_await_media_ready
+ldflags-y += --wrap=cxl_hdm_decode_init
+ldflags-y += --wrap=cxl_rcrb_to_component
+endif
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 657ef250d848..715766dac899 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -309,19 +309,6 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
return -ENOMEM;
}

-__mock struct acpi_device *to_cxl_host_bridge(struct device *host,
- struct device *dev)
-{
- struct acpi_device *adev = to_acpi_device(dev);
-
- if (!acpi_pci_find_root(adev->handle))
- return NULL;
-
- if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0)
- return adev;
- return NULL;
-}
-
/*
* A host bridge is a dport to a CFMWS decode and it is a uport to the
* dport (PCIe Root Ports) in the host bridge.
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 79c7257f4107..969567113a42 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -1,4 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
+include $(srctree)/drivers/cxl/Makefile.mock
+
obj-$(CONFIG_CXL_BUS) += cxl_core.o
obj-$(CONFIG_CXL_SUSPEND) += suspend.o

@@ -11,3 +13,5 @@ cxl_core-y += mbox.o
cxl_core-y += pci.o
cxl_core-y += hdm.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
+
+obj-$(CONFIG_CXL_ACPI_BUILTIN) += acpi.o
diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
new file mode 100644
index 000000000000..b68cf5b45385
--- /dev/null
+++ b/drivers/cxl/core/acpi.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/memregion.h>
+#include <linux/workqueue.h>
+#include <linux/debugfs.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <linux/acpi.h>
+#include <cxlmem.h>
+#include <cxlpci.h>
+#include <cxl.h>
+#include "core.h"
+
+struct acpi_device *__to_cxl_host_bridge(struct device *host,
+ struct device *dev)
+{
+ struct acpi_device *adev = to_acpi_device(dev);
+
+ if (!acpi_pci_find_root(adev->handle))
+ return NULL;
+
+ if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0)
+ return adev;
+ return NULL;
+}
+EXPORT_SYMBOL_NS_GPL(__to_cxl_host_bridge, CXL);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index ed2b0a2e80e2..e7cb044372fe 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -693,12 +693,17 @@ static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
}
#endif

-/*
- * Unit test builds overrides this to __weak, find the 'strong' version
- * of these symbols in tools/testing/cxl/.
- */
-#ifndef __mock
-#define __mock static
-#endif
+#ifdef CONFIG_CXL_ACPI_BUILTIN
+
+struct acpi_device *__to_cxl_host_bridge(struct device *host, struct device *dev);
+
+#ifdef CONFIG_CXL_DEBUG_MOCK_ACPI
+struct acpi_device *mock_to_cxl_host_bridge(struct device *host, struct device *dev);
+#define to_cxl_host_bridge mock_to_cxl_host_bridge
+#else
+#define to_cxl_host_bridge __to_cxl_host_bridge
+#endif /* CONFIG_CXL_DEBUG_MOCK_ACPI */
+
+#endif /* CONFIG_CXL_ACPI_BUILTIN */

#endif /* __CXL_H__ */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 67c86b20c96a..ef87a513e3db 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2104,6 +2104,14 @@ config CPUMASK_KUNIT_TEST

If unsure, say N.

+config TEST_CXL
+ tristate "Let's you load a CXL mock driver"
+ depends on CXL_BUS
+ depends on CXL_DEBUG_MOCK
+ default n
+ help
+ Enable this to allow you to compile a CXL mock driver.
+
config TEST_LIST_SORT
tristate "Linked list sorting test" if !KUNIT_ALL_TESTS
depends on KUNIT
diff --git a/lib/Makefile b/lib/Makefile
index 4d9461bfea42..a29f9632e817 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
+obj-$(CONFIG_CXL_DEBUG_MOCK) += test_cxl/
#
# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
diff --git a/lib/test_cxl/Makefile b/lib/test_cxl/Makefile
new file mode 100644
index 000000000000..b4f0f6d890ef
--- /dev/null
+++ b/lib/test_cxl/Makefile
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+ccflags-y := -I$(srctree)/drivers/cxl/
+
+obj-$(CONFIG_CXL_DEBUG_MOCK) += core.o
+obj-$(CONFIG_CXL_DEBUG_MOCK_ACPI) += acpi.o
+
+obj-$(CONFIG_TEST_CXL) += cxl_test.o
+obj-$(CONFIG_TEST_CXL) += cxl_mock.o
+obj-$(CONFIG_TEST_CXL) += cxl_mock_mem.o
+
+cxl_test-y := cxl.o
+cxl_mock-y := mock.o
+cxl_mock_mem-y := mem.o
diff --git a/lib/test_cxl/acpi.c b/lib/test_cxl/acpi.c
new file mode 100644
index 000000000000..18fbc32a96d3
--- /dev/null
+++ b/lib/test_cxl/acpi.c
@@ -0,0 +1,28 @@
+#include <linux/acpi.h>
+#include <linux/platform_device.h>
+#include <cxl.h>
+#include "mock.h"
+
+struct acpi_device *mock_to_cxl_host_bridge(struct device *host,
+ struct device *dev)
+{
+ int index;
+ struct acpi_device *found = NULL;
+ struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+
+ if (ops && ops->is_mock_bridge(dev)) {
+ found = ACPI_COMPANION(dev);
+ goto out;
+ }
+
+ if (dev->bus == &platform_bus_type)
+ goto out;
+
+ found = __to_cxl_host_bridge(host, dev);
+out:
+ put_cxl_mock_ops(index);
+ return found;
+}
+EXPORT_SYMBOL_NS_GPL(mock_to_cxl_host_bridge, CXL);
+
+MODULE_IMPORT_NS(CXL);
diff --git a/lib/test_cxl/core.c b/lib/test_cxl/core.c
new file mode 100644
index 000000000000..1d37ab5e9305
--- /dev/null
+++ b/lib/test_cxl/core.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-only
+//Copyright(c) 2021 Intel Corporation. All rights reserved.
+
+#include <linux/export.h>
+#include <linux/list.h>
+#include <linux/rculist.h>
+#include <linux/srcu.h>
+#include "mock.h"
+
+static LIST_HEAD(mock);
+DEFINE_SRCU(cxl_mock_srcu);
+
+struct cxl_mock_ops *get_cxl_mock_ops(int *index)
+{
+ *index = srcu_read_lock(&cxl_mock_srcu);
+ return list_first_or_null_rcu(&mock, struct cxl_mock_ops, list);
+}
+EXPORT_SYMBOL_GPL(get_cxl_mock_ops);
+
+void put_cxl_mock_ops(int index)
+{
+ srcu_read_unlock(&cxl_mock_srcu, index);
+}
+EXPORT_SYMBOL_GPL(put_cxl_mock_ops);
+
+void register_cxl_mock_ops(struct cxl_mock_ops *ops)
+{
+ list_add_rcu(&ops->list, &mock);
+}
+EXPORT_SYMBOL_GPL(register_cxl_mock_ops);
+
+void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
+{
+ list_del_rcu(&ops->list);
+ synchronize_srcu(&cxl_mock_srcu);
+}
+EXPORT_SYMBOL_GPL(unregister_cxl_mock_ops);
diff --git a/tools/testing/cxl/test/cxl.c b/lib/test_cxl/cxl.c
similarity index 99%
rename from tools/testing/cxl/test/cxl.c
rename to lib/test_cxl/cxl.c
index 920bd969c554..ef8071f48142 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/lib/test_cxl/cxl.c
@@ -10,7 +10,6 @@
#include <linux/mm.h>
#include <cxlmem.h>

-#include "../watermark.h"
#include "mock.h"

static int interleave_arithmetic;
@@ -1121,12 +1120,6 @@ static __init int cxl_test_init(void)
{
int rc, i;

- cxl_acpi_test();
- cxl_core_test();
- cxl_mem_test();
- cxl_pmem_test();
- cxl_port_test();
-
register_cxl_mock_ops(&cxl_mock_ops);

cxl_mock_pool = gen_pool_create(ilog2(SZ_2M), NUMA_NO_NODE);
diff --git a/tools/testing/cxl/test/mem.c b/lib/test_cxl/mem.c
similarity index 100%
rename from tools/testing/cxl/test/mem.c
rename to lib/test_cxl/mem.c
diff --git a/tools/testing/cxl/test/mock.c b/lib/test_cxl/mock.c
similarity index 88%
rename from tools/testing/cxl/test/mock.c
rename to lib/test_cxl/mock.c
index 5dface08e0de..fa2b2d5c7054 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/lib/test_cxl/mock.c
@@ -11,36 +11,6 @@
#include <cxlpci.h>
#include "mock.h"

-static LIST_HEAD(mock);
-
-void register_cxl_mock_ops(struct cxl_mock_ops *ops)
-{
- list_add_rcu(&ops->list, &mock);
-}
-EXPORT_SYMBOL_GPL(register_cxl_mock_ops);
-
-static DEFINE_SRCU(cxl_mock_srcu);
-
-void unregister_cxl_mock_ops(struct cxl_mock_ops *ops)
-{
- list_del_rcu(&ops->list);
- synchronize_srcu(&cxl_mock_srcu);
-}
-EXPORT_SYMBOL_GPL(unregister_cxl_mock_ops);
-
-struct cxl_mock_ops *get_cxl_mock_ops(int *index)
-{
- *index = srcu_read_lock(&cxl_mock_srcu);
- return list_first_or_null_rcu(&mock, struct cxl_mock_ops, list);
-}
-EXPORT_SYMBOL_GPL(get_cxl_mock_ops);
-
-void put_cxl_mock_ops(int index)
-{
- srcu_read_unlock(&cxl_mock_srcu, index);
-}
-EXPORT_SYMBOL_GPL(put_cxl_mock_ops);
-
bool __wrap_is_acpi_device_node(const struct fwnode_handle *fwnode)
{
struct acpi_device *adev =
diff --git a/tools/testing/cxl/test/mock.h b/lib/test_cxl/mock.h
similarity index 100%
rename from tools/testing/cxl/test/mock.h
rename to lib/test_cxl/mock.h
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
deleted file mode 100644
index 427174feeb7d..000000000000
--- a/tools/testing/cxl/Kbuild
+++ /dev/null
@@ -1,61 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-ldflags-y += --wrap=acpi_table_parse_cedt
-ldflags-y += --wrap=is_acpi_device_node
-ldflags-y += --wrap=acpi_evaluate_integer
-ldflags-y += --wrap=acpi_pci_find_root
-ldflags-y += --wrap=nvdimm_bus_register
-ldflags-y += --wrap=devm_cxl_port_enumerate_dports
-ldflags-y += --wrap=devm_cxl_setup_hdm
-ldflags-y += --wrap=devm_cxl_add_passthrough_decoder
-ldflags-y += --wrap=devm_cxl_enumerate_decoders
-ldflags-y += --wrap=cxl_await_media_ready
-ldflags-y += --wrap=cxl_hdm_decode_init
-ldflags-y += --wrap=cxl_rcrb_to_component
-
-DRIVERS := ../../../drivers
-CXL_SRC := $(DRIVERS)/cxl
-CXL_CORE_SRC := $(DRIVERS)/cxl/core
-ccflags-y := -I$(srctree)/drivers/cxl/
-ccflags-y += -D__mock=__weak
-
-obj-m += cxl_acpi.o
-
-cxl_acpi-y := $(CXL_SRC)/acpi.o
-cxl_acpi-y += mock_acpi.o
-cxl_acpi-y += config_check.o
-cxl_acpi-y += cxl_acpi_test.o
-
-obj-m += cxl_pmem.o
-
-cxl_pmem-y := $(CXL_SRC)/pmem.o
-cxl_pmem-y += $(CXL_SRC)/security.o
-cxl_pmem-y += config_check.o
-cxl_pmem-y += cxl_pmem_test.o
-
-obj-m += cxl_port.o
-
-cxl_port-y := $(CXL_SRC)/port.o
-cxl_port-y += config_check.o
-cxl_port-y += cxl_port_test.o
-
-
-obj-m += cxl_mem.o
-
-cxl_mem-y := $(CXL_SRC)/mem.o
-cxl_mem-y += config_check.o
-cxl_mem-y += cxl_mem_test.o
-
-obj-m += cxl_core.o
-
-cxl_core-y := $(CXL_CORE_SRC)/port.o
-cxl_core-y += $(CXL_CORE_SRC)/pmem.o
-cxl_core-y += $(CXL_CORE_SRC)/regs.o
-cxl_core-y += $(CXL_CORE_SRC)/memdev.o
-cxl_core-y += $(CXL_CORE_SRC)/mbox.o
-cxl_core-y += $(CXL_CORE_SRC)/pci.o
-cxl_core-y += $(CXL_CORE_SRC)/hdm.o
-cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
-cxl_core-y += config_check.o
-cxl_core-y += cxl_core_test.o
-
-obj-m += test/
diff --git a/tools/testing/cxl/config_check.c b/tools/testing/cxl/config_check.c
deleted file mode 100644
index de5e5b3652fd..000000000000
--- a/tools/testing/cxl/config_check.c
+++ /dev/null
@@ -1,13 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/bug.h>
-
-void check(void)
-{
- /*
- * These kconfig symbols must be set to "m" for cxl_test to load
- * and operate.
- */
- BUILD_BUG_ON(!IS_MODULE(CONFIG_CXL_BUS));
- BUILD_BUG_ON(!IS_MODULE(CONFIG_CXL_ACPI));
- BUILD_BUG_ON(!IS_MODULE(CONFIG_CXL_PMEM));
-}
diff --git a/tools/testing/cxl/cxl_acpi_test.c b/tools/testing/cxl/cxl_acpi_test.c
deleted file mode 100644
index 8602dc27c81c..000000000000
--- a/tools/testing/cxl/cxl_acpi_test.c
+++ /dev/null
@@ -1,6 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
-
-#include "watermark.h"
-
-cxl_test_watermark(cxl_acpi);
diff --git a/tools/testing/cxl/cxl_core_test.c b/tools/testing/cxl/cxl_core_test.c
deleted file mode 100644
index 464a9255e4d6..000000000000
--- a/tools/testing/cxl/cxl_core_test.c
+++ /dev/null
@@ -1,6 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
-
-#include "watermark.h"
-
-cxl_test_watermark(cxl_core);
diff --git a/tools/testing/cxl/cxl_mem_test.c b/tools/testing/cxl/cxl_mem_test.c
deleted file mode 100644
index ba7fb8a44288..000000000000
--- a/tools/testing/cxl/cxl_mem_test.c
+++ /dev/null
@@ -1,6 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
-
-#include "watermark.h"
-
-cxl_test_watermark(cxl_mem);
diff --git a/tools/testing/cxl/cxl_pmem_test.c b/tools/testing/cxl/cxl_pmem_test.c
deleted file mode 100644
index 3fd884fae537..000000000000
--- a/tools/testing/cxl/cxl_pmem_test.c
+++ /dev/null
@@ -1,6 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
-
-#include "watermark.h"
-
-cxl_test_watermark(cxl_pmem);
diff --git a/tools/testing/cxl/cxl_port_test.c b/tools/testing/cxl/cxl_port_test.c
deleted file mode 100644
index be183917a9f6..000000000000
--- a/tools/testing/cxl/cxl_port_test.c
+++ /dev/null
@@ -1,6 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
-
-#include "watermark.h"
-
-cxl_test_watermark(cxl_port);
diff --git a/tools/testing/cxl/mock_acpi.c b/tools/testing/cxl/mock_acpi.c
deleted file mode 100644
index 55813de26d46..000000000000
--- a/tools/testing/cxl/mock_acpi.c
+++ /dev/null
@@ -1,35 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
-
-#include <linux/platform_device.h>
-#include <linux/device.h>
-#include <linux/acpi.h>
-#include <cxl.h>
-#include "test/mock.h"
-
-struct acpi_device *to_cxl_host_bridge(struct device *host, struct device *dev)
-{
- int index;
- struct acpi_device *adev, *found = NULL;
- struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
-
- if (ops && ops->is_mock_bridge(dev)) {
- found = ACPI_COMPANION(dev);
- goto out;
- }
-
- if (dev->bus == &platform_bus_type)
- goto out;
-
- adev = to_acpi_device(dev);
- if (!acpi_pci_find_root(adev->handle))
- goto out;
-
- if (strcmp(acpi_device_hid(adev), "ACPI0016") == 0) {
- found = adev;
- dev_dbg(host, "found host bridge %s\n", dev_name(&adev->dev));
- }
-out:
- put_cxl_mock_ops(index);
- return found;
-}
diff --git a/tools/testing/cxl/test/Kbuild b/tools/testing/cxl/test/Kbuild
deleted file mode 100644
index 4e59e2c911f6..000000000000
--- a/tools/testing/cxl/test/Kbuild
+++ /dev/null
@@ -1,10 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-ccflags-y := -I$(srctree)/drivers/cxl/
-
-obj-m += cxl_test.o
-obj-m += cxl_mock.o
-obj-m += cxl_mock_mem.o
-
-cxl_test-y := cxl.o
-cxl_mock-y := mock.o
-cxl_mock_mem-y := mem.o
diff --git a/tools/testing/cxl/watermark.h b/tools/testing/cxl/watermark.h
deleted file mode 100644
index 9d81d4a5f6be..000000000000
--- a/tools/testing/cxl/watermark.h
+++ /dev/null
@@ -1,25 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
-#ifndef _TEST_CXL_WATERMARK_H_
-#define _TEST_CXL_WATERMARK_H_
-#include <linux/module.h>
-#include <linux/printk.h>
-
-int cxl_acpi_test(void);
-int cxl_core_test(void);
-int cxl_mem_test(void);
-int cxl_pmem_test(void);
-int cxl_port_test(void);
-
-/*
- * dummy routine for cxl_test to validate it is linking to the properly
- * mocked module and not the standard one from the base tree.
- */
-#define cxl_test_watermark(x) \
-int x##_test(void) \
-{ \
- pr_debug("%s for cxl_test\n", KBUILD_MODNAME); \
- return 0; \
-} \
-EXPORT_SYMBOL(x##_test)
-#endif /* _TEST_CXL_WATERMARK_H_ */
--
2.35.1


2022-12-17 05:17:38

by Dan Williams

[permalink] [raw]
Subject: RE: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

Luis Chamberlain wrote:
> To unit test CXL today we make use of the ndctl meson test suite but
> this requires the cxl_test driver. There is however is no kconfig option
> for this driver as its required to be built as an external driver. To
> debug CXL with this framework is not inuitive as it departs itself from
> typical Linux kernel development debugging processes, requiring an
> external module build which also happens to *rebuild* all CXL related
> production drivers with some new magic incantations, and replacing
> your production CXL modules with the new ones.
>
> This is quite complex, departs ourselves from the typical build/boot
> debugging process most folks are used to, and requires a manual error-prone
> process which in some kernels / configurations can leads up to a kernel
> crash [0].
>
> We can replace this by having the requirements be defined through proper
> kconfig symbols and having the cxl_test driver and requirements also become
> part of the standard Linux kernel build process. This matches most other
> kernel kernel debugging frameworks for subsystems, which don't require any
> external modules.
>
> Let's review the current strategy today, first, so nothing is lost:
>
> * one must manually *build*, and then as a second step install
> the cxl_test driver as an external modules:
>
> make M=tools/testing/cxl/
> sudo M=tools/testing/cxl/ modules_install
>
> Provided your depmod.d was configured correctly on your Linux
> distribution you will end up with a complete set of CXL production
> modules and cxl_test mock drivers to let you now use the ndctl
> test suite. To be clear, you will not only end up with cxl_test
> but also with a complete set of module replacements for your CXL
> environment.
>
> This works by:
>
> a) allowing the external module to re-define the __mock macro
> to __weak, used on to_cxl_host_bridge() and allows the mock driver
> to provide a replacement for that single call.
>
> b) the external module build process *rebuilds* all production
> modules *again* but uses the the binutils --wrap=symbol
> feature [0] [1] to let the production CXL code use the mocked up
> CXL features.
>
> We can simplify all this considerably and do away with the external
> modules requirements. The __mock stuff is raplaced by addressing the
> to_cxl_host_bridge() mapping using a define based on your kernel
> configuration. If using the production code you use the produciton
> __to_cxl_host_bridge(), otherwise mock_to_cxl_host_bridge() will be
> used. This is the *only* eyesore in the CXL code to enable use of the
> mock driver.
>
> The magic --wrap=symbol incantations are also just tucked in a new
> production drivers/cxl/Makefile.mock which is only read when the kernel
> has been configured for debugging using the CXl mock framework.
>
> The last bit of work left is to move as built-in code shared code
> between a production environment (non-debugging) and between what is
> needed for the same code to run when doing mock debugging. Today the
> requirements are small:
>
> * The code to implement to_cxl_host_bridge()
> * When mock debugging is enabled, just the code we need to
> support mock_to_cxl_host_bridge()
>
> For both cases this is needed you have CXL_ACPI enabled.
>
> In the future if we wanted to then now use the kernel selftests,
> for example a tools/testing/sefltests/cxl/ directory, we can easily
> do so. This also enables us to separate out unit tests out from the
> ndctl tree and allow unit tests to also be developed and written
> upstream on the kernel.
>
> Another benefit of this approach is that there is no bit rot,
> in the sense that now bots can go willy nilly test building this
> code, whereas before only those who knew the proper incantations
> actually were building this code and loading it properly.
>
> [0] https://lkml.kernel.org/r/[email protected]
> [1] https://sourceware.org/binutils/docs-2.23.1/ld/Options.html#index-g_t_002d_002dwrap_003d_0040var_007bsymbol_007d-263
> [2] https://lwn.net/Articles/558106/
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
>
> What do folks think?
>
> The test results:
>
> https://gist.github.com/mcgrof/2ab7f1601141faa5ac7b16240b4ea652
>
> The summary of test results:
>
> Summary of Failures:
>
> 1/5 ndctl:cxl / cxl-topology.sh FAIL 0.50s exit status 1
> 2/5 ndctl:cxl / cxl-region-sysfs.sh FAIL 0.44s exit status 1
> 5/5 ndctl:cxl / cxl-xor-region.sh FAIL 0.45s exit status 1

At least for me the presented rationale and these results leaves me
cold. Yes, there are sharp edges but 0day groks this scheme when it runs
tools/testing/nvdimm/ (nfit_test) tests. The run_qemu script automates
cxl_test and nfit_test this as well. So the complexity has not proven
prohibitive. In the case of tools/testing/nvdimm/ it has not proven
prohibitive for years. In other words the suggestion that the current
organization ultimately leads to bit rot has not been substantiated in
practice.

The proposed direction to move tests out of the ndctl.git repo into the
kernel solves the wrong problem. It overlooks the fact that the tests
are more tightly coupled to libcxl changes than kernel changes. So in
terms of benefits of code being colocated, tests + libcxl + tools in the
same repo is more impactful than tests + kernel in the same repo.

I know Jonathan has some latent ideas about building up a CXL regression
suite on top of QEMU, but even there it's not clear that would benefit
from being developed in linux.git given the tight coupling to QEMU.git.

>
> Ok: 2
> Expected Fail: 0
> Fail: 3
> Unexpected Pass: 0
> Skipped: 0
> Timeout: 0
>
> I don't quite get the failures yet, but hey it's a start.
> This commit depends on Dan's patch:
>
> https://lkmll.kernel.org/r/[email protected]
>
> But I can build another RFC if folks want without it.
>
> drivers/cxl/Kconfig | 23 +++++++
> drivers/cxl/Makefile | 2 +
> drivers/cxl/Makefile.mock | 15 +++++
> drivers/cxl/acpi.c | 13 ----
> drivers/cxl/core/Makefile | 4 ++
> drivers/cxl/core/acpi.c | 30 +++++++++
> drivers/cxl/cxl.h | 19 +++---
> lib/Kconfig.debug | 8 +++
> lib/Makefile | 1 +
> lib/test_cxl/Makefile | 13 ++++
> lib/test_cxl/acpi.c | 28 +++++++++
> lib/test_cxl/core.c | 37 +++++++++++
> .../testing/cxl/test => lib/test_cxl}/cxl.c | 7 ---
> .../testing/cxl/test => lib/test_cxl}/mem.c | 0
> .../testing/cxl/test => lib/test_cxl}/mock.c | 30 ---------
> .../testing/cxl/test => lib/test_cxl}/mock.h | 0

I do not think cxl_test is suitable to ship as an in-tree capability, it
is intentionally second class to keep test logic distinct from
production code.

2022-12-18 18:15:35

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

On Fri, 16 Dec 2022 20:55:19 -0800
Dan Williams <[email protected]> wrote:

> Luis Chamberlain wrote:
> > To unit test CXL today we make use of the ndctl meson test suite but
> > this requires the cxl_test driver. There is however is no kconfig option
> > for this driver as its required to be built as an external driver. To
> > debug CXL with this framework is not inuitive as it departs itself from
> > typical Linux kernel development debugging processes, requiring an
> > external module build which also happens to *rebuild* all CXL related
> > production drivers with some new magic incantations, and replacing
> > your production CXL modules with the new ones.
> >
> > This is quite complex, departs ourselves from the typical build/boot
> > debugging process most folks are used to, and requires a manual error-prone
> > process which in some kernels / configurations can leads up to a kernel
> > crash [0].
> >
> > We can replace this by having the requirements be defined through proper
> > kconfig symbols and having the cxl_test driver and requirements also become
> > part of the standard Linux kernel build process. This matches most other
> > kernel kernel debugging frameworks for subsystems, which don't require any
> > external modules.
> >
> > Let's review the current strategy today, first, so nothing is lost:
> >
> > * one must manually *build*, and then as a second step install
> > the cxl_test driver as an external modules:
> >
> > make M=tools/testing/cxl/
> > sudo M=tools/testing/cxl/ modules_install
> >
> > Provided your depmod.d was configured correctly on your Linux
> > distribution you will end up with a complete set of CXL production
> > modules and cxl_test mock drivers to let you now use the ndctl
> > test suite. To be clear, you will not only end up with cxl_test
> > but also with a complete set of module replacements for your CXL
> > environment.
> >
> > This works by:
> >
> > a) allowing the external module to re-define the __mock macro
> > to __weak, used on to_cxl_host_bridge() and allows the mock driver
> > to provide a replacement for that single call.
> >
> > b) the external module build process *rebuilds* all production
> > modules *again* but uses the the binutils --wrap=symbol
> > feature [0] [1] to let the production CXL code use the mocked up
> > CXL features.
> >
> > We can simplify all this considerably and do away with the external
> > modules requirements. The __mock stuff is raplaced by addressing the
> > to_cxl_host_bridge() mapping using a define based on your kernel
> > configuration. If using the production code you use the produciton
> > __to_cxl_host_bridge(), otherwise mock_to_cxl_host_bridge() will be
> > used. This is the *only* eyesore in the CXL code to enable use of the
> > mock driver.
> >
> > The magic --wrap=symbol incantations are also just tucked in a new
> > production drivers/cxl/Makefile.mock which is only read when the kernel
> > has been configured for debugging using the CXl mock framework.
> >
> > The last bit of work left is to move as built-in code shared code
> > between a production environment (non-debugging) and between what is
> > needed for the same code to run when doing mock debugging. Today the
> > requirements are small:
> >
> > * The code to implement to_cxl_host_bridge()
> > * When mock debugging is enabled, just the code we need to
> > support mock_to_cxl_host_bridge()
> >
> > For both cases this is needed you have CXL_ACPI enabled.
> >
> > In the future if we wanted to then now use the kernel selftests,
> > for example a tools/testing/sefltests/cxl/ directory, we can easily
> > do so. This also enables us to separate out unit tests out from the
> > ndctl tree and allow unit tests to also be developed and written
> > upstream on the kernel.
> >
> > Another benefit of this approach is that there is no bit rot,
> > in the sense that now bots can go willy nilly test building this
> > code, whereas before only those who knew the proper incantations
> > actually were building this code and loading it properly.
> >
> > [0] https://lkml.kernel.org/r/[email protected]
> > [1] https://sourceware.org/binutils/docs-2.23.1/ld/Options.html#index-g_t_002d_002dwrap_003d_0040var_007bsymbol_007d-263
> > [2] https://lwn.net/Articles/558106/
> > Signed-off-by: Luis Chamberlain <[email protected]>
> > ---
> >
> > What do folks think?
> >
> > The test results:
> >
> > https://gist.github.com/mcgrof/2ab7f1601141faa5ac7b16240b4ea652
> >
> > The summary of test results:
> >
> > Summary of Failures:
> >
> > 1/5 ndctl:cxl / cxl-topology.sh FAIL 0.50s exit status 1
> > 2/5 ndctl:cxl / cxl-region-sysfs.sh FAIL 0.44s exit status 1
> > 5/5 ndctl:cxl / cxl-xor-region.sh FAIL 0.45s exit status 1
>
> At least for me the presented rationale and these results leaves me
> cold. Yes, there are sharp edges but 0day groks this scheme when it runs
> tools/testing/nvdimm/ (nfit_test) tests. The run_qemu script automates
> cxl_test and nfit_test this as well. So the complexity has not proven
> prohibitive. In the case of tools/testing/nvdimm/ it has not proven
> prohibitive for years. In other words the suggestion that the current
> organization ultimately leads to bit rot has not been substantiated in
> practice.
>
> The proposed direction to move tests out of the ndctl.git repo into the
> kernel solves the wrong problem. It overlooks the fact that the tests
> are more tightly coupled to libcxl changes than kernel changes. So in
> terms of benefits of code being colocated, tests + libcxl + tools in the
> same repo is more impactful than tests + kernel in the same repo.
>
> I know Jonathan has some latent ideas about building up a CXL regression
> suite on top of QEMU, but even there it's not clear that would benefit
> from being developed in linux.git given the tight coupling to QEMU.git.

QEMU based CI should go two ways:
1) QEMU CI would typically pin particular kernel version and verify that
QEMU changed don't break that. If we need new features for a new test,
we move that kernel version used. Existing tests should never break
against a fixed kernel version as that's a regression in QEMU (or
maybe a bug elsewhere) Ultimately we should have this running in the
normal QEMU gitlab CI.
2) Kernel CI against QEMU would typically pin particular QEMU version
and check that kernel changes don't break. This will have rough edges
for a while yet as we are still adding mandatory features to the QEMU
emulation (e.g. events support). Again, as we add new features / tests
may need to move the QEMU version forwards to support them.

I don't think we much care about backwards compatibility so once we've
moved the pinned element forwards in the above, we won't care about the
old version. The aim here isn't really to ensure no regressions when
running on QEMU (though that CI is nice to have), but more that we have
no problems in kernel side of things.

This is a way off yet. Not seeing this as being part of linux.git.
The QEMU CI stuff will be in the qemu.git and Kernel CI stuff probably
sit out of tree - there shouldn't be a tight coupling beyond new tests
wanting to check available features etc. I might ask a friendly
CI project to add this to their normal runs.

I don't have strong feelings on cxl_test. Tend not to use it myself
and haven't yet contributed to it.

Jonathan

>
> >
> > Ok: 2
> > Expected Fail: 0
> > Fail: 3
> > Unexpected Pass: 0
> > Skipped: 0
> > Timeout: 0
> >
> > I don't quite get the failures yet, but hey it's a start.
> > This commit depends on Dan's patch:
> >
> > https://lkmll.kernel.org/r/[email protected]
> >
> > But I can build another RFC if folks want without it.
> >
> > drivers/cxl/Kconfig | 23 +++++++
> > drivers/cxl/Makefile | 2 +
> > drivers/cxl/Makefile.mock | 15 +++++
> > drivers/cxl/acpi.c | 13 ----
> > drivers/cxl/core/Makefile | 4 ++
> > drivers/cxl/core/acpi.c | 30 +++++++++
> > drivers/cxl/cxl.h | 19 +++---
> > lib/Kconfig.debug | 8 +++
> > lib/Makefile | 1 +
> > lib/test_cxl/Makefile | 13 ++++
> > lib/test_cxl/acpi.c | 28 +++++++++
> > lib/test_cxl/core.c | 37 +++++++++++
> > .../testing/cxl/test => lib/test_cxl}/cxl.c | 7 ---
> > .../testing/cxl/test => lib/test_cxl}/mem.c | 0
> > .../testing/cxl/test => lib/test_cxl}/mock.c | 30 ---------
> > .../testing/cxl/test => lib/test_cxl}/mock.h | 0
>
> I do not think cxl_test is suitable to ship as an in-tree capability, it
> is intentionally second class to keep test logic distinct from
> production code.

2022-12-19 18:57:18

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

On Sun, Dec 18, 2022 at 04:08:24PM +0000, Jonathan Cameron wrote:
> QEMU based CI should go two ways:
> 1) QEMU CI would typically pin particular kernel version and verify that
> QEMU changed don't break that. If we need new features for a new test,
> we move that kernel version used. Existing tests should never break
> against a fixed kernel version as that's a regression in QEMU (or
> maybe a bug elsewhere) Ultimately we should have this running in the
> normal QEMU gitlab CI.

Sounds sensible.

> 2) Kernel CI against QEMU would typically pin particular QEMU version
> and check that kernel changes don't break. This will have rough edges
> for a while yet as we are still adding mandatory features to the QEMU
> emulation (e.g. events support). Again, as we add new features / tests
> may need to move the QEMU version forwards to support them.

Sure - but for this today other than ensuring a kernel does not crash upon
bootup we also have cxl_test, but not much else.

We'll want to exand a set of target tests on CXL enabled nodes, without
cxl_test. Other than verifying the topology matches, we'll want to start
mimicking actual use cases / performance stuff.

> I don't think we much care about backwards compatibility so once we've
> moved the pinned element forwards in the above, we won't care about the
> old version.

Making tests simply skip if the feature is not available doens't take
much effort but forward thinking.

> The aim here isn't really to ensure no regressions when
> running on QEMU (though that CI is nice to have), but more that we have
> no problems in kernel side of things.

Sure.

> This is a way off yet. Not seeing this as being part of linux.git.
> The QEMU CI stuff will be in the qemu.git and Kernel CI stuff probably
> sit out of tree - there shouldn't be a tight coupling beyond new tests
> wanting to check available features etc. I might ask a friendly
> CI project to add this to their normal runs.

OK in case it helps, cxl-enabled qemu building bringup / ndctl building
and install is all now automated and integratead as part of kdevops so
patches welcomed to expand that coverage.

> I don't have strong feelings on cxl_test. Tend not to use it myself
> and haven't yet contributed to it.

Thanks, this is useful information.

Luis

2022-12-19 20:19:02

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

On Fri, Dec 16, 2022 at 08:55:19PM -0800, Dan Williams wrote:
> In other words the suggestion that the current
> organization ultimately leads to bit rot has not been substantiated in
> practice.

On top of this patch I just added a custom debug patch to my tree which
enables CXL_BUS and CXL_TEST by default when this is currently allowed
and it got quite a bit of kernel build warnings. Although some of these
are specific to my change, some of them do not seem to be related to
that and likely could benefit from fixing:

https://gist.github.com/mcgrof/73dce72939590c6edc9413b0384ae4c2

And so although you may not see some build warnings so far, it does not
negate my suggestion that having cxl_test as a proper upstream driver strategy
gets you more build testing / coverage.

> The proposed direction to move tests out of the ndctl.git repo into the
> kernel solves the wrong problem.

That's not in any way what I suggested and is not the point to my patch.

The proposed patch does not suggest to ditch ndctl unit tests but to
*enable* also sefltests to make use of cxl_test using the selftests
framework, which is very different. It is not saying "abandon" ndctl
unit tests, but rather, "also enable linux kernel selftests for CXL with
cxl_test".

But more importantly, it looks for the value of proper kernel
integration and making use of kconfig for the actual dependencies
and requirements. This is of high value.

In addition to this, one possible area I see of value with this change is the
ability to also use the wrap feature later, even without cxl_test to enable
error injection. What would this look like? You simply replace one built in
routine as you do with another which has sprinkled in should_fail() calls,
which otherwise would be an eyesore upstream. This shold also then not
depend the rest of cxl_test stuff, but can make use of building
alternative wrap routines which could be replacement for upstream ones.

Another benefit of this strategy is you can also test cxl_test *without*
the need for for requiring modules, which some folks prefer for testing.
At LSFMM this came up for instance and one of the biggest grudges with
testing some frameworks was the dependency on modules.

So requiring modules is also limitting from test scrope perspective.

> So in terms of benefits of code being colocated, tests + libcxl + tools in the
> same repo is more impactful than tests + kernel in the same repo.

That usage does not negate any possible benefit of enabling selftests upstream
too.

> I know Jonathan has some latent ideas about building up a CXL regression
> suite on top of QEMU, but even there it's not clear that would benefit
> from being developed in linux.git given the tight coupling to QEMU.git.

Definitely not. Such tests should exist outside of the kernel tree.

Luis


Attachments:
(No filename) (2.81 kB)
(No filename) (7.16 kB)
PGP Key 0x002C0B2920E6ABEF510EFE4FE1CA73A3C9594E24.
Download all attachments

2022-12-20 00:52:16

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

Luis Chamberlain wrote:
> On Fri, Dec 16, 2022 at 08:55:19PM -0800, Dan Williams wrote:
> > In other words the suggestion that the current
> > organization ultimately leads to bit rot has not been substantiated in
> > practice.
>
> On top of this patch I just added a custom debug patch to my tree which
> enables CXL_BUS and CXL_TEST by default when this is currently allowed
> and it got quite a bit of kernel build warnings. Although some of these
> are specific to my change, some of them do not seem to be related to
> that and likely could benefit from fixing:
>
> https://gist.github.com/mcgrof/73dce72939590c6edc9413b0384ae4c2
>
> And so although you may not see some build warnings so far, it does not
> negate my suggestion that having cxl_test as a proper upstream driver strategy
> gets you more build testing / coverage.

If autobuild coverage of test components is the main concern then
cxl_test can copy what nfit_test is doing with CONFIG_NVDIMM_TEST_BUILD.
No need for disruptive redesign of how this facility is integrated.

> > The proposed direction to move tests out of the ndctl.git repo into the
> > kernel solves the wrong problem.
>
> That's not in any way what I suggested and is not the point to my patch.
>
> The proposed patch does not suggest to ditch ndctl unit tests but to
> *enable* also sefltests to make use of cxl_test using the selftests
> framework, which is very different. It is not saying "abandon" ndctl
> unit tests, but rather, "also enable linux kernel selftests for CXL with
> cxl_test".

I think centralizing test scripts is a virtue, and right now the
momentum is with those located ndctl.git. This is why I jumped to the
conclusion that the long term direction would be to pick one location
for maintainer regression tests.

> But more importantly, it looks for the value of proper kernel
> integration and making use of kconfig for the actual dependencies
> and requirements. This is of high value.
>
> In addition to this, one possible area I see of value with this change is the
> ability to also use the wrap feature later, even without cxl_test to enable
> error injection. What would this look like? You simply replace one built in
> routine as you do with another which has sprinkled in should_fail() calls,
> which otherwise would be an eyesore upstream. This shold also then not
> depend the rest of cxl_test stuff, but can make use of building
> alternative wrap routines which could be replacement for upstream ones.
>
> Another benefit of this strategy is you can also test cxl_test *without*
> the need for for requiring modules, which some folks prefer for testing.
> At LSFMM this came up for instance and one of the biggest grudges with
> testing some frameworks was the dependency on modules.

I do think this is the void that QEMU CXL testing would attempt to fill.

2022-12-20 03:11:10

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

On Mon, Dec 19, 2022 at 04:27:10PM -0800, Dan Williams wrote:
> Luis Chamberlain wrote:
> > On Fri, Dec 16, 2022 at 08:55:19PM -0800, Dan Williams wrote:
> > > In other words the suggestion that the current
> > > organization ultimately leads to bit rot has not been substantiated in
> > > practice.
> >
> > On top of this patch I just added a custom debug patch to my tree which
> > enables CXL_BUS and CXL_TEST by default when this is currently allowed
> > and it got quite a bit of kernel build warnings. Although some of these
> > are specific to my change, some of them do not seem to be related to
> > that and likely could benefit from fixing:
> >
> > https://gist.github.com/mcgrof/73dce72939590c6edc9413b0384ae4c2
> >
> > And so although you may not see some build warnings so far, it does not
> > negate my suggestion that having cxl_test as a proper upstream driver strategy
> > gets you more build testing / coverage.
>
> If autobuild coverage of test components is the main concern then
> cxl_test can copy what nfit_test is doing with CONFIG_NVDIMM_TEST_BUILD.
> No need for disruptive redesign of how this facility is integrated.

I've itemized a list of gains of having this properly integrated. What
gains are there of this being an external module other than a few folks
are used to it and it been done before for other subsystems?

> > > The proposed direction to move tests out of the ndctl.git repo into the
> > > kernel solves the wrong problem.
> >
> > That's not in any way what I suggested and is not the point to my patch.
> >
> > The proposed patch does not suggest to ditch ndctl unit tests but to
> > *enable* also sefltests to make use of cxl_test using the selftests
> > framework, which is very different. It is not saying "abandon" ndctl
> > unit tests, but rather, "also enable linux kernel selftests for CXL with
> > cxl_test".
>
> I think centralizing test scripts is a virtue, and right now the
> momentum is with those located ndctl.git. This is why I jumped to the
> conclusion that the long term direction would be to pick one location
> for maintainer regression tests.

That's fine for ndctl unit tests.

> > But more importantly, it looks for the value of proper kernel
> > integration and making use of kconfig for the actual dependencies
> > and requirements. This is of high value.
> >
> > In addition to this, one possible area I see of value with this change is the
> > ability to also use the wrap feature later, even without cxl_test to enable
> > error injection. What would this look like? You simply replace one built in
> > routine as you do with another which has sprinkled in should_fail() calls,
> > which otherwise would be an eyesore upstream. This shold also then not
> > depend the rest of cxl_test stuff, but can make use of building
> > alternative wrap routines which could be replacement for upstream ones.
> >
> > Another benefit of this strategy is you can also test cxl_test *without*
> > the need for for requiring modules, which some folks prefer for testing.
> > At LSFMM this came up for instance and one of the biggest grudges with
> > testing some frameworks was the dependency on modules.
>
> I do think this is the void that QEMU CXL testing would attempt to fill.

Not for error injection support. You need proper kernel support for that.

Luis

2022-12-20 17:12:24

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

On Mon, 19 Dec 2022 10:20:55 -0800
Luis Chamberlain <[email protected]> wrote:

> On Sun, Dec 18, 2022 at 04:08:24PM +0000, Jonathan Cameron wrote:
> > QEMU based CI should go two ways:
> > 1) QEMU CI would typically pin particular kernel version and verify that
> > QEMU changed don't break that. If we need new features for a new test,
> > we move that kernel version used. Existing tests should never break
> > against a fixed kernel version as that's a regression in QEMU (or
> > maybe a bug elsewhere) Ultimately we should have this running in the
> > normal QEMU gitlab CI.
>
> Sounds sensible.
>
> > 2) Kernel CI against QEMU would typically pin particular QEMU version
> > and check that kernel changes don't break. This will have rough edges
> > for a while yet as we are still adding mandatory features to the QEMU
> > emulation (e.g. events support). Again, as we add new features / tests
> > may need to move the QEMU version forwards to support them.
>
> Sure - but for this today other than ensuring a kernel does not crash upon
> bootup we also have cxl_test, but not much else.
>
> We'll want to exand a set of target tests on CXL enabled nodes, without
> cxl_test. Other than verifying the topology matches, we'll want to start
> mimicking actual use cases / performance stuff.

Definitely good to mimic usecases, but in using the QEMU emulation,
performance is probably not sensible as there are some horrible slow
paths in the emulation such as how we do interleaving. Could be
improved with some caching of look ups, but the result wouldn't look
much like real devices.

Doing it on a limited set of hardware is viable, but it'll be a while
before we have all the fun options available.

>
> > I don't think we much care about backwards compatibility so once we've
> > moved the pinned element forwards in the above, we won't care about the
> > old version.
>
> Making tests simply skip if the feature is not available doens't take
> much effort but forward thinking.
>
> > The aim here isn't really to ensure no regressions when
> > running on QEMU (though that CI is nice to have), but more that we have
> > no problems in kernel side of things.
>
> Sure.
>
> > This is a way off yet. Not seeing this as being part of linux.git.
> > The QEMU CI stuff will be in the qemu.git and Kernel CI stuff probably
> > sit out of tree - there shouldn't be a tight coupling beyond new tests
> > wanting to check available features etc. I might ask a friendly
> > CI project to add this to their normal runs.
>
> OK in case it helps, cxl-enabled qemu building bringup / ndctl building
> and install is all now automated and integratead as part of kdevops so
> patches welcomed to expand that coverage.

Excellent. I'll take a look at kdevops sometime next year.

>
> > I don't have strong feelings on cxl_test. Tend not to use it myself
> > and haven't yet contributed to it.
>
> Thanks, this is useful information.

Different approaches. I can appreciate cxl_tests usecase, but I've mostly been hacking
on the corners that are tricky to test with a mocking driver and
it's easy to hack stuff into the QEMU emulation (for me - now :)

Jonathan

>
> Luis

2022-12-20 20:04:42

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

Luis Chamberlain wrote:
> On Mon, Dec 19, 2022 at 04:27:10PM -0800, Dan Williams wrote:
> > Luis Chamberlain wrote:
> > > On Fri, Dec 16, 2022 at 08:55:19PM -0800, Dan Williams wrote:
> > > > In other words the suggestion that the current
> > > > organization ultimately leads to bit rot has not been substantiated in
> > > > practice.
> > >
> > > On top of this patch I just added a custom debug patch to my tree which
> > > enables CXL_BUS and CXL_TEST by default when this is currently allowed
> > > and it got quite a bit of kernel build warnings. Although some of these
> > > are specific to my change, some of them do not seem to be related to
> > > that and likely could benefit from fixing:
> > >
> > > https://gist.github.com/mcgrof/73dce72939590c6edc9413b0384ae4c2
> > >
> > > And so although you may not see some build warnings so far, it does not
> > > negate my suggestion that having cxl_test as a proper upstream driver strategy
> > > gets you more build testing / coverage.
> >
> > If autobuild coverage of test components is the main concern then
> > cxl_test can copy what nfit_test is doing with CONFIG_NVDIMM_TEST_BUILD.
> > No need for disruptive redesign of how this facility is integrated.
>
> I've itemized a list of gains of having this properly integrated. What
> gains are there of this being an external module other than a few folks
> are used to it and it been done before for other subsystems?

Your crash report is a prime example of why this needs to stay an
external module. Any redefinition of what a symbol does via --wrap= is a
fragile proposition. The fact that crash signatures with cxl_test loaded
have the external module taint flag set is a feature. The --wrap= option
has no business within the main tree because it violates the valid
assumptions of other cxl_test-innocent developers.

The benefit that resonated with me during this discussion was more
compile test coverage for cxl_test components. However, that is achieved
by tools/testing/cxl/ adopting the same compile coverage scheme that
tools/testing/nvdimm/ has with CONFIG_NVDIMM_TEST_BUILD.

2023-01-04 19:57:25

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC] cxl_test: upgrade as a first class citizen selftests capable driver

On Tue, Dec 20, 2022 at 11:51:41AM -0800, Dan Williams wrote:
> Luis Chamberlain wrote:
> > On Mon, Dec 19, 2022 at 04:27:10PM -0800, Dan Williams wrote:
> > > Luis Chamberlain wrote:
> > > > On Fri, Dec 16, 2022 at 08:55:19PM -0800, Dan Williams wrote:
> > > > > In other words the suggestion that the current
> > > > > organization ultimately leads to bit rot has not been substantiated in
> > > > > practice.
> > > >
> > > > On top of this patch I just added a custom debug patch to my tree which
> > > > enables CXL_BUS and CXL_TEST by default when this is currently allowed
> > > > and it got quite a bit of kernel build warnings. Although some of these
> > > > are specific to my change, some of them do not seem to be related to
> > > > that and likely could benefit from fixing:
> > > >
> > > > https://gist.github.com/mcgrof/73dce72939590c6edc9413b0384ae4c2
> > > >
> > > > And so although you may not see some build warnings so far, it does not
> > > > negate my suggestion that having cxl_test as a proper upstream driver strategy
> > > > gets you more build testing / coverage.
> > >
> > > If autobuild coverage of test components is the main concern then
> > > cxl_test can copy what nfit_test is doing with CONFIG_NVDIMM_TEST_BUILD.
> > > No need for disruptive redesign of how this facility is integrated.
> >
> > I've itemized a list of gains of having this properly integrated. What
> > gains are there of this being an external module other than a few folks
> > are used to it and it been done before for other subsystems?
>
> Your crash report is a prime example of why this needs to stay an
> external module.

The crash *can* be avoided completely *iff* the semantics over the
requirements are expressed clearly through kconfig. My follow up patch
to the top level Makefile INSTALL_MOD_DIR to use "updates" instead of
"extra" essentially exposed anyone other than folks using a specific
version of RHEL or Fedora *easily* can end up crashing with cxl_test.
That's I think a far much worse predicament.

> Any redefinition of what a symbol does via --wrap= is a
> fragile proposition.

It is what cxl_test does though. Supporting it as a module Vs built-in
has no difference except as exposing semantics and requirements clearly.

> The fact that crash signatures with cxl_test loaded
> have the external module taint flag set is a feature.

I helped review the patch that added the taint flag for all testing
modules, that it does not mean we can't add it for built-in. This can
easily be done for instance with a kconfig symbol which pegs the taint
for any test module as built-in. In fact if we're not tainint built-in
test modules that change should happen anyway.

Having a test module be built-in or not shoud in no way shape or form
affect your testing. If the driver *happens* to rely on module load
and unload to any clean state machine -- that should be fixed given
the slew of bugs I have found other test modules which follow similar
logic.

> The --wrap= option
> has no business within the main tree

If --wrap is really unreliable, the unreliable aspects should be
documented, however it seems in this case it is just seems for cxl
its only used for testing. And whether or not you test with built-in
or modules should have no effect over --wrap.

> because it violates the valid
> assumptions of other cxl_test-innocent developers.

By having cxl_test be a proper upstream driver you define the
requirements clearly, and the kconfig symbols enabled are sufficient
to only let that module be built if folks are ready to shoot themseves
on the foot. Today the semantics are not clear, and in fact relies on
a old distro INSTALL_MOD_DIR assumption.

> The benefit that resonated with me during this discussion was more
> compile test coverage for cxl_test components.

Those benefits still stand.

> However, that is achieved
> by tools/testing/cxl/ adopting the same compile coverage scheme that
> tools/testing/nvdimm/ has with CONFIG_NVDIMM_TEST_BUILD.

That does not by any means mean that CONFIG_NVDIMM_TEST_BUILD can also
be converted to allow built-in. In fact I'd argue the same for that too.
The same benefits applies there too.

Luis