This series make KFENCE to be more convenient to adjust parameters in
not only debug process but also production situations. In different
production and development stage, the demands of memory and CPU
limitations for KFENCE is quite different. In order to satisfy these
demands with a uniform kernel release, dynamically adjust KFENCE
parameters is needed.
Signed-off-by: Peng Liu <[email protected]>
Peng Liu (3):
kfence: Add a module parameter to adjust kfence objects
kfence: Optimize branches prediction when sample interval is zero
kfence: Make test case compatible with run time set sample interval
Documentation/dev-tools/kfence.rst | 14 ++--
include/linux/kfence.h | 10 ++-
mm/kfence/core.c | 113 ++++++++++++++++++++++++-----
mm/kfence/kfence.h | 2 +-
mm/kfence/kfence_test.c | 10 +--
5 files changed, 116 insertions(+), 33 deletions(-)
--
2.18.0.huawei.25
KFENCE is designed to be enabled in production kernels, but it can
be also useful in some debug situations. For machines with limited
memory and CPU resources, KASAN is really hard to run. Fortunately,
KFENCE can be a suitable candidate. For KFENCE running on a single
machine, the possibility of discovering existed bugs will increase
as the increasing of KFENCE objects, but this will cost more memory.
In order to balance the possibility of discovering existed bugs and
memory cost, KFENCE objects need to be adjusted according to memory
resources for a compiled kernel Image. Add a module parameter to
adjust KFENCE objects will make kfence to use in different machines
with the same kernel Image.
In short, the following reasons motivate us to add this parameter.
1) In some debug situations, this will make kfence flexible.
2) For some production machines with different memory and CPU size,
this will reduce the kernel-Image-version burden.
The main change is just using kfence_num_objects to replace config
CONFIG_KFENCE_NUM_OBJECTS for dynamic configuration convenient. To
make compatible, kfence_metadata and alloc_covered are alloced by
memblock_alloc. Considering "cat /sys/kernel/debug/kfence/objects"
will read kfence_metadata, the initialization of this fs should
check whether kfence_metadata is successfully allocated.
This patch (of 3):
The most important motivation of this patch series is to make
KFENCE easy-to-use in business situations.
Signed-off-by: Peng Liu <[email protected]>
---
Documentation/dev-tools/kfence.rst | 14 ++--
include/linux/kfence.h | 3 +-
mm/kfence/core.c | 108 ++++++++++++++++++++++++-----
mm/kfence/kfence.h | 2 +-
mm/kfence/kfence_test.c | 2 +-
5 files changed, 103 insertions(+), 26 deletions(-)
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
index ac6b89d1a8c3..e079dfb81dfe 100644
--- a/Documentation/dev-tools/kfence.rst
+++ b/Documentation/dev-tools/kfence.rst
@@ -41,12 +41,14 @@ guarded by KFENCE. The default is configurable via the Kconfig option
``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
disables KFENCE.
-The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
-further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
-255), the number of available guarded objects can be controlled. Each object
-requires 2 pages, one for the object itself and the other one used as a guard
-page; object pages are interleaved with guard pages, and every object page is
-therefore surrounded by two guard pages.
+The KFENCE memory pool size is also an important parameter, which can be
+can be set via the kernel boot parameter ``kfence.num_objects``, and if the
+pool is exhausted, no further KFENCE allocations occur. The KFENCE memory
+pool size Each object requires 2 pages, one for the object itself and the
+other one used as a guard page; object pages are interleaved with guard pages,
+and every object page is therefore surrounded by two guard pages. The default
+is configurable via the Kconfig option ``CONFIG_KFENCE_SAMPLE_INTERVAL``
+(default 255).
The total memory dedicated to the KFENCE memory pool can be computed as::
diff --git a/include/linux/kfence.h b/include/linux/kfence.h
index 4b5e3679a72c..aec4f6b247b5 100644
--- a/include/linux/kfence.h
+++ b/include/linux/kfence.h
@@ -17,12 +17,13 @@
#include <linux/atomic.h>
#include <linux/static_key.h>
+extern unsigned long kfence_num_objects;
/*
* We allocate an even number of pages, as it simplifies calculations to map
* address to metadata indices; effectively, the very first page serves as an
* extended guard page, but otherwise has no special purpose.
*/
-#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
+#define KFENCE_POOL_SIZE ((kfence_num_objects + 1) * 2 * PAGE_SIZE)
extern char *__kfence_pool;
DECLARE_STATIC_KEY_FALSE(kfence_allocation_key);
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 5ad40e3add45..4655bcc0306e 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -34,6 +34,9 @@
#include "kfence.h"
+#define MIN_KFENCE_OBJECTS 1
+#define MAX_KFENCE_OBJECTS 65535
+
/* Disables KFENCE on the first warning assuming an irrecoverable error. */
#define KFENCE_WARN_ON(cond) \
({ \
@@ -93,12 +96,53 @@ module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644)
char *__kfence_pool __ro_after_init;
EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
+/*
+ * The number of kfence objects will affect performance and bug detection
+ * accuracy. The initial value of this global parameter is determined by
+ * compiling settings.
+ */
+unsigned long kfence_num_objects = CONFIG_KFENCE_NUM_OBJECTS;
+EXPORT_SYMBOL(kfence_num_objects); /* Export for test modules. */
+
+static int param_set_num_objects(const char *val, const struct kernel_param *kp)
+{
+ unsigned long num;
+
+ if (system_state != SYSTEM_BOOTING)
+ return -EINVAL; /* Cannot adjust KFENCE objects number on-the-fly. */
+
+ if (kstrtoul(val, 0, &num) < 0)
+ return -EINVAL;
+
+ if (num < MIN_KFENCE_OBJECTS || num > MAX_KFENCE_OBJECTS) {
+ pr_warn("kfence_num_objects = %lu is not in valid range [%d,%d]\n",
+ num, MIN_KFENCE_OBJECTS, MAX_KFENCE_OBJECTS);
+ return -EINVAL;
+ }
+
+ *((unsigned long *)kp->arg) = num;
+ return 0;
+}
+
+static int param_get_num_objects(char *buffer, const struct kernel_param *kp)
+{
+ if (!READ_ONCE(kfence_enabled))
+ return sprintf(buffer, "0\n");
+
+ return param_get_ulong(buffer, kp);
+}
+
+static const struct kernel_param_ops num_objects_param_ops = {
+ .set = param_set_num_objects,
+ .get = param_get_num_objects,
+};
+module_param_cb(num_objects, &num_objects_param_ops, &kfence_num_objects, 0600);
+
/*
* Per-object metadata, with one-to-one mapping of object metadata to
* backing pages (in __kfence_pool).
*/
-static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0);
-struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
+struct kfence_metadata *kfence_metadata;
/* Freelist with available objects. */
static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist);
@@ -124,11 +168,11 @@ atomic_t kfence_allocation_gate = ATOMIC_INIT(1);
* P(alloc_traces) = (1 - e^(-HNUM * (alloc_traces / SIZE)) ^ HNUM
*/
#define ALLOC_COVERED_HNUM 2
-#define ALLOC_COVERED_ORDER (const_ilog2(CONFIG_KFENCE_NUM_OBJECTS) + 2)
+#define ALLOC_COVERED_ORDER (const_ilog2(kfence_num_objects) + 2)
#define ALLOC_COVERED_SIZE (1 << ALLOC_COVERED_ORDER)
#define ALLOC_COVERED_HNEXT(h) hash_32(h, ALLOC_COVERED_ORDER)
#define ALLOC_COVERED_MASK (ALLOC_COVERED_SIZE - 1)
-static atomic_t alloc_covered[ALLOC_COVERED_SIZE];
+static atomic_t *alloc_covered;
/* Stack depth used to determine uniqueness of an allocation. */
#define UNIQUE_ALLOC_STACK_DEPTH ((size_t)8)
@@ -168,7 +212,7 @@ static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT);
static inline bool should_skip_covered(void)
{
- unsigned long thresh = (CONFIG_KFENCE_NUM_OBJECTS * kfence_skip_covered_thresh) / 100;
+ unsigned long thresh = (kfence_num_objects * kfence_skip_covered_thresh) / 100;
return atomic_long_read(&counters[KFENCE_COUNTER_ALLOCATED]) > thresh;
}
@@ -236,7 +280,7 @@ static inline struct kfence_metadata *addr_to_metadata(unsigned long addr)
* error.
*/
index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) - 1;
- if (index < 0 || index >= CONFIG_KFENCE_NUM_OBJECTS)
+ if (index < 0 || index >= kfence_num_objects)
return NULL;
return &kfence_metadata[index];
@@ -251,7 +295,7 @@ static inline unsigned long metadata_to_pageaddr(const struct kfence_metadata *m
/* Only call with a pointer into kfence_metadata. */
if (KFENCE_WARN_ON(meta < kfence_metadata ||
- meta >= kfence_metadata + CONFIG_KFENCE_NUM_OBJECTS))
+ meta >= kfence_metadata + kfence_num_objects))
return 0;
/*
@@ -577,7 +621,7 @@ static bool __init kfence_init_pool(void)
addr += PAGE_SIZE;
}
- for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
+ for (i = 0; i < kfence_num_objects; i++) {
struct kfence_metadata *meta = &kfence_metadata[i];
/* Initialize metadata. */
@@ -638,7 +682,7 @@ DEFINE_SHOW_ATTRIBUTE(stats);
*/
static void *start_object(struct seq_file *seq, loff_t *pos)
{
- if (*pos < CONFIG_KFENCE_NUM_OBJECTS)
+ if (*pos < kfence_num_objects)
return (void *)((long)*pos + 1);
return NULL;
}
@@ -650,7 +694,7 @@ static void stop_object(struct seq_file *seq, void *v)
static void *next_object(struct seq_file *seq, void *v, loff_t *pos)
{
++*pos;
- if (*pos < CONFIG_KFENCE_NUM_OBJECTS)
+ if (*pos < kfence_num_objects)
return (void *)((long)*pos + 1);
return NULL;
}
@@ -692,7 +736,10 @@ static int __init kfence_debugfs_init(void)
struct dentry *kfence_dir = debugfs_create_dir("kfence", NULL);
debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops);
- debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops);
+
+ /* Variable kfence_metadata may fail to allocate. */
+ if (kfence_metadata)
+ debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops);
return 0;
}
@@ -756,13 +803,40 @@ static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
void __init kfence_alloc_pool(void)
{
+ phys_addr_t metadata_size;
+ phys_addr_t covered_size;
+
if (!kfence_sample_interval)
return;
- __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
+ metadata_size = sizeof(struct kfence_metadata) * kfence_num_objects;
+ kfence_metadata = memblock_alloc(metadata_size, PAGE_SIZE);
+ if (!kfence_metadata) {
+ pr_err("failed to allocate metadata\n");
+ return;
+ }
- if (!__kfence_pool)
+ covered_size = sizeof(atomic_t) * kfence_num_objects;
+ alloc_covered = memblock_alloc(covered_size, PAGE_SIZE);
+ if (!alloc_covered) {
+ pr_err("failed to allocate covered\n");
+ goto fail_covered;
+ }
+
+ __kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
+ if (!__kfence_pool) {
pr_err("failed to allocate pool\n");
+ goto fail_pool;
+ }
+
+ return;
+
+fail_pool:
+ memblock_free(alloc_covered, covered_size);
+ alloc_covered = NULL;
+fail_covered:
+ memblock_free(kfence_metadata, metadata_size);
+ kfence_metadata = NULL;
}
void __init kfence_init(void)
@@ -781,8 +855,8 @@ void __init kfence_init(void)
static_branch_enable(&kfence_allocation_key);
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
- pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
- CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
+ pr_info("initialized - using %lu bytes for %lu objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
+ kfence_num_objects, (void *)__kfence_pool,
(void *)(__kfence_pool + KFENCE_POOL_SIZE));
}
@@ -792,7 +866,7 @@ void kfence_shutdown_cache(struct kmem_cache *s)
struct kfence_metadata *meta;
int i;
- for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
+ for (i = 0; i < kfence_num_objects; i++) {
bool in_use;
meta = &kfence_metadata[i];
@@ -831,7 +905,7 @@ void kfence_shutdown_cache(struct kmem_cache *s)
}
}
- for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
+ for (i = 0; i < kfence_num_objects; i++) {
meta = &kfence_metadata[i];
/* See above. */
diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h
index 2a2d5de9d379..c6d37fad977a 100644
--- a/mm/kfence/kfence.h
+++ b/mm/kfence/kfence.h
@@ -91,7 +91,7 @@ struct kfence_metadata {
u32 alloc_stack_hash;
};
-extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
+extern struct kfence_metadata *kfence_metadata;
/* KFENCE error types for report generation. */
enum kfence_error_type {
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index a22b1af85577..084e3a55aebb 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -623,7 +623,7 @@ static void test_gfpzero(struct kunit *test)
break;
test_free(buf2);
- if (i == CONFIG_KFENCE_NUM_OBJECTS) {
+ if (i == kfence_num_objects) {
kunit_warn(test, "giving up ... cannot get same object back\n");
return;
}
--
2.18.0.huawei.25
In order to release a uniform kernel with KFENCE, it is good to
compile it with CONFIG_KFENCE_SAMPLE_INTERVAL = 0. For a group of
produtions who don't want to use KFENCE, they can use kernel just
as original vesion without KFENCE. For KFENCE users, they can open
it by setting the kernel boot parameter kfence.sample_interval.
Hence, set KFENCE sample interval default to zero is convenient.
The current KFENCE is supportted to adjust sample interval via the
kernel boot parameter. However, branches prediction in kfence_alloc
is not good for situation with CONFIG_KFENCE_SAMPLE_INTERVAL = 0
and boot parameter kfence.sample_interval != 0, which is because
the current kfence_alloc is likely to return NULL when
CONFIG_KFENCE_SAMPLE_INTERVAL = 0. To optimize branches prediction
in this situation, kfence_enabled will check firstly.
Signed-off-by: Peng Liu <[email protected]>
---
include/linux/kfence.h | 5 ++++-
mm/kfence/core.c | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/kfence.h b/include/linux/kfence.h
index aec4f6b247b5..bf91b76b87ee 100644
--- a/include/linux/kfence.h
+++ b/include/linux/kfence.h
@@ -17,6 +17,7 @@
#include <linux/atomic.h>
#include <linux/static_key.h>
+extern bool kfence_enabled;
extern unsigned long kfence_num_objects;
/*
* We allocate an even number of pages, as it simplifies calculations to map
@@ -115,7 +116,9 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags);
*/
static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
{
-#if defined(CONFIG_KFENCE_STATIC_KEYS) || CONFIG_KFENCE_SAMPLE_INTERVAL == 0
+ if (!kfence_enabled)
+ return NULL;
+#if defined(CONFIG_KFENCE_STATIC_KEYS)
if (!static_branch_unlikely(&kfence_allocation_key))
return NULL;
#else
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 4655bcc0306e..2301923182b8 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -48,7 +48,7 @@
/* === Data ================================================================= */
-static bool kfence_enabled __read_mostly;
+bool kfence_enabled __read_mostly;
static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
--
2.18.0.huawei.25
The parameter kfence_sample_interval can be set via boot parameter
and late shell command. However, KFENCE test case just use compile
time CONFIG_KFENCE_SAMPLE_INTERVAL, this will make KFENCE test case
not run as user desired. This patch will make KFENCE test case
compatible with run-time-set sample interval.
Signed-off-by: Peng Liu <[email protected]>
---
include/linux/kfence.h | 2 ++
mm/kfence/core.c | 3 ++-
mm/kfence/kfence_test.c | 8 ++++----
3 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/kfence.h b/include/linux/kfence.h
index bf91b76b87ee..0fc913a7f017 100644
--- a/include/linux/kfence.h
+++ b/include/linux/kfence.h
@@ -19,6 +19,8 @@
extern bool kfence_enabled;
extern unsigned long kfence_num_objects;
+extern unsigned long kfence_sample_interval;
+
/*
* We allocate an even number of pages, as it simplifies calculations to map
* address to metadata indices; effectively, the very first page serves as an
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 2301923182b8..e2fcae34cc84 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -50,7 +50,8 @@
bool kfence_enabled __read_mostly;
-static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
+unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
+EXPORT_SYMBOL(kfence_sample_interval); /* Export for test modules. */
#ifdef MODULE_PARAM_PREFIX
#undef MODULE_PARAM_PREFIX
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index 084e3a55aebb..97ff3a133f11 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -268,13 +268,13 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat
* 100x the sample interval should be more than enough to ensure we get
* a KFENCE allocation eventually.
*/
- timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL);
+ timeout = jiffies + msecs_to_jiffies(100 * kfence_sample_interval);
/*
* Especially for non-preemption kernels, ensure the allocation-gate
* timer can catch up: after @resched_after, every failed allocation
* attempt yields, to ensure the allocation-gate timer is scheduled.
*/
- resched_after = jiffies + msecs_to_jiffies(CONFIG_KFENCE_SAMPLE_INTERVAL);
+ resched_after = jiffies + msecs_to_jiffies(kfence_sample_interval);
do {
if (test_cache)
alloc = kmem_cache_alloc(test_cache, gfp);
@@ -608,7 +608,7 @@ static void test_gfpzero(struct kunit *test)
int i;
/* Skip if we think it'd take too long. */
- KFENCE_TEST_REQUIRES(test, CONFIG_KFENCE_SAMPLE_INTERVAL <= 100);
+ KFENCE_TEST_REQUIRES(test, kfence_sample_interval <= 100);
setup_test_cache(test, size, 0, NULL);
buf1 = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
@@ -739,7 +739,7 @@ static void test_memcache_alloc_bulk(struct kunit *test)
* 100x the sample interval should be more than enough to ensure we get
* a KFENCE allocation eventually.
*/
- timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL);
+ timeout = jiffies + msecs_to_jiffies(100 * kfence_sample_interval);
do {
void *objects[100];
int i, num = kmem_cache_alloc_bulk(test_cache, GFP_ATOMIC, ARRAY_SIZE(objects),
--
2.18.0.huawei.25
On Mon, Jan 24, 2022 at 02:52AM +0000, Peng Liu wrote:
> KFENCE is designed to be enabled in production kernels, but it can
> be also useful in some debug situations. For machines with limited
> memory and CPU resources, KASAN is really hard to run. Fortunately,
If these are arm64 based machines, see if CONFIG_KASAN_SW_TAGS works for
you. In future, we believe that CONFIG_KASAN_HW_TAGS will be suitable
for a variety of scenarios, including debugging scenarios of resource
constrained environments.
> KFENCE can be a suitable candidate. For KFENCE running on a single
> machine, the possibility of discovering existed bugs will increase
> as the increasing of KFENCE objects, but this will cost more memory.
> In order to balance the possibility of discovering existed bugs and
> memory cost, KFENCE objects need to be adjusted according to memory
> resources for a compiled kernel Image. Add a module parameter to
> adjust KFENCE objects will make kfence to use in different machines
> with the same kernel Image.
>
> In short, the following reasons motivate us to add this parameter.
> 1) In some debug situations, this will make kfence flexible.
> 2) For some production machines with different memory and CPU size,
> this will reduce the kernel-Image-version burden.
[...]
> This patch (of 3):
[ Note for future: No need to add "This patch (of X)" usually -- this is
added by maintainers if deemed appropriate, and usually includes the
cover letter. ]
> The most important motivation of this patch series is to make
> KFENCE easy-to-use in business situations.
>
> Signed-off-by: Peng Liu <[email protected]>
> ---
> Documentation/dev-tools/kfence.rst | 14 ++--
> include/linux/kfence.h | 3 +-
> mm/kfence/core.c | 108 ++++++++++++++++++++++++-----
> mm/kfence/kfence.h | 2 +-
> mm/kfence/kfence_test.c | 2 +-
> 5 files changed, 103 insertions(+), 26 deletions(-)
[...]
> diff --git a/include/linux/kfence.h b/include/linux/kfence.h
> index 4b5e3679a72c..aec4f6b247b5 100644
> --- a/include/linux/kfence.h
> +++ b/include/linux/kfence.h
> @@ -17,12 +17,13 @@
> #include <linux/atomic.h>
> #include <linux/static_key.h>
>
> +extern unsigned long kfence_num_objects;
> /*
> * We allocate an even number of pages, as it simplifies calculations to map
> * address to metadata indices; effectively, the very first page serves as an
> * extended guard page, but otherwise has no special purpose.
> */
> -#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
> +#define KFENCE_POOL_SIZE ((kfence_num_objects + 1) * 2 * PAGE_SIZE)
> extern char *__kfence_pool;
I appreciate the effort, but you could have gotten a quicker answer if
you had first sent us an email to ask why adjustable number of objects
hasn't been done before. Because if it was trivial, we would have
already done it.
What you've done is turned KFENCE_POOL_SIZE into a function instead of a
constant (it still being ALL_CAPS is now also misleading).
This is important here:
/**
* is_kfence_address() - check if an address belongs to KFENCE pool
* @addr: address to check
*
* Return: true or false depending on whether the address is within the KFENCE
* object range.
*
* KFENCE objects live in a separate page range and are not to be intermixed
* with regular heap objects (e.g. KFENCE objects must never be added to the
* allocator freelists). Failing to do so may and will result in heap
* corruptions, therefore is_kfence_address() must be used to check whether
* an object requires specific handling.
*
* Note: This function may be used in fast-paths, and is performance critical.
* Future changes should take this into account; for instance, we want to avoid
* introducing another load and therefore need to keep KFENCE_POOL_SIZE a
* constant (until immediate patching support is added to the kernel).
*/
static __always_inline bool is_kfence_address(const void *addr)
{
/*
* The __kfence_pool != NULL check is required to deal with the case
* where __kfence_pool == NULL && addr < KFENCE_POOL_SIZE. Keep it in
* the slow-path after the range-check!
*/
return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && __kfence_pool);
}
Unfortunately I think you missed the "Note".
Which means that ultimately your patch adds another LOAD to the fast
path, which is not an acceptable trade-off.
This would mean your change would require benchmarking, but it'd also
mean we and everyone else would have to re-benchmark _all_ systems where
we've deployed KFENCE.
I think the only reasonable way forward is if you add immediate patching
support to the kernel as the "Note" suggests.
In the meantime, while not a single kernel imagine, we've found that
debug scenarios usually are best served with a custom debug kernel, as
there are other debug features that are only Kconfig configurable. Thus,
having a special debug kernel just configure KFENCE differently
shouldn't be an issue in the majority of cases.
Should this answer not be satisfying for you, the recently added feature
skipping already covered allocations (configurable via
kfence.skip_covered_thresh) alleviates some of the issue of a smaller
pool with a very low sample interval (viz. high sample rate).
The main thing to watch out for is KFENCE's actual sample rate vs
intended sample rate (per kfence.sample_interval). If you monitor
/sys/kernel/debug/kfence/stats, you can compute the actual sample rate.
If the actual sample rate becomes significantly lower than the intended
rate, only then does it make sense to increase the pool size. My
suggestion for you is therefore to run some experiments, while adjusting
kfence.sample_interval and kfence.skip_covered_thresh until you reach a
sample rate that is close to intended.
Thanks,
-- Marco
On Mon, 24 Jan 2022 at 03:37, Peng Liu <[email protected]> wrote:
>
> In order to release a uniform kernel with KFENCE, it is good to
> compile it with CONFIG_KFENCE_SAMPLE_INTERVAL = 0. For a group of
> produtions who don't want to use KFENCE, they can use kernel just
> as original vesion without KFENCE. For KFENCE users, they can open
> it by setting the kernel boot parameter kfence.sample_interval.
> Hence, set KFENCE sample interval default to zero is convenient.
>
> The current KFENCE is supportted to adjust sample interval via the
> kernel boot parameter. However, branches prediction in kfence_alloc
> is not good for situation with CONFIG_KFENCE_SAMPLE_INTERVAL = 0
> and boot parameter kfence.sample_interval != 0, which is because
> the current kfence_alloc is likely to return NULL when
> CONFIG_KFENCE_SAMPLE_INTERVAL = 0. To optimize branches prediction
> in this situation, kfence_enabled will check firstly.
This patch doesn't make any sense. You're adding an unconditional LOAD
to the fast path.
And the choice of static_branch_unlikely() if
CONFIG_KFENCE_SAMPLE_INTERVAL == 0 is very much deliberate, as it
generates code that is preferable in the common case (KFENCE is
disabled).
Please see include/linux/jump_label.h:430. But even then, CPUs are
very good at dealing with unconditional branches, so the difference
really is a wash.
But that new LOAD is not acceptable.
Sorry, but Nack.
> Signed-off-by: Peng Liu <[email protected]>
> ---
> include/linux/kfence.h | 5 ++++-
> mm/kfence/core.c | 2 +-
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/kfence.h b/include/linux/kfence.h
> index aec4f6b247b5..bf91b76b87ee 100644
> --- a/include/linux/kfence.h
> +++ b/include/linux/kfence.h
> @@ -17,6 +17,7 @@
> #include <linux/atomic.h>
> #include <linux/static_key.h>
>
> +extern bool kfence_enabled;
> extern unsigned long kfence_num_objects;
> /*
> * We allocate an even number of pages, as it simplifies calculations to map
> @@ -115,7 +116,9 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags);
> */
> static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
> {
> -#if defined(CONFIG_KFENCE_STATIC_KEYS) || CONFIG_KFENCE_SAMPLE_INTERVAL == 0
> + if (!kfence_enabled)
> + return NULL;
> +#if defined(CONFIG_KFENCE_STATIC_KEYS)
> if (!static_branch_unlikely(&kfence_allocation_key))
> return NULL;
> #else
> diff --git a/mm/kfence/core.c b/mm/kfence/core.c
> index 4655bcc0306e..2301923182b8 100644
> --- a/mm/kfence/core.c
> +++ b/mm/kfence/core.c
> @@ -48,7 +48,7 @@
>
> /* === Data ================================================================= */
>
> -static bool kfence_enabled __read_mostly;
> +bool kfence_enabled __read_mostly;
>
> static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
>
> --
> 2.18.0.huawei.25
>
On Mon, 24 Jan 2022 at 03:37, 'Peng Liu' via kasan-dev
<[email protected]> wrote:
>
> The parameter kfence_sample_interval can be set via boot parameter
> and late shell command. However, KFENCE test case just use compile
> time CONFIG_KFENCE_SAMPLE_INTERVAL, this will make KFENCE test case
> not run as user desired. This patch will make KFENCE test case
> compatible with run-time-set sample interval.
>
> Signed-off-by: Peng Liu <[email protected]>
> ---
> include/linux/kfence.h | 2 ++
> mm/kfence/core.c | 3 ++-
> mm/kfence/kfence_test.c | 8 ++++----
> 3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kfence.h b/include/linux/kfence.h
> index bf91b76b87ee..0fc913a7f017 100644
> --- a/include/linux/kfence.h
> +++ b/include/linux/kfence.h
> @@ -19,6 +19,8 @@
>
> extern bool kfence_enabled;
> extern unsigned long kfence_num_objects;
> +extern unsigned long kfence_sample_interval;
> +
> /*
> * We allocate an even number of pages, as it simplifies calculations to map
> * address to metadata indices; effectively, the very first page serves as an
> diff --git a/mm/kfence/core.c b/mm/kfence/core.c
> index 2301923182b8..e2fcae34cc84 100644
> --- a/mm/kfence/core.c
> +++ b/mm/kfence/core.c
> @@ -50,7 +50,8 @@
>
> bool kfence_enabled __read_mostly;
>
> -static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
> +unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
> +EXPORT_SYMBOL(kfence_sample_interval); /* Export for test modules. */
While it would make some situations more convenient, I've wanted to
avoid exporting a new symbol just for the test. And in most cases it
only makes sense to run the test on a custom debug kernel.
Why do you need this?
Should you really need this, I suggest at least using
EXPORT_SYMBOL_GPL. Should you want it, you can resend this patch
standalone detached from the rest.
Thanks,
-- Marco
On Mon, 24 Jan 2022 at 12:24, liupeng (DM) <[email protected]> wrote:
>
>
> On 2022/1/24 16:19, Marco Elver wrote:
>
> On Mon, Jan 24, 2022 at 02:52AM +0000, Peng Liu wrote:
>
> KFENCE is designed to be enabled in production kernels, but it can
> be also useful in some debug situations. For machines with limited
> memory and CPU resources, KASAN is really hard to run. Fortunately,
>
> If these are arm64 based machines, see if CONFIG_KASAN_SW_TAGS works for
> you. In future, we believe that CONFIG_KASAN_HW_TAGS will be suitable
> for a variety of scenarios, including debugging scenarios of resource
> constrained environments.
>
> Thank you for your good suggestion, we will try it.
>
> KFENCE can be a suitable candidate. For KFENCE running on a single
> machine, the possibility of discovering existed bugs will increase
> as the increasing of KFENCE objects, but this will cost more memory.
> In order to balance the possibility of discovering existed bugs and
> memory cost, KFENCE objects need to be adjusted according to memory
> resources for a compiled kernel Image. Add a module parameter to
> adjust KFENCE objects will make kfence to use in different machines
> with the same kernel Image.
>
> In short, the following reasons motivate us to add this parameter.
> 1) In some debug situations, this will make kfence flexible.
> 2) For some production machines with different memory and CPU size,
> this will reduce the kernel-Image-version burden.
>
> [...]
>
> This patch (of 3):
>
> [ Note for future: No need to add "This patch (of X)" usually -- this is
> added by maintainers if deemed appropriate, and usually includes the
> cover letter. ]
>
> The most important motivation of this patch series is to make
> KFENCE easy-to-use in business situations.
>
> Signed-off-by: Peng Liu <[email protected]>
> ---
> Documentation/dev-tools/kfence.rst | 14 ++--
> include/linux/kfence.h | 3 +-
> mm/kfence/core.c | 108 ++++++++++++++++++++++++-----
> mm/kfence/kfence.h | 2 +-
> mm/kfence/kfence_test.c | 2 +-
> 5 files changed, 103 insertions(+), 26 deletions(-)
>
> [...]
>
> diff --git a/include/linux/kfence.h b/include/linux/kfence.h
> index 4b5e3679a72c..aec4f6b247b5 100644
> --- a/include/linux/kfence.h
> +++ b/include/linux/kfence.h
> @@ -17,12 +17,13 @@
> #include <linux/atomic.h>
> #include <linux/static_key.h>
>
> +extern unsigned long kfence_num_objects;
> /*
> * We allocate an even number of pages, as it simplifies calculations to map
> * address to metadata indices; effectively, the very first page serves as an
> * extended guard page, but otherwise has no special purpose.
> */
> -#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
> +#define KFENCE_POOL_SIZE ((kfence_num_objects + 1) * 2 * PAGE_SIZE)
> extern char *__kfence_pool;
>
> I appreciate the effort, but you could have gotten a quicker answer if
> you had first sent us an email to ask why adjustable number of objects
> hasn't been done before. Because if it was trivial, we would have
> already done it.
>
> What you've done is turned KFENCE_POOL_SIZE into a function instead of a
> constant (it still being ALL_CAPS is now also misleading).
>
> This is important here:
>
> /**
> * is_kfence_address() - check if an address belongs to KFENCE pool
> * @addr: address to check
> *
> * Return: true or false depending on whether the address is within the KFENCE
> * object range.
> *
> * KFENCE objects live in a separate page range and are not to be intermixed
> * with regular heap objects (e.g. KFENCE objects must never be added to the
> * allocator freelists). Failing to do so may and will result in heap
> * corruptions, therefore is_kfence_address() must be used to check whether
> * an object requires specific handling.
> *
> * Note: This function may be used in fast-paths, and is performance critical.
> * Future changes should take this into account; for instance, we want to avoid
> * introducing another load and therefore need to keep KFENCE_POOL_SIZE a
> * constant (until immediate patching support is added to the kernel).
> */
> static __always_inline bool is_kfence_address(const void *addr)
> {
> /*
> * The __kfence_pool != NULL check is required to deal with the case
> * where __kfence_pool == NULL && addr < KFENCE_POOL_SIZE. Keep it in
> * the slow-path after the range-check!
> */
> return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && __kfence_pool);
> }
>
> Unfortunately I think you missed the "Note".
>
> Which means that ultimately your patch adds another LOAD to the fast
> path, which is not an acceptable trade-off.
>
> This would mean your change would require benchmarking, but it'd also
> mean we and everyone else would have to re-benchmark _all_ systems where
> we've deployed KFENCE.
>
> I think the only reasonable way forward is if you add immediate patching
> support to the kernel as the "Note" suggests.
>
> May you give us more details about "immediate patching"?
Another option may be as follows:
Have a config for _max_ pool size. Always reserve max amount of
virtual address space, and do the range check for the max amount. But
actually allocate pages potentially for a smaller number of objects
(configured with a runtime parameter).
> In the meantime, while not a single kernel imagine, we've found that
> debug scenarios usually are best served with a custom debug kernel, as
> there are other debug features that are only Kconfig configurable. Thus,
> having a special debug kernel just configure KFENCE differently
> shouldn't be an issue in the majority of cases.
>
> Should this answer not be satisfying for you, the recently added feature
> skipping already covered allocations (configurable via
> kfence.skip_covered_thresh) alleviates some of the issue of a smaller
> pool with a very low sample interval (viz. high sample rate).
>
> The main thing to watch out for is KFENCE's actual sample rate vs
> intended sample rate (per kfence.sample_interval). If you monitor
> /sys/kernel/debug/kfence/stats, you can compute the actual sample rate.
> If the actual sample rate becomes significantly lower than the intended
> rate, only then does it make sense to increase the pool size. My
> suggestion for you is therefore to run some experiments, while adjusting
> kfence.sample_interval and kfence.skip_covered_thresh until you reach a
> sample rate that is close to intended.
>
> Thanks,
> -- Marco
> .
>
> Thank you for your patient suggestions, it's actually helpful and inspired.
> We have integrated your latest work "skipping already covered allocations",
> and will do more experiments about KFENCE. Finally, we really hope you can
> give us more introductions about "immediate patching".
>
> Thanks,
> -- Peng Liu
> .
[ FYI, your reply was not plain text, so LKML may have rejected it. I
advise that you switch your email client for LKML emails to plain
text. ]
On Mon, 24 Jan 2022 at 12:24, liupeng (DM) <[email protected]> wrote:
[...]
> > I think the only reasonable way forward is if you add immediate patching
> > support to the kernel as the "Note" suggests.
>
> May you give us more details about "immediate patching"?
[...]
> Thank you for your patient suggestions, it's actually helpful and inspired.
> We have integrated your latest work "skipping already covered allocations",
> and will do more experiments about KFENCE. Finally, we really hope you can
> give us more introductions about "immediate patching".
"Immediate patching" would, similar to "static branches" or
"alternatives" be based on code hot patching.
https://www.kernel.org/doc/html/latest/staging/static-keys.html
"Patching immediates" would essentially patch the immediate operands
of certain (limited) instructions. I think designing this properly to
work across various architectures (like static_keys/jump_label) is
very complex. So it may not be a viable near-term option.
What Dmitry suggests using a constant virtual address carveout is more
realistic. But this means having to discuss with arch maintainers
which virtual address ranges can be reserved. The nice thing about
just relying on memblock and nothing else is that it is very portable
and simple. You can have a look at how KASAN deals with organizing its
shadow memory if you are interested.
Thanks,
-- Marco
On Mon, 24 Jan 2022 at 12:45, Marco Elver <[email protected]> wrote:
>
> [ FYI, your reply was not plain text, so LKML may have rejected it. I
> advise that you switch your email client for LKML emails to plain
> text. ]
>
> On Mon, 24 Jan 2022 at 12:24, liupeng (DM) <[email protected]> wrote:
> [...]
> > > I think the only reasonable way forward is if you add immediate patching
> > > support to the kernel as the "Note" suggests.
> >
> > May you give us more details about "immediate patching"?
> [...]
> > Thank you for your patient suggestions, it's actually helpful and inspired.
> > We have integrated your latest work "skipping already covered allocations",
> > and will do more experiments about KFENCE. Finally, we really hope you can
> > give us more introductions about "immediate patching".
>
> "Immediate patching" would, similar to "static branches" or
> "alternatives" be based on code hot patching.
>
> https://www.kernel.org/doc/html/latest/staging/static-keys.html
>
> "Patching immediates" would essentially patch the immediate operands
> of certain (limited) instructions. I think designing this properly to
> work across various architectures (like static_keys/jump_label) is
> very complex. So it may not be a viable near-term option.
>
> What Dmitry suggests using a constant virtual address carveout is more
> realistic. But this means having to discuss with arch maintainers
> which virtual address ranges can be reserved. The nice thing about
> just relying on memblock and nothing else is that it is very portable
> and simple. You can have a look at how KASAN deals with organizing its
> shadow memory if you are interested.
Hmm, there may be more issues lurking here:
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
... and I'm guessing if we assign a fixed virtual address range it'll
live outside the linear mapping, which is likely to break certain
requirements of kmalloc()'d allocations in certain situations (a
problem we had with v1 of KFENCE on arm64).
So I don't even know if that's feasible. :-/
Thanks,
-- Marco
On Mon, 24 Jan 2022 at 13:19, liupeng (DM) <[email protected]> wrote:
[...]
> When KFENCE pool size can be adjusted by boot parameters(assumption),
> automatically test and train KFENCE may be useful. So far, exporting
> kfence.sample_interval is not necessary.
I'm not opposed to the patch (I've also run into this issue, but not
too frequently) - feel free to just send it with EXPORT_SYMBOL_GPL.
Thanks,
-- Marco