2022-06-13 17:36:22

by Mel Gorman

[permalink] [raw]
Subject: [PATCH v4 00/7] Drain remote per-cpu directly

This replaces the existing version on mm-unstable. The biggest difference
is replacing local_lock entirely which is the last patch. The other changes
are minor fixes reported by Hugh and Vlastimil.

Changelog since v3
o Checkpatch fixes from mm-unstable (akpm)
o Replace local_lock with spinlock (akpm)
o Remove IRQ-disabled check in free_unref_page_list as it triggers
a false positive (hughd)
o Take an unlikely check out of the rmqueue fast path (vbabka)

Some setups, notably NOHZ_FULL CPUs, may be running realtime or
latency-sensitive applications that cannot tolerate interference due to
per-cpu drain work queued by __drain_all_pages(). Introduce a new
mechanism to remotely drain the per-cpu lists. It is made possible by
remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. This has
two advantages, the time to drain is more predictable and other unrelated
tasks are not interrupted.

This series has the same intent as Nicolas' series "mm/page_alloc: Remote
per-cpu lists drain support" -- avoid interference of a high priority task
due to a workqueue item draining per-cpu page lists. While many workloads
can tolerate a brief interruption, it may cause a real-time task running
on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is
non-deterministic.

Currently an IRQ-safe local_lock protects the page allocator per-cpu
lists. The local_lock on its own prevents migration and the IRQ disabling
protects from corruption due to an interrupt arriving while a page
allocation is in progress.

This series adjusts the locking. A spinlock is added to struct
per_cpu_pages to protect the list contents while local_lock_irq is
ultimately replaced by just the spinlock in the final patch. This allows
a remote CPU to safely. Follow-on work should allow the local_irq_save
to be converted to a local_irq to avoid IRQs being disabled/enabled in
most cases.

Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
and when it is storing per-cpu pages.

Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
this is not necessary but it avoids per_cpu_pages consuming another
cache line.

Patch 3 is a preparation patch to avoid code duplication.

Patch 4 is a simple micro-optimisation that improves code flow necessary for
a later patch to avoid code duplication.

Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
relying on local_lock to prevent migration, stabilise the pcp
lookup and prevent IRQ reentrancy.

Patch 6 remote drains per-cpu pages directly instead of using a workqueue.

Patch 7 uses a normal spinlock instead of local_lock for remote draining

include/linux/mm_types.h | 5 +
include/linux/mmzone.h | 12 +-
mm/page_alloc.c | 404 ++++++++++++++++++++++++---------------
3 files changed, 266 insertions(+), 155 deletions(-)

--
2.35.3


2022-06-13 17:42:53

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

struct per_cpu_pages is no longer strictly local as PCP lists can be
drained remotely using a lock for protection. While the use of local_lock
works, it goes against the intent of local_lock which is for "pure
CPU local concurrency control mechanisms and not suited for inter-CPU
concurrency control" (Documentation/locking/locktypes.rst)

local_lock protects against migration between when the percpu pointer is
accessed and the pcp->lock acquired. The lock acquisition is a preemption
point so in the worst case, a task could migrate to another NUMA node
and accidentally allocate remote memory. The main requirement is to pin
the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.

Replace local_lock with helpers that pin a task to a CPU, lookup the
per-cpu structure and acquire the embedded lock. It's similar to local_lock
without breaking the intent behind the API. It is not a complete API
as only the parts needed for PCP-alloc are implemented but in theory,
the generic helpers could be promoted to a general API if there was
demand for an embedded lock within a per-cpu struct with a guarantee
that the per-cpu structure locked matches the running CPU and cannot use
get_cpu_var due to RT concerns. PCP requires these semantics to avoid
accidentally allocating remote memory.

Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 226 ++++++++++++++++++++++++++----------------------
1 file changed, 121 insertions(+), 105 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03882ce7765f..f10782ab7cc7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -126,13 +126,6 @@ typedef int __bitwise fpi_t;
static DEFINE_MUTEX(pcp_batch_high_lock);
#define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8)

-struct pagesets {
- local_lock_t lock;
-};
-static DEFINE_PER_CPU(struct pagesets, pagesets) = {
- .lock = INIT_LOCAL_LOCK(lock),
-};
-
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
/*
* On SMP, spin_trylock is sufficient protection.
@@ -147,6 +140,81 @@ static DEFINE_PER_CPU(struct pagesets, pagesets) = {
#define pcp_trylock_finish(flags) local_irq_restore(flags)
#endif

+/*
+ * Locking a pcp requires a PCP lookup followed by a spinlock. To avoid
+ * a migration causing the wrong PCP to be locked and remote memory being
+ * potentially allocated, pin the task to the CPU for the lookup+lock.
+ * preempt_disable is used on !RT because it is faster than migrate_disable.
+ * migrate_disable is used on RT because otherwise RT spinlock usage is
+ * interfered with and a high priority task cannot preempt the allocator.
+ */
+#ifndef CONFIG_PREEMPT_RT
+#define pcpu_task_pin() preempt_disable()
+#define pcpu_task_unpin() preempt_enable()
+#else
+#define pcpu_task_pin() migrate_disable()
+#define pcpu_task_unpin() migrate_enable()
+#endif
+
+/*
+ * Generic helper to lookup and a per-cpu variable with an embedded spinlock.
+ * Return value should be used with equivalent unlock helper.
+ */
+#define pcpu_spin_lock(type, member, ptr) \
+({ \
+ type *_ret; \
+ pcpu_task_pin(); \
+ _ret = this_cpu_ptr(ptr); \
+ spin_lock(&_ret->member); \
+ _ret; \
+})
+
+#define pcpu_spin_lock_irqsave(type, member, ptr, flags) \
+({ \
+ type *_ret; \
+ pcpu_task_pin(); \
+ _ret = this_cpu_ptr(ptr); \
+ spin_lock_irqsave(&_ret->member, flags); \
+ _ret; \
+})
+
+#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \
+({ \
+ type *_ret; \
+ pcpu_task_pin(); \
+ _ret = this_cpu_ptr(ptr); \
+ if (!spin_trylock_irqsave(&_ret->member, flags)) \
+ _ret = NULL; \
+ _ret; \
+})
+
+#define pcpu_spin_unlock(member, ptr) \
+({ \
+ spin_unlock(&ptr->member); \
+ pcpu_task_unpin(); \
+})
+
+#define pcpu_spin_unlock_irqrestore(member, ptr, flags) \
+({ \
+ spin_unlock_irqrestore(&ptr->member, flags); \
+ pcpu_task_unpin(); \
+})
+
+/* struct per_cpu_pages specific helpers. */
+#define pcp_spin_lock(ptr) \
+ pcpu_spin_lock(struct per_cpu_pages, lock, ptr)
+
+#define pcp_spin_lock_irqsave(ptr, flags) \
+ pcpu_spin_lock_irqsave(struct per_cpu_pages, lock, ptr, flags)
+
+#define pcp_spin_trylock_irqsave(ptr, flags) \
+ pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags)
+
+#define pcp_spin_unlock(ptr) \
+ pcpu_spin_unlock(lock, ptr)
+
+#define pcp_spin_unlock_irqrestore(ptr, flags) \
+ pcpu_spin_unlock_irqrestore(lock, ptr, flags)
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DEFINE_PER_CPU(int, numa_node);
EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -1481,10 +1549,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* Ensure requested pindex is drained first. */
pindex = pindex - 1;

- /*
- * local_lock_irq held so equivalent to spin_lock_irqsave for
- * both PREEMPT_RT and non-PREEMPT_RT configurations.
- */
+ /* Caller must hold IRQ-safe pcp->lock so IRQs are disabled. */
spin_lock(&zone->lock);
isolated_pageblocks = has_isolate_pageblock(zone);

@@ -3052,10 +3117,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
{
int i, allocated = 0;

- /*
- * local_lock_irq held so equivalent to spin_lock_irqsave for
- * both PREEMPT_RT and non-PREEMPT_RT configurations.
- */
+ /* Caller must hold IRQ-safe pcp->lock so IRQs are disabled. */
spin_lock(&zone->lock);
for (i = 0; i < count; ++i) {
struct page *page = __rmqueue(zone, order, migratetype,
@@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
return min(READ_ONCE(pcp->batch) << 2, high);
}

-/* Returns true if the page was committed to the per-cpu list. */
-static bool free_unref_page_commit(struct page *page, int migratetype,
- unsigned int order, bool locked)
+static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone,
+ struct page *page, int migratetype,
+ unsigned int order)
{
- struct zone *zone = page_zone(page);
- struct per_cpu_pages *pcp;
int high;
int pindex;
bool free_high;
- unsigned long __maybe_unused UP_flags;

__count_vm_event(PGFREE);
- pcp = this_cpu_ptr(zone->per_cpu_pageset);
pindex = order_to_pindex(migratetype, order);

- if (!locked) {
- /* Protect against a parallel drain. */
- pcp_trylock_prepare(UP_flags);
- if (!spin_trylock(&pcp->lock)) {
- pcp_trylock_finish(UP_flags);
- return false;
- }
- }
-
list_add(&page->pcp_list, &pcp->lists[pindex]);
pcp->count += 1 << order;

@@ -3408,13 +3457,6 @@ static bool free_unref_page_commit(struct page *page, int migratetype,

free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex);
}
-
- if (!locked) {
- spin_unlock(&pcp->lock);
- pcp_trylock_finish(UP_flags);
- }
-
- return true;
}

/*
@@ -3422,10 +3464,12 @@ static bool free_unref_page_commit(struct page *page, int migratetype,
*/
void free_unref_page(struct page *page, unsigned int order)
{
- unsigned long flags;
+ struct per_cpu_pages *pcp;
+ struct zone *zone;
unsigned long pfn = page_to_pfn(page);
int migratetype;
- bool freed_pcp = false;
+ unsigned long flags;
+ unsigned long __maybe_unused UP_flags;

if (!free_unref_page_prepare(page, pfn, order))
return;
@@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order)
migratetype = MIGRATE_MOVABLE;
}

- local_lock_irqsave(&pagesets.lock, flags);
- freed_pcp = free_unref_page_commit(page, migratetype, order, false);
- local_unlock_irqrestore(&pagesets.lock, flags);
-
- if (unlikely(!freed_pcp))
+ zone = page_zone(page);
+ pcp_trylock_prepare(UP_flags);
+ pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
+ if (pcp) {
+ free_unref_page_commit(pcp, zone, page, migratetype, order);
+ pcp_spin_unlock_irqrestore(pcp, flags);
+ } else {
free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
+ }
+ pcp_trylock_finish(UP_flags);
}

/*
@@ -3500,20 +3548,20 @@ void free_unref_page_list(struct list_head *list)
if (list_empty(list))
return;

- local_lock_irqsave(&pagesets.lock, flags);
-
page = lru_to_page(list);
locked_zone = page_zone(page);
- pcp = this_cpu_ptr(locked_zone->per_cpu_pageset);
- spin_lock(&pcp->lock);
+ pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags);

list_for_each_entry_safe(page, next, list, lru) {
struct zone *zone = page_zone(page);

/* Different zone, different pcp lock. */
if (zone != locked_zone) {
+ /* Leave IRQs enabled as a new lock is acquired. */
spin_unlock(&pcp->lock);
locked_zone = zone;
+
+ /* Preemption disabled by pcp_spin_lock_irqsave. */
pcp = this_cpu_ptr(zone->per_cpu_pageset);
spin_lock(&pcp->lock);
}
@@ -3528,33 +3576,19 @@ void free_unref_page_list(struct list_head *list)

trace_mm_page_free_batched(page);

- /*
- * If there is a parallel drain in progress, free to the buddy
- * allocator directly. This is expensive as the zone lock will
- * be acquired multiple times but if a drain is in progress
- * then an expensive operation is already taking place.
- *
- * TODO: Always false at the moment due to local_lock_irqsave
- * and is preparation for converting to local_lock.
- */
- if (unlikely(!free_unref_page_commit(page, migratetype, 0, true)))
- free_one_page(page_zone(page), page, page_to_pfn(page), 0, migratetype, FPI_NONE);
+ free_unref_page_commit(pcp, zone, page, migratetype, 0);

/*
* Guard against excessive IRQ disabled times when we get
* a large list of pages to free.
*/
if (++batch_count == SWAP_CLUSTER_MAX) {
- spin_unlock(&pcp->lock);
- local_unlock_irqrestore(&pagesets.lock, flags);
+ pcp_spin_unlock_irqrestore(pcp, flags);
batch_count = 0;
- local_lock_irqsave(&pagesets.lock, flags);
- pcp = this_cpu_ptr(locked_zone->per_cpu_pageset);
- spin_lock(&pcp->lock);
+ pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags);
}
}
- spin_unlock(&pcp->lock);
- local_unlock_irqrestore(&pagesets.lock, flags);
+ pcp_spin_unlock_irqrestore(pcp, flags);
}

/*
@@ -3722,28 +3756,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
int migratetype,
unsigned int alloc_flags,
struct per_cpu_pages *pcp,
- struct list_head *list,
- bool locked)
+ struct list_head *list)
{
struct page *page;
- unsigned long __maybe_unused UP_flags;
-
- /*
- * spin_trylock is not necessary right now due to due to
- * local_lock_irqsave and is a preparation step for
- * a conversion to local_lock using the trylock to prevent
- * IRQ re-entrancy. If pcp->lock cannot be acquired, the caller
- * uses rmqueue_buddy.
- *
- * TODO: Convert local_lock_irqsave to local_lock.
- */
- if (unlikely(!locked)) {
- pcp_trylock_prepare(UP_flags);
- if (!spin_trylock(&pcp->lock)) {
- pcp_trylock_finish(UP_flags);
- return NULL;
- }
- }

do {
if (list_empty(list)) {
@@ -3776,10 +3791,6 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
} while (check_new_pcp(page, order));

out:
- if (!locked) {
- spin_unlock(&pcp->lock);
- pcp_trylock_finish(UP_flags);
- }

return page;
}
@@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
struct list_head *list;
struct page *page;
unsigned long flags;
+ unsigned long __maybe_unused UP_flags;

- local_lock_irqsave(&pagesets.lock, flags);
+ /*
+ * spin_trylock_irqsave is not necessary right now as it'll only be
+ * true when contending with a remote drain. It's in place as a
+ * preparation step before converting pcp locking to spin_trylock
+ * to protect against IRQ reentry.
+ */
+ pcp_trylock_prepare(UP_flags);
+ pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
+ if (!pcp)
+ return NULL;

/*
* On allocation, reduce the number of pages that are batch freed.
* See nr_pcp_free() where free_factor is increased for subsequent
* frees.
*/
- pcp = this_cpu_ptr(zone->per_cpu_pageset);
pcp->free_factor >>= 1;
list = &pcp->lists[order_to_pindex(migratetype, order)];
- page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list, false);
- local_unlock_irqrestore(&pagesets.lock, flags);
+ page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
+ pcp_spin_unlock_irqrestore(pcp, flags);
+ pcp_trylock_finish(UP_flags);
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
zone_statistics(preferred_zone, zone, 1);
@@ -5410,10 +5431,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
goto failed;

/* Attempt the batch allocation */
- local_lock_irqsave(&pagesets.lock, flags);
- pcp = this_cpu_ptr(zone->per_cpu_pageset);
+ pcp = pcp_spin_lock_irqsave(zone->per_cpu_pageset, flags);
pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)];
- spin_lock(&pcp->lock);

while (nr_populated < nr_pages) {

@@ -5424,13 +5443,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
}

page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
- pcp, pcp_list, true);
+ pcp, pcp_list);
if (unlikely(!page)) {
/* Try and allocate at least one page */
- if (!nr_account) {
- spin_unlock(&pcp->lock);
+ if (!nr_account)
goto failed_irq;
- }
break;
}
nr_account++;
@@ -5443,8 +5460,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
nr_populated++;
}

- spin_unlock(&pcp->lock);
- local_unlock_irqrestore(&pagesets.lock, flags);
+ pcp_spin_unlock_irqrestore(pcp, flags);

__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
@@ -5453,7 +5469,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
return nr_populated;

failed_irq:
- local_unlock_irqrestore(&pagesets.lock, flags);
+ pcp_spin_unlock_irqrestore(pcp, flags);

failed:
page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
--
2.35.3

2022-06-13 17:43:43

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 4/7] mm/page_alloc: Remove mistaken page == NULL check in rmqueue

If a page allocation fails, the ZONE_BOOSTER_WATERMARK should be tested,
cleared and kswapd woken whether the allocation attempt was via the PCP
or directly via the buddy list.

Remove the page == NULL so the ZONE_BOOSTED_WATERMARK bit is checked
unconditionally. As it is unlikely that ZONE_BOOSTED_WATERMARK is set,
mark the branch accordingly.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
mm/page_alloc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44d198af4b35..7fb262eeec2f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3777,12 +3777,10 @@ struct page *rmqueue(struct zone *preferred_zone,

page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
migratetype);
- if (unlikely(!page))
- return NULL;

out:
/* Separate test+clear to avoid unnecessary atomics */
- if (test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) {
+ if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
wakeup_kswapd(zone, 0, 0, zone_idx(zone));
}
--
2.35.3

2022-06-13 17:44:57

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 6/7] mm/page_alloc: Remotely drain per-cpu lists

From: Nicolas Saenz Julienne <[email protected]>

Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu
drain work queued by __drain_all_pages(). So introduce a new mechanism to
remotely drain the per-cpu lists. It is made possible by remotely locking
'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new
scheme is that drain operations are now migration safe.

There was no observed performance degradation vs. the previous scheme.
Both netperf and hackbench were run in parallel to triggering the
__drain_all_pages(NULL, true) code path around ~100 times per second. The
new scheme performs a bit better (~5%), although the important point here
is there are no performance regressions vs. the previous mechanism.
Per-cpu lists draining happens only in slow paths.

Minchan Kim tested this independently and reported;

My workload is not NOHZ CPUs but run apps under heavy memory
pressure so they goes to direct reclaim and be stuck on
drain_all_pages until work on workqueue run.

unit: nanosecond
max(dur) avg(dur) count(dur)
166713013 487511.77786438033 1283

From traces, system encountered the drain_all_pages 1283 times and
worst case was 166ms and avg was 487us.

The other problem was alloc_contig_range in CMA. The PCP draining
takes several hundred millisecond sometimes though there is no
memory pressure or a few of pages to be migrated out but CPU were
fully booked.

Your patch perfectly removed those wasted time.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: Minchan Kim <[email protected]>
Acked-by: Minchan Kim <[email protected]>
---
mm/page_alloc.c | 58 ++++---------------------------------------------
1 file changed, 4 insertions(+), 54 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 533fc5527582..03882ce7765f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -165,13 +165,7 @@ DEFINE_PER_CPU(int, _numa_mem_); /* Kernel "local memory" node */
EXPORT_PER_CPU_SYMBOL(_numa_mem_);
#endif

-/* work_structs for global per-cpu drains */
-struct pcpu_drain {
- struct zone *zone;
- struct work_struct work;
-};
static DEFINE_MUTEX(pcpu_drain_mutex);
-static DEFINE_PER_CPU(struct pcpu_drain, pcpu_drain);

#ifdef CONFIG_GCC_PLUGIN_LATENT_ENTROPY
volatile unsigned long latent_entropy __latent_entropy;
@@ -3105,9 +3099,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
* Called from the vmstat counter updater to drain pagesets of this
* currently executing processor on remote nodes after they have
* expired.
- *
- * Note that this function must be called with the thread pinned to
- * a single processor.
*/
void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
{
@@ -3132,10 +3123,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)

/*
* Drain pcplists of the indicated processor and zone.
- *
- * The processor must either be the current processor and the
- * thread pinned to the current processor or a processor that
- * is not online.
*/
static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
@@ -3154,10 +3141,6 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)

/*
* Drain pcplists of all zones on the indicated processor.
- *
- * The processor must either be the current processor and the
- * thread pinned to the current processor or a processor that
- * is not online.
*/
static void drain_pages(unsigned int cpu)
{
@@ -3170,9 +3153,6 @@ static void drain_pages(unsigned int cpu)

/*
* Spill all of this CPU's per-cpu pages back into the buddy allocator.
- *
- * The CPU has to be pinned. When zone parameter is non-NULL, spill just
- * the single zone's pages.
*/
void drain_local_pages(struct zone *zone)
{
@@ -3184,24 +3164,6 @@ void drain_local_pages(struct zone *zone)
drain_pages(cpu);
}

-static void drain_local_pages_wq(struct work_struct *work)
-{
- struct pcpu_drain *drain;
-
- drain = container_of(work, struct pcpu_drain, work);
-
- /*
- * drain_all_pages doesn't use proper cpu hotplug protection so
- * we can race with cpu offline when the WQ can move this from
- * a cpu pinned worker to an unbound one. We can operate on a different
- * cpu which is alright but we also have to make sure to not move to
- * a different one.
- */
- migrate_disable();
- drain_local_pages(drain->zone);
- migrate_enable();
-}
-
/*
* The implementation of drain_all_pages(), exposing an extra parameter to
* drain on all cpus.
@@ -3222,13 +3184,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus)
*/
static cpumask_t cpus_with_pcps;

- /*
- * Make sure nobody triggers this path before mm_percpu_wq is fully
- * initialized.
- */
- if (WARN_ON_ONCE(!mm_percpu_wq))
- return;
-
/*
* Do not drain if one is already in progress unless it's specific to
* a zone. Such callers are primarily CMA and memory hotplug and need
@@ -3278,14 +3233,11 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus)
}

for_each_cpu(cpu, &cpus_with_pcps) {
- struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu);
-
- drain->zone = zone;
- INIT_WORK(&drain->work, drain_local_pages_wq);
- queue_work_on(cpu, mm_percpu_wq, &drain->work);
+ if (zone)
+ drain_pages_zone(cpu, zone);
+ else
+ drain_pages(cpu);
}
- for_each_cpu(cpu, &cpus_with_pcps)
- flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work);

mutex_unlock(&pcpu_drain_mutex);
}
@@ -3294,8 +3246,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus)
* Spill all the per-cpu pages from all CPUs back into the buddy allocator.
*
* When zone parameter is non-NULL, spill just the single zone's pages.
- *
- * Note that this can be extremely slow as the draining happens in a workqueue.
*/
void drain_all_pages(struct zone *zone)
{
--
2.35.3

2022-06-13 17:55:04

by Mel Gorman

[permalink] [raw]
Subject: [PATCH 1/7] mm/page_alloc: Add page->buddy_list and page->pcp_list

The page allocator uses page->lru for storing pages on either buddy or PCP
lists. Create page->buddy_list and page->pcp_list as a union with
page->lru. This is simply to clarify what type of list a page is on in
the page allocator.

No functional change intended.

[[email protected]: fix page lru fields in macros]
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: Minchan Kim <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Reviewed-by: Nicolas Saenz Julienne <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
include/linux/mm_types.h | 5 +++++
mm/page_alloc.c | 24 ++++++++++++------------
2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index b34ff2cdbc4f..c09b7f0555b8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -87,6 +87,7 @@ struct page {
*/
union {
struct list_head lru;
+
/* Or, for the Unevictable "LRU list" slot */
struct {
/* Always even, to negate PageTail */
@@ -94,6 +95,10 @@ struct page {
/* Count page's or folio's mlocks */
unsigned int mlock_count;
};
+
+ /* Or, free page */
+ struct list_head buddy_list;
+ struct list_head pcp_list;
};
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485..247fa7502199 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -785,7 +785,7 @@ static inline bool set_page_guard(struct zone *zone, struct page *page,
return false;

__SetPageGuard(page);
- INIT_LIST_HEAD(&page->lru);
+ INIT_LIST_HEAD(&page->buddy_list);
set_page_private(page, order);
/* Guard pages are not available for any usage */
__mod_zone_freepage_state(zone, -(1 << order), migratetype);
@@ -928,7 +928,7 @@ static inline void add_to_free_list(struct page *page, struct zone *zone,
{
struct free_area *area = &zone->free_area[order];

- list_add(&page->lru, &area->free_list[migratetype]);
+ list_add(&page->buddy_list, &area->free_list[migratetype]);
area->nr_free++;
}

@@ -938,7 +938,7 @@ static inline void add_to_free_list_tail(struct page *page, struct zone *zone,
{
struct free_area *area = &zone->free_area[order];

- list_add_tail(&page->lru, &area->free_list[migratetype]);
+ list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
area->nr_free++;
}

@@ -952,7 +952,7 @@ static inline void move_to_free_list(struct page *page, struct zone *zone,
{
struct free_area *area = &zone->free_area[order];

- list_move_tail(&page->lru, &area->free_list[migratetype]);
+ list_move_tail(&page->buddy_list, &area->free_list[migratetype]);
}

static inline void del_page_from_free_list(struct page *page, struct zone *zone,
@@ -962,7 +962,7 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone,
if (page_reported(page))
__ClearPageReported(page);

- list_del(&page->lru);
+ list_del(&page->buddy_list);
__ClearPageBuddy(page);
set_page_private(page, 0);
zone->free_area[order].nr_free--;
@@ -1504,11 +1504,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
do {
int mt;

- page = list_last_entry(list, struct page, lru);
+ page = list_last_entry(list, struct page, pcp_list);
mt = get_pcppage_migratetype(page);

/* must delete to avoid corrupting pcp list */
- list_del(&page->lru);
+ list_del(&page->pcp_list);
count -= nr_pages;
pcp->count -= nr_pages;

@@ -3068,7 +3068,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
* for IO devices that can merge IO requests if the physical
* pages are ordered properly.
*/
- list_add_tail(&page->lru, list);
+ list_add_tail(&page->pcp_list, list);
allocated++;
if (is_migrate_cma(get_pcppage_migratetype(page)))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
@@ -3318,7 +3318,7 @@ void mark_free_pages(struct zone *zone)

for_each_migratetype_order(order, t) {
list_for_each_entry(page,
- &zone->free_area[order].free_list[t], lru) {
+ &zone->free_area[order].free_list[t], buddy_list) {
unsigned long i;

pfn = page_to_pfn(page);
@@ -3407,7 +3407,7 @@ static void free_unref_page_commit(struct page *page, int migratetype,
__count_vm_event(PGFREE);
pcp = this_cpu_ptr(zone->per_cpu_pageset);
pindex = order_to_pindex(migratetype, order);
- list_add(&page->lru, &pcp->lists[pindex]);
+ list_add(&page->pcp_list, &pcp->lists[pindex]);
pcp->count += 1 << order;

/*
@@ -3670,8 +3670,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
return NULL;
}

- page = list_first_entry(list, struct page, lru);
- list_del(&page->lru);
+ page = list_first_entry(list, struct page, pcp_list);
+ list_del(&page->pcp_list);
pcp->count -= 1 << order;
} while (check_new_pcp(page, order));

--
2.35.3

2022-06-15 22:56:08

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

Hi Mel,

On 13.06.2022 14:56, Mel Gorman wrote:
> struct per_cpu_pages is no longer strictly local as PCP lists can be
> drained remotely using a lock for protection. While the use of local_lock
> works, it goes against the intent of local_lock which is for "pure
> CPU local concurrency control mechanisms and not suited for inter-CPU
> concurrency control" (Documentation/locking/locktypes.rst)
>
> local_lock protects against migration between when the percpu pointer is
> accessed and the pcp->lock acquired. The lock acquisition is a preemption
> point so in the worst case, a task could migrate to another NUMA node
> and accidentally allocate remote memory. The main requirement is to pin
> the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
>
> Replace local_lock with helpers that pin a task to a CPU, lookup the
> per-cpu structure and acquire the embedded lock. It's similar to local_lock
> without breaking the intent behind the API. It is not a complete API
> as only the parts needed for PCP-alloc are implemented but in theory,
> the generic helpers could be promoted to a general API if there was
> demand for an embedded lock within a per-cpu struct with a guarantee
> that the per-cpu structure locked matches the running CPU and cannot use
> get_cpu_var due to RT concerns. PCP requires these semantics to avoid
> accidentally allocating remote memory.
>
> Signed-off-by: Mel Gorman <[email protected]>

This patch landed in linux next-20220614 as commit 54bcdc6744e3
("mm/page_alloc: replace local_lock with normal spinlock").
Unfortunately it causes some serious issues when some kernel debugging
options (CONFIG_PROVE_LOCKING and CONFIG_DEBUG_ATOMIC_SLEEP) are
enabled. I've observed this on various ARM 64bit and 32bit boards.

In the logs I see lots of errors like:

BUG: sleeping function called from invalid context at
./include/linux/sched/mm.h:274

BUG: scheduling while atomic: systemd-udevd/288/0x00000002

BUG: sleeping function called from invalid context at mm/filemap.c:2647

however there are also a fatal ones like:

Unable to handle kernel paging request at virtual address 00000000017a87b4


The issues seems to be a bit random. Looks like memory trashing.
Reverting $subject on top of current linux-next fixes all those issues.


Let me know if how I can help debugging this.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-15 23:16:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <[email protected]> wrote:

> In the logs I see lots of errors like:
>
> BUG: sleeping function called from invalid context at
> ./include/linux/sched/mm.h:274
>
> BUG: scheduling while atomic: systemd-udevd/288/0x00000002
>
> BUG: sleeping function called from invalid context at mm/filemap.c:2647
>
> however there are also a fatal ones like:
>
> Unable to handle kernel paging request at virtual address 00000000017a87b4
>
>
> The issues seems to be a bit random. Looks like memory trashing.
> Reverting $subject on top of current linux-next fixes all those issues.
>
>

This?

--- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
+++ a/mm/page_alloc.c
@@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock)
type *_ret; \
pcpu_task_pin(); \
_ret = this_cpu_ptr(ptr); \
- if (!spin_trylock_irqsave(&_ret->member, flags)) \
+ if (!spin_trylock_irqsave(&_ret->member, flags)) { \
+ pcpu_task_unpin(); \
_ret = NULL; \
+ } \
_ret; \
})


I'll drop Mel's patch for next -next.

2022-06-15 23:16:45

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Mon, Jun 13, 2022 at 8:54 AM Mel Gorman <[email protected]> wrote:
...

> +#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \
> +({ \
> + type *_ret; \
> + pcpu_task_pin(); \
> + _ret = this_cpu_ptr(ptr); \
> + if (!spin_trylock_irqsave(&_ret->member, flags)) \
> + _ret = NULL; \

I'm getting "BUG: sleeping function called from invalid context" with
mm-everything-2022-06-14-19-05.

Perhaps missing a pcpu_task_unpin() here?

> + _ret; \
> +})

2022-06-16 03:29:04

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Wed, Jun 15, 2022 at 04:04:46PM -0700, Andrew Morton wrote:
> On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <[email protected]> wrote:
>
> > In the logs I see lots of errors like:
> >
> > BUG: sleeping function called from invalid context at
> > ./include/linux/sched/mm.h:274
> >
> > BUG: scheduling while atomic: systemd-udevd/288/0x00000002
> >
> > BUG: sleeping function called from invalid context at mm/filemap.c:2647
> >
> > however there are also a fatal ones like:
> >
> > Unable to handle kernel paging request at virtual address 00000000017a87b4
> >
> >
> > The issues seems to be a bit random. Looks like memory trashing.
> > Reverting $subject on top of current linux-next fixes all those issues.
> >
> >
>
> This?
>
> --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
> +++ a/mm/page_alloc.c
> @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock)
> type *_ret; \
> pcpu_task_pin(); \
> _ret = this_cpu_ptr(ptr); \
> - if (!spin_trylock_irqsave(&_ret->member, flags)) \
> + if (!spin_trylock_irqsave(&_ret->member, flags)) { \
> + pcpu_task_unpin(); \
> _ret = NULL; \
> + } \
> _ret; \
> })
>
>
> I'll drop Mel's patch for next -next.

While we are at it, please consider this cleanup:

mm/page_alloc.c | 48 +++++++++---------------------------------------
1 file changed, 9 insertions(+), 39 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e538dde2c1c0..a1b76d5fdf75 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -160,61 +160,31 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
* Generic helper to lookup and a per-cpu variable with an embedded spinlock.
* Return value should be used with equivalent unlock helper.
*/
-#define pcpu_spin_lock(type, member, ptr) \
-({ \
- type *_ret; \
- pcpu_task_pin(); \
- _ret = this_cpu_ptr(ptr); \
- spin_lock(&_ret->member); \
- _ret; \
-})
-
-#define pcpu_spin_lock_irqsave(type, member, ptr, flags) \
-({ \
- type *_ret; \
- pcpu_task_pin(); \
- _ret = this_cpu_ptr(ptr); \
- spin_lock_irqsave(&_ret->member, flags); \
- _ret; \
-})
-
-#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \
-({ \
- type *_ret; \
- pcpu_task_pin(); \
- _ret = this_cpu_ptr(ptr); \
- if (!spin_trylock_irqsave(&_ret->member, flags)) \
- _ret = NULL; \
- _ret; \
-})
-
-#define pcpu_spin_unlock(member, ptr) \
-({ \
- spin_unlock(&ptr->member); \
- pcpu_task_unpin(); \
-})
-
-#define pcpu_spin_unlock_irqrestore(member, ptr, flags) \
-({ \
- spin_unlock_irqrestore(&ptr->member, flags); \
- pcpu_task_unpin(); \
-})
-
-/* struct per_cpu_pages specific helpers. */
-#define pcp_spin_lock(ptr) \
- pcpu_spin_lock(struct per_cpu_pages, lock, ptr)
-
#define pcp_spin_lock_irqsave(ptr, flags) \
- pcpu_spin_lock_irqsave(struct per_cpu_pages, lock, ptr, flags)
+({ \
+ struct per_cpu_pages *_ret; \
+ pcpu_task_pin(); \
+ _ret = this_cpu_ptr(ptr); \
+ spin_lock_irqsave(&_ret->lock, flags); \
+ _ret; \
+})

#define pcp_spin_trylock_irqsave(ptr, flags) \
- pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags)
-
-#define pcp_spin_unlock(ptr) \
- pcpu_spin_unlock(lock, ptr)
+({ \
+ struct per_cpu_pages *_ret; \
+ pcpu_task_pin(); \
+ _ret = this_cpu_ptr(ptr); \
+ if (!spin_trylock_irqsave(&_ret->lock, flags)) \
+ _ret = NULL; \
+ _ret; \
+})

#define pcp_spin_unlock_irqrestore(ptr, flags) \
- pcpu_spin_unlock_irqrestore(lock, ptr, flags)
+({ \
+ spin_unlock_irqrestore(&ptr->lock, flags); \
+ pcpu_task_unpin(); \
+})
+
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DEFINE_PER_CPU(int, numa_node);
EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -3488,7 +3458,7 @@ void free_unref_page(struct page *page, unsigned int order)

zone = page_zone(page);
pcp_trylock_prepare(UP_flags);
- pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
+ pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
if (pcp) {
free_unref_page_commit(pcp, zone, page, migratetype, order);
pcp_spin_unlock_irqrestore(pcp, flags);

2022-06-16 17:02:52

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 6/7] mm/page_alloc: Remotely drain per-cpu lists

On 6/13/22 14:56, Mel Gorman wrote:
> From: Nicolas Saenz Julienne <[email protected]>
>
> Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu
> drain work queued by __drain_all_pages(). So introduce a new mechanism to
> remotely drain the per-cpu lists. It is made possible by remotely locking
> 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new
> scheme is that drain operations are now migration safe.
>
> There was no observed performance degradation vs. the previous scheme.
> Both netperf and hackbench were run in parallel to triggering the
> __drain_all_pages(NULL, true) code path around ~100 times per second. The
> new scheme performs a bit better (~5%), although the important point here
> is there are no performance regressions vs. the previous mechanism.
> Per-cpu lists draining happens only in slow paths.
>
> Minchan Kim tested this independently and reported;
>
> My workload is not NOHZ CPUs but run apps under heavy memory
> pressure so they goes to direct reclaim and be stuck on
> drain_all_pages until work on workqueue run.
>
> unit: nanosecond
> max(dur) avg(dur) count(dur)
> 166713013 487511.77786438033 1283
>
> From traces, system encountered the drain_all_pages 1283 times and
> worst case was 166ms and avg was 487us.
>
> The other problem was alloc_contig_range in CMA. The PCP draining
> takes several hundred millisecond sometimes though there is no
> memory pressure or a few of pages to be migrated out but CPU were
> fully booked.
>
> Your patch perfectly removed those wasted time.
>
> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> Tested-by: Minchan Kim <[email protected]>
> Acked-by: Minchan Kim <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2022-06-16 17:11:46

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On 6/13/22 14:56, Mel Gorman wrote:
> struct per_cpu_pages is no longer strictly local as PCP lists can be
> drained remotely using a lock for protection. While the use of local_lock
> works, it goes against the intent of local_lock which is for "pure
> CPU local concurrency control mechanisms and not suited for inter-CPU
> concurrency control" (Documentation/locking/locktypes.rst)
>
> local_lock protects against migration between when the percpu pointer is
> accessed and the pcp->lock acquired. The lock acquisition is a preemption
> point so in the worst case, a task could migrate to another NUMA node
> and accidentally allocate remote memory. The main requirement is to pin
> the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
>
> Replace local_lock with helpers that pin a task to a CPU, lookup the
> per-cpu structure and acquire the embedded lock. It's similar to local_lock
> without breaking the intent behind the API. It is not a complete API
> as only the parts needed for PCP-alloc are implemented but in theory,
> the generic helpers could be promoted to a general API if there was
> demand for an embedded lock within a per-cpu struct with a guarantee
> that the per-cpu structure locked matches the running CPU and cannot use
> get_cpu_var due to RT concerns. PCP requires these semantics to avoid
> accidentally allocating remote memory.
>
> Signed-off-by: Mel Gorman <[email protected]>

...

> @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
> return min(READ_ONCE(pcp->batch) << 2, high);
> }
>
> -/* Returns true if the page was committed to the per-cpu list. */
> -static bool free_unref_page_commit(struct page *page, int migratetype,
> - unsigned int order, bool locked)
> +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone,
> + struct page *page, int migratetype,
> + unsigned int order)

Hmm given this drops the "bool locked" and bool return value again, my
suggestion for patch 5/7 would result in less churn as those woudn't need to
be introduced?

...

> @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> struct list_head *list;
> struct page *page;
> unsigned long flags;
> + unsigned long __maybe_unused UP_flags;
>
> - local_lock_irqsave(&pagesets.lock, flags);
> + /*
> + * spin_trylock_irqsave is not necessary right now as it'll only be
> + * true when contending with a remote drain. It's in place as a
> + * preparation step before converting pcp locking to spin_trylock
> + * to protect against IRQ reentry.
> + */
> + pcp_trylock_prepare(UP_flags);
> + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
> + if (!pcp)

Besides the missing unpin Andrew fixed, I think also this is missing
pcp_trylock_finish(UP_flags); ?

> + return NULL;
>
> /*
> * On allocation, reduce the number of pages that are batch freed.
> * See nr_pcp_free() where free_factor is increased for subsequent
> * frees.
> */
> - pcp = this_cpu_ptr(zone->per_cpu_pageset);
> pcp->free_factor >>= 1;
> list = &pcp->lists[order_to_pindex(migratetype, order)];
> - page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list, false);
> - local_unlock_irqrestore(&pagesets.lock, flags);
> + page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list);
> + pcp_spin_unlock_irqrestore(pcp, flags);
> + pcp_trylock_finish(UP_flags);
> if (page) {
> __count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
> zone_statistics(preferred_zone, zone, 1);
> @@ -5410,10 +5431,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> goto failed;
>
> /* Attempt the batch allocation */
> - local_lock_irqsave(&pagesets.lock, flags);
> - pcp = this_cpu_ptr(zone->per_cpu_pageset);
> + pcp = pcp_spin_lock_irqsave(zone->per_cpu_pageset, flags);
> pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)];
> - spin_lock(&pcp->lock);
>
> while (nr_populated < nr_pages) {
>
> @@ -5424,13 +5443,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> }
>
> page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
> - pcp, pcp_list, true);
> + pcp, pcp_list);
> if (unlikely(!page)) {
> /* Try and allocate at least one page */
> - if (!nr_account) {
> - spin_unlock(&pcp->lock);
> + if (!nr_account)
> goto failed_irq;
> - }
> break;
> }
> nr_account++;
> @@ -5443,8 +5460,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> nr_populated++;
> }
>
> - spin_unlock(&pcp->lock);
> - local_unlock_irqrestore(&pagesets.lock, flags);
> + pcp_spin_unlock_irqrestore(pcp, flags);
>
> __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
> zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
> @@ -5453,7 +5469,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> return nr_populated;
>
> failed_irq:
> - local_unlock_irqrestore(&pagesets.lock, flags);
> + pcp_spin_unlock_irqrestore(pcp, flags);
>
> failed:
> page = __alloc_pages(gfp, 0, preferred_nid, nodemask);

2022-06-16 21:47:59

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Thu, Jun 16, 2022 at 11:02 AM Vlastimil Babka <[email protected]> wrote:
>
> On 6/13/22 14:56, Mel Gorman wrote:
> > struct per_cpu_pages is no longer strictly local as PCP lists can be
> > drained remotely using a lock for protection. While the use of local_lock
> > works, it goes against the intent of local_lock which is for "pure
> > CPU local concurrency control mechanisms and not suited for inter-CPU
> > concurrency control" (Documentation/locking/locktypes.rst)
> >
> > local_lock protects against migration between when the percpu pointer is
> > accessed and the pcp->lock acquired. The lock acquisition is a preemption
> > point so in the worst case, a task could migrate to another NUMA node
> > and accidentally allocate remote memory. The main requirement is to pin
> > the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
> >
> > Replace local_lock with helpers that pin a task to a CPU, lookup the
> > per-cpu structure and acquire the embedded lock. It's similar to local_lock
> > without breaking the intent behind the API. It is not a complete API
> > as only the parts needed for PCP-alloc are implemented but in theory,
> > the generic helpers could be promoted to a general API if there was
> > demand for an embedded lock within a per-cpu struct with a guarantee
> > that the per-cpu structure locked matches the running CPU and cannot use
> > get_cpu_var due to RT concerns. PCP requires these semantics to avoid
> > accidentally allocating remote memory.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
>
> ...
>
> > @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
> > return min(READ_ONCE(pcp->batch) << 2, high);
> > }
> >
> > -/* Returns true if the page was committed to the per-cpu list. */
> > -static bool free_unref_page_commit(struct page *page, int migratetype,
> > - unsigned int order, bool locked)
> > +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone,
> > + struct page *page, int migratetype,
> > + unsigned int order)
>
> Hmm given this drops the "bool locked" and bool return value again, my
> suggestion for patch 5/7 would result in less churn as those woudn't need to
> be introduced?
>
> ...
>
> > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> > struct list_head *list;
> > struct page *page;
> > unsigned long flags;
> > + unsigned long __maybe_unused UP_flags;
> >
> > - local_lock_irqsave(&pagesets.lock, flags);
> > + /*
> > + * spin_trylock_irqsave is not necessary right now as it'll only be
> > + * true when contending with a remote drain. It's in place as a
> > + * preparation step before converting pcp locking to spin_trylock
> > + * to protect against IRQ reentry.
> > + */
> > + pcp_trylock_prepare(UP_flags);
> > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
> > + if (!pcp)
>
> Besides the missing unpin Andrew fixed, I think also this is missing
> pcp_trylock_finish(UP_flags); ?

spin_trylock only fails when trylock_finish is a NOP.

2022-06-17 07:26:47

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

Hi Andrew,

On 16.06.2022 01:04, Andrew Morton wrote:
> On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <[email protected]> wrote:
>
>> In the logs I see lots of errors like:
>>
>> BUG: sleeping function called from invalid context at
>> ./include/linux/sched/mm.h:274
>>
>> BUG: scheduling while atomic: systemd-udevd/288/0x00000002
>>
>> BUG: sleeping function called from invalid context at mm/filemap.c:2647
>>
>> however there are also a fatal ones like:
>>
>> Unable to handle kernel paging request at virtual address 00000000017a87b4
>>
>>
>> The issues seems to be a bit random. Looks like memory trashing.
>> Reverting $subject on top of current linux-next fixes all those issues.
>>
>>
> This?
>
> --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
> +++ a/mm/page_alloc.c
> @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock)
> type *_ret; \
> pcpu_task_pin(); \
> _ret = this_cpu_ptr(ptr); \
> - if (!spin_trylock_irqsave(&_ret->member, flags)) \
> + if (!spin_trylock_irqsave(&_ret->member, flags)) { \
> + pcpu_task_unpin(); \
> _ret = NULL; \
> + } \
> _ret; \
> })
>
>
> I'll drop Mel's patch for next -next.

Yes, this fixes the issues I've observed. Feel free to add:

Tested-by: Marek Szyprowski <[email protected]>

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-17 08:00:56

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On 6/16/22 05:05, Yu Zhao wrote:
> On Wed, Jun 15, 2022 at 04:04:46PM -0700, Andrew Morton wrote:
>
> While we are at it, please consider this cleanup:

I suspect Mel had further plans for the API beynd this series.

...

> #define pcp_spin_trylock_irqsave(ptr, flags) \
> - pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags)
> -
> -#define pcp_spin_unlock(ptr) \
> - pcpu_spin_unlock(lock, ptr)
> +({ \
> + struct per_cpu_pages *_ret; \
> + pcpu_task_pin(); \
> + _ret = this_cpu_ptr(ptr); \
> + if (!spin_trylock_irqsave(&_ret->lock, flags)) \

Also missing the unpin?

> + _ret = NULL; \
> + _ret; \
> +})
>
> #define pcp_spin_unlock_irqrestore(ptr, flags) \
> - pcpu_spin_unlock_irqrestore(lock, ptr, flags)
> +({ \
> + spin_unlock_irqrestore(&ptr->lock, flags); \
> + pcpu_task_unpin(); \
> +})
> +
> #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> DEFINE_PER_CPU(int, numa_node);
> EXPORT_PER_CPU_SYMBOL(numa_node);
> @@ -3488,7 +3458,7 @@ void free_unref_page(struct page *page, unsigned int order)
>
> zone = page_zone(page);
> pcp_trylock_prepare(UP_flags);
> - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
> + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
> if (pcp) {
> free_unref_page_commit(pcp, zone, page, migratetype, order);
> pcp_spin_unlock_irqrestore(pcp, flags);

2022-06-17 08:01:04

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On 6/16/22 23:07, Yu Zhao wrote:
> On Thu, Jun 16, 2022 at 11:02 AM Vlastimil Babka <[email protected]> wrote:
>>
>>
>> > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
>> > struct list_head *list;
>> > struct page *page;
>> > unsigned long flags;
>> > + unsigned long __maybe_unused UP_flags;
>> >
>> > - local_lock_irqsave(&pagesets.lock, flags);
>> > + /*
>> > + * spin_trylock_irqsave is not necessary right now as it'll only be
>> > + * true when contending with a remote drain. It's in place as a
>> > + * preparation step before converting pcp locking to spin_trylock
>> > + * to protect against IRQ reentry.
>> > + */
>> > + pcp_trylock_prepare(UP_flags);
>> > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
>> > + if (!pcp)
>>
>> Besides the missing unpin Andrew fixed, I think also this is missing
>> pcp_trylock_finish(UP_flags); ?
>
> spin_trylock only fails when trylock_finish is a NOP.

True, so it's not an active bug, but I would still add it, so it's not
confusing and depending on non-obvious details that might later change and
break the code.

2022-06-17 10:24:22

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

Hi Mel,

On Mon, 2022-06-13 at 13:56 +0100, Mel Gorman wrote:
> @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order)
> migratetype = MIGRATE_MOVABLE;
> }
>
> - local_lock_irqsave(&pagesets.lock, flags);
> - freed_pcp = free_unref_page_commit(page, migratetype, order, false);
> - local_unlock_irqrestore(&pagesets.lock, flags);
> -
> - if (unlikely(!freed_pcp))
> + zone = page_zone(page);
> + pcp_trylock_prepare(UP_flags);

Now that you're calling the *_irqsave() family of function you can drop
pcp_trylock_prepare/finish()

For the record in UP:

#define spin_trylock_irqsave(lock, flags) \
({ \
local_irq_save(flags); \
1;
})

> + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
> + if (pcp) {
> + free_unref_page_commit(pcp, zone, page, migratetype, order);
> + pcp_spin_unlock_irqrestore(pcp, flags);
> + } else {
> free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
> + }
> + pcp_trylock_finish(UP_flags);
> }
>
> /*

As Vlastimil mentioned elsewhere, I also wonder if it makes sense to just
bypass patch #5. Especially as its intent isn't true anymore:

"As preparation for dealing with both of those problems, protect the lists
with a spinlock. The IRQ-unsafe version of the lock is used because IRQs
are already disabled by local_lock_irqsave. spin_trylock is used in
preparation for a time when local_lock could be used instead of
lock_lock_irqsave."

--
Nicolás Sáenz

2022-06-21 10:00:52

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Fri, Jun 17, 2022 at 11:39:03AM +0200, Nicolas Saenz Julienne wrote:
> Hi Mel,
>
> On Mon, 2022-06-13 at 13:56 +0100, Mel Gorman wrote:
> > @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order)
> > migratetype = MIGRATE_MOVABLE;
> > }
> >
> > - local_lock_irqsave(&pagesets.lock, flags);
> > - freed_pcp = free_unref_page_commit(page, migratetype, order, false);
> > - local_unlock_irqrestore(&pagesets.lock, flags);
> > -
> > - if (unlikely(!freed_pcp))
> > + zone = page_zone(page);
> > + pcp_trylock_prepare(UP_flags);
>
> Now that you're calling the *_irqsave() family of function you can drop
> pcp_trylock_prepare/finish()
>
> For the record in UP:
>
> #define spin_trylock_irqsave(lock, flags) \
> ({ \
> local_irq_save(flags); \
> 1;
> })
>

The missing patch that is deferred for a later release uses spin_trylock
so unless that is never merged because there is an unfixable flaw in it,
I'd prefer to leave the preparation in place.

> > + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
> > + if (pcp) {
> > + free_unref_page_commit(pcp, zone, page, migratetype, order);
> > + pcp_spin_unlock_irqrestore(pcp, flags);
> > + } else {
> > free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
> > + }
> > + pcp_trylock_finish(UP_flags);
> > }
> >
> > /*
>
> As Vlastimil mentioned elsewhere, I also wonder if it makes sense to just
> bypass patch #5. Especially as its intent isn't true anymore:
>
> "As preparation for dealing with both of those problems, protect the lists
> with a spinlock. The IRQ-unsafe version of the lock is used because IRQs
> are already disabled by local_lock_irqsave. spin_trylock is used in
> preparation for a time when local_lock could be used instead of
> lock_lock_irqsave."
>

It's still true, the patch just isn't included as I wanted them to be
separated by time so a bisection that points to it is "obvious" instead
of pointing at the whole series as being a potential problem.

--
Mel Gorman
SUSE Labs

2022-06-21 10:00:52

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Thu, Jun 16, 2022 at 07:01:53PM +0200, Vlastimil Babka wrote:
> On 6/13/22 14:56, Mel Gorman wrote:
> > struct per_cpu_pages is no longer strictly local as PCP lists can be
> > drained remotely using a lock for protection. While the use of local_lock
> > works, it goes against the intent of local_lock which is for "pure
> > CPU local concurrency control mechanisms and not suited for inter-CPU
> > concurrency control" (Documentation/locking/locktypes.rst)
> >
> > local_lock protects against migration between when the percpu pointer is
> > accessed and the pcp->lock acquired. The lock acquisition is a preemption
> > point so in the worst case, a task could migrate to another NUMA node
> > and accidentally allocate remote memory. The main requirement is to pin
> > the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
> >
> > Replace local_lock with helpers that pin a task to a CPU, lookup the
> > per-cpu structure and acquire the embedded lock. It's similar to local_lock
> > without breaking the intent behind the API. It is not a complete API
> > as only the parts needed for PCP-alloc are implemented but in theory,
> > the generic helpers could be promoted to a general API if there was
> > demand for an embedded lock within a per-cpu struct with a guarantee
> > that the per-cpu structure locked matches the running CPU and cannot use
> > get_cpu_var due to RT concerns. PCP requires these semantics to avoid
> > accidentally allocating remote memory.
> >
> > Signed-off-by: Mel Gorman <[email protected]>
>
> ...
>
> > @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
> > return min(READ_ONCE(pcp->batch) << 2, high);
> > }
> >
> > -/* Returns true if the page was committed to the per-cpu list. */
> > -static bool free_unref_page_commit(struct page *page, int migratetype,
> > - unsigned int order, bool locked)
> > +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone,
> > + struct page *page, int migratetype,
> > + unsigned int order)
>
> Hmm given this drops the "bool locked" and bool return value again, my
> suggestion for patch 5/7 would result in less churn as those woudn't need to
> be introduced?
>

It would. I considered doing exactly that but I didn't want to drop the
reviewed-bys and tested-bys and the change was significant enough to do
that. As multiple fixes are needed, I'll do that.

> ...
>
> > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> > struct list_head *list;
> > struct page *page;
> > unsigned long flags;
> > + unsigned long __maybe_unused UP_flags;
> >
> > - local_lock_irqsave(&pagesets.lock, flags);
> > + /*
> > + * spin_trylock_irqsave is not necessary right now as it'll only be
> > + * true when contending with a remote drain. It's in place as a
> > + * preparation step before converting pcp locking to spin_trylock
> > + * to protect against IRQ reentry.
> > + */
> > + pcp_trylock_prepare(UP_flags);
> > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
> > + if (!pcp)
>
> Besides the missing unpin Andrew fixed, I think also this is missing
> pcp_trylock_finish(UP_flags); ?
>

Yes.

--
Mel Gorman
SUSE Labs

2022-06-21 10:05:10

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Fri, Jun 17, 2022 at 09:57:06AM +0200, Vlastimil Babka wrote:
> On 6/16/22 23:07, Yu Zhao wrote:
> > On Thu, Jun 16, 2022 at 11:02 AM Vlastimil Babka <[email protected]> wrote:
> >>
> >>
> >> > @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
> >> > struct list_head *list;
> >> > struct page *page;
> >> > unsigned long flags;
> >> > + unsigned long __maybe_unused UP_flags;
> >> >
> >> > - local_lock_irqsave(&pagesets.lock, flags);
> >> > + /*
> >> > + * spin_trylock_irqsave is not necessary right now as it'll only be
> >> > + * true when contending with a remote drain. It's in place as a
> >> > + * preparation step before converting pcp locking to spin_trylock
> >> > + * to protect against IRQ reentry.
> >> > + */
> >> > + pcp_trylock_prepare(UP_flags);
> >> > + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
> >> > + if (!pcp)
> >>
> >> Besides the missing unpin Andrew fixed, I think also this is missing
> >> pcp_trylock_finish(UP_flags); ?
> >
> > spin_trylock only fails when trylock_finish is a NOP.
>
> True, so it's not an active bug, but I would still add it, so it's not
> confusing and depending on non-obvious details that might later change and
> break the code.

Yes. Even though it may work, it's still wrong.

--
Mel Gorman
SUSE Labs

2022-06-21 10:08:34

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Tue, 2022-06-21 at 10:29 +0100, Mel Gorman wrote:
> On Fri, Jun 17, 2022 at 11:39:03AM +0200, Nicolas Saenz Julienne wrote:
> > Hi Mel,
> >
> > On Mon, 2022-06-13 at 13:56 +0100, Mel Gorman wrote:
> > > @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order)
> > > migratetype = MIGRATE_MOVABLE;
> > > }
> > >
> > > - local_lock_irqsave(&pagesets.lock, flags);
> > > - freed_pcp = free_unref_page_commit(page, migratetype, order, false);
> > > - local_unlock_irqrestore(&pagesets.lock, flags);
> > > -
> > > - if (unlikely(!freed_pcp))
> > > + zone = page_zone(page);
> > > + pcp_trylock_prepare(UP_flags);
> >
> > Now that you're calling the *_irqsave() family of function you can drop
> > pcp_trylock_prepare/finish()
> >
> > For the record in UP:
> >
> > #define spin_trylock_irqsave(lock, flags) \
> > ({ \
> > local_irq_save(flags); \
> > 1;
> > })
> >
>
> The missing patch that is deferred for a later release uses spin_trylock
> so unless that is never merged because there is an unfixable flaw in it,
> I'd prefer to leave the preparation in place.
>
> > > + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
> > > + if (pcp) {
> > > + free_unref_page_commit(pcp, zone, page, migratetype, order);
> > > + pcp_spin_unlock_irqrestore(pcp, flags);
> > > + } else {
> > > free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
> > > + }
> > > + pcp_trylock_finish(UP_flags);
> > > }
> > >
> > > /*
> >
> > As Vlastimil mentioned elsewhere, I also wonder if it makes sense to just
> > bypass patch #5. Especially as its intent isn't true anymore:
> >
> > "As preparation for dealing with both of those problems, protect the lists
> > with a spinlock. The IRQ-unsafe version of the lock is used because IRQs
> > are already disabled by local_lock_irqsave. spin_trylock is used in
> > preparation for a time when local_lock could be used instead of
> > lock_lock_irqsave."
> >
>
> It's still true, the patch just isn't included as I wanted them to be
> separated by time so a bisection that points to it is "obvious" instead
> of pointing at the whole series as being a potential problem.

Understood, I jumped straight into the code and missed your comment in the
cover letter.

Thanks!

--
Nicolás Sáenz

2022-06-21 10:21:20

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock

On Wed, Jun 15, 2022 at 04:04:46PM -0700, Andrew Morton wrote:
> On Thu, 16 Jun 2022 00:48:55 +0200 Marek Szyprowski <[email protected]> wrote:
>
> > In the logs I see lots of errors like:
> >
> > BUG: sleeping function called from invalid context at
> > ./include/linux/sched/mm.h:274
> >
> > BUG: scheduling while atomic: systemd-udevd/288/0x00000002
> >
> > BUG: sleeping function called from invalid context at mm/filemap.c:2647
> >
> > however there are also a fatal ones like:
> >
> > Unable to handle kernel paging request at virtual address 00000000017a87b4
> >
> >
> > The issues seems to be a bit random. Looks like memory trashing.
> > Reverting $subject on top of current linux-next fixes all those issues.
> >
> >
>
> This?
>
> --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
> +++ a/mm/page_alloc.c
> @@ -183,8 +183,10 @@ static DEFINE_MUTEX(pcp_batch_high_lock)
> type *_ret; \
> pcpu_task_pin(); \
> _ret = this_cpu_ptr(ptr); \
> - if (!spin_trylock_irqsave(&_ret->member, flags)) \
> + if (!spin_trylock_irqsave(&_ret->member, flags)) { \
> + pcpu_task_unpin(); \
> _ret = NULL; \
> + } \
> _ret; \
> })
>

This is the correct fix. I *had* a fix for this but in a patch that was
not posted that drops irqsave :(

--
Mel Gorman
SUSE Labs

2022-07-03 09:52:00

by Oliver Sang

[permalink] [raw]
Subject: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
patch link: https://lore.kernel.org/lkml/[email protected]

in testcase: kernel-selftests
version: kernel-selftests-x86_64-a10a197d-1_20220626
with following parameters:

sc_nr_hugepages: 2
group: vm
ucode: 0x500320a

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 202.339609][T27281] BUG: sleeping function called from invalid context at mm/gup.c:1170
[ 202.339615][T27281] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27281, name: compaction_test
[ 202.339617][T27281] preempt_count: 1, expected: 0
[ 202.339619][T27281] 1 lock held by compaction_test/27281:
[202.339622][T27281] #0: ffff88911e087828 (&mm->mmap_lock#2){++++}-{3:3}, at: __mm_populate (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/linux/mmap_lock.h:35 include/linux/mmap_lock.h:118 mm/gup.c:1611)
[ 202.339637][T27281] CPU: 78 PID: 27281 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 202.339641][T27281] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 202.339643][T27281] Call Trace:
[ 202.339645][T27281] <TASK>
[202.339650][T27281] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[202.339657][T27281] __might_resched.cold (kernel/sched/core.c:9792)
[202.339668][T27281] __get_user_pages (include/linux/sched.h:2059 mm/gup.c:1170)
[202.339682][T27281] ? get_gate_page (mm/gup.c:1099)
[202.339697][T27281] ? rwsem_down_read_slowpath (kernel/locking/rwsem.c:1487)
[202.339709][T27281] populate_vma_page_range (mm/gup.c:1518)
[202.339715][T27281] __mm_populate (mm/gup.c:1639)
[202.339720][T27281] ? faultin_vma_page_range (mm/gup.c:1595)
[202.339726][T27281] ? __up_write (arch/x86/include/asm/atomic64_64.h:172 (discriminator 23) include/linux/atomic/atomic-long.h:95 (discriminator 23) include/linux/atomic/atomic-instrumented.h:1348 (discriminator 23) kernel/locking/rwsem.c:1346 (discriminator 23))
[202.339736][T27281] vm_mmap_pgoff (include/linux/mm.h:2706 mm/util.c:557)
[202.339745][T27281] ? randomize_page (mm/util.c:542)
[202.339753][T27281] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[202.339757][T27281] ? syscall_enter_from_user_mode (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 kernel/entry/common.c:109)
[202.339768][T27281] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[202.339779][T27281] ? __local_bh_enable (kernel/softirq.c:357)
[202.339785][T27281] ? __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600)
[202.339795][T27281] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309)
[202.339802][T27281] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[202.339806][T27281] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115)
[ 202.339810][T27281] RIP: 0033:0x7fdb25ea1b62
[ 202.339814][T27281] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64
All code
========
0: e4 e8 in $0xe8,%al
2: b2 4b mov $0x4b,%dl
4: 01 00 add %eax,(%rax)
6: 66 90 xchg %ax,%ax
8: 41 f7 c1 ff 0f 00 00 test $0xfff,%r9d
f: 75 27 jne 0x38
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 53 push %rbx
16: 89 cb mov %ecx,%ebx
18: 48 85 ff test %rdi,%rdi
1b: 74 3b je 0x58
1d: 41 89 da mov %ebx,%r10d
20: 48 89 ef mov %rbp,%rdi
23: b8 09 00 00 00 mov $0x9,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 66 ja 0x98
32: 5b pop %rbx
33: 5d pop %rbp
34: c3 retq
35: 0f 1f 00 nopl (%rax)
38: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc5338
3f: 64 fs

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 66 ja 0x6e
8: 5b pop %rbx
9: 5d pop %rbp
a: c3 retq
b: 0f 1f 00 nopl (%rax)
e: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc530e
15: 64 fs
[ 202.339817][T27281] RSP: 002b:00007ffc53280778 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 202.339820][T27281] RAX: ffffffffffffffda RBX: 0000000000002022 RCX: 00007fdb25ea1b62
[ 202.339822][T27281] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000
[ 202.339823][T27281] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[ 202.339825][T27281] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170
[ 202.339826][T27281] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 202.339842][T27281] </TASK>
[ 202.571229][T27281] BUG: scheduling while atomic: compaction_test/27281/0x00000003
[ 202.571235][T27281] no locks held by compaction_test/27281.
[ 202.571236][T27281] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 202.571302][T27281] CPU: 78 PID: 27281 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 202.571305][T27281] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 202.571307][T27281] Call Trace:
[ 202.571309][T27281] <TASK>
[202.571313][T27281] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[202.571321][T27281] __schedule_bug.cold (kernel/sched/core.c:5661)
[202.571328][T27281] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688)
[202.571338][T27281] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324)
[202.571348][T27281] ? io_schedule_timeout (kernel/sched/core.c:6310)
[202.571352][T27281] ? vm_mmap_pgoff (include/linux/mm.h:2706 mm/util.c:557)
[202.571363][T27281] schedule (include/linux/instrumented.h:71 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:134 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1))
[202.571368][T27281] exit_to_user_mode_loop (kernel/entry/common.c:159)
[202.571374][T27281] exit_to_user_mode_prepare (kernel/entry/common.c:201)
[202.571377][T27281] syscall_exit_to_user_mode (kernel/entry/common.c:128 kernel/entry/common.c:296)
[202.571383][T27281] do_syscall_64 (arch/x86/entry/common.c:87)
[202.571387][T27281] ? __local_bh_enable (kernel/softirq.c:357)
[202.571392][T27281] ? __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600)
[202.571400][T27281] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309)
[202.571407][T27281] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[202.571412][T27281] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115)
[ 202.571416][T27281] RIP: 0033:0x7fdb25ea1b62
[ 202.571421][T27281] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64
All code
========
0: e4 e8 in $0xe8,%al
2: b2 4b mov $0x4b,%dl
4: 01 00 add %eax,(%rax)
6: 66 90 xchg %ax,%ax
8: 41 f7 c1 ff 0f 00 00 test $0xfff,%r9d
f: 75 27 jne 0x38
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 53 push %rbx
16: 89 cb mov %ecx,%ebx
18: 48 85 ff test %rdi,%rdi
1b: 74 3b je 0x58
1d: 41 89 da mov %ebx,%r10d
20: 48 89 ef mov %rbp,%rdi
23: b8 09 00 00 00 mov $0x9,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 66 ja 0x98
32: 5b pop %rbx
33: 5d pop %rbp
34: c3 retq
35: 0f 1f 00 nopl (%rax)
38: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc5338
3f: 64 fs

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 66 ja 0x6e
8: 5b pop %rbx
9: 5d pop %rbp
a: c3 retq
b: 0f 1f 00 nopl (%rax)
e: 48 8b 05 f9 52 0c 00 mov 0xc52f9(%rip),%rax # 0xc530e
15: 64 fs
[ 202.571423][T27281] RSP: 002b:00007ffc53280778 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 202.571426][T27281] RAX: 00007fcc735a6000 RBX: 0000000000002022 RCX: 00007fdb25ea1b62
[ 202.571428][T27281] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000
[ 202.571429][T27281] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[ 202.571431][T27281] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170
[ 202.571432][T27281] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 202.571446][T27281] </TASK>
[ 215.004337][ T1122]
[ 228.735493][ T1122]
[ 242.528575][ T1122]
[ 256.379123][ T1122]
[ 269.551898][ T569] BUG: sleeping function called from invalid context at mm/migrate.c:1380
[ 269.551906][ T569] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 569, name: kcompactd1
[ 269.551909][ T569] preempt_count: 1, expected: 0
[ 269.551912][ T569] no locks held by kcompactd1/569.
[ 269.551916][ T569] CPU: 72 PID: 569 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 269.551921][ T569] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 269.551924][ T569] Call Trace:
[ 269.551926][ T569] <TASK>
[ 269.551934][ T569] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 269.551945][ T569] __might_resched.cold (kernel/sched/core.c:9792)
[ 269.551958][ T569] migrate_pages (include/linux/sched.h:2059 mm/migrate.c:1380)
[ 269.551971][ T569] ? isolate_freepages (mm/compaction.c:1687)
[ 269.551978][ T569] ? split_map_pages (mm/compaction.c:1711)
[ 269.551994][ T569] ? buffer_migrate_page_norefs (mm/migrate.c:1345)
[ 269.552002][ T569] ? isolate_migratepages (mm/compaction.c:1959)
[ 269.552023][ T569] compact_zone (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/compaction.h:68 mm/compaction.c:2419)
[ 269.552054][ T569] ? compaction_suitable (mm/compaction.c:2292)
[ 269.552063][ T569] ? lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5667 kernel/locking/lockdep.c:5630)
[ 269.552069][ T569] ? finish_wait (include/linux/list.h:134 include/linux/list.h:206 kernel/sched/wait.c:407)
[ 269.552082][ T569] proactive_compact_node (mm/compaction.c:2660 (discriminator 2))
[ 269.552089][ T569] ? compact_store (mm/compaction.c:2648)
[ 269.552115][ T569] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[ 269.552121][ T569] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 269.552134][ T569] kcompactd (mm/compaction.c:2011 mm/compaction.c:2031 mm/compaction.c:2978)
[ 269.552152][ T569] ? kcompactd_do_work (mm/compaction.c:2924)
[ 269.552161][ T569] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414)
[ 269.552174][ T569] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4))
[ 269.552178][ T569] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1))
[ 269.552183][ T569] ? kcompactd_do_work (mm/compaction.c:2924)
[ 269.552193][ T569] kthread (kernel/kthread.c:376)
[ 269.552196][ T569] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 269.552206][ T569] ret_from_fork (arch/x86/entry/entry_64.S:302)
[ 269.552235][ T569] </TASK>
[ 269.961505][ T568] BUG: scheduling while atomic: kcompactd0/568/0x00000028
[ 269.961512][ T568] no locks held by kcompactd0/568.
[ 269.961514][ T568] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 269.961581][ T568] CPU: 13 PID: 568 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 269.961585][ T568] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 269.961587][ T568] Call Trace:
[ 269.961589][ T568] <TASK>
[ 269.961596][ T568] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 269.961606][ T568] __schedule_bug.cold (kernel/sched/core.c:5661)
[ 269.961615][ T568] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688)
[ 269.961625][ T568] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324)
[ 269.961637][ T568] ? io_schedule_timeout (kernel/sched/core.c:6310)
[ 269.961641][ T568] ? find_held_lock (kernel/locking/lockdep.c:5156)
[ 269.961647][ T568] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15))
[ 269.961657][ T568] schedule (include/linux/instrumented.h:71 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:134 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1))
[ 269.961662][ T568] schedule_timeout (kernel/time/timer.c:1936)
[ 269.961668][ T568] ? usleep_range_state (kernel/time/timer.c:1897)
[ 269.961673][ T568] ? timer_migration_handler (kernel/time/timer.c:1859)
[ 269.961682][ T568] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 269.961687][ T568] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15))
[ 269.961695][ T568] kcompactd (include/linux/freezer.h:121 include/linux/freezer.h:193 mm/compaction.c:2950)
[ 269.961707][ T568] ? kcompactd_do_work (mm/compaction.c:2924)
[ 269.961713][ T568] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414)
[ 269.961720][ T568] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4))
[ 269.961724][ T568] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1))
[ 269.961727][ T568] ? kcompactd_do_work (mm/compaction.c:2924)
[ 269.961732][ T568] kthread (kernel/kthread.c:376)
[ 269.961735][ T568] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 269.961741][ T568] ret_from_fork (arch/x86/entry/entry_64.S:302)
[ 269.961758][ T568] </TASK>
[ 270.347843][ T569] BUG: scheduling while atomic: kcompactd1/569/0x00000017
[ 270.347849][ T569] no locks held by kcompactd1/569.
[ 270.347851][ T569] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 270.347911][ T569] CPU: 72 PID: 569 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 270.347915][ T569] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 270.347917][ T569] Call Trace:
[ 270.347920][ T569] <TASK>
[ 270.347926][ T569] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[ 270.347935][ T569] __schedule_bug.cold (kernel/sched/core.c:5661)
[ 270.347944][ T569] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688)
[ 270.347955][ T569] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324)
[ 270.347967][ T569] ? io_schedule_timeout (kernel/sched/core.c:6310)
[ 270.347970][ T569] ? find_held_lock (kernel/locking/lockdep.c:5156)
[ 270.347977][ T569] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15))
[ 270.347987][ T569] schedule (include/linux/instrumented.h:71 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:134 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1))
[ 270.347993][ T569] schedule_timeout (kernel/time/timer.c:1936)
[ 270.347999][ T569] ? usleep_range_state (kernel/time/timer.c:1897)
[ 270.348004][ T569] ? timer_migration_handler (kernel/time/timer.c:1859)
[ 270.348013][ T569] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 270.348018][ T569] ? prepare_to_wait_event (kernel/sched/wait.c:334 (discriminator 15))
[ 270.348025][ T569] kcompactd (include/linux/freezer.h:121 include/linux/freezer.h:193 mm/compaction.c:2950)
[ 270.348040][ T569] ? kcompactd_do_work (mm/compaction.c:2924)
[ 270.348045][ T569] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414)
[ 270.348053][ T569] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4))
[ 270.348057][ T569] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2196 (discriminator 1) kernel/sched/core.c:6502 (discriminator 1))
[ 270.348059][ T569] ? kcompactd_do_work (mm/compaction.c:2924)
[ 270.348065][ T569] kthread (kernel/kthread.c:376)
[ 270.348068][ T569] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 270.348073][ T569] ret_from_fork (arch/x86/entry/entry_64.S:302)
[ 270.348092][ T569] </TASK>
[ 270.616627][ T1122]
[ 270.768074][T27574] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
[ 270.768078][T27574] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27574, name: date
[ 270.768080][T27574] preempt_count: 1, expected: 0
[ 270.768082][T27574] 1 lock held by date/27574:
[270.768084][T27574] #0: ffff88820bd53228 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/linux/mmap_lock.h:35 include/linux/mmap_lock.h:137 arch/x86/mm/fault.c:1338)
[ 270.768098][T27574] CPU: 4 PID: 27574 Comm: date Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 270.768101][T27574] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 270.768103][T27574] Call Trace:
[ 270.768104][T27574] <TASK>
[270.768108][T27574] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[270.768113][T27574] __might_resched.cold (kernel/sched/core.c:9792)
[270.768120][T27574] ? __pmd_alloc (mm/memory.c:5763 include/linux/mm.h:2304 include/linux/mm.h:2390 include/linux/mm.h:2426 include/asm-generic/pgalloc.h:129 mm/memory.c:5214)
[270.768125][T27574] kmem_cache_alloc (include/linux/sched/mm.h:274 mm/slab.h:723 mm/slub.c:3128 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239)
[270.768137][T27574] __pmd_alloc (mm/memory.c:5763 include/linux/mm.h:2304 include/linux/mm.h:2390 include/linux/mm.h:2426 include/asm-generic/pgalloc.h:129 mm/memory.c:5214)
[270.768144][T27574] __handle_mm_fault (include/linux/mm.h:2254 mm/memory.c:5003)
[270.768155][T27574] ? copy_page_range (mm/memory.c:4955)
[270.768159][T27574] ? __lock_release (kernel/locking/lockdep.c:5341)
[270.768172][T27574] ? lock_is_held_type (kernel/locking/lockdep.c:5406 kernel/locking/lockdep.c:5708)
[270.768181][T27574] ? handle_mm_fault (include/linux/rcupdate.h:274 include/linux/rcupdate.h:728 include/linux/memcontrol.h:1087 include/linux/memcontrol.h:1075 mm/memory.c:5120)
[270.768188][T27574] handle_mm_fault (mm/memory.c:5140)
[270.768195][T27574] do_user_addr_fault (arch/x86/mm/fault.c:1397)
[270.768206][T27574] exc_page_fault (arch/x86/include/asm/irqflags.h:29 arch/x86/include/asm/irqflags.h:70 arch/x86/include/asm/irqflags.h:130 arch/x86/mm/fault.c:1492 arch/x86/mm/fault.c:1540)
[270.768211][T27574] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
[270.768215][T27574] RIP: 0010:__clear_user (arch/x86/lib/usercopy_64.c:24)
[ 270.768220][T27574] Code: 00 00 00 e8 a2 28 56 ff 0f 01 cb 48 89 d8 48 c1 eb 03 48 89 ef 83 e0 07 48 89 d9 48 85 c9 74 19 66 2e 0f 1f 84 00 00 00 00 00 <48> c7 07 00 00 00 00 48 83 c7 08 ff c9 75 f1 48 89 c1 85 c9 74 0a
All code
========
0: 00 00 add %al,(%rax)
2: 00 e8 add %ch,%al
4: a2 28 56 ff 0f 01 cb movabs %al,0x8948cb010fff5628
b: 48 89
d: d8 48 c1 fmuls -0x3f(%rax)
10: eb 03 jmp 0x15
12: 48 89 ef mov %rbp,%rdi
15: 83 e0 07 and $0x7,%eax
18: 48 89 d9 mov %rbx,%rcx
1b: 48 85 c9 test %rcx,%rcx
1e: 74 19 je 0x39
20: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
27: 00 00 00
2a:* 48 c7 07 00 00 00 00 movq $0x0,(%rdi) <-- trapping instruction
31: 48 83 c7 08 add $0x8,%rdi
35: ff c9 dec %ecx
37: 75 f1 jne 0x2a
39: 48 89 c1 mov %rax,%rcx
3c: 85 c9 test %ecx,%ecx
3e: 74 0a je 0x4a

Code starting with the faulting instruction
===========================================
0: 48 c7 07 00 00 00 00 movq $0x0,(%rdi)
7: 48 83 c7 08 add $0x8,%rdi
b: ff c9 dec %ecx
d: 75 f1 jne 0x0
f: 48 89 c1 mov %rax,%rcx
12: 85 c9 test %ecx,%ecx
14: 74 0a je 0x20
[ 270.768223][T27574] RSP: 0018:ffffc900350dfb28 EFLAGS: 00050202
[ 270.768226][T27574] RAX: 0000000000000000 RBX: 00000000000001a4 RCX: 00000000000001a4
[ 270.768227][T27574] RDX: 0000000000000000 RSI: ffff88820bd53228 RDI: 00005649441d92e0
[ 270.768229][T27574] RBP: 00005649441d92e0 R08: ffff88a0589ec810 R09: ffffffff85f06fa7
[ 270.768231][T27574] R10: fffffbfff0be0df4 R11: 0000000000000001 R12: 0000000000000000
[ 270.768232][T27574] R13: 000000000001c498 R14: 00005649441d92e0 R15: 000000000001c2e0
[270.768249][T27574] ? __clear_user (arch/x86/include/asm/smap.h:39 arch/x86/lib/usercopy_64.c:23)
[270.768252][T27574] load_elf_binary (fs/binfmt_elf.c:143 fs/binfmt_elf.c:1244)
[270.768279][T27574] ? load_elf_interp+0xa80/0xa80
[270.768285][T27574] ? search_binary_handler (fs/exec.c:1728)
[270.768297][T27574] search_binary_handler (fs/exec.c:1728)
[270.768302][T27574] ? bprm_change_interp (fs/exec.c:1707)
[270.768310][T27574] ? exec_binprm (include/linux/rcupdate.h:274 include/linux/rcupdate.h:728 fs/exec.c:1761)
[270.768317][T27574] exec_binprm (fs/exec.c:1770)
[270.768325][T27574] bprm_execve (fs/exec.c:1920)
[270.768330][T27574] ? bprm_execve (fs/exec.c:1474 fs/exec.c:1806)
[270.768336][T27574] do_execveat_common+0x4c7/0x680
[270.768344][T27574] ? getname_flags (fs/namei.c:205)
[270.768350][T27574] __x64_sys_execve (fs/exec.c:2088)
[270.768356][T27574] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[270.768361][T27574] ? do_user_addr_fault (arch/x86/mm/fault.c:1422)
[270.768367][T27574] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309)
[270.768374][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[270.768379][T27574] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115)
[ 270.768384][T27574] RIP: 0033:0x7f1a7a7936c7
[ 270.768390][T27574] Code: Unable to access opcode bytes at RIP 0x7f1a7a79369d.

Code starting with the faulting instruction
===========================================
[ 270.768392][T27574] RSP: 002b:00007ffe741919f8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[ 270.768394][T27574] RAX: ffffffffffffffda RBX: 00005643b084c428 RCX: 00007f1a7a7936c7
[ 270.768396][T27574] RDX: 00005643b084ff48 RSI: 00005643b084c428 RDI: 00005643b0850208
[ 270.768397][T27574] RBP: 00005643b079246e R08: 00005643b0792470 R09: 00005643b079247b
[ 270.768398][T27574] R10: 000000000000006e R11: 0000000000000246 R12: 00005643b084ff48
[ 270.768400][T27574] R13: 0000000000000002 R14: 00005643b084ff48 R15: 00005643b0850208
[ 270.768415][T27574] </TASK>
[ 270.768815][T27574] BUG: scheduling while atomic: date/27574/0x00000002
[ 270.768818][T27574] no locks held by date/27574.
[ 270.768819][T27574] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper ttm ipmi_ssif drm_kms_helper syscopyarea ahci libahci sysfillrect acpi_ipmi intel_uncore mei_me joydev ipmi_si sysimgblt ioatdma libata i2c_i801 fb_sys_fops mei ipmi_devintf i2c_smbus intel_pch_thermal lpc_ich dca wmi ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 270.768871][T27574] CPU: 4 PID: 27574 Comm: date Tainted: G S W 5.19.0-rc2-00007-g2bd8eec68f74 #1
[ 270.768874][T27574] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 270.768876][T27574] Call Trace:
[ 270.768878][T27574] <TASK>
[270.768881][T27574] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
[270.768886][T27574] __schedule_bug.cold (kernel/sched/core.c:5661)
[270.768892][T27574] schedule_debug (arch/x86/include/asm/preempt.h:35 kernel/sched/core.c:5688)
[270.768900][T27574] __schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:40 kernel/sched/core.c:6324)
[270.768907][T27574] ? rwlock_bug+0xc0/0xc0
[270.768913][T27574] ? io_schedule_timeout (kernel/sched/core.c:6310)
[270.768919][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[270.768923][T27574] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[270.768931][T27574] do_task_dead (kernel/sched/core.c:6447 (discriminator 4))
[270.768938][T27574] do_exit (include/trace/events/sched.h:333 kernel/exit.c:786)
[270.768948][T27574] do_group_exit (kernel/exit.c:906)
[270.768955][T27574] get_signal (kernel/signal.c:2857)
[270.768965][T27574] ? search_binary_handler (fs/exec.c:1707)
[270.768976][T27574] ? ptrace_signal (kernel/signal.c:2627)
[270.768980][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[270.768984][T27574] ? kasan_quarantine_put (arch/x86/include/asm/irqflags.h:45 (discriminator 1) arch/x86/include/asm/irqflags.h:80 (discriminator 1) arch/x86/include/asm/irqflags.h:138 (discriminator 1) mm/kasan/quarantine.c:242 (discriminator 1))
[270.768988][T27574] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[270.768998][T27574] arch_do_signal_or_restart (arch/x86/kernel/signal.c:869)
[270.769004][T27574] ? get_sigframe_size (arch/x86/kernel/signal.c:866)
[270.769009][T27574] ? do_execveat_common+0x1c0/0x680
[270.769022][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[270.769029][T27574] exit_to_user_mode_loop (kernel/entry/common.c:168)
[270.769035][T27574] exit_to_user_mode_prepare (kernel/entry/common.c:201)
[270.769039][T27574] syscall_exit_to_user_mode (kernel/entry/common.c:128 kernel/entry/common.c:296)
[270.769044][T27574] do_syscall_64 (arch/x86/entry/common.c:87)
[270.769050][T27574] ? do_user_addr_fault (arch/x86/mm/fault.c:1422)
[270.769057][T27574] ? irqentry_exit_to_user_mode (kernel/entry/common.c:129 kernel/entry/common.c:309)
[270.769064][T27574] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4526)
[270.769069][T27574] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115)
[ 270.769073][T27574] RIP: 0033:0x7f1a7a7936c7
[ 270.769076][T27574] Code: Unable to access opcode bytes at RIP 0x7f1a7a79369d.

Code starting with the faulting instruction
===========================================
[ 270.769077][T27574] RSP: 002b:00007ffe741919f8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[ 270.769080][T27574] RAX: fffffffffffffff2 RBX: 00005643b084c428 RCX: 00007f1a7a7936c7
[ 270.769082][T27574] RDX: 00005643b084ff48 RSI: 00005643b084c428 RDI: 00005643b0850208
[ 270.769083][T27574] RBP: 00005643b079246e R08: 00005643b0792470 R09: 00005643b079247b
[ 270.769084][T27574] R10: 000000000000006e R11: 0000000000000246 R12: 00005643b084ff48
[ 270.769086][T27574] R13: 0000000000000002 R14: 00005643b084ff48 R15: 00005643b0850208
[ 270.769100][T27574] </TASK>
[ 271.701080][ T1124] Segmentation fault
[ 271.701094][ T1124]
[ 284.402869][ T1122]


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (33.35 kB)
config-5.19.0-rc2-00007-g2bd8eec68f74 (170.94 kB)
job-script (6.17 kB)
dmesg.xz (52.53 kB)
kernel-selftests (90.20 kB)
job.yaml (5.03 kB)
reproduce (188.00 B)
Download all attachments

2022-07-03 20:30:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:

> FYI, we noticed the following commit (built with gcc-11):
>
> commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> patch link: https://lore.kernel.org/lkml/[email protected]
>

Did this test include the followup patch
mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?


From: Mel Gorman <[email protected]>
Subject: mm/page_alloc: replace local_lock with normal spinlock -fix
Date: Mon, 27 Jun 2022 09:46:45 +0100

As noted by Yu Zhao, use pcp_spin_trylock_irqsave instead of
pcpu_spin_trylock_irqsave. This is a fix to the mm-unstable patch
mm-page_alloc-replace-local_lock-with-normal-spinlock.patch

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Reported-by: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
+++ a/mm/page_alloc.c
@@ -3497,7 +3497,7 @@ void free_unref_page(struct page *page,

zone = page_zone(page);
pcp_trylock_prepare(UP_flags);
- pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
+ pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
if (pcp) {
free_unref_page_commit(zone, pcp, page, migratetype, order);
pcp_spin_unlock_irqrestore(pcp, flags);
_

2022-07-05 14:10:50

by Oliver Sang

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

Hi Andrew Morton,

On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
>
> > FYI, we noticed the following commit (built with gcc-11):
> >
> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > patch link: https://lore.kernel.org/lkml/[email protected]
> >
>
> Did this test include the followup patch
> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?

no, we just fetched original patch set and test upon it.

now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
still exist.
(attached dmesg FYI)

[ 204.416449][T27283] BUG: sleeping function called from invalid context at mm/gup.c:1170
[ 204.416455][T27283] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27283, name: compaction_test
[ 204.416457][T27283] preempt_count: 1, expected: 0
[ 204.416460][T27283] 1 lock held by compaction_test/27283:
[ 204.416462][T27283] #0: ffff88918df83928 (&mm->mmap_lock#2){++++}-{3:3}, at: __mm_populate+0x1d0/0x300
[ 204.416477][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 204.416481][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 204.416483][T27283] Call Trace:
[ 204.416485][T27283] <TASK>
[ 204.416489][T27283] dump_stack_lvl+0x45/0x59
[ 204.416497][T27283] __might_resched.cold+0x15e/0x190
[ 204.416508][T27283] __get_user_pages+0x274/0x6c0
[ 204.416522][T27283] ? get_gate_page+0x640/0x640
[ 204.416538][T27283] ? rwsem_down_read_slowpath+0xb80/0xb80
[ 204.416548][T27283] populate_vma_page_range+0xd7/0x140
[ 204.416554][T27283] __mm_populate+0x178/0x300
[ 204.416560][T27283] ? faultin_vma_page_range+0x100/0x100
[ 204.416566][T27283] ? __up_write+0x13a/0x480
[ 204.416575][T27283] vm_mmap_pgoff+0x1a7/0x240
[ 204.416584][T27283] ? randomize_page+0x80/0x80
[ 204.416586][T27283] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 204.416595][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 204.416600][T27283] ? syscall_enter_from_user_mode+0x21/0x80
[ 204.416609][T27283] do_syscall_64+0x59/0x80
[ 204.416617][T27283] ? irqentry_exit_to_user_mode+0xa/0x40
[ 204.416624][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 204.416629][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 204.416633][T27283] RIP: 0033:0x7f10e01e2b62
[ 204.416637][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64
[ 204.416639][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 204.416642][T27283] RAX: ffffffffffffffda RBX: 0000000000002022 RCX: 00007f10e01e2b62
[ 204.416645][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000
[ 204.416646][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[ 204.416648][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170
[ 204.416649][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 204.416666][T27283] </TASK>
[ 204.690617][T27283] BUG: scheduling while atomic: compaction_test/27283/0x00000004
[ 204.690624][T27283] no locks held by compaction_test/27283.
[ 204.690625][T27283] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 204.690688][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 204.690691][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 204.690694][T27283] Call Trace:
[ 204.690695][T27283] <TASK>
[ 204.690700][T27283] dump_stack_lvl+0x45/0x59
[ 204.690707][T27283] __schedule_bug.cold+0xcf/0xe0
[ 204.690714][T27283] schedule_debug+0x274/0x300
[ 204.690724][T27283] __schedule+0xf5/0x1740
[ 204.690733][T27283] ? io_schedule_timeout+0x180/0x180
[ 204.690737][T27283] ? vm_mmap_pgoff+0x1a7/0x240
[ 204.690748][T27283] schedule+0xea/0x240
[ 204.690753][T27283] exit_to_user_mode_loop+0x79/0x140
[ 204.690759][T27283] exit_to_user_mode_prepare+0xfc/0x180
[ 204.690762][T27283] syscall_exit_to_user_mode+0x19/0x80
[ 204.690768][T27283] do_syscall_64+0x69/0x80
[ 204.690773][T27283] ? __local_bh_enable+0x7a/0xc0
[ 204.690777][T27283] ? __do_softirq+0x52c/0x865
[ 204.690786][T27283] ? irqentry_exit_to_user_mode+0xa/0x40
[ 204.690792][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 204.690798][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 204.690802][T27283] RIP: 0033:0x7f10e01e2b62
[ 204.690806][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64
[ 204.690808][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 204.690811][T27283] RAX: 00007f022d8e7000 RBX: 0000000000002022 RCX: 00007f10e01e2b62
[ 204.690813][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000
[ 204.690814][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[ 204.690815][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170
[ 204.690817][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 204.690830][T27283] </TASK>
[ 216.734914][ T1147]
[ 230.207563][ T1147]
[ 244.124530][ T1147]
[ 257.808775][ T1147]
[ 271.803313][ T1147]
[ 272.181098][ T563] BUG: sleeping function called from invalid context at mm/migrate.c:1380
[ 272.181104][ T563] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 563, name: kcompactd0
[ 272.181107][ T563] preempt_count: 1, expected: 0
[ 272.181109][ T563] no locks held by kcompactd0/563.
[ 272.181112][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 272.181115][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 272.181117][ T563] Call Trace:
[ 272.181119][ T563] <TASK>
[ 272.181124][ T563] dump_stack_lvl+0x45/0x59
[ 272.181133][ T563] __might_resched.cold+0x15e/0x190
[ 272.181143][ T563] migrate_pages+0x2b1/0x1200
[ 272.181152][ T563] ? isolate_freepages+0x880/0x880
[ 272.181158][ T563] ? split_map_pages+0x4c0/0x4c0
[ 272.181167][ T563] ? buffer_migrate_page_norefs+0x40/0x40
[ 272.181172][ T563] ? isolate_migratepages+0x300/0x6c0
[ 272.181183][ T563] compact_zone+0xa3f/0x1640
[ 272.181200][ T563] ? compaction_suitable+0x200/0x200
[ 272.181205][ T563] ? lock_acquire+0x194/0x500
[ 272.181211][ T563] ? finish_wait+0xc5/0x280
[ 272.181220][ T563] proactive_compact_node+0xeb/0x180
[ 272.181224][ T563] ? compact_store+0xc0/0xc0
[ 272.181239][ T563] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 272.181242][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.181252][ T563] kcompactd+0x500/0xc80
[ 272.181262][ T563] ? kcompactd_do_work+0x540/0x540
[ 272.181268][ T563] ? prepare_to_swait_exclusive+0x240/0x240
[ 272.181275][ T563] ? __kthread_parkme+0xd9/0x200
[ 272.181278][ T563] ? schedule+0xfe/0x240
[ 272.181282][ T563] ? kcompactd_do_work+0x540/0x540
[ 272.181288][ T563] kthread+0x28f/0x340
[ 272.181290][ T563] ? kthread_complete_and_exit+0x40/0x40
[ 272.181295][ T563] ret_from_fork+0x1f/0x30
[ 272.181313][ T563] </TASK>
[ 272.295259][ T2111] meminfo[2111]: segfault at 7ffc6e0e55e8 ip 00007fbdf6db8580 sp 00007ffc6e0e55f0 error 7 in libc-2.31.so[7fbdf6d12000+14b000]
[ 272.295314][ T2111] Code: 00 00 48 8b 15 11 29 0f 00 f7 d8 41 bd ff ff ff ff 64 89 02 66 0f 1f 44 00 00 85 ed 0f 85 80 00 00 00 44 89 e6 bf 02 00 00
00 <e8> 3b 9c fb ff 44 89 e8 5d 41 5c 41 5d c3 66 90 e8 eb 8a fb ff e8
[ 272.296053][ T2111] BUG: scheduling while atomic: meminfo/2111/0x00000002
[ 272.296056][ T2111] no locks held by meminfo/2111.
[ 272.296058][ T2111] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 272.296121][ T2111] CPU: 20 PID: 2111 Comm: meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 272.296125][ T2111] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 272.296127][ T2111] Call Trace:
[ 272.296128][ T2111] <TASK>
[ 272.296132][ T2111] dump_stack_lvl+0x45/0x59
[ 272.296141][ T2111] __schedule_bug.cold+0xcf/0xe0
[ 272.296150][ T2111] schedule_debug+0x274/0x300
[ 272.296160][ T2111] __schedule+0xf5/0x1740
[ 272.296169][ T2111] ? rwlock_bug+0xc0/0xc0
[ 272.296176][ T2111] ? io_schedule_timeout+0x180/0x180
[ 272.296181][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 272.296185][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.296194][ T2111] do_task_dead+0xda/0x140
[ 272.296200][ T2111] do_exit+0x6a7/0xac0
[ 272.296210][ T2111] do_group_exit+0xb7/0x2c0
[ 272.296216][ T2111] get_signal+0x1b13/0x1cc0
[ 272.296226][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.296230][ T2111] ? force_sig_info_to_task+0x30d/0x500
[ 272.296234][ T2111] ? ptrace_signal+0x700/0x700
[ 272.296245][ T2111] arch_do_signal_or_restart+0x77/0x300
[ 272.296252][ T2111] ? get_sigframe_size+0x40/0x40
[ 272.296257][ T2111] ? show_opcodes.cold+0x1c/0x21
[ 272.296270][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 272.296277][ T2111] exit_to_user_mode_loop+0xac/0x140
[ 272.296282][ T2111] exit_to_user_mode_prepare+0xfc/0x180
[ 272.296286][ T2111] irqentry_exit_to_user_mode+0x5/0x40
[ 272.296291][ T2111] asm_exc_page_fault+0x27/0x30
[ 272.296293][ T2111] RIP: 0033:0x7fbdf6db8580
[ 272.296297][ T2111] Code: Unable to access opcode bytes at RIP 0x7fbdf6db8556.
[ 272.296299][ T2111] RSP: 002b:00007ffc6e0e55f0 EFLAGS: 00010246
[ 272.296301][ T2111] RAX: 0000000000006bb3 RBX: 00007ffc6e0e56d0 RCX: 00007fbdf6db84bb
[ 272.296303][ T2111] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
[ 272.296305][ T2111] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fbdf6cea740
[ 272.296306][ T2111] R10: 00007fbdf6ceaa10 R11: 0000000000000246 R12: 0000000000000000
[ 272.296308][ T2111] R13: 0000000000006bb3 R14: 00005563332b3908 R15: 00007ffc6e0e56b0
[ 272.296323][ T2111] </TASK>
[ 272.296514][ T2150] gzip-meminfo[2150]: segfault at 7fd637199670 ip 00007fd637199670 sp 00007fffd9088698 error 14 in libc-2.31.so[7fd6370f3000+14b000
]
[ 272.296560][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646.
[ 272.297682][ T2150] BUG: scheduling while atomic: gzip-meminfo/2150/0x00000002
[ 272.297686][ T2150] no locks held by gzip-meminfo/2150.
[ 272.297687][ T2150] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 272.297746][ T2150] CPU: 45 PID: 2150 Comm: gzip-meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 272.297749][ T2150] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 272.297751][ T2150] Call Trace:
[ 272.297752][ T2150] <TASK>
[ 272.297756][ T2150] dump_stack_lvl+0x45/0x59
[ 272.297762][ T2150] __schedule_bug.cold+0xcf/0xe0
[ 272.297768][ T2150] schedule_debug+0x274/0x300
[ 272.297775][ T2150] __schedule+0xf5/0x1740
[ 272.297783][ T2150] ? rwlock_bug+0xc0/0xc0
[ 272.297788][ T2150] ? io_schedule_timeout+0x180/0x180
[ 272.297794][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 272.297797][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.297806][ T2150] do_task_dead+0xda/0x140
[ 272.297811][ T2150] do_exit+0x6a7/0xac0
[ 272.297819][ T2150] do_group_exit+0xb7/0x2c0
[ 272.297825][ T2150] get_signal+0x1b13/0x1cc0
[ 272.297833][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.297838][ T2150] ? force_sig_info_to_task+0x30d/0x500
[ 272.297842][ T2150] ? ptrace_signal+0x700/0x700
[ 272.297854][ T2150] arch_do_signal_or_restart+0x77/0x300
[ 272.297859][ T2150] ? get_sigframe_size+0x40/0x40
[ 272.297864][ T2150] ? show_opcodes+0x97/0xc0
[ 272.297876][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 272.297883][ T2150] exit_to_user_mode_loop+0xac/0x140
[ 272.297887][ T2150] exit_to_user_mode_prepare+0xfc/0x180
[ 272.297890][ T2150] irqentry_exit_to_user_mode+0x5/0x40
[ 272.297894][ T2150] asm_exc_page_fault+0x27/0x30
[ 272.297897][ T2150] RIP: 0033:0x7fd637199670
[ 272.297900][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646.
[ 272.297901][ T2150] RSP: 002b:00007fffd9088698 EFLAGS: 00010246
[ 272.297904][ T2150] RAX: 0000000000000000 RBX: 00007fd63728e610 RCX: 0000000000000000
[ 272.297905][ T2150] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 272.297906][ T2150] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000001
[ 272.297908][ T2150] R10: fffffffffffff287 R11: 00007fd63710c660 R12: 00007fd63728e610
[ 272.297909][ T2150] R13: 0000000000000001 R14: 00007fd63728eae8 R15: 0000000000000000
[ 272.297923][ T2150] </TASK>
[ 272.340352][ T563] BUG: scheduling while atomic: kcompactd0/563/0x0000004d
[ 272.340356][ T563] no locks held by kcompactd0/563.
[ 272.340357][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 272.340433][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 272.340437][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 272.340438][ T563] Call Trace:
[ 272.340440][ T563] <TASK>
[ 272.340444][ T563] dump_stack_lvl+0x45/0x59
[ 272.340451][ T563] __schedule_bug.cold+0xcf/0xe0
[ 272.340459][ T563] schedule_debug+0x274/0x300
[ 272.340467][ T563] __schedule+0xf5/0x1740
[ 272.340477][ T563] ? io_schedule_timeout+0x180/0x180
[ 272.340481][ T563] ? find_held_lock+0x2c/0x140
[ 272.340486][ T563] ? prepare_to_wait_event+0xcd/0x6c0
[ 272.340496][ T563] schedule+0xea/0x240
[ 272.340501][ T563] schedule_timeout+0x11b/0x240
[ 272.340507][ T563] ? usleep_range_state+0x180/0x180
[ 272.340512][ T563] ? timer_migration_handler+0xc0/0xc0
[ 272.340520][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.340525][ T563] ? prepare_to_wait_event+0xcd/0x6c0
[ 272.340540][ T563] kcompactd+0x870/0xc80
[ 272.340554][ T563] ? kcompactd_do_work+0x540/0x540
[ 272.340560][ T563] ? prepare_to_swait_exclusive+0x240/0x240
[ 272.340567][ T563] ? __kthread_parkme+0xd9/0x200
[ 272.340571][ T563] ? schedule+0xfe/0x240
[ 272.340574][ T563] ? kcompactd_do_work+0x540/0x540
[ 272.340579][ T563] kthread+0x28f/0x340
[ 272.340582][ T563] ? kthread_complete_and_exit+0x40/0x40
[ 272.340588][ T563] ret_from_fork+0x1f/0x30
[ 272.340605][ T563] </TASK>
[ 272.799216][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000027
[ 272.799222][ T564] no locks held by kcompactd1/564.
[ 272.799224][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 272.799283][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 272.799287][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 272.799289][ T564] Call Trace:
[ 272.799292][ T564] <TASK>
[ 272.799299][ T564] dump_stack_lvl+0x45/0x59
[ 272.799309][ T564] __schedule_bug.cold+0xcf/0xe0
[ 272.799318][ T564] schedule_debug+0x274/0x300
[ 272.799329][ T564] __schedule+0xf5/0x1740
[ 272.799341][ T564] ? io_schedule_timeout+0x180/0x180
[ 272.799345][ T564] ? find_held_lock+0x2c/0x140
[ 272.799352][ T564] ? prepare_to_wait_event+0xcd/0x6c0
[ 272.799362][ T564] schedule+0xea/0x240
[ 272.799368][ T564] schedule_timeout+0x11b/0x240
[ 272.799374][ T564] ? usleep_range_state+0x180/0x180
[ 272.799379][ T564] ? timer_migration_handler+0xc0/0xc0
[ 272.799389][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 272.799394][ T564] ? prepare_to_wait_event+0xcd/0x6c0
[ 272.799402][ T564] kcompactd+0x870/0xc80
[ 272.799416][ T564] ? kcompactd_do_work+0x540/0x540
[ 272.799422][ T564] ? prepare_to_swait_exclusive+0x240/0x240
[ 272.799429][ T564] ? __kthread_parkme+0xd9/0x200
[ 272.799433][ T564] ? schedule+0xfe/0x240
[ 272.799436][ T564] ? kcompactd_do_work+0x540/0x540
[ 272.799442][ T564] kthread+0x28f/0x340
[ 272.799445][ T564] ? kthread_complete_and_exit+0x40/0x40
[ 272.799451][ T564] ret_from_fork+0x1f/0x30
[ 272.799469][ T564] </TASK>
[ 273.033327][ T563] BUG: scheduling while atomic: kcompactd0/563/0x00000003
[ 273.033331][ T563] no locks held by kcompactd0/563.
[ 273.033333][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 273.033428][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 273.033432][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 273.033434][ T563] Call Trace:
[ 273.033436][ T563] <TASK>
[ 273.033440][ T563] dump_stack_lvl+0x45/0x59
[ 273.033449][ T563] __schedule_bug.cold+0xcf/0xe0
[ 273.033457][ T563] schedule_debug+0x274/0x300
[ 273.033467][ T563] __schedule+0xf5/0x1740
[ 273.033477][ T563] ? io_schedule_timeout+0x180/0x180
[ 273.033481][ T563] ? find_held_lock+0x2c/0x140
[ 273.033487][ T563] ? prepare_to_wait_event+0xcd/0x6c0
[ 273.033498][ T563] schedule+0xea/0x240
[ 273.033503][ T563] schedule_timeout+0x11b/0x240
[ 273.033509][ T563] ? usleep_range_state+0x180/0x180
[ 273.033521][ T563] ? timer_migration_handler+0xc0/0xc0
[ 273.033530][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 273.033535][ T563] ? prepare_to_wait_event+0xcd/0x6c0
[ 273.033543][ T563] kcompactd+0x870/0xc80
[ 273.033557][ T563] ? kcompactd_do_work+0x540/0x540
[ 273.033563][ T563] ? prepare_to_swait_exclusive+0x240/0x240
[ 273.033570][ T563] ? __kthread_parkme+0xd9/0x200
[ 273.033574][ T563] ? schedule+0xfe/0x240
[ 273.033577][ T563] ? kcompactd_do_work+0x540/0x540
[ 273.033582][ T563] kthread+0x28f/0x340
[ 273.033585][ T563] ? kthread_complete_and_exit+0x40/0x40
[ 273.033590][ T563] ret_from_fork+0x1f/0x30
[ 273.033608][ T563] </TASK>
[ 273.319687][ T564] BUG: sleeping function called from invalid context at mm/migrate.c:1380
[ 273.319692][ T564] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 564, name: kcompactd1
[ 273.319694][ T564] preempt_count: 1, expected: 0
[ 273.319696][ T564] no locks held by kcompactd1/564.
[ 273.319699][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 273.319702][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 273.319704][ T564] Call Trace:
[ 273.319707][ T564] <TASK>
[ 273.319713][ T564] dump_stack_lvl+0x45/0x59
[ 273.319723][ T564] __might_resched.cold+0x15e/0x190
[ 273.319734][ T564] migrate_pages+0x2b1/0x1200
[ 273.319744][ T564] ? isolate_freepages+0x880/0x880
[ 273.319752][ T564] ? split_map_pages+0x4c0/0x4c0
[ 273.319762][ T564] ? buffer_migrate_page_norefs+0x40/0x40
[ 273.319767][ T564] ? isolate_migratepages+0x300/0x6c0
[ 273.319778][ T564] compact_zone+0xa3f/0x1640
[ 273.319795][ T564] ? compaction_suitable+0x200/0x200
[ 273.319800][ T564] ? lock_acquire+0x194/0x500
[ 273.319807][ T564] ? finish_wait+0xc5/0x280
[ 273.319816][ T564] proactive_compact_node+0xeb/0x180
[ 273.319820][ T564] ? compact_store+0xc0/0xc0
[ 273.319835][ T564] ? lockdep_hardirqs_on_prepare+0x19a/0x380
[ 273.319839][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 273.319850][ T564] kcompactd+0x500/0xc80
[ 273.319860][ T564] ? kcompactd_do_work+0x540/0x540
[ 273.319866][ T564] ? prepare_to_swait_exclusive+0x240/0x240
[ 273.319873][ T564] ? __kthread_parkme+0xd9/0x200
[ 273.319877][ T564] ? schedule+0xfe/0x240
[ 273.319882][ T564] ? kcompactd_do_work+0x540/0x540
[ 273.319888][ T564] kthread+0x28f/0x340
[ 273.319891][ T564] ? kthread_complete_and_exit+0x40/0x40
[ 273.319896][ T564] ret_from_fork+0x1f/0x30
[ 273.319914][ T564] </TASK>
[ 273.637490][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000041
[ 273.637496][ T564] no locks held by kcompactd1/564.
[ 273.637498][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
_sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
[ 273.637556][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
[ 273.637560][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[ 273.637562][ T564] Call Trace:
[ 273.637565][ T564] <TASK>
[ 273.637571][ T564] dump_stack_lvl+0x45/0x59
[ 273.637580][ T564] __schedule_bug.cold+0xcf/0xe0
[ 273.637589][ T564] schedule_debug+0x274/0x300
[ 273.637600][ T564] __schedule+0xf5/0x1740
[ 273.637612][ T564] ? io_schedule_timeout+0x180/0x180
[ 273.637616][ T564] ? find_held_lock+0x2c/0x140
[ 273.637622][ T564] ? prepare_to_wait_event+0xcd/0x6c0
[ 273.637633][ T564] schedule+0xea/0x240
[ 273.637638][ T564] schedule_timeout+0x11b/0x240
[ 273.637645][ T564] ? usleep_range_state+0x180/0x180
[ 273.637650][ T564] ? timer_migration_handler+0xc0/0xc0
[ 273.637659][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 273.637664][ T564] ? prepare_to_wait_event+0xcd/0x6c0
[ 273.637671][ T564] kcompactd+0x870/0xc80
[ 273.637687][ T564] ? kcompactd_do_work+0x540/0x540
[ 273.637692][ T564] ? prepare_to_swait_exclusive+0x240/0x240
[ 273.637700][ T564] ? __kthread_parkme+0xd9/0x200
[ 273.637704][ T564] ? schedule+0xfe/0x240
[ 273.637707][ T564] ? kcompactd_do_work+0x540/0x540
[ 273.637713][ T564] kthread+0x28f/0x340
[ 273.637716][ T564] ? kthread_complete_and_exit+0x40/0x40
[ 273.637722][ T564] ret_from_fork+0x1f/0x30
[ 273.637740][ T564] </TASK>
[ 285.377624][ T1147]



>
>
> From: Mel Gorman <[email protected]>
> Subject: mm/page_alloc: replace local_lock with normal spinlock -fix
> Date: Mon, 27 Jun 2022 09:46:45 +0100
>
> As noted by Yu Zhao, use pcp_spin_trylock_irqsave instead of
> pcpu_spin_trylock_irqsave. This is a fix to the mm-unstable patch
> mm-page_alloc-replace-local_lock-with-normal-spinlock.patch
>
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: Mel Gorman <[email protected]>
> Reported-by: Yu Zhao <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
> +++ a/mm/page_alloc.c
> @@ -3497,7 +3497,7 @@ void free_unref_page(struct page *page,
>
> zone = page_zone(page);
> pcp_trylock_prepare(UP_flags);
> - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
> + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
> if (pcp) {
> free_unref_page_commit(zone, pcp, page, migratetype, order);
> pcp_spin_unlock_irqrestore(pcp, flags);
> _
>


Attachments:
(No filename) (28.01 kB)
dmesg.xz (53.17 kB)
Download all attachments

2022-07-06 09:57:29

by Mel Gorman

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote:
> Hi Andrew Morton,
>
> On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> >
> > > FYI, we noticed the following commit (built with gcc-11):
> > >
> > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > > patch link: https://lore.kernel.org/lkml/[email protected]
> > >
> >
> > Did this test include the followup patch
> > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
>
> no, we just fetched original patch set and test upon it.
>
> now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
> still exist.
> (attached dmesg FYI)
>

Thanks Oliver.

The trace is odd in that it hits in GUP when the page allocator is no
longer active and the context is a syscall. First, is this definitely
the first patch the problem occurs?

Second, it's possible for IRQs to be enabled and an IRQ delivered before
preemption is enabled. It's not clear why that would be a problem other
than lacking symmetry or how it could result in the reported BUG but
might as well rule it out. This is build tested only

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 934d1b5a5449..d0141e51e613 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -192,14 +192,14 @@ static DEFINE_MUTEX(pcp_batch_high_lock);

#define pcpu_spin_unlock(member, ptr) \
({ \
- spin_unlock(&ptr->member); \
pcpu_task_unpin(); \
+ spin_unlock(&ptr->member); \
})

#define pcpu_spin_unlock_irqrestore(member, ptr, flags) \
({ \
- spin_unlock_irqrestore(&ptr->member, flags); \
pcpu_task_unpin(); \
+ spin_unlock_irqrestore(&ptr->member, flags); \
})

/* struct per_cpu_pages specific helpers. */



2022-07-06 12:02:13

by Mel Gorman

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote:
> On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote:
> > Hi Andrew Morton,
> >
> > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> > >
> > > > FYI, we noticed the following commit (built with gcc-11):
> > > >
> > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > > > patch link: https://lore.kernel.org/lkml/[email protected]
> > > >
> > >
> > > Did this test include the followup patch
> > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
> >
> > no, we just fetched original patch set and test upon it.
> >
> > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
> > still exist.
> > (attached dmesg FYI)
> >
>
> Thanks Oliver.
>
> The trace is odd in that it hits in GUP when the page allocator is no
> longer active and the context is a syscall. First, is this definitely
> the first patch the problem occurs?
>

I tried reproducing this on a 2-socket machine with Xeon
Gold Gold 5218R CPUs. It was necessary to set timeouts in both
vm/settings and kselftest/runner.sh to avoid timeouts. Testing with
a standard config on my original 5.19-rc3 baseline and the baseline
b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel
config with i915 disabled (would not build) and necessary storage drivers
and network drivers enabled (for boot and access). The kernel log shows
a bunch of warnings related to USBAN during boot and during some of the
tests but otherwise compaction_test completed successfully as well as
the other VM tests.

Is this always reproducible?

--
Mel Gorman
SUSE Labs

2022-07-06 14:31:43

by Oliver Sang

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

hi, Mel Gorman,

On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote:
> On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote:
> > Hi Andrew Morton,
> >
> > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> > >
> > > > FYI, we noticed the following commit (built with gcc-11):
> > > >
> > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > > > patch link: https://lore.kernel.org/lkml/[email protected]
> > > >
> > >
> > > Did this test include the followup patch
> > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
> >
> > no, we just fetched original patch set and test upon it.
> >
> > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
> > still exist.
> > (attached dmesg FYI)
> >
>
> Thanks Oliver.
>
> The trace is odd in that it hits in GUP when the page allocator is no
> longer active and the context is a syscall. First, is this definitely
> the first patch the problem occurs?
>
> Second, it's possible for IRQs to be enabled and an IRQ delivered before
> preemption is enabled. It's not clear why that would be a problem other
> than lacking symmetry or how it could result in the reported BUG but
> might as well rule it out. This is build tested only

do you want us test below patch?
if so, should we apply it upon the patch
"mm/page_alloc: Replace local_lock with normal spinlock"
or
"mm/page_alloc: replace local_lock with normal spinlock -fix"?

>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 934d1b5a5449..d0141e51e613 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -192,14 +192,14 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
>
> #define pcpu_spin_unlock(member, ptr) \
> ({ \
> - spin_unlock(&ptr->member); \
> pcpu_task_unpin(); \
> + spin_unlock(&ptr->member); \
> })
>
> #define pcpu_spin_unlock_irqrestore(member, ptr, flags) \
> ({ \
> - spin_unlock_irqrestore(&ptr->member, flags); \
> pcpu_task_unpin(); \
> + spin_unlock_irqrestore(&ptr->member, flags); \
> })
>
> /* struct per_cpu_pages specific helpers. */
>
>
>

2022-07-06 14:33:37

by Oliver Sang

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

hi, Mel Gorman,

On Wed, Jul 06, 2022 at 12:53:29PM +0100, Mel Gorman wrote:
> On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote:
> > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote:
> > > Hi Andrew Morton,
> > >
> > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> > > >
> > > > > FYI, we noticed the following commit (built with gcc-11):
> > > > >
> > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > > > > patch link: https://lore.kernel.org/lkml/[email protected]
> > > > >
> > > >
> > > > Did this test include the followup patch
> > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
> > >
> > > no, we just fetched original patch set and test upon it.
> > >
> > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
> > > still exist.
> > > (attached dmesg FYI)
> > >
> >
> > Thanks Oliver.
> >
> > The trace is odd in that it hits in GUP when the page allocator is no
> > longer active and the context is a syscall. First, is this definitely
> > the first patch the problem occurs?
> >
>
> I tried reproducing this on a 2-socket machine with Xeon
> Gold Gold 5218R CPUs. It was necessary to set timeouts in both
> vm/settings and kselftest/runner.sh to avoid timeouts. Testing with
> a standard config on my original 5.19-rc3 baseline and the baseline
> b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel
> config with i915 disabled (would not build) and necessary storage drivers
> and network drivers enabled (for boot and access). The kernel log shows
> a bunch of warnings related to USBAN during boot and during some of the
> tests but otherwise compaction_test completed successfully as well as
> the other VM tests.
>
> Is this always reproducible?

not always but high rate.
we actually also observed other dmesgs stats for both 2bd8eec68f74 and its
parent, but those dmesg.BUG:sleeping_function_called_from_invalid_context_at*
seem only happen on 2bd8eec68f74 as well as the '-fix' commit.

=========================================================================================
compiler/group/kconfig/rootfs/sc_nr_hugepages/tbox_group/testcase/ucode:
gcc-11/vm/x86_64-rhel-8.3-kselftests/debian-11.1-x86_64-20220510.cgz/2/lkp-csl-2sp9/kernel-selftests/0x500320a

commit:
eec0ff5df294 ("mm/page_alloc: Remotely drain per-cpu lists")
2bd8eec68f74 ("mm/page_alloc: Replace local_lock with normal spinlock")
292baeb4c714 ("mm/page_alloc: replace local_lock with normal spinlock -fix")

eec0ff5df2945d19 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:20 75% 15:20 70% 14:21 dmesg.BUG:scheduling_while_atomic
:20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/binfmt_elf.c
:20 5% 1:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c
:20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/freezer.h
:20 10% 2:20 25% 5:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/mmu_notifier.h
:20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h
:20 40% 8:20 40% 8:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
:20 10% 2:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
:20 10% 2:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c
:20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c
:20 15% 3:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c
:20 60% 12:20 55% 11:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/migrate.c
:20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c
:20 0% :20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/rmap.c
:20 15% 3:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c
:20 45% 9:20 45% 9:21 dmesg.BUG:workqueue_leaked_lock_or_atomic
:20 25% 5:20 15% 3:21 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode=
:20 5% 1:20 0% :21 dmesg.RIP:__clear_user
20:20 0% 20:20 5% 21:21 dmesg.RIP:rcu_eqs_exit
20:20 0% 20:20 5% 21:21 dmesg.RIP:sched_clock_tick
:20 5% 1:20 0% :21 dmesg.RIP:smp_call_function_many_cond
20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit
20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick
:20 5% 1:20 0% :21 dmesg.WARNING:at_kernel/smp.c:#smp_call_function_many_cond
20:20 0% 20:20 5% 21:21 dmesg.WARNING:suspicious_RCU_usage
20:20 0% 20:20 5% 21:21 dmesg.boot_failures
9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle
9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle
20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage
20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage


>
> --
> Mel Gorman
> SUSE Labs

2022-07-06 15:09:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On Wed, Jul 06, 2022 at 10:25:30PM +0800, Oliver Sang wrote:
> hi, Mel Gorman,
>
> On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote:
> > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote:
> > > Hi Andrew Morton,
> > >
> > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> > > >
> > > > > FYI, we noticed the following commit (built with gcc-11):
> > > > >
> > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > > > > patch link: https://lore.kernel.org/lkml/[email protected]
> > > > >
> > > >
> > > > Did this test include the followup patch
> > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
> > >
> > > no, we just fetched original patch set and test upon it.
> > >
> > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
> > > still exist.
> > > (attached dmesg FYI)
> > >
> >
> > Thanks Oliver.
> >
> > The trace is odd in that it hits in GUP when the page allocator is no
> > longer active and the context is a syscall. First, is this definitely
> > the first patch the problem occurs?
> >
> > Second, it's possible for IRQs to be enabled and an IRQ delivered before
> > preemption is enabled. It's not clear why that would be a problem other
> > than lacking symmetry or how it could result in the reported BUG but
> > might as well rule it out. This is build tested only
>
> do you want us test below patch?
> if so, should we apply it upon the patch
> "mm/page_alloc: Replace local_lock with normal spinlock"
> or
> "mm/page_alloc: replace local_lock with normal spinlock -fix"?
>

On top of "mm/page_alloc: replace local_lock with normal spinlock -fix"
please. The -fix patch is cosmetic but it'd still be better to test on
top.

Thanks!

--
Mel Gorman
SUSE Labs

2022-07-06 15:09:36

by Mel Gorman

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On Wed, Jul 06, 2022 at 10:21:36PM +0800, Oliver Sang wrote:
> > I tried reproducing this on a 2-socket machine with Xeon
> > Gold Gold 5218R CPUs. It was necessary to set timeouts in both
> > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with
> > a standard config on my original 5.19-rc3 baseline and the baseline
> > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel
> > config with i915 disabled (would not build) and necessary storage drivers
> > and network drivers enabled (for boot and access). The kernel log shows
> > a bunch of warnings related to USBAN during boot and during some of the
> > tests but otherwise compaction_test completed successfully as well as
> > the other VM tests.
> >
> > Is this always reproducible?
>
> not always but high rate.
> we actually also observed other dmesgs stats for both 2bd8eec68f74 and its
> parent

Ok, it's unclear what the "other dmesg stats" are but given that it happens
for the parent. Does 5.19-rc2 (your baseline) have the same messages as
2bd8eec68f74^? Does the kselftests vm suite always pass but sometimes
fails with 2bd8eec68f74?

> but those dmesg.BUG:sleeping_function_called_from_invalid_context_at*
> seem only happen on 2bd8eec68f74 as well as the '-fix' commit.
>

And roughly how often does it happen? I'm running it in a loop now to
see if I can trigger it locally.

--
Mel Gorman
SUSE Labs

2022-07-07 08:46:41

by Oliver Sang

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

Hi Mel Gorman,

On Wed, Jul 06, 2022 at 03:52:41PM +0100, Mel Gorman wrote:
> On Wed, Jul 06, 2022 at 10:21:36PM +0800, Oliver Sang wrote:
> > > I tried reproducing this on a 2-socket machine with Xeon
> > > Gold Gold 5218R CPUs. It was necessary to set timeouts in both
> > > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with
> > > a standard config on my original 5.19-rc3 baseline and the baseline
> > > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel
> > > config with i915 disabled (would not build) and necessary storage drivers
> > > and network drivers enabled (for boot and access). The kernel log shows
> > > a bunch of warnings related to USBAN during boot and during some of the
> > > tests but otherwise compaction_test completed successfully as well as
> > > the other VM tests.
> > >
> > > Is this always reproducible?
> >
> > not always but high rate.
> > we actually also observed other dmesgs stats for both 2bd8eec68f74 and its
> > parent
>
> Ok, it's unclear what the "other dmesg stats" are but given that it happens
> for the parent. Does 5.19-rc2 (your baseline) have the same messages as
> 2bd8eec68f74^?

yeah, 5.19-rc2 has similar results as 2bd8eec68f74^ by multi-runs, while
2bd8eec68f74 looks quite similar to '-fix' commit which we applied it as
292baeb4c714.

take the 'BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c'
we reported as example:

v5.19-rc2 eec0ff5df2945d19039d16841b9 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481
---------------- --------------------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | | | |
:31 0% :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

the 'fail:runs' means we observed the issue 'fail' times while running 'runs'
times.

for v5.19-rc2, " :31", so we run the same jobs upon v5.19-rc2 31 times, but
never see this
"dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c"

for eec0ff5df2 (2bd8eec68f74^), also clean on 20 runs.

but for both
2bd8eec68f74 ("mm/page_alloc: Replace local_lock with normal spinlock")
292baeb4c714 ("mm/page_alloc: replace local_lock with normal spinlock -fix")
it almost can reproduce in half of runs (11 out of 20 runs, 13 out of 31 runs
respectively)

the full comparison of these 4 commits are as [1]

generally, for those dmesg.BUG:sleeping_function_called_from_invalid***,
quite clean on v5.19-rc2 2bd8eec68f74^,
but have similar rate on 2bd8eec68f74 & 292baeb4c714

but we also obversed other issues, such like "dmesg.RIP:rcu_eqs_exit",
almost always happen on all 4 commits (this is what I said 'other dmesg
stats', sorry for confusion, and I will avoid using this kind of 'internal'
words in the future)

> Does the kselftests vm suite always pass but sometimes
> fails with 2bd8eec68f74?

below is results of kselftests vm suite, so it's really like what you said,
sometimes fails with 2bd8eec68f74 (also 292baeb4c714)
one example is
kernel-selftests.vm.run_vmtests.sh../userfaultfd_anon_20_16
if always pass on v5.19-rc2 and 2bd8eec68f74^
but fail 6 times out of 20 runs on 2bd8eec68f74, and fail 5 times out of
21 runs on 292baeb4c714

but since this rate seems not match with above issues, so not sure if they
are related?


v5.19-rc2 eec0ff5df2945d19039d16841b9 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481
---------------- --------------------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | | | |
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.madv_populate.fail
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../gup_test_a.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../gup_test_ct_F_0x1_0_19_0x1000.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../gup_test_u.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_mmap.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_mremap_./huge/huge_mremap.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_shm.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugepage_vmemmap.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../hugetlb_madvise_./huge/madvise_test.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../map_fixed_noreplace.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.run_vmtests.sh../map_hugetlb.pass
31:31 -35% 20:20 -30% 14:20 -20% 16:21 kernel-selftests.vm.run_vmtests.sh../userfaultfd_anon_20_16.pass
31:31 -35% 20:20 -30% 14:20 -20% 16:21 kernel-selftests.vm.run_vmtests.sh../userfaultfd_hugetlb_256_32.pass
29:31 -32% 19:20 -29% 13:20 -34% 12:21 kernel-selftests.vm.run_vmtests.sh../userfaultfd_shmem_20_16.pass
31:31 -35% 20:20 -35% 13:20 -25% 15:21 kernel-selftests.vm.run_vmtests.sh.fail
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.soft-dirty.pass
31:31 -35% 20:20 0% 20:20 5% 21:21 kernel-selftests.vm.split_huge_page_test.pass
>
> > but those dmesg.BUG:sleeping_function_called_from_invalid_context_at*
> > seem only happen on 2bd8eec68f74 as well as the '-fix' commit.
> >
>
> And roughly how often does it happen? I'm running it in a loop now to
> see if I can trigger it locally.

just as above, the 'BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c'
has around 50% rate to happen upon 2bd8eec68f74 & 292baeb4c714


BTW, will test that patch you mentioned in another mail later and update you
with the results.


>
> --
> Mel Gorman
> SUSE Labs


[1]
v5.19-rc2 eec0ff5df2945d19039d16841b9 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481
---------------- --------------------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | | | |
:31 0% :20 75% 15:20 70% 14:21 dmesg.BUG:scheduling_while_atomic
:31 0% :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/binfmt_elf.c
:31 0% :20 5% 1:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c
:31 0% :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/freezer.h
:31 0% :20 10% 2:20 25% 5:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/mmu_notifier.h
:31 0% :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h
:31 0% :20 40% 8:20 40% 8:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
:31 0% :20 10% 2:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
:31 0% :20 10% 2:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c
:31 0% :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c
:31 0% :20 15% 3:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c
:31 0% :20 60% 12:20 55% 11:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/migrate.c
:31 0% :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c
:31 0% :20 0% :20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/rmap.c
:31 0% :20 15% 3:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c
:31 0% :20 45% 9:20 45% 9:21 dmesg.BUG:workqueue_leaked_lock_or_atomic
:31 0% :20 25% 5:20 15% 3:21 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode=
:31 0% :20 5% 1:20 0% :21 dmesg.RIP:__clear_user
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.RIP:rcu_eqs_exit
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.RIP:sched_clock_tick
:31 0% :20 5% 1:20 0% :21 dmesg.RIP:smp_call_function_many_cond
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick
:31 0% :20 5% 1:20 0% :21 dmesg.WARNING:at_kernel/smp.c:#smp_call_function_many_cond
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.WARNING:suspicious_RCU_usage
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.boot_failures
11:31 -6% 9:20 -5% 6:20 5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle
11:31 -6% 9:20 -5% 6:20 5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage
29:31 -29% 20:20 6% 20:20 11% 21:21 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage

2022-07-07 22:09:36

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On 7/5/22 15:51, Oliver Sang wrote:
> Hi Andrew Morton,
>
> On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
>> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
>>
>> > FYI, we noticed the following commit (built with gcc-11):
>> >
>> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
>> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
>> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
>> > patch link: https://lore.kernel.org/lkml/[email protected]
>> >
>>
>> Did this test include the followup patch
>> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
>
> no, we just fetched original patch set and test upon it.

It appears you fetched v4, not v5. I noticed it from the threading of your
report that was threaded in the v4 thread, and also the github url: above.
In v4, pcpu_spin_trylock_irqsave() was missing an unpin, and indeed it's
missing in the github branch you were testing:

https://github.com/intel-lab-lkp/linux/commit/2bd8eec68f740608db5ea58ecff06965228764cb#diff-cef95765dfd76e5f9c9f0faebfa683edf904d0c3de71547ae8c3ea14418c1e38R187

v5 should be fine:

https://lore.kernel.org/lkml/[email protected]/

> now we applied the patch you pointed to us upon 2bd8eec68f and found the issue
> still exist.
> (attached dmesg FYI)
>
> [ 204.416449][T27283] BUG: sleeping function called from invalid context at mm/gup.c:1170
> [ 204.416455][T27283] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 27283, name: compaction_test
> [ 204.416457][T27283] preempt_count: 1, expected: 0
> [ 204.416460][T27283] 1 lock held by compaction_test/27283:
> [ 204.416462][T27283] #0: ffff88918df83928 (&mm->mmap_lock#2){++++}-{3:3}, at: __mm_populate+0x1d0/0x300
> [ 204.416477][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 204.416481][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 204.416483][T27283] Call Trace:
> [ 204.416485][T27283] <TASK>
> [ 204.416489][T27283] dump_stack_lvl+0x45/0x59
> [ 204.416497][T27283] __might_resched.cold+0x15e/0x190
> [ 204.416508][T27283] __get_user_pages+0x274/0x6c0
> [ 204.416522][T27283] ? get_gate_page+0x640/0x640
> [ 204.416538][T27283] ? rwsem_down_read_slowpath+0xb80/0xb80
> [ 204.416548][T27283] populate_vma_page_range+0xd7/0x140
> [ 204.416554][T27283] __mm_populate+0x178/0x300
> [ 204.416560][T27283] ? faultin_vma_page_range+0x100/0x100
> [ 204.416566][T27283] ? __up_write+0x13a/0x480
> [ 204.416575][T27283] vm_mmap_pgoff+0x1a7/0x240
> [ 204.416584][T27283] ? randomize_page+0x80/0x80
> [ 204.416586][T27283] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 204.416595][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 204.416600][T27283] ? syscall_enter_from_user_mode+0x21/0x80
> [ 204.416609][T27283] do_syscall_64+0x59/0x80
> [ 204.416617][T27283] ? irqentry_exit_to_user_mode+0xa/0x40
> [ 204.416624][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 204.416629][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 204.416633][T27283] RIP: 0033:0x7f10e01e2b62
> [ 204.416637][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f
> 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64
> [ 204.416639][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
> [ 204.416642][T27283] RAX: ffffffffffffffda RBX: 0000000000002022 RCX: 00007f10e01e2b62
> [ 204.416645][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000
> [ 204.416646][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
> [ 204.416648][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170
> [ 204.416649][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 204.416666][T27283] </TASK>
> [ 204.690617][T27283] BUG: scheduling while atomic: compaction_test/27283/0x00000004
> [ 204.690624][T27283] no locks held by compaction_test/27283.
> [ 204.690625][T27283] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 204.690688][T27283] CPU: 76 PID: 27283 Comm: compaction_test Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 204.690691][T27283] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 204.690694][T27283] Call Trace:
> [ 204.690695][T27283] <TASK>
> [ 204.690700][T27283] dump_stack_lvl+0x45/0x59
> [ 204.690707][T27283] __schedule_bug.cold+0xcf/0xe0
> [ 204.690714][T27283] schedule_debug+0x274/0x300
> [ 204.690724][T27283] __schedule+0xf5/0x1740
> [ 204.690733][T27283] ? io_schedule_timeout+0x180/0x180
> [ 204.690737][T27283] ? vm_mmap_pgoff+0x1a7/0x240
> [ 204.690748][T27283] schedule+0xea/0x240
> [ 204.690753][T27283] exit_to_user_mode_loop+0x79/0x140
> [ 204.690759][T27283] exit_to_user_mode_prepare+0xfc/0x180
> [ 204.690762][T27283] syscall_exit_to_user_mode+0x19/0x80
> [ 204.690768][T27283] do_syscall_64+0x69/0x80
> [ 204.690773][T27283] ? __local_bh_enable+0x7a/0xc0
> [ 204.690777][T27283] ? __do_softirq+0x52c/0x865
> [ 204.690786][T27283] ? irqentry_exit_to_user_mode+0xa/0x40
> [ 204.690792][T27283] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 204.690798][T27283] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 204.690802][T27283] RIP: 0033:0x7f10e01e2b62
> [ 204.690806][T27283] Code: e4 e8 b2 4b 01 00 66 90 41 f7 c1 ff 0f 00 00 75 27 55 48 89 fd 53 89 cb 48 85 ff 74 3b 41 89 da 48 89 ef b8 09 00 00 00 0f
> 05 <48> 3d 00 f0 ff ff 77 66 5b 5d c3 0f 1f 00 48 8b 05 f9 52 0c 00 64
> [ 204.690808][T27283] RSP: 002b:00007ffd771efe48 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
> [ 204.690811][T27283] RAX: 00007f022d8e7000 RBX: 0000000000002022 RCX: 00007f10e01e2b62
> [ 204.690813][T27283] RDX: 0000000000000003 RSI: 0000000006400000 RDI: 0000000000000000
> [ 204.690814][T27283] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
> [ 204.690815][T27283] R10: 0000000000002022 R11: 0000000000000246 R12: 0000000000401170
> [ 204.690817][T27283] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 204.690830][T27283] </TASK>
> [ 216.734914][ T1147]
> [ 230.207563][ T1147]
> [ 244.124530][ T1147]
> [ 257.808775][ T1147]
> [ 271.803313][ T1147]
> [ 272.181098][ T563] BUG: sleeping function called from invalid context at mm/migrate.c:1380
> [ 272.181104][ T563] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 563, name: kcompactd0
> [ 272.181107][ T563] preempt_count: 1, expected: 0
> [ 272.181109][ T563] no locks held by kcompactd0/563.
> [ 272.181112][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 272.181115][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 272.181117][ T563] Call Trace:
> [ 272.181119][ T563] <TASK>
> [ 272.181124][ T563] dump_stack_lvl+0x45/0x59
> [ 272.181133][ T563] __might_resched.cold+0x15e/0x190
> [ 272.181143][ T563] migrate_pages+0x2b1/0x1200
> [ 272.181152][ T563] ? isolate_freepages+0x880/0x880
> [ 272.181158][ T563] ? split_map_pages+0x4c0/0x4c0
> [ 272.181167][ T563] ? buffer_migrate_page_norefs+0x40/0x40
> [ 272.181172][ T563] ? isolate_migratepages+0x300/0x6c0
> [ 272.181183][ T563] compact_zone+0xa3f/0x1640
> [ 272.181200][ T563] ? compaction_suitable+0x200/0x200
> [ 272.181205][ T563] ? lock_acquire+0x194/0x500
> [ 272.181211][ T563] ? finish_wait+0xc5/0x280
> [ 272.181220][ T563] proactive_compact_node+0xeb/0x180
> [ 272.181224][ T563] ? compact_store+0xc0/0xc0
> [ 272.181239][ T563] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 272.181242][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.181252][ T563] kcompactd+0x500/0xc80
> [ 272.181262][ T563] ? kcompactd_do_work+0x540/0x540
> [ 272.181268][ T563] ? prepare_to_swait_exclusive+0x240/0x240
> [ 272.181275][ T563] ? __kthread_parkme+0xd9/0x200
> [ 272.181278][ T563] ? schedule+0xfe/0x240
> [ 272.181282][ T563] ? kcompactd_do_work+0x540/0x540
> [ 272.181288][ T563] kthread+0x28f/0x340
> [ 272.181290][ T563] ? kthread_complete_and_exit+0x40/0x40
> [ 272.181295][ T563] ret_from_fork+0x1f/0x30
> [ 272.181313][ T563] </TASK>
> [ 272.295259][ T2111] meminfo[2111]: segfault at 7ffc6e0e55e8 ip 00007fbdf6db8580 sp 00007ffc6e0e55f0 error 7 in libc-2.31.so[7fbdf6d12000+14b000]
> [ 272.295314][ T2111] Code: 00 00 48 8b 15 11 29 0f 00 f7 d8 41 bd ff ff ff ff 64 89 02 66 0f 1f 44 00 00 85 ed 0f 85 80 00 00 00 44 89 e6 bf 02 00 00
> 00 <e8> 3b 9c fb ff 44 89 e8 5d 41 5c 41 5d c3 66 90 e8 eb 8a fb ff e8
> [ 272.296053][ T2111] BUG: scheduling while atomic: meminfo/2111/0x00000002
> [ 272.296056][ T2111] no locks held by meminfo/2111.
> [ 272.296058][ T2111] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 272.296121][ T2111] CPU: 20 PID: 2111 Comm: meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 272.296125][ T2111] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 272.296127][ T2111] Call Trace:
> [ 272.296128][ T2111] <TASK>
> [ 272.296132][ T2111] dump_stack_lvl+0x45/0x59
> [ 272.296141][ T2111] __schedule_bug.cold+0xcf/0xe0
> [ 272.296150][ T2111] schedule_debug+0x274/0x300
> [ 272.296160][ T2111] __schedule+0xf5/0x1740
> [ 272.296169][ T2111] ? rwlock_bug+0xc0/0xc0
> [ 272.296176][ T2111] ? io_schedule_timeout+0x180/0x180
> [ 272.296181][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 272.296185][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.296194][ T2111] do_task_dead+0xda/0x140
> [ 272.296200][ T2111] do_exit+0x6a7/0xac0
> [ 272.296210][ T2111] do_group_exit+0xb7/0x2c0
> [ 272.296216][ T2111] get_signal+0x1b13/0x1cc0
> [ 272.296226][ T2111] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.296230][ T2111] ? force_sig_info_to_task+0x30d/0x500
> [ 272.296234][ T2111] ? ptrace_signal+0x700/0x700
> [ 272.296245][ T2111] arch_do_signal_or_restart+0x77/0x300
> [ 272.296252][ T2111] ? get_sigframe_size+0x40/0x40
> [ 272.296257][ T2111] ? show_opcodes.cold+0x1c/0x21
> [ 272.296270][ T2111] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 272.296277][ T2111] exit_to_user_mode_loop+0xac/0x140
> [ 272.296282][ T2111] exit_to_user_mode_prepare+0xfc/0x180
> [ 272.296286][ T2111] irqentry_exit_to_user_mode+0x5/0x40
> [ 272.296291][ T2111] asm_exc_page_fault+0x27/0x30
> [ 272.296293][ T2111] RIP: 0033:0x7fbdf6db8580
> [ 272.296297][ T2111] Code: Unable to access opcode bytes at RIP 0x7fbdf6db8556.
> [ 272.296299][ T2111] RSP: 002b:00007ffc6e0e55f0 EFLAGS: 00010246
> [ 272.296301][ T2111] RAX: 0000000000006bb3 RBX: 00007ffc6e0e56d0 RCX: 00007fbdf6db84bb
> [ 272.296303][ T2111] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
> [ 272.296305][ T2111] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fbdf6cea740
> [ 272.296306][ T2111] R10: 00007fbdf6ceaa10 R11: 0000000000000246 R12: 0000000000000000
> [ 272.296308][ T2111] R13: 0000000000006bb3 R14: 00005563332b3908 R15: 00007ffc6e0e56b0
> [ 272.296323][ T2111] </TASK>
> [ 272.296514][ T2150] gzip-meminfo[2150]: segfault at 7fd637199670 ip 00007fd637199670 sp 00007fffd9088698 error 14 in libc-2.31.so[7fd6370f3000+14b000
> ]
> [ 272.296560][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646.
> [ 272.297682][ T2150] BUG: scheduling while atomic: gzip-meminfo/2150/0x00000002
> [ 272.297686][ T2150] no locks held by gzip-meminfo/2150.
> [ 272.297687][ T2150] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 272.297746][ T2150] CPU: 45 PID: 2150 Comm: gzip-meminfo Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 272.297749][ T2150] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 272.297751][ T2150] Call Trace:
> [ 272.297752][ T2150] <TASK>
> [ 272.297756][ T2150] dump_stack_lvl+0x45/0x59
> [ 272.297762][ T2150] __schedule_bug.cold+0xcf/0xe0
> [ 272.297768][ T2150] schedule_debug+0x274/0x300
> [ 272.297775][ T2150] __schedule+0xf5/0x1740
> [ 272.297783][ T2150] ? rwlock_bug+0xc0/0xc0
> [ 272.297788][ T2150] ? io_schedule_timeout+0x180/0x180
> [ 272.297794][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 272.297797][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.297806][ T2150] do_task_dead+0xda/0x140
> [ 272.297811][ T2150] do_exit+0x6a7/0xac0
> [ 272.297819][ T2150] do_group_exit+0xb7/0x2c0
> [ 272.297825][ T2150] get_signal+0x1b13/0x1cc0
> [ 272.297833][ T2150] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.297838][ T2150] ? force_sig_info_to_task+0x30d/0x500
> [ 272.297842][ T2150] ? ptrace_signal+0x700/0x700
> [ 272.297854][ T2150] arch_do_signal_or_restart+0x77/0x300
> [ 272.297859][ T2150] ? get_sigframe_size+0x40/0x40
> [ 272.297864][ T2150] ? show_opcodes+0x97/0xc0
> [ 272.297876][ T2150] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 272.297883][ T2150] exit_to_user_mode_loop+0xac/0x140
> [ 272.297887][ T2150] exit_to_user_mode_prepare+0xfc/0x180
> [ 272.297890][ T2150] irqentry_exit_to_user_mode+0x5/0x40
> [ 272.297894][ T2150] asm_exc_page_fault+0x27/0x30
> [ 272.297897][ T2150] RIP: 0033:0x7fd637199670
> [ 272.297900][ T2150] Code: Unable to access opcode bytes at RIP 0x7fd637199646.
> [ 272.297901][ T2150] RSP: 002b:00007fffd9088698 EFLAGS: 00010246
> [ 272.297904][ T2150] RAX: 0000000000000000 RBX: 00007fd63728e610 RCX: 0000000000000000
> [ 272.297905][ T2150] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 272.297906][ T2150] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000001
> [ 272.297908][ T2150] R10: fffffffffffff287 R11: 00007fd63710c660 R12: 00007fd63728e610
> [ 272.297909][ T2150] R13: 0000000000000001 R14: 00007fd63728eae8 R15: 0000000000000000
> [ 272.297923][ T2150] </TASK>
> [ 272.340352][ T563] BUG: scheduling while atomic: kcompactd0/563/0x0000004d
> [ 272.340356][ T563] no locks held by kcompactd0/563.
> [ 272.340357][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 272.340433][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 272.340437][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 272.340438][ T563] Call Trace:
> [ 272.340440][ T563] <TASK>
> [ 272.340444][ T563] dump_stack_lvl+0x45/0x59
> [ 272.340451][ T563] __schedule_bug.cold+0xcf/0xe0
> [ 272.340459][ T563] schedule_debug+0x274/0x300
> [ 272.340467][ T563] __schedule+0xf5/0x1740
> [ 272.340477][ T563] ? io_schedule_timeout+0x180/0x180
> [ 272.340481][ T563] ? find_held_lock+0x2c/0x140
> [ 272.340486][ T563] ? prepare_to_wait_event+0xcd/0x6c0
> [ 272.340496][ T563] schedule+0xea/0x240
> [ 272.340501][ T563] schedule_timeout+0x11b/0x240
> [ 272.340507][ T563] ? usleep_range_state+0x180/0x180
> [ 272.340512][ T563] ? timer_migration_handler+0xc0/0xc0
> [ 272.340520][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.340525][ T563] ? prepare_to_wait_event+0xcd/0x6c0
> [ 272.340540][ T563] kcompactd+0x870/0xc80
> [ 272.340554][ T563] ? kcompactd_do_work+0x540/0x540
> [ 272.340560][ T563] ? prepare_to_swait_exclusive+0x240/0x240
> [ 272.340567][ T563] ? __kthread_parkme+0xd9/0x200
> [ 272.340571][ T563] ? schedule+0xfe/0x240
> [ 272.340574][ T563] ? kcompactd_do_work+0x540/0x540
> [ 272.340579][ T563] kthread+0x28f/0x340
> [ 272.340582][ T563] ? kthread_complete_and_exit+0x40/0x40
> [ 272.340588][ T563] ret_from_fork+0x1f/0x30
> [ 272.340605][ T563] </TASK>
> [ 272.799216][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000027
> [ 272.799222][ T564] no locks held by kcompactd1/564.
> [ 272.799224][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 272.799283][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 272.799287][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 272.799289][ T564] Call Trace:
> [ 272.799292][ T564] <TASK>
> [ 272.799299][ T564] dump_stack_lvl+0x45/0x59
> [ 272.799309][ T564] __schedule_bug.cold+0xcf/0xe0
> [ 272.799318][ T564] schedule_debug+0x274/0x300
> [ 272.799329][ T564] __schedule+0xf5/0x1740
> [ 272.799341][ T564] ? io_schedule_timeout+0x180/0x180
> [ 272.799345][ T564] ? find_held_lock+0x2c/0x140
> [ 272.799352][ T564] ? prepare_to_wait_event+0xcd/0x6c0
> [ 272.799362][ T564] schedule+0xea/0x240
> [ 272.799368][ T564] schedule_timeout+0x11b/0x240
> [ 272.799374][ T564] ? usleep_range_state+0x180/0x180
> [ 272.799379][ T564] ? timer_migration_handler+0xc0/0xc0
> [ 272.799389][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 272.799394][ T564] ? prepare_to_wait_event+0xcd/0x6c0
> [ 272.799402][ T564] kcompactd+0x870/0xc80
> [ 272.799416][ T564] ? kcompactd_do_work+0x540/0x540
> [ 272.799422][ T564] ? prepare_to_swait_exclusive+0x240/0x240
> [ 272.799429][ T564] ? __kthread_parkme+0xd9/0x200
> [ 272.799433][ T564] ? schedule+0xfe/0x240
> [ 272.799436][ T564] ? kcompactd_do_work+0x540/0x540
> [ 272.799442][ T564] kthread+0x28f/0x340
> [ 272.799445][ T564] ? kthread_complete_and_exit+0x40/0x40
> [ 272.799451][ T564] ret_from_fork+0x1f/0x30
> [ 272.799469][ T564] </TASK>
> [ 273.033327][ T563] BUG: scheduling while atomic: kcompactd0/563/0x00000003
> [ 273.033331][ T563] no locks held by kcompactd0/563.
> [ 273.033333][ T563] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 273.033428][ T563] CPU: 63 PID: 563 Comm: kcompactd0 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 273.033432][ T563] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 273.033434][ T563] Call Trace:
> [ 273.033436][ T563] <TASK>
> [ 273.033440][ T563] dump_stack_lvl+0x45/0x59
> [ 273.033449][ T563] __schedule_bug.cold+0xcf/0xe0
> [ 273.033457][ T563] schedule_debug+0x274/0x300
> [ 273.033467][ T563] __schedule+0xf5/0x1740
> [ 273.033477][ T563] ? io_schedule_timeout+0x180/0x180
> [ 273.033481][ T563] ? find_held_lock+0x2c/0x140
> [ 273.033487][ T563] ? prepare_to_wait_event+0xcd/0x6c0
> [ 273.033498][ T563] schedule+0xea/0x240
> [ 273.033503][ T563] schedule_timeout+0x11b/0x240
> [ 273.033509][ T563] ? usleep_range_state+0x180/0x180
> [ 273.033521][ T563] ? timer_migration_handler+0xc0/0xc0
> [ 273.033530][ T563] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 273.033535][ T563] ? prepare_to_wait_event+0xcd/0x6c0
> [ 273.033543][ T563] kcompactd+0x870/0xc80
> [ 273.033557][ T563] ? kcompactd_do_work+0x540/0x540
> [ 273.033563][ T563] ? prepare_to_swait_exclusive+0x240/0x240
> [ 273.033570][ T563] ? __kthread_parkme+0xd9/0x200
> [ 273.033574][ T563] ? schedule+0xfe/0x240
> [ 273.033577][ T563] ? kcompactd_do_work+0x540/0x540
> [ 273.033582][ T563] kthread+0x28f/0x340
> [ 273.033585][ T563] ? kthread_complete_and_exit+0x40/0x40
> [ 273.033590][ T563] ret_from_fork+0x1f/0x30
> [ 273.033608][ T563] </TASK>
> [ 273.319687][ T564] BUG: sleeping function called from invalid context at mm/migrate.c:1380
> [ 273.319692][ T564] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 564, name: kcompactd1
> [ 273.319694][ T564] preempt_count: 1, expected: 0
> [ 273.319696][ T564] no locks held by kcompactd1/564.
> [ 273.319699][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 273.319702][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 273.319704][ T564] Call Trace:
> [ 273.319707][ T564] <TASK>
> [ 273.319713][ T564] dump_stack_lvl+0x45/0x59
> [ 273.319723][ T564] __might_resched.cold+0x15e/0x190
> [ 273.319734][ T564] migrate_pages+0x2b1/0x1200
> [ 273.319744][ T564] ? isolate_freepages+0x880/0x880
> [ 273.319752][ T564] ? split_map_pages+0x4c0/0x4c0
> [ 273.319762][ T564] ? buffer_migrate_page_norefs+0x40/0x40
> [ 273.319767][ T564] ? isolate_migratepages+0x300/0x6c0
> [ 273.319778][ T564] compact_zone+0xa3f/0x1640
> [ 273.319795][ T564] ? compaction_suitable+0x200/0x200
> [ 273.319800][ T564] ? lock_acquire+0x194/0x500
> [ 273.319807][ T564] ? finish_wait+0xc5/0x280
> [ 273.319816][ T564] proactive_compact_node+0xeb/0x180
> [ 273.319820][ T564] ? compact_store+0xc0/0xc0
> [ 273.319835][ T564] ? lockdep_hardirqs_on_prepare+0x19a/0x380
> [ 273.319839][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 273.319850][ T564] kcompactd+0x500/0xc80
> [ 273.319860][ T564] ? kcompactd_do_work+0x540/0x540
> [ 273.319866][ T564] ? prepare_to_swait_exclusive+0x240/0x240
> [ 273.319873][ T564] ? __kthread_parkme+0xd9/0x200
> [ 273.319877][ T564] ? schedule+0xfe/0x240
> [ 273.319882][ T564] ? kcompactd_do_work+0x540/0x540
> [ 273.319888][ T564] kthread+0x28f/0x340
> [ 273.319891][ T564] ? kthread_complete_and_exit+0x40/0x40
> [ 273.319896][ T564] ret_from_fork+0x1f/0x30
> [ 273.319914][ T564] </TASK>
> [ 273.637490][ T564] BUG: scheduling while atomic: kcompactd1/564/0x00000041
> [ 273.637496][ T564] no locks held by kcompactd1/564.
> [ 273.637498][ T564] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common sk
> x_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate btrfs blake2b_g
> eneric xor raid6_pq zstd_compress libcrc32c crc32c_intel sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ast drm_vram_helper drm_ttm_helper
> ipmi_ssif ttm drm_kms_helper ahci syscopyarea libahci sysfillrect mei_me intel_uncore acpi_ipmi i2c_i801 sysimgblt ioatdma ipmi_si mei libata joydev fb
> _sys_fops i2c_smbus lpc_ich intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter fuse ip_tables
> [ 273.637556][ T564] CPU: 80 PID: 564 Comm: kcompactd1 Tainted: G S W 5.19.0-rc2-00008-g292baeb4c714 #1
> [ 273.637560][ T564] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
> [ 273.637562][ T564] Call Trace:
> [ 273.637565][ T564] <TASK>
> [ 273.637571][ T564] dump_stack_lvl+0x45/0x59
> [ 273.637580][ T564] __schedule_bug.cold+0xcf/0xe0
> [ 273.637589][ T564] schedule_debug+0x274/0x300
> [ 273.637600][ T564] __schedule+0xf5/0x1740
> [ 273.637612][ T564] ? io_schedule_timeout+0x180/0x180
> [ 273.637616][ T564] ? find_held_lock+0x2c/0x140
> [ 273.637622][ T564] ? prepare_to_wait_event+0xcd/0x6c0
> [ 273.637633][ T564] schedule+0xea/0x240
> [ 273.637638][ T564] schedule_timeout+0x11b/0x240
> [ 273.637645][ T564] ? usleep_range_state+0x180/0x180
> [ 273.637650][ T564] ? timer_migration_handler+0xc0/0xc0
> [ 273.637659][ T564] ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [ 273.637664][ T564] ? prepare_to_wait_event+0xcd/0x6c0
> [ 273.637671][ T564] kcompactd+0x870/0xc80
> [ 273.637687][ T564] ? kcompactd_do_work+0x540/0x540
> [ 273.637692][ T564] ? prepare_to_swait_exclusive+0x240/0x240
> [ 273.637700][ T564] ? __kthread_parkme+0xd9/0x200
> [ 273.637704][ T564] ? schedule+0xfe/0x240
> [ 273.637707][ T564] ? kcompactd_do_work+0x540/0x540
> [ 273.637713][ T564] kthread+0x28f/0x340
> [ 273.637716][ T564] ? kthread_complete_and_exit+0x40/0x40
> [ 273.637722][ T564] ret_from_fork+0x1f/0x30
> [ 273.637740][ T564] </TASK>
> [ 285.377624][ T1147]
>
>
>
>>
>>
>> From: Mel Gorman <[email protected]>
>> Subject: mm/page_alloc: replace local_lock with normal spinlock -fix
>> Date: Mon, 27 Jun 2022 09:46:45 +0100
>>
>> As noted by Yu Zhao, use pcp_spin_trylock_irqsave instead of
>> pcpu_spin_trylock_irqsave. This is a fix to the mm-unstable patch
>> mm-page_alloc-replace-local_lock-with-normal-spinlock.patch
>>
>> Link: https://lkml.kernel.org/r/[email protected]
>> Signed-off-by: Mel Gorman <[email protected]>
>> Reported-by: Yu Zhao <[email protected]>
>> Signed-off-by: Andrew Morton <[email protected]>
>> ---
>>
>> mm/page_alloc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- a/mm/page_alloc.c~mm-page_alloc-replace-local_lock-with-normal-spinlock-fix
>> +++ a/mm/page_alloc.c
>> @@ -3497,7 +3497,7 @@ void free_unref_page(struct page *page,
>>
>> zone = page_zone(page);
>> pcp_trylock_prepare(UP_flags);
>> - pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags);
>> + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags);
>> if (pcp) {
>> free_unref_page_commit(zone, pcp, page, migratetype, order);
>> pcp_spin_unlock_irqrestore(pcp, flags);
>> _
>>

2022-07-08 11:09:43

by Mel Gorman

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

On Thu, Jul 07, 2022 at 11:55:35PM +0200, Vlastimil Babka wrote:
> On 7/5/22 15:51, Oliver Sang wrote:
> > Hi Andrew Morton,
> >
> > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> >> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> >>
> >> > FYI, we noticed the following commit (built with gcc-11):
> >> >
> >> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> >> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> >> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> >> > patch link: https://lore.kernel.org/lkml/[email protected]
> >> >
> >>
> >> Did this test include the followup patch
> >> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
> >
> > no, we just fetched original patch set and test upon it.
>
> It appears you fetched v4, not v5. I noticed it from the threading of your
> report that was threaded in the v4 thread, and also the github url: above.
> In v4, pcpu_spin_trylock_irqsave() was missing an unpin, and indeed it's
> missing in the github branch you were testing:
>

Thanks Vlastimil! This is my fault, I failed to verify that the code in
my tree, Andrew's tree and what Oliver tested were the same so no wonder I
could not find where the missing unpin was. I've gone through mm-unstable
commits be42c869b8e..4143c9b5266 and can confirm that they are now identical
to my own tree which includes Andrew's fix for the smatch warning that
Dan reported.

# git diff HEAD^..mm-pcpspinnoirq-v6r1-mmunstable | wc -l
0

The only difference between my tree and Andrew's is that there is a head
commit for "mm/page_alloc: Do not disable IRQs for per-cpu allocations"
which has been put on hold for now.

--
Mel Gorman
SUSE Labs

2022-07-12 05:27:20

by Oliver Sang

[permalink] [raw]
Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c

Hi Mel Gorman, Hi Vlastimil Babka,

thanks a lot for information!
normally we would discard a report if we found there is a new version for a
patch set. however, for this one, we failed to fetch v5 from mailing list.

sorry if any inconvenience.

On Fri, Jul 08, 2022 at 11:56:03AM +0100, Mel Gorman wrote:
> On Thu, Jul 07, 2022 at 11:55:35PM +0200, Vlastimil Babka wrote:
> > On 7/5/22 15:51, Oliver Sang wrote:
> > > Hi Andrew Morton,
> > >
> > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote:
> > >> On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot <[email protected]> wrote:
> > >>
> > >> > FYI, we noticed the following commit (built with gcc-11):
> > >> >
> > >> > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock")
> > >> > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139
> > >> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> > >> > patch link: https://lore.kernel.org/lkml/[email protected]
> > >> >
> > >>
> > >> Did this test include the followup patch
> > >> mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch?
> > >
> > > no, we just fetched original patch set and test upon it.
> >
> > It appears you fetched v4, not v5. I noticed it from the threading of your
> > report that was threaded in the v4 thread, and also the github url: above.
> > In v4, pcpu_spin_trylock_irqsave() was missing an unpin, and indeed it's
> > missing in the github branch you were testing:
> >
>
> Thanks Vlastimil! This is my fault, I failed to verify that the code in
> my tree, Andrew's tree and what Oliver tested were the same so no wonder I
> could not find where the missing unpin was. I've gone through mm-unstable
> commits be42c869b8e..4143c9b5266 and can confirm that they are now identical
> to my own tree which includes Andrew's fix for the smatch warning that
> Dan reported.
>
> # git diff HEAD^..mm-pcpspinnoirq-v6r1-mmunstable | wc -l
> 0
>
> The only difference between my tree and Andrew's is that there is a head
> commit for "mm/page_alloc: Do not disable IRQs for per-cpu allocations"
> which has been put on hold for now.
>
> --
> Mel Gorman
> SUSE Labs