2023-07-07 20:26:28

by Jiaqi Yan

[permalink] [raw]
Subject: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
hugetlb folio is a raw HWPOISON page. This functionality relies on
RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
becomes meaningless.

is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
the iterating used by __is_raw_hwp_list. Caller can minimize the
overhead of lock cycles by first checking if folio / head page's
HWPOISON flag is set.

Exports this functionality to be immediately used in the read operation
for hugetlbfs.

Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Jiaqi Yan <[email protected]>
---
include/linux/hugetlb.h | 19 +++++++++++++++++++
include/linux/mm.h | 7 +++++++
mm/hugetlb.c | 10 ++++++++++
mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
4 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ca3c8e10f24a..4a745af98525 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1007,6 +1007,25 @@ void hugetlb_register_node(struct node *node);
void hugetlb_unregister_node(struct node *node);
#endif

+/*
+ * Struct raw_hwp_page represents information about "raw error page",
+ * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
+ */
+struct raw_hwp_page {
+ struct llist_node node;
+ struct page *page;
+};
+
+static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
+{
+ return (struct llist_head *)&folio->_hugetlb_hwpoison;
+}
+
+/*
+ * Check if a given raw @subpage in a hugepage @folio is HWPOISON.
+ */
+bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 74f1be743ba2..edaa18b6f731 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3680,6 +3680,7 @@ extern const struct attribute_group memory_failure_attr_group;
extern void memory_failure_queue(unsigned long pfn, int flags);
extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
bool *migratable_cleared);
+extern bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage);
void num_poisoned_pages_inc(unsigned long pfn);
void num_poisoned_pages_sub(unsigned long pfn, long i);
struct task_struct *task_early_kill(struct task_struct *tsk, int force_early);
@@ -3694,6 +3695,12 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
return 0;
}

+static inline bool __is_raw_hwp_subpage(struct folio *folio,
+ struct page *subpage)
+{
+ return false;
+}
+
static inline void num_poisoned_pages_inc(unsigned long pfn)
{
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bce28cca73a1..9c608d2f6630 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7373,6 +7373,16 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
return ret;
}

+bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
+{
+ bool ret;
+
+ spin_lock_irq(&hugetlb_lock);
+ ret = __is_raw_hwp_subpage(folio, subpage);
+ spin_unlock_irq(&hugetlb_lock);
+ return ret;
+}
+
void folio_putback_active_hugetlb(struct folio *folio)
{
spin_lock_irq(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a08677dcf953..5b6c8ceb13c0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1813,18 +1813,32 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
#endif /* CONFIG_FS_DAX */

#ifdef CONFIG_HUGETLB_PAGE
-/*
- * Struct raw_hwp_page represents information about "raw error page",
- * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
- */
-struct raw_hwp_page {
- struct llist_node node;
- struct page *page;
-};

-static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
+bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
{
- return (struct llist_head *)&folio->_hugetlb_hwpoison;
+ struct llist_head *raw_hwp_head;
+ struct raw_hwp_page *p, *tmp;
+ bool ret = false;
+
+ if (!folio_test_hwpoison(folio))
+ return false;
+
+ /*
+ * When RawHwpUnreliable is set, kernel lost track of which subpages
+ * are HWPOISON. So return as if ALL subpages are HWPOISONed.
+ */
+ if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+ return true;
+
+ raw_hwp_head = raw_hwp_list_head(folio);
+ llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
+ if (subpage == p->page) {
+ ret = true;
+ break;
+ }
+ }
+
+ return ret;
}

static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
--
2.41.0.255.g8b1d071c50-goog



2023-07-07 20:51:44

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a

This is incorrect naming. "subpage" was needed before we had the
folio concept, but it should not be used any more. We have folios
and pages now.

Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
hwp is already used as an acronym by acpi, intel_pstate, some clock
drivers, an ethernet driver, and a scsi driver.


2023-07-07 21:10:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Fri, 7 Jul 2023 21:31:39 +0100 Matthew Wilcox <[email protected]> wrote:

> On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
>
> This is incorrect naming. "subpage" was needed before we had the
> folio concept, but it should not be used any more. We have folios
> and pages now.
>
> Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> hwp is already used as an acronym by acpi, intel_pstate, some clock
> drivers, an ethernet driver, and a scsi driver.

Thanks, Matthew. I'll merge v3 so we hit next -next (which actually
won't be until Monday), and add a note that v4 is expected.


2023-07-10 00:27:22

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote:
> On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
>
> This is incorrect naming. "subpage" was needed before we had the
> folio concept, but it should not be used any more. We have folios
> and pages now.

I think we can address the raw hwpoison page by the offset in folio/hugepage
to eliminate the concept of "subpage".

>
> Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> hwp is already used as an acronym by acpi, intel_pstate, some clock
> drivers, an ethernet driver, and a scsi driver.

I originally introduced the abbreviation "hwp" to avoid using a lengthy
function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()."
Therefore, I prefer using "rawhwp" instead of a longer form like
"raw_hwpoison," although I don't expect any confusion between "hwp" and
"rawhwp."
As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c,
we can easily substitute it with "hwpoison_walk."

Thanks,
Naoya Horiguchi

2023-07-10 15:50:47

by Jiaqi Yan

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Sun, Jul 9, 2023 at 5:21 PM Naoya Horiguchi
<[email protected]> wrote:
>
> On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote:
> > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> >
> > This is incorrect naming. "subpage" was needed before we had the
> > folio concept, but it should not be used any more. We have folios
> > and pages now.
>

Thanks for your comment, Matthew.

> I think we can address the raw hwpoison page by the offset in folio/hugepage
> to eliminate the concept of "subpage".
>
> >
> > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> > hwp is already used as an acronym by acpi, intel_pstate, some clock
> > drivers, an ethernet driver, and a scsi driver.
>
> I originally introduced the abbreviation "hwp" to avoid using a lengthy
> function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()."
> Therefore, I prefer using "rawhwp" instead of a longer form like
> "raw_hwpoison," although I don't expect any confusion between "hwp" and
> "rawhwp."

These are names in my mind, what do you think?
* is_rawhwp_page_in_hugepage
* is_raw_hwpoison_page_in_hugepage // I prefer this one
* folio_test_hugetlb_raw_hwpoison_page

> As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c,
> we can easily substitute it with "hwpoison_walk."

In this "hwp_walk" case, I also prefer "hwpoison" than "hwp". I can
create a separate renaming patch.

>
> Thanks,
> Naoya Horiguchi

2023-07-10 16:22:11

by Jiaqi Yan

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <[email protected]> wrote:
>
> On 2023/7/8 4:19, Jiaqi Yan wrote:
> > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> > hugetlb folio is a raw HWPOISON page. This functionality relies on
> > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
> > becomes meaningless.
> >
> > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
> > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
> > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
> > the iterating used by __is_raw_hwp_list. Caller can minimize the
> > overhead of lock cycles by first checking if folio / head page's
> > HWPOISON flag is set.
> >
> > Exports this functionality to be immediately used in the read operation
> > for hugetlbfs.
> >
> > Reviewed-by: Mike Kravetz <[email protected]>
> > Reviewed-by: Naoya Horiguchi <[email protected]>
> > Signed-off-by: Jiaqi Yan <[email protected]>
> > ---
> > include/linux/hugetlb.h | 19 +++++++++++++++++++
> > include/linux/mm.h | 7 +++++++
> > mm/hugetlb.c | 10 ++++++++++
> > mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
> > 4 files changed, 60 insertions(+), 10 deletions(-)
> > ...
> > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
> > {
> > - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > + struct llist_head *raw_hwp_head;
> > + struct raw_hwp_page *p, *tmp;
> > + bool ret = false;
> > +
> > + if (!folio_test_hwpoison(folio))
> > + return false;
> > +
> > + /*
> > + * When RawHwpUnreliable is set, kernel lost track of which subpages
> > + * are HWPOISON. So return as if ALL subpages are HWPOISONed.
> > + */
> > + if (folio_test_hugetlb_raw_hwp_unreliable(folio))
> > + return true;
> > +
> > + raw_hwp_head = raw_hwp_list_head(folio);
> > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
>
> Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe?
>
> > + if (subpage == p->page) {
> > + ret = true;
> > + break;
> > + }
> > + }
> > +
> > + return ret;
> > }
>
> It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> unpoison_memory __is_raw_hwp_subpage
> if (!folio_test_hwpoison(folio)) -- hwpoison is set
> folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> llist_del_all ..
> folio_test_clear_hwpoison
>

Thanks Miaohe for raising this concern.

> But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> folio_mapping == NULL check before folio_free_raw_hwp.

I agree. But in near future I do want to make __is_raw_hwp_subpage
work for shared-mapping hugetlb, so it would be nice to work with
unpoison_memory. It doesn't seem to me that holding mf_mutex in
__is_raw_hwp_subpage is nice or even absolutely correct. Let me think
if I can come up with something in v4.

>
> Anyway, this patch looks good to me.
>
> Reviewed-by: Miaohe Lin <[email protected]>
> Thanks.
>

2023-07-11 09:18:55

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Mon, Jul 10, 2023 at 08:11:48AM -0700, Jiaqi Yan wrote:
> On Sun, Jul 9, 2023 at 5:21 PM Naoya Horiguchi
> <[email protected]> wrote:
> >
> > On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote:
> > > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> > >
> > > This is incorrect naming. "subpage" was needed before we had the
> > > folio concept, but it should not be used any more. We have folios
> > > and pages now.
> >
>
> Thanks for your comment, Matthew.
>
> > I think we can address the raw hwpoison page by the offset in folio/hugepage
> > to eliminate the concept of "subpage".
> >
> > >
> > > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> > > hwp is already used as an acronym by acpi, intel_pstate, some clock
> > > drivers, an ethernet driver, and a scsi driver.
> >
> > I originally introduced the abbreviation "hwp" to avoid using a lengthy
> > function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()."
> > Therefore, I prefer using "rawhwp" instead of a longer form like
> > "raw_hwpoison," although I don't expect any confusion between "hwp" and
> > "rawhwp."
>
> These are names in my mind, what do you think?
> * is_rawhwp_page_in_hugepage
> * is_raw_hwpoison_page_in_hugepage // I prefer this one

This one is fine to me.

> * folio_test_hugetlb_raw_hwpoison_page
>
> > As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c,
> > we can easily substitute it with "hwpoison_walk."
>
> In this "hwp_walk" case, I also prefer "hwpoison" than "hwp". I can
> create a separate renaming patch.

Great, thank you.

- Naoya Horiguchi

2023-07-11 17:42:07

by Jiaqi Yan

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <[email protected]> wrote:
>
> On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <[email protected]> wrote:
> >
> > On 2023/7/8 4:19, Jiaqi Yan wrote:
> > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> > > hugetlb folio is a raw HWPOISON page. This functionality relies on
> > > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
> > > becomes meaningless.
> > >
> > > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
> > > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
> > > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
> > > the iterating used by __is_raw_hwp_list. Caller can minimize the
> > > overhead of lock cycles by first checking if folio / head page's
> > > HWPOISON flag is set.
> > >
> > > Exports this functionality to be immediately used in the read operation
> > > for hugetlbfs.
> > >
> > > Reviewed-by: Mike Kravetz <[email protected]>
> > > Reviewed-by: Naoya Horiguchi <[email protected]>
> > > Signed-off-by: Jiaqi Yan <[email protected]>
> > > ---
> > > include/linux/hugetlb.h | 19 +++++++++++++++++++
> > > include/linux/mm.h | 7 +++++++
> > > mm/hugetlb.c | 10 ++++++++++
> > > mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
> > > 4 files changed, 60 insertions(+), 10 deletions(-)
> > > ...
> > > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
> > > {
> > > - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > > + struct llist_head *raw_hwp_head;
> > > + struct raw_hwp_page *p, *tmp;
> > > + bool ret = false;
> > > +
> > > + if (!folio_test_hwpoison(folio))
> > > + return false;
> > > +
> > > + /*
> > > + * When RawHwpUnreliable is set, kernel lost track of which subpages
> > > + * are HWPOISON. So return as if ALL subpages are HWPOISONed.
> > > + */
> > > + if (folio_test_hugetlb_raw_hwp_unreliable(folio))
> > > + return true;
> > > +
> > > + raw_hwp_head = raw_hwp_list_head(folio);
> > > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
> >
> > Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe?

Sorry I missed this comment. Yes they are the same but
llist_for_each_entry doesn't need "tmp". I will switch to
llist_for_each_entry in v4.

>
> >
> > > + if (subpage == p->page) {
> > > + ret = true;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + return ret;
> > > }
> >
> > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> > unpoison_memory __is_raw_hwp_subpage
> > if (!folio_test_hwpoison(folio)) -- hwpoison is set
> > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> > llist_del_all ..
> > folio_test_clear_hwpoison
> >
>
> Thanks Miaohe for raising this concern.
>
> > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> > folio_mapping == NULL check before folio_free_raw_hwp.
>
> I agree. But in near future I do want to make __is_raw_hwp_subpage
> work for shared-mapping hugetlb, so it would be nice to work with
> unpoison_memory. It doesn't seem to me that holding mf_mutex in
> __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
> if I can come up with something in v4.

At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
before llist_for_each_entry, it will introduce a deadlock:

unpoison_memory __is_raw_hwp_subpage
held mf_mutex held hugetlb_lock
get_hwpoison_hugetlb_folio attempts mf_mutex
attempts hugetlb lock

Not for this patch series, but for future, is it a good idea to make
mf_mutex available to hugetlb code? Then enforce the order of locking
to be mf_mutex first, hugetlb_lock second? I believe this is the
current locking pattern / order for try_memory_failure_hugetlb.


>
> >
> > Anyway, this patch looks good to me.
> >
> > Reviewed-by: Miaohe Lin <[email protected]>
> > Thanks.
> >

2023-07-11 18:59:29

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On 07/11/23 10:05, Jiaqi Yan wrote:
> On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <[email protected]> wrote:
> > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <[email protected]> wrote:
> > > On 2023/7/8 4:19, Jiaqi Yan wrote:
> > >
> > > > + if (subpage == p->page) {
> > > > + ret = true;
> > > > + break;
> > > > + }
> > > > + }
> > > > +
> > > > + return ret;
> > > > }
> > >
> > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> > > unpoison_memory __is_raw_hwp_subpage
> > > if (!folio_test_hwpoison(folio)) -- hwpoison is set
> > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> > > llist_del_all ..
> > > folio_test_clear_hwpoison
> > >
> >
> > Thanks Miaohe for raising this concern.
> >
> > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> > > folio_mapping == NULL check before folio_free_raw_hwp.
> >
> > I agree. But in near future I do want to make __is_raw_hwp_subpage
> > work for shared-mapping hugetlb, so it would be nice to work with
> > unpoison_memory. It doesn't seem to me that holding mf_mutex in
> > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
> > if I can come up with something in v4.
>
> At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
> before llist_for_each_entry, it will introduce a deadlock:
>
> unpoison_memory __is_raw_hwp_subpage
> held mf_mutex held hugetlb_lock
> get_hwpoison_hugetlb_folio attempts mf_mutex
> attempts hugetlb lock
>
> Not for this patch series, but for future, is it a good idea to make
> mf_mutex available to hugetlb code? Then enforce the order of locking
> to be mf_mutex first, hugetlb_lock second? I believe this is the
> current locking pattern / order for try_memory_failure_hugetlb.

I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient
to prevent races with unpoison_memory. memory failure code needs to take
both mf_mutex and hugetlb_lock. The hugetlb lock is to prevent hugetlb
page state changes. IIUC, __is_raw_hwp_subpage is only taking hugetlb_lock
to prevent races with memory failure code.

Of course, I could be missing something as there are subtle issues with
locking in the memory failure code.
--
Mike Kravetz

2023-07-11 23:01:51

by Jiaqi Yan

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON

On Tue, Jul 11, 2023 at 11:02 AM Mike Kravetz <[email protected]> wrote:
>
> On 07/11/23 10:05, Jiaqi Yan wrote:
> > On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <[email protected]> wrote:
> > > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <[email protected]> wrote:
> > > > On 2023/7/8 4:19, Jiaqi Yan wrote:
> > > >
> > > > > + if (subpage == p->page) {
> > > > > + ret = true;
> > > > > + break;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + return ret;
> > > > > }
> > > >
> > > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> > > > unpoison_memory __is_raw_hwp_subpage
> > > > if (!folio_test_hwpoison(folio)) -- hwpoison is set
> > > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> > > > llist_del_all ..
> > > > folio_test_clear_hwpoison
> > > >
> > >
> > > Thanks Miaohe for raising this concern.
> > >
> > > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> > > > folio_mapping == NULL check before folio_free_raw_hwp.
> > >
> > > I agree. But in near future I do want to make __is_raw_hwp_subpage
> > > work for shared-mapping hugetlb, so it would be nice to work with
> > > unpoison_memory. It doesn't seem to me that holding mf_mutex in
> > > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
> > > if I can come up with something in v4.
> >
> > At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
> > before llist_for_each_entry, it will introduce a deadlock:
> >
> > unpoison_memory __is_raw_hwp_subpage
> > held mf_mutex held hugetlb_lock
> > get_hwpoison_hugetlb_folio attempts mf_mutex
> > attempts hugetlb lock
> >
> > Not for this patch series, but for future, is it a good idea to make
> > mf_mutex available to hugetlb code? Then enforce the order of locking
> > to be mf_mutex first, hugetlb_lock second? I believe this is the
> > current locking pattern / order for try_memory_failure_hugetlb.
>
> I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient
> to prevent races with unpoison_memory. memory failure code needs to take
> both mf_mutex and hugetlb_lock. The hugetlb lock is to prevent hugetlb
> page state changes. IIUC, __is_raw_hwp_subpage is only taking hugetlb_lock
> to prevent races with memory failure code.

Thanks Mike, I think mf_mutex is a simple and correct solution. I will
drop hugetlb_lock from __is_raw_hwp_subpage in v4.

>
> Of course, I could be missing something as there are subtle issues with
> locking in the memory failure code.
> --
> Mike Kravetz