2022-04-24 15:35:18

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

There is a bug in unuse_pte(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space. In case
of error, a special swap entry indicating swap read fails is set to the
page table. So the swapcache page can be freed and the user won't end up
with a permanently mounted swap because a sector is bad. And if the page
is accessed later, the user process will be killed so that corrupted data
is never consumed. On the other hand, if the page is never accessed, the
user won't even notice it.

Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
---
include/linux/swap.h | 7 ++++++-
include/linux/swapops.h | 10 ++++++++++
mm/memory.c | 5 ++++-
mm/swapfile.c | 11 +++++++++++
4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5553189d0215..b82c196d8867 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
* actions on faults.
*/

+#define SWP_SWAPIN_ERROR_NUM 1
+#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
+ SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
+ SWP_PTE_MARKER_NUM)
/*
* PTE markers are used to persist information onto PTEs that are mapped with
* file-backed memories. As its name "PTE" hints, it should only be applied to
@@ -120,7 +124,8 @@ static inline int current_is_kswapd(void)

#define MAX_SWAPFILES \
((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \
- SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM)
+ SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - \
+ SWP_PTE_MARKER_NUM - SWP_SWAPIN_ERROR_NUM)

/*
* Magic header for a swap area. The first part of the union is
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a291f210e7f8..9d989ed049a6 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -108,6 +108,16 @@ static inline void *swp_to_radix_entry(swp_entry_t entry)
return xa_mk_value(entry.val);
}

+static inline swp_entry_t make_swapin_error_entry(struct page *page)
+{
+ return swp_entry(SWP_SWAPIN_ERROR, page_to_pfn(page));
+}
+
+static inline int is_swapin_error_entry(swp_entry_t entry)
+{
+ return swp_type(entry) == SWP_SWAPIN_ERROR;
+}
+
#if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
{
diff --git a/mm/memory.c b/mm/memory.c
index f4161fb07ffa..626f63858e0c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1488,7 +1488,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
/* Only drop the uffd-wp marker if explicitly requested */
if (!zap_drop_file_uffd_wp(details))
continue;
- } else if (is_hwpoison_entry(entry)) {
+ } else if (is_hwpoison_entry(entry) ||
+ is_swapin_error_entry(entry)) {
if (!should_zap_cows(details))
continue;
} else {
@@ -3728,6 +3729,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
} else if (is_hwpoison_entry(entry)) {
ret = VM_FAULT_HWPOISON;
+ } else if (is_swapin_error_entry(entry)) {
+ ret = VM_FAULT_SIGBUS;
} else if (is_pte_marker_entry(entry)) {
ret = handle_pte_marker(vmf);
} else {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 9398e915b36b..95b63f69f388 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1797,6 +1797,17 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
goto out;
}

+ if (unlikely(!PageUptodate(page))) {
+ pte_t pteval;
+
+ dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
+ pteval = swp_entry_to_pte(make_swapin_error_entry(page));
+ set_pte_at(vma->vm_mm, addr, pte, pteval);
+ swap_free(entry);
+ ret = 0;
+ goto out;
+ }
+
/* See do_swap_page() */
BUG_ON(!PageAnon(page) && PageMappedToDisk(page));
BUG_ON(PageAnon(page) && PageAnonExclusive(page));
--
2.23.0


Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On Mon, Apr 25, 2022 at 10:20:23AM +0800, Miaohe Lin wrote:
> On 2022/4/25 9:08, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
> >> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> >> page filled with random data is mapped into user address space. In case
> >> of error, a special swap entry indicating swap read fails is set to the
> >> page table. So the swapcache page can be freed and the user won't end up
> >> with a permanently mounted swap because a sector is bad. And if the page
> >> is accessed later, the user process will be killed so that corrupted data
> >> is never consumed. On the other hand, if the page is never accessed, the
> >> user won't even notice it.
> >>
> >> Signed-off-by: Miaohe Lin <[email protected]>
> >> Acked-by: David Hildenbrand <[email protected]>
> >
> > Hi Miaohe,
> >
> > This bug sounds relatively serious to me, and it seems old, so is it worth
> > sending to -stable?
>
> This bug is really old but it's never seen yet because swapoff is supposed only to
> be done before rebooting the system. But swapoff can happen anytime. Poor guys might
> come across it and get wrong data. So I think it's worth sending to -stable.
>
> BTW: This patch should be revised in order to go to the stable version.

I sometimes have the same wonder, but I'm not sure about the rule. If you
choose to send another version, could you update subject line (subject line
is supposed to show what the patch does rather than what the problem is).

Thanks,
Naoya Horiguchi

2022-04-25 06:52:22

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/4/25 10:51, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Apr 25, 2022 at 10:20:23AM +0800, Miaohe Lin wrote:
>> On 2022/4/25 9:08, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>>>> page filled with random data is mapped into user address space. In case
>>>> of error, a special swap entry indicating swap read fails is set to the
>>>> page table. So the swapcache page can be freed and the user won't end up
>>>> with a permanently mounted swap because a sector is bad. And if the page
>>>> is accessed later, the user process will be killed so that corrupted data
>>>> is never consumed. On the other hand, if the page is never accessed, the
>>>> user won't even notice it.
>>>>
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>> Acked-by: David Hildenbrand <[email protected]>
>>>
>>> Hi Miaohe,
>>>
>>> This bug sounds relatively serious to me, and it seems old, so is it worth
>>> sending to -stable?
>>
>> This bug is really old but it's never seen yet because swapoff is supposed only to
>> be done before rebooting the system. But swapoff can happen anytime. Poor guys might
>> come across it and get wrong data. So I think it's worth sending to -stable.
>>
>> BTW: This patch should be revised in order to go to the stable version.
>
> I sometimes have the same wonder, but I'm not sure about the rule. If you
> choose to send another version, could you update subject line (subject line

What I mean is that SWP_PTE_MARKER is newly added and it will conflict with the stable version.
So this patch might need to be revised for specified stable version in order to fix the possible
conflict beforehand. Or that should be done when it goes to the stable ?

> is supposed to show what the patch does rather than what the problem is).

If a specified version for stable is required, I will do this.

Thanks!

>
> Thanks,
> Naoya Horiguchi
>

Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> page filled with random data is mapped into user address space. In case
> of error, a special swap entry indicating swap read fails is set to the
> page table. So the swapcache page can be freed and the user won't end up
> with a permanently mounted swap because a sector is bad. And if the page
> is accessed later, the user process will be killed so that corrupted data
> is never consumed. On the other hand, if the page is never accessed, the
> user won't even notice it.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> Acked-by: David Hildenbrand <[email protected]>

Hi Miaohe,

This bug sounds relatively serious to me, and it seems old, so is it worth
sending to -stable?

Thanks,
Naoya Horiguchi

2022-04-25 09:22:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 25.04.22 09:41, [email protected] wrote:
> Hi, Miaohe,
>
> On Sun, 2022-04-24 at 17:11 +0800, Miaohe Lin wrote:
>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>> page filled with random data is mapped into user address space. In case
>> of error, a special swap entry indicating swap read fails is set to the
>> page table. So the swapcache page can be freed and the user won't end up
>> with a permanently mounted swap because a sector is bad. And if the page
>> is accessed later, the user process will be killed so that corrupted data
>> is never consumed. On the other hand, if the page is never accessed, the
>> user won't even notice it.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> Acked-by: David Hildenbrand <[email protected]>
>> ---
>>  include/linux/swap.h | 7 ++++++-
>>  include/linux/swapops.h | 10 ++++++++++
>>  mm/memory.c | 5 ++++-
>>  mm/swapfile.c | 11 +++++++++++
>>  4 files changed, 31 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 5553189d0215..b82c196d8867 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
>>   * actions on faults.
>>   */
>>
>> +#define SWP_SWAPIN_ERROR_NUM 1
>> +#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
>> + SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
>> + SWP_PTE_MARKER_NUM)
>>
>>
>
> It appears wasteful to use another swap device number.

Do we really care?

We currently use 5 bits for swap types, so we have a total of 32.

SWP_HWPOISON_NUM -> 1
SWP_MIGRATION_NUM -> 3
SWP_PTE_MARKER_NUM -> 1
SWP_DEVICE_NUM -> 4
SWP_SWAPIN_ERROR_NUM -> 1

Which would leave us with 32 - 10 = 22 swap devices. IMHO that's plenty
for real life scenarios.

I'd prefer reworking this when we really run into trouble (and we could
think about using more bits for applicable architectures then, for
select 64bit architectures it might be fairly easily possible).

--
Thanks,

David / dhildenb

2022-04-25 09:27:01

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/4/25 9:08, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>> page filled with random data is mapped into user address space. In case
>> of error, a special swap entry indicating swap read fails is set to the
>> page table. So the swapcache page can be freed and the user won't end up
>> with a permanently mounted swap because a sector is bad. And if the page
>> is accessed later, the user process will be killed so that corrupted data
>> is never consumed. On the other hand, if the page is never accessed, the
>> user won't even notice it.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> Acked-by: David Hildenbrand <[email protected]>
>
> Hi Miaohe,
>
> This bug sounds relatively serious to me, and it seems old, so is it worth
> sending to -stable?

This bug is really old but it's never seen yet because swapoff is supposed only to
be done before rebooting the system. But swapoff can happen anytime. Poor guys might
come across it and get wrong data. So I think it's worth sending to -stable.

BTW: This patch should be revised in order to go to the stable version.

Thanks!

>
> Thanks,
> Naoya Horiguchi
>

2022-04-25 09:39:08

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

Hi, Miaohe,

On Sun, 2022-04-24 at 17:11 +0800, Miaohe Lin wrote:
> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> page filled with random data is mapped into user address space. In case
> of error, a special swap entry indicating swap read fails is set to the
> page table. So the swapcache page can be freed and the user won't end up
> with a permanently mounted swap because a sector is bad. And if the page
> is accessed later, the user process will be killed so that corrupted data
> is never consumed. On the other hand, if the page is never accessed, the
> user won't even notice it.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> Acked-by: David Hildenbrand <[email protected]>
> ---
>  include/linux/swap.h | 7 ++++++-
>  include/linux/swapops.h | 10 ++++++++++
>  mm/memory.c | 5 ++++-
>  mm/swapfile.c | 11 +++++++++++
>  4 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 5553189d0215..b82c196d8867 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
>   * actions on faults.
>   */
>
> +#define SWP_SWAPIN_ERROR_NUM 1
> +#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
> + SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
> + SWP_PTE_MARKER_NUM)
>
>

It appears wasteful to use another swap device number. Is it possible
to use a special swap offset? For example, 0 or -1?

Best Regards,
Huang, Ying


[snip]


2022-04-25 09:51:15

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 25.04.22 03:08, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>> page filled with random data is mapped into user address space. In case
>> of error, a special swap entry indicating swap read fails is set to the
>> page table. So the swapcache page can be freed and the user won't end up
>> with a permanently mounted swap because a sector is bad. And if the page
>> is accessed later, the user process will be killed so that corrupted data
>> is never consumed. On the other hand, if the page is never accessed, the
>> user won't even notice it.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> Acked-by: David Hildenbrand <[email protected]>
>
> Hi Miaohe,
>
> This bug sounds relatively serious to me, and it seems old, so is it worth
> sending to -stable?

I'm not sure if this is worth -stable, but no strong opinion.

The do_swap_page() part was added in 2005:

commit b81074800b98ac50b64d4c8d34e8abf0fda5e3d1
Author: Kirill Korotaev <[email protected]>
Date: Mon May 16 21:53:50 2005 -0700

[PATCH] do_swap_page() can map random data if swap read fails

There is a bug in do_swap_page(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space. The fix is
to check for PageUptodate and send SIGBUS in case of error.

Signed-Off-By: Kirill Korotaev <[email protected]>
Signed-Off-By: Alexey Kuznetsov <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

So the do_swap_page() part has been fixed for quite a while already.

--
Thanks,

David / dhildenb

2022-04-25 10:56:48

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On Mon, 2022-04-25 at 09:49 +0200, David Hildenbrand wrote:
> On 25.04.22 09:41, [email protected] wrote:
> > Hi, Miaohe,
> >
> > On Sun, 2022-04-24 at 17:11 +0800, Miaohe Lin wrote:
> > > There is a bug in unuse_pte(): when swap page happens to be unreadable,
> > > page filled with random data is mapped into user address space. In case
> > > of error, a special swap entry indicating swap read fails is set to the
> > > page table. So the swapcache page can be freed and the user won't end up
> > > with a permanently mounted swap because a sector is bad. And if the page
> > > is accessed later, the user process will be killed so that corrupted data
> > > is never consumed. On the other hand, if the page is never accessed, the
> > > user won't even notice it.
> > >
> > > Signed-off-by: Miaohe Lin <[email protected]>
> > > Acked-by: David Hildenbrand <[email protected]>
> > > ---
> > >  include/linux/swap.h | 7 ++++++-
> > >  include/linux/swapops.h | 10 ++++++++++
> > >  mm/memory.c | 5 ++++-
> > >  mm/swapfile.c | 11 +++++++++++
> > >  4 files changed, 31 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index 5553189d0215..b82c196d8867 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
> > >   * actions on faults.
> > >   */
> > >
> > > +#define SWP_SWAPIN_ERROR_NUM 1
> > > +#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
> > > + SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
> > > + SWP_PTE_MARKER_NUM)
> > >
> > >
> >
> > It appears wasteful to use another swap device number.
>
> Do we really care?
>
> We currently use 5 bits for swap types, so we have a total of 32.
>
> SWP_HWPOISON_NUM -> 1
> SWP_MIGRATION_NUM -> 3
> SWP_PTE_MARKER_NUM -> 1
> SWP_DEVICE_NUM -> 4
> SWP_SWAPIN_ERROR_NUM -> 1
>
> Which would leave us with 32 - 10 = 22 swap devices. IMHO that's plenty
> for real life scenarios.

Creating multiple swap partitions on one disk can improve the
scalability of swap subsystem, although we usually don't have so many
disks for swap.

> I'd prefer reworking this when we really run into trouble (and we could
> think about using more bits for applicable architectures then, for
> select 64bit architectures it might be fairly easily possible).

Best Regards,
Huang, Ying


2022-04-25 13:29:43

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/4/25 16:01, David Hildenbrand wrote:
> On 25.04.22 09:55, [email protected] wrote:
>> On Mon, 2022-04-25 at 09:49 +0200, David Hildenbrand wrote:
>>> On 25.04.22 09:41, [email protected] wrote:
>>>> Hi, Miaohe,
>>>>
>>>> On Sun, 2022-04-24 at 17:11 +0800, Miaohe Lin wrote:
>>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>>>>> page filled with random data is mapped into user address space. In case
>>>>> of error, a special swap entry indicating swap read fails is set to the
>>>>> page table. So the swapcache page can be freed and the user won't end up
>>>>> with a permanently mounted swap because a sector is bad. And if the page
>>>>> is accessed later, the user process will be killed so that corrupted data
>>>>> is never consumed. On the other hand, if the page is never accessed, the
>>>>> user won't even notice it.
>>>>>
>>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>>> Acked-by: David Hildenbrand <[email protected]>
>>>>> ---
>>>>>  include/linux/swap.h | 7 ++++++-
>>>>>  include/linux/swapops.h | 10 ++++++++++
>>>>>  mm/memory.c | 5 ++++-
>>>>>  mm/swapfile.c | 11 +++++++++++
>>>>>  4 files changed, 31 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>>> index 5553189d0215..b82c196d8867 100644
>>>>> --- a/include/linux/swap.h
>>>>> +++ b/include/linux/swap.h
>>>>> @@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
>>>>>   * actions on faults.
>>>>>   */
>>>>>
>>>>> +#define SWP_SWAPIN_ERROR_NUM 1
>>>>> +#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
>>>>> + SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
>>>>> + SWP_PTE_MARKER_NUM)
>>>>>
>>>>>
>>>>
>>>> It appears wasteful to use another swap device number.
>>>
>>> Do we really care?
>>>
>>> We currently use 5 bits for swap types, so we have a total of 32.
>>>
>>> SWP_HWPOISON_NUM -> 1
>>> SWP_MIGRATION_NUM -> 3
>>> SWP_PTE_MARKER_NUM -> 1
>>> SWP_DEVICE_NUM -> 4
>>> SWP_SWAPIN_ERROR_NUM -> 1
>>>
>>> Which would leave us with 32 - 10 = 22 swap devices. IMHO that's plenty
>>> for real life scenarios.
>>
>> Creating multiple swap partitions on one disk can improve the
>> scalability of swap subsystem, although we usually don't have so many
>> disks for swap.
>
> Exactly, and IMHO if we have 22 or 23 doesn't make a real difference
> here ...

I tend to agree with David. Thanks both!

>

2022-04-25 16:57:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 25.04.22 09:55, [email protected] wrote:
> On Mon, 2022-04-25 at 09:49 +0200, David Hildenbrand wrote:
>> On 25.04.22 09:41, [email protected] wrote:
>>> Hi, Miaohe,
>>>
>>> On Sun, 2022-04-24 at 17:11 +0800, Miaohe Lin wrote:
>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>>>> page filled with random data is mapped into user address space. In case
>>>> of error, a special swap entry indicating swap read fails is set to the
>>>> page table. So the swapcache page can be freed and the user won't end up
>>>> with a permanently mounted swap because a sector is bad. And if the page
>>>> is accessed later, the user process will be killed so that corrupted data
>>>> is never consumed. On the other hand, if the page is never accessed, the
>>>> user won't even notice it.
>>>>
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>> Acked-by: David Hildenbrand <[email protected]>
>>>> ---
>>>>  include/linux/swap.h | 7 ++++++-
>>>>  include/linux/swapops.h | 10 ++++++++++
>>>>  mm/memory.c | 5 ++++-
>>>>  mm/swapfile.c | 11 +++++++++++
>>>>  4 files changed, 31 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>> index 5553189d0215..b82c196d8867 100644
>>>> --- a/include/linux/swap.h
>>>> +++ b/include/linux/swap.h
>>>> @@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
>>>>   * actions on faults.
>>>>   */
>>>>
>>>> +#define SWP_SWAPIN_ERROR_NUM 1
>>>> +#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
>>>> + SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
>>>> + SWP_PTE_MARKER_NUM)
>>>>
>>>>
>>>
>>> It appears wasteful to use another swap device number.
>>
>> Do we really care?
>>
>> We currently use 5 bits for swap types, so we have a total of 32.
>>
>> SWP_HWPOISON_NUM -> 1
>> SWP_MIGRATION_NUM -> 3
>> SWP_PTE_MARKER_NUM -> 1
>> SWP_DEVICE_NUM -> 4
>> SWP_SWAPIN_ERROR_NUM -> 1
>>
>> Which would leave us with 32 - 10 = 22 swap devices. IMHO that's plenty
>> for real life scenarios.
>
> Creating multiple swap partitions on one disk can improve the
> scalability of swap subsystem, although we usually don't have so many
> disks for swap.

Exactly, and IMHO if we have 22 or 23 doesn't make a real difference
here ...

--
Thanks,

David / dhildenb

2022-04-25 17:25:51

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/4/25 15:45, David Hildenbrand wrote:
> On 25.04.22 03:08, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>>> page filled with random data is mapped into user address space. In case
>>> of error, a special swap entry indicating swap read fails is set to the
>>> page table. So the swapcache page can be freed and the user won't end up
>>> with a permanently mounted swap because a sector is bad. And if the page
>>> is accessed later, the user process will be killed so that corrupted data
>>> is never consumed. On the other hand, if the page is never accessed, the
>>> user won't even notice it.
>>>
>>> Signed-off-by: Miaohe Lin <[email protected]>
>>> Acked-by: David Hildenbrand <[email protected]>
>>
>> Hi Miaohe,
>>
>> This bug sounds relatively serious to me, and it seems old, so is it worth
>> sending to -stable?
>
> I'm not sure if this is worth -stable, but no strong opinion.

I have no strong opinion too. I'm just afraid someone might run into it. But swapoff is
expected to be a rare operation anyway...

>
> The do_swap_page() part was added in 2005:
>
> commit b81074800b98ac50b64d4c8d34e8abf0fda5e3d1
> Author: Kirill Korotaev <[email protected]>
> Date: Mon May 16 21:53:50 2005 -0700
>
> [PATCH] do_swap_page() can map random data if swap read fails
>
> There is a bug in do_swap_page(): when swap page happens to be unreadable,
> page filled with random data is mapped into user address space. The fix is
> to check for PageUptodate and send SIGBUS in case of error.
>
> Signed-Off-By: Kirill Korotaev <[email protected]>
> Signed-Off-By: Alexey Kuznetsov <[email protected]>
> Acked-by: Hugh Dickins <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> So the do_swap_page() part has been fixed for quite a while already.

Does this mean only do_swap_page maps random data if swap read fails is observed from that time on?
So this might not be worth -stable as it's never seen more than a decade?

Thanks!

>

Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On Mon, Apr 25, 2022 at 04:47:41PM +0800, Miaohe Lin wrote:
> On 2022/4/25 15:45, David Hildenbrand wrote:
> > On 25.04.22 03:08, HORIGUCHI NAOYA(堀口 直也) wrote:
> >> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
> >>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> >>> page filled with random data is mapped into user address space. In case
> >>> of error, a special swap entry indicating swap read fails is set to the
> >>> page table. So the swapcache page can be freed and the user won't end up
> >>> with a permanently mounted swap because a sector is bad. And if the page
> >>> is accessed later, the user process will be killed so that corrupted data
> >>> is never consumed. On the other hand, if the page is never accessed, the
> >>> user won't even notice it.
> >>>
> >>> Signed-off-by: Miaohe Lin <[email protected]>
> >>> Acked-by: David Hildenbrand <[email protected]>
> >>
> >> Hi Miaohe,
> >>
> >> This bug sounds relatively serious to me, and it seems old, so is it worth
> >> sending to -stable?
> >
> > I'm not sure if this is worth -stable, but no strong opinion.
>
> I have no strong opinion too. I'm just afraid someone might run into it. But swapoff is
> expected to be a rare operation anyway...
>
> >
> > The do_swap_page() part was added in 2005:
> >
> > commit b81074800b98ac50b64d4c8d34e8abf0fda5e3d1
> > Author: Kirill Korotaev <[email protected]>
> > Date: Mon May 16 21:53:50 2005 -0700
> >
> > [PATCH] do_swap_page() can map random data if swap read fails
> >
> > There is a bug in do_swap_page(): when swap page happens to be unreadable,
> > page filled with random data is mapped into user address space. The fix is
> > to check for PageUptodate and send SIGBUS in case of error.
> >
> > Signed-Off-By: Kirill Korotaev <[email protected]>
> > Signed-Off-By: Alexey Kuznetsov <[email protected]>
> > Acked-by: Hugh Dickins <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > Signed-off-by: Linus Torvalds <[email protected]>
> >
> > So the do_swap_page() part has been fixed for quite a while already.
>
> Does this mean only do_swap_page maps random data if swap read fails is observed from that time on?
> So this might not be worth -stable as it's never seen more than a decade?

OK, both choices seems possible, so not sending to -stable is fine to me.
It's finally up to you.

Thanks,
Naoya Horiguchi

2022-04-26 08:49:40

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/4/26 8:31, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Apr 25, 2022 at 04:47:41PM +0800, Miaohe Lin wrote:
>> On 2022/4/25 15:45, David Hildenbrand wrote:
>>> On 25.04.22 03:08, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>>>>> page filled with random data is mapped into user address space. In case
>>>>> of error, a special swap entry indicating swap read fails is set to the
>>>>> page table. So the swapcache page can be freed and the user won't end up
>>>>> with a permanently mounted swap because a sector is bad. And if the page
>>>>> is accessed later, the user process will be killed so that corrupted data
>>>>> is never consumed. On the other hand, if the page is never accessed, the
>>>>> user won't even notice it.
>>>>>
>>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>>> Acked-by: David Hildenbrand <[email protected]>
>>>>
>>>> Hi Miaohe,
>>>>
>>>> This bug sounds relatively serious to me, and it seems old, so is it worth
>>>> sending to -stable?
>>>
>>> I'm not sure if this is worth -stable, but no strong opinion.
>>
>> I have no strong opinion too. I'm just afraid someone might run into it. But swapoff is
>> expected to be a rare operation anyway...
>>
>>>
>>> The do_swap_page() part was added in 2005:
>>>
>>> commit b81074800b98ac50b64d4c8d34e8abf0fda5e3d1
>>> Author: Kirill Korotaev <[email protected]>
>>> Date: Mon May 16 21:53:50 2005 -0700
>>>
>>> [PATCH] do_swap_page() can map random data if swap read fails
>>>
>>> There is a bug in do_swap_page(): when swap page happens to be unreadable,
>>> page filled with random data is mapped into user address space. The fix is
>>> to check for PageUptodate and send SIGBUS in case of error.
>>>
>>> Signed-Off-By: Kirill Korotaev <[email protected]>
>>> Signed-Off-By: Alexey Kuznetsov <[email protected]>
>>> Acked-by: Hugh Dickins <[email protected]>
>>> Signed-off-by: Andrew Morton <[email protected]>
>>> Signed-off-by: Linus Torvalds <[email protected]>
>>>
>>> So the do_swap_page() part has been fixed for quite a while already.
>>
>> Does this mean only do_swap_page maps random data if swap read fails is observed from that time on?
>> So this might not be worth -stable as it's never seen more than a decade?
>
> OK, both choices seems possible, so not sending to -stable is fine to me.
> It's finally up to you.

I tend not to send it to -stable due to the above concern now.

Thanks!

>
> Thanks,
> Naoya Horiguchi
>

2022-05-10 11:07:57

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/5/10 14:17, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>> page filled with random data is mapped into user address space. In case
>> of error, a special swap entry indicating swap read fails is set to the
>> page table. So the swapcache page can be freed and the user won't end up
>> with a permanently mounted swap because a sector is bad. And if the page
>> is accessed later, the user process will be killed so that corrupted data
>> is never consumed. On the other hand, if the page is never accessed, the
>> user won't even notice it.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> Acked-by: David Hildenbrand <[email protected]>
>
> When I reproduced the issue (generated read error with dm-dust), I saw
> infinite loop in the while loop in shmem_unuse_inode() (and this happens
> even with this patch). I confirmed that shmem_swapin_page() returns -EIO,
> but shmem_unuse_swap_entries() does not return the error to the callers,
> so the while loop in shmem_unuse_inode() seems not break.
>

Many thanks for your report! I didn't test the shmem case because I saw -EIO
is returned. So I just focus on the normal page case. Sorry about it. :(

> So maybe you need more code around shmem_unuse_inode() to handle the error?

I will try to reproduce it and come up a fixup patch asap! And if you like, you
can kindly solve this issue too. ;)

Thanks a lot!

>
> Thanks,
> Naoya Horiguchi
>


2022-05-10 15:10:50

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/5/10 14:17, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
>> page filled with random data is mapped into user address space. In case
>> of error, a special swap entry indicating swap read fails is set to the
>> page table. So the swapcache page can be freed and the user won't end up
>> with a permanently mounted swap because a sector is bad. And if the page
>> is accessed later, the user process will be killed so that corrupted data
>> is never consumed. On the other hand, if the page is never accessed, the
>> user won't even notice it.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> Acked-by: David Hildenbrand <[email protected]>
>
> When I reproduced the issue (generated read error with dm-dust), I saw
> infinite loop in the while loop in shmem_unuse_inode() (and this happens
> even with this patch). I confirmed that shmem_swapin_page() returns -EIO,
> but shmem_unuse_swap_entries() does not return the error to the callers,
> so the while loop in shmem_unuse_inode() seems not break.

In the current implementation, try_to_unuse will keep trying to do shmem_unuse unless -ENOMEM is
returned from shmem_swapin_folio. This could be easily fixed by return -EIO when swapin error
occurs. But the user will end up with a permanently mounted swap just because a sector was bad.
One alternative is inventing a way to proceed the swapoff while preventing user from accessing
the wrong data. But this might complicate the code a lot and I need to learn more about shmem.
Any suggestion will be really grateful!

Thanks! :)

>
> So maybe you need more code around shmem_unuse_inode() to handle the error?
>
> Thanks,
> Naoya Horiguchi
>


Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On Sun, Apr 24, 2022 at 05:11:03PM +0800, Miaohe Lin wrote:
> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> page filled with random data is mapped into user address space. In case
> of error, a special swap entry indicating swap read fails is set to the
> page table. So the swapcache page can be freed and the user won't end up
> with a permanently mounted swap because a sector is bad. And if the page
> is accessed later, the user process will be killed so that corrupted data
> is never consumed. On the other hand, if the page is never accessed, the
> user won't even notice it.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> Acked-by: David Hildenbrand <[email protected]>

When I reproduced the issue (generated read error with dm-dust), I saw
infinite loop in the while loop in shmem_unuse_inode() (and this happens
even with this patch). I confirmed that shmem_swapin_page() returns -EIO,
but shmem_unuse_swap_entries() does not return the error to the callers,
so the while loop in shmem_unuse_inode() seems not break.

So maybe you need more code around shmem_unuse_inode() to handle the error?

Thanks,
Naoya Horiguchi

2022-05-13 14:19:35

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On 2022/5/13 8:42, Andrew Morton wrote:
> On Tue, 10 May 2022 14:58:05 +0800 Miaohe Lin <[email protected]> wrote:
>
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>> Acked-by: David Hildenbrand <[email protected]>
>>>
>>> When I reproduced the issue (generated read error with dm-dust), I saw
>>> infinite loop in the while loop in shmem_unuse_inode() (and this happens
>>> even with this patch). I confirmed that shmem_swapin_page() returns -EIO,
>>> but shmem_unuse_swap_entries() does not return the error to the callers,
>>> so the while loop in shmem_unuse_inode() seems not break.
>>>
>>
>> Many thanks for your report! I didn't test the shmem case because I saw -EIO
>> is returned. So I just focus on the normal page case. Sorry about it. :(
>>
>>> So maybe you need more code around shmem_unuse_inode() to handle the error?
>>
>> I will try to reproduce it and come up a fixup patch asap! And if you like, you
>> can kindly solve this issue too. ;)
>
> Seems that this patch didn't cause the infinite loop, so as far as I
> can tell it is good to be merged up. But the problem it solves isn't
> urgent and fixing that infinite loop might impact this change so I
> think I'll drop this version.

I will update and resend the corresponding patch series when I fix this infinite loop.

Thanks!

> .
>


2022-05-14 00:02:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] mm/swapfile: unuse_pte can map random data if swap read fails

On Tue, 10 May 2022 14:58:05 +0800 Miaohe Lin <[email protected]> wrote:

> >> Signed-off-by: Miaohe Lin <[email protected]>
> >> Acked-by: David Hildenbrand <[email protected]>
> >
> > When I reproduced the issue (generated read error with dm-dust), I saw
> > infinite loop in the while loop in shmem_unuse_inode() (and this happens
> > even with this patch). I confirmed that shmem_swapin_page() returns -EIO,
> > but shmem_unuse_swap_entries() does not return the error to the callers,
> > so the while loop in shmem_unuse_inode() seems not break.
> >
>
> Many thanks for your report! I didn't test the shmem case because I saw -EIO
> is returned. So I just focus on the normal page case. Sorry about it. :(
>
> > So maybe you need more code around shmem_unuse_inode() to handle the error?
>
> I will try to reproduce it and come up a fixup patch asap! And if you like, you
> can kindly solve this issue too. ;)

Seems that this patch didn't cause the infinite loop, so as far as I
can tell it is good to be merged up. But the problem it solves isn't
urgent and fixing that infinite loop might impact this change so I
think I'll drop this version.