2023-01-05 11:26:53

by Yishai Hadas

[permalink] [raw]
Subject: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
in its 'sgt_append->prv' flow to check whether it can merge contiguous
pages into the last SG, it passes the page arguments in the wrong order.

The first parameter should be the next candidate page to be merged to
the last page and not the opposite.

The current code leads to a corrupted SG which resulted in OOPs and
unexpected errors when non-contiguous pages are merged wrongly.

Fix to pass the page parameters in the right order.

Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
Signed-off-by: Yishai Hadas <[email protected]>
---
lib/scatterlist.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index a0ad2a7959b5..f72aa50c6654 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -476,7 +476,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
/* Merge contiguous pages into the last SG */
prv_len = sgt_append->prv->length;
last_pg = sg_page(sgt_append->prv);
- while (n_pages && pages_are_mergeable(last_pg, pages[0])) {
+ while (n_pages && pages_are_mergeable(pages[0], last_pg)) {
if (sgt_append->prv->length + PAGE_SIZE > max_segment)
break;
sgt_append->prv->length += PAGE_SIZE;
--
2.18.1


2023-01-05 14:14:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
> When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
> in its 'sgt_append->prv' flow to check whether it can merge contiguous
> pages into the last SG, it passes the page arguments in the wrong order.
>
> The first parameter should be the next candidate page to be merged to
> the last page and not the opposite.
>
> The current code leads to a corrupted SG which resulted in OOPs and
> unexpected errors when non-contiguous pages are merged wrongly.
>
> Fix to pass the page parameters in the right order.
>
> Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
> Signed-off-by: Yishai Hadas <[email protected]>
> ---
> lib/scatterlist.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Also, I'm looking more closely at '156 and this is not right either:

- unsigned long paddr =
- (page_to_pfn(sg_page(sgt_append->prv)) * PAGE_SIZE +
- sgt_append->prv->offset + sgt_append->prv->length) /
- PAGE_SIZE;
-
- while (n_pages && page_to_pfn(pages[0]) == paddr) {
+ last_pg = sg_page(sgt_append->prv);
+ while (n_pages && pages_are_mergeable(last_pg, pages[0])) {

This change will break things like multi-page combining, sub page
scenarios and maybe more.

The contiguity test here has to be done a phys, it should go back to
struct page to check if the pgmap is OK.

Can you fix it as well?

Thanks,
Jason

2023-01-05 17:04:33

by Yishai Hadas

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

On 05/01/2023 15:36, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
>> When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
>> in its 'sgt_append->prv' flow to check whether it can merge contiguous
>> pages into the last SG, it passes the page arguments in the wrong order.
>>
>> The first parameter should be the next candidate page to be merged to
>> the last page and not the opposite.
>>
>> The current code leads to a corrupted SG which resulted in OOPs and
>> unexpected errors when non-contiguous pages are merged wrongly.
>>
>> Fix to pass the page parameters in the right order.
>>
>> Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
>> Signed-off-by: Yishai Hadas <[email protected]>
>> ---
>> lib/scatterlist.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
> Reviewed-by: Jason Gunthorpe <[email protected]>

Thanks Jason

>
> Also, I'm looking more closely at '156 and this is not right either:
>
> - unsigned long paddr =
> - (page_to_pfn(sg_page(sgt_append->prv)) * PAGE_SIZE +
> - sgt_append->prv->offset + sgt_append->prv->length) /
> - PAGE_SIZE;
> -
> - while (n_pages && page_to_pfn(pages[0]) == paddr) {
> + last_pg = sg_page(sgt_append->prv);
> + while (n_pages && pages_are_mergeable(last_pg, pages[0])) {
>
> This change will break things like multi-page combining, sub page
> scenarios and maybe more.
>
> The contiguity test here has to be done a phys, it should go back to
> struct page to check if the pgmap is OK.
>
> Can you fix it as well?


Yes, I have locally some candidate patch as you asked, on top of this one.

I would like to run some extra testing on, then may send it.

Yishai

2023-01-05 20:25:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
> When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
> in its 'sgt_append->prv' flow to check whether it can merge contiguous
> pages into the last SG, it passes the page arguments in the wrong order.
>
> The first parameter should be the next candidate page to be merged to
> the last page and not the opposite.
>
> The current code leads to a corrupted SG which resulted in OOPs and
> unexpected errors when non-contiguous pages are merged wrongly.
>
> Fix to pass the page parameters in the right order.
>
> Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
> Signed-off-by: Yishai Hadas <[email protected]>
> ---
> lib/scatterlist.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

rdma is pretty much the only user of this API and this bug is causing
bad data corruption, so I'm going to take it to the rdma tree and send
it tomorrow.

Which raises the question why the original patch was done at all,
nothing ever inputs pgmap pages into this function?

Thanks,
Jason

2023-01-05 20:27:47

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly



On 2023-01-05 13:06, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
>> When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
>> in its 'sgt_append->prv' flow to check whether it can merge contiguous
>> pages into the last SG, it passes the page arguments in the wrong order.
>>
>> The first parameter should be the next candidate page to be merged to
>> the last page and not the opposite.
>>
>> The current code leads to a corrupted SG which resulted in OOPs and
>> unexpected errors when non-contiguous pages are merged wrongly.
>>
>> Fix to pass the page parameters in the right order.
>>
>> Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
>> Signed-off-by: Yishai Hadas <[email protected]>
>> ---
>> lib/scatterlist.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> rdma is pretty much the only user of this API and this bug is causing
> bad data corruption, so I'm going to take it to the rdma tree and send
> it tomorrow.
>
> Which raises the question why the original patch was done at all,
> nothing ever inputs pgmap pages into this function?

It was done solely because you had suggested it was necessary.

https://lore.kernel.org/all/[email protected]/

Though when the patch was correct when I originally wrote it and it
looks like I merged it poorly somewhere along the line (roughly v5 of
the series) when the paddr stuff was added. Sorry about that.
The paddr stuff was messy and really hard to understand.

Anyway, Yishai's first patch looks correct to me, but I guess we need to
fix it further. For what it's worth:

Reviewed-by: Logan Gunthorpe <[email protected]>

Logan

2023-01-05 20:28:04

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

On Thu, Jan 05, 2023 at 01:23:52PM -0700, Logan Gunthorpe wrote:
>
>
> On 2023-01-05 13:06, Jason Gunthorpe wrote:
> > On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
> >> When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
> >> in its 'sgt_append->prv' flow to check whether it can merge contiguous
> >> pages into the last SG, it passes the page arguments in the wrong order.
> >>
> >> The first parameter should be the next candidate page to be merged to
> >> the last page and not the opposite.
> >>
> >> The current code leads to a corrupted SG which resulted in OOPs and
> >> unexpected errors when non-contiguous pages are merged wrongly.
> >>
> >> Fix to pass the page parameters in the right order.
> >>
> >> Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
> >> Signed-off-by: Yishai Hadas <[email protected]>
> >> ---
> >> lib/scatterlist.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > rdma is pretty much the only user of this API and this bug is causing
> > bad data corruption, so I'm going to take it to the rdma tree and send
> > it tomorrow.
> >
> > Which raises the question why the original patch was done at all,
> > nothing ever inputs pgmap pages into this function?
>
> It was done solely because you had suggested it was necessary.
>
> https://lore.kernel.org/all/[email protected]/

Yes, but that was when I was expecting this would work with
FOLL_LONGTERM and PUP..

Jason

2023-01-05 20:39:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

On Thu, Jan 05, 2023 at 01:21:43PM -0700, Keith Busch wrote:
> On Thu, Jan 05, 2023 at 04:06:11PM -0400, Jason Gunthorpe wrote:
> > On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
> > > When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
> > > in its 'sgt_append->prv' flow to check whether it can merge contiguous
> > > pages into the last SG, it passes the page arguments in the wrong order.
> > >
> > > The first parameter should be the next candidate page to be merged to
> > > the last page and not the opposite.
> > >
> > > The current code leads to a corrupted SG which resulted in OOPs and
> > > unexpected errors when non-contiguous pages are merged wrongly.
> > >
> > > Fix to pass the page parameters in the right order.
> > >
> > > Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
> > > Signed-off-by: Yishai Hadas <[email protected]>
> > > ---
> > > lib/scatterlist.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > rdma is pretty much the only user of this API and this bug is causing
> > bad data corruption, so I'm going to take it to the rdma tree and send
> > it tomorrow.
> >
> > Which raises the question why the original patch was done at all,
> > nothing ever inputs pgmap pages into this function?
>
> This just takes any arbitrary user addresses, right? The user could
> provide addresses from mmap'ing pci resource files that resolve to pgmap
> pages.

No, it passes FOLL_LONGTERM and pin_user_pages will not return any pgmaps
in that case.

Jason

2023-01-05 21:12:38

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] lib/scatterlist: Fix to merge contiguous pages into the last SG properly

On Thu, Jan 05, 2023 at 04:06:11PM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2023 at 01:23:39PM +0200, Yishai Hadas wrote:
> > When sg_alloc_append_table_from_pages() calls to pages_are_mergeable()
> > in its 'sgt_append->prv' flow to check whether it can merge contiguous
> > pages into the last SG, it passes the page arguments in the wrong order.
> >
> > The first parameter should be the next candidate page to be merged to
> > the last page and not the opposite.
> >
> > The current code leads to a corrupted SG which resulted in OOPs and
> > unexpected errors when non-contiguous pages are merged wrongly.
> >
> > Fix to pass the page parameters in the right order.
> >
> > Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages")
> > Signed-off-by: Yishai Hadas <[email protected]>
> > ---
> > lib/scatterlist.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> rdma is pretty much the only user of this API and this bug is causing
> bad data corruption, so I'm going to take it to the rdma tree and send
> it tomorrow.
>
> Which raises the question why the original patch was done at all,
> nothing ever inputs pgmap pages into this function?

This just takes any arbitrary user addresses, right? The user could
provide addresses from mmap'ing pci resource files that resolve to pgmap
pages.