LinuxLists.cc - [PATCH 0/2] scatterlist cleanups

2015-06-09 16:46:49

Subject: [PATCH 0/2] scatterlist cleanups

Hi Jens,

While working through a conversion of the scattlerlist data structure
I noticed that some users were open coding scatterlist manipulations.
Christoph recommended sending these for 4.1-rc inclusion. These are
based on 4.1-rc7.

---

Dan Williams (2):
scatterlist: use sg_phys()
scatterlist: cleanup sg_chain() and sg_unmark_end()

arch/arm/mm/dma-mapping.c | 2 +-
arch/microblaze/kernel/dma.c | 2 +-
block/blk-merge.c | 2 +-
drivers/crypto/omap-sham.c | 2 +-
drivers/dma/imx-dma.c | 8 ++------
drivers/dma/ste_dma40.c | 5 +----
drivers/iommu/intel-iommu.c | 4 ++--
drivers/iommu/iommu.c | 2 +-
drivers/mmc/card/queue.c | 4 ++--
drivers/staging/android/ion/ion_chunk_heap.c | 4 ++--
include/crypto/scatterwalk.h | 9 ++-------
11 files changed, 16 insertions(+), 28 deletions(-)

2015-06-09 16:29:59

by Dan Williams

[permalink] [raw]

Subject: [PATCH 1/2] scatterlist: use sg_phys()

Coccinelle cleanup to replace open coded sg to physical address
translations. This is in preparation for introducing scatterlists that
reference __pfn_t.

// sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
// usage: make coccicheck COCCI=sg_phys.cocci MODE=patch

virtual patch
virtual report
virtual org

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg)) + sg->offset
+ sg_phys(sg)

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg))
+ sg_phys(sg) - sg->offset

Cc: Julia Lawall <[email protected]>
Cc: Russell King <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
arch/arm/mm/dma-mapping.c | 2 +-
arch/microblaze/kernel/dma.c | 2 +-
drivers/iommu/intel-iommu.c | 4 ++--
drivers/iommu/iommu.c | 2 +-
drivers/staging/android/ion/ion_chunk_heap.c | 4 ++--
5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7e7583ddd607..9f6ff6671f01 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
return -ENOMEM;

for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
- phys_addr_t phys = page_to_phys(sg_page(s));
+ phys_addr_t phys = sg_phys(s) - s->offset;
unsigned int len = PAGE_ALIGN(s->offset + s->length);

if (!is_coherent &&
diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
index ed7ba8a11822..dcb3c594d626 100644
--- a/arch/microblaze/kernel/dma.c
+++ b/arch/microblaze/kernel/dma.c
@@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
/* FIXME this part of code is untested */
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = sg_phys(sg);
- __dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
+ __dma_sync(sg_phys(sg),
sg->length, direction);
}

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 68d43beccb7e..9b9ada71e0d3 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1998,7 +1998,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
sg_res = aligned_nrpages(sg->offset, sg->length);
sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset;
sg->dma_length = sg->length;
- pteval = page_to_phys(sg_page(sg)) | prot;
+ pteval = (sg_phys(sg) - sg->offset) | prot;
phys_pfn = pteval >> VTD_PAGE_SHIFT;
}

@@ -3302,7 +3302,7 @@ static int intel_nontranslate_map_sg(struct device *hddev,

for_each_sg(sglist, sg, nelems, i) {
BUG_ON(!sg_page(sg));
- sg->dma_address = page_to_phys(sg_page(sg)) + sg->offset;
+ sg->dma_address = sg_phys(sg);
sg->dma_length = sg->length;
}
return nelems;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d4f527e56679..59808fc9110d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1147,7 +1147,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);

for_each_sg(sg, s, nents, i) {
- phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
+ phys_addr_t phys = sg_phys(s);

/*
* We are mapping on IOMMU page boundaries, so offset within
diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c
index 3e6ec2ee6802..b7da5d142aa9 100644
--- a/drivers/staging/android/ion/ion_chunk_heap.c
+++ b/drivers/staging/android/ion/ion_chunk_heap.c
@@ -81,7 +81,7 @@ static int ion_chunk_heap_allocate(struct ion_heap *heap,
err:
sg = table->sgl;
for (i -= 1; i >= 0; i--) {
- gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)),
+ gen_pool_free(chunk_heap->pool, sg_phys(sg) - sg->offset,
sg->length);
sg = sg_next(sg);
}
@@ -109,7 +109,7 @@ static void ion_chunk_heap_free(struct ion_buffer *buffer)
DMA_BIDIRECTIONAL);

for_each_sg(table->sgl, sg, table->nents, i) {
- gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)),
+ gen_pool_free(chunk_heap->pool, sg_phys(sg) - sg->offset,
sg->length);
}
chunk_heap->allocated -= allocated_size;

2015-06-09 16:30:14

by Dan Williams

[permalink] [raw]

Subject: [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()

Replace open coded sg_chain() and sg_unmark_end() instances with the
aforementioned helpers.

Cc: Herbert Xu <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: Linus Walleij <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
block/blk-merge.c | 2 +-
drivers/crypto/omap-sham.c | 2 +-
drivers/dma/imx-dma.c | 8 ++------
drivers/dma/ste_dma40.c | 5 +----
drivers/mmc/card/queue.c | 4 ++--
include/crypto/scatterwalk.h | 9 ++-------
6 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index fd3fee81c23c..7bb4f260ca4b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -266,7 +266,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
if (rq->cmd_flags & REQ_WRITE)
memset(q->dma_drain_buffer, 0, q->dma_drain_size);

- sg->page_link &= ~0x02;
+ sg_unmark_end(sg);
sg = sg_next(sg);
sg_set_page(sg, virt_to_page(q->dma_drain_buffer),
q->dma_drain_size,
diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 4d63e0d4da9a..df8b23e19b90 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -582,7 +582,7 @@ static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
* the dmaengine may try to DMA the incorrect amount of data.
*/
sg_init_table(&ctx->sgl, 1);
- ctx->sgl.page_link = ctx->sg->page_link;
+ sg_assign_page(&ctx->sgl, sg_page(ctx->sg));
ctx->sgl.offset = ctx->sg->offset;
sg_dma_len(&ctx->sgl) = len32;
sg_dma_address(&ctx->sgl) = sg_dma_address(ctx->sg);
diff --git a/drivers/dma/imx-dma.c b/drivers/dma/imx-dma.c
index eed405976ea9..081fbfc87f6b 100644
--- a/drivers/dma/imx-dma.c
+++ b/drivers/dma/imx-dma.c
@@ -886,18 +886,14 @@ static struct dma_async_tx_descriptor *imxdma_prep_dma_cyclic(
sg_init_table(imxdmac->sg_list, periods);

for (i = 0; i < periods; i++) {
- imxdmac->sg_list[i].page_link = 0;
- imxdmac->sg_list[i].offset = 0;
+ sg_set_page(&imxdmac->sg_list[i], NULL, period_len, 0);
imxdmac->sg_list[i].dma_address = dma_addr;
sg_dma_len(&imxdmac->sg_list[i]) = period_len;
dma_addr += period_len;
}

/* close the loop */
- imxdmac->sg_list[periods].offset = 0;
- sg_dma_len(&imxdmac->sg_list[periods]) = 0;
- imxdmac->sg_list[periods].page_link =
- ((unsigned long)imxdmac->sg_list | 0x01) & ~0x02;
+ sg_chain(imxdmac->sg_list, periods + 1, imxdmac->sg_list);

desc->type = IMXDMA_DESC_CYCLIC;
desc->sg = imxdmac->sg_list;
diff --git a/drivers/dma/ste_dma40.c b/drivers/dma/ste_dma40.c
index 3c10f034d4b9..e8c00642cacb 100644
--- a/drivers/dma/ste_dma40.c
+++ b/drivers/dma/ste_dma40.c
@@ -2562,10 +2562,7 @@ dma40_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t dma_addr,
dma_addr += period_len;
}

- sg[periods].offset = 0;
- sg_dma_len(&sg[periods]) = 0;
- sg[periods].page_link =
- ((unsigned long)sg | 0x01) & ~0x02;
+ sg_chain(sg, periods + 1, sg);

txd = d40_prep_sg(chan, sg, sg, periods, direction,
DMA_PREP_INTERRUPT);
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 8efa3684aef8..fa525ee20470 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -469,7 +469,7 @@ static unsigned int mmc_queue_packed_map_sg(struct mmc_queue *mq,
sg_set_buf(__sg, buf + offset, len);
offset += len;
remain -= len;
- (__sg++)->page_link &= ~0x02;
+ sg_unmark_end(__sg++);
sg_len++;
} while (remain);
}
@@ -477,7 +477,7 @@ static unsigned int mmc_queue_packed_map_sg(struct mmc_queue *mq,
list_for_each_entry(req, &packed->list, queuelist) {
sg_len += blk_rq_map_sg(mq->queue, req, __sg);
__sg = sg + (sg_len - 1);
- (__sg++)->page_link &= ~0x02;
+ sg_unmark_end(__sg++);
}
sg_mark_end(sg + (sg_len - 1));
return sg_len;
diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h
index 20e4226a2e14..4529889b0f07 100644
--- a/include/crypto/scatterwalk.h
+++ b/include/crypto/scatterwalk.h
@@ -25,13 +25,8 @@
#include <linux/scatterlist.h>
#include <linux/sched.h>

-static inline void scatterwalk_sg_chain(struct scatterlist *sg1, int num,
- struct scatterlist *sg2)
-{
- sg_set_page(&sg1[num - 1], (void *)sg2, 0, 0);
- sg1[num - 1].page_link &= ~0x02;
- sg1[num - 1].page_link |= 0x01;
-}
+#define scatterwalk_sg_chain(prv, num, sgl) sg_chain(prv, num, sgl)
+#define scatterwalk_sg_next(sgl) sg_next(sgl)

static inline void scatterwalk_crypto_chain(struct scatterlist *head,
struct scatterlist *sg,

2015-06-10 00:37:12

by Elliott, Robert (Server Storage)

[permalink] [raw]

Subject: RE: [PATCH 1/2] scatterlist: use sg_phys()

> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of Dan Williams
> Sent: Tuesday, June 09, 2015 10:27 AM
> Subject: [PATCH 1/2] scatterlist: use sg_phys()
>
...
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index ed7ba8a11822..dcb3c594d626 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct
> scatterlist *sgl,
> /* FIXME this part of code is untested */
> for_each_sg(sgl, sg, nents, i) {
> sg->dma_address = sg_phys(sg);
> - __dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> + __dma_sync(sg_phys(sg),
> sg->length, direction);
> }

That one ends up with weird indentation.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-06-10 05:38:36

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()

On Tue, Jun 09, 2015 at 12:27:15PM -0400, Dan Williams wrote:
>
> +#define scatterwalk_sg_chain(prv, num, sgl) sg_chain(prv, num, sgl)
> +#define scatterwalk_sg_next(sgl) sg_next(sgl)

Why are you reintroducing the scatterwalk_sg_next macro that we
just removed?

I also don't understand why this is so urgent that it has to go
into mainline at this late date.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-06-10 09:33:05

by Joerg Roedel

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 7e7583ddd607..9f6ff6671f01 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
> return -ENOMEM;
>
> for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> - phys_addr_t phys = page_to_phys(sg_page(s));
> + phys_addr_t phys = sg_phys(s) - s->offset;

So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
which makes the above statement to:

page_to_phys(sg_page(s)) + s->offset - s->offset;

The compiler will probably optimize that away, but it still doesn't look
like an improvement.

> unsigned int len = PAGE_ALIGN(s->offset + s->length);
>
> if (!is_coherent &&
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index ed7ba8a11822..dcb3c594d626 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
> /* FIXME this part of code is untested */
> for_each_sg(sgl, sg, nents, i) {
> sg->dma_address = sg_phys(sg);
> - __dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> + __dma_sync(sg_phys(sg),
> sg->length, direction);

Here the replacement makes sense, but weird indendation. Could all be
moved to one line, I guess.

> }
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 68d43beccb7e..9b9ada71e0d3 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1998,7 +1998,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
> sg_res = aligned_nrpages(sg->offset, sg->length);
> sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset;
> sg->dma_length = sg->length;
> - pteval = page_to_phys(sg_page(sg)) | prot;
> + pteval = (sg_phys(sg) - sg->offset) | prot;

Here it doesn't make sense too. In general, please remove the cases
where you have to subtract sg->offset after the conversion.

Joerg

2015-06-10 16:00:40

by Dan Williams

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <[email protected]> wrote:
> On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index 7e7583ddd607..9f6ff6671f01 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>> return -ENOMEM;
>>
>> for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> - phys_addr_t phys = page_to_phys(sg_page(s));
>> + phys_addr_t phys = sg_phys(s) - s->offset;
>
> So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> which makes the above statement to:
>
> page_to_phys(sg_page(s)) + s->offset - s->offset;
>
> The compiler will probably optimize that away, but it still doesn't look
> like an improvement.

The goal is to eventually stop leaking struct page deep into the i/o
stack. Anything that relies on being able to retrieve a struct page
out of an sg entry needs to be converted. I think we need a new
helper for this case "sg_phys_aligned()?".

2015-06-10 16:32:04

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <[email protected]> wrote:
> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> >> index 7e7583ddd607..9f6ff6671f01 100644
> >> --- a/arch/arm/mm/dma-mapping.c
> >> +++ b/arch/arm/mm/dma-mapping.c
> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
> >> return -ENOMEM;
> >>
> >> for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> >> - phys_addr_t phys = page_to_phys(sg_page(s));
> >> + phys_addr_t phys = sg_phys(s) - s->offset;
> >
> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> > which makes the above statement to:
> >
> > page_to_phys(sg_page(s)) + s->offset - s->offset;
> >
> > The compiler will probably optimize that away, but it still doesn't look
> > like an improvement.
>
> The goal is to eventually stop leaking struct page deep into the i/o
> stack. Anything that relies on being able to retrieve a struct page
> out of an sg entry needs to be converted. I think we need a new
> helper for this case "sg_phys_aligned()?".

Why? The aim of the code is not to detect whether the address is aligned
to a page (if it were, it'd be testing for a zero s->offset, or it would
be testing for an s->offset being a multiple of the page size.

Let's first understand the code that's being modified (which seems to be
something which isn't done very much today - people seem to just like
changing code for the hell of it.)

for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
phys_addr_t phys = page_to_phys(sg_page(s));
unsigned int len = PAGE_ALIGN(s->offset + s->length);

if (!is_coherent &&
!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
dir);

prot = __dma_direction_to_prot(dir);

ret = iommu_map(mapping->domain, iova, phys, len, prot);
if (ret < 0)
goto fail;
count += len >> PAGE_SHIFT;
iova += len;
}

What it's doing is trying to map each scatterlist entry via an IOMMU.
Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
page.

However, what says that the IOMMU page size is the same as the host CPU
page size - it may not be... so the above code is wrong for a completely
different reason.

What we _should_ be doing is finding out what the IOMMU minimum page
size is, and using that in conjunction with the sg_phys() of the
scatterlist entry to determine the start and length of the mapping
such that the IOMMU mapping covers the range described by the scatterlist
entry. (iommu_map() takes arguments which must be aligned to the IOMMU
minimum page size.)

However, what we can also see from the above is that we have other code
here using sg_page() - which is a necessity to be able to perform the
required DMA cache maintanence to ensure that the data is visible to the
DMA device. We need to kmap_atomic() these in order to flush them, and
there's no other way other than struct page to represent a highmem page.

So, I think your intent to want to remove struct page from the I/O
subsystem is a false hope, unless you want to end up rewriting lots of
architecture code, and you can come up with an alternative method to
represent highmem pages.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-06-10 16:57:16

by Dan Williams

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <[email protected]> wrote:
>> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> >> index 7e7583ddd607..9f6ff6671f01 100644
>> >> --- a/arch/arm/mm/dma-mapping.c
>> >> +++ b/arch/arm/mm/dma-mapping.c
>> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>> >> return -ENOMEM;
>> >>
>> >> for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >> - phys_addr_t phys = page_to_phys(sg_page(s));
>> >> + phys_addr_t phys = sg_phys(s) - s->offset;
>> >
>> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
>> > which makes the above statement to:
>> >
>> > page_to_phys(sg_page(s)) + s->offset - s->offset;
>> >
>> > The compiler will probably optimize that away, but it still doesn't look
>> > like an improvement.
>>
>> The goal is to eventually stop leaking struct page deep into the i/o
>> stack. Anything that relies on being able to retrieve a struct page
>> out of an sg entry needs to be converted. I think we need a new
>> helper for this case "sg_phys_aligned()?".
>
> Why? The aim of the code is not to detect whether the address is aligned
> to a page (if it were, it'd be testing for a zero s->offset, or it would
> be testing for an s->offset being a multiple of the page size.
>
> Let's first understand the code that's being modified (which seems to be
> something which isn't done very much today - people seem to just like
> changing code for the hell of it.)
>
> for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> phys_addr_t phys = page_to_phys(sg_page(s));
> unsigned int len = PAGE_ALIGN(s->offset + s->length);
>
> if (!is_coherent &&
> !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
> __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> dir);
>
> prot = __dma_direction_to_prot(dir);
>
> ret = iommu_map(mapping->domain, iova, phys, len, prot);
> if (ret < 0)
> goto fail;
> count += len >> PAGE_SHIFT;
> iova += len;
> }
>
> What it's doing is trying to map each scatterlist entry via an IOMMU.
> Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> page.
>
> However, what says that the IOMMU page size is the same as the host CPU
> page size - it may not be... so the above code is wrong for a completely
> different reason.
>
> What we _should_ be doing is finding out what the IOMMU minimum page
> size is, and using that in conjunction with the sg_phys() of the
> scatterlist entry to determine the start and length of the mapping
> such that the IOMMU mapping covers the range described by the scatterlist
> entry. (iommu_map() takes arguments which must be aligned to the IOMMU
> minimum page size.)
>
> However, what we can also see from the above is that we have other code
> here using sg_page() - which is a necessity to be able to perform the
> required DMA cache maintanence to ensure that the data is visible to the
> DMA device. We need to kmap_atomic() these in order to flush them, and
> there's no other way other than struct page to represent a highmem page.
>
> So, I think your intent to want to remove struct page from the I/O
> subsystem is a false hope, unless you want to end up rewriting lots of
> architecture code, and you can come up with an alternative method to
> represent highmem pages.

I think there will always be cases that need to do pfn_to_page() for
buffer management. Those configurations will be blocked from seeing
cases where a pfn is not struct page backed. So, you can have highmem
or dma to pmem, but not both. Christoph, this is why I have Kconfig
symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
i/o.

2015-06-10 17:13:29

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Wed, Jun 10, 2015 at 09:57:06AM -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
> <[email protected]> wrote:
> > Why? The aim of the code is not to detect whether the address is aligned
> > to a page (if it were, it'd be testing for a zero s->offset, or it would
> > be testing for an s->offset being a multiple of the page size.
> >
> > Let's first understand the code that's being modified (which seems to be
> > something which isn't done very much today - people seem to just like
> > changing code for the hell of it.)
> >
> > for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> > phys_addr_t phys = page_to_phys(sg_page(s));
> > unsigned int len = PAGE_ALIGN(s->offset + s->length);
> >
> > if (!is_coherent &&
> > !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
> > __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> > dir);
> >
> > prot = __dma_direction_to_prot(dir);
> >
> > ret = iommu_map(mapping->domain, iova, phys, len, prot);
> > if (ret < 0)
> > goto fail;
> > count += len >> PAGE_SHIFT;
> > iova += len;
> > }
> >
> > What it's doing is trying to map each scatterlist entry via an IOMMU.
> > Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> > page.
> >
> > However, what says that the IOMMU page size is the same as the host CPU
> > page size - it may not be... so the above code is wrong for a completely
> > different reason.
> >
> > What we _should_ be doing is finding out what the IOMMU minimum page
> > size is, and using that in conjunction with the sg_phys() of the
> > scatterlist entry to determine the start and length of the mapping
> > such that the IOMMU mapping covers the range described by the scatterlist
> > entry. (iommu_map() takes arguments which must be aligned to the IOMMU
> > minimum page size.)
> >
> > However, what we can also see from the above is that we have other code
> > here using sg_page() - which is a necessity to be able to perform the
> > required DMA cache maintanence to ensure that the data is visible to the
> > DMA device. We need to kmap_atomic() these in order to flush them, and
> > there's no other way other than struct page to represent a highmem page.
> >
> > So, I think your intent to want to remove struct page from the I/O
> > subsystem is a false hope, unless you want to end up rewriting lots of
> > architecture code, and you can come up with an alternative method to
> > represent highmem pages.
>
> I think there will always be cases that need to do pfn_to_page() for
> buffer management. Those configurations will be blocked from seeing
> cases where a pfn is not struct page backed. So, you can have highmem
> or dma to pmem, but not both. Christoph, this is why I have Kconfig
> symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
> i/o.

Hmm, pmem... yea, in the SolidRun community, we've basically decided to
stick with my updated Marvell BMM layer rather than switch to pmem. I
forget the reasons why, but the decision was made after looking at what
pmem was doing...

In any case, let's not get bogged down in a peripheral issue.

What I'm objecting to is that the patches I've seen seem to be just
churn without any net benefit.

You can't simply make sg_page() return NULL after this change, because
you've done nothing with the remaining sg_page() callers to allow them
to gracefully handle that case.

What I'd like to see is a much fuller series of patches which show the
whole progression towards your end goal rather than a piecemeal
approach. Right now, it's not clear that there is any benefit to
this round of changes.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-06-10 17:25:48

by Dan Williams

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Wed, Jun 10, 2015 at 10:13 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Jun 10, 2015 at 09:57:06AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
>> <[email protected]> wrote:
>> > Why? The aim of the code is not to detect whether the address is aligned
>> > to a page (if it were, it'd be testing for a zero s->offset, or it would
>> > be testing for an s->offset being a multiple of the page size.
>> >
>> > Let's first understand the code that's being modified (which seems to be
>> > something which isn't done very much today - people seem to just like
>> > changing code for the hell of it.)
>> >
>> > for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> > phys_addr_t phys = page_to_phys(sg_page(s));
>> > unsigned int len = PAGE_ALIGN(s->offset + s->length);
>> >
>> > if (!is_coherent &&
>> > !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>> > __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
>> > dir);
>> >
>> > prot = __dma_direction_to_prot(dir);
>> >
>> > ret = iommu_map(mapping->domain, iova, phys, len, prot);
>> > if (ret < 0)
>> > goto fail;
>> > count += len >> PAGE_SHIFT;
>> > iova += len;
>> > }
>> >
>> > What it's doing is trying to map each scatterlist entry via an IOMMU.
>> > Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
>> > page.
>> >
>> > However, what says that the IOMMU page size is the same as the host CPU
>> > page size - it may not be... so the above code is wrong for a completely
>> > different reason.
>> >
>> > What we _should_ be doing is finding out what the IOMMU minimum page
>> > size is, and using that in conjunction with the sg_phys() of the
>> > scatterlist entry to determine the start and length of the mapping
>> > such that the IOMMU mapping covers the range described by the scatterlist
>> > entry. (iommu_map() takes arguments which must be aligned to the IOMMU
>> > minimum page size.)
>> >
>> > However, what we can also see from the above is that we have other code
>> > here using sg_page() - which is a necessity to be able to perform the
>> > required DMA cache maintanence to ensure that the data is visible to the
>> > DMA device. We need to kmap_atomic() these in order to flush them, and
>> > there's no other way other than struct page to represent a highmem page.
>> >
>> > So, I think your intent to want to remove struct page from the I/O
>> > subsystem is a false hope, unless you want to end up rewriting lots of
>> > architecture code, and you can come up with an alternative method to
>> > represent highmem pages.
>>
>> I think there will always be cases that need to do pfn_to_page() for
>> buffer management. Those configurations will be blocked from seeing
>> cases where a pfn is not struct page backed. So, you can have highmem
>> or dma to pmem, but not both. Christoph, this is why I have Kconfig
>> symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
>> i/o.
>
> Hmm, pmem... yea, in the SolidRun community, we've basically decided to
> stick with my updated Marvell BMM layer rather than switch to pmem. I
> forget the reasons why, but the decision was made after looking at what
> pmem was doing...

I'd of course be open to exploring if drivers/nvdimm/ could be made
more generally useful.

> In any case, let's not get bogged down in a peripheral issue.
>
> What I'm objecting to is that the patches I've seen seem to be just
> churn without any net benefit.
>
> You can't simply make sg_page() return NULL after this change, because
> you've done nothing with the remaining sg_page() callers to allow them
> to gracefully handle that case.
>
> What I'd like to see is a much fuller series of patches which show the
> whole progression towards your end goal rather than a piecemeal
> approach. Right now, it's not clear that there is any benefit to
> this round of changes.
>

Fair enough. I had them as part of a larger series [1]. Christoph
suggested that I break out the pure cleanups separately. I'll add you
to the next rev of that series.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-June/001094.html

2015-06-11 06:51:06

by Joerg Roedel

[permalink] [raw]

Subject: Re: [PATCH 1/2] scatterlist: use sg_phys()

On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> > which makes the above statement to:
> >
> > page_to_phys(sg_page(s)) + s->offset - s->offset;
> >
> > The compiler will probably optimize that away, but it still doesn't look
> > like an improvement.
>
> The goal is to eventually stop leaking struct page deep into the i/o
> stack. Anything that relies on being able to retrieve a struct page
> out of an sg entry needs to be converted. I think we need a new
> helper for this case "sg_phys_aligned()?".

You still have a reference to a struct page, because sg_phys() calls
sg_page() too. If you want to get rid of sg_page() something like
sg_pfn() migth be a more workable solution than sg_phys_(page_)aligned.

But maybe I am just missing the bigger scope of this, so I agree with
Russell that it is better so see a patch series which shows the
direction you want to go with this.

Joerg

2015-06-11 07:31:16

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()

On Wed, Jun 10, 2015 at 01:38:04PM +0800, Herbert Xu wrote:
> On Tue, Jun 09, 2015 at 12:27:15PM -0400, Dan Williams wrote:
> >
> > +#define scatterwalk_sg_chain(prv, num, sgl) sg_chain(prv, num, sgl)
> > +#define scatterwalk_sg_next(sgl) sg_next(sgl)
>
> Why are you reintroducing the scatterwalk_sg_next macro that we
> just removed?
>
> I also don't understand why this is so urgent that it has to go
> into mainline at this late date.

It allows getting a cleaner slate for the next merge window, which seems
useful on it's own. The re-addition of scatterwalk_sg_next seems next,
but getting rid of the open-coded sg_chain is useful.

Maybe you can take it through the crypto tree and also kill off the
scatterwalk_sg_chain name as well while we're at it?

2015-06-11 07:35:16

by Herbert Xu

[permalink] [raw]

Subject: Re: [PATCH 2/2] scatterlist: cleanup sg_chain() and sg_unmark_end()

On Thu, Jun 11, 2015 at 09:31:07AM +0200, Christoph Hellwig wrote:
>
> It allows getting a cleaner slate for the next merge window, which seems
> useful on it's own. The re-addition of scatterwalk_sg_next seems next,
> but getting rid of the open-coded sg_chain is useful.

Sure I can take the crypto bits.

> Maybe you can take it through the crypto tree and also kill off the
> scatterwalk_sg_chain name as well while we're at it?

Sure I'm happy to take such a patch.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt