2015-02-11 08:28:44

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Hello,

On 2015-01-27 09:25, Sumit Semwal wrote:
> Add some helpers to share the constraints of devices while attaching
> to the dmabuf buffer.
>
> At each attach, the constraints are calculated based on the following:
> - max_segment_size, max_segment_count, segment_boundary_mask from
> device_dma_parameters.
>
> In case the attaching device's constraints don't match up, attach() fails.
>
> At detach, the constraints are recalculated based on the remaining
> attached devices.
>
> Two helpers are added:
> - dma_buf_get_constraints - which gives the current constraints as calculated
> during each attach on the buffer till the time,
> - dma_buf_recalc_constraints - which recalculates the constraints for all
> currently attached devices for the 'paranoid' ones amongst us.
>
> The idea of this patch is largely taken from Rob Clark's RFC at
> https://lkml.org/lkml/2012/7/19/285, and the comments received on it.
>
> Cc: Rob Clark <[email protected]>
> Signed-off-by: Sumit Semwal <[email protected]>

The code looks okay, although it will probably will work well only with
typical
cases like 'contiguous memory needed' or 'no constraints at all' (iommu).

Acked-by: Marek Szyprowski <[email protected]>

> ---
> v3:
> - Thanks to Russell's comment, remove dma_mask and coherent_dma_mask from
> constraints' calculation; has a nice side effect of letting us use
> device_dma_parameters directly to list constraints.
> - update the debugfs output to show constraint info as well.
>
> v2: split constraints-sharing and allocation helpers
>
> drivers/dma-buf/dma-buf.c | 126 +++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/dma-buf.h | 7 +++
> 2 files changed, 132 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 5be225c2ba98..f363f1440803 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -264,6 +264,66 @@ static inline int is_dma_buf_file(struct file *file)
> return file->f_op == &dma_buf_fops;
> }
>
> +static inline void init_constraints(struct device_dma_parameters *cons)
> +{
> + cons->max_segment_count = (unsigned int)-1;
> + cons->max_segment_size = (unsigned int)-1;
> + cons->segment_boundary_mask = (unsigned long)-1;
> +}
> +
> +/*
> + * calc_constraints - calculates if the new attaching device's constraints
> + * match, with the constraints of already attached devices; if yes, returns
> + * the constraints; else return ERR_PTR(-EINVAL)
> + */
> +static int calc_constraints(struct device *dev,
> + struct device_dma_parameters *calc_cons)
> +{
> + struct device_dma_parameters cons = *calc_cons;
> +
> + cons.max_segment_count = min(cons.max_segment_count,
> + dma_get_max_seg_count(dev));
> + cons.max_segment_size = min(cons.max_segment_size,
> + dma_get_max_seg_size(dev));
> + cons.segment_boundary_mask &= dma_get_seg_boundary(dev);
> +
> + if (!cons.max_segment_count ||
> + !cons.max_segment_size ||
> + !cons.segment_boundary_mask) {
> + pr_err("Dev: %s's constraints don't match\n", dev_name(dev));
> + return -EINVAL;
> + }
> +
> + *calc_cons = cons;
> +
> + return 0;
> +}
> +
> +/*
> + * recalc_constraints - recalculates constraints for all attached devices;
> + * useful for detach() recalculation, and for dma_buf_recalc_constraints()
> + * helper.
> + * Returns recalculated constraints in recalc_cons, or error in the unlikely
> + * case when constraints of attached devices might have changed.
> + */
> +static int recalc_constraints(struct dma_buf *dmabuf,
> + struct device_dma_parameters *recalc_cons)
> +{
> + struct device_dma_parameters calc_cons;
> + struct dma_buf_attachment *attach;
> + int ret = 0;
> +
> + init_constraints(&calc_cons);
> +
> + list_for_each_entry(attach, &dmabuf->attachments, node) {
> + ret = calc_constraints(attach->dev, &calc_cons);
> + if (ret)
> + return ret;
> + }
> + *recalc_cons = calc_cons;
> + return 0;
> +}
> +
> /**
> * dma_buf_export_named - Creates a new dma_buf, and associates an anon file
> * with this buffer, so it can be exported.
> @@ -313,6 +373,9 @@ struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops,
> dmabuf->ops = ops;
> dmabuf->size = size;
> dmabuf->exp_name = exp_name;
> +
> + init_constraints(&dmabuf->constraints);
> +
> init_waitqueue_head(&dmabuf->poll);
> dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll;
> dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0;
> @@ -422,7 +485,7 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> struct device *dev)
> {
> struct dma_buf_attachment *attach;
> - int ret;
> + int ret = 0;
>
> if (WARN_ON(!dmabuf || !dev))
> return ERR_PTR(-EINVAL);
> @@ -436,6 +499,9 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>
> mutex_lock(&dmabuf->lock);
>
> + if (calc_constraints(dev, &dmabuf->constraints))
> + goto err_constraints;
> +
> if (dmabuf->ops->attach) {
> ret = dmabuf->ops->attach(dmabuf, dev, attach);
> if (ret)
> @@ -448,6 +514,7 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>
> err_attach:
> kfree(attach);
> +err_constraints:
> mutex_unlock(&dmabuf->lock);
> return ERR_PTR(ret);
> }
> @@ -470,6 +537,8 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
> if (dmabuf->ops->detach)
> dmabuf->ops->detach(dmabuf, attach);
>
> + recalc_constraints(dmabuf, &dmabuf->constraints);
> +
> mutex_unlock(&dmabuf->lock);
> kfree(attach);
> }
> @@ -770,6 +839,56 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
> }
> EXPORT_SYMBOL_GPL(dma_buf_vunmap);
>
> +/**
> + * dma_buf_get_constraints - get the *current* constraints of the dmabuf,
> + * as calculated during each attach(); returns error on invalid inputs
> + *
> + * @dmabuf: [in] buffer to get constraints of
> + * @constraints: [out] current constraints are returned in this
> + */
> +int dma_buf_get_constraints(struct dma_buf *dmabuf,
> + struct device_dma_parameters *constraints)
> +{
> + if (WARN_ON(!dmabuf || !constraints))
> + return -EINVAL;
> +
> + mutex_lock(&dmabuf->lock);
> + *constraints = dmabuf->constraints;
> + mutex_unlock(&dmabuf->lock);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dma_buf_get_constraints);
> +
> +/**
> + * dma_buf_recalc_constraints - *recalculate* the constraints for the buffer
> + * afresh, from the list of currently attached devices; this could be useful
> + * cross-check the current constraints, for exporters that might want to be
> + * 'paranoid' about the device constraints.
> + *
> + * returns error on invalid inputs
> + *
> + * @dmabuf: [in] buffer to get constraints of
> + * @constraints: [out] recalculated constraints are returned in this
> + */
> +int dma_buf_recalc_constraints(struct dma_buf *dmabuf,
> + struct device_dma_parameters *constraints)
> +{
> + struct device_dma_parameters calc_cons;
> + int ret = 0;
> +
> + if (WARN_ON(!dmabuf || !constraints))
> + return -EINVAL;
> +
> + mutex_lock(&dmabuf->lock);
> + ret = recalc_constraints(dmabuf, &calc_cons);
> + if (!ret)
> + *constraints = calc_cons;
> +
> + mutex_unlock(&dmabuf->lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(dma_buf_recalc_constraints);
> +
> #ifdef CONFIG_DEBUG_FS
> static int dma_buf_describe(struct seq_file *s)
> {
> @@ -801,6 +920,11 @@ static int dma_buf_describe(struct seq_file *s)
> buf_obj->file->f_flags, buf_obj->file->f_mode,
> file_count(buf_obj->file),
> buf_obj->exp_name);
> + seq_printf(s, "\tConstraints: Seg Count: %08u, Seg Size: %08u",
> + buf_obj->constraints.max_segment_count,
> + buf_obj->constraints.max_segment_size);
> + seq_printf(s, " seg boundary mask: %08lx\n",
> + buf_obj->constraints.segment_boundary_mask);
>
> seq_puts(s, "\tAttached Devices:\n");
> attach_count = 0;
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 694e1fe1c4b4..489ad9b2e5ae 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -34,6 +34,7 @@
> #include <linux/wait.h>
>
> struct device;
> +struct device_dma_parameters;
> struct dma_buf;
> struct dma_buf_attachment;
>
> @@ -116,6 +117,7 @@ struct dma_buf_ops {
> * @ops: dma_buf_ops associated with this buffer object.
> * @exp_name: name of the exporter; useful for debugging.
> * @list_node: node for dma_buf accounting and debugging.
> + * @constraints: calculated constraints of attached devices.
> * @priv: exporter specific private data for this buffer object.
> * @resv: reservation object linked to this dma-buf
> */
> @@ -130,6 +132,7 @@ struct dma_buf {
> void *vmap_ptr;
> const char *exp_name;
> struct list_head list_node;
> + struct device_dma_parameters constraints;
> void *priv;
> struct reservation_object *resv;
>
> @@ -211,4 +214,8 @@ void *dma_buf_vmap(struct dma_buf *);
> void dma_buf_vunmap(struct dma_buf *, void *vaddr);
> int dma_buf_debugfs_create_file(const char *name,
> int (*write)(struct seq_file *));
> +
> +int dma_buf_get_constraints(struct dma_buf *, struct device_dma_parameters *);
> +int dma_buf_recalc_constraints(struct dma_buf *,
> + struct device_dma_parameters *);
> #endif /* __DMA_BUF_H__ */

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland


2015-02-11 11:13:18

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On Wed, Feb 11, 2015 at 09:28:37AM +0100, Marek Szyprowski wrote:
> Hello,
>
> On 2015-01-27 09:25, Sumit Semwal wrote:
> >Add some helpers to share the constraints of devices while attaching
> >to the dmabuf buffer.
> >
> >At each attach, the constraints are calculated based on the following:
> >- max_segment_size, max_segment_count, segment_boundary_mask from
> > device_dma_parameters.
> >
> >In case the attaching device's constraints don't match up, attach() fails.
> >
> >At detach, the constraints are recalculated based on the remaining
> >attached devices.
> >
> >Two helpers are added:
> >- dma_buf_get_constraints - which gives the current constraints as calculated
> > during each attach on the buffer till the time,
> >- dma_buf_recalc_constraints - which recalculates the constraints for all
> > currently attached devices for the 'paranoid' ones amongst us.
> >
> >The idea of this patch is largely taken from Rob Clark's RFC at
> >https://lkml.org/lkml/2012/7/19/285, and the comments received on it.
> >
> >Cc: Rob Clark <[email protected]>
> >Signed-off-by: Sumit Semwal <[email protected]>
>
> The code looks okay, although it will probably will work well only with
> typical cases like 'contiguous memory needed' or 'no constraints at all'
> (iommu).

Which is a damn good reason to NAK it - by that admission, it's a half-baked
idea.

If all we want to know is whether the importer can accept only contiguous
memory or not, make a flag to do that, and allow the exporter to test this
flag. Don't over-engineer this to make it _seem_ like it can do something
that it actually totally fails with.

As I've already pointed out, there's a major problem if you have already
had a less restrictive attachment which has an active mapping, and a new
more restrictive attachment comes along later.

It seems from Rob's descriptions that we also need another flag in the
importer to indicate whether it wants to have a valid struct page in the
scatter list, or whether it (correctly) uses the DMA accessors on the
scatter list - so that exporters can reject importers which are buggy.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-02-11 11:23:55

by Rob Clark

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Feb 11, 2015 at 09:28:37AM +0100, Marek Szyprowski wrote:
>> Hello,
>>
>> On 2015-01-27 09:25, Sumit Semwal wrote:
>> >Add some helpers to share the constraints of devices while attaching
>> >to the dmabuf buffer.
>> >
>> >At each attach, the constraints are calculated based on the following:
>> >- max_segment_size, max_segment_count, segment_boundary_mask from
>> > device_dma_parameters.
>> >
>> >In case the attaching device's constraints don't match up, attach() fails.
>> >
>> >At detach, the constraints are recalculated based on the remaining
>> >attached devices.
>> >
>> >Two helpers are added:
>> >- dma_buf_get_constraints - which gives the current constraints as calculated
>> > during each attach on the buffer till the time,
>> >- dma_buf_recalc_constraints - which recalculates the constraints for all
>> > currently attached devices for the 'paranoid' ones amongst us.
>> >
>> >The idea of this patch is largely taken from Rob Clark's RFC at
>> >https://lkml.org/lkml/2012/7/19/285, and the comments received on it.
>> >
>> >Cc: Rob Clark <[email protected]>
>> >Signed-off-by: Sumit Semwal <[email protected]>
>>
>> The code looks okay, although it will probably will work well only with
>> typical cases like 'contiguous memory needed' or 'no constraints at all'
>> (iommu).
>
> Which is a damn good reason to NAK it - by that admission, it's a half-baked
> idea.
>
> If all we want to know is whether the importer can accept only contiguous
> memory or not, make a flag to do that, and allow the exporter to test this
> flag. Don't over-engineer this to make it _seem_ like it can do something
> that it actually totally fails with.

jfyi, I agree with that.. I think the flag is probably the right
approach to start with. At the end of the day it *is* still just an
in-kernel API (and not something that ends up as userspace ABI) so
when we come up with the use case to make it more generic we can. Vs.
making it look like something more generic when it isn't really yet.

> As I've already pointed out, there's a major problem if you have already
> had a less restrictive attachment which has an active mapping, and a new
> more restrictive attachment comes along later.
>
> It seems from Rob's descriptions that we also need another flag in the
> importer to indicate whether it wants to have a valid struct page in the
> scatter list, or whether it (correctly) uses the DMA accessors on the
> scatter list - so that exporters can reject importers which are buggy.

to be completely generic, we would really need a way that the device
could take over only just the last iommu (in case there were multiple
levels of address translation)..

I'm not completely sure, but I *think* the other arm gpu's have their
own internal mmu for doing context switching, etc, so if there is an
additional iommu in front of them they may actually still want to use
the normal dma api's. Someone please contradict me if I am wrong. If
this ends up being an issue only for msm, then I'm completely ok with
the easier option of a less generic solution..

BR,
-R

>
> --
> FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
> according to speedtest.net.

2015-02-11 12:20:34

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Hello,

On 2015-02-11 12:12, Russell King - ARM Linux wrote:
> On Wed, Feb 11, 2015 at 09:28:37AM +0100, Marek Szyprowski wrote:
>> On 2015-01-27 09:25, Sumit Semwal wrote:
>>> Add some helpers to share the constraints of devices while attaching
>>> to the dmabuf buffer.
>>>
>>> At each attach, the constraints are calculated based on the following:
>>> - max_segment_size, max_segment_count, segment_boundary_mask from
>>> device_dma_parameters.
>>>
>>> In case the attaching device's constraints don't match up, attach() fails.
>>>
>>> At detach, the constraints are recalculated based on the remaining
>>> attached devices.
>>>
>>> Two helpers are added:
>>> - dma_buf_get_constraints - which gives the current constraints as calculated
>>> during each attach on the buffer till the time,
>>> - dma_buf_recalc_constraints - which recalculates the constraints for all
>>> currently attached devices for the 'paranoid' ones amongst us.
>>>
>>> The idea of this patch is largely taken from Rob Clark's RFC at
>>> https://lkml.org/lkml/2012/7/19/285, and the comments received on it.
>>>
>>> Cc: Rob Clark <[email protected]>
>>> Signed-off-by: Sumit Semwal <[email protected]>
>> The code looks okay, although it will probably will work well only with
>> typical cases like 'contiguous memory needed' or 'no constraints at all'
>> (iommu).
> Which is a damn good reason to NAK it - by that admission, it's a half-baked
> idea.
>
> If all we want to know is whether the importer can accept only contiguous
> memory or not, make a flag to do that, and allow the exporter to test this
> flag. Don't over-engineer this to make it _seem_ like it can do something
> that it actually totally fails with.
>
> As I've already pointed out, there's a major problem if you have already
> had a less restrictive attachment which has an active mapping, and a new
> more restrictive attachment comes along later.
>
> It seems from Rob's descriptions that we also need another flag in the
> importer to indicate whether it wants to have a valid struct page in the
> scatter list, or whether it (correctly) uses the DMA accessors on the
> scatter list - so that exporters can reject importers which are buggy.

Okay, but flag-based approach also have limitations.

Frankly, if we want to make it really portable and sharable between devices,
then IMO we should get rid of struct scatterlist and replace it with simple
array of pfns in dma_buf. This way at least the problem of missing struct
page will be solved and the buffer representation will be also a bit more
compact.

Such solution however also requires extending dma-mapping API to handle
mapping and unmapping of such pfn arrays. The advantage of this approach
is the fact that it will be completely new API, so it can be designed
well from the beginning.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2015-02-11 12:55:53

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On Wed, Feb 11, 2015 at 06:23:52AM -0500, Rob Clark wrote:
> On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux
> <[email protected]> wrote:
> > As I've already pointed out, there's a major problem if you have already
> > had a less restrictive attachment which has an active mapping, and a new
> > more restrictive attachment comes along later.
> >
> > It seems from Rob's descriptions that we also need another flag in the
> > importer to indicate whether it wants to have a valid struct page in the
> > scatter list, or whether it (correctly) uses the DMA accessors on the
> > scatter list - so that exporters can reject importers which are buggy.
>
> to be completely generic, we would really need a way that the device
> could take over only just the last iommu (in case there were multiple
> levels of address translation)..

I still hold that if the dma api steals the iommu your gpu needs for
context switching then that's a bug in the platform setup code. dma api
really doesn't have any concept of switchable hw contexts. So trying to
work around this brokeness by mandating it as a valid dma-buf use-case is
totally backwards.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2015-02-11 13:30:55

by Rob Clark

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On Wed, Feb 11, 2015 at 7:56 AM, Daniel Vetter <[email protected]> wrote:
> On Wed, Feb 11, 2015 at 06:23:52AM -0500, Rob Clark wrote:
>> On Wed, Feb 11, 2015 at 6:12 AM, Russell King - ARM Linux
>> <[email protected]> wrote:
>> > As I've already pointed out, there's a major problem if you have already
>> > had a less restrictive attachment which has an active mapping, and a new
>> > more restrictive attachment comes along later.
>> >
>> > It seems from Rob's descriptions that we also need another flag in the
>> > importer to indicate whether it wants to have a valid struct page in the
>> > scatter list, or whether it (correctly) uses the DMA accessors on the
>> > scatter list - so that exporters can reject importers which are buggy.
>>
>> to be completely generic, we would really need a way that the device
>> could take over only just the last iommu (in case there were multiple
>> levels of address translation)..
>
> I still hold that if the dma api steals the iommu your gpu needs for
> context switching then that's a bug in the platform setup code. dma api
> really doesn't have any concept of switchable hw contexts. So trying to
> work around this brokeness by mandating it as a valid dma-buf use-case is
> totally backwards.

sure, my only point is that if I'm the odd man out, I can live with a
hack (ie. requiring drm/msm to be aware enough of the platform to know
if there is >1 level of address translation and frob'ing my 'struct
device' accordingly)... no point in a generic solution for one user.
I like to be practical.

BR,
-R

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

2015-02-11 16:23:28

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
> Hello,
>
> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
> >Which is a damn good reason to NAK it - by that admission, it's a half-baked
> >idea.
> >
> >If all we want to know is whether the importer can accept only contiguous
> >memory or not, make a flag to do that, and allow the exporter to test this
> >flag. Don't over-engineer this to make it _seem_ like it can do something
> >that it actually totally fails with.
> >
> >As I've already pointed out, there's a major problem if you have already
> >had a less restrictive attachment which has an active mapping, and a new
> >more restrictive attachment comes along later.
> >
> >It seems from Rob's descriptions that we also need another flag in the
> >importer to indicate whether it wants to have a valid struct page in the
> >scatter list, or whether it (correctly) uses the DMA accessors on the
> >scatter list - so that exporters can reject importers which are buggy.
>
> Okay, but flag-based approach also have limitations.

Yes, the flag-based approach doesn't let you describe in detail what
the importer can accept - which, given the issues that I've raised
is a *good* thing. We won't be misleading anyone into thinking that
we can do something that's really half-baked, and which we have no
present requirement for.

This is precisely what Linus talks about when he says "don't over-
engineer" - if we over-engineer this, we end up with something that
sort-of works, and that's a bad thing.

The Keep It Simple approach here makes total sense - what are our
current requirements - to be able to say that an importer can only accept:
- contiguous memory rather than a scatterlist
- scatterlists with struct page pointers

Does solving that need us to compare all the constraints of each and
every importer, possibly ending up with constraints which can't be
satisfied? No. Does the flag approach satisfy the requirements? Yes.

> Frankly, if we want to make it really portable and sharable between devices,
> then IMO we should get rid of struct scatterlist and replace it with simple
> array of pfns in dma_buf. This way at least the problem of missing struct
> page will be solved and the buffer representation will be also a bit more
> compact.

... and move the mapping and unmapping of the PFNs to the importer,
which IMHO is where it should already be (so the importer can decide
when it should map the buffer itself independently of getting the
details of the buffer.)

> Such solution however also requires extending dma-mapping API to handle
> mapping and unmapping of such pfn arrays. The advantage of this approach
> is the fact that it will be completely new API, so it can be designed
> well from the beginning.

As far as I'm aware, there was no big discussion of the dma_buf API -
it's something that just appeared one day (I don't remember seeing it
discussed.) So, that may well be a good thing if it means we can get
these kinds of details better hammered out.

However, I don't share your view of "completely new API" - that would
be far too disruptive. I think we can modify the existing API, to
achieve our goals.

I don't think changing the dma-mapping API just for this case is really
on though - if we're passed a list of PFNs, they either must be all
associated with a struct page - iow, pfn_valid(pfn) returns true for
all of them or none of them. If none of them, then we need to be able
to convert those PFNs to a dma_addr_t for the target device (yes, that
may need the dma-mapping API augmenting.)

However, if they are associated with a struct page, then we should use
the established APIs and use a scatterlist, otherwise we're looking
at rewriting all IOMMUs and all DMA mapping implementations to handle
what would become a special case for dma_buf.

I'd rather... Keep It Simple.

So, maybe, as a first step, let's augment dma_buf with a pair of
functions which get the "raw" unmapped scatterlist:

struct sg_table *dma_buf_get_scatterlist(struct dma_buf_attachment *attach)
{
struct sg_table *sg_table;

might_sleep();

if (!attach->dmabuf->ops->get_scatterlist)
return ERR_PTR(-EINVAL);

sg_table = attach->dmabuf->ops->get_scatterlist(attach);
if (!sg_table)
sg_table = ERR_PTR(-ENOMEM);

return sg_table;
}

void dma_buf_put_scatterlist(struct dma_buf_attachment *attach,
struct sg_table *sg_table)
{
might_sleep();

attach->dmabuf->ops->put_scatterlist(attach, sg_table);
}

Implementations should arrange for dma_buf_get_scatterlist() to return
the EINVAL error pointer if they are unable to provide an unmapped
scatterlist (in other words, they are exporting a set of PFNs or
already-mapped dma_addr_t's.) This can be done by either not
implementing the get_scatterlist method, or by implementing it and
returning that forementioned error pointer value.

Where these two are implemented and used, the importer is responsible
for calling dma_map_sg() and dma_unmap_sg() on the returned scatterlist
table.

unsigned long *dma_buf_get_pfns(struct dma_buf_attachment *attach)
{
unsigned long *pfns;

might_sleep();

if (!attach->dmabuf->ops->get_pfns)
return ERR_PTR(-EINVAL);

return attach->dmabuf->ops->get_pfns(attach);
}

void dma_buf_put_pfns(struct dma_buf_attachment *attach, unsigned long *pfns)
{
might_sleep();

attach->dmabuf->ops->put_pfns(attach, pfns);
}

Similar to the above, but this gets a list of PFNs. Each PFN entry prior
to the last describes a page starting at offset 0 extending to the end of
the page. The last PFN entry describes a page starting at offset 0 and
extending to the offset of the attachment size within the page. Again,
if not implemented or it is not possible to represent the buffer as PFNs,
it returns -EINVAL.

For the time being, we keep the existing dma_buf_map_attachment() and
dma_buf_unmap_attachment() while we transition users over to the new
interfaces.

We may wish to augment struct dma_buf_attachment with a couple of flags
to indicate which of these the attached dma_buf supports, so that drivers
can deal with these themselves.

We may also wish in the longer term to keep dma_buf_map_attachment() but
implement it as a wrapper around get_scatterlist() as a helper to provide
its existing functionality - providing a mapped scatterlist. Possibly
also including get_pfns() in that as well if we need to.

However, I would still like more thought put into Rob's issue to see
whether we can solve his problem with using the dma_addr_t in a more
elegant way. (I wish I had some hardware to experiment with for that.)

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-05-05 16:07:53

by Sumit Semwal

[permalink] [raw]
Subject: Re: [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Hi Russell, everyone,

First up, sincere apologies for being awol for sometime; had some
personal / medical things to take care of, and then I thought I'd wait
for the merge window to get over before beginning to discuss this
again.

On 11 February 2015 at 21:53, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
>> Hello,
>>
>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>> >Which is a damn good reason to NAK it - by that admission, it's a half-baked
>> >idea.
>> >
>> >If all we want to know is whether the importer can accept only contiguous
>> >memory or not, make a flag to do that, and allow the exporter to test this
>> >flag. Don't over-engineer this to make it _seem_ like it can do something
>> >that it actually totally fails with.
>> >
>> >As I've already pointed out, there's a major problem if you have already
>> >had a less restrictive attachment which has an active mapping, and a new
>> >more restrictive attachment comes along later.
>> >
>> >It seems from Rob's descriptions that we also need another flag in the
>> >importer to indicate whether it wants to have a valid struct page in the
>> >scatter list, or whether it (correctly) uses the DMA accessors on the
>> >scatter list - so that exporters can reject importers which are buggy.
>>
>> Okay, but flag-based approach also have limitations.
>
> Yes, the flag-based approach doesn't let you describe in detail what
> the importer can accept - which, given the issues that I've raised
> is a *good* thing. We won't be misleading anyone into thinking that
> we can do something that's really half-baked, and which we have no
> present requirement for.
>
> This is precisely what Linus talks about when he says "don't over-
> engineer" - if we over-engineer this, we end up with something that
> sort-of works, and that's a bad thing.
>
> The Keep It Simple approach here makes total sense - what are our
> current requirements - to be able to say that an importer can only accept:
> - contiguous memory rather than a scatterlist
> - scatterlists with struct page pointers
>
> Does solving that need us to compare all the constraints of each and
> every importer, possibly ending up with constraints which can't be
> satisfied? No. Does the flag approach satisfy the requirements? Yes.
>

So, for basic constraint-sharing, we'll just go with the flag based
approach, with a flag (best place for it is still dev->dma_params I
suppose) for denoting contiguous or scatterlist. Is that agreed, then?
Also, with this idea, of course, there won't be any helpers for trying
to calculate constraints; it would be totally the exporter's
responsibility to handle it via the attach() dma_buf_op if it wishes
to.

>> Frankly, if we want to make it really portable and sharable between devices,
>> then IMO we should get rid of struct scatterlist and replace it with simple
>> array of pfns in dma_buf. This way at least the problem of missing struct
>> page will be solved and the buffer representation will be also a bit more
>> compact.
>
> ... and move the mapping and unmapping of the PFNs to the importer,
> which IMHO is where it should already be (so the importer can decide
> when it should map the buffer itself independently of getting the
> details of the buffer.)
>
>> Such solution however also requires extending dma-mapping API to handle
>> mapping and unmapping of such pfn arrays. The advantage of this approach
>> is the fact that it will be completely new API, so it can be designed
>> well from the beginning.
>
> As far as I'm aware, there was no big discussion of the dma_buf API -
> it's something that just appeared one day (I don't remember seeing it
> discussed.) So, that may well be a good thing if it means we can get
> these kinds of details better hammered out.
>
> However, I don't share your view of "completely new API" - that would
> be far too disruptive. I think we can modify the existing API, to
> achieve our goals.
>
> I don't think changing the dma-mapping API just for this case is really
> on though - if we're passed a list of PFNs, they either must be all
> associated with a struct page - iow, pfn_valid(pfn) returns true for
> all of them or none of them. If none of them, then we need to be able
> to convert those PFNs to a dma_addr_t for the target device (yes, that
> may need the dma-mapping API augmenting.)
>
> However, if they are associated with a struct page, then we should use
> the established APIs and use a scatterlist, otherwise we're looking
> at rewriting all IOMMUs and all DMA mapping implementations to handle
> what would become a special case for dma_buf.
>
> I'd rather... Keep It Simple.
>
+1 for Keep it simple, and the idea overall. Though I suspect more
dma-buf users (dri / v4l friends?) should comment if this doesn't help
solve things on some platforms / subsystems that they care about.

> So, maybe, as a first step, let's augment dma_buf with a pair of
> functions which get the "raw" unmapped scatterlist:
>
> struct sg_table *dma_buf_get_scatterlist(struct dma_buf_attachment *attach)
> {
> struct sg_table *sg_table;
>
> might_sleep();
>
> if (!attach->dmabuf->ops->get_scatterlist)
> return ERR_PTR(-EINVAL);
>
> sg_table = attach->dmabuf->ops->get_scatterlist(attach);
> if (!sg_table)
> sg_table = ERR_PTR(-ENOMEM);
>
> return sg_table;
> }
>
> void dma_buf_put_scatterlist(struct dma_buf_attachment *attach,
> struct sg_table *sg_table)
> {
> might_sleep();
>
> attach->dmabuf->ops->put_scatterlist(attach, sg_table);
> }
>
> Implementations should arrange for dma_buf_get_scatterlist() to return
> the EINVAL error pointer if they are unable to provide an unmapped
> scatterlist (in other words, they are exporting a set of PFNs or
> already-mapped dma_addr_t's.) This can be done by either not
> implementing the get_scatterlist method, or by implementing it and
> returning that forementioned error pointer value.
>
> Where these two are implemented and used, the importer is responsible
> for calling dma_map_sg() and dma_unmap_sg() on the returned scatterlist
> table.
>
> unsigned long *dma_buf_get_pfns(struct dma_buf_attachment *attach)
> {
> unsigned long *pfns;
>
> might_sleep();
>
> if (!attach->dmabuf->ops->get_pfns)
> return ERR_PTR(-EINVAL);
>
> return attach->dmabuf->ops->get_pfns(attach);
> }
>
> void dma_buf_put_pfns(struct dma_buf_attachment *attach, unsigned long *pfns)
> {
> might_sleep();
>
> attach->dmabuf->ops->put_pfns(attach, pfns);
> }
>
> Similar to the above, but this gets a list of PFNs. Each PFN entry prior
> to the last describes a page starting at offset 0 extending to the end of
> the page. The last PFN entry describes a page starting at offset 0 and
> extending to the offset of the attachment size within the page. Again,
> if not implemented or it is not possible to represent the buffer as PFNs,
> it returns -EINVAL.
>
> For the time being, we keep the existing dma_buf_map_attachment() and
> dma_buf_unmap_attachment() while we transition users over to the new
> interfaces.
>
> We may wish to augment struct dma_buf_attachment with a couple of flags
> to indicate which of these the attached dma_buf supports, so that drivers
> can deal with these themselves.
>
Could I please send a patchset first for the constraint-flags
addition, and then take this up in a following patch-set, once we have
agreement from other stake holders as well?

Russell: would it be ok if we discuss it offline more? I am afraid I
am not as knowledgeable on this topic, and would probably request your
help / guidance here.

> We may also wish in the longer term to keep dma_buf_map_attachment() but
> implement it as a wrapper around get_scatterlist() as a helper to provide
> its existing functionality - providing a mapped scatterlist. Possibly
> also including get_pfns() in that as well if we need to.
>
> However, I would still like more thought put into Rob's issue to see
> whether we can solve his problem with using the dma_addr_t in a more
> elegant way. (I wish I had some hardware to experiment with for that.)
>
> --
> FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
> according to speedtest.net.

Best regards,
~Sumit.

2015-06-03 06:40:11

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Hi Sumit,

On 05/05/2015 04:41 PM, Sumit Semwal wrote:
> Hi Russell, everyone,
>
> First up, sincere apologies for being awol for sometime; had some
> personal / medical things to take care of, and then I thought I'd wait
> for the merge window to get over before beginning to discuss this
> again.
>
> On 11 February 2015 at 21:53, Russell King - ARM Linux
> <[email protected]> wrote:
>> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>>>> Which is a damn good reason to NAK it - by that admission, it's a half-baked
>>>> idea.
>>>>
>>>> If all we want to know is whether the importer can accept only contiguous
>>>> memory or not, make a flag to do that, and allow the exporter to test this
>>>> flag. Don't over-engineer this to make it _seem_ like it can do something
>>>> that it actually totally fails with.
>>>>
>>>> As I've already pointed out, there's a major problem if you have already
>>>> had a less restrictive attachment which has an active mapping, and a new
>>>> more restrictive attachment comes along later.
>>>>
>>>> It seems from Rob's descriptions that we also need another flag in the
>>>> importer to indicate whether it wants to have a valid struct page in the
>>>> scatter list, or whether it (correctly) uses the DMA accessors on the
>>>> scatter list - so that exporters can reject importers which are buggy.
>>>
>>> Okay, but flag-based approach also have limitations.
>>
>> Yes, the flag-based approach doesn't let you describe in detail what
>> the importer can accept - which, given the issues that I've raised
>> is a *good* thing. We won't be misleading anyone into thinking that
>> we can do something that's really half-baked, and which we have no
>> present requirement for.
>>
>> This is precisely what Linus talks about when he says "don't over-
>> engineer" - if we over-engineer this, we end up with something that
>> sort-of works, and that's a bad thing.
>>
>> The Keep It Simple approach here makes total sense - what are our
>> current requirements - to be able to say that an importer can only accept:
>> - contiguous memory rather than a scatterlist
>> - scatterlists with struct page pointers
>>
>> Does solving that need us to compare all the constraints of each and
>> every importer, possibly ending up with constraints which can't be
>> satisfied? No. Does the flag approach satisfy the requirements? Yes.
>>
>
> So, for basic constraint-sharing, we'll just go with the flag based
> approach, with a flag (best place for it is still dev->dma_params I
> suppose) for denoting contiguous or scatterlist. Is that agreed, then?
> Also, with this idea, of course, there won't be any helpers for trying
> to calculate constraints; it would be totally the exporter's
> responsibility to handle it via the attach() dma_buf_op if it wishes
> to.

What's wrong with the proposed max_segment_count? Many media devices do have a
limited max_segment_count and that should be taken into account.

Although to be fair I have to say that in practice the limited segment counts
are all well above what is needed for a normal capture buffer, but it might
be reached if you have video strides > line width, so each line has its own segment
(or two, if there is a page boundary in the middle of the line). It's a somewhat
contrived use-case, though, although I have done this once myself.

Anyway, the worst-case number of segments would be:

height * ((bytesperline + PAGE_SIZE - 1) / PAGE_SIZE)

That might well exceed the max segment count for some devices.

I think I have also seen devices in the past that have a coarse-grained IOMMU that
uses very large segment sizes, but have a very limited segment count. We don't
have media drivers like that, and to be honest I am not sure whether we should
bother too much with this corner case.

>>> Frankly, if we want to make it really portable and sharable between devices,
>>> then IMO we should get rid of struct scatterlist and replace it with simple
>>> array of pfns in dma_buf. This way at least the problem of missing struct
>>> page will be solved and the buffer representation will be also a bit more
>>> compact.
>>
>> ... and move the mapping and unmapping of the PFNs to the importer,
>> which IMHO is where it should already be (so the importer can decide
>> when it should map the buffer itself independently of getting the
>> details of the buffer.)
>>
>>> Such solution however also requires extending dma-mapping API to handle
>>> mapping and unmapping of such pfn arrays. The advantage of this approach
>>> is the fact that it will be completely new API, so it can be designed
>>> well from the beginning.
>>
>> As far as I'm aware, there was no big discussion of the dma_buf API -
>> it's something that just appeared one day (I don't remember seeing it
>> discussed.) So, that may well be a good thing if it means we can get
>> these kinds of details better hammered out.
>>
>> However, I don't share your view of "completely new API" - that would
>> be far too disruptive. I think we can modify the existing API, to
>> achieve our goals.
>>
>> I don't think changing the dma-mapping API just for this case is really
>> on though - if we're passed a list of PFNs, they either must be all
>> associated with a struct page - iow, pfn_valid(pfn) returns true for
>> all of them or none of them. If none of them, then we need to be able
>> to convert those PFNs to a dma_addr_t for the target device (yes, that
>> may need the dma-mapping API augmenting.)
>>
>> However, if they are associated with a struct page, then we should use
>> the established APIs and use a scatterlist, otherwise we're looking
>> at rewriting all IOMMUs and all DMA mapping implementations to handle
>> what would become a special case for dma_buf.
>>
>> I'd rather... Keep It Simple.
>>
> +1 for Keep it simple, and the idea overall. Though I suspect more
> dma-buf users (dri / v4l friends?) should comment if this doesn't help
> solve things on some platforms / subsystems that they care about.
>
>> So, maybe, as a first step, let's augment dma_buf with a pair of
>> functions which get the "raw" unmapped scatterlist:
>>
>> struct sg_table *dma_buf_get_scatterlist(struct dma_buf_attachment *attach)
>> {
>> struct sg_table *sg_table;
>>
>> might_sleep();
>>
>> if (!attach->dmabuf->ops->get_scatterlist)
>> return ERR_PTR(-EINVAL);
>>
>> sg_table = attach->dmabuf->ops->get_scatterlist(attach);
>> if (!sg_table)
>> sg_table = ERR_PTR(-ENOMEM);
>>
>> return sg_table;
>> }
>>
>> void dma_buf_put_scatterlist(struct dma_buf_attachment *attach,
>> struct sg_table *sg_table)
>> {
>> might_sleep();
>>
>> attach->dmabuf->ops->put_scatterlist(attach, sg_table);
>> }
>>
>> Implementations should arrange for dma_buf_get_scatterlist() to return
>> the EINVAL error pointer if they are unable to provide an unmapped
>> scatterlist (in other words, they are exporting a set of PFNs or
>> already-mapped dma_addr_t's.) This can be done by either not
>> implementing the get_scatterlist method, or by implementing it and
>> returning that forementioned error pointer value.
>>
>> Where these two are implemented and used, the importer is responsible
>> for calling dma_map_sg() and dma_unmap_sg() on the returned scatterlist
>> table.
>>
>> unsigned long *dma_buf_get_pfns(struct dma_buf_attachment *attach)
>> {
>> unsigned long *pfns;
>>
>> might_sleep();
>>
>> if (!attach->dmabuf->ops->get_pfns)
>> return ERR_PTR(-EINVAL);
>>
>> return attach->dmabuf->ops->get_pfns(attach);
>> }
>>
>> void dma_buf_put_pfns(struct dma_buf_attachment *attach, unsigned long *pfns)
>> {
>> might_sleep();
>>
>> attach->dmabuf->ops->put_pfns(attach, pfns);
>> }
>>
>> Similar to the above, but this gets a list of PFNs. Each PFN entry prior
>> to the last describes a page starting at offset 0 extending to the end of
>> the page. The last PFN entry describes a page starting at offset 0 and
>> extending to the offset of the attachment size within the page. Again,
>> if not implemented or it is not possible to represent the buffer as PFNs,
>> it returns -EINVAL.
>>
>> For the time being, we keep the existing dma_buf_map_attachment() and
>> dma_buf_unmap_attachment() while we transition users over to the new
>> interfaces.
>>
>> We may wish to augment struct dma_buf_attachment with a couple of flags
>> to indicate which of these the attached dma_buf supports, so that drivers
>> can deal with these themselves.
>>
> Could I please send a patchset first for the constraint-flags
> addition, and then take this up in a following patch-set, once we have
> agreement from other stake holders as well?

+1

One of the main problems end-users are faced with today is that they do not
know which device should be the exporter of buffers and which should be the
importer. This depends on the constraints and right now applications have
no way of knowing this. It's nuts that this hasn't been addressed yet since
it is the main complaint I am getting.

And it is something that the V4L2 API needs to export to userland anyway for
both the above and some other V4L2-specific reasons.

BTW, one thing I didn't see in the constraints discussion (but I didn't read
everything, so I may very well have missed it), is reporting whether memory
is opaque (i.e. cannot be mapped by non-trusted devices). No, I don't have
such hardware, but I know it exists. All the constraints mentioned so far
apply to such memory as well, but this is an additional flag (or perhaps some
trust ID). I don't think anyone made drivers for such hardware, although I
recently saw someone working on something like that.

Does anyone know about someone working in this area? Or existing code for this?

All I am saying is that this is sure to appear at some point.

Regards,

Hans

> Russell: would it be ok if we discuss it offline more? I am afraid I
> am not as knowledgeable on this topic, and would probably request your
> help / guidance here.
>
>> We may also wish in the longer term to keep dma_buf_map_attachment() but
>> implement it as a wrapper around get_scatterlist() as a helper to provide
>> its existing functionality - providing a mapped scatterlist. Possibly
>> also including get_pfns() in that as well if we need to.
>>
>> However, I would still like more thought put into Rob's issue to see
>> whether we can solve his problem with using the dma_addr_t in a more
>> elegant way. (I wish I had some hardware to experiment with for that.)
>>
>> --
>> FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
>> according to speedtest.net.
>
> Best regards,
> ~Sumit.
> _______________________________________________
> Linaro-mm-sig mailing list
> [email protected]
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig
>

2015-06-03 08:41:39

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
> Hi Sumit,
>
> On 05/05/2015 04:41 PM, Sumit Semwal wrote:
> > Hi Russell, everyone,
> >
> > First up, sincere apologies for being awol for sometime; had some
> > personal / medical things to take care of, and then I thought I'd wait
> > for the merge window to get over before beginning to discuss this
> > again.
> >
> > On 11 February 2015 at 21:53, Russell King - ARM Linux
> > <[email protected]> wrote:
> >> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
> >>> Hello,
> >>>
> >>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
> >>>> Which is a damn good reason to NAK it - by that admission, it's a half-baked
> >>>> idea.
> >>>>
> >>>> If all we want to know is whether the importer can accept only contiguous
> >>>> memory or not, make a flag to do that, and allow the exporter to test this
> >>>> flag. Don't over-engineer this to make it _seem_ like it can do something
> >>>> that it actually totally fails with.
> >>>>
> >>>> As I've already pointed out, there's a major problem if you have already
> >>>> had a less restrictive attachment which has an active mapping, and a new
> >>>> more restrictive attachment comes along later.
> >>>>
> >>>> It seems from Rob's descriptions that we also need another flag in the
> >>>> importer to indicate whether it wants to have a valid struct page in the
> >>>> scatter list, or whether it (correctly) uses the DMA accessors on the
> >>>> scatter list - so that exporters can reject importers which are buggy.
> >>>
> >>> Okay, but flag-based approach also have limitations.
> >>
> >> Yes, the flag-based approach doesn't let you describe in detail what
> >> the importer can accept - which, given the issues that I've raised
> >> is a *good* thing. We won't be misleading anyone into thinking that
> >> we can do something that's really half-baked, and which we have no
> >> present requirement for.
> >>
> >> This is precisely what Linus talks about when he says "don't over-
> >> engineer" - if we over-engineer this, we end up with something that
> >> sort-of works, and that's a bad thing.
> >>
> >> The Keep It Simple approach here makes total sense - what are our
> >> current requirements - to be able to say that an importer can only accept:
> >> - contiguous memory rather than a scatterlist
> >> - scatterlists with struct page pointers
> >>
> >> Does solving that need us to compare all the constraints of each and
> >> every importer, possibly ending up with constraints which can't be
> >> satisfied? No. Does the flag approach satisfy the requirements? Yes.
> >>
> >
> > So, for basic constraint-sharing, we'll just go with the flag based
> > approach, with a flag (best place for it is still dev->dma_params I
> > suppose) for denoting contiguous or scatterlist. Is that agreed, then?
> > Also, with this idea, of course, there won't be any helpers for trying
> > to calculate constraints; it would be totally the exporter's
> > responsibility to handle it via the attach() dma_buf_op if it wishes
> > to.
>
> What's wrong with the proposed max_segment_count? Many media devices do
> have a limited max_segment_count and that should be taken into account.

So what happens if you have a dma_buf exporter, and several dma_buf
importers. One dma_buf importer attaches to the exporter, and asks
for the buffer, and starts making use of the buffer. This export has
many scatterlist segments.

Another dma_buf importer attaches to the same buffer, and now asks for
the buffer, but the number of scatterlist segments exceeds it's
requirement.

You can't reallocate the buffer because it's in-use by another importer.
There is no way to revoke the buffer from the other importer. So there
is no way to satisfy this importer's requirements.

What I'm showing is that the idea that exporting these parameters fixes
some problem is just an illusion - it may work for the single importer
case, but doesn't for the multiple importer case.

Importers really have two choices here: either they accept what the
exporter is giving them, or they reject it.

The other issue here is that DMA scatterlists are _not_ really that
determinable in terms of number of entries when it comes to systems with
system IOMMUs. System IOMMUs, which should be integrated into the DMA
API, are permitted to coalesce entries in the physical page range. For
example:

nsg = 128;
n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);

Here, n might be 4 if the system IOMMU has been able to coalesce the 128
entries down to 4 IOMMU entries - and that means for DMA purposes, only
the first four scatterlist entries should be walked (this is why
dma_map_sg() returns a positive integer when mapping.)

Each struct device has a set of parameters which control how the IOMMU
entries are coalesced:

struct device_dma_parameters {
/*
* a low level driver may set these to teach IOMMU code about
* sg limitations.
*/
unsigned int max_segment_size;
unsigned long segment_boundary_mask;
};

and this is independent of the dma_buf API. This doesn't indicate the
maximum number of segments, but as I've shown above, it's not something
that you can say "I want a scatterlist for this memory with only 32
segments" so it's totally unclear how an exporter would limit that.

The only thing an exporter could do would be to fail the export if the
buffer didn't end up having fewer than the requested scatterlist entries,
which is something the importer can do too.

> One of the main problems end-users are faced with today is that they do not
> know which device should be the exporter of buffers and which should be the
> importer. This depends on the constraints and right now applications have
> no way of knowing this. It's nuts that this hasn't been addressed yet since
> it is the main complaint I am getting.

IT's nuts that we've ended up in this situation in the first place. This
was bound to happen as soon as the dma_buf sharing was introduced, because
it immediately introduced this problem. I don't think there is any easy
solution to it, and what's being proposed with flags and other stuff is
just trying to paper over the problem.

What you're actually asking is that each dmabuf exporting subsystem needs
to publish their DMA parameters to userspace, and userspace then gets to
decide which dmabuf exporter should be used.

That's not a dmabuf problem, that's a subsystem problem, but even so, we
don't have a standardised way to export that information (and I'd suspect
that it would be very difficult to get agreements between subsystems on
a standard ioctl and/or data structure.) In my experience, getting cross-
subsystem agreement in the kernel with anything is very difficult, you
normally end up with 60% of people agreeing, and the other 40% going off
and doing something completely different because they object to it
(figures vary, 90% of all statistics are made up on the spot!)

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-06-03 09:39:17

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On 06/03/15 10:41, Russell King - ARM Linux wrote:
> On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
>> Hi Sumit,
>>
>> On 05/05/2015 04:41 PM, Sumit Semwal wrote:
>>> Hi Russell, everyone,
>>>
>>> First up, sincere apologies for being awol for sometime; had some
>>> personal / medical things to take care of, and then I thought I'd wait
>>> for the merge window to get over before beginning to discuss this
>>> again.
>>>
>>> On 11 February 2015 at 21:53, Russell King - ARM Linux
>>> <[email protected]> wrote:
>>>> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
>>>>> Hello,
>>>>>
>>>>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>>>>>> Which is a damn good reason to NAK it - by that admission, it's a half-baked
>>>>>> idea.
>>>>>>
>>>>>> If all we want to know is whether the importer can accept only contiguous
>>>>>> memory or not, make a flag to do that, and allow the exporter to test this
>>>>>> flag. Don't over-engineer this to make it _seem_ like it can do something
>>>>>> that it actually totally fails with.
>>>>>>
>>>>>> As I've already pointed out, there's a major problem if you have already
>>>>>> had a less restrictive attachment which has an active mapping, and a new
>>>>>> more restrictive attachment comes along later.
>>>>>>
>>>>>> It seems from Rob's descriptions that we also need another flag in the
>>>>>> importer to indicate whether it wants to have a valid struct page in the
>>>>>> scatter list, or whether it (correctly) uses the DMA accessors on the
>>>>>> scatter list - so that exporters can reject importers which are buggy.
>>>>>
>>>>> Okay, but flag-based approach also have limitations.
>>>>
>>>> Yes, the flag-based approach doesn't let you describe in detail what
>>>> the importer can accept - which, given the issues that I've raised
>>>> is a *good* thing. We won't be misleading anyone into thinking that
>>>> we can do something that's really half-baked, and which we have no
>>>> present requirement for.
>>>>
>>>> This is precisely what Linus talks about when he says "don't over-
>>>> engineer" - if we over-engineer this, we end up with something that
>>>> sort-of works, and that's a bad thing.
>>>>
>>>> The Keep It Simple approach here makes total sense - what are our
>>>> current requirements - to be able to say that an importer can only accept:
>>>> - contiguous memory rather than a scatterlist
>>>> - scatterlists with struct page pointers
>>>>
>>>> Does solving that need us to compare all the constraints of each and
>>>> every importer, possibly ending up with constraints which can't be
>>>> satisfied? No. Does the flag approach satisfy the requirements? Yes.
>>>>
>>>
>>> So, for basic constraint-sharing, we'll just go with the flag based
>>> approach, with a flag (best place for it is still dev->dma_params I
>>> suppose) for denoting contiguous or scatterlist. Is that agreed, then?
>>> Also, with this idea, of course, there won't be any helpers for trying
>>> to calculate constraints; it would be totally the exporter's
>>> responsibility to handle it via the attach() dma_buf_op if it wishes
>>> to.
>>
>> What's wrong with the proposed max_segment_count? Many media devices do
>> have a limited max_segment_count and that should be taken into account.
>
> So what happens if you have a dma_buf exporter, and several dma_buf
> importers. One dma_buf importer attaches to the exporter, and asks
> for the buffer, and starts making use of the buffer. This export has
> many scatterlist segments.
>
> Another dma_buf importer attaches to the same buffer, and now asks for
> the buffer, but the number of scatterlist segments exceeds it's
> requirement.
>
> You can't reallocate the buffer because it's in-use by another importer.
> There is no way to revoke the buffer from the other importer. So there
> is no way to satisfy this importer's requirements.
>
> What I'm showing is that the idea that exporting these parameters fixes
> some problem is just an illusion - it may work for the single importer
> case, but doesn't for the multiple importer case.
>
> Importers really have two choices here: either they accept what the
> exporter is giving them, or they reject it.

I agree completely with that.

> The other issue here is that DMA scatterlists are _not_ really that
> determinable in terms of number of entries when it comes to systems with
> system IOMMUs. System IOMMUs, which should be integrated into the DMA
> API, are permitted to coalesce entries in the physical page range. For
> example:
>
> nsg = 128;
> n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);
>
> Here, n might be 4 if the system IOMMU has been able to coalesce the 128
> entries down to 4 IOMMU entries - and that means for DMA purposes, only
> the first four scatterlist entries should be walked (this is why
> dma_map_sg() returns a positive integer when mapping.)
>
> Each struct device has a set of parameters which control how the IOMMU
> entries are coalesced:
>
> struct device_dma_parameters {
> /*
> * a low level driver may set these to teach IOMMU code about
> * sg limitations.
> */
> unsigned int max_segment_size;
> unsigned long segment_boundary_mask;
> };
>
> and this is independent of the dma_buf API. This doesn't indicate the
> maximum number of segments, but as I've shown above, it's not something
> that you can say "I want a scatterlist for this memory with only 32
> segments" so it's totally unclear how an exporter would limit that.
>
> The only thing an exporter could do would be to fail the export if the
> buffer didn't end up having fewer than the requested scatterlist entries,
> which is something the importer can do too.

Right.

>> One of the main problems end-users are faced with today is that they do not
>> know which device should be the exporter of buffers and which should be the
>> importer. This depends on the constraints and right now applications have
>> no way of knowing this. It's nuts that this hasn't been addressed yet since
>> it is the main complaint I am getting.
>
> IT's nuts that we've ended up in this situation in the first place. This
> was bound to happen as soon as the dma_buf sharing was introduced, because
> it immediately introduced this problem. I don't think there is any easy
> solution to it, and what's being proposed with flags and other stuff is
> just trying to paper over the problem.

This was the first thing raised in the initial discussions. My suggestion at
the time was to give userspace limited information about the buffer restrictions:
Physically contiguous, scatter-gather and 'weird'. But obviously you need
segment_boundary_mask and max_segment_size as well.

And the application can decide based on that info which device has the most
restrictive requirements and make that the exporter.

I am not sure whether there is any sense in exporting the max_segment_count
to userspace (probably not), but I see no reason why we can't set it internally.
That said, a single flag is OK for me as well.

> What you're actually asking is that each dmabuf exporting subsystem needs
> to publish their DMA parameters to userspace, and userspace then gets to
> decide which dmabuf exporter should be used.

Yes, that was the initial plan.

> That's not a dmabuf problem, that's a subsystem problem,

Well, yes, but it doesn't hurt to sync which DMA parameters are exposed with
what dma-buf uses.

> but even so, we
> don't have a standardised way to export that information (and I'd suspect
> that it would be very difficult to get agreements between subsystems on
> a standard ioctl and/or data structure.) In my experience, getting cross-
> subsystem agreement in the kernel with anything is very difficult, you
> normally end up with 60% of people agreeing, and the other 40% going off
> and doing something completely different because they object to it
> (figures vary, 90% of all statistics are made up on the spot!)

I don't care which ioctl or other mechanism a subsystem uses to expose the
information. Each subsystem should design their own method for that. Imposing
a standard API for that generally doesn't work for the reasons you give.

But deciding *which* minimum set of information is exposed, that is another
matter and that should be synced. And the easiest starting point for that is
that the device will store that information internally in device_dma_parameters.

The various subsystems can then make APIs to expose that info.

Regards,

Hans

2015-06-04 05:24:33

by Sumit Semwal

[permalink] [raw]
Subject: Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

On 3 June 2015 at 15:07, Hans Verkuil <[email protected]> wrote:
> On 06/03/15 10:41, Russell King - ARM Linux wrote:
>> On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
>>> Hi Sumit,
>>>
>>> On 05/05/2015 04:41 PM, Sumit Semwal wrote:
>>>> Hi Russell, everyone,
>>>>
>>>> First up, sincere apologies for being awol for sometime; had some
>>>> personal / medical things to take care of, and then I thought I'd wait
>>>> for the merge window to get over before beginning to discuss this
>>>> again.
>>>>
>>>> On 11 February 2015 at 21:53, Russell King - ARM Linux
>>>> <[email protected]> wrote:
>>>>> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>>>>>>> Which is a damn good reason to NAK it - by that admission, it's a half-baked
>>>>>>> idea.
>>>>>>>
>>>>>>> If all we want to know is whether the importer can accept only contiguous
>>>>>>> memory or not, make a flag to do that, and allow the exporter to test this
>>>>>>> flag. Don't over-engineer this to make it _seem_ like it can do something
>>>>>>> that it actually totally fails with.
>>>>>>>
>>>>>>> As I've already pointed out, there's a major problem if you have already
>>>>>>> had a less restrictive attachment which has an active mapping, and a new
>>>>>>> more restrictive attachment comes along later.
>>>>>>>
>>>>>>> It seems from Rob's descriptions that we also need another flag in the
>>>>>>> importer to indicate whether it wants to have a valid struct page in the
>>>>>>> scatter list, or whether it (correctly) uses the DMA accessors on the
>>>>>>> scatter list - so that exporters can reject importers which are buggy.
>>>>>>
>>>>>> Okay, but flag-based approach also have limitations.
>>>>>
>>>>> Yes, the flag-based approach doesn't let you describe in detail what
>>>>> the importer can accept - which, given the issues that I've raised
>>>>> is a *good* thing. We won't be misleading anyone into thinking that
>>>>> we can do something that's really half-baked, and which we have no
>>>>> present requirement for.
>>>>>
>>>>> This is precisely what Linus talks about when he says "don't over-
>>>>> engineer" - if we over-engineer this, we end up with something that
>>>>> sort-of works, and that's a bad thing.
>>>>>
>>>>> The Keep It Simple approach here makes total sense - what are our
>>>>> current requirements - to be able to say that an importer can only accept:
>>>>> - contiguous memory rather than a scatterlist
>>>>> - scatterlists with struct page pointers
>>>>>
>>>>> Does solving that need us to compare all the constraints of each and
>>>>> every importer, possibly ending up with constraints which can't be
>>>>> satisfied? No. Does the flag approach satisfy the requirements? Yes.
>>>>>
>>>>
>>>> So, for basic constraint-sharing, we'll just go with the flag based
>>>> approach, with a flag (best place for it is still dev->dma_params I
>>>> suppose) for denoting contiguous or scatterlist. Is that agreed, then?
>>>> Also, with this idea, of course, there won't be any helpers for trying
>>>> to calculate constraints; it would be totally the exporter's
>>>> responsibility to handle it via the attach() dma_buf_op if it wishes
>>>> to.
>>>
>>> What's wrong with the proposed max_segment_count? Many media devices do
>>> have a limited max_segment_count and that should be taken into account.
>>
>> So what happens if you have a dma_buf exporter, and several dma_buf
>> importers. One dma_buf importer attaches to the exporter, and asks
>> for the buffer, and starts making use of the buffer. This export has
>> many scatterlist segments.
>>
>> Another dma_buf importer attaches to the same buffer, and now asks for
>> the buffer, but the number of scatterlist segments exceeds it
>> requirement.

So, in the midst of all the various directions this discussion has
taken, I seem to have missed to reiterate the base premise for this
suggestion [1] - that we can use this information to help implement a
deferred allocation logic - so that all the importers can attach
first, and the exporter can do the actual allocation on the first
map() call.
This is also inline with the prescribed usage of dma_buf_attach() /
dma_buf_map_attachment() sequence - ideally speaking, all
participating 'importers' of dma_buf should only attach first, and
then map() at a 'later' time, which is usually right before using the
buffer actually.
Note: at present, both DRI and V4L subsystems don't do that; while
proposing this RFC I had deliberately kept that separate, as it is a
related but orthogonal problem to solve. I guess I should address that
in parallel.
>>
>> You can't reallocate the buffer because it's in-use by another importer.
>> There is no way to revoke the buffer from the other importer. So there
>> is no way to satisfy this importer's requirements.
>>
You're right; but in a deferred allocation mechanism, this
constraint-sharing can atleast help decide on the most restrictive
allocation at the time of first map() ,and if later, an importer with
more relaxed constraints attaches, it can still use the same buffer.
A more restrictive importer would still be not allowed, but in that
case the exporter can disallow that importer from attaching, and a
feedback to userspace is possible.

>> What I'm showing is that the idea that exporting these parameters fixes
>> some problem is just an illusion - it may work for the single importer
>> case, but doesn't for the multiple importer case.
>>
>> Importers really have two choices here: either they accept what the
>> exporter is giving them, or they reject it.
>
> I agree completely with that.
>
In a non-deferred allocation case, these constraints have no meaning,
since it will always depend on the first subsystem to attach,
irrespective of the exporter's possible capability to allocate from
different allocators with different constraints; for example, if a
subsystem with relaxed constraints attaches first, a later more
restrictive constraints attach() request will fail, even though the
exporter might have the ability to allocate with the more restricted
constraints.

In deferred allocation, on the other hand, the exporter atleast gets
the ability to possibly choose the allocation mechanism based on the
match of most restricted importers, so the order of attach() ceases to
matter.

>> The other issue here is that DMA scatterlists are _not_ really that
>> determinable in terms of number of entries when it comes to systems with
>> system IOMMUs. System IOMMUs, which should be integrated into the DMA
>> API, are permitted to coalesce entries in the physical page range. For
>> example:
>>
>> nsg = 128;
>> n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);
>>
>> Here, n might be 4 if the system IOMMU has been able to coalesce the 128
>> entries down to 4 IOMMU entries - and that means for DMA purposes, only
>> the first four scatterlist entries should be walked (this is why
>> dma_map_sg() returns a positive integer when mapping.)
>>
>> Each struct device has a set of parameters which control how the IOMMU
>> entries are coalesced:
>>
>> struct device_dma_parameters {
>> /*
>> * a low level driver may set these to teach IOMMU code about
>> * sg limitations.
>> */
>> unsigned int max_segment_size;
>> unsigned long segment_boundary_mask;
>> };
>>
>> and this is independent of the dma_buf API. This doesn't indicate the
>> maximum number of segments, but as I've shown above, it's not something
>> that you can say "I want a scatterlist for this memory with only 32
>> segments" so it's totally unclear how an exporter would limit that.
>>
>> The only thing an exporter could do would be to fail the
>> buffer didn't end up having fewer than the requested scatterlist entries,
>> which is something the importer can do too.
>
You're right in that after allocation is done, an exporter can only
fail a more restrictive attach request, but I don't think the importer
has any way of knowing the information about the current allocation?
Unless I misunderstood something.
> Right.
>
>>> One of the main problems end-users are faced with today is that they do not
>>> know which device should be the exporter of buffers and which should be the
>>> importer. This depends on the constraints and right now applications have
>>> no way of knowing this. It's nuts that this hasn't been addressed yet since
>>> it is the main complaint I am getting.
>>
One of the ways to try and solve this is via the deferred allocation
mechanism described above; I hope it makes sense to you all, but if it
doesn't, may I request you to please help me understand why not?

>> IT's nuts that we've ended up in this situation in the first place. This
>> was bound to happen as soon as the dma_buf sharing was introduced, because
>> it immediately introduced this problem. I don't think there is any easy
>> solution to it, and what's being proposed with flags and other stuff is
>> just trying to paper over the problem.
>
> This was the first thing raised in the initial discussions. My suggestion at
> the time was to give userspace limited information about the buffer restrictions:
> Physically contiguous, scatter-gather and 'weird'. But obviously you need
> segment_boundary_mask and max_segment_size as well.
>
> And the application can decide based on that info which device has the most
> restrictive requirements and make that the exporter.
>
> I am not sure whether there is any sense in exporting the max_segment_count
> to userspace (probably not), but I see no reason why we can't set it internally.
> That said, a single flag is OK for me as well.
>
>> What you're actually asking is that each dmabuf exporting subsystem needs
>> to publish their DMA parameters to userspace, and userspace then gets to
>> decide which dmabuf exporter should be used.
>
> Yes, that was the initial plan.
>
In absence of deferred allocation, that could be the other option.
With deferred allocation, we could try and keep this internal to the
kernel.

>> That's not a dmabuf problem, that's a subsystem problem,
>
> Well, yes, but it doesn't hurt to sync which DMA parameters are exposed with
> what dma-buf uses.
>
>> but even so, we
>> don't have a standardised way to export that information (and I'd suspect
>> that it would be very difficult to get agreements between subsystems on
>> a standard ioctl and/or data structure.) In my experience, getting cross-
>> subsystem agreement in the kernel with anything is very difficult, you
>> normally end up with 60% of people agreeing, and the other 40% going off
>> and doing something completely different because they object to it
>> (figures vary, 90% of all statistics are made up on the spot!)
>
> I don't care which ioctl or other mechanism a subsystem uses to expose the
> information. Each subsystem should design their own method for that. Imposing
> a standard API for that generally doesn't work for the reasons you give.
>
> But deciding *which* minimum set of information is exposed, that is another
> matter and that should be synced. And the easiest starting point for that is
> that the device will store that information internally in device_dma_parameters.
>
> The various subsystems can then make APIs to expose that info.
>
> Regards,
>
> Hans

Best regards,
Sumit.