2013-03-08 16:16:29

by Terje Bergstrom

[permalink] [raw]
Subject: Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

On 26.02.2013 11:48, Terje Bergstr?m wrote:
> On 25.02.2013 17:24, Thierry Reding wrote:
>> You use two different styles to indent the function parameters. You
>> might want to stick to one, preferably aligning them with the first
>> parameter on the first line.
>
> I've generally favored "two tabs" indenting, but we'll anyway
> standardize on one.

We standardized on the convention used in tegradrm, i.e. aligning with
first parameter.

>> There's nothing in this function that requires a platform_device, so
>> passing struct device should be enough. Or maybe host1x_cdma should get
>> a struct device * field?
>
> I think we'll just start using struct device * in general in code.
> Arto's been already fixing a lot of these, so he might've already fixed
> this.

We did a sweep in the code and now I hope everything that can, uses
struct device *. The side effect was getting rid of a lot of casting,
which is good.

>> Why don't you use any of the kernel's reference counting mechanisms?
>>
>>> +void host1x_channel_put(struct host1x_channel *ch)
>>> +{
>>> + mutex_lock(&ch->reflock);
>>> + if (ch->refcount == 1) {
>>> + host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
>>> + host1x_cdma_deinit(&ch->cdma);
>>> + }
>>> + ch->refcount--;
>>> + mutex_unlock(&ch->reflock);
>>> +}
>>
>> I think you can do all of this using a kref.
>
> I think the original reason was that there's no reason to use atomic
> kref, as we anyway have to do mutual exclusion via mutex. But, using
> kref won't be any problem, so we could use that.

Actually, we ended up with a problem with this. kref assumes that once
refcount goes to zero, it gets destroyed. In ch->refcount, going to zero
is just fine and just indicates that we need to initialize. And, we
anyway need to do locking, so we didn't do the conversion to kref.

>>> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
>>> +{
(...)
>>> +}
>>
>> I think the critical section could be shorter here. It's probably not
>> worth the extra trouble, though, given that channels are not often
>> allocated.
>
> Yeah, boot time isn't measured in microseconds. :-) But, if we just make
> allocated_channels an atomic, we should be able to drop chlist_mutex
> altogether and it could simplify the code.

There wasn't much we could have moved outside the critical section, so
we didn't touch this area.

>> Also, is it really necessary to abstract these into an ops structure? I
>> get that newer hardware revisions might require different ops for sync-
>> point handling because the register layout or number of syncpoints may
>> be different, but the CDMA and push buffer (below) concepts are pretty
>> much a software abstraction, and as such its implementation is unlikely
>> to change with some future hardware revision.
>
> Pushbuffer ops can become generic. There's only one catch - init uses
> the restart opcode. But the opcode is not going to change, so we can
> generalize that.

We ended up keeping the init as an operation, but rest of push buffer
ops became generic.

>>
>>> +/*
>>> + * Push two words to the push buffer
>>> + * Caller must ensure push buffer is not full
>>> + */
>>> +static void push_buffer_push_to(struct push_buffer *pb,
>>> + struct mem_handle *handle,
>>> + u32 op1, u32 op2)
>>> +{
>>> + u32 cur = pb->cur;
>>> + u32 *p = (u32 *)((u32)pb->mapped + cur);
>>
>> You do all this extra casting to make sure to increment by bytes and not
>> 32-bit words. How about you change pb->cur to contain the word index, so
>> that you don't have to go through hoops each time around.

When we changed DMASTART and DMAEND to actually denote the push buffer
area, we noticed that DMAGET and DMAPUT are actually relative to
DMASTART and DMAEND. This and the need to access both CPU and device
virtual addresses coupled with changing to word indexes didn't actually
simplify the code, so we kept still using byte indexes.

>>
>>> +/*
>>> + * Return the number of two word slots free in the push buffer
>>> + */
>>> +static u32 push_buffer_space(struct push_buffer *pb)
>>> +{
>>> + return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
>>> +}
>>
>> Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
>> PUSH_BUFFER_SIZE, can it?
>
> You're right, this function doesn't need to worry about wrapping.

Arto noticed this, but actually I was wrong - the wrapping is very
possible. We just have to remember that if we're processing something at
the end of push buffer, cur might be in the end, and fence in the beginning.

>>> diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
>> [...]
>>> +struct mem_handle;
>>> +struct platform_device;
>>> +
>>> +struct host1x_job_unpin_data {
>>> + struct mem_handle *h;
>>> + struct sg_table *mem;
>>> +};
>>> +
>>> +enum mem_mgr_flag {
>>> + mem_mgr_flag_uncacheable = 0,
>>> + mem_mgr_flag_write_combine = 1,
>>> +};
>>
>> I'd like to see this use a more object-oriented approach and more common
>> terminology. All of these handles are essentially buffer objects, so
>> maybe something like host1x_bo would be a nice and short name.

We did this a bit differently, but following pretty much the same
principles. We have host1x_mem_handle, which contains an ops pointer.
The handle gets encapsulated inside drm_gem_cma_object.

_bo structs seem to usually contains a drm_gem_object, so we thought
it's better not to reuse that term.

Please check the code and let us know what you think. This pretty much
follows what Lucas proposed a while ago, and keeps neatly the DRM
specific parts inside the drm directory.

Other than these, we should have implemented all changes that we agreed
to include. If something's missing, it's because there were so many that
we just dropped the ball.

Terje


2013-03-08 20:43:11

by Thierry Reding

[permalink] [raw]
Subject: Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

On Fri, Mar 08, 2013 at 06:16:16PM +0200, Terje Bergström wrote:
> On 26.02.2013 11:48, Terje Bergström wrote:
> > On 25.02.2013 17:24, Thierry Reding wrote:
[...]
> >>> +struct mem_handle;
> >>> +struct platform_device;
> >>> +
> >>> +struct host1x_job_unpin_data {
> >>> + struct mem_handle *h;
> >>> + struct sg_table *mem;
> >>> +};
> >>> +
> >>> +enum mem_mgr_flag {
> >>> + mem_mgr_flag_uncacheable = 0,
> >>> + mem_mgr_flag_write_combine = 1,
> >>> +};
> >>
> >> I'd like to see this use a more object-oriented approach and more common
> >> terminology. All of these handles are essentially buffer objects, so
> >> maybe something like host1x_bo would be a nice and short name.
>
> We did this a bit differently, but following pretty much the same
> principles. We have host1x_mem_handle, which contains an ops pointer.
> The handle gets encapsulated inside drm_gem_cma_object.
>
> _bo structs seem to usually contains a drm_gem_object, so we thought
> it's better not to reuse that term.
>
> Please check the code and let us know what you think. This pretty much
> follows what Lucas proposed a while ago, and keeps neatly the DRM
> specific parts inside the drm directory.

A bo is just a buffer object, so I don't see why the name shouldn't be
used. The name is in no way specific to DRM or GEM. But the point that I
was trying to make was that there is nothing to suggest that we couldn't
use drm_gem_object as the underlying scaffold to base all host1x buffer
objects on.

Furthermore I don't understand why you've chosen this approach. It is
completely different from what other drivers do and therefore makes it
more difficult to comprehend. That alone I could live with if there were
any advantages to that approach, but as far as I can tell there are
none.

Thierry


Attachments:
(No filename) (1.76 kB)
(No filename) (836.00 B)
Download all attachments

2013-03-11 06:29:29

by Terje Bergstrom

[permalink] [raw]
Subject: Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

On 08.03.2013 22:43, Thierry Reding wrote:
> A bo is just a buffer object, so I don't see why the name shouldn't
> be used. The name is in no way specific to DRM or GEM. But the point
> that I was trying to make was that there is nothing to suggest that
> we couldn't use drm_gem_object as the underlying scaffold to base all
> host1x buffer objects on.
>
> Furthermore I don't understand why you've chosen this approach. It
> is completely different from what other drivers do and therefore
> makes it more difficult to comprehend. That alone I could live with
> if there were any advantages to that approach, but as far as I can
> tell there are none.

I was following the plan we agreed on earlier in email discussion with
you and Lucas:

On 29.11.2012 11:09, Lucas Stach wrote:
> We should aim for a clean split here. GEM handles are something which
> is really specific to how DRM works and as such should be constructed
> by tegradrm. nvhost should really just manage allocations/virtual
> address space and provide something that is able to back all the GEM
> handle operations.
>
> nvhost has really no reason at all to even know about GEM handles.
> If you back a GEM object by a nvhost object you can just peel out
> the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
> handler and queue the job to nvhost using it's native handles.
>
> This way you would also be able to construct different handles (like
> GEM obj or V4L2 buffers) from the same backing nvhost object. Note
> that I'm not sure how useful this would be, but it seems like a
> reasonable design to me being able to do so.

With this structure, we are already prepared for non-DRM APIs. Tt's a
matter of familiarity of code versus future expansion. Code paths for
both are as simple/complex, so neither has a direct technical
superiority in performance.

I know other DRM drivers have opted to hard code GEM dependency
throughout the code. Then again, host1x hardware is managing much more
than graphics, so we need to think outside the DRM box, too.

Terje

2013-03-11 07:19:00

by Thierry Reding

[permalink] [raw]
Subject: Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

On Mon, Mar 11, 2013 at 08:29:59AM +0200, Terje Bergström wrote:
> On 08.03.2013 22:43, Thierry Reding wrote:
> > A bo is just a buffer object, so I don't see why the name shouldn't
> > be used. The name is in no way specific to DRM or GEM. But the point
> > that I was trying to make was that there is nothing to suggest that
> > we couldn't use drm_gem_object as the underlying scaffold to base all
> > host1x buffer objects on.
> >
> > Furthermore I don't understand why you've chosen this approach. It
> > is completely different from what other drivers do and therefore
> > makes it more difficult to comprehend. That alone I could live with
> > if there were any advantages to that approach, but as far as I can
> > tell there are none.
>
> I was following the plan we agreed on earlier in email discussion with
> you and Lucas:
>
> On 29.11.2012 11:09, Lucas Stach wrote:
> > We should aim for a clean split here. GEM handles are something which
> > is really specific to how DRM works and as such should be constructed
> > by tegradrm. nvhost should really just manage allocations/virtual
> > address space and provide something that is able to back all the GEM
> > handle operations.
> >
> > nvhost has really no reason at all to even know about GEM handles.
> > If you back a GEM object by a nvhost object you can just peel out
> > the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
> > handler and queue the job to nvhost using it's native handles.
> >
> > This way you would also be able to construct different handles (like
> > GEM obj or V4L2 buffers) from the same backing nvhost object. Note
> > that I'm not sure how useful this would be, but it seems like a
> > reasonable design to me being able to do so.
>
> With this structure, we are already prepared for non-DRM APIs. Tt's a
> matter of familiarity of code versus future expansion. Code paths for
> both are as simple/complex, so neither has a direct technical
> superiority in performance.
>
> I know other DRM drivers have opted to hard code GEM dependency
> throughout the code. Then again, host1x hardware is managing much more
> than graphics, so we need to think outside the DRM box, too.

This sound a bit over-engineered at this point in time. DRM is currently
the only user. Is anybody working on any non-DRM drivers that would use
this?

Even that aside, I don't think host1x_mem_handle is a good choice of
name here. The objects are much more than handles. They are in fact
buffer objects, which can optionally be attached to a handle. I also
think that using a void * to store the handle specific data isn't such a
good idea.

So how about the following proposal, which I think might satisfy both of
us:

struct host1x_bo;

struct host1x_bo_ops {
struct host1x_bo *(*get)(struct host1x_bo *bo);
void (*put)(struct host1x_bo *bo);
dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
...
};

struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
void host1x_bo_put(struct host1x_bo *bo);
dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
...

struct host1x_bo {
const struct host1x_bo_ops *ops;
...
};

struct tegra_drm_bo {
struct host1x_bo base;
...
};

That way you can get rid of the host1x_memmgr_create_handle() helper and
instead embed host1x_bo into driver-/framework-specific structures with
the necessary initialization.

It also allows you to interact directly with the objects instead of
having to go through the memmgr API. The memory manager doesn't really
exist anymore so keeping the name in the API is only confusing. Your
current proposal deals with memory handles directly already so it's
really just making the naming more consistent.

Thierry


Attachments:
(No filename) (3.65 kB)
(No filename) (836.00 B)
Download all attachments

2013-03-11 09:20:34

by Terje Bergstrom

[permalink] [raw]
Subject: Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

On 11.03.2013 09:18, Thierry Reding wrote:
> This sound a bit over-engineered at this point in time. DRM is currently
> the only user. Is anybody working on any non-DRM drivers that would use
> this?

Well, this contains beginning of that:

http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2

I don't want to give these guys any excuse not to port it over to host1x
code base. :-)

> Even that aside, I don't think host1x_mem_handle is a good choice of
> name here. The objects are much more than handles. They are in fact
> buffer objects, which can optionally be attached to a handle. I also
> think that using a void * to store the handle specific data isn't such a
> good idea.

Naming if not an issue for me - we can easily agree on using _bo.

> So how about the following proposal, which I think might satisfy both of
> us:
>
> struct host1x_bo;
>
> struct host1x_bo_ops {
> struct host1x_bo *(*get)(struct host1x_bo *bo);
> void (*put)(struct host1x_bo *bo);
> dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
> ...
> };
>
> struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
> void host1x_bo_put(struct host1x_bo *bo);
> dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
> ...
>
> struct host1x_bo {
> const struct host1x_bo_ops *ops;
> ...
> };
>
> struct tegra_drm_bo {
> struct host1x_bo base;
> ...
> };
>
> That way you can get rid of the host1x_memmgr_create_handle() helper and
> instead embed host1x_bo into driver-/framework-specific structures with
> the necessary initialization.

This would make sense. We'll get back when we have enough of
implementation done to understand it all. One consequence is that we
cannot use drm_gem_cma_create() anymore. We'll have to introduce a
function that does the same as drm_gem_cma_create(), but it takes a
pre-allocated drm_gem_cma_object pointer. That way we can allocate the
struct, and use DRM CMA just to initialize the drm_gem_cma_object.

Other way would be just taking a copy of DRM CMA helper, but I'd like to
defer that to the next step when we implement IOMMU aware allocator.

> It also allows you to interact directly with the objects instead of
> having to go through the memmgr API. The memory manager doesn't really
> exist anymore so keeping the name in the API is only confusing. Your
> current proposal deals with memory handles directly already so it's
> really just making the naming more consistent.

The memmgr APIs are currently just a shortcut wrapper to the ops, so in
that sense the memmgr does not really exist. I think it might still make
sense to keep static inline wrappers for calling the ops within, but we
could rename them to host1x_bo_somethingandother. Then it'd follow the
pattern we are using for the hw ops in the latest set.

Terje

2013-03-11 09:41:41

by Thierry Reding

[permalink] [raw]
Subject: Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

On Mon, Mar 11, 2013 at 11:21:05AM +0200, Terje Bergström wrote:
> On 11.03.2013 09:18, Thierry Reding wrote:
> > This sound a bit over-engineered at this point in time. DRM is currently
> > the only user. Is anybody working on any non-DRM drivers that would use
> > this?
>
> Well, this contains beginning of that:
>
> http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2
>
> I don't want to give these guys any excuse not to port it over to host1x
> code base. :-)

I was aware of that driver but I didn't realize it had been available
publicly. It's great to see this, though, and one more argument in
favour of not binding the host1x_bo too tightly to DRM/GEM.

> > So how about the following proposal, which I think might satisfy both of
> > us:
> >
> > struct host1x_bo;
> >
> > struct host1x_bo_ops {
> > struct host1x_bo *(*get)(struct host1x_bo *bo);
> > void (*put)(struct host1x_bo *bo);
> > dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
> > ...
> > };
> >
> > struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
> > void host1x_bo_put(struct host1x_bo *bo);
> > dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
> > ...
> >
> > struct host1x_bo {
> > const struct host1x_bo_ops *ops;
> > ...
> > };
> >
> > struct tegra_drm_bo {
> > struct host1x_bo base;
> > ...
> > };
> >
> > That way you can get rid of the host1x_memmgr_create_handle() helper and
> > instead embed host1x_bo into driver-/framework-specific structures with
> > the necessary initialization.
>
> This would make sense. We'll get back when we have enough of
> implementation done to understand it all. One consequence is that we
> cannot use drm_gem_cma_create() anymore. We'll have to introduce a
> function that does the same as drm_gem_cma_create(), but it takes a
> pre-allocated drm_gem_cma_object pointer. That way we can allocate the
> struct, and use DRM CMA just to initialize the drm_gem_cma_object.

I certainly think that introducing a drm_gem_cma_object_init() function
shouldn't pose a problem. If you do, make sure to update the existing
drm_gem_cma_create() to use it. Having both lets users have the choice
to use drm_gem_cma_create() if they don't need to embed it, or
drm_gem_cma_object_init() otherwise.

> Other way would be just taking a copy of DRM CMA helper, but I'd like to
> defer that to the next step when we implement IOMMU aware allocator.

I'm not sure I understand what you're saying, but if you add a function
as discussed above this shouldn't be necessary.

> > It also allows you to interact directly with the objects instead of
> > having to go through the memmgr API. The memory manager doesn't really
> > exist anymore so keeping the name in the API is only confusing. Your
> > current proposal deals with memory handles directly already so it's
> > really just making the naming more consistent.
>
> The memmgr APIs are currently just a shortcut wrapper to the ops, so in
> that sense the memmgr does not really exist. I think it might still make
> sense to keep static inline wrappers for calling the ops within, but we
> could rename them to host1x_bo_somethingandother. Then it'd follow the
> pattern we are using for the hw ops in the latest set.

Yes, that's exactly what I had in mind in the above proposal. They could
be inline, but it's probably also okay if they're not. They aren't meant
to be used very frequently so the extra function call shouldn't matter
much. It might be easier to do add some additional checks if they aren't
inlined. I'm fine either way.

Thierry


Attachments:
(No filename) (3.60 kB)
(No filename) (836.00 B)
Download all attachments