2022-08-15 10:14:04

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
> On 8/15/22 13:05, Christian König wrote:
>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>> they
>>> need to be refcounted, otherwise it's impossible to map them by KVM. This
>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>> mapping
>>> faults.
>>>
>>> Without this change guest virgl driver can't map host buffers into guest
>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>> mappings are also needed for enabling the Venus driver using host GPU
>>> drivers that are utilizing TTM.
>>>
>>> Based on a patch proposed by Trigger Huang.
>> Well I can't count how often I have repeated this: This is an absolutely
>> clear NAK!
>>
>> TTM pages are not reference counted in the first place and because of
>> this giving them to virgl is illegal.
> A? The first page is refcounted when allocated, the tail pages are not.

No they aren't. The first page is just by coincident initialized with a
refcount of 1. This refcount is completely ignored and not used at all.

Incrementing the reference count and by this mapping the page into some
other address space is illegal and corrupts the internal state tracking
of TTM.

>> Please immediately stop this completely broken approach. We have
>> discussed this multiple times now.
> Could you please give me a link to these discussions?

Not of hand, please search the dri-devel list for similar patches. This
was brought up multiple times now.

Regards,
Christian.


2022-08-15 10:17:17

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

Am 15.08.22 um 12:11 schrieb Christian König:
> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>> On 8/15/22 13:05, Christian König wrote:
>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>> they
>>>> need to be refcounted, otherwise it's impossible to map them by
>>>> KVM. This
>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>> mapping
>>>> faults.
>>>>
>>>> Without this change guest virgl driver can't map host buffers into
>>>> guest
>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>> drivers that are utilizing TTM.
>>>>
>>>> Based on a patch proposed by Trigger Huang.
>>> Well I can't count how often I have repeated this: This is an
>>> absolutely
>>> clear NAK!
>>>
>>> TTM pages are not reference counted in the first place and because of
>>> this giving them to virgl is illegal.
>> A? The first page is refcounted when allocated, the tail pages are not.
>
> No they aren't. The first page is just by coincident initialized with
> a refcount of 1. This refcount is completely ignored and not used at all.
>
> Incrementing the reference count and by this mapping the page into
> some other address space is illegal and corrupts the internal state
> tracking of TTM.

See this comment in the source code as well:

        /* Don't set the __GFP_COMP flag for higher order allocations.
         * Mapping pages directly into an userspace process and calling
         * put_page() on a TTM allocated page is illegal.
         */

I have absolutely no idea how somebody had the idea he could do this.

Regards,
Christian.

>
>>> Please immediately stop this completely broken approach. We have
>>> discussed this multiple times now.
>> Could you please give me a link to these discussions?
>
> Not of hand, please search the dri-devel list for similar patches.
> This was brought up multiple times now.
>
> Regards,
> Christian.

2022-08-15 10:39:16

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

On 8/15/22 13:14, Christian König wrote:
> Am 15.08.22 um 12:11 schrieb Christian König:
>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>> On 8/15/22 13:05, Christian König wrote:
>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>> they
>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>> KVM. This
>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>> mapping
>>>>> faults.
>>>>>
>>>>> Without this change guest virgl driver can't map host buffers into
>>>>> guest
>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>> drivers that are utilizing TTM.
>>>>>
>>>>> Based on a patch proposed by Trigger Huang.
>>>> Well I can't count how often I have repeated this: This is an
>>>> absolutely
>>>> clear NAK!
>>>>
>>>> TTM pages are not reference counted in the first place and because of
>>>> this giving them to virgl is illegal.
>>> A? The first page is refcounted when allocated, the tail pages are not.
>>
>> No they aren't. The first page is just by coincident initialized with
>> a refcount of 1. This refcount is completely ignored and not used at all.
>>
>> Incrementing the reference count and by this mapping the page into
>> some other address space is illegal and corrupts the internal state
>> tracking of TTM.
>
> See this comment in the source code as well:
>
>         /* Don't set the __GFP_COMP flag for higher order allocations.
>          * Mapping pages directly into an userspace process and calling
>          * put_page() on a TTM allocated page is illegal.
>          */
>
> I have absolutely no idea how somebody had the idea he could do this.

I saw this comment, but it doesn't make sense because it doesn't explain
why it's illegal. Hence it looks like a bogus comment since the
refcouting certainly works, at least to a some degree because I haven't
noticed any problems in practice, maybe by luck :)

I'll try to dig out the older discussions, thank you for the quick reply!

--
Best regards,
Dmitry

2022-08-15 11:00:11

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

On 8/15/22 13:18, Dmitry Osipenko wrote:
> On 8/15/22 13:14, Christian König wrote:
>> Am 15.08.22 um 12:11 schrieb Christian König:
>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>> On 8/15/22 13:05, Christian König wrote:
>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>>> they
>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>> KVM. This
>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>> mapping
>>>>>> faults.
>>>>>>
>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>> guest
>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>>> drivers that are utilizing TTM.
>>>>>>
>>>>>> Based on a patch proposed by Trigger Huang.
>>>>> Well I can't count how often I have repeated this: This is an
>>>>> absolutely
>>>>> clear NAK!
>>>>>
>>>>> TTM pages are not reference counted in the first place and because of
>>>>> this giving them to virgl is illegal.
>>>> A? The first page is refcounted when allocated, the tail pages are not.
>>>
>>> No they aren't. The first page is just by coincident initialized with
>>> a refcount of 1. This refcount is completely ignored and not used at all.
>>>
>>> Incrementing the reference count and by this mapping the page into
>>> some other address space is illegal and corrupts the internal state
>>> tracking of TTM.
>>
>> See this comment in the source code as well:
>>
>>         /* Don't set the __GFP_COMP flag for higher order allocations.
>>          * Mapping pages directly into an userspace process and calling
>>          * put_page() on a TTM allocated page is illegal.
>>          */
>>
>> I have absolutely no idea how somebody had the idea he could do this.
>
> I saw this comment, but it doesn't make sense because it doesn't explain
> why it's illegal. Hence it looks like a bogus comment since the
> refcouting certainly works, at least to a some degree because I haven't
> noticed any problems in practice, maybe by luck :)
>
> I'll try to dig out the older discussions, thank you for the quick reply!

Are you sure it was really discussed in public previously? All I can
find is yours two answers to a similar patches where you're saying that
this it's a wrong solution without in-depth explanation and further
discussions.

Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.

--
Best regards,
Dmitry

2022-08-15 11:00:32

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

Am 15.08.22 um 12:18 schrieb Dmitry Osipenko:
> On 8/15/22 13:14, Christian König wrote:
>> Am 15.08.22 um 12:11 schrieb Christian König:
>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>> On 8/15/22 13:05, Christian König wrote:
>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>>> they
>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>> KVM. This
>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>> mapping
>>>>>> faults.
>>>>>>
>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>> guest
>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>>> drivers that are utilizing TTM.
>>>>>>
>>>>>> Based on a patch proposed by Trigger Huang.
>>>>> Well I can't count how often I have repeated this: This is an
>>>>> absolutely
>>>>> clear NAK!
>>>>>
>>>>> TTM pages are not reference counted in the first place and because of
>>>>> this giving them to virgl is illegal.
>>>> A? The first page is refcounted when allocated, the tail pages are not.
>>> No they aren't. The first page is just by coincident initialized with
>>> a refcount of 1. This refcount is completely ignored and not used at all.
>>>
>>> Incrementing the reference count and by this mapping the page into
>>> some other address space is illegal and corrupts the internal state
>>> tracking of TTM.
>> See this comment in the source code as well:
>>
>>         /* Don't set the __GFP_COMP flag for higher order allocations.
>>          * Mapping pages directly into an userspace process and calling
>>          * put_page() on a TTM allocated page is illegal.
>>          */
>>
>> I have absolutely no idea how somebody had the idea he could do this.
> I saw this comment, but it doesn't make sense because it doesn't explain
> why it's illegal. Hence it looks like a bogus comment since the
> refcouting certainly works, at least to a some degree because I haven't
> noticed any problems in practice, maybe by luck :)

Well exactly that's the problem. It does not work, you are just lucky :)

I will provide a patch to set the reference count to zero even for
non-compound pages. Maybe that will yield more backtrace to abusers of
this interface.

Regards,
Christian.

>
> I'll try to dig out the older discussions, thank you for the quick reply!
>

2022-08-15 11:03:21

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:
> On 8/15/22 13:18, Dmitry Osipenko wrote:
>> On 8/15/22 13:14, Christian König wrote:
>>> Am 15.08.22 um 12:11 schrieb Christian König:
>>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>>> On 8/15/22 13:05, Christian König wrote:
>>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>>>> they
>>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>>> KVM. This
>>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>>> mapping
>>>>>>> faults.
>>>>>>>
>>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>>> guest
>>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>>>> drivers that are utilizing TTM.
>>>>>>>
>>>>>>> Based on a patch proposed by Trigger Huang.
>>>>>> Well I can't count how often I have repeated this: This is an
>>>>>> absolutely
>>>>>> clear NAK!
>>>>>>
>>>>>> TTM pages are not reference counted in the first place and because of
>>>>>> this giving them to virgl is illegal.
>>>>> A? The first page is refcounted when allocated, the tail pages are not.
>>>> No they aren't. The first page is just by coincident initialized with
>>>> a refcount of 1. This refcount is completely ignored and not used at all.
>>>>
>>>> Incrementing the reference count and by this mapping the page into
>>>> some other address space is illegal and corrupts the internal state
>>>> tracking of TTM.
>>> See this comment in the source code as well:
>>>
>>>         /* Don't set the __GFP_COMP flag for higher order allocations.
>>>          * Mapping pages directly into an userspace process and calling
>>>          * put_page() on a TTM allocated page is illegal.
>>>          */
>>>
>>> I have absolutely no idea how somebody had the idea he could do this.
>> I saw this comment, but it doesn't make sense because it doesn't explain
>> why it's illegal. Hence it looks like a bogus comment since the
>> refcouting certainly works, at least to a some degree because I haven't
>> noticed any problems in practice, maybe by luck :)
>>
>> I'll try to dig out the older discussions, thank you for the quick reply!
> Are you sure it was really discussed in public previously? All I can
> find is yours two answers to a similar patches where you're saying that
> this it's a wrong solution without in-depth explanation and further
> discussions.

Yeah, that's my problem as well I can't find that of hand.

But yes it certainly was discussed in public.

>
> Maybe it was discussed privately? In this case I will be happy to get
> more info from you about the root of the problem so I could start to
> look at how to fix it properly. It's not apparent where the problem is
> to a TTM newbie like me.
>

Well this is completely unfixable. See the whole purpose of TTM is to
allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than
that whole functionality can't work correctly any more.

Regards,
Christian.

2022-08-15 11:51:50

by Dmitry Osipenko

[permalink] [raw]
Subject: Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

On 8/15/22 13:51, Christian König wrote:
> Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:
>> On 8/15/22 13:18, Dmitry Osipenko wrote:
>>> On 8/15/22 13:14, Christian König wrote:
>>>> Am 15.08.22 um 12:11 schrieb Christian König:
>>>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>>>> On 8/15/22 13:05, Christian König wrote:
>>>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>>>> Higher order pages allocated using alloc_pages() aren't
>>>>>>>> refcounted and
>>>>>>>> they
>>>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>>>> KVM. This
>>>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>>>> mapping
>>>>>>>> faults.
>>>>>>>>
>>>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>>>> guest
>>>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>>>> mappings are also needed for enabling the Venus driver using
>>>>>>>> host GPU
>>>>>>>> drivers that are utilizing TTM.
>>>>>>>>
>>>>>>>> Based on a patch proposed by Trigger Huang.
>>>>>>> Well I can't count how often I have repeated this: This is an
>>>>>>> absolutely
>>>>>>> clear NAK!
>>>>>>>
>>>>>>> TTM pages are not reference counted in the first place and
>>>>>>> because of
>>>>>>> this giving them to virgl is illegal.
>>>>>> A? The first page is refcounted when allocated, the tail pages are
>>>>>> not.
>>>>> No they aren't. The first page is just by coincident initialized with
>>>>> a refcount of 1. This refcount is completely ignored and not used
>>>>> at all.
>>>>>
>>>>> Incrementing the reference count and by this mapping the page into
>>>>> some other address space is illegal and corrupts the internal state
>>>>> tracking of TTM.
>>>> See this comment in the source code as well:
>>>>
>>>>          /* Don't set the __GFP_COMP flag for higher order allocations.
>>>>           * Mapping pages directly into an userspace process and
>>>> calling
>>>>           * put_page() on a TTM allocated page is illegal.
>>>>           */
>>>>
>>>> I have absolutely no idea how somebody had the idea he could do this.
>>> I saw this comment, but it doesn't make sense because it doesn't explain
>>> why it's illegal. Hence it looks like a bogus comment since the
>>> refcouting certainly works, at least to a some degree because I haven't
>>> noticed any problems in practice, maybe by luck :)
>>>
>>> I'll try to dig out the older discussions, thank you for the quick
>>> reply!
>> Are you sure it was really discussed in public previously? All I can
>> find is yours two answers to a similar patches where you're saying that
>> this it's a wrong solution without in-depth explanation and further
>> discussions.
>
> Yeah, that's my problem as well I can't find that of hand.
>
> But yes it certainly was discussed in public.

If it was only CC'd to dri-devel, then could be that emails didn't pass
the spam moderation :/

>> Maybe it was discussed privately? In this case I will be happy to get
>> more info from you about the root of the problem so I could start to
>> look at how to fix it properly. It's not apparent where the problem is
>> to a TTM newbie like me.
>>
>
> Well this is completely unfixable. See the whole purpose of TTM is to
> allow tracing where what is mapped of a buffer object.
>
> If you circumvent that and increase the page reference yourself than
> that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?

--
Best regards,
Dmitry