Subject: Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

Hi Daniel,

On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann <[email protected]> wrote:
> >
> > Hi
> >
> > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > >> Thank you for testing. But don't get too excited, because the patch
> > >> simulates a bug that was present in the original mgag200 code. A
> > >> significant number of frames are simply skipped. That is apparently the
> > >> reason why it's faster.
> > >
> > > Thanks for the detailed info, so the original code skips time-consuming
> > > work inside atomic context on purpose. Is there any space to optmise it?
> > > If 2 scheduled update worker are handled at almost same time, can one be
> > > skipped?
> >
> > To my knowledge, there's only one instance of the worker. Re-scheduling
> > the worker before a previous instance started, will not create a second
> > instance. The worker's instance will complete all pending updates. So in
> > some way, skipping workers already happens.
>
> So I think that the most often fbcon update from atomic context is the
> blinking cursor. If you disable that one you should be back to the old
> performance level I think, since just writing to dmesg is from process
> context, so shouldn't change.

Hmm, then for the old driver, it should also do the most update in
non-atomic context?

One other thing is, I profiled that updating a 3MB shadow buffer needs
20 ms, which transfer to 150 MB/s bandwidth. Could it be related with
the cache setting of DRM shadow buffer? say the orginal code use a
cachable buffer?

>
> https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking
>
> Bunch of tricks, but tbh I haven't tested them.

Thomas has suggested to disable curson by
echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink

We tried that way, and no change for the performance data.

Thanks,
Feng

>
> In any case, I still strongly advice you don't print anything to dmesg
> or fbcon while benchmarking, because dmesg/printf are anything but
> fast, especially if a gpu driver is involved. There's some efforts to
> make the dmesg/printk side less painful (untangling the console_lock
> from printk), but fundamentally printing to the gpu from the kernel
> through dmesg/fbcon won't be cheap. It's just not something we
> optimize beyond "make sure it works for emergencies".
> -Daniel
>
> >
> > Best regards
> > Thomas
> >
> > >
> > > Thanks,
> > > Feng
> > >
> > >>
> > >> Best regards
> > >> Thomas
> > > _______________________________________________
> > > dri-devel mailing list
> > > [email protected]
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >
> >
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> > HRB 21284 (AG Nürnberg)
> >
> > _______________________________________________
> > dri-devel mailing list
> > [email protected]
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

2019-09-04 08:45:26

Hi

Am 04.09.19 um 08:27 schrieb Feng Tang:
> Hi Thomas,
>
> On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 28.08.19 um 11:37 schrieb Rong Chen:
>>> Hi Thomas,
>>>
>>> On 8/28/19 1:16 AM, Thomas Zimmermann wrote:
>>>> Hi
>>>>
>>>> Am 27.08.19 um 14:33 schrieb Chen, Rong A:
>>>>> Both patches have little impact on the performance from our side.
>>>> Thanks for testing. Too bad they doesn't solve the issue.
>>>>
>>>> There's another patch attached. Could you please tests this as well?
>>>> Thanks a lot!
>>>>
>>>> The patch comes from Daniel Vetter after discussing the problem on IRC.
>>>> The idea of the patch is that the old mgag200 code might display much
>>>> less frames that the generic code, because mgag200 only prints from
>>>> non-atomic context. If we simulate this with the generic code, we should
>>>> see roughly the original performance.
>>>>
>>>>
>>>
>>> It's cool, the patch "usecansleep.patch" can fix the issue.
>>
>> Thank you for testing. But don't get too excited, because the patch
>> simulates a bug that was present in the original mgag200 code. A
>> significant number of frames are simply skipped. That is apparently the
>> reason why it's faster.
>
> Thanks for the detailed info, so the original code skips time-consuming
> work inside atomic context on purpose. Is there any space to optmise it?
> If 2 scheduled update worker are handled at almost same time, can one be
> skipped?

We discussed ideas on IRC and decided that screen updates could be
synchronized with vblank intervals. This may give some rate limiting to
the output.

If you like, you could try the patch set at [1]. It adds the respective
code to console and mgag200.

Best regards
Thomas

[1]
https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html

>
> Thanks,
> Feng
>
>>
>> Best regards
>> Thomas

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

Attachments:

signature.asc (499.00 B)
OpenPGP digital signature

2019-09-16 10:40:18

by Feng Tang

[permalink] [raw]

Subject: Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

Hi Thomas,

On Mon, Sep 09, 2019 at 04:12:37PM +0200, Thomas Zimmermann wrote:
> Hi
>
> Am 04.09.19 um 08:27 schrieb Feng Tang:
> > Hi Thomas,
> >
> > On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote:
> >> Hi
> >>
> >> Am 28.08.19 um 11:37 schrieb Rong Chen:
> >>> Hi Thomas,
> >>>
> >>> On 8/28/19 1:16 AM, Thomas Zimmermann wrote:
> >>>> Hi
> >>>>
> >>>> Am 27.08.19 um 14:33 schrieb Chen, Rong A:
> >>>>> Both patches have little impact on the performance from our side.
> >>>> Thanks for testing. Too bad they doesn't solve the issue.
> >>>>
> >>>> There's another patch attached. Could you please tests this as well?
> >>>> Thanks a lot!
> >>>>
> >>>> The patch comes from Daniel Vetter after discussing the problem on IRC.
> >>>> The idea of the patch is that the old mgag200 code might display much
> >>>> less frames that the generic code, because mgag200 only prints from
> >>>> non-atomic context. If we simulate this with the generic code, we should
> >>>> see roughly the original performance.
> >>>>
> >>>>
> >>>
> >>> It's cool, the patch "usecansleep.patch" can fix the issue.
> >>
> >> Thank you for testing. But don't get too excited, because the patch
> >> simulates a bug that was present in the original mgag200 code. A
> >> significant number of frames are simply skipped. That is apparently the
> >> reason why it's faster.
> >
> > Thanks for the detailed info, so the original code skips time-consuming
> > work inside atomic context on purpose. Is there any space to optmise it?
> > If 2 scheduled update worker are handled at almost same time, can one be
> > skipped?
>
> We discussed ideas on IRC and decided that screen updates could be
> synchronized with vblank intervals. This may give some rate limiting to
> the output.
>
> If you like, you could try the patch set at [1]. It adds the respective
> code to console and mgag200.

I just tried the 2 patches, no obvious change (comparing to the
18.8% regression), both in overall benchmark and micro-profiling.

90f479ae51afa45e 04a0983095feaee022cdd65e3e4
---------------- ---------------------------
37236 ± 3% +2.5% 38167 ± 3% vm-scalability.median
0.15 ± 24% -25.1% 0.11 ± 23% vm-scalability.median_stddev
0.15 ± 23% -25.1% 0.11 ± 22% vm-scalability.stddev
12767318 ± 4% +2.5% 13089177 ± 3% vm-scalability.throughput

Thanks,
Feng

>
> Best regards
> Thomas
>
> [1]
> https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html
>
> >
> > Thanks,
> > Feng
> >
> >>
> >> Best regards
> >> Thomas
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> HRB 21284 (AG Nürnberg)
>

2019-09-17 09:54:24

by Thomas Zimmermann

[permalink] [raw]

Subject: Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

Hi

Am 16.09.19 um 11:06 schrieb Feng Tang:
> Hi Thomas,
>
> On Mon, Sep 09, 2019 at 04:12:37PM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 04.09.19 um 08:27 schrieb Feng Tang:
>>> Hi Thomas,
>>>
>>> On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote:
>>>> Hi
>>>>
>>>> Am 28.08.19 um 11:37 schrieb Rong Chen:
>>>>> Hi Thomas,
>>>>>
>>>>> On 8/28/19 1:16 AM, Thomas Zimmermann wrote:
>>>>>> Hi
>>>>>>
>>>>>> Am 27.08.19 um 14:33 schrieb Chen, Rong A:
>>>>>>> Both patches have little impact on the performance from our side.
>>>>>> Thanks for testing. Too bad they doesn't solve the issue.
>>>>>>
>>>>>> There's another patch attached. Could you please tests this as well?
>>>>>> Thanks a lot!
>>>>>>
>>>>>> The patch comes from Daniel Vetter after discussing the problem on IRC.
>>>>>> The idea of the patch is that the old mgag200 code might display much
>>>>>> less frames that the generic code, because mgag200 only prints from
>>>>>> non-atomic context. If we simulate this with the generic code, we should
>>>>>> see roughly the original performance.
>>>>>>
>>>>>>
>>>>>
>>>>> It's cool, the patch "usecansleep.patch" can fix the issue.
>>>>
>>>> Thank you for testing. But don't get too excited, because the patch
>>>> simulates a bug that was present in the original mgag200 code. A
>>>> significant number of frames are simply skipped. That is apparently the
>>>> reason why it's faster.
>>>
>>> Thanks for the detailed info, so the original code skips time-consuming
>>> work inside atomic context on purpose. Is there any space to optmise it?
>>> If 2 scheduled update worker are handled at almost same time, can one be
>>> skipped?
>>
>> We discussed ideas on IRC and decided that screen updates could be
>> synchronized with vblank intervals. This may give some rate limiting to
>> the output.
>>
>> If you like, you could try the patch set at [1]. It adds the respective
>> code to console and mgag200.
>
> I just tried the 2 patches, no obvious change (comparing to the
> 18.8% regression), both in overall benchmark and micro-profiling.
>
> 90f479ae51afa45e 04a0983095feaee022cdd65e3e4
> ---------------- ---------------------------
> 37236 ± 3% +2.5% 38167 ± 3% vm-scalability.median
> 0.15 ± 24% -25.1% 0.11 ± 23% vm-scalability.median_stddev
> 0.15 ± 23% -25.1% 0.11 ± 22% vm-scalability.stddev
> 12767318 ± 4% +2.5% 13089177 ± 3% vm-scalability.throughput

Thank you for testing. I wish we'd seen at least some improvement.

Best regards
Thomas

> Thanks,
> Feng
>
>>
>> Best regards
>> Thomas
>>
>> [1]
>> https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html
>>
>>>
>>> Thanks,
>>> Feng
>>>
>>>>
>>>> Best regards
>>>> Thomas
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
>> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
>> HRB 21284 (AG Nürnberg)
>>
>
>
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

Attachments:

signature.asc (499.00 B)
OpenPGP digital signature