Subject: OMAPFB: CMA allocation failures

Hi Tomi,

patch http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/131269.html modifies
omapfb driver to use DMA API to allocate framebuffer memory instead of preallocating VRAM.

With this patch I see a lot of:

Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.879577] cma: dma_alloc_from_contiguous(cma c05f5844, count 192, align 8)
Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.914215] cma: dma_alloc_from_contiguous(): memory range at c07df000 is busy, retrying
Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.933502] cma: dma_alloc_from_contiguous(): memory range at c07e1000 is busy, retrying
Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.940032] cma: dma_alloc_from_contiguous(): memory range at c07e3000 is busy, retrying
Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.966644] cma: dma_alloc_from_contiguous(): memory range at c07e5000 is busy, retrying
Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.976867] cma: dma_alloc_from_contiguous(): memory range at c07e7000 is busy, retrying
Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038055] cma: dma_alloc_from_contiguous(): memory range at c07e9000 is busy, retrying
Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038116] cma: dma_alloc_from_contiguous(): returned (null)
Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038146] omapfb omapfb: failed to allocate framebuffer

errors while trying to play a video on N900 with Maemo 5 (Fremantle) on top of linux-3.12rc1.
It is deffinitely the CMA that fails to allocate the memory most of the times, but I wonder
how reliable CMA is to be used in omapfb. I even reserved 64MB for CMA, but that made no
difference. If CMA is disabled, the memory allocation still fails as obviously it is highly
unlikely there will be such a big chunk of continuous free memory on RAM limited device like
N900.

One obvious solution is to just revert the removal of VRAM memory allocator, but that would
mean I'll have to maintain a separate tree with all the implications that brings.

What would you advise on how to deal with the issue?

Regards,
Ivo


2013-10-14 06:04:51

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: OMAPFB: CMA allocation failures

Hi,

On 12/10/13 17:43, Ивайло Димитров wrote:
> Hi Tomi,
>
> patch http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/131269.html modifies
> omapfb driver to use DMA API to allocate framebuffer memory instead of preallocating VRAM.
>
> With this patch I see a lot of:
>
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.879577] cma: dma_alloc_from_contiguous(cma c05f5844, count 192, align 8)
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.914215] cma: dma_alloc_from_contiguous(): memory range at c07df000 is busy, retrying
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.933502] cma: dma_alloc_from_contiguous(): memory range at c07e1000 is busy, retrying
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.940032] cma: dma_alloc_from_contiguous(): memory range at c07e3000 is busy, retrying
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.966644] cma: dma_alloc_from_contiguous(): memory range at c07e5000 is busy, retrying
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.976867] cma: dma_alloc_from_contiguous(): memory range at c07e7000 is busy, retrying
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038055] cma: dma_alloc_from_contiguous(): memory range at c07e9000 is busy, retrying
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038116] cma: dma_alloc_from_contiguous(): returned (null)
> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038146] omapfb omapfb: failed to allocate framebuffer
>
> errors while trying to play a video on N900 with Maemo 5 (Fremantle) on top of linux-3.12rc1.
> It is deffinitely the CMA that fails to allocate the memory most of the times, but I wonder
> how reliable CMA is to be used in omapfb. I even reserved 64MB for CMA, but that made no
> difference. If CMA is disabled, the memory allocation still fails as obviously it is highly
> unlikely there will be such a big chunk of continuous free memory on RAM limited device like
> N900.
>
> One obvious solution is to just revert the removal of VRAM memory allocator, but that would
> mean I'll have to maintain a separate tree with all the implications that brings.
>
> What would you advise on how to deal with the issue?

I've not seen such errors, and I'm no expert on CMA. But I guess the
contiguous memory area can get fragmented enough no matter how hard one
tries to avoid it. The old VRAM system had the same issue, although it
was quite difficult to hit it.

64MB does sound quite a lot, though. I wonder what other drivers are
using CMA, and how do they manage to allocate so much memory and
fragment it so badly... With double buffering, N900 should only need
something like 3MB for the frame buffer.

With a quick glance I didn't find any debugfs or such files to show
information about the CMA area. It'd be helpful to find out what's going
on there. Or maybe normal allocations are fragmenting the CMA area, but
for some reason they cannot be moved? Just guessing.

There's also dma_declare_contiguous() that could be used to reserve
memory for omapfb. I have not used it, and I have no idea if it would
help here. But it's something you could try.

Tomi



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature
Subject: Re: OMAPFB: CMA allocation failures

Hi

>-------- Оригинално писмо --------
>От: Tomi Valkeinen
>Относно: Re: OMAPFB: CMA allocation failures
>До: Ивайло Димитров

>Изпратено на: Понеделник, 2013, Октомври 14 09:04:35 EEST
>
>
>Hi,
>
>On 12/10/13 17:43, Ивайло Димитров wrote:
>> Hi Tomi,
>>
>> patch http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/131269.html modifies
>> omapfb driver to use DMA API to allocate framebuffer memory instead of preallocating VRAM.
>>
>> With this patch I see a lot of:
>>
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.879577] cma: dma_alloc_from_contiguous(cma c05f5844, count 192, align 8)
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.914215] cma: dma_alloc_from_contiguous(): memory range at c07df000 is busy, retrying
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.933502] cma: dma_alloc_from_contiguous(): memory range at c07e1000 is busy, retrying
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.940032] cma: dma_alloc_from_contiguous(): memory range at c07e3000 is busy, retrying
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.966644] cma: dma_alloc_from_contiguous(): memory range at c07e5000 is busy, retrying
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.976867] cma: dma_alloc_from_contiguous(): memory range at c07e7000 is busy, retrying
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038055] cma: dma_alloc_from_contiguous(): memory range at c07e9000 is busy, retrying
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038116] cma: dma_alloc_from_contiguous(): returned (null)
>> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038146] omapfb omapfb: failed to allocate framebuffer
>>
>> errors while trying to play a video on N900 with Maemo 5 (Fremantle) on top of linux-3.12rc1.
>> It is deffinitely the CMA that fails to allocate the memory most of the times, but I wonder
>> how reliable CMA is to be used in omapfb. I even reserved 64MB for CMA, but that made no
>> difference. If CMA is disabled, the memory allocation still fails as obviously it is highly
>> unlikely there will be such a big chunk of continuous free memory on RAM limited device like
>> N900.
>>
>> One obvious solution is to just revert the removal of VRAM memory allocator, but that would
>> mean I'll have to maintain a separate tree with all the implications that brings.
>>
>> What would you advise on how to deal with the issue?
>
>I've not seen such errors, and I'm no expert on CMA. But I guess the
>contiguous memory area can get fragmented enough no matter how hard one
>tries to avoid it. The old VRAM system had the same issue, although it
>was quite difficult to hit it.

I am using my n900 as a daily/only device since the beginning of 2010, never seen such an
issue with video playback. And as a maintainer of one of the community supported kernels for
n900 (kernel-power) I've never had such an issue reported. On stock kernel and derivatives of
course. It seems VRAM allocator is virtually impossible to fail, while with CMA OMAPFB fails on
the first video after boot-up.

When saying you've not seen such an issue - did you actually test video playback, on what
device and using which distro? Did you use DSP accelerated decoding?

>64MB does sound quite a lot, though. I wonder what other drivers are
>using CMA, and how do they manage to allocate so much memory and
>fragment it so badly... With double buffering, N900 should only need
>something like 3MB for the frame buffer.

Sure, 64 MB is a lot, but I just wanted to see if that would make any difference. And for 720p
3MB is not enough, something like 8MB is needed.

>With a quick glance I didn't find any debugfs or such files to show
>information about the CMA area. It'd be helpful to find out what's going
>on there. Or maybe normal allocations are fragmenting the CMA area, but
>for some reason they cannot be moved? Just guessing.

I was able to track down the failures to:
http://lxr.free-electrons.com/source/mm/migrate.c#L320

So it seems the problem is not that CMA gets fragmented, rather some pages cannot be migrated.
Unfortunately, my knowledge stops here. Someone from the mm guys should be involved in the
issue as well? I am starting to think there is some serious issue with CMA and/or mm I am
hitting on n900. As it is not the lack of free RAM that is the problem -
"echo 3>/proc/sys/vm/drop_caches" results in more that 45MB of free RAM according to free.

>There's also dma_declare_contiguous() that could be used to reserve
>memory for omapfb. I have not used it, and I have no idea if it would
>help here. But it's something you could try.

dma_declare_contiguous() won't help IMO, it just reserves CMA area that is private to the
driver, so it is used instead of the global CMA area, but I don't see how that would help in my
case.

Anyway, what about reverting VRAM allocator removal and migrating it to DMA API, the same way
DMA coherent pool is allocated and managed? Or simply revering VRAM allocator removal :) ?

Regards,
Ivo

2013-10-15 07:37:13

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: OMAPFB: CMA allocation failures

On 15/10/13 09:49, Ивайло Димитров wrote:

> I am using my n900 as a daily/only device since the beginning of 2010, never seen such an
> issue with video playback. And as a maintainer of one of the community supported kernels for
> n900 (kernel-power) I've never had such an issue reported. On stock kernel and derivatives of
> course. It seems VRAM allocator is virtually impossible to fail, while with CMA OMAPFB fails on
> the first video after boot-up.

Yes, I think with normal fb use it's quite difficult to fragment VRAM
allocator too much.

> When saying you've not seen such an issue - did you actually test video playback, on what
> device and using which distro? Did you use DSP accelerated decoding?

No, I don't have a rootfs with DSP, and quite rarely test video
playback. But the VRAM allocator was removed a year ago, and this is the
first time I've seen anyone have issues with the CMA.

> I was able to track down the failures to:
> http://lxr.free-electrons.com/source/mm/migrate.c#L320
>
> So it seems the problem is not that CMA gets fragmented, rather some pages cannot be migrated.
> Unfortunately, my knowledge stops here. Someone from the mm guys should be involved in the
> issue as well? I am starting to think there is some serious issue with CMA and/or mm I am
> hitting on n900. As it is not the lack of free RAM that is the problem -
> "echo 3>/proc/sys/vm/drop_caches" results in more that 45MB of free RAM according to free.

I think we should somehow find out what the pages are that cannot be
migrated, and where they come from.

So there are "anonymous pages without mapping" with page_count(page) !=
1. I have to say I don't know what that means =). I need to find some
time to study the mm.

> dma_declare_contiguous() won't help IMO, it just reserves CMA area that is private to the
> driver, so it is used instead of the global CMA area, but I don't see how that would help in my
> case.

If the issue is not about fragmentation, then I think you're right,
dma_declare_contiguous won't help.

> Anyway, what about reverting VRAM allocator removal and migrating it to DMA API, the same way
> DMA coherent pool is allocated and managed? Or simply revering VRAM allocator removal :) ?

Well, as I said, you're the first one to report any errors, after the
change being in use for a year. Maybe people just haven't used recent
enough kernels, and the issue is only now starting to emerge, but I
wouldn't draw any conclusions yet.

If the CMA would have big generic issues, I think we would've seen
issues earlier. So I'm guessing it's some driver or app in your setup
that's causing the issues. Maybe the driver/app is broken, or maybe that
specific behavior is not handled well by CMA. In both case I think we
need to identify what that driver/app is.

I wonder how I could try to reproduce this with a generic omap3 board...

Tomi



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature
Subject: Re: OMAPFB: CMA allocation failures

Hi Tomi,

>I think we should somehow find out what the pages are that cannot be
>migrated, and where they come from.
>
>So there are "anonymous pages without mapping" with page_count(page) !=
>1. I have to say I don't know what that means =). I need to find some
>time to study the mm.

I put some more traces in the point of failure, the result:

page_count(page) == 2, page->flags == 0x0008025D, which is:

PG_locked, PG_referenced, PG_uptodate, PG_dirty, PG_active, PG_arch_1, PG_unevictable

Whatever those mean :). I have no idea how to identify where those pages come from.

>Well, as I said, you're the first one to report any errors, after the
>change being in use for a year. Maybe people just haven't used recent
>enough kernels, and the issue is only now starting to emerge, but I
>wouldn't draw any conclusions yet.

I am (almost) sure I am the first one to test video playback on OMAP3 with DSP video
acceleration, using recent kernel and Maemo5 on n900 :). So there is high probability the issue was not reported earlier because noone have tested it thoroughly after the change.

>If the CMA would have big generic issues, I think we would've seen
>issues earlier. So I'm guessing it's some driver or app in your setup
>that's causing the issues. Maybe the driver/app is broken, or maybe that
>specific behavior is not handled well by CMA. In both case I think we
>need to identify what that driver/app is.

What I know is going on, is that there is heavy fs I/O at the same time - there is
a thumbnailer process running in background which tries to extract thumbnails of all video
files in the system. Also, there are other processes doing various jobs (e-mail fetching, IM
accounts login, whatnot). And in addition Xorg mlocks parts of its address space. Of course all
this happens with lots of memory being swapped in and out. I guess all this is related.

However, even after the system has settled, the CMA failures continue to happen. It looks like
some pages are allocated from CMA which should not be.

>I wonder how I could try to reproduce this with a generic omap3 board...

I can always reproduce it here (well, not on generic board, but I guess it is even better to test in real-life conditions), so if you need some specific tests or traces or whatever, I
can do them for you.

Regards,
Ivo

Subject: Re: OMAPFB: CMA allocation failures

Hi,

I wonder if there is any progress on the issue? Do you need me to send more data? Or
should I raise the issue with the CMA maintainer?

Regards,
Ivo

>-------- Оригинално писмо --------
>От: Ивайло Димитров
>Относно: Re: OMAPFB: CMA allocation failures
>До: Tomi Valkeinen
>Изпратено на: Сряда, 2013, Октомври 16 09:33:51 EEST
>
>
> Hi Tomi,
>
>>I think we should somehow find out what the pages are that cannot be
>>migrated, and where they come from.
>>
>>So there are "anonymous pages without mapping" with page_count(page) !=
>>1. I have to say I don't know what that means =). I need to find some
>>time to study the mm.
>
>I put some more traces in the point of failure, the result:
>page_count(page) == 2, page->flags == 0x0008025D, which is:
>PG_locked, PG_referenced, PG_uptodate, PG_dirty, PG_active, PG_arch_1, PG_unevictable
>Whatever those mean :). I have no idea how to identify where those pages come from.
>
>>Well, as I said, you're the first one to report any errors, after the
>>change being in use for a year. Maybe people just haven't used recent
>>enough kernels, and the issue is only now starting to emerge, but I
>>wouldn't draw any conclusions yet.
>
>I am (almost) sure I am the first one to test video playback on OMAP3 with DSP video
>acceleration, using recent kernel and Maemo5 on n900 :). So there is high probability the
>issue was not reported earlier because noone have tested it thoroughly after the change.
>
>>If the CMA would have big generic issues, I think we would've seen
>>issues earlier. So I'm guessing it's some driver or app in your setup
>>that's causing the issues. Maybe the driver/app is broken, or maybe that
>>specific behavior is not handled well by CMA. In both case I think we
>>need to identify what that driver/app is.
>
>What I know is going on, is that there is heavy fs I/O at the same time - there is
>a thumbnailer process running in background which tries to extract thumbnails of all video
>files in the system. Also, there are other processes doing various jobs (e-mail fetching, IM
>accounts login, whatnot). And in addition Xorg mlocks parts of its address space. Of course
>all this happens with lots of memory being swapped in and out. I guess all this is related.
>
>However, even after the system has settled, the CMA failures continue to happen. It looks like
>some pages are allocated from CMA which should not be.
>
>>I wonder how I could try to reproduce this with a generic omap3 board...
>
>I can always reproduce it here (well, not on generic board, but I guess it is even better to
>test in real-life conditions), so if you need some specific tests or traces or whatever, I
>can do them for you.
>
>Regards,
>Ivo
>

2013-10-24 07:01:29

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: OMAPFB: CMA allocation failures

Hi,

On 24/10/13 00:59, Ивайло Димитров wrote:
> Hi,
>
> I wonder if there is any progress on the issue? Do you need me to send more data? Or
> should I raise the issue with the CMA maintainer?

No, I haven't had time to look at this. And frankly, I don't even have
an idea what to look for if I can't reproduce it. The issue is not about
display, but DMA allocation, of which I know very little.

So yes, I suggest you try to discuss this with CMA/DMA people.

Tomi

>
> Regards,
> Ivo
>
> >-------- Оригинално писмо --------
> >От: Ивайло Димитров
> >Относно: Re: OMAPFB: CMA allocation failures
> >До: Tomi Valkeinen
> >Изпратено на: Сряда, 2013, Октомври 16 09:33:51 EEST
> >
> >
> > Hi Tomi,
> >
> >>I think we should somehow find out what the pages are that cannot be
> >>migrated, and where they come from.
> >>
> >>So there are "anonymous pages without mapping" with page_count(page) !=
> >>1. I have to say I don't know what that means =). I need to find some
> >>time to study the mm.
> >
> >I put some more traces in the point of failure, the result:
> >page_count(page) == 2, page->flags == 0x0008025D, which is:
> >PG_locked, PG_referenced, PG_uptodate, PG_dirty, PG_active, PG_arch_1, PG_unevictable
> >Whatever those mean :). I have no idea how to identify where those pages come from.
> >
> >>Well, as I said, you're the first one to report any errors, after the
> >>change being in use for a year. Maybe people just haven't used recent
> >>enough kernels, and the issue is only now starting to emerge, but I
> >>wouldn't draw any conclusions yet.
> >
> >I am (almost) sure I am the first one to test video playback on OMAP3 with DSP video
> >acceleration, using recent kernel and Maemo5 on n900 :). So there is high probability the
> >issue was not reported earlier because noone have tested it thoroughly after the change.
> >
> >>If the CMA would have big generic issues, I think we would've seen
> >>issues earlier. So I'm guessing it's some driver or app in your setup
> >>that's causing the issues. Maybe the driver/app is broken, or maybe that
> >>specific behavior is not handled well by CMA. In both case I think we
> >>need to identify what that driver/app is.
> >
> >What I know is going on, is that there is heavy fs I/O at the same time - there is
> >a thumbnailer process running in background which tries to extract thumbnails of all video
> >files in the system. Also, there are other processes doing various jobs (e-mail fetching, IM
> >accounts login, whatnot). And in addition Xorg mlocks parts of its address space. Of course
> >all this happens with lots of memory being swapped in and out. I guess all this is related.
> >
> >However, even after the system has settled, the CMA failures continue to happen. It looks like
> >some pages are allocated from CMA which should not be.
> >
> >>I wonder how I could try to reproduce this with a generic omap3 board...
> >
> >I can always reproduce it here (well, not on generic board, but I guess it is even better to
> >test in real-life conditions), so if you need some specific tests or traces or whatever, I
> >can do them for you.
> >
> >Regards,
> >Ivo
> >
>



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature

2013-10-28 07:37:51

by Minchan Kim

[permalink] [raw]
Subject: Re: OMAPFB: CMA allocation failures

Hello,

On Tue, Oct 15, 2013 at 09:49:51AM +0300, Ивайло Димитров wrote:
> Hi
>
> >-------- Оригинално писмо --------
> >От: Tomi Valkeinen
> >Относно: Re: OMAPFB: CMA allocation failures
> >До: Ивайло Димитров
>
> >Изпратено на: Понеделник, 2013, Октомври 14 09:04:35 EEST
> >
> >
> >Hi,
> >
> >On 12/10/13 17:43, Ивайло Димитров wrote:
> >> Hi Tomi,
> >>
> >> patch http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/131269.html modifies
> >> omapfb driver to use DMA API to allocate framebuffer memory instead of preallocating VRAM.
> >>
> >> With this patch I see a lot of:
> >>
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.879577] cma: dma_alloc_from_contiguous(cma c05f5844, count 192, align 8)
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.914215] cma: dma_alloc_from_contiguous(): memory range at c07df000 is busy, retrying
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.933502] cma: dma_alloc_from_contiguous(): memory range at c07e1000 is busy, retrying
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.940032] cma: dma_alloc_from_contiguous(): memory range at c07e3000 is busy, retrying
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.966644] cma: dma_alloc_from_contiguous(): memory range at c07e5000 is busy, retrying
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.976867] cma: dma_alloc_from_contiguous(): memory range at c07e7000 is busy, retrying
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038055] cma: dma_alloc_from_contiguous(): memory range at c07e9000 is busy, retrying
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038116] cma: dma_alloc_from_contiguous(): returned (null)
> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038146] omapfb omapfb: failed to allocate framebuffer
> >>
> >> errors while trying to play a video on N900 with Maemo 5 (Fremantle) on top of linux-3.12rc1.
> >> It is deffinitely the CMA that fails to allocate the memory most of the times, but I wonder
> >> how reliable CMA is to be used in omapfb. I even reserved 64MB for CMA, but that made no
> >> difference. If CMA is disabled, the memory allocation still fails as obviously it is highly
> >> unlikely there will be such a big chunk of continuous free memory on RAM limited device like
> >> N900.
> >>
> >> One obvious solution is to just revert the removal of VRAM memory allocator, but that would
> >> mean I'll have to maintain a separate tree with all the implications that brings.
> >>
> >> What would you advise on how to deal with the issue?
> >
> >I've not seen such errors, and I'm no expert on CMA. But I guess the
> >contiguous memory area can get fragmented enough no matter how hard one
> >tries to avoid it. The old VRAM system had the same issue, although it
> >was quite difficult to hit it.
>
> I am using my n900 as a daily/only device since the beginning of 2010, never seen such an
> issue with video playback. And as a maintainer of one of the community supported kernels for
> n900 (kernel-power) I've never had such an issue reported. On stock kernel and derivatives of
> course. It seems VRAM allocator is virtually impossible to fail, while with CMA OMAPFB fails on
> the first video after boot-up.
>
> When saying you've not seen such an issue - did you actually test video playback, on what
> device and using which distro? Did you use DSP accelerated decoding?
>
> >64MB does sound quite a lot, though. I wonder what other drivers are
> >using CMA, and how do they manage to allocate so much memory and
> >fragment it so badly... With double buffering, N900 should only need
> >something like 3MB for the frame buffer.
>
> Sure, 64 MB is a lot, but I just wanted to see if that would make any difference. And for 720p
> 3MB is not enough, something like 8MB is needed.
>
> >With a quick glance I didn't find any debugfs or such files to show
> >information about the CMA area. It'd be helpful to find out what's going
> >on there. Or maybe normal allocations are fragmenting the CMA area, but
> >for some reason they cannot be moved? Just guessing.
>
> I was able to track down the failures to:
> http://lxr.free-electrons.com/source/mm/migrate.c#L320

That path is for anonymous page migration so the culprit I can think of
is that you did get_user_pages on those anonymous pages for pin them.
Right?

If so, it's no surpse that fails the migration and CMA doesn't work.

--
Kind regards,
Minchan Kim

Subject: Re: OMAPFB: CMA allocation failures


Hi,


>-------- Оригинално писмо --------
>От: Minchan Kim
>Относно: Re: OMAPFB: CMA allocation failures
>До: Ивайло Димитров
>Изпратено на: Понеделник, 2013, Октомври 28 09:37:48 EET
>
>
>Hello,
>
>On Tue, Oct 15, 2013 at 09:49:51AM +0300, Ивайло Димитров wrote:
>> Hi
>>
>> >-------- Оригинално писмо --------
>> >От: Tomi Valkeinen
>> >Относно: Re: OMAPFB: CMA allocation failures
>> >До: Ивайло Димитров
>>
>> >Изпратено на: Понеделник, 2013, Октомври 14 09:04:35 EEST
>> >
>> >
>> >Hi,
>> >
>> >On 12/10/13 17:43, Ивайло Димитров wrote:
>> >> Hi Tomi,
>> >>
>> >> patch http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/131269.html modifies
>> >> omapfb driver to use DMA API to allocate framebuffer memory instead of preallocating VRAM.
>> >>
>> >> With this patch I see a lot of:
>> >>
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.879577] cma: dma_alloc_from_contiguous(cma c05f5844, count 192, align 8)
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.914215] cma: dma_alloc_from_contiguous(): memory range at c07df000 is busy, retrying
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.933502] cma: dma_alloc_from_contiguous(): memory range at c07e1000 is busy, retrying
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.940032] cma: dma_alloc_from_contiguous(): memory range at c07e3000 is busy, retrying
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.966644] cma: dma_alloc_from_contiguous(): memory range at c07e5000 is busy, retrying
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.976867] cma: dma_alloc_from_contiguous(): memory range at c07e7000 is busy, retrying
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038055] cma: dma_alloc_from_contiguous(): memory range at c07e9000 is busy, retrying
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038116] cma: dma_alloc_from_contiguous(): returned (null)
>> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038146] omapfb omapfb: failed to allocate framebuffer
>> >>
>> >> errors while trying to play a video on N900 with Maemo 5 (Fremantle) on top of linux-3.12rc1.
>> >> It is deffinitely the CMA that fails to allocate the memory most of the times, but I wonder
>> >> how reliable CMA is to be used in omapfb. I even reserved 64MB for CMA, but that made no
>> >> difference. If CMA is disabled, the memory allocation still fails as obviously it is highly
>> >> unlikely there will be such a big chunk of continuous free memory on RAM limited device like
>> >> N900.
>> >>
>> >> One obvious solution is to just revert the removal of VRAM memory allocator, but that would
>> >> mean I'll have to maintain a separate tree with all the implications that brings.
>> >>
>> >> What would you advise on how to deal with the issue?
>> >
>> >I've not seen such errors, and I'm no expert on CMA. But I guess the
>> >contiguous memory area can get fragmented enough no matter how hard one
>> >tries to avoid it. The old VRAM system had the same issue, although it
>> >was quite difficult to hit it.
>>
>> I am using my n900 as a daily/only device since the beginning of 2010, never seen such an
>> issue with video playback. And as a maintainer of one of the community supported kernels for
>> n900 (kernel-power) I've never had such an issue reported. On stock kernel and derivatives of
>> course. It seems VRAM allocator is virtually impossible to fail, while with CMA OMAPFB fails on
>> the first video after boot-up.
>>
>> When saying you've not seen such an issue - did you actually test video playback, on what
>> device and using which distro? Did you use DSP accelerated decoding?
>>
>> >64MB does sound quite a lot, though. I wonder what other drivers are
>> >using CMA, and how do they manage to allocate so much memory and
>> >fragment it so badly... With double buffering, N900 should only need
>> >something like 3MB for the frame buffer.
>>
>> Sure, 64 MB is a lot, but I just wanted to see if that would make any difference. And for 720p
>> 3MB is not enough, something like 8MB is needed.
>>
>> >With a quick glance I didn't find any debugfs or such files to show
>> >information about the CMA area. It'd be helpful to find out what's going
>> >on there. Or maybe normal allocations are fragmenting the CMA area, but
>> >for some reason they cannot be moved? Just guessing.
>>
>> I was able to track down the failures to:
>> http://lxr.free-electrons.com/source/mm/migrate.c#L320
>
>That path is for anonymous page migration so the culprit I can think of
>is that you did get_user_pages on those anonymous pages for pin them.
>Right?
>

I grepped through the code and there are lots of places where get_user_pages is called, though
I suspect either SGX or DSP (or both) drivers to be the ones to blame. Both of them are active
and needed for HW accelerated video decoding.

>If so, it's no surpse that fails the migration and CMA doesn't work.
>
>--
>Kind regards,
>Minchan Kim
>

Well, if CMA is to be reliable, I would expect some logic to take care about get_user_pages
causing MIGRATE_CMA pages to be effectively made non-migratable, either by migrating them out of
CMA area before they got pinned or by providing a mechanism to migrate them when needed. I am far
from knowing the nuts and bolts of MM and CMA, but so far I failed to see any such logic. Without
it, CMA could be fine for allocating small buffers, but when we talk about framebuffer memory
needed for 720p playback(for example) on a RAM limited embedded device, it is too fragile, IMO.

BTW quick googling shows I am not the first one to encounter similar problems [0], [1], I don't
see solution for.

However, back to omapfb - my understanding is that the way it uses CMA (in its current form) is
prone to allocation failures way beyond acceptable.

Tomi, what do you think about adding module parameters to allow pre-allocating framebuffer memory
from CMA during boot? Or re-implement VRAM allocator to use CMA? As a good side-effect
OMAPFB_GET_VRAM_INFO will no longer return fake values.

Regards,
Ivo

[0] http://lwn.net/Articles/541423/
[1] https://lkml.org/lkml/2012/11/29/69

2013-10-30 05:53:49

by Minchan Kim

[permalink] [raw]
Subject: Re: OMAPFB: CMA allocation failures

Hello, Ивайло


On Tue, Oct 29, 2013 at 02:47:35PM +0200, Ивайло Димитров wrote:
>
> Hi,
>
>
> >-------- Оригинално писмо --------
> >От: Minchan Kim
> >Относно: Re: OMAPFB: CMA allocation failures
> >До: Ивайло Димитров
> >Изпратено на: Понеделник, 2013, Октомври 28 09:37:48 EET
> >
> >
> >Hello,
> >
> >On Tue, Oct 15, 2013 at 09:49:51AM +0300, Ивайло Димитров wrote:
> >> Hi
> >>
> >> >-------- Оригинално писмо --------
> >> >От: Tomi Valkeinen
> >> >Относно: Re: OMAPFB: CMA allocation failures
> >> >До: Ивайло Димитров
> >>
> >> >Изпратено на: Понеделник, 2013, Октомври 14 09:04:35 EEST
> >> >
> >> >
> >> >Hi,
> >> >
> >> >On 12/10/13 17:43, Ивайло Димитров wrote:
> >> >> Hi Tomi,
> >> >>
> >> >> patch http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/131269.html modifies
> >> >> omapfb driver to use DMA API to allocate framebuffer memory instead of preallocating VRAM.
> >> >>
> >> >> With this patch I see a lot of:
> >> >>
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.879577] cma: dma_alloc_from_contiguous(cma c05f5844, count 192, align 8)
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.914215] cma: dma_alloc_from_contiguous(): memory range at c07df000 is busy, retrying
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.933502] cma: dma_alloc_from_contiguous(): memory range at c07e1000 is busy, retrying
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.940032] cma: dma_alloc_from_contiguous(): memory range at c07e3000 is busy, retrying
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.966644] cma: dma_alloc_from_contiguous(): memory range at c07e5000 is busy, retrying
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2054.976867] cma: dma_alloc_from_contiguous(): memory range at c07e7000 is busy, retrying
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038055] cma: dma_alloc_from_contiguous(): memory range at c07e9000 is busy, retrying
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038116] cma: dma_alloc_from_contiguous(): returned (null)
> >> >> Jan 1 06:33:27 Nokia-N900 kernel: [ 2055.038146] omapfb omapfb: failed to allocate framebuffer
> >> >>
> >> >> errors while trying to play a video on N900 with Maemo 5 (Fremantle) on top of linux-3.12rc1.
> >> >> It is deffinitely the CMA that fails to allocate the memory most of the times, but I wonder
> >> >> how reliable CMA is to be used in omapfb. I even reserved 64MB for CMA, but that made no
> >> >> difference. If CMA is disabled, the memory allocation still fails as obviously it is highly
> >> >> unlikely there will be such a big chunk of continuous free memory on RAM limited device like
> >> >> N900.
> >> >>
> >> >> One obvious solution is to just revert the removal of VRAM memory allocator, but that would
> >> >> mean I'll have to maintain a separate tree with all the implications that brings.
> >> >>
> >> >> What would you advise on how to deal with the issue?
> >> >
> >> >I've not seen such errors, and I'm no expert on CMA. But I guess the
> >> >contiguous memory area can get fragmented enough no matter how hard one
> >> >tries to avoid it. The old VRAM system had the same issue, although it
> >> >was quite difficult to hit it.
> >>
> >> I am using my n900 as a daily/only device since the beginning of 2010, never seen such an
> >> issue with video playback. And as a maintainer of one of the community supported kernels for
> >> n900 (kernel-power) I've never had such an issue reported. On stock kernel and derivatives of
> >> course. It seems VRAM allocator is virtually impossible to fail, while with CMA OMAPFB fails on
> >> the first video after boot-up.
> >>
> >> When saying you've not seen such an issue - did you actually test video playback, on what
> >> device and using which distro? Did you use DSP accelerated decoding?
> >>
> >> >64MB does sound quite a lot, though. I wonder what other drivers are
> >> >using CMA, and how do they manage to allocate so much memory and
> >> >fragment it so badly... With double buffering, N900 should only need
> >> >something like 3MB for the frame buffer.
> >>
> >> Sure, 64 MB is a lot, but I just wanted to see if that would make any difference. And for 720p
> >> 3MB is not enough, something like 8MB is needed.
> >>
> >> >With a quick glance I didn't find any debugfs or such files to show
> >> >information about the CMA area. It'd be helpful to find out what's going
> >> >on there. Or maybe normal allocations are fragmenting the CMA area, but
> >> >for some reason they cannot be moved? Just guessing.
> >>
> >> I was able to track down the failures to:
> >> http://lxr.free-electrons.com/source/mm/migrate.c#L320
> >
> >That path is for anonymous page migration so the culprit I can think of
> >is that you did get_user_pages on those anonymous pages for pin them.
> >Right?
> >
>
> I grepped through the code and there are lots of places where get_user_pages is called, though
> I suspect either SGX or DSP (or both) drivers to be the ones to blame. Both of them are active
> and needed for HW accelerated video decoding.
>
> >If so, it's no surpse that fails the migration and CMA doesn't work.
> >
> >--
> >Kind regards,
> >Minchan Kim
> >
>
> Well, if CMA is to be reliable, I would expect some logic to take care about get_user_pages

First of all, CMA is never reliable.

> causing MIGRATE_CMA pages to be effectively made non-migratable, either by migrating them out of
> CMA area before they got pinned or by providing a mechanism to migrate them when needed. I am far
> from knowing the nuts and bolts of MM and CMA, but so far I failed to see any such logic. Without

If you read below links you attached, you could know why it doesn't accept.

> it, CMA could be fine for allocating small buffers, but when we talk about framebuffer memory
> needed for 720p playback(for example) on a RAM limited embedded device, it is too fragile, IMO.

True.

>
> BTW quick googling shows I am not the first one to encounter similar problems [0], [1], I don't
> see solution for.
>
> However, back to omapfb - my understanding is that the way it uses CMA (in its current form) is
> prone to allocation failures way beyond acceptable.

Basically, fragile subsystem shouldn't use CMA, otherwise, your platform should support process
killing to unpin some pages, Yeah I know it's not 100% solution and very horrible but I know
some insane people have done it.

I just post an idea.
http://marc.info/?l=linux-mm&m=138311160522311&w=2
If anybody has a interest, maybe we will move that way.

Thanks.

>
> Tomi, what do you think about adding module parameters to allow pre-allocating framebuffer memory
> from CMA during boot? Or re-implement VRAM allocator to use CMA? As a good side-effect
> OMAPFB_GET_VRAM_INFO will no longer return fake values.
>
> Regards,
> Ivo
>
> [0] http://lwn.net/Articles/541423/
> [1] https://lkml.org/lkml/2012/11/29/69
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Kind regards,
Minchan Kim

2013-10-30 12:19:58

by Tomi Valkeinen

[permalink] [raw]
Subject: Re: OMAPFB: CMA allocation failures

On 2013-10-29 14:47, Ивайло Димитров wrote:

> However, back to omapfb - my understanding is that the way it uses CMA (in its current form) is
> prone to allocation failures way beyond acceptable.
>
> Tomi, what do you think about adding module parameters to allow pre-allocating framebuffer memory
> from CMA during boot? Or re-implement VRAM allocator to use CMA? As a good side-effect
> OMAPFB_GET_VRAM_INFO will no longer return fake values.

I really dislike the idea of adding the omap vram allocator back. Then
again, if the CMA doesn't work, something has to be done.

Pre-allocating is possible, but that won't work if there's any need to
re-allocating the framebuffers. Except if the omapfb would retain and
manage the pre-allocated buffers, but that would just be more or less
the old vram allocator again.

So, as I see it, the best option would be to have the standard dma_alloc
functions get the memory for omapfb from a private pool, which is not
used for anything else.

I wonder if that's possible already? It sounds quite trivial to me.

Tomi



Attachments:
signature.asc (901.00 B)
OpenPGP digital signature