2024-05-06 11:49:34

by Hans de Goede

[permalink] [raw]
Subject: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi dma-buf maintainers, et.al.,

Various people have been working on making complex/MIPI cameras work OOTB
with mainline Linux kernels and an opensource userspace stack.

The generic solution adds a software ISP (for Debayering and 3A) to
libcamera. Libcamera's API guarantees that buffers handed to applications
using it are dma-bufs so that these can be passed to e.g. a video encoder.

In order to meet this API guarantee the libcamera software ISP allocates
dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
the Fedora COPR repo for the PoC of this:
https://hansdegoede.dreamwidth.org/28153.html

I have added a simple udev rule to give physically present users access
to the dma_heap-s:

KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"

(and on Rasperry Pi devices any users in the video group get access)

This was just a quick fix for the PoC. Now that we are ready to move out
of the PoC phase and start actually integrating this into distributions
the question becomes if this is an acceptable solution; or if we need some
other way to deal with this ?

Specifically the question is if this will have any negative security
implications? I can certainly see this being used to do some sort of
denial of service attack on the system (1). This is especially true for
the cma heap which generally speaking is a limited resource.

But devices tagged for uaccess are only opened up to users who are
physcially present behind the machine and those can just hit
the powerbutton, so I don't believe that any *on purpose* DOS is part of
the thread model. Any accidental DOS would be a userspace stack bug.

Do you foresee any other negative security implications from allowing
physically present non root users to create (u)dma-bufs ?

Regards,

Hans


1) There are some limits in drivers/dma-buf/udmabuf.c and distributions
could narrow these.




2024-05-06 12:09:42

by Maxime Ripard

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi,

On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> Hi dma-buf maintainers, et.al.,
>
> Various people have been working on making complex/MIPI cameras work OOTB
> with mainline Linux kernels and an opensource userspace stack.
>
> The generic solution adds a software ISP (for Debayering and 3A) to
> libcamera. Libcamera's API guarantees that buffers handed to applications
> using it are dma-bufs so that these can be passed to e.g. a video encoder.
>
> In order to meet this API guarantee the libcamera software ISP allocates
> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> the Fedora COPR repo for the PoC of this:
> https://hansdegoede.dreamwidth.org/28153.html

For the record, we're also considering using them for ARM KMS devices,
so it would be better if the solution wasn't only considering v4l2
devices.

> I have added a simple udev rule to give physically present users access
> to the dma_heap-s:
>
> KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
>
> (and on Rasperry Pi devices any users in the video group get access)
>
> This was just a quick fix for the PoC. Now that we are ready to move out
> of the PoC phase and start actually integrating this into distributions
> the question becomes if this is an acceptable solution; or if we need some
> other way to deal with this ?
>
> Specifically the question is if this will have any negative security
> implications? I can certainly see this being used to do some sort of
> denial of service attack on the system (1). This is especially true for
> the cma heap which generally speaking is a limited resource.

There's plenty of other ways to exhaust CMA, like allocating too much
KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
differently than those if it's part of our threat model.

> But devices tagged for uaccess are only opened up to users who are
> physcially present behind the machine and those can just hit
> the powerbutton, so I don't believe that any *on purpose* DOS is part of
> the thread model.

How would that work for headless devices?

Maxime


Attachments:
(No filename) (2.11 kB)
signature.asc (281.00 B)
Download all attachments

2024-05-06 12:11:24

by Hans de Goede

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi Maxime,

On 5/6/24 2:05 PM, Maxime Ripard wrote:
> Hi,
>
> On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
>> Hi dma-buf maintainers, et.al.,
>>
>> Various people have been working on making complex/MIPI cameras work OOTB
>> with mainline Linux kernels and an opensource userspace stack.
>>
>> The generic solution adds a software ISP (for Debayering and 3A) to
>> libcamera. Libcamera's API guarantees that buffers handed to applications
>> using it are dma-bufs so that these can be passed to e.g. a video encoder.
>>
>> In order to meet this API guarantee the libcamera software ISP allocates
>> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
>> the Fedora COPR repo for the PoC of this:
>> https://hansdegoede.dreamwidth.org/28153.html
>
> For the record, we're also considering using them for ARM KMS devices,
> so it would be better if the solution wasn't only considering v4l2
> devices.
>
>> I have added a simple udev rule to give physically present users access
>> to the dma_heap-s:
>>
>> KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
>>
>> (and on Rasperry Pi devices any users in the video group get access)
>>
>> This was just a quick fix for the PoC. Now that we are ready to move out
>> of the PoC phase and start actually integrating this into distributions
>> the question becomes if this is an acceptable solution; or if we need some
>> other way to deal with this ?
>>
>> Specifically the question is if this will have any negative security
>> implications? I can certainly see this being used to do some sort of
>> denial of service attack on the system (1). This is especially true for
>> the cma heap which generally speaking is a limited resource.
>
> There's plenty of other ways to exhaust CMA, like allocating too much
> KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> differently than those if it's part of our threat model.

Ack.

>> But devices tagged for uaccess are only opened up to users who are
>> physcially present behind the machine and those can just hit
>> the powerbutton, so I don't believe that any *on purpose* DOS is part of
>> the thread model.
>
> How would that work for headless devices?

The uaccess tag solution does not work for headless devices, but it
also should not hurt any headless scenarios.

Headless devices could use something like the video group solution
(dma_heap group?) which Raspberry Pi is using and them make sure that
any services which need access run as a user in that group.

This can co-exist with uaccess tags since those use ACLs not classic Unix
permissions.

Regards,

Hans



2024-05-06 14:02:55

by Hans de Goede

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi Sima,

On 5/6/24 3:38 PM, Daniel Vetter wrote:
> On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
>> Hi,
>>
>> On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
>>> Hi dma-buf maintainers, et.al.,
>>>
>>> Various people have been working on making complex/MIPI cameras work OOTB
>>> with mainline Linux kernels and an opensource userspace stack.
>>>
>>> The generic solution adds a software ISP (for Debayering and 3A) to
>>> libcamera. Libcamera's API guarantees that buffers handed to applications
>>> using it are dma-bufs so that these can be passed to e.g. a video encoder.
>>>
>>> In order to meet this API guarantee the libcamera software ISP allocates
>>> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
>>> the Fedora COPR repo for the PoC of this:
>>> https://hansdegoede.dreamwidth.org/28153.html
>>
>> For the record, we're also considering using them for ARM KMS devices,
>> so it would be better if the solution wasn't only considering v4l2
>> devices.
>>
>>> I have added a simple udev rule to give physically present users access
>>> to the dma_heap-s:
>>>
>>> KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
>>>
>>> (and on Rasperry Pi devices any users in the video group get access)
>>>
>>> This was just a quick fix for the PoC. Now that we are ready to move out
>>> of the PoC phase and start actually integrating this into distributions
>>> the question becomes if this is an acceptable solution; or if we need some
>>> other way to deal with this ?
>>>
>>> Specifically the question is if this will have any negative security
>>> implications? I can certainly see this being used to do some sort of
>>> denial of service attack on the system (1). This is especially true for
>>> the cma heap which generally speaking is a limited resource.
>>
>> There's plenty of other ways to exhaust CMA, like allocating too much
>> KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
>> differently than those if it's part of our threat model.
>
> So generally for an arm soc where your display needs cma, your render node
> doesn't. And user applications only have access to the later, while only
> the compositor gets a kms fd through logind. At least in drm aside from
> vc4 there's really no render driver that just gives you access to cma and
> allows you to exhaust that, you need to be a compositor with drm master
> access to the display.
>
> Which means we're mostly protected against bad applications, and that's
> not a threat the "user physically sits in front of the machine accounts
> for", and which giving cma access to everyone would open up. And with
> flathub/snaps/... this is very much an issue.

I agree that bad applications are an issue, but not for the flathub / snaps
case. Flatpacks / snaps run sandboxed and don't have access to a full /dev
so those should not be able to open /dev/dma_heap/* independent of
the ACLs on /dev/dma_heap/*. The plan is for cameras using the
libcamera software ISP to always be accessed through pipewire and
the camera portal, so in this case pipewere is taking the place of
the compositor in your kms vs render node example.

So this reduces the problem to bad apps packaged by regular distributions
and if any of those misbehave the distros should fix that.

So I think that for the denial of service side allowing physical
present users (but not sandboxed apps running as those users) to
access /dev/dma_heap/* should be ok.

My bigger worry is if dma_heap (u)dma-bufs can be abused in other
ways then causing a denial of service.

I guess that the answer there is that causing other security issues
should not be possible ?

Regards,

Hans


2024-05-06 19:07:40

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
> Hi,
>
> On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> > Hi dma-buf maintainers, et.al.,
> >
> > Various people have been working on making complex/MIPI cameras work OOTB
> > with mainline Linux kernels and an opensource userspace stack.
> >
> > The generic solution adds a software ISP (for Debayering and 3A) to
> > libcamera. Libcamera's API guarantees that buffers handed to applications
> > using it are dma-bufs so that these can be passed to e.g. a video encoder.
> >
> > In order to meet this API guarantee the libcamera software ISP allocates
> > dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> > the Fedora COPR repo for the PoC of this:
> > https://hansdegoede.dreamwidth.org/28153.html
>
> For the record, we're also considering using them for ARM KMS devices,
> so it would be better if the solution wasn't only considering v4l2
> devices.
>
> > I have added a simple udev rule to give physically present users access
> > to the dma_heap-s:
> >
> > KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
> >
> > (and on Rasperry Pi devices any users in the video group get access)
> >
> > This was just a quick fix for the PoC. Now that we are ready to move out
> > of the PoC phase and start actually integrating this into distributions
> > the question becomes if this is an acceptable solution; or if we need some
> > other way to deal with this ?
> >
> > Specifically the question is if this will have any negative security
> > implications? I can certainly see this being used to do some sort of
> > denial of service attack on the system (1). This is especially true for
> > the cma heap which generally speaking is a limited resource.
>
> There's plenty of other ways to exhaust CMA, like allocating too much
> KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> differently than those if it's part of our threat model.

So generally for an arm soc where your display needs cma, your render node
doesn't. And user applications only have access to the later, while only
the compositor gets a kms fd through logind. At least in drm aside from
vc4 there's really no render driver that just gives you access to cma and
allows you to exhaust that, you need to be a compositor with drm master
access to the display.

Which means we're mostly protected against bad applications, and that's
not a threat the "user physically sits in front of the machine accounts
for", and which giving cma access to everyone would open up. And with
flathub/snaps/... this is very much an issue.

So you need more, either:

- cgroups limits on dma-buf and dma-buf heaps. This has been bikeshedded
for years and is just not really moving.

- An allocator service which checks whether you're allowed to allocate
these special buffers. Android does that through binder.

Probably also some way to nuke applications that refuse to release buffers
when they're no longer the right application. cgroups is a lot more
convenient for that.
-Sima

> > But devices tagged for uaccess are only opened up to users who are
> > physcially present behind the machine and those can just hit
> > the powerbutton, so I don't believe that any *on purpose* DOS is part of
> > the thread model.
>
> How would that work for headless devices?
>
> Maxime



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-07 12:32:06

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 06, 2024 at 04:01:42PM +0200, Hans de Goede wrote:
> Hi Sima,
>
> On 5/6/24 3:38 PM, Daniel Vetter wrote:
> > On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
> >> Hi,
> >>
> >> On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> >>> Hi dma-buf maintainers, et.al.,
> >>>
> >>> Various people have been working on making complex/MIPI cameras work OOTB
> >>> with mainline Linux kernels and an opensource userspace stack.
> >>>
> >>> The generic solution adds a software ISP (for Debayering and 3A) to
> >>> libcamera. Libcamera's API guarantees that buffers handed to applications
> >>> using it are dma-bufs so that these can be passed to e.g. a video encoder.
> >>>
> >>> In order to meet this API guarantee the libcamera software ISP allocates
> >>> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> >>> the Fedora COPR repo for the PoC of this:
> >>> https://hansdegoede.dreamwidth.org/28153.html
> >>
> >> For the record, we're also considering using them for ARM KMS devices,
> >> so it would be better if the solution wasn't only considering v4l2
> >> devices.
> >>
> >>> I have added a simple udev rule to give physically present users access
> >>> to the dma_heap-s:
> >>>
> >>> KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
> >>>
> >>> (and on Rasperry Pi devices any users in the video group get access)
> >>>
> >>> This was just a quick fix for the PoC. Now that we are ready to move out
> >>> of the PoC phase and start actually integrating this into distributions
> >>> the question becomes if this is an acceptable solution; or if we need some
> >>> other way to deal with this ?
> >>>
> >>> Specifically the question is if this will have any negative security
> >>> implications? I can certainly see this being used to do some sort of
> >>> denial of service attack on the system (1). This is especially true for
> >>> the cma heap which generally speaking is a limited resource.
> >>
> >> There's plenty of other ways to exhaust CMA, like allocating too much
> >> KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> >> differently than those if it's part of our threat model.
> >
> > So generally for an arm soc where your display needs cma, your render node
> > doesn't. And user applications only have access to the later, while only
> > the compositor gets a kms fd through logind. At least in drm aside from
> > vc4 there's really no render driver that just gives you access to cma and
> > allows you to exhaust that, you need to be a compositor with drm master
> > access to the display.
> >
> > Which means we're mostly protected against bad applications, and that's
> > not a threat the "user physically sits in front of the machine accounts
> > for", and which giving cma access to everyone would open up. And with
> > flathub/snaps/... this is very much an issue.
>
> I agree that bad applications are an issue, but not for the flathub / snaps
> case. Flatpacks / snaps run sandboxed and don't have access to a full /dev
> so those should not be able to open /dev/dma_heap/* independent of
> the ACLs on /dev/dma_heap/*. The plan is for cameras using the
> libcamera software ISP to always be accessed through pipewire and
> the camera portal, so in this case pipewere is taking the place of
> the compositor in your kms vs render node example.

Yeah essentially if you clarify to "set the permissions such that pipewire
can do allocations", then I think that makes sense. And is at the same
level as e.g. drm kms giving compsitors (but _only_ compositors) special
access rights.

> So this reduces the problem to bad apps packaged by regular distributions
> and if any of those misbehave the distros should fix that.
>
> So I think that for the denial of service side allowing physical
> present users (but not sandboxed apps running as those users) to
> access /dev/dma_heap/* should be ok.
>
> My bigger worry is if dma_heap (u)dma-bufs can be abused in other
> ways then causing a denial of service.
>
> I guess that the answer there is that causing other security issues
> should not be possible ?

Well pinned memory exhaustion is a very useful tool to make all kinds of
other kernel issues exploitable. Like if you have that you can weaponize
all kinds of kmalloc error paths (and since it's untracked memory the oom
killer will not get you of these issuees).

I think for the pipewire based desktop it'd be best if you only allow
pipewire to get at an fd for allocating from dma-heaps, kinda like logind
furnishes the kms master fd ... Still has the issue that you can't nuke
these buffers, but that's for another day. But at least from a "limit
attack surface" design pov I think this would be better than just handing
out access to the current user outright. But that's also not the worst
option I guess, as long as snaps/flatpacks only go through the pipewire
service.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-07 13:40:20

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 06, 2024 at 01:49:17PM +0200, Hans de Goede wrote:
> Hi dma-buf maintainers, et.al.,
>
> Various people have been working on making complex/MIPI cameras work OOTB
> with mainline Linux kernels and an opensource userspace stack.
>
> The generic solution adds a software ISP (for Debayering and 3A) to
> libcamera. Libcamera's API guarantees that buffers handed to applications
> using it are dma-bufs so that these can be passed to e.g. a video encoder.
>
> In order to meet this API guarantee the libcamera software ISP allocates
> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> the Fedora COPR repo for the PoC of this:
> https://hansdegoede.dreamwidth.org/28153.html

Is there any reason for allocating DMA buffers for libcamera through
/dev/dma_heap/ rather than allocating them via corresponding media
device and then giving them away to DRM / VPU / etc?

At least this should solve the permission usecase: if the app can open
camera device, it can allocate a buffer.

> I have added a simple udev rule to give physically present users access
> to the dma_heap-s:
>
> KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
>
> (and on Rasperry Pi devices any users in the video group get access)
>
> This was just a quick fix for the PoC. Now that we are ready to move out
> of the PoC phase and start actually integrating this into distributions
> the question becomes if this is an acceptable solution; or if we need some
> other way to deal with this ?
>
> Specifically the question is if this will have any negative security
> implications? I can certainly see this being used to do some sort of
> denial of service attack on the system (1). This is especially true for
> the cma heap which generally speaking is a limited resource.
>
> But devices tagged for uaccess are only opened up to users who are
> physcially present behind the machine and those can just hit
> the powerbutton, so I don't believe that any *on purpose* DOS is part of
> the thread model. Any accidental DOS would be a userspace stack bug.
>
> Do you foresee any other negative security implications from allowing
> physically present non root users to create (u)dma-bufs ?
>
> Regards,
>
> Hans
>
>
> 1) There are some limits in drivers/dma-buf/udmabuf.c and distributions
> could narrow these.
>
>

--
With best wishes
Dmitry

2024-05-07 13:41:41

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 06, 2024 at 04:01:42PM +0200, Hans de Goede wrote:
> Hi Sima,
>
> On 5/6/24 3:38 PM, Daniel Vetter wrote:
> > On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
> >> Hi,
> >>
> >> On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> >>> Hi dma-buf maintainers, et.al.,
> >>>
> >>> Various people have been working on making complex/MIPI cameras work OOTB
> >>> with mainline Linux kernels and an opensource userspace stack.
> >>>
> >>> The generic solution adds a software ISP (for Debayering and 3A) to
> >>> libcamera. Libcamera's API guarantees that buffers handed to applications
> >>> using it are dma-bufs so that these can be passed to e.g. a video encoder.
> >>>
> >>> In order to meet this API guarantee the libcamera software ISP allocates
> >>> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> >>> the Fedora COPR repo for the PoC of this:
> >>> https://hansdegoede.dreamwidth.org/28153.html
> >>
> >> For the record, we're also considering using them for ARM KMS devices,
> >> so it would be better if the solution wasn't only considering v4l2
> >> devices.
> >>
> >>> I have added a simple udev rule to give physically present users access
> >>> to the dma_heap-s:
> >>>
> >>> KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
> >>>
> >>> (and on Rasperry Pi devices any users in the video group get access)
> >>>
> >>> This was just a quick fix for the PoC. Now that we are ready to move out
> >>> of the PoC phase and start actually integrating this into distributions
> >>> the question becomes if this is an acceptable solution; or if we need some
> >>> other way to deal with this ?
> >>>
> >>> Specifically the question is if this will have any negative security
> >>> implications? I can certainly see this being used to do some sort of
> >>> denial of service attack on the system (1). This is especially true for
> >>> the cma heap which generally speaking is a limited resource.
> >>
> >> There's plenty of other ways to exhaust CMA, like allocating too much
> >> KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> >> differently than those if it's part of our threat model.
> >
> > So generally for an arm soc where your display needs cma, your render node
> > doesn't. And user applications only have access to the later, while only
> > the compositor gets a kms fd through logind. At least in drm aside from
> > vc4 there's really no render driver that just gives you access to cma and
> > allows you to exhaust that, you need to be a compositor with drm master
> > access to the display.
> >
> > Which means we're mostly protected against bad applications, and that's
> > not a threat the "user physically sits in front of the machine accounts
> > for", and which giving cma access to everyone would open up. And with
> > flathub/snaps/... this is very much an issue.
>
> I agree that bad applications are an issue, but not for the flathub / snaps
> case. Flatpacks / snaps run sandboxed and don't have access to a full /dev
> so those should not be able to open /dev/dma_heap/* independent of
> the ACLs on /dev/dma_heap/*. The plan is for cameras using the
> libcamera software ISP to always be accessed through pipewire and
> the camera portal, so in this case pipewere is taking the place of
> the compositor in your kms vs render node example.
>
> So this reduces the problem to bad apps packaged by regular distributions
> and if any of those misbehave the distros should fix that.
>
> So I think that for the denial of service side allowing physical
> present users (but not sandboxed apps running as those users) to
> access /dev/dma_heap/* should be ok.

What about an app built by the user? The machines can still be
multi-seat or have multiple users.

> My bigger worry is if dma_heap (u)dma-bufs can be abused in other
> ways then causing a denial of service.
>
> I guess that the answer there is that causing other security issues
> should not be possible ?
>
> Regards,
>
> Hans
>

--
With best wishes
Dmitry

2024-05-07 14:34:39

by Hans de Goede

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi Dmitry,

On 5/7/24 3:32 PM, Dmitry Baryshkov wrote:
> On Mon, May 06, 2024 at 01:49:17PM +0200, Hans de Goede wrote:
>> Hi dma-buf maintainers, et.al.,
>>
>> Various people have been working on making complex/MIPI cameras work OOTB
>> with mainline Linux kernels and an opensource userspace stack.
>>
>> The generic solution adds a software ISP (for Debayering and 3A) to
>> libcamera. Libcamera's API guarantees that buffers handed to applications
>> using it are dma-bufs so that these can be passed to e.g. a video encoder.
>>
>> In order to meet this API guarantee the libcamera software ISP allocates
>> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
>> the Fedora COPR repo for the PoC of this:
>> https://hansdegoede.dreamwidth.org/28153.html
>
> Is there any reason for allocating DMA buffers for libcamera through
> /dev/dma_heap/ rather than allocating them via corresponding media
> device and then giving them away to DRM / VPU / etc?
>
> At least this should solve the permission usecase: if the app can open
> camera device, it can allocate a buffer.

This is with a software ISP, the input buffers with raw bayer data
come from a v4l2 device, but the output buffers with the processed
data are purely userspace managed in this case.

Regards,

Hans


2024-05-07 15:16:05

by Bryan O'Donoghue

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> providing data to VPU or DRM, then you should be able to get the buffer
> from the data-consuming device.

Because we don't necessarily know what the consuming device is, if any.

Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
sake be GPU or DSP.

Also if we introduce a dependency on another device to allocate the
output buffers - say always taking the output buffer from the GPU, then
we've added another dependency which is more difficult to guarantee
across different arches.

---
bod

2024-05-07 15:19:59

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue
<[email protected]> wrote:
>
> On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > providing data to VPU or DRM, then you should be able to get the buffer
> > from the data-consuming device.
>
> Because we don't necessarily know what the consuming device is, if any.
>
> Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> sake be GPU or DSP.
>
> Also if we introduce a dependency on another device to allocate the
> output buffers - say always taking the output buffer from the GPU, then
> we've added another dependency which is more difficult to guarantee
> across different arches.

Yes. And it should be expected. It's a consumer who knows the
restrictions on the buffer. As I wrote, Zoom/Hangouts should not
require a DMA buffer at all. Applications should be able to allocate
the buffer out of the generic memory. GPUs might also have different
requirements. Consider GPUs with VRAM. It might be beneficial to
allocate a buffer out of VRAM rather than generic DMA mem.

--
With best wishes
Dmitry

2024-05-07 15:43:51

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 04:34:24PM +0200, Hans de Goede wrote:
> Hi Dmitry,
>
> On 5/7/24 3:32 PM, Dmitry Baryshkov wrote:
> > On Mon, May 06, 2024 at 01:49:17PM +0200, Hans de Goede wrote:
> >> Hi dma-buf maintainers, et.al.,
> >>
> >> Various people have been working on making complex/MIPI cameras work OOTB
> >> with mainline Linux kernels and an opensource userspace stack.
> >>
> >> The generic solution adds a software ISP (for Debayering and 3A) to
> >> libcamera. Libcamera's API guarantees that buffers handed to applications
> >> using it are dma-bufs so that these can be passed to e.g. a video encoder.
> >>
> >> In order to meet this API guarantee the libcamera software ISP allocates
> >> dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> >> the Fedora COPR repo for the PoC of this:
> >> https://hansdegoede.dreamwidth.org/28153.html
> >
> > Is there any reason for allocating DMA buffers for libcamera through
> > /dev/dma_heap/ rather than allocating them via corresponding media
> > device and then giving them away to DRM / VPU / etc?
> >
> > At least this should solve the permission usecase: if the app can open
> > camera device, it can allocate a buffer.
>
> This is with a software ISP, the input buffers with raw bayer data
> come from a v4l2 device, but the output buffers with the processed
> data are purely userspace managed in this case.

Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
providing data to VPU or DRM, then you should be able to get the buffer
from the data-consuming device. If the data is further processed by
a userspace app, then it should not require DMA memory at all.

My main concern is that dma-heaps is both too generic and too limiting
for the generic library implementation.

--
With best wishes
Dmitry

2024-05-07 18:38:16

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 07:36:59PM +0200, Daniel Vetter wrote:
> On Tue, May 07, 2024 at 04:15:05PM +0100, Bryan O'Donoghue wrote:
> > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > providing data to VPU or DRM, then you should be able to get the buffer
> > > from the data-consuming device.
> >
> > Because we don't necessarily know what the consuming device is, if any.
>
> Well ... that's an entirely different issue. And it's unsolved.
>
> Currently the approach is to allocate where the constraints are usually
> most severe (like display, if you need that, or the camera module for
> input) and then just pray the stack works out without too much copying.
> All userspace (whether the generic glue or the userspace driver depends a
> bit upon the exact api) does need to have a copy fallback for these
> sharing cases, ideally with the copying accelerated by hw.
>
> If you try to solve this by just preemptive allocating everything as cma
> buffers, then you'll make the situation substantially worse (because now
> you're wasting tons of cma memory where you might not even need it).
> And without really solving the problem, since for some gpus that memory
> might be unusable (because you cannot scan that out on any discrete gpu,
> and sometimes not even on an integrated one).

I think we have a general agreement that the proposed solution is a
stop-gap measure for an unsolved issue.

Note that libcamera is already designed that way. The API is designed to
import buffers, using dma-buf file handles. If an application has a way
to allocate dma-buf instances through another means (from the display or
from a video encoder for instance), it should do so, and use those
buffers with libcamera.

For applications that don't have an easy way to get hold of dma-buf
instances, we have a buffer allocator helper as a side component. That
allocator uses the underlying camera capture device, and allocates
buffers from the V4L2 video device. It's only on platforms where we have
no hardware camera processing (or, rather, platforms where the hardware
vendors doesn't give us access to the camera hardware, such as recent
Intel SoCs, or Qualcomm SoCs used in ARM laptops) that we need to
allocate memory elsewhere.

In the long run, I want a centralized memory allocator accessible by
userspace applications (something similar in purpose to gralloc on
Android), and I want to get rid of buffer allocation in libcamera (and
even in V4L2, in the even longer term). That's the long run.

Shorter term, we have a problem to solve, and the best option we have
found so far is to rely on dma-buf heaps as a backend for the frame
buffer allocatro helper in libcamera for the use case described above.
This won't work in 100% of the cases, clearly. It's a stop-gap measure
until we can do better.

> > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument sake
> > be GPU or DSP.
> >
> > Also if we introduce a dependency on another device to allocate the output
> > buffers - say always taking the output buffer from the GPU, then we've added
> > another dependency which is more difficult to guarantee across different
> > arches.

--
Regards,

Laurent Pinchart

2024-05-07 18:41:11

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote:
> On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote:
> > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > providing data to VPU or DRM, then you should be able to get the buffer
> > > from the data-consuming device.
> >
> > Because we don't necessarily know what the consuming device is, if any.
> >
> > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> > sake be GPU or DSP.
> >
> > Also if we introduce a dependency on another device to allocate the
> > output buffers - say always taking the output buffer from the GPU, then
> > we've added another dependency which is more difficult to guarantee
> > across different arches.
>
> Yes. And it should be expected. It's a consumer who knows the
> restrictions on the buffer. As I wrote, Zoom/Hangouts should not
> require a DMA buffer at all.

Why not ? If you want to capture to a buffer that you then compose on
the screen without copying data, dma-buf is the way to go. That's the
Linux solution for buffer sharing.

> Applications should be able to allocate
> the buffer out of the generic memory.

If applications really want to copy data and degrade performance, they
are free to shoot themselves in the foot of course. Applications (or
compositors) need to support copying as a fallback in the worst case,
but all components should at least aim for the zero-copy case.

> GPUs might also have different
> requirements. Consider GPUs with VRAM. It might be beneficial to
> allocate a buffer out of VRAM rather than generic DMA mem.

Absolutely. For that we need a centralized device memory allocator in
userspace. An effort was started by James Jones in 2016, see [1]. It has
unfortunately stalled. If I didn't have a camera framework to develop, I
would try to tackle that issue :-)

[1] https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf

--
Regards,

Laurent Pinchart

2024-05-07 19:00:19

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 06, 2024 at 03:38:24PM +0200, Daniel Vetter wrote:
> On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
> > On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> > > Hi dma-buf maintainers, et.al.,
> > >
> > > Various people have been working on making complex/MIPI cameras work OOTB
> > > with mainline Linux kernels and an opensource userspace stack.
> > >
> > > The generic solution adds a software ISP (for Debayering and 3A) to
> > > libcamera. Libcamera's API guarantees that buffers handed to applications
> > > using it are dma-bufs so that these can be passed to e.g. a video encoder.
> > >
> > > In order to meet this API guarantee the libcamera software ISP allocates
> > > dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> > > the Fedora COPR repo for the PoC of this:
> > > https://hansdegoede.dreamwidth.org/28153.html
> >
> > For the record, we're also considering using them for ARM KMS devices,
> > so it would be better if the solution wasn't only considering v4l2
> > devices.
> >
> > > I have added a simple udev rule to give physically present users access
> > > to the dma_heap-s:
> > >
> > > KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
> > >
> > > (and on Rasperry Pi devices any users in the video group get access)
> > >
> > > This was just a quick fix for the PoC. Now that we are ready to move out
> > > of the PoC phase and start actually integrating this into distributions
> > > the question becomes if this is an acceptable solution; or if we need some
> > > other way to deal with this ?
> > >
> > > Specifically the question is if this will have any negative security
> > > implications? I can certainly see this being used to do some sort of
> > > denial of service attack on the system (1). This is especially true for
> > > the cma heap which generally speaking is a limited resource.
> >
> > There's plenty of other ways to exhaust CMA, like allocating too much
> > KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> > differently than those if it's part of our threat model.
>
> So generally for an arm soc where your display needs cma, your render node
> doesn't. And user applications only have access to the later, while only
> the compositor gets a kms fd through logind. At least in drm aside from
> vc4 there's really no render driver that just gives you access to cma and
> allows you to exhaust that, you need to be a compositor with drm master
> access to the display.
>
> Which means we're mostly protected against bad applications, and that's
> not a threat the "user physically sits in front of the machine accounts
> for", and which giving cma access to everyone would open up. And with
> flathub/snaps/... this is very much an issue.
>
> So you need more, either:
>
> - cgroups limits on dma-buf and dma-buf heaps. This has been bikeshedded
> for years and is just not really moving.
>
> - An allocator service which checks whether you're allowed to allocate
> these special buffers. Android does that through binder.

I would split the latter into a centralized frame buffer allocator
library for userspace (James Jones' Unix device memory allocator comes
to mind), providing dma-buf instances using whatever backend is
available and suitable (DMA heaps would likely be one of them), and a
safe way to make this allocator available to applications. Exposing it
through some system services could be useful, but with proper tracking
and accounting of memory allocated through DMA heaps (and DRM, and V4L2)
we could possibly do with just a library.

What I really want is to move memory allocation for devices to a
separate component, and make everything else a consumer of those
buffers.

> Probably also some way to nuke applications that refuse to release buffers
> when they're no longer the right application. cgroups is a lot more
> convenient for that.
>
> > > But devices tagged for uaccess are only opened up to users who are
> > > physcially present behind the machine and those can just hit
> > > the powerbutton, so I don't believe that any *on purpose* DOS is part of
> > > the thread model.
> >
> > How would that work for headless devices?

--
Regards,

Laurent Pinchart

2024-05-07 20:07:38

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, 7 May 2024 at 21:40, Laurent Pinchart
<[email protected]> wrote:
>
> On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote:
> > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote:
> > > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > > providing data to VPU or DRM, then you should be able to get the buffer
> > > > from the data-consuming device.
> > >
> > > Because we don't necessarily know what the consuming device is, if any.
> > >
> > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> > > sake be GPU or DSP.
> > >
> > > Also if we introduce a dependency on another device to allocate the
> > > output buffers - say always taking the output buffer from the GPU, then
> > > we've added another dependency which is more difficult to guarantee
> > > across different arches.
> >
> > Yes. And it should be expected. It's a consumer who knows the
> > restrictions on the buffer. As I wrote, Zoom/Hangouts should not
> > require a DMA buffer at all.
>
> Why not ? If you want to capture to a buffer that you then compose on
> the screen without copying data, dma-buf is the way to go. That's the
> Linux solution for buffer sharing.

Yes. But it should be allocated by the DRM driver. As Sima wrote,
there is no guarantee that the buffer allocated from dma-heaps is
accessible to the GPU.

>
> > Applications should be able to allocate
> > the buffer out of the generic memory.
>
> If applications really want to copy data and degrade performance, they
> are free to shoot themselves in the foot of course. Applications (or
> compositors) need to support copying as a fallback in the worst case,
> but all components should at least aim for the zero-copy case.

I'd say that they should aim for the optimal case. It might include
both zero-copying access from another DMA master or simple software
processing of some kind.

> > GPUs might also have different
> > requirements. Consider GPUs with VRAM. It might be beneficial to
> > allocate a buffer out of VRAM rather than generic DMA mem.
>
> Absolutely. For that we need a centralized device memory allocator in
> userspace. An effort was started by James Jones in 2016, see [1]. It has
> unfortunately stalled. If I didn't have a camera framework to develop, I
> would try to tackle that issue :-)

I'll review the talk. However the fact that the effort has stalled
most likely means that 'one fits them all' approach didn't really fly
well. We have too many usecases.

>
> [1] https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf

--
With best wishes
Dmitry

2024-05-07 20:09:53

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi,

Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> Shorter term, we have a problem to solve, and the best option we have
> found so far is to rely on dma-buf heaps as a backend for the frame
> buffer allocatro helper in libcamera for the use case described above.
> This won't work in 100% of the cases, clearly. It's a stop-gap measure
> until we can do better.

Considering the security concerned raised on this thread with dmabuf heap
allocation not be restricted by quotas, you'd get what you want quickly with
memfd + udmabuf instead (which is accounted already).

It was raised that distro don't enable udmabuf, but as stated there by Hans, in
any cases distro needs to take action to make the softISP works. This
alternative is easy and does not interfere in anyway with your future plan or
the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
safer memfd+udmabuf for the distro with security concerns.

And for the long term plan, we can certainly get closer by fixing that issue
with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
find common set of helpers to fix these exporters.

regards,
Nicolas

2024-05-07 20:15:51

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 10:59:42PM +0300, Dmitry Baryshkov wrote:
> On Tue, 7 May 2024 at 21:40, Laurent Pinchart wrote:
> > On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote:
> > > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote:
> > > > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > > > providing data to VPU or DRM, then you should be able to get the buffer
> > > > > from the data-consuming device.
> > > >
> > > > Because we don't necessarily know what the consuming device is, if any.
> > > >
> > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> > > > sake be GPU or DSP.
> > > >
> > > > Also if we introduce a dependency on another device to allocate the
> > > > output buffers - say always taking the output buffer from the GPU, then
> > > > we've added another dependency which is more difficult to guarantee
> > > > across different arches.
> > >
> > > Yes. And it should be expected. It's a consumer who knows the
> > > restrictions on the buffer. As I wrote, Zoom/Hangouts should not
> > > require a DMA buffer at all.
> >
> > Why not ? If you want to capture to a buffer that you then compose on
> > the screen without copying data, dma-buf is the way to go. That's the
> > Linux solution for buffer sharing.
>
> Yes. But it should be allocated by the DRM driver. As Sima wrote,
> there is no guarantee that the buffer allocated from dma-heaps is
> accessible to the GPU.

No disagreement there. From a libcamera point of view, we want to import
buffers allocated externally. It's for use cases where applications
can't provide dma buf instances easily that we need to allocate them
through the libcamera buffer allocator helper. That helper has to a)
provide dma buf fds, and b) make a best effort to allocate buffers that
will have a decent chance of being usable by other devices. We're open
to exploring other solutions than dma heaps, although I wonder what the
dma heaps are for if nobody enables them :-)

> > > Applications should be able to allocate
> > > the buffer out of the generic memory.
> >
> > If applications really want to copy data and degrade performance, they
> > are free to shoot themselves in the foot of course. Applications (or
> > compositors) need to support copying as a fallback in the worst case,
> > but all components should at least aim for the zero-copy case.
>
> I'd say that they should aim for the optimal case. It might include
> both zero-copying access from another DMA master or simple software
> processing of some kind.
>
> > > GPUs might also have different
> > > requirements. Consider GPUs with VRAM. It might be beneficial to
> > > allocate a buffer out of VRAM rather than generic DMA mem.
> >
> > Absolutely. For that we need a centralized device memory allocator in
> > userspace. An effort was started by James Jones in 2016, see [1]. It has
> > unfortunately stalled. If I didn't have a camera framework to develop, I
> > would try to tackle that issue :-)
>
> I'll review the talk. However the fact that the effort has stalled
> most likely means that 'one fits them all' approach didn't really fly
> well. We have too many usecases.
>
> > [1] https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf

--
Regards,

Laurent Pinchart

2024-05-07 23:02:21

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 04:15:05PM +0100, Bryan O'Donoghue wrote:
> On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > providing data to VPU or DRM, then you should be able to get the buffer
> > from the data-consuming device.
>
> Because we don't necessarily know what the consuming device is, if any.

Well ... that's an entirely different issue. And it's unsolved.

Currently the approach is to allocate where the constraints are usually
most severe (like display, if you need that, or the camera module for
input) and then just pray the stack works out without too much copying.
All userspace (whether the generic glue or the userspace driver depends a
bit upon the exact api) does need to have a copy fallback for these
sharing cases, ideally with the copying accelerated by hw.

If you try to solve this by just preemptive allocating everything as cma
buffers, then you'll make the situation substantially worse (because now
you're wasting tons of cma memory where you might not even need it).
And without really solving the problem, since for some gpus that memory
might be unusable (because you cannot scan that out on any discrete gpu,
and sometimes not even on an integrated one).
-Sima

> Could be VPU, could be Zoom/Hangouts via pipewire, could for argument sake
> be GPU or DSP.
>
> Also if we introduce a dependency on another device to allocate the output
> buffers - say always taking the output buffer from the GPU, then we've added
> another dependency which is more difficult to guarantee across different
> arches.
>
> ---
> bod

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-08 08:41:48

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 10:59:42PM +0300, Dmitry Baryshkov wrote:
> On Tue, 7 May 2024 at 21:40, Laurent Pinchart
> <[email protected]> wrote:
> >
> > On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote:
> > > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote:
> > > > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > > > providing data to VPU or DRM, then you should be able to get the buffer
> > > > > from the data-consuming device.
> > > >
> > > > Because we don't necessarily know what the consuming device is, if any.
> > > >
> > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> > > > sake be GPU or DSP.
> > > >
> > > > Also if we introduce a dependency on another device to allocate the
> > > > output buffers - say always taking the output buffer from the GPU, then
> > > > we've added another dependency which is more difficult to guarantee
> > > > across different arches.
> > >
> > > Yes. And it should be expected. It's a consumer who knows the
> > > restrictions on the buffer. As I wrote, Zoom/Hangouts should not
> > > require a DMA buffer at all.
> >
> > Why not ? If you want to capture to a buffer that you then compose on
> > the screen without copying data, dma-buf is the way to go. That's the
> > Linux solution for buffer sharing.
>
> Yes. But it should be allocated by the DRM driver. As Sima wrote,
> there is no guarantee that the buffer allocated from dma-heaps is
> accessible to the GPU.
>
> >
> > > Applications should be able to allocate
> > > the buffer out of the generic memory.
> >
> > If applications really want to copy data and degrade performance, they
> > are free to shoot themselves in the foot of course. Applications (or
> > compositors) need to support copying as a fallback in the worst case,
> > but all components should at least aim for the zero-copy case.
>
> I'd say that they should aim for the optimal case. It might include
> both zero-copying access from another DMA master or simple software
> processing of some kind.
>
> > > GPUs might also have different
> > > requirements. Consider GPUs with VRAM. It might be beneficial to
> > > allocate a buffer out of VRAM rather than generic DMA mem.
> >
> > Absolutely. For that we need a centralized device memory allocator in
> > userspace. An effort was started by James Jones in 2016, see [1]. It has
> > unfortunately stalled. If I didn't have a camera framework to develop, I
> > would try to tackle that issue :-)
>
> I'll review the talk. However the fact that the effort has stalled
> most likely means that 'one fits them all' approach didn't really fly
> well. We have too many usecases.

I think there's two reasons:

- It's a really hard problem with many aspects. Where you need to allocate
the buffer is just one of the myriad of issues a common allocator needs
to solve.

- Every linux-based os has their own solution for these, and the one that
suffers most has an entirely different one from everyone else: Android
uses binder services to allow apps to make these allocations, keep track
of them and make sure there's no abuse. And if there is, it can just
nuke the app.

Cheers, Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-08 11:02:24

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> Hi,
>
> Le mardi 07 mai 2024 ? 21:36 +0300, Laurent Pinchart a ?crit?:
> > Shorter term, we have a problem to solve, and the best option we have
> > found so far is to rely on dma-buf heaps as a backend for the frame
> > buffer allocatro helper in libcamera for the use case described above.
> > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > until we can do better.
>
> Considering the security concerned raised on this thread with dmabuf heap
> allocation not be restricted by quotas, you'd get what you want quickly with
> memfd + udmabuf instead (which is accounted already).
>
> It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> any cases distro needs to take action to make the softISP works. This
> alternative is easy and does not interfere in anyway with your future plan or
> the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> safer memfd+udmabuf for the distro with security concerns.
>
> And for the long term plan, we can certainly get closer by fixing that issue
> with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> find common set of helpers to fix these exporters.

Yeah if this is just for softisp, then memfd + udmabuf is also what I was
about to suggest. Not just as a stopgap, but as the real official thing.

udmabuf does kinda allow you to pin memory, but we can easily fix that by
adding the right accounting and then either let mlock rlimits or cgroups
kernel memory limits enforce good behavior.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-08 11:57:15

by Daniel Stone

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi,

On Tue, 7 May 2024 at 12:15, Daniel Vetter <[email protected]> wrote:
> On Mon, May 06, 2024 at 04:01:42PM +0200, Hans de Goede wrote:
> > On 5/6/24 3:38 PM, Daniel Vetter wrote:
> > I agree that bad applications are an issue, but not for the flathub / snaps
> > case. Flatpacks / snaps run sandboxed and don't have access to a full /dev
> > so those should not be able to open /dev/dma_heap/* independent of
> > the ACLs on /dev/dma_heap/*. The plan is for cameras using the
> > libcamera software ISP to always be accessed through pipewire and
> > the camera portal, so in this case pipewere is taking the place of
> > the compositor in your kms vs render node example.
>
> Yeah essentially if you clarify to "set the permissions such that pipewire
> can do allocations", then I think that makes sense. And is at the same
> level as e.g. drm kms giving compsitors (but _only_ compositors) special
> access rights.

That would have the unfortunate side effect of making sandboxed apps
less efficient on some platforms, since they wouldn't be able to do
direct scanout anymore ...

Cheers,
Daniel

2024-05-08 12:17:37

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, May 08, 2024 at 06:46:53AM +0100, Daniel Stone wrote:
> Hi,
>
> On Tue, 7 May 2024 at 12:15, Daniel Vetter <[email protected]> wrote:
> > On Mon, May 06, 2024 at 04:01:42PM +0200, Hans de Goede wrote:
> > > On 5/6/24 3:38 PM, Daniel Vetter wrote:
> > > I agree that bad applications are an issue, but not for the flathub / snaps
> > > case. Flatpacks / snaps run sandboxed and don't have access to a full /dev
> > > so those should not be able to open /dev/dma_heap/* independent of
> > > the ACLs on /dev/dma_heap/*. The plan is for cameras using the
> > > libcamera software ISP to always be accessed through pipewire and
> > > the camera portal, so in this case pipewere is taking the place of
> > > the compositor in your kms vs render node example.
> >
> > Yeah essentially if you clarify to "set the permissions such that pipewire
> > can do allocations", then I think that makes sense. And is at the same
> > level as e.g. drm kms giving compsitors (but _only_ compositors) special
> > access rights.
>
> That would have the unfortunate side effect of making sandboxed apps
> less efficient on some platforms, since they wouldn't be able to do
> direct scanout anymore ...

I was assuming that everyone goes through pipewire, and ideally that is
the only one that can even get at these special chardev.

If pipewire is only for sandboxed apps then yeah this aint great :-/
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-08 12:39:32

by Daniel Stone

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, 8 May 2024 at 09:33, Daniel Vetter <[email protected]> wrote:
> On Wed, May 08, 2024 at 06:46:53AM +0100, Daniel Stone wrote:
> > That would have the unfortunate side effect of making sandboxed apps
> > less efficient on some platforms, since they wouldn't be able to do
> > direct scanout anymore ...
>
> I was assuming that everyone goes through pipewire, and ideally that is
> the only one that can even get at these special chardev.
>
> If pipewire is only for sandboxed apps then yeah this aint great :-/

No, PipeWire is fine, I mean graphical apps.

Right now, if your platform requires CMA for display, then the app
needs access to the GPU render node and the display node too, in order
to allocate buffers which the compositor can scan out directly. If it
only has access to the render nodes and not the display node, it won't
be able to allocate correctly, so its content will need a composition
pass, i.e. performance penalty for sandboxing. But if it can allocate
correctly, then hey, it can exhaust CMA just like heaps can.

Personally I think we'd be better off just allowing access and
figuring out cgroups later. It's not like the OOM story is great
generally, and hey, you can get there with just render nodes ...

Cheers,
Daniel

2024-05-08 16:54:36

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, May 08, 2024 at 09:38:33AM +0100, Daniel Stone wrote:
> On Wed, 8 May 2024 at 09:33, Daniel Vetter <[email protected]> wrote:
> > On Wed, May 08, 2024 at 06:46:53AM +0100, Daniel Stone wrote:
> > > That would have the unfortunate side effect of making sandboxed apps
> > > less efficient on some platforms, since they wouldn't be able to do
> > > direct scanout anymore ...
> >
> > I was assuming that everyone goes through pipewire, and ideally that is
> > the only one that can even get at these special chardev.
> >
> > If pipewire is only for sandboxed apps then yeah this aint great :-/
>
> No, PipeWire is fine, I mean graphical apps.
>
> Right now, if your platform requires CMA for display, then the app
> needs access to the GPU render node and the display node too, in order
> to allocate buffers which the compositor can scan out directly. If it
> only has access to the render nodes and not the display node, it won't
> be able to allocate correctly, so its content will need a composition
> pass, i.e. performance penalty for sandboxing. But if it can allocate
> correctly, then hey, it can exhaust CMA just like heaps can.
>
> Personally I think we'd be better off just allowing access and
> figuring out cgroups later. It's not like the OOM story is great
> generally, and hey, you can get there with just render nodes ...

Imo the right fix is to ask the compositor to allocate the buffers in this
case, and then maybe have some kind of revoke/purge behaviour on these
buffers. Compositor has an actual idea of who's a candidate for direct
scanout after all, not the app. Or well at least force migrate the memory
from cma to shmem.

If you only whack cgroups on this issue you're still stuck in the world
where either all apps together can ddos the display or no one can
realistically direct scanout.

So yeah on the display side the problem isn't solved either, but we knew
that already.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-08 21:51:39

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > Shorter term, we have a problem to solve, and the best option we have
> > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > buffer allocatro helper in libcamera for the use case described above.
> > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > until we can do better.
> >
> > Considering the security concerned raised on this thread with dmabuf heap
> > allocation not be restricted by quotas, you'd get what you want quickly with
> > memfd + udmabuf instead (which is accounted already).
> >
> > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > any cases distro needs to take action to make the softISP works. This
> > alternative is easy and does not interfere in anyway with your future plan or
> > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > safer memfd+udmabuf for the distro with security concerns.
> >
> > And for the long term plan, we can certainly get closer by fixing that issue
> > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > find common set of helpers to fix these exporters.
>
> Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> about to suggest. Not just as a stopgap, but as the real official thing.

Long term I still want a centralized memory allocator, at which point
libcamera should stop allocating buffers at all.

> udmabuf does kinda allow you to pin memory, but we can easily fix that by
> adding the right accounting and then either let mlock rlimits or cgroups
> kernel memory limits enforce good behavior.

--
Regards,

Laurent Pinchart

2024-05-08 21:53:02

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Thu, May 09, 2024 at 12:51:08AM +0300, Laurent Pinchart wrote:
> On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > Shorter term, we have a problem to solve, and the best option we have
> > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > buffer allocatro helper in libcamera for the use case described above.
> > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > until we can do better.
> > >
> > > Considering the security concerned raised on this thread with dmabuf heap
> > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > memfd + udmabuf instead (which is accounted already).
> > >
> > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > any cases distro needs to take action to make the softISP works. This
> > > alternative is easy and does not interfere in anyway with your future plan or
> > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > safer memfd+udmabuf for the distro with security concerns.
> > >
> > > And for the long term plan, we can certainly get closer by fixing that issue
> > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > find common set of helpers to fix these exporters.
> >
> > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > about to suggest. Not just as a stopgap, but as the real official thing.
>
> Long term I still want a centralized memory allocator, at which point
> libcamera should stop allocating buffers at all.

And to be clear, udmabuf could be fine for the time being. At least as
long as we don't find any shortcoming while testing it :-)

> > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > adding the right accounting and then either let mlock rlimits or cgroups
> > kernel memory limits enforce good behavior.

--
Regards,

Laurent Pinchart

2024-05-08 21:55:12

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, May 08, 2024 at 10:39:58AM +0200, Daniel Vetter wrote:
> On Tue, May 07, 2024 at 10:59:42PM +0300, Dmitry Baryshkov wrote:
> > On Tue, 7 May 2024 at 21:40, Laurent Pinchart wrote:
> > > On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote:
> > > > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote:
> > > > > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > > > > providing data to VPU or DRM, then you should be able to get the buffer
> > > > > > from the data-consuming device.
> > > > >
> > > > > Because we don't necessarily know what the consuming device is, if any.
> > > > >
> > > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> > > > > sake be GPU or DSP.
> > > > >
> > > > > Also if we introduce a dependency on another device to allocate the
> > > > > output buffers - say always taking the output buffer from the GPU, then
> > > > > we've added another dependency which is more difficult to guarantee
> > > > > across different arches.
> > > >
> > > > Yes. And it should be expected. It's a consumer who knows the
> > > > restrictions on the buffer. As I wrote, Zoom/Hangouts should not
> > > > require a DMA buffer at all.
> > >
> > > Why not ? If you want to capture to a buffer that you then compose on
> > > the screen without copying data, dma-buf is the way to go. That's the
> > > Linux solution for buffer sharing.
> >
> > Yes. But it should be allocated by the DRM driver. As Sima wrote,
> > there is no guarantee that the buffer allocated from dma-heaps is
> > accessible to the GPU.
> >
> > >
> > > > Applications should be able to allocate
> > > > the buffer out of the generic memory.
> > >
> > > If applications really want to copy data and degrade performance, they
> > > are free to shoot themselves in the foot of course. Applications (or
> > > compositors) need to support copying as a fallback in the worst case,
> > > but all components should at least aim for the zero-copy case.
> >
> > I'd say that they should aim for the optimal case. It might include
> > both zero-copying access from another DMA master or simple software
> > processing of some kind.
> >
> > > > GPUs might also have different
> > > > requirements. Consider GPUs with VRAM. It might be beneficial to
> > > > allocate a buffer out of VRAM rather than generic DMA mem.
> > >
> > > Absolutely. For that we need a centralized device memory allocator in
> > > userspace. An effort was started by James Jones in 2016, see [1]. It has
> > > unfortunately stalled. If I didn't have a camera framework to develop, I
> > > would try to tackle that issue :-)
> >
> > I'll review the talk. However the fact that the effort has stalled
> > most likely means that 'one fits them all' approach didn't really fly
> > well. We have too many usecases.
>
> I think there's two reasons:
>
> - It's a really hard problem with many aspects. Where you need to allocate
> the buffer is just one of the myriad of issues a common allocator needs
> to solve.

The other large problem is picking up an optimal pixel format. I wonder
if that could be decoupled from the allocation. That could help moving
forward.

> - Every linux-based os has their own solution for these, and the one that
> suffers most has an entirely different one from everyone else: Android
> uses binder services to allow apps to make these allocations, keep track
> of them and make sure there's no abuse. And if there is, it can just
> nuke the app.

--
Regards,

Laurent Pinchart

2024-05-09 10:39:22

by Daniel Stone

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi,

On Wed, 8 May 2024 at 16:49, Daniel Vetter <[email protected]> wrote:
> On Wed, May 08, 2024 at 09:38:33AM +0100, Daniel Stone wrote:
> > Right now, if your platform requires CMA for display, then the app
> > needs access to the GPU render node and the display node too, in order
> > to allocate buffers which the compositor can scan out directly. If it
> > only has access to the render nodes and not the display node, it won't
> > be able to allocate correctly, so its content will need a composition
> > pass, i.e. performance penalty for sandboxing. But if it can allocate
> > correctly, then hey, it can exhaust CMA just like heaps can.
> >
> > Personally I think we'd be better off just allowing access and
> > figuring out cgroups later. It's not like the OOM story is great
> > generally, and hey, you can get there with just render nodes ...
>
> Imo the right fix is to ask the compositor to allocate the buffers in this
> case, and then maybe have some kind of revoke/purge behaviour on these
> buffers. Compositor has an actual idea of who's a candidate for direct
> scanout after all, not the app. Or well at least force migrate the memory
> from cma to shmem.
>
> If you only whack cgroups on this issue you're still stuck in the world
> where either all apps together can ddos the display or no one can
> realistically direct scanout.

Mmm, back to DRI2. I can't say I'm wildly enthused about that, not
least because a client using GPU/codec/etc for those buffers would
have to communicate its requirements (alignment etc) forward to the
compositor in order for the compositor to allocate for it. Obviously
passing the constraints etc around isn't a solved problem yet, but it
is at least contained down in clients rather than making it back and
forth between client and compositor.

I'm extremely not-wild about the compositor migrating memory from CMA
to shmem behind the client's back, and tbh I'm not sure how that would
even work if the client has it pinned through whatever API it's
imported into.

Anyway, like Laurent says, if we're deciding that heaps can't be used
by generic apps (unlike DRM/V4L2/etc), then we need gralloc.

Cheers,
Daniel

2024-05-13 08:34:42

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 13, 2024 at 10:29:22AM +0200, Maxime Ripard wrote:
> On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > Hi,
> > >
> > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > Shorter term, we have a problem to solve, and the best option we have
> > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > buffer allocatro helper in libcamera for the use case described above.
> > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > until we can do better.
> > >
> > > Considering the security concerned raised on this thread with dmabuf heap
> > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > memfd + udmabuf instead (which is accounted already).
> > >
> > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > any cases distro needs to take action to make the softISP works. This
> > > alternative is easy and does not interfere in anyway with your future plan or
> > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > safer memfd+udmabuf for the distro with security concerns.
> > >
> > > And for the long term plan, we can certainly get closer by fixing that issue
> > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > find common set of helpers to fix these exporters.
> >
> > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > about to suggest. Not just as a stopgap, but as the real official thing.
> >
> > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > adding the right accounting and then either let mlock rlimits or cgroups
> > kernel memory limits enforce good behavior.
>
> I think the main drawback with memfd is that it'll be broken for devices
> without an IOMMU, and while you said that it's uncommon for GPUs, it's
> definitely not for codecs and display engines.

If the application wants to share buffers between the camera and a
display engine or codec, it should arguably not use the libcamera
FrameBufferAllocator, but allocate the buffers from the display or the
encoder. memfd wouldn't be used in that case.

We need to eat our own dogfood though. If we want to push the
responsibility for buffer allocation in the buffer sharing case to the
application, we need to modify the cam application to do so when using
the KMS backend.

--
Regards,

Laurent Pinchart

2024-05-13 08:34:55

by Maxime Ripard

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > Hi,
> >
> > Le mardi 07 mai 2024 ? 21:36 +0300, Laurent Pinchart a ?crit?:
> > > Shorter term, we have a problem to solve, and the best option we have
> > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > buffer allocatro helper in libcamera for the use case described above.
> > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > until we can do better.
> >
> > Considering the security concerned raised on this thread with dmabuf heap
> > allocation not be restricted by quotas, you'd get what you want quickly with
> > memfd + udmabuf instead (which is accounted already).
> >
> > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > any cases distro needs to take action to make the softISP works. This
> > alternative is easy and does not interfere in anyway with your future plan or
> > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > safer memfd+udmabuf for the distro with security concerns.
> >
> > And for the long term plan, we can certainly get closer by fixing that issue
> > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > find common set of helpers to fix these exporters.
>
> Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> about to suggest. Not just as a stopgap, but as the real official thing.
>
> udmabuf does kinda allow you to pin memory, but we can easily fix that by
> adding the right accounting and then either let mlock rlimits or cgroups
> kernel memory limits enforce good behavior.

I think the main drawback with memfd is that it'll be broken for devices
without an IOMMU, and while you said that it's uncommon for GPUs, it's
definitely not for codecs and display engines.

Maxime


Attachments:
(No filename) (1.93 kB)
signature.asc (281.00 B)
Download all attachments

2024-05-13 08:40:08

by Maxime Ripard

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tue, May 07, 2024 at 10:59:42PM +0300, Dmitry Baryshkov wrote:
> On Tue, 7 May 2024 at 21:40, Laurent Pinchart
> <[email protected]> wrote:
> >
> > On Tue, May 07, 2024 at 06:19:18PM +0300, Dmitry Baryshkov wrote:
> > > On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue wrote:
> > > > On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > > > > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > > > > providing data to VPU or DRM, then you should be able to get the buffer
> > > > > from the data-consuming device.
> > > >
> > > > Because we don't necessarily know what the consuming device is, if any.
> > > >
> > > > Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> > > > sake be GPU or DSP.
> > > >
> > > > Also if we introduce a dependency on another device to allocate the
> > > > output buffers - say always taking the output buffer from the GPU, then
> > > > we've added another dependency which is more difficult to guarantee
> > > > across different arches.
> > >
> > > Yes. And it should be expected. It's a consumer who knows the
> > > restrictions on the buffer. As I wrote, Zoom/Hangouts should not
> > > require a DMA buffer at all.
> >
> > Why not ? If you want to capture to a buffer that you then compose on
> > the screen without copying data, dma-buf is the way to go. That's the
> > Linux solution for buffer sharing.
>
> Yes. But it should be allocated by the DRM driver. As Sima wrote,
> there is no guarantee that the buffer allocated from dma-heaps is
> accessible to the GPU.

And there is no guarantee that the buffer allocated from the GPU is
accessible to the display engine. In practice, I've yet to see an issue
with that assumption.

And there's the other elephant in the room that hasn't been addressed.
Buffers typically allocated by the data-consuming frameworks are
coherent buffers, which on arm/arm64 usually mean non-cacheable.

Performances are *terrible*. Meanwhile, dma-heaps and dma-buf provide
cacheable buffers with a cache synchronization API, which allow to have
it run much faster.

Maxime


Attachments:
(No filename) (2.09 kB)
signature.asc (281.00 B)
Download all attachments

2024-05-13 13:51:45

by Simon Ser

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wednesday, May 8th, 2024 at 17:49, Daniel Vetter <[email protected]> wrote:

> On Wed, May 08, 2024 at 09:38:33AM +0100, Daniel Stone wrote:
>
> > On Wed, 8 May 2024 at 09:33, Daniel Vetter [email protected] wrote:
> >
> > > On Wed, May 08, 2024 at 06:46:53AM +0100, Daniel Stone wrote:
> > >
> > > > That would have the unfortunate side effect of making sandboxed apps
> > > > less efficient on some platforms, since they wouldn't be able to do
> > > > direct scanout anymore ...
> > >
> > > I was assuming that everyone goes through pipewire, and ideally that is
> > > the only one that can even get at these special chardev.
> > >
> > > If pipewire is only for sandboxed apps then yeah this aint great :-/
> >
> > No, PipeWire is fine, I mean graphical apps.
> >
> > Right now, if your platform requires CMA for display, then the app
> > needs access to the GPU render node and the display node too, in order
> > to allocate buffers which the compositor can scan out directly. If it
> > only has access to the render nodes and not the display node, it won't
> > be able to allocate correctly, so its content will need a composition
> > pass, i.e. performance penalty for sandboxing. But if it can allocate
> > correctly, then hey, it can exhaust CMA just like heaps can.
> >
> > Personally I think we'd be better off just allowing access and
> > figuring out cgroups later. It's not like the OOM story is great
> > generally, and hey, you can get there with just render nodes ...
>
> Imo the right fix is to ask the compositor to allocate the buffers in this
> case, and then maybe have some kind of revoke/purge behaviour on these
> buffers. Compositor has an actual idea of who's a candidate for direct
> scanout after all, not the app. Or well at least force migrate the memory
> from cma to shmem.
>
> If you only whack cgroups on this issue you're still stuck in the world
> where either all apps together can ddos the display or no one can
> realistically direct scanout.
>
> So yeah on the display side the problem isn't solved either, but we knew
> that already.

What makes scanout memory so special?

The way I see it, any kind of memory will always be a limited resource:
regular programs can exhaust system memory, as well as GPU VRAM, as well
as scanout memory. I think we need to have ways to limit/control/arbiter
the allocations regardless, and I don't think scanout memory should be a
special case here.

2024-05-13 13:52:00

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Le lundi 13 mai 2024 à 10:29 +0200, Maxime Ripard a écrit :
> On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > Hi,
> > >
> > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > Shorter term, we have a problem to solve, and the best option we have
> > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > buffer allocatro helper in libcamera for the use case described above.
> > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > until we can do better.
> > >
> > > Considering the security concerned raised on this thread with dmabuf heap
> > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > memfd + udmabuf instead (which is accounted already).
> > >
> > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > any cases distro needs to take action to make the softISP works. This
> > > alternative is easy and does not interfere in anyway with your future plan or
> > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > safer memfd+udmabuf for the distro with security concerns.
> > >
> > > And for the long term plan, we can certainly get closer by fixing that issue
> > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > find common set of helpers to fix these exporters.
> >
> > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > about to suggest. Not just as a stopgap, but as the real official thing.
> >
> > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > adding the right accounting and then either let mlock rlimits or cgroups
> > kernel memory limits enforce good behavior.
>
> I think the main drawback with memfd is that it'll be broken for devices
> without an IOMMU, and while you said that it's uncommon for GPUs, it's
> definitely not for codecs and display engines.

In the context of libcamera, the allocation and the alignment done to the video
frame is done completely blindly. In that context, there is a lot more then just
the allocation type that can go wrong and will lead to a memory copy. The upside
of memfd, is that the read cache will help speeding up the copies if they are
needed.

Another important point is that this is only used if the application haven't
provided frames. If your embedded application is non-generic, and you have
permissions to access the right heap, the application can solve your specific
issue. But in the generic Linux space, Linux kernel API are just insufficient
for the "just work" scenario.

Nicolas

2024-05-13 14:00:32

by Maxime Ripard

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 13, 2024 at 09:42:00AM -0400, Nicolas Dufresne wrote:
> Le lundi 13 mai 2024 ? 10:29 +0200, Maxime Ripard a ?crit?:
> > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > Hi,
> > > >
> > > > Le mardi 07 mai 2024 ? 21:36 +0300, Laurent Pinchart a ?crit?:
> > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > until we can do better.
> > > >
> > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > memfd + udmabuf instead (which is accounted already).
> > > >
> > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > any cases distro needs to take action to make the softISP works. This
> > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > safer memfd+udmabuf for the distro with security concerns.
> > > >
> > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > find common set of helpers to fix these exporters.
> > >
> > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > about to suggest. Not just as a stopgap, but as the real official thing.
> > >
> > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > adding the right accounting and then either let mlock rlimits or cgroups
> > > kernel memory limits enforce good behavior.
> >
> > I think the main drawback with memfd is that it'll be broken for devices
> > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > definitely not for codecs and display engines.
>
> In the context of libcamera, the allocation and the alignment done to the video
> frame is done completely blindly. In that context, there is a lot more then just
> the allocation type that can go wrong and will lead to a memory copy. The upside
> of memfd, is that the read cache will help speeding up the copies if they are
> needed.

dma-heaps provide cacheable buffers too...

> Another important point is that this is only used if the application haven't
> provided frames. If your embedded application is non-generic, and you have
> permissions to access the right heap, the application can solve your specific
> issue. But in the generic Linux space, Linux kernel API are just insufficient
> for the "just work" scenario.

... but they also provide semantics around the memory buffers that no
other allocation API do. There's at least the mediatek secure playback
series and another one that I've started to work on to allocate ECC
protected or unprotected buffers that are just the right use case for
the heaps, and the target frameworks aren't.

Maxime


Attachments:
(No filename) (3.25 kB)
signature.asc (281.00 B)
Download all attachments

2024-05-13 15:06:44

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Le lundi 13 mai 2024 à 15:51 +0200, Maxime Ripard a écrit :
> On Mon, May 13, 2024 at 09:42:00AM -0400, Nicolas Dufresne wrote:
> > Le lundi 13 mai 2024 à 10:29 +0200, Maxime Ripard a écrit :
> > > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > > Hi,
> > > > >
> > > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > > until we can do better.
> > > > >
> > > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > > memfd + udmabuf instead (which is accounted already).
> > > > >
> > > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > > any cases distro needs to take action to make the softISP works. This
> > > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > > safer memfd+udmabuf for the distro with security concerns.
> > > > >
> > > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > > find common set of helpers to fix these exporters.
> > > >
> > > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > >
> > > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > > adding the right accounting and then either let mlock rlimits or cgroups
> > > > kernel memory limits enforce good behavior.
> > >
> > > I think the main drawback with memfd is that it'll be broken for devices
> > > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > > definitely not for codecs and display engines.
> >
> > In the context of libcamera, the allocation and the alignment done to the video
> > frame is done completely blindly. In that context, there is a lot more then just
> > the allocation type that can go wrong and will lead to a memory copy. The upside
> > of memfd, is that the read cache will help speeding up the copies if they are
> > needed.
>
> dma-heaps provide cacheable buffers too...

Yes, and why we have cache hints in V4L2 now. There is no clue that softISP code
can read to make the right call. The required cache management in undefined
until all the importer are known. I also don't think heaps currently care to
adapt the dmabuf sync behaviour based on the different importers, or the
addition of a new importer. On top of which, there is insufficient information
on the device to really deduce what is needed.

>
> > Another important point is that this is only used if the application haven't
> > provided frames. If your embedded application is non-generic, and you have
> > permissions to access the right heap, the application can solve your specific
> > issue. But in the generic Linux space, Linux kernel API are just insufficient
> > for the "just work" scenario.
>
> ... but they also provide semantics around the memory buffers that no
> other allocation API do. There's at least the mediatek secure playback
> series and another one that I've started to work on to allocate ECC
> protected or unprotected buffers that are just the right use case for
> the heaps, and the target frameworks aren't.

Let's agree we are both off topic now. The libcamera softISP is currently purely
software, and cannot write to any form of protected memory. As for ECC, I would
hope this usage will be coded in the application and that this application has
been authorized to access the appropriate heaps.

And finally, none of this fixes the issue that the heap allocation are not being
accounted properly and allow of an easy memory DoS. So uaccess should be granted
with care, meaning that defaulting a "desktop" library to that, means it will
most of the time not work at all.

Nicolas

2024-05-13 15:10:21

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Le lundi 13 mai 2024 à 11:34 +0300, Laurent Pinchart a écrit :
> On Mon, May 13, 2024 at 10:29:22AM +0200, Maxime Ripard wrote:
> > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > Hi,
> > > >
> > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > until we can do better.
> > > >
> > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > memfd + udmabuf instead (which is accounted already).
> > > >
> > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > any cases distro needs to take action to make the softISP works. This
> > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > safer memfd+udmabuf for the distro with security concerns.
> > > >
> > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > find common set of helpers to fix these exporters.
> > >
> > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > about to suggest. Not just as a stopgap, but as the real official thing.
> > >
> > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > adding the right accounting and then either let mlock rlimits or cgroups
> > > kernel memory limits enforce good behavior.
> >
> > I think the main drawback with memfd is that it'll be broken for devices
> > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > definitely not for codecs and display engines.
>
> If the application wants to share buffers between the camera and a
> display engine or codec, it should arguably not use the libcamera
> FrameBufferAllocator, but allocate the buffers from the display or the
> encoder. memfd wouldn't be used in that case.
>
> We need to eat our own dogfood though. If we want to push the
> responsibility for buffer allocation in the buffer sharing case to the
> application, we need to modify the cam application to do so when using
> the KMS backend.
>

Agreed, and the new dmabuf feedback on wayland can also be used on top of this.

You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
only offer allocation for render buffers and most of them are missing allocators
for YUV buffers, even though they can import in these formats. (kms allocators,
except dumb, which has other issues, are format aware).

Nicolas

2024-05-14 20:42:47

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 13, 2024 at 11:10:00AM -0400, Nicolas Dufresne wrote:
> Le lundi 13 mai 2024 à 11:34 +0300, Laurent Pinchart a écrit :
> > On Mon, May 13, 2024 at 10:29:22AM +0200, Maxime Ripard wrote:
> > > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > > Hi,
> > > > >
> > > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > > until we can do better.
> > > > >
> > > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > > memfd + udmabuf instead (which is accounted already).
> > > > >
> > > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > > any cases distro needs to take action to make the softISP works. This
> > > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > > safer memfd+udmabuf for the distro with security concerns.
> > > > >
> > > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > > find common set of helpers to fix these exporters.
> > > >
> > > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > >
> > > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > > adding the right accounting and then either let mlock rlimits or cgroups
> > > > kernel memory limits enforce good behavior.
> > >
> > > I think the main drawback with memfd is that it'll be broken for devices
> > > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > > definitely not for codecs and display engines.
> >
> > If the application wants to share buffers between the camera and a
> > display engine or codec, it should arguably not use the libcamera
> > FrameBufferAllocator, but allocate the buffers from the display or the
> > encoder. memfd wouldn't be used in that case.
> >
> > We need to eat our own dogfood though. If we want to push the
> > responsibility for buffer allocation in the buffer sharing case to the
> > application, we need to modify the cam application to do so when using
> > the KMS backend.
>
> Agreed, and the new dmabuf feedback on wayland can also be used on top of this.
>
> You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
> only offer allocation for render buffers and most of them are missing allocators
> for YUV buffers, even though they can import in these formats. (kms allocators,
> except dumb, which has other issues, are format aware).

My experience on Arm platforms is that the KMS drivers offer allocation
for scanout buffers, not render buffers, and mostly using the dumb
allocator API. If the KMS device can scan out YUV natively, YUV buffer
allocation should be supported. Am I missing something here ?

--
Regards,

Laurent Pinchart

2024-05-14 20:45:23

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 13, 2024 at 11:06:24AM -0400, Nicolas Dufresne wrote:
> Le lundi 13 mai 2024 à 15:51 +0200, Maxime Ripard a écrit :
> > On Mon, May 13, 2024 at 09:42:00AM -0400, Nicolas Dufresne wrote:
> > > Le lundi 13 mai 2024 à 10:29 +0200, Maxime Ripard a écrit :
> > > > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > > > until we can do better.
> > > > > >
> > > > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > > > memfd + udmabuf instead (which is accounted already).
> > > > > >
> > > > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > > > any cases distro needs to take action to make the softISP works. This
> > > > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > > > safer memfd+udmabuf for the distro with security concerns.
> > > > > >
> > > > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > > > find common set of helpers to fix these exporters.
> > > > >
> > > > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > > >
> > > > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > > > adding the right accounting and then either let mlock rlimits or cgroups
> > > > > kernel memory limits enforce good behavior.
> > > >
> > > > I think the main drawback with memfd is that it'll be broken for devices
> > > > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > > > definitely not for codecs and display engines.
> > >
> > > In the context of libcamera, the allocation and the alignment done to the video
> > > frame is done completely blindly. In that context, there is a lot more then just
> > > the allocation type that can go wrong and will lead to a memory copy. The upside
> > > of memfd, is that the read cache will help speeding up the copies if they are
> > > needed.
> >
> > dma-heaps provide cacheable buffers too...
>
> Yes, and why we have cache hints in V4L2 now. There is no clue that softISP code
> can read to make the right call. The required cache management in undefined
> until all the importer are known. I also don't think heaps currently care to
> adapt the dmabuf sync behaviour based on the different importers, or the
> addition of a new importer. On top of which, there is insufficient information
> on the device to really deduce what is needed.
>
> > > Another important point is that this is only used if the application haven't
> > > provided frames. If your embedded application is non-generic, and you have
> > > permissions to access the right heap, the application can solve your specific
> > > issue. But in the generic Linux space, Linux kernel API are just insufficient
> > > for the "just work" scenario.
> >
> > ... but they also provide semantics around the memory buffers that no
> > other allocation API do. There's at least the mediatek secure playback
> > series and another one that I've started to work on to allocate ECC
> > protected or unprotected buffers that are just the right use case for
> > the heaps, and the target frameworks aren't.
>
> Let's agree we are both off topic now. The libcamera softISP is currently purely
> software, and cannot write to any form of protected memory. As for ECC, I would
> hope this usage will be coded in the application and that this application has
> been authorized to access the appropriate heaps.
>
> And finally, none of this fixes the issue that the heap allocation are not being
> accounted properly and allow of an easy memory DoS. So uaccess should be granted
> with care, meaning that defaulting a "desktop" library to that, means it will
> most of the time not work at all.

I think that issue should be fixed, regardless of whether or not we end
up using dma heaps for libcamera. If we do use them, maybe there will be
a higher incentive for somebody involved in this conversation to tackle
that problem first :-) And maybe, as a result, the rest of the Linux
community will consider with a more open mind usage of dma heaps on
desktop systems.

--
Regards,

Laurent Pinchart

2024-05-14 20:53:03

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi,

Le mardi 14 mai 2024 à 23:45 +0300, Laurent Pinchart a écrit :
> > And finally, none of this fixes the issue that the heap allocation are not being
> > accounted properly and allow of an easy memory DoS. So uaccess should be granted
> > with care, meaning that defaulting a "desktop" library to that, means it will
> > most of the time not work at all.
>
> I think that issue should be fixed, regardless of whether or not we end
> up using dma heaps for libcamera. If we do use them, maybe there will be
> a higher incentive for somebody involved in this conversation to tackle
> that problem first :-) And maybe, as a result, the rest of the Linux
> community will consider with a more open mind usage of dma heaps on
> desktop systems.

The strict reality is that if libcamera offer no alternatives, some OS will
enable it and reduce their security. I totally agree this issue needs to be
fixed regardless of libcamera, or even dma heaps. DMABuf allocation should be
accounted and limited to quotas whether it comes from a GPU, Display, V4L2 or
other type of supported devices. I would also not recommend dropping your heap
support (or preventing it from being merged) in libcamera.

Nicolas

2024-05-16 07:00:57

by Simon Ser

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Tuesday, May 14th, 2024 at 22:42, Laurent Pinchart <[email protected]> wrote:

> My experience on Arm platforms is that the KMS drivers offer allocation
> for scanout buffers, not render buffers, and mostly using the dumb
> allocator API. If the KMS device can scan out YUV natively, YUV buffer
> allocation should be supported. Am I missing something here ?

Note that dumb buffers are only intended for simple software-rendering
use-cases. Anything more complicated (e.g. involving GPU rendering)
should use another mechanism.

2024-05-16 10:18:36

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Thu, May 09, 2024 at 10:23:16AM +0100, Daniel Stone wrote:
> Hi,
>
> On Wed, 8 May 2024 at 16:49, Daniel Vetter <[email protected]> wrote:
> > On Wed, May 08, 2024 at 09:38:33AM +0100, Daniel Stone wrote:
> > > Right now, if your platform requires CMA for display, then the app
> > > needs access to the GPU render node and the display node too, in order
> > > to allocate buffers which the compositor can scan out directly. If it
> > > only has access to the render nodes and not the display node, it won't
> > > be able to allocate correctly, so its content will need a composition
> > > pass, i.e. performance penalty for sandboxing. But if it can allocate
> > > correctly, then hey, it can exhaust CMA just like heaps can.
> > >
> > > Personally I think we'd be better off just allowing access and
> > > figuring out cgroups later. It's not like the OOM story is great
> > > generally, and hey, you can get there with just render nodes ...
> >
> > Imo the right fix is to ask the compositor to allocate the buffers in this
> > case, and then maybe have some kind of revoke/purge behaviour on these
> > buffers. Compositor has an actual idea of who's a candidate for direct
> > scanout after all, not the app. Or well at least force migrate the memory
> > from cma to shmem.
> >
> > If you only whack cgroups on this issue you're still stuck in the world
> > where either all apps together can ddos the display or no one can
> > realistically direct scanout.
>
> Mmm, back to DRI2. I can't say I'm wildly enthused about that, not
> least because a client using GPU/codec/etc for those buffers would
> have to communicate its requirements (alignment etc) forward to the
> compositor in order for the compositor to allocate for it. Obviously
> passing the constraints etc around isn't a solved problem yet, but it
> is at least contained down in clients rather than making it back and
> forth between client and compositor.

I don't think you need the compositor to allocate the buffer from the
requirements, you only need a protocol that a) allocates a buffer of a
given size from a given heap and b) has some kinda of revoke provisions so
that the compositor can claw back the memory again when it needs it.

> I'm extremely not-wild about the compositor migrating memory from CMA
> to shmem behind the client's back, and tbh I'm not sure how that would
> even work if the client has it pinned through whatever API it's
> imported into.

Other option is revoke on cma buffers that are allocated by clients, for
the case the compositor needs it.

> Anyway, like Laurent says, if we're deciding that heaps can't be used
> by generic apps (unlike DRM/V4L2/etc), then we need gralloc.

gralloc doesn't really fix this, it's just abstraction around how/where
you allocate?

Anyway the current plan is that we all pretend this issue of CMA allocated
buffers don't exist and we let clients allocate without limits. Given that
we don't even have cgroups to sort out the mess for anything else I
wouldn't worry too much ...
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-16 11:21:21

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Thu, May 16, 2024 at 07:00:31AM +0000, Simon Ser wrote:
> On Tuesday, May 14th, 2024 at 22:42, Laurent Pinchart wrote:
>
> > My experience on Arm platforms is that the KMS drivers offer allocation
> > for scanout buffers, not render buffers, and mostly using the dumb
> > allocator API. If the KMS device can scan out YUV natively, YUV buffer
> > allocation should be supported. Am I missing something here ?
>
> Note that dumb buffers are only intended for simple software-rendering
> use-cases. Anything more complicated (e.g. involving GPU rendering)
> should use another mechanism.

Sure. Even if dumb buffers may work for GPU rendering in some cases,
there's no guarantee they will, so they shouldn't be used.

My comment was related to scanout buffers, as I was puzzled by Nicolas
mentioning how "KMS drivers only offer allocation for render buffers".
On Arm platforms the render buffers are allocated on the GPU's DRM
device as far as I understand, while the KMS drivers allocate scanout
buffers using the dumb buffers API.

--
Regards,

Laurent Pinchart

2024-05-16 11:27:42

by Laurent Pinchart

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi Nicolas,

On Wed, May 15, 2024 at 01:43:58PM -0400, [email protected] wrote:
> Le mardi 14 mai 2024 à 23:42 +0300, Laurent Pinchart a écrit :
> > > You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
> > > only offer allocation for render buffers and most of them are missing allocators
> > > for YUV buffers, even though they can import in these formats. (kms allocators,
> > > except dumb, which has other issues, are format aware).
> >
> > My experience on Arm platforms is that the KMS drivers offer allocation
> > for scanout buffers, not render buffers, and mostly using the dumb
> > allocator API. If the KMS device can scan out YUV natively, YUV buffer
> > allocation should be supported. Am I missing something here ?
>
> There is two APIs, Dumb is the legacy allocation API, only used by display

Is it legacy only ? I understand the dumb buffers API to be officially
supported, to allocate scanout buffers suitable for software rendering.

> drivers indeed, and the API does not include a pixel format or a modifier. The
> allocation of YUV buffer has been made through a small hack,
>
> bpp = number of bits per component (of luma plane if multiple planes)
> width = width
> height = height * X
>
> Where X will vary, "3 / 2" is used for 420 subsampling, "2" for 422 and "3" for
> 444. It is far from idea, requires deep knowledge of each formats in the
> application

I'm not sure I see that as an issue, but our experiences and uses cases
may vary :-)

> and cannot allocate each planes seperatly.

For semi-planar or planar formats, unless I'm mistaken, you can either
allocate a single buffer and use it with appropriate offsets when
constructing your framebuffer (with DRM_IOCTL_MODE_ADDFB2), or allocate
one buffer per plane.

> The second is to use the driver specific allocation API. This is then abstracted
> by GBM. This allows allocating render buffers with notably modifiers and/or use
> cases. But no support for YUV formats or multi-planar formats.

GBM is the way to go for render buffers indeed. It has been designed
with only graphics buffer management use cases in mind, so it's
unfortunately not an option as a generic allocator, at least in its
current form.

--
Regards,

Laurent Pinchart

2024-05-16 13:52:04

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Mon, May 13, 2024 at 01:51:23PM +0000, Simon Ser wrote:
> On Wednesday, May 8th, 2024 at 17:49, Daniel Vetter <[email protected]> wrote:
>
> > On Wed, May 08, 2024 at 09:38:33AM +0100, Daniel Stone wrote:
> >
> > > On Wed, 8 May 2024 at 09:33, Daniel Vetter [email protected] wrote:
> > >
> > > > On Wed, May 08, 2024 at 06:46:53AM +0100, Daniel Stone wrote:
> > > >
> > > > > That would have the unfortunate side effect of making sandboxed apps
> > > > > less efficient on some platforms, since they wouldn't be able to do
> > > > > direct scanout anymore ...
> > > >
> > > > I was assuming that everyone goes through pipewire, and ideally that is
> > > > the only one that can even get at these special chardev.
> > > >
> > > > If pipewire is only for sandboxed apps then yeah this aint great :-/
> > >
> > > No, PipeWire is fine, I mean graphical apps.
> > >
> > > Right now, if your platform requires CMA for display, then the app
> > > needs access to the GPU render node and the display node too, in order
> > > to allocate buffers which the compositor can scan out directly. If it
> > > only has access to the render nodes and not the display node, it won't
> > > be able to allocate correctly, so its content will need a composition
> > > pass, i.e. performance penalty for sandboxing. But if it can allocate
> > > correctly, then hey, it can exhaust CMA just like heaps can.
> > >
> > > Personally I think we'd be better off just allowing access and
> > > figuring out cgroups later. It's not like the OOM story is great
> > > generally, and hey, you can get there with just render nodes ...
> >
> > Imo the right fix is to ask the compositor to allocate the buffers in this
> > case, and then maybe have some kind of revoke/purge behaviour on these
> > buffers. Compositor has an actual idea of who's a candidate for direct
> > scanout after all, not the app. Or well at least force migrate the memory
> > from cma to shmem.
> >
> > If you only whack cgroups on this issue you're still stuck in the world
> > where either all apps together can ddos the display or no one can
> > realistically direct scanout.
> >
> > So yeah on the display side the problem isn't solved either, but we knew
> > that already.
>
> What makes scanout memory so special?
>
> The way I see it, any kind of memory will always be a limited resource:
> regular programs can exhaust system memory, as well as GPU VRAM, as well
> as scanout memory. I think we need to have ways to limit/control/arbiter
> the allocations regardless, and I don't think scanout memory should be a
> special case here.

(Long w/en and I caught a cold)

It's not scanout that's special, it's cma memory that's special. Because
once you've allocated it, it's gone since it cannot be swapped out, and
there's not a lot of it to go around. Which means even if we'd have
cgroups for all the various gpu allocation heaps, you can't use cgroups to
manage cma in a meaningful way:

- You set the cgroup limits so low for apps that it's guaranteed that the
compositor will always be able to allocate enough scanout memory for
it's need. That will be low enough that apps can never allocate scanout
buffers themselves.

- Or you set the limit high enough so that apps can allocate enough, which
means (as soon as you have more than just one app and not a totally
bonkers amount of cma) that the compositor might not be able to allocate
anymore.

It's kinda shit situation, which is also why you need the compositor to be
able to revoke cma allocations it has handed to clients (like with drm
leases).

Or we just keep the current yolo situation.

For any other memory type than CMA most of the popular drivers at least
implement swapping, which gives you a ton more flexibility in setting up
limits in a way that actually work. But even there we'd need cgroups first
to make sure things don't go wrong too badly in the face of evil apps ...
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2024-05-21 08:43:57

by Maxime Ripard

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Thu, May 16, 2024 at 01:11:51PM GMT, [email protected] wrote:
> Le jeudi 16 mai 2024 ? 14:27 +0300, Laurent Pinchart a ?crit?:
> > Hi Nicolas,
> >
> > On Wed, May 15, 2024 at 01:43:58PM -0400, [email protected] wrote:
> > > Le mardi 14 mai 2024 ? 23:42 +0300, Laurent Pinchart a ?crit?:
> > > > > You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
> > > > > only offer allocation for render buffers and most of them are missing allocators
> > > > > for YUV buffers, even though they can import in these formats. (kms allocators,
> > > > > except dumb, which has other issues, are format aware).
> > > >
> > > > My experience on Arm platforms is that the KMS drivers offer allocation
> > > > for scanout buffers, not render buffers, and mostly using the dumb
> > > > allocator API. If the KMS device can scan out YUV natively, YUV buffer
> > > > allocation should be supported. Am I missing something here ?
> > >
> > > There is two APIs, Dumb is the legacy allocation API, only used by display
> >
> > Is it legacy only ? I understand the dumb buffers API to be officially
> > supported, to allocate scanout buffers suitable for software rendering.
> >
> > > drivers indeed, and the API does not include a pixel format or a modifier. The
> > > allocation of YUV buffer has been made through a small hack,
> > >
> > > bpp = number of bits per component (of luma plane if multiple planes)
> > > width = width
> > > height = height * X
> > >
> > > Where X will vary, "3 / 2" is used for 420 subsampling, "2" for 422 and "3" for
> > > 444. It is far from idea, requires deep knowledge of each formats in the
> > > application
> >
> > I'm not sure I see that as an issue, but our experiences and uses cases
> > may vary :-)
>
> Its extra burden, and does not scale to all available pixel formats. My reply
> was for readers education as I feel like a lot of linux-media dev don't have a
> clue of what is going on at the rendering side. This ensure a minimum knowledge
> to everyone commenting.
>
> And yes, within the GFX community, Dumb allocation is to be killed and
> replacement completely in the future, it simply does not have a complete
> replacement yet.
>
> >
> > > and cannot allocate each planes seperatly.
> >
> > For semi-planar or planar formats, unless I'm mistaken, you can either
> > allocate a single buffer and use it with appropriate offsets when
> > constructing your framebuffer (with DRM_IOCTL_MODE_ADDFB2), or allocate
> > one buffer per plane.
>
> We have use cases were single allocation is undesirable, but I don't really feel
> like this is important enough for me to type this explanation. Ping me if you
> care.
> >
> > > The second is to use the driver specific allocation API. This is then abstracted
> > > by GBM. This allows allocating render buffers with notably modifiers and/or use
> > > cases. But no support for YUV formats or multi-planar formats.
> >
> > GBM is the way to go for render buffers indeed. It has been designed
> > with only graphics buffer management use cases in mind, so it's
> > unfortunately not an option as a generic allocator, at least in its
> > current form.
> >
>
> What I perhaps should have highlighted that is that all these allocators in the
> GFX (called DRM, but meh) subsystem abstract away some deep knowledge of the HW
> requirements. Heaps are lower level APIs that assume that userspace have this
> knowledge. The Android and ChromeOS solution is to take the implementation from
> the kernel and move it into userspace, see minigbm from chromeos, or gralloc
> from Android. As these two projects are device centric, they are not usable on
> generic Linux. Heaps might have some future, but not without other piece of the
> puzzle.
>
> To come back to you wanting heaps in libcamera, because it makes them better for
> rendered or display. Well today this is a lie you make to yourself, because this
> is just a tiny bit of the puzzle, it is pure luck if you allocate dmabuf is
> usable but a foreign device. At the end of the day, this is just a fallback to
> satisfy that application are not forced to allocate that memory in libcamera.

I mean, it's pure luck, but can you point to any platform supported
upstream where it wouldn't work?

> Thus, I strongly recommend the udmabuf in the short term. Finally, moving to
> heaps when the reported issue is resolved, as then it gives more options and
> reduce the number of layers.

udmabuf wouldn't work with any platform without an IOMMU. We have plenty
of those.

All things considered, while I agree that it isn't the ideal solution,
we really don't have a better (ie, works on a larger set of platforms)
solution at the moment or in the next 5 years.

Maxime


Attachments:
(No filename) (4.79 kB)
signature.asc (281.00 B)
Download all attachments

2024-05-22 13:05:01

by Nicolas Dufresne

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Le jeudi 16 mai 2024 à 14:20 +0300, Laurent Pinchart a écrit :
> On Thu, May 16, 2024 at 07:00:31AM +0000, Simon Ser wrote:
> > On Tuesday, May 14th, 2024 at 22:42, Laurent Pinchart wrote:
> >
> > > My experience on Arm platforms is that the KMS drivers offer allocation
> > > for scanout buffers, not render buffers, and mostly using the dumb
> > > allocator API. If the KMS device can scan out YUV natively, YUV buffer
> > > allocation should be supported. Am I missing something here ?
> >
> > Note that dumb buffers are only intended for simple software-rendering
> > use-cases. Anything more complicated (e.g. involving GPU rendering)
> > should use another mechanism.
>
> Sure. Even if dumb buffers may work for GPU rendering in some cases,
> there's no guarantee they will, so they shouldn't be used.
>
> My comment was related to scanout buffers, as I was puzzled by Nicolas
> mentioning how "KMS drivers only offer allocation for render buffers".
> On Arm platforms the render buffers are allocated on the GPU's DRM
> device as far as I understand, while the KMS drivers allocate scanout
> buffers using the dumb buffers API.
>

The message is getting distorted. I'm saying that not all supported formats have
an allocation API in DRM/KMS drivers. Most YUV formats needed for media handling
(GPU or scannout) are not supported.

Nicolas

p.s. I feel like commenters thinks its evident for userspace application to know
if they are doing scanout or GPU ... while in reality, they offload their
allocated buffer to a compositor which will have to dynamically juggle between
these two with its own heuristic.


2024-05-22 13:36:21

by Maxime Ripard

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

Hi,

On Mon, May 06, 2024 at 03:38:24PM GMT, Daniel Vetter wrote:
> On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
> > Hi,
> >
> > On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> > > Hi dma-buf maintainers, et.al.,
> > >
> > > Various people have been working on making complex/MIPI cameras work OOTB
> > > with mainline Linux kernels and an opensource userspace stack.
> > >
> > > The generic solution adds a software ISP (for Debayering and 3A) to
> > > libcamera. Libcamera's API guarantees that buffers handed to applications
> > > using it are dma-bufs so that these can be passed to e.g. a video encoder.
> > >
> > > In order to meet this API guarantee the libcamera software ISP allocates
> > > dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> > > the Fedora COPR repo for the PoC of this:
> > > https://hansdegoede.dreamwidth.org/28153.html
> >
> > For the record, we're also considering using them for ARM KMS devices,
> > so it would be better if the solution wasn't only considering v4l2
> > devices.
> >
> > > I have added a simple udev rule to give physically present users access
> > > to the dma_heap-s:
> > >
> > > KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
> > >
> > > (and on Rasperry Pi devices any users in the video group get access)
> > >
> > > This was just a quick fix for the PoC. Now that we are ready to move out
> > > of the PoC phase and start actually integrating this into distributions
> > > the question becomes if this is an acceptable solution; or if we need some
> > > other way to deal with this ?
> > >
> > > Specifically the question is if this will have any negative security
> > > implications? I can certainly see this being used to do some sort of
> > > denial of service attack on the system (1). This is especially true for
> > > the cma heap which generally speaking is a limited resource.
> >
> > There's plenty of other ways to exhaust CMA, like allocating too much
> > KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> > differently than those if it's part of our threat model.
>
> So generally for an arm soc where your display needs cma, your render node
> doesn't. And user applications only have access to the later, while only
> the compositor gets a kms fd through logind. At least in drm aside from
> vc4 there's really no render driver that just gives you access to cma and
> allows you to exhaust that, you need to be a compositor with drm master
> access to the display.
>
> Which means we're mostly protected against bad applications, and that's
> not a threat the "user physically sits in front of the machine accounts
> for", and which giving cma access to everyone would open up. And with
> flathub/snaps/... this is very much an issue.
>
> So you need more, either:
>
> - cgroups limits on dma-buf and dma-buf heaps. This has been bikeshedded
> for years and is just not really moving.

For reference, are you talking about:

https://lore.kernel.org/r/[email protected]

Or has there been a new version of that recently?

Maxime


Attachments:
(No filename) (3.12 kB)
signature.asc (281.00 B)
Download all attachments

2024-05-23 09:41:42

by Daniel Vetter

[permalink] [raw]
Subject: Re: Safety of opening up /dev/dma_heap/* to physically present users (udev uaccess tag) ?

On Wed, May 22, 2024 at 03:34:52PM +0200, Maxime Ripard wrote:
> Hi,
>
> On Mon, May 06, 2024 at 03:38:24PM GMT, Daniel Vetter wrote:
> > On Mon, May 06, 2024 at 02:05:12PM +0200, Maxime Ripard wrote:
> > > Hi,
> > >
> > > On Mon, May 06, 2024 at 01:49:17PM GMT, Hans de Goede wrote:
> > > > Hi dma-buf maintainers, et.al.,
> > > >
> > > > Various people have been working on making complex/MIPI cameras work OOTB
> > > > with mainline Linux kernels and an opensource userspace stack.
> > > >
> > > > The generic solution adds a software ISP (for Debayering and 3A) to
> > > > libcamera. Libcamera's API guarantees that buffers handed to applications
> > > > using it are dma-bufs so that these can be passed to e.g. a video encoder.
> > > >
> > > > In order to meet this API guarantee the libcamera software ISP allocates
> > > > dma-bufs from userspace through one of the /dev/dma_heap/* heaps. For
> > > > the Fedora COPR repo for the PoC of this:
> > > > https://hansdegoede.dreamwidth.org/28153.html
> > >
> > > For the record, we're also considering using them for ARM KMS devices,
> > > so it would be better if the solution wasn't only considering v4l2
> > > devices.
> > >
> > > > I have added a simple udev rule to give physically present users access
> > > > to the dma_heap-s:
> > > >
> > > > KERNEL=="system", SUBSYSTEM=="dma_heap", TAG+="uaccess"
> > > >
> > > > (and on Rasperry Pi devices any users in the video group get access)
> > > >
> > > > This was just a quick fix for the PoC. Now that we are ready to move out
> > > > of the PoC phase and start actually integrating this into distributions
> > > > the question becomes if this is an acceptable solution; or if we need some
> > > > other way to deal with this ?
> > > >
> > > > Specifically the question is if this will have any negative security
> > > > implications? I can certainly see this being used to do some sort of
> > > > denial of service attack on the system (1). This is especially true for
> > > > the cma heap which generally speaking is a limited resource.
> > >
> > > There's plenty of other ways to exhaust CMA, like allocating too much
> > > KMS or v4l2 buffers. I'm not sure we should consider dma-heaps
> > > differently than those if it's part of our threat model.
> >
> > So generally for an arm soc where your display needs cma, your render node
> > doesn't. And user applications only have access to the later, while only
> > the compositor gets a kms fd through logind. At least in drm aside from
> > vc4 there's really no render driver that just gives you access to cma and
> > allows you to exhaust that, you need to be a compositor with drm master
> > access to the display.
> >
> > Which means we're mostly protected against bad applications, and that's
> > not a threat the "user physically sits in front of the machine accounts
> > for", and which giving cma access to everyone would open up. And with
> > flathub/snaps/... this is very much an issue.
> >
> > So you need more, either:
> >
> > - cgroups limits on dma-buf and dma-buf heaps. This has been bikeshedded
> > for years and is just not really moving.
>
> For reference, are you talking about:
>
> https://lore.kernel.org/r/[email protected]
>
> Or has there been a new version of that recently?

I think the design feedback from Tejun has changed to that system memory
should be tracked with memcg instead (but that kinda leaves the open of
what to do with cma), and only device memory be tracked with a separate
cgroups controller.

But I'm also not sure whether that would actually solve all the
tracking/isolation requirements people tossed around or just gives us
something that wont get the job done.

Either way, yes I think that was the most recent code.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch