It'd be nice to allow a process to send RT requests without granting
it the wide capabilities of CAP_SYS_ADMIN, and we already have a
capability which seems to almost fit this priority idea -
CAP_SYS_NICE? Would this fit there?
Being capable of setting IO priorities on per request or per thread
basis (be it async submission or w/ thread ioprio_set) is useful
especially when the userspace has its own prioritization/scheduling
before hitting the kernel, allowing us to signal to the kernel how to
order certain IOs, and it'd be nice to separate this from ADMIN for
non-root processes, in a way that's less error prone than e.g. having
a trusted launcher ionice the process and then drop priorities for
everything but prio requests.
khazhy
On 2020-08-20 17:35, Khazhismel Kumykov wrote:
> It'd be nice to allow a process to send RT requests without granting
> it the wide capabilities of CAP_SYS_ADMIN, and we already have a
> capability which seems to almost fit this priority idea -
> CAP_SYS_NICE? Would this fit there?
>
> Being capable of setting IO priorities on per request or per thread
> basis (be it async submission or w/ thread ioprio_set) is useful
> especially when the userspace has its own prioritization/scheduling
> before hitting the kernel, allowing us to signal to the kernel how to
> order certain IOs, and it'd be nice to separate this from ADMIN for
> non-root processes, in a way that's less error prone than e.g. having
> a trusted launcher ionice the process and then drop priorities for
> everything but prio requests.
Hi Khazhy,
In include/uapi/linux/capability.h I found the following:
/* Allow raising priority and setting priority on other (different
UID) processes */
/* Allow use of FIFO and round-robin (realtime) scheduling on own
processes and setting the scheduling algorithm used by another
process. */
/* Allow setting cpu affinity on other processes */
#define CAP_SYS_NICE 23
If it is acceptable that every process that has permission to submit
IOPRIO_CLASS_RT I/O also has permission to modify the priority of
other processes then extending CAP_SYS_NICE is an option. Another
possibility is to extend the block cgroup controller such that the
capability to submit IOPRIO_CLASS_RT I/O can be enabled through the
cgroup interface. There may be other approaches. I'm not sure what
the best approach is.
Bart.
On 8/22/20 7:58 PM, Bart Van Assche wrote:
> On 2020-08-20 17:35, Khazhismel Kumykov wrote:
>> It'd be nice to allow a process to send RT requests without granting
>> it the wide capabilities of CAP_SYS_ADMIN, and we already have a
>> capability which seems to almost fit this priority idea -
>> CAP_SYS_NICE? Would this fit there?
>>
>> Being capable of setting IO priorities on per request or per thread
>> basis (be it async submission or w/ thread ioprio_set) is useful
>> especially when the userspace has its own prioritization/scheduling
>> before hitting the kernel, allowing us to signal to the kernel how to
>> order certain IOs, and it'd be nice to separate this from ADMIN for
>> non-root processes, in a way that's less error prone than e.g. having
>> a trusted launcher ionice the process and then drop priorities for
>> everything but prio requests.
>
> Hi Khazhy,
>
> In include/uapi/linux/capability.h I found the following:
>
> /* Allow raising priority and setting priority on other (different
> UID) processes */
> /* Allow use of FIFO and round-robin (realtime) scheduling on own
> processes and setting the scheduling algorithm used by another
> process. */
> /* Allow setting cpu affinity on other processes */
> #define CAP_SYS_NICE 23
>
> If it is acceptable that every process that has permission to submit
> IOPRIO_CLASS_RT I/O also has permission to modify the priority of
> other processes then extending CAP_SYS_NICE is an option. Another
> possibility is to extend the block cgroup controller such that the
> capability to submit IOPRIO_CLASS_RT I/O can be enabled through the
> cgroup interface. There may be other approaches. I'm not sure what
> the best approach is.
I think CAP_SYS_NICE fits pretty nicely, and I was actually planning on
using that for the io_uring SQPOLL side as well. So there is/will be
some precedent for tying it into IO related things, too. For this use
case, I think it's perfect.
--
Jens Axboe
On Sat, Aug 22, 2020 at 7:14 PM Jens Axboe <[email protected]> wrote:
>
> On 8/22/20 7:58 PM, Bart Van Assche wrote:
> > On 2020-08-20 17:35, Khazhismel Kumykov wrote:
> >> It'd be nice to allow a process to send RT requests without granting
> >> it the wide capabilities of CAP_SYS_ADMIN, and we already have a
> >> capability which seems to almost fit this priority idea -
> >> CAP_SYS_NICE? Would this fit there?
> >>
> >> Being capable of setting IO priorities on per request or per thread
> >> basis (be it async submission or w/ thread ioprio_set) is useful
> >> especially when the userspace has its own prioritization/scheduling
> >> before hitting the kernel, allowing us to signal to the kernel how to
> >> order certain IOs, and it'd be nice to separate this from ADMIN for
> >> non-root processes, in a way that's less error prone than e.g. having
> >> a trusted launcher ionice the process and then drop priorities for
> >> everything but prio requests.
> >
> > Hi Khazhy,
> >
> > In include/uapi/linux/capability.h I found the following:
> >
> > /* Allow raising priority and setting priority on other (different
> > UID) processes */
> > /* Allow use of FIFO and round-robin (realtime) scheduling on own
> > processes and setting the scheduling algorithm used by another
> > process. */
> > /* Allow setting cpu affinity on other processes */
> > #define CAP_SYS_NICE 23
> >
> > If it is acceptable that every process that has permission to submit
> > IOPRIO_CLASS_RT I/O also has permission to modify the priority of
> > other processes then extending CAP_SYS_NICE is an option. Another
> > possibility is to extend the block cgroup controller such that the
> > capability to submit IOPRIO_CLASS_RT I/O can be enabled through the
> > cgroup interface. There may be other approaches. I'm not sure what
> > the best approach is.
I think it fits well with CAP_SYS_NICE, especially since that
capability already grants the ability to demote other processes to
IOPRIO_CLASS_IDLE, etc.
>
> I think CAP_SYS_NICE fits pretty nicely, and I was actually planning on
> using that for the io_uring SQPOLL side as well. So there is/will be
> some precedent for tying it into IO related things, too. For this use
> case, I think it's perfect.
>
> --
> Jens Axboe
>