LinuxLists.cc - systemd-rfkill regression on 5.11 and later kernels

2021-03-18 08:28:35

Subject: systemd-rfkill regression on 5.11 and later kernels

Hi,

we've received a bug report about rfkill change that was introduced in
5.11. While the systemd-rfkill expects the same size of both read and
write, the kernel rfkill write cuts off to the old 8 bytes while read
gives 9 bytes, hence it leads the error:
https://github.com/systemd/systemd/issues/18677
https://bugzilla.opensuse.org/show_bug.cgi?id=1183147

As far as I understand from the log in the commit 14486c82612a, this
sounds like the intended behavior. But if this was implemented in
that way just for the compatibility reason, it actually is worse,
introducing a regression.

Although this can be addressed easily in the systemd side, the current
kernel behavior needs reconsideration, IMO.

thanks,

Takashi

2021-03-18 09:29:43

by Emmanuel Grumbach

[permalink] [raw]

Subject: Re: systemd-rfkill regression on 5.11 and later kernels

Hi,

On Thu, Mar 18, 2021 at 10:31 AM Takashi Iwai <[email protected]> wrote:
>
> Hi,
>
> we've received a bug report about rfkill change that was introduced in
> 5.11. While the systemd-rfkill expects the same size of both read and
> write, the kernel rfkill write cuts off to the old 8 bytes while read
> gives 9 bytes, hence it leads the error:
> https://github.com/systemd/systemd/issues/18677
> https://bugzilla.opensuse.org/show_bug.cgi?id=1183147

If you use an old kernel that expects only 8 bytes and you write 9,
then yes, the kernel will read only 8.
If you use a new kernel (5.11) and you send 9 bytes, the kernel will
read all the 9 bytes, so I am not sure I understand the problem here.
If you have a new header file that makes you send 9 bytes, then, in
order to run against an old kernel (which seems to have been the case
with the report in github), then you must be ready to have the kernel
read less than 9 bytes.
What am I missing?

>
> As far as I understand from the log in the commit 14486c82612a, this
> sounds like the intended behavior. But if this was implemented in
> that way just for the compatibility reason, it actually is worse,
> introducing a regression.
>
> Although this can be addressed easily in the systemd side, the current
> kernel behavior needs reconsideration, IMO.
>
>
> thanks,
>
> Takashi

2021-03-18 09:37:28

by Johannes Berg

[permalink] [raw]

Subject: Re: systemd-rfkill regression on 5.11 and later kernels

Hi Takashi,

Oh yay :-(

> we've received a bug report about rfkill change that was introduced in
> 5.11. While the systemd-rfkill expects the same size of both read and
> write, the kernel rfkill write cuts off to the old 8 bytes while read
> gives 9 bytes, hence it leads the error:
> https://github.com/systemd/systemd/issues/18677
> https://bugzilla.opensuse.org/show_bug.cgi?id=1183147

> As far as I understand from the log in the commit 14486c82612a, this
> sounds like the intended behavior.

Not really? I don't even understand why we get this behaviour.

The code is this:

rfkill_fop_write():

...
/* we don't need the 'hard' variable but accept it */
if (count < RFKILL_EVENT_SIZE_V1 - 1)
return -EINVAL;

# this is actually 7 - RFKILL_EVENT_SIZE_V1 is 8
# (and obviously we get past this if and don't get -EINVAL

/*
* Copy as much data as we can accept into our 'ev' buffer,
* but tell userspace how much we've copied so it can determine
* our API version even in a write() call, if it cares.
*/
count = min(count, sizeof(ev));

# sizeof(ev) should be 9 since 'ev' is the new struct

if (copy_from_user(&ev, buf, count))
return -EFAULT;

...
ret = 0;
...
return ret ?: count;

Ah, no, I see. The bug says:

EDIT: above is with kernel-core-5.10.16-200.fc33.x86_64.

So you've recompiled systemd with 5.11 headers, but are running against
5.10 now, where the short write really was intentional - it lets you
detect that the new fields weren't handled by the kernel. If

The other issue is basically this (pre-fix) systemd code:

l = read(c.rfkill_fd, &event, sizeof(event));
...
if (l != RFKILL_EVENT_SIZE_V1) /* log/return error */

So ... honestly, I don't have all that much sympathy, when the uapi
header explicitly says we want to be able to change the size. But I
guess "no regressions" rules are hard, so ... dunno. I guess we can add
a version/size ioctl and keep using 8 bytes unless you send that?

johannes

2021-03-18 10:09:16

by Takashi Iwai

[permalink] [raw]

Subject: Re: systemd-rfkill regression on 5.11 and later kernels

On Thu, 18 Mar 2021 10:36:19 +0100,
Johannes Berg wrote:
>
> Hi Takashi,
>
> Oh yay :-(
>
> > we've received a bug report about rfkill change that was introduced in
> > 5.11. While the systemd-rfkill expects the same size of both read and
> > write, the kernel rfkill write cuts off to the old 8 bytes while read
> > gives 9 bytes, hence it leads the error:
> > https://github.com/systemd/systemd/issues/18677
> > https://bugzilla.opensuse.org/show_bug.cgi?id=1183147
>
> > As far as I understand from the log in the commit 14486c82612a, this
> > sounds like the intended behavior.
>
> Not really? I don't even understand why we get this behaviour.
>
> The code is this:
>
> rfkill_fop_write():
>
> ...
> /* we don't need the 'hard' variable but accept it */
> if (count < RFKILL_EVENT_SIZE_V1 - 1)
> return -EINVAL;
>
> # this is actually 7 - RFKILL_EVENT_SIZE_V1 is 8
> # (and obviously we get past this if and don't get -EINVAL
>
> /*
> * Copy as much data as we can accept into our 'ev' buffer,
> * but tell userspace how much we've copied so it can determine
> * our API version even in a write() call, if it cares.
> */
> count = min(count, sizeof(ev));
>
> # sizeof(ev) should be 9 since 'ev' is the new struct
>
> if (copy_from_user(&ev, buf, count))
> return -EFAULT;
>
> ...
> ret = 0;
> ...
> return ret ?: count;
>
>
>
>
> Ah, no, I see. The bug says:
>
> EDIT: above is with kernel-core-5.10.16-200.fc33.x86_64.
>
> So you've recompiled systemd with 5.11 headers, but are running against
> 5.10 now, where the short write really was intentional - it lets you
> detect that the new fields weren't handled by the kernel. If
>
>
> The other issue is basically this (pre-fix) systemd code:
>
> l = read(c.rfkill_fd, &event, sizeof(event));
> ...
> if (l != RFKILL_EVENT_SIZE_V1) /* log/return error */
>
>
>
> So ... honestly, I don't have all that much sympathy, when the uapi
> header explicitly says we want to be able to change the size. But I
> guess "no regressions" rules are hard, so ... dunno. I guess we can add
> a version/size ioctl and keep using 8 bytes unless you send that?

OK, I took a deeper look again, and actually there are two issues in
systemd-rfkill code:

* It expects 8 bytes returned from read while it reads a struct
rfkill_event record. If the code is rebuilt with the latest kernel
headers, it breaks due to the change of rfkill_event. That's the
error openSUSE bug report points to.

* When systemd-rfkill is built with the latest kernel headers but runs
on the old kernel code, the write size check fails as you mentioned
in the above. That's another part of the github issue.

So, with a kernel devs hat on, I share your feeling, that's an
application bug. OTOH, the extension of the rfkill_event is, well,
not really safe as expected.

IMO, if systemd-rfkill is the only one that hits such a problem, we
may let the systemd code fixed, as it's obviously buggy. But who
knows...

Is the extension of rfkill_event mandatory? Can the new entry
provided in a different way such as another sysfs record?
IOW, if we revert the change, would it break anything else new?

thanks,

Takashi

2021-03-19 22:16:35

by Johannes Berg

[permalink] [raw]

Subject: Re: systemd-rfkill regression on 5.11 and later kernels

On Thu, 2021-03-18 at 12:16 +0100, Johannes Berg wrote:
> > Yeah, that's a dilemma. An oft-seen trick is to add more bytes for
> > the future use, e.g. extend to 16 bytes and fill 0 for the remaining.
>
> Yeah, I guess I could stick a reserved[15] there, it's small enough.

Actually, that doesn't really help anything either.

If today I require that the reserved bytes are sent as 0 by userspace,
then any potential expansion that requires userspace to set it will
break when userspace does it and runs on an old kernel.

If I don't require the reserved bytes to be set to 0 then somebody will
invariably get it wrong and send garbage, and then we again cannot
extend it.

So ... that all seems pointless. I guess I'll send the patch as it is
now.

johannes