2021-04-14 00:21:38

by Hans de Goede

[permalink] [raw]
Subject: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

I've been debugging a userspace rfkill issue today which boils down
to the "rfkill: add a reason to the HW rfkill state" patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14486c82612a177cb910980c70ba900827ca0894
breaking userspace.

It is too late to fix this now since we likely also have new
userspace depending on the new API, but I still thought I
should report this.

I've submitted a fix for the problematic userspace bits here:
https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/232

Let me quote the commit msg which explains the problem:

"""
Access to a /dev/foo device should never use buffered mode.

While debugging a gsd-rfkill issue I noticed in the g_debug output
that the rfkill-glib.c code now seems to be receiving bogus events.
Doing a strace I noticed some read(dev_rfkill_fd, buf, 8) calls,
even though we call:
g_io_channel_read_chars(..., sizeof(struct rfkill_event, ...)
Which requests 9 bytes.

The problem is the kernel expects us to read 1 event per read() system-call
and it will throw away excess data. The idea is here that the rfkill_event
struct can be extended by adding new fields at the end and then userspace
code compiled against older kernel headers will still work since it
will only read the fields it knows in a single call and the
extra fields are thrown away.

Since the rfkill-glib.c code was using buffered-io and asking
g_io_channel_read_chars for 9 bytes when compiled against recent
kernel headers, what would happen is that 2 events would be consumed
in 2 read(fd, buf, 8) syscalls and then the first byte of the
second event read would be appended to the previous event and
the remaining 7 bytes would be used as the first 7 bytes for the
next event (and eventually completed with the first 2 bytes of
the next event, etc.). Leading to completely bogus events.

Enabling unbuffered mode fixes this.

(before the kernel change the rfkill_event struct was 8 bytes large
which allowed us to get away with using buffered io here.)
"""

Note this is new userspace on a new kernel actually being broken.

I believe that the new userspace (expecting 9 bytes) on old kernel
will also be broken, since a naive userspace implementation will do:

if (read(fd, buf, sizeof(struct rfkill_event)) != sizeof(struct rfkill_event))
/* Do error */

Which means that after a recompile on a new kernel it will expect 9
bytes from a read call an if it gets only 8 then it will consider
that an error (or worse it could try to do a second read to make-up
for the missing byte). Note that gnome-settings-daemon still has
the new-userspace on old-kernel issue even after my fix...

I believe that all that we can do now is fix userspace where necessary :|
but this is something to keep in mind for future fixes.

Regards,

Hans


2021-04-14 12:36:44

by Grumbach, Emmanuel

[permalink] [raw]
Subject: RE: "rfkill: add a reason to the HW rfkill state" breaks userspace

>
> Hi,
>
> I've been debugging a userspace rfkill issue today which boils down to the
> "rfkill: add a reason to the HW rfkill state" patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=14486c82612a177cb910980c70ba900827ca0894
> breaking userspace.

This has been rolled back by:
https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
Other userspace broke (systemd) so Johannes rolled this back by default.
Userspace that is interested in the new byte will read 9 bytes.

>
> It is too late to fix this now since we likely also have new userspace
> depending on the new API, but I still thought I should report this.
>
> I've submitted a fix for the problematic userspace bits here:
> https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-
> /merge_requests/232
>
> Let me quote the commit msg which explains the problem:
>
> """
> Access to a /dev/foo device should never use buffered mode.
>
> While debugging a gsd-rfkill issue I noticed in the g_debug output that the
> rfkill-glib.c code now seems to be receiving bogus events.
> Doing a strace I noticed some read(dev_rfkill_fd, buf, 8) calls, even though
> we call:
> g_io_channel_read_chars(..., sizeof(struct rfkill_event, ...) Which requests 9
> bytes.
>
> The problem is the kernel expects us to read 1 event per read() system-call
> and it will throw away excess data. The idea is here that the rfkill_event
> struct can be extended by adding new fields at the end and then userspace
> code compiled against older kernel headers will still work since it will only
> read the fields it knows in a single call and the extra fields are thrown away.
>
> Since the rfkill-glib.c code was using buffered-io and asking
> g_io_channel_read_chars for 9 bytes when compiled against recent kernel
> headers, what would happen is that 2 events would be consumed in 2
> read(fd, buf, 8) syscalls and then the first byte of the second event read
> would be appended to the previous event and the remaining 7 bytes would
> be used as the first 7 bytes for the next event (and eventually completed
> with the first 2 bytes of the next event, etc.). Leading to completely bogus
> events.
>
> Enabling unbuffered mode fixes this.
>
> (before the kernel change the rfkill_event struct was 8 bytes large which
> allowed us to get away with using buffered io here.) """
>
> Note this is new userspace on a new kernel actually being broken.
>
> I believe that the new userspace (expecting 9 bytes) on old kernel will also
> be broken, since a naive userspace implementation will do:
>
> if (read(fd, buf, sizeof(struct rfkill_event)) != sizeof(struct
> rfkill_event))
> /* Do error */
>
> Which means that after a recompile on a new kernel it will expect 9 bytes
> from a read call an if it gets only 8 then it will consider that an error (or worse
> it could try to do a second read to make-up for the missing byte). Note that
> gnome-settings-daemon still has the new-userspace on old-kernel issue
> even after my fix...
>
> I believe that all that we can do now is fix userspace where necessary :| but
> this is something to keep in mind for future fixes.
>
> Regards,
>
> Hans

2021-04-14 14:10:55

by Johannes Berg

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote:
> >
> > Hi,
> >
> > I've been debugging a userspace rfkill issue today which boils down
> > to the
> > "rfkill: add a reason to the HW rfkill state" patch:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> > d=14486c82612a177cb910980c70ba900827ca0894
> > breaking userspace.
>
> This has been rolled back by:
> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
> Other userspace broke (systemd) so Johannes rolled this back by
> default.
> Userspace that is interested in the new byte will read 9 bytes.

Which, unfortunately, doesn't address *this* particular case, because it
uses gio and that will fill the buffer with arbitrary size?

When you (Hans) say you saw in strace a read of size 8, did you mean the
size passed to it, or the return size? I guess it must be the return
size, and the size passed to it was way larger.

The commit Emmanuel linked to fixes cases such as systemd that were just
completely garbage (reading with one size, and then checking they got
another), but it wouldn't fix this case.

Unfortunately, as you also said, it does seem a bit late now - it's been
released in various kernels since 5.10, and while the default rollback
will improve the situation somewhat, read(..., size>8) will still return
9 bytes rather than 8 as it used to. Switching that *also* back *should*
be safe, but who knows what other bugs were introduced in the meantime?

I certainly don't really have a major objection to rolling that also
back, but would it really help that much at this point? I guess it could
be going into 5.10/5.11 stable kernels though.

johannes

2021-04-14 14:47:47

by Hans de Goede

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

Adding Benjamin Berg who is one of the gnome-settings-daemon
maintainers to the Cc.

On 4/14/21 9:07 AM, Johannes Berg wrote:
> On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote:
>>>
>>> Hi,
>>>
>>> I've been debugging a userspace rfkill issue today which boils down
>>> to the
>>> "rfkill: add a reason to the HW rfkill state" patch:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
>>> d=14486c82612a177cb910980c70ba900827ca0894
>>> breaking userspace.
>>
>> This has been rolled back by:
>> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
>> Other userspace broke (systemd) so Johannes rolled this back by
>> default.

I see, but this change is not in:

kernel-headers-5.11.11-300.fc34.x86_64

Meaning that basically all of Fedora 34 has been built with the "bad"
headers.

>> Userspace that is interested in the new byte will read 9 bytes.
>
> Which, unfortunately, doesn't address *this* particular case, because it
> uses gio and that will fill the buffer with arbitrary size?
>
> When you (Hans) say you saw in strace a read of size 8, did you mean the
> size passed to it, or the return size? I guess it must be the return
> size, and the size passed to it was way larger.

No for some reason, for some later read calls gio was actually passing
8 as size to the read() syscall. And g-s-d was compiled with headers
where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
out of the 16 bytes it got to fill a single rfkill_event which is fine,
except that it uses the remaining 7 bytes as the first 7 bytes of the
next rfkill_event which it processes making that next event be completely
bogus.

I do believe this really is a g-s-d bug though, it should not have
been using a "buffered" gio-channel on a /dev/foo node; and so far it
only got away with this by the rfkill_event size being a nice power of 2
value.

As I mentioned in an email to Benjamin, g-s-d should really switch
to making direct read() calls on the fd circumventing the gio-channel
read code all together:

"Right, notice I just realized that even after my fix there still is an issue,
when running code compiled against new kernel headers gsd-rfkill will now
always expect 9 bytes per event. But when running on an older kernel that
will not be true.

To fix this the code should probably just directly call read() itself
on the fd (only using g_io_channel for the polling) and then accept anything
between 8 bytes and sizeof(struct rfkill_event) as valid return value from the
read() call..."

Notice that the problem which I described there will go away when compiled
against even newer kernel-headers where sizeof(struct rfkill_event) is
back to 8 again.

So question, for code like g-s-d, which does not care about the new
hard_block_reasons field. What would be the preferred / max compatibility
way to do the reads. Also keeping in mind that there are "bad" kernel
headers out there where sizeof(struct rfkill_event) == 9 ?

I think that this would be best:

ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
if (ret == RFKILL_EVENT_SIZE_V1) {
/* Valid event process it */
}

This should produce the same code regardless of the kernel-headers version
and should work on both old and new kernels, correct ?

> The commit Emmanuel linked to fixes cases such as systemd that were just
> completely garbage (reading with one size, and then checking they got
> another), but it wouldn't fix this case.
>
> Unfortunately, as you also said, it does seem a bit late now - it's been
> released in various kernels since 5.10, and while the default rollback
> will improve the situation somewhat, read(..., size>8) will still return
> 9 bytes rather than 8 as it used to. Switching that *also* back *should*
> be safe, but who knows what other bugs were introduced in the meantime?
>
> I certainly don't really have a major objection to rolling that also
> back, but would it really help that much at this point? I guess it could
> be going into 5.10/5.11 stable kernels though.

I don't think that rolling back the new extended-event support altogether
will help. Since this has been out there for 2 released kernel versions
now, I believe the best way to fix this is to fix userspace; and to fix
userspace in such a way (at least g-s-d) that this problem cannot
happen again.

Regards,

Hans

2021-04-14 15:05:12

by Johannes Berg

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi Hans,

> Adding Benjamin Berg who is one of the gnome-settings-daemon
> maintainers to the Cc.

:)
As you might imagine, I've talked to him ;-)
>
> kernel-headers-5.11.11-300.fc34.x86_64

Right, not yet.

> Meaning that basically all of Fedora 34 has been built with the "bad"
> headers.

Yay. But still, I think in this case it wouldn't help.

> > > Userspace that is interested in the new byte will read 9 bytes.
> >
> > Which, unfortunately, doesn't address *this* particular case, because it
> > uses gio and that will fill the buffer with arbitrary size?
> >
> > When you (Hans) say you saw in strace a read of size 8, did you mean the
> > size passed to it, or the return size? I guess it must be the return
> > size, and the size passed to it was way larger.
>
> No for some reason, for some later read calls gio was actually passing
> 8 as size to the read() syscall. 
>

Wait, that's weird? The (latest) gio code just calls
g_io_channel_fill_buffer() and that doesn't care about the size you
passed to g_io_channel_read_chars(), in fact it isn't even passed down,
as you might expect for stream reads?

> And g-s-d was compiled with headers
> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
> out of the 16 bytes it got to fill a single rfkill_event which is fine,
> except that it uses the remaining 7 bytes as the first 7 bytes of the
> next rfkill_event which it processes making that next event be completely
> bogus.

I understand the issue. I just didn't expect it would get solved when
the headers contain a struct rkfill_event of size 8 again, since the
kernel would still return 9 bytes. In effect, you'd still get the same
issue, because the read in g_io_channel_fill_buffer() is 1024
(G_IO_NICE_BUF_SIZE).

>
> I do believe this really is a g-s-d bug though, it should not have
> been using a "buffered" gio-channel on a /dev/foo node; and so far it
> only got away with this by the rfkill_event size being a nice power of 2
> value.

I agree, but ... kernel regressions and all that, right?

> As I mentioned in an email to Benjamin, g-s-d should really switch
> to making direct read() calls on the fd circumventing the gio-channel
> read code all together:
>
> "Right, notice I just realized that even after my fix there still is an issue,
> when running code compiled against new kernel headers gsd-rfkill will now
> always expect 9 bytes per event. But when running on an older kernel that
> will not be true.
>
> To fix this the code should probably just directly call read() itself
> on the fd (only using g_io_channel for the polling) and then accept anything
> between 8 bytes and sizeof(struct rfkill_event) as valid return value from the
> read() call..."

I don't think this is related to gio, you can still use
g_io_channel_read_chars() since that just does the read for you:

if (!channel->use_buffer)
{
gsize tmp_bytes;

g_assert (!channel->read_buf || channel->read_buf->len == 0);

status = channel->funcs->io_read (channel, buf, count, &tmp_bytes, error);

if (bytes_read)
*bytes_read = tmp_bytes;

return status;
}


but checking that you actually got the *exact* size that you wanted is
indeed wrong (as well).

Sounds like I made the wrong call here - I was discussing this with
Benjamin a while ago and decided *not* to add an ioctl to opt in to the
new event type ... sounds like I should have.


Notice that the problem which I described there will go away when compiled
against even newer kernel-headers where sizeof(struct rfkill_event) is
back to 8 again.

Yes.

So question, for code like g-s-d, which does not care about the new
hard_block_reasons field. What would be the preferred / max compatibility
way to do the reads. Also keeping in mind that there are "bad" kernel
headers out there where sizeof(struct rfkill_event) == 9 ?

I think that this would be best:

ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
if (ret == RFKILL_EVENT_SIZE_V1) {
/* Valid event process it */
}

This should produce the same code regardless of the kernel-headers version
and should work on both old and new kernels, correct ?


I believe so, yes.

I don't think that rolling back the new extended-event support altogether
will help. Since this has been out there for 2 released kernel versions
now, I believe the best way to fix this is to fix userspace; and to fix
userspace in such a way (at least g-s-d) that this problem cannot
happen again.

Fair enough.

Thanks,
johannes

2021-04-14 15:15:08

by Benjamin Berg

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

On Wed, 2021-04-14 at 10:17 +0200, Hans de Goede wrote:
> Hi,
>
> Adding Benjamin Berg who is one of the gnome-settings-daemon
> maintainers to the Cc.
>
> On 4/14/21 9:07 AM, Johannes Berg wrote:
> > On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote:
> > > >
> > > > Hi,
> > > >
> > > > I've been debugging a userspace rfkill issue today which boils
> > > > down
> > > > to the
> > > > "rfkill: add a reason to the HW rfkill state" patch:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> > > > d=14486c82612a177cb910980c70ba900827ca0894
> > > > breaking userspace.
> > >
> > > This has been rolled back by:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
> > > Other userspace broke (systemd) so Johannes rolled this back by
> > > default.
>
> I see, but this change is not in:
>
> kernel-headers-5.11.11-300.fc34.x86_64
>
> Meaning that basically all of Fedora 34 has been built with the "bad"
> headers.
>
> > > Userspace that is interested in the new byte will read 9 bytes.
> >
> > Which, unfortunately, doesn't address *this* particular case, because
> > it
> > uses gio and that will fill the buffer with arbitrary size?
> >
> > When you (Hans) say you saw in strace a read of size 8, did you mean
> > the
> > size passed to it, or the return size? I guess it must be the return
> > size, and the size passed to it was way larger.
>
> No for some reason, for some later read calls gio was actually passing
> 8 as size to the read() syscall. And g-s-d was compiled with headers
> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
> out of the 16 bytes it got to fill a single rfkill_event which is fine,
> except that it uses the remaining 7 bytes as the first 7 bytes of the
> next rfkill_event which it processes making that next event be
> completely bogus.
>
> I do believe this really is a g-s-d bug though, it should not have
> been using a "buffered" gio-channel on a /dev/foo node; and so far it
> only got away with this by the rfkill_event size being a nice power of
> 2 value.
>
> As I mentioned in an email to Benjamin, g-s-d should really switch
> to making direct read() calls on the fd circumventing the gio-channel
> read code all together:
>
> "Right, notice I just realized that even after my fix there still is an
> issue, when running code compiled against new kernel headers gsd-rfkill
> will now always expect 9 bytes per event. But when running on an older
> kernel that will not be true.

This confuses me. i.e. if g-s-d is compiled against headers where the
struct is 9 bytes long, then we should just be getting a short read of
8 bytes. And that should be the same, whether we use plain read() or
g_io_channel_read_chars().

I was under the impression that with
https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/232
merged, the read should be fine on the GNOME side.

That said, the g-s-d event handler is checking the size of the read
against the struct size. This is obviously wrong, it should only check
that the read was successful (or check for >= V1 size).

Benjamin

> To fix this the code should probably just directly call read() itself
> on the fd (only using g_io_channel for the polling) and then accept
> anything
> between 8 bytes and sizeof(struct rfkill_event) as valid return value
> from the
> read() call..."
>
> Notice that the problem which I described there will go away when
> compiled
> against even newer kernel-headers where sizeof(struct rfkill_event) is
> back to 8 again.
>
> So question, for code like g-s-d, which does not care about the new
> hard_block_reasons field. What would be the preferred / max
> compatibility
> way to do the reads. Also keeping in mind that there are "bad" kernel
> headers out there where sizeof(struct rfkill_event) == 9 ?
>
> I think that this would be best:
>
>         ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
>         if (ret == RFKILL_EVENT_SIZE_V1) {
>                 /* Valid event process it */
>         }
>
> This should produce the same code regardless of the kernel-headers
> version
> and should work on both old and new kernels, correct ?
>
> > The commit Emmanuel linked to fixes cases such as systemd that were
> > just
> > completely garbage (reading with one size, and then checking they got
> > another), but it wouldn't fix this case.
> >
> > Unfortunately, as you also said, it does seem a bit late now - it's
> > been
> > released in various kernels since 5.10, and while the default
> > rollback
> > will improve the situation somewhat, read(..., size>8) will still
> > return
> > 9 bytes rather than 8 as it used to. Switching that *also* back
> > *should*
> > be safe, but who knows what other bugs were introduced in the
> > meantime?
> >
> > I certainly don't really have a major objection to rolling that also
> > back, but would it really help that much at this point? I guess it
> > could
> > be going into 5.10/5.11 stable kernels though.
>
> I don't think that rolling back the new extended-event support
> altogether
> will help. Since this has been out there for 2 released kernel versions
> now, I believe the best way to fix this is to fix userspace; and to fix
> userspace in such a way (at least g-s-d) that this problem cannot
> happen again.
>
> Regards,
>
> Hans
>


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2021-04-14 15:16:10

by Hans de Goede

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

On 4/14/21 11:52 AM, Benjamin Berg wrote:
> Hi,
>
> On Wed, 2021-04-14 at 10:17 +0200, Hans de Goede wrote:
>> Hi,
>>
>> Adding Benjamin Berg who is one of the gnome-settings-daemon
>> maintainers to the Cc.
>>
>> On 4/14/21 9:07 AM, Johannes Berg wrote:
>>> On Wed, 2021-04-14 at 05:12 +0000, Grumbach, Emmanuel wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've been debugging a userspace rfkill issue today which boils
>>>>> down
>>>>> to the
>>>>> "rfkill: add a reason to the HW rfkill state" patch:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
>>>>> d=14486c82612a177cb910980c70ba900827ca0894
>>>>> breaking userspace.
>>>>
>>>> This has been rolled back by:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=71826654ce40112f0651b6f4e94c422354f4adb6
>>>> Other userspace broke (systemd) so Johannes rolled this back by
>>>> default.
>>
>> I see, but this change is not in:
>>
>> kernel-headers-5.11.11-300.fc34.x86_64
>>
>> Meaning that basically all of Fedora 34 has been built with the "bad"
>> headers.
>>
>>>> Userspace that is interested in the new byte will read 9 bytes.
>>>
>>> Which, unfortunately, doesn't address *this* particular case, because
>>> it
>>> uses gio and that will fill the buffer with arbitrary size?
>>>
>>> When you (Hans) say you saw in strace a read of size 8, did you mean
>>> the
>>> size passed to it, or the return size? I guess it must be the return
>>> size, and the size passed to it was way larger.
>>
>> No for some reason, for some later read calls gio was actually passing
>> 8 as size to the read() syscall. And g-s-d was compiled with headers
>> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
>> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
>> out of the 16 bytes it got to fill a single rfkill_event which is fine,
>> except that it uses the remaining 7 bytes as the first 7 bytes of the
>> next rfkill_event which it processes making that next event be
>> completely bogus.
>>
>> I do believe this really is a g-s-d bug though, it should not have
>> been using a "buffered" gio-channel on a /dev/foo node; and so far it
>> only got away with this by the rfkill_event size being a nice power of
>> 2 value.
>>
>> As I mentioned in an email to Benjamin, g-s-d should really switch
>> to making direct read() calls on the fd circumventing the gio-channel
>> read code all together:
>>
>> "Right, notice I just realized that even after my fix there still is an
>> issue, when running code compiled against new kernel headers gsd-rfkill
>> will now always expect 9 bytes per event. But when running on an older
>> kernel that will not be true.
>
> This confuses me. i.e. if g-s-d is compiled against headers where the
> struct is 9 bytes long, then we should just be getting a short read of
> 8 bytes. And that should be the same, whether we use plain read() or
> g_io_channel_read_chars().
>
> I was under the impression that with
> https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/232
> merged, the read should be fine on the GNOME side.
>
> That said, the g-s-d event handler is checking the size of the read
> against the struct size. This is obviously wrong, it should only check
> that the read was successful (or check for >= V1 size).

Right this is what I was referring to a g-s-d compiled against the
new headers with a struct size of 9 will read 8 bytes on an old kernel
and that will fail the len check, so it won't work.

But this will actually never happen as I just noticed that g-s-d
uses a private rfkill.h copy with the old 8 bytes struct definition.

So with the buffered-io disabled everything should work fine,
see my other email in this thread.

We should probably still fix / clean the code a bit though, as
you are working on in:
https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/234

Regards,

Hans




>
> Benjamin
>
>> To fix this the code should probably just directly call read() itself
>> on the fd (only using g_io_channel for the polling) and then accept
>> anything
>> between 8 bytes and sizeof(struct rfkill_event) as valid return value
>> from the
>> read() call..."
>>
>> Notice that the problem which I described there will go away when
>> compiled
>> against even newer kernel-headers where sizeof(struct rfkill_event) is
>> back to 8 again.
>>
>> So question, for code like g-s-d, which does not care about the new
>> hard_block_reasons field. What would be the preferred / max
>> compatibility
>> way to do the reads. Also keeping in mind that there are "bad" kernel
>> headers out there where sizeof(struct rfkill_event) == 9 ?
>>
>> I think that this would be best:
>>
>>         ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
>>         if (ret == RFKILL_EVENT_SIZE_V1) {
>>                 /* Valid event process it */
>>         }
>>
>> This should produce the same code regardless of the kernel-headers
>> version
>> and should work on both old and new kernels, correct ?
>>
>>> The commit Emmanuel linked to fixes cases such as systemd that were
>>> just
>>> completely garbage (reading with one size, and then checking they got
>>> another), but it wouldn't fix this case.
>>>
>>> Unfortunately, as you also said, it does seem a bit late now - it's
>>> been
>>> released in various kernels since 5.10, and while the default
>>> rollback
>>> will improve the situation somewhat, read(..., size>8) will still
>>> return
>>> 9 bytes rather than 8 as it used to. Switching that *also* back
>>> *should*
>>> be safe, but who knows what other bugs were introduced in the
>>> meantime?
>>>
>>> I certainly don't really have a major objection to rolling that also
>>> back, but would it really help that much at this point? I guess it
>>> could
>>> be going into 5.10/5.11 stable kernels though.
>>
>> I don't think that rolling back the new extended-event support
>> altogether
>> will help. Since this has been out there for 2 released kernel versions
>> now, I believe the best way to fix this is to fix userspace; and to fix
>> userspace in such a way (at least g-s-d) that this problem cannot
>> happen again.
>>
>> Regards,
>>
>> Hans
>>
>

2021-04-14 15:17:36

by Benjamin Berg

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

On Wed, 2021-04-14 at 12:29 +0200, Hans de Goede wrote:
> On 4/14/21 11:52 AM, Benjamin Berg wrote:
> > [SNIP]
> >
> > That said, the g-s-d event handler is checking the size of the read
> > against the struct size. This is obviously wrong, it should only
> > check
> > that the read was successful (or check for >= V1 size).
>
> Right this is what I was referring to a g-s-d compiled against the
> new headers with a struct size of 9 will read 8 bytes on an old
> kernel
> and that will fail the len check, so it won't work.
>
> But this will actually never happen as I just noticed that g-s-d
> uses a private rfkill.h copy with the old 8 bytes struct definition.
>
> So with the buffered-io disabled everything should work fine,
> see my other email in this thread.

OK, so at least that part of the fix should be backported to older
release branches.

> We should probably still fix / clean the code a bit though, as
> you are working on in:
>
> https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/234

Yeah, the code there is somewhat messy overall. It could probably be
improved more than the MR does, but hopefully at least the biggest
inconsistencies are gone now.

Benjamin


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2021-04-14 15:18:11

by Hans de Goede

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

On 4/14/21 11:47 AM, Johannes Berg wrote:
> Hi Hans,
>
>> Adding Benjamin Berg who is one of the gnome-settings-daemon
>> maintainers to the Cc.
>
> :)
> As you might imagine, I've talked to him ;-)
>>
>> kernel-headers-5.11.11-300.fc34.x86_64
>
> Right, not yet.
>
>> Meaning that basically all of Fedora 34 has been built with the "bad"
>> headers.
>
> Yay. But still, I think in this case it wouldn't help.
>
>>>> Userspace that is interested in the new byte will read 9 bytes.
>>>
>>> Which, unfortunately, doesn't address *this* particular case, because it
>>> uses gio and that will fill the buffer with arbitrary size?
>>>
>>> When you (Hans) say you saw in strace a read of size 8, did you mean the
>>> size passed to it, or the return size? I guess it must be the return
>>> size, and the size passed to it was way larger.
>>
>> No for some reason, for some later read calls gio was actually passing
>> 8 as size to the read() syscall. 
>>
>
> Wait, that's weird? The (latest) gio code just calls
> g_io_channel_fill_buffer() and that doesn't care about the size you
> passed to g_io_channel_read_chars(), in fact it isn't even passed down,
> as you might expect for stream reads?

Ok, so I took another look at the strace and you are right.

There are 2 things happening here:

1. g-s-d has a private rfkill.h userspace-api copy with the old 8 byte
rfkill_event definition, which I missed.

2. g-s-d mixes read() and g_io_channel_read_chars() usage, using
read() to get the initial state and then later on using
g_io_channel_read_chars() on events. The initial state read()s
are the read(fd, buf, 8) calls which I saw in the strace.

So g-s-d was using / assuming a 8 byte struct size all the time and
my off-by-one by using buffered io theory was right, but works
differently then I expected. As you (Johannes) said, the problem is
g-s-d making read(fd, buf, 1024) calls, reading 9 bytes per event
and then returning 8 to the g_io_channel_read_chars() saving the
left over byte and pre-pending it to the next event, ruining
the next event(s).

Actually the code for getting the initial state seems to do the same
sizeof / RFKILL_EVENT_SIZE_V1 mix/match as systemd is doing
(probably copy and pasted from one to the other), but it is "saved"
somewhat by having its own rfkill.h copy:

len = read(fd, &event, sizeof(event));
if (len < 0) {
if (errno == EAGAIN)
break;
g_debug ("Reading of RFKILL events failed");
break;
}

if (len != RFKILL_EVENT_SIZE_V1) {
g_warning ("Wrong size of RFKILL event\n");
continue;
}

And then later on, when poll says there is new data:

status = g_io_channel_read_chars (source,
(char *) &event,
sizeof(event),
&read,
NULL);

while (status == G_IO_STATUS_NORMAL && read == sizeof(event)) {

So with my fix to turn off buffering all should be well now, but we
(g-s-d folks) should consider doing a follow-up patch to

plugins/rfkill/rfkill-glib.c

Replacing all the sizeof(event) occurences with RFKILL_EVENT_SIZE_V1 for
future proofing against rfkill.h changes (e.g. if someone syncs
the private copy with the kernel) although this should be safe with
the recent kernel-header change...

Benjamin, what is your take on doing a s/sizeof(event)/RFKILL_EVENT_SIZE_V1/ ?

At a minimum that would be good to do because it fixes this weirdness:

len = read(fd, &event, sizeof(event));
if (len < 0) {
if (errno == EAGAIN)
break;
g_debug ("Reading of RFKILL events failed");
break;
}

if (len != RFKILL_EVENT_SIZE_V1) {
g_warning ("Wrong size of RFKILL event\n");
continue;
}

Where the len param passed to read() and the one checked for are
not necessarily the same.

Regards,

Hans







>
>> And g-s-d was compiled with headers
>> where sizeof(struct rfkill_event) was 9. This is/was the issue, g-s-d
>> would do 2 read(fd, buf, 8) calls and then take the first 9 bytes read
>> out of the 16 bytes it got to fill a single rfkill_event which is fine,
>> except that it uses the remaining 7 bytes as the first 7 bytes of the
>> next rfkill_event which it processes making that next event be completely
>> bogus.
>
> I understand the issue. I just didn't expect it would get solved when
> the headers contain a struct rkfill_event of size 8 again, since the
> kernel would still return 9 bytes. In effect, you'd still get the same
> issue, because the read in g_io_channel_fill_buffer() is 1024
> (G_IO_NICE_BUF_SIZE).
>
>>
>> I do believe this really is a g-s-d bug though, it should not have
>> been using a "buffered" gio-channel on a /dev/foo node; and so far it
>> only got away with this by the rfkill_event size being a nice power of 2
>> value.
>
> I agree, but ... kernel regressions and all that, right?
>
>> As I mentioned in an email to Benjamin, g-s-d should really switch
>> to making direct read() calls on the fd circumventing the gio-channel
>> read code all together:
>>
>> "Right, notice I just realized that even after my fix there still is an issue,
>> when running code compiled against new kernel headers gsd-rfkill will now
>> always expect 9 bytes per event. But when running on an older kernel that
>> will not be true.
>>
>> To fix this the code should probably just directly call read() itself
>> on the fd (only using g_io_channel for the polling) and then accept anything
>> between 8 bytes and sizeof(struct rfkill_event) as valid return value from the
>> read() call..."
>
> I don't think this is related to gio, you can still use
> g_io_channel_read_chars() since that just does the read for you:
>
> if (!channel->use_buffer)
> {
> gsize tmp_bytes;
>
> g_assert (!channel->read_buf || channel->read_buf->len == 0);
>
> status = channel->funcs->io_read (channel, buf, count, &tmp_bytes, error);
>
> if (bytes_read)
> *bytes_read = tmp_bytes;
>
> return status;
> }
>
>
> but checking that you actually got the *exact* size that you wanted is
> indeed wrong (as well).
>
> Sounds like I made the wrong call here - I was discussing this with
> Benjamin a while ago and decided *not* to add an ioctl to opt in to the
> new event type ... sounds like I should have.
>
>
> Notice that the problem which I described there will go away when compiled
> against even newer kernel-headers where sizeof(struct rfkill_event) is
> back to 8 again.
>
> Yes.
>
> So question, for code like g-s-d, which does not care about the new
> hard_block_reasons field. What would be the preferred / max compatibility
> way to do the reads. Also keeping in mind that there are "bad" kernel
> headers out there where sizeof(struct rfkill_event) == 9 ?
>
> I think that this would be best:
>
> ret = read(fd, &event, RFKILL_EVENT_SIZE_V1);
> if (ret == RFKILL_EVENT_SIZE_V1) {
> /* Valid event process it */
> }
>
> This should produce the same code regardless of the kernel-headers version
> and should work on both old and new kernels, correct ?
>
>
> I believe so, yes.
>
> I don't think that rolling back the new extended-event support altogether
> will help. Since this has been out there for 2 released kernel versions
> now, I believe the best way to fix this is to fix userspace; and to fix
> userspace in such a way (at least g-s-d) that this problem cannot
> happen again.
>
> Fair enough.
>
> Thanks,
> johannes
>

2021-04-14 15:29:36

by Hans de Goede

[permalink] [raw]
Subject: Re: "rfkill: add a reason to the HW rfkill state" breaks userspace

Hi,

On 4/14/21 12:46 PM, Benjamin Berg wrote:
> Hi,
>
> On Wed, 2021-04-14 at 12:29 +0200, Hans de Goede wrote:
>> On 4/14/21 11:52 AM, Benjamin Berg wrote:
>>> [SNIP]
>>>
>>> That said, the g-s-d event handler is checking the size of the read
>>> against the struct size. This is obviously wrong, it should only
>>> check
>>> that the read was successful (or check for >= V1 size).
>>
>> Right this is what I was referring to a g-s-d compiled against the
>> new headers with a struct size of 9 will read 8 bytes on an old
>> kernel
>> and that will fail the len check, so it won't work.
>>
>> But this will actually never happen as I just noticed that g-s-d
>> uses a private rfkill.h copy with the old 8 bytes struct definition.
>>
>> So with the buffered-io disabled everything should work fine,
>> see my other email in this thread.
>
> OK, so at least that part of the fix should be backported to older
> release branches.

Ack, that fix should be backported to fix issues with newer
kernels and then everything should work fine.

Regards,

Hans