Hi Kent,
Looking around in the bcachefs code for possible causes of this KMSAN
bug report:
https://lore.kernel.org/lkml/[email protected]/
I notice the following pattern in the bcachefs structures: zero-length
arrays members are inserted in structures (not always at the end),
seemingly to achieve a result similar to what could be done with a
union:
fs/bcachefs/bcachefs_format.h:
struct bkey_packed {
__u64 _data[0];
/* Size of combined key and value, in u64s */
__u8 u64s;
[...]
};
likewise:
struct bkey_i {
__u64 _data[0];
struct bkey k;
struct bch_val v;
};
(and there are many more examples of this pattern in bcachefs)
AFAIK, the C11 standard states that array declarator constant expression
delimited by [ ] shall have a value greater than zero.
Effectively, we can verify that this code triggers an undefined behavior
with:
#include <stdio.h>
struct z {
int x[0];
int y;
int z;
} __attribute__((packed));
int main(void)
{
struct z a;
a.y = 1;
printf("%d\n", a.x[0]);
}
clang-15 -fsanitize=undefined -o a a.c
/a
a.c:14:17: runtime error: index 0 out of bounds for type 'int[0]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior a.c:14:17 in
1
Also, gcc warns that ISO C forbids zero-size arrays when compiling
with -pedantic:
gcc -std=c11 -pedantic -o a a.c
a.c:4:13: warning: ISO C forbids zero-size array ‘x’ [-Wpedantic]
4 | int x[0];
And clang states that this is only supported as an extension, even though
accessing it seems to be classified as an undefined behavior by UBSAN.
clang-15 -std=c11 -pedantic -o a a.c
a.c:4:8: warning: zero size arrays are an extension [-Wzero-length-array]
int x[0];
So I wonder if the issue reported by KMSAN could be caused by this
pattern ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
> Hi Kent,
>
> Looking around in the bcachefs code for possible causes of this KMSAN
> bug report:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> I notice the following pattern in the bcachefs structures: zero-length
> arrays members are inserted in structures (not always at the end),
> seemingly to achieve a result similar to what could be done with a
> union:
>
> fs/bcachefs/bcachefs_format.h:
>
> struct bkey_packed {
> __u64 _data[0];
>
> /* Size of combined key and value, in u64s */
> __u8 u64s;
> [...]
> };
>
> likewise:
>
> struct bkey_i {
> __u64 _data[0];
>
> struct bkey k;
> struct bch_val v;
> };
>
> (and there are many more examples of this pattern in bcachefs)
>
> AFAIK, the C11 standard states that array declarator constant expression
>
> Effectively, we can verify that this code triggers an undefined behavior
> with:
>
> #include <stdio.h>
>
> struct z {
> int x[0];
> int y;
> int z;
> } __attribute__((packed));
>
> int main(void)
> {
> struct z a;
>
> a.y = 1;
> printf("%d\n", a.x[0]);
> }
> delimited by [ ] shall have a value greater than zero.
Yet another example of the C people going absolutely nutty with
everything being undefined. Look, this isn't ok, we need to get work
done, and I've already wasted entirely too much time on ZLA vs. flex
array member nonsense.
There's a bunch of legit uses for zero length arrays, and your example,
where we're not even _assigning_ to x, is just batshit. Someone needs to
get his head examined.
> So I wonder if the issue reported by KMSAN could be caused by this
> pattern ?
Possibly; the KMSAN errors I've been looking at do look suspicious. But
it sounds like we need a real fix that involves defining proper
semantics, not compiler folks giving up and saying 'aiee!'.
IOW, clang/KMSAN are broken if they simply choke on a zero length array
being present.
[ Adding clang/llvm and KMSAN maintainers/reviewers in CC. ]
On 2024-05-24 11:28, Kent Overstreet wrote:
> On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
>> Hi Kent,
>>
>> Looking around in the bcachefs code for possible causes of this KMSAN
>> bug report:
>>
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> I notice the following pattern in the bcachefs structures: zero-length
>> arrays members are inserted in structures (not always at the end),
>> seemingly to achieve a result similar to what could be done with a
>> union:
>>
>> fs/bcachefs/bcachefs_format.h:
>>
>> struct bkey_packed {
>> __u64 _data[0];
>>
>> /* Size of combined key and value, in u64s */
>> __u8 u64s;
>> [...]
>> };
>>
>> likewise:
>>
>> struct bkey_i {
>> __u64 _data[0];
>>
>> struct bkey k;
>> struct bch_val v;
>> };
>>
>> (and there are many more examples of this pattern in bcachefs)
>>
>> AFAIK, the C11 standard states that array declarator constant expression
>>
>> Effectively, we can verify that this code triggers an undefined behavior
>> with:
>>
>> #include <stdio.h>
>>
>> struct z {
>> int x[0];
>> int y;
>> int z;
>> } __attribute__((packed));
>>
>> int main(void)
>> {
>> struct z a;
>>
>> a.y = 1;
>> printf("%d\n", a.x[0]);
>> }
>> delimited by [ ] shall have a value greater than zero.
>
> Yet another example of the C people going absolutely nutty with
> everything being undefined. Look, this isn't ok, we need to get work
> done, and I've already wasted entirely too much time on ZLA vs. flex
> array member nonsense.
>
> There's a bunch of legit uses for zero length arrays, and your example,
> where we're not even _assigning_ to x, is just batshit. Someone needs to
> get his head examined.
>
>> So I wonder if the issue reported by KMSAN could be caused by this
>> pattern ?
>
> Possibly; the KMSAN errors I've been looking at do look suspicious. But
> it sounds like we need a real fix that involves defining proper
> semantics, not compiler folks giving up and saying 'aiee!'.
>
> IOW, clang/KMSAN are broken if they simply choke on a zero length array
> being present.
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
On 2024-05-24 11:35, Mathieu Desnoyers wrote:
> [ Adding clang/llvm and KMSAN maintainers/reviewers in CC. ]
>
> On 2024-05-24 11:28, Kent Overstreet wrote:
>> On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
>>> Hi Kent,
>>>
>>> Looking around in the bcachefs code for possible causes of this KMSAN
>>> bug report:
>>>
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> I notice the following pattern in the bcachefs structures: zero-length
>>> arrays members are inserted in structures (not always at the end),
>>> seemingly to achieve a result similar to what could be done with a
>>> union:
>>>
>>> fs/bcachefs/bcachefs_format.h:
>>>
>>> struct bkey_packed {
>>> __u64 _data[0];
>>>
>>> /* Size of combined key and value, in u64s */
>>> __u8 u64s;
>>> [...]
>>> };
>>>
>>> likewise:
>>>
>>> struct bkey_i {
>>> __u64 _data[0];
>>>
>>> struct bkey k;
>>> struct bch_val v;
>>> };
>>>
>>> (and there are many more examples of this pattern in bcachefs)
>>>
>>> AFAIK, the C11 standard states that array declarator constant expression
>>>
>>> Effectively, we can verify that this code triggers an undefined behavior
>>> with:
>>>
>>> #include <stdio.h>
>>>
>>> struct z {
>>> int x[0];
>>> int y;
>>> int z;
>>> } __attribute__((packed));
>>>
>>> int main(void)
>>> {
>>> struct z a;
>>>
>>> a.y = 1;
>>> printf("%d\n", a.x[0]);
>>> }
>>> delimited by [ ] shall have a value greater than zero.
>>
>> Yet another example of the C people going absolutely nutty with
>> everything being undefined. Look, this isn't ok, we need to get work
>> done, and I've already wasted entirely too much time on ZLA vs. flex
>> array member nonsense.
>>
>> There's a bunch of legit uses for zero length arrays, and your example,
>> where we're not even _assigning_ to x, is just batshit. Someone needs to
>> get his head examined.
Notice how a.y is first set to 1, then a.x[0] is loaded, expecting to
alias with a.y.
This is the same aliasing pattern found in bcachefs, for instance here:
bcachefs_format.h:
struct jset {
[...]
__u8 encrypted_start[0];
__le16 _read_clock; /* no longer used */
__le16 _write_clock;
/* Sequence number of oldest dirty journal entry */
__le64 last_seq;
struct jset_entry start[0];
__u64 _data[];
} __packed __aligned(8);
where struct jset last_seq field is set by jset_validate():
jset->last_seq = jset->seq;
and where journal_read_bucket() uses the encrypted_start member as input:
ret = bch2_encrypt(c, JSET_CSUM_TYPE(j), journal_nonce(j),
j->encrypted_start,
vstruct_end(j) - (void *) j->encrypted_start);
Regards,
Mathieu
>>
>>> So I wonder if the issue reported by KMSAN could be caused by this
>>> pattern ?
>>
>> Possibly; the KMSAN errors I've been looking at do look suspicious. But
>> it sounds like we need a real fix that involves defining proper
>> semantics, not compiler folks giving up and saying 'aiee!'.
>>
>> IOW, clang/KMSAN are broken if they simply choke on a zero length array
>> being present.
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
On Fri, May 24, 2024 at 12:04:11PM -0400, Mathieu Desnoyers wrote:
> On 2024-05-24 11:35, Mathieu Desnoyers wrote:
> > [ Adding clang/llvm and KMSAN maintainers/reviewers in CC. ]
> >
> > On 2024-05-24 11:28, Kent Overstreet wrote:
> > > On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
> > > > Hi Kent,
> > > >
> > > > Looking around in the bcachefs code for possible causes of this KMSAN
> > > > bug report:
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > I notice the following pattern in the bcachefs structures: zero-length
> > > > arrays members are inserted in structures (not always at the end),
> > > > seemingly to achieve a result similar to what could be done with a
> > > > union:
> > > >
> > > > fs/bcachefs/bcachefs_format.h:
> > > >
> > > > struct bkey_packed {
> > > > __u64 _data[0];
> > > >
> > > > /* Size of combined key and value, in u64s */
> > > > __u8 u64s;
> > > > [...]
> > > > };
> > > >
> > > > likewise:
> > > >
> > > > struct bkey_i {
> > > > __u64 _data[0];
> > > >
> > > > struct bkey k;
> > > > struct bch_val v;
> > > > };
> > > >
> > > > (and there are many more examples of this pattern in bcachefs)
> > > >
> > > > AFAIK, the C11 standard states that array declarator constant expression
> > > >
> > > > Effectively, we can verify that this code triggers an undefined behavior
> > > > with:
> > > >
> > > > #include <stdio.h>
> > > >
> > > > struct z {
> > > > int x[0];
> > > > int y;
> > > > int z;
> > > > } __attribute__((packed));
> > > >
> > > > int main(void)
> > > > {
> > > > struct z a;
> > > >
> > > > a.y = 1;
> > > > printf("%d\n", a.x[0]);
> > > > }
> > > > delimited by [ ] shall have a value greater than zero.
> > >
> > > Yet another example of the C people going absolutely nutty with
> > > everything being undefined. Look, this isn't ok, we need to get work
> > > done, and I've already wasted entirely too much time on ZLA vs. flex
> > > array member nonsense.
> > >
> > > There's a bunch of legit uses for zero length arrays, and your example,
> > > where we're not even _assigning_ to x, is just batshit. Someone needs to
> > > get his head examined.
>
> Notice how a.y is first set to 1, then a.x[0] is loaded, expecting to
> alias with a.y.
>
> This is the same aliasing pattern found in bcachefs, for instance here:
>
> bcachefs_format.h:
>
> struct jset {
> [...]
> __u8 encrypted_start[0];
>
> __le16 _read_clock; /* no longer used */
> __le16 _write_clock;
>
> /* Sequence number of oldest dirty journal entry */
> __le64 last_seq;
>
>
> struct jset_entry start[0];
> __u64 _data[];
> } __packed __aligned(8);
>
> where struct jset last_seq field is set by jset_validate():
>
> jset->last_seq = jset->seq;
>
> and where journal_read_bucket() uses the encrypted_start member as input:
>
> ret = bch2_encrypt(c, JSET_CSUM_TYPE(j), journal_nonce(j),
> j->encrypted_start,
> vstruct_end(j) - (void *) j->encrypted_start);
Except we're just using it as a marker for an offset into the struct,
the same "aliasing" issue would apply if we were just using offsetof()
to calculate the offsets directly.
On Fri, May 24, 2024 at 7:30 PM Kent Overstreet
<[email protected]> wrote:
>
> On Fri, May 24, 2024 at 12:04:11PM -0400, Mathieu Desnoyers wrote:
> > On 2024-05-24 11:35, Mathieu Desnoyers wrote:
> > > [ Adding clang/llvm and KMSAN maintainers/reviewers in CC. ]
> > >
> > > On 2024-05-24 11:28, Kent Overstreet wrote:
> > > > On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
> > > > > Hi Kent,
> > > > >
> > > > > Looking around in the bcachefs code for possible causes of this KMSAN
> > > > > bug report:
> > > > >
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > >
> > > > > I notice the following pattern in the bcachefs structures: zero-length
> > > > > arrays members are inserted in structures (not always at the end),
> > > > > seemingly to achieve a result similar to what could be done with a
> > > > > union:
> > > > >
> > > > > fs/bcachefs/bcachefs_format.h:
> > > > >
> > > > > struct bkey_packed {
> > > > > __u64 _data[0];
> > > > >
> > > > > /* Size of combined key and value, in u64s */
> > > > > __u8 u64s;
> > > > > [...]
> > > > > };
> > > > >
> > > > > likewise:
> > > > >
> > > > > struct bkey_i {
> > > > > __u64 _data[0];
> > > > >
> > > > > struct bkey k;
> > > > > struct bch_val v;
> > > > > };
I took a glance at the LLVM IR for fs/bcachefs/bset.c, and it defines
struct bkey_packed and bkey_i as:
%struct.bkey_packed = type { [0 x i64], i8, i8, i8, [0 x i8], [37 x i8] }
%struct.bkey_i = type { [0 x i64], %struct.bkey, %struct.bch_val }
, which more or less looks as expected, so I don't think it could be
causing problems with KMSAN right now.
Moreover, there are cases in e.g. include/linux/skbuff.h where
zero-length arrays are used for the same purpose, and KMSAN handles
them just fine.
Yet I want to point out that even GCC discourages the use of
zero-length arrays in the middle of a struct:
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html, so Clang is not
unique here.
Regarding the original KMSAN bug, as noted in
https://lore.kernel.org/all/[email protected]/T/,
we might be missing the event of copying data from the disk to
bcachefs structs.
I'd appreciate help from someone knowledgeable about how disk I/O is
implemented in the kernel.
On Tue, May 28, 2024 at 01:36:11PM +0200, Alexander Potapenko wrote:
> On Fri, May 24, 2024 at 7:30 PM Kent Overstreet
> <[email protected]> wrote:
> >
> > On Fri, May 24, 2024 at 12:04:11PM -0400, Mathieu Desnoyers wrote:
> > > On 2024-05-24 11:35, Mathieu Desnoyers wrote:
> > > > [ Adding clang/llvm and KMSAN maintainers/reviewers in CC. ]
> > > >
> > > > On 2024-05-24 11:28, Kent Overstreet wrote:
> > > > > On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
> > > > > > Hi Kent,
> > > > > >
> > > > > > Looking around in the bcachefs code for possible causes of this KMSAN
> > > > > > bug report:
> > > > > >
> > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > >
> > > > > > I notice the following pattern in the bcachefs structures: zero-length
> > > > > > arrays members are inserted in structures (not always at the end),
> > > > > > seemingly to achieve a result similar to what could be done with a
> > > > > > union:
> > > > > >
> > > > > > fs/bcachefs/bcachefs_format.h:
> > > > > >
> > > > > > struct bkey_packed {
> > > > > > __u64 _data[0];
> > > > > >
> > > > > > /* Size of combined key and value, in u64s */
> > > > > > __u8 u64s;
> > > > > > [...]
> > > > > > };
> > > > > >
> > > > > > likewise:
> > > > > >
> > > > > > struct bkey_i {
> > > > > > __u64 _data[0];
> > > > > >
> > > > > > struct bkey k;
> > > > > > struct bch_val v;
> > > > > > };
>
> I took a glance at the LLVM IR for fs/bcachefs/bset.c, and it defines
> struct bkey_packed and bkey_i as:
>
> %struct.bkey_packed = type { [0 x i64], i8, i8, i8, [0 x i8], [37 x i8] }
> %struct.bkey_i = type { [0 x i64], %struct.bkey, %struct.bch_val }
>
> , which more or less looks as expected, so I don't think it could be
> causing problems with KMSAN right now.
> Moreover, there are cases in e.g. include/linux/skbuff.h where
> zero-length arrays are used for the same purpose, and KMSAN handles
> them just fine.
>
> Yet I want to point out that even GCC discourages the use of
> zero-length arrays in the middle of a struct:
> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html, so Clang is not
> unique here.
>
> Regarding the original KMSAN bug, as noted in
> https://lore.kernel.org/all/[email protected]/T/,
> we might be missing the event of copying data from the disk to
> bcachefs structs.
> I'd appreciate help from someone knowledgeable about how disk I/O is
> implemented in the kernel.
If that was missing I'd expect everything to be breaking. What's the
helper that marks memory as initialized?
On Tue, May 28, 2024 at 5:02 PM Kent Overstreet
<[email protected]> wrote:
>
> On Tue, May 28, 2024 at 01:36:11PM +0200, Alexander Potapenko wrote:
> > On Fri, May 24, 2024 at 7:30 PM Kent Overstreet
> > <[email protected]> wrote:
> > >
> > > On Fri, May 24, 2024 at 12:04:11PM -0400, Mathieu Desnoyers wrote:
> > > > On 2024-05-24 11:35, Mathieu Desnoyers wrote:
> > > > > [ Adding clang/llvm and KMSAN maintainers/reviewers in CC. ]
> > > > >
> > > > > On 2024-05-24 11:28, Kent Overstreet wrote:
> > > > > > On Thu, May 23, 2024 at 01:53:42PM -0400, Mathieu Desnoyers wrote:
> > > > > > > Hi Kent,
> > > > > > >
> > > > > > > Looking around in the bcachefs code for possible causes of this KMSAN
> > > > > > > bug report:
> > > > > > >
> > > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > > >
> > > > > > > I notice the following pattern in the bcachefs structures: zero-length
> > > > > > > arrays members are inserted in structures (not always at the end),
> > > > > > > seemingly to achieve a result similar to what could be done with a
> > > > > > > union:
> > > > > > >
> > > > > > > fs/bcachefs/bcachefs_format.h:
> > > > > > >
> > > > > > > struct bkey_packed {
> > > > > > > __u64 _data[0];
> > > > > > >
> > > > > > > /* Size of combined key and value, in u64s */
> > > > > > > __u8 u64s;
> > > > > > > [...]
> > > > > > > };
> > > > > > >
> > > > > > > likewise:
> > > > > > >
> > > > > > > struct bkey_i {
> > > > > > > __u64 _data[0];
> > > > > > >
> > > > > > > struct bkey k;
> > > > > > > struct bch_val v;
> > > > > > > };
> >
> > I took a glance at the LLVM IR for fs/bcachefs/bset.c, and it defines
> > struct bkey_packed and bkey_i as:
> >
> > %struct.bkey_packed = type { [0 x i64], i8, i8, i8, [0 x i8], [37 x i8] }
> > %struct.bkey_i = type { [0 x i64], %struct.bkey, %struct.bch_val }
> >
> > , which more or less looks as expected, so I don't think it could be
> > causing problems with KMSAN right now.
> > Moreover, there are cases in e.g. include/linux/skbuff.h where
> > zero-length arrays are used for the same purpose, and KMSAN handles
> > them just fine.
> >
> > Yet I want to point out that even GCC discourages the use of
> > zero-length arrays in the middle of a struct:
> > https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html, so Clang is not
> > unique here.
> >
> > Regarding the original KMSAN bug, as noted in
> > https://lore.kernel.org/all/[email protected]/T/,
> > we might be missing the event of copying data from the disk to
> > bcachefs structs.
> > I'd appreciate help from someone knowledgeable about how disk I/O is
> > implemented in the kernel.
>
> If that was missing I'd expect everything to be breaking. What's the
> helper that marks memory as initialized?
There's kmsan_unpoison_memory()
(https://elixir.bootlin.com/linux/latest/source/include/linux/kmsan-checks.h#L37).
include/linux/kmsan.h also has several more specific helpers for
various subsystems - we probably need something like that.
I was expecting kmsan_handle_dma() to cover disk IO as well, but
apparently I was wrong.