syzkaller reports for memory leak when kobject_init_and_add()
returns an error in the function sysfs_slab_add() [1]
When this happened, the function kobject_put() is not called for the
corresponding kobject, which potentially leads to memory leak.
This patch fixes the issue by calling kobject_put() even if
kobject_init_and_add() fails.
[1]
BUG: memory leak
unreferenced object 0xffff8880a6d4be88 (size 8):
comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
hex dump (first 8 bytes):
70 69 64 5f 33 00 ff ff pid_3...
backtrace:
[<00000000a0980095>] kstrdup+0x35/0x70 mm/util.c:60
[<00000000ef0cff3f>] kstrdup_const+0x3d/0x50 mm/util.c:82
[<00000000e2461486>] kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
[<000000005d749e93>] kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
[<0000000094e31519>] kobject_add_varg lib/kobject.c:384 [inline]
[<0000000094e31519>] kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
[<0000000060f13e32>] sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
[<00000000fe1d9a22>] __kmem_cache_create+0x50a/0x570 mm/slub.c:4384
[<000000006a71a1b4>] create_cache+0x113/0x1e0 mm/slab_common.c:407
[<0000000089491438>] kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
[<000000008c992595>] kmem_cache_create+0xd/0x10 mm/slab_common.c:564
[<000000005320c4b6>] create_pid_cachep kernel/pid_namespace.c:54 [inline]
[<000000005320c4b6>] create_pid_namespace kernel/pid_namespace.c:96 [inline]
[<000000005320c4b6>] copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
[<00000000fc8e1a2b>] create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
[<0000000080f0c9a5>] unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
[<0000000007e05aea>] ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
[<00000000e04c8e4b>] __do_sys_unshare kernel/fork.c:3037 [inline]
[<00000000e04c8e4b>] __se_sys_unshare kernel/fork.c:3035 [inline]
[<00000000e04c8e4b>] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
[<000000005c4707c7>] do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
Fixes: 80da026a8e5d ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
---
mm/slub.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index b762450f..63bd39c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5809,8 +5809,10 @@ static int sysfs_slab_add(struct kmem_cache *s)
s->kobj.kset = kset;
err = kobject_init_and_add(&s->kobj, &slab_ktype, NULL, "%s", name);
- if (err)
+ if (err) {
+ kobject_put(&s->kobj);
goto out;
+ }
err = sysfs_create_group(&s->kobj, &slab_attr_group);
if (err)
--
1.8.3.1
On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> syzkaller reports for memory leak when kobject_init_and_add()
> returns an error in the function sysfs_slab_add() [1]
>
> When this happened, the function kobject_put() is not called for the
> corresponding kobject, which potentially leads to memory leak.
>
> This patch fixes the issue by calling kobject_put() even if
> kobject_init_and_add() fails.
I think this speaks to a deeper problem with kobject_init_and_add()
-- the need to call kobject_put() if it fails is not readily apparent
to most users. This same bug appears in the first three users of
kobject_init_and_add() that I checked --
arch/ia64/kernel/topology.c
drivers/firmware/dmi-sysfs.c
drivers/firmware/efi/esrt.c
drivers/scsi/iscsi_boot_sysfs.c
Some do get it right --
arch/powerpc/kernel/cacheinfo.c
drivers/gpu/drm/ttm/ttm_bo.c
drivers/gpu/drm/ttm/ttm_memory.c
drivers/infiniband/hw/mlx4/sysfs.c
I'd argue that the current behaviour is wrong, that kobject_init_and_add()
should call kobject_put() if the add fails. This would need a tree-wide
audit. But somebody needs to do that anyway because based on my random
sampling, half of the users currently get it wrong.
On 02/06/2020 15.10, Matthew Wilcox wrote:
> On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
>> syzkaller reports for memory leak when kobject_init_and_add()
>> returns an error in the function sysfs_slab_add() [1]
>>
>> When this happened, the function kobject_put() is not called for the
>> corresponding kobject, which potentially leads to memory leak.
>>
>> This patch fixes the issue by calling kobject_put() even if
>> kobject_init_and_add() fails.
>
> I think this speaks to a deeper problem with kobject_init_and_add()
> -- the need to call kobject_put() if it fails is not readily apparent
> to most users. This same bug appears in the first three users of
> kobject_init_and_add() that I checked --
> arch/ia64/kernel/topology.c
> drivers/firmware/dmi-sysfs.c
> drivers/firmware/efi/esrt.c
> drivers/scsi/iscsi_boot_sysfs.c
>
> Some do get it right --
> arch/powerpc/kernel/cacheinfo.c
> drivers/gpu/drm/ttm/ttm_bo.c
> drivers/gpu/drm/ttm/ttm_memory.c
> drivers/infiniband/hw/mlx4/sysfs.c
>
> I'd argue that the current behaviour is wrong, that kobject_init_and_add()
> should call kobject_put() if the add fails. This would need a tree-wide
> audit. But somebody needs to do that anyway because based on my random
> sampling, half of the users currently get it wrong.
>
At his point kobject doesn't own kmem-cache structure itself yet.
So calling kobject_put() will free kmem-cache and then it will be
freed second time on error path in create_cache().
I suppose freeing in case of error should be pushed from common
create_cache() into slab-specific __kmem_cache_create().
On Tue, Jun 02, 2020 at 05:10:35AM -0700, Matthew Wilcox wrote:
> On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > syzkaller reports for memory leak when kobject_init_and_add()
> > returns an error in the function sysfs_slab_add() [1]
> >
> > When this happened, the function kobject_put() is not called for the
> > corresponding kobject, which potentially leads to memory leak.
> >
> > This patch fixes the issue by calling kobject_put() even if
> > kobject_init_and_add() fails.
>
> I think this speaks to a deeper problem with kobject_init_and_add()
> -- the need to call kobject_put() if it fails is not readily apparent
> to most users. This same bug appears in the first three users of
> kobject_init_and_add() that I checked --
> arch/ia64/kernel/topology.c
> drivers/firmware/dmi-sysfs.c
> drivers/firmware/efi/esrt.c
> drivers/scsi/iscsi_boot_sysfs.c
>
> Some do get it right --
> arch/powerpc/kernel/cacheinfo.c
> drivers/gpu/drm/ttm/ttm_bo.c
> drivers/gpu/drm/ttm/ttm_memory.c
> drivers/infiniband/hw/mlx4/sysfs.c
Why are random individual drivers calling kobject* functions? That
speaks to a larger problem here...
Anyway, yes, it's a tricky function, but the issue usually is that the
kobject is embedded in something else and if you call init_and_add() you
want to tear things down _before_ the final put happens.
The good thing is, that function is really hard to get to fail except if
you abuse it with syzkaller :)
> I'd argue that the current behaviour is wrong, that kobject_init_and_add()
> should call kobject_put() if the add fails. This would need a tree-wide
> audit. But somebody needs to do that anyway because based on my random
> sampling, half of the users currently get it wrong.
As said above, this is "tricky", and might break things.
thanks,
greg k-h
On Tue, Jun 02, 2020 at 04:04:04PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 02, 2020 at 05:10:35AM -0700, Matthew Wilcox wrote:
> > On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > > syzkaller reports for memory leak when kobject_init_and_add()
> > > returns an error in the function sysfs_slab_add() [1]
> > >
> > > When this happened, the function kobject_put() is not called for the
> > > corresponding kobject, which potentially leads to memory leak.
> > >
> > > This patch fixes the issue by calling kobject_put() even if
> > > kobject_init_and_add() fails.
> >
> > I think this speaks to a deeper problem with kobject_init_and_add()
> > -- the need to call kobject_put() if it fails is not readily apparent
> > to most users. This same bug appears in the first three users of
> > kobject_init_and_add() that I checked --
> > arch/ia64/kernel/topology.c
> > drivers/firmware/dmi-sysfs.c
> > drivers/firmware/efi/esrt.c
> > drivers/scsi/iscsi_boot_sysfs.c
> >
> > Some do get it right --
> > arch/powerpc/kernel/cacheinfo.c
> > drivers/gpu/drm/ttm/ttm_bo.c
> > drivers/gpu/drm/ttm/ttm_memory.c
> > drivers/infiniband/hw/mlx4/sysfs.c
>
> Why are random individual drivers calling kobject* functions? That
> speaks to a larger problem here...
There's around 120 callers in the kernel today ... large, indeed.
> Anyway, yes, it's a tricky function, but the issue usually is that the
> kobject is embedded in something else and if you call init_and_add() you
> want to tear things down _before_ the final put happens.
>
> The good thing is, that function is really hard to get to fail except if
> you abuse it with syzkaller :)
Yes ;-)
> > I'd argue that the current behaviour is wrong, that kobject_init_and_add()
> > should call kobject_put() if the add fails. This would need a tree-wide
> > audit. But somebody needs to do that anyway because based on my random
> > sampling, half of the users currently get it wrong.
>
> As said above, this is "tricky", and might break things.
My audit may not be correct then. The kobject_put() may be appropriately
being called at a higher level rather than in the same function as the
kobject_init_and_add().
On Tue, 2020-06-02 at 05:10 -0700, Matthew Wilcox wrote:
> On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > syzkaller reports for memory leak when kobject_init_and_add()
> > returns an error in the function sysfs_slab_add() [1]
> >
> > When this happened, the function kobject_put() is not called for
> > the
> > corresponding kobject, which potentially leads to memory leak.
> >
> > This patch fixes the issue by calling kobject_put() even if
> > kobject_init_and_add() fails.
>
> I think this speaks to a deeper problem with kobject_init_and_add()
> -- the need to call kobject_put() if it fails is not readily apparent
> to most users. This same bug appears in the first three users of
> kobject_init_and_add() that I checked --
> arch/ia64/kernel/topology.c
> drivers/firmware/dmi-sysfs.c
> drivers/firmware/efi/esrt.c
> drivers/scsi/iscsi_boot_sysfs.c
>
> Some do get it right --
> arch/powerpc/kernel/cacheinfo.c
> drivers/gpu/drm/ttm/ttm_bo.c
> drivers/gpu/drm/ttm/ttm_memory.c
> drivers/infiniband/hw/mlx4/sysfs.c
>
> I'd argue that the current behaviour is wrong,
Absolutely agree with this. We have a big meta pattern here where we
introduce functions with tortuous semantics then someone creates a
checker for the semantics and misuses come crawling out of the woodwork
leading to floods of patches, usually for little or never used error
paths, which really don't buy anything apart from theoretical
correctness. Just insisting on simple semantics would have avoided
this.
> that kobject_init_and_add() should call kobject_put() if the add
> fails. This would need a tree-wide audit. But somebody needs to do
> that anyway because based on my random sampling, half of the users
> currently get it wrong.
Well, the semantics of kobject_init() are free on fail, so these are
the ones everyone seems to be using. The semantics of kobject_add are
put on fail. The problem is that put on fail isn't necessarily correct
in the kobject_init() case: the release function may make assumptions
about the object hierarchy which aren't satisfied in the kobject_init()
failure case. This argues that kobject_init_and_add() can't ever have
correct semantics and we should eliminate it.
James
On Tue, Jun 02, 2020 at 08:25:14AM -0700, James Bottomley wrote:
> On Tue, 2020-06-02 at 05:10 -0700, Matthew Wilcox wrote:
> > On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > > syzkaller reports for memory leak when kobject_init_and_add()
> > > returns an error in the function sysfs_slab_add() [1]
> > >
> > > When this happened, the function kobject_put() is not called for
> > > the
> > > corresponding kobject, which potentially leads to memory leak.
> > >
> > > This patch fixes the issue by calling kobject_put() even if
> > > kobject_init_and_add() fails.
> >
> > I think this speaks to a deeper problem with kobject_init_and_add()
> > -- the need to call kobject_put() if it fails is not readily apparent
> > to most users. This same bug appears in the first three users of
> > kobject_init_and_add() that I checked --
> > arch/ia64/kernel/topology.c
> > drivers/firmware/dmi-sysfs.c
> > drivers/firmware/efi/esrt.c
> > drivers/scsi/iscsi_boot_sysfs.c
> >
> > Some do get it right --
> > arch/powerpc/kernel/cacheinfo.c
> > drivers/gpu/drm/ttm/ttm_bo.c
> > drivers/gpu/drm/ttm/ttm_memory.c
> > drivers/infiniband/hw/mlx4/sysfs.c
> >
> > I'd argue that the current behaviour is wrong,
>
> Absolutely agree with this. We have a big meta pattern here where we
> introduce functions with tortuous semantics then someone creates a
> checker for the semantics and misuses come crawling out of the woodwork
> leading to floods of patches, usually for little or never used error
> paths, which really don't buy anything apart from theoretical
> correctness. Just insisting on simple semantics would have avoided
> this.
I "introduced" this way back at the end of 2007. It's not exactly a new
function.
> > that kobject_init_and_add() should call kobject_put() if the add
> > fails. This would need a tree-wide audit. But somebody needs to do
> > that anyway because based on my random sampling, half of the users
> > currently get it wrong.
>
> Well, the semantics of kobject_init() are free on fail, so these are
> the ones everyone seems to be using. The semantics of kobject_add are
> put on fail. The problem is that put on fail isn't necessarily correct
> in the kobject_init() case: the release function may make assumptions
> about the object hierarchy which aren't satisfied in the kobject_init()
> failure case. This argues that kobject_init_and_add() can't ever have
> correct semantics and we should eliminate it.
At the time, it did reduce common functionality and error handling all
into a simpler function. And, given it's history, it must have somehow
worked for the past 12 years or so :)
Odds are, lots of the callers shouldn't be messing around with kobjects
in the first place. Originally it was only assumed that there would be
very few users. But it has spread to filesystems and firmware
subsystems. Drivers should never use it though, so it's a good hint
something is wrong there...
Anyway, patches to fix this up to make a "sane" api for kobjects is
always appreciated. Personally I don't have the time at the moment.
thanks,
greg k-h
On Tue, Jun 02, 2020 at 05:10:35AM -0700, Matthew Wilcox wrote:
> On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > syzkaller reports for memory leak when kobject_init_and_add()
> > returns an error in the function sysfs_slab_add() [1]
> >
> > When this happened, the function kobject_put() is not called for the
> > corresponding kobject, which potentially leads to memory leak.
> >
> > This patch fixes the issue by calling kobject_put() even if
> > kobject_init_and_add() fails.
>
> I think this speaks to a deeper problem with kobject_init_and_add()
> to most users. This same bug appears in the first three users of
> kobject_init_and_add() that I checked --
> arch/ia64/kernel/topology.c
> drivers/firmware/dmi-sysfs.c
> drivers/firmware/efi/esrt.c
> drivers/scsi/iscsi_boot_sysfs.c
>
> Some do get it right --
> arch/powerpc/kernel/cacheinfo.c
> drivers/gpu/drm/ttm/ttm_bo.c
> drivers/gpu/drm/ttm/ttm_memory.c
> drivers/infiniband/hw/mlx4/sysfs.c
>
> I'd argue that the current behaviour is wrong, that kobject_init_and_add()
> should call kobject_put() if the add fails.
There are APIs that auto-free their argument on failure and last times
I checked one, about half the tree had a tricky use-after free bug on
the error path.
> This would need a tree-wide audit. But somebody needs to do that
> anyway because based on my random sampling, half of the users
> currently get it wrong.
IMHO these functions that hide an 'init' inside (eg the switch from
kfree to refcount in the error unwind) are tricky. The caller must
switch from some kfree goto error unwind to a put goto error unwind,
which is very unnatural and strange.
I think it is better if the init is near the kalloc and the entire
error unwind always uses put. No switching.
Jason
On Tue, 2020-06-02 at 19:36 +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 02, 2020 at 08:25:14AM -0700, James Bottomley wrote:
> > On Tue, 2020-06-02 at 05:10 -0700, Matthew Wilcox wrote:
> > > On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > > > syzkaller reports for memory leak when kobject_init_and_add()
> > > > returns an error in the function sysfs_slab_add() [1]
> > > >
> > > > When this happened, the function kobject_put() is not called
> > > > for the corresponding kobject, which potentially leads to
> > > > memory leak.
> > > >
> > > > This patch fixes the issue by calling kobject_put() even if
> > > > kobject_init_and_add() fails.
> > >
> > > I think this speaks to a deeper problem with
> > > kobject_init_and_add()
> > > -- the need to call kobject_put() if it fails is not readily
> > > apparent
> > > to most users. This same bug appears in the first three users of
> > > kobject_init_and_add() that I checked --
> > > arch/ia64/kernel/topology.c
> > > drivers/firmware/dmi-sysfs.c
> > > drivers/firmware/efi/esrt.c
> > > drivers/scsi/iscsi_boot_sysfs.c
> > >
> > > Some do get it right --
> > > arch/powerpc/kernel/cacheinfo.c
> > > drivers/gpu/drm/ttm/ttm_bo.c
> > > drivers/gpu/drm/ttm/ttm_memory.c
> > > drivers/infiniband/hw/mlx4/sysfs.c
> > >
> > > I'd argue that the current behaviour is wrong,
> >
> > Absolutely agree with this. We have a big meta pattern here where
> > we introduce functions with tortuous semantics then someone creates
> > a checker for the semantics and misuses come crawling out of the
> > woodwork leading to floods of patches, usually for little or never
> > used error paths, which really don't buy anything apart from
> > theoretical correctness. Just insisting on simple semantics would
> > have avoided this.
>
> I "introduced" this way back at the end of 2007. It's not exactly a
> new function.
Heh, well, if it never fails, how you handle the failure become
unimportant semantics ...
> > > that kobject_init_and_add() should call kobject_put() if the add
> > > fails. This would need a tree-wide audit. But somebody needs to
> > > do that anyway because based on my random sampling, half of the
> > > users currently get it wrong.
> >
> > Well, the semantics of kobject_init() are free on fail, so these
> > are the ones everyone seems to be using. The semantics of
> > kobject_add are put on fail. The problem is that put on fail isn't
> > necessarily correct in the kobject_init() case: the release
> > function may make assumptions about the object hierarchy which
> > aren't satisfied in the kobject_init() failure case. This argues
> > that kobject_init_and_add() can't ever have correct semantics and
> > we should eliminate it.
>
> At the time, it did reduce common functionality and error handling
> all into a simpler function. And, given it's history, it must have
> somehow worked for the past 12 years or so :)
Well, like I said, as long as it never fails, no problem.
It was just Matthew saying "couldn't we make it do kobject_put()
itself?" that got me thinking that perhaps that wouldn't work with all
cases. So now we're discussing failure handling, we're into the
esoteric rabbit hole case that never happens.
> Odds are, lots of the callers shouldn't be messing around with
> kobjects in the first place. Originally it was only assumed that
> there would be very few users. But it has spread to filesystems and
> firmware subsystems. Drivers should never use it though, so it's a
> good hint something is wrong there...
>
> Anyway, patches to fix this up to make a "sane" api for kobjects is
> always appreciated. Personally I don't have the time at the moment.
I think the only way we can make the failure semantics consistent is to
have the kobject_init() ones (so kfree on failure). That means for the
add part, the function would have to unwind everything it did from init
on so kfree() is still an option. If people agree, then I can produce
the patch ... it's just the current drive to transform everyone who's
doing kfree() into kobject_put() would become wrong ...
James
On Tue, Jun 02, 2020 at 12:54:16PM -0700, James Bottomley wrote:
> On Tue, 2020-06-02 at 19:36 +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jun 02, 2020 at 08:25:14AM -0700, James Bottomley wrote:
> > > On Tue, 2020-06-02 at 05:10 -0700, Matthew Wilcox wrote:
> > > > On Tue, Jun 02, 2020 at 07:50:33PM +0800, Wang Hai wrote:
> > > > > syzkaller reports for memory leak when kobject_init_and_add()
> > > > > returns an error in the function sysfs_slab_add() [1]
> > > > >
> > > > > When this happened, the function kobject_put() is not called
> > > > > for the corresponding kobject, which potentially leads to
> > > > > memory leak.
> > > > >
> > > > > This patch fixes the issue by calling kobject_put() even if
> > > > > kobject_init_and_add() fails.
> > > >
> > > > I think this speaks to a deeper problem with
> > > > kobject_init_and_add()
> > > > -- the need to call kobject_put() if it fails is not readily
> > > > apparent
> > > > to most users. This same bug appears in the first three users of
> > > > kobject_init_and_add() that I checked --
> > > > arch/ia64/kernel/topology.c
> > > > drivers/firmware/dmi-sysfs.c
> > > > drivers/firmware/efi/esrt.c
> > > > drivers/scsi/iscsi_boot_sysfs.c
> > > >
> > > > Some do get it right --
> > > > arch/powerpc/kernel/cacheinfo.c
> > > > drivers/gpu/drm/ttm/ttm_bo.c
> > > > drivers/gpu/drm/ttm/ttm_memory.c
> > > > drivers/infiniband/hw/mlx4/sysfs.c
> > > >
> > > > I'd argue that the current behaviour is wrong,
> > >
> > > Absolutely agree with this. We have a big meta pattern here where
> > > we introduce functions with tortuous semantics then someone creates
> > > a checker for the semantics and misuses come crawling out of the
> > > woodwork leading to floods of patches, usually for little or never
> > > used error paths, which really don't buy anything apart from
> > > theoretical correctness. Just insisting on simple semantics would
> > > have avoided this.
> >
> > I "introduced" this way back at the end of 2007. It's not exactly a
> > new function.
>
> Heh, well, if it never fails, how you handle the failure become
> unimportant semantics ...
>
> > > > that kobject_init_and_add() should call kobject_put() if the add
> > > > fails. This would need a tree-wide audit. But somebody needs to
> > > > do that anyway because based on my random sampling, half of the
> > > > users currently get it wrong.
> > >
> > > Well, the semantics of kobject_init() are free on fail, so these
> > > are the ones everyone seems to be using. The semantics of
> > > kobject_add are put on fail. The problem is that put on fail isn't
> > > necessarily correct in the kobject_init() case: the release
> > > function may make assumptions about the object hierarchy which
> > > aren't satisfied in the kobject_init() failure case. This argues
> > > that kobject_init_and_add() can't ever have correct semantics and
> > > we should eliminate it.
> >
> > At the time, it did reduce common functionality and error handling
> > all into a simpler function. And, given it's history, it must have
> > somehow worked for the past 12 years or so :)
>
> Well, like I said, as long as it never fails, no problem.
>
> It was just Matthew saying "couldn't we make it do kobject_put()
> itself?" that got me thinking that perhaps that wouldn't work with all
> cases. So now we're discussing failure handling, we're into the
> esoteric rabbit hole case that never happens.
>
> > Odds are, lots of the callers shouldn't be messing around with
> > kobjects in the first place. Originally it was only assumed that
> > there would be very few users. But it has spread to filesystems and
> > firmware subsystems. Drivers should never use it though, so it's a
> > good hint something is wrong there...
> >
> > Anyway, patches to fix this up to make a "sane" api for kobjects is
> > always appreciated. Personally I don't have the time at the moment.
>
> I think the only way we can make the failure semantics consistent is to
> have the kobject_init() ones (so kfree on failure). That means for the
> add part, the function would have to unwind everything it did from init
> on so kfree() is still an option. If people agree, then I can produce
> the patch ... it's just the current drive to transform everyone who's
> doing kfree() into kobject_put() would become wrong ...
Everyone should be putting their kfree into the kobject release anyway,
right?
Anyway, let's see your patch before I start to object further :)
thanks,
greg k-h
On Tue, 2020-06-02 at 22:07 +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 02, 2020 at 12:54:16PM -0700, James Bottomley wrote:
[...]
> > I think the only way we can make the failure semantics consistent
> > is to have the kobject_init() ones (so kfree on failure). That
> > means for the add part, the function would have to unwind
> > everything it did from init on so kfree() is still an option. If
> > people agree, then I can produce the patch ... it's just the
> > current drive to transform everyone who's doing kfree() into
> > kobject_put() would become wrong ...
>
> Everyone should be putting their kfree into the kobject release
> anyway, right?
No, that's the problem ... for a static kobject you can't free it; and
the release path may make assumption which aren't valid depending on
the kobject state.
> Anyway, let's see your patch before I start to object further :)
My first thought was "what? I got suckered into creating a patch",
thanks ;-) But now I look, all the error paths do unwind back to the
initial state, so kfree() on error looks to be completely correct. I
got confused by a bogus patch set like this:
https://lore.kernel.org/linux-scsi/[email protected]/
But it turns out the person sending the patch didn't understand the
network failure they quote:
b8eb718348b8 net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject
Has the problem precisely because the kobject is static. The release
path clears it and that allows it to be readded. I'll just reply to
the sender of the bogus patches.
James
On Tue, 2020-06-02 at 14:51 -0700, James Bottomley wrote:
> On Tue, 2020-06-02 at 22:07 +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jun 02, 2020 at 12:54:16PM -0700, James Bottomley wrote:
>
> [...]
> > > I think the only way we can make the failure semantics consistent
> > > is to have the kobject_init() ones (so kfree on failure). That
> > > means for the add part, the function would have to unwind
> > > everything it did from init on so kfree() is still an option. If
> > > people agree, then I can produce the patch ... it's just the
> > > current drive to transform everyone who's doing kfree() into
> > > kobject_put() would become wrong ...
> >
> > Everyone should be putting their kfree into the kobject release
> > anyway, right?
>
> No, that's the problem ... for a static kobject you can't free it;
> and the release path may make assumption which aren't valid depending
> on the kobject state.
>
> > Anyway, let's see your patch before I start to object further :)
>
> My first thought was "what? I got suckered into creating a patch",
> thanks ;-) But now I look, all the error paths do unwind back to the
> initial state, so kfree() on error looks to be completely correct.
Actually, I spoke too soon. I did another analysis of the syzkaller
flow in b8eb718348b8 ("net-sysfs: Fix reference count leak in
rx|netdev_queue_add_kobject") and it turns out there is a single piece
of state that's not correctly unwound: the kobj->name which, thanks to
additions after kobject_init_and_add() was created, is now allocated
via kmalloc if it's not a rodata string and is always and freed in
kobject_cleanup via kfree_const(). This problem can be fixed by
unwinding the name allocation at the end of kobject_init_and_add() ...
or it could be unwound in kobject_add_varg, which would also make
kobject_add() unwind correctly.
The unwind step is to kfree_const(kobj->name); kobj->name = NULL; so it
won't interfere if the kobject_put() is called instead of a simple
kfree.
Would you prefer the unwind in kobject_init_and_add() like the patch
below or in kobject_add_varg()?
James
---
diff --git a/lib/kobject.c b/lib/kobject.c
index 65fa7bf70c57..9991baf43d27 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -472,6 +472,10 @@ int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
va_start(args, fmt);
retval = kobject_add_varg(kobj, parent, fmt, args);
va_end(args);
+ if (retval && kobj->name) {
+ kfree_const(kobj->name);
+ kobj->name = NULL;
+ }
return retval;
}
On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley wrote:
> My first thought was "what? I got suckered into creating a patch",
> thanks ;-) But now I look, all the error paths do unwind back to the
> initial state, so kfree() on error looks to be completely correct.
It doesn't fully unwind if the kobject is put into a kset, then
another thread can get the kref during kset_find_obj() and kfree() won't
wait for the kref to go to 0. It must use put.
Jason
On Tue, 2020-06-02 at 21:22 -0300, Jason Gunthorpe wrote:
> On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley wrote:
>
> > My first thought was "what? I got suckered into creating a patch",
> > thanks ;-) But now I look, all the error paths do unwind back to
> > the initial state, so kfree() on error looks to be completely
> > correct.
>
> It doesn't fully unwind if the kobject is put into a kset, then
> another thread can get the kref during kset_find_obj() and kfree()
> won't wait for the kref to go to 0. It must use put.
That does seem a bit contrived: the only failure kobject_add_internal()
can get after kobj_kset_join() is from directory creation. If
directory creation fails, no name appears in sysfs and no event for the
name is sent, how did another thread get the name to pass in to
kset_find_obj()?
James
On Wed, Jun 03, 2020 at 11:04:35AM -0700, James Bottomley wrote:
> On Tue, 2020-06-02 at 21:22 -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley wrote:
> >
> > > My first thought was "what? I got suckered into creating a patch",
> > > thanks ;-) But now I look, all the error paths do unwind back to
> > > the initial state, so kfree() on error looks to be completely
> > > correct.
> >
> > It doesn't fully unwind if the kobject is put into a kset, then
> > another thread can get the kref during kset_find_obj() and kfree()
> > won't wait for the kref to go to 0. It must use put.
>
> That does seem a bit contrived: the only failure kobject_add_internal()
> can get after kobj_kset_join() is from directory creation. If
> directory creation fails, no name appears in sysfs and no event for the
> name is sent, how did another thread get the name to pass in to
> kset_find_obj()?
The other thread just guesses in a hostile way?
Eg it looks like the iommu stuff just feeds in user data to
kobj_kset_join().
Jason
On Wed, 2020-06-03 at 15:36 -0300, Jason Gunthorpe wrote:
> On Wed, Jun 03, 2020 at 11:04:35AM -0700, James Bottomley wrote:
> > On Tue, 2020-06-02 at 21:22 -0300, Jason Gunthorpe wrote:
> > > On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley wrote:
> > >
> > > > My first thought was "what? I got suckered into creating a
> > > > patch", thanks ;-) But now I look, all the error paths do
> > > > unwind back to the initial state, so kfree() on error looks to
> > > > be completely correct.
> > >
> > > It doesn't fully unwind if the kobject is put into a kset, then
> > > another thread can get the kref during kset_find_obj() and
> > > kfree() won't wait for the kref to go to 0. It must use put.
> >
> > That does seem a bit contrived: the only failure
> > kobject_add_internal() can get after kobj_kset_join() is from
> > directory creation. If directory creation fails, no name appears
> > in sysfs and no event for the name is sent, how did another thread
> > get the name to pass in to kset_find_obj()?
>
> The other thread just guesses in a hostile way?
>
> Eg it looks like the iommu stuff just feeds in user data to
> kobj_kset_join().
Well, if we have to go down the rabbit hole this far, it turns out to
be fixable because of the state_in_sysfs flag:
@@ -899,7 +903,8 @@ struct kobject *kset_find_obj(struct kset *kset, const char *name)
spin_lock(&kset->list_lock);
list_for_each_entry(k, &kset->list, entry) {
- if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
+ if (kobject_name(k) && k->state_in_sysfs &&
+ !strcmp(kobject_name(k), name)) {
ret = kobject_get_unless_zero(k);
break;
}
That would ensure the name can't be found until the sysfs directory
creation has succeeded, which would be the point from which
kobject_init_and_add() can't fail.
James
On Wed, Jun 03, 2020 at 12:02:08PM -0700, James Bottomley wrote:
> On Wed, 2020-06-03 at 15:36 -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 03, 2020 at 11:04:35AM -0700, James Bottomley wrote:
> > > On Tue, 2020-06-02 at 21:22 -0300, Jason Gunthorpe wrote:
> > > > On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley wrote:
> > > >
> > > > > My first thought was "what? I got suckered into creating a
> > > > > patch", thanks ;-) But now I look, all the error paths do
> > > > > unwind back to the initial state, so kfree() on error looks to
> > > > > be completely correct.
> > > >
> > > > It doesn't fully unwind if the kobject is put into a kset, then
> > > > another thread can get the kref during kset_find_obj() and
> > > > kfree() won't wait for the kref to go to 0. It must use put.
> > >
> > > That does seem a bit contrived: the only failure
> > > kobject_add_internal() can get after kobj_kset_join() is from
> > > directory creation. If directory creation fails, no name appears
> > > in sysfs and no event for the name is sent, how did another thread
> > > get the name to pass in to kset_find_obj()?
> >
> > The other thread just guesses in a hostile way?
> >
> > Eg it looks like the iommu stuff just feeds in user data to
> > kobj_kset_join().
>
> Well, if we have to go down the rabbit hole this far, it turns out to
> be fixable because of the state_in_sysfs flag:
>
> @@ -899,7 +903,8 @@ struct kobject *kset_find_obj(struct kset *kset, const char *name)
> spin_lock(&kset->list_lock);
>
> list_for_each_entry(k, &kset->list, entry) {
> - if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
> + if (kobject_name(k) && k->state_in_sysfs &&
> + !strcmp(kobject_name(k), name)) {
> ret = kobject_get_unless_zero(k);
> break;
> }
>
> That would ensure the name can't be found until the sysfs directory
> creation has succeeded, which would be the point from which
> kobject_init_and_add() can't fail.
Convoluted, and needs something on the store of state_in_sysfs too,
but could work.
It feels more robust to stick with the put though..
Jason
On Wed, 2020-06-03 at 16:30 -0300, Jason Gunthorpe wrote:
> On Wed, Jun 03, 2020 at 12:02:08PM -0700, James Bottomley wrote:
> > On Wed, 2020-06-03 at 15:36 -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 03, 2020 at 11:04:35AM -0700, James Bottomley wrote:
> > > > On Tue, 2020-06-02 at 21:22 -0300, Jason Gunthorpe wrote:
> > > > > On Tue, Jun 02, 2020 at 02:51:10PM -0700, James Bottomley
> > > > > wrote:
> > > > >
> > > > > > My first thought was "what? I got suckered into creating a
> > > > > > patch", thanks ;-) But now I look, all the error paths do
> > > > > > unwind back to the initial state, so kfree() on error looks
> > > > > > to
> > > > > > be completely correct.
> > > > >
> > > > > It doesn't fully unwind if the kobject is put into a kset,
> > > > > then
> > > > > another thread can get the kref during kset_find_obj() and
> > > > > kfree() won't wait for the kref to go to 0. It must use put.
> > > >
> > > > That does seem a bit contrived: the only failure
> > > > kobject_add_internal() can get after kobj_kset_join() is from
> > > > directory creation. If directory creation fails, no name
> > > > appears
> > > > in sysfs and no event for the name is sent, how did another
> > > > thread
> > > > get the name to pass in to kset_find_obj()?
> > >
> > > The other thread just guesses in a hostile way?
> > >
> > > Eg it looks like the iommu stuff just feeds in user data to
> > > kobj_kset_join().
> >
> > Well, if we have to go down the rabbit hole this far, it turns out
> > to
> > be fixable because of the state_in_sysfs flag:
> >
> > @@ -899,7 +903,8 @@ struct kobject *kset_find_obj(struct kset
> > *kset, const char *name)
> > spin_lock(&kset->list_lock);
> >
> > list_for_each_entry(k, &kset->list, entry) {
> > - if (kobject_name(k) && !strcmp(kobject_name(k),
> > name)) {
> > + if (kobject_name(k) && k->state_in_sysfs &&
> > + !strcmp(kobject_name(k), name)) {
> > ret = kobject_get_unless_zero(k);
> > break;
> > }
> >
> > That would ensure the name can't be found until the sysfs directory
> > creation has succeeded, which would be the point from which
> > kobject_init_and_add() can't fail.
>
> Convoluted, and needs something on the store of state_in_sysfs too,
> but could work.
The store of state_in_sysfs is already done in kobject_add_internal().
It's an existing flag people already use to tell if the kobject has
been exposed in sysfs. However, it's set after the sysfs directory
creation succeeds. This is the code with some debugging removed:
error = create_dir(kobj);
if (error) {
kobj_kset_leave(kobj);
kobject_put(parent);
kobj->parent = NULL;
...
} else
kobj->state_in_sysfs = 1;
return error;
So it's the very last thing set before kobject_add_internal() returns
success ... which is pretty much the last thing kobject_init_and_add()
does.
> It feels more robust to stick with the put though..
possibly ... like I said, the only concern with the put path is that
->release has state expectations that aren't met if
kobject_init_and_add fails. I think none of the callers that currently
does kfree() on error has a problem with this, but it should be
checked.
However, adding unwinding correctly keeps either kfree or put correct
in the event of kobject_init_and_add() failure.
James
On Wed, Jun 03, 2020 at 01:56:20PM -0700, James Bottomley wrote:
> The store of state_in_sysfs is already done in kobject_add_internal().
> It's an existing flag people already use to tell if the kobject has
> been exposed in sysfs. However, it's set after the sysfs directory
> creation succeeds. This is the code with some debugging removed:
>
> error = create_dir(kobj);
> if (error) {
> kobj_kset_leave(kobj);
> kobject_put(parent);
> kobj->parent = NULL;
> ...
> } else
> kobj->state_in_sysfs = 1;
I was thinking most probably this will need a lock or a
smp_store_release() ..
> > It feels more robust to stick with the put though..
>
> possibly ... like I said, the only concern with the put path is that
> ->release has state expectations that aren't met if
> kobject_init_and_add fails.
Certainly error unwind bugs related to put and release will exist, but
I suspect switching to kfree won't solve them, just move them to the
next function that fails and needs a put based unwind?
At least the patches I reviewed for RDM a from Wang Hai were all
correct and didn't seem to have release based errors.
Jason