The normal flow through the widget sysfs codepath is that
snd_hdac_refresh_widgets() is called once without the sysfs bool set
to set up codec->num_nodes and friends, then another time with the
bool set to actually allocate all the sysfs widgets. However, during
the first time allocation, hda_widget_sysfs_reinit() ignores the new
num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
using whatever was in codec->num_nodes before the update. This is not
correct in cases where num_nodes changes. Here's an example:
Sometime earlier:
snd_hdac_refresh_widgets(hdac, false)
sets codec->num_nodes to 2, widgets is still not allocated
Now:
snd_hdac_refresh_widgets(hdac, true)
hda_widget_sysfs_reinit(num_nodes=7)
hda_widget_sysfs_init()
widget_tree_create()
alloc(codec->num_nodes) // this is still 2
codec->num_nodes = 7
Pass num_nodes and start_nid down into widget_tree_create() so that
the right number of nodes are allocated in all cases.
Signed-off-by: Evan Green <[email protected]>
---
sound/hda/hdac_device.c | 2 +-
sound/hda/hdac_sysfs.c | 14 ++++++++------
sound/hda/local.h | 3 ++-
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/sound/hda/hdac_device.c b/sound/hda/hdac_device.c
index 6907dbefd08c..5e74acf45c81 100644
--- a/sound/hda/hdac_device.c
+++ b/sound/hda/hdac_device.c
@@ -144,7 +144,7 @@ int snd_hdac_device_register(struct hdac_device *codec)
if (err < 0)
return err;
mutex_lock(&codec->widget_lock);
- err = hda_widget_sysfs_init(codec);
+ err = hda_widget_sysfs_init(codec, codec->start_nid, codec->num_nodes);
mutex_unlock(&codec->widget_lock);
if (err < 0) {
device_del(&codec->dev);
diff --git a/sound/hda/hdac_sysfs.c b/sound/hda/hdac_sysfs.c
index 909d5ef1179c..1c4b98929d9c 100644
--- a/sound/hda/hdac_sysfs.c
+++ b/sound/hda/hdac_sysfs.c
@@ -358,7 +358,8 @@ static int add_widget_node(struct kobject *parent, hda_nid_t nid,
return 0;
}
-static int widget_tree_create(struct hdac_device *codec)
+static int widget_tree_create(struct hdac_device *codec,
+ hda_nid_t start_nid, int num_nodes)
{
struct hdac_widget_tree *tree;
int i, err;
@@ -372,12 +373,12 @@ static int widget_tree_create(struct hdac_device *codec)
if (!tree->root)
return -ENOMEM;
- tree->nodes = kcalloc(codec->num_nodes + 1, sizeof(*tree->nodes),
+ tree->nodes = kcalloc(num_nodes + 1, sizeof(*tree->nodes),
GFP_KERNEL);
if (!tree->nodes)
return -ENOMEM;
- for (i = 0, nid = codec->start_nid; i < codec->num_nodes; i++, nid++) {
+ for (i = 0, nid = start_nid; i < num_nodes; i++, nid++) {
err = add_widget_node(tree->root, nid, &widget_node_group,
&tree->nodes[i]);
if (err < 0)
@@ -396,14 +397,15 @@ static int widget_tree_create(struct hdac_device *codec)
}
/* call with codec->widget_lock held */
-int hda_widget_sysfs_init(struct hdac_device *codec)
+int hda_widget_sysfs_init(struct hdac_device *codec,
+ hda_nid_t start_nid, int num_nodes)
{
int err;
if (codec->widgets)
return 0; /* already created */
- err = widget_tree_create(codec);
+ err = widget_tree_create(codec, start_nid, num_nodes);
if (err < 0) {
widget_tree_free(codec);
return err;
@@ -428,7 +430,7 @@ int hda_widget_sysfs_reinit(struct hdac_device *codec,
int i;
if (!codec->widgets)
- return hda_widget_sysfs_init(codec);
+ return hda_widget_sysfs_init(codec, start_nid, num_nodes);
tree = kmemdup(codec->widgets, sizeof(*tree), GFP_KERNEL);
if (!tree)
diff --git a/sound/hda/local.h b/sound/hda/local.h
index 877631e39373..8936120ab4d9 100644
--- a/sound/hda/local.h
+++ b/sound/hda/local.h
@@ -28,7 +28,8 @@ static inline unsigned int get_wcaps_channels(u32 wcaps)
}
extern const struct attribute_group *hdac_dev_attr_groups[];
-int hda_widget_sysfs_init(struct hdac_device *codec);
+int hda_widget_sysfs_init(struct hdac_device *codec,
+ hda_nid_t start_nid, int num_nodes);
int hda_widget_sysfs_reinit(struct hdac_device *codec, hda_nid_t start_nid,
int num_nodes);
void hda_widget_sysfs_exit(struct hdac_device *codec);
--
2.20.1
On Tue, 25 Jun 2019 23:54:18 +0200,
Evan Green wrote:
>
> The normal flow through the widget sysfs codepath is that
> snd_hdac_refresh_widgets() is called once without the sysfs bool set
> to set up codec->num_nodes and friends, then another time with the
> bool set to actually allocate all the sysfs widgets. However, during
> the first time allocation, hda_widget_sysfs_reinit() ignores the new
> num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
> using whatever was in codec->num_nodes before the update. This is not
> correct in cases where num_nodes changes. Here's an example:
>
> Sometime earlier:
> snd_hdac_refresh_widgets(hdac, false)
> sets codec->num_nodes to 2, widgets is still not allocated
>
> Now:
> snd_hdac_refresh_widgets(hdac, true)
> hda_widget_sysfs_reinit(num_nodes=7)
> hda_widget_sysfs_init()
> widget_tree_create()
> alloc(codec->num_nodes) // this is still 2
> codec->num_nodes = 7
>
> Pass num_nodes and start_nid down into widget_tree_create() so that
> the right number of nodes are allocated in all cases.
>
> Signed-off-by: Evan Green <[email protected]>
Thanks for the patch. That's indeed a problem, but I guess a simpler
approach is just to return if sysfs didn't exist. If the sysfs
entries aren't present at the second call with sysfs=true, it implies
that the codec object will be exposed anyway later, and the sysfs will
be created there. So, something like below would work instead?
thanks,
Takashi
--- a/sound/hda/hdac_sysfs.c
+++ b/sound/hda/hdac_sysfs.c
@@ -428,7 +428,7 @@ int hda_widget_sysfs_reinit(struct hdac_device *codec,
int i;
if (!codec->widgets)
- return hda_widget_sysfs_init(codec);
+ return 0;
tree = kmemdup(codec->widgets, sizeof(*tree), GFP_KERNEL);
if (!tree)
On Wed, Jun 26, 2019 at 1:27 AM Takashi Iwai <[email protected]> wrote:
>
> On Tue, 25 Jun 2019 23:54:18 +0200,
> Evan Green wrote:
> >
> > The normal flow through the widget sysfs codepath is that
> > snd_hdac_refresh_widgets() is called once without the sysfs bool set
> > to set up codec->num_nodes and friends, then another time with the
> > bool set to actually allocate all the sysfs widgets. However, during
> > the first time allocation, hda_widget_sysfs_reinit() ignores the new
> > num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
> > using whatever was in codec->num_nodes before the update. This is not
> > correct in cases where num_nodes changes. Here's an example:
> >
> > Sometime earlier:
> > snd_hdac_refresh_widgets(hdac, false)
> > sets codec->num_nodes to 2, widgets is still not allocated
> >
> > Now:
> > snd_hdac_refresh_widgets(hdac, true)
> > hda_widget_sysfs_reinit(num_nodes=7)
> > hda_widget_sysfs_init()
> > widget_tree_create()
> > alloc(codec->num_nodes) // this is still 2
> > codec->num_nodes = 7
> >
> > Pass num_nodes and start_nid down into widget_tree_create() so that
> > the right number of nodes are allocated in all cases.
> >
> > Signed-off-by: Evan Green <[email protected]>
>
> Thanks for the patch. That's indeed a problem, but I guess a simpler
> approach is just to return if sysfs didn't exist. If the sysfs
> entries aren't present at the second call with sysfs=true, it implies
> that the codec object will be exposed anyway later, and the sysfs will
> be created there. So, something like below would work instead?
Hi Takashi,
Thanks for taking a look. I'm not sure you'd want to do that because
then you end up returning from sysfs_reinit without having allocated
any of the sysfs widgets. You'd be relying on the implicit behavior
that another call to init is coming later (despite having updated
num_nodes and start node), which is difficult to follow and easy to
break. In my opinion the slight bit of extra diffs is well worth the
clarity of having widget_tree_create always allocate the correct
start/count.
Actually, in looking at the widget lock patch, I don't think it's
sufficient either. It adds a lock around sysfs_reinit, but the setting
of codec->num_nodes and codec->start_nid is unprotected by the lock.
So you could have the two threads politely serialize through
sysfs_reinit, but then get reordered before setting codec->num_nodes,
landing you with an array whose length doesn't match num_nodes.
Let me craft up an additional patch to fix the locking.
-Evan
>
>
> thanks,
>
> Takashi
>
> --- a/sound/hda/hdac_sysfs.c
> +++ b/sound/hda/hdac_sysfs.c
> @@ -428,7 +428,7 @@ int hda_widget_sysfs_reinit(struct hdac_device *codec,
> int i;
>
> if (!codec->widgets)
> - return hda_widget_sysfs_init(codec);
> + return 0;
>
> tree = kmemdup(codec->widgets, sizeof(*tree), GFP_KERNEL);
> if (!tree)
On Wed, 26 Jun 2019 22:34:28 +0200,
Evan Green wrote:
>
> On Wed, Jun 26, 2019 at 1:27 AM Takashi Iwai <[email protected]> wrote:
> >
> > On Tue, 25 Jun 2019 23:54:18 +0200,
> > Evan Green wrote:
> > >
> > > The normal flow through the widget sysfs codepath is that
> > > snd_hdac_refresh_widgets() is called once without the sysfs bool set
> > > to set up codec->num_nodes and friends, then another time with the
> > > bool set to actually allocate all the sysfs widgets. However, during
> > > the first time allocation, hda_widget_sysfs_reinit() ignores the new
> > > num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
> > > using whatever was in codec->num_nodes before the update. This is not
> > > correct in cases where num_nodes changes. Here's an example:
> > >
> > > Sometime earlier:
> > > snd_hdac_refresh_widgets(hdac, false)
> > > sets codec->num_nodes to 2, widgets is still not allocated
> > >
> > > Now:
> > > snd_hdac_refresh_widgets(hdac, true)
> > > hda_widget_sysfs_reinit(num_nodes=7)
> > > hda_widget_sysfs_init()
> > > widget_tree_create()
> > > alloc(codec->num_nodes) // this is still 2
> > > codec->num_nodes = 7
> > >
> > > Pass num_nodes and start_nid down into widget_tree_create() so that
> > > the right number of nodes are allocated in all cases.
> > >
> > > Signed-off-by: Evan Green <[email protected]>
> >
> > Thanks for the patch. That's indeed a problem, but I guess a simpler
> > approach is just to return if sysfs didn't exist. If the sysfs
> > entries aren't present at the second call with sysfs=true, it implies
> > that the codec object will be exposed anyway later, and the sysfs will
> > be created there. So, something like below would work instead?
>
> Hi Takashi,
> Thanks for taking a look. I'm not sure you'd want to do that because
> then you end up returning from sysfs_reinit without having allocated
> any of the sysfs widgets. You'd be relying on the implicit behavior
> that another call to init is coming later (despite having updated
> num_nodes and start node), which is difficult to follow and easy to
> break. In my opinion the slight bit of extra diffs is well worth the
> clarity of having widget_tree_create always allocate the correct
> start/count.
Well, skipping is the right behavior, actually. The whole need of the
refresh function is just to refresh the widget list, and the current
behavior to create a sysfs is rather superfluous. This action has
never been used, so better to get removed for avoiding misuse.
> Actually, in looking at the widget lock patch, I don't think it's
> sufficient either. It adds a lock around sysfs_reinit, but the setting
> of codec->num_nodes and codec->start_nid is unprotected by the lock.
> So you could have the two threads politely serialize through
> sysfs_reinit, but then get reordered before setting codec->num_nodes,
> landing you with an array whose length doesn't match num_nodes.
The usage of snd_hdac_refresh_widgets() is supposed to be done only at
the codec probe phase, hence there is no lock done in the core code;
IOW, any concurrent access has to be protected in the caller side in
general.
Have you actually seen such concurrent accesses? If yes, that's a
problem in the usage.
thanks,
Takashi
>
> Let me craft up an additional patch to fix the locking.
> -Evan
>
> >
> >
> > thanks,
> >
> > Takashi
> >
> > --- a/sound/hda/hdac_sysfs.c
> > +++ b/sound/hda/hdac_sysfs.c
> > @@ -428,7 +428,7 @@ int hda_widget_sysfs_reinit(struct hdac_device *codec,
> > int i;
> >
> > if (!codec->widgets)
> > - return hda_widget_sysfs_init(codec);
> > + return 0;
> >
> > tree = kmemdup(codec->widgets, sizeof(*tree), GFP_KERNEL);
> > if (!tree)
>
On Wed, Jun 26, 2019 at 2:16 PM Takashi Iwai <[email protected]> wrote:
>
> On Wed, 26 Jun 2019 22:34:28 +0200,
> Evan Green wrote:
> >
> > On Wed, Jun 26, 2019 at 1:27 AM Takashi Iwai <[email protected]> wrote:
> > >
> > > On Tue, 25 Jun 2019 23:54:18 +0200,
> > > Evan Green wrote:
> > > >
> > > > The normal flow through the widget sysfs codepath is that
> > > > snd_hdac_refresh_widgets() is called once without the sysfs bool set
> > > > to set up codec->num_nodes and friends, then another time with the
> > > > bool set to actually allocate all the sysfs widgets. However, during
> > > > the first time allocation, hda_widget_sysfs_reinit() ignores the new
> > > > num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
> > > > using whatever was in codec->num_nodes before the update. This is not
> > > > correct in cases where num_nodes changes. Here's an example:
> > > >
> > > > Sometime earlier:
> > > > snd_hdac_refresh_widgets(hdac, false)
> > > > sets codec->num_nodes to 2, widgets is still not allocated
> > > >
> > > > Now:
> > > > snd_hdac_refresh_widgets(hdac, true)
> > > > hda_widget_sysfs_reinit(num_nodes=7)
> > > > hda_widget_sysfs_init()
> > > > widget_tree_create()
> > > > alloc(codec->num_nodes) // this is still 2
> > > > codec->num_nodes = 7
> > > >
> > > > Pass num_nodes and start_nid down into widget_tree_create() so that
> > > > the right number of nodes are allocated in all cases.
> > > >
> > > > Signed-off-by: Evan Green <[email protected]>
> > >
> > > Thanks for the patch. That's indeed a problem, but I guess a simpler
> > > approach is just to return if sysfs didn't exist. If the sysfs
> > > entries aren't present at the second call with sysfs=true, it implies
> > > that the codec object will be exposed anyway later, and the sysfs will
> > > be created there. So, something like below would work instead?
> >
> > Hi Takashi,
> > Thanks for taking a look. I'm not sure you'd want to do that because
> > then you end up returning from sysfs_reinit without having allocated
> > any of the sysfs widgets. You'd be relying on the implicit behavior
> > that another call to init is coming later (despite having updated
> > num_nodes and start node), which is difficult to follow and easy to
> > break. In my opinion the slight bit of extra diffs is well worth the
> > clarity of having widget_tree_create always allocate the correct
> > start/count.
>
> Well, skipping is the right behavior, actually. The whole need of the
> refresh function is just to refresh the widget list, and the current
> behavior to create a sysfs is rather superfluous. This action has
> never been used, so better to get removed for avoiding misuse.
Whoops, I sent out a v2 before seeing this. Sorry to jump the gun like that.
I don't quite follow what you mean by "current behavior to create a
sysfs is rather superfluous". Do you think we could delete this
conditional in re-init altogether? I wasn't totally sure, but it
seemed like if the conditional could possibly be activated, then the
behavior was also incorrect.
Actually, couldn't this happen if something goes through
widget_tree_free(), then something else goes through a reinit()? If
the reinit call doesn't have the same number of widgets as before,
then you'd need my patch to avoid initing with the wrong array size.
>
> > Actually, in looking at the widget lock patch, I don't think it's
> > sufficient either. It adds a lock around sysfs_reinit, but the setting
> > of codec->num_nodes and codec->start_nid is unprotected by the lock.
> > So you could have the two threads politely serialize through
> > sysfs_reinit, but then get reordered before setting codec->num_nodes,
> > landing you with an array whose length doesn't match num_nodes.
>
> The usage of snd_hdac_refresh_widgets() is supposed to be done only at
> the codec probe phase, hence there is no lock done in the core code;
> IOW, any concurrent access has to be protected in the caller side in
> general.
>
> Have you actually seen such concurrent accesses? If yes, that's a
> problem in the usage.
I got into staring at this code while trying to debug a KASAN
use-after-free in this code. I found the issue in this patch by
inspection, so I'm not 100% sure if it could ever happen. My
use-after-free appears to be fixed by the new widget_lock (I didn't
have that at the time), but I think at least my other patch in v2 is
necessary to make the widget_lock work correctly.
I'll document my use-after-free here in case it helps. The cleaned up
stacks racing to mess with sysfs look like this:
Thread A:
[ 28.296049] hda_widget_sysfs_reinit+0x184/0x552 [snd_hda_core]
[ 28.526242] snd_hdac_refresh_widgets+0x106/0x39c [snd_hda_core]
[ 28.542238] hdac_hdmi_dev_probe+0x19a8/0x1af5 [snd_soc_hdac_hdmi]
[ 28.586440] really_probe+0x3f4/0x58e
[ 28.590563] driver_probe_device+0xb1/0x1db
[ 28.595266] __driver_attach+0x14f/0x1f7
[ 28.599684] bus_for_each_dev+0x146/0x1a1
[ 28.621944] bus_add_driver+0x349/0x4f6
[ 28.626266] driver_register+0x1cb/0x328
[ 28.634411] do_one_initcall+0x3ea/0x903
[ 28.738132] do_init_module+0x1f9/0x56d
[ 28.742463] load_module+0x74d6/0x8d0a
[ 28.783270] __se_sys_finit_module+0x1d6/0x244
Thread B:
Thread B:
/mnt/host/source/src/third_party/kernel/v4.19/sound/hda/hdac_sysfs.c:423
hda_widget_sysfs_init+0x345/0x41d
[snd_hda_core]
[ 29.178793] snd_hdac_device_register+0x20/0x3a [snd_hda_core]
[ 29.185351] snd_hdac_ext_bus_device_init+0x151/0x240 [snd_hda_ext_core]
[ 29.211982] hda_codec_probe_bus+0x298/0x35a [snd_sof_intel_hda]
[ 29.228663] hda_dsp_probe+0x15aa/0x1da9 [snd_sof_intel_hda_common]
[ 29.257109] sof_probe_work+0x120/0x930 [snd_sof]
[ 29.284174] process_one_work+0x90b/0x11b6
[ 29.318222] worker_thread+0xad5/0xdcc
[ 29.336172] kthread+0x34e/0x35e
The original use-after-free:
[ 27.822086]
==================================================================
[ 27.830522] BUG: KASAN: use-after-free in
add_widget_node+0xc6/0xf3 [snd_hda_core]
[ 27.839073] Write of size 8 at addr ffff888405d72f10 by task
kworker/2:1/67
[ 27.846940]
[ 27.848645] CPU: 2 PID: 67 Comm: kworker/2:1 Not tainted 4.19.44 #93
[ 27.855832] Hardware name: Google Hatch/Hatch, BIOS
Google_Hatch.12225.0.2019_05_24_1436 05/24/2019
[ 27.866009] Workqueue: events sof_probe_work [snd_sof]
[ 27.871819] Call Trace:
[ 27.874612] dump_stack+0x122/0x1b5
[ 27.878574] ? do_raw_spin_lock+0xbd/0x1e9
[ 27.883216] ? show_regs_print_info+0x5/0x5
[ 27.887962] ? log_buf_vmcoreinfo_setup+0x131/0x131
[ 27.893483] ? _raw_spin_lock_irqsave+0xc5/0xfa
[ 27.898620] print_address_description+0x88/0x271
[ 27.903959] ? add_widget_node+0xc6/0xf3 [snd_hda_core]
[ 27.909876] kasan_report+0x274/0x29e
[ 27.914032] add_widget_node+0xc6/0xf3 [snd_hda_core]
[ 27.919759] hda_widget_sysfs_init+0x2b8/0x3a5 [snd_hda_core]
[ 27.926283] snd_hdac_device_register+0x20/0x3a [snd_hda_core]
[ 27.932897] snd_hdac_ext_bus_device_init+0x151/0x240 [snd_hda_ext_core]
[ 27.940485] ? snd_hdac_ext_bus_exit+0x44/0x44 [snd_hda_ext_core]
[ 27.947381] ? kmem_cache_alloc_trace+0x11d/0x19a
[ 27.952712] ? hda_codec_probe_bus+0x283/0x35a [snd_sof_intel_hda]
[ 27.959705] hda_codec_probe_bus+0x298/0x35a [snd_sof_intel_hda]
[ 27.966517] ? 0xffffffffa05c0000
[ 27.970288] ? snd_sof_pci_update_bits+0x53/0x64 [snd_sof]
[ 27.976515] hda_dsp_probe+0x15df/0x1de5 [snd_sof_intel_hda_common]
[ 27.983619] ? print_irqtrace_events+0x223/0x223
[ 27.988861] ? hda_dsp_get_status+0x1ba/0x1ba [snd_sof_intel_hda_common]
[ 27.996433] ? reacquire_held_locks+0x373/0x373
[ 28.001570] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 28.006817] ? __lock_is_held+0x61/0x11e
[ 28.011277] sof_probe_work+0x120/0x930 [snd_sof]
[ 28.016617] ? snd_sof_device_probe+0x47f/0x47f [snd_sof]
[ 28.022736] ? rcu_read_lock_sched_held+0x140/0x22a
[ 28.028260] ? __bpf_trace_rcu_utilization+0xa/0xa
[ 28.033689] ? read_word_at_a_time+0x12/0x18
[ 28.038539] process_one_work+0x90b/0x11b6
[ 28.043210] ? worker_detach_from_pool+0x1fa/0x1fa
[ 28.048636] ? is_mmconf_reserved+0x3bc/0x3bc
[ 28.053577] ? lock_downgrade+0x60a/0x60a
[ 28.058133] ? lockdep_hardirqs_on+0x6d8/0x6d8
[ 28.063163] ? _raw_spin_unlock_irq+0x83/0x100
[ 28.068208] ? do_raw_spin_lock+0xbd/0x1e9
[ 28.068549] cr50_spi spi-PRP0001:01: SPI transfer timed out
[ 28.072842] worker_thread+0xad5/0xdcc
[ 28.072861] ? _raw_spin_unlock+0xfa/0xfa
[ 28.072868] ? _raw_spin_lock_irqsave+0xc5/0xfa
[ 28.072898] ? pr_cont_work+0xe6/0xe6
[ 28.072905] kthread+0x34e/0x35e
[ 28.072914] ? pr_cont_work+0xe6/0xe6
[ 28.072922] ? kthread_blkcg+0xa2/0xa2
[ 28.109396] ret_from_fork+0x24/0x50
[ 28.113433]
[ 28.115130] Allocated by task 67:
[ 28.118894] kmem_cache_alloc_trace+0x11d/0x19a
[ 28.124038] hda_widget_sysfs_init+0x8a/0x3a5 [snd_hda_core]
[ 28.130444] snd_hdac_device_register+0x20/0x3a [snd_hda_core]
[ 28.137047] snd_hdac_ext_bus_device_init+0x151/0x240 [snd_hda_ext_core]
[ 28.144620] hda_codec_probe_bus+0x298/0x35a [snd_sof_intel_hda]
[ 28.151417] hda_dsp_probe+0x15df/0x1de5 [snd_sof_intel_hda_common]
[ 28.158508] sof_probe_work+0x120/0x930 [snd_sof]
[ 28.163829] process_one_work+0x90b/0x11b6
[ 28.168457] worker_thread+0xad5/0xdcc
[ 28.172693] kthread+0x34e/0x35e
[ 28.176349] ret_from_fork+0x24/0x50
[ 28.180388]
[ 28.182078] Freed by task 2983:
[ 28.185639] kfree+0x239/0x723
[ 28.189106] hda_widget_sysfs_reinit+0x47e/0x50a [snd_hda_core]
[ 28.195824] snd_hdac_refresh_widgets+0xf8/0x2a0 [snd_hda_core]
[ 28.202527] hdac_hdmi_dev_probe+0x19a8/0x1af5 [snd_soc_hdac_hdmi]
[ 28.209514] really_probe+0x237/0x58e
[ 28.213673] driver_probe_device+0xb1/0x1db
[ 28.218415] __driver_attach+0x14f/0x1f7
[ 28.222857] bus_for_each_dev+0x146/0x1a1
[ 28.227402] bus_add_driver+0x349/0x4f6
[ 28.231736] driver_register+0x1cb/0x328
[ 28.236164] do_one_initcall+0x3ea/0x903
[ 28.240611] do_init_module+0x1f9/0x56d
[ 28.244971] load_module+0x74d6/0x8d0a
[ 28.249218] __se_sys_finit_module+0x1d6/0x244
[ 28.254250] do_syscall_64+0xcd/0x120
[ 28.258389] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 28.264095]
[ 28.265794] The buggy address belongs to the object at ffff888405d72f08
[ 28.265794] which belongs to the cache kmalloc-32 of size 32
[ 28.279733] The buggy address is located 8 bytes inside of
[ 28.279733] 32-byte region [ffff888405d72f08, ffff888405d72f28)
[ 28.292627] The buggy address belongs to the page:
[ 28.298037] page:ffffea0010175c80 count:1 mapcount:0
mapping:ffff88840cc0c5c0 index:0x0 compound_mapcount: 0
[ 28.308498] cr50_spi spi-PRP0001:01: SPI transfer timed out
[ 28.309102] flags: 0x8000000000008100(slab|head)
[ 28.309111] raw: 8000000000008100 ffffea0010175c08
ffffea00102ef108 ffff88840cc0c5c0
[ 28.329340] raw: 0000000000000000 0000000000150015
00000001ffffffff 0000000000000000
[ 28.338091] page dumped because: kasan: bad access detected
[ 28.344381]
[ 28.346079] Memory state around the buggy address:
[ 28.351497] ffff888405d72e00: fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc fc
[ 28.359663] ffff888405d72e80: fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc fc
[ 28.367824] >ffff888405d72f00: fc fb fb fb fb fc fc fc fc fc fc
fc fc fc fc fc
[ 28.375987] ^
[ 28.380236] ffff888405d72f80: fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc fc
[ 28.388401] ffff888405d73000: fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc fc
Again, sorry about prematurely sending out the v2. This patch hasn't
changed, so feel free to ignore that part of the v2. The other patch
though is new, and I think is needed regardless of the state of this
patch.
-Evan
>
>
> thanks,
>
> Takashi
>
> >
> > Let me craft up an additional patch to fix the locking.
> > -Evan
> >
> > >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> > > --- a/sound/hda/hdac_sysfs.c
> > > +++ b/sound/hda/hdac_sysfs.c
> > > @@ -428,7 +428,7 @@ int hda_widget_sysfs_reinit(struct hdac_device *codec,
> > > int i;
> > >
> > > if (!codec->widgets)
> > > - return hda_widget_sysfs_init(codec);
> > > + return 0;
> > >
> > > tree = kmemdup(codec->widgets, sizeof(*tree), GFP_KERNEL);
> > > if (!tree)
> >
On Wed, 26 Jun 2019 23:59:33 +0200,
Evan Green wrote:
>
> On Wed, Jun 26, 2019 at 2:16 PM Takashi Iwai <[email protected]> wrote:
> >
> > On Wed, 26 Jun 2019 22:34:28 +0200,
> > Evan Green wrote:
> > >
> > > On Wed, Jun 26, 2019 at 1:27 AM Takashi Iwai <[email protected]> wrote:
> > > >
> > > > On Tue, 25 Jun 2019 23:54:18 +0200,
> > > > Evan Green wrote:
> > > > >
> > > > > The normal flow through the widget sysfs codepath is that
> > > > > snd_hdac_refresh_widgets() is called once without the sysfs bool set
> > > > > to set up codec->num_nodes and friends, then another time with the
> > > > > bool set to actually allocate all the sysfs widgets. However, during
> > > > > the first time allocation, hda_widget_sysfs_reinit() ignores the new
> > > > > num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
> > > > > using whatever was in codec->num_nodes before the update. This is not
> > > > > correct in cases where num_nodes changes. Here's an example:
> > > > >
> > > > > Sometime earlier:
> > > > > snd_hdac_refresh_widgets(hdac, false)
> > > > > sets codec->num_nodes to 2, widgets is still not allocated
> > > > >
> > > > > Now:
> > > > > snd_hdac_refresh_widgets(hdac, true)
> > > > > hda_widget_sysfs_reinit(num_nodes=7)
> > > > > hda_widget_sysfs_init()
> > > > > widget_tree_create()
> > > > > alloc(codec->num_nodes) // this is still 2
> > > > > codec->num_nodes = 7
> > > > >
> > > > > Pass num_nodes and start_nid down into widget_tree_create() so that
> > > > > the right number of nodes are allocated in all cases.
> > > > >
> > > > > Signed-off-by: Evan Green <[email protected]>
> > > >
> > > > Thanks for the patch. That's indeed a problem, but I guess a simpler
> > > > approach is just to return if sysfs didn't exist. If the sysfs
> > > > entries aren't present at the second call with sysfs=true, it implies
> > > > that the codec object will be exposed anyway later, and the sysfs will
> > > > be created there. So, something like below would work instead?
> > >
> > > Hi Takashi,
> > > Thanks for taking a look. I'm not sure you'd want to do that because
> > > then you end up returning from sysfs_reinit without having allocated
> > > any of the sysfs widgets. You'd be relying on the implicit behavior
> > > that another call to init is coming later (despite having updated
> > > num_nodes and start node), which is difficult to follow and easy to
> > > break. In my opinion the slight bit of extra diffs is well worth the
> > > clarity of having widget_tree_create always allocate the correct
> > > start/count.
> >
> > Well, skipping is the right behavior, actually. The whole need of the
> > refresh function is just to refresh the widget list, and the current
> > behavior to create a sysfs is rather superfluous. This action has
> > never been used, so better to get removed for avoiding misuse.
>
> Whoops, I sent out a v2 before seeing this. Sorry to jump the gun like that.
>
> I don't quite follow what you mean by "current behavior to create a
> sysfs is rather superfluous". Do you think we could delete this
> conditional in re-init altogether? I wasn't totally sure, but it
> seemed like if the conditional could possibly be activated, then the
> behavior was also incorrect.
>
> Actually, couldn't this happen if something goes through
> widget_tree_free(), then something else goes through a reinit()? If
> the reinit call doesn't have the same number of widgets as before,
> then you'd need my patch to avoid initing with the wrong array size.
I meant that hda_widget_sysfs_reinit() creates sysfs files if they
weren't present. hda_widget_sysfs_reinit() should do nothing if the
sysfs wasn't created beforehand -- like my suggested patch does.
After that change, "bool sysfs" argument can be de even dropped from
snd_hdac_refresh_widgets(). The very first call of this function is
with sysfs=false, but at this point, codec->widgets=NULL, so reinit()
would just skip. All later calls are with sysfs=true.
> > > Actually, in looking at the widget lock patch, I don't think it's
> > > sufficient either. It adds a lock around sysfs_reinit, but the setting
> > > of codec->num_nodes and codec->start_nid is unprotected by the lock.
> > > So you could have the two threads politely serialize through
> > > sysfs_reinit, but then get reordered before setting codec->num_nodes,
> > > landing you with an array whose length doesn't match num_nodes.
> >
> > The usage of snd_hdac_refresh_widgets() is supposed to be done only at
> > the codec probe phase, hence there is no lock done in the core code;
> > IOW, any concurrent access has to be protected in the caller side in
> > general.
> >
> > Have you actually seen such concurrent accesses? If yes, that's a
> > problem in the usage.
>
> I got into staring at this code while trying to debug a KASAN
> use-after-free in this code. I found the issue in this patch by
> inspection, so I'm not 100% sure if it could ever happen. My
> use-after-free appears to be fixed by the new widget_lock (I didn't
> have that at the time), but I think at least my other patch in v2 is
> necessary to make the widget_lock work correctly.
>
> I'll document my use-after-free here in case it helps. The cleaned up
> stacks racing to mess with sysfs look like this:
(snip)
OK, this information helps a lot for understanding the problem.
The parallel initialization hasn't been considered much, so far.
Let me check that in detail later. I'll postspone your v2 patch for a
while.
thanks,
Takashi
> Thread A:
> [ 28.296049] hda_widget_sysfs_reinit+0x184/0x552 [snd_hda_core]
> [ 28.526242] snd_hdac_refresh_widgets+0x106/0x39c [snd_hda_core]
> [ 28.542238] hdac_hdmi_dev_probe+0x19a8/0x1af5 [snd_soc_hdac_hdmi]
> [ 28.586440] really_probe+0x3f4/0x58e
> [ 28.590563] driver_probe_device+0xb1/0x1db
> [ 28.595266] __driver_attach+0x14f/0x1f7
> [ 28.599684] bus_for_each_dev+0x146/0x1a1
> [ 28.621944] bus_add_driver+0x349/0x4f6
> [ 28.626266] driver_register+0x1cb/0x328
> [ 28.634411] do_one_initcall+0x3ea/0x903
> [ 28.738132] do_init_module+0x1f9/0x56d
> [ 28.742463] load_module+0x74d6/0x8d0a
> [ 28.783270] __se_sys_finit_module+0x1d6/0x244
>
> Thread B:
> Thread B:
> /mnt/host/source/src/third_party/kernel/v4.19/sound/hda/hdac_sysfs.c:423
> hda_widget_sysfs_init+0x345/0x41d
> [snd_hda_core]
> [ 29.178793] snd_hdac_device_register+0x20/0x3a [snd_hda_core]
> [ 29.185351] snd_hdac_ext_bus_device_init+0x151/0x240 [snd_hda_ext_core]
> [ 29.211982] hda_codec_probe_bus+0x298/0x35a [snd_sof_intel_hda]
> [ 29.228663] hda_dsp_probe+0x15aa/0x1da9 [snd_sof_intel_hda_common]
> [ 29.257109] sof_probe_work+0x120/0x930 [snd_sof]
> [ 29.284174] process_one_work+0x90b/0x11b6
> [ 29.318222] worker_thread+0xad5/0xdcc
> [ 29.336172] kthread+0x34e/0x35e
>
> The original use-after-free:
> [ 27.822086]
> ==================================================================
> [ 27.830522] BUG: KASAN: use-after-free in
> add_widget_node+0xc6/0xf3 [snd_hda_core]
> [ 27.839073] Write of size 8 at addr ffff888405d72f10 by task
> kworker/2:1/67
> [ 27.846940]
> [ 27.848645] CPU: 2 PID: 67 Comm: kworker/2:1 Not tainted 4.19.44 #93
> [ 27.855832] Hardware name: Google Hatch/Hatch, BIOS
> Google_Hatch.12225.0.2019_05_24_1436 05/24/2019
> [ 27.866009] Workqueue: events sof_probe_work [snd_sof]
> [ 27.871819] Call Trace:
> [ 27.874612] dump_stack+0x122/0x1b5
> [ 27.878574] ? do_raw_spin_lock+0xbd/0x1e9
> [ 27.883216] ? show_regs_print_info+0x5/0x5
> [ 27.887962] ? log_buf_vmcoreinfo_setup+0x131/0x131
> [ 27.893483] ? _raw_spin_lock_irqsave+0xc5/0xfa
> [ 27.898620] print_address_description+0x88/0x271
> [ 27.903959] ? add_widget_node+0xc6/0xf3 [snd_hda_core]
> [ 27.909876] kasan_report+0x274/0x29e
> [ 27.914032] add_widget_node+0xc6/0xf3 [snd_hda_core]
> [ 27.919759] hda_widget_sysfs_init+0x2b8/0x3a5 [snd_hda_core]
> [ 27.926283] snd_hdac_device_register+0x20/0x3a [snd_hda_core]
> [ 27.932897] snd_hdac_ext_bus_device_init+0x151/0x240 [snd_hda_ext_core]
> [ 27.940485] ? snd_hdac_ext_bus_exit+0x44/0x44 [snd_hda_ext_core]
> [ 27.947381] ? kmem_cache_alloc_trace+0x11d/0x19a
> [ 27.952712] ? hda_codec_probe_bus+0x283/0x35a [snd_sof_intel_hda]
> [ 27.959705] hda_codec_probe_bus+0x298/0x35a [snd_sof_intel_hda]
> [ 27.966517] ? 0xffffffffa05c0000
> [ 27.970288] ? snd_sof_pci_update_bits+0x53/0x64 [snd_sof]
> [ 27.976515] hda_dsp_probe+0x15df/0x1de5 [snd_sof_intel_hda_common]
> [ 27.983619] ? print_irqtrace_events+0x223/0x223
> [ 27.988861] ? hda_dsp_get_status+0x1ba/0x1ba [snd_sof_intel_hda_common]
> [ 27.996433] ? reacquire_held_locks+0x373/0x373
> [ 28.001570] ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ 28.006817] ? __lock_is_held+0x61/0x11e
> [ 28.011277] sof_probe_work+0x120/0x930 [snd_sof]
> [ 28.016617] ? snd_sof_device_probe+0x47f/0x47f [snd_sof]
> [ 28.022736] ? rcu_read_lock_sched_held+0x140/0x22a
> [ 28.028260] ? __bpf_trace_rcu_utilization+0xa/0xa
> [ 28.033689] ? read_word_at_a_time+0x12/0x18
> [ 28.038539] process_one_work+0x90b/0x11b6
> [ 28.043210] ? worker_detach_from_pool+0x1fa/0x1fa
> [ 28.048636] ? is_mmconf_reserved+0x3bc/0x3bc
> [ 28.053577] ? lock_downgrade+0x60a/0x60a
> [ 28.058133] ? lockdep_hardirqs_on+0x6d8/0x6d8
> [ 28.063163] ? _raw_spin_unlock_irq+0x83/0x100
> [ 28.068208] ? do_raw_spin_lock+0xbd/0x1e9
> [ 28.068549] cr50_spi spi-PRP0001:01: SPI transfer timed out
> [ 28.072842] worker_thread+0xad5/0xdcc
> [ 28.072861] ? _raw_spin_unlock+0xfa/0xfa
> [ 28.072868] ? _raw_spin_lock_irqsave+0xc5/0xfa
> [ 28.072898] ? pr_cont_work+0xe6/0xe6
> [ 28.072905] kthread+0x34e/0x35e
> [ 28.072914] ? pr_cont_work+0xe6/0xe6
> [ 28.072922] ? kthread_blkcg+0xa2/0xa2
> [ 28.109396] ret_from_fork+0x24/0x50
> [ 28.113433]
> [ 28.115130] Allocated by task 67:
> [ 28.118894] kmem_cache_alloc_trace+0x11d/0x19a
> [ 28.124038] hda_widget_sysfs_init+0x8a/0x3a5 [snd_hda_core]
> [ 28.130444] snd_hdac_device_register+0x20/0x3a [snd_hda_core]
> [ 28.137047] snd_hdac_ext_bus_device_init+0x151/0x240 [snd_hda_ext_core]
> [ 28.144620] hda_codec_probe_bus+0x298/0x35a [snd_sof_intel_hda]
> [ 28.151417] hda_dsp_probe+0x15df/0x1de5 [snd_sof_intel_hda_common]
> [ 28.158508] sof_probe_work+0x120/0x930 [snd_sof]
> [ 28.163829] process_one_work+0x90b/0x11b6
> [ 28.168457] worker_thread+0xad5/0xdcc
> [ 28.172693] kthread+0x34e/0x35e
> [ 28.176349] ret_from_fork+0x24/0x50
> [ 28.180388]
> [ 28.182078] Freed by task 2983:
> [ 28.185639] kfree+0x239/0x723
> [ 28.189106] hda_widget_sysfs_reinit+0x47e/0x50a [snd_hda_core]
> [ 28.195824] snd_hdac_refresh_widgets+0xf8/0x2a0 [snd_hda_core]
> [ 28.202527] hdac_hdmi_dev_probe+0x19a8/0x1af5 [snd_soc_hdac_hdmi]
> [ 28.209514] really_probe+0x237/0x58e
> [ 28.213673] driver_probe_device+0xb1/0x1db
> [ 28.218415] __driver_attach+0x14f/0x1f7
> [ 28.222857] bus_for_each_dev+0x146/0x1a1
> [ 28.227402] bus_add_driver+0x349/0x4f6
> [ 28.231736] driver_register+0x1cb/0x328
> [ 28.236164] do_one_initcall+0x3ea/0x903
> [ 28.240611] do_init_module+0x1f9/0x56d
> [ 28.244971] load_module+0x74d6/0x8d0a
> [ 28.249218] __se_sys_finit_module+0x1d6/0x244
> [ 28.254250] do_syscall_64+0xcd/0x120
> [ 28.258389] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 28.264095]
> [ 28.265794] The buggy address belongs to the object at ffff888405d72f08
> [ 28.265794] which belongs to the cache kmalloc-32 of size 32
> [ 28.279733] The buggy address is located 8 bytes inside of
> [ 28.279733] 32-byte region [ffff888405d72f08, ffff888405d72f28)
> [ 28.292627] The buggy address belongs to the page:
> [ 28.298037] page:ffffea0010175c80 count:1 mapcount:0
> mapping:ffff88840cc0c5c0 index:0x0 compound_mapcount: 0
> [ 28.308498] cr50_spi spi-PRP0001:01: SPI transfer timed out
> [ 28.309102] flags: 0x8000000000008100(slab|head)
> [ 28.309111] raw: 8000000000008100 ffffea0010175c08
> ffffea00102ef108 ffff88840cc0c5c0
> [ 28.329340] raw: 0000000000000000 0000000000150015
> 00000001ffffffff 0000000000000000
> [ 28.338091] page dumped because: kasan: bad access detected
> [ 28.344381]
> [ 28.346079] Memory state around the buggy address:
> [ 28.351497] ffff888405d72e00: fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc
> [ 28.359663] ffff888405d72e80: fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc
> [ 28.367824] >ffff888405d72f00: fc fb fb fb fb fc fc fc fc fc fc
> fc fc fc fc fc
> [ 28.375987] ^
> [ 28.380236] ffff888405d72f80: fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc
> [ 28.388401] ffff888405d73000: fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc fc
>
> Again, sorry about prematurely sending out the v2. This patch hasn't
> changed, so feel free to ignore that part of the v2. The other patch
> though is new, and I think is needed regardless of the state of this
> patch.
>
> -Evan
>
> >
> >
> > thanks,
> >
> > Takashi
> >
> > >
> > > Let me craft up an additional patch to fix the locking.
> > > -Evan
> > >
> > > >
> > > >
> > > > thanks,
> > > >
> > > > Takashi
> > > >
> > > > --- a/sound/hda/hdac_sysfs.c
> > > > +++ b/sound/hda/hdac_sysfs.c
> > > > @@ -428,7 +428,7 @@ int hda_widget_sysfs_reinit(struct hdac_device *codec,
> > > > int i;
> > > >
> > > > if (!codec->widgets)
> > > > - return hda_widget_sysfs_init(codec);
> > > > + return 0;
> > > >
> > > > tree = kmemdup(codec->widgets, sizeof(*tree), GFP_KERNEL);
> > > > if (!tree)
> > >
>
On Wed, Jun 26, 2019 at 10:26 PM Takashi Iwai <[email protected]> wrote:
>
> On Wed, 26 Jun 2019 23:59:33 +0200,
> Evan Green wrote:
> >
> > On Wed, Jun 26, 2019 at 2:16 PM Takashi Iwai <[email protected]> wrote:
> > >
> > > On Wed, 26 Jun 2019 22:34:28 +0200,
> > > Evan Green wrote:
> > > >
> > > > On Wed, Jun 26, 2019 at 1:27 AM Takashi Iwai <[email protected]> wrote:
> > > > >
> > > > > On Tue, 25 Jun 2019 23:54:18 +0200,
> > > > > Evan Green wrote:
> > > > > >
> > > > > > The normal flow through the widget sysfs codepath is that
> > > > > > snd_hdac_refresh_widgets() is called once without the sysfs bool set
> > > > > > to set up codec->num_nodes and friends, then another time with the
> > > > > > bool set to actually allocate all the sysfs widgets. However, during
> > > > > > the first time allocation, hda_widget_sysfs_reinit() ignores the new
> > > > > > num_nodes passed in via parameter and just calls hda_widget_sysfs_init(),
> > > > > > using whatever was in codec->num_nodes before the update. This is not
> > > > > > correct in cases where num_nodes changes. Here's an example:
> > > > > >
> > > > > > Sometime earlier:
> > > > > > snd_hdac_refresh_widgets(hdac, false)
> > > > > > sets codec->num_nodes to 2, widgets is still not allocated
> > > > > >
> > > > > > Now:
> > > > > > snd_hdac_refresh_widgets(hdac, true)
> > > > > > hda_widget_sysfs_reinit(num_nodes=7)
> > > > > > hda_widget_sysfs_init()
> > > > > > widget_tree_create()
> > > > > > alloc(codec->num_nodes) // this is still 2
> > > > > > codec->num_nodes = 7
> > > > > >
> > > > > > Pass num_nodes and start_nid down into widget_tree_create() so that
> > > > > > the right number of nodes are allocated in all cases.
> > > > > >
> > > > > > Signed-off-by: Evan Green <[email protected]>
> > > > >
> > > > > Thanks for the patch. That's indeed a problem, but I guess a simpler
> > > > > approach is just to return if sysfs didn't exist. If the sysfs
> > > > > entries aren't present at the second call with sysfs=true, it implies
> > > > > that the codec object will be exposed anyway later, and the sysfs will
> > > > > be created there. So, something like below would work instead?
> > > >
> > > > Hi Takashi,
> > > > Thanks for taking a look. I'm not sure you'd want to do that because
> > > > then you end up returning from sysfs_reinit without having allocated
> > > > any of the sysfs widgets. You'd be relying on the implicit behavior
> > > > that another call to init is coming later (despite having updated
> > > > num_nodes and start node), which is difficult to follow and easy to
> > > > break. In my opinion the slight bit of extra diffs is well worth the
> > > > clarity of having widget_tree_create always allocate the correct
> > > > start/count.
> > >
> > > Well, skipping is the right behavior, actually. The whole need of the
> > > refresh function is just to refresh the widget list, and the current
> > > behavior to create a sysfs is rather superfluous. This action has
> > > never been used, so better to get removed for avoiding misuse.
> >
> > Whoops, I sent out a v2 before seeing this. Sorry to jump the gun like that.
> >
> > I don't quite follow what you mean by "current behavior to create a
> > sysfs is rather superfluous". Do you think we could delete this
> > conditional in re-init altogether? I wasn't totally sure, but it
> > seemed like if the conditional could possibly be activated, then the
> > behavior was also incorrect.
> >
> > Actually, couldn't this happen if something goes through
> > widget_tree_free(), then something else goes through a reinit()? If
> > the reinit call doesn't have the same number of widgets as before,
> > then you'd need my patch to avoid initing with the wrong array size.
>
> I meant that hda_widget_sysfs_reinit() creates sysfs files if they
> weren't present. hda_widget_sysfs_reinit() should do nothing if the
> sysfs wasn't created beforehand -- like my suggested patch does.
>
> After that change, "bool sysfs" argument can be de even dropped from
> snd_hdac_refresh_widgets(). The very first call of this function is
> with sysfs=false, but at this point, codec->widgets=NULL, so reinit()
> would just skip. All later calls are with sysfs=true.
Hi Takashi,
I think I understand. That might work, but it would definitely require
my other patch in v2. The contributing factors to that are:
1. The count of widgets coming out of snd_hdac_get_sub_nodes() can change.
2. hda_widget_sysfs_init() races with at least one (and in my head I
just make it multiple) calls to snd_hdac_refresh_widgets().
3. codec->{num_nodes, start_nid, widgets} are all essentially one data
structure that needs to be protected.
Without my other patch in v2 [1], num_nodes and start_nid are not
updated at the same time as the widgets array.
The patch you suggest may work as long as the lock surrounds all of 1)
the read in snd_hdac_get_sub_nodes(), 2) the updating of widgets
array, and 3) updating start_nid, num_nodes.
However, by doing it the way you suggest we're effectively saying "if
codec->widgets doesn't exist, then reinit does nothing". I worry that
a future caller might try to do something that requires the widgets to
have actually been refreshed after calling refresh_widgets(), but end
up crashing because they had won the race against sysfs_init(), and
refresh_widgets() did nothing except set codec->num_nodes. Which seems
weird, you'd expect a function named refresh_widgets() would keep the
whole data structure consistent, and that sysfs_reinit() would return
having actually re-initialized sysfs. I was trying to preserve those
semantics with my patch, rather than introduce an implicit ordering
dependency between refresh_widgets() and sysfs_init().
>
>
> > > > Actually, in looking at the widget lock patch, I don't think it's
> > > > sufficient either. It adds a lock around sysfs_reinit, but the setting
> > > > of codec->num_nodes and codec->start_nid is unprotected by the lock.
> > > > So you could have the two threads politely serialize through
> > > > sysfs_reinit, but then get reordered before setting codec->num_nodes,
> > > > landing you with an array whose length doesn't match num_nodes.
> > >
> > > The usage of snd_hdac_refresh_widgets() is supposed to be done only at
> > > the codec probe phase, hence there is no lock done in the core code;
> > > IOW, any concurrent access has to be protected in the caller side in
> > > general.
> > >
> > > Have you actually seen such concurrent accesses? If yes, that's a
> > > problem in the usage.
> >
> > I got into staring at this code while trying to debug a KASAN
> > use-after-free in this code. I found the issue in this patch by
> > inspection, so I'm not 100% sure if it could ever happen. My
> > use-after-free appears to be fixed by the new widget_lock (I didn't
> > have that at the time), but I think at least my other patch in v2 is
> > necessary to make the widget_lock work correctly.
> >
> > I'll document my use-after-free here in case it helps. The cleaned up
> > stacks racing to mess with sysfs look like this:
> (snip)
>
> OK, this information helps a lot for understanding the problem.
> The parallel initialization hasn't been considered much, so far.
> Let me check that in detail later. I'll postspone your v2 patch for a
> while.
Ok. Please do consider the other patch in that series [1], as I
believe it is needed no matter which way we do this patch.
[1] https://lore.kernel.org/lkml/[email protected]/
-Evan