Hi Lukas,
With KASAN now working on ARM 32-bit, I was able to get the following
trace upon reboot which invokes bcm2835_spi_shutdown() calling
bcm2835_spi_remove(), the same can be triggered by doing a driver unbind:
# pwd
/sys/devices/platform/rdb/47e204800.spi/driver
# echo 47e204800.spi > unbind
How would you go about fixing this? This was not on a Rpi 4 but in
premise the same problem exists there.
Thanks!
[ 229.746516]
==================================================================
[ 229.754013] BUG: KASAN: use-after-free in bcm2835_dma_release+0x2c/0x260
[ 229.760820] Read of size 4 at addr e0f08358 by task reboot/157
[ 229.766727]
[ 229.768302] CPU: 0 PID: 157 Comm: reboot Not tainted
5.9.0-gdf4dd84a3f7d #27
[ 229.775445] Hardware name: Broadcom STB (Flattened Device Tree)
[ 229.781448] Backtrace:
[ 229.784017] [<c02120b4>] (dump_backtrace) from [<c02123d8>]
(show_stack+0x20/0x24)
[ 229.791738] r9:ffffffff r8:00000080 r7:c298e3c0 r6:400f0093
r5:00000000 r4:c298e3c0
[ 229.799655] [<c02123b8>] (show_stack) from [<c08852a0>]
(dump_stack+0xbc/0xe0)
[ 229.807050] [<c08851e4>] (dump_stack) from [<c04522bc>]
(print_address_description.constprop.3+0x3c/0x4b0)
[ 229.816863] r10:c2b771c0 r9:e46d9848 r8:e46d9854 r7:00000000
r6:c0b3ea3c r5:eeea5940
[ 229.824815] r4:e0f08358 r3:00000100
[ 229.828510] [<c0452280>] (print_address_description.constprop.3) from
[<c0452944>] (kasan_report+0x15c/0x178)
[ 229.838575] r8:e46d9854 r7:00000000 r6:c0b3ea3c r5:0000009d r4:e0f08358
[ 229.845411] [<c04527e8>] (kasan_report) from [<c0452f24>]
(__asan_load4+0x6c/0xbc)
[ 229.853109] r7:e0f08380 r6:e0f08000 r5:e0f08358 r4:e0f08380
[ 229.858898] [<c0452eb8>] (__asan_load4) from [<c0b3ea3c>]
(bcm2835_dma_release+0x2c/0x260)
[ 229.867318] [<c0b3ea10>] (bcm2835_dma_release) from [<c0b3ecd8>]
(bcm2835_spi_remove+0x68/0xa4)
[ 229.876166] r9:e46d9848 r8:e46d9854 r7:e0f083c0 r6:00000000
r5:e0f08000 r4:e0f08380
[ 229.884069] [<c0b3ec70>] (bcm2835_spi_remove) from [<c0b3ed30>]
(bcm2835_spi_shutdown+0x1c/0x38)
[ 229.892991] r7:c2fc7f40 r6:e46d9810 r5:c2a1d854 r4:e46d9800
[ 229.898788] [<c0b3ed14>] (bcm2835_spi_shutdown) from [<c0a17010>]
(platform_drv_shutdown+0x40/0x44)
[ 229.907958] r5:c2a1d854 r4:e46d9810
[ 229.911653] [<c0a16fd0>] (platform_drv_shutdown) from [<c0a0f91c>]
(device_shutdown+0x248/0x35c)
[ 229.920561] r5:e465b810 r4:e46d9814
[ 229.924255] [<c0a0f6d4>] (device_shutdown) from [<c0269418>]
(kernel_restart_prepare+0x4c/0x50)
[ 229.933103] r10:01234567 r9:fee1dead r8:dfdb3f60 r7:c2835240
r6:c2806d48 r5:00000000
[ 229.941045] r4:c2806d40
[ 229.943675] [<c02693cc>] (kernel_restart_prepare) from [<c0269528>]
(kernel_restart+0x1c/0x60)
[ 229.952405] r5:00000000 r4:00000000
[ 229.956084] [<c026950c>] (kernel_restart) from [<c0269810>]
(__do_sys_reboot+0x148/0x260)
[ 229.964380] r5:00000000 r4:bafb67c0
[ 229.968057] [<c02696c8>] (__do_sys_reboot) from [<c0269998>]
(sys_reboot+0x18/0x1c)
[ 229.975852] r10:00000058 r9:dfdb0000 r8:c0200228 r7:00000058
r6:00000000 r5:00000004
[ 229.983792] r4:00000002
[ 229.986422] [<c0269980>] (sys_reboot) from [<c0200060>]
(ret_fast_syscall+0x0/0x2c)
[ 229.994190] Exception stack(0xdfdb3fa8 to 0xdfdb3ff0)
[ 229.999350] 3fa0: 00000002 00000004 fee1dead
28121969 01234567 000a9864
[ 230.007669] 3fc0: 00000002 00000004 00000000 00000058 00000000
00000000 aedbe000 00000000
[ 230.015974] 3fe0: aecce8f0 b6a81cec 000982d4 aecce910
[ 230.021095]
[ 230.022636] Allocated by task 20:
[ 230.026039] kasan_save_stack+0x24/0x48
[ 230.029962] __kasan_kmalloc.constprop.1+0xb8/0xc4
[ 230.034842] kasan_kmalloc+0x10/0x14
[ 230.038495] __kmalloc+0x168/0x2f4
[ 230.041976] __spi_alloc_controller+0x30/0xc0
[ 230.046421] bcm2835_spi_probe+0x90/0x4cc
[ 230.050514] platform_drv_probe+0x70/0xc8
[ 230.054612] really_probe+0x184/0x728
[ 230.058361] driver_probe_device+0xa4/0x278
[ 230.062637] __device_attach_driver+0xe8/0x148
[ 230.067169] bus_for_each_drv+0x108/0x158
[ 230.071267] __device_attach+0x190/0x234
[ 230.075279] device_initial_probe+0x1c/0x20
[ 230.079551] bus_probe_device+0xdc/0xec
[ 230.083475] deferred_probe_work_func+0xd4/0x11c
[ 230.088196] process_one_work+0x420/0x8f0
[ 230.092293] worker_thread+0x4fc/0x91c
[ 230.096127] kthread+0x21c/0x22c
[ 230.099427] ret_from_fork+0x14/0x20
[ 230.103075] 0x0
[ 230.104957]
[ 230.106496] Freed by task 157:
[ 230.109627] kasan_save_stack+0x24/0x48
[ 230.113542] kasan_set_track+0x30/0x38
[ 230.117375] kasan_set_free_info+0x28/0x34
[ 230.121553] __kasan_slab_free+0x110/0x144
[ 230.125732] kasan_slab_free+0x14/0x18
[ 230.129556] kfree+0xbc/0x2b8
[ 230.132597] spi_controller_release+0x18/0x1c
[ 230.137037] device_release+0x4c/0xf0
[ 230.140781] kobject_put+0x14c/0x2d8
[ 230.144434] device_unregister+0x44/0x84
[ 230.148438] spi_unregister_controller+0xcc/0x124
[ 230.153233] bcm2835_spi_remove+0x5c/0xa4
[ 230.157328] bcm2835_spi_shutdown+0x1c/0x38
[ 230.161593] platform_drv_shutdown+0x40/0x44
[ 230.165949] device_shutdown+0x248/0x35c
[ 230.169953] kernel_restart_prepare+0x4c/0x50
[ 230.174391] kernel_restart+0x1c/0x60
[ 230.178131] __do_sys_reboot+0x148/0x260
[ 230.182132] sys_reboot+0x18/0x1c
[ 230.185519] ret_fast_syscall+0x0/0x2c
[ 230.189335] 0xb6a81cec
[ 230.191829]
[ 230.193380] The buggy address belongs to the object at e0f08000
[ 230.193380] which belongs to the cache kmalloc-2k of size 2048
[ 230.205354] The buggy address is located 856 bytes inside of
[ 230.205354] 2048-byte region [e0f08000, e0f08800)
[ 230.215907] The buggy address belongs to the page:
[ 230.220806] page:b990e388 refcount:1 mapcount:0 mapping:00000000
index:0x0 pfn:0x20f08
[ 230.228841] head:b990e388 order:3 compound_mapcount:0 compound_pincount:0
[ 230.235731] flags: 0x2010200(slab|head)
[ 230.239688] raw: 02010200 00000000 00000100 00000122 e4401800
00000000 80080008 00000000
[ 230.247895] raw: ffffffff 00000001
[ 230.251358] page dumped because: kasan: bad access detected
[ 230.257000]
[ 230.258534] Memory state around the buggy address:
[ 230.263412] e0f08200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 230.270038] e0f08280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 230.276662] >e0f08300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 230.283272] ^
[ 230.288759] e0f08380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 230.295384] e0f08400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 230.301992]
==================================================================
[ 230.309311] Disabling lock debugging due to kernel taint
[ 230.325568] reboot: Restarting system
--
Florian
On Tue, Oct 13, 2020 at 04:48:42PM -0700, Florian Fainelli wrote:
> With KASAN now working on ARM 32-bit, I was able to get the following
> trace upon reboot which invokes bcm2835_spi_shutdown() calling
> bcm2835_spi_remove(), the same can be triggered by doing a driver unbind:
Thank you for the report. Apparently the problem is that
spi_unregister_controller() drops the last ref on the controller,
causing it to be freed, and afterwards we access the controller's
private data, which is part of the same allocation as
struct spi_controller:
bcm2835_spi_remove()
spi_unregister_controller()
device_unregister()
put_device()
spi_controller_release() # spi_master_class.dev_release()
kfree(ctlr)
bcm2835_dma_release(ctlr, bs)
...
However, when I submitted commit 9dd277ff92d0, I double-checked that
the kfree() happens after bcm2835_spi_remove() has finished and I
even wrote in the commit message:
"Note that
the struct spi_controller as well as the driver-private data are not
freed until after bcm2835_spi_remove() has finished, so accessing them
is safe."
I'm puzzled now that it doesn't work as intended. I do not see any
recent commits which changed the behavior, so I must have made a
mistake and missed something.
The below patch should fix the issue. Could you verify that?
Unfortunately I do not have access to a RasPi currently.
An alternative to this patch would be a devm function which acquires
a ref on the spi controller on ->probe() and automatically releases it
after ->remove() has finished. This could be used by other SPI drivers
as well.
Thanks,
Lukas
-- >8 --
diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 41986ac..5254fda 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -1377,6 +1377,7 @@ static int bcm2835_spi_remove(struct platform_device *pdev)
bcm2835_debugfs_remove(bs);
+ spi_controller_get(ctlr);
spi_unregister_controller(ctlr);
bcm2835_dma_release(ctlr, bs);
@@ -1386,6 +1387,7 @@ static int bcm2835_spi_remove(struct platform_device *pdev)
BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
clk_disable_unprepare(bs->clk);
+ spi_controller_put(ctlr);
return 0;
}
On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> Apparently the problem is that spi_unregister_controller() drops the
> last ref on the controller, causing it to be freed, and afterwards we
> access the controller's private data, which is part of the same
> allocation as struct spi_controller:
>
> bcm2835_spi_remove()
> spi_unregister_controller()
> device_unregister()
> put_device()
> spi_controller_release() # spi_master_class.dev_release()
> kfree(ctlr)
> bcm2835_dma_release(ctlr, bs)
> ...
Also see these threads:
https://lore.kernel.org/linux-spi/[email protected]/T/#t
https://lore.kernel.org/linux-spi/[email protected]/T/#u
And here's how _not_ to fix it:
https://lore.kernel.org/linux-spi/[email protected]/T/#t
At least without some care to not break other things:
https://lore.kernel.org/linux-spi/[email protected]/T/#t
On 10/14/20 1:25 PM, Mark Brown wrote:
> On Wed, Oct 14, 2020 at 10:40:35PM +0300, Vladimir Oltean wrote:
>> On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
>
>>> Apparently the problem is that spi_unregister_controller() drops the
>>> last ref on the controller, causing it to be freed, and afterwards we
>>> access the controller's private data, which is part of the same
>>> allocation as struct spi_controller:
>
>>> bcm2835_spi_remove()
>>> spi_unregister_controller()
>>> device_unregister()
>>> put_device()
>>> spi_controller_release() # spi_master_class.dev_release()
>>> kfree(ctlr)
>>> bcm2835_dma_release(ctlr, bs)
>
>> Also see these threads:
>> https://lore.kernel.org/linux-spi/[email protected]/T/#t
>> https://lore.kernel.org/linux-spi/[email protected]/T/#u
>
> Right, the proposed patch is yet another way to fix the issue - it all
> comes back to the fact that you shouldn't be using the driver data after
> unregistering if it was allocated as part of allocating the controller.
> This framework feature is unfortunately quite error prone.
Lukas, your patch works fine for me and is only two lines, so maybe
better suited for stable. How about the attached patch?
--
Florian
On Wed, Oct 14, 2020 at 10:40:35PM +0300, Vladimir Oltean wrote:
> On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> > Apparently the problem is that spi_unregister_controller() drops the
> > last ref on the controller, causing it to be freed, and afterwards we
> > access the controller's private data, which is part of the same
> > allocation as struct spi_controller:
> > bcm2835_spi_remove()
> > spi_unregister_controller()
> > device_unregister()
> > put_device()
> > spi_controller_release() # spi_master_class.dev_release()
> > kfree(ctlr)
> > bcm2835_dma_release(ctlr, bs)
> Also see these threads:
> https://lore.kernel.org/linux-spi/[email protected]/T/#t
> https://lore.kernel.org/linux-spi/[email protected]/T/#u
Right, the proposed patch is yet another way to fix the issue - it all
comes back to the fact that you shouldn't be using the driver data after
unregistering if it was allocated as part of allocating the controller.
This framework feature is unfortunately quite error prone.
[cc += Sascha]
On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > On Wed, Oct 14, 2020 at 04:09:12PM +0200, Lukas Wunner wrote:
> > > Apparently the problem is that spi_unregister_controller() drops the
> > > last ref on the controller, causing it to be freed, and afterwards we
> > > access the controller's private data, which is part of the same
> > > allocation as struct spi_controller:
>
> Right, the proposed patch is yet another way to fix the issue - it all
> comes back to the fact that you shouldn't be using the driver data after
> unregistering if it was allocated as part of allocating the controller.
> This framework feature is unfortunately quite error prone.
How about holding a ref on the controller as long as the SPI driver
is bound to the controller's parent device? See below for a patch,
compile-tested only for lack of an SPI-equipped machine.
Makes sense or dumb idea?
If this approach is deemed to be a case of "midlayer fallacy",
we could alternatively do this in a library function which drivers
opt-in to. Or, given that the majority of drivers seems to be affected,
make it the default and allow drivers to opt-out.
-- >8 --
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 0cab239..5afa275 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2399,6 +2399,11 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr,
extern struct class spi_slave_class; /* dummy */
#endif
+static void __spi_controller_put(void *ctlr)
+{
+ spi_controller_put(ctlr);
+}
+
/**
* __spi_alloc_controller - allocate an SPI master or slave controller
* @dev: the controller, possibly using the platform_bus
@@ -2414,6 +2419,7 @@ static ssize_t slave_store(struct device *dev, struct device_attribute *attr,
* This call is used only by SPI controller drivers, which are the
* only ones directly touching chip registers. It's how they allocate
* an spi_controller structure, prior to calling spi_register_controller().
+ * The structure is accessible as long as the SPI driver is bound to @dev.
*
* This must be called from context that can sleep.
*
@@ -2429,6 +2435,7 @@ struct spi_controller *__spi_alloc_controller(struct device *dev,
{
struct spi_controller *ctlr;
size_t ctlr_size = ALIGN(sizeof(*ctlr), dma_get_cache_alignment());
+ int ret;
if (!dev)
return NULL;
@@ -2449,6 +2456,13 @@ struct spi_controller *__spi_alloc_controller(struct device *dev,
pm_suspend_ignore_children(&ctlr->dev, true);
spi_controller_set_devdata(ctlr, (void *)ctlr + ctlr_size);
+ spi_controller_get(ctlr);
+ ret = devm_add_action(dev, __spi_controller_put, ctlr);
+ if (ret) {
+ kfree(ctlr);
+ return NULL;
+ }
+
return ctlr;
}
EXPORT_SYMBOL_GPL(__spi_alloc_controller);
On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote:
> On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > Right, the proposed patch is yet another way to fix the issue - it all
> > comes back to the fact that you shouldn't be using the driver data after
> > unregistering if it was allocated as part of allocating the controller.
> > This framework feature is unfortunately quite error prone.
> How about holding a ref on the controller as long as the SPI driver
> is bound to the controller's parent device? See below for a patch,
> compile-tested only for lack of an SPI-equipped machine.
> Makes sense or dumb idea?
> If this approach is deemed to be a case of "midlayer fallacy",
> we could alternatively do this in a library function which drivers
> opt-in to. Or, given that the majority of drivers seems to be affected,
> make it the default and allow drivers to opt-out.
...
> + spi_controller_get(ctlr);
> + ret = devm_add_action(dev, __spi_controller_put, ctlr);
> + if (ret) {
> + kfree(ctlr);
This feels a bit icky - we're masking a standard use after free bug that
affects devm in general, not just this instance, and so while it will
work it doesn't feel great. If we did do this it'd need more comments
and should probably be conditional on using the feature. TBH I'm just
thinking it's better to just remove the feature, it's clearly error
prone and pretty redundant with devm. I'm not sure any memory savings
it's delivering are worth the sharp edges.
On Wed, Oct 14, 2020 at 02:20:16PM -0700, Florian Fainelli wrote:
> In bcm2835_spi_remove(), spi_controller_unregister() will free the ctlr
> reference which will lead to an use after free in bcm2835_release_dma().
>
> To avoid this use after free, allocate the bcm2835_spi structure with a
> different lifecycle than the spi_controller structure such that we
> unregister the SPI controller, free up all the resources and finally let
> device managed allocations free the bcm2835_spi structure.
[...]
> - if (ctlr->dma_tx) {
> - dmaengine_terminate_sync(ctlr->dma_tx);
> + if (dma_tx) {
> + dmaengine_terminate_sync(dma_tx);
>
> if (bs->fill_tx_desc)
> dmaengine_desc_free(bs->fill_tx_desc);
>
> if (bs->fill_tx_addr)
> - dma_unmap_page_attrs(ctlr->dma_tx->device->dev,
> + dma_unmap_page_attrs(dma_tx->device->dev,
> bs->fill_tx_addr, sizeof(u32),
> DMA_TO_DEVICE,
> DMA_ATTR_SKIP_CPU_SYNC);
>
> - dma_release_channel(ctlr->dma_tx);
> - ctlr->dma_tx = NULL;
> + dma_release_channel(dma_tx);
> }
You must set ctlr->dma_tx and ctlr->dma_rx to NULL because the driver
checks their value in a couple of places.
E.g. bcm2835_spi_setup() checks ctlr->dma_rx.
Likewise, the error paths of bcm2835_dma_init() and bcm2835_spi_probe()
call bcm2835_dma_release() and the latter checks ctlr->dma_tx and
ctlr->dma_rx to determine whether DMA was set up, hence needs to be
torn down.
> + bs = devm_kzalloc(&pdev->dev, sizeof(*bs), GFP_KERNEL);
> + if (!bs)
> + return -ENOMEM;
> +
> ctlr = spi_alloc_master(&pdev->dev, ALIGN(sizeof(*bs),
> dma_get_cache_alignment()));
You can set the second argument to spi_alloc_master() to 0
to conserve memory.
Thanks,
Lukas
On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote:
> On Thu, Oct 15, 2020 at 07:38:29AM +0200, Lukas Wunner wrote:
> > On Wed, Oct 14, 2020 at 09:25:05PM +0100, Mark Brown wrote:
> > How about holding a ref on the controller as long as the SPI driver
> > is bound to the controller's parent device? See below for a patch,
> > compile-tested only for lack of an SPI-equipped machine.
[...]
> > + spi_controller_get(ctlr);
> > + ret = devm_add_action(dev, __spi_controller_put, ctlr);
> > + if (ret) {
> > + kfree(ctlr);
>
> This feels a bit icky - we're masking a standard use after free bug that
> affects devm in general, not just this instance, and so while it will
> work it doesn't feel great. If we did do this it'd need more comments
> and should probably be conditional on using the feature. TBH I'm just
> thinking it's better to just remove the feature, it's clearly error
> prone and pretty redundant with devm. I'm not sure any memory savings
> it's delivering are worth the sharp edges.
A combined memory allocation for struct spi_controller and the private
data has more benefits than just memory savings: Having them adjacent
in memory reduces the chance of cache misses. Also, one can get from
one struct to the other with a cheap subtraction (using container_of())
instead of having to chase pointers. So it helps performance. And a
lack of pointers arguably helps security.
Most subsystems embed the controller struct in the private data, but
there *is* precedence for doing it the other way round. E.g. the IIO
subsystem likewise appends the private data to the controller struct.
So I think that's fine, it need not and should not be changed.
The problem is that the ->probe() and ->remove() code is currently
asymmetric, which is unintuitive: On ->probe(), there's an allocation
step and a registration step:
spi_alloc_master()
spi_register_controller()
Whereas on ->remove(), there's no step to free the master which would
balance the prior alloc step:
spi_unregister_controller()
That's because the spi_controller struct is ref-counted and the last
ref is usually dropped by spi_unregister_controller(). If the private
data is accessed after the spi_unregister_controller() step, a ref
needs to be held.
I maintain that it would be more intuitive to automatically hold a ref.
We could offer a devm_spi_alloc_master() function which holds this ref
and automatically releases it on unbind.
There are three drivers which call spi_alloc_master() with a size of zero
for the private data. In these three cases it is fine to free the
spi_controller struct upon spi_unregister_controller(). So these drivers
can continue to use spi_alloc_master(). All other drivers could be
changed to use the new devm_spi_alloc_master(), or I could scrutinize
each of them and convert to the new function only if necessary.
Does this sound more convincing to you?
Thanks,
Lukas
On Wed, Oct 28, 2020 at 10:59:46AM +0100, Lukas Wunner wrote:
> On Thu, Oct 15, 2020 at 01:53:35PM +0100, Mark Brown wrote:
> > This feels a bit icky - we're masking a standard use after free bug that
> > affects devm in general, not just this instance, and so while it will
> > work it doesn't feel great. If we did do this it'd need more comments
> A combined memory allocation for struct spi_controller and the private
> data has more benefits than just memory savings: Having them adjacent
> in memory reduces the chance of cache misses. Also, one can get from
> one struct to the other with a cheap subtraction (using container_of())
> instead of having to chase pointers. So it helps performance. And a
> lack of pointers arguably helps security.
The performance arguments don't seem super compelling either way TBH
given what SPI does, cache misses accessing the private data seem
unlikely to be perceptible when operations boil down to accesses on the
SPI bus.
> Most subsystems embed the controller struct in the private data, but
> there *is* precedence for doing it the other way round. E.g. the IIO
> subsystem likewise appends the private data to the controller struct.
> So I think that's fine, it need not and should not be changed.
Given their ages I suspect IIO copied SPI; I do think it's this reversal
that's confusing things.
> The problem is that the ->probe() and ->remove() code is currently
> asymmetric, which is unintuitive: On ->probe(), there's an allocation
> step and a registration step:
> spi_alloc_master()
> spi_register_controller()
> Whereas on ->remove(), there's no step to free the master which would
> balance the prior alloc step:
> spi_unregister_controller()
> That's because the spi_controller struct is ref-counted and the last
> ref is usually dropped by spi_unregister_controller(). If the private
> data is accessed after the spi_unregister_controller() step, a ref
> needs to be held.
I agree that it's the asymmetry here, the disagreement is about how to
fix it. If we keep the allocations combined then that probably makes
sense but I'm at best unclear on the merit of keeping the allocations
combined.
> I maintain that it would be more intuitive to automatically hold a ref.
> We could offer a devm_spi_alloc_master() function which holds this ref
> and automatically releases it on unbind.
I don't know that it's super intuitive to have to have an explicit free
in the driver - you could equally expect that having registered the
thing allocated with the core's custom allocation function with the core
that the core is now taking ownership of it (which is how SPI devices as
opposed to controllers work). That's what makes me lean towards just
doing separate allocations, there's no possibility of expectations about
transferring ownership. If it's *always* done with devm it kind of gets
hidden though so perhaps it's not so bad and my concern goes away...
> There are three drivers which call spi_alloc_master() with a size of zero
> for the private data. In these three cases it is fine to free the
> spi_controller struct upon spi_unregister_controller(). So these drivers
> can continue to use spi_alloc_master(). All other drivers could be
> changed to use the new devm_spi_alloc_master(), or I could scrutinize
> each of them and convert to the new function only if necessary.
It's only things that explicitly unregister the controller (rather than
using devm) that are going to be affected here, that's a *much* smaller
subset. Everything else will be done with driver specific code and
hence private data usage before the controler goes away, though it looks
like a bunch (though not all) of them have other issues and are using
devm when they shouldn't. That's a separate problem which ought to be
fixed anyway though. The removal paths aren't exactly heavily stressed,
especially not under load.
In any case for your proposal your plan makes sense, I mostly just want
to avoid ending up with people getting confused in the other direction
and introducing another set of bugs.