2013-10-04 01:40:40

by Fengguang Wu

[permalink] [raw]
Subject: [DMA-API] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248

Hi Russell,

I got the below dmesg and the first bad commit is

commit 2713c99438b00d67b4bd88eaf9713c8645c8daf7
Author: Russell King <[email protected]>
Date: Thu Jun 27 14:14:43 2013 +0100

DMA-API: dcdbas: update DMA mask handing

dcdbas was explicitly initializing DMA masks thusly:
dcdbas_pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
dcdbas_pdev->dev.dma_mask = &dcdbas_pdev->dev.coherent_dma_mask;
which bypasses the architecture check. Moreover, it is creating the
dcdbas_pdev device itself, and using the platform_device_register_full()
avoids some of this explicit initialization.

Convert the driver to use platform_device_register_full(), and as it
makes use of coherent DMA, also call dma_set_coherent_mask() to ensure
that the architecture gets to check the mask.

Signed-off-by: Russell King <[email protected]>

This BUG does not show up in upstream and linux-next, so either the
commit has not been merged or has been fixed somewhere.

[ 267.537083] sdhci-pltfm: SDHCI platform and OF driver helper
[ 267.602219] ledtrig-cpu: registered to indicate activity on CPUs
[ 267.656654] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248
[ 267.656689] IP: [<ffffffff810073c9>] dma_supported+0x9/0xa0
[ 267.656689] PGD 0
[ 267.656689] Oops: 0000 [#1] PREEMPT
[ 267.656689] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc2-00154-g12c6060 #3
[ 267.656689] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 267.656689] task: ffff88000d868000 ti: ffff88000d864000 task.ti: ffff88000d864000
[ 267.656689] RIP: 0010:[<ffffffff810073c9>] [<ffffffff810073c9>] dma_supported+0x9/0xa0
[ 267.656689] RSP: 0000:ffff88000d865cb0 EFLAGS: 00000202
[ 267.656689] RAX: ffffffff814c0b90 RBX: 00000000fffffffb RCX: 0000000000000001
[ 267.656689] RDX: 0000000000000780 RSI: 00000000ffffffff RDI: 0000000000000010
[ 267.656689] RBP: ffff88000d865cb0 R08: 0000000000000001 R09: 0000000000000001
[ 267.656689] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 267.656689] R13: ffff88000f5eb400 R14: ffffffff81c85240 R15: 0000000000000000
[ 267.656689] FS: 0000000000000000(0000) GS:ffffffff81a6b000(0000) knlGS:0000000000000000
[ 267.656689] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 267.656689] CR2: 0000000000000248 CR3: 0000000001a5c000 CR4: 00000000000006b0
[ 267.656689] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 267.656689] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[ 267.656689] Stack:
[ 267.656689] ffff88000d865cd8 ffffffff816878d6 ffff88000f5eb410 ffffffff81d00128
[ 267.656689] ffff88000f5eb400 ffff88000d865d08 ffffffff814c0bce ffffffff814bf06e
[ 267.656689] ffff88000f5eb410 ffff88000f5eb410 ffffffff81d00128 ffff88000d865d30
[ 267.656689] Call Trace:
[ 267.656689] [<ffffffff816878d6>] dcdbas_probe+0x46/0xb0
[ 267.656689] [<ffffffff814c0bce>] platform_drv_probe+0x3e/0x70
[ 267.656689] [<ffffffff814bf06e>] ? driver_sysfs_add+0x6e/0xa0
[ 267.656689] [<ffffffff814bf495>] really_probe+0xc5/0x1e0
[ 267.656689] [<ffffffff814bf5d0>] ? driver_probe_device+0x20/0x20
[ 267.656689] [<ffffffff814bf625>] __device_attach+0x55/0x70
[ 267.656689] [<ffffffff814bdc6e>] bus_for_each_drv+0x5e/0xb0
[ 267.656689] [<ffffffff814bf328>] device_attach+0x78/0x90
[ 267.656689] [<ffffffff814bdee5>] bus_probe_device+0x35/0xd0
[ 267.656689] [<ffffffff814bbafd>] device_add+0x4ad/0x720
[ 267.656689] [<ffffffff814c11f0>] platform_device_add+0x180/0x210
[ 267.656689] [<ffffffff814c14a0>] platform_device_register_full+0xb0/0x110
[ 267.656689] [<ffffffff81d57880>] ? dcdrbu_init+0x15a/0x15a
[ 267.656689] [<ffffffff81d578a6>] dcdbas_init+0x26/0x51
[ 267.656689] [<ffffffff81d23e62>] do_one_initcall+0x7d/0x115
[ 267.656689] [<ffffffff81d2403d>] kernel_init_freeable+0x143/0x1cf
[ 267.656689] [<ffffffff81d23831>] ? do_early_param+0x8a/0x8a
[ 267.656689] [<ffffffff8174da90>] ? rest_init+0xc0/0xc0
[ 267.656689] [<ffffffff8174da99>] kernel_init+0x9/0x170
[ 267.656689] [<ffffffff817623ca>] ret_from_fork+0x7a/0xb0
[ 267.656689] [<ffffffff8174da90>] ? rest_init+0xc0/0xc0
[ 267.656689] Code: 0c 48 0f bd c6 8d 70 01 e8 05 24 0f 00 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 ff 48 89 e5 74 0c <48> 8b 87 38 02 00 00 48 85 c0 75 0b 48 8b 05 3c e0 a5 00 0f 1f
[ 267.656689] RIP [<ffffffff810073c9>] dma_supported+0x9/0xa0
[ 267.656689] RSP <ffff88000d865cb0>
[ 267.656689] CR2: 0000000000000248
[ 267.751209] ---[ end trace a1914743e6b14bdd ]---

git bisect start 12c6060d1380098bb69fad0c26200557d0763355 4a10c2ac2f368583138b774ca41fac4207911983 --
git bisect good 5dd16df0949a73a3dfd7bfb976d68d6ba2e0676e # 07:52 60+ Merge remote-tracking branch 'rcu/rcu/fixes' into kbuild_tmp
git bisect good 8706fffa0da105de430ae2492c37348c076140fa # 11:41 60+ DMA-API: usb: use new dma_coerce_mask_and_coherent()
git bisect good f516e2c9ecdcc717d25fe1533805a7960310c186 # 12:49 60+ Merge branch 'bnx2x'
git bisect bad 55871ea05974c2bbd3082bbae95448eea777873a # 12:54 0- ARM: DMA-API: better handing of DMA masks for coherent allocations
git bisect good 067f3c33c61816c4cb34f91f1a73dbd86fe6f867 # 17:37 60+ DMA-API: crypto: remove last references to 'static struct device *dev'
git bisect good cf792a24d10b66d903549fe911084fda0cc78bb9 # 18:46 60+ DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
git bisect bad aff71e59b441bcef245d86e82903950d5581a3aa # 18:55 0- DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks
git bisect bad 2713c99438b00d67b4bd88eaf9713c8645c8daf7 # 18:57 0- DMA-API: dcdbas: update DMA mask handing
git bisect good cf792a24d10b66d903549fe911084fda0cc78bb9 # 00:20 180+ DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
git bisect bad 12c6060d1380098bb69fad0c26200557d0763355 # 00:20 0- Merge remote-tracking branch 'arm-soc/to-build' into kbuild_tmp
git bisect good f1ee84840f837a5939f40cf07c3e6bceeb2a5432 # 05:48 180+ Revert "DMA-API: dcdbas: update DMA mask handing"
git bisect good 6d15ee492809d38bd62237b6d0f6a81d4dd12d15 # 06:22 180+ Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good a0cf1abc25ac197dd97b857c0f6341066a8cb1cf # 09:25 180+ Add linux-next specific files for 20130927

Thanks,
Fengguang


Attachments:
(No filename) (6.28 kB)
dmesg-quantal-ant-3:20130924081102:3.12.0-rc2-00154-g12c6060:3 (41.85 kB)
bisect-12c6060d1380098bb69fad0c26200557d0763355-x86_64-randconfig-c8-0924-BUG:-unable-to-handle-kernel-NULL-pointer-dereference-at-90445.log (31.87 kB)
config-3.12.0-rc2-00154-g12c6060 (72.70 kB)
Download all attachments

2013-10-04 10:41:51

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [DMA-API] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248

On Fri, Oct 04, 2013 at 09:40:17AM +0800, Fengguang Wu wrote:
> This BUG does not show up in upstream and linux-next, so either the
> commit has not been merged or has been fixed somewhere.

That's because I haven't pushed it out into linux-next yet. Thanks
for testing nevertheless.

> [ 267.537083] sdhci-pltfm: SDHCI platform and OF driver helper
> [ 267.602219] ledtrig-cpu: registered to indicate activity on CPUs
> [ 267.656654] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248
> [ 267.656689] IP: [<ffffffff810073c9>] dma_supported+0x9/0xa0

This seems to be saying that 'dev' was approximately a NULL pointer.

This is a wonderfully written driver this is... it's not really a
platform driver at all. It stores the platform device into the global
dcdbas_pdev, and then references it from within the driver. The problem
is caused by that happening before platform_device_register_full() has
returned and stored the platform device pointer. It's use of a
platform device is just to be able to have some sysfs attributes which
it can play around with (I wonder if anyone has reviewed what it does
with these...)

You can also find goodies like this here:

/*
* We have to free the buffer here instead of dcdbas_remove
* because only in module exit function we can be sure that
* all sysfs attributes belonging to this module have been
* released.
*/
smi_data_buf_free();
platform_device_unregister(dcdbas_pdev);
platform_driver_unregister(&dcdbas_driver);

which is quite a statement in itself, and if it's thought about, how
can putting smi_data_buf_free() before platform_device_unregister()
here satisfy that comment.

Well, short of dropping the patch, I think the easiest solution here is
this, which annoyingly makes the mess slightly worse:

diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
index a85fda2..0a751bf 100644
--- a/drivers/firmware/dcdbas.c
+++ b/drivers/firmware/dcdbas.c
@@ -545,6 +545,8 @@ static int dcdbas_probe(struct platform_device *dev)
host_control_action = HC_ACTION_NONE;
host_control_smi_type = HC_SMITYPE_NONE;

+ dcdbas_pdev = dev;
+
/*
* BIOS SMI calls require buffer addresses be in 32-bit address space.
* This is done by setting the DMA mask below.
@@ -588,6 +590,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
.dma_mask = DMA_BIT_MASK(32),
};

+static struct platform_device *dcdbas_pdev_reg;
+
/**
* dcdbas_init: initialize driver
*/
@@ -599,9 +603,9 @@ static int __init dcdbas_init(void)
if (error)
return error;

- dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
- if (IS_ERR(dcdbas_pdev)) {
- error = PTR_ERR(dcdbas_pdev);
+ dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
+ if (IS_ERR(dcdbas_pdev_reg)) {
+ error = PTR_ERR(dcdbas_pdev_reg);
goto err_unregister_driver;
}

@@ -629,8 +633,9 @@ static void __exit dcdbas_exit(void)
* all sysfs attributes belonging to this module have been
* released.
*/
- smi_data_buf_free();
- platform_device_unregister(dcdbas_pdev);
+ if (dcdbas_pdev)
+ smi_data_buf_free();
+ platform_device_unregister(dcdbas_pdev_reg);
platform_driver_unregister(&dcdbas_driver);
}

Another is to try and pull the direct references to dcdbas_pdev up the
call chains... that's easier said than done because there is a global
function in this driver - dcdbas_smi_request. The best I can get to
is the below, which leaves three direct uses of dcdbas_pdev:

diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
index a85fda2..d9d215c 100644
--- a/drivers/firmware/dcdbas.c
+++ b/drivers/firmware/dcdbas.c
@@ -58,15 +58,15 @@ static unsigned int host_control_on_shutdown;
/**
* smi_data_buf_free: free SMI data buffer
*/
-static void smi_data_buf_free(void)
+static void smi_data_buf_free(struct device *dev)
{
if (!smi_data_buf)
return;

- dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
+ dev_dbg(dev, "%s: phys: %x size: %lu\n",
__func__, smi_data_buf_phys_addr, smi_data_buf_size);

- dma_free_coherent(&dcdbas_pdev->dev, smi_data_buf_size, smi_data_buf,
+ dma_free_coherent(dev, smi_data_buf_size, smi_data_buf,
smi_data_buf_handle);
smi_data_buf = NULL;
smi_data_buf_handle = 0;
@@ -77,7 +77,7 @@ static void smi_data_buf_free(void)
/**
* smi_data_buf_realloc: grow SMI data buffer if needed
*/
-static int smi_data_buf_realloc(unsigned long size)
+static int smi_data_buf_realloc(struct device *dev, unsigned long size)
{
void *buf;
dma_addr_t handle;
@@ -89,10 +89,9 @@ static int smi_data_buf_realloc(unsigned long size)
return -EINVAL;

/* new buffer is needed */
- buf = dma_alloc_coherent(&dcdbas_pdev->dev, size, &handle, GFP_KERNEL);
+ buf = dma_alloc_coherent(dev, size, &handle, GFP_KERNEL);
if (!buf) {
- dev_dbg(&dcdbas_pdev->dev,
- "%s: failed to allocate memory size %lu\n",
+ dev_dbg(dev, "%s: failed to allocate memory size %lu\n",
__func__, size);
return -ENOMEM;
}
@@ -102,7 +101,7 @@ static int smi_data_buf_realloc(unsigned long size)
memcpy(buf, smi_data_buf, smi_data_buf_size);

/* free any existing buffer */
- smi_data_buf_free();
+ smi_data_buf_free(dev);

/* set up new buffer for use */
smi_data_buf = buf;
@@ -110,7 +109,7 @@ static int smi_data_buf_realloc(unsigned long size)
smi_data_buf_phys_addr = (u32) virt_to_phys(buf);
smi_data_buf_size = size;

- dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
+ dev_dbg(dev, "%s: phys: %x size: %lu\n",
__func__, smi_data_buf_phys_addr, smi_data_buf_size);

return 0;
@@ -141,7 +140,7 @@ static ssize_t smi_data_buf_size_store(struct device *dev,

/* make sure SMI data buffer is at least buf_size */
mutex_lock(&smi_data_lock);
- ret = smi_data_buf_realloc(buf_size);
+ ret = smi_data_buf_realloc(dev, buf_size);
mutex_unlock(&smi_data_lock);
if (ret)
return ret;
@@ -166,6 +165,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr,
char *buf, loff_t pos, size_t count)
{
+ struct device *dev = kobj_to_dev(kobj);
ssize_t ret;

if ((pos + count) > MAX_SMI_DATA_BUF_SIZE)
@@ -173,7 +173,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,

mutex_lock(&smi_data_lock);

- ret = smi_data_buf_realloc(pos + count);
+ ret = smi_data_buf_realloc(dev, pos + count);
if (ret)
goto out;

@@ -199,7 +199,7 @@ static ssize_t host_control_action_store(struct device *dev,

/* make sure buffer is available for host control command */
mutex_lock(&smi_data_lock);
- ret = smi_data_buf_realloc(sizeof(struct apm_cmd));
+ ret = smi_data_buf_realloc(dev, sizeof(struct apm_cmd));
mutex_unlock(&smi_data_lock);
if (ret)
return ret;
@@ -347,7 +347,7 @@ EXPORT_SYMBOL(dcdbas_smi_request);
*
* Caller must set up the host control command in smi_data_buf.
*/
-static int host_control_smi(void)
+static int host_control_smi(struct device *dev)
{
struct apm_cmd *apm_cmd;
u8 *data;
@@ -426,7 +426,7 @@ static int host_control_smi(void)
break;

default:
- dev_dbg(&dcdbas_pdev->dev, "%s: invalid SMI type %u\n",
+ dev_dbg(dev, "%s: invalid SMI type %u\n",
__func__, host_control_smi_type);
return -ENOSYS;
}
@@ -443,7 +443,7 @@ static int host_control_smi(void)
* use smi_data_buf at this point because the system has finished
* shutting down and no userspace apps are running.
*/
-static void dcdbas_host_control(void)
+static void dcdbas_host_control(struct device *dev)
{
struct apm_cmd *apm_cmd;
u8 action;
@@ -455,13 +455,12 @@ static void dcdbas_host_control(void)
host_control_action = HC_ACTION_NONE;

if (!smi_data_buf) {
- dev_dbg(&dcdbas_pdev->dev, "%s: no SMI buffer\n", __func__);
+ dev_dbg(dev, "%s: no SMI buffer\n", __func__);
return;
}

if (smi_data_buf_size < sizeof(struct apm_cmd)) {
- dev_dbg(&dcdbas_pdev->dev, "%s: SMI buffer too small\n",
- __func__);
+ dev_dbg(dev, "%s: SMI buffer too small\n", __func__);
return;
}

@@ -472,12 +471,12 @@ static void dcdbas_host_control(void)
apm_cmd->command = ESM_APM_POWER_CYCLE;
apm_cmd->reserved = 0;
*((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 0;
- host_control_smi();
+ host_control_smi(dev);
} else if (action & HC_ACTION_HOST_CONTROL_POWERCYCLE) {
apm_cmd->command = ESM_APM_POWER_CYCLE;
apm_cmd->reserved = 0;
*((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 20;
- host_control_smi();
+ host_control_smi(dev);
}
}

@@ -495,7 +494,7 @@ static int dcdbas_reboot_notify(struct notifier_block *nb, unsigned long code,
/* firmware is going to perform host control action */
printk(KERN_WARNING "Please wait for shutdown "
"action to complete...\n");
- dcdbas_host_control();
+ dcdbas_host_control(&dcdbas_pdev->dev);
}
break;
}
@@ -545,11 +544,13 @@ static int dcdbas_probe(struct platform_device *dev)
host_control_action = HC_ACTION_NONE;
host_control_smi_type = HC_SMITYPE_NONE;

+ dcdbas_pdev = dev;
+
/*
* BIOS SMI calls require buffer addresses be in 32-bit address space.
* This is done by setting the DMA mask below.
*/
- error = dma_set_coherent_mask(&dcdbas_pdev->dev, DMA_BIT_MASK(32));
+ error = dma_set_coherent_mask(&dev->dev, DMA_BIT_MASK(32));
if (error)
return error;

@@ -588,6 +589,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
.dma_mask = DMA_BIT_MASK(32),
};

+static struct platform_device *dcdbas_pdev_reg;
+
/**
* dcdbas_init: initialize driver
*/
@@ -599,9 +602,9 @@ static int __init dcdbas_init(void)
if (error)
return error;

- dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
- if (IS_ERR(dcdbas_pdev)) {
- error = PTR_ERR(dcdbas_pdev);
+ dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
+ if (IS_ERR(dcdbas_pdev_reg)) {
+ error = PTR_ERR(dcdbas_pdev_reg);
goto err_unregister_driver;
}

@@ -629,8 +632,9 @@ static void __exit dcdbas_exit(void)
* all sysfs attributes belonging to this module have been
* released.
*/
- smi_data_buf_free();
- platform_device_unregister(dcdbas_pdev);
+ if (dcdbas_pdev)
+ smi_data_buf_free(&dcdbas_pdev->dev);
+ platform_device_unregister(dcdbas_pdev_reg);
platform_driver_unregister(&dcdbas_driver);
}

2013-10-07 08:38:31

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [DMA-API] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248

Fengguang,

Have you done anything with this at all? If not, don't expect me to
sort this out, because I can't test this. Thanks.

On Fri, Oct 04, 2013 at 11:41:36AM +0100, Russell King - ARM Linux wrote:
> On Fri, Oct 04, 2013 at 09:40:17AM +0800, Fengguang Wu wrote:
> > This BUG does not show up in upstream and linux-next, so either the
> > commit has not been merged or has been fixed somewhere.
>
> That's because I haven't pushed it out into linux-next yet. Thanks
> for testing nevertheless.
>
> > [ 267.537083] sdhci-pltfm: SDHCI platform and OF driver helper
> > [ 267.602219] ledtrig-cpu: registered to indicate activity on CPUs
> > [ 267.656654] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248
> > [ 267.656689] IP: [<ffffffff810073c9>] dma_supported+0x9/0xa0
>
> This seems to be saying that 'dev' was approximately a NULL pointer.
>
> This is a wonderfully written driver this is... it's not really a
> platform driver at all. It stores the platform device into the global
> dcdbas_pdev, and then references it from within the driver. The problem
> is caused by that happening before platform_device_register_full() has
> returned and stored the platform device pointer. It's use of a
> platform device is just to be able to have some sysfs attributes which
> it can play around with (I wonder if anyone has reviewed what it does
> with these...)
>
> You can also find goodies like this here:
>
> /*
> * We have to free the buffer here instead of dcdbas_remove
> * because only in module exit function we can be sure that
> * all sysfs attributes belonging to this module have been
> * released.
> */
> smi_data_buf_free();
> platform_device_unregister(dcdbas_pdev);
> platform_driver_unregister(&dcdbas_driver);
>
> which is quite a statement in itself, and if it's thought about, how
> can putting smi_data_buf_free() before platform_device_unregister()
> here satisfy that comment.
>
> Well, short of dropping the patch, I think the easiest solution here is
> this, which annoyingly makes the mess slightly worse:
>
> diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
> index a85fda2..0a751bf 100644
> --- a/drivers/firmware/dcdbas.c
> +++ b/drivers/firmware/dcdbas.c
> @@ -545,6 +545,8 @@ static int dcdbas_probe(struct platform_device *dev)
> host_control_action = HC_ACTION_NONE;
> host_control_smi_type = HC_SMITYPE_NONE;
>
> + dcdbas_pdev = dev;
> +
> /*
> * BIOS SMI calls require buffer addresses be in 32-bit address space.
> * This is done by setting the DMA mask below.
> @@ -588,6 +590,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
> .dma_mask = DMA_BIT_MASK(32),
> };
>
> +static struct platform_device *dcdbas_pdev_reg;
> +
> /**
> * dcdbas_init: initialize driver
> */
> @@ -599,9 +603,9 @@ static int __init dcdbas_init(void)
> if (error)
> return error;
>
> - dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
> - if (IS_ERR(dcdbas_pdev)) {
> - error = PTR_ERR(dcdbas_pdev);
> + dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
> + if (IS_ERR(dcdbas_pdev_reg)) {
> + error = PTR_ERR(dcdbas_pdev_reg);
> goto err_unregister_driver;
> }
>
> @@ -629,8 +633,9 @@ static void __exit dcdbas_exit(void)
> * all sysfs attributes belonging to this module have been
> * released.
> */
> - smi_data_buf_free();
> - platform_device_unregister(dcdbas_pdev);
> + if (dcdbas_pdev)
> + smi_data_buf_free();
> + platform_device_unregister(dcdbas_pdev_reg);
> platform_driver_unregister(&dcdbas_driver);
> }
>
> Another is to try and pull the direct references to dcdbas_pdev up the
> call chains... that's easier said than done because there is a global
> function in this driver - dcdbas_smi_request. The best I can get to
> is the below, which leaves three direct uses of dcdbas_pdev:
>
> diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
> index a85fda2..d9d215c 100644
> --- a/drivers/firmware/dcdbas.c
> +++ b/drivers/firmware/dcdbas.c
> @@ -58,15 +58,15 @@ static unsigned int host_control_on_shutdown;
> /**
> * smi_data_buf_free: free SMI data buffer
> */
> -static void smi_data_buf_free(void)
> +static void smi_data_buf_free(struct device *dev)
> {
> if (!smi_data_buf)
> return;
>
> - dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
> + dev_dbg(dev, "%s: phys: %x size: %lu\n",
> __func__, smi_data_buf_phys_addr, smi_data_buf_size);
>
> - dma_free_coherent(&dcdbas_pdev->dev, smi_data_buf_size, smi_data_buf,
> + dma_free_coherent(dev, smi_data_buf_size, smi_data_buf,
> smi_data_buf_handle);
> smi_data_buf = NULL;
> smi_data_buf_handle = 0;
> @@ -77,7 +77,7 @@ static void smi_data_buf_free(void)
> /**
> * smi_data_buf_realloc: grow SMI data buffer if needed
> */
> -static int smi_data_buf_realloc(unsigned long size)
> +static int smi_data_buf_realloc(struct device *dev, unsigned long size)
> {
> void *buf;
> dma_addr_t handle;
> @@ -89,10 +89,9 @@ static int smi_data_buf_realloc(unsigned long size)
> return -EINVAL;
>
> /* new buffer is needed */
> - buf = dma_alloc_coherent(&dcdbas_pdev->dev, size, &handle, GFP_KERNEL);
> + buf = dma_alloc_coherent(dev, size, &handle, GFP_KERNEL);
> if (!buf) {
> - dev_dbg(&dcdbas_pdev->dev,
> - "%s: failed to allocate memory size %lu\n",
> + dev_dbg(dev, "%s: failed to allocate memory size %lu\n",
> __func__, size);
> return -ENOMEM;
> }
> @@ -102,7 +101,7 @@ static int smi_data_buf_realloc(unsigned long size)
> memcpy(buf, smi_data_buf, smi_data_buf_size);
>
> /* free any existing buffer */
> - smi_data_buf_free();
> + smi_data_buf_free(dev);
>
> /* set up new buffer for use */
> smi_data_buf = buf;
> @@ -110,7 +109,7 @@ static int smi_data_buf_realloc(unsigned long size)
> smi_data_buf_phys_addr = (u32) virt_to_phys(buf);
> smi_data_buf_size = size;
>
> - dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
> + dev_dbg(dev, "%s: phys: %x size: %lu\n",
> __func__, smi_data_buf_phys_addr, smi_data_buf_size);
>
> return 0;
> @@ -141,7 +140,7 @@ static ssize_t smi_data_buf_size_store(struct device *dev,
>
> /* make sure SMI data buffer is at least buf_size */
> mutex_lock(&smi_data_lock);
> - ret = smi_data_buf_realloc(buf_size);
> + ret = smi_data_buf_realloc(dev, buf_size);
> mutex_unlock(&smi_data_lock);
> if (ret)
> return ret;
> @@ -166,6 +165,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
> struct bin_attribute *bin_attr,
> char *buf, loff_t pos, size_t count)
> {
> + struct device *dev = kobj_to_dev(kobj);
> ssize_t ret;
>
> if ((pos + count) > MAX_SMI_DATA_BUF_SIZE)
> @@ -173,7 +173,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
>
> mutex_lock(&smi_data_lock);
>
> - ret = smi_data_buf_realloc(pos + count);
> + ret = smi_data_buf_realloc(dev, pos + count);
> if (ret)
> goto out;
>
> @@ -199,7 +199,7 @@ static ssize_t host_control_action_store(struct device *dev,
>
> /* make sure buffer is available for host control command */
> mutex_lock(&smi_data_lock);
> - ret = smi_data_buf_realloc(sizeof(struct apm_cmd));
> + ret = smi_data_buf_realloc(dev, sizeof(struct apm_cmd));
> mutex_unlock(&smi_data_lock);
> if (ret)
> return ret;
> @@ -347,7 +347,7 @@ EXPORT_SYMBOL(dcdbas_smi_request);
> *
> * Caller must set up the host control command in smi_data_buf.
> */
> -static int host_control_smi(void)
> +static int host_control_smi(struct device *dev)
> {
> struct apm_cmd *apm_cmd;
> u8 *data;
> @@ -426,7 +426,7 @@ static int host_control_smi(void)
> break;
>
> default:
> - dev_dbg(&dcdbas_pdev->dev, "%s: invalid SMI type %u\n",
> + dev_dbg(dev, "%s: invalid SMI type %u\n",
> __func__, host_control_smi_type);
> return -ENOSYS;
> }
> @@ -443,7 +443,7 @@ static int host_control_smi(void)
> * use smi_data_buf at this point because the system has finished
> * shutting down and no userspace apps are running.
> */
> -static void dcdbas_host_control(void)
> +static void dcdbas_host_control(struct device *dev)
> {
> struct apm_cmd *apm_cmd;
> u8 action;
> @@ -455,13 +455,12 @@ static void dcdbas_host_control(void)
> host_control_action = HC_ACTION_NONE;
>
> if (!smi_data_buf) {
> - dev_dbg(&dcdbas_pdev->dev, "%s: no SMI buffer\n", __func__);
> + dev_dbg(dev, "%s: no SMI buffer\n", __func__);
> return;
> }
>
> if (smi_data_buf_size < sizeof(struct apm_cmd)) {
> - dev_dbg(&dcdbas_pdev->dev, "%s: SMI buffer too small\n",
> - __func__);
> + dev_dbg(dev, "%s: SMI buffer too small\n", __func__);
> return;
> }
>
> @@ -472,12 +471,12 @@ static void dcdbas_host_control(void)
> apm_cmd->command = ESM_APM_POWER_CYCLE;
> apm_cmd->reserved = 0;
> *((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 0;
> - host_control_smi();
> + host_control_smi(dev);
> } else if (action & HC_ACTION_HOST_CONTROL_POWERCYCLE) {
> apm_cmd->command = ESM_APM_POWER_CYCLE;
> apm_cmd->reserved = 0;
> *((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 20;
> - host_control_smi();
> + host_control_smi(dev);
> }
> }
>
> @@ -495,7 +494,7 @@ static int dcdbas_reboot_notify(struct notifier_block *nb, unsigned long code,
> /* firmware is going to perform host control action */
> printk(KERN_WARNING "Please wait for shutdown "
> "action to complete...\n");
> - dcdbas_host_control();
> + dcdbas_host_control(&dcdbas_pdev->dev);
> }
> break;
> }
> @@ -545,11 +544,13 @@ static int dcdbas_probe(struct platform_device *dev)
> host_control_action = HC_ACTION_NONE;
> host_control_smi_type = HC_SMITYPE_NONE;
>
> + dcdbas_pdev = dev;
> +
> /*
> * BIOS SMI calls require buffer addresses be in 32-bit address space.
> * This is done by setting the DMA mask below.
> */
> - error = dma_set_coherent_mask(&dcdbas_pdev->dev, DMA_BIT_MASK(32));
> + error = dma_set_coherent_mask(&dev->dev, DMA_BIT_MASK(32));
> if (error)
> return error;
>
> @@ -588,6 +589,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
> .dma_mask = DMA_BIT_MASK(32),
> };
>
> +static struct platform_device *dcdbas_pdev_reg;
> +
> /**
> * dcdbas_init: initialize driver
> */
> @@ -599,9 +602,9 @@ static int __init dcdbas_init(void)
> if (error)
> return error;
>
> - dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
> - if (IS_ERR(dcdbas_pdev)) {
> - error = PTR_ERR(dcdbas_pdev);
> + dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
> + if (IS_ERR(dcdbas_pdev_reg)) {
> + error = PTR_ERR(dcdbas_pdev_reg);
> goto err_unregister_driver;
> }
>
> @@ -629,8 +632,9 @@ static void __exit dcdbas_exit(void)
> * all sysfs attributes belonging to this module have been
> * released.
> */
> - smi_data_buf_free();
> - platform_device_unregister(dcdbas_pdev);
> + if (dcdbas_pdev)
> + smi_data_buf_free(&dcdbas_pdev->dev);
> + platform_device_unregister(dcdbas_pdev_reg);
> platform_driver_unregister(&dcdbas_driver);
> }
>

2013-10-07 12:18:18

by Fengguang Wu

[permalink] [raw]
Subject: Re: [DMA-API] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248

Hi Russell,

On Mon, Oct 07, 2013 at 09:38:13AM +0100, Russell King - ARM Linux wrote:
> Fengguang,
>
> Have you done anything with this at all? If not, don't expect me to
> sort this out, because I can't test this. Thanks.

Sorry it's my fault to overlook this! Just queued up the test..

Thanks,
Fengguang

> On Fri, Oct 04, 2013 at 11:41:36AM +0100, Russell King - ARM Linux wrote:
> > On Fri, Oct 04, 2013 at 09:40:17AM +0800, Fengguang Wu wrote:
> > > This BUG does not show up in upstream and linux-next, so either the
> > > commit has not been merged or has been fixed somewhere.
> >
> > That's because I haven't pushed it out into linux-next yet. Thanks
> > for testing nevertheless.
> >
> > > [ 267.537083] sdhci-pltfm: SDHCI platform and OF driver helper
> > > [ 267.602219] ledtrig-cpu: registered to indicate activity on CPUs
> > > [ 267.656654] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248
> > > [ 267.656689] IP: [<ffffffff810073c9>] dma_supported+0x9/0xa0
> >
> > This seems to be saying that 'dev' was approximately a NULL pointer.
> >
> > This is a wonderfully written driver this is... it's not really a
> > platform driver at all. It stores the platform device into the global
> > dcdbas_pdev, and then references it from within the driver. The problem
> > is caused by that happening before platform_device_register_full() has
> > returned and stored the platform device pointer. It's use of a
> > platform device is just to be able to have some sysfs attributes which
> > it can play around with (I wonder if anyone has reviewed what it does
> > with these...)
> >
> > You can also find goodies like this here:
> >
> > /*
> > * We have to free the buffer here instead of dcdbas_remove
> > * because only in module exit function we can be sure that
> > * all sysfs attributes belonging to this module have been
> > * released.
> > */
> > smi_data_buf_free();
> > platform_device_unregister(dcdbas_pdev);
> > platform_driver_unregister(&dcdbas_driver);
> >
> > which is quite a statement in itself, and if it's thought about, how
> > can putting smi_data_buf_free() before platform_device_unregister()
> > here satisfy that comment.
> >
> > Well, short of dropping the patch, I think the easiest solution here is
> > this, which annoyingly makes the mess slightly worse:
> >
> > diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
> > index a85fda2..0a751bf 100644
> > --- a/drivers/firmware/dcdbas.c
> > +++ b/drivers/firmware/dcdbas.c
> > @@ -545,6 +545,8 @@ static int dcdbas_probe(struct platform_device *dev)
> > host_control_action = HC_ACTION_NONE;
> > host_control_smi_type = HC_SMITYPE_NONE;
> >
> > + dcdbas_pdev = dev;
> > +
> > /*
> > * BIOS SMI calls require buffer addresses be in 32-bit address space.
> > * This is done by setting the DMA mask below.
> > @@ -588,6 +590,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
> > .dma_mask = DMA_BIT_MASK(32),
> > };
> >
> > +static struct platform_device *dcdbas_pdev_reg;
> > +
> > /**
> > * dcdbas_init: initialize driver
> > */
> > @@ -599,9 +603,9 @@ static int __init dcdbas_init(void)
> > if (error)
> > return error;
> >
> > - dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
> > - if (IS_ERR(dcdbas_pdev)) {
> > - error = PTR_ERR(dcdbas_pdev);
> > + dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
> > + if (IS_ERR(dcdbas_pdev_reg)) {
> > + error = PTR_ERR(dcdbas_pdev_reg);
> > goto err_unregister_driver;
> > }
> >
> > @@ -629,8 +633,9 @@ static void __exit dcdbas_exit(void)
> > * all sysfs attributes belonging to this module have been
> > * released.
> > */
> > - smi_data_buf_free();
> > - platform_device_unregister(dcdbas_pdev);
> > + if (dcdbas_pdev)
> > + smi_data_buf_free();
> > + platform_device_unregister(dcdbas_pdev_reg);
> > platform_driver_unregister(&dcdbas_driver);
> > }
> >
> > Another is to try and pull the direct references to dcdbas_pdev up the
> > call chains... that's easier said than done because there is a global
> > function in this driver - dcdbas_smi_request. The best I can get to
> > is the below, which leaves three direct uses of dcdbas_pdev:
> >
> > diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
> > index a85fda2..d9d215c 100644
> > --- a/drivers/firmware/dcdbas.c
> > +++ b/drivers/firmware/dcdbas.c
> > @@ -58,15 +58,15 @@ static unsigned int host_control_on_shutdown;
> > /**
> > * smi_data_buf_free: free SMI data buffer
> > */
> > -static void smi_data_buf_free(void)
> > +static void smi_data_buf_free(struct device *dev)
> > {
> > if (!smi_data_buf)
> > return;
> >
> > - dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
> > + dev_dbg(dev, "%s: phys: %x size: %lu\n",
> > __func__, smi_data_buf_phys_addr, smi_data_buf_size);
> >
> > - dma_free_coherent(&dcdbas_pdev->dev, smi_data_buf_size, smi_data_buf,
> > + dma_free_coherent(dev, smi_data_buf_size, smi_data_buf,
> > smi_data_buf_handle);
> > smi_data_buf = NULL;
> > smi_data_buf_handle = 0;
> > @@ -77,7 +77,7 @@ static void smi_data_buf_free(void)
> > /**
> > * smi_data_buf_realloc: grow SMI data buffer if needed
> > */
> > -static int smi_data_buf_realloc(unsigned long size)
> > +static int smi_data_buf_realloc(struct device *dev, unsigned long size)
> > {
> > void *buf;
> > dma_addr_t handle;
> > @@ -89,10 +89,9 @@ static int smi_data_buf_realloc(unsigned long size)
> > return -EINVAL;
> >
> > /* new buffer is needed */
> > - buf = dma_alloc_coherent(&dcdbas_pdev->dev, size, &handle, GFP_KERNEL);
> > + buf = dma_alloc_coherent(dev, size, &handle, GFP_KERNEL);
> > if (!buf) {
> > - dev_dbg(&dcdbas_pdev->dev,
> > - "%s: failed to allocate memory size %lu\n",
> > + dev_dbg(dev, "%s: failed to allocate memory size %lu\n",
> > __func__, size);
> > return -ENOMEM;
> > }
> > @@ -102,7 +101,7 @@ static int smi_data_buf_realloc(unsigned long size)
> > memcpy(buf, smi_data_buf, smi_data_buf_size);
> >
> > /* free any existing buffer */
> > - smi_data_buf_free();
> > + smi_data_buf_free(dev);
> >
> > /* set up new buffer for use */
> > smi_data_buf = buf;
> > @@ -110,7 +109,7 @@ static int smi_data_buf_realloc(unsigned long size)
> > smi_data_buf_phys_addr = (u32) virt_to_phys(buf);
> > smi_data_buf_size = size;
> >
> > - dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
> > + dev_dbg(dev, "%s: phys: %x size: %lu\n",
> > __func__, smi_data_buf_phys_addr, smi_data_buf_size);
> >
> > return 0;
> > @@ -141,7 +140,7 @@ static ssize_t smi_data_buf_size_store(struct device *dev,
> >
> > /* make sure SMI data buffer is at least buf_size */
> > mutex_lock(&smi_data_lock);
> > - ret = smi_data_buf_realloc(buf_size);
> > + ret = smi_data_buf_realloc(dev, buf_size);
> > mutex_unlock(&smi_data_lock);
> > if (ret)
> > return ret;
> > @@ -166,6 +165,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
> > struct bin_attribute *bin_attr,
> > char *buf, loff_t pos, size_t count)
> > {
> > + struct device *dev = kobj_to_dev(kobj);
> > ssize_t ret;
> >
> > if ((pos + count) > MAX_SMI_DATA_BUF_SIZE)
> > @@ -173,7 +173,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
> >
> > mutex_lock(&smi_data_lock);
> >
> > - ret = smi_data_buf_realloc(pos + count);
> > + ret = smi_data_buf_realloc(dev, pos + count);
> > if (ret)
> > goto out;
> >
> > @@ -199,7 +199,7 @@ static ssize_t host_control_action_store(struct device *dev,
> >
> > /* make sure buffer is available for host control command */
> > mutex_lock(&smi_data_lock);
> > - ret = smi_data_buf_realloc(sizeof(struct apm_cmd));
> > + ret = smi_data_buf_realloc(dev, sizeof(struct apm_cmd));
> > mutex_unlock(&smi_data_lock);
> > if (ret)
> > return ret;
> > @@ -347,7 +347,7 @@ EXPORT_SYMBOL(dcdbas_smi_request);
> > *
> > * Caller must set up the host control command in smi_data_buf.
> > */
> > -static int host_control_smi(void)
> > +static int host_control_smi(struct device *dev)
> > {
> > struct apm_cmd *apm_cmd;
> > u8 *data;
> > @@ -426,7 +426,7 @@ static int host_control_smi(void)
> > break;
> >
> > default:
> > - dev_dbg(&dcdbas_pdev->dev, "%s: invalid SMI type %u\n",
> > + dev_dbg(dev, "%s: invalid SMI type %u\n",
> > __func__, host_control_smi_type);
> > return -ENOSYS;
> > }
> > @@ -443,7 +443,7 @@ static int host_control_smi(void)
> > * use smi_data_buf at this point because the system has finished
> > * shutting down and no userspace apps are running.
> > */
> > -static void dcdbas_host_control(void)
> > +static void dcdbas_host_control(struct device *dev)
> > {
> > struct apm_cmd *apm_cmd;
> > u8 action;
> > @@ -455,13 +455,12 @@ static void dcdbas_host_control(void)
> > host_control_action = HC_ACTION_NONE;
> >
> > if (!smi_data_buf) {
> > - dev_dbg(&dcdbas_pdev->dev, "%s: no SMI buffer\n", __func__);
> > + dev_dbg(dev, "%s: no SMI buffer\n", __func__);
> > return;
> > }
> >
> > if (smi_data_buf_size < sizeof(struct apm_cmd)) {
> > - dev_dbg(&dcdbas_pdev->dev, "%s: SMI buffer too small\n",
> > - __func__);
> > + dev_dbg(dev, "%s: SMI buffer too small\n", __func__);
> > return;
> > }
> >
> > @@ -472,12 +471,12 @@ static void dcdbas_host_control(void)
> > apm_cmd->command = ESM_APM_POWER_CYCLE;
> > apm_cmd->reserved = 0;
> > *((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 0;
> > - host_control_smi();
> > + host_control_smi(dev);
> > } else if (action & HC_ACTION_HOST_CONTROL_POWERCYCLE) {
> > apm_cmd->command = ESM_APM_POWER_CYCLE;
> > apm_cmd->reserved = 0;
> > *((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 20;
> > - host_control_smi();
> > + host_control_smi(dev);
> > }
> > }
> >
> > @@ -495,7 +494,7 @@ static int dcdbas_reboot_notify(struct notifier_block *nb, unsigned long code,
> > /* firmware is going to perform host control action */
> > printk(KERN_WARNING "Please wait for shutdown "
> > "action to complete...\n");
> > - dcdbas_host_control();
> > + dcdbas_host_control(&dcdbas_pdev->dev);
> > }
> > break;
> > }
> > @@ -545,11 +544,13 @@ static int dcdbas_probe(struct platform_device *dev)
> > host_control_action = HC_ACTION_NONE;
> > host_control_smi_type = HC_SMITYPE_NONE;
> >
> > + dcdbas_pdev = dev;
> > +
> > /*
> > * BIOS SMI calls require buffer addresses be in 32-bit address space.
> > * This is done by setting the DMA mask below.
> > */
> > - error = dma_set_coherent_mask(&dcdbas_pdev->dev, DMA_BIT_MASK(32));
> > + error = dma_set_coherent_mask(&dev->dev, DMA_BIT_MASK(32));
> > if (error)
> > return error;
> >
> > @@ -588,6 +589,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
> > .dma_mask = DMA_BIT_MASK(32),
> > };
> >
> > +static struct platform_device *dcdbas_pdev_reg;
> > +
> > /**
> > * dcdbas_init: initialize driver
> > */
> > @@ -599,9 +602,9 @@ static int __init dcdbas_init(void)
> > if (error)
> > return error;
> >
> > - dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
> > - if (IS_ERR(dcdbas_pdev)) {
> > - error = PTR_ERR(dcdbas_pdev);
> > + dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
> > + if (IS_ERR(dcdbas_pdev_reg)) {
> > + error = PTR_ERR(dcdbas_pdev_reg);
> > goto err_unregister_driver;
> > }
> >
> > @@ -629,8 +632,9 @@ static void __exit dcdbas_exit(void)
> > * all sysfs attributes belonging to this module have been
> > * released.
> > */
> > - smi_data_buf_free();
> > - platform_device_unregister(dcdbas_pdev);
> > + if (dcdbas_pdev)
> > + smi_data_buf_free(&dcdbas_pdev->dev);
> > + platform_device_unregister(dcdbas_pdev_reg);
> > platform_driver_unregister(&dcdbas_driver);
> > }
> >

2013-10-08 00:13:46

by Fengguang Wu

[permalink] [raw]
Subject: Re: [DMA-API] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248

On Fri, Oct 04, 2013 at 11:41:36AM +0100, Russell King - ARM Linux wrote:
> On Fri, Oct 04, 2013 at 09:40:17AM +0800, Fengguang Wu wrote:
> > This BUG does not show up in upstream and linux-next, so either the
> > commit has not been merged or has been fixed somewhere.
>
> That's because I haven't pushed it out into linux-next yet. Thanks
> for testing nevertheless.
>
> > [ 267.537083] sdhci-pltfm: SDHCI platform and OF driver helper
> > [ 267.602219] ledtrig-cpu: registered to indicate activity on CPUs
> > [ 267.656654] BUG: unable to handle kernel NULL pointer dereference at 0000000000000248
> > [ 267.656689] IP: [<ffffffff810073c9>] dma_supported+0x9/0xa0
>
> This seems to be saying that 'dev' was approximately a NULL pointer.
>
> This is a wonderfully written driver this is... it's not really a
> platform driver at all. It stores the platform device into the global
> dcdbas_pdev, and then references it from within the driver. The problem
> is caused by that happening before platform_device_register_full() has
> returned and stored the platform device pointer. It's use of a
> platform device is just to be able to have some sysfs attributes which
> it can play around with (I wonder if anyone has reviewed what it does
> with these...)
>
> You can also find goodies like this here:
>
> /*
> * We have to free the buffer here instead of dcdbas_remove
> * because only in module exit function we can be sure that
> * all sysfs attributes belonging to this module have been
> * released.
> */
> smi_data_buf_free();
> platform_device_unregister(dcdbas_pdev);
> platform_driver_unregister(&dcdbas_driver);
>
> which is quite a statement in itself, and if it's thought about, how
> can putting smi_data_buf_free() before platform_device_unregister()
> here satisfy that comment.
>
> Well, short of dropping the patch, I think the easiest solution here is
> this, which annoyingly makes the mess slightly worse:

Russell, I confirm that the below patch fixes the kernel oops.
I tried 2000 boots and non of them has the oops any more.
Before patch, at commit 2713c99438b0, ALL kernel boots have the oops.

Tested-by: Fengguang Wu <[email protected]>

> diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
> index a85fda2..0a751bf 100644
> --- a/drivers/firmware/dcdbas.c
> +++ b/drivers/firmware/dcdbas.c
> @@ -545,6 +545,8 @@ static int dcdbas_probe(struct platform_device *dev)
> host_control_action = HC_ACTION_NONE;
> host_control_smi_type = HC_SMITYPE_NONE;
>
> + dcdbas_pdev = dev;
> +
> /*
> * BIOS SMI calls require buffer addresses be in 32-bit address space.
> * This is done by setting the DMA mask below.
> @@ -588,6 +590,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
> .dma_mask = DMA_BIT_MASK(32),
> };
>
> +static struct platform_device *dcdbas_pdev_reg;
> +
> /**
> * dcdbas_init: initialize driver
> */
> @@ -599,9 +603,9 @@ static int __init dcdbas_init(void)
> if (error)
> return error;
>
> - dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
> - if (IS_ERR(dcdbas_pdev)) {
> - error = PTR_ERR(dcdbas_pdev);
> + dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
> + if (IS_ERR(dcdbas_pdev_reg)) {
> + error = PTR_ERR(dcdbas_pdev_reg);
> goto err_unregister_driver;
> }
>
> @@ -629,8 +633,9 @@ static void __exit dcdbas_exit(void)
> * all sysfs attributes belonging to this module have been
> * released.
> */
> - smi_data_buf_free();
> - platform_device_unregister(dcdbas_pdev);
> + if (dcdbas_pdev)
> + smi_data_buf_free();
> + platform_device_unregister(dcdbas_pdev_reg);
> platform_driver_unregister(&dcdbas_driver);
> }
>
> Another is to try and pull the direct references to dcdbas_pdev up the
> call chains... that's easier said than done because there is a global
> function in this driver - dcdbas_smi_request. The best I can get to
> is the below, which leaves three direct uses of dcdbas_pdev:
>
> diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
> index a85fda2..d9d215c 100644
> --- a/drivers/firmware/dcdbas.c
> +++ b/drivers/firmware/dcdbas.c
> @@ -58,15 +58,15 @@ static unsigned int host_control_on_shutdown;
> /**
> * smi_data_buf_free: free SMI data buffer
> */
> -static void smi_data_buf_free(void)
> +static void smi_data_buf_free(struct device *dev)
> {
> if (!smi_data_buf)
> return;
>
> - dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
> + dev_dbg(dev, "%s: phys: %x size: %lu\n",
> __func__, smi_data_buf_phys_addr, smi_data_buf_size);
>
> - dma_free_coherent(&dcdbas_pdev->dev, smi_data_buf_size, smi_data_buf,
> + dma_free_coherent(dev, smi_data_buf_size, smi_data_buf,
> smi_data_buf_handle);
> smi_data_buf = NULL;
> smi_data_buf_handle = 0;
> @@ -77,7 +77,7 @@ static void smi_data_buf_free(void)
> /**
> * smi_data_buf_realloc: grow SMI data buffer if needed
> */
> -static int smi_data_buf_realloc(unsigned long size)
> +static int smi_data_buf_realloc(struct device *dev, unsigned long size)
> {
> void *buf;
> dma_addr_t handle;
> @@ -89,10 +89,9 @@ static int smi_data_buf_realloc(unsigned long size)
> return -EINVAL;
>
> /* new buffer is needed */
> - buf = dma_alloc_coherent(&dcdbas_pdev->dev, size, &handle, GFP_KERNEL);
> + buf = dma_alloc_coherent(dev, size, &handle, GFP_KERNEL);
> if (!buf) {
> - dev_dbg(&dcdbas_pdev->dev,
> - "%s: failed to allocate memory size %lu\n",
> + dev_dbg(dev, "%s: failed to allocate memory size %lu\n",
> __func__, size);
> return -ENOMEM;
> }
> @@ -102,7 +101,7 @@ static int smi_data_buf_realloc(unsigned long size)
> memcpy(buf, smi_data_buf, smi_data_buf_size);
>
> /* free any existing buffer */
> - smi_data_buf_free();
> + smi_data_buf_free(dev);
>
> /* set up new buffer for use */
> smi_data_buf = buf;
> @@ -110,7 +109,7 @@ static int smi_data_buf_realloc(unsigned long size)
> smi_data_buf_phys_addr = (u32) virt_to_phys(buf);
> smi_data_buf_size = size;
>
> - dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
> + dev_dbg(dev, "%s: phys: %x size: %lu\n",
> __func__, smi_data_buf_phys_addr, smi_data_buf_size);
>
> return 0;
> @@ -141,7 +140,7 @@ static ssize_t smi_data_buf_size_store(struct device *dev,
>
> /* make sure SMI data buffer is at least buf_size */
> mutex_lock(&smi_data_lock);
> - ret = smi_data_buf_realloc(buf_size);
> + ret = smi_data_buf_realloc(dev, buf_size);
> mutex_unlock(&smi_data_lock);
> if (ret)
> return ret;
> @@ -166,6 +165,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
> struct bin_attribute *bin_attr,
> char *buf, loff_t pos, size_t count)
> {
> + struct device *dev = kobj_to_dev(kobj);
> ssize_t ret;
>
> if ((pos + count) > MAX_SMI_DATA_BUF_SIZE)
> @@ -173,7 +173,7 @@ static ssize_t smi_data_write(struct file *filp, struct kobject *kobj,
>
> mutex_lock(&smi_data_lock);
>
> - ret = smi_data_buf_realloc(pos + count);
> + ret = smi_data_buf_realloc(dev, pos + count);
> if (ret)
> goto out;
>
> @@ -199,7 +199,7 @@ static ssize_t host_control_action_store(struct device *dev,
>
> /* make sure buffer is available for host control command */
> mutex_lock(&smi_data_lock);
> - ret = smi_data_buf_realloc(sizeof(struct apm_cmd));
> + ret = smi_data_buf_realloc(dev, sizeof(struct apm_cmd));
> mutex_unlock(&smi_data_lock);
> if (ret)
> return ret;
> @@ -347,7 +347,7 @@ EXPORT_SYMBOL(dcdbas_smi_request);
> *
> * Caller must set up the host control command in smi_data_buf.
> */
> -static int host_control_smi(void)
> +static int host_control_smi(struct device *dev)
> {
> struct apm_cmd *apm_cmd;
> u8 *data;
> @@ -426,7 +426,7 @@ static int host_control_smi(void)
> break;
>
> default:
> - dev_dbg(&dcdbas_pdev->dev, "%s: invalid SMI type %u\n",
> + dev_dbg(dev, "%s: invalid SMI type %u\n",
> __func__, host_control_smi_type);
> return -ENOSYS;
> }
> @@ -443,7 +443,7 @@ static int host_control_smi(void)
> * use smi_data_buf at this point because the system has finished
> * shutting down and no userspace apps are running.
> */
> -static void dcdbas_host_control(void)
> +static void dcdbas_host_control(struct device *dev)
> {
> struct apm_cmd *apm_cmd;
> u8 action;
> @@ -455,13 +455,12 @@ static void dcdbas_host_control(void)
> host_control_action = HC_ACTION_NONE;
>
> if (!smi_data_buf) {
> - dev_dbg(&dcdbas_pdev->dev, "%s: no SMI buffer\n", __func__);
> + dev_dbg(dev, "%s: no SMI buffer\n", __func__);
> return;
> }
>
> if (smi_data_buf_size < sizeof(struct apm_cmd)) {
> - dev_dbg(&dcdbas_pdev->dev, "%s: SMI buffer too small\n",
> - __func__);
> + dev_dbg(dev, "%s: SMI buffer too small\n", __func__);
> return;
> }
>
> @@ -472,12 +471,12 @@ static void dcdbas_host_control(void)
> apm_cmd->command = ESM_APM_POWER_CYCLE;
> apm_cmd->reserved = 0;
> *((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 0;
> - host_control_smi();
> + host_control_smi(dev);
> } else if (action & HC_ACTION_HOST_CONTROL_POWERCYCLE) {
> apm_cmd->command = ESM_APM_POWER_CYCLE;
> apm_cmd->reserved = 0;
> *((s16 *)&apm_cmd->parameters.shortreq.parm[0]) = (s16) 20;
> - host_control_smi();
> + host_control_smi(dev);
> }
> }
>
> @@ -495,7 +494,7 @@ static int dcdbas_reboot_notify(struct notifier_block *nb, unsigned long code,
> /* firmware is going to perform host control action */
> printk(KERN_WARNING "Please wait for shutdown "
> "action to complete...\n");
> - dcdbas_host_control();
> + dcdbas_host_control(&dcdbas_pdev->dev);
> }
> break;
> }
> @@ -545,11 +544,13 @@ static int dcdbas_probe(struct platform_device *dev)
> host_control_action = HC_ACTION_NONE;
> host_control_smi_type = HC_SMITYPE_NONE;
>
> + dcdbas_pdev = dev;
> +
> /*
> * BIOS SMI calls require buffer addresses be in 32-bit address space.
> * This is done by setting the DMA mask below.
> */
> - error = dma_set_coherent_mask(&dcdbas_pdev->dev, DMA_BIT_MASK(32));
> + error = dma_set_coherent_mask(&dev->dev, DMA_BIT_MASK(32));
> if (error)
> return error;
>
> @@ -588,6 +589,8 @@ static const struct platform_device_info dcdbas_dev_info __initdata = {
> .dma_mask = DMA_BIT_MASK(32),
> };
>
> +static struct platform_device *dcdbas_pdev_reg;
> +
> /**
> * dcdbas_init: initialize driver
> */
> @@ -599,9 +602,9 @@ static int __init dcdbas_init(void)
> if (error)
> return error;
>
> - dcdbas_pdev = platform_device_register_full(&dcdbas_dev_info);
> - if (IS_ERR(dcdbas_pdev)) {
> - error = PTR_ERR(dcdbas_pdev);
> + dcdbas_pdev_reg = platform_device_register_full(&dcdbas_dev_info);
> + if (IS_ERR(dcdbas_pdev_reg)) {
> + error = PTR_ERR(dcdbas_pdev_reg);
> goto err_unregister_driver;
> }
>
> @@ -629,8 +632,9 @@ static void __exit dcdbas_exit(void)
> * all sysfs attributes belonging to this module have been
> * released.
> */
> - smi_data_buf_free();
> - platform_device_unregister(dcdbas_pdev);
> + if (dcdbas_pdev)
> + smi_data_buf_free(&dcdbas_pdev->dev);
> + platform_device_unregister(dcdbas_pdev_reg);
> platform_driver_unregister(&dcdbas_driver);
> }
>