Subject: Random crashes with v4l2_device_register_subdev

Hi,

Recently I received a report about crashes with mxb v4l driver
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/745213), there are
two slightly different reported crashes which I paste here:

1) BootDmesg.txt crash:

[ 21.210232] general protection fault: 0000 [#1] SMP
[ 21.210292] last sysfs file: /sys/bus/i2c/drivers/tda9840/uevent
[ 21.210357] CPU 0
[ 21.210379] Modules linked in: tda9840 tea6415c tea6420 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec saa7115 mxb(+) snd_seq_midi snd_hwdep snd_rawmidi snd_pcm snd_seq_midi_event snd_seq snd_timer snd_seq_device saa7146_vv saa7146 snd videobuf_dma_sg videobuf_core v4l2_common videodev edac_core soundcore shpchp snd_page_alloc sp5100_tco edac_mce_amd xhci_hcd v4l2_compat_ioctl32 k10temp i2c_piix4 asus_atk0110 lp parport radeon usbhid ttm hid drm_kms_helper drm ahci sym53c8xx firewire_ohci e1000 scsi_transport_spi r8169 pata_atiixp firewire_core i2c_algo_bit libahci crc_itu_t pata_via
[ 21.211030]
[ 21.211047] Pid: 812, comm: work_for_cpu Not tainted 2.6.38-7-generic #39-Ubuntu System manufacturer System Product Name/M4A88TD-V EVO/USB3
[ 21.211187] RIP: 0010:[<ffffffffa029e745>] [<ffffffffa029e745>] v4l2_device_register_subdev+0x95/0x170 [videodev]
[ 21.211304] RSP: 0018:ffff880209de5d10 EFLAGS: 00010202
[ 21.211362] RAX: 6564656572662e67 RBX: 00000000ffffffea RCX: ffff88020f38d0c8
[ 21.211438] RDX: ffff88020ea90e40 RSI: ffff88020b5ad380 RDI: ffff88020eb72018
[ 21.211514] RBP: ffff880209de5d40 R08: 0000000000000000 R09: dead000000200200
[ 21.211590] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88020b5ad380
[ 21.211667] R13: ffff88020eb72018 R14: ffffffffa03820e0 R15: ffff88020eb72018
[ 21.211743] FS: 00007f9933fa1720(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
[ 21.211831] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 21.211892] CR2: 000000000048c000 CR3: 000000020fb3b000 CR4: 00000000000006f0
[ 21.211968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 21.212045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 21.212121] Process work_for_cpu (pid: 812, threadinfo ffff880209de4000, task ffff88020d2716c0)
[ 21.212214] Stack:
[ 21.212236] ffff880209de5d40 ffffffff8146fc05 ffff88020b097c00 ffff88020b5ad380
[ 21.212318] ffff88020eb72018 ffffffffa03820e0 ffff880209de5d70 ffffffffa028f6b5
[ 21.212401] ffff880209de5d80 0000000000000000 0000000000000042 ffff88020eb52010
[ 21.212484] Call Trace:
[ 21.212513] [<ffffffff8146fc05>] ? i2c_new_device+0x135/0x1c0
[ 21.212578] [<ffffffffa028f6b5>] v4l2_i2c_new_subdev_board+0xf5/0x160 [v4l2_common]
[ 21.212663] [<ffffffffa028f795>] v4l2_i2c_new_subdev+0x75/0xa0 [v4l2_common]
[ 21.215825] [<ffffffffa0333714>] mxb_probe+0x124/0x260 [mxb]
[ 21.219016] [<ffffffffa03345a3>] mxb_attach+0x33/0x1f0 [mxb]
[ 21.220214] [<ffffffffa02e24c7>] saa7146_init_one+0x887/0x13c0 [saa7146]
[ 21.220214] [<ffffffff812fef4f>] local_pci_probe+0x5f/0xd0
[ 21.220214] [<ffffffff8107f890>] ? do_work_for_cpu+0x0/0x30
[ 21.220214] [<ffffffff8107f8a8>] do_work_for_cpu+0x18/0x30
[ 21.220214] [<ffffffff81086fe6>] kthread+0x96/0xa0
[ 21.220214] [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
[ 21.220214] [<ffffffff81086f50>] ? kthread+0x0/0xa0
[ 21.220214] [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
[ 21.220214] Code: f6 74 19 41 83 3e 02 0f 84 eb 00 00 00 49 8b 86 68 02 00 00 65 ff 00 66 66 66 66 90 49 8b 44 24 30 4d 89 6c 24 20 48 85 c0 74 13 <48> 8b 00 48 85 c0 74 0b 4c 89 e7 ff d0 85 c0 89 c3 75 85 49 8b
[ 21.220214] RIP [<ffffffffa029e745>] v4l2_device_register_subdev+0x95/0x170 [videodev]
[ 21.220214] RSP <ffff880209de5d10>
[ 21.260394] ---[ end trace 5adfede4ff2f2907 ]---

2) Another crash, OopsText.txt:

043b
IP: [<ffffffffa014c745>] v4l2_device_register_subdev+0x95/0x170 [videodev]
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/bus/i2c/drivers/tea6415c/uevent
CPU 0
Modules linked in: tea6415c binfmt_misc tea6420 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi saa7115 snd_seq_midi_event edac_core edac_mce_amd snd_seq mxb(+) saa7146_vv k10temp saa7146 videobuf_dma_sg videobuf_core asus_atk0110 v4l2_common snd_timer snd_seq_device videodev v4l2_compat_ioctl32 snd soundcore snd_page_alloc sp5100_tco i2c_piix4 xhci_hcd shpchp lp parport usbhid hid sym53c8xx ahci scsi_transport_spi e1000 libahci firewire_ohci r8169 pata_via pata_atiixp firewire_core crc_itu_t

Pid: 589, comm: work_for_cpu Not tainted 2.6.38-7-generic #39-Ubuntu System manufacturer System Product Name/M4A88TD-V EVO/USB3
RIP: 0010:[<ffffffffa014c745>] [<ffffffffa014c745>] v4l2_device_register_subdev+0x95/0x170 [videodev]
RSP: 0018:ffff88020e8dfd10 EFLAGS: 00010202
RAX: 000000000000043b RBX: 00000000ffffffea RCX: ffff88020e889908
RDX: ffff88020aeeb240 RSI: ffff88020f5a8200 RDI: ffff88020ff61a18
RBP: ffff88020e8dfd40 R08: 0000000000000000 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000004 R12: ffff88020f5a8200
R13: ffff88020ff61a18 R14: ffffffffa01f10e0 R15: ffff88020ff61a18
FS: 00007f0bb556e720(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000043b CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process work_for_cpu (pid: 589, threadinfo ffff88020e8de000, task ffff88020e8cdb00)
Stack:
ffff88020e8dfd40 ffffffff8146fc05 ffff88020691b800 ffff88020f5a8200
ffff88020ff61a18 ffffffffa01f10e0 ffff88020e8dfd70 ffffffffa01726b5
ffff88020e8dfd80 0000000000000000 0000000000000043 ffff88020dac6010
Call Trace:
[<ffffffff8146fc05>] ? i2c_new_device+0x135/0x1c0
[<ffffffffa01726b5>] v4l2_i2c_new_subdev_board+0xf5/0x160 [v4l2_common]
[<ffffffffa0172795>] v4l2_i2c_new_subdev+0x75/0xa0 [v4l2_common]
[<ffffffffa01d06f3>] mxb_probe+0x103/0x260 [mxb]
[<ffffffffa01d15a3>] mxb_attach+0x33/0x1f0 [mxb]
[<ffffffffa01924c7>] saa7146_init_one+0x887/0x13c0 [saa7146]
[<ffffffff812fef4f>] local_pci_probe+0x5f/0xd0
[<ffffffff8107f890>] ? do_work_for_cpu+0x0/0x30
[<ffffffff8107f8a8>] do_work_for_cpu+0x18/0x30
[<ffffffff81086fe6>] kthread+0x96/0xa0
[<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
[<ffffffff81086f50>] ? kthread+0x0/0xa0
[<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
Code: f6 74 19 41 83 3e 02 0f 84 eb 00 00 00 49 8b 86 68 02 00 00 65 ff 00 66 66 66 66 90 49 8b 44 24 30 4d 89 6c 24 20 48 85 c0 74 13 <48> 8b 00 48 85 c0 74 0b 4c 89 e7 ff d0 85 c0 89 c3 75 85 49 8b
RIP [<ffffffffa014c745>] v4l2_device_register_subdev+0x95/0x170 [videodev]
RSP <ffff88020e8dfd10>
CR2: 000000000000043b
---[ end trace f6215d41cb05d370 ]---


The crashs are on same place, v4l2_device_register_subdev+0x95/0x170

Using the debug symbols of kernel above [1] and source [2], we can see:
(gdb) l *(v4l2_device_register_subdev+0x95)
0x6775 is in v4l2_device_register_subdev (/build/buildd/linux-2.6.38/drivers/media/video/v4l2-device.c:132).
127 /* Warn if we apparently re-register a subdev */
128 WARN_ON(sd->v4l2_dev != NULL);
129 if (!try_module_get(sd->owner))
130 return -ENODEV;
131 sd->v4l2_dev = v4l2_dev;
132 if (sd->internal_ops && sd->internal_ops->registered) {
133 err = sd->internal_ops->registered(sd);
134 if (err)
135 return err;
136 }

So the crash points out to be in dereference of sd->internal_ops, as if
it was sd it would likely crash earlier as sd is used previously in the
code.

And indeed if we look at decodecode of the oops, it matches
sd->internal_ops->registered dereference where the crash happens:

0: f6 74 19 41 divb 0x41(%rcx,%rbx,1)
4: 83 3e 02 cmpl $0x2,(%rsi)
7: 0f 84 eb 00 00 00 je 0xf8
d: 49 8b 86 68 02 00 00 mov 0x268(%r14),%rax
14: 65 ff 00 incl %gs:(%rax)
17: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax
1c: 49 8b 44 24 30 mov 0x30(%r12),%rax
21: 4d 89 6c 24 20 mov %r13,0x20(%r12)
26: 48 85 c0 test %rax,%rax
29: 74 13 je 0x3e
2b:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
2e: 48 85 c0 test %rax,%rax
31: 74 0b je 0x3e
33: 4c 89 e7 mov %r12,%rdi
36: ff d0 callq *%rax
38: 85 c0 test %eax,%eax
3a: 89 c3 mov %eax,%ebx
3c: 75 85 jne 0xffffffffffffffc3
3e: 49 rex.WB
3f: 8b .byte 0x8b

(gdb) p &((struct v4l2_subdev *)0)->internal_ops
$1 = (const struct v4l2_subdev_internal_ops **) 0x30
(gdb) p &((struct v4l2_subdev_internal_ops *)0)->registered
$2 = (int (**)(struct v4l2_subdev *)) 0x0

So it dereferences and tests successfuly sd->internal_ops:
mov 0x30(%r12),%rax
...
test %rax,%rax
and when it tries to dereference sd->internal_ops->registered it
crashes:
mov (%rax),%rax

But looking at oopses above, look that RAX in each case has a different
value, in first it tries to dereference 0x6564656572662e67, in the other
it is 0x43b instead, so random values in sd->internal_ops

Now wonder why internal_ops got random values. Well, looking back in the
traces, specially at mxb_probe where each crash happens, we can see:

* for the first oops:
(gdb) l *(mxb_probe+0x124)
0x744 is in mxb_probe (/build/buildd/linux-2.6.38/drivers/media/video/mxb.c:197).
192 "tea6420", I2C_TEA6420_2, NULL);
193 mxb->tea6415c = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
194 "tea6415c", I2C_TEA6415C, NULL);
195 mxb->tda9840 = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
196 "tda9840", I2C_TDA9840, NULL);
197 mxb->tuner = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
198 "tuner", I2C_TUNER, NULL);
199
200 /* check if all devices are present */
201 if (!mxb->tea6420_1 || !mxb->tea6420_2 || !mxb->tea6415c ||

* for the second oops:
(gdb) l *(mxb_probe+0x103)
0x723 is in mxb_probe (/build/buildd/linux-2.6.38/drivers/media/video/mxb.c:195).
190 "tea6420", I2C_TEA6420_1, NULL);
191 mxb->tea6420_2 = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
192 "tea6420", I2C_TEA6420_2, NULL);
193 mxb->tea6415c = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
194 "tea6415c", I2C_TEA6415C, NULL);
195 mxb->tda9840 = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
196 "tda9840", I2C_TDA9840, NULL);
197 mxb->tuner = v4l2_i2c_new_subdev(&dev->v4l2_dev, &mxb->i2c_adapter,
198 "tuner", I2C_TUNER, NULL);
199

Hmm the crash seems to be at random, in the cases above probably
when getting tea6415c/tda9840.

Looking at the code then, notice how sd (struct v4l2_subdev *) is
allocated, for example in tda9840_probe we have
sd = kmalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);

So it's not kzalloc, the same holds for tea6415c, its probe function
uses kmalloc as well.

And this is why sd->internal_ops should be getting a random value. I
don't see anywhere in current code where we clear sd->internal_ops on
initialization, and as many of these tuners etc. allocate using just
kmalloc, we get random data in sd.

A fix could be to drivers allocating/zeroing sd using kzalloc/memset
before use.

But seems current v4l code assumes drivers can use kmalloc, and
initialization is done in v4l2_subdev_init, so in a reply to this I
propose a patch to initialize sd->internal_ops to null in
v4l2_subdev_init, which should fix these random crashes.

[1] http://ddebs.ubuntu.com/pool/main/l/linux/linux-image-2.6.38-7-generic-dbgsym_2.6.38-7.39_amd64.ddeb
[2] git://kernel.ubuntu.com/ubuntu/ubuntu-natty.git
(pointing to Ubuntu-2.6.38-7.39 tag checkout)

--
[]'s
Herton


Subject: [PATCH] v4l: subdev: initialize sd->internal_ops in v4l2_subdev_init

Many v4l drivers currently don't initialize their struct v4l2_subdev
with zeros, so since the addition of internal_ops in commit 45f6f84, we
are at risk of random oopses when code in v4l2_device_register_subdev
tries to dereference sd->internal_ops->*, as can be shown by the report
at http://bugs.launchpad.net/bugs/745213

So make sure internal_ops is cleared in v4l2_subdev_init.

BugLink: http://bugs.launchpad.net/bugs/745213
Cc: <[email protected]> # .38.x
Signed-off-by: Herton Ronaldo Krzesinski <[email protected]>
---
drivers/media/video/v4l2-subdev.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/media/video/v4l2-subdev.c b/drivers/media/video/v4l2-subdev.c
index 0b80644..0f70c74 100644
--- a/drivers/media/video/v4l2-subdev.c
+++ b/drivers/media/video/v4l2-subdev.c
@@ -324,6 +324,7 @@ void v4l2_subdev_init(struct v4l2_subdev *sd, const struct v4l2_subdev_ops *ops)
sd->grp_id = 0;
sd->dev_priv = NULL;
sd->host_priv = NULL;
+ sd->internal_ops = NULL;
#if defined(CONFIG_MEDIA_CONTROLLER)
sd->entity.name = sd->name;
sd->entity.type = MEDIA_ENT_T_V4L2_SUBDEV;
--
1.7.1

2011-04-01 14:41:39

by Andy Walls

[permalink] [raw]
Subject: Re: Random crashes with v4l2_device_register_subdev

Herton Ronaldo Krzesinski <[email protected]> wrote:

>Hi,
>
>Recently I received a report about crashes with mxb v4l driver
>(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/745213), there
>are
>two slightly different reported crashes which I paste here:
>
>1) BootDmesg.txt crash:
>
>[ 21.210232] general protection fault: 0000 [#1] SMP
>[ 21.210292] last sysfs file: /sys/bus/i2c/drivers/tda9840/uevent
>[ 21.210357] CPU 0
>[ 21.210379] Modules linked in: tda9840 tea6415c tea6420
>snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
>saa7115 mxb(+) snd_seq_midi snd_hwdep snd_rawmidi snd_pcm
>snd_seq_midi_event snd_seq snd_timer snd_seq_device saa7146_vv saa7146
>snd videobuf_dma_sg videobuf_core v4l2_common videodev edac_core
>soundcore shpchp snd_page_alloc sp5100_tco edac_mce_amd xhci_hcd
>v4l2_compat_ioctl32 k10temp i2c_piix4 asus_atk0110 lp parport radeon
>usbhid ttm hid drm_kms_helper drm ahci sym53c8xx firewire_ohci e1000
>scsi_transport_spi r8169 pata_atiixp firewire_core i2c_algo_bit libahci
>crc_itu_t pata_via
>[ 21.211030]
>[ 21.211047] Pid: 812, comm: work_for_cpu Not tainted
>2.6.38-7-generic #39-Ubuntu System manufacturer System Product
>Name/M4A88TD-V EVO/USB3
>[ 21.211187] RIP: 0010:[<ffffffffa029e745>] [<ffffffffa029e745>]
>v4l2_device_register_subdev+0x95/0x170 [videodev]
>[ 21.211304] RSP: 0018:ffff880209de5d10 EFLAGS: 00010202
>[ 21.211362] RAX: 6564656572662e67 RBX: 00000000ffffffea RCX:
>ffff88020f38d0c8
>[ 21.211438] RDX: ffff88020ea90e40 RSI: ffff88020b5ad380 RDI:
>ffff88020eb72018
>[ 21.211514] RBP: ffff880209de5d40 R08: 0000000000000000 R09:
>dead000000200200
>[ 21.211590] R10: 0000000000000001 R11: 0000000000000001 R12:
>ffff88020b5ad380
>[ 21.211667] R13: ffff88020eb72018 R14: ffffffffa03820e0 R15:
>ffff88020eb72018
>[ 21.211743] FS: 00007f9933fa1720(0000) GS:ffff8800cfc00000(0000)
>knlGS:0000000000000000
>[ 21.211831] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>[ 21.211892] CR2: 000000000048c000 CR3: 000000020fb3b000 CR4:
>00000000000006f0
>[ 21.211968] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>[ 21.212045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>0000000000000400
>[ 21.212121] Process work_for_cpu (pid: 812, threadinfo
>ffff880209de4000, task ffff88020d2716c0)
>[ 21.212214] Stack:
>[ 21.212236] ffff880209de5d40 ffffffff8146fc05 ffff88020b097c00
>ffff88020b5ad380
>[ 21.212318] ffff88020eb72018 ffffffffa03820e0 ffff880209de5d70
>ffffffffa028f6b5
>[ 21.212401] ffff880209de5d80 0000000000000000 0000000000000042
>ffff88020eb52010
>[ 21.212484] Call Trace:
>[ 21.212513] [<ffffffff8146fc05>] ? i2c_new_device+0x135/0x1c0
>[ 21.212578] [<ffffffffa028f6b5>]
>v4l2_i2c_new_subdev_board+0xf5/0x160 [v4l2_common]
>[ 21.212663] [<ffffffffa028f795>] v4l2_i2c_new_subdev+0x75/0xa0
>[v4l2_common]
>[ 21.215825] [<ffffffffa0333714>] mxb_probe+0x124/0x260 [mxb]
>[ 21.219016] [<ffffffffa03345a3>] mxb_attach+0x33/0x1f0 [mxb]
>[ 21.220214] [<ffffffffa02e24c7>] saa7146_init_one+0x887/0x13c0
>[saa7146]
>[ 21.220214] [<ffffffff812fef4f>] local_pci_probe+0x5f/0xd0
>[ 21.220214] [<ffffffff8107f890>] ? do_work_for_cpu+0x0/0x30
>[ 21.220214] [<ffffffff8107f8a8>] do_work_for_cpu+0x18/0x30
>[ 21.220214] [<ffffffff81086fe6>] kthread+0x96/0xa0
>[ 21.220214] [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
>[ 21.220214] [<ffffffff81086f50>] ? kthread+0x0/0xa0
>[ 21.220214] [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
>[ 21.220214] Code: f6 74 19 41 83 3e 02 0f 84 eb 00 00 00 49 8b 86 68
>02 00 00 65 ff 00 66 66 66 66 90 49 8b 44 24 30 4d 89 6c 24 20 48 85 c0
>74 13 <48> 8b 00 48 85 c0 74 0b 4c 89 e7 ff d0 85 c0 89 c3 75 85 49 8b
>[ 21.220214] RIP [<ffffffffa029e745>]
>v4l2_device_register_subdev+0x95/0x170 [videodev]
>[ 21.220214] RSP <ffff880209de5d10>
>[ 21.260394] ---[ end trace 5adfede4ff2f2907 ]---
>
>2) Another crash, OopsText.txt:
>
>043b
>IP: [<ffffffffa014c745>] v4l2_device_register_subdev+0x95/0x170
>[videodev]
>PGD 0
>Oops: 0000 [#1] SMP
>last sysfs file: /sys/bus/i2c/drivers/tea6415c/uevent
>CPU 0
>Modules linked in: tea6415c binfmt_misc tea6420 snd_hda_codec_hdmi
>snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
>snd_seq_midi snd_rawmidi saa7115 snd_seq_midi_event edac_core
>edac_mce_amd snd_seq mxb(+) saa7146_vv k10temp saa7146 videobuf_dma_sg
>videobuf_core asus_atk0110 v4l2_common snd_timer snd_seq_device
>videodev v4l2_compat_ioctl32 snd soundcore snd_page_alloc sp5100_tco
>i2c_piix4 xhci_hcd shpchp lp parport usbhid hid sym53c8xx ahci
>scsi_transport_spi e1000 libahci firewire_ohci r8169 pata_via
>pata_atiixp firewire_core crc_itu_t
>
>Pid: 589, comm: work_for_cpu Not tainted 2.6.38-7-generic #39-Ubuntu
>System manufacturer System Product Name/M4A88TD-V EVO/USB3
>RIP: 0010:[<ffffffffa014c745>] [<ffffffffa014c745>]
>v4l2_device_register_subdev+0x95/0x170 [videodev]
>RSP: 0018:ffff88020e8dfd10 EFLAGS: 00010202
>RAX: 000000000000043b RBX: 00000000ffffffea RCX: ffff88020e889908
>RDX: ffff88020aeeb240 RSI: ffff88020f5a8200 RDI: ffff88020ff61a18
>RBP: ffff88020e8dfd40 R08: 0000000000000000 R09: 0000000000000002
>R10: 0000000000000000 R11: 0000000000000004 R12: ffff88020f5a8200
>R13: ffff88020ff61a18 R14: ffffffffa01f10e0 R15: ffff88020ff61a18
>FS: 00007f0bb556e720(0000) GS:ffff8800cfc00000(0000)
>knlGS:0000000000000000
>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 000000000000043b CR3: 0000000001a03000 CR4: 00000000000006f0
>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>Process work_for_cpu (pid: 589, threadinfo ffff88020e8de000, task
>ffff88020e8cdb00)
>Stack:
> ffff88020e8dfd40 ffffffff8146fc05 ffff88020691b800 ffff88020f5a8200
> ffff88020ff61a18 ffffffffa01f10e0 ffff88020e8dfd70 ffffffffa01726b5
> ffff88020e8dfd80 0000000000000000 0000000000000043 ffff88020dac6010
>Call Trace:
> [<ffffffff8146fc05>] ? i2c_new_device+0x135/0x1c0
>[<ffffffffa01726b5>] v4l2_i2c_new_subdev_board+0xf5/0x160 [v4l2_common]
> [<ffffffffa0172795>] v4l2_i2c_new_subdev+0x75/0xa0 [v4l2_common]
> [<ffffffffa01d06f3>] mxb_probe+0x103/0x260 [mxb]
> [<ffffffffa01d15a3>] mxb_attach+0x33/0x1f0 [mxb]
> [<ffffffffa01924c7>] saa7146_init_one+0x887/0x13c0 [saa7146]
> [<ffffffff812fef4f>] local_pci_probe+0x5f/0xd0
> [<ffffffff8107f890>] ? do_work_for_cpu+0x0/0x30
> [<ffffffff8107f8a8>] do_work_for_cpu+0x18/0x30
> [<ffffffff81086fe6>] kthread+0x96/0xa0
> [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
> [<ffffffff81086f50>] ? kthread+0x0/0xa0
> [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
>Code: f6 74 19 41 83 3e 02 0f 84 eb 00 00 00 49 8b 86 68 02 00 00 65 ff
>00 66 66 66 66 90 49 8b 44 24 30 4d 89 6c 24 20 48 85 c0 74 13 <48> 8b
>00 48 85 c0 74 0b 4c 89 e7 ff d0 85 c0 89 c3 75 85 49 8b
>RIP [<ffffffffa014c745>] v4l2_device_register_subdev+0x95/0x170
>[videodev]
> RSP <ffff88020e8dfd10>
>CR2: 000000000000043b
>---[ end trace f6215d41cb05d370 ]---
>
>
>The crashs are on same place, v4l2_device_register_subdev+0x95/0x170
>
>Using the debug symbols of kernel above [1] and source [2], we can see:
>(gdb) l *(v4l2_device_register_subdev+0x95)
>0x6775 is in v4l2_device_register_subdev
>(/build/buildd/linux-2.6.38/drivers/media/video/v4l2-device.c:132).
>127 /* Warn if we apparently re-register a subdev */
>128 WARN_ON(sd->v4l2_dev != NULL);
>129 if (!try_module_get(sd->owner))
>130 return -ENODEV;
>131 sd->v4l2_dev = v4l2_dev;
>132 if (sd->internal_ops && sd->internal_ops->registered) {
>133 err = sd->internal_ops->registered(sd);
>134 if (err)
>135 return err;
>136 }
>
>So the crash points out to be in dereference of sd->internal_ops, as if
>it was sd it would likely crash earlier as sd is used previously in the
>code.
>
>And indeed if we look at decodecode of the oops, it matches
>sd->internal_ops->registered dereference where the crash happens:
>
> 0: f6 74 19 41 divb 0x41(%rcx,%rbx,1)
> 4: 83 3e 02 cmpl $0x2,(%rsi)
> 7: 0f 84 eb 00 00 00 je 0xf8
> d: 49 8b 86 68 02 00 00 mov 0x268(%r14),%rax
> 14: 65 ff 00 incl %gs:(%rax)
> 17: 66 66 66 66 90 data32 data32 data32 xchg %ax,%ax
> 1c: 49 8b 44 24 30 mov 0x30(%r12),%rax
> 21: 4d 89 6c 24 20 mov %r13,0x20(%r12)
> 26: 48 85 c0 test %rax,%rax
> 29: 74 13 je 0x3e
>2b:* 48 8b 00 mov (%rax),%rax <-- trapping
>instruction
> 2e: 48 85 c0 test %rax,%rax
> 31: 74 0b je 0x3e
> 33: 4c 89 e7 mov %r12,%rdi
> 36: ff d0 callq *%rax
> 38: 85 c0 test %eax,%eax
> 3a: 89 c3 mov %eax,%ebx
> 3c: 75 85 jne 0xffffffffffffffc3
> 3e: 49 rex.WB
> 3f: 8b .byte 0x8b
>
>(gdb) p &((struct v4l2_subdev *)0)->internal_ops
>$1 = (const struct v4l2_subdev_internal_ops **) 0x30
>(gdb) p &((struct v4l2_subdev_internal_ops *)0)->registered
>$2 = (int (**)(struct v4l2_subdev *)) 0x0
>
>So it dereferences and tests successfuly sd->internal_ops:
>mov 0x30(%r12),%rax
>...
>test %rax,%rax
>and when it tries to dereference sd->internal_ops->registered it
>crashes:
>mov (%rax),%rax
>
>But looking at oopses above, look that RAX in each case has a different
>value, in first it tries to dereference 0x6564656572662e67, in the
>other
>it is 0x43b instead, so random values in sd->internal_ops
>
>Now wonder why internal_ops got random values. Well, looking back in
>the
>traces, specially at mxb_probe where each crash happens, we can see:
>
>* for the first oops:
>(gdb) l *(mxb_probe+0x124)
>0x744 is in mxb_probe
>(/build/buildd/linux-2.6.38/drivers/media/video/mxb.c:197).
>192 "tea6420", I2C_TEA6420_2, NULL);
>193 mxb->tea6415c = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>194 "tea6415c", I2C_TEA6415C, NULL);
>195 mxb->tda9840 = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>196 "tda9840", I2C_TDA9840, NULL);
>197 mxb->tuner = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>198 "tuner", I2C_TUNER, NULL);
>199
>200 /* check if all devices are present */
>201 if (!mxb->tea6420_1 || !mxb->tea6420_2 || !mxb->tea6415c ||
>
>* for the second oops:
>(gdb) l *(mxb_probe+0x103)
>0x723 is in mxb_probe
>(/build/buildd/linux-2.6.38/drivers/media/video/mxb.c:195).
>190 "tea6420", I2C_TEA6420_1, NULL);
>191 mxb->tea6420_2 = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>192 "tea6420", I2C_TEA6420_2, NULL);
>193 mxb->tea6415c = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>194 "tea6415c", I2C_TEA6415C, NULL);
>195 mxb->tda9840 = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>196 "tda9840", I2C_TDA9840, NULL);
>197 mxb->tuner = v4l2_i2c_new_subdev(&dev->v4l2_dev,
>&mxb->i2c_adapter,
>198 "tuner", I2C_TUNER, NULL);
>199
>
>Hmm the crash seems to be at random, in the cases above probably
>when getting tea6415c/tda9840.
>
>Looking at the code then, notice how sd (struct v4l2_subdev *) is
>allocated, for example in tda9840_probe we have
>sd = kmalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
>
>So it's not kzalloc, the same holds for tea6415c, its probe function
>uses kmalloc as well.
>
>And this is why sd->internal_ops should be getting a random value. I
>don't see anywhere in current code where we clear sd->internal_ops on
>initialization, and as many of these tuners etc. allocate using just
>kmalloc, we get random data in sd.
>
>A fix could be to drivers allocating/zeroing sd using kzalloc/memset
>before use.
>
>But seems current v4l code assumes drivers can use kmalloc, and
>initialization is done in v4l2_subdev_init, so in a reply to this I
>propose a patch to initialize sd->internal_ops to null in
>v4l2_subdev_init, which should fix these random crashes.
>
>[1]
>http://ddebs.ubuntu.com/pool/main/l/linux/linux-image-2.6.38-7-generic-dbgsym_2.6.38-7.39_amd64.ddeb
>[2] git://kernel.ubuntu.com/ubuntu/ubuntu-natty.git
> (pointing to Ubuntu-2.6.38-7.39 tag checkout)
>
>--
>[]'s
>Herton
>--
>To unsubscribe from this list: send the line "unsubscribe linux-media"
>in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

The one RAX register lloks like it has ASCII bytes in it.

Sorry I can't look at these futher myself. I have urgent family needs this weekend.

Regards,
Andy

2011-04-01 15:05:57

by Hans Verkuil

[permalink] [raw]
Subject: Re: [PATCH] v4l: subdev: initialize sd->internal_ops in v4l2_subdev_init

On Friday, April 01, 2011 16:24:17 Herton Ronaldo Krzesinski wrote:
> Many v4l drivers currently don't initialize their struct v4l2_subdev
> with zeros, so since the addition of internal_ops in commit 45f6f84, we
> are at risk of random oopses when code in v4l2_device_register_subdev
> tries to dereference sd->internal_ops->*, as can be shown by the report
> at http://bugs.launchpad.net/bugs/745213
>
> So make sure internal_ops is cleared in v4l2_subdev_init.

NACK: we need to replace those kmalloc's with kzalloc. This patch will just
fix the internal_ops problem, but sd->entity isn't zeroed. It's going to be
a neverending problem unless we fix those kmalloc's. It is my fault, I guess:
I should always have required the use of kzalloc.

I grepped for this and the number of kmalloc's is quite small:

drivers/media/video/tda9840.c
drivers/media/video/upd64031a.c
drivers/media/video/m52790.c
drivers/media/video/tea6415c.c
drivers/media/video/tea6420.c
drivers/media/video/upd64083.c
drivers/media/radio/saa7706h.c
drivers/media/radio/tef6862.c

If you fix those kmalloc's, then I'll ack those changes. There is just one
relevant kmalloc in each file.

The reason this wasn't noticed before is that all these devices are all
pretty rare. All the more common device drivers use kzalloc. I am really
surprised to hear of mxb boards still in use! Just for the record: I have
one myself for testing, although I clearly never realized that I should
test that particular change with that board...

Thanks for doing such a great job of tracking this down!

Regards,

Hans

>
> BugLink: http://bugs.launchpad.net/bugs/745213
> Cc: <[email protected]> # .38.x
> Signed-off-by: Herton Ronaldo Krzesinski <[email protected]>
> ---
> drivers/media/video/v4l2-subdev.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/media/video/v4l2-subdev.c b/drivers/media/video/v4l2-subdev.c
> index 0b80644..0f70c74 100644
> --- a/drivers/media/video/v4l2-subdev.c
> +++ b/drivers/media/video/v4l2-subdev.c
> @@ -324,6 +324,7 @@ void v4l2_subdev_init(struct v4l2_subdev *sd, const struct v4l2_subdev_ops *ops)
> sd->grp_id = 0;
> sd->dev_priv = NULL;
> sd->host_priv = NULL;
> + sd->internal_ops = NULL;
> #if defined(CONFIG_MEDIA_CONTROLLER)
> sd->entity.name = sd->name;
> sd->entity.type = MEDIA_ENT_T_V4L2_SUBDEV;
>

Subject: Re: [PATCH] v4l: subdev: initialize sd->internal_ops in v4l2_subdev_init

On Fri, Apr 01, 2011 at 05:05:27PM +0200, Hans Verkuil wrote:
> On Friday, April 01, 2011 16:24:17 Herton Ronaldo Krzesinski wrote:
> > Many v4l drivers currently don't initialize their struct v4l2_subdev
> > with zeros, so since the addition of internal_ops in commit 45f6f84, we
> > are at risk of random oopses when code in v4l2_device_register_subdev
> > tries to dereference sd->internal_ops->*, as can be shown by the report
> > at http://bugs.launchpad.net/bugs/745213
> >
> > So make sure internal_ops is cleared in v4l2_subdev_init.
>
> NACK: we need to replace those kmalloc's with kzalloc. This patch will just
> fix the internal_ops problem, but sd->entity isn't zeroed. It's going to be
> a neverending problem unless we fix those kmalloc's. It is my fault, I guess:
> I should always have required the use of kzalloc.
>
> I grepped for this and the number of kmalloc's is quite small:
>
> drivers/media/video/tda9840.c
> drivers/media/video/upd64031a.c
> drivers/media/video/m52790.c
> drivers/media/video/tea6415c.c
> drivers/media/video/tea6420.c
> drivers/media/video/upd64083.c
> drivers/media/radio/saa7706h.c
> drivers/media/radio/tef6862.c
>
> If you fix those kmalloc's, then I'll ack those changes. There is just one
> relevant kmalloc in each file.
>
> The reason this wasn't noticed before is that all these devices are all
> pretty rare. All the more common device drivers use kzalloc. I am really
> surprised to hear of mxb boards still in use! Just for the record: I have
> one myself for testing, although I clearly never realized that I should
> test that particular change with that board...
>
> Thanks for doing such a great job of tracking this down!

No problem, I'll take a look and send a new patch.

>
> Regards,
>
> Hans
>

--
[]'s
Herton