2008-10-15 16:42:57

by Alan Jenkins

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Laurent Pinchart wrote:
> Hi Alan,
>
> On Friday 26 September 2008, Alan Jenkins wrote:
>
>> This is is on v2.6.27-rc6. I originally saw it on todays -tip tree.
>>
>> # while true; do modprobe uvcvideo; modprobe -r uvcvideo; done
>>
>> After a few tens of cycles, modprobe gets stuck in "D" state, and the
>> following backtrace appears:
>>
>
> [snip]
>
> I can't reproduce the issue here on 2.6.27. Could you please test that
> version ?
>

Sure... still happens.

If you look at the trace, it happens as "hald-probe-video" opens the
video device. This is from Ubuntu 8.04. Possibly it's significant that
I use the camera first, to make sure it works (I use Kopete, the
settings dialogue includes a video test).

[ 267.056655] input: UVC Camera (eb1a:2761) as /class/input/input59
[ 267.069990] usbcore: registered new interface driver uvcvideo
[ 267.069990] USB Video Class driver (v0.1.0)
[ 267.299989] usbcore: deregistering interface driver uvcvideo
[ 267.336656] Linux video capture interface: v2.00
[ 267.349981] uvcvideo: Found UVC 1.00 device <unnamed> (eb1a:2761)
[ 267.353317] input: UVC Camera (eb1a:2761) as /class/input/input60
[ 267.363324] usbcore: registered new interface driver uvcvideo
[ 267.363324] USB Video Class driver (v0.1.0)
[ 267.373325] BUG: unable to handle kernel NULL pointer dereference at
00000030
[ 267.373325] IP: [<e03e13ec>] :videodev:video_open+0x8b/0xe4
[ 267.373325] *pdpt = 0000000011ebc001 *pde = 0000000000000000
[ 267.373325] Oops: 0000 [#1]
[ 267.373325] Modules linked in: uvcvideo(-) compat_ioctl32 videodev
v4l1_compat aes_i586 aes_generic af_packet i915 drm cpufreq_userspace
cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table
cpufreq_conservative wmi sbs sbshc ip6t_LOG nf_conntrack_ipv6 ipt_LOG
xt_limit ipt_addrtype xt_state xt_tcpudp xt_conntrack ip6table_filter
ip6_tables ipv6 nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat
nf_conntrack_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables
x_tables dm_crypt dm_mod fuse joydev arc4 ecb crypto_blkcipher ath5k
mac80211 led_class sg cfg80211 psmouse serio_raw snd_hda_intel
snd_pcm_oss snd_mixer_oss ata_generic snd_pcm snd_timer snd_page_alloc
snd_hwdep shpchp pci_hotplug intel_agp agpgart video output battery ac
eeepc_laptop backlight button evdev uhci_hcd ehci_hcd usb_storage
libusual pcspkr usbcore thermal processor fan [last unloaded: v4l1_compat]
[ 267.373325]
[ 267.373325] Pid: 7709, comm: hald-probe-vide Not tainted (2.6.27eeepc
#61)
[ 267.373325] EIP: 0060:[<e03e13ec>] EFLAGS: 00010246 CPU: 0
[ 267.373325] EIP is at video_open+0x8b/0xe4 [videodev]
[ 267.373325] EAX: 00000000 EBX: e03e49d0 ECX: e03e1361 EDX: d1e6c000
[ 267.373325] ESI: d1ea5c80 EDI: 00000000 EBP: debf41ec ESP: d1efde98
[ 267.373325] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 267.373325] Process hald-probe-vide (pid: 7709, ti=d1efc000
task=d1c586e0 task.ti=d1efc000)
[ 267.373325] Stack: 00000000 df1e17c0 00000000 debf41ec c01604e6
d1ea5c80 00000000 d1ea5c80
[ 267.373325] debf41ec d1efdf14 c0160406 c015d36d deae0580
def61700 d1efdf14 d1ea5c80
[ 267.373325] d1efdf14 00008001 c015d471 d1ea5c80 00000000
00000000 d1efdf14 c016692e
[ 267.373325] Call Trace:
[ 267.373325] [<c01604e6>] chrdev_open+0xe0/0xf6
[ 267.373325] [<c0160406>] chrdev_open+0x0/0xf6
[ 267.373325] [<c015d36d>] __dentry_open+0xf2/0x1da
[ 267.373325] [<c015d471>] nameidata_to_filp+0x1c/0x2c
[ 267.373325] [<c016692e>] do_filp_open+0x343/0x61e
[ 267.373325] [<c014e1ad>] handle_mm_fault+0x27d/0x528
[ 267.373325] [<c016df2d>] alloc_fd+0x46/0xad
[ 267.373325] [<c015d1a8>] do_sys_open+0x3f/0xb3
[ 267.373325] [<c015d260>] sys_open+0x1e/0x23
[ 267.373325] [<c01035c1>] sysenter_do_call+0x12/0x21
[ 267.373325] [<c0280000>] xfrm_state_find+0x3bb/0x4b1
[ 267.373325] =======================
[ 267.373325] Code: c4 0c 85 d2 74 6d 8b 02 8b 5e 10 85 c0 74 15 8b 00
85 c0 74 0b 83 38 02 74 0a ff 80 40 01 00 00 8b 02 eb 02 31 c0 89 46 10
31 ff <8b> 48 30 85 c9 74 36 89 f2 89 e8 ff d1 85 c0 89 c7 74 2a 8b 46
[ 267.373325] EIP: [<e03e13ec>] video_open+0x8b/0xe4 [videodev] SS:ESP
0068:d1efde98
[ 267.373325] ---[ end trace 1a1a7d2a0a55bf75 ]---
[ 267.379201] usbcore: deregistering interface driver uvcvideo


2008-10-15 18:17:47

by Laurent Pinchart

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Hi Alan,

On Wednesday 15 October 2008, Alan Jenkins wrote:
> Laurent Pinchart wrote:
> > Hi Alan,
> >
> > On Friday 26 September 2008, Alan Jenkins wrote:
> >> This is is on v2.6.27-rc6. I originally saw it on todays -tip tree.
> >>
> >> # while true; do modprobe uvcvideo; modprobe -r uvcvideo; done
> >>
> >> After a few tens of cycles, modprobe gets stuck in "D" state, and the
> >> following backtrace appears:
> >
> > [snip]
> >
> > I can't reproduce the issue here on 2.6.27. Could you please test that
> > version ?
>
> Sure... still happens.

I had secretly hoped it would have disapearred :-)

> If you look at the trace, it happens as "hald-probe-video" opens the
> video device. This is from Ubuntu 8.04. Possibly it's significant that
> I use the camera first, to make sure it works (I use Kopete, the
> settings dialogue includes a video test).

The NULL pointer (or rather 0x00000030 pointer) dereference happens in
video_open:

file->f_op = fops_get(vfl->fops);
if (file->f_op->open)
err = file->f_op->open(inode, file);

file->f_op ends up being NULL. Either vfl->fops is NULL to begin with, or
fops_get failed to get a reference to the file_operations structure.

I'd be surprised if vfl->fops was NULL. To rule out that case, can you add a
BUG_ON(vfl->fops == NULL) before the call to fops_get ?

I'm not too familiar with the module loader, but a quick look at the code
shows that the module could be marked as being unloaded (MODULE_STATE_GOING)
before its exit function is called. If this is the case video_open would
still be called, as the video device would still be registered, but fops_get
would fail in try_module_get and return a NULL pointer. It seems the pointer
returned by fops_get should be tested in video_open.

I've CC'ed the v4l maintainer to get his opinion on this.

> [ 267.056655] input: UVC Camera (eb1a:2761) as /class/input/input59
> [ 267.069990] usbcore: registered new interface driver uvcvideo
> [ 267.069990] USB Video Class driver (v0.1.0)
> [ 267.299989] usbcore: deregistering interface driver uvcvideo
> [ 267.336656] Linux video capture interface: v2.00
> [ 267.349981] uvcvideo: Found UVC 1.00 device <unnamed> (eb1a:2761)
> [ 267.353317] input: UVC Camera (eb1a:2761) as /class/input/input60
> [ 267.363324] usbcore: registered new interface driver uvcvideo
> [ 267.363324] USB Video Class driver (v0.1.0)
> [ 267.373325] BUG: unable to handle kernel NULL pointer dereference at
> 00000030
> [ 267.373325] IP: [<e03e13ec>] :videodev:video_open+0x8b/0xe4
> [ 267.373325] *pdpt = 0000000011ebc001 *pde = 0000000000000000
> [ 267.373325] Oops: 0000 [#1]
> [ 267.373325] Modules linked in: uvcvideo(-) compat_ioctl32 videodev
> v4l1_compat aes_i586 aes_generic af_packet i915 drm cpufreq_userspace
> cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table
> cpufreq_conservative wmi sbs sbshc ip6t_LOG nf_conntrack_ipv6 ipt_LOG
> xt_limit ipt_addrtype xt_state xt_tcpudp xt_conntrack ip6table_filter
> ip6_tables ipv6 nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat
> nf_conntrack_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables
> x_tables dm_crypt dm_mod fuse joydev arc4 ecb crypto_blkcipher ath5k
> mac80211 led_class sg cfg80211 psmouse serio_raw snd_hda_intel
> snd_pcm_oss snd_mixer_oss ata_generic snd_pcm snd_timer snd_page_alloc
> snd_hwdep shpchp pci_hotplug intel_agp agpgart video output battery ac
> eeepc_laptop backlight button evdev uhci_hcd ehci_hcd usb_storage
> libusual pcspkr usbcore thermal processor fan [last unloaded: v4l1_compat]
> [ 267.373325]
> [ 267.373325] Pid: 7709, comm: hald-probe-vide Not tainted (2.6.27eeepc
> #61)
> [ 267.373325] EIP: 0060:[<e03e13ec>] EFLAGS: 00010246 CPU: 0
> [ 267.373325] EIP is at video_open+0x8b/0xe4 [videodev]
> [ 267.373325] EAX: 00000000 EBX: e03e49d0 ECX: e03e1361 EDX: d1e6c000
> [ 267.373325] ESI: d1ea5c80 EDI: 00000000 EBP: debf41ec ESP: d1efde98
> [ 267.373325] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> [ 267.373325] Process hald-probe-vide (pid: 7709, ti=d1efc000
> task=d1c586e0 task.ti=d1efc000)
> [ 267.373325] Stack: 00000000 df1e17c0 00000000 debf41ec c01604e6
> d1ea5c80 00000000 d1ea5c80
> [ 267.373325] debf41ec d1efdf14 c0160406 c015d36d deae0580
> def61700 d1efdf14 d1ea5c80
> [ 267.373325] d1efdf14 00008001 c015d471 d1ea5c80 00000000
> 00000000 d1efdf14 c016692e
> [ 267.373325] Call Trace:
> [ 267.373325] [<c01604e6>] chrdev_open+0xe0/0xf6
> [ 267.373325] [<c0160406>] chrdev_open+0x0/0xf6
> [ 267.373325] [<c015d36d>] __dentry_open+0xf2/0x1da
> [ 267.373325] [<c015d471>] nameidata_to_filp+0x1c/0x2c
> [ 267.373325] [<c016692e>] do_filp_open+0x343/0x61e
> [ 267.373325] [<c014e1ad>] handle_mm_fault+0x27d/0x528
> [ 267.373325] [<c016df2d>] alloc_fd+0x46/0xad
> [ 267.373325] [<c015d1a8>] do_sys_open+0x3f/0xb3
> [ 267.373325] [<c015d260>] sys_open+0x1e/0x23
> [ 267.373325] [<c01035c1>] sysenter_do_call+0x12/0x21
> [ 267.373325] [<c0280000>] xfrm_state_find+0x3bb/0x4b1
> [ 267.373325] =======================
> [ 267.373325] Code: c4 0c 85 d2 74 6d 8b 02 8b 5e 10 85 c0 74 15 8b 00
> 85 c0 74 0b 83 38 02 74 0a ff 80 40 01 00 00 8b 02 eb 02 31 c0 89 46 10
> 31 ff <8b> 48 30 85 c9 74 36 89 f2 89 e8 ff d1 85 c0 89 c7 74 2a 8b 46
> [ 267.373325] EIP: [<e03e13ec>] video_open+0x8b/0xe4 [videodev] SS:ESP
> 0068:d1efde98
> [ 267.373325] ---[ end trace 1a1a7d2a0a55bf75 ]---
> [ 267.379201] usbcore: deregistering interface driver uvcvideo

Best regards,

Laurent Pinchart

2008-10-15 18:53:43

by Alan Jenkins

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Laurent Pinchart wrote:
> Hi Alan,
>
> On Wednesday 15 October 2008, Alan Jenkins wrote:
>
>> Laurent Pinchart wrote:
>>
>>> Hi Alan,
>>>
>>> On Friday 26 September 2008, Alan Jenkins wrote:
>>>
>>>> This is is on v2.6.27-rc6. I originally saw it on todays -tip tree.
>>>>
>>>> # while true; do modprobe uvcvideo; modprobe -r uvcvideo; done
>>>>
>>>> After a few tens of cycles, modprobe gets stuck in "D" state, and the
>>>> following backtrace appears:
>>>>
>>> [snip]
>>>
>>> I can't reproduce the issue here on 2.6.27. Could you please test that
>>> version ?
>>>
>> Sure... still happens.
>>
>
> I had secretly hoped it would have disapearred :-)
>
>
>> If you look at the trace, it happens as "hald-probe-video" opens the
>> video device. This is from Ubuntu 8.04. Possibly it's significant that
>> I use the camera first, to make sure it works (I use Kopete, the
>> settings dialogue includes a video test).
>>
>
> The NULL pointer (or rather 0x00000030 pointer) dereference happens in
> video_open:
>
> file->f_op = fops_get(vfl->fops);
> if (file->f_op->open)
> err = file->f_op->open(inode, file);
>
> file->f_op ends up being NULL. Either vfl->fops is NULL to begin with, or
> fops_get failed to get a reference to the file_operations structure.
>
> I'd be surprised if vfl->fops was NULL. To rule out that case, can you add a
> BUG_ON(vfl->fops == NULL) before the call to fops_get ?
>
> I'm not too familiar with the module loader, but a quick look at the code
> shows that the module could be marked as being unloaded (MODULE_STATE_GOING)
> before its exit function is called. If this is the case video_open would
> still be called, as the video device would still be registered, but fops_get
> would fail in try_module_get and return a NULL pointer. It seems the pointer
> returned by fops_get should be tested in video_open.
>
> I've CC'ed the v4l maintainer to get his opinion on this.
>

I put one before and one after

134 BUG_ON(vfl->fops == NULL);
135 file->f_op = fops_get(vfl->fops);
136 BUG_ON(file->f_op == NULL);

and the second one triggered

[ 245.379990] ------------[ cut here ]------------
[ 245.379990] kernel BUG at drivers/media/video/v4l2-dev.c:136!
[ 245.379990] invalid opcode: 0000 [#1]
[ 245.379990] Modules linked in: uvcvideo(-) compat_ioctl32 videodev
v4l1_compat aes_i586 aes_generic af_packet i915 drm cpufreq_userspace
cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table
cpufreq_conservative wmi sbs sbshc ip6t_LOG nf_conntrack_ipv6 ipt_LOG
xt_limit ipt_addrtype xt_state xt_tcpudp xt_conntrack ip6table_filter
ip6_tables ipv6 nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat
nf_conntrack_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables
x_tables dm_crypt dm_mod fuse joydev arc4 ecb crypto_blkcipher ath5k
mac80211 led_class cfg80211 psmouse serio_raw sg snd_hda_intel
ata_generic snd_pcm_oss snd_mixer_oss snd_pcm shpchp pci_hotplug
snd_timer snd_page_alloc snd_hwdep intel_agp agpgart video output
battery ac eeepc_laptop backlight button evdev uhci_hcd ehci_hcd
usb_storage libusual usbcore pcspkr thermal processor fan [last
unloaded: v4l1_compat]
[ 245.379990]
[ 245.379990] Pid: 11686, comm: hald-probe-vide Not tainted
(2.6.27eeepc #62)
[ 245.379990] EIP: 0060:[<e024f3f2>] EFLAGS: 00010246 CPU: 0
[ 245.379990] EIP is at video_open+0x8f/0xee [videodev]
[ 245.379990] EAX: e03fbf40 EBX: e02529dc ECX: 00000000 EDX: d1d20e00
[ 245.379990] ESI: d1f33780 EDI: ffffffed EBP: de4e182c ESP: d1fade98
[ 245.379990] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 245.379990] Process hald-probe-vide (pid: 11686, ti=d1fac000
task=d8e73810 task.ti=d1fac000)
[ 245.379990] Stack: 00000000 d1f20540 00000000 de4e182c c01604e6
d1f33780 00000000 d1f33780
[ 245.379990] de4e182c d1fadf14 c0160406 c015d36d deae2580
d22c8600 d1fadf14 d1f33780
[ 245.379990] d1fadf14 00008001 c015d471 d1f33780 00000000
00000000 d1fadf14 c016692e
[ 245.379990] Call Trace:
[ 245.379990] [<c01604e6>] chrdev_open+0xe0/0xf6
[ 245.379990] [<c0160406>] chrdev_open+0x0/0xf6
[ 245.379990] [<c015d36d>] __dentry_open+0xf2/0x1da
[ 245.379990] [<c015d471>] nameidata_to_filp+0x1c/0x2c
[ 245.379990] [<c016692e>] do_filp_open+0x343/0x61e
[ 245.379990] [<c014e1ad>] handle_mm_fault+0x27d/0x528
[ 245.379990] [<c016df2d>] alloc_fd+0x46/0xad
[ 245.379990] [<c015d1a8>] do_sys_open+0x3f/0xb3
[ 245.379990] [<c015d260>] sys_open+0x1e/0x23
[ 245.379990] [<c01035c1>] sysenter_do_call+0x12/0x21
[ 245.379990] [<c0280000>] xfrm_state_find+0x3bb/0x4b1
[ 245.379990] =======================
[ 245.379990] Code: 74 77 8b 02 8b 5e 10 85 c0 75 04 0f 0b eb fe 8b 00
85 c0 74 0d 31 c9 83 38 02 74 08 ff 80 40 01 00 00 8b 0a 85 c9 89 4e 10
75 04 <0f> 0b eb fe 8b 49 30 31 ff 85 c9 74 36 89 f2 89 e8 ff d1 85 c0
[ 245.379990] EIP: [<e024f3f2>] video_open+0x8f/0xee [videodev] SS:ESP
0068:d1fade98
[ 245.379990] ---[ end trace 2385a52acb7b9557 ]---

2008-10-15 21:19:21

by Laurent Pinchart

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Hi Alan,

On Wednesday 15 October 2008, Alan Jenkins wrote:
> Laurent Pinchart wrote:
> > On Wednesday 15 October 2008, Alan Jenkins wrote:
> > > If you look at the trace, it happens as "hald-probe-video" opens the
> > > video device. This is from Ubuntu 8.04. Possibly it's significant that
> > > I use the camera first, to make sure it works (I use Kopete, the
> > > settings dialogue includes a video test).
> >
> > The NULL pointer (or rather 0x00000030 pointer) dereference happens in
> > video_open:
> >
> > file->f_op = fops_get(vfl->fops);
> > if (file->f_op->open)
> > err = file->f_op->open(inode, file);
> >
> > file->f_op ends up being NULL. Either vfl->fops is NULL to begin with, or
> > fops_get failed to get a reference to the file_operations structure.
> >
> > I'd be surprised if vfl->fops was NULL. To rule out that case, can you
> > add a BUG_ON(vfl->fops == NULL) before the call to fops_get ?
> >
> > I'm not too familiar with the module loader, but a quick look at the code
> > shows that the module could be marked as being unloaded
> > (MODULE_STATE_GOING) before its exit function is called. If this is the
> > case video_open would still be called, as the video device would still be
> > registered, but fops_get would fail in try_module_get and return a NULL
> > pointer. It seems the pointer returned by fops_get should be tested in
> > video_open.
> >
> > I've CC'ed the v4l maintainer to get his opinion on this.
>
> I put one before and one after
>
> 134 BUG_ON(vfl->fops == NULL);
> 135 file->f_op = fops_get(vfl->fops);
> 136 BUG_ON(file->f_op == NULL);
>
> and the second one triggered

This confirms my suspicion. Could you please try the attached patch ?

Best regards,

Laurent Pinchart


Attachments:
(No filename) (1.73 kB)
fops_get_check.patch (710.00 B)
Download all attachments

2008-10-16 10:01:43

by Alan Jenkins

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Laurent Pinchart wrote:
> Hi Alan,
>
> On Wednesday 15 October 2008, Alan Jenkins wrote:
>
>> Laurent Pinchart wrote:
>>
>>> On Wednesday 15 October 2008, Alan Jenkins wrote:
>>>
>>>> If you look at the trace, it happens as "hald-probe-video" opens the
>>>> video device. This is from Ubuntu 8.04. Possibly it's significant that
>>>> I use the camera first, to make sure it works (I use Kopete, the
>>>> settings dialogue includes a video test).
>>>>
>>> The NULL pointer (or rather 0x00000030 pointer) dereference happens in
>>> video_open:
>>>
>>> file->f_op = fops_get(vfl->fops);
>>> if (file->f_op->open)
>>> err = file->f_op->open(inode, file);
>>>
>>> file->f_op ends up being NULL. Either vfl->fops is NULL to begin with, or
>>> fops_get failed to get a reference to the file_operations structure.
>>>
>>> I'd be surprised if vfl->fops was NULL. To rule out that case, can you
>>> add a BUG_ON(vfl->fops == NULL) before the call to fops_get ?
>>>
>>> I'm not too familiar with the module loader, but a quick look at the code
>>> shows that the module could be marked as being unloaded
>>> (MODULE_STATE_GOING) before its exit function is called. If this is the
>>> case video_open would still be called, as the video device would still be
>>> registered, but fops_get would fail in try_module_get and return a NULL
>>> pointer. It seems the pointer returned by fops_get should be tested in
>>> video_open.
>>>
>>> I've CC'ed the v4l maintainer to get his opinion on this.
>>>
>> I put one before and one after
>>
>> 134 BUG_ON(vfl->fops == NULL);
>> 135 file->f_op = fops_get(vfl->fops);
>> 136 BUG_ON(file->f_op == NULL);
>>
>> and the second one triggered
>>
>
> This confirms my suspicion. Could you please try the attached patch ?
>

Yup, that seems to fix it.

I wonder if there are more instances of this error in other subsystems.

Ta
Alan

2008-10-16 12:03:48

by Laurent Pinchart

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Hi Alan,

On Thursday 16 October 2008, Alan Jenkins wrote:
> Laurent Pinchart wrote:
> > On Wednesday 15 October 2008, Alan Jenkins wrote:
> >> Laurent Pinchart wrote:
> >>> On Wednesday 15 October 2008, Alan Jenkins wrote:
> >>>> If you look at the trace, it happens as "hald-probe-video" opens the
> >>>> video device. This is from Ubuntu 8.04. Possibly it's significant
> >>>> that I use the camera first, to make sure it works (I use Kopete, the
> >>>> settings dialogue includes a video test).
> >>>
> >>> The NULL pointer (or rather 0x00000030 pointer) dereference happens in
> >>> video_open:
> >>>
> >>> file->f_op = fops_get(vfl->fops);
> >>> if (file->f_op->open)
> >>> err = file->f_op->open(inode, file);
> >>>
> >>> file->f_op ends up being NULL. Either vfl->fops is NULL to begin with,
> >>> or fops_get failed to get a reference to the file_operations structure.
> >>>
> >>> I'd be surprised if vfl->fops was NULL. To rule out that case, can you
> >>> add a BUG_ON(vfl->fops == NULL) before the call to fops_get ?
> >>>
> >>> I'm not too familiar with the module loader, but a quick look at the
> >>> code shows that the module could be marked as being unloaded
> >>> (MODULE_STATE_GOING) before its exit function is called. If this is the
> >>> case video_open would still be called, as the video device would still
> >>> be registered, but fops_get would fail in try_module_get and return a
> >>> NULL pointer. It seems the pointer returned by fops_get should be
> >>> tested in video_open.
> >>>
> >>> I've CC'ed the v4l maintainer to get his opinion on this.
> >>
> >> I put one before and one after
> >>
> >> 134 BUG_ON(vfl->fops == NULL);
> >> 135 file->f_op = fops_get(vfl->fops);
> >> 136 BUG_ON(file->f_op == NULL);
> >>
> >> and the second one triggered
> >
> > This confirms my suspicion. Could you please try the attached patch ?
>
> Yup, that seems to fix it.

Great.

> I wonder if there are more instances of this error in other subsystems.

>From a quick grep it seems the following subsystems are affected:

drivers/media/video
drivers/media/video/dvb/dvb-core
drivers/gpu/drm
sound/core

Unless the issue is critical and should be fixed before 2.6.28,
drivers/media/video won't matter as the v4l core has already been moved to
the cdev API in the kernel tree, removing the offending code.

Will you submit patches for the other three subsystems or would you like me to
take care of that ?

Best regards,

Laurent Pinchart

2008-10-16 12:22:40

by Alan Jenkins

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Laurent Pinchart wrote:
> Hi Alan,
>
> On Thursday 16 October 2008, Alan Jenkins wrote:
>
>> Laurent Pinchart wrote:
>>
>>> On Wednesday 15 October 2008, Alan Jenkins wrote:
>>>
>>>> Laurent Pinchart wrote:
>>>>
>>>>> On Wednesday 15 October 2008, Alan Jenkins wrote:
>>>>>
>>>>>> If you look at the trace, it happens as "hald-probe-video" opens the
>>>>>> video device. This is from Ubuntu 8.04. Possibly it's significant
>>>>>> that I use the camera first, to make sure it works (I use Kopete, the
>>>>>> settings dialogue includes a video test).
>>>>>>
>>>>> The NULL pointer (or rather 0x00000030 pointer) dereference happens in
>>>>> video_open:
>>>>>
>>>>> file->f_op = fops_get(vfl->fops);
>>>>> if (file->f_op->open)
>>>>> err = file->f_op->open(inode, file);
>>>>>
>>>>> file->f_op ends up being NULL. Either vfl->fops is NULL to begin with,
>>>>> or fops_get failed to get a reference to the file_operations structure.
>>>>>
>>>>> I'd be surprised if vfl->fops was NULL. To rule out that case, can you
>>>>> add a BUG_ON(vfl->fops == NULL) before the call to fops_get ?
>>>>>
>>>>> I'm not too familiar with the module loader, but a quick look at the
>>>>> code shows that the module could be marked as being unloaded
>>>>> (MODULE_STATE_GOING) before its exit function is called. If this is the
>>>>> case video_open would still be called, as the video device would still
>>>>> be registered, but fops_get would fail in try_module_get and return a
>>>>> NULL pointer. It seems the pointer returned by fops_get should be
>>>>> tested in video_open.
>>>>>
>>>>> I've CC'ed the v4l maintainer to get his opinion on this.
>>>>>
>>>> I put one before and one after
>>>>
>>>> 134 BUG_ON(vfl->fops == NULL);
>>>> 135 file->f_op = fops_get(vfl->fops);
>>>> 136 BUG_ON(file->f_op == NULL);
>>>>
>>>> and the second one triggered
>>>>
>>> This confirms my suspicion. Could you please try the attached patch ?
>>>
>> Yup, that seems to fix it.
>>
>
> Great.
>
>
>> I wonder if there are more instances of this error in other subsystems.
>>
>
> From a quick grep it seems the following subsystems are affected:
>
> drivers/media/video
> drivers/media/video/dvb/dvb-core
> drivers/gpu/drm
> sound/core
>
> Unless the issue is critical and should be fixed before 2.6.28,
> drivers/media/video won't matter as the v4l core has already been moved to
> the cdev API in the kernel tree, removing the offending code.
>
> Will you submit patches for the other three subsystems or would you like me to
> take care of that ?
>

I need to start concentrating on project work. Feel free to take all
the hard work^W^W patch credits :).

It's not critical to me. This only happens in the stress test because
it unloads the module "too soon" after loading it, while HAL tries to
open the new device.

It's a completely artificial test. I ran it to see what happens with
input devices - the device numbers don't seem to be reallocated like
e.g. usb storage devices. Continually reloading the uvcvideo driver
means the number assigned to input device increments each time, and you
get "input67" and so on. I haven't worked out whether this is a bug
though. For all I know the numbers are reused eventually, once they've
run through the entire device minor space.

Alan

2008-10-24 14:31:43

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

On Thu, 16 Oct 2008 11:01:27 +0100
Alan Jenkins <[email protected]> wrote:

> Laurent Pinchart wrote:
> > Hi Alan,
> >
> > On Wednesday 15 October 2008, Alan Jenkins wrote:
> >
> >> Laurent Pinchart wrote:
> >>
> >>> On Wednesday 15 October 2008, Alan Jenkins wrote:
> >>>
> >>>> If you look at the trace, it happens as "hald-probe-video" opens the
> >>>> video device. This is from Ubuntu 8.04. Possibly it's significant that
> >>>> I use the camera first, to make sure it works (I use Kopete, the
> >>>> settings dialogue includes a video test).
> >>>>
> >>> The NULL pointer (or rather 0x00000030 pointer) dereference happens in
> >>> video_open:
> >>>
> >>> file->f_op = fops_get(vfl->fops);
> >>> if (file->f_op->open)
> >>> err = file->f_op->open(inode, file);
> >>>
> >>> file->f_op ends up being NULL. Either vfl->fops is NULL to begin with, or
> >>> fops_get failed to get a reference to the file_operations structure.
> >>>
> >>> I'd be surprised if vfl->fops was NULL. To rule out that case, can you
> >>> add a BUG_ON(vfl->fops == NULL) before the call to fops_get ?
> >>>
> >>> I'm not too familiar with the module loader, but a quick look at the code
> >>> shows that the module could be marked as being unloaded
> >>> (MODULE_STATE_GOING) before its exit function is called. If this is the
> >>> case video_open would still be called, as the video device would still be
> >>> registered, but fops_get would fail in try_module_get and return a NULL
> >>> pointer. It seems the pointer returned by fops_get should be tested in
> >>> video_open.
> >>>
> >>> I've CC'ed the v4l maintainer to get his opinion on this.

Sorry for being late with this. Too much work here...

I suspect that you only hit this bug due to BKL removal from open/close fops.
maybe you're calling open() before having the device fully initialized?

Anyway, I think that the proposed check is interesting on other places where
there are similar code.

Cheers,
Mauro

2008-10-25 11:19:15

by Laurent Pinchart

[permalink] [raw]
Subject: Re: [Linux-uvc-devel] [BUG] NULL pointer dereference caused by uvcvideo stress test

Hi Mauro,

On Friday 24 October 2008, Mauro Carvalho Chehab wrote:
> On Thu, 16 Oct 2008 11:01:27 +0100
>
> Alan Jenkins <[email protected]> wrote:
> > Laurent Pinchart wrote:
> > > Hi Alan,
> > >
> > > On Wednesday 15 October 2008, Alan Jenkins wrote:
> > >> Laurent Pinchart wrote:
> > >>> On Wednesday 15 October 2008, Alan Jenkins wrote:
> > >>>> If you look at the trace, it happens as "hald-probe-video" opens the
> > >>>> video device. This is from Ubuntu 8.04. Possibly it's significant
> > >>>> that I use the camera first, to make sure it works (I use Kopete,
> > >>>> the settings dialogue includes a video test).
> > >>>
> > >>> The NULL pointer (or rather 0x00000030 pointer) dereference happens
> > >>> in video_open:
> > >>>
> > >>> file->f_op = fops_get(vfl->fops);
> > >>> if (file->f_op->open)
> > >>> err = file->f_op->open(inode, file);
> > >>>
> > >>> file->f_op ends up being NULL. Either vfl->fops is NULL to begin
> > >>> with, or fops_get failed to get a reference to the file_operations
> > >>> structure.
> > >>>
> > >>> I'd be surprised if vfl->fops was NULL. To rule out that case, can
> > >>> you add a BUG_ON(vfl->fops == NULL) before the call to fops_get ?
> > >>>
> > >>> I'm not too familiar with the module loader, but a quick look at the
> > >>> code shows that the module could be marked as being unloaded
> > >>> (MODULE_STATE_GOING) before its exit function is called. If this is
> > >>> the case video_open would still be called, as the video device would
> > >>> still be registered, but fops_get would fail in try_module_get and
> > >>> return a NULL pointer. It seems the pointer returned by fops_get
> > >>> should be tested in video_open.
> > >>>
> > >>> I've CC'ed the v4l maintainer to get his opinion on this.
>
> Sorry for being late with this. Too much work here...
>
> I suspect that you only hit this bug due to BKL removal from open/close
> fops. maybe you're calling open() before having the device fully
> initialized?

It's actually the other way around. open() is closed while unloading the
module. As v4l is being moved to cdev the bug will go away.

> Anyway, I think that the proposed check is interesting on other places
> where there are similar code.

Best regards,

Laurent Pinchart