2015-04-21 23:36:26

by Linus Torvalds

[permalink] [raw]
Subject: NULL ptr dereference in current i915 driver

So I just go the appended NULL pointer de-reference when trying to
look at a video from my GoPro.

The code disassembles to

0: 81 fb 00 04 00 00 cmp $0x400,%ebx
6: 41 89 07 mov %eax,(%r15)
9: 74 78 je 0x83
b: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi
10: e8 6e b3 1b c1 callq 0xffffffffc11bb383
15: 84 c0 test %al,%al
17: 74 4a je 0x63
19: 48 85 ed test %rbp,%rbp
1c: 75 b5 jne 0xffffffffffffffd3
1e: 48 8b 04 24 mov (%rsp),%rax
22: 49 8b 84 c4 98 01 00 mov 0x198(%r12,%rax,8),%rax
29: 00
2a:* 48 8b 28 mov (%rax),%rbp <-- trapping instruction
2d: 65 ff 05 1f e8 ef 3f incl %gs:0x3fefe81f(%rip) # 0x3fefe853
34: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax
3b: 16 00 00

which matches up with the asm code

cmpl $1024, %ebx #, act_pte
movl %eax, (%r15) # D.49217, *_26
je .L118 #,
.L110:
leaq 24(%rsp), %rdi #, tmp156
call __sg_page_iter_next #
testb %al, %al # D.49219
je .L119 #,
testq %rbp, %rbp # pt_vaddr
jne .L109 #,
movq (%rsp), %rax # %sfp, act_pt
movq 408(%r12,%rax,8), %rax # MEM[(struct i915_hw_ppgtt
*)vm_8(D)].D.36998.pd.page
movq (%rax), %rbp # _21->page, D.49221
#APP
# 72 "./arch/x86/include/asm/preempt.h" 1
incl %gs:__preempt_count(%rip) # __preempt_count
# 0 "" 2
#NO_APP
movabsq $24189255811072, %rax #, tmp150

which in turn seems to come from the C code

pt_vaddr =
kmap_atomic(ppgtt->pd.page_table[act_pt]->page);

(that "testq %rbp,%rbp; jne" just before the oopsing instruction group
is that "if (pt_vaddr == NULL)" test.

IOW, it looks like

ppgtt->pd.page_table[act_pt]

is NULL, and then trying to dereference ->page off of it is what
oopses (the preempt-count increment that comes after is the
"pagefault_disable()" in kmap_atomic, and the big constant we're
loading into %rax is part of "page_address(page)").

I have no idea why "ppgtt->pd.page_table[act_pt]" would be NULL, but
clearly it can be. Can somebody who knows this code look into it. I've
added a few people who have worked in this area recently, in addition
to the usual maintainer list..

Thanks,

Linus

---
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffc010c137>] gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: rfcomm fuse cmac ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat
x86_pkg_temp_thermal pn544_mei mei_phy coretemp pn544 hci nfc
kvm_intel iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
snd_hda_codec_hdmi kvm snd_hda_codec_generic uvcvideo
videobuf2_vmalloc videobuf2_memops microcode videobuf2_core
snd_hda_intel v4l2_common hid_multitouch snd_hda_controller videodev
btusb snd_hda_codec iwlmvm media snd_hwdep mac80211 btbcm snd_seq
btintel bluetooth snd_seq_device joydev snd_pcm serio_raw
i2c_i801 iwlwifi cfg80211 snd_hda_core sony_laptop snd_timer snd
rfkill mei_me soundcore lpc_ich shpchp mei mfd_core dm_crypt
crct10dif_pclmul i915 crc32_pclmul crc32c_intel i2c_algo_bit
drm_kms_helper ghash_clmulni_intel drm i2c_core video
CPU: 1 PID: 2697 Comm: chrome Not tainted 4.0.0-09362-g1fc149933fd4 #8
Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
task: ffff88010dc51b30 ti: ffff88003f328000 task.ti: ffff88003f328000
RIP: 0010:[<ffffffffc010c137>] [<ffffffffc010c137>]
gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
RSP: 0018:ffff88003f32b9a8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000075b1b
RDX: ffff88007d848990 RSI: 0000000000000001 RDI: ffff88003f32b9c0
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88003f6f7e58
R10: 000000000d836000 R11: 0000000000000000 R12: ffff8800d4164000
R13: 0000000000000000 R14: 0000000000000001 R15: ffff88003f7bbffc
FS: 00007f7f0ee94a00(0000) GS:ffff88011fa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000005f607000 CR4: 00000000001407e0
Stack:
0000000000000201 000002010dc51b30 0000000000000000 ffff88007d848990
0000004000075b1b ffff880100000001 0000000000000fe0 ffff880011215900
0000000000000000 ffff88006cc4c380 ffff88003f6f0000 0000000000000001
Call Trace:
ggtt_bind_vma+0x97/0x110 [i915]
i915_vma_bind+0x40/0x410 [i915]
swiotlb_map_sg_attrs+0x74/0x140
i915_gem_object_do_pin+0x864/0x9f0 [i915]
mutex_lock+0x9/0x30
i915_gem_execbuffer_reserve_vma.isra.20+0x66/0x130 [i915]
i915_gem_execbuffer_reserve+0x2ec/0x320 [i915]
i915_gem_do_execbuffer.isra.27+0x5ee/0xf80 [i915]
mutex_optimistic_spin+0x16e/0x1f0
__mutex_lock_interruptible_slowpath+0x21/0x130
shmem_fault+0x57/0x1c0
drm_gem_object_lookup+0x14/0xa0 [drm]
i915_gem_execbuffer2+0xb2/0x2a0 [i915]
drm_ioctl+0x15a/0x580 [drm]
current_fs_time+0x9/0x50
do_vfs_ioctl+0x2e8/0x4f0
file_has_perm+0x77/0x80
syscall_trace_enter_phase1+0x116/0x140
SyS_ioctl+0x79/0x90
system_call_fastpath+0x12/0x6a
Code: 00 81 fb 00 04 00 00 41 89 07 74 78 48 8d 7c 24 18 e8 6e b3 1b
c1 84 c0 74 4a 48 85 ed 75 b5 48 8b 04 24 49 8b 84 c4 98 01 00 00 <48>
8b 28 65 ff 05 1f e8 ef 3f 48 b8 00 00 00 00 00 16 00 00 48
RIP gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
RSP <ffff88003f32b9a8>
CR2: 0000000000000000


2015-04-22 16:11:42

by Michel Thierry

[permalink] [raw]
Subject: Re: NULL ptr dereference in current i915 driver

On 4/22/2015 12:36 AM, Linus Torvalds wrote:
> So I just go the appended NULL pointer de-reference when trying to
> look at a video from my GoPro.
>
> The code disassembles to
>
> 0: 81 fb 00 04 00 00 cmp $0x400,%ebx
> 6: 41 89 07 mov %eax,(%r15)
> 9: 74 78 je 0x83
> b: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi
> 10: e8 6e b3 1b c1 callq 0xffffffffc11bb383
> 15: 84 c0 test %al,%al
> 17: 74 4a je 0x63
> 19: 48 85 ed test %rbp,%rbp
> 1c: 75 b5 jne 0xffffffffffffffd3
> 1e: 48 8b 04 24 mov (%rsp),%rax
> 22: 49 8b 84 c4 98 01 00 mov 0x198(%r12,%rax,8),%rax
> 29: 00
> 2a:* 48 8b 28 mov (%rax),%rbp <-- trapping instruction
> 2d: 65 ff 05 1f e8 ef 3f incl %gs:0x3fefe81f(%rip) # 0x3fefe853
> 34: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax
> 3b: 16 00 00
>
> which matches up with the asm code
>
> cmpl $1024, %ebx #, act_pte
> movl %eax, (%r15) # D.49217, *_26
> je .L118 #,
> .L110:
> leaq 24(%rsp), %rdi #, tmp156
> call __sg_page_iter_next #
> testb %al, %al # D.49219
> je .L119 #,
> testq %rbp, %rbp # pt_vaddr
> jne .L109 #,
> movq (%rsp), %rax # %sfp, act_pt
> movq 408(%r12,%rax,8), %rax # MEM[(struct i915_hw_ppgtt
> *)vm_8(D)].D.36998.pd.page
> movq (%rax), %rbp # _21->page, D.49221
> #APP
> # 72 "./arch/x86/include/asm/preempt.h" 1
> incl %gs:__preempt_count(%rip) # __preempt_count
> # 0 "" 2
> #NO_APP
> movabsq $24189255811072, %rax #, tmp150
>
> which in turn seems to come from the C code
>
> pt_vaddr =
> kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
>
> (that "testq %rbp,%rbp; jne" just before the oopsing instruction group
> is that "if (pt_vaddr == NULL)" test.
>
> IOW, it looks like
>
> ppgtt->pd.page_table[act_pt]
>
> is NULL, and then trying to dereference ->page off of it is what
> oopses (the preempt-count increment that comes after is the
> "pagefault_disable()" in kmap_atomic, and the big constant we're
> loading into %rax is part of "page_address(page)").
>
> I have no idea why "ppgtt->pd.page_table[act_pt]" would be NULL, but
> clearly it can be. Can somebody who knows this code look into it. I've
> added a few people who have worked in this area recently, in addition
> to the usual maintainer list..
>
> Thanks,
>
> Linus
>
> ---
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffffc010c137>] gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in: rfcomm fuse cmac ip6t_rpfilter ip6t_REJECT
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
> nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute
> bridge stp llc ebtable_filter ebtables ip6table_mangle
> ip6table_security ip6table_raw ip6table_filter ip6_tables
> iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat
> x86_pkg_temp_thermal pn544_mei mei_phy coretemp pn544 hci nfc
> kvm_intel iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
> snd_hda_codec_hdmi kvm snd_hda_codec_generic uvcvideo
> videobuf2_vmalloc videobuf2_memops microcode videobuf2_core
> snd_hda_intel v4l2_common hid_multitouch snd_hda_controller videodev
> btusb snd_hda_codec iwlmvm media snd_hwdep mac80211 btbcm snd_seq
> btintel bluetooth snd_seq_device joydev snd_pcm serio_raw
> i2c_i801 iwlwifi cfg80211 snd_hda_core sony_laptop snd_timer snd
> rfkill mei_me soundcore lpc_ich shpchp mei mfd_core dm_crypt
> crct10dif_pclmul i915 crc32_pclmul crc32c_intel i2c_algo_bit
> drm_kms_helper ghash_clmulni_intel drm i2c_core video
> CPU: 1 PID: 2697 Comm: chrome Not tainted 4.0.0-09362-g1fc149933fd4 #8
> Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
> task: ffff88010dc51b30 ti: ffff88003f328000 task.ti: ffff88003f328000
> RIP: 0010:[<ffffffffc010c137>] [<ffffffffc010c137>]
> gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
> RSP: 0018:ffff88003f32b9a8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000075b1b
> RDX: ffff88007d848990 RSI: 0000000000000001 RDI: ffff88003f32b9c0
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88003f6f7e58
> R10: 000000000d836000 R11: 0000000000000000 R12: ffff8800d4164000
> R13: 0000000000000000 R14: 0000000000000001 R15: ffff88003f7bbffc
> FS: 00007f7f0ee94a00(0000) GS:ffff88011fa80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000005f607000 CR4: 00000000001407e0
> Stack:
> 0000000000000201 000002010dc51b30 0000000000000000 ffff88007d848990
> 0000004000075b1b ffff880100000001 0000000000000fe0 ffff880011215900
> 0000000000000000 ffff88006cc4c380 ffff88003f6f0000 0000000000000001
> Call Trace:
> ggtt_bind_vma+0x97/0x110 [i915]
> i915_vma_bind+0x40/0x410 [i915]
> swiotlb_map_sg_attrs+0x74/0x140
> i915_gem_object_do_pin+0x864/0x9f0 [i915]
> mutex_lock+0x9/0x30
> i915_gem_execbuffer_reserve_vma.isra.20+0x66/0x130 [i915]
> i915_gem_execbuffer_reserve+0x2ec/0x320 [i915]
> i915_gem_do_execbuffer.isra.27+0x5ee/0xf80 [i915]
> mutex_optimistic_spin+0x16e/0x1f0
> __mutex_lock_interruptible_slowpath+0x21/0x130
> shmem_fault+0x57/0x1c0
> drm_gem_object_lookup+0x14/0xa0 [drm]
> i915_gem_execbuffer2+0xb2/0x2a0 [i915]
> drm_ioctl+0x15a/0x580 [drm]
> current_fs_time+0x9/0x50
> do_vfs_ioctl+0x2e8/0x4f0
> file_has_perm+0x77/0x80
> syscall_trace_enter_phase1+0x116/0x140
> SyS_ioctl+0x79/0x90
> system_call_fastpath+0x12/0x6a
> Code: 00 81 fb 00 04 00 00 41 89 07 74 78 48 8d 7c 24 18 e8 6e b3 1b
> c1 84 c0 74 4a 48 85 ed 75 b5 48 8b 04 24 49 8b 84 c4 98 01 00 00 <48>
> 8b 28 65 ff 05 1f e8 ef 3f 48 b8 00 00 00 00 00 16 00 00 48
> RIP gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
> RSP <ffff88003f32b9a8>
> CR2: 0000000000000000
>

Hi,

I see a possible va re-allocation that could be the culprit, but the
change was commited just 2 days ago
(http://cgit.freedesktop.org/drm-intel/commit/?id=5c5f645773b6d147bf68c350674dc3ef4f8de83d).

-Michel

2015-04-22 16:45:48

by Mika Kuoppala

[permalink] [raw]
Subject: [PATCH] drm/i915: Add checks to i915_bind_vma

The current aliasing ppgtt implementation allocates
the page table structures on driver initialization
for the entire vm address space. Earlier the page tables
were allocated as array of struct pages, but introduction
of dynamic allocation of page structures changed the page
tables to be inside a page directory.

We have a detailed bug report where traversing of tables and
deferencing page_table[pte]->page oopses. This indicates that
our pre allocation of page tables has failed or that we get
corrupt vma which does not fit inside the vm area and throws
pte > 511.

Add more checks to catch the latter. Warn and bail out if
we get vma which is out of bounds for binding.

v2: Check vma node early (Chris)

Cc: Linus Torvalds <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Michel Thierry <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0239fbf..2ffa8f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2746,6 +2746,13 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
u32 flags)
{
+
+ if (WARN_ON(!drm_mm_node_allocated(&vma->node)))
+ return -EINVAL;
+
+ if (WARN_ON(vma->node.start > vma->vm->total - vma->node.size))
+ return -EINVAL;
+
if (i915_is_ggtt(vma->vm)) {
int ret = i915_get_ggtt_vma_pages(vma);

--
1.9.1

2015-04-23 13:14:33

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH] drm/i915: Add checks to i915_bind_vma

On Wed, Apr 22, 2015 at 12:45 PM, Mika Kuoppala
<[email protected]> wrote:
> The current aliasing ppgtt implementation allocates
> the page table structures on driver initialization
> for the entire vm address space. Earlier the page tables
> were allocated as array of struct pages, but introduction
> of dynamic allocation of page structures changed the page
> tables to be inside a page directory.
>
> We have a detailed bug report where traversing of tables and
> deferencing page_table[pte]->page oopses. This indicates that
> our pre allocation of page tables has failed or that we get
> corrupt vma which does not fit inside the vm area and throws
> pte > 511.
>
> Add more checks to catch the latter. Warn and bail out if
> we get vma which is out of bounds for binding.
>
> v2: Check vma node early (Chris)
>
> Cc: Linus Torvalds <[email protected]>
> Cc: Chris Wilson <[email protected]>
> Cc: Michel Thierry <[email protected]>
> Signed-off-by: Mika Kuoppala <[email protected]>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0239fbf..2ffa8f6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2746,6 +2746,13 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
> int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
> u32 flags)
> {
> +
> + if (WARN_ON(!drm_mm_node_allocated(&vma->node)))
> + return -EINVAL;
> +
> + if (WARN_ON(vma->node.start > vma->vm->total - vma->node.size))
> + return -EINVAL;
> +

Do you really need a full backtrace in these cases? From an end user
perspective, they're going to get a popup saying they have a kernel
problem or an automated tool will file a bug for them and they will
have no idea what any of this means.

I understand the need for the check, but could it be done in a way
that doesn't splat an oops on a user's system? The i915 driver has
tons of these kinds of WARN_ONs already and they don't seem to be
helping anything overall...

josh

> if (i915_is_ggtt(vma->vm)) {
> int ret = i915_get_ggtt_vma_pages(vma);
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/