Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4680506yba; Tue, 30 Apr 2019 02:33:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqxdBKZ4qkQ1ShvQlV23k0+l1mrW94HeP7llZ+4N8LW3I0+kHTttgZqU1LiGfS4F2jZOlwCe X-Received: by 2002:a63:f24f:: with SMTP id d15mr65289974pgk.410.1556616819969; Tue, 30 Apr 2019 02:33:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556616819; cv=none; d=google.com; s=arc-20160816; b=OD0RXdWRu/28iwaoExn1YEpndCRYgkAYKCkV+JjYD3AD0n9FC3ehqcAZ+euH+eUE/m pIhuYLWJgOhm6ZtDYR8mGz7ZNUjnbHepnwoGLDozqvGx3OsbhMmkwu6W8A/ltSq2rw0U l9CZl7wz5jc9WYOyyXe9qO3erDOyPJdLnbGgFglbOtlU4DzMy8Gb72OhkHOI/AKKVbOd nJO1L7ndWhxsMj3py/0x9rD3gz9Seet1OWlTEPwvmNWAOPRSrnDGItE67dmV8U9n+5Lx uhix3eHzG8bDJ5u0oLVhOcsChL0QgnpMOmAJteJhEQJfczTaCwchag2hr//VzskHg7Wx XXGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :references:subject:cc:to:mime-version:user-agent:from:date :message-id; bh=NmYEGtxhjvfAnCKWoizTXNxCORRWPEuTPXE+5vshmdA=; b=sUvQqnAkQEnuq3eMx2g9l0J/j2bK47ZxnGzKrWPC4xrKmXwwXdj3u6DfOwO9FS6ReC qW93egl38aljsobRK0QEiT6TLr+G5I0KUarvHS3BAQe3CTSKiyoWnovIWk7VAuhK6Pt6 62Cn6dpZ5X7ZTLCDOld/YgTPnwyRDyBk22rBqUR0YNcFVlyjZjDwPjEJhfh29EG/rYHu ccMxl6Q+Oi20cK1t4hgnoBX4lxbtuTmELq1i9VDsIUYWfS0QoW9MZW3XgUYjdF+3Rsxh X8hEdIQJuQs//LqBSM/hfAJX2Lz21jAJr1lh4WDiSB7ThLkEQUwJHhRlWy0+SKmVoWs9 5SYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c136si6036920pfc.214.2019.04.30.02.33.24; Tue, 30 Apr 2019 02:33:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726967AbfD3Jc1 (ORCPT + 99 others); Tue, 30 Apr 2019 05:32:27 -0400 Received: from mail5.windriver.com ([192.103.53.11]:40056 "EHLO mail5.wrs.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726979AbfD3JcX (ORCPT ); Tue, 30 Apr 2019 05:32:23 -0400 Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail5.wrs.com (8.15.2/8.15.2) with ESMTPS id x3U9TWpE016047 (version=TLSv1 cipher=AES128-SHA bits=128 verify=FAIL); Tue, 30 Apr 2019 02:29:42 -0700 Received: from [128.224.162.229] (128.224.162.229) by ALA-HCA.corp.ad.wrs.com (147.11.189.50) with Microsoft SMTP Server id 14.3.439.0; Tue, 30 Apr 2019 02:29:21 -0700 Message-ID: <5CC8156F.1030403@windriver.com> Date: Tue, 30 Apr 2019 17:29:19 +0800 From: Liwei Song User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Takashi Iwai CC: , Yu Zhao , Mark Brown , Keyon Jie , Jaroslav Kysela , linux-kernel Subject: Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer References: <1556604653-47363-1-git-send-email-liwei.song@windriver.com> <5CC8082F.4090903@windriver.com> In-Reply-To: Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/30/2019 04:53 PM, Takashi Iwai wrote: > On Tue, 30 Apr 2019 10:32:47 +0200, > Liwei Song wrote: >> >> >> >> On 04/30/2019 03:31 PM, Takashi Iwai wrote: >>> On Tue, 30 Apr 2019 08:10:53 +0200, >>> Song liwei wrote: >>>> >>>> From: Liwei Song >>>> >>>> Fix the following BUG: >>>> >>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c >>>> Workqueue: events azx_probe_work [snd_hda_intel] >>>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core] >>>> Call Trace: >>>> >>>> azx_interrupt+0x78/0x140 [snd_hda_codec] >>>> __handle_irq_event_percpu+0x49/0x300 >>>> handle_irq_event_percpu+0x23/0x60 >>>> handle_irq_event+0x3c/0x60 >>>> handle_edge_irq+0xdb/0x180 >>>> handle_irq+0x23/0x30 >>>> do_IRQ+0x6a/0x140 >>>> common_interrupt+0xf/0xf >>>> >>>> The Call Trace happened when run kdump on a NFS rootfs system. >>>> Exist the following calling sequence when boot the second kernel: >>>> >>>> azx_first_init() >>>> --> azx_acquire_irq() >>>> <-- interrupt come in, azx_interrupt() was called >>>> --> hda_intel_init_chip() >>>> --> azx_init_chip() >>>> --> snd_hdac_bus_init_chip() >>>> --> snd_hdac_bus_init_cmd_io(); >>>> --> init rirb.buf and corb.buf >>>> >>>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got >>>> initialized, then NULL pointer will be used when process the interrupt. >>>> >>>> Check the value of RIRB to ensure it is not NULL, to aviod some special >>>> case may hang the system. >>>> >>>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module") >>>> Signed-off-by: Liwei Song >>> >>> Oh, that's indeed a race there. >>> >>> But I guess the check introduced by the patch is still error-prone. >>> Basically the interrupt handling should be moved after the chip >>> initialization. I suppose that your platform uses the shared >>> interrupt, not the MSI? >> >> This is the information from /proc/interrupt >> 134: 0 102 0 0 IR-PCI-MSI 514048-edge snd_hda_intel:card0 > > Hm, then it's interesting... > > >>> In anyway, alternative (and likely more certain) fix would be to move >>> the azx_acquir_irq() call like the patch below (note: totally >>> untested). Could you check whether it works? >> >> Yes, It works. >> >> Considering a previous patch like the one you provide will import some issue, >> so I choose check the invalid value to low the risk, but just as you mentioned, >> It is not a good solution. >> >> commit 542cedec53c9e8b73f3f05bf8468823598c50489 >> Author: Yu Zhao >> Date: Tue Sep 11 15:12:46 2018 -0600 >> >> Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation" >> >> This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19. >> >> The patch doesn't fix accessing memory with null pointer in >> skl_interrupt(). >> >> There are two problems: 1) skl_init_chip() is called twice, before >> and after dma buffer is allocate. The first call sets bus->chip_init >> which prevents the second from initializing bus->corb.buf and >> rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables >> interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers. >> There is a small window which skl_interrupt() can be called if irq >> has been acquired. If so, it crashes when using null dma buffer >> pointers. > > Actually this followed by another fix b61749a89f82, > sound: enable interrupt after dma buffer initialization > > and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io(). > > So I wonder how the irq gets triggered in your case. > If it were a shared irq, it's understandable. But for MSI, it should > have been the isolated source. I'm still working on how the irq was triggered, it is a little complex to reproduce it, first it must run with NFS rootfs, without NFS rootfs it can not reproduced. Then with kdump enabled, after "echo c > /proc/sysrq-trigger" crash the kernel, the kernel specified by kdump will boot, then interrupt will trigger soon after azx interrupt was register. > > In anyway, for the latest tree, the change I suggested would cover > better although it's more radical as you pointed. Got it, Thanks. Liwei. > > > thanks, > > Takashi > >