Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4721318yba; Tue, 30 Apr 2019 03:20:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqwSJVhtHK26gpFgxTt41wFD4cMm5z5iKTPJnvcLXYyANuVJ4tHNqGdx3FSHD+4BYD3kEFAf X-Received: by 2002:a65:5049:: with SMTP id k9mr62982989pgo.229.1556619601012; Tue, 30 Apr 2019 03:20:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556619601; cv=none; d=google.com; s=arc-20160816; b=zhvYrZKZkBK4a+Fr0FQYgWfUXAW9octjZcOIb7lCeuMKBpm0UYFlrPQb4v8GzNnTPA tPtaycKqY2raEsbY+Jae8DpdRrVqpzScDRE7juSuizsQgYZekRXE+3dh75TsflCGZX9X e9VRcfDlxef9mawCpLA7YP2xbb8aQAi7EcJa1JfT84y7wRXAoehLAm4JdAARehIrNRJ6 sCbbZGrdaqpRJqz0nKtiZLlx6jGx8FAe7nTH8CHQ19tyhzUdV87UWttwS0SJK+JnVuA4 lmOhFVgjrejuGQxRoWeeU2R2xRUPSYKSjF22nLSPNZVTWQfPMl+jmzQJgbBCp59dW4dq 7ikw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:subject:cc:to:from:message-id:date; bh=as6NtzFJfuBOXUYDwU6nyDYqpQuYl4L4OxSm3ETR6Hs=; b=P7IMcfa6baJIfpKhQZQRUywT2FES0F7fB4hvVmtUWzP6QpYgpmDMg7nGYu24WsAqcU lkoQUNdmKS11ogpDBG6RoApokoIA7Mn9Dr49HnNQ+U6TR1+6Y3T3IF887pARFOZYgp+O mAbwGHm2nD4fzUpW1Y3cfngoeCsD2Mf1UgHHv0rReFf5e17R2HcpHQ/KhdzhzstHp6zo 0vIcjYFQcZYrELGOHHzJ0mp3Ox6JK3GpzP2/dOQKpJP8Bp0aHv8GRdHIkz63Dq8T68Em oauMTqM3q3t/L7NAJ5382Zm6HpJYp1cEBhL74taob9Z+2lihXlCY8p7uWDEbIZGqdXOQ 1LEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i2si11256188pfa.287.2019.04.30.03.19.44; Tue, 30 Apr 2019 03:20:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727106AbfD3KRi (ORCPT + 99 others); Tue, 30 Apr 2019 06:17:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:50390 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726145AbfD3KRh (ORCPT ); Tue, 30 Apr 2019 06:17:37 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B1395AD64; Tue, 30 Apr 2019 10:17:35 +0000 (UTC) Date: Tue, 30 Apr 2019 12:17:35 +0200 Message-ID: From: Takashi Iwai To: Liwei Song Cc: , Yu Zhao , Mark Brown , Keyon Jie , Jaroslav Kysela , linux-kernel Subject: Re: [PATCH] ALSA: hda: check RIRB to avoid use NULL pointer In-Reply-To: <5CC8156F.1030403@windriver.com> References: <1556604653-47363-1-git-send-email-liwei.song@windriver.com> <5CC8082F.4090903@windriver.com> <5CC8156F.1030403@windriver.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 30 Apr 2019 11:29:19 +0200, Liwei Song wrote: > > > > On 04/30/2019 04:53 PM, Takashi Iwai wrote: > > On Tue, 30 Apr 2019 10:32:47 +0200, > > Liwei Song wrote: > >> > >> > >> > >> On 04/30/2019 03:31 PM, Takashi Iwai wrote: > >>> On Tue, 30 Apr 2019 08:10:53 +0200, > >>> Song liwei wrote: > >>>> > >>>> From: Liwei Song > >>>> > >>>> Fix the following BUG: > >>>> > >>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000c > >>>> Workqueue: events azx_probe_work [snd_hda_intel] > >>>> RIP: 0010:snd_hdac_bus_update_rirb+0x80/0x160 [snd_hda_core] > >>>> Call Trace: > >>>> > >>>> azx_interrupt+0x78/0x140 [snd_hda_codec] > >>>> __handle_irq_event_percpu+0x49/0x300 > >>>> handle_irq_event_percpu+0x23/0x60 > >>>> handle_irq_event+0x3c/0x60 > >>>> handle_edge_irq+0xdb/0x180 > >>>> handle_irq+0x23/0x30 > >>>> do_IRQ+0x6a/0x140 > >>>> common_interrupt+0xf/0xf > >>>> > >>>> The Call Trace happened when run kdump on a NFS rootfs system. > >>>> Exist the following calling sequence when boot the second kernel: > >>>> > >>>> azx_first_init() > >>>> --> azx_acquire_irq() > >>>> <-- interrupt come in, azx_interrupt() was called > >>>> --> hda_intel_init_chip() > >>>> --> azx_init_chip() > >>>> --> snd_hdac_bus_init_chip() > >>>> --> snd_hdac_bus_init_cmd_io(); > >>>> --> init rirb.buf and corb.buf > >>>> > >>>> Interrupt happened after azx_acquire_irq() while RIRB still didn't got > >>>> initialized, then NULL pointer will be used when process the interrupt. > >>>> > >>>> Check the value of RIRB to ensure it is not NULL, to aviod some special > >>>> case may hang the system. > >>>> > >>>> Fixes: 14752412721c ("ALSA: hda - Add the controller helper codes to hda-core module") > >>>> Signed-off-by: Liwei Song > >>> > >>> Oh, that's indeed a race there. > >>> > >>> But I guess the check introduced by the patch is still error-prone. > >>> Basically the interrupt handling should be moved after the chip > >>> initialization. I suppose that your platform uses the shared > >>> interrupt, not the MSI? > >> > >> This is the information from /proc/interrupt > >> 134: 0 102 0 0 IR-PCI-MSI 514048-edge snd_hda_intel:card0 > > > > Hm, then it's interesting... > > > > > >>> In anyway, alternative (and likely more certain) fix would be to move > >>> the azx_acquir_irq() call like the patch below (note: totally > >>> untested). Could you check whether it works? > >> > >> Yes, It works. > >> > >> Considering a previous patch like the one you provide will import some issue, > >> so I choose check the invalid value to low the risk, but just as you mentioned, > >> It is not a good solution. > >> > >> commit 542cedec53c9e8b73f3f05bf8468823598c50489 > >> Author: Yu Zhao > >> Date: Tue Sep 11 15:12:46 2018 -0600 > >> > >> Revert "ASoC: Intel: Skylake: Acquire irq after RIRB allocation" > >> > >> This reverts commit 12eeeb4f4733bbc4481d01df35933fc15beb8b19. > >> > >> The patch doesn't fix accessing memory with null pointer in > >> skl_interrupt(). > >> > >> There are two problems: 1) skl_init_chip() is called twice, before > >> and after dma buffer is allocate. The first call sets bus->chip_init > >> which prevents the second from initializing bus->corb.buf and > >> rirb.buf from bus->rb.area. 2) snd_hdac_bus_init_chip() enables > >> interrupt before snd_hdac_bus_init_cmd_io() initializing dma buffers. > >> There is a small window which skl_interrupt() can be called if irq > >> has been acquired. If so, it crashes when using null dma buffer > >> pointers. > > > > Actually this followed by another fix b61749a89f82, > > sound: enable interrupt after dma buffer initialization > > > > and this moved the IRQ enablement after snd_hdac_bus_init_cmd_io(). > > > > So I wonder how the irq gets triggered in your case. > > If it were a shared irq, it's understandable. But for MSI, it should > > have been the isolated source. > > I'm still working on how the irq was triggered, > it is a little complex to reproduce it, first it must run with NFS rootfs, > without NFS rootfs it can not reproduced. > Then with kdump enabled, after "echo c > /proc/sysrq-trigger" crash the kernel, > the kernel specified by kdump will boot, then interrupt will trigger > soon after azx interrupt was register. Ah, so it happens in a kdump kernel? It implies that the interrupt line may be still active (or confused). Then it's no wonder a stale interrupt comes up. > > In anyway, for the latest tree, the change I suggested would cover > > better although it's more radical as you pointed. > > Got it, Thanks. OK, I'm going to submit and apply the proper patch. thanks, Takashi