Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4066435pxb; Tue, 2 Nov 2021 03:34:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxvJK4u84CH5L+WKUy9iX9g1ItQGU5YKJwfnQVM827oxxUcjI/5W0KX21tPwB12U3tv0z2S X-Received: by 2002:a05:6402:2682:: with SMTP id w2mr50414370edd.185.1635849243223; Tue, 02 Nov 2021 03:34:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635849243; cv=none; d=google.com; s=arc-20160816; b=lWCmpmw+znmDiOqC7ei/FvkwMvvrlgyYdrinC7Um6zPjuFcIhDd/TzDCcrw1jfAXui orvBLx0elpMoj/rhePecAxZ8ksK9TKPbrKjCu/C/dRgGWDvSvavpf20lZ2NirTuZmdPP OPBFAVK90OdmtvlRuPlMkjeS6Yr59UoE1PZYiiYXWnuUOvJR4gpgxcTrJcfey6FjRVMP u8pVJXj5WGkBJcy21i++FYmJIsQWIXbxpPT3nWybp10ev2J/6Tzw4cdA4mV60atqcmxT aXChTDjOdewGa2OtXS9vrYjWrDxFy+h93j3hZoe5mrlMnYWXoPWJe1XKGSejiq1WQDYx QiTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:subject:cc:to:from:message-id :date:dkim-signature:dkim-signature; bh=ayzifrmSx2RtPVVSxMWcHrbtaEbgK40zXrtwKXLG2m0=; b=fystUhpO0GOhLHUcyYmkpzuDVjQz7ua45qiM9aVIllFKd7rxlVkiA306kMqSL5fTO7 ZunkUCzPIEES5xCvZ3q3hpQp7UU0TovAxAqwBEh3HWIscH+c2JTwTc3TiS17Jg3mTBQu 1K5RGlMkKall9hwkk6Q/aWQLZ4mJPJMAqaqUrnvgbS8noS8t4KX8AoCflFwaBSpnBEjX iWaodslxjBIXIvlnRgC1AFv0b+t0Kdb4QX+T/nNQ+ZQvY3EIvMuQ9ErUrniuH2Wpz9lX CNirOn5MQgTEY0qFx3XmdyEZJViPyspU+RJMocR5Q4swGgSGBp3p4lrv1ErirgoPnbsq 5j7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=yTJj+PFD; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=CLjRDNho; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bq5si24835561edb.617.2021.11.02.03.33.35; Tue, 02 Nov 2021 03:34:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=yTJj+PFD; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=CLjRDNho; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229931AbhKBKeW (ORCPT + 99 others); Tue, 2 Nov 2021 06:34:22 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:35916 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229720AbhKBKeV (ORCPT ); Tue, 2 Nov 2021 06:34:21 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4896C212C6; Tue, 2 Nov 2021 10:31:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1635849106; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ayzifrmSx2RtPVVSxMWcHrbtaEbgK40zXrtwKXLG2m0=; b=yTJj+PFDZGlSRK8ri2AI9FfcWQC82EMpqw/SOl+YfYouvFisqd6TIRg0SbbKBFC6/N5Fwm /TcyJ0b4EyMUnb7mwZxx4A8nSBgayFmNyTcjiDLHC1osliatoG6GaoKJip922Y55Pwh1pj Hrg9x8/BNPpYWdl2/pF88iO/QZddzBA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1635849106; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ayzifrmSx2RtPVVSxMWcHrbtaEbgK40zXrtwKXLG2m0=; b=CLjRDNhoo6EdtlA8OvHxgZHImZDYvacHLMsYmwV1PPWOJG6kaDQ0GdV3+O7yUFatEsiWRn gJKKv75OlrL7pNAg== Received: from alsa1.suse.de (alsa1.suse.de [10.160.4.42]) by relay2.suse.de (Postfix) with ESMTP id 38DA1A3B84; Tue, 2 Nov 2021 10:31:46 +0000 (UTC) Date: Tue, 02 Nov 2021 11:31:46 +0100 Message-ID: From: Takashi Iwai To: Zqiang Cc: tiwai@suse.com, alsa-devel@alsa-project.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] ALSA: seq: Fix RCU stall in snd_seq_write() In-Reply-To: <2d05ceab-b8b7-0c7b-f847-69950c6db14e@gmail.com> References: <20211102033222.3849-1-qiang.zhang1211@gmail.com> <2d05ceab-b8b7-0c7b-f847-69950c6db14e@gmail.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 02 Nov 2021 10:41:57 +0100, Zqiang wrote: > > > On 2021/11/2 下午4:33, Takashi Iwai wrote: > > On Tue, 02 Nov 2021 04:32:22 +0100, > > Zqiang wrote: > >> If we have a lot of cell object, this cycle may take a long time, and > >> trigger RCU stall. insert a conditional reschedule point to fix it. > >> > >> rcu: INFO: rcu_preempt self-detected stall on CPU > >> rcu: 1-....: (1 GPs behind) idle=9f5/1/0x4000000000000000 > >> softirq=16474/16475 fqs=4916 > >> (t=10500 jiffies g=19249 q=192515) > >> NMI backtrace for cpu 1 > >> ...... > >> asm_sysvec_apic_timer_interrupt > >> RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 > >> spin_unlock_irqrestore > >> snd_seq_prioq_cell_out+0x1dc/0x360 > >> snd_seq_check_queue+0x1a6/0x3f0 > >> snd_seq_enqueue_event+0x1ed/0x3e0 > >> snd_seq_client_enqueue_event.constprop.0+0x19a/0x3c0 > >> snd_seq_write+0x2db/0x510 > >> vfs_write+0x1c4/0x900 > >> ksys_write+0x171/0x1d0 > >> do_syscall_64+0x35/0xb0 > >> > >> Reported-by: syzbot+bb950e68b400ab4f65f8@syzkaller.appspotmail.com > >> Signed-off-by: Zqiang > >> --- > >> sound/core/seq/seq_queue.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/sound/core/seq/seq_queue.c b/sound/core/seq/seq_queue.c > >> index d6c02dea976c..f5b1e4562a64 100644 > >> --- a/sound/core/seq/seq_queue.c > >> +++ b/sound/core/seq/seq_queue.c > >> @@ -263,6 +263,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) > >> if (!cell) > >> break; > >> snd_seq_dispatch_event(cell, atomic, hop); > >> + cond_resched(); > >> } > >> /* Process time queue... */ > >> @@ -272,6 +273,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) > >> if (!cell) > >> break; > >> snd_seq_dispatch_event(cell, atomic, hop); > >> + cond_resched(); > > > > It's good to have cond_resched() in those places but it must be done > > more carefully, as the code path may be called from the non-atomic > > context, too. That is, it must have a check of atomic argument, and > > cond_resched() is applied only when atomic==false. > > > > But I still wonder how this gets a RCU stall out of sudden. Looking > > through https://syzkaller.appspot.com/bug?extid=bb950e68b400ab4f65f8 > > it's triggered by many cases since the end of September... > > I did not find useful information from the log,  through calltrace, I > guess it may be triggered by the long cycle time, which caused the > static state of the RCU to > > not be reported in time. Yes, I understand that logic. But I wonder why this gets triggered *now* out of sudden. The code has been present over decades, and I don't think the similar test case must have been performed by fuzzer. > I ignore the atomic parameter check,  I will resend v2 .   in > no-atomic context, we can insert > > cond_resched() to avoid this situation, but in atomic context, > > the RCU stall maybe still trigger. Right, so maybe it's better to have an upper limit for the processed cells, something like below (totally untested). Could you reproduce the problem locally? Otherwise it's all nothing but a guess... thanks, Takashi --- diff --git a/sound/core/seq/seq_queue.c b/sound/core/seq/seq_queue.c index d6c02dea976c..7f796ee62ee7 100644 --- a/sound/core/seq/seq_queue.c +++ b/sound/core/seq/seq_queue.c @@ -235,12 +235,15 @@ struct snd_seq_queue *snd_seq_queue_find_name(char *name) /* -------------------------------------------------------- */ +#define MAX_CELL_PROCESSES_IN_QUEUE 1000 + void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) { unsigned long flags; struct snd_seq_event_cell *cell; snd_seq_tick_time_t cur_tick; snd_seq_real_time_t cur_time; + int processed = 0; if (q == NULL) return; @@ -263,6 +266,8 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) if (!cell) break; snd_seq_dispatch_event(cell, atomic, hop); + if (++processed >= MAX_CELL_PROCESSES_IN_QUEUE) + return; /* the rest processed at the next batch */ } /* Process time queue... */ @@ -272,6 +277,8 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) if (!cell) break; snd_seq_dispatch_event(cell, atomic, hop); + if (++processed >= MAX_CELL_PROCESSES_IN_QUEUE) + return; /* the rest processed at the next batch */ } /* free lock */