Received: by 2002:ab2:6d45:0:b0:1fb:d597:ff75 with SMTP id d5csp180941lqr; Wed, 5 Jun 2024 02:48:28 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWMP4T462sWAWh8yufvSn3IBKh7BRYcxBi5oGTr+4IZ4r5lqETuQ7qNQp/M5GCniQeJWmmWbvKhCokyiBPl0etYiYfnwXq9Cl8d+srtug== X-Google-Smtp-Source: AGHT+IEaHtAdkW+s6sSPBZcvNEfKbykxTp8tSgetEpg0XZlXdGtCoFs+AMLilKcBQ/H/wO8AHGuU X-Received: by 2002:a05:6214:5b07:b0:6ae:d417:b661 with SMTP id 6a1803df08f44-6b030a95ba8mr18621946d6.61.1717580908233; Wed, 05 Jun 2024 02:48:28 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717580908; cv=pass; d=google.com; s=arc-20160816; b=hAZj8PHphIYgGhcw/8CPwgC+bnAQIJqDOesPVj9+BvbWHFPVLk18sIOEB15GWKSGOc J/7vchkVnblpwglQEyNnD0kozlD1OAzL/7y1JNr4hCdUJ75FVUqCsnuzuig0QQe+/XhW Hz/345k8KgncVlbdN8S5SeyjzWGJtdyEVkjLDRO/eA1jN/51SYrV6p9+GM29ogewr7Gh +DCMs6RG5pRcGAWFHdUCX7V8fd5YWertOwI4MRlmNSi3A4PqE8Lr+2UJ579NOgI4Q+G2 +/dNg6a8CdI+F3SKo+sBRrNJMOZmZ842eC/u9KPD/tLnZbHoB91VOcnwSUL66+PI5RVZ P1bA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=opY4p+BpYUBWLdwIooIw98BXTUX1WcY1hsxBx/R5iBM=; fh=DFs8VdoLIwaFVA8pd/24z+UH4HS6gWmX5l8W720ky4c=; b=Guf2h4eF+u5b6YRbtdp1kCv4tP5zGX9jwv8akd5VEGAFMrK+jPzcE9f3eugEBXPLGj FnM3/sJvBb06H4prCE1k3dLJlytSgc94TWthLFq4bePdw5PW1ikYllltJ04i0QY5+b85 AVB7iMhc3HTIeN5715DnnvTk/g96m8EdJtwAdCFq13WWXLDvyrQQiK+wE8y9Olv7jKFU ks2lyxPHY1apaYkkrKTEO4f9WrOJLz/H+T3wN1veAgh9VpN9XjhsuesT7ysfNkumullA c2onQgHQDFTSyXolEDe9MY5EFr8uw6v//T4xQamAmxQ2KHVx6Lz6guqmH0b2d51NrrO0 C2zQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aQCoWDdN; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-202141-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202141-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d75a77b69052e-43ff23ac0ecsi5565461cf.159.2024.06.05.02.48.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 02:48:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-202141-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aQCoWDdN; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-202141-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-202141-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0CD8F1C2300F for ; Wed, 5 Jun 2024 09:48:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 07E4C18FDAA; Wed, 5 Jun 2024 09:48:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aQCoWDdN" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 526DF18FC90 for ; Wed, 5 Jun 2024 09:48:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717580892; cv=none; b=saGaKEfgxQOTiQcz7XXNZBmWTIiDLPilWN4SRX4T9sOlPpjQU5Z5NKEzFEa6dgzlGU/TmyrEaK0mlRB8xm+KVI+qp7pqU9lT20bboSb+uDJ9O+1HdJW6MXpHF9Y2T9Oa64TP59qu8o2q3Bl0U1sXUF1hx+yu2nYvNBR5ur7qWqM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717580892; c=relaxed/simple; bh=C4iwgHy0Vtpi0Ntz+kAPqXOSt3BXAywIVancJW3S7TA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=h33oItTvtHbqXZjRw42FGJAzebu48EoGV4BAXNP9aBPiBjWtO59YSKWomG4S+AE/0Ury+44ZZsev14n4ZP2zKoczRP4ob1uqmMlWb+1b9fUlxcTn4TBb8MgLVBlvYX9CSdm9npiglmNEP3xGALu7g1VTenOPug5uaX2GjJGX4B4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=aQCoWDdN; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1717580889; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=opY4p+BpYUBWLdwIooIw98BXTUX1WcY1hsxBx/R5iBM=; b=aQCoWDdNl1VR8+RkaBf1SDI8KesjZyjMRydry/QoMsBiXBCTp1aB9/qZeA9HIrUj6uvpAa cbSRUQ8nhI4ZaQp8zj1HCJ4OZkaRSxcCW58V1haq9PVFqkPdJwCODIdCdhUgLanFE6elj3 a4YhLQtlBisfuKOh/hSGauK515AbBKQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-333-4X7cLI-bOh-8HDmhpp1D3Q-1; Wed, 05 Jun 2024 05:48:03 -0400 X-MC-Unique: 4X7cLI-bOh-8HDmhpp1D3Q-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F6BF101A520; Wed, 5 Jun 2024 09:48:03 +0000 (UTC) Received: from fedora (unknown [10.72.116.43]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DFE9E492BF6; Wed, 5 Jun 2024 09:47:57 +0000 (UTC) Date: Wed, 5 Jun 2024 17:47:53 +0800 From: Ming Lei To: Changhui Zhong Cc: Li Nan , axboe@kernel.dk, ZiyangZhang@linux.alibaba.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com Subject: Re: [PATCH] ublk_drv: fix NULL pointer dereference in ublk_ctrl_start_recovery() Message-ID: References: <20240529095313.2568595-1-linan666@huaweicloud.com> <225f4c8e-0e2c-8f4b-f87d-69f4677af572@huaweicloud.com> <918f128b-f752-2d66-ca60-7d9c711ed928@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 On Wed, Jun 05, 2024 at 03:20:34PM +0800, Changhui Zhong wrote: > On Wed, Jun 5, 2024 at 9:41 AM Li Nan wrote: > > > > > > > > 在 2024/6/4 9:32, Changhui Zhong 写道: > > > On Mon, Jun 3, 2024 at 10:20 AM Li Nan wrote: > > >> > > >> > > >> > > >> 在 2024/6/3 8:39, Ming Lei 写道: > > >> > > >> [...] > > >> > > >>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > >>>> index 4e159948c912..99b621b2d40f 100644 > > >>>> --- a/drivers/block/ublk_drv.c > > >>>> +++ b/drivers/block/ublk_drv.c > > >>>> @@ -2630,7 +2630,8 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) > > >>>> { > > >>>> int i; > > >>>> > > >>>> - WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq))); > > >>>> + if (WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq)))) > > >>>> + return; > > >>> > > >>> Yeah, it is one bug. However, it could be addressed by adding the check in > > >>> ublk_ctrl_start_recovery() and return immediately in case of NULL ubq->ubq_daemon, > > >>> what do you think about this way? > > >>> > > >> > > >> Check ub->nr_queues_ready seems better. How about: > > >> > > >> @@ -2662,6 +2662,8 @@ static int ublk_ctrl_start_recovery(struct > > >> ublk_device *ub, > > >> mutex_lock(&ub->mutex); > > >> if (!ublk_can_use_recovery(ub)) > > >> goto out_unlock; > > >> + if (!ub->nr_queues_ready) > > >> + goto out_unlock; > > >> /* > > >> * START_RECOVERY is only allowd after: > > >> * > > >> > > >>> > > >>> Thanks, > > >>> Ming > > >> > > >> -- > > >> Thanks, > > >> Nan > > >> > > > > > > > > > Hi,Nan > > > > > > After applying your new patch, I did not trigger "NULL pointer > > > dereference" and "Warning", > > > but hit task hung "Call Trace" info, please check > > > > > > [13617.812306] running generic/004 > > > [13622.293674] blk_print_req_error: 91 callbacks suppressed > > > [13622.293681] I/O error, dev ublkb4, sector 233256 op 0x1:(WRITE) > > > flags 0x8800 phys_seg 1 prio class 0 > > > [13622.308145] I/O error, dev ublkb4, sector 233256 op 0x0:(READ) > > > flags 0x0 phys_seg 2 prio class 0 > > > [13622.316923] I/O error, dev ublkb4, sector 233264 op 0x1:(WRITE) > > > flags 0x8800 phys_seg 1 prio class 0 > > > [13622.326048] I/O error, dev ublkb4, sector 233272 op 0x0:(READ) > > > flags 0x0 phys_seg 1 prio class 0 > > > [13622.334828] I/O error, dev ublkb4, sector 233272 op 0x1:(WRITE) > > > flags 0x8800 phys_seg 1 prio class 0 > > > [13622.343954] I/O error, dev ublkb4, sector 233312 op 0x0:(READ) > > > flags 0x0 phys_seg 1 prio class 0 > > > [13622.352733] I/O error, dev ublkb4, sector 233008 op 0x0:(READ) > > > flags 0x0 phys_seg 1 prio class 0 > > > [13622.361514] I/O error, dev ublkb4, sector 233112 op 0x0:(READ) > > > flags 0x0 phys_seg 1 prio class 0 > > > [13622.370292] I/O error, dev ublkb4, sector 233192 op 0x1:(WRITE) > > > flags 0x8800 phys_seg 1 prio class 0 > > > [13622.379419] I/O error, dev ublkb4, sector 233120 op 0x0:(READ) > > > flags 0x0 phys_seg 1 prio class 0 > > > [13641.069695] INFO: task fio:174413 blocked for more than 122 seconds. > > > [13641.076061] Not tainted 6.10.0-rc1+ #1 > > > [13641.080338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [13641.088164] task:fio state:D stack:0 pid:174413 > > > tgid:174413 ppid:174386 flags:0x00004002 > > > [13641.088168] Call Trace: > > > [13641.088170] > > > [13641.088171] __schedule+0x221/0x670 > > > [13641.088177] schedule+0x23/0xa0 > > > [13641.088179] io_schedule+0x42/0x70 > > > [13641.088181] blk_mq_get_tag+0x118/0x2b0 > > > [13641.088185] ? gup_fast_pgd_range+0x280/0x370 > > > [13641.088188] ? __pfx_autoremove_wake_function+0x10/0x10 > > > [13641.088192] __blk_mq_alloc_requests+0x194/0x3a0 > > > [13641.088194] blk_mq_submit_bio+0x241/0x6c0 > > > [13641.088196] __submit_bio+0x8a/0x1f0 > > > [13641.088199] submit_bio_noacct_nocheck+0x168/0x250 > > > [13641.088201] ? submit_bio_noacct+0x45/0x560 > > > [13641.088203] __blkdev_direct_IO_async+0x167/0x1a0 > > > [13641.088206] blkdev_write_iter+0x1c8/0x270 > > > [13641.088208] aio_write+0x11c/0x240 > > > [13641.088212] ? __rq_qos_issue+0x21/0x40 > > > [13641.088214] ? blk_mq_start_request+0x34/0x1a0 > > > [13641.088216] ? io_submit_one+0x68/0x380 > > > [13641.088218] ? kmem_cache_alloc_noprof+0x4e/0x320 > > > [13641.088221] ? fget+0x7c/0xc0 > > > [13641.088224] ? io_submit_one+0xde/0x380 > > > [13641.088226] io_submit_one+0xde/0x380 > > > [13641.088228] __x64_sys_io_submit+0x80/0x160 > > > [13641.088229] do_syscall_64+0x79/0x150 > > > [13641.088233] ? syscall_exit_to_user_mode+0x6c/0x1f0 > > > [13641.088237] ? do_io_getevents+0x8b/0xe0 > > > [13641.088238] ? syscall_exit_work+0xf3/0x120 > > > [13641.088241] ? syscall_exit_to_user_mode+0x6c/0x1f0 > > > [13641.088243] ? do_syscall_64+0x85/0x150 > > > [13641.088245] ? do_syscall_64+0x85/0x150 > > > [13641.088247] ? blk_mq_flush_plug_list.part.0+0x108/0x160 > > > [13641.088249] ? rseq_get_rseq_cs+0x1d/0x220 > > > [13641.088252] ? rseq_ip_fixup+0x6d/0x1d0 > > > [13641.088254] ? blk_finish_plug+0x24/0x40 > > > [13641.088256] ? syscall_exit_to_user_mode+0x6c/0x1f0 > > > [13641.088258] ? do_syscall_64+0x85/0x150 > > > [13641.088260] ? syscall_exit_to_user_mode+0x6c/0x1f0 > > > [13641.088262] ? do_syscall_64+0x85/0x150 > > > [13641.088264] ? syscall_exit_to_user_mode+0x6c/0x1f0 > > > [13641.088266] ? do_syscall_64+0x85/0x150 > > > [13641.088268] ? do_syscall_64+0x85/0x150 > > > [13641.088270] ? do_syscall_64+0x85/0x150 > > > [13641.088272] ? clear_bhb_loop+0x45/0xa0 > > > [13641.088275] ? clear_bhb_loop+0x45/0xa0 > > > [13641.088277] ? clear_bhb_loop+0x45/0xa0 > > > [13641.088279] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > [13641.088281] RIP: 0033:0x7ff92150713d > > > [13641.088283] RSP: 002b:00007ffca1ef81f8 EFLAGS: 00000246 ORIG_RAX: > > > 00000000000000d1 > > > [13641.088285] RAX: ffffffffffffffda RBX: 00007ff9217e2f70 RCX: 00007ff92150713d > > > [13641.088286] RDX: 000055863b694fe0 RSI: 0000000000000010 RDI: 00007ff92164d000 > > > [13641.088287] RBP: 00007ff92164d000 R08: 00007ff91936d000 R09: 0000000000000180 > > > [13641.088288] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000010 > > > [13641.088289] R13: 0000000000000000 R14: 000055863b694fe0 R15: 000055863b6970c0 > > > [13641.088291] > > > > > > Thanks, > > > Changhui > > > > > > > After applying the previous patch, will the test environment continue to > > execute test cases after WARN? > > a few days ago,test with the previous patch, the test environment > continued to execute test cases after WARN, > and I terminated the test when I observed a WARN,so I did not observe > the subsequent situation. > > > I am not sure whether this issue has always > > existed but was not tested becasue of WARN, or whether the new patch > > introduced it. > > today, I re-test previous patch, and let it run for a long time,I > observed WARN and task hung, > looks this issue already existed and not introduced by new patch. Hi Changhui, The hang is actually expected because recovery fails. Please pull the latest ublksrv and check if the issue can still be reproduced: https://github.com/ublk-org/ublksrv BTW, one ublksrv segfault and two test cleanup issues are fixed. Thanks, Ming