Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp1231169rdb; Wed, 24 Jan 2024 08:33:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IEcPLzDuHG/Hvczw/XpmWUjik1frPeHEj7vzHdFw31vVWSBFeE/GeI+rwjb3Oq+i7vPIabI X-Received: by 2002:a05:6a20:9192:b0:19b:5b08:45e0 with SMTP id v18-20020a056a20919200b0019b5b0845e0mr923671pzd.74.1706114021388; Wed, 24 Jan 2024 08:33:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706114021; cv=pass; d=google.com; s=arc-20160816; b=PJW/k0EgGhSJ5NPEh1JqqQPDxcte4+rlWZz6OI3gt5QLQfnEInnw1npkA8tFdkEMrU ry19HSXsBVU47glbflsqEkuqmj8R5itmgvy7I5TcKH5CbHH5RXhCidQPoH4IUaPRlUhf oznfYLFs5GREBFsZlCNE8zWuxq2K+LF6/zqbAdmHdLm/c8rgpcY+vcHPcYGNBnsbsV2L dYkQRsUJY+z4OwS817xIgV/VepVQA55vFQc0djQnN06I9T+s2qjY2tAtA61iNvYRewon 4KhoW9t+Xz6XOGlVtEz3d/Iq+UiTXIKDQxhEAtDca470KnLOtt3wmMfwyVFDbE3kfvU4 R2Yw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:precedence:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=7vT1PswudYg3wT0G4RuvjsUk701LiPWja9I6tIWZQRY=; fh=5WYL+3ssmAdhm2s0FKUX8C7oWDrAaS1ON+psju9bpVQ=; b=cv8sbjHHIi86a+r0TuxoGhUHMgqBl31+xR8Mjz1t77MGelN0wN3zHNGlaCJrTPPDT/ pa5IL8Swe9sMAV2DVKHdT7ZaIIp+NjDk8w9RQk5ll7j87+G9fjV+vx612h79WIQPJXY7 LiU05CZ5e4bhXhCagcG94+KLUzG3Ga2rw/jMsSoj4Rvubh/jfG4KKXVGmDaUkLi5I3SI ruWiw2ftIadD6kM6ClbEV0IcrHELmkyLonXWUMBR6kSWrIuqbmBYyhKJJGUNAtPk1kzV r23FvMe8rtetsyptieZ05EV4IOKMgSTEUOoFaXmXzkqLI2ahjBA0NspUe1zpcCIngZCg 0dZA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=CNkDA6Lv; arc=pass (i=1 spf=pass spfdomain=amazon.de dkim=pass dkdomain=amazon.de dmarc=pass fromdomain=amazon.de); spf=pass (google.com: domain of linux-kernel+bounces-37336-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37336-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id u11-20020a63790b000000b005d3a532f031si3006096pgc.257.2024.01.24.08.33.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jan 2024 08:33:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-37336-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=CNkDA6Lv; arc=pass (i=1 spf=pass spfdomain=amazon.de dkim=pass dkdomain=amazon.de dmarc=pass fromdomain=amazon.de); spf=pass (google.com: domain of linux-kernel+bounces-37336-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-37336-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 706622856DE for ; Wed, 24 Jan 2024 16:32:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 35EE8433AD; Wed, 24 Jan 2024 16:32:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="CNkDA6Lv" Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D243477620; Wed, 24 Jan 2024 16:32:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706113927; cv=none; b=PCkCQ5IcJm5lOxuJjrMWa8i3bSYmgMGxkwSEgjXKLNh8FmtAC5k3L0fPVkpH8ji+rKiSk2Nhemt1PBFKVWqbh7r4TlVTpVFptAZuJFewXgNdyB+vbDEw9/trgh1M/2QIX/LXjK1ljp547SPA8ynKQl7ERoZ/b/YTIwToEextRWA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706113927; c=relaxed/simple; bh=Uy5X5ueXyxWHqCF4CbUbbUkxQtCsB/lkqnQBEKxtlZ4=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=eRj7cDOmwj2A9IV/X1SzCHhr8C6vJQOeYIGGDDW3pV8jS8d8nwemUD+M7Mc7Ac3BmF1nfZClE5ewXqhJxKnY/cJGuhqEkNACFvmYtPDu9OqsiS95oMQ8It6rNJPxFQUZ73Wl71QcErg8vE7RU2iKRgdQd1Aay+A26oY/yQhL97I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=CNkDA6Lv; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1706113925; x=1737649925; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=7vT1PswudYg3wT0G4RuvjsUk701LiPWja9I6tIWZQRY=; b=CNkDA6LvxIBfTlyEwqlfSXPNmBGHBovLb1WLNGLwie/sQSLlJvouxmNn OeI63NIwC2F8p10Heas1BZwPr1dxHQXwCkaWiZlihBQLho8sdmiO+93zI 9ISo0xMyeLsewiCAvaCrsexNqeEgZjP6gflG3CF0piGE+1aQC1TEFpXAu A=; X-IronPort-AV: E=Sophos;i="6.05,216,1701129600"; d="scan'208";a="629728689" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-iad-1e-m6i4x-3554bfcf.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2024 16:32:02 +0000 Received: from smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev (iad7-ws-svc-p70-lb3-vlan2.iad.amazon.com [10.32.235.34]) by email-inbound-relay-iad-1e-m6i4x-3554bfcf.us-east-1.amazon.com (Postfix) with ESMTPS id 641D7806A9; Wed, 24 Jan 2024 16:32:00 +0000 (UTC) Received: from EX19MTAUEA001.ant.amazon.com [10.0.44.209:41371] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.88.120:2525] with esmtp (Farcaster) id 439588db-d258-4371-8e65-95409cade95b; Wed, 24 Jan 2024 16:31:59 +0000 (UTC) X-Farcaster-Flow-ID: 439588db-d258-4371-8e65-95409cade95b Received: from EX19D008UEA003.ant.amazon.com (10.252.134.116) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 24 Jan 2024 16:31:49 +0000 Received: from EX19MTAUWC001.ant.amazon.com (10.250.64.145) by EX19D008UEA003.ant.amazon.com (10.252.134.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 24 Jan 2024 16:31:48 +0000 Received: from dev-dsk-mheyne-1b-c1362c4d.eu-west-1.amazon.com (10.15.57.183) by mail-relay.amazon.com (10.250.64.145) with Microsoft SMTP Server id 15.2.1118.40 via Frontend Transport; Wed, 24 Jan 2024 16:31:48 +0000 Received: by dev-dsk-mheyne-1b-c1362c4d.eu-west-1.amazon.com (Postfix, from userid 5466572) id 37CB1ADF; Wed, 24 Jan 2024 16:31:48 +0000 (UTC) From: Maximilian Heyne To: CC: Julien Grall , Maximilian Heyne , , Andrew Panyakin , "Juergen Gross" , Stefano Stabellini , Oleksandr Tyshchenko , Rahul Singh , David Woodhouse , Viresh Kumar , Konrad Rzeszutek Wilk , Jeremy Fitzhardinge , , Subject: [PATCH] xen/events: close evtchn after mapping cleanup Date: Wed, 24 Jan 2024 16:31:28 +0000 Message-ID: <20240124163130.31324-1-mheyne@amazon.de> X-Mailer: git-send-email 2.40.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Precedence: Bulk Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit shutdown_pirq and startup_pirq are not taking the irq_mapping_update_lock because they can't due to lock inversion. Both are called with the irq_desc->lock being taking. The lock order, however, is first irq_mapping_update_lock and then irq_desc->lock. This opens multiple races: - shutdown_pirq can be interrupted by a function that allocates an event channel: CPU0 CPU1 shutdown_pirq { xen_evtchn_close(e) __startup_pirq { EVTCHNOP_bind_pirq -> returns just freed evtchn e set_evtchn_to_irq(e, irq) } xen_irq_info_cleanup() { set_evtchn_to_irq(e, -1) } } Assume here event channel e refers here to the same event channel number. After this race the evtchn_to_irq mapping for e is invalid (-1). - __startup_pirq races with __unbind_from_irq in a similar way. Because __startup_pirq doesn't take irq_mapping_update_lock it can grab the evtchn that __unbind_from_irq is currently freeing and cleaning up. In this case even though the event channel is allocated, its mapping can be unset in evtchn_to_irq. The fix is to first cleanup the mappings and then close the event channel. In this way, when an event channel gets allocated it's potential previous evtchn_to_irq mappings are guaranteed to be unset already. This is also the reverse order of the allocation where first the event channel is allocated and then the mappings are setup. On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal [un]bind interfaces"), we hit a BUG like the following during probing of NVMe devices. The issue is that during nvme_setup_io_queues, pci_free_irq is called for every device which results in a call to shutdown_pirq. With many nvme devices it's therefore likely to hit this race during boot because there will be multiple calls to shutdown_pirq and startup_pirq are running potentially in parallel. ------------[ cut here ]------------ blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled kernel BUG at drivers/xen/events/events_base.c:499! invalid opcode: 0000 [#1] SMP PTI CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 Workqueue: nvme-reset-wq nvme_reset_work RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0 Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? show_trace_log_lvl+0x1c1/0x2d9 ? show_trace_log_lvl+0x1c1/0x2d9 ? set_affinity_irq+0xdc/0x1c0 ? __die_body.cold+0x8/0xd ? die+0x2b/0x50 ? do_trap+0x90/0x110 ? bind_evtchn_to_cpu+0xdf/0xf0 ? do_error_trap+0x65/0x80 ? bind_evtchn_to_cpu+0xdf/0xf0 ? exc_invalid_op+0x4e/0x70 ? bind_evtchn_to_cpu+0xdf/0xf0 ? asm_exc_invalid_op+0x12/0x20 ? bind_evtchn_to_cpu+0xdf/0xf0 ? bind_evtchn_to_cpu+0xc5/0xf0 set_affinity_irq+0xdc/0x1c0 irq_do_set_affinity+0x1d7/0x1f0 irq_setup_affinity+0xd6/0x1a0 irq_startup+0x8a/0xf0 __setup_irq+0x639/0x6d0 ? nvme_suspend+0x150/0x150 request_threaded_irq+0x10c/0x180 ? nvme_suspend+0x150/0x150 pci_request_irq+0xa8/0xf0 ? __blk_mq_free_request+0x74/0xa0 queue_request_irq+0x6f/0x80 nvme_create_queue+0x1af/0x200 nvme_create_io_queues+0xbd/0xf0 nvme_setup_io_queues+0x246/0x320 ? nvme_irq_check+0x30/0x30 nvme_reset_work+0x1c8/0x400 process_one_work+0x1b0/0x350 worker_thread+0x49/0x310 ? process_one_work+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Modules linked in: ---[ end trace a11715de1eee1873 ]--- Fixes: d46a78b05c0e ("xen: implement pirq type event channels") Cc: stable@vger.kernel.org Co-debugged-by: Andrew Panyakin Signed-off-by: Maximilian Heyne --- drivers/xen/events/events_base.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index b8cfea7812d6..3b9f080109d7 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -923,8 +923,8 @@ static void shutdown_pirq(struct irq_data *data) return; do_mask(info, EVT_MASK_REASON_EXPLICIT); - xen_evtchn_close(evtchn); xen_irq_info_cleanup(info); + xen_evtchn_close(evtchn); } static void enable_pirq(struct irq_data *data) @@ -956,6 +956,7 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi); static void __unbind_from_irq(struct irq_info *info, unsigned int irq) { evtchn_port_t evtchn; + bool close_evtchn = false; if (!info) { xen_irq_free_desc(irq); @@ -975,7 +976,7 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq) struct xenbus_device *dev; if (!info->is_static) - xen_evtchn_close(evtchn); + close_evtchn = true; switch (info->type) { case IRQT_VIRQ: @@ -995,6 +996,9 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq) } xen_irq_info_cleanup(info); + + if (close_evtchn) + xen_evtchn_close(evtchn); } xen_free_irq(info); -- 2.40.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879