Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751822AbbDLPiC (ORCPT ); Sun, 12 Apr 2015 11:38:02 -0400 Received: from mail-db3on0073.outbound.protection.outlook.com ([157.55.234.73]:18643 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751720AbbDLPiA convert rfc822-to-8bit (ORCPT ); Sun, 12 Apr 2015 11:38:00 -0400 X-Greylist: delayed 2021 seconds by postgrey-1.27 at vger.kernel.org; Sun, 12 Apr 2015 11:37:59 EDT From: Eli Cohen To: Honggang Li , "netdev@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [linux-next PATCH] mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures Thread-Topic: [linux-next PATCH] mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures Thread-Index: AQHQdRCqySYizaR760qJF6mzwdgJzJ1Jdmsg Date: Sun, 12 Apr 2015 15:04:16 +0000 Message-ID: References: <1428836751-23037-1-git-send-email-honli@redhat.com> In-Reply-To: <1428836751-23037-1-git-send-email-honli@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [193.47.165.251] authentication-results: redhat.com; dkim=none (message not signed) header.d=none; x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB4PR05MB446;UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB4PR05MB0782; x-forefront-antispam-report: BMV:1;SFV:NSPM;SFS:(10009020)(6009001)(13464003)(377454003)(41574002)(106116001)(2501003)(19580405001)(2950100001)(19580395003)(2656002)(87936001)(76576001)(74316001)(46102003)(62966003)(77156002)(2900100001)(40100003)(107886001)(122556002)(76176999)(54356999)(50986999)(66066001)(86362001)(102836002)(77096005)(33656002)(92566002);DIR:OUT;SFP:1101;SCL:1;SRVR:DB4PR05MB446;H:DB4PR05MB448.eurprd05.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006);SRVR:DB4PR05MB446;BCL:0;PCL:0;RULEID:;SRVR:DB4PR05MB446; x-forefront-prvs: 0544D934E1 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Apr 2015 15:04:16.5252 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR05MB446 X-OriginatorOrg: Mellanox.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3066 Lines: 69 Good catch, thanks! There are more places in this file where PAGE_MASK is wrongly used. Need to fix them as well. Also, see below [Eli] -----Original Message----- From: Honggang Li [mailto:honli@redhat.com] Sent: Sunday, April 12, 2015 2:06 PM To: Eli Cohen; netdev@vger.kernel.org; linux-rdma@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Honggang Li Subject: [linux-next PATCH] mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures If CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for x86 systems and physical memory is more than 4GB, dma_map_page may return a valid memory address which greater than 0xffffffff. As a result, the mlx5 device page allocator RB tree will be initialized with valid addresses greater than 0xfffffff. However, (addr & PAGE_MASK) set the high four bytes to zeros. So, it's impossible for the function, free_4k, to release the pages whose addresses greater than 4GB. Memory leaks. And mlx5_ib module can't release the pages when user try to remove the module, as a result, system hang. [root@rdma05 root]# dmesg | grep addr | head addr = 3fe384000 addr & PAGE_MASK = fe384000 [root@rdma05 root]# rmmod mlx5_ib <---- hang on ---------------------- cosnole log ----------------- mlx5_ib 0000:04:00.0: irq 138 for MSI/MSI-X alloc irq_desc for 139 on node -1 alloc kstat_irqs on node -1 mlx5_ib 0000:04:00.0: irq 139 for MSI/MSI-X 0000:04:00.0:free_4k:221:(pid 1519): page not found 0000:04:00.0:free_4k:221:(pid 1519): page not found 0000:04:00.0:free_4k:221:(pid 1519): page not found 0000:04:00.0:free_4k:221:(pid 1519): page not found ---------------------- cosnole log ----------------- Signed-off-by: Honggang Li --- drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index df22383..27c72da 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -211,12 +211,14 @@ static int alloc_4k(struct mlx5_core_dev *dev, u64 *addr) return 0; } +#define MLX5_U64_4K_PAGE_MASK ((~(u64)0U) << PAGE_SHIFT) + static void free_4k(struct mlx5_core_dev *dev, u64 addr) { struct fw_page *fwp; int n; - fwp = find_fw_page(dev, addr & PAGE_MASK); + fwp = find_fw_page(dev, addr & MLX5_U64_4K_PAGE_MASK); if (!fwp) { mlx5_core_warn(dev, "page not found\n"); return; @@ -241,7 +243,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr) static int alloc_system_page(struct mlx5_core_dev *dev, u16 func_id) { struct page *page; - u64 addr; + u64 addr = 0; [Eli] Why is this required? int err; int nid = dev_to_node(&dev->pdev->dev); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/