Received: by 10.213.65.68 with SMTP id h4csp597704imn; Tue, 20 Mar 2018 10:28:34 -0700 (PDT) X-Google-Smtp-Source: AG47ELu713IgQoN89cO5qodQI2z/9sv/tT38LbQusSYBicFjZ5804t9+ITF+xDAjLt8DQzkc6ZHH X-Received: by 10.101.77.145 with SMTP id p17mr1078543pgq.275.1521566914230; Tue, 20 Mar 2018 10:28:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521566914; cv=none; d=google.com; s=arc-20160816; b=unSCElX6fd9DPaXkcDP20UpM+SDrq9Xl6X6QfLvI50lRNJu4ActnCRGeDlSQT1DW9W kod5OHZOp2qUTWlKV/j4kjNIYM+nKFn2UiAKMIcMMkq1NQxfT3OhlzPqR0pQykDCpqb9 rjMtKLg5zgjG/jn3w+/xN+T+dS2NmvJrwxZhuzO7gaF/diaJGM7T2jXf8vDfgDAiSddH myNRO49yKA6bqUUW4ymqNn5TT8HVrzFfYYHL4LhtaUojtYGHGXRF4dfLDzoviPxD0aJt JJtROiMX47/A0/1q/Vc34d+dmHKU76R3md2n6rNs3FyoGfwPmbxNYn29a5DSzBeEfyuh zafQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:to:from :arc-authentication-results; bh=J0E+lVH6KmUOLX7lm7EhEWmjJxVMSKFo8eBCzg3LHR8=; b=CgcVnrZN3JwQqEbePEgU8F/V4sX2E3rzMPiNgiFbKCc0v6ycCCTLxp7e4MVQsBZw8d Gr6JwuwUbft6bHhfpdFgr58Q5aM+VX1QGjzEADSyFuIVEjgcEZiE09u2aU0naGxghG3A X/eQBrph7eGo4DbyouruMoFztBtp+Vo4w2IWK5ued7fcuNKFUgDcKN2j8s0/mo99RmhV oKcsS8DhT25O4WbHiBiWiUZazom+mhJhdagustd7plzqp62PPS2A4Ac7F6chL39p4YBC /c/2K0yzVqyik0zZgEvOvaFrL10sOGktUtD0vrl6wDSHzYjo7ADuyiWcN/ZLrQBqPYwy oXWw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v22si1475414pgb.115.2018.03.20.10.28.19; Tue, 20 Mar 2018 10:28:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751545AbeCTR0H (ORCPT + 99 others); Tue, 20 Mar 2018 13:26:07 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:51296 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751411AbeCTR0B (ORCPT ); Tue, 20 Mar 2018 13:26:01 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2KHPRqW011209 for ; Tue, 20 Mar 2018 13:26:01 -0400 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gu6j2rckf-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Tue, 20 Mar 2018 13:26:00 -0400 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 20 Mar 2018 17:25:58 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 20 Mar 2018 17:25:55 -0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2KHPt6I66125904; Tue, 20 Mar 2018 17:25:55 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BD412AE04D; Tue, 20 Mar 2018 17:16:15 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 76890AE055; Tue, 20 Mar 2018 17:16:15 +0000 (GMT) Received: from nimbus.lab.toulouse-stg.fr.ibm.com (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 20 Mar 2018 17:16:15 +0000 (GMT) From: Laurent Dufour To: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , mhocko@kernel.org Subject: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned Date: Tue, 20 Mar 2018 18:25:54 +0100 X-Mailer: git-send-email 2.7.4 X-TM-AS-GCONF: 00 x-cbid: 18032017-0008-0000-0000-000004E06562 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032017-0009-0000-0000-00001E7383BB Message-Id: <1521566754-30390-1-git-send-email-ldufour@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-20_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803200198 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When running the sampler detailed below, the kernel, if built with the VM debug option turned on (as many distro do), is panicing with the following message : kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt 8<--8<--8<--8< snip 8<--8<--8<--8< CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009 REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic) MSR: 9000000000029033 CR: 24002222 XER: 20040000 CFAR: c00000000036ee44 SOFTE: 1 GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40 GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000 GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000 GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554 GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40 GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950 NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760 LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50 Call Trace: [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable) [c000003fbcdcfb50] [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50 [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190 [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140 [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0 [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0 [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80 [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100 [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30 [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c Instruction dump: 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468 ---[ end trace ee88f958a1c62605 ]--- The panic is due to a VMA pointing to a hugetlb area while the vma->vm_start or vma->vm_end field are not aligned to the huge page boundaries. The sampler is just unmapping a part of the hugetlb area, leading to 2 VMAs which are not well aligned. The same could be achieved by calling madvise() situation, as it is when running: stress-ng --shm-sysv 1 The hugetlb code is assuming that the VMA will be well aligned when it is unmapped, so we must prevent such a VMA to be split or shrink to a misaligned address. This patch is preventing this by checking the new VMA's boundaries when a VMA is modified by calling vma_adjust(). If this patch is applied, stable should be Cced. --- Sampler used to hit the panic nclude unsigned long page_size; int main(void) { int shmid, ret=1; void *addr; setbuf(stdout, NULL); page_size = getpagesize(); shmid = shmget(0x1410, LENGTH, IPC_CREAT | SHM_HUGETLB | SHM_R | SHM_W); if (shmid < 0) { perror("shmget"); exit(1); } printf("shmid: %d\n", shmid); addr = shmat(shmid, NULL, 0); if (addr == (void*)-1) { perror("shmat"); goto out; } /* * The following munmap() call will split the VMA in 2, leading to * unaligned to huge page size VMAs which will trigger a check when * shmdt() is called. */ if (munmap(addr + HPSIZE + page_size, page_size)) { perror("munmap"); goto out; } if (shmdt(addr)) { perror("shmdt"); goto out; } printf("test done.\n"); ret = 0; out: shmctl(shmid, IPC_RMID, NULL); return ret; } --- End of code Signed-off-by: Laurent Dufour --- mm/mmap.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 188f195883b9..5dbf4b69a798 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -692,6 +692,17 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, long adjust_next = 0; int remove_next = 0; + if (is_vm_hugetlb_page(vma)) { + /* + * We must check against the huge page boundarie to not + * create misaligned VMA. + */ + struct hstate *h = hstate_vma(vma); + + if (start & ~huge_page_mask(h) || end & ~huge_page_mask(h)) + return -EINVAL; + } + if (next && !insert) { struct vm_area_struct *exporter = NULL, *importer = NULL; -- 2.7.4