Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp879961imm; Fri, 12 Oct 2018 08:09:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV61z3xYB71uy806fhj0LmfYWBnvYfPr3+xdMTKhJoKZbXIhNrOSZIfCaBWatoRJAyb3a7pQE X-Received: by 2002:a17:902:b109:: with SMTP id q9-v6mr6317480plr.83.1539356984894; Fri, 12 Oct 2018 08:09:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539356984; cv=none; d=google.com; s=arc-20160816; b=YJ4i3SMp+VZT4sTk5JWH6yXVjxzDaxVR9GXBqBo45NU/pOczYOlXvdZTLAao116Y+h kf5Md8Y39O5kiH4yyGfmdJ+AlAET+pBjA1lzisP7C2b0gd+xsYO3mp06IZHByMddLrNa OZS7/q0mAD9GhB9OJsHivkZZVH9ShGhxi4wv2lKcJep/uy5ZpK5/s7vm4fsrjxynDbG0 nsmjmQ1HheWIKAkz9Q3HiIupiil8stdU4DLd2bW1H5MLtJNSzJMcV8A4SpRWHzzp5lgf ph3yu4rzi/CPDbGAtzxgbSkNQCf7Xv6hvjekEei9XvUUQLytSi5tKqSkgxRCt/R6SZPA gAwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :message-id:references:in-reply-to:subject:cc:to:from:date; bh=ouI4m2izP1px4YbLHlnXzr2sciFURTlCEt4lPXR4h0U=; b=PGwNPu9wAi66PKVx4c7e6gTxuCm4Gz1uwN6n0qN57g+VYgJnlYoUTBu7T0heL6JQPm tAwQ3G6N1m2+Otr71H8ok0RwYxaqhyeQa74C3BI8hZPgoTA7g+MuvqKFi5L14XChRBSZ DuFTgk+YxLicw36AC7bUtX+ypEeAACjYvH7R0pNU4MLpefYUPsYpkZk4e4aq+GYWWYMj kh8O4CALOG7fAx0NIUtNQRIWI1qtPrG9cG94WccdP63IUXFxwB9IytcIwDPfZVPJnUI0 lvcCfjU7TMdLaWaZfSybo5XdTSKB0+py7flxeTYhJkBQClLnGCMqOK8C/iXp+gmBN+Q+ DnLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w9-v6si1490916pll.138.2018.10.12.08.09.30; Fri, 12 Oct 2018 08:09:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729072AbeJLWlh convert rfc822-to-8bit (ORCPT + 99 others); Fri, 12 Oct 2018 18:41:37 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:16373 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728999AbeJLWlg (ORCPT ); Fri, 12 Oct 2018 18:41:36 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w9CF6MC3098675 for ; Fri, 12 Oct 2018 11:08:41 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2n2v5rd97j-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 12 Oct 2018 11:08:40 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 12 Oct 2018 16:08:38 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 12 Oct 2018 16:08:36 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w9CF8ZW84915612 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 12 Oct 2018 15:08:35 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3E29952050; Fri, 12 Oct 2018 18:08:07 +0100 (BST) Received: from mschwideX1 (unknown [9.152.212.164]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 01F5F52057; Fri, 12 Oct 2018 18:08:06 +0100 (BST) Date: Fri, 12 Oct 2018 17:08:33 +0200 From: Martin Schwidefsky To: Li Wang Cc: Guenter Roeck , Janosch Frank , "Kirill A. Shutemov" , Heiko Carstens , linux-kernel , Linux-MM Subject: Re: s390: runtime warning about pgtables_bytes In-Reply-To: <20181011150211.7d8c07ac@mschwideX1> References: <20181011150211.7d8c07ac@mschwideX1> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) X-TM-AS-GCONF: 00 x-cbid: 18101215-0028-0000-0000-000003064AAF X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18101215-0029-0000-0000-000023C0AECD Message-Id: <20181012170833.2a05f308@mschwideX1> Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-10-12_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810120149 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 11 Oct 2018 15:02:11 +0200 Martin Schwidefsky wrote: > On Thu, 11 Oct 2018 18:04:12 +0800 > Li Wang wrote: > > > When running s390 system with LTP/cve-2017-17052.c[1], the following BUG is > > came out repeatedly. > > I remember this warning start from kernel-4.16.0 and now it still exist in > > kernel-4.19-rc7. > > Can anyone take a look? > > > > [ 2678.991496] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.001543] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.002453] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.003256] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.013689] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.024647] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.064408] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > [ 2679.133963] BUG: non-zero pgtables_bytes on freeing mm: 16384 > > > > [1]: > > https://github.com/linux-test-project/ltp/blob/master/testcases/cve/cve-2017-17052.c > > Confirmed, I see this bug with cvs-2017-17052 on my LPAR as well. > I'll look into it. Ok, I think I understand the problem now. This is the patch I am testing right now. It seems to fix the issue, but I had to change common mm code for it. -- From 9e3bc2e96930206ef1ece377e45224c51aca1799 Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Fri, 12 Oct 2018 16:32:29 +0200 Subject: [RFC][PATCH] s390/mm: fix mis-accounting of pgtable_bytes In case a fork or a clone system fails in copy_process and the error handling does the mmput() at the bad_fork_cleanup_mm label, the following warning messages will appear on the console: BUG: non-zero pgtables_bytes on freeing mm: 16384 The reason for that is the tricks we play with mm_inc_nr_puds() and mm_inc_nr_pmds() in init_new_context(). A normal 64-bit process has 3 levels of page table, the p4d level and the pud level are folded. On process termination the free_pud_range() function in mm/memory.c will subtract 16KB from pgtable_bytes with a mm_dec_nr_puds() call, but there actually is not really a pud table. The s390 version of pud_free_tlb() recognized this an does nothing, the region-3 table will be freed with the pgd_free() call later on. But the mm_dec_nr_puds() is done unconditionally, to counter act this the init_new_context() function has an extra mm_inc_nr_puds() call. Now with a failed fork or clone the free_pgtables() function is not called, there is no mm_dec_nr_puds() but the mm_inc_nr_puds() has been done which leads to the incorrect pgtable_bytes of 16384. Nothing is broken by this, but the warning is annoying. To get rid of the warning drop the mm_inc_nr_pmds() & mm_inc_nr_puds() calls from init_new_context(), introduce the mm_pmd_folded(), pmd_pud_folded() and pmd_p4d_folded() helper, and add if-statements to the functions mm_[inc|dec]_nr_[pmds|puds]. Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/mmu_context.h | 5 ----- arch/s390/include/asm/pgalloc.h | 6 ++--- arch/s390/include/asm/pgtable.h | 18 +++++++++++++++ arch/s390/include/asm/tlb.h | 6 ++--- include/linux/mm.h | 44 ++++++++++++++++++++++++++++++++----- 5 files changed, 62 insertions(+), 17 deletions(-) diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h index dbd689d556ce..ccbb53e22024 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -46,8 +46,6 @@ static inline int init_new_context(struct task_struct *tsk, mm->context.asce_limit = STACK_TOP_MAX; mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH | _ASCE_USER_BITS | _ASCE_TYPE_REGION3; - /* pgd_alloc() did not account this pud */ - mm_inc_nr_puds(mm); break; case -PAGE_SIZE: /* forked 5-level task, set new asce with new_mm->pgd */ @@ -63,9 +61,6 @@ static inline int init_new_context(struct task_struct *tsk, /* forked 2-level compat task, set new asce with new mm->pgd */ mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH | _ASCE_USER_BITS | _ASCE_TYPE_SEGMENT; - /* pgd_alloc() did not account this pmd */ - mm_inc_nr_pmds(mm); - mm_inc_nr_puds(mm); } crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm)); return 0; diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index f0f9bcf94c03..5ee733720a57 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -36,11 +36,11 @@ static inline void crst_table_init(unsigned long *crst, unsigned long entry) static inline unsigned long pgd_entry_type(struct mm_struct *mm) { - if (mm->context.asce_limit <= _REGION3_SIZE) + if (mm_pmd_folded(mm)) return _SEGMENT_ENTRY_EMPTY; - if (mm->context.asce_limit <= _REGION2_SIZE) + if (mm_pud_folded(mm)) return _REGION3_ENTRY_EMPTY; - if (mm->context.asce_limit <= _REGION1_SIZE) + if (mm_p4d_folded(mm)) return _REGION2_ENTRY_EMPTY; return _REGION1_ENTRY_EMPTY; } diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 411d435e7a7d..063732414dfb 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -493,6 +493,24 @@ static inline int is_module_addr(void *addr) _REGION_ENTRY_PROTECT | \ _REGION_ENTRY_NOEXEC) +static inline bool mm_p4d_folded(struct mm_struct *mm) +{ + return mm->context.asce_limit <= _REGION1_SIZE; +} +#define mm_p4d_folded(mm) mm_p4d_folded(mm) + +static inline bool mm_pud_folded(struct mm_struct *mm) +{ + return mm->context.asce_limit <= _REGION2_SIZE; +} +#define mm_pud_folded(mm) mm_pud_folded(mm) + +static inline bool mm_pmd_folded(struct mm_struct *mm) +{ + return mm->context.asce_limit <= _REGION3_SIZE; +} +#define mm_pmd_folded(mm) mm_pmd_folded(mm) + static inline int mm_has_pgste(struct mm_struct *mm) { #ifdef CONFIG_PGSTE diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h index 457b7ba0fbb6..b31c779cf581 100644 --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -136,7 +136,7 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, unsigned long address) { - if (tlb->mm->context.asce_limit <= _REGION3_SIZE) + if (mm_pmd_folded(tlb->mm)) return; pgtable_pmd_page_dtor(virt_to_page(pmd)); tlb_remove_table(tlb, pmd); @@ -152,7 +152,7 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d, unsigned long address) { - if (tlb->mm->context.asce_limit <= _REGION1_SIZE) + if (mm_p4d_folded(tlb->mm)) return; tlb_remove_table(tlb, p4d); } @@ -167,7 +167,7 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d, static inline void pud_free_tlb(struct mmu_gather *tlb, pud_t *pud, unsigned long address) { - if (tlb->mm->context.asce_limit <= _REGION2_SIZE) + if (mm_pud_folded(tlb->mm)) return; tlb_remove_table(tlb, pud); } diff --git a/include/linux/mm.h b/include/linux/mm.h index 0416a7204be3..1e4a045f19ec 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -105,6 +105,34 @@ extern int mmap_rnd_compat_bits __read_mostly; #define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page))) #endif +/* + * On some architectures it depends on the mm if the p4d/pud or pmd + * layer of the page table hierarchy is folded or not. + */ +#ifndef mm_p4d_folded +#define mm_p4d_folded(mm) mm_p4d_folded(mm) +static inline bool mm_p4d_folded(struct mm_struct *mm) +{ + return __is_defined(__PAGETABLE_P4D_FOLDED); +} +#endif + +#ifndef mm_pud_folded +#define mm_pud_folded(mm) mm_pud_folded(mm) +static inline bool mm_pud_folded(struct mm_struct *mm) +{ + return __is_defined(__PAGETABLE_PUD_FOLDED); +} +#endif + +#ifndef mm_pmd_folded +#define mm_pmd_folded(mm) mm_pmd_folded(mm) +static inline bool mm_pmd_folded(struct mm_struct *mm) +{ + return __is_defined(__PAGETABLE_PMD_FOLDED); +} +#endif + /* * Default maximum number of active map areas, this limits the number of vmas * per mm struct. Users can overwrite this number by sysctl but there is a @@ -1710,7 +1738,7 @@ static inline int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address); #endif -#if defined(__PAGETABLE_PUD_FOLDED) || !defined(CONFIG_MMU) +#if !defined(CONFIG_MMU) static inline int __pud_alloc(struct mm_struct *mm, p4d_t *p4d, unsigned long address) { @@ -1724,16 +1752,18 @@ int __pud_alloc(struct mm_struct *mm, p4d_t *p4d, unsigned long address); static inline void mm_inc_nr_puds(struct mm_struct *mm) { - atomic_long_add(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes); + if (!mm_pud_folded(mm)) + atomic_long_add(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes); } static inline void mm_dec_nr_puds(struct mm_struct *mm) { - atomic_long_sub(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes); + if (!mm_pud_folded(mm)) + atomic_long_sub(PTRS_PER_PUD * sizeof(pud_t), &mm->pgtables_bytes); } #endif -#if defined(__PAGETABLE_PMD_FOLDED) || !defined(CONFIG_MMU) +#if !defined(CONFIG_MMU) static inline int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) { @@ -1748,12 +1778,14 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address); static inline void mm_inc_nr_pmds(struct mm_struct *mm) { - atomic_long_add(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes); + if (!mm_pmd_folded(mm)) + atomic_long_add(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes); } static inline void mm_dec_nr_pmds(struct mm_struct *mm) { - atomic_long_sub(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes); + if (!mm_pmd_folded(mm)) + atomic_long_sub(PTRS_PER_PMD * sizeof(pmd_t), &mm->pgtables_bytes); } #endif -- 2.16.4 -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.