Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp289773ybt; Wed, 17 Jun 2020 00:37:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJySQco5DLKmA5vPzB5+a3ciFTV1gSYJuepbGlgB2i0OlAeWvKrdVykBP8cOdvmyQybW7oRQ X-Received: by 2002:aa7:da4f:: with SMTP id w15mr5895795eds.384.1592379432861; Wed, 17 Jun 2020 00:37:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592379432; cv=none; d=google.com; s=arc-20160816; b=FFCL0xlOJ/OJXlxjT+MJz3SyG3FeOy5XP2ugqe5sYrIVppg7g3gIXD+WTtN+qGOpEi lmY0h5ceNcDNkU7yWalSETU1vRFgSN99G8qt98PTOqjvB941JTuYxz2UdMrgIhhAgm5B kc7/15yNs2GtsLltNI4t8/kVIXMYLoWpuxnbWjFWnDL2FodgM8ggJqYa8Uwq66vb3yno By5B9iJW0qROnZWAhJY5w10E4ojFMpGsLiCJ6IQTN9DGK4oJEqyNZDzXROcnTab9ZA2F BzpDVC8aksbmm0hNCmkQAISNRvkaEyIumq6RzvTnxz9h2k6zgQ3Dw8tQ4KDoz+taI4dn ipqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=fwngVWkjY9fLUGwy3ivvNT4X5Xvq6DSDUdpByq/8Ym4=; b=RU+GtKnEF4R1iYQtBli2IH5z9S5T8lhlxnHSlcEWP9gQk3pSHzbzn3Guhq42rF8uc2 SLZxJGoPmYaClbr8aiTryLPW+mDqs7N9dkJOiTN6rK1pQ+Mi1Qa4+teEjusgXLB2lpEd qobiOZcAswuWzOeJePqhqZAzBlPF41OjmsVPfYz7VcR3/sf+E6SMgQDmNPwkbvDRdaxc b/9rAQ4s4t+4+Q5hLxFrmRn/+1rAh0jkadSmQadJL9vqdGjFFAeGlVysu+wY7jNcWnKd JSoYsTBAfSviITOIkHc+SFsHY+IaOtpIJWO2r9+uDnCMA61Fq4nMxNuqqWY7d4rL4Dzo lN8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@atishpatra.org header.s=google header.b=dbNJ1iHZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a25si12259249ejb.697.2020.06.17.00.36.50; Wed, 17 Jun 2020 00:37:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@atishpatra.org header.s=google header.b=dbNJ1iHZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726270AbgFQHev (ORCPT + 99 others); Wed, 17 Jun 2020 03:34:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725894AbgFQHeu (ORCPT ); Wed, 17 Jun 2020 03:34:50 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF888C061573 for ; Wed, 17 Jun 2020 00:34:48 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id q25so877621wmj.0 for ; Wed, 17 Jun 2020 00:34:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=atishpatra.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fwngVWkjY9fLUGwy3ivvNT4X5Xvq6DSDUdpByq/8Ym4=; b=dbNJ1iHZuhQ+KrxrQ57v93tTtorXC26arOGU1ZaPPFFFaaG0teuhLyDWIXJPGLAjh4 feWSRbSxsJ4CL+QUgOaEFCJTfL160TYhRjvwn5OZuvuBs824bOd2fF8tsgkFf5XUwWS9 sSepUeOHSQh/2elEzN8RkF4NAFTgvGfIau9To= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fwngVWkjY9fLUGwy3ivvNT4X5Xvq6DSDUdpByq/8Ym4=; b=RV4+8XcbkyDVX/oThqm092vwdvM+X/zQdf+LECQa8/pXIvKJi1VIh20rsnW1+ebfyi /giTEWRiL901rYhF8218W1BVJ+5tHI3UW0CNYkEY0RNdhOMQc1/eE6I/FFE/jyqXQ2u7 L7t8gWjbsxOEeFO+3cVs4cpFD3bwFumyQdkHP5JGis9qf0I0p/mt15CnaXxir0R0lKaR ZLf5Qn9ptSWAyv1RlqQtGxB17hOvs3sIeQSeuRJR+XR53WuTir8XO4NNJQfng7YXzoNC +mpfk1NtMCfGnZ+gEKHs0ZR10tu8YskD7uUFwq0ZEjMnkg69mZP6vonvcv7xJdcJODXu /VbA== X-Gm-Message-State: AOAM533Jrb1WF3dG+9j7q1PmG7JrlwtmSvb4jMxUDpRNCtJuQM/Q8Dj6 r+IXrnipfcrHT25eaJRPqEuSM25RSESdCx/Ayb8a X-Received: by 2002:a1c:2e0e:: with SMTP id u14mr7119944wmu.55.1592379287504; Wed, 17 Jun 2020 00:34:47 -0700 (PDT) MIME-Version: 1.0 References: <20200616045108.GP75760@lianli.shorne-pla.net> <20200616191943.GA1401039@lianli.shorne-pla.net> <20200617053539.GB1401039@lianli.shorne-pla.net> <20200617060734.GC1401039@lianli.shorne-pla.net> In-Reply-To: From: Atish Patra Date: Wed, 17 Jun 2020 00:34:36 -0700 Message-ID: Subject: Re: mm lock issue while booting Linux on 5.8-rc1 for RISC-V To: Michel Lespinasse Cc: Stafford Horne , Palmer Dabbelt , linux-riscv , LKML , Bjorn Topel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 16, 2020 at 11:30 PM Michel Lespinasse wrote: > > On Tue, Jun 16, 2020 at 11:07 PM Stafford Horne wrote: > > On Wed, Jun 17, 2020 at 02:35:39PM +0900, Stafford Horne wrote: > > > On Tue, Jun 16, 2020 at 01:47:24PM -0700, Michel Lespinasse wrote: > > > > This makes me wonder actually - maybe there is a latent bug that got > > > > exposed after my change added the rwsem_is_locked assertion to the > > > > lockdep_assert_held one. If that is the case, it may be helpful to > > > > bisect when that issue first appeared, by testing before my patchset > > > > with VM_BUG_ON(!rwsem_is_locked(&walk.mm->mmap_lock)) added to > > > > walk_page_range() / walk_page_range_novma() / walk_page_vma() ... > > > > > > Hello, > > > > > > I tried to bisect it, but I think this issue goes much further back. > > > > > > Just with the below patch booting fails all the way back to v5.7. > > > > > > What does this mean by they way, why would mmap_assert_locked() want to assert > > > that the rwsem_is_locked() is not true? > > It's the opposite - VM_BUG_ON(cond) triggers if cond is true, so in > other words it asserts that cond is false. Yeah, I agree it is kinda > confusing. But in our case, it asserts that the rwsem is locked, which > is what we want. > > > The openrisc code that was walking the page ranges was not locking mm. I have > > added the below patch to v5.8-rc1 and it seems to work fine. I will send a > > better patch in a bit. > > > > iff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > > index c152a68811dd..bd5f05dd9174 100644 > > --- a/arch/openrisc/kernel/dma.c > > +++ b/arch/openrisc/kernel/dma.c > > @@ -74,8 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size) > > * We need to iterate through the pages, clearing the dcache for > > * them and setting the cache-inhibit bit. > > */ > > + mmap_read_lock(&init_mm); > > error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops, > > NULL); > > + mmap_read_unlock(&init_mm); > > if (error) > > return ERR_PTR(error); > > return cpu_addr; > > @@ -85,9 +87,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > > { > > unsigned long va = (unsigned long)cpu_addr; > > > > + mmap_read_lock(&init_mm); > > /* walk_page_range shouldn't be able to fail here */ > > WARN_ON(walk_page_range(&init_mm, va, va + size, > > &clear_nocache_walk_ops, NULL)); > > + mmap_read_unlock(&init_mm); > > } > > Thanks a lot for getting to the bottom of this. I think this is the proper fix. A similar patch works for RISC-V as well. Thanks for debugging it. To sum it up, mm should be locked before walk_page_range and walk_page_range_novma. Here is a diff that works for RISC-V. I will send the patch soon. diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index ec2c70f84994..289a9a5ea5b5 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -151,6 +151,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_direct_map_invalid_noflush(struct page *page) { + int ret; unsigned long start = (unsigned long)page_address(page); unsigned long end = start + PAGE_SIZE; struct pageattr_masks masks = { @@ -158,11 +159,16 @@ int set_direct_map_invalid_noflush(struct page *page) .clear_mask = __pgprot(_PAGE_PRESENT) }; - return walk_page_range(&init_mm, start, end, &pageattr_ops, &masks); + mmap_read_lock(&init_mm); + ret = walk_page_range(&init_mm, start, end, &pageattr_ops, &masks); + mmap_read_unlock(&init_mm); + + return ret; } int set_direct_map_default_noflush(struct page *page) { + int ret; unsigned long start = (unsigned long)page_address(page); unsigned long end = start + PAGE_SIZE; struct pageattr_masks masks = { @@ -170,7 +176,11 @@ int set_direct_map_default_noflush(struct page *page) .clear_mask = __pgprot(0) }; - return walk_page_range(&init_mm, start, end, &pageattr_ops, &masks); + mmap_read_lock(&init_mm); + ret = walk_page_range(&init_mm, start, end, &pageattr_ops, &masks); + mmap_read_unlock(&init_mm); + + return ret; } -- Regards, Atish