Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1443733pxb; Mon, 22 Feb 2021 01:55:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJxXPFG5VIuuO/3g5O6V1bP1Agzpplwel2eSmhIleAerWd8bBd8PiXYE6h6YcBQQuUxpH+co X-Received: by 2002:a17:906:b6c3:: with SMTP id ec3mr20298844ejb.200.1613987730219; Mon, 22 Feb 2021 01:55:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613987730; cv=none; d=google.com; s=arc-20160816; b=Hf/POy/CLh4cFNbLlTpkaQbhOeH0IN9XqRXWdX4p0yHt7HH3Ndq2yBwd4MqxuMzF8f 2Id5ykNnyLO2fezoYdEEQR1qbHKkA17fkOtdXL4haXHrnpDTaPDIEza+E72NI0DwseFG o/sZORvQ0GMwubVA+IkR12uM/B1VJKLRz+xAu9foP5G6dOcmJi7o8JXNCHAjfqaq+5FI qbD3tuc7xA0y0Az7EianAu5elJBQzDSJKYQGpbl3Btav6Uuc2p74xbIlK/4VXcHtkWo2 uEJFSfDGz6qD97Hc13V+IBS5dtZWmy4exZ1X1Q37GNm1TDqDjFuN5xIaw7wPX+uk2oXh JJ0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to:dkim-signature; bh=d5eIr8u8FvWotSBBBi/ahMw0H/+9oyeliDYTeYUXMTY=; b=jcKdOHTwkDWFvrDMmM94G+muca5GuqsJdANmSDgQVzwCANOQxtWW5ZnOGbA7kZWvC9 GcOv6wSGUX2J90oRb3YW8Wts1Re34k2WFE+rr+Dwx7ReXSb6w8jsJ0Mq2IP3vZUTCHul 0fLeqKiXyVW3xBNqeg4ACMn3uoO5oJVkv9EO/oqg78hcOQxUam1rdief0ujhLrIPXgl+ Rp0lAEnzrdLDKZQKk8LQvuW3k7dYAOGL4GtaexdrkZGyK3G8Sas1+MKiBP/WeiaaOVY2 HJRyuHa9f3nz2MRFFtOry7r9R69J9AARUXI3iyGuBC+Sy55ps2DImKWsCUMZEBIC+nj/ BdYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GXqAOA+R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h1si589858ejb.448.2021.02.22.01.55.07; Mon, 22 Feb 2021 01:55:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GXqAOA+R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230312AbhBVJyB (ORCPT + 99 others); Mon, 22 Feb 2021 04:54:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:56993 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230044AbhBVJyB (ORCPT ); Mon, 22 Feb 2021 04:54:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613987553; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d5eIr8u8FvWotSBBBi/ahMw0H/+9oyeliDYTeYUXMTY=; b=GXqAOA+RYvIZJfEUn4nGiayIrj5840zqoWrctV44cKoS/RUfyIB51oWyvKgrqOOuFoj3SW B2t+2xYIsl8qGo/3X1cbvmWo6n02tKhW3lq9/EhGYTvYj+mwJvN8Esiq+xsQ3kjD2P0RnD MJAx13YimqMazBnXsP7uyJLADvg68fQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-30-3Uvdxx5pNT-S0cgB1-QOkw-1; Mon, 22 Feb 2021 04:52:31 -0500 X-MC-Unique: 3Uvdxx5pNT-S0cgB1-QOkw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7C76D801965; Mon, 22 Feb 2021 09:52:28 +0000 (UTC) Received: from [10.36.115.16] (ovpn-115-16.ams2.redhat.com [10.36.115.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id 160D71001281; Mon, 22 Feb 2021 09:52:23 +0000 (UTC) To: George Kennedy , Andrey Konovalov Cc: Andrew Morton , Catalin Marinas , Vincenzo Frascino , Dmitry Vyukov , Konrad Rzeszutek Wilk , Will Deacon , Andrey Ryabinin , Alexander Potapenko , Marco Elver , Peter Collingbourne , Evgenii Stepanov , Branislav Rankov , Kevin Brodsky , Christoph Hellwig , kasan-dev , Linux ARM , Linux Memory Management List , LKML , Dhaval Giani References: <487751e1ccec8fcd32e25a06ce000617e96d7ae1.1613595269.git.andreyknvl@google.com> <797fae72-e3ea-c0b0-036a-9283fa7f2317@oracle.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH] mm, kasan: don't poison boot memory Message-ID: <1ac78f02-d0af-c3ff-cc5e-72d6b074fc43@redhat.com> Date: Mon, 22 Feb 2021 10:52:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <797fae72-e3ea-c0b0-036a-9283fa7f2317@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20.02.21 00:04, George Kennedy wrote: > > > On 2/19/2021 11:45 AM, George Kennedy wrote: >> >> >> On 2/18/2021 7:09 PM, Andrey Konovalov wrote: >>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy >>> wrote: >>>> >>>> >>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote: >>>>> On 17.02.21 21:56, Andrey Konovalov wrote: >>>>>> During boot, all non-reserved memblock memory is exposed to the buddy >>>>>> allocator. Poisoning all that memory with KASAN lengthens boot time, >>>>>> especially on systems with large amount of RAM. This patch makes >>>>>> page_alloc to not call kasan_free_pages() on all new memory. >>>>>> >>>>>> __free_pages_core() is used when exposing fresh memory during system >>>>>> boot and when onlining memory during hotplug. This patch adds a new >>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through >>>>>> free_pages_prepare() from __free_pages_core(). >>>>>> >>>>>> This has little impact on KASAN memory tracking. >>>>>> >>>>>> Assuming that there are no references to newly exposed pages >>>>>> before they >>>>>> are ever allocated, there won't be any intended (but buggy) >>>>>> accesses to >>>>>> that memory that KASAN would normally detect. >>>>>> >>>>>> However, with this patch, KASAN stops detecting wild and large >>>>>> out-of-bounds accesses that happen to land on a fresh memory page >>>>>> that >>>>>> was never allocated. This is taken as an acceptable trade-off. >>>>>> >>>>>> All memory allocated normally when the boot is over keeps getting >>>>>> poisoned as usual. >>>>>> >>>>>> Signed-off-by: Andrey Konovalov >>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d >>>>> Not sure this is the right thing to do, see >>>>> >>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com >>>>> >>>>> >>>>> Reversing the order in which memory gets allocated + used during boot >>>>> (in a patch by me) might have revealed an invalid memory access during >>>>> boot. >>>>> >>>>> I suspect that that issue would no longer get detected with your >>>>> patch, as the invalid memory access would simply not get detected. >>>>> Now, I cannot prove that :) >>>> Since David's patch we're having trouble with the iBFT ACPI table, >>>> which >>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KASAN >>>> detects that it is being used after free when ibft_init() accesses the >>>> iBFT table, but as of yet we can't find where it get's freed (we've >>>> instrumented calls to kunmap()). >>> Maybe it doesn't get freed, but what you see is a wild or a large >>> out-of-bounds access. Since KASAN marks all memory as freed during the >>> memblock->page_alloc transition, such bugs can manifest as >>> use-after-frees. >> >> It gets freed and re-used. By the time the iBFT table is accessed by >> ibft_init() the page has been over-written. >> >> Setting page flags like the following before the call to kmap() >> prevents the iBFT table page from being freed: > > Cleaned up version: > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > index 0418feb..8f0a8e7 100644 > --- a/drivers/acpi/osl.c > +++ b/drivers/acpi/osl.c > @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address > pg_off, unsigned long pg_sz) > >      pfn = pg_off >> PAGE_SHIFT; >      if (should_use_kmap(pfn)) { > +        struct page *page = pfn_to_page(pfn); > + >          if (pg_sz > PAGE_SIZE) >              return NULL; > -        return (void __iomem __force *)kmap(pfn_to_page(pfn)); > +        SetPageReserved(page); > +        return (void __iomem __force *)kmap(page); >      } else >          return acpi_os_ioremap(pg_off, pg_sz); >  } > @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address > pg_off, void __iomem *vaddr) >      unsigned long pfn; > >      pfn = pg_off >> PAGE_SHIFT; > -    if (should_use_kmap(pfn)) > -        kunmap(pfn_to_page(pfn)); > -    else > +    if (should_use_kmap(pfn)) { > +        struct page *page = pfn_to_page(pfn); > + > +        ClearPageReserved(page); > +        kunmap(page); > +    } else >          iounmap(vaddr); >  } > > David, the above works, but wondering why it is now necessary. kunmap() > is not hit. What other ways could a page mapped via kmap() be unmapped? > Let me look into the code ... I have little experience with ACPI details, so bear with me. I assume that acpi_map()/acpi_unmap() map some firmware blob that is provided via firmware/bios/... to us. should_use_kmap() tells us whether a) we have a "struct page" and should kmap() that one b) we don't have a "struct page" and should ioremap. As it is a blob, the firmware should always reserve that memory region via memblock (e.g., memblock_reserve()), such that we either 1) don't create a memmap ("struct page") at all (-> case b) ) 2) if we have to create e memmap, we mark the page PG_reserved and *never* expose it to the buddy (-> case a) ) Are you telling me that in this case we might have a memmap for the HW blob that is *not* PG_reserved? In that case it most probably got exposed to the buddy where it can happily get allocated/freed. The latent BUG would be that that blob gets exposed to the system like ordinary RAM, and not reserved via memblock early during boot. Assuming that blob has a low physical address, with my patch it will get allocated/used a lot earlier - which would mean we trigger this latent BUG now more easily. There have been similar latent BUGs on ARM boards that my patch discovered where special RAM regions did not get marked as reserved via the device tree properly. Now, this is just a wild guess :) Can you dump the page when mapping (before PageReserved()) and when unmapping, to see what the state of that memmap is? -- Thanks, David / dhildenb