Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1299003imm; Tue, 5 Jun 2018 12:11:55 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIHajrNJ2PEFKRvyeoXZIMljPsZtdYylmmuZFnOBM/wSj9taeBw4TR7h4LILs2kDsJQ8+VC X-Received: by 2002:a63:4c8:: with SMTP id 191-v6mr21967216pge.129.1528225915207; Tue, 05 Jun 2018 12:11:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528225915; cv=none; d=google.com; s=arc-20160816; b=Q21HZ8KUw1Mn7bI5RVKwtmAqzHDP8gU3tb2noxNwsdvWYc2r4Whe+KNO4BWKYaDizR EMAwDmsaAw72VJO9ehw9vVowlHZ5XwQDa4Ihnlct/IdU1+Mhg9VfFgO9wVntddLnMPco 5OKSzkedKda73cW42+SQ+mieys64nLKFTiHgVjWj3boLc4uKqNj5kD76oXpIb64SyXA4 qEG8yfxT+z0U3j0/l2dw3Ug2uMY6CuKQTqKOIPKKzyAhVsD1fbPts6rnOIsZnlulwJaP l5h13JA0g9LaSO/RdB91o98G0YusgKaGq+1CDyzJshzJUAmzQwjm5DjyeWr7DzIpFLPX Tr8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=ZEljFjUqFtJLXafTHkvfkw1+i1qyev0gT5ApK3yNAkA=; b=ZWmQ11OukoPDmCIZjRZyySLqeIJ6adacxkKKzfjhPcq614MftaRog0HvX7Df3XXsB4 DUh1IykuIHrJgi1/ATPmvqtkSYwLHmnhVhjX6FT4ANLOlNuuctc0udz+W9EaHNCPGfbx Tj0SBM3BOmTtOOvKfmm5JFODYRx/WYQwuEKfyZGwVz7qqGao2/Z13DFH+El+PVf5pyQw 1Z6kJ/6OHr+IYnGycTFwOwWKOGAdlZ5wE4lvLdLjb+TMvspwVItKFcLFHMmpKmPWbFjE cglktn832BaRANeyPzlx5EV9uDiDVHH3XDvhnJ/7uTpOgJlWvosQj/iRix8wzl552VVG dfWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=r/559Q6s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r39-v6si50741697pld.249.2018.06.05.12.11.40; Tue, 05 Jun 2018 12:11:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=r/559Q6s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752402AbeFETKU (ORCPT + 99 others); Tue, 5 Jun 2018 15:10:20 -0400 Received: from mail-ot0-f194.google.com ([74.125.82.194]:46929 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752003AbeFETKS (ORCPT ); Tue, 5 Jun 2018 15:10:18 -0400 Received: by mail-ot0-f194.google.com with SMTP id q1-v6so4140233oth.13 for ; Tue, 05 Jun 2018 12:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ZEljFjUqFtJLXafTHkvfkw1+i1qyev0gT5ApK3yNAkA=; b=r/559Q6sl6J/GbqNHJCcX0XlCHuwWoxWbMGqR2QzStnzux8ivKam4jek/3GEPCUwX/ EX0+h7cOrv7M+UB9LSm5nrUk69cZpCSO+bf5HRSf+amMCd0wxSXtE/bZ5AYR4DT0Z7OG rWJToV18Im9+4iXq4Alx2fIjlqlDJOjI3s66vEVFUbSda5MQU0PtAjvkvzUSMTLPAuSH bVDsgcsUbXNwxKKnxv0EQO9WAPoBtem8U4bI/X7rgVTjm76zGamVS9H1ivoGssSOI8WK 9ed9nWQZzdl9YdB3RzWITkjtIniz6Xu+5ejsQdJfqV1rieP7Lt91EUeYfcdEcvYHuipg wQQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ZEljFjUqFtJLXafTHkvfkw1+i1qyev0gT5ApK3yNAkA=; b=qTNLEemU/Q3mukCYH7tL/EKU3SawLguClL9vEEfmCdG91QQl7M5dUQwJ2BPs9+i/Lw vzUQibInU7RadXLCdkoqjpdTW5je0xTLeIuxcTbvZAzcsmSVeldhqvJyoQadCyXTiHnP KNZm/N/lPLqLjgjLMq3/BCNfDwPAMr2yczaFOJJVEC2y1+8JEARpl4RwqVvgaAP92NQI d3J81ztzU/1yOkoWGncOxzbiyJ5VIQ++SLz0N9s/9jxycB5JknYIw3IRHPcDOznL/Vhr RM+dJuGk2bq2G0rTB5ZeJgVC/HEHSFOLSUgld1N6eDXMudePP0vJRBReLjiAv8yqy87p PgiQ== X-Gm-Message-State: APt69E1d3OsXN72s9v4bEwohD4CyUdeUQnwP8dcuDY0f7JjJ3vlRZXHp mJcjIgp7kSZ0N3PAt8Ma3Ozn0twe4Q9zMAPnVxyUqg== X-Received: by 2002:a9d:7311:: with SMTP id e17-v6mr14383071otk.162.1528225817127; Tue, 05 Jun 2018 12:10:17 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2ea9:0:0:0:0:0 with HTTP; Tue, 5 Jun 2018 12:10:16 -0700 (PDT) In-Reply-To: <71c55f5c-deed-b307-9022-8a41dd898822@virtuozzo.com> References: <71c55f5c-deed-b307-9022-8a41dd898822@virtuozzo.com> From: Dan Williams Date: Tue, 5 Jun 2018 12:10:16 -0700 Message-ID: Subject: Re: KASAN vs ZONE_DEVICE (was: Re: [PATCH v2 2/7] dax: change bdev_dax_supported()...) To: Andrey Ryabinin Cc: Dave Chinner , "Darrick J. Wong" , Mike Snitzer , linux-nvdimm , Linux Kernel Mailing List , linux-xfs , device-mapper development , linux-fsdevel , Dmitry Vyukov , Alexander Potapenko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 5, 2018 at 7:01 AM, Andrey Ryabinin wrote: > > > On 06/05/2018 07:22 AM, Dan Williams wrote: >> On Mon, Jun 4, 2018 at 8:32 PM, Dan Williams wrote: >>> [ adding KASAN devs...] >>> >>> On Mon, Jun 4, 2018 at 4:40 PM, Dan Williams wrote: >>>> On Sun, Jun 3, 2018 at 6:48 PM, Dan Williams wrote: >>>>> On Sun, Jun 3, 2018 at 5:25 PM, Dave Chinner wrote: >>>>>> On Mon, Jun 04, 2018 at 08:20:38AM +1000, Dave Chinner wrote: >>>>>>> On Thu, May 31, 2018 at 09:02:52PM -0700, Dan Williams wrote: >>>>>>>> On Thu, May 31, 2018 at 7:24 PM, Dave Chinner wrote: >>>>>>>>> On Thu, May 31, 2018 at 06:57:33PM -0700, Dan Williams wrote: >>>>>>>>>>> FWIW, XFS+DAX used to just work on this setup (I hadn't even >>>>>>>>>>> installed ndctl until this morning!) but after changing the kernel >>>>>>>>>>> it no longer works. That would make it a regression, yes? >>>>>>> >>>>>>> [....] >>>>>>> >>>>>>>>>> I suspect your kernel does not have CONFIG_ZONE_DEVICE enabled which >>>>>>>>>> has the following dependencies: >>>>>>>>>> >>>>>>>>>> depends on MEMORY_HOTPLUG >>>>>>>>>> depends on MEMORY_HOTREMOVE >>>>>>>>>> depends on SPARSEMEM_VMEMMAP >>>>>>>>> >>>>>>>>> Filesystem DAX now has a dependency on memory hotplug? >>>>>>> >>>>>>> [....] >>>>>>> >>>>>>>>> OK, works now I've found the magic config incantantions to turn >>>>>>>>> everything I now need on. >>>>>>> >>>>>>> By enabling these options, my test VM now has a ~30s pause in the >>>>>>> boot very soon after the nvdimm subsystem is initialised. >>>>>>> >>>>>>> [ 1.523718] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled >>>>>>> [ 1.550353] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A >>>>>>> [ 1.552175] Non-volatile memory driver v1.3 >>>>>>> [ 2.332045] tsc: Refined TSC clocksource calibration: 2199.909 MHz >>>>>>> [ 2.333280] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fb5dcd4620, max_idle_ns: 440795264143 ns >>>>>>> [ 37.217453] brd: module loaded >>>>>>> [ 37.225423] loop: module loaded >>>>>>> [ 37.228441] virtio_blk virtio2: [vda] 10485760 512-byte logical blocks (5.37 GB/5.00 GiB) >>>>>>> [ 37.245418] virtio_blk virtio3: [vdb] 146800640 512-byte logical blocks (75.2 GB/70.0 GiB) >>>>>>> [ 37.255794] virtio_blk virtio4: [vdc] 1073741824000 512-byte logical blocks (550 TB/500 TiB) >>>>>>> [ 37.265403] nd_pmem namespace1.0: unable to guarantee persistence of writes >>>>>>> [ 37.265618] nd_pmem namespace0.0: unable to guarantee persistence of writes >>>>>>> >>>>>>> The system does not appear to be consuming CPU, but it is blocking >>>>>>> NMIs so I can't get a CPU trace. For a VM that I rely on booting in >>>>>>> a few seconds because I reboot it tens of times a day, this is a >>>>>>> problem.... >>>>>> >>>>>> And when I turn on KASAN, the kernel fails to boot to a login prompt >>>>>> because: >>>>> >>>>> What's your qemu and kernel command line? I'll take look at this first >>>>> thing tomorrow. >>>> >>>> I was able to reproduce this crash by just turning on KASAN... >>>> investigating. It would still help to have your config for our own >>>> regression testing purposes it makes sense for us to prioritize >>>> "Dave's test config", similar to the priority of not breaking Linus' >>>> laptop. >>> >>> I believe this is a bug in KASAN, or a bug in devm_memremap_pages(), >>> depends on your point of view. At the very least it is a mismatch of >>> assumptions. KASAN learns of hot added memory via the memory hotplug >>> notifier. However, the devm_memremap_pages() implementation is >>> intentionally limited to the "first half" of the memory hotplug >>> procedure. I.e. it does just enough to setup the linear map for >>> pfn_to_page() and initialize the "struct page" memmap, but then stops >>> short of onlining the pages. This is why we are getting a NULL ptr >>> deref and not a KASAN report, because KASAN has no shadow area setup >>> for the linearly mapped pmem range. >>> >>> In terms of solving it we could refactor kasan_mem_notifier() so that >>> devm_memremap_pages() can call it outside of the notifier... I'll give >>> this a shot. >> >> Well, the attached patch got me slightly further, but only slightly... >> >> [ 14.998394] BUG: KASAN: unknown-crash in pmem_do_bvec+0x19e/0x790 [nd_pmem] >> [ 15.000006] Read of size 4096 at addr ffff880200000000 by task >> systemd-udevd/915 >> [ 15.001991] >> [ 15.002590] CPU: 15 PID: 915 Comm: systemd-udevd Tainted: G >> OE 4.17.0-rc5+ #1 >> 982 >> [ 15.004783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS rel-1.11.1-0-g0551a >> 4be2c-prebuilt.qemu-project.org 04/01/2014 >> [ 15.007652] Call Trace: >> [ 15.008339] dump_stack+0x9a/0xeb >> [ 15.009344] print_address_description+0x73/0x280 >> [ 15.010524] kasan_report+0x258/0x380 >> [ 15.011528] ? pmem_do_bvec+0x19e/0x790 [nd_pmem] >> [ 15.012747] memcpy+0x1f/0x50 >> [ 15.013659] pmem_do_bvec+0x19e/0x790 [nd_pmem] >> >> ...I've exhausted my limited kasan internals knowledge, any ideas what >> it's missing? >> > > Initialization is missing. kasan_mem_notifier() doesn't initialize shadow because > it expects kasan_free_pages()/kasan_alloc_pages() will do that when page allocated/freed. > > So adding memset(shadow_start, 0, shadow_size); will make this work. > But we shouldn't use kasan_mem_notifier here, as that would mean wasting a lot of memory only > to store zeroes. > > A better solution would be mapping kasan_zero_page in shadow. > The draft patch bellow demonstrates the idea (build tested only). > > > --- > include/linux/kasan.h | 14 ++++++++++++++ > kernel/memremap.c | 10 ++++++++++ > mm/kasan/kasan_init.c | 46 ++++++++++++++++++++++++++++++++++++---------- > 3 files changed, 60 insertions(+), 10 deletions(-) Thank you! This RFC patch works for me. For now we don't necessarily need kasan_remove_zero_shadow(), but in the future we might dynamically switch the same physical address from being mapped by devm_memremap_page() and traditional memory hotplug.