Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp2086766pxx; Sat, 31 Oct 2020 07:42:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzjSPM5baLTpUJ8PLqYhKOl33JmpA2YgmlyVGy/7XUuJtva0t/M4HBo+Jc1+ghO9Asm2AIr X-Received: by 2002:a17:906:3fc5:: with SMTP id k5mr7602653ejj.158.1604155360650; Sat, 31 Oct 2020 07:42:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1604155360; cv=none; d=google.com; s=arc-20160816; b=WFq4bUXlsqRTlzDH1IRJW4cPdOMYBSBCxRqmaz0360dvS75tB7otNv1dlNGAaFZMUv NCj3+hW/xdEbfBn+AlGGT1s1AT4Mb6/dUWXUJi6LYFYIQSsNq/HAupEhxBqoIvGpFtu3 XdH6kVm003dXcl3OkfKrFRRpyg5c98ZayvXPjZmyRkG7tp2ulGiuDIexVCgS6nh2Fy/2 aG/Ek7yZ4yre27o/SJl/f2Q9v7PsmyWMXTHcdPcTbFnHzjGM/4oB7yhf4MnYxwiVozpj EYhhAsCftGAIoy3AXJq25Mvq/ruoQBRvVsNENJx9bFR4KFaJ7y2j6NRgfdAcEQxnzinu zXPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=JtO7Nja6RcTtOeq2s1LZwcLE76uoTTnCv1bvoi/jbRQ=; b=NVBcLX/UnOBCzeGdt7NnETonI6ERomslZFg34zg0u/KtdUZpFeP4099jlM2roZNShD culhcUHERQaYD+qMVHX/GprqNUC9Ldu+aHIRJXTDmLlQXcRps3UNTqBCA9zggylYIb+K ZRRwf870fTSLADNYlmZeY0H9vXYmx0/J226ixBRrk1GeVGLzs63htuZo6UdaPFcBBePc yzrZAuscDGXGtVF2jStRaPxtwPbltb/dX38dFomFgVhojeGJmPDzK3nsKbF2LUzDrsrB nMdYMmRL3nmS+ms1Ipl+O2gpaFc7skUmt/SSVvTTJwkNitAGPOcH79aPxorrMo2OUphJ PJFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=O6NovDNr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g16si6604635edp.288.2020.10.31.07.42.16; Sat, 31 Oct 2020 07:42:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=O6NovDNr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727708AbgJaOkt (ORCPT + 99 others); Sat, 31 Oct 2020 10:40:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726241AbgJaOks (ORCPT ); Sat, 31 Oct 2020 10:40:48 -0400 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34246C0617A6 for ; Sat, 31 Oct 2020 07:40:47 -0700 (PDT) Received: by mail-ot1-x341.google.com with SMTP id m26so8310333otk.11 for ; Sat, 31 Oct 2020 07:40:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=JtO7Nja6RcTtOeq2s1LZwcLE76uoTTnCv1bvoi/jbRQ=; b=O6NovDNrkmvyLeMFHQ9kK6wUcbzNTFA80tHkHLx4K1Rl1FiZWxl43rYqA2paTDUoG9 Xw8tEEg3vp7olCb3fkGN5R505BIXt9Vj5gLohBUOiYfhXy8e4m1xQZzJucvcpAOVyKpG MKCmjLVQE+563SqK3v/+2eMm0Mi9X2tcHiKvE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=JtO7Nja6RcTtOeq2s1LZwcLE76uoTTnCv1bvoi/jbRQ=; b=uWgfuuF3t8DhwqTSAHOkcVrgqckTdcZ/pGV94CsjoylcQnw7imRzRBJl25rNXbew32 ehiSzwzyFQFo2x2tUj6hZDa+iAHm9XP5PLiva9i4bne6imiMMY4apVmreOlW0Hgwcyc+ tcVioyYYcOQr2uY+3bLi3yRUKYpmWIo06Zm4f5mhS8QtA96baNZj5aQ9jcDb02z5OOgv 1boLWatSAAC/24nWIKGxIRjcbgoVv7T1P9UR+ozG4oVdiC2eGuu//LMwqS1dQ0yttyk4 TgYlhafPL+JmICJ4lqz3AtGjvSNz2dYPV83gfvj35eU/2diTxzAwixVMNFfvkNMIuxxC yYbw== X-Gm-Message-State: AOAM531Wh5xbXfbAsI93l9jsmazUQbva7aTCzsHu2drLBYUlQSi6+q75 kCnwvb2JvHcEzPnA3pasT+dBHKH4OjUynSl3mKn41A== X-Received: by 2002:a9d:6e81:: with SMTP id a1mr5364593otr.303.1604155246273; Sat, 31 Oct 2020 07:40:46 -0700 (PDT) MIME-Version: 1.0 References: <20201030100815.2269-1-daniel.vetter@ffwll.ch> <20201030100815.2269-14-daniel.vetter@ffwll.ch> <787f2914-5777-4703-4bee-68c4c3742817@nvidia.com> In-Reply-To: <787f2914-5777-4703-4bee-68c4c3742817@nvidia.com> From: Daniel Vetter Date: Sat, 31 Oct 2020 15:40:35 +0100 Message-ID: Subject: Re: [PATCH v5 13/15] resource: Move devmem revoke code to resource framework To: John Hubbard Cc: DRI Development , LKML , KVM list , Linux MM , Linux ARM , linux-samsung-soc , "open list:DMA BUFFER SHARING FRAMEWORK" , Greg Kroah-Hartman , Daniel Vetter , Jason Gunthorpe , Kees Cook , Dan Williams , Andrew Morton , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Arnd Bergmann , David Hildenbrand , "Rafael J. Wysocki" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 31, 2020 at 7:36 AM John Hubbard wrote: > > On 10/30/20 3:08 AM, Daniel Vetter wrote: > > We want all iomem mmaps to consistently revoke ptes when the kernel > > takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the > > pci bar mmaps available through procfs and sysfs, which currently do > > not revoke mappings. > > > > To prepare for this, move the code from the /dev/kmem driver to > > kernel/resource.c. > > This seems like it's doing a lot more than just code movement, right? > Should we list some of that here? It was meant to be just moving code, but then the inevitable bikeshed showed up and I forgot to update the commit message properly. Will fix that. > Also, I'm seeing a crash due to this commit. More below: Uh that's not good. > > > > > Reviewed-by: Greg Kroah-Hartman > > Signed-off-by: Daniel Vetter > > Cc: Jason Gunthorpe > > Cc: Kees Cook > > Cc: Dan Williams > > Cc: Andrew Morton > > Cc: John Hubbard > > Cc: J=C3=A9r=C3=B4me Glisse > > Cc: Jan Kara > > Cc: Dan Williams > > Cc: linux-mm@kvack.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-samsung-soc@vger.kernel.org > > Cc: linux-media@vger.kernel.org > > Cc: Arnd Bergmann > > Cc: Greg Kroah-Hartman > > Cc: Daniel Vetter > > Cc: David Hildenbrand > > Cc: "Rafael J. Wysocki" > > Signed-off-by: Daniel Vetter > > -- > > v3: > > - add barrier for consistency and document why we don't have to check > > for NULL (Jason) > > v4 > > - Adjust comments to reflect the general nature of this iomem revoke > > code now (Dan) > > --- > > drivers/char/mem.c | 85 +--------------------------------- > > include/linux/ioport.h | 6 +-- > > kernel/resource.c | 101 ++++++++++++++++++++++++++++++++++++++++= - > > 3 files changed, 102 insertions(+), 90 deletions(-) > > > > diff --git a/drivers/char/mem.c b/drivers/char/mem.c > > index 7dcf9e4ea79d..43c871dc7477 100644 > > --- a/drivers/char/mem.c > > +++ b/drivers/char/mem.c > > @@ -31,9 +31,6 @@ > > #include > > #include > > #include > > -#include > > -#include > > -#include > > > > #ifdef CONFIG_IA64 > > # include > > @@ -836,42 +833,6 @@ static loff_t memory_lseek(struct file *file, loff= _t offset, int orig) > > return ret; > > } > > > > -static struct inode *devmem_inode; > > - > > -#ifdef CONFIG_IO_STRICT_DEVMEM > > -void revoke_devmem(struct resource *res) > > -{ > > - /* pairs with smp_store_release() in devmem_init_inode() */ > > - struct inode *inode =3D smp_load_acquire(&devmem_inode); > > - > > - /* > > - * Check that the initialization has completed. Losing the race > > - * is ok because it means drivers are claiming resources before > > - * the fs_initcall level of init and prevent /dev/mem from > > - * establishing mappings. > > - */ > > - if (!inode) > > - return; > > - > > - /* > > - * The expectation is that the driver has successfully marked > > - * the resource busy by this point, so devmem_is_allowed() > > - * should start returning false, however for performance this > > - * does not iterate the entire resource range. > > - */ > > - if (devmem_is_allowed(PHYS_PFN(res->start)) && > > - devmem_is_allowed(PHYS_PFN(res->end))) { > > - /* > > - * *cringe* iomem=3Drelaxed says "go ahead, what's the > > - * worst that can happen?" > > - */ > > - return; > > - } > > - > > - unmap_mapping_range(inode->i_mapping, res->start, resource_size(r= es), 1); > > -} > > -#endif > > - > > static int open_port(struct inode *inode, struct file *filp) > > { > > int rc; > > @@ -891,7 +852,7 @@ static int open_port(struct inode *inode, struct fi= le *filp) > > * revocations when drivers want to take over a /dev/mem mapped > > * range. > > */ > > - filp->f_mapping =3D inode->i_mapping; > > + filp->f_mapping =3D iomem_get_mapping(); > > > The problem is that iomem_get_mapping() returns NULL for the !CONFIG_IO_S= TRICT_DEVMEM > case. And then we have pre-existing fs code that expects to go "up and ov= er", like this: > > > static int do_dentry_open(struct file *f, > struct inode *inode, > int (*open)(struct inode *, struct file *)) > { > ... > > file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping); > > ...and it crashes on that line fairly early in bootup. > > Not sure what to suggest for this patch, but wanted to get this report ou= t at least. Old code seems to have worked by always setting up the inode (we still do that) and always setting it (we don't do that anymore), just not revoking the ptes when the Kconfig is not set. I'll fix that up and remove the behaviour change here. -Daniel > thanks, > -- > John Hubbard > NVIDIA > > > > > return 0; > > } > > @@ -1023,48 +984,6 @@ static char *mem_devnode(struct device *dev, umod= e_t *mode) > > > > static struct class *mem_class; > > > > -static int devmem_fs_init_fs_context(struct fs_context *fc) > > -{ > > - return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM; > > -} > > - > > -static struct file_system_type devmem_fs_type =3D { > > - .name =3D "devmem", > > - .owner =3D THIS_MODULE, > > - .init_fs_context =3D devmem_fs_init_fs_context, > > - .kill_sb =3D kill_anon_super, > > -}; > > - > > -static int devmem_init_inode(void) > > -{ > > - static struct vfsmount *devmem_vfs_mount; > > - static int devmem_fs_cnt; > > - struct inode *inode; > > - int rc; > > - > > - rc =3D simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_= fs_cnt); > > - if (rc < 0) { > > - pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", r= c); > > - return rc; > > - } > > - > > - inode =3D alloc_anon_inode(devmem_vfs_mount->mnt_sb); > > - if (IS_ERR(inode)) { > > - rc =3D PTR_ERR(inode); > > - pr_err("Cannot allocate inode for /dev/mem: %d\n", rc); > > - simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt); > > - return rc; > > - } > > - > > - /* > > - * Publish /dev/mem initialized. > > - * Pairs with smp_load_acquire() in revoke_devmem(). > > - */ > > - smp_store_release(&devmem_inode, inode); > > - > > - return 0; > > -} > > - > > static int __init chr_dev_init(void) > > { > > int minor; > > @@ -1086,8 +1005,6 @@ static int __init chr_dev_init(void) > > */ > > if ((minor =3D=3D DEVPORT_MINOR) && !arch_has_dev_port()) > > continue; > > - if ((minor =3D=3D DEVMEM_MINOR) && devmem_init_inode() != =3D 0) > > - continue; > > > > device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor), > > NULL, devlist[minor].name); > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > > index 5135d4b86cd6..02a5466245c0 100644 > > --- a/include/linux/ioport.h > > +++ b/include/linux/ioport.h > > @@ -307,11 +307,7 @@ struct resource *devm_request_free_mem_region(stru= ct device *dev, > > struct resource *request_free_mem_region(struct resource *base, > > unsigned long size, const char *name); > > > > -#ifdef CONFIG_IO_STRICT_DEVMEM > > -void revoke_devmem(struct resource *res); > > -#else > > -static inline void revoke_devmem(struct resource *res) { }; > > -#endif > > +extern struct address_space *iomem_get_mapping(void); > > > > #endif /* __ASSEMBLY__ */ > > #endif /* _LINUX_IOPORT_H */ > > diff --git a/kernel/resource.c b/kernel/resource.c > > index 3ae2f56cc79d..5ecc3187fe2d 100644 > > --- a/kernel/resource.c > > +++ b/kernel/resource.c > > @@ -18,12 +18,15 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > #include > > #include > > +#include > > #include > > +#include > > #include > > > > > > @@ -1115,6 +1118,58 @@ resource_size_t resource_alignment(struct resour= ce *res) > > > > static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait); > > > > +static struct inode *iomem_inode; > > + > > +#ifdef CONFIG_IO_STRICT_DEVMEM > > +static void revoke_iomem(struct resource *res) > > +{ > > + /* pairs with smp_store_release() in iomem_init_inode() */ > > + struct inode *inode =3D smp_load_acquire(&iomem_inode); > > + > > + /* > > + * Check that the initialization has completed. Losing the race > > + * is ok because it means drivers are claiming resources before > > + * the fs_initcall level of init and prevent iomem_get_mapping us= ers > > + * from establishing mappings. > > + */ > > + if (!inode) > > + return; > > + > > + /* > > + * The expectation is that the driver has successfully marked > > + * the resource busy by this point, so devmem_is_allowed() > > + * should start returning false, however for performance this > > + * does not iterate the entire resource range. > > + */ > > + if (devmem_is_allowed(PHYS_PFN(res->start)) && > > + devmem_is_allowed(PHYS_PFN(res->end))) { > > + /* > > + * *cringe* iomem=3Drelaxed says "go ahead, what's the > > + * worst that can happen?" > > + */ > > + return; > > + } > > + > > + unmap_mapping_range(inode->i_mapping, res->start, resource_size(r= es), 1); > > +} > > +struct address_space *iomem_get_mapping(void) > > +{ > > + /* > > + * This function is only called from file open paths, hence guara= nteed > > + * that fs_initcalls have completed and no need to check for NULL= . But > > + * since revoke_iomem can be called before the initcall we still = need > > + * the barrier to appease checkers. > > + */ > > + return smp_load_acquire(&iomem_inode)->i_mapping; > > +} > > +#else > > +static void revoke_iomem(struct resource *res) {} > > +struct address_space *iomem_get_mapping(void) > > +{ > > + return NULL; > > +} > > +#endif > > + > > /** > > * __request_region - create a new busy resource region > > * @parent: parent resource descriptor > > @@ -1182,7 +1237,7 @@ struct resource * __request_region(struct resourc= e *parent, > > write_unlock(&resource_lock); > > > > if (res && orig_parent =3D=3D &iomem_resource) > > - revoke_devmem(res); > > + revoke_iomem(res); > > > > return res; > > } > > @@ -1782,4 +1837,48 @@ static int __init strict_iomem(char *str) > > return 1; > > } > > > > +static int iomem_fs_init_fs_context(struct fs_context *fc) > > +{ > > + return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM; > > +} > > + > > +static struct file_system_type iomem_fs_type =3D { > > + .name =3D "iomem", > > + .owner =3D THIS_MODULE, > > + .init_fs_context =3D iomem_fs_init_fs_context, > > + .kill_sb =3D kill_anon_super, > > +}; > > + > > +static int __init iomem_init_inode(void) > > +{ > > + static struct vfsmount *iomem_vfs_mount; > > + static int iomem_fs_cnt; > > + struct inode *inode; > > + int rc; > > + > > + rc =3D simple_pin_fs(&iomem_fs_type, &iomem_vfs_mount, &iomem_fs_= cnt); > > + if (rc < 0) { > > + pr_err("Cannot mount iomem pseudo filesystem: %d\n", rc); > > + return rc; > > + } > > + > > + inode =3D alloc_anon_inode(iomem_vfs_mount->mnt_sb); > > + if (IS_ERR(inode)) { > > + rc =3D PTR_ERR(inode); > > + pr_err("Cannot allocate inode for iomem: %d\n", rc); > > + simple_release_fs(&iomem_vfs_mount, &iomem_fs_cnt); > > + return rc; > > + } > > + > > + /* > > + * Publish iomem revocation inode initialized. > > + * Pairs with smp_load_acquire() in revoke_iomem(). > > + */ > > + smp_store_release(&iomem_inode, inode); > > + > > + return 0; > > +} > > + > > +fs_initcall(iomem_init_inode); > > + > > __setup("iomem=3D", strict_iomem); > > > > --=20 Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch