Obviously, the subject is meant to be [PATCH v8 x/2 RESEND].
Best,
Valentin
On Fri, Sep 22, 2023 at 05:12:19PM +0500, Valentin Sinitsyn wrote:
> Obviously, the subject is meant to be [PATCH v8 x/2 RESEND].
Can you do a v9 then please?
I am behind in catching up on patches due to the merge window and then
travel, so please be patient. If you want to help out, please review
other pending patches to help yours move to the front.
thanks,
greg k-h
Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
sysfs entries have started to receive their f_mapping from the iomem
pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
(and procfs) as well as in /dev/[k]mem.
This resulted in a userspace-visible regression:
1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
2. Use lseek(fd, 0, SEEK_END) to determine its size
Expected result: a PCI region size is returned.
Actual result: 0 is returned.
The reason is that PCI resource files residing in sysfs use
generic_file_llseek(), which relies on f_mapping->host inode to get the
file size. As f_mapping is now redefined, f_mapping->host points to an
anonymous zero-sized iomem_inode which has nothing to do with sysfs file
in question.
Implement a custom llseek method for sysfs PCI resources, which is
almost the same as proc_bus_pci_lseek() used for procfs entries.
This makes sysfs and procfs entries consistent with regards to seeking,
but also introduces userspace-visible changes to seeking PCI resources
in sysfs:
- SEEK_DATA and SEEK_HOLE are no longer supported;
- Seeking past the end of the file is prohibited while previously
offsets up to MAX_NON_LFS were accepted (reading from these offsets
was always invalid).
Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
Cc: [email protected]
Signed-off-by: Valentine Sinitsyn <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index d9eede2dbc0e..97c9c62d5e3e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -835,6 +835,19 @@ static const struct attribute_group pci_dev_config_attr_group = {
.is_bin_visible = pci_dev_config_attr_is_visible,
};
+/*
+ * llseek operation for mmappable PCI resources.
+ * May be left unused if the arch doesn't provide them.
+ */
+static __maybe_unused loff_t
+pci_llseek_resource(struct file *filep,
+ struct kobject *kobj __always_unused,
+ struct bin_attribute *attr,
+ loff_t offset, int whence)
+{
+ return fixed_size_llseek(filep, offset, whence, attr->size);
+}
+
#ifdef HAVE_PCI_LEGACY
/**
* pci_read_legacy_io - read byte(s) from legacy I/O port space
@@ -967,6 +980,8 @@ void pci_create_legacy_files(struct pci_bus *b)
b->legacy_io->attr.mode = 0600;
b->legacy_io->read = pci_read_legacy_io;
b->legacy_io->write = pci_write_legacy_io;
+ /* See pci_create_attr() for motivation */
+ b->legacy_io->llseek = pci_llseek_resource;
b->legacy_io->mmap = pci_mmap_legacy_io;
b->legacy_io->f_mapping = iomem_get_mapping;
pci_adjust_legacy_attr(b, pci_mmap_io);
@@ -981,6 +996,8 @@ void pci_create_legacy_files(struct pci_bus *b)
b->legacy_mem->size = 1024*1024;
b->legacy_mem->attr.mode = 0600;
b->legacy_mem->mmap = pci_mmap_legacy_mem;
+ /* See pci_create_attr() for motivation */
+ b->legacy_mem->llseek = pci_llseek_resource;
b->legacy_mem->f_mapping = iomem_get_mapping;
pci_adjust_legacy_attr(b, pci_mmap_mem);
error = device_create_bin_file(&b->dev, b->legacy_mem);
@@ -1199,8 +1216,15 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
res_attr->mmap = pci_mmap_resource_uc;
}
}
- if (res_attr->mmap)
+ if (res_attr->mmap) {
res_attr->f_mapping = iomem_get_mapping;
+ /*
+ * generic_file_llseek() consults f_mapping->host to determine
+ * the file size. As iomem_inode knows nothing about the
+ * attribute, it's not going to work, so override it as well.
+ */
+ res_attr->llseek = pci_llseek_resource;
+ }
res_attr->attr.name = res_attr_name;
res_attr->attr.mode = 0600;
res_attr->size = pci_resource_len(pdev, num);
--
2.34.1
As of now, seeking in sysfs files is handled by generic_file_llseek().
There are situations where one may want to customize seeking logic:
- Many sysfs entries are fixed files while generic_file_llseek() accepts
past-the-end positions. Not only being useless by itself, this
also means a bug in userspace code will trigger not at lseek(), but at
some later point making debugging harder.
- generic_file_llseek() relies on f_mapping->host to get the file size
which might not be correct for all sysfs entries.
See commit 636b21b50152 ("PCI: Revoke mappings like devmem") as an example.
Implement llseek method to override this behavior at sysfs attribute
level. The method is optional, and if it is absent,
generic_file_llseek() is called to preserve backwards compatibility.
Cc: [email protected]
Signed-off-by: Valentine Sinitsyn <[email protected]>
---
Changelog:
v9:
- Rebased on the latest Linus tree per GregKH's request
v8:
- This is real v7; previous v7 is a buggy earlier patch sent by
mistake
v7:
- Use proper locking in kernfs_fop_llseek()
v6:
- Mark pci_llseek_resource() as __maybe_unused
- Fix a typo in pci_create_legacy_files()
v5:
- Fix builds without PCI mmap support (e.g. Alpha)
v4:
- Fix builds which #define HAVE_PCI_LEGACY (e.g. PowerPC)
v3:
- Grammar fixes
- Add base-patch: and prerequisite-patch-id: to make kernel test
robot happy
v2:
- Add infrastructure to customize llseek per sysfs entry type
- Override llseek for PCI sysfs entries with fixed_file_llseek()--
fs/kernfs/file.c | 29 ++++++++++++++++++++++++++++-
fs/sysfs/file.c | 13 +++++++++++++
include/linux/kernfs.h | 1 +
include/linux/sysfs.h | 2 ++
4 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 180906c36f51..855e3f9d8dcc 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -903,6 +903,33 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
return ret;
}
+static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
+{
+ struct kernfs_open_file *of = kernfs_of(file);
+ const struct kernfs_ops *ops;
+ loff_t ret;
+
+ /*
+ * @of->mutex nests outside active ref and is primarily to ensure that
+ * the ops aren't called concurrently for the same open file.
+ */
+ mutex_lock(&of->mutex);
+ if (!kernfs_get_active(of->kn)) {
+ mutex_unlock(&of->mutex);
+ return -ENODEV;
+ }
+
+ ops = kernfs_ops(of->kn);
+ if (ops->llseek)
+ ret = ops->llseek(of, offset, whence);
+ else
+ ret = generic_file_llseek(file, offset, whence);
+
+ kernfs_put_active(of->kn);
+ mutex_unlock(&of->mutex);
+ return ret;
+}
+
static void kernfs_notify_workfn(struct work_struct *work)
{
struct kernfs_node *kn;
@@ -1005,7 +1032,7 @@ EXPORT_SYMBOL_GPL(kernfs_notify);
const struct file_operations kernfs_file_fops = {
.read_iter = kernfs_fop_read_iter,
.write_iter = kernfs_fop_write_iter,
- .llseek = generic_file_llseek,
+ .llseek = kernfs_fop_llseek,
.mmap = kernfs_fop_mmap,
.open = kernfs_fop_open,
.release = kernfs_fop_release,
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index a12ac0356c69..6b7652fb8050 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -167,6 +167,18 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
return battr->mmap(of->file, kobj, battr, vma);
}
+static loff_t sysfs_kf_bin_llseek(struct kernfs_open_file *of, loff_t offset,
+ int whence)
+{
+ struct bin_attribute *battr = of->kn->priv;
+ struct kobject *kobj = of->kn->parent->priv;
+
+ if (battr->llseek)
+ return battr->llseek(of->file, kobj, battr, offset, whence);
+ else
+ return generic_file_llseek(of->file, offset, whence);
+}
+
static int sysfs_kf_bin_open(struct kernfs_open_file *of)
{
struct bin_attribute *battr = of->kn->priv;
@@ -249,6 +261,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = {
.write = sysfs_kf_bin_write,
.mmap = sysfs_kf_bin_mmap,
.open = sysfs_kf_bin_open,
+ .llseek = sysfs_kf_bin_llseek,
};
int sysfs_add_file_mode_ns(struct kernfs_node *parent,
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 2a36f3218b51..99aaa050ccb7 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -316,6 +316,7 @@ struct kernfs_ops {
struct poll_table_struct *pt);
int (*mmap)(struct kernfs_open_file *of, struct vm_area_struct *vma);
+ loff_t (*llseek)(struct kernfs_open_file *of, loff_t offset, int whence);
};
/*
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index fd3fe5c8c17f..b717a70219f6 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -181,6 +181,8 @@ struct bin_attribute {
char *, loff_t, size_t);
ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
char *, loff_t, size_t);
+ loff_t (*llseek)(struct file *, struct kobject *, struct bin_attribute *,
+ loff_t, int);
int (*mmap)(struct file *, struct kobject *, struct bin_attribute *attr,
struct vm_area_struct *vma);
};
base-commit: 27bbf45eae9ca98877a2d52a92a188147cd61b07
--
2.34.1
On Mon, Sep 25, 2023 at 11:40:13AM +0300, Valentine Sinitsyn wrote:
> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
> sysfs entries have started to receive their f_mapping from the iomem
> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
> (and procfs) as well as in /dev/[k]mem.
>
> This resulted in a userspace-visible regression:
>
> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
> 2. Use lseek(fd, 0, SEEK_END) to determine its size
>
> Expected result: a PCI region size is returned.
> Actual result: 0 is returned.
>
> The reason is that PCI resource files residing in sysfs use
> generic_file_llseek(), which relies on f_mapping->host inode to get the
> file size. As f_mapping is now redefined, f_mapping->host points to an
> anonymous zero-sized iomem_inode which has nothing to do with sysfs file
> in question.
>
> Implement a custom llseek method for sysfs PCI resources, which is
> almost the same as proc_bus_pci_lseek() used for procfs entries.
>
> This makes sysfs and procfs entries consistent with regards to seeking,
> but also introduces userspace-visible changes to seeking PCI resources
> in sysfs:
>
> - SEEK_DATA and SEEK_HOLE are no longer supported;
> - Seeking past the end of the file is prohibited while previously
> offsets up to MAX_NON_LFS were accepted (reading from these offsets
> was always invalid).
>
> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
> Cc: [email protected]
> Signed-off-by: Valentine Sinitsyn <[email protected]>
> Acked-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
I'll take these now, for 6.7-rc1, but not mark them as fixes or cc:
stable as this is a new functionality, the code has never worked for
lseek on these files so it's not like anything was broken :)
thanks,
greg k-h
Valentine Sinitsyn wrote:
> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
> sysfs entries have started to receive their f_mapping from the iomem
> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
> (and procfs) as well as in /dev/[k]mem.
>
> This resulted in a userspace-visible regression:
>
> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
> 2. Use lseek(fd, 0, SEEK_END) to determine its size
>
> Expected result: a PCI region size is returned.
> Actual result: 0 is returned.
>
> The reason is that PCI resource files residing in sysfs use
> generic_file_llseek(), which relies on f_mapping->host inode to get the
> file size. As f_mapping is now redefined, f_mapping->host points to an
> anonymous zero-sized iomem_inode which has nothing to do with sysfs file
> in question.
>
> Implement a custom llseek method for sysfs PCI resources, which is
> almost the same as proc_bus_pci_lseek() used for procfs entries.
>
> This makes sysfs and procfs entries consistent with regards to seeking,
> but also introduces userspace-visible changes to seeking PCI resources
> in sysfs:
>
> - SEEK_DATA and SEEK_HOLE are no longer supported;
> - Seeking past the end of the file is prohibited while previously
> offsets up to MAX_NON_LFS were accepted (reading from these offsets
> was always invalid).
>
> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
> Cc: [email protected]
> Signed-off-by: Valentine Sinitsyn <[email protected]>
> Acked-by: Bjorn Helgaas <[email protected]>
Thanks for doing this reorganization and the follow-up, looks good to me:
Reviewed-by: Dan Williams <[email protected]>
On 05.10.2023 16:41, Greg Kroah-Hartman wrote:
> On Mon, Sep 25, 2023 at 11:40:13AM +0300, Valentine Sinitsyn wrote:
>> Since commit 636b21b50152 ("PCI: Revoke mappings like devmem"), mmappable
>> sysfs entries have started to receive their f_mapping from the iomem
>> pseudo filesystem, so that CONFIG_IO_STRICT_DEVMEM is honored in sysfs
>> (and procfs) as well as in /dev/[k]mem.
>>
>> This resulted in a userspace-visible regression:
>>
>> 1. Open a sysfs PCI resource file (eg. /sys/bus/pci/devices/*/resource0)
>> 2. Use lseek(fd, 0, SEEK_END) to determine its size
>>
>> Expected result: a PCI region size is returned.
>> Actual result: 0 is returned.
>>
>> The reason is that PCI resource files residing in sysfs use
>> generic_file_llseek(), which relies on f_mapping->host inode to get the
>> file size. As f_mapping is now redefined, f_mapping->host points to an
>> anonymous zero-sized iomem_inode which has nothing to do with sysfs file
>> in question.
>>
>> Implement a custom llseek method for sysfs PCI resources, which is
>> almost the same as proc_bus_pci_lseek() used for procfs entries.
>>
>> This makes sysfs and procfs entries consistent with regards to seeking,
>> but also introduces userspace-visible changes to seeking PCI resources
>> in sysfs:
>>
>> - SEEK_DATA and SEEK_HOLE are no longer supported;
>> - Seeking past the end of the file is prohibited while previously
>> offsets up to MAX_NON_LFS were accepted (reading from these offsets
>> was always invalid).
>>
>> Fixes: 636b21b50152 ("PCI: Revoke mappings like devmem")
>> Cc: [email protected]
>> Signed-off-by: Valentine Sinitsyn <[email protected]>
>> Acked-by: Bjorn Helgaas <[email protected]>
>> ---
>> drivers/pci/pci-sysfs.c | 26 +++++++++++++++++++++++++-
>> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> I'll take these now, for 6.7-rc1, but not mark them as fixes or cc:
Thanks, appreciated.
> stable as this is a new functionality, the code has never worked for
> lseek on these files so it's not like anything was broken :)
In fact, lseek() on PCI resource files in sysfs was broken since commit
636b21b50152. That was the reason why I started to investigate the
issue: one of our applications stopped working after a kernel update.
I'm not hundred percent sure if it belongs to stable, but it does fix a
user-visible regression.
Best,
Valentin