2024-01-11 16:03:19

by Marco Pagani

[permalink] [raw]
Subject: [RFC PATCH v5 0/1] fpga: improve protection against low-level control module unloading

This RFC proposes a solution to keep protecting the fpga manager against
the unloading of the low-level control modules while addressing the
limitations of the current implementation. Currently, the code assumes
that the low-level module registers a driver for the parent device that
is later used to take the module's refcount. This proposal removes this
limitation by adding a module owner field to the fpga_manager struct
that can be set while registering the manager.

For more context, please refer to these threads:
https://lore.kernel.org/linux-fpga/ZS6hhlvjUcqyv8zL@yilunxu-OptiPlex-7050
https://lore.kernel.org/linux-fpga/ZT9qENE9fE3Z0KCW@yilunxu-OptiPlex-7050

v5:
- Updated the documentation
- Removed kernel-doc comments for helper macros
v4:
- Use helper macros to set the owner module
v3:
- Improved locking
v2:
- Fixed protection against races during module removal

Marco Pagani (1):
fpga: add an owner and use it to take the low-level module's refcount

Documentation/driver-api/fpga/fpga-mgr.rst | 34 ++++----
drivers/fpga/fpga-mgr.c | 93 ++++++++++++++--------
include/linux/fpga/fpga-mgr.h | 28 +++++--
3 files changed, 102 insertions(+), 53 deletions(-)


base-commit: c849ecb2ae8413f86c84627cb0af06dffce4e215
--
2.43.0



2024-01-11 16:03:39

by Marco Pagani

[permalink] [raw]
Subject: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

Add a module owner field to the fpga_manager struct to take the
low-level control module refcount instead of assuming that the parent
device has a driver and using its owner pointer. The owner is now
passed as an additional argument at registration time. To this end,
the functions for registration have been modified to take an additional
owner parameter and renamed to avoid conflicts. The old function names
are now used for helper macros that automatically set the module that
registers the fpga manager as the owner. This ensures compatibility
with existing low-level control modules and reduces the chances of
registering a manager without setting the owner.

To detect when the owner module pointer becomes stale, set the mops
pointer to null during fpga_mgr_unregister() and test it before taking
the module's refcount. Use a mutex to protect against a crash that can
happen if __fpga_mgr_get() gets suspended between testing the mops
pointer and taking the refcount while the low-level module is being
unloaded.

Update the documentation to keep it consistent with the new interface
for registering an fpga manager.

Other changes: opportunistically move put_device() from __fpga_mgr_get()
to fpga_mgr_get() and of_fpga_mgr_get() to improve code clarity since
the device refcount in taken in these functions.

Fixes: 654ba4cc0f3e ("fpga manager: ensure lifetime with of_fpga_mgr_get")
Suggested-by: Xu Yilun <[email protected]>
Signed-off-by: Marco Pagani <[email protected]>
---
Documentation/driver-api/fpga/fpga-mgr.rst | 34 ++++----
drivers/fpga/fpga-mgr.c | 93 ++++++++++++++--------
include/linux/fpga/fpga-mgr.h | 28 +++++--
3 files changed, 102 insertions(+), 53 deletions(-)

diff --git a/Documentation/driver-api/fpga/fpga-mgr.rst b/Documentation/driver-api/fpga/fpga-mgr.rst
index 49c0a9512653..8d2b79f696c1 100644
--- a/Documentation/driver-api/fpga/fpga-mgr.rst
+++ b/Documentation/driver-api/fpga/fpga-mgr.rst
@@ -24,7 +24,8 @@ How to support a new FPGA device
--------------------------------

To add another FPGA manager, write a driver that implements a set of ops. The
-probe function calls fpga_mgr_register() or fpga_mgr_register_full(), such as::
+probe function calls ``fpga_mgr_register()`` or ``fpga_mgr_register_full()``,
+such as::

static const struct fpga_manager_ops socfpga_fpga_ops = {
.write_init = socfpga_fpga_ops_configure_init,
@@ -69,10 +70,11 @@ probe function calls fpga_mgr_register() or fpga_mgr_register_full(), such as::
}

Alternatively, the probe function could call one of the resource managed
-register functions, devm_fpga_mgr_register() or devm_fpga_mgr_register_full().
-When these functions are used, the parameter syntax is the same, but the call
-to fpga_mgr_unregister() should be removed. In the above example, the
-socfpga_fpga_remove() function would not be required.
+register functions, ``devm_fpga_mgr_register()`` or
+``devm_fpga_mgr_register_full()``. When these functions are used, the
+parameter syntax is the same, but the call to ``fpga_mgr_unregister()`` should be
+removed. In the above example, the ``socfpga_fpga_remove()`` function would not be
+required.

The ops will implement whatever device specific register writes are needed to
do the programming sequence for this particular FPGA. These ops return 0 for
@@ -125,15 +127,19 @@ API for implementing a new FPGA Manager driver
* struct fpga_manager - the FPGA manager struct
* struct fpga_manager_ops - Low level FPGA manager driver ops
* struct fpga_manager_info - Parameter structure for fpga_mgr_register_full()
-* fpga_mgr_register_full() - Create and register an FPGA manager using the
+* __fpga_mgr_register_full() - Create and register an FPGA manager using the
fpga_mgr_info structure to provide the full flexibility of options
-* fpga_mgr_register() - Create and register an FPGA manager using standard
+* __fpga_mgr_register() - Create and register an FPGA manager using standard
arguments
-* devm_fpga_mgr_register_full() - Resource managed version of
- fpga_mgr_register_full()
-* devm_fpga_mgr_register() - Resource managed version of fpga_mgr_register()
+* __devm_fpga_mgr_register_full() - Resource managed version of
+ __fpga_mgr_register_full()
+* __devm_fpga_mgr_register() - Resource managed version of __fpga_mgr_register()
* fpga_mgr_unregister() - Unregister an FPGA manager

+Helper macros ``fpga_mgr_register_full()``, ``fpga_mgr_register()``,
+``devm_fpga_mgr_register_full()``, and ``devm_fpga_mgr_register()`` are available
+to ease the registration.
+
.. kernel-doc:: include/linux/fpga/fpga-mgr.h
:functions: fpga_mgr_states

@@ -147,16 +153,16 @@ API for implementing a new FPGA Manager driver
:functions: fpga_manager_info

.. kernel-doc:: drivers/fpga/fpga-mgr.c
- :functions: fpga_mgr_register_full
+ :functions: __fpga_mgr_register_full

.. kernel-doc:: drivers/fpga/fpga-mgr.c
- :functions: fpga_mgr_register
+ :functions: __fpga_mgr_register

.. kernel-doc:: drivers/fpga/fpga-mgr.c
- :functions: devm_fpga_mgr_register_full
+ :functions: __devm_fpga_mgr_register_full

.. kernel-doc:: drivers/fpga/fpga-mgr.c
- :functions: devm_fpga_mgr_register
+ :functions: __devm_fpga_mgr_register

.. kernel-doc:: drivers/fpga/fpga-mgr.c
:functions: fpga_mgr_unregister
diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
index 06651389c592..d7bfbdfdf2fc 100644
--- a/drivers/fpga/fpga-mgr.c
+++ b/drivers/fpga/fpga-mgr.c
@@ -664,20 +664,20 @@ static struct attribute *fpga_mgr_attrs[] = {
};
ATTRIBUTE_GROUPS(fpga_mgr);

-static struct fpga_manager *__fpga_mgr_get(struct device *dev)
+static struct fpga_manager *__fpga_mgr_get(struct device *mgr_dev)
{
struct fpga_manager *mgr;

- mgr = to_fpga_manager(dev);
+ mgr = to_fpga_manager(mgr_dev);

- if (!try_module_get(dev->parent->driver->owner))
- goto err_dev;
+ mutex_lock(&mgr->mops_mutex);

- return mgr;
+ if (!mgr->mops || !try_module_get(mgr->mops_owner))
+ mgr = ERR_PTR(-ENODEV);

-err_dev:
- put_device(dev);
- return ERR_PTR(-ENODEV);
+ mutex_unlock(&mgr->mops_mutex);
+
+ return mgr;
}

static int fpga_mgr_dev_match(struct device *dev, const void *data)
@@ -693,12 +693,18 @@ static int fpga_mgr_dev_match(struct device *dev, const void *data)
*/
struct fpga_manager *fpga_mgr_get(struct device *dev)
{
- struct device *mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev,
- fpga_mgr_dev_match);
+ struct fpga_manager *mgr;
+ struct device *mgr_dev;
+
+ mgr_dev = class_find_device(&fpga_mgr_class, NULL, dev, fpga_mgr_dev_match);
if (!mgr_dev)
return ERR_PTR(-ENODEV);

- return __fpga_mgr_get(mgr_dev);
+ mgr = __fpga_mgr_get(mgr_dev);
+ if (IS_ERR(mgr))
+ put_device(mgr_dev);
+
+ return mgr;
}
EXPORT_SYMBOL_GPL(fpga_mgr_get);

@@ -711,13 +717,18 @@ EXPORT_SYMBOL_GPL(fpga_mgr_get);
*/
struct fpga_manager *of_fpga_mgr_get(struct device_node *node)
{
- struct device *dev;
+ struct fpga_manager *mgr;
+ struct device *mgr_dev;

- dev = class_find_device_by_of_node(&fpga_mgr_class, node);
- if (!dev)
+ mgr_dev = class_find_device_by_of_node(&fpga_mgr_class, node);
+ if (!mgr_dev)
return ERR_PTR(-ENODEV);

- return __fpga_mgr_get(dev);
+ mgr = __fpga_mgr_get(mgr_dev);
+ if (IS_ERR(mgr))
+ put_device(mgr_dev);
+
+ return mgr;
}
EXPORT_SYMBOL_GPL(of_fpga_mgr_get);

@@ -727,7 +738,7 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
*/
void fpga_mgr_put(struct fpga_manager *mgr)
{
- module_put(mgr->dev.parent->driver->owner);
+ module_put(mgr->mops_owner);
put_device(&mgr->dev);
}
EXPORT_SYMBOL_GPL(fpga_mgr_put);
@@ -766,9 +777,10 @@ void fpga_mgr_unlock(struct fpga_manager *mgr)
EXPORT_SYMBOL_GPL(fpga_mgr_unlock);

/**
- * fpga_mgr_register_full - create and register an FPGA Manager device
+ * __fpga_mgr_register_full - create and register an FPGA Manager device
* @parent: fpga manager device from pdev
* @info: parameters for fpga manager
+ * @owner: owner module containing the ops
*
* The caller of this function is responsible for calling fpga_mgr_unregister().
* Using devm_fpga_mgr_register_full() instead is recommended.
@@ -776,7 +788,8 @@ EXPORT_SYMBOL_GPL(fpga_mgr_unlock);
* Return: pointer to struct fpga_manager pointer or ERR_PTR()
*/
struct fpga_manager *
-fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info)
+__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
+ struct module *owner)
{
const struct fpga_manager_ops *mops = info->mops;
struct fpga_manager *mgr;
@@ -803,6 +816,9 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in
}

mutex_init(&mgr->ref_mutex);
+ mutex_init(&mgr->mops_mutex);
+
+ mgr->mops_owner = owner;

mgr->name = info->name;
mgr->mops = info->mops;
@@ -841,14 +857,15 @@ fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *in

return ERR_PTR(ret);
}
-EXPORT_SYMBOL_GPL(fpga_mgr_register_full);
+EXPORT_SYMBOL_GPL(__fpga_mgr_register_full);

/**
- * fpga_mgr_register - create and register an FPGA Manager device
+ * __fpga_mgr_register - create and register an FPGA Manager device
* @parent: fpga manager device from pdev
* @name: fpga manager name
* @mops: pointer to structure of fpga manager ops
* @priv: fpga manager private data
+ * @owner: owner module containing the ops
*
* The caller of this function is responsible for calling fpga_mgr_unregister().
* Using devm_fpga_mgr_register() instead is recommended. This simple
@@ -859,8 +876,8 @@ EXPORT_SYMBOL_GPL(fpga_mgr_register_full);
* Return: pointer to struct fpga_manager pointer or ERR_PTR()
*/
struct fpga_manager *
-fpga_mgr_register(struct device *parent, const char *name,
- const struct fpga_manager_ops *mops, void *priv)
+__fpga_mgr_register(struct device *parent, const char *name,
+ const struct fpga_manager_ops *mops, void *priv, struct module *owner)
{
struct fpga_manager_info info = { 0 };

@@ -868,9 +885,9 @@ fpga_mgr_register(struct device *parent, const char *name,
info.mops = mops;
info.priv = priv;

- return fpga_mgr_register_full(parent, &info);
+ return __fpga_mgr_register_full(parent, &info, owner);
}
-EXPORT_SYMBOL_GPL(fpga_mgr_register);
+EXPORT_SYMBOL_GPL(__fpga_mgr_register);

/**
* fpga_mgr_unregister - unregister an FPGA manager
@@ -888,6 +905,12 @@ void fpga_mgr_unregister(struct fpga_manager *mgr)
*/
fpga_mgr_fpga_remove(mgr);

+ mutex_lock(&mgr->mops_mutex);
+
+ mgr->mops = NULL;
+
+ mutex_unlock(&mgr->mops_mutex);
+
device_unregister(&mgr->dev);
}
EXPORT_SYMBOL_GPL(fpga_mgr_unregister);
@@ -900,9 +923,10 @@ static void devm_fpga_mgr_unregister(struct device *dev, void *res)
}

/**
- * devm_fpga_mgr_register_full - resource managed variant of fpga_mgr_register()
+ * __devm_fpga_mgr_register_full - resource managed variant of fpga_mgr_register()
* @parent: fpga manager device from pdev
* @info: parameters for fpga manager
+ * @owner: owner module containing the ops
*
* Return: fpga manager pointer on success, negative error code otherwise.
*
@@ -910,7 +934,8 @@ static void devm_fpga_mgr_unregister(struct device *dev, void *res)
* function will be called automatically when the managing device is detached.
*/
struct fpga_manager *
-devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info)
+__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
+ struct module *owner)
{
struct fpga_mgr_devres *dr;
struct fpga_manager *mgr;
@@ -919,7 +944,7 @@ devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_inf
if (!dr)
return ERR_PTR(-ENOMEM);

- mgr = fpga_mgr_register_full(parent, info);
+ mgr = __fpga_mgr_register_full(parent, info, owner);
if (IS_ERR(mgr)) {
devres_free(dr);
return mgr;
@@ -930,14 +955,15 @@ devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_inf

return mgr;
}
-EXPORT_SYMBOL_GPL(devm_fpga_mgr_register_full);
+EXPORT_SYMBOL_GPL(__devm_fpga_mgr_register_full);

/**
- * devm_fpga_mgr_register - resource managed variant of fpga_mgr_register()
+ * __devm_fpga_mgr_register - resource managed variant of fpga_mgr_register()
* @parent: fpga manager device from pdev
* @name: fpga manager name
* @mops: pointer to structure of fpga manager ops
* @priv: fpga manager private data
+ * @owner: owner module containing the ops
*
* Return: fpga manager pointer on success, negative error code otherwise.
*
@@ -946,8 +972,9 @@ EXPORT_SYMBOL_GPL(devm_fpga_mgr_register_full);
* device is detached.
*/
struct fpga_manager *
-devm_fpga_mgr_register(struct device *parent, const char *name,
- const struct fpga_manager_ops *mops, void *priv)
+__devm_fpga_mgr_register(struct device *parent, const char *name,
+ const struct fpga_manager_ops *mops, void *priv,
+ struct module *owner)
{
struct fpga_manager_info info = { 0 };

@@ -955,9 +982,9 @@ devm_fpga_mgr_register(struct device *parent, const char *name,
info.mops = mops;
info.priv = priv;

- return devm_fpga_mgr_register_full(parent, &info);
+ return __devm_fpga_mgr_register_full(parent, &info, owner);
}
-EXPORT_SYMBOL_GPL(devm_fpga_mgr_register);
+EXPORT_SYMBOL_GPL(__devm_fpga_mgr_register);

static void fpga_mgr_dev_release(struct device *dev)
{
diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h
index 54f63459efd6..0a4f67b59fb5 100644
--- a/include/linux/fpga/fpga-mgr.h
+++ b/include/linux/fpga/fpga-mgr.h
@@ -201,6 +201,8 @@ struct fpga_manager_ops {
* @state: state of fpga manager
* @compat_id: FPGA manager id for compatibility check.
* @mops: pointer to struct of fpga manager ops
+ * @mops_mutex: protects mops from low-level module removal
+ * @mops_owner: module containing the mops
* @priv: low level driver private date
*/
struct fpga_manager {
@@ -210,6 +212,8 @@ struct fpga_manager {
enum fpga_mgr_states state;
struct fpga_compat_id *compat_id;
const struct fpga_manager_ops *mops;
+ struct mutex mops_mutex;
+ struct module *mops_owner;
void *priv;
};

@@ -230,18 +234,30 @@ struct fpga_manager *fpga_mgr_get(struct device *dev);

void fpga_mgr_put(struct fpga_manager *mgr);

+#define fpga_mgr_register_full(parent, info) \
+ __fpga_mgr_register_full(parent, info, THIS_MODULE)
struct fpga_manager *
-fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
+__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
+ struct module *owner);

+#define fpga_mgr_register(parent, name, mops, priv) \
+ __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
struct fpga_manager *
-fpga_mgr_register(struct device *parent, const char *name,
- const struct fpga_manager_ops *mops, void *priv);
+__fpga_mgr_register(struct device *parent, const char *name,
+ const struct fpga_manager_ops *mops, void *priv, struct module *owner);
+
void fpga_mgr_unregister(struct fpga_manager *mgr);

+#define devm_fpga_mgr_register_full(parent, info) \
+ __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
struct fpga_manager *
-devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
+__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
+ struct module *owner);
+#define devm_fpga_mgr_register(parent, name, mops, priv) \
+ __devm_fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
struct fpga_manager *
-devm_fpga_mgr_register(struct device *parent, const char *name,
- const struct fpga_manager_ops *mops, void *priv);
+__devm_fpga_mgr_register(struct device *parent, const char *name,
+ const struct fpga_manager_ops *mops, void *priv,
+ struct module *owner);

#endif /*_LINUX_FPGA_MGR_H */
--
2.43.0


2024-01-30 08:14:41

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

> +#define fpga_mgr_register_full(parent, info) \
> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
> struct fpga_manager *
> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> + struct module *owner);
>
> +#define fpga_mgr_register(parent, name, mops, priv) \
> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> struct fpga_manager *
> -fpga_mgr_register(struct device *parent, const char *name,
> - const struct fpga_manager_ops *mops, void *priv);
> +__fpga_mgr_register(struct device *parent, const char *name,
> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
> +
> void fpga_mgr_unregister(struct fpga_manager *mgr);
>
> +#define devm_fpga_mgr_register_full(parent, info) \
> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
> struct fpga_manager *
> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> + struct module *owner);

Add a line here. I can do it myself if you agree.

There is still a RFC prefix for this patch. Are you ready to get it merged?
If yes, Acked-by: Xu Yilun <[email protected]>

Next time if you think patches are ready for serious review and merge, drop
the RFC prefix. That avoids an extra query.

Thanks,
Yilun

> +#define devm_fpga_mgr_register(parent, name, mops, priv) \
> + __devm_fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> struct fpga_manager *
> -devm_fpga_mgr_register(struct device *parent, const char *name,
> - const struct fpga_manager_ops *mops, void *priv);
> +__devm_fpga_mgr_register(struct device *parent, const char *name,
> + const struct fpga_manager_ops *mops, void *priv,
> + struct module *owner);
>
> #endif /*_LINUX_FPGA_MGR_H */
> --
> 2.43.0
>
>

2024-02-02 17:46:02

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-01-30 05:31, Xu Yilun wrote:
>> +#define fpga_mgr_register_full(parent, info) \
>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>> struct fpga_manager *
>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>> + struct module *owner);
>>
>> +#define fpga_mgr_register(parent, name, mops, priv) \
>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>> struct fpga_manager *
>> -fpga_mgr_register(struct device *parent, const char *name,
>> - const struct fpga_manager_ops *mops, void *priv);
>> +__fpga_mgr_register(struct device *parent, const char *name,
>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>> +
>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>
>> +#define devm_fpga_mgr_register_full(parent, info) \
>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>> struct fpga_manager *
>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>> + struct module *owner);
>
> Add a line here. I can do it myself if you agree.

Sure, that is fine by me. I also spotted a typo in the commit log body
(in taken -> is taken). Do you want me to send a v6, or do you prefer
to fix that in place?

>
> There is still a RFC prefix for this patch. Are you ready to get it merged?
> If yes, Acked-by: Xu Yilun <[email protected]>

I'm ready for the patch to be merged. However, I recently sent an RFC
to propose a safer implementation of try_module_get() that would
simplify the code and may also benefit other subsystems. What do you
think?

https://lore.kernel.org/linux-modules/[email protected]/

> Next time if you think patches are ready for serious review and merge, drop
> the RFC prefix. That avoids an extra query.

Okay, I'll do it like that next time.

Thanks,
Marco


2024-02-04 08:24:28

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>
>
> On 2024-01-30 05:31, Xu Yilun wrote:
> >> +#define fpga_mgr_register_full(parent, info) \
> >> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
> >> struct fpga_manager *
> >> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >> + struct module *owner);
> >>
> >> +#define fpga_mgr_register(parent, name, mops, priv) \
> >> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> >> struct fpga_manager *
> >> -fpga_mgr_register(struct device *parent, const char *name,
> >> - const struct fpga_manager_ops *mops, void *priv);
> >> +__fpga_mgr_register(struct device *parent, const char *name,
> >> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
> >> +
> >> void fpga_mgr_unregister(struct fpga_manager *mgr);
> >>
> >> +#define devm_fpga_mgr_register_full(parent, info) \
> >> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
> >> struct fpga_manager *
> >> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >> + struct module *owner);
> >
> > Add a line here. I can do it myself if you agree.
>
> Sure, that is fine by me. I also spotted a typo in the commit log body
> (in taken -> is taken). Do you want me to send a v6, or do you prefer
> to fix that in place?

No need, I can fix it.

>
> >
> > There is still a RFC prefix for this patch. Are you ready to get it merged?
> > If yes, Acked-by: Xu Yilun <[email protected]>
>
> I'm ready for the patch to be merged. However, I recently sent an RFC
> to propose a safer implementation of try_module_get() that would
> simplify the code and may also benefit other subsystems. What do you
> think?
>
> https://lore.kernel.org/linux-modules/[email protected]/

I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
proposal is applied before the end of this cycle, we could re-evaluate
this patch.

Thanks,
Yilun

>
> > Next time if you think patches are ready for serious review and merge, drop
> > the RFC prefix. That avoids an extra query.
>
> Okay, I'll do it like that next time.
>
> Thanks,
> Marco
>

2024-02-05 17:48:19

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-02-04 06:15, Xu Yilun wrote:
> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>> +#define fpga_mgr_register_full(parent, info) \
>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>> struct fpga_manager *
>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>> + struct module *owner);
>>>>
>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>> struct fpga_manager *
>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>> +
>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>
>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>> struct fpga_manager *
>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>> + struct module *owner);
>>>
>>> Add a line here. I can do it myself if you agree.
>>
>> Sure, that is fine by me. I also spotted a typo in the commit log body
>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>> to fix that in place?
>
> No need, I can fix it.
>
>>
>>>
>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>> If yes, Acked-by: Xu Yilun <[email protected]>
>>
>> I'm ready for the patch to be merged. However, I recently sent an RFC
>> to propose a safer implementation of try_module_get() that would
>> simplify the code and may also benefit other subsystems. What do you
>> think?
>>
>> https://lore.kernel.org/linux-modules/[email protected]/
>
> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
> proposal is applied before the end of this cycle, we could re-evaluate
> this patch.

That's fine by me.

Thanks,
Marco


2024-02-18 10:09:37

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>
>
> On 2024-02-04 06:15, Xu Yilun wrote:
> > On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
> >>
> >>
> >> On 2024-01-30 05:31, Xu Yilun wrote:
> >>>> +#define fpga_mgr_register_full(parent, info) \
> >>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>> struct fpga_manager *
> >>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>> + struct module *owner);
> >>>>
> >>>> +#define fpga_mgr_register(parent, name, mops, priv) \
> >>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> >>>> struct fpga_manager *
> >>>> -fpga_mgr_register(struct device *parent, const char *name,
> >>>> - const struct fpga_manager_ops *mops, void *priv);
> >>>> +__fpga_mgr_register(struct device *parent, const char *name,
> >>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
> >>>> +
> >>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
> >>>>
> >>>> +#define devm_fpga_mgr_register_full(parent, info) \
> >>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>> struct fpga_manager *
> >>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>> + struct module *owner);
> >>>
> >>> Add a line here. I can do it myself if you agree.
> >>
> >> Sure, that is fine by me. I also spotted a typo in the commit log body
> >> (in taken -> is taken). Do you want me to send a v6, or do you prefer
> >> to fix that in place?
> >
> > No need, I can fix it.
> >
> >>
> >>>
> >>> There is still a RFC prefix for this patch. Are you ready to get it merged?
> >>> If yes, Acked-by: Xu Yilun <[email protected]>
> >>
> >> I'm ready for the patch to be merged. However, I recently sent an RFC
> >> to propose a safer implementation of try_module_get() that would
> >> simplify the code and may also benefit other subsystems. What do you
> >> think?
> >>
> >> https://lore.kernel.org/linux-modules/[email protected]/
> >
> > I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
> > proposal is applied before the end of this cycle, we could re-evaluate
> > this patch.
>
> That's fine by me.

Sorry, I still found issues about this solution.

void fpga_mgr_unregister(struct fpga_manager *mgr)
{
dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);

/*
* If the low level driver provides a method for putting fpga into
* a desired state upon unregister, do it.
*/
fpga_mgr_fpga_remove(mgr);

mutex_lock(&mgr->mops_mutex);

mgr->mops = NULL;

mutex_unlock(&mgr->mops_mutex);

device_unregister(&mgr->dev);
}

Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
fpga_manager dev without mops, this is not what the user want and cause
problem when using this fpga_manager dev for other FPGA APIs.

I have this concern when I was reviewing the same improvement for fpga
bridge. The change for fpga bridge seems workable, the mutex keeps hold
until fpga_bridge_put(). But I somewhat don't prefer the unregistration
been unnecessarily blocked for long term.

I think your try_module_get_safe() patch may finally solve the invalid
module owner issue. Some options now, we ignore the invalid module owner
issue (it exists before this change) and merge the rest of the
improvements, or we wait for your patch accepted then re-evaluate. I
prefer the former.

Thanks,
Yilun

>
> Thanks,
> Marco
>

2024-02-20 11:12:56

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-02-18 11:05, Xu Yilun wrote:
> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-02-04 06:15, Xu Yilun wrote:
>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>> struct fpga_manager *
>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>> + struct module *owner);
>>>>>>
>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>> struct fpga_manager *
>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>> +
>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>
>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>> struct fpga_manager *
>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>> + struct module *owner);
>>>>>
>>>>> Add a line here. I can do it myself if you agree.
>>>>
>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>> to fix that in place?
>>>
>>> No need, I can fix it.
>>>
>>>>
>>>>>
>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
>>>>
>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>> to propose a safer implementation of try_module_get() that would
>>>> simplify the code and may also benefit other subsystems. What do you
>>>> think?
>>>>
>>>> https://lore.kernel.org/linux-modules/[email protected]/
>>>
>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>> proposal is applied before the end of this cycle, we could re-evaluate
>>> this patch.
>>
>> That's fine by me.
>
> Sorry, I still found issues about this solution.
>
> void fpga_mgr_unregister(struct fpga_manager *mgr)
> {
> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
>
> /*
> * If the low level driver provides a method for putting fpga into
> * a desired state upon unregister, do it.
> */
> fpga_mgr_fpga_remove(mgr);
>
> mutex_lock(&mgr->mops_mutex);
>
> mgr->mops = NULL;
>
> mutex_unlock(&mgr->mops_mutex);
>
> device_unregister(&mgr->dev);
> }
>
> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
> fpga_manager dev without mops, this is not what the user want and cause
> problem when using this fpga_manager dev for other FPGA APIs.

How about moving mgr->mops = NULL from fpga_mgr_unregister() to
class->dev_release()? In that way, mops will be set to NULL only when the
manager dev refcount reaches 0.

If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
the module's refcount, preventing its removal in the first place. In this case,
when device_unregister() is called, it will trigger dev_release() since the
manager dev refcount has reached 0.

If fpga_mgr_unregister() is called elsewhere in the module that registered the
manager (1), we have two subcases:

a) someone got the manager dev earlier and bumped the module's refcount. Hence,
the ops are safe since the module cannot be removed until the manager dev is
released by calling (the last) put_device(). This, in turn, will trigger
class->dev_release().

b) no one got manager dev. In this case, class->dev_release() will be called
immediately.

(1) The caller of fpga_mgr_register_*() is responsible for calling
fpga_mgr_unregister(), as specified in the docs.

> I have this concern when I was reviewing the same improvement for fpga
> bridge. The change for fpga bridge seems workable, the mutex keeps hold
> until fpga_bridge_put(). But I somewhat don't prefer the unregistration
> been unnecessarily blocked for long term.

I also don't like the idea of potentially blocking the unregistration, but I
could not find a better solution for the bridge at the moment.

> I think your try_module_get_safe() patch may finally solve the invalid
> module owner issue. Some options now, we ignore the invalid module owner
> issue (it exists before this change) and merge the rest of the
> improvements, or we wait for your patch accepted then re-evaluate. I
> prefer the former.

Yeah, try_module_get_safe() would make things simpler and easier. I'm currently
working on a series of selftests to demonstrate that the function is safe from
deadlocks, as requested by the maintainer. I hope I can convince people of the
advantages.

Thanks,
Marco


2024-02-21 17:10:05

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
>
>
> On 2024-02-18 11:05, Xu Yilun wrote:
> > On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
> >>
> >>
> >> On 2024-02-04 06:15, Xu Yilun wrote:
> >>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
> >>>>
> >>>>
> >>>> On 2024-01-30 05:31, Xu Yilun wrote:
> >>>>>> +#define fpga_mgr_register_full(parent, info) \
> >>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>>>> struct fpga_manager *
> >>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>>>> + struct module *owner);
> >>>>>>
> >>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
> >>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> >>>>>> struct fpga_manager *
> >>>>>> -fpga_mgr_register(struct device *parent, const char *name,
> >>>>>> - const struct fpga_manager_ops *mops, void *priv);
> >>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
> >>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
> >>>>>> +
> >>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
> >>>>>>
> >>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
> >>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>>>> struct fpga_manager *
> >>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>>>> + struct module *owner);
> >>>>>
> >>>>> Add a line here. I can do it myself if you agree.
> >>>>
> >>>> Sure, that is fine by me. I also spotted a typo in the commit log body
> >>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
> >>>> to fix that in place?
> >>>
> >>> No need, I can fix it.
> >>>
> >>>>
> >>>>>
> >>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
> >>>>> If yes, Acked-by: Xu Yilun <[email protected]>
> >>>>
> >>>> I'm ready for the patch to be merged. However, I recently sent an RFC
> >>>> to propose a safer implementation of try_module_get() that would
> >>>> simplify the code and may also benefit other subsystems. What do you
> >>>> think?
> >>>>
> >>>> https://lore.kernel.org/linux-modules/[email protected]/
> >>>
> >>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
> >>> proposal is applied before the end of this cycle, we could re-evaluate
> >>> this patch.
> >>
> >> That's fine by me.
> >
> > Sorry, I still found issues about this solution.
> >
> > void fpga_mgr_unregister(struct fpga_manager *mgr)
> > {
> > dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
> >
> > /*
> > * If the low level driver provides a method for putting fpga into
> > * a desired state upon unregister, do it.
> > */
> > fpga_mgr_fpga_remove(mgr);
> >
> > mutex_lock(&mgr->mops_mutex);
> >
> > mgr->mops = NULL;
> >
> > mutex_unlock(&mgr->mops_mutex);
> >
> > device_unregister(&mgr->dev);
> > }
> >
> > Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
> > So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
> > fpga_manager dev without mops, this is not what the user want and cause
> > problem when using this fpga_manager dev for other FPGA APIs.
>
> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
> class->dev_release()? In that way, mops will be set to NULL only when the
> manager dev refcount reaches 0.

I'm afraid it doesn't help. The lifecycle of the module and the fpga
mgr dev is different.

We use mops = NULL to indicate module has been freed or will be freed in no
time. On the other hand mops != NULL means module is still there, so
that try_module_get() could be safely called. It is possible someone
has got fpga mgr dev but not the module yet, at that time the module is
unloaded, then try_module_get() triggers crash.

>
> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up

No, someone may get the manager dev but not the module yet, and been
scheduled out.

> the module's refcount, preventing its removal in the first place. In this case,
> when device_unregister() is called, it will trigger dev_release() since the
> manager dev refcount has reached 0.
>
> If fpga_mgr_unregister() is called elsewhere in the module that registered the
> manager (1), we have two subcases:
>
> a) someone got the manager dev earlier and bumped the module's refcount. Hence,
> the ops are safe since the module cannot be removed until the manager dev is
> released by calling (the last) put_device(). This, in turn, will trigger
> class->dev_release().
>
> b) no one got manager dev. In this case, class->dev_release() will be called
> immediately.
>
> (1) The caller of fpga_mgr_register_*() is responsible for calling
> fpga_mgr_unregister(), as specified in the docs.
>
> > I have this concern when I was reviewing the same improvement for fpga
> > bridge. The change for fpga bridge seems workable, the mutex keeps hold
> > until fpga_bridge_put(). But I somewhat don't prefer the unregistration
> > been unnecessarily blocked for long term.
>
> I also don't like the idea of potentially blocking the unregistration, but I
> could not find a better solution for the bridge at the moment.
>
> > I think your try_module_get_safe() patch may finally solve the invalid
> > module owner issue. Some options now, we ignore the invalid module owner
> > issue (it exists before this change) and merge the rest of the
> > improvements, or we wait for your patch accepted then re-evaluate. I
> > prefer the former.
>
> Yeah, try_module_get_safe() would make things simpler and easier. I'm currently
> working on a series of selftests to demonstrate that the function is safe from
> deadlocks, as requested by the maintainer. I hope I can convince people of the
> advantages.
>
> Thanks,
> Marco
>

2024-02-27 11:56:37

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-02-21 15:37, Xu Yilun wrote:
> On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-02-18 11:05, Xu Yilun wrote:
>>> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-02-04 06:15, Xu Yilun wrote:
>>>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>>>
>>>>>>
>>>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>> struct fpga_manager *
>>>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>> + struct module *owner);
>>>>>>>>
>>>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>>>> struct fpga_manager *
>>>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>>>> +
>>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>>>
>>>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>> struct fpga_manager *
>>>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>> + struct module *owner);
>>>>>>>
>>>>>>> Add a line here. I can do it myself if you agree.
>>>>>>
>>>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>>>> to fix that in place?
>>>>>
>>>>> No need, I can fix it.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
>>>>>>
>>>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>>>> to propose a safer implementation of try_module_get() that would
>>>>>> simplify the code and may also benefit other subsystems. What do you
>>>>>> think?
>>>>>>
>>>>>> https://lore.kernel.org/linux-modules/[email protected]/
>>>>>
>>>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>>>> proposal is applied before the end of this cycle, we could re-evaluate
>>>>> this patch.
>>>>
>>>> That's fine by me.
>>>
>>> Sorry, I still found issues about this solution.
>>>
>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
>>> {
>>> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
>>>
>>> /*
>>> * If the low level driver provides a method for putting fpga into
>>> * a desired state upon unregister, do it.
>>> */
>>> fpga_mgr_fpga_remove(mgr);
>>>
>>> mutex_lock(&mgr->mops_mutex);
>>>
>>> mgr->mops = NULL;
>>>
>>> mutex_unlock(&mgr->mops_mutex);
>>>
>>> device_unregister(&mgr->dev);
>>> }
>>>
>>> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
>>> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
>>> fpga_manager dev without mops, this is not what the user want and cause
>>> problem when using this fpga_manager dev for other FPGA APIs.
>>
>> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
>> class->dev_release()? In that way, mops will be set to NULL only when the
>> manager dev refcount reaches 0.
>
> I'm afraid it doesn't help. The lifecycle of the module and the fpga
> mgr dev is different.
>
> We use mops = NULL to indicate module has been freed or will be freed in no
> time. On the other hand mops != NULL means module is still there, so
> that try_module_get() could be safely called. It is possible someone
> has got fpga mgr dev but not the module yet, at that time the module is
> unloaded, then try_module_get() triggers crash.
>
>>
>> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
>> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
>
> No, someone may get the manager dev but not the module yet, and been
> scheduled out.
>

You are right. Overall, it's a bad idea. How about then using an additional
bool flag instead of "overloading" the mops pointer? Something like:

get:
if (!mgr->owner_valid || !try_module_get(mgr->mops_owner))

remove:
mgr->owner_valid = false;

Another possibility that comes to my mind would be to "overload" the owner
pointer itself by using the ERR_PTR/IS_ERR macros. However, it looks ugly
to me.

Thanks,
Marco


[...]


2024-02-28 07:15:11

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

On Tue, Feb 27, 2024 at 12:49:06PM +0100, Marco Pagani wrote:
>
>
> On 2024-02-21 15:37, Xu Yilun wrote:
> > On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
> >>
> >>
> >> On 2024-02-18 11:05, Xu Yilun wrote:
> >>> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
> >>>>
> >>>>
> >>>> On 2024-02-04 06:15, Xu Yilun wrote:
> >>>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2024-01-30 05:31, Xu Yilun wrote:
> >>>>>>>> +#define fpga_mgr_register_full(parent, info) \
> >>>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>>>>>> struct fpga_manager *
> >>>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>>>>>> + struct module *owner);
> >>>>>>>>
> >>>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
> >>>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> >>>>>>>> struct fpga_manager *
> >>>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
> >>>>>>>> - const struct fpga_manager_ops *mops, void *priv);
> >>>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
> >>>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
> >>>>>>>> +
> >>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
> >>>>>>>>
> >>>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
> >>>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>>>>>> struct fpga_manager *
> >>>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>>>>>> + struct module *owner);
> >>>>>>>
> >>>>>>> Add a line here. I can do it myself if you agree.
> >>>>>>
> >>>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
> >>>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
> >>>>>> to fix that in place?
> >>>>>
> >>>>> No need, I can fix it.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
> >>>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
> >>>>>>
> >>>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
> >>>>>> to propose a safer implementation of try_module_get() that would
> >>>>>> simplify the code and may also benefit other subsystems. What do you
> >>>>>> think?
> >>>>>>
> >>>>>> https://lore.kernel.org/linux-modules/[email protected]/
> >>>>>
> >>>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
> >>>>> proposal is applied before the end of this cycle, we could re-evaluate
> >>>>> this patch.
> >>>>
> >>>> That's fine by me.
> >>>
> >>> Sorry, I still found issues about this solution.
> >>>
> >>> void fpga_mgr_unregister(struct fpga_manager *mgr)
> >>> {
> >>> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
> >>>
> >>> /*
> >>> * If the low level driver provides a method for putting fpga into
> >>> * a desired state upon unregister, do it.
> >>> */
> >>> fpga_mgr_fpga_remove(mgr);
> >>>
> >>> mutex_lock(&mgr->mops_mutex);
> >>>
> >>> mgr->mops = NULL;
> >>>
> >>> mutex_unlock(&mgr->mops_mutex);
> >>>
> >>> device_unregister(&mgr->dev);
> >>> }
> >>>
> >>> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
> >>> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
> >>> fpga_manager dev without mops, this is not what the user want and cause
> >>> problem when using this fpga_manager dev for other FPGA APIs.
> >>
> >> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
> >> class->dev_release()? In that way, mops will be set to NULL only when the
> >> manager dev refcount reaches 0.
> >
> > I'm afraid it doesn't help. The lifecycle of the module and the fpga
> > mgr dev is different.
> >
> > We use mops = NULL to indicate module has been freed or will be freed in no
> > time. On the other hand mops != NULL means module is still there, so
> > that try_module_get() could be safely called. It is possible someone
> > has got fpga mgr dev but not the module yet, at that time the module is
> > unloaded, then try_module_get() triggers crash.
> >
> >>
> >> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
> >> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
> >
> > No, someone may get the manager dev but not the module yet, and been
> > scheduled out.
> >
>
> You are right. Overall, it's a bad idea. How about then using an additional
> bool flag instead of "overloading" the mops pointer? Something like:
>
> get:
> if (!mgr->owner_valid || !try_module_get(mgr->mops_owner))
>
> remove:
> mgr->owner_valid = false;

I'm not quite sure which function is actually mentioned by "remove". I
assume it should be fpga_mgr_unregister(). IIUC this flag means no
more reference to fpga mgr, but existing references are still valid.

It works for me. But the name of this flag could be reconsidered to
avoid misunderstanding. The owner is still valid (we still need to put
the owner) but allows no more reference. Maybe "owner_inactive"?

I still wanna this owner reference change been splitted, so that
we could simply revert it when the try_module_get_safe() got accepted.

Thanks,
Yilun

>
> Another possibility that comes to my mind would be to "overload" the owner
> pointer itself by using the ERR_PTR/IS_ERR macros. However, it looks ugly
> to me.
>
> Thanks,
> Marco
>
>
> [...]
>

2024-02-29 10:37:27

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount


On 2024-02-28 08:10, Xu Yilun wrote:
> On Tue, Feb 27, 2024 at 12:49:06PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-02-21 15:37, Xu Yilun wrote:
>>> On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-02-18 11:05, Xu Yilun wrote:
>>>>> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>>>>>
>>>>>>
>>>>>> On 2024-02-04 06:15, Xu Yilun wrote:
>>>>>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>>>> struct fpga_manager *
>>>>>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>>>> + struct module *owner);
>>>>>>>>>>
>>>>>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>>>>>> struct fpga_manager *
>>>>>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>>>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>>>>>> +
>>>>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>>>>>
>>>>>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>>>> struct fpga_manager *
>>>>>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>>>> + struct module *owner);
>>>>>>>>>
>>>>>>>>> Add a line here. I can do it myself if you agree.
>>>>>>>>
>>>>>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>>>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>>>>>> to fix that in place?
>>>>>>>
>>>>>>> No need, I can fix it.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
>>>>>>>>
>>>>>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>>>>>> to propose a safer implementation of try_module_get() that would
>>>>>>>> simplify the code and may also benefit other subsystems. What do you
>>>>>>>> think?
>>>>>>>>
>>>>>>>> https://lore.kernel.org/linux-modules/[email protected]/
>>>>>>>
>>>>>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>>>>>> proposal is applied before the end of this cycle, we could re-evaluate
>>>>>>> this patch.
>>>>>>
>>>>>> That's fine by me.
>>>>>
>>>>> Sorry, I still found issues about this solution.
>>>>>
>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
>>>>> {
>>>>> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
>>>>>
>>>>> /*
>>>>> * If the low level driver provides a method for putting fpga into
>>>>> * a desired state upon unregister, do it.
>>>>> */
>>>>> fpga_mgr_fpga_remove(mgr);
>>>>>
>>>>> mutex_lock(&mgr->mops_mutex);
>>>>>
>>>>> mgr->mops = NULL;
>>>>>
>>>>> mutex_unlock(&mgr->mops_mutex);
>>>>>
>>>>> device_unregister(&mgr->dev);
>>>>> }
>>>>>
>>>>> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
>>>>> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
>>>>> fpga_manager dev without mops, this is not what the user want and cause
>>>>> problem when using this fpga_manager dev for other FPGA APIs.
>>>>
>>>> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
>>>> class->dev_release()? In that way, mops will be set to NULL only when the
>>>> manager dev refcount reaches 0.
>>>
>>> I'm afraid it doesn't help. The lifecycle of the module and the fpga
>>> mgr dev is different.
>>>
>>> We use mops = NULL to indicate module has been freed or will be freed in no
>>> time. On the other hand mops != NULL means module is still there, so
>>> that try_module_get() could be safely called. It is possible someone
>>> has got fpga mgr dev but not the module yet, at that time the module is
>>> unloaded, then try_module_get() triggers crash.
>>>
>>>>
>>>> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
>>>> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
>>>
>>> No, someone may get the manager dev but not the module yet, and been
>>> scheduled out.
>>>
>>
>> You are right. Overall, it's a bad idea. How about then using an additional
>> bool flag instead of "overloading" the mops pointer? Something like:
>>
>> get:
>> if (!mgr->owner_valid || !try_module_get(mgr->mops_owner))
>>
>> remove:
>> mgr->owner_valid = false;
>
> I'm not quite sure which function is actually mentioned by "remove". I
> assume it should be fpga_mgr_unregister().

Yes, I was referring to fpga_mgr_unregister().

> IIUC this flag means no more reference to fpga mgr, but existing
> references are still valid.

Yes.

>
> It works for me. But the name of this flag could be reconsidered to
> avoid misunderstanding. The owner is still valid (we still need to put
> the owner) but allows no more reference. Maybe "owner_inactive"?

Right, owner_valid might be misleading. How about removing any
reference to the owner module and name the flag unreg?

__fpga_mgr_get:
if (mgr->unreg || !try_module_get(mgr->mops_owner))
mgr = ERR_PTR(-ENODEV);

fpga_mgr_unregister:
mgr->unreg = true;

> I still wanna this owner reference change been splitted, so that
> we could simply revert it when the try_module_get_safe() got accepted.

I guess it may take some time to have try_module_get_safe() accepted.
What do you prefer to do with the bridge and the region in the
meantime?

Thanks,
Marco


2024-03-01 15:17:05

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

On Thu, Feb 29, 2024 at 11:37:10AM +0100, Marco Pagani wrote:
>
> On 2024-02-28 08:10, Xu Yilun wrote:
> > On Tue, Feb 27, 2024 at 12:49:06PM +0100, Marco Pagani wrote:
> >>
> >>
> >> On 2024-02-21 15:37, Xu Yilun wrote:
> >>> On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
> >>>>
> >>>>
> >>>> On 2024-02-18 11:05, Xu Yilun wrote:
> >>>>> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 2024-02-04 06:15, Xu Yilun wrote:
> >>>>>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024-01-30 05:31, Xu Yilun wrote:
> >>>>>>>>>> +#define fpga_mgr_register_full(parent, info) \
> >>>>>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>>>>>>>> struct fpga_manager *
> >>>>>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>>>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>>>>>>>> + struct module *owner);
> >>>>>>>>>>
> >>>>>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
> >>>>>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
> >>>>>>>>>> struct fpga_manager *
> >>>>>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
> >>>>>>>>>> - const struct fpga_manager_ops *mops, void *priv);
> >>>>>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
> >>>>>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
> >>>>>>>>>> +
> >>>>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
> >>>>>>>>>>
> >>>>>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
> >>>>>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
> >>>>>>>>>> struct fpga_manager *
> >>>>>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
> >>>>>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
> >>>>>>>>>> + struct module *owner);
> >>>>>>>>>
> >>>>>>>>> Add a line here. I can do it myself if you agree.
> >>>>>>>>
> >>>>>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
> >>>>>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
> >>>>>>>> to fix that in place?
> >>>>>>>
> >>>>>>> No need, I can fix it.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
> >>>>>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
> >>>>>>>>
> >>>>>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
> >>>>>>>> to propose a safer implementation of try_module_get() that would
> >>>>>>>> simplify the code and may also benefit other subsystems. What do you
> >>>>>>>> think?
> >>>>>>>>
> >>>>>>>> https://lore.kernel.org/linux-modules/[email protected]/
> >>>>>>>
> >>>>>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
> >>>>>>> proposal is applied before the end of this cycle, we could re-evaluate
> >>>>>>> this patch.
> >>>>>>
> >>>>>> That's fine by me.
> >>>>>
> >>>>> Sorry, I still found issues about this solution.
> >>>>>
> >>>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
> >>>>> {
> >>>>> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
> >>>>>
> >>>>> /*
> >>>>> * If the low level driver provides a method for putting fpga into
> >>>>> * a desired state upon unregister, do it.
> >>>>> */
> >>>>> fpga_mgr_fpga_remove(mgr);
> >>>>>
> >>>>> mutex_lock(&mgr->mops_mutex);
> >>>>>
> >>>>> mgr->mops = NULL;
> >>>>>
> >>>>> mutex_unlock(&mgr->mops_mutex);
> >>>>>
> >>>>> device_unregister(&mgr->dev);
> >>>>> }
> >>>>>
> >>>>> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
> >>>>> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
> >>>>> fpga_manager dev without mops, this is not what the user want and cause
> >>>>> problem when using this fpga_manager dev for other FPGA APIs.
> >>>>
> >>>> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
> >>>> class->dev_release()? In that way, mops will be set to NULL only when the
> >>>> manager dev refcount reaches 0.
> >>>
> >>> I'm afraid it doesn't help. The lifecycle of the module and the fpga
> >>> mgr dev is different.
> >>>
> >>> We use mops = NULL to indicate module has been freed or will be freed in no
> >>> time. On the other hand mops != NULL means module is still there, so
> >>> that try_module_get() could be safely called. It is possible someone
> >>> has got fpga mgr dev but not the module yet, at that time the module is
> >>> unloaded, then try_module_get() triggers crash.
> >>>
> >>>>
> >>>> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
> >>>> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
> >>>
> >>> No, someone may get the manager dev but not the module yet, and been
> >>> scheduled out.
> >>>
> >>
> >> You are right. Overall, it's a bad idea. How about then using an additional
> >> bool flag instead of "overloading" the mops pointer? Something like:
> >>
> >> get:
> >> if (!mgr->owner_valid || !try_module_get(mgr->mops_owner))
> >>
> >> remove:
> >> mgr->owner_valid = false;
> >
> > I'm not quite sure which function is actually mentioned by "remove". I
> > assume it should be fpga_mgr_unregister().
>
> Yes, I was referring to fpga_mgr_unregister().
>
> > IIUC this flag means no more reference to fpga mgr, but existing
> > references are still valid.
>
> Yes.
>
> >
> > It works for me. But the name of this flag could be reconsidered to
> > avoid misunderstanding. The owner is still valid (we still need to put
> > the owner) but allows no more reference. Maybe "owner_inactive"?
>
> Right, owner_valid might be misleading. How about removing any
> reference to the owner module and name the flag unreg?

the full name "unregistered" is better.

>
> __fpga_mgr_get:
> if (mgr->unreg || !try_module_get(mgr->mops_owner))
> mgr = ERR_PTR(-ENODEV);
>
> fpga_mgr_unregister:
> mgr->unreg = true;
>
> > I still wanna this owner reference change been splitted, so that
> > we could simply revert it when the try_module_get_safe() got accepted.
>
> I guess it may take some time to have try_module_get_safe() accepted.
> What do you prefer to do with the bridge and the region in the
> meantime?

This issue could happen in little chance. I actually don't have much
preference, either way is good to me.

Thanks,
Yilun

>
> Thanks,
> Marco
>

2024-03-01 16:29:56

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-03-01 16:12, Xu Yilun wrote:
> On Thu, Feb 29, 2024 at 11:37:10AM +0100, Marco Pagani wrote:
>>
>> On 2024-02-28 08:10, Xu Yilun wrote:
>>> On Tue, Feb 27, 2024 at 12:49:06PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-02-21 15:37, Xu Yilun wrote:
>>>>> On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
>>>>>>
>>>>>>
>>>>>> On 2024-02-18 11:05, Xu Yilun wrote:
>>>>>>> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024-02-04 06:15, Xu Yilun wrote:
>>>>>>>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>>>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>>>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>>>>>> struct fpga_manager *
>>>>>>>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>>>>>> + struct module *owner);
>>>>>>>>>>>>
>>>>>>>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>>>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>>>>>>>> struct fpga_manager *
>>>>>>>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>>>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>>>>>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>>>>>>>> +
>>>>>>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>>>>>>>
>>>>>>>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>>>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>>>>>> struct fpga_manager *
>>>>>>>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>>>>>> + struct module *owner);
>>>>>>>>>>>
>>>>>>>>>>> Add a line here. I can do it myself if you agree.
>>>>>>>>>>
>>>>>>>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>>>>>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>>>>>>>> to fix that in place?
>>>>>>>>>
>>>>>>>>> No need, I can fix it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>>>>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
>>>>>>>>>>
>>>>>>>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>>>>>>>> to propose a safer implementation of try_module_get() that would
>>>>>>>>>> simplify the code and may also benefit other subsystems. What do you
>>>>>>>>>> think?
>>>>>>>>>>
>>>>>>>>>> https://lore.kernel.org/linux-modules/[email protected]/
>>>>>>>>>
>>>>>>>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>>>>>>>> proposal is applied before the end of this cycle, we could re-evaluate
>>>>>>>>> this patch.
>>>>>>>>
>>>>>>>> That's fine by me.
>>>>>>>
>>>>>>> Sorry, I still found issues about this solution.
>>>>>>>
>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
>>>>>>> {
>>>>>>> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
>>>>>>>
>>>>>>> /*
>>>>>>> * If the low level driver provides a method for putting fpga into
>>>>>>> * a desired state upon unregister, do it.
>>>>>>> */
>>>>>>> fpga_mgr_fpga_remove(mgr);
>>>>>>>
>>>>>>> mutex_lock(&mgr->mops_mutex);
>>>>>>>
>>>>>>> mgr->mops = NULL;
>>>>>>>
>>>>>>> mutex_unlock(&mgr->mops_mutex);
>>>>>>>
>>>>>>> device_unregister(&mgr->dev);
>>>>>>> }
>>>>>>>
>>>>>>> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
>>>>>>> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
>>>>>>> fpga_manager dev without mops, this is not what the user want and cause
>>>>>>> problem when using this fpga_manager dev for other FPGA APIs.
>>>>>>
>>>>>> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
>>>>>> class->dev_release()? In that way, mops will be set to NULL only when the
>>>>>> manager dev refcount reaches 0.
>>>>>
>>>>> I'm afraid it doesn't help. The lifecycle of the module and the fpga
>>>>> mgr dev is different.
>>>>>
>>>>> We use mops = NULL to indicate module has been freed or will be freed in no
>>>>> time. On the other hand mops != NULL means module is still there, so
>>>>> that try_module_get() could be safely called. It is possible someone
>>>>> has got fpga mgr dev but not the module yet, at that time the module is
>>>>> unloaded, then try_module_get() triggers crash.
>>>>>
>>>>>>
>>>>>> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
>>>>>> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
>>>>>
>>>>> No, someone may get the manager dev but not the module yet, and been
>>>>> scheduled out.
>>>>>
>>>>
>>>> You are right. Overall, it's a bad idea. How about then using an additional
>>>> bool flag instead of "overloading" the mops pointer? Something like:
>>>>
>>>> get:
>>>> if (!mgr->owner_valid || !try_module_get(mgr->mops_owner))
>>>>
>>>> remove:
>>>> mgr->owner_valid = false;
>>>
>>> I'm not quite sure which function is actually mentioned by "remove". I
>>> assume it should be fpga_mgr_unregister().
>>
>> Yes, I was referring to fpga_mgr_unregister().
>>
>>> IIUC this flag means no more reference to fpga mgr, but existing
>>> references are still valid.
>>
>> Yes.
>>
>>>
>>> It works for me. But the name of this flag could be reconsidered to
>>> avoid misunderstanding. The owner is still valid (we still need to put
>>> the owner) but allows no more reference. Maybe "owner_inactive"?
>>
>> Right, owner_valid might be misleading. How about removing any
>> reference to the owner module and name the flag unreg?
>
> the full name "unregistered" is better.

That's fine by me.

>
>>
>> __fpga_mgr_get:
>> if (mgr->unreg || !try_module_get(mgr->mops_owner))
>> mgr = ERR_PTR(-ENODEV);
>>
>> fpga_mgr_unregister:
>> mgr->unreg = true;
>>
>>> I still wanna this owner reference change been splitted, so that
>>> we could simply revert it when the try_module_get_safe() got accepted.
>>
>> I guess it may take some time to have try_module_get_safe() accepted.
>> What do you prefer to do with the bridge and the region in the
>> meantime?
>
> This issue could happen in little chance. I actually don't have much
> preference, either way is good to me.
>

Okay, I'll also send the patch for the region then.

Thanks,
Marco


2024-03-04 11:51:39

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-02-28 08:10, Xu Yilun wrote:
> On Tue, Feb 27, 2024 at 12:49:06PM +0100, Marco Pagani wrote:
>>
>>
>> On 2024-02-21 15:37, Xu Yilun wrote:
>>> On Tue, Feb 20, 2024 at 12:11:26PM +0100, Marco Pagani wrote:
>>>>
>>>>
>>>> On 2024-02-18 11:05, Xu Yilun wrote:
>>>>> On Mon, Feb 05, 2024 at 06:47:34PM +0100, Marco Pagani wrote:
>>>>>>
>>>>>>
>>>>>> On 2024-02-04 06:15, Xu Yilun wrote:
>>>>>>> On Fri, Feb 02, 2024 at 06:44:01PM +0100, Marco Pagani wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024-01-30 05:31, Xu Yilun wrote:
>>>>>>>>>> +#define fpga_mgr_register_full(parent, info) \
>>>>>>>>>> + __fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>>>> struct fpga_manager *
>>>>>>>>>> -fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>>>> +__fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>>>> + struct module *owner);
>>>>>>>>>>
>>>>>>>>>> +#define fpga_mgr_register(parent, name, mops, priv) \
>>>>>>>>>> + __fpga_mgr_register(parent, name, mops, priv, THIS_MODULE)
>>>>>>>>>> struct fpga_manager *
>>>>>>>>>> -fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>>>> - const struct fpga_manager_ops *mops, void *priv);
>>>>>>>>>> +__fpga_mgr_register(struct device *parent, const char *name,
>>>>>>>>>> + const struct fpga_manager_ops *mops, void *priv, struct module *owner);
>>>>>>>>>> +
>>>>>>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr);
>>>>>>>>>>
>>>>>>>>>> +#define devm_fpga_mgr_register_full(parent, info) \
>>>>>>>>>> + __devm_fpga_mgr_register_full(parent, info, THIS_MODULE)
>>>>>>>>>> struct fpga_manager *
>>>>>>>>>> -devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info);
>>>>>>>>>> +__devm_fpga_mgr_register_full(struct device *parent, const struct fpga_manager_info *info,
>>>>>>>>>> + struct module *owner);
>>>>>>>>>
>>>>>>>>> Add a line here. I can do it myself if you agree.
>>>>>>>>
>>>>>>>> Sure, that is fine by me. I also spotted a typo in the commit log body
>>>>>>>> (in taken -> is taken). Do you want me to send a v6, or do you prefer
>>>>>>>> to fix that in place?
>>>>>>>
>>>>>>> No need, I can fix it.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is still a RFC prefix for this patch. Are you ready to get it merged?
>>>>>>>>> If yes, Acked-by: Xu Yilun <[email protected]>
>>>>>>>>
>>>>>>>> I'm ready for the patch to be merged. However, I recently sent an RFC
>>>>>>>> to propose a safer implementation of try_module_get() that would
>>>>>>>> simplify the code and may also benefit other subsystems. What do you
>>>>>>>> think?
>>>>>>>>
>>>>>>>> https://lore.kernel.org/linux-modules/[email protected]/
>>>>>>>
>>>>>>> I suggest take your fix to linux-fpga/for-next now. If your try_module_get()
>>>>>>> proposal is applied before the end of this cycle, we could re-evaluate
>>>>>>> this patch.
>>>>>>
>>>>>> That's fine by me.
>>>>>
>>>>> Sorry, I still found issues about this solution.
>>>>>
>>>>> void fpga_mgr_unregister(struct fpga_manager *mgr)
>>>>> {
>>>>> dev_info(&mgr->dev, "%s %s\n", __func__, mgr->name);
>>>>>
>>>>> /*
>>>>> * If the low level driver provides a method for putting fpga into
>>>>> * a desired state upon unregister, do it.
>>>>> */
>>>>> fpga_mgr_fpga_remove(mgr);
>>>>>
>>>>> mutex_lock(&mgr->mops_mutex);
>>>>>
>>>>> mgr->mops = NULL;
>>>>>
>>>>> mutex_unlock(&mgr->mops_mutex);
>>>>>
>>>>> device_unregister(&mgr->dev);
>>>>> }
>>>>>
>>>>> Note that fpga_mgr_unregister() doesn't have to be called in module_exit().
>>>>> So if we do fpga_mgr_get() then fpga_mgr_unregister(), We finally had a
>>>>> fpga_manager dev without mops, this is not what the user want and cause
>>>>> problem when using this fpga_manager dev for other FPGA APIs.
>>>>
>>>> How about moving mgr->mops = NULL from fpga_mgr_unregister() to
>>>> class->dev_release()? In that way, mops will be set to NULL only when the
>>>> manager dev refcount reaches 0.
>>>
>>> I'm afraid it doesn't help. The lifecycle of the module and the fpga
>>> mgr dev is different.
>>>
>>> We use mops = NULL to indicate module has been freed or will be freed in no
>>> time. On the other hand mops != NULL means module is still there, so
>>> that try_module_get() could be safely called. It is possible someone
>>> has got fpga mgr dev but not the module yet, at that time the module is
>>> unloaded, then try_module_get() triggers crash.
>>>
>>>>
>>>> If fpga_mgr_unregister() is called from module_exit(), we are sure that nobody
>>>> got the manager dev earlier using fpga_mgr_get(), or it would have bumped up
>>>
>>> No, someone may get the manager dev but not the module yet, and been
>>> scheduled out.
>>>
>>
>> You are right. Overall, it's a bad idea. How about then using an additional
>> bool flag instead of "overloading" the mops pointer? Something like:
>>
>> get:
>> if (!mgr->owner_valid || !try_module_get(mgr->mops_owner))
>>
>> remove:
>> mgr->owner_valid = false;
>
> I'm not quite sure which function is actually mentioned by "remove". I
> assume it should be fpga_mgr_unregister(). IIUC this flag means no
> more reference to fpga mgr, but existing references are still valid.
>
> It works for me. But the name of this flag could be reconsidered to
> avoid misunderstanding. The owner is still valid (we still need to put
> the owner) but allows no more reference. Maybe "owner_inactive"?
>
> I still wanna this owner reference change been splitted, so that
> we could simply revert it when the try_module_get_safe() got accepted.
>

Just to be sure that I understood correctly, you want to split the
changes into two patches, like:

a) add module owner to the manager struct and take it in
__fpga_mgr_get(); move put_device() from __fpga_mgr_get() to
fpga_mgr_get() and of_fpga_mgr_get().

b) add the mutex and the unregistered flag for protection against races.

So that (b) can be reverted if try_module_get_safe() will be accepted?

> [...]

Thanks,
Marco


2024-03-04 17:10:32

by Xu Yilun

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount

> Just to be sure that I understood correctly, you want to split the
> changes into two patches, like:
>
> a) add module owner to the manager struct and take it in
> __fpga_mgr_get(); move put_device() from __fpga_mgr_get() to
> fpga_mgr_get() and of_fpga_mgr_get().
>
> b) add the mutex and the unregistered flag for protection against races.
>
> So that (b) can be reverted if try_module_get_safe() will be accepted?

Yes, that's what I mean.

>
> > [...]
>
> Thanks,
> Marco
>
>

2024-03-04 22:27:17

by Marco Pagani

[permalink] [raw]
Subject: Re: [RFC PATCH v5 1/1] fpga: add an owner and use it to take the low-level module's refcount



On 2024-03-04 14:49, Xu Yilun wrote:
>> Just to be sure that I understood correctly, you want to split the
>> changes into two patches, like:
>>
>> a) add module owner to the manager struct and take it in
>> __fpga_mgr_get(); move put_device() from __fpga_mgr_get() to
>> fpga_mgr_get() and of_fpga_mgr_get().
>>
>> b) add the mutex and the unregistered flag for protection against races.
>>
>> So that (b) can be reverted if try_module_get_safe() will be accepted?
>
> Yes, that's what I mean.
>

On second thought, I would prefer to send only the patches (a) for the
manager, bridge, and region for the moment. As you said, the chances of
having a race are slim (no crash has ever been reported to date). So,
after (a), I think it is worth focusing first on try_module_get_safe()
since it could also be useful for other subsystems. The (b) patches
could always be applied later if try_module_get_safe() is not accepted.
Thanks,
Marco