2020-01-06 18:59:24

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 0/3] drivers base: transport component error propagation

Hi,

This small series improves error propagation on the transport component
to prevent an inconsistent state in the iscsi module. The bug that
motivated this patch results in a hanging iscsi connection that cannot
be used or removed by userspace, since the session is in an inconsistent
state.

That said, I tested it using the TCP iscsi transport (and forcing errors
on the triggered function), which doesn't require a particularly complex
container structure, so it is not the best test for finding corner cases
on the atomic attribute_container_device trigger version.

Please let me know what you think.

Gabriel Krisman Bertazi (3):
drivers: base: Support atomic version of
attribute_container_device_trigger
drivers: base: Propagate errors through the transport component
iscsi: Fail session and connection on transport registration failure

drivers/base/attribute_container.c | 103 ++++++++++++++++++++++++++++
drivers/base/transport_class.c | 11 ++-
drivers/scsi/scsi_transport_iscsi.c | 18 ++++-
include/linux/attribute_container.h | 7 ++
include/linux/transport_class.h | 6 +-
5 files changed, 137 insertions(+), 8 deletions(-)

--
2.24.1


2020-01-06 18:59:31

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 1/3] drivers: base: Support atomic version of attribute_container_device_trigger

attribute_container_device_trigger invokes callbacks that may fail for
one or more classdev's, for instance, the transport_add_class_device
callback, called during transport creation, does memory allocation.
This information, though, is not propagated to upper layers, and any
driver using the attribute_container_device_trigger API will not know
whether any, some, or all callbacks succeeded.

This patch implements a safe version of this dispatcher, to either
succeed all the callbacks or revert to the original state.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
drivers/base/attribute_container.c | 103 ++++++++++++++++++++++++++++
include/linux/attribute_container.h | 7 ++
2 files changed, 110 insertions(+)

diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c
index 20736aaa0e69..f7bd0f4db13d 100644
--- a/drivers/base/attribute_container.c
+++ b/drivers/base/attribute_container.c
@@ -236,6 +236,109 @@ attribute_container_remove_device(struct device *dev,
mutex_unlock(&attribute_container_mutex);
}

+static int
+do_attribute_container_device_trigger_safe(struct device *dev,
+ struct attribute_container *cont,
+ int (*fn)(struct attribute_container *,
+ struct device *, struct device *),
+ int (*undo)(struct attribute_container *,
+ struct device *, struct device *))
+{
+ int ret;
+ struct internal_container *ic, *failed;
+ struct klist_iter iter;
+
+ if (attribute_container_no_classdevs(cont))
+ return fn(cont, dev, NULL);
+
+ klist_for_each_entry(ic, &cont->containers, node, &iter) {
+ if (dev == ic->classdev.parent) {
+ ret = fn(cont, dev, &ic->classdev);
+ if (ret) {
+ failed = ic;
+ klist_iter_exit(&iter);
+ goto fail;
+ }
+ }
+ }
+ return 0;
+
+fail:
+ if (!undo)
+ return ret;
+
+ /* Attempt to undo the work partially done. */
+ klist_for_each_entry(ic, &cont->containers, node, &iter) {
+ if (ic == failed) {
+ klist_iter_exit(&iter);
+ break;
+ }
+ if (dev == ic->classdev.parent)
+ undo(cont, dev, &ic->classdev);
+ }
+ return ret;
+}
+
+/**
+ * attribute_container_device_trigger_safe - execute a trigger for each
+ * matching classdev or fail all of them.
+ *
+ * @dev: The generic device to run the trigger for
+ * @fn the function to execute for each classdev.
+ * @undo A function to undo the work previously done in case of error
+ *
+ * This function is a safe version of
+ * attribute_container_device_trigger. It stops on the first error and
+ * undo the partial work that has been done, on previous classdev. It
+ * is guaranteed that either they all succeeded, or none of them
+ * succeeded.
+ */
+int
+attribute_container_device_trigger_safe(struct device *dev,
+ int (*fn)(struct attribute_container *,
+ struct device *,
+ struct device *),
+ int (*undo)(struct attribute_container *,
+ struct device *,
+ struct device *))
+{
+ struct attribute_container *cont, *failed = NULL;
+ int ret = 0;
+
+ mutex_lock(&attribute_container_mutex);
+
+ list_for_each_entry(cont, &attribute_container_list, node) {
+
+ if (!cont->match(cont, dev))
+ continue;
+
+ ret = do_attribute_container_device_trigger_safe(dev, cont,
+ fn, undo);
+ if (ret) {
+ failed = cont;
+ break;
+ }
+ }
+
+ if (ret && !WARN_ON(!undo)) {
+ list_for_each_entry(cont, &attribute_container_list, node) {
+
+ if (failed == cont)
+ break;
+
+ if (!cont->match(cont, dev))
+ continue;
+
+ do_attribute_container_device_trigger_safe(dev, cont,
+ undo, NULL);
+ }
+ }
+
+ mutex_unlock(&attribute_container_mutex);
+ return ret;
+
+}
+
/**
* attribute_container_device_trigger - execute a trigger for each matching classdev
*
diff --git a/include/linux/attribute_container.h b/include/linux/attribute_container.h
index d12bb2153cd6..e4004d1e6725 100644
--- a/include/linux/attribute_container.h
+++ b/include/linux/attribute_container.h
@@ -54,6 +54,13 @@ void attribute_container_device_trigger(struct device *dev,
int (*fn)(struct attribute_container *,
struct device *,
struct device *));
+int attribute_container_device_trigger_safe(struct device *dev,
+ int (*fn)(struct attribute_container *,
+ struct device *,
+ struct device *),
+ int (*undo)(struct attribute_container *,
+ struct device *,
+ struct device *));
void attribute_container_trigger(struct device *dev,
int (*fn)(struct attribute_container *,
struct device *));
--
2.24.1

2020-01-06 18:59:44

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 3/3] iscsi: Fail session and connection on transport registration failure

If the transport cannot be registered, the session/connection creation
needs to be failed early to let the initiator know. Otherwise, the
system will have an outstanding connection that cannot be used nor
removed by open-iscsi. The result is similar to the error below,
triggered by injecting a failure in the transport's registration path.

openiscsi reports success:

root@debian-vm:~# iscsiadm -m node -T iqn:lun1 -p 127.0.0.1 -l
Logging in to [iface: default, target: iqn:lun1, portal: 127.0.0.1,3260]
Login to [iface: default, target: iqn:lun1, portal:127.0.0.1,3260] successful.

But cannot remove the session afterwards, since the kernel is in an
inconsistent state.

root@debian-vm:~# iscsiadm -m node -T iqn:lun1 -p 127.0.0.1 -u
iscsiadm: No matching sessions found

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
drivers/scsi/scsi_transport_iscsi.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index ed8d9709b9b9..2b732b7a9a5b 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -2093,7 +2093,12 @@ int iscsi_add_session(struct iscsi_cls_session *session, unsigned int target_id)
"could not register session's dev\n");
goto release_ida;
}
- transport_register_device(&session->dev);
+ err = transport_register_device(&session->dev);
+ if (err) {
+ iscsi_cls_session_printk(KERN_ERR, session,
+ "could not register transport's dev\n");
+ goto release_dev;
+ }

spin_lock_irqsave(&sesslock, flags);
list_add(&session->sess_list, &sesslist);
@@ -2103,6 +2108,8 @@ int iscsi_add_session(struct iscsi_cls_session *session, unsigned int target_id)
ISCSI_DBG_TRANS_SESSION(session, "Completed session adding\n");
return 0;

+release_dev:
+ device_del(&session->dev);
release_ida:
if (session->ida_used)
ida_simple_remove(&iscsi_sess_ida, session->target_id);
@@ -2263,7 +2270,12 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
"register connection's dev\n");
goto release_parent_ref;
}
- transport_register_device(&conn->dev);
+ err = transport_register_device(&conn->dev);
+ if (err) {
+ iscsi_cls_session_printk(KERN_ERR, session, "could not "
+ "register transport's dev\n");
+ goto release_conn_ref;
+ }

spin_lock_irqsave(&connlock, flags);
list_add(&conn->conn_list, &connlist);
@@ -2272,6 +2284,8 @@ iscsi_create_conn(struct iscsi_cls_session *session, int dd_size, uint32_t cid)
ISCSI_DBG_TRANS_CONN(conn, "Completed conn creation\n");
return conn;

+release_conn_ref:
+ put_device(&conn->dev);
release_parent_ref:
put_device(&session->dev);
free_conn:
--
2.24.1

2020-01-06 18:59:47

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: [PATCH 2/3] drivers: base: Propagate errors through the transport component

The transport registration may fail. Make sure the errors are
propagated to the callers.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
drivers/base/transport_class.c | 11 ++++++++---
include/linux/transport_class.h | 6 +++---
2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/base/transport_class.c b/drivers/base/transport_class.c
index 5ed86ded6e6b..ccc86206e508 100644
--- a/drivers/base/transport_class.c
+++ b/drivers/base/transport_class.c
@@ -30,6 +30,10 @@
#include <linux/attribute_container.h>
#include <linux/transport_class.h>

+static int transport_remove_classdev(struct attribute_container *cont,
+ struct device *dev,
+ struct device *classdev);
+
/**
* transport_class_register - register an initial transport class
*
@@ -172,10 +176,11 @@ static int transport_add_class_device(struct attribute_container *cont,
* routine is simply a trigger point used to add the device to the
* system and register attributes for it.
*/
-
-void transport_add_device(struct device *dev)
+int transport_add_device(struct device *dev)
{
- attribute_container_device_trigger(dev, transport_add_class_device);
+ return attribute_container_device_trigger_safe(dev,
+ transport_add_class_device,
+ transport_remove_classdev);
}
EXPORT_SYMBOL_GPL(transport_add_device);

diff --git a/include/linux/transport_class.h b/include/linux/transport_class.h
index a9c59761927b..63076fb835e3 100644
--- a/include/linux/transport_class.h
+++ b/include/linux/transport_class.h
@@ -62,16 +62,16 @@ struct transport_container {
container_of(x, struct transport_container, ac)

void transport_remove_device(struct device *);
-void transport_add_device(struct device *);
+int transport_add_device(struct device *);
void transport_setup_device(struct device *);
void transport_configure_device(struct device *);
void transport_destroy_device(struct device *);

-static inline void
+static inline int
transport_register_device(struct device *dev)
{
transport_setup_device(dev);
- transport_add_device(dev);
+ return transport_add_device(dev);
}

static inline void
--
2.24.1

2020-01-14 15:07:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/3] drivers: base: Support atomic version of attribute_container_device_trigger

On Mon, Jan 06, 2020 at 01:58:15PM -0500, Gabriel Krisman Bertazi wrote:
> attribute_container_device_trigger invokes callbacks that may fail for
> one or more classdev's, for instance, the transport_add_class_device
> callback, called during transport creation, does memory allocation.
> This information, though, is not propagated to upper layers, and any
> driver using the attribute_container_device_trigger API will not know
> whether any, some, or all callbacks succeeded.
>
> This patch implements a safe version of this dispatcher, to either
> succeed all the callbacks or revert to the original state.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Reviewed-by: Greg Kroah-Hartman <[email protected]>

2020-01-14 15:07:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/3] drivers: base: Propagate errors through the transport component

On Mon, Jan 06, 2020 at 01:58:16PM -0500, Gabriel Krisman Bertazi wrote:
> The transport registration may fail. Make sure the errors are
> propagated to the callers.
>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>

Reviewed-by: Greg Kroah-Hartman <[email protected]>

2020-01-14 15:07:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/3] drivers base: transport component error propagation

On Mon, Jan 06, 2020 at 01:58:14PM -0500, Gabriel Krisman Bertazi wrote:
> Hi,
>
> This small series improves error propagation on the transport component
> to prevent an inconsistent state in the iscsi module. The bug that
> motivated this patch results in a hanging iscsi connection that cannot
> be used or removed by userspace, since the session is in an inconsistent
> state.
>
> That said, I tested it using the TCP iscsi transport (and forcing errors
> on the triggered function), which doesn't require a particularly complex
> container structure, so it is not the best test for finding corner cases
> on the atomic attribute_container_device trigger version.
>
> Please let me know what you think.

Looks sane, feel free to take the first two patches through what ever
tree iscsi patches go through.

thanks,

greg k-h

2020-01-16 05:13:42

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 0/3] drivers base: transport component error propagation


Gabriel,

> This small series improves error propagation on the transport
> component to prevent an inconsistent state in the iscsi module. The
> bug that motivated this patch results in a hanging iscsi connection
> that cannot be used or removed by userspace, since the session is in
> an inconsistent state.
>
> That said, I tested it using the TCP iscsi transport (and forcing
> errors on the triggered function), which doesn't require a
> particularly complex container structure, so it is not the best test
> for finding corner cases on the atomic attribute_container_device
> trigger version.

Applied to 5.6/scsi-queue, thanks!

--
Martin K. Petersen Oracle Linux Engineering