2014-02-20 20:44:38

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET wq/for-3.15] workqueue: remove PREPARE_[DELAYED_]WORK()

Peter Hurley noticed that since a2c1c57be8d9 ("workqueue: consider
work function when searching for busy work items"), a work item which
gets assigned a different work function would break out of the
non-reentrancy guarantee as workqueue would consider it a different
work item.

This is fragile and extremely subtle. PREPARE_[DELAYED_]WORK() have
never been used widely and its semantics has always been somewhat
iffy. If the work item is known not to be on queue when
PREPARE_WORK() is called, there's no difference from using
INIT_WORK(). If the work item may be queued at the time of
PREPARE_WORK(), we can't really tell whether the old or new function
will be executed the next time.

We really don't want this level of subtlety in workqueue interface for
such marginal use cases. The previous patches converted all existing
users away from PREPARE_[DELAYED_]WORK(). Let's remove them.

This patchset contains the following nine patches.

0001-wireless-rt2x00-don-t-use-PREPARE_WORK-in-rt2800usb..patch
0002-ps3-vuart-don-t-use-PREPARE_WORK.patch
0003-floppy-don-t-use-PREPARE_-DELAYED_-WORK.patch
0004-firewire-don-t-use-PREPARE_DELAYED_WORK.patch
0005-usb-don-t-use-PREPARE_DELAYED_WORK.patch
0006-nvme-don-t-use-PREPARE_WORK.patch
0007-afs-don-t-use-PREPARE_WORK.patch
0008-staging-fwserial-don-t-use-PREPARE_WORK.patch
0009-workqueue-remove-PREPARE_-DELAYED_-WORK.patch

0001-0008 convert existing users.

0009 removes them.

This patchset is on top of e95003c3f9ccb ("Merge tag 'nfs-for-3.14-4'
of git://git.linux-nfs.org/projects/trondmy/linux-nfs") and also
available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-kill-PREPARE_WORK

diffstat follows.

drivers/block/floppy.c | 42 +++++++++++++++++++++-----------
drivers/block/nvme-core.c | 18 +++++++++----
drivers/firewire/core-device.c | 22 +++++++++++-----
drivers/firewire/sbp2.c | 17 +++++++++---
drivers/net/wireless/rt2x00/rt2800usb.c | 2 -
drivers/ps3/ps3-vuart.c | 4 ---
drivers/staging/fwserial/fwserial.c | 13 +++++++--
drivers/staging/fwserial/fwserial.h | 1
drivers/usb/core/hub.c | 13 +++++++--
drivers/usb/core/hub.h | 1
fs/afs/internal.h | 1
fs/afs/rxrpc.c | 12 +++++++--
include/linux/firewire.h | 1
include/linux/nvme.h | 1
include/linux/workqueue.h | 15 +----------
15 files changed, 107 insertions(+), 56 deletions(-)

Thanks.

--
tejun


2014-02-20 20:44:41

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/9] wireless/rt2x00: don't use PREPARE_WORK in rt2800usb.c

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

Update rt2800usb.c to use INIT_WORK() instead of PREPARE_WORK(). As
the work item isn't in active use during rt2800usb_probe_hw(), this
doesn't cause any behavior difference.

It would probably be best to route this with other related updates
through the workqueue tree.

Only compile tested.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Ivo van Doorn <[email protected]>
Cc: Gertjan van Wingerde <[email protected]>
Cc: Helmut Schaa <[email protected]>
Cc: [email protected]
---
drivers/net/wireless/rt2x00/rt2800usb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index caddc1b..42a2e06 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -764,7 +764,7 @@ static int rt2800usb_probe_hw(struct rt2x00_dev *rt2x00dev)
/*
* Overwrite TX done handler
*/
- PREPARE_WORK(&rt2x00dev->txdone_work, rt2800usb_work_txdone);
+ INIT_WORK(&rt2x00dev->txdone_work, rt2800usb_work_txdone);

return 0;
}
--
1.8.5.3

2014-02-20 20:44:45

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/9] floppy: don't use PREPARE_[DELAYED_]WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

floppy has been multiplexing floppy_work and fd_timer with multiple
work functions. Introduce floppy_work_workfn() and fd_timer_workfn()
which invoke floppy_work_fn and fd_timer_fn respectively and always
use the two functions as the work functions and update the users to
set floppy_work_fn and fd_timer_fn instead of overriding work
functions using PREPARE_[DELAYED_]WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Lightly tested using qemu.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Jiri Kosina <[email protected]>
---
drivers/block/floppy.c | 42 ++++++++++++++++++++++++++++--------------
1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 2023043..8f5565b 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -961,17 +961,31 @@ static void empty(void)
{
}

-static DECLARE_WORK(floppy_work, NULL);
+static void (*floppy_work_fn)(void);
+
+static void floppy_work_workfn(struct work_struct *work)
+{
+ floppy_work_fn();
+}
+
+static DECLARE_WORK(floppy_work, floppy_work_workfn);

static void schedule_bh(void (*handler)(void))
{
WARN_ON(work_pending(&floppy_work));

- PREPARE_WORK(&floppy_work, (work_func_t)handler);
+ floppy_work_fn = handler;
queue_work(floppy_wq, &floppy_work);
}

-static DECLARE_DELAYED_WORK(fd_timer, NULL);
+static void (*fd_timer_fn)(void) = NULL;
+
+static void fd_timer_workfn(struct work_struct *work)
+{
+ fd_timer_fn();
+}
+
+static DECLARE_DELAYED_WORK(fd_timer, fd_timer_workfn);

static void cancel_activity(void)
{
@@ -982,7 +996,7 @@ static void cancel_activity(void)

/* this function makes sure that the disk stays in the drive during the
* transfer */
-static void fd_watchdog(struct work_struct *arg)
+static void fd_watchdog(void)
{
debug_dcl(DP->flags, "calling disk change from watchdog\n");

@@ -993,7 +1007,7 @@ static void fd_watchdog(struct work_struct *arg)
reset_fdc();
} else {
cancel_delayed_work(&fd_timer);
- PREPARE_DELAYED_WORK(&fd_timer, fd_watchdog);
+ fd_timer_fn = fd_watchdog;
queue_delayed_work(floppy_wq, &fd_timer, HZ / 10);
}
}
@@ -1005,7 +1019,8 @@ static void main_command_interrupt(void)
}

/* waits for a delay (spinup or select) to pass */
-static int fd_wait_for_completion(unsigned long expires, work_func_t function)
+static int fd_wait_for_completion(unsigned long expires,
+ void (*function)(void))
{
if (FDCS->reset) {
reset_fdc(); /* do the reset during sleep to win time
@@ -1016,7 +1031,7 @@ static int fd_wait_for_completion(unsigned long expires, work_func_t function)

if (time_before(jiffies, expires)) {
cancel_delayed_work(&fd_timer);
- PREPARE_DELAYED_WORK(&fd_timer, function);
+ fd_timer_fn = function;
queue_delayed_work(floppy_wq, &fd_timer, expires - jiffies);
return 1;
}
@@ -1334,8 +1349,7 @@ static int fdc_dtr(void)
* Pause 5 msec to avoid trouble. (Needs to be 2 jiffies)
*/
FDCS->dtr = raw_cmd->rate & 3;
- return fd_wait_for_completion(jiffies + 2UL * HZ / 100,
- (work_func_t)floppy_ready);
+ return fd_wait_for_completion(jiffies + 2UL * HZ / 100, floppy_ready);
} /* fdc_dtr */

static void tell_sector(void)
@@ -1440,7 +1454,7 @@ static void setup_rw_floppy(void)
int flags;
int dflags;
unsigned long ready_date;
- work_func_t function;
+ void (*function)(void);

flags = raw_cmd->flags;
if (flags & (FD_RAW_READ | FD_RAW_WRITE))
@@ -1454,9 +1468,9 @@ static void setup_rw_floppy(void)
*/
if (time_after(ready_date, jiffies + DP->select_delay)) {
ready_date -= DP->select_delay;
- function = (work_func_t)floppy_start;
+ function = floppy_start;
} else
- function = (work_func_t)setup_rw_floppy;
+ function = setup_rw_floppy;

/* wait until the floppy is spinning fast enough */
if (fd_wait_for_completion(ready_date, function))
@@ -1486,7 +1500,7 @@ static void setup_rw_floppy(void)
inr = result();
cont->interrupt();
} else if (flags & FD_RAW_NEED_DISK)
- fd_watchdog(NULL);
+ fd_watchdog();
}

static int blind_seek;
@@ -1863,7 +1877,7 @@ static int start_motor(void (*function)(void))

/* wait_for_completion also schedules reset if needed. */
return fd_wait_for_completion(DRS->select_date + DP->select_delay,
- (work_func_t)function);
+ function);
}

static void floppy_ready(void)
--
1.8.5.3

2014-02-20 20:44:54

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 9/9] workqueue: remove PREPARE_[DELAYED_]WORK()

Peter Hurley noticed that since a2c1c57be8d9 ("workqueue: consider
work function when searching for busy work items"), a work item which
gets assigned a different work function would break out of the
non-reentrancy guarantee as workqueue would consider it a different
work item.

This is fragile and extremely subtle. PREPARE_[DELAYED_]WORK() have
never been used widely and its semantics has always been somewhat
iffy. If the work item is known not to be on queue when
PREPARE_WORK() is called, there's no difference from using
INIT_WORK(). If the work item may be queued at the time of
PREPARE_WORK(), we can't really tell whether the old or new function
will be executed the next time.

We really don't want this level of subtlety in workqueue interface for
such marginal use cases. The previous patches converted all existing
users away from PREPARE_[DELAYED_]WORK(). Let's remove them.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Peter Hurley <[email protected]>
Link: http://lkml.kernel.org/g/[email protected]
---
include/linux/workqueue.h | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 594521b..d499066 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -177,17 +177,6 @@ struct execute_work {
#define DECLARE_DEFERRABLE_WORK(n, f) \
struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, TIMER_DEFERRABLE)

-/*
- * initialize a work item's function pointer
- */
-#define PREPARE_WORK(_work, _func) \
- do { \
- (_work)->func = (_func); \
- } while (0)
-
-#define PREPARE_DELAYED_WORK(_work, _func) \
- PREPARE_WORK(&(_work)->work, (_func))
-
#ifdef CONFIG_DEBUG_OBJECTS_WORK
extern void __init_work(struct work_struct *work, int onstack);
extern void destroy_work_on_stack(struct work_struct *work);
@@ -217,7 +206,7 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; }
(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0); \
INIT_LIST_HEAD(&(_work)->entry); \
- PREPARE_WORK((_work), (_func)); \
+ (_work)->func = (_func); \
} while (0)
#else
#define __INIT_WORK(_work, _func, _onstack) \
@@ -225,7 +214,7 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; }
__init_work((_work), _onstack); \
(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
INIT_LIST_HEAD(&(_work)->entry); \
- PREPARE_WORK((_work), (_func)); \
+ (_work)->func = (_func); \
} while (0)
#endif

--
1.8.5.3

2014-02-20 20:44:53

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 8/9] staging/fwserial: don't use PREPARE_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

fwtty_peer->work is multiplexed with multiple work functions.
Introduce fwserial_peer_workfn() which invokes fwtty_peer->workfn and
always use it as the work function and update the users to set the
->workfn field instead of overriding the work function using
PREPARE_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Peter Hurley <[email protected]>
---
drivers/staging/fwserial/fwserial.c | 13 ++++++++++---
drivers/staging/fwserial/fwserial.h | 1 +
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/fwserial/fwserial.c b/drivers/staging/fwserial/fwserial.c
index 8af136e..b22142e 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -2036,6 +2036,13 @@ static void fwserial_auto_connect(struct work_struct *work)
schedule_delayed_work(&peer->connect, CONNECT_RETRY_DELAY);
}

+static void fwserial_peer_workfn(struct work_struct *work)
+{
+ struct fwtty_peer *peer = to_peer(work, work);
+
+ peer->workfn(work);
+}
+
/**
* fwserial_add_peer - add a newly probed 'serial' unit device as a 'peer'
* @serial: aggregate representing the specific fw_card to add the peer to
@@ -2100,7 +2107,7 @@ static int fwserial_add_peer(struct fw_serial *serial, struct fw_unit *unit)
peer->port = NULL;

init_timer(&peer->timer);
- INIT_WORK(&peer->work, NULL);
+ INIT_WORK(&peer->work, fwserial_peer_workfn);
INIT_DELAYED_WORK(&peer->connect, fwserial_auto_connect);

/* associate peer with specific fw_card */
@@ -2702,7 +2709,7 @@ static int fwserial_parse_mgmt_write(struct fwtty_peer *peer,

} else {
peer->work_params.plug_req = pkt->plug_req;
- PREPARE_WORK(&peer->work, fwserial_handle_plug_req);
+ peer->workfn = fwserial_handle_plug_req;
queue_work(system_unbound_wq, &peer->work);
}
break;
@@ -2731,7 +2738,7 @@ static int fwserial_parse_mgmt_write(struct fwtty_peer *peer,
fwtty_err(&peer->unit, "unplug req: busy\n");
rcode = RCODE_CONFLICT_ERROR;
} else {
- PREPARE_WORK(&peer->work, fwserial_handle_unplug_req);
+ peer->workfn = fwserial_handle_unplug_req;
queue_work(system_unbound_wq, &peer->work);
}
break;
diff --git a/drivers/staging/fwserial/fwserial.h b/drivers/staging/fwserial/fwserial.h
index 54f7f9b..98b853d 100644
--- a/drivers/staging/fwserial/fwserial.h
+++ b/drivers/staging/fwserial/fwserial.h
@@ -91,6 +91,7 @@ struct fwtty_peer {
struct rcu_head rcu;

spinlock_t lock;
+ work_func_t workfn;
struct work_struct work;
struct peer_work_params work_params;
struct timer_list timer;
--
1.8.5.3

2014-02-20 20:44:51

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/9] usb: don't use PREPARE_DELAYED_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

usb_hub->init_work is multiplexed with multiple work functions.
Introduce hub_init_workfn() which invokes usb_hub->init_workfn and
always use it as the work function and update the users to set the
->init_workfn field instead of overriding the work function using
PREPARE_DELAYED_WORK().

It looks like that the work items are never queued while in-flight, so
simply using INIT_DELAYED_WORK() before each queueing could be enough.
This patch performs equivalent conversion just in case but we probably
wanna clean it up later if that's the case.

It would probably be best to route this with other related updates
through the workqueue tree.

Lightly tested.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
---
drivers/usb/core/hub.c | 13 ++++++++++---
drivers/usb/core/hub.h | 1 +
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 64ea219..2bc61c0 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
*/
if (type == HUB_INIT) {
delay = hub_power_on(hub, false);
- PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func2);
+ hub->init_workfn = hub_init_func2;
schedule_delayed_work(&hub->init_work,
msecs_to_jiffies(delay));

@@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)

/* Don't do a long sleep inside a workqueue routine */
if (type == HUB_INIT2) {
- PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func3);
+ hub->init_workfn = hub_init_func3;
schedule_delayed_work(&hub->init_work,
msecs_to_jiffies(delay));
return; /* Continues at init3: below */
@@ -1634,6 +1634,13 @@ static void hub_disconnect(struct usb_interface *intf)
kref_put(&hub->kref, hub_release);
}

+static void hub_init_workfn(struct work_struct *work)
+{
+ struct usb_hub *hub = container_of(to_delayed_work(work),
+ struct usb_hub, init_work);
+ hub->init_workfn(work);
+}
+
static int hub_probe(struct usb_interface *intf, const struct usb_device_id *id)
{
struct usb_host_interface *desc;
@@ -1728,7 +1735,7 @@ descriptor_error:
hub->intfdev = &intf->dev;
hub->hdev = hdev;
INIT_DELAYED_WORK(&hub->leds, led_work);
- INIT_DELAYED_WORK(&hub->init_work, NULL);
+ INIT_DELAYED_WORK(&hub->init_work, hub_init_workfn);
usb_get_intf(intf);

usb_set_intfdata (intf, hub);
diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
index df629a3..ef81463 100644
--- a/drivers/usb/core/hub.h
+++ b/drivers/usb/core/hub.h
@@ -72,6 +72,7 @@ struct usb_hub {
unsigned has_indicators:1;
u8 indicator[USB_MAXCHILDREN];
struct delayed_work leds;
+ work_func_t init_workfn;
struct delayed_work init_work;
struct usb_port **ports;
};
--
1.8.5.3

2014-02-20 20:44:49

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 6/9] nvme: don't use PREPARE_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

nvme_dev->reset_work is multiplexed with multiple work functions.
Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and
always use it as the work function and update the users to set the
->reset_workfn field instead of overriding the work function using
PREPARE_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: [email protected]
---
drivers/block/nvme-core.c | 18 ++++++++++++------
include/linux/nvme.h | 1 +
2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 51824d1..8459e4e 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -993,7 +993,7 @@ static void nvme_abort_cmd(int cmdid, struct nvme_queue *nvmeq)
dev_warn(&dev->pci_dev->dev,
"I/O %d QID %d timeout, reset controller\n", cmdid,
nvmeq->qid);
- PREPARE_WORK(&dev->reset_work, nvme_reset_failed_dev);
+ dev->reset_workfn = nvme_reset_failed_dev;
queue_work(nvme_workq, &dev->reset_work);
return;
}
@@ -1696,8 +1696,7 @@ static int nvme_kthread(void *data)
list_del_init(&dev->node);
dev_warn(&dev->pci_dev->dev,
"Failed status, reset controller\n");
- PREPARE_WORK(&dev->reset_work,
- nvme_reset_failed_dev);
+ dev->reset_workfn = nvme_reset_failed_dev;
queue_work(nvme_workq, &dev->reset_work);
continue;
}
@@ -2406,7 +2405,7 @@ static int nvme_dev_resume(struct nvme_dev *dev)
return ret;
if (ret == -EBUSY) {
spin_lock(&dev_list_lock);
- PREPARE_WORK(&dev->reset_work, nvme_remove_disks);
+ dev->reset_workfn = nvme_remove_disks;
queue_work(nvme_workq, &dev->reset_work);
spin_unlock(&dev_list_lock);
}
@@ -2435,6 +2434,12 @@ static void nvme_reset_failed_dev(struct work_struct *ws)
nvme_dev_reset(dev);
}

+static void nvme_reset_workfn(struct work_struct *work)
+{
+ struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
+ dev->reset_workfn(work);
+}
+
static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
int result = -ENOMEM;
@@ -2453,7 +2458,8 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto free;

INIT_LIST_HEAD(&dev->namespaces);
- INIT_WORK(&dev->reset_work, nvme_reset_failed_dev);
+ dev->reset_workfn = nvme_reset_failed_dev;
+ INIT_WORK(&dev->reset_work, nvme_reset_workfn);
dev->pci_dev = pdev;
pci_set_drvdata(pdev, dev);
result = nvme_set_instance(dev);
@@ -2553,7 +2559,7 @@ static int nvme_resume(struct device *dev)
struct nvme_dev *ndev = pci_get_drvdata(pdev);

if (nvme_dev_resume(ndev) && !work_busy(&ndev->reset_work)) {
- PREPARE_WORK(&ndev->reset_work, nvme_reset_failed_dev);
+ ndev->reset_workfn = nvme_reset_failed_dev;
queue_work(nvme_workq, &ndev->reset_work);
}
return 0;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 69ae03f..6b9aafe 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -87,6 +87,7 @@ struct nvme_dev {
struct list_head namespaces;
struct kref kref;
struct miscdevice miscdev;
+ work_func_t reset_workfn;
struct work_struct reset_work;
char name[12];
char serial[20];
--
1.8.5.3

2014-02-20 20:46:44

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 7/9] afs: don't use PREPARE_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

afs_call->async_work is multiplexed with multiple work functions.
Introduce afs_async_workfn() which invokes afs_call->async_workfn and
always use it as the work function and update the users to set the
->async_workfn field instead of overriding the work function using
PREPARE_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.

Signed-off-by: Tejun Heo <[email protected]>
Cc: David Howells <[email protected]>
Cc: [email protected]
---
fs/afs/internal.h | 1 +
fs/afs/rxrpc.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 6621f80..be75b50 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -75,6 +75,7 @@ struct afs_call {
const struct afs_call_type *type; /* type of call */
const struct afs_wait_mode *wait_mode; /* completion wait mode */
wait_queue_head_t waitq; /* processes awaiting completion */
+ work_func_t async_workfn;
struct work_struct async_work; /* asynchronous work processor */
struct work_struct work; /* actual work processor */
struct sk_buff_head rx_queue; /* received packets */
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 8ad8c2a..ef943df 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -644,7 +644,7 @@ static void afs_process_async_call(struct work_struct *work)

/* we can't just delete the call because the work item may be
* queued */
- PREPARE_WORK(&call->async_work, afs_delete_async_call);
+ call->async_workfn = afs_delete_async_call;
queue_work(afs_async_calls, &call->async_work);
}

@@ -663,6 +663,13 @@ void afs_transfer_reply(struct afs_call *call, struct sk_buff *skb)
call->reply_size += len;
}

+static void afs_async_workfn(struct work_struct *work)
+{
+ struct afs_call *call = container_of(work, struct afs_call, async_work);
+
+ call->async_workfn(work);
+}
+
/*
* accept the backlog of incoming calls
*/
@@ -685,7 +692,8 @@ static void afs_collect_incoming_call(struct work_struct *work)
return;
}

- INIT_WORK(&call->async_work, afs_process_async_call);
+ call->async_workfn = afs_process_async_call;
+ INIT_WORK(&call->async_work, afs_async_workfn);
call->wait_mode = &afs_async_incoming_call;
call->type = &afs_RXCMxxxx;
init_waitqueue_head(&call->waitq);
--
1.8.5.3

2014-02-20 20:47:16

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

firewire core-device and sbp2 have been been multiplexing work items
with multiple work functions. Introduce fw_device_workfn() and
sbp2_lu_workfn() which invoke fw_device->workfn and
sbp2_logical_unit->workfn respectively and always use the two
functions as the work functions and update the users to set the
->workfn fields instead of overriding work functions using
PREPARE_DELAYED_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Stefan Richter <[email protected]>
Cc: [email protected]
Cc: Chris Boot <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/firewire/core-device.c | 22 +++++++++++++++-------
drivers/firewire/sbp2.c | 17 +++++++++++++----
include/linux/firewire.h | 1 +
3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/firewire/core-device.c b/drivers/firewire/core-device.c
index de4aa40..2c6d5e1 100644
--- a/drivers/firewire/core-device.c
+++ b/drivers/firewire/core-device.c
@@ -916,7 +916,7 @@ static int lookup_existing_device(struct device *dev, void *data)
old->config_rom_retries = 0;
fw_notice(card, "rediscovered device %s\n", dev_name(dev));

- PREPARE_DELAYED_WORK(&old->work, fw_device_update);
+ old->workfn = fw_device_update;
fw_schedule_device_work(old, 0);

if (current_node == card->root_node)
@@ -1075,7 +1075,7 @@ static void fw_device_init(struct work_struct *work)
if (atomic_cmpxchg(&device->state,
FW_DEVICE_INITIALIZING,
FW_DEVICE_RUNNING) == FW_DEVICE_GONE) {
- PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
+ device->workfn = fw_device_shutdown;
fw_schedule_device_work(device, SHUTDOWN_DELAY);
} else {
fw_notice(card, "created device %s: GUID %08x%08x, S%d00\n",
@@ -1196,13 +1196,20 @@ static void fw_device_refresh(struct work_struct *work)
dev_name(&device->device), fw_rcode_string(ret));
gone:
atomic_set(&device->state, FW_DEVICE_GONE);
- PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
+ device->workfn = fw_device_shutdown;
fw_schedule_device_work(device, SHUTDOWN_DELAY);
out:
if (node_id == card->root_node->node_id)
fw_schedule_bm_work(card, 0);
}

+static void fw_device_workfn(struct work_struct *work)
+{
+ struct fw_device *device = container_of(to_delayed_work(work),
+ struct fw_device, work);
+ device->workfn(work);
+}
+
void fw_node_event(struct fw_card *card, struct fw_node *node, int event)
{
struct fw_device *device;
@@ -1252,7 +1259,8 @@ void fw_node_event(struct fw_card *card, struct fw_node *node, int event)
* power-up after getting plugged in. We schedule the
* first config rom scan half a second after bus reset.
*/
- INIT_DELAYED_WORK(&device->work, fw_device_init);
+ device->workfn = fw_device_init;
+ INIT_DELAYED_WORK(&device->work, fw_device_workfn);
fw_schedule_device_work(device, INITIAL_DELAY);
break;

@@ -1268,7 +1276,7 @@ void fw_node_event(struct fw_card *card, struct fw_node *node, int event)
if (atomic_cmpxchg(&device->state,
FW_DEVICE_RUNNING,
FW_DEVICE_INITIALIZING) == FW_DEVICE_RUNNING) {
- PREPARE_DELAYED_WORK(&device->work, fw_device_refresh);
+ device->workfn = fw_device_refresh;
fw_schedule_device_work(device,
device->is_local ? 0 : INITIAL_DELAY);
}
@@ -1283,7 +1291,7 @@ void fw_node_event(struct fw_card *card, struct fw_node *node, int event)
smp_wmb(); /* update node_id before generation */
device->generation = card->generation;
if (atomic_read(&device->state) == FW_DEVICE_RUNNING) {
- PREPARE_DELAYED_WORK(&device->work, fw_device_update);
+ device->workfn = fw_device_update;
fw_schedule_device_work(device, 0);
}
break;
@@ -1308,7 +1316,7 @@ void fw_node_event(struct fw_card *card, struct fw_node *node, int event)
device = node->data;
if (atomic_xchg(&device->state,
FW_DEVICE_GONE) == FW_DEVICE_RUNNING) {
- PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
+ device->workfn = fw_device_shutdown;
fw_schedule_device_work(device,
list_empty(&card->link) ? 0 : SHUTDOWN_DELAY);
}
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 281029d..7aef911 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -146,6 +146,7 @@ struct sbp2_logical_unit {
*/
int generation;
int retries;
+ work_func_t workfn;
struct delayed_work work;
bool has_sdev;
bool blocked;
@@ -864,7 +865,7 @@ static void sbp2_login(struct work_struct *work)
/* set appropriate retry limit(s) in BUSY_TIMEOUT register */
sbp2_set_busy_timeout(lu);

- PREPARE_DELAYED_WORK(&lu->work, sbp2_reconnect);
+ lu->workfn = sbp2_reconnect;
sbp2_agent_reset(lu);

/* This was a re-login. */
@@ -918,7 +919,7 @@ static void sbp2_login(struct work_struct *work)
* If a bus reset happened, sbp2_update will have requeued
* lu->work already. Reset the work from reconnect to login.
*/
- PREPARE_DELAYED_WORK(&lu->work, sbp2_login);
+ lu->workfn = sbp2_login;
}

static void sbp2_reconnect(struct work_struct *work)
@@ -952,7 +953,7 @@ static void sbp2_reconnect(struct work_struct *work)
lu->retries++ >= 5) {
dev_err(tgt_dev(tgt), "failed to reconnect\n");
lu->retries = 0;
- PREPARE_DELAYED_WORK(&lu->work, sbp2_login);
+ lu->workfn = sbp2_login;
}
sbp2_queue_work(lu, DIV_ROUND_UP(HZ, 5));

@@ -972,6 +973,13 @@ static void sbp2_reconnect(struct work_struct *work)
sbp2_conditionally_unblock(lu);
}

+static void sbp2_lu_workfn(struct work_struct *work)
+{
+ struct sbp2_logical_unit *lu = container_of(to_delayed_work(work),
+ struct sbp2_logical_unit, work);
+ lu->workfn(work);
+}
+
static int sbp2_add_logical_unit(struct sbp2_target *tgt, int lun_entry)
{
struct sbp2_logical_unit *lu;
@@ -998,7 +1006,8 @@ static int sbp2_add_logical_unit(struct sbp2_target *tgt, int lun_entry)
lu->blocked = false;
++tgt->dont_block;
INIT_LIST_HEAD(&lu->orb_list);
- INIT_DELAYED_WORK(&lu->work, sbp2_login);
+ lu->workfn = sbp2_login;
+ INIT_DELAYED_WORK(&lu->work, sbp2_lu_workfn);

list_add_tail(&lu->link, &tgt->lu_list);
return 0;
diff --git a/include/linux/firewire.h b/include/linux/firewire.h
index 5d7782e..c3683bd 100644
--- a/include/linux/firewire.h
+++ b/include/linux/firewire.h
@@ -200,6 +200,7 @@ struct fw_device {
unsigned irmc:1;
unsigned bc_implemented:2;

+ work_func_t workfn;
struct delayed_work work;
struct fw_attribute_group attribute_group;
};
--
1.8.5.3

2014-02-20 20:47:37

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/9] ps3-vuart: don't use PREPARE_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

ps3_vuart wasn't overriding the work item with multiple work functions
but was using NULL for INIT_WORK() and then single PREPARE_WORK() to
set the work function. We can simply invoke INIT_WORK() with the work
function and remove the PREPARE_WORK() usage.

Not tested at all.

It would probably be best to route this with other related updates
through the workqueue tree.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/ps3/ps3-vuart.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/ps3/ps3-vuart.c b/drivers/ps3/ps3-vuart.c
index fb73008..bc1e513 100644
--- a/drivers/ps3/ps3-vuart.c
+++ b/drivers/ps3/ps3-vuart.c
@@ -699,8 +699,6 @@ int ps3_vuart_read_async(struct ps3_system_bus_device *dev, unsigned int bytes)

BUG_ON(!bytes);

- PREPARE_WORK(&priv->rx_list.work.work, ps3_vuart_work);
-
spin_lock_irqsave(&priv->rx_list.lock, flags);
if (priv->rx_list.bytes_held >= bytes) {
dev_dbg(&dev->core, "%s:%d: schedule_work %xh bytes\n",
@@ -1052,7 +1050,7 @@ static int ps3_vuart_probe(struct ps3_system_bus_device *dev)
INIT_LIST_HEAD(&priv->rx_list.head);
spin_lock_init(&priv->rx_list.lock);

- INIT_WORK(&priv->rx_list.work.work, NULL);
+ INIT_WORK(&priv->rx_list.work.work, ps3_vuart_work);
priv->rx_list.work.trigger = 0;
priv->rx_list.work.dev = dev;

--
1.8.5.3

2014-02-20 20:57:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/9] usb: don't use PREPARE_DELAYED_WORK

On Thu, Feb 20, 2014 at 03:44:27PM -0500, Tejun Heo wrote:
> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.
>
> usb_hub->init_work is multiplexed with multiple work functions.
> Introduce hub_init_workfn() which invokes usb_hub->init_workfn and
> always use it as the work function and update the users to set the
> ->init_workfn field instead of overriding the work function using
> PREPARE_DELAYED_WORK().
>
> It looks like that the work items are never queued while in-flight, so
> simply using INIT_DELAYED_WORK() before each queueing could be enough.
> This patch performs equivalent conversion just in case but we probably
> wanna clean it up later if that's the case.

I think it should be fine to use INIT_DELAYED_WORK(), but Alan would
know best. Alan?

>
> It would probably be best to route this with other related updates
> through the workqueue tree.
>
> Lightly tested.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: [email protected]

Acked-by: Greg Kroah-Hartman <[email protected]>

> ---
> drivers/usb/core/hub.c | 13 ++++++++++---
> drivers/usb/core/hub.h | 1 +
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 64ea219..2bc61c0 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
> */
> if (type == HUB_INIT) {
> delay = hub_power_on(hub, false);
> - PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func2);
> + hub->init_workfn = hub_init_func2;
> schedule_delayed_work(&hub->init_work,
> msecs_to_jiffies(delay));
>
> @@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
>
> /* Don't do a long sleep inside a workqueue routine */
> if (type == HUB_INIT2) {
> - PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func3);
> + hub->init_workfn = hub_init_func3;
> schedule_delayed_work(&hub->init_work,
> msecs_to_jiffies(delay));
> return; /* Continues at init3: below */
> @@ -1634,6 +1634,13 @@ static void hub_disconnect(struct usb_interface *intf)
> kref_put(&hub->kref, hub_release);
> }
>
> +static void hub_init_workfn(struct work_struct *work)
> +{
> + struct usb_hub *hub = container_of(to_delayed_work(work),
> + struct usb_hub, init_work);
> + hub->init_workfn(work);
> +}
> +
> static int hub_probe(struct usb_interface *intf, const struct usb_device_id *id)
> {
> struct usb_host_interface *desc;
> @@ -1728,7 +1735,7 @@ descriptor_error:
> hub->intfdev = &intf->dev;
> hub->hdev = hdev;
> INIT_DELAYED_WORK(&hub->leds, led_work);
> - INIT_DELAYED_WORK(&hub->init_work, NULL);
> + INIT_DELAYED_WORK(&hub->init_work, hub_init_workfn);
> usb_get_intf(intf);
>
> usb_set_intfdata (intf, hub);
> diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
> index df629a3..ef81463 100644
> --- a/drivers/usb/core/hub.h
> +++ b/drivers/usb/core/hub.h
> @@ -72,6 +72,7 @@ struct usb_hub {
> unsigned has_indicators:1;
> u8 indicator[USB_MAXCHILDREN];
> struct delayed_work leds;
> + work_func_t init_workfn;
> struct delayed_work init_work;
> struct usb_port **ports;
> };
> --
> 1.8.5.3

2014-02-20 22:46:06

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 7/9] afs: don't use PREPARE_WORK

Tejun Heo <[email protected]> wrote:

> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.

Why so? Isn't the work item address sufficient to distinguish them?

David

2014-02-20 22:46:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 7/9] afs: don't use PREPARE_WORK

On Thu, Feb 20, 2014 at 10:00:05PM +0000, David Howells wrote:
> Tejun Heo <[email protected]> wrote:
>
> > PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> > and a nasty surprise in terms of reentrancy guarantee as workqueue
> > considers work items to be different if they don't have the same work
> > function.
>
> Why so? Isn't the work item address sufficient to distinguish them?

Because we ended up introducing deadlocks through work items which are
freed and then recycled.

Thanks.

--
tejun

2014-02-21 01:44:52

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/20/2014 03:44 PM, Tejun Heo wrote:
> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.
>
> firewire core-device and sbp2 have been been multiplexing work items
> with multiple work functions. Introduce fw_device_workfn() and
> sbp2_lu_workfn() which invoke fw_device->workfn and
> sbp2_logical_unit->workfn respectively and always use the two
> functions as the work functions and update the users to set the
> ->workfn fields instead of overriding work functions using
> PREPARE_DELAYED_WORK().
>
> It would probably be best to route this with other related updates
> through the workqueue tree.
>
> Compile tested.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Stefan Richter <[email protected]>
> Cc: [email protected]
> Cc: Chris Boot <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> drivers/firewire/core-device.c | 22 +++++++++++++++-------
> drivers/firewire/sbp2.c | 17 +++++++++++++----
> include/linux/firewire.h | 1 +
> 3 files changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/firewire/core-device.c b/drivers/firewire/core-device.c
> index de4aa40..2c6d5e1 100644
> --- a/drivers/firewire/core-device.c
> +++ b/drivers/firewire/core-device.c
> @@ -916,7 +916,7 @@ static int lookup_existing_device(struct device *dev, void *data)
> old->config_rom_retries = 0;
> fw_notice(card, "rediscovered device %s\n", dev_name(dev));
>
> - PREPARE_DELAYED_WORK(&old->work, fw_device_update);
> + old->workfn = fw_device_update;
> fw_schedule_device_work(old, 0);
>
> if (current_node == card->root_node)
> @@ -1075,7 +1075,7 @@ static void fw_device_init(struct work_struct *work)
> if (atomic_cmpxchg(&device->state,
> FW_DEVICE_INITIALIZING,
> FW_DEVICE_RUNNING) == FW_DEVICE_GONE) {
> - PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
> + device->workfn = fw_device_shutdown;
> fw_schedule_device_work(device, SHUTDOWN_DELAY);

Implied mb of test_and_set_bit() in queue_work_on() ensures that the
newly assigned work function is visible on all cpus before evaluating
whether or not the work can be queued.

Ok.

> } else {
> fw_notice(card, "created device %s: GUID %08x%08x, S%d00\n",
> @@ -1196,13 +1196,20 @@ static void fw_device_refresh(struct work_struct *work)
> dev_name(&device->device), fw_rcode_string(ret));
> gone:
> atomic_set(&device->state, FW_DEVICE_GONE);
> - PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
> + device->workfn = fw_device_shutdown;
> fw_schedule_device_work(device, SHUTDOWN_DELAY);
> out:
> if (node_id == card->root_node->node_id)
> fw_schedule_bm_work(card, 0);
> }
>
> +static void fw_device_workfn(struct work_struct *work)
> +{
> + struct fw_device *device = container_of(to_delayed_work(work),
> + struct fw_device, work);

I think this needs an smp_rmb() here.

> + device->workfn(work);
> +}
> +

Otherwise this cpu could speculatively load workfn before
set_work_pool_and_clear_pending(), which means that the old workfn
could have been loaded but PENDING was still set and caused queue_work_on()
to reject the work as already pending.

Result: the new work function never runs.

But this exposes a more general problem that I believe workqueue should
prevent; speculated loads and stores in the work item function should be
prevented from occurring before clearing PENDING in
set_work_pool_and_clear_pending().

IOW, the beginning of the work function should act like a barrier in
the same way that queue_work_on() (et. al.) already does.

Regards,
Peter Hurley

2014-02-21 01:59:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

Hello,

On Thu, Feb 20, 2014 at 08:44:46PM -0500, Peter Hurley wrote:
> >+static void fw_device_workfn(struct work_struct *work)
> >+{
> >+ struct fw_device *device = container_of(to_delayed_work(work),
> >+ struct fw_device, work);
>
> I think this needs an smp_rmb() here.

The patch is equivalent transformation and the whole thing is
guaranteed to have gone through pool->lock. No explicit rmb
necessary.

> IOW, the beginning of the work function should act like a barrier in
> the same way that queue_work_on() (et. al.) already does.

workqueue already has enough barriers; otherwise, the whole kernel
would have crumbled long time ago.

Thanks.

--
tejun

2014-02-21 02:07:34

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/20/2014 08:59 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Feb 20, 2014 at 08:44:46PM -0500, Peter Hurley wrote:
>>> +static void fw_device_workfn(struct work_struct *work)
>>> +{
>>> + struct fw_device *device = container_of(to_delayed_work(work),
>>> + struct fw_device, work);
>>
>> I think this needs an smp_rmb() here.
>
> The patch is equivalent transformation and the whole thing is
> guaranteed to have gone through pool->lock. No explicit rmb
> necessary.

The spin_unlock_irq(&pool->lock) only guarantees completion of
memory operations _before_ the unlock; memory operations which occur
_after_ the unlock may be speculated before the unlock.

IOW, unlock is not a memory barrier for operations that occur after.

>> IOW, the beginning of the work function should act like a barrier in
>> the same way that queue_work_on() (et. al.) already does.
>
> workqueue already has enough barriers; otherwise, the whole kernel
> would have crumbled long time ago.

See above.

Regards,
Peter Hurley

2014-02-21 02:13:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On Thu, Feb 20, 2014 at 09:07:27PM -0500, Peter Hurley wrote:
> On 02/20/2014 08:59 PM, Tejun Heo wrote:
> >Hello,
> >
> >On Thu, Feb 20, 2014 at 08:44:46PM -0500, Peter Hurley wrote:
> >>>+static void fw_device_workfn(struct work_struct *work)
> >>>+{
> >>>+ struct fw_device *device = container_of(to_delayed_work(work),
> >>>+ struct fw_device, work);
> >>
> >>I think this needs an smp_rmb() here.
> >
> >The patch is equivalent transformation and the whole thing is
> >guaranteed to have gone through pool->lock. No explicit rmb
> >necessary.
>
> The spin_unlock_irq(&pool->lock) only guarantees completion of
> memory operations _before_ the unlock; memory operations which occur
> _after_ the unlock may be speculated before the unlock.
>
> IOW, unlock is not a memory barrier for operations that occur after.

It's not just unlock. It's lock / unlock pair on the same lock from
both sides. Nothing can sip through that.

--
tejun

2014-02-21 05:13:24

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/20/2014 09:13 PM, Tejun Heo wrote:
> On Thu, Feb 20, 2014 at 09:07:27PM -0500, Peter Hurley wrote:
>> On 02/20/2014 08:59 PM, Tejun Heo wrote:
>>> Hello,
>>>
>>> On Thu, Feb 20, 2014 at 08:44:46PM -0500, Peter Hurley wrote:
>>>>> +static void fw_device_workfn(struct work_struct *work)
>>>>> +{
>>>>> + struct fw_device *device = container_of(to_delayed_work(work),
>>>>> + struct fw_device, work);
>>>>
>>>> I think this needs an smp_rmb() here.
>>>
>>> The patch is equivalent transformation and the whole thing is
>>> guaranteed to have gone through pool->lock. No explicit rmb
>>> necessary.
>>
>> The spin_unlock_irq(&pool->lock) only guarantees completion of
>> memory operations _before_ the unlock; memory operations which occur
>> _after_ the unlock may be speculated before the unlock.
>>
>> IOW, unlock is not a memory barrier for operations that occur after.
>
> It's not just unlock. It's lock / unlock pair on the same lock from
> both sides. Nothing can sip through that.

CPU 0 | CPU 1
|
INIT_WORK(fw_device_workfn) |
|
workfn = funcA |
queue_work_on() |
. | process_one_work()
. | ..
. | worker->current_func = work->func
. |
. | speculative load of workfn = funcA
. | .
workfn = funcB | .
queue_work_on() | .
local_irq_save() | .
test_and_set_bit() == 1 | .
| set_work_pool_and_clear_pending()
work is not queued | smp_wmb
funcB never runs | set_work_data()
| atomic_set()
| spin_unlock_irq()
|
| worker->current_func(work) @ fw_device_workfn
| workfn() @ funcA


The speculative load of workfn on CPU 1 is valid because no rmb will occur
between the load and the execution of workfn() on CPU 1.

Thus funcB will never execute because, in this circumstance, a second
worker is not queued (because PENDING had not yet been cleared).

Regards,
Peter Hurley

2014-02-21 09:37:09

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 3/9] floppy: don't use PREPARE_[DELAYED_]WORK

On Thu, 20 Feb 2014, Tejun Heo wrote:

> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.
>
> floppy has been multiplexing floppy_work and fd_timer with multiple
> work functions. Introduce floppy_work_workfn() and fd_timer_workfn()
> which invoke floppy_work_fn and fd_timer_fn respectively and always
> use the two functions as the work functions and update the users to
> set floppy_work_fn and fd_timer_fn instead of overriding work
> functions using PREPARE_[DELAYED_]WORK().
>
> It would probably be best to route this with other related updates
> through the workqueue tree.
>
> Lightly tested using qemu.
>
> Signed-off-by: Tejun Heo <[email protected]>

Acked-by: Jiri Kosina <[email protected]>

Thanks.

> ---
> drivers/block/floppy.c | 42 ++++++++++++++++++++++++++++--------------
> 1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 2023043..8f5565b 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -961,17 +961,31 @@ static void empty(void)
> {
> }
>
> -static DECLARE_WORK(floppy_work, NULL);
> +static void (*floppy_work_fn)(void);
> +
> +static void floppy_work_workfn(struct work_struct *work)
> +{
> + floppy_work_fn();
> +}
> +
> +static DECLARE_WORK(floppy_work, floppy_work_workfn);
>
> static void schedule_bh(void (*handler)(void))
> {
> WARN_ON(work_pending(&floppy_work));
>
> - PREPARE_WORK(&floppy_work, (work_func_t)handler);
> + floppy_work_fn = handler;
> queue_work(floppy_wq, &floppy_work);
> }
>
> -static DECLARE_DELAYED_WORK(fd_timer, NULL);
> +static void (*fd_timer_fn)(void) = NULL;
> +
> +static void fd_timer_workfn(struct work_struct *work)
> +{
> + fd_timer_fn();
> +}
> +
> +static DECLARE_DELAYED_WORK(fd_timer, fd_timer_workfn);
>
> static void cancel_activity(void)
> {
> @@ -982,7 +996,7 @@ static void cancel_activity(void)
>
> /* this function makes sure that the disk stays in the drive during the
> * transfer */
> -static void fd_watchdog(struct work_struct *arg)
> +static void fd_watchdog(void)
> {
> debug_dcl(DP->flags, "calling disk change from watchdog\n");
>
> @@ -993,7 +1007,7 @@ static void fd_watchdog(struct work_struct *arg)
> reset_fdc();
> } else {
> cancel_delayed_work(&fd_timer);
> - PREPARE_DELAYED_WORK(&fd_timer, fd_watchdog);
> + fd_timer_fn = fd_watchdog;
> queue_delayed_work(floppy_wq, &fd_timer, HZ / 10);
> }
> }
> @@ -1005,7 +1019,8 @@ static void main_command_interrupt(void)
> }
>
> /* waits for a delay (spinup or select) to pass */
> -static int fd_wait_for_completion(unsigned long expires, work_func_t function)
> +static int fd_wait_for_completion(unsigned long expires,
> + void (*function)(void))
> {
> if (FDCS->reset) {
> reset_fdc(); /* do the reset during sleep to win time
> @@ -1016,7 +1031,7 @@ static int fd_wait_for_completion(unsigned long expires, work_func_t function)
>
> if (time_before(jiffies, expires)) {
> cancel_delayed_work(&fd_timer);
> - PREPARE_DELAYED_WORK(&fd_timer, function);
> + fd_timer_fn = function;
> queue_delayed_work(floppy_wq, &fd_timer, expires - jiffies);
> return 1;
> }
> @@ -1334,8 +1349,7 @@ static int fdc_dtr(void)
> * Pause 5 msec to avoid trouble. (Needs to be 2 jiffies)
> */
> FDCS->dtr = raw_cmd->rate & 3;
> - return fd_wait_for_completion(jiffies + 2UL * HZ / 100,
> - (work_func_t)floppy_ready);
> + return fd_wait_for_completion(jiffies + 2UL * HZ / 100, floppy_ready);
> } /* fdc_dtr */
>
> static void tell_sector(void)
> @@ -1440,7 +1454,7 @@ static void setup_rw_floppy(void)
> int flags;
> int dflags;
> unsigned long ready_date;
> - work_func_t function;
> + void (*function)(void);
>
> flags = raw_cmd->flags;
> if (flags & (FD_RAW_READ | FD_RAW_WRITE))
> @@ -1454,9 +1468,9 @@ static void setup_rw_floppy(void)
> */
> if (time_after(ready_date, jiffies + DP->select_delay)) {
> ready_date -= DP->select_delay;
> - function = (work_func_t)floppy_start;
> + function = floppy_start;
> } else
> - function = (work_func_t)setup_rw_floppy;
> + function = setup_rw_floppy;
>
> /* wait until the floppy is spinning fast enough */
> if (fd_wait_for_completion(ready_date, function))
> @@ -1486,7 +1500,7 @@ static void setup_rw_floppy(void)
> inr = result();
> cont->interrupt();
> } else if (flags & FD_RAW_NEED_DISK)
> - fd_watchdog(NULL);
> + fd_watchdog();
> }
>
> static int blind_seek;
> @@ -1863,7 +1877,7 @@ static int start_motor(void (*function)(void))
>
> /* wait_for_completion also schedules reset if needed. */
> return fd_wait_for_completion(DRS->select_date + DP->select_delay,
> - (work_func_t)function);
> + function);
> }
>
> static void floppy_ready(void)
> --
> 1.8.5.3
>

--
Jiri Kosina
SUSE Labs

2014-02-21 10:03:10

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On Fri, Feb 21, 2014 at 12:13:16AM -0500, Peter Hurley wrote:
> CPU 0 | CPU 1
> |
> INIT_WORK(fw_device_workfn) |
> |
> workfn = funcA |
> queue_work_on() |
> . | process_one_work()
> . | ..
> . | worker->current_func = work->func
> . |
> . | speculative load of workfn = funcA
> . | .
> workfn = funcB | .
> queue_work_on() | .
> local_irq_save() | .
> test_and_set_bit() == 1 | .
> | set_work_pool_and_clear_pending()
> work is not queued | smp_wmb
> funcB never runs | set_work_data()
> | atomic_set()
> | spin_unlock_irq()
> |
> | worker->current_func(work) @ fw_device_workfn
> | workfn() @ funcA
>
>
> The speculative load of workfn on CPU 1 is valid because no rmb will occur
> between the load and the execution of workfn() on CPU 1.
>
> Thus funcB will never execute because, in this circumstance, a second
> worker is not queued (because PENDING had not yet been cleared).

There's no right or wrong execution. Executions of either funcA or
funcB are correct results. The only memory ordering guarantee
workqueue gives is that anything written before the work item is
queued will be visible when that instance starts executing. When a
work item is not queued, no ordering is guaranteed between the
queueing attempt and the execution of the existing instance. We can
add such guarantee, not sure how much it'd matter but it's not like
it's gonna cost a lot either.

This doesn't have much to do with the current series tho. In fact,
PREPARE_WORK can't ever be made to give such guarantee. The function
pointer has to fetched before clearing of PENDING.

--
tejun

2014-02-21 12:51:56

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/21/2014 05:03 AM, Tejun Heo wrote:
> On Fri, Feb 21, 2014 at 12:13:16AM -0500, Peter Hurley wrote:
>> CPU 0 | CPU 1
>> |
>> INIT_WORK(fw_device_workfn) |
>> |
>> workfn = funcA |
>> queue_work_on() |
>> . | process_one_work()
>> . | ..
>> . | worker->current_func = work->func
>> . |
>> . | speculative load of workfn = funcA
>> . | .
>> workfn = funcB | .
>> queue_work_on() | .
>> local_irq_save() | .
>> test_and_set_bit() == 1 | .
>> | set_work_pool_and_clear_pending()
>> work is not queued | smp_wmb
>> funcB never runs | set_work_data()
>> | atomic_set()
>> | spin_unlock_irq()
>> |
>> | worker->current_func(work) @ fw_device_workfn
>> | workfn() @ funcA
>>
>>
>> The speculative load of workfn on CPU 1 is valid because no rmb will occur
>> between the load and the execution of workfn() on CPU 1.
>>
>> Thus funcB will never execute because, in this circumstance, a second
>> worker is not queued (because PENDING had not yet been cleared).
>
> There's no right or wrong execution. Executions of either funcA or
> funcB are correct results. The only memory ordering guarantee
> workqueue gives is that anything written before the work item is
> queued will be visible when that instance starts executing. When a
> work item is not queued, no ordering is guaranteed between the
> queueing attempt and the execution of the existing instance.

I think the vast majority of kernel code which uses the workqueue
assumes there is a memory ordering guarantee.

Meaning that if a work item is not queue-able then the previously
queued instance _has not yet started_ and so, by deduction, must be
able to see the newly written values.

Consider:

add something important to list to work on
queue work

or

update index in buffer indicating more data
queue work

Neither of these uses expect that the workqueue does not guarantee
that this latest data is acted upon.

Another way to look at this problem is that process_one_work()
doesn't become the existing instance _until_ PENDING is cleared.

> We can
> add such guarantee, not sure how much it'd matter but it's not like
> it's gonna cost a lot either.
>
> This doesn't have much to do with the current series tho. In fact,
> PREPARE_WORK can't ever be made to give such guarantee.

Yes, I agree that PREPARE_DELAYED_WORK was also broken usage with the
same problem. [And there are other bugs in that firewire device work
code which I'm working on.]

> The function pointer has to fetched before clearing of PENDING.

Why?

As long as the load takes place within the pool->lock, I don't think
it matters (especially now PREPARE_WORK is removed).

Regards,
Peter Hurley

2014-02-21 13:06:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

Hello,

On Fri, Feb 21, 2014 at 07:51:48AM -0500, Peter Hurley wrote:
> I think the vast majority of kernel code which uses the workqueue
> assumes there is a memory ordering guarantee.

Not really. Workqueues haven't even guaranteed non-reentrancy until
recently, forcing everybody to lock explicitly in the work function.
I don't think there'd be many, if any, use cases which depend on
ordering guarantee on duplicate queueing.

> Another way to look at this problem is that process_one_work()
> doesn't become the existing instance _until_ PENDING is cleared.

Sure, having that guarantee definitely is nicer and all we need seems
to be mb_after_unlock in the execution path. Please feel free to
submit a patch.

> >add such guarantee, not sure how much it'd matter but it's not like
> >it's gonna cost a lot either.
> >
> >This doesn't have much to do with the current series tho. In fact,
> >PREPARE_WORK can't ever be made to give such guarantee.
>
> Yes, I agree that PREPARE_DELAYED_WORK was also broken usage with the
> same problem. [And there are other bugs in that firewire device work
> code which I'm working on.]
>
> >The function pointer has to fetched before clearing of PENDING.
>
> Why?
>
> As long as the load takes place within the pool->lock, I don't think
> it matters (especially now PREPARE_WORK is removed).

Hmmm... I was talking about PREPARE_WORK(). Clearing PENDING means
that the work item is released from the worker context and may be
freed or reused at any time (hmm... this may not be true anymore as
non-syncing variants of cancel_work are gone), so clearing PENDING
should be the last access to the work item and thus we can't use that
as the barrier event for fetching its work function.

Thanks.

--
tejun

2014-02-21 15:06:08

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 5/9] usb: don't use PREPARE_DELAYED_WORK

On Thu, 20 Feb 2014, Greg Kroah-Hartman wrote:

> On Thu, Feb 20, 2014 at 03:44:27PM -0500, Tejun Heo wrote:
> > PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> > and a nasty surprise in terms of reentrancy guarantee as workqueue
> > considers work items to be different if they don't have the same work
> > function.
> >
> > usb_hub->init_work is multiplexed with multiple work functions.
> > Introduce hub_init_workfn() which invokes usb_hub->init_workfn and
> > always use it as the work function and update the users to set the
> > ->init_workfn field instead of overriding the work function using
> > PREPARE_DELAYED_WORK().
> >
> > It looks like that the work items are never queued while in-flight, so

They aren't. But one work item does get queued by the previous one's
work function.

> > simply using INIT_DELAYED_WORK() before each queueing could be enough.
> > This patch performs equivalent conversion just in case but we probably
> > wanna clean it up later if that's the case.
>
> I think it should be fine to use INIT_DELAYED_WORK(), but Alan would
> know best. Alan?

That's right; INIT_DELAYED_WORK() should be fine. Provided there's no
problem doing it from within the previous work routine.

Alan Stern

2014-02-21 15:07:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 5/9] usb: don't use PREPARE_DELAYED_WORK

On Fri, Feb 21, 2014 at 10:06:05AM -0500, Alan Stern wrote:
> > I think it should be fine to use INIT_DELAYED_WORK(), but Alan would
> > know best. Alan?
>
> That's right; INIT_DELAYED_WORK() should be fine. Provided there's no
> problem doing it from within the previous work routine.

Yeah, that's completely fine. Once a work item starts execution,
workqueue doesn't care about it at all - even freeing and recycling it
is fine.

Thanks.

--
tejun

2014-02-21 15:13:32

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 8/9] staging/fwserial: don't use PREPARE_WORK

On 02/20/2014 03:44 PM, Tejun Heo wrote:
> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.
>
> fwtty_peer->work is multiplexed with multiple work functions.
> Introduce fwserial_peer_workfn() which invokes fwtty_peer->workfn and
> always use it as the work function and update the users to set the
> ->workfn field instead of overriding the work function using
> PREPARE_WORK().
>
> It would probably be best to route this with other related updates
> through the workqueue tree.
>
> Compile tested.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Peter Hurley <[email protected]>

Acked-by: Peter Hurley <[email protected]>

2014-02-21 16:53:56

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

Hi Tejun,

On 02/21/2014 08:06 AM, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 21, 2014 at 07:51:48AM -0500, Peter Hurley wrote:
>> I think the vast majority of kernel code which uses the workqueue
>> assumes there is a memory ordering guarantee.
>
> Not really. Workqueues haven't even guaranteed non-reentrancy until
> recently, forcing everybody to lock explicitly in the work function.
> I don't think there'd be many, if any, use cases which depend on
> ordering guarantee on duplicate queueing.

I added some in 3.12 :)

>> Another way to look at this problem is that process_one_work()
>> doesn't become the existing instance _until_ PENDING is cleared.
>
> Sure, having that guarantee definitely is nicer and all we need seems
> to be mb_after_unlock in the execution path. Please feel free to
> submit a patch.

Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
no mb__after unlock.

[ After thinking about it some, I don't think preventing speculative
writes before clearing PENDING if useful or necessary, so that's
why I'm suggesting only the rmb. ]

>>> add such guarantee, not sure how much it'd matter but it's not like
>>> it's gonna cost a lot either.
>>>
>>> This doesn't have much to do with the current series tho. In fact,
>>> PREPARE_WORK can't ever be made to give such guarantee.
>>
>> Yes, I agree that PREPARE_DELAYED_WORK was also broken usage with the
>> same problem. [And there are other bugs in that firewire device work
>> code which I'm working on.]
>>
>>> The function pointer has to fetched before clearing of PENDING.
>>
>> Why?
>>
>> As long as the load takes place within the pool->lock, I don't think
>> it matters (especially now PREPARE_WORK is removed).
>
> Hmmm... I was talking about PREPARE_WORK(). Clearing PENDING means
> that the work item is released from the worker context and may be
> freed or reused at any time (hmm... this may not be true anymore as
> non-syncing variants of cancel_work are gone), so clearing PENDING
> should be the last access to the work item and thus we can't use that
> as the barrier event for fetching its work function.

Yeah, it seems like the work item lifetime is at least guaranteed
while either PENDING is set _or_ while the pool->lock is held
after PENDING is cleared.

Regards,
Peter Hurley

2014-02-21 16:57:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

Yo,

On Fri, Feb 21, 2014 at 11:53:46AM -0500, Peter Hurley wrote:
> Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
> no mb__after unlock.

We do have smp_mb__after_unlock_lock().

> [ After thinking about it some, I don't think preventing speculative
> writes before clearing PENDING if useful or necessary, so that's
> why I'm suggesting only the rmb. ]

But smp_mb__after_unlock_lock() would be cheaper on most popular
archs, I think.

Thanks.

--
tejun

2014-02-21 20:51:19

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On Feb 20 Tejun Heo wrote:
> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.
>
> firewire core-device and sbp2 have been been multiplexing work items
> with multiple work functions. Introduce fw_device_workfn() and
> sbp2_lu_workfn() which invoke fw_device->workfn and
> sbp2_logical_unit->workfn respectively and always use the two
> functions as the work functions and update the users to set the
> ->workfn fields instead of overriding work functions using
> PREPARE_DELAYED_WORK().
>
> It would probably be best to route this with other related updates
> through the workqueue tree.
>
> Compile tested.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Stefan Richter <[email protected]>
> Cc: [email protected]

Acked-by: Stefan Richter <[email protected]>

And lightly runtime-tested as well.

This doesn't actually touch sbp-target; the following Cc's could be
dropped:

> Cc: Chris Boot <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> drivers/firewire/core-device.c | 22 +++++++++++++++-------
> drivers/firewire/sbp2.c | 17 +++++++++++++----
> include/linux/firewire.h | 1 +
> 3 files changed, 29 insertions(+), 11 deletions(-)

--
Stefan Richter
-=====-====- --=- =-=-=
http://arcgraph.de/sr/

2014-02-21 23:01:37

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/21/2014 11:57 AM, Tejun Heo wrote:
> Yo,
>
> On Fri, Feb 21, 2014 at 11:53:46AM -0500, Peter Hurley wrote:
>> Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
>> no mb__after unlock.
>
> We do have smp_mb__after_unlock_lock().
>
>> [ After thinking about it some, I don't think preventing speculative
>> writes before clearing PENDING if useful or necessary, so that's
>> why I'm suggesting only the rmb. ]
>
> But smp_mb__after_unlock_lock() would be cheaper on most popular
> archs, I think.

smp_mb__after_unlock_lock() is only for ordering memory operations
between two spin-locked sections on either the same lock or by
the same task/cpu. Like:

i = 1
spin_unlock(lock1)
spin_lock(lock2)
smp_mb__after_unlock_lock()
j = 1

This guarantees that the store to j happens after the store to i.
Without it, a cpu can

spin_lock(lock2)
j = 1
i = 1
spin_unlock(lock1)

Regards,
Peter Hurley

2014-02-21 23:18:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On Fri, Feb 21, 2014 at 06:01:29PM -0500, Peter Hurley wrote:
> smp_mb__after_unlock_lock() is only for ordering memory operations
> between two spin-locked sections on either the same lock or by
> the same task/cpu. Like:
>
> i = 1
> spin_unlock(lock1)
> spin_lock(lock2)
> smp_mb__after_unlock_lock()
> j = 1
>
> This guarantees that the store to j happens after the store to i.
> Without it, a cpu can
>
> spin_lock(lock2)
> j = 1
> i = 1
> spin_unlock(lock1)

Hmmm? I'm pretty sure that's a full barrier. Local processor is
always in order (w.r.t. the compiler).

Thanks.

--
tejun

2014-02-21 23:19:24

by Geoff Levand

[permalink] [raw]
Subject: Re: [PATCH 2/9] ps3-vuart: don't use PREPARE_WORK

Hi,

On Thu, 2014-02-20 at 15:44 -0500, Tejun Heo wrote:
> ps3_vuart wasn't overriding the work item with multiple work functions
> but was using NULL for INIT_WORK() and then single PREPARE_WORK() to
> set the work function. We can simply invoke INIT_WORK() with the work
> function and remove the PREPARE_WORK() usage.
>
> It would probably be best to route this with other related updates
> through the workqueue tree.

I tested this on ps3 (DECR-1400) and it seems to work OK. Please queue
with your other workqueue changes. Thanks!

Acked-by: Geoff Levand <[email protected]>


2014-02-21 23:46:32

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/21/2014 06:18 PM, Tejun Heo wrote:
> On Fri, Feb 21, 2014 at 06:01:29PM -0500, Peter Hurley wrote:
>> smp_mb__after_unlock_lock() is only for ordering memory operations
>> between two spin-locked sections on either the same lock or by
>> the same task/cpu. Like:
>>
>> i = 1
>> spin_unlock(lock1)
>> spin_lock(lock2)
>> smp_mb__after_unlock_lock()
>> j = 1
>>
>> This guarantees that the store to j happens after the store to i.
>> Without it, a cpu can
>>
>> spin_lock(lock2)
>> j = 1
>> i = 1
>> spin_unlock(lock1)
> ;
> Hmmm? I'm pretty sure that's a full barrier. Local processor is
> always in order (w.r.t. the compiler).

It's a long story but the short version is that
Documentation/memory-barriers.txt recently was overhauled to reflect
what cpus actually do and what the different archs actually
deliver.

Turns out that unlock + lock is not guaranteed by all archs to be
a full barrier. Thus the smb_mb__after_unlock_lock().

This is now all spelled out in memory-barriers.txt under the
sub-heading "IMPLICIT KERNEL MEMORY BARRIERS".

Regards,
Peter Hurley

2014-02-22 14:39:07

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

Hey,

On Fri, Feb 21, 2014 at 06:46:24PM -0500, Peter Hurley wrote:
> It's a long story but the short version is that
> Documentation/memory-barriers.txt recently was overhauled to reflect
> what cpus actually do and what the different archs actually
> deliver.
>
> Turns out that unlock + lock is not guaranteed by all archs to be
> a full barrier. Thus the smb_mb__after_unlock_lock().
>
> This is now all spelled out in memory-barriers.txt under the
> sub-heading "IMPLICIT KERNEL MEMORY BARRIERS".

So, that one is for unlock/lock sequence, not smp_mb__after_unlock().
Urgh... kinda dislike adding smp_rmb() there as it's one of the
barriers which translate to something weighty on most, if not all,
archs.

Thanks.

--
tejun

2014-02-22 14:49:02

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/22/2014 09:38 AM, Tejun Heo wrote:
> Hey,
>
> On Fri, Feb 21, 2014 at 06:46:24PM -0500, Peter Hurley wrote:
>> It's a long story but the short version is that
>> Documentation/memory-barriers.txt recently was overhauled to reflect
>> what cpus actually do and what the different archs actually
>> deliver.
>>
>> Turns out that unlock + lock is not guaranteed by all archs to be
>> a full barrier. Thus the smb_mb__after_unlock_lock().
>>
>> This is now all spelled out in memory-barriers.txt under the
>> sub-heading "IMPLICIT KERNEL MEMORY BARRIERS".
>
> So, that one is for unlock/lock sequence, not smp_mb__after_unlock().
> Urgh... kinda dislike adding smp_rmb() there as it's one of the
> barriers which translate to something weighty on most, if not all,
> archs.

The ticket lock, which is used on x86 to implement spinlocks, has no
fence at all for the fast path (only the compiler barrier), so even on
x86 there has to be a real barrier.

IOW, there really isn't an arch that can get away without doing this
barrier because it follows an unlock.

Regards,
Peter Hurley

2014-02-22 15:00:05

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

Hello,

If this is actually safe, let's do it from the get-go.

Thanks!
------- 8< -------
PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

usb_hub->init_work is multiplexed with multiple work functions;
however, the work item is never queued while in-flight, so we can
simply use INIT_DELAYED_WORK() before each queueing.

It would probably be best to route this with other related updates
through the workqueue tree.

Lightly tested.

v2: Greg and Alan confirm that the work item is never queued while
in-flight. Simply use INIT_DELAYED_WORK().

Signed-off-by: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: [email protected]
---
drivers/usb/core/hub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub
*/
if (type == HUB_INIT) {
delay = hub_power_on(hub, false);
- PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func2);
+ INIT_DELAYED_WORK(&hub->init_work, hub_init_func2);
schedule_delayed_work(&hub->init_work,
msecs_to_jiffies(delay));

@@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub

/* Don't do a long sleep inside a workqueue routine */
if (type == HUB_INIT2) {
- PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func3);
+ INIT_DELAYED_WORK(&hub->init_work, hub_init_func3);
schedule_delayed_work(&hub->init_work,
msecs_to_jiffies(delay));
return; /* Continues at init3: below */

2014-02-22 15:14:51

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

On Sat, 22 Feb 2014, Tejun Heo wrote:

> Hello,
>
> If this is actually safe, let's do it from the get-go.
>
> Thanks!
> ------- 8< -------
> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
> and a nasty surprise in terms of reentrancy guarantee as workqueue
> considers work items to be different if they don't have the same work
> function.
>
> usb_hub->init_work is multiplexed with multiple work functions;
> however, the work item is never queued while in-flight, so we can
> simply use INIT_DELAYED_WORK() before each queueing.
>
> It would probably be best to route this with other related updates
> through the workqueue tree.
>
> Lightly tested.
>
> v2: Greg and Alan confirm that the work item is never queued while
> in-flight. Simply use INIT_DELAYED_WORK().
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Alan Stern <[email protected]>
> Cc: [email protected]
> ---
> drivers/usb/core/hub.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub
> */
> if (type == HUB_INIT) {
> delay = hub_power_on(hub, false);
> - PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func2);
> + INIT_DELAYED_WORK(&hub->init_work, hub_init_func2);
> schedule_delayed_work(&hub->init_work,
> msecs_to_jiffies(delay));
>
> @@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub
>
> /* Don't do a long sleep inside a workqueue routine */
> if (type == HUB_INIT2) {
> - PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func3);
> + INIT_DELAYED_WORK(&hub->init_work, hub_init_func3);
> schedule_delayed_work(&hub->init_work,
> msecs_to_jiffies(delay));
> return; /* Continues at init3: below */
>

This should work okay. But while you're making these changes, you
should remove the INIT_DELAYED_WORK(&hub->init_work, NULL) call in
hub_probe(). It is now unnecessary.

Is the cancel_delayed_work_sync(&hub->init_work) call in hub_quiesce()
going to get confused by all this?

It's worth mentioning that the only reason for the hub_init_func3 stuff
is, as the comment says, to avoid a long sleep (100 ms) inside a work
routine. With all the changes to the work queue infrastructure, maybe
this doesn't matter so much any more. If we got rid of it then there
wouldn't be any multiplexing, and this whole issue would become moot.

Alan Stern

2014-02-22 15:20:46

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

On 02/22/2014 10:14 AM, Alan Stern wrote:
> On Sat, 22 Feb 2014, Tejun Heo wrote:
>
>> Hello,
>>
>> If this is actually safe, let's do it from the get-go.
>>
>> Thanks!
>> ------- 8< -------
>> PREPARE_[DELAYED_]WORK() are being phased out. They have few users
>> and a nasty surprise in terms of reentrancy guarantee as workqueue
>> considers work items to be different if they don't have the same work
>> function.
>>
>> usb_hub->init_work is multiplexed with multiple work functions;
>> however, the work item is never queued while in-flight, so we can
>> simply use INIT_DELAYED_WORK() before each queueing.
>>
>> It would probably be best to route this with other related updates
>> through the workqueue tree.
>>
>> Lightly tested.
>>
>> v2: Greg and Alan confirm that the work item is never queued while
>> in-flight. Simply use INIT_DELAYED_WORK().
>>
>> Signed-off-by: Tejun Heo <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>> Cc: Alan Stern <[email protected]>
>> Cc: [email protected]
>> ---
>> drivers/usb/core/hub.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> --- a/drivers/usb/core/hub.c
>> +++ b/drivers/usb/core/hub.c
>> @@ -1040,7 +1040,7 @@ static void hub_activate(struct usb_hub
>> */
>> if (type == HUB_INIT) {
>> delay = hub_power_on(hub, false);
>> - PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func2);
>> + INIT_DELAYED_WORK(&hub->init_work, hub_init_func2);
>> schedule_delayed_work(&hub->init_work,
>> msecs_to_jiffies(delay));
>>
>> @@ -1194,7 +1194,7 @@ static void hub_activate(struct usb_hub
>>
>> /* Don't do a long sleep inside a workqueue routine */
>> if (type == HUB_INIT2) {
>> - PREPARE_DELAYED_WORK(&hub->init_work, hub_init_func3);
>> + INIT_DELAYED_WORK(&hub->init_work, hub_init_func3);
>> schedule_delayed_work(&hub->init_work,
>> msecs_to_jiffies(delay));
>> return; /* Continues at init3: below */
>>
>
> This should work okay. But while you're making these changes, you
> should remove the INIT_DELAYED_WORK(&hub->init_work, NULL) call in
> hub_probe(). It is now unnecessary.
>
> Is the cancel_delayed_work_sync(&hub->init_work) call in hub_quiesce()
> going to get confused by all this?
>
> It's worth mentioning that the only reason for the hub_init_func3 stuff
> is, as the comment says, to avoid a long sleep (100 ms) inside a work
> routine.

If a running hub init does not need to be single-threaded wrt
a different running hub init, then a single init work could be queued to
the system_unbound_wq which doesn't care about running times.

> With all the changes to the work queue infrastructure, maybe
> this doesn't matter so much any more. If we got rid of it then there
> wouldn't be any multiplexing, and this whole issue would become moot.

2014-02-22 15:38:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

On Sat, Feb 22, 2014 at 10:14:48AM -0500, Alan Stern wrote:
> Is the cancel_delayed_work_sync(&hub->init_work) call in hub_quiesce()
> going to get confused by all this?

Yeah, you can't cancel a work item which hasn't been initialzed.
Maybe move init of the first work function there? I don't think it
really matters tho.

> It's worth mentioning that the only reason for the hub_init_func3 stuff
> is, as the comment says, to avoid a long sleep (100 ms) inside a work
> routine. With all the changes to the work queue infrastructure, maybe
> this doesn't matter so much any more. If we got rid of it then there
> wouldn't be any multiplexing, and this whole issue would become moot.

I don't really think that'd be necessary. Just sleeping synchronously
should be fine. How many threads are we talking about?

Thanks.

--
tejun

2014-02-22 18:43:33

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK


On Fri, 2014-02-21 at 18:01 -0500, Peter Hurley wrote:
> On 02/21/2014 11:57 AM, Tejun Heo wrote:
> > Yo,
> >
> > On Fri, Feb 21, 2014 at 11:53:46AM -0500, Peter Hurley wrote:
> >> Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
> >> no mb__after unlock.
> >
> > We do have smp_mb__after_unlock_lock().
> >
> >> [ After thinking about it some, I don't think preventing speculative
> >> writes before clearing PENDING if useful or necessary, so that's
> >> why I'm suggesting only the rmb. ]
> >
> > But smp_mb__after_unlock_lock() would be cheaper on most popular
> > archs, I think.
>
> smp_mb__after_unlock_lock() is only for ordering memory operations
> between two spin-locked sections on either the same lock or by
> the same task/cpu. Like:
>
> i = 1
> spin_unlock(lock1)
> spin_lock(lock2)
> smp_mb__after_unlock_lock()
> j = 1
>
> This guarantees that the store to j happens after the store to i.
> Without it, a cpu can
>
> spin_lock(lock2)
> j = 1
> i = 1
> spin_unlock(lock1)

No the CPU cannot. If the CPU were allowed to reorder locking
sequences, we'd get speculation induced ABBA deadlocks. The rules are
quite simple: loads and stores cannot speculate out of critical
sections. All architectures have barriers in place to prevent this ...
I know from personal experience because the barriers on PARISC were
originally too weak and we did get some speculation out of the critical
sections, which was very nasty to debug.

Stuff may speculate into critical sections from non-critical but never
out of them and critical section boundaries may not reorder to cause an
overlap.

James


2014-02-22 18:48:09

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/22/2014 01:43 PM, James Bottomley wrote:
>
> On Fri, 2014-02-21 at 18:01 -0500, Peter Hurley wrote:
>> On 02/21/2014 11:57 AM, Tejun Heo wrote:
>>> Yo,
>>>
>>> On Fri, Feb 21, 2014 at 11:53:46AM -0500, Peter Hurley wrote:
>>>> Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
>>>> no mb__after unlock.
>>>
>>> We do have smp_mb__after_unlock_lock().
>>>
>>>> [ After thinking about it some, I don't think preventing speculative
>>>> writes before clearing PENDING if useful or necessary, so that's
>>>> why I'm suggesting only the rmb. ]
>>>
>>> But smp_mb__after_unlock_lock() would be cheaper on most popular
>>> archs, I think.
>>
>> smp_mb__after_unlock_lock() is only for ordering memory operations
>> between two spin-locked sections on either the same lock or by
>> the same task/cpu. Like:
>>
>> i = 1
>> spin_unlock(lock1)
>> spin_lock(lock2)
>> smp_mb__after_unlock_lock()
>> j = 1
>>
>> This guarantees that the store to j happens after the store to i.
>> Without it, a cpu can
>>
>> spin_lock(lock2)
>> j = 1
>> i = 1
>> spin_unlock(lock1)
>
> No the CPU cannot. If the CPU were allowed to reorder locking
> sequences, we'd get speculation induced ABBA deadlocks. The rules are
> quite simple: loads and stores cannot speculate out of critical
> sections.

If you look carefully, you'll notice that the stores have not been
moved from their respective critical sections; simply that the two
critical sections overlap because they use different locks.

Regards,
Peter Hurley

PS - Your reply address is unroutable.

> All architectures have barriers in place to prevent this ...
> I know from personal experience because the barriers on PARISC were
> originally too weak and we did get some speculation out of the critical
> sections, which was very nasty to debug.
>
> Stuff may speculate into critical sections from non-critical but never
> out of them and critical section boundaries may not reorder to cause an
> overlap.

2014-02-22 18:52:22

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On Sat, 2014-02-22 at 13:48 -0500, Peter Hurley wrote:
> On 02/22/2014 01:43 PM, James Bottomley wrote:
> >
> > On Fri, 2014-02-21 at 18:01 -0500, Peter Hurley wrote:
> >> On 02/21/2014 11:57 AM, Tejun Heo wrote:
> >>> Yo,
> >>>
> >>> On Fri, Feb 21, 2014 at 11:53:46AM -0500, Peter Hurley wrote:
> >>>> Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
> >>>> no mb__after unlock.
> >>>
> >>> We do have smp_mb__after_unlock_lock().
> >>>
> >>>> [ After thinking about it some, I don't think preventing speculative
> >>>> writes before clearing PENDING if useful or necessary, so that's
> >>>> why I'm suggesting only the rmb. ]
> >>>
> >>> But smp_mb__after_unlock_lock() would be cheaper on most popular
> >>> archs, I think.
> >>
> >> smp_mb__after_unlock_lock() is only for ordering memory operations
> >> between two spin-locked sections on either the same lock or by
> >> the same task/cpu. Like:
> >>
> >> i = 1
> >> spin_unlock(lock1)
> >> spin_lock(lock2)
> >> smp_mb__after_unlock_lock()
> >> j = 1
> >>
> >> This guarantees that the store to j happens after the store to i.
> >> Without it, a cpu can
> >>
> >> spin_lock(lock2)
> >> j = 1
> >> i = 1
> >> spin_unlock(lock1)
> >
> > No the CPU cannot. If the CPU were allowed to reorder locking
> > sequences, we'd get speculation induced ABBA deadlocks. The rules are
> > quite simple: loads and stores cannot speculate out of critical
> > sections.
>
> If you look carefully, you'll notice that the stores have not been
> moved from their respective critical sections; simply that the two
> critical sections overlap because they use different locks.

You didn't look carefully enough at what I wrote. You may not reorder
critical sections so they overlap regardless of whether the locks are
independent or not. This is because we'd get ABBA deadlocks due to
speculation (A represents lock1 and B lock 2 in your example).

James

2014-02-22 19:03:57

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On 02/22/2014 01:52 PM, James Bottomley wrote:
> On Sat, 2014-02-22 at 13:48 -0500, Peter Hurley wrote:
>> On 02/22/2014 01:43 PM, James Bottomley wrote:
>>>
>>> On Fri, 2014-02-21 at 18:01 -0500, Peter Hurley wrote:
>>>> On 02/21/2014 11:57 AM, Tejun Heo wrote:
>>>>> Yo,
>>>>>
>>>>> On Fri, Feb 21, 2014 at 11:53:46AM -0500, Peter Hurley wrote:
>>>>>> Ok, I can do that. But AFAIK it'll have to be an smp_rmb(); there is
>>>>>> no mb__after unlock.
>>>>>
>>>>> We do have smp_mb__after_unlock_lock().
>>>>>
>>>>>> [ After thinking about it some, I don't think preventing speculative
>>>>>> writes before clearing PENDING if useful or necessary, so that's
>>>>>> why I'm suggesting only the rmb. ]
>>>>>
>>>>> But smp_mb__after_unlock_lock() would be cheaper on most popular
>>>>> archs, I think.
>>>>
>>>> smp_mb__after_unlock_lock() is only for ordering memory operations
>>>> between two spin-locked sections on either the same lock or by
>>>> the same task/cpu. Like:
>>>>
>>>> i = 1
>>>> spin_unlock(lock1)
>>>> spin_lock(lock2)
>>>> smp_mb__after_unlock_lock()
>>>> j = 1
>>>>
>>>> This guarantees that the store to j happens after the store to i.
>>>> Without it, a cpu can
>>>>
>>>> spin_lock(lock2)
>>>> j = 1
>>>> i = 1
>>>> spin_unlock(lock1)
>>>
>>> No the CPU cannot. If the CPU were allowed to reorder locking
>>> sequences, we'd get speculation induced ABBA deadlocks. The rules are
>>> quite simple: loads and stores cannot speculate out of critical
>>> sections.
>>
>> If you look carefully, you'll notice that the stores have not been
>> moved from their respective critical sections; simply that the two
>> critical sections overlap because they use different locks.
>
> You didn't look carefully enough at what I wrote. You may not reorder
> critical sections so they overlap regardless of whether the locks are
> independent or not. This is because we'd get ABBA deadlocks due to
> speculation (A represents lock1 and B lock 2 in your example).

On 11/26/2013 02:00 PM, Linus Torvalds wrote:
> On Tue, Nov 26, 2013 at 1:59 AM, Peter Zijlstra <[email protected]> wrote:
>>
>> If you now want to weaken this definition, then that needs consideration
>> because we actually rely on things like
>>
>> spin_unlock(l1);
>> spin_lock(l2);
>>
>> being full barriers.
>
> Btw, maybe we should just stop that assumption. The complexity of this
> discussion makes me go "maybe we should stop with subtle assumptions
> that happen to be obviously true on x86 due to historical
> implementations, but aren't obviously true even *there* any more with
> the MCS lock".
>
> We already have a concept of
>
> smp_mb__before_spinlock();
> spin_lock():
>
> for sequences where we *know* we need to make getting a spin-lock be a
> full memory barrier. It's free on x86 (and remains so even with the
> MCS lock, regardless of any subtle issues, if only because even the
> MCS lock starts out with a locked atomic, never mind the contention
> slow-case). Of course, that macro is only used inside the scheduler,
> and is actually documented to not really be a full memory barrier, but
> it handles the case we actually care about.
>
> IOW, where do we really care about the "unlock+lock" is a memory
> barrier? And could we make those places explicit, and then do
> something similar to the above to them?
>
> Linus
>

http://www.spinics.net/lists/linux-mm/msg65653.html

and the resultant text from Documentation/memory-barriers.txt

An ACQUIRE followed by a RELEASE may not be assumed to be full memory barrier
because it is possible for an access preceding the ACQUIRE to happen after the
ACQUIRE, and an access following the RELEASE to happen before the RELEASE, and
the two accesses can themselves then cross:

*A = a;
ACQUIRE M
RELEASE M
*B = b;

may occur as:

ACQUIRE M, STORE *B, STORE *A, RELEASE M

This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
to the same lock variable, but only from the perspective of another CPU not
holding that lock.

In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
memory barrier because it is possible for a preceding RELEASE to pass a
later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
of the compiler. Note that deadlocks cannot be introduced by this
interchange because if such a deadlock threatened, the RELEASE would
simply complete.

If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This
will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
same variable. The smp_mb__after_unlock_lock() primitive is free on many
architectures. Without smp_mb__after_unlock_lock(), the critical sections
corresponding to the RELEASE and the ACQUIRE can cross:

*A = a;
RELEASE M
ACQUIRE N
*B = b;

could occur as:

ACQUIRE N, STORE *B, STORE *A, RELEASE M

With smp_mb__after_unlock_lock(), they cannot, so that:

*A = a;
RELEASE M
ACQUIRE N
smp_mb__after_unlock_lock();
*B = b;

will always occur as either of the following:

STORE *A, RELEASE, ACQUIRE, STORE *B
STORE *A, ACQUIRE, RELEASE, STORE *B

If the RELEASE and ACQUIRE were instead both operating on the same lock
variable, only the first of these two alternatives can occur.

Regards,
Peter Hurley

2014-02-22 23:03:08

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

On Sat, 22 Feb 2014, Tejun Heo wrote:

> On Sat, Feb 22, 2014 at 10:14:48AM -0500, Alan Stern wrote:
> > Is the cancel_delayed_work_sync(&hub->init_work) call in hub_quiesce()
> > going to get confused by all this?
>
> Yeah, you can't cancel a work item which hasn't been initialzed.
> Maybe move init of the first work function there? I don't think it
> really matters tho.
>
> > It's worth mentioning that the only reason for the hub_init_func3 stuff
> > is, as the comment says, to avoid a long sleep (100 ms) inside a work
> > routine. With all the changes to the work queue infrastructure, maybe
> > this doesn't matter so much any more. If we got rid of it then there
> > wouldn't be any multiplexing, and this whole issue would become moot.
>
> I don't really think that'd be necessary. Just sleeping synchronously
> should be fine. How many threads are we talking about?

One thread per hub (no more than 10 on a typical system). The code in
question is part of the hub driver's probe sequence.


On Sat, 22 Feb 2014, Peter Hurley wrote:

> If a running hub init does not need to be single-threaded wrt
> a different running hub init,

I'm not quite sure what that means, but the hub init threads are indeed
independent of each other.

> then a single init work could be queued to
> the system_unbound_wq which doesn't care about running times.

This sort of thing sounds like the best approach. Tejun, do you want
to rewrite the patch, getting rid of the hub_init_func3 and HUB_INIT3
business entirely? Or would you like me to do it?

Alan Stern

2014-02-23 01:29:00

by Stefan Richter

[permalink] [raw]
Subject: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

Hi Paul,

in patch "Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK" (sic),
you wrote:
+ Memory operations issued before the LOCK may be completed after the
+ LOCK operation has completed. An smp_mb__before_spinlock(), combined
+ with a following LOCK, orders prior loads against subsequent stores
+ and stores and prior stores against subsequent stores. Note that

Is there a "and stores" too many? Or was one "stores" mistyped and meant
to be something else? Or what else is meant?

@@ -1677,13 +1681,57 @@ LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
two accesses can themselves then cross:

*A = a;
- LOCK
- UNLOCK
+ LOCK M
+ UNLOCK M
*B = b;

may occur as:

- LOCK, STORE *B, STORE *A, UNLOCK
+ LOCK M, STORE *B, STORE *A, UNLOCK M
+
+This same reordering can of course occur if the LOCK and UNLOCK are
+to the same lock variable, but only from the perspective of another
+CPU not holding that lock.

The example says "LOCK M" and "UNLOCK M" (since the patch). I read
this as LOCK and UNLOCK to the same variable, M. Why does the
following sentence then say that "this same reordering can... occur
if the LOCK and UNLOCK are to the same lock variable"? This sentence
would make sense if the example had been about LOCK M, UNLOCK N.

+In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
+memory barrier because it is possible for a preceding UNLOCK to pass a
+later LOCK from the viewpoint of the CPU, but not from the viewpoint
+of the compiler. Note that deadlocks cannot be introduced by this
+interchange because if such a deadlock threatened, the UNLOCK would
+simply complete.

So rather than deadlock, "the UNLOCK would simply complete". But
/why/ does it complete? It is left unclear (to me at least), why
it would do so. IOW, what mechanism will make it always proceed
to the UNLOCK? Without knowing that, it is left entirely unclear
(to me) why the deadlock wouldn't happen.
--
Stefan Richter
-=====-====- --=- =-===
http://arcgraph.de/sr/

2014-02-23 04:29:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] usb: don't use PREPARE_DELAYED_WORK

Hey, Alan.

On Sat, Feb 22, 2014 at 06:03:04PM -0500, Alan Stern wrote:
> > then a single init work could be queued to
> > the system_unbound_wq which doesn't care about running times.
>
> This sort of thing sounds like the best approach. Tejun, do you want
> to rewrite the patch, getting rid of the hub_init_func3 and HUB_INIT3
> business entirely? Or would you like me to do it?

I'll doing the minimal patch in this series and then following up with
the update is probably better. I can put the patches in a separate
branch so that it can be easily pulled.

Thanks.

--
tejun

2014-02-23 16:37:55

by Paul E. McKenney

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

On Sun, Feb 23, 2014 at 02:23:03AM +0100, Stefan Richter wrote:
> Hi Paul,
>
> in patch "Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK" (sic),
> you wrote:
> + Memory operations issued before the LOCK may be completed after the
> + LOCK operation has completed. An smp_mb__before_spinlock(), combined
> + with a following LOCK, orders prior loads against subsequent stores
> + and stores and prior stores against subsequent stores. Note that
>
> Is there a "and stores" too many? Or was one "stores" mistyped and meant
> to be something else? Or what else is meant?

Good catch! (This should also answer Peter Hurley's concern on another
email thread.)

The last "stores" on the third line should be a "loads":

Memory operations issued before the ACQUIRE may be
completed after the ACQUIRE operation has completed. An
smp_mb__before_spinlock(), combined with a following ACQUIRE,
orders prior loads against subsequent loads and stores and
also orders prior stores against subsequent stores. Note that
this is weaker than smp_mb()! The smp_mb__before_spinlock()
primitive is free on many architectures.

> @@ -1677,13 +1681,57 @@ LOCK, and an access following the UNLOCK to happen before the UNLOCK, and the
> two accesses can themselves then cross:
>
> *A = a;
> - LOCK
> - UNLOCK
> + LOCK M
> + UNLOCK M
> *B = b;
>
> may occur as:
>
> - LOCK, STORE *B, STORE *A, UNLOCK
> + LOCK M, STORE *B, STORE *A, UNLOCK M
> +
> +This same reordering can of course occur if the LOCK and UNLOCK are
> +to the same lock variable, but only from the perspective of another
> +CPU not holding that lock.
>
> The example says "LOCK M" and "UNLOCK M" (since the patch). I read
> this as LOCK and UNLOCK to the same variable, M. Why does the
> following sentence then say that "this same reordering can... occur
> if the LOCK and UNLOCK are to the same lock variable"? This sentence
> would make sense if the example had been about LOCK M, UNLOCK N.

Good point. How about the following?

When the ACQUIRE and RELEASE are a lock acquisition and release,
respectively, this same reordering can of course occur if the
lock's ACQUIRE and RELEASE are to the same lock variable, but
only from the perspective of another CPU not holding that lock.

> +In short, an UNLOCK followed by a LOCK may -not- be assumed to be a full
> +memory barrier because it is possible for a preceding UNLOCK to pass a
> +later LOCK from the viewpoint of the CPU, but not from the viewpoint
> +of the compiler. Note that deadlocks cannot be introduced by this
> +interchange because if such a deadlock threatened, the UNLOCK would
> +simply complete.
>
> So rather than deadlock, "the UNLOCK would simply complete". But
> /why/ does it complete? It is left unclear (to me at least), why
> it would do so. IOW, what mechanism will make it always proceed
> to the UNLOCK? Without knowing that, it is left entirely unclear
> (to me) why the deadlock wouldn't happen.

One key point is that we are only talking about the CPU doing the
interchanging, not the compiler. If the compiler (or, for that matter,
the developer) switched the operations, deadlock -could- occur.

But suppose the CPU interchanged the operations. In this case, the
unlock precede the lock in the assembly code. The CPU simply elected
to try executing the lock operation first. If there is a deadlock,
this lock operation will simply spin (or try to sleep, but more on
that later). The CPU will eventually execute the unlock operation
(which again preceded the lock operation in the assembly code), which
will unravel the potential deadlock.

But what if the lock is for a sleeplock? In that case, the code will
try to enter the scheduler, where it will eventually encounter a
memory barrier, which will force the unlock operation to complete,
again unraveling the deadlock.

Please see below for a patch against the current version of
Documentation/memory-barriers.txt. Does this update help?

Thanx, Paul

------------------------------------------------------------------------

commit aba6b0e82c9de53eb032844f1932599f148ff68d
Author: Paul E. McKenney <[email protected]>
Date: Sun Feb 23 08:34:24 2014 -0800

Documentation/memory-barriers.txt: Clarify release/acquire ordering

This commit fixes a couple of typos and clarifies what happens when
the CPU chooses to execute a later lock acquisition before a prior
lock release, in particular, why deadlock is avoided.

Reported-by: Peter Hurley <[email protected]>
Reported-by: James Bottomley <[email protected]>
Reported-by: Stefan Richter <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 9dde54c55b24..c8932e06edf1 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
Memory operations issued after the ACQUIRE will be completed after the
ACQUIRE operation has completed.

- Memory operations issued before the ACQUIRE may be completed after the
- ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
- with a following ACQUIRE, orders prior loads against subsequent stores and
- stores and prior stores against subsequent stores. Note that this is
- weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
- many architectures.
+ Memory operations issued before the ACQUIRE may be completed after
+ the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
+ combined with a following ACQUIRE, orders prior loads against
+ subsequent loads and stores and also orders prior stores against
+ subsequent stores. Note that this is weaker than smp_mb()! The
+ smp_mb__before_spinlock() primitive is free on many architectures.

(2) RELEASE operation implication:

@@ -1717,23 +1717,47 @@ the two accesses can themselves then cross:

*A = a;
ACQUIRE M
- RELEASE M
+ RELEASE N
*B = b;

may occur as:

- ACQUIRE M, STORE *B, STORE *A, RELEASE M
-
-This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
-to the same lock variable, but only from the perspective of another CPU not
-holding that lock.
-
-In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
-memory barrier because it is possible for a preceding RELEASE to pass a
-later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
-of the compiler. Note that deadlocks cannot be introduced by this
-interchange because if such a deadlock threatened, the RELEASE would
-simply complete.
+ ACQUIRE M, STORE *B, STORE *A, RELEASE N
+
+When the ACQUIRE and RELEASE are a lock acquisition and release,
+respectively, this same reordering can of course occur if the lock's
+ACQUIRE and RELEASE are to the same lock variable, but only from the
+perspective of another CPU not holding that lock.
+
+In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be
+a full memory barrier because it is possible for a preceding RELEASE
+to pass a later ACQUIRE from the viewpoint of the CPU, but not from the
+viewpoint of the compiler. Note that the CPU cannot introduce deadlocks
+with this interchange because if such a deadlock threatened, the RELEASE
+would simply complete.
+
+ Why does this work?
+
+ One key point is that we are only talking about the CPU doing
+ the interchanging, not the compiler. If the compiler (or, for
+ that matter, the developer) switched the operations, deadlock
+ -could- occur.
+
+ But suppose the CPU interchanged the operations. In this case,
+ the unlock precedes the lock in the assembly code. The CPU simply
+ elected to try executing the later lock operation first. If there
+ is a deadlock, this lock operation will simply spin (or try to
+ sleep, but more on that later). The CPU will eventually execute
+ the unlock operation (which again preceded the lock operation
+ in the assembly code), which will unravel the potential deadlock,
+ allowing the lock operation to succeed.
+
+ But what if the lock is a sleeplock? In that case, the code will
+ try to enter the scheduler, where it will eventually encounter
+ a memory barrier, which will force the earlier unlock operation
+ to complete, again unraveling the deadlock. There might be
+ a sleep-unlock race, but the locking primitive needs to resolve
+ such races properly in any case.

If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This

2014-02-23 20:06:06

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

On Sat, 2014-02-22 at 14:03 -0500, Peter Hurley wrote:
> If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
> ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This
> will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
> executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
> same variable. The smp_mb__after_unlock_lock() primitive is free on many
> architectures. Without smp_mb__after_unlock_lock(), the critical sections
> corresponding to the RELEASE and the ACQUIRE can cross:
>
> *A = a;
> RELEASE M
> ACQUIRE N
> *B = b;
>
> could occur as:
>
> ACQUIRE N, STORE *B, STORE *A, RELEASE M

Ah, OK, that's an error in the documentation. The example should read

*A = a;
RELEASE *N*
ACQUIRE *M*
*B = b;

The point being you can't have speculation that entangles critical
sections, as I've been saying, because that would speculate you into
ABBA deadlocks. Paul McKenny will submit a patch fixing the bug in
documentation.

James

2014-02-23 20:35:44

by Peter Hurley

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

Hi Paul,

On 02/23/2014 11:37 AM, Paul E. McKenney wrote:
> commit aba6b0e82c9de53eb032844f1932599f148ff68d
> Author: Paul E. McKenney <[email protected]>
> Date: Sun Feb 23 08:34:24 2014 -0800
>
> Documentation/memory-barriers.txt: Clarify release/acquire ordering
>
> This commit fixes a couple of typos and clarifies what happens when
> the CPU chooses to execute a later lock acquisition before a prior
> lock release, in particular, why deadlock is avoided.
>
> Reported-by: Peter Hurley <[email protected]>
> Reported-by: James Bottomley <[email protected]>
> Reported-by: Stefan Richter <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 9dde54c55b24..c8932e06edf1 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
> Memory operations issued after the ACQUIRE will be completed after the
> ACQUIRE operation has completed.
>
> - Memory operations issued before the ACQUIRE may be completed after the
> - ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
> - with a following ACQUIRE, orders prior loads against subsequent stores and
> - stores and prior stores against subsequent stores. Note that this is
> - weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
> - many architectures.
> + Memory operations issued before the ACQUIRE may be completed after
> + the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
> + combined with a following ACQUIRE, orders prior loads against
> + subsequent loads and stores and also orders prior stores against
> + subsequent stores. Note that this is weaker than smp_mb()! The
> + smp_mb__before_spinlock() primitive is free on many architectures.
>
> (2) RELEASE operation implication:
>
> @@ -1717,23 +1717,47 @@ the two accesses can themselves then cross:
>
> *A = a;
> ACQUIRE M
> - RELEASE M
> + RELEASE N
> *B = b;
>
> may occur as:
>
> - ACQUIRE M, STORE *B, STORE *A, RELEASE M

This example should remain as is; it refers to the porosity of a critical
section to loads and stores occurring outside that critical section, and
importantly that LOCK + UNLOCK is not a full barrier. It documents that
memory operations from either side of the critical section may cross
(in the absence of other specific memory barriers). IOW, it is the example
to implication #1 above.

Regards,
Peter Hurley

2014-02-23 22:32:31

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK

Hi James,

On 02/23/2014 03:05 PM, James Bottomley wrote:
> On Sat, 2014-02-22 at 14:03 -0500, Peter Hurley wrote:
>> If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
>> ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This
>> will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
>> executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
>> same variable. The smp_mb__after_unlock_lock() primitive is free on many
>> architectures. Without smp_mb__after_unlock_lock(), the critical sections
>> corresponding to the RELEASE and the ACQUIRE can cross:
>>
>> *A = a;
>> RELEASE M
>> ACQUIRE N
>> *B = b;
>>
>> could occur as:
>>
>> ACQUIRE N, STORE *B, STORE *A, RELEASE M
>
> Ah, OK, that's an error in the documentation.

AFAIK, Paul will not be changing the quoted text above.

> The example should read
>
> *A = a;
> RELEASE *N*
> ACQUIRE *M*
> *B = b;
>
> The point being you can't have speculation that entangles critical
> sections, as I've been saying, because that would speculate you into
> ABBA deadlocks. Paul McKenny will submit a patch fixing the bug in
> documentation.

The reason why there is no deadlock here is because the RELEASE M is
not dependent on the ACQUIRE N to complete.

If the attempt to ACQUIRE N is speculated before the RELEASE M, two
possibilities exist:
1. N is not owned, so the ACQUIRE is immediately successful, or
2. N is owned, so the attempted ACQUIRE is not immediately successful.

However, in both cases the RELEASE M will still complete, having already
been started (since it occurred before in the instruction stream).

Regards,
Peter Hurley



2014-02-23 23:50:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

On Sun, Feb 23, 2014 at 03:35:31PM -0500, Peter Hurley wrote:
> Hi Paul,
>
> On 02/23/2014 11:37 AM, Paul E. McKenney wrote:
> >commit aba6b0e82c9de53eb032844f1932599f148ff68d
> >Author: Paul E. McKenney <[email protected]>
> >Date: Sun Feb 23 08:34:24 2014 -0800
> >
> > Documentation/memory-barriers.txt: Clarify release/acquire ordering
> >
> > This commit fixes a couple of typos and clarifies what happens when
> > the CPU chooses to execute a later lock acquisition before a prior
> > lock release, in particular, why deadlock is avoided.
> >
> > Reported-by: Peter Hurley <[email protected]>
> > Reported-by: James Bottomley <[email protected]>
> > Reported-by: Stefan Richter <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> >diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> >index 9dde54c55b24..c8932e06edf1 100644
> >--- a/Documentation/memory-barriers.txt
> >+++ b/Documentation/memory-barriers.txt
> >@@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
> > Memory operations issued after the ACQUIRE will be completed after the
> > ACQUIRE operation has completed.
> >
> >- Memory operations issued before the ACQUIRE may be completed after the
> >- ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
> >- with a following ACQUIRE, orders prior loads against subsequent stores and
> >- stores and prior stores against subsequent stores. Note that this is
> >- weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
> >- many architectures.
> >+ Memory operations issued before the ACQUIRE may be completed after
> >+ the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
> >+ combined with a following ACQUIRE, orders prior loads against
> >+ subsequent loads and stores and also orders prior stores against
> >+ subsequent stores. Note that this is weaker than smp_mb()! The
> >+ smp_mb__before_spinlock() primitive is free on many architectures.
> >
> > (2) RELEASE operation implication:
> >
> >@@ -1717,23 +1717,47 @@ the two accesses can themselves then cross:
> >
> > *A = a;
> > ACQUIRE M
> >- RELEASE M
> >+ RELEASE N
> > *B = b;
> >
> > may occur as:
> >
> >- ACQUIRE M, STORE *B, STORE *A, RELEASE M
>
> This example should remain as is; it refers to the porosity of a critical
> section to loads and stores occurring outside that critical section, and
> importantly that LOCK + UNLOCK is not a full barrier. It documents that
> memory operations from either side of the critical section may cross
> (in the absence of other specific memory barriers). IOW, it is the example
> to implication #1 above.

Good point, I needed to apply the changes further down. How does the
following updated patch look?

Thanx, Paul

------------------------------------------------------------------------

commit 528c2771288df7f98f9224a56b93bdb2db27ec70
Author: Paul E. McKenney <[email protected]>
Date: Sun Feb 23 08:34:24 2014 -0800

Documentation/memory-barriers.txt: Clarify release/acquire ordering

This commit fixes a couple of typos and clarifies what happens when
the CPU chooses to execute a later lock acquisition before a prior
lock release, in particular, why deadlock is avoided.

Reported-by: Peter Hurley <[email protected]>
Reported-by: James Bottomley <[email protected]>
Reported-by: Stefan Richter <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 9dde54c55b24..9ea6de4eb252 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
Memory operations issued after the ACQUIRE will be completed after the
ACQUIRE operation has completed.

- Memory operations issued before the ACQUIRE may be completed after the
- ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
- with a following ACQUIRE, orders prior loads against subsequent stores and
- stores and prior stores against subsequent stores. Note that this is
- weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
- many architectures.
+ Memory operations issued before the ACQUIRE may be completed after
+ the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
+ combined with a following ACQUIRE, orders prior loads against
+ subsequent loads and stores and also orders prior stores against
+ subsequent stores. Note that this is weaker than smp_mb()! The
+ smp_mb__before_spinlock() primitive is free on many architectures.

(2) RELEASE operation implication:

@@ -1724,24 +1724,20 @@ may occur as:

ACQUIRE M, STORE *B, STORE *A, RELEASE M

-This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
-to the same lock variable, but only from the perspective of another CPU not
-holding that lock.
-
-In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
-memory barrier because it is possible for a preceding RELEASE to pass a
-later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
-of the compiler. Note that deadlocks cannot be introduced by this
-interchange because if such a deadlock threatened, the RELEASE would
-simply complete.
+When the ACQUIRE and RELEASE are a lock acquisition and release,
+respectively, this same reordering can of course occur if the lock's
+ACQUIRE and RELEASE are to the same lock variable, but only from the
+perspective of another CPU not holding that lock. In short, a RELEASE
+followed by an ACQUIRE may -not- be assumed to be a full memory barrier.

If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This
will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
same variable. The smp_mb__after_unlock_lock() primitive is free on many
-architectures. Without smp_mb__after_unlock_lock(), the critical sections
-corresponding to the RELEASE and the ACQUIRE can cross:
+architectures. Without smp_mb__after_unlock_lock(), the CPU's execution of
+the critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:

*A = a;
RELEASE M
@@ -1752,7 +1748,36 @@ could occur as:

ACQUIRE N, STORE *B, STORE *A, RELEASE M

-With smp_mb__after_unlock_lock(), they cannot, so that:
+It might appear that this rearrangement could introduce a deadlock.
+However, this cannot happen because if such a deadlock threatened,
+the RELEASE would simply complete, thereby avoiding the deadlock.
+
+ Why does this work?
+
+ One key point is that we are only talking about the CPU doing
+ the interchanging, not the compiler. If the compiler (or, for
+ that matter, the developer) switched the operations, deadlock
+ -could- occur.
+
+ But suppose the CPU interchanged the operations. In this case,
+ the unlock precedes the lock in the assembly code. The CPU simply
+ elected to try executing the later lock operation first. If there
+ is a deadlock, this lock operation will simply spin (or try to
+ sleep, but more on that later). The CPU will eventually execute
+ the unlock operation (which again preceded the lock operation
+ in the assembly code), which will unravel the potential deadlock,
+ allowing the lock operation to succeed.
+
+ But what if the lock is a sleeplock? In that case, the code will
+ try to enter the scheduler, where it will eventually encounter
+ a memory barrier, which will force the earlier unlock operation
+ to complete, again unraveling the deadlock. There might be
+ a sleep-unlock race, but the locking primitive needs to resolve
+ such races properly in any case.
+
+With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
+For example, with the following code, the store to *A will always be
+seen by other CPUs before the store to *B:

*A = a;
RELEASE M
@@ -1760,13 +1785,18 @@ With smp_mb__after_unlock_lock(), they cannot, so that:
smp_mb__after_unlock_lock();
*B = b;

-will always occur as either of the following:
+The operations will always occur in one of the following orders:

- STORE *A, RELEASE, ACQUIRE, STORE *B
- STORE *A, ACQUIRE, RELEASE, STORE *B
+ STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
+ STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
+ ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B

-If the RELEASE and ACQUIRE were instead both operating on the same lock
-variable, only the first of these two alternatives can occur.
+If the RELEASE and ACQUIRE were instead both operating on the
+same lock variable, only the first of these two alternatives can
+occur. In addition, the more strongly ordered systems may rule out
+some of the above orders. But in any case, as noted earlier, the
+smp_mb__after_unlock_lock() ensures that the store to *A will always be
+seen as happening before the store to *B.

Locks and semaphores may not provide any guarantee of ordering on UP compiled
systems, and so cannot be counted on in such a situation to actually achieve

2014-02-24 00:10:03

by Peter Hurley

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

On 02/23/2014 06:50 PM, Paul E. McKenney wrote:
> On Sun, Feb 23, 2014 at 03:35:31PM -0500, Peter Hurley wrote:
>> Hi Paul,
>>
>> On 02/23/2014 11:37 AM, Paul E. McKenney wrote:
>>> commit aba6b0e82c9de53eb032844f1932599f148ff68d
>>> Author: Paul E. McKenney <[email protected]>
>>> Date: Sun Feb 23 08:34:24 2014 -0800
>>>
>>> Documentation/memory-barriers.txt: Clarify release/acquire ordering
>>>
>>> This commit fixes a couple of typos and clarifies what happens when
>>> the CPU chooses to execute a later lock acquisition before a prior
>>> lock release, in particular, why deadlock is avoided.
>>>
>>> Reported-by: Peter Hurley <[email protected]>
>>> Reported-by: James Bottomley <[email protected]>
>>> Reported-by: Stefan Richter <[email protected]>
>>> Signed-off-by: Paul E. McKenney <[email protected]>
>>>
>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>>> index 9dde54c55b24..c8932e06edf1 100644
>>> --- a/Documentation/memory-barriers.txt
>>> +++ b/Documentation/memory-barriers.txt
>>> @@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
>>> Memory operations issued after the ACQUIRE will be completed after the
>>> ACQUIRE operation has completed.
>>>
>>> - Memory operations issued before the ACQUIRE may be completed after the
>>> - ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
>>> - with a following ACQUIRE, orders prior loads against subsequent stores and
>>> - stores and prior stores against subsequent stores. Note that this is
>>> - weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
>>> - many architectures.
>>> + Memory operations issued before the ACQUIRE may be completed after
>>> + the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
>>> + combined with a following ACQUIRE, orders prior loads against
>>> + subsequent loads and stores and also orders prior stores against
>>> + subsequent stores. Note that this is weaker than smp_mb()! The
>>> + smp_mb__before_spinlock() primitive is free on many architectures.
>>>
>>> (2) RELEASE operation implication:
>>>
>>> @@ -1717,23 +1717,47 @@ the two accesses can themselves then cross:
>>>
>>> *A = a;
>>> ACQUIRE M
>>> - RELEASE M
>>> + RELEASE N
>>> *B = b;
>>>
>>> may occur as:
>>>
>>> - ACQUIRE M, STORE *B, STORE *A, RELEASE M
>>
>> This example should remain as is; it refers to the porosity of a critical
>> section to loads and stores occurring outside that critical section, and
>> importantly that LOCK + UNLOCK is not a full barrier. It documents that
>> memory operations from either side of the critical section may cross
>> (in the absence of other specific memory barriers). IOW, it is the example
>> to implication #1 above.
>
> Good point, I needed to apply the changes further down. How does the
> following updated patch look?
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 528c2771288df7f98f9224a56b93bdb2db27ec70
> Author: Paul E. McKenney <[email protected]>
> Date: Sun Feb 23 08:34:24 2014 -0800
>
> Documentation/memory-barriers.txt: Clarify release/acquire ordering
>
> This commit fixes a couple of typos and clarifies what happens when
> the CPU chooses to execute a later lock acquisition before a prior
> lock release, in particular, why deadlock is avoided.
>
> Reported-by: Peter Hurley <[email protected]>
> Reported-by: James Bottomley <[email protected]>
> Reported-by: Stefan Richter <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 9dde54c55b24..9ea6de4eb252 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
> Memory operations issued after the ACQUIRE will be completed after the
> ACQUIRE operation has completed.
>
> - Memory operations issued before the ACQUIRE may be completed after the
> - ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
> - with a following ACQUIRE, orders prior loads against subsequent stores and
> - stores and prior stores against subsequent stores. Note that this is
> - weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
> - many architectures.
> + Memory operations issued before the ACQUIRE may be completed after
> + the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
> + combined with a following ACQUIRE, orders prior loads against
> + subsequent loads and stores and also orders prior stores against
> + subsequent stores. Note that this is weaker than smp_mb()! The
> + smp_mb__before_spinlock() primitive is free on many architectures.
>
> (2) RELEASE operation implication:
>
> @@ -1724,24 +1724,20 @@ may occur as:
>
> ACQUIRE M, STORE *B, STORE *A, RELEASE M
>
> -This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
> -to the same lock variable, but only from the perspective of another CPU not
> -holding that lock.
> -
> -In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
> -memory barrier because it is possible for a preceding RELEASE to pass a
> -later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
> -of the compiler. Note that deadlocks cannot be introduced by this
> -interchange because if such a deadlock threatened, the RELEASE would
> -simply complete.
> +When the ACQUIRE and RELEASE are a lock acquisition and release,
> +respectively, this same reordering can of course occur if the lock's
^^^^^^^
[delete?]

> +ACQUIRE and RELEASE are to the same lock variable, but only from the
> +perspective of another CPU not holding that lock.

In the above, are you introducing UNLOCK + LOCK not being a full barrier
or are you further elaborating the non-barrier that is LOCK + UNLOCK.

If you mean the first, might I suggest something like,

"Similarly, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
memory barrier."

as the introductory sentence.

> In short, a RELEASE
> +followed by an ACQUIRE may -not- be assumed to be a full memory barrier.


> If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
> ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This
> will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
> executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
> same variable. The smp_mb__after_unlock_lock() primitive is free on many
> -architectures. Without smp_mb__after_unlock_lock(), the critical sections
> -corresponding to the RELEASE and the ACQUIRE can cross:
> +architectures. Without smp_mb__after_unlock_lock(), the CPU's execution of
> +the critical sections corresponding to the RELEASE and the ACQUIRE can cross,
> +so that:
>
> *A = a;
> RELEASE M
> @@ -1752,7 +1748,36 @@ could occur as:
>
> ACQUIRE N, STORE *B, STORE *A, RELEASE M
>
> -With smp_mb__after_unlock_lock(), they cannot, so that:
> +It might appear that this rearrangement could introduce a deadlock.
> +However, this cannot happen because if such a deadlock threatened,
> +the RELEASE would simply complete, thereby avoiding the deadlock.
> +
> + Why does this work?
> +
> + One key point is that we are only talking about the CPU doing
> + the interchanging, not the compiler. If the compiler (or, for
^
reordering?
> + that matter, the developer) switched the operations, deadlock
> + -could- occur.
> +
> + But suppose the CPU interchanged the operations. In this case,
^
reordered?

> + the unlock precedes the lock in the assembly code. The CPU simply
> + elected to try executing the later lock operation first. If there
> + is a deadlock, this lock operation will simply spin (or try to
> + sleep, but more on that later). The CPU will eventually execute
> + the unlock operation (which again preceded the lock operation
^^
[delete?]
> + in the assembly code), which will unravel the potential deadlock,
> + allowing the lock operation to succeed.
> +
> + But what if the lock is a sleeplock? In that case, the code will
> + try to enter the scheduler, where it will eventually encounter
> + a memory barrier, which will force the earlier unlock operation
> + to complete, again unraveling the deadlock. There might be
> + a sleep-unlock race, but the locking primitive needs to resolve
> + such races properly in any case.
> +
> +With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
> +For example, with the following code, the store to *A will always be
> +seen by other CPUs before the store to *B:
>
> *A = a;
> RELEASE M
> @@ -1760,13 +1785,18 @@ With smp_mb__after_unlock_lock(), they cannot, so that:
> smp_mb__after_unlock_lock();
> *B = b;
>
> -will always occur as either of the following:
> +The operations will always occur in one of the following orders:
>
> - STORE *A, RELEASE, ACQUIRE, STORE *B
> - STORE *A, ACQUIRE, RELEASE, STORE *B
> + STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> + STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> + ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
>
> -If the RELEASE and ACQUIRE were instead both operating on the same lock
> -variable, only the first of these two alternatives can occur.
> +If the RELEASE and ACQUIRE were instead both operating on the
> +same lock variable, only the first of these two alternatives can
> +occur. In addition, the more strongly ordered systems may rule out
> +some of the above orders. But in any case, as noted earlier, the
> +smp_mb__after_unlock_lock() ensures that the store to *A will always be
> +seen as happening before the store to *B.
>
> Locks and semaphores may not provide any guarantee of ordering on UP compiled
> systems, and so cannot be counted on in such a situation to actually achieve

Thanks for your work on these docs (and rcu and locks and multi-arch barriers
in general :) )

Regards,
Peter Hurley

2014-02-24 01:36:34

by Stefan Richter

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

On Feb 23 Paul E. McKenney wrote:
>>> Please see below for a patch against the current version of
>>> Documentation/memory-barriers.txt. Does this update help?

Thank you, this clarifies it.

[...]
A new nit:
> +The operations will always occur in one of the following orders:
>
> - STORE *A, RELEASE, ACQUIRE, STORE *B
> - STORE *A, ACQUIRE, RELEASE, STORE *B
> + STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> + STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> + ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
>
> -If the RELEASE and ACQUIRE were instead both operating on the same lock
> -variable, only the first of these two alternatives can occur.
> +If the RELEASE and ACQUIRE were instead both operating on the
> +same lock variable, only the first of these two alternatives can
> +occur.
^^^
...these {,three} alternatives...
--
Stefan Richter
-=====-====- --=- ==---
http://arcgraph.de/sr/

2014-02-24 16:26:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

On Sun, Feb 23, 2014 at 07:09:55PM -0500, Peter Hurley wrote:
> On 02/23/2014 06:50 PM, Paul E. McKenney wrote:
> >On Sun, Feb 23, 2014 at 03:35:31PM -0500, Peter Hurley wrote:
> >>Hi Paul,
> >>
> >>On 02/23/2014 11:37 AM, Paul E. McKenney wrote:
> >>>commit aba6b0e82c9de53eb032844f1932599f148ff68d
> >>>Author: Paul E. McKenney <[email protected]>
> >>>Date: Sun Feb 23 08:34:24 2014 -0800
> >>>
> >>> Documentation/memory-barriers.txt: Clarify release/acquire ordering
> >>>
> >>> This commit fixes a couple of typos and clarifies what happens when
> >>> the CPU chooses to execute a later lock acquisition before a prior
> >>> lock release, in particular, why deadlock is avoided.
> >>>
> >>> Reported-by: Peter Hurley <[email protected]>
> >>> Reported-by: James Bottomley <[email protected]>
> >>> Reported-by: Stefan Richter <[email protected]>
> >>> Signed-off-by: Paul E. McKenney <[email protected]>
> >>>
> >>>diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> >>>index 9dde54c55b24..c8932e06edf1 100644
> >>>--- a/Documentation/memory-barriers.txt
> >>>+++ b/Documentation/memory-barriers.txt
> >>>@@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
> >>> Memory operations issued after the ACQUIRE will be completed after the
> >>> ACQUIRE operation has completed.
> >>>
> >>>- Memory operations issued before the ACQUIRE may be completed after the
> >>>- ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
> >>>- with a following ACQUIRE, orders prior loads against subsequent stores and
> >>>- stores and prior stores against subsequent stores. Note that this is
> >>>- weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
> >>>- many architectures.
> >>>+ Memory operations issued before the ACQUIRE may be completed after
> >>>+ the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
> >>>+ combined with a following ACQUIRE, orders prior loads against
> >>>+ subsequent loads and stores and also orders prior stores against
> >>>+ subsequent stores. Note that this is weaker than smp_mb()! The
> >>>+ smp_mb__before_spinlock() primitive is free on many architectures.
> >>>
> >>> (2) RELEASE operation implication:
> >>>
> >>>@@ -1717,23 +1717,47 @@ the two accesses can themselves then cross:
> >>>
> >>> *A = a;
> >>> ACQUIRE M
> >>>- RELEASE M
> >>>+ RELEASE N
> >>> *B = b;
> >>>
> >>> may occur as:
> >>>
> >>>- ACQUIRE M, STORE *B, STORE *A, RELEASE M
> >>
> >>This example should remain as is; it refers to the porosity of a critical
> >>section to loads and stores occurring outside that critical section, and
> >>importantly that LOCK + UNLOCK is not a full barrier. It documents that
> >>memory operations from either side of the critical section may cross
> >>(in the absence of other specific memory barriers). IOW, it is the example
> >>to implication #1 above.
> >
> >Good point, I needed to apply the changes further down. How does the
> >following updated patch look?
> >
> > Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >commit 528c2771288df7f98f9224a56b93bdb2db27ec70
> >Author: Paul E. McKenney <[email protected]>
> >Date: Sun Feb 23 08:34:24 2014 -0800
> >
> > Documentation/memory-barriers.txt: Clarify release/acquire ordering
> >
> > This commit fixes a couple of typos and clarifies what happens when
> > the CPU chooses to execute a later lock acquisition before a prior
> > lock release, in particular, why deadlock is avoided.
> >
> > Reported-by: Peter Hurley <[email protected]>
> > Reported-by: James Bottomley <[email protected]>
> > Reported-by: Stefan Richter <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> >diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> >index 9dde54c55b24..9ea6de4eb252 100644
> >--- a/Documentation/memory-barriers.txt
> >+++ b/Documentation/memory-barriers.txt
> >@@ -1674,12 +1674,12 @@ for each construct. These operations all imply certain barriers:
> > Memory operations issued after the ACQUIRE will be completed after the
> > ACQUIRE operation has completed.
> >
> >- Memory operations issued before the ACQUIRE may be completed after the
> >- ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined
> >- with a following ACQUIRE, orders prior loads against subsequent stores and
> >- stores and prior stores against subsequent stores. Note that this is
> >- weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on
> >- many architectures.
> >+ Memory operations issued before the ACQUIRE may be completed after
> >+ the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
> >+ combined with a following ACQUIRE, orders prior loads against
> >+ subsequent loads and stores and also orders prior stores against
> >+ subsequent stores. Note that this is weaker than smp_mb()! The
> >+ smp_mb__before_spinlock() primitive is free on many architectures.
> >
> > (2) RELEASE operation implication:
> >
> >@@ -1724,24 +1724,20 @@ may occur as:
> >
> > ACQUIRE M, STORE *B, STORE *A, RELEASE M
> >
> >-This same reordering can of course occur if the lock's ACQUIRE and RELEASE are
> >-to the same lock variable, but only from the perspective of another CPU not
> >-holding that lock.
> >-
> >-In short, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
> >-memory barrier because it is possible for a preceding RELEASE to pass a
> >-later ACQUIRE from the viewpoint of the CPU, but not from the viewpoint
> >-of the compiler. Note that deadlocks cannot be introduced by this
> >-interchange because if such a deadlock threatened, the RELEASE would
> >-simply complete.
> >+When the ACQUIRE and RELEASE are a lock acquisition and release,
> >+respectively, this same reordering can of course occur if the lock's
> ^^^^^^^
> [delete?]

If you insist. ;-) Done.

> >+ACQUIRE and RELEASE are to the same lock variable, but only from the
> >+perspective of another CPU not holding that lock.
>
> In the above, are you introducing UNLOCK + LOCK not being a full barrier
> or are you further elaborating the non-barrier that is LOCK + UNLOCK.
>
> If you mean the first, might I suggest something like,
>
> "Similarly, a RELEASE followed by an ACQUIRE may -not- be assumed to be a full
> memory barrier."
>
> as the introductory sentence.

Nope, just got the "ACQUIRE" and "RELEASE" backwards. The sentence
below now reads:

In short, a ACQUIRE followed by an RELEASE may -not- be assumed
to be a full memory barrier.

> > In short, a RELEASE
> >+followed by an ACQUIRE may -not- be assumed to be a full memory barrier.

But I do need an transition sentence before the following paragraph.

I added this to the beginning of the following paragraph:

Similarly, the reverse case of a RELEASE followed by an ACQUIRE
does not imply a full memory barrier.

> > If it is necessary for a RELEASE-ACQUIRE pair to produce a full barrier, the
> > ACQUIRE can be followed by an smp_mb__after_unlock_lock() invocation. This
> > will produce a full barrier if either (a) the RELEASE and the ACQUIRE are
> > executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on the
> > same variable. The smp_mb__after_unlock_lock() primitive is free on many
> >-architectures. Without smp_mb__after_unlock_lock(), the critical sections
> >-corresponding to the RELEASE and the ACQUIRE can cross:
> >+architectures. Without smp_mb__after_unlock_lock(), the CPU's execution of
> >+the critical sections corresponding to the RELEASE and the ACQUIRE can cross,
> >+so that:
> >
> > *A = a;
> > RELEASE M
> >@@ -1752,7 +1748,36 @@ could occur as:
> >
> > ACQUIRE N, STORE *B, STORE *A, RELEASE M
> >
> >-With smp_mb__after_unlock_lock(), they cannot, so that:
> >+It might appear that this rearrangement could introduce a deadlock.
> >+However, this cannot happen because if such a deadlock threatened,
> >+the RELEASE would simply complete, thereby avoiding the deadlock.
> >+
> >+ Why does this work?
> >+
> >+ One key point is that we are only talking about the CPU doing
> >+ the interchanging, not the compiler. If the compiler (or, for
> ^
> reordering?
> >+ that matter, the developer) switched the operations, deadlock
> >+ -could- occur.
> >+
> >+ But suppose the CPU interchanged the operations. In this case,
> ^
> reordered?

Agreed, avoiding random use of synonyms makes it clearer. These are
now changed as you suggest.

> >+ the unlock precedes the lock in the assembly code. The CPU simply
> >+ elected to try executing the later lock operation first. If there
> >+ is a deadlock, this lock operation will simply spin (or try to
> >+ sleep, but more on that later). The CPU will eventually execute
> >+ the unlock operation (which again preceded the lock operation
> ^^
> [delete?]

I guess I didn't explicitly call out this before, so agree on deleting
"again". Done.

> >+ in the assembly code), which will unravel the potential deadlock,
> >+ allowing the lock operation to succeed.
> >+
> >+ But what if the lock is a sleeplock? In that case, the code will
> >+ try to enter the scheduler, where it will eventually encounter
> >+ a memory barrier, which will force the earlier unlock operation
> >+ to complete, again unraveling the deadlock. There might be
> >+ a sleep-unlock race, but the locking primitive needs to resolve
> >+ such races properly in any case.
> >+
> >+With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
> >+For example, with the following code, the store to *A will always be
> >+seen by other CPUs before the store to *B:
> >
> > *A = a;
> > RELEASE M
> >@@ -1760,13 +1785,18 @@ With smp_mb__after_unlock_lock(), they cannot, so that:
> > smp_mb__after_unlock_lock();
> > *B = b;
> >
> >-will always occur as either of the following:
> >+The operations will always occur in one of the following orders:
> >
> >- STORE *A, RELEASE, ACQUIRE, STORE *B
> >- STORE *A, ACQUIRE, RELEASE, STORE *B
> >+ STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> >+ STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> >+ ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> >
> >-If the RELEASE and ACQUIRE were instead both operating on the same lock
> >-variable, only the first of these two alternatives can occur.
> >+If the RELEASE and ACQUIRE were instead both operating on the
> >+same lock variable, only the first of these two alternatives can
> >+occur. In addition, the more strongly ordered systems may rule out
> >+some of the above orders. But in any case, as noted earlier, the
> >+smp_mb__after_unlock_lock() ensures that the store to *A will always be
> >+seen as happening before the store to *B.
> >
> > Locks and semaphores may not provide any guarantee of ordering on UP compiled
> > systems, and so cannot be counted on in such a situation to actually achieve
>
> Thanks for your work on these docs (and rcu and locks and multi-arch barriers
> in general :) )

Glad you like them, and thank you for your review and comments!

Thanx, Paul

2014-02-24 16:27:38

by Paul E. McKenney

[permalink] [raw]
Subject: Re: memory-barriers.txt again (was Re: [PATCH 4/9] firewire: don't use PREPARE_DELAYED_WORK)

On Mon, Feb 24, 2014 at 01:32:54AM +0100, Stefan Richter wrote:
> On Feb 23 Paul E. McKenney wrote:
> >>> Please see below for a patch against the current version of
> >>> Documentation/memory-barriers.txt. Does this update help?
>
> Thank you, this clarifies it.
>
> [...]
> A new nit:
> > +The operations will always occur in one of the following orders:
> >
> > - STORE *A, RELEASE, ACQUIRE, STORE *B
> > - STORE *A, ACQUIRE, RELEASE, STORE *B
> > + STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> > + STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> > + ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
> >
> > -If the RELEASE and ACQUIRE were instead both operating on the same lock
> > -variable, only the first of these two alternatives can occur.
> > +If the RELEASE and ACQUIRE were instead both operating on the
> > +same lock variable, only the first of these two alternatives can
> > +occur.
> ^^^
> ...these {,three} alternatives...

Good catch! I chose the empty string.

Thanx, Paul