2011-02-09 22:38:55

by Ray Jui

[permalink] [raw]
Subject: RE: Questions about Linux workqueue in 2.6.36

Hi Tejun,

Thanks for your quick response.

You are right. The problem with the rescuer is that when it is leaving a gcwq,
it does not check to see whether it needs to wake up a worker to process subsequent work items. After applying your patch, I see pid 26 getting waken up by the rescuer and work items from mmc1 and mmc2 are now being processed properly.

I also did a quick profiling on the amount of time it takes to create a new worker on our system. It takes 15 ~ 20 ms. This is not surprising as it's an embedded system and it was during kernel boot up and the system was getting quite busy. Regardless, the current 10 ms initial mayday timeout has resolution issues on a lot of embedded systems with worse timer resolutions (e.g., 10 ms on our system).

Thanks for your help on this issue.

Ray Jui
Staff Software Engineer
604-233-9371
[email protected]
Broadcom Canada

-----Original Message-----
From: Tejun Heo [mailto:[email protected]] On Behalf Of Tejun Heo
Sent: Wednesday, February 09, 2011 2:29 AM
To: Ray Jui
Subject: Re: Questions about Linux workqueue in 2.6.36

Hello,

Would you mind cc'ing [email protected] when replying? I'm
quoting the whole message. If you cc lkml when replying, please do
likewise so that the context isn't lost.

On Tue, Feb 08, 2011 at 10:30:47AM -0800, Ray Jui wrote:
> We recently upgraded our system to 2.6.36.3 and are seeing a
> potential workqueue related issue that affects the SD card
> detection. Just wonder if you can provide some help here and that
> will be much appreciated. By the way, we are running 2.6.36.3 on an
> embedded system with ARM cortex A9.
>
> Here's some background information: Our system uses SD/MMC for 3
> different devices: 1) eMMC that contains the root file system
> (mmc0); 2) standard SD card for mass storage (mmc1); 3) WiFi through
> the SDIO interface (mmc2). We use the open source SDHCI driver with
> some mods which we believe have nothing to do with the problem that
> we are experiencing.
>
> Quick summary of the problem The SDHCI driver calls
> mmc_detect_change at the driver probe for initial card detection and
> initialization, and that eventually calls queue_delayed_work to
> submit the card detection work item to the workqueue created in the
> MMC core with API create_singlethread_workqueue, which now calls the
> new API alloc_workqueue with the two flags WQ_UNBOUND and WQ_RESUER
> set. What we are seeing is that the first work item (mmc0) gets
> queued up and processed properly. The 2nd and 3rd work items (mmc1
> and mmc2) also got queued up but were never processed. Note that 1)
> queue_delayed_work is called with a delay of zero, so it calls
> queue_work to queue the work directly instead of setting up a timer
> to queue it in the future; 2) work items from mmc0, mmc1, and mmc2
> are different work items (with different addresses) although they
> end up using the same function when the work is eventually being
> processed (i.e., mmc_rescan).
>
> Worker thread information:
> pid 4 root [kworker/0:0]
> pid 5 root [kworker/u:0]
> pid 26 root [kworker/u:1]
> pid 37 root [kworker/0:1]

Weird, there should be more workers during boot.

> Detailed work flow:
>
> 1. First mmc0 work is queued and kworker/u:0 (pid 5) is waken up
>
> 2. 2nd mmc0 work is submitted before the first one is processed, the
> kernel detects it (with bit WORK_STRUCT_PENDING_BIT) and drops
> the 2nd work
>
> 3. Somehow kworker/u:0 is put to sleep by the scheduler in the
> middle, now the system calls mayday and the rescuer thread kicks
> in to process the first mmc0 work

It's a bit weird. Unless something is already going wrong, the
manager should be able to create a new worker. Rescuers are called
iff the worker assuming the manager role can't create a new worker in
certain amount of time. Maybe timer isn't working properly or
something.

> 4. While the first mmc0 work is being processed, kworker/u:0 (pid 5)
> is waken up again, but then it found the work is being processed
> by the rescuer, so it goes back to sleep

Yeah, this looks like a proper bug. When rescuer is leaving a gcwq,
it should wake up a worker; otherwise, the execution slot that the
rescuer was occupying could get lost.

> 5. kworker/u:1 (pid 26) is being created
>
> 6. mmc1 work is queued, pid 26 waken up but then goes back to sleep
>
> 7. mmc2 work is queued, pid 26 waken up but then goes back to sleep
>
> 8. First mmc0 work is done and the rescuer goes back to sleep

Yeah, so, the rescuer should kick pid 26 at this point.

> 9. Pid 5 and 26 sleep forever and never wake up
>
> As you can see from the detailed work flow, mmc1 and mmc2 work items
> were never processed. After this happens, if I submit a new work
> item or flush the workqueue, that would wake up pid 26 which will
> process the queued work items for mmc1 and mmc2.
>
> Here are my questions:
>
> 1. This happens during kernel boot up, is that why the mmc0 work
> item is being processed on the rescuer thread? According to the
> workqueue documentation, rescuer is supposed to be used only when
> there's a "memory pressure" on the system. I have a feeling that
> the problem has something to do with the first work item being
> processed on the rescuer thread. If it is being processed on the
> normal kworker thread, then it likely be fine

I don't know why it's behaving like that. As I wrote above, rescuer
is summoned iff the manager (any worker can assume the role) can't
create another worker in some amount of time. It's currently set at
10ms. Hmm... if CONFIG_HZ is 100, this means that the timer can
trigger almost any time depending on where in the 10ms window the
allocation starts. This should probably be updated to somthing like
max(HZ / 100, 2).

> 2. Do you see any obvious cause of this problem? If not, would you
> please make some suggestions on how I should proceed with debugging
> this issue?

Yeah, please try the patch at the end. Please let me know if this
resolves the execution stall.

Thanks for the detailed analysis.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 11869fa..90a17ca 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2047,6 +2047,15 @@ repeat:
move_linked_works(work, scheduled, &n);

process_scheduled_works(rescuer);
+
+ /*
+ * Leave this gcwq. If keep_working() is %true, notify a
+ * regular worker; otherwise, we end up with 0 concurrency
+ * and stalling the execution.
+ */
+ if (keep_working(gcwq))
+ wake_up_worker(gcwq);
+
spin_unlock_irq(&gcwq->lock);
}


--
tejun


2011-02-14 13:04:47

by Tejun Heo

[permalink] [raw]
Subject: Re: Questions about Linux workqueue in 2.6.36

Hello,

On Wed, Feb 09, 2011 at 02:38:36PM -0800, Ray Jui wrote:
> You are right. The problem with the rescuer is that when it is
> leaving a gcwq, it does not check to see whether it needs to wake up
> a worker to process subsequent work items. After applying your
> patch, I see pid 26 getting waken up by the rescuer and work items
> from mmc1 and mmc2 are now being processed properly.

Cool, I'll send the patch to upstream and stable.

> I also did a quick profiling on the amount of time it takes to
> create a new worker on our system. It takes 15 ~ 20 ms. This is not
> surprising as it's an embedded system and it was during kernel boot
> up and the system was getting quite busy. Regardless, the current 10
> ms initial mayday timeout has resolution issues on a lot of embedded
> systems with worse timer resolutions (e.g., 10 ms on our system).

Can you please verify the second patch removes the unncessary rescuer
invocation?

Thanks.

--
tejun

2011-02-14 13:10:01

by Tejun Heo

[permalink] [raw]
Subject: [PATCH wq#fixes-2.6.38] workqueue: wake up a worker when a rescuer is leaving a gcwq

After executing the matching works, a rescuer leaves the gcwq whether
there are more pending works or not. This may decrease the
concurrency level to zero and stall execution until a new work item is
queued on the gcwq.

Make rescuer wake up a regular worker when it leaves a gcwq if there
are more works to execute, so that execution isn't stalled.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Ray Jui <[email protected]>
Cc: [email protected]
---
kernel/workqueue.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 11869fa..90a17ca 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2047,6 +2047,15 @@ repeat:
move_linked_works(work, scheduled, &n);

process_scheduled_works(rescuer);
+
+ /*
+ * Leave this gcwq. If keep_working() is %true, notify a
+ * regular worker; otherwise, we end up with 0 concurrency
+ * and stalling the execution.
+ */
+ if (keep_working(gcwq))
+ wake_up_worker(gcwq);
+
spin_unlock_irq(&gcwq->lock);
}

--
1.7.1

2011-02-15 17:57:48

by Ray Jui

[permalink] [raw]
Subject: RE: Questions about Linux workqueue in 2.6.36

Hi Tejun,

Regarding your second patch, which changes the initial mayday timeout to
MAYDAY_INITIAL_TIMEOUT = HZ / 100 >= 2 ? HZ / 100 : 2,

That's a minimum of 2 timer ticks, which is 20 ms in our system. Since the timer resolution in our system is 10 ms and it takes 15 ~ 20 ms to create a new kworker during kernel startup in our system, I verified that I need to change the minimum timer tick to 3 for the rescuer NOT to kick in our system during startup.

Thanks,

Ray

-----Original Message-----
From: Tejun Heo [mailto:[email protected]] On Behalf Of Tejun Heo
Sent: Monday, February 14, 2011 5:03 AM
To: Ray Jui
Cc: [email protected]
Subject: Re: Questions about Linux workqueue in 2.6.36

Hello,

On Wed, Feb 09, 2011 at 02:38:36PM -0800, Ray Jui wrote:
> You are right. The problem with the rescuer is that when it is
> leaving a gcwq, it does not check to see whether it needs to wake up
> a worker to process subsequent work items. After applying your
> patch, I see pid 26 getting waken up by the rescuer and work items
> from mmc1 and mmc2 are now being processed properly.

Cool, I'll send the patch to upstream and stable.

> I also did a quick profiling on the amount of time it takes to
> create a new worker on our system. It takes 15 ~ 20 ms. This is not
> surprising as it's an embedded system and it was during kernel boot
> up and the system was getting quite busy. Regardless, the current 10
> ms initial mayday timeout has resolution issues on a lot of embedded
> systems with worse timer resolutions (e.g., 10 ms on our system).

Can you please verify the second patch removes the unncessary rescuer
invocation?

Thanks.

--
tejun

2011-02-16 16:53:04

by Tejun Heo

[permalink] [raw]
Subject: Re: Questions about Linux workqueue in 2.6.36

Hello,

On Tue, Feb 15, 2011 at 09:57:21AM -0800, Ray Jui wrote:
> Regarding your second patch, which changes the initial mayday timeout to
> MAYDAY_INITIAL_TIMEOUT = HZ / 100 >= 2 ? HZ / 100 : 2,
>
> That's a minimum of 2 timer ticks, which is 20 ms in our
> system. Since the timer resolution in our system is 10 ms and it
> takes 15 ~ 20 ms to create a new kworker during kernel startup in
> our system, I verified that I need to change the minimum timer tick
> to 3 for the rescuer NOT to kick in our system during startup.

I think calling rescuers in that case is not perfect but okay. It'll
only happen while the level of workers is below stable level. It
would still be better to avoid calling in rescuers almost immediately
with 0 or 1 jiffy timeout.

Thanks.

--
tejun

2011-02-16 17:19:14

by Tejun Heo

[permalink] [raw]
Subject: [PATCH wq#fixes-2.6.38] workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long

MAYDAY_INITIAL_TIMEOUT is defined as HZ / 100 and depending on
configuration may end up 0 or 1. Even when it's 1, depending on when
the mayday timer is added in the current jiffy interval, it may expire
way before a jiffy has passed.

Make sure MAYDAY_INITIAL_TIMEOUT is at least two to guarantee that at
least a full jiffy has passed before calling rescuers.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Ray Jui <[email protected]>
Cc: [email protected]
---
kernel/workqueue.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 88a3e34..ee6578b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -79,7 +79,9 @@ enum {
MAX_IDLE_WORKERS_RATIO = 4, /* 1/4 of busy can be idle */
IDLE_WORKER_TIMEOUT = 300 * HZ, /* keep idle ones for 5 mins */

- MAYDAY_INITIAL_TIMEOUT = HZ / 100, /* call for help after 10ms */
+ MAYDAY_INITIAL_TIMEOUT = HZ / 100 >= 2 ? HZ / 100 : 2,
+ /* call for help after 10ms
+ (min two ticks) */
MAYDAY_INTERVAL = HZ / 10, /* and then every 100ms */
CREATE_COOLDOWN = HZ, /* time to breath after fail */
TRUSTEE_COOLDOWN = HZ / 10, /* for trustee draining */
--
1.7.1