After checking all possible call chains to fs_send() here,
my tool finds that fs_send() is never called in atomic context.
And this function is assigned to a function pointer "dev->ops->send",
which is only called by vcc_sendmsg() (net/atm/common.c)
through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
it indicates that fs_send() can call functions which may sleep.
Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <[email protected]>
---
drivers/atm/firestream.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/atm/firestream.c b/drivers/atm/firestream.c
index d97c056..cce6f9f 100644
--- a/drivers/atm/firestream.c
+++ b/drivers/atm/firestream.c
@@ -1184,7 +1184,7 @@ static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
vcc->last_skb = skb;
- td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
+ td = kmalloc (sizeof (struct FS_BPENTRY), GFP_KERNEL);
fs_dprintk (FS_DEBUG_ALLOC, "Alloc transd: %p(%zd)\n", td, sizeof (struct FS_BPENTRY));
if (!td) {
/* Oops out of mem */
--
1.7.9.5
On Fri, Jan 26, 2018 at 04:00:27PM +0800, Jia-Ju Bai wrote:
> After checking all possible call chains to fs_send() here,
> my tool finds that fs_send() is never called in atomic context.
> And this function is assigned to a function pointer "dev->ops->send",
> which is only called by vcc_sendmsg() (net/atm/common.c)
> through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
> it indicates that fs_send() can call functions which may sleep.
> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
>
> This is found by a static analysis tool named DCNS written by myself.
The trouble is, places like
net/atm/raw.c:65: vcc->send = atm_send_aal0;
net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
? DROP_PACKET : 1;
bh_unlock_sock(sk_atm(vcc));
in pppoatm_send() definitely is called under a spinlock.
Looking through the driver (in advanced bitrot, as usual for drivers/atm),
I'd say that submit_queue() is fucked in head in the "queue full" case.
And judging by the history, had been thus since the original merge...
On 2018/1/26 20:05, Al Viro wrote:
> On Fri, Jan 26, 2018 at 04:00:27PM +0800, Jia-Ju Bai wrote:
>> After checking all possible call chains to fs_send() here,
>> my tool finds that fs_send() is never called in atomic context.
>> And this function is assigned to a function pointer "dev->ops->send",
>> which is only called by vcc_sendmsg() (net/atm/common.c)
>> through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
>> it indicates that fs_send() can call functions which may sleep.
>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
>>
>> This is found by a static analysis tool named DCNS written by myself.
> The trouble is, places like
> net/atm/raw.c:65: vcc->send = atm_send_aal0;
> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
> ? DROP_PACKET : 1;
> bh_unlock_sock(sk_atm(vcc));
> in pppoatm_send() definitely is called under a spinlock.
>
> Looking through the driver (in advanced bitrot, as usual for drivers/atm),
> I'd say that submit_queue() is fucked in head in the "queue full" case.
> And judging by the history, had been thus since the original merge...
Thanks for reply :)
I am sorry for this false positive.
I think other ATM related patches that I submitted are also false
positives, sorry.
My tool did not handle this situation of passing function pointer, and I
will improve the tool...
Thanks,
Jia-Ju Bai
On 2018/1/26 21:56, Jia-Ju Bai wrote:
>
>
> On 2018/1/26 20:05, Al Viro wrote:
>> On Fri, Jan 26, 2018 at 04:00:27PM +0800, Jia-Ju Bai wrote:
>>> After checking all possible call chains to fs_send() here,
>>> my tool finds that fs_send() is never called in atomic context.
>>> And this function is assigned to a function pointer "dev->ops->send",
>>> which is only called by vcc_sendmsg() (net/atm/common.c)
>>> through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
>>> it indicates that fs_send() can call functions which may sleep.
>>> Thus GFP_ATOMIC is not necessary, and it can be replaced with
>>> GFP_KERNEL.
>>>
>>> This is found by a static analysis tool named DCNS written by myself.
>> The trouble is, places like
>> net/atm/raw.c:65: vcc->send = atm_send_aal0;
>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
>> ? DROP_PACKET : 1;
>> bh_unlock_sock(sk_atm(vcc));
>> in pppoatm_send() definitely is called under a spinlock.
>>
>> Looking through the driver (in advanced bitrot, as usual for
>> drivers/atm),
>> I'd say that submit_queue() is fucked in head in the "queue full" case.
>> And judging by the history, had been thus since the original merge...
>
> Thanks for reply :)
>
> I am sorry for this false positive.
> I think other ATM related patches that I submitted are also false
> positives, sorry.
> My tool did not handle this situation of passing function pointer, and
> I will improve the tool...
>
>
> Thanks,
> Jia-Ju Bai
I check the code again, and confirm only my patches about "send" are
false positives.
I think other my patches that are about "open" does not has this problem:
https://marc.info/?l=linux-kernel&m=151693791432626&w=2
https://marc.info/?l=linux-kernel&m=151695475503314&w=2
https://marc.info/?l=linux-kernel&m=151693150131512&w=2
I hope you can have a check :)
Thanks,
Jia-Ju Bai
From: Al Viro <[email protected]>
Date: Fri, 26 Jan 2018 12:05:22 +0000
> On Fri, Jan 26, 2018 at 04:00:27PM +0800, Jia-Ju Bai wrote:
>> After checking all possible call chains to fs_send() here,
>> my tool finds that fs_send() is never called in atomic context.
>> And this function is assigned to a function pointer "dev->ops->send",
>> which is only called by vcc_sendmsg() (net/atm/common.c)
>> through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
>> it indicates that fs_send() can call functions which may sleep.
>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
>>
>> This is found by a static analysis tool named DCNS written by myself.
>
> The trouble is, places like
> net/atm/raw.c:65: vcc->send = atm_send_aal0;
> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
> ? DROP_PACKET : 1;
> bh_unlock_sock(sk_atm(vcc));
> in pppoatm_send() definitely is called under a spinlock.
>
> Looking through the driver (in advanced bitrot, as usual for drivers/atm),
> I'd say that submit_queue() is fucked in head in the "queue full" case.
> And judging by the history, had been thus since the original merge...
Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL
conversions.
Al's analysis above is similar to how things looked for other patches
you submited of this nature.
So because of the lack of care and serious auditing you put into these
conversions, I really have no choice but to drop them from my queue
because on the whole they are adding bugs rather than improving the
code.
Thank you.
From: Jia-Ju Bai <[email protected]>
Date: Fri, 26 Jan 2018 22:17:08 +0800
>
>
> On 2018/1/26 21:56, Jia-Ju Bai wrote:
>>
>>
>> On 2018/1/26 20:05, Al Viro wrote:
>>> On Fri, Jan 26, 2018 at 04:00:27PM +0800, Jia-Ju Bai wrote:
>>>> After checking all possible call chains to fs_send() here,
>>>> my tool finds that fs_send() is never called in atomic context.
>>>> And this function is assigned to a function pointer "dev->ops->send",
>>>> which is only called by vcc_sendmsg() (net/atm/common.c)
>>>> through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
>>>> it indicates that fs_send() can call functions which may sleep.
>>>> Thus GFP_ATOMIC is not necessary, and it can be replaced with
>>>> GFP_KERNEL.
>>>>
>>>> This is found by a static analysis tool named DCNS written by myself.
>>> The trouble is, places like
>>> net/atm/raw.c:65: vcc->send = atm_send_aal0;
>>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
>>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
>>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
>>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
>>> ? DROP_PACKET : 1;
>>> bh_unlock_sock(sk_atm(vcc));
>>> in pppoatm_send() definitely is called under a spinlock.
>>>
>>> Looking through the driver (in advanced bitrot, as usual for
>>> drivers/atm),
>>> I'd say that submit_queue() is fucked in head in the "queue full"
>>> case.
>>> And judging by the history, had been thus since the original merge...
>>
>> Thanks for reply :)
>>
>> I am sorry for this false positive.
>> I think other ATM related patches that I submitted are also false
>> positives, sorry.
>> My tool did not handle this situation of passing function pointer, and
>> I will improve the tool...
>>
>>
>> Thanks,
>> Jia-Ju Bai
>
> I check the code again, and confirm only my patches about "send" are
> false positives.
> I think other my patches that are about "open" does not has this
> problem:
> https://marc.info/?l=linux-kernel&m=151693791432626&w=2
> https://marc.info/?l=linux-kernel&m=151695475503314&w=2
> https://marc.info/?l=linux-kernel&m=151693150131512&w=2
>
> I hope you can have a check :)
No, _you_ have a check.
All of these patches will be dropped, sorry.
From: Jia-Ju Bai <[email protected]>
Date: Fri, 26 Jan 2018 21:56:32 +0800
> My tool did not handle this situation of passing function pointer, and
> I will improve the tool...
Never automate these kinds of changes.
Actually audit the change fully _yourself_ as a human after the tool
indicates a possibility to you.
This is why most of your changes have bugs, you rely on the tool too
much.
On 2018/1/27 0:07, David Miller wrote:
> From: Al Viro <[email protected]>
> Date: Fri, 26 Jan 2018 12:05:22 +0000
>
>> On Fri, Jan 26, 2018 at 04:00:27PM +0800, Jia-Ju Bai wrote:
>>> After checking all possible call chains to fs_send() here,
>>> my tool finds that fs_send() is never called in atomic context.
>>> And this function is assigned to a function pointer "dev->ops->send",
>>> which is only called by vcc_sendmsg() (net/atm/common.c)
>>> through vcc->dev->ops->send(), and vcc_sendmsg() calls schedule(),
>>> it indicates that fs_send() can call functions which may sleep.
>>> Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.
>>>
>>> This is found by a static analysis tool named DCNS written by myself.
>> The trouble is, places like
>> net/atm/raw.c:65: vcc->send = atm_send_aal0;
>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
>> ? DROP_PACKET : 1;
>> bh_unlock_sock(sk_atm(vcc));
>> in pppoatm_send() definitely is called under a spinlock.
>>
>> Looking through the driver (in advanced bitrot, as usual for drivers/atm),
>> I'd say that submit_queue() is fucked in head in the "queue full" case.
>> And judging by the history, had been thus since the original merge...
> Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL
> conversions.
>
> Al's analysis above is similar to how things looked for other patches
> you submited of this nature.
>
> So because of the lack of care and serious auditing you put into these
> conversions, I really have no choice but to drop them from my queue
> because on the whole they are adding bugs rather than improving the
> code.
>
> Thank you.
Okay, I can understand that careless GFP_KERNEL conversions will
introduce bugs.
Your concern is right.
Sorry for my previous incorrect patches...
I will manually carefully check and audit my patches before I submit them.
Thanks,
Jia-Ju Bai
On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote:
> >> This is found by a static analysis tool named DCNS written by myself.
> >
> > The trouble is, places like
> > net/atm/raw.c:65: vcc->send = atm_send_aal0;
> > net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
> > net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
> > mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
> > ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
> > ? DROP_PACKET : 1;
> > bh_unlock_sock(sk_atm(vcc));
> > in pppoatm_send() definitely is called under a spinlock.
> >
> > Looking through the driver (in advanced bitrot, as usual for drivers/atm),
> > I'd say that submit_queue() is fucked in head in the "queue full" case.
> > And judging by the history, had been thus since the original merge...
>
> Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL
> conversions.
>
> Al's analysis above is similar to how things looked for other patches
> you submited of this nature.
>
> So because of the lack of care and serious auditing you put into these
> conversions, I really have no choice but to drop them from my queue
> because on the whole they are adding bugs rather than improving the
> code.
FWIW, the tool *does* promise to be useful - as far as I understand it
looks for places where GFP_ATOMIC allocation goes with blocking operations
near every callchain leading there. And that indicates something fishy
going on - either pointless GFP_ATOMIC (in benign case), or something
much nastier: a callchain that would require GFP_ATOMIC. In that case
whatever blocking operation found along the way is a bug.
This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c -
static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)
{
u32 wp;
struct FS_QENTRY *cqe;
/* XXX Sanity check: the write pointer can be checked to be
still the same as the value passed as qe... -- REW */
/* udelay (5); */
while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) {
fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n",
q->offset);
schedule ();
}
...
static void submit_queue (struct fs_dev *dev, struct queue *q,
u32 cmd, u32 p1, u32 p2, u32 p3)
{
struct FS_QENTRY *qe;
qe = get_qentry (dev, q);
qe->cmd = cmd;
qe->p0 = p1;
qe->p1 = p2;
qe->p2 = p3;
submit_qentry (dev, q, qe);
...
static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
{
...
td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
...
submit_queue (dev, &dev->hp_txq,
QE_TRANSMIT_DE | vcc->channo,
virt_to_bus (td), 0,
virt_to_bus (td));
...
Either all callchains leading to fs_send() are in non-atomic contexts
(in which case that GFP_ATOMIC would be pointless) or there's one
where we cannot block. Any such would be a potential deadlock.
And the latter appears to be the case - fs_send() is atmdev_ops ->send(),
which can end up in atm_vcc ->send(), which can be called from under
->sk_lock.slock.
What would be really useful:
* list of "here's a list of locations where we do something
blocking; each callchain to this GFP_ATOMIC allocation passes either
upstream of one of those without leaving atomic context in between
or downstream without entering one".
* after that - backtracking these callchains further, watching
for ones in atomic contexts. Can be done manually, but if that tool
can assist in doing so, all the better. If we do find one, we have
found a deadlock - just take the blocking operation reported for
that callchain and that's it. If it's not an obvious false positive
(e.g.
if (!foo)
spin_lock(&splat);
.....
if (foo)
schedule();
), it's worth reporting to maintainers of the code in question.
* if all callchains reach obviously non-atomic contexts
(syscall entry, for example, or a kernel thread payload, or
a function documented to require a mutex held by caller, etc.) -
then a switch to GFP_KERNEL might be appropriate. With analysis
of callchains posted as you are posting that.
* either way, having the tool print the callchains out
would be a good idea - for examining them, for writing reports,
etc.
On 2018/1/27 1:08, Al Viro wrote:
> On Fri, Jan 26, 2018 at 11:07:39AM -0500, David Miller wrote:
>>>> This is found by a static analysis tool named DCNS written by myself.
>>> The trouble is, places like
>>> net/atm/raw.c:65: vcc->send = atm_send_aal0;
>>> net/atm/raw.c:74: vcc->send = vcc->dev->ops->send;
>>> net/atm/raw.c:83: vcc->send = vcc->dev->ops->send;
>>> mean extra call chains. It's *not* just vcc_sendmsg(), and e.g.
>>> ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
>>> ? DROP_PACKET : 1;
>>> bh_unlock_sock(sk_atm(vcc));
>>> in pppoatm_send() definitely is called under a spinlock.
>>>
>>> Looking through the driver (in advanced bitrot, as usual for drivers/atm),
>>> I'd say that submit_queue() is fucked in head in the "queue full" case.
>>> And judging by the history, had been thus since the original merge...
>> Jia-Ju, I'm probably not going to apply any of your GFP_KERNEL
>> conversions.
>>
>> Al's analysis above is similar to how things looked for other patches
>> you submited of this nature.
>>
>> So because of the lack of care and serious auditing you put into these
>> conversions, I really have no choice but to drop them from my queue
>> because on the whole they are adding bugs rather than improving the
>> code.
> FWIW, the tool *does* promise to be useful
Thanks, I am happy to hear that :)
> - as far as I understand it
> looks for places where GFP_ATOMIC allocation goes with blocking operations
> near every callchain leading there. And that indicates something fishy
> going on - either pointless GFP_ATOMIC (in benign case), or something
> much nastier: a callchain that would require GFP_ATOMIC. In that case
> whatever blocking operation found along the way is a bug.
In fact, my tool first collects all places of GFP_ATOMIC and mdelay in
the whole kernel code.
Then it starts analysis from the entry of each interrupt handler and
spin-lock function call in the whole kernel code,
to mark the places of GFP_ATOMIC and mdelay that are called in atomic
context.
The remaining unmarked places of GFP_ATOMIC and mdelay are reported and
can be replaced with GFP_KERNEL and mdelay (or usleep_range).
Though my tool has handled some common situations of function pointers,
But it does not well handle function pointer passing in this code
(vcc->send = vcc->dev->ops->send), so the tool needs to be improved.
I am sorry for my incorrect report...
> This time it has, AFAICS, caught a genuine bug in drivers/atm/firestream.c -
> static void submit_qentry (struct fs_dev *dev, struct queue *q, struct FS_QENTRY *qe)
> {
> u32 wp;
> struct FS_QENTRY *cqe;
>
> /* XXX Sanity check: the write pointer can be checked to be
> still the same as the value passed as qe... -- REW */
> /* udelay (5); */
> while ((wp = read_fs (dev, Q_WP (q->offset))) & Q_FULL) {
> fs_dprintk (FS_DEBUG_TXQ, "Found queue at %x full. Waiting.\n",
> q->offset);
> schedule ();
> }
> ...
> static void submit_queue (struct fs_dev *dev, struct queue *q,
> u32 cmd, u32 p1, u32 p2, u32 p3)
> {
> struct FS_QENTRY *qe;
>
> qe = get_qentry (dev, q);
> qe->cmd = cmd;
> qe->p0 = p1;
> qe->p1 = p2;
> qe->p2 = p3;
> submit_qentry (dev, q, qe);
> ...
> static int fs_send (struct atm_vcc *atm_vcc, struct sk_buff *skb)
> {
> ...
> td = kmalloc (sizeof (struct FS_BPENTRY), GFP_ATOMIC);
> ...
> submit_queue (dev, &dev->hp_txq,
> QE_TRANSMIT_DE | vcc->channo,
> virt_to_bus (td), 0,
> virt_to_bus (td));
> ...
>
> Either all callchains leading to fs_send() are in non-atomic contexts
> (in which case that GFP_ATOMIC would be pointless) or there's one
> where we cannot block. Any such would be a potential deadlock.
>
> And the latter appears to be the case - fs_send() is atmdev_ops ->send(),
> which can end up in atm_vcc ->send(), which can be called from under
> ->sk_lock.slock.
Yes, I think schedule() can cause a sleep-in-atomic-context bug.
> What would be really useful:
> * list of "here's a list of locations where we do something
> blocking; each callchain to this GFP_ATOMIC allocation passes either
> upstream of one of those without leaving atomic context in between
> or downstream without entering one".
> * after that - backtracking these callchains further, watching
> for ones in atomic contexts. Can be done manually, but if that tool
> can assist in doing so, all the better. If we do find one, we have
> found a deadlock - just take the blocking operation reported for
> that callchain and that's it. If it's not an obvious false positive
> (e.g.
> if (!foo)
> spin_lock(&splat);
> .....
> if (foo)
> schedule();
> ), it's worth reporting to maintainers of the code in question.
> * if all callchains reach obviously non-atomic contexts
> (syscall entry, for example, or a kernel thread payload, or
> a function documented to require a mutex held by caller, etc.) -
> then a switch to GFP_KERNEL might be appropriate. With analysis
> of callchains posted as you are posting that.
> * either way, having the tool print the callchains out
> would be a good idea - for examining them, for writing reports,
> etc.
Thanks for your very helpful advice :)
I will follow it in my patches.
Thanks,
Jia-Ju Bai