The channel level "_show" functions are vulnerable to race conditions.
Add a mutex lock and unlock around the call to the channel level "_show"
functions in vmbus_chan_attr_show().
This problem was discussed here: https://lkml.org/lkml/2018/10/18/830
Signed-off-by: Kimberly Brown <[email protected]>
---
drivers/hv/vmbus_drv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 403fee01572c..e8189bc168da 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1414,6 +1414,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
= container_of(attr, struct vmbus_chan_attribute, attr);
const struct vmbus_channel *chan
= container_of(kobj, struct vmbus_channel, kobj);
+ ssize_t ret;
if (!attribute->show)
return -EIO;
@@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
if (chan->state != CHANNEL_OPENED_STATE)
return -EINVAL;
- return attribute->show(chan, buf);
+ mutex_lock(&vmbus_connection.channel_mutex);
+ ret = attribute->show(chan, buf);
+ mutex_unlock(&vmbus_connection.channel_mutex);
+ return ret;
}
static const struct sysfs_ops vmbus_chan_sysfs_ops = {
--
2.17.1
> From: Kimberly Brown <[email protected]>
> Sent: Monday, January 21, 2019 6:08 PM
> Subject: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions
>
> The channel level "_show" functions are vulnerable to race conditions.
> Add a mutex lock and unlock around the call to the channel level "_show"
> functions in vmbus_chan_attr_show().
>
> This problem was discussed here:
> https://lkml.org/lkml/2018/10/18/830
>
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1414,6 +1414,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject
> *kobj,
> = container_of(attr, struct vmbus_chan_attribute, attr);
> const struct vmbus_channel *chan
> = container_of(kobj, struct vmbus_channel, kobj);
> + ssize_t ret;
>
> if (!attribute->show)
> return -EIO;
> @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> kobject *kobj,
> if (chan->state != CHANNEL_OPENED_STATE)
> return -EINVAL;
>
> - return attribute->show(chan, buf);
> + mutex_lock(&vmbus_connection.channel_mutex);
> + ret = attribute->show(chan, buf);
> + mutex_unlock(&vmbus_connection.channel_mutex);
> + return ret;
> }
It looks this patch is already able to fix the original issue:
6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
as chan->state can't be CHANNEL_OPENED_STATE when
channel->ringbuffer_page is not set up yet, or has been freed.
Thanks,
-- Dexuan
On Tue, Jan 22, 2019 at 03:46:48AM +0000, Dexuan Cui wrote:
> > From: Kimberly Brown <[email protected]>
> > Sent: Monday, January 21, 2019 6:08 PM
> > Subject: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions
> >
> > The channel level "_show" functions are vulnerable to race conditions.
> > Add a mutex lock and unlock around the call to the channel level "_show"
> > functions in vmbus_chan_attr_show().
> >
> > This problem was discussed here:
> > https://lkml.org/lkml/2018/10/18/830
> >
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -1414,6 +1414,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject
> > *kobj,
> > = container_of(attr, struct vmbus_chan_attribute, attr);
> > const struct vmbus_channel *chan
> > = container_of(kobj, struct vmbus_channel, kobj);
> > + ssize_t ret;
> >
> > if (!attribute->show)
> > return -EIO;
> > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > kobject *kobj,
> > if (chan->state != CHANNEL_OPENED_STATE)
> > return -EINVAL;
> >
> > - return attribute->show(chan, buf);
> > + mutex_lock(&vmbus_connection.channel_mutex);
> > + ret = attribute->show(chan, buf);
> > + mutex_unlock(&vmbus_connection.channel_mutex);
> > + return ret;
> > }
>
> It looks this patch is already able to fix the original issue:
> 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> as chan->state can't be CHANNEL_OPENED_STATE when
> channel->ringbuffer_page is not set up yet, or has been freed.
>
> Thanks,
> -- Dexuan
I think that patch 6712cc9c2211 fixes the problem when the channel is
not set up yet, but it doesn't fix the problem when the channel is being
closed. The channel could be freed after the check that "chan->state" is
CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.
Actually, there should be checks that "chan" is not null and that
"chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
need to fix that.
> From: Kimberly Brown <[email protected]>
> Sent: Monday, January 21, 2019 10:43 PM
> > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > kobject *kobj,
> > > if (chan->state != CHANNEL_OPENED_STATE)
> > > return -EINVAL;
> > >
> > > - return attribute->show(chan, buf);
> > > + mutex_lock(&vmbus_connection.channel_mutex);
> > > + ret = attribute->show(chan, buf);
> > > + mutex_unlock(&vmbus_connection.channel_mutex);
> > > + return ret;
> > > }
> >
> > It looks this patch is already able to fix the original issue:
> > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > as chan->state can't be CHANNEL_OPENED_STATE when
> > channel->ringbuffer_page is not set up yet, or has been freed.
> > -- Dexuan
>
> I think that patch 6712cc9c2211 fixes the problem when the channel is
> not set up yet, but it doesn't fix the problem when the channel is being
> closed
Yeah, now I see your point.
> The channel could be freed after the check that "chan->state" is
> CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.
IMO the channel struct itself can't be freed while the "attribute->show()"
function is running, because I suppose the sysfs interface should have an extra
reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
files are removed, the channel struct itself can't disappear.
(I didn't check the related code very carefully, so I could be wrong. :-) )
But as you pointed, at least for sub-channels, channel->ringbuffer_page
can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
the "attribute->show()" could crash when the race happens.
Adding channel_mutex here seems to be able to fix the race for
sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().
For a primary channel, vmbus_close() -> vmbus_free_ring() can still
free channel->ringbuffer_page without notifying the "attribute->show()".
We may also need to acquire/release the channel_mutex in vmbus_close()?
> Actually, there should be checks that "chan" is not null and that
> "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> need to fix that.
I suppose "chan" can not be NULL here (see the above).
Checking "chan->state" may not help to completely fix the race, because
there is no locking/synchronization code in
vmbus_close_internal() when we test and change "channel->state".
I guess we may need to check if channel->ringbuffer_page is NULL in
the "attribute->show()".
PS, to prove that a race condition does exist and can really cause a panic or
something, I usually add some msleep() delays in different paths so that I
can reproduce the crash every time I take a special action, e.g. when I read
the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
a patch can indeed help, at least it can fix the crash which would happen
without the patch. :-)
-- Dexuan
On Tue, Jan 22, 2019 at 06:40:30PM +0000, Dexuan Cui wrote:
> > From: Kimberly Brown <[email protected]>
> > Sent: Monday, January 21, 2019 10:43 PM
> > > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > > kobject *kobj,
> > > > if (chan->state != CHANNEL_OPENED_STATE)
> > > > return -EINVAL;
> > > >
> > > > - return attribute->show(chan, buf);
> > > > + mutex_lock(&vmbus_connection.channel_mutex);
> > > > + ret = attribute->show(chan, buf);
> > > > + mutex_unlock(&vmbus_connection.channel_mutex);
> > > > + return ret;
> > > > }
> > >
> > > It looks this patch is already able to fix the original issue:
> > > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > > as chan->state can't be CHANNEL_OPENED_STATE when
> > > channel->ringbuffer_page is not set up yet, or has been freed.
> > > -- Dexuan
> >
> > I think that patch 6712cc9c2211 fixes the problem when the channel is
> > not set up yet, but it doesn't fix the problem when the channel is being
> > closed
> Yeah, now I see your point.
>
> > The channel could be freed after the check that "chan->state" is
> > CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.
>
> IMO the channel struct itself can't be freed while the "attribute->show()"
> function is running, because I suppose the sysfs interface should have an extra
> reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
> files are removed, the channel struct itself can't disappear.
> (I didn't check the related code very carefully, so I could be wrong. :-) )
>
I think that you're correct that the channel struct can't be freed while
the "attribute->show()" function is running. I tested this by calling
msleep() in chan_attr_show(), opening a subchannel sysfs file, and
unloading hv_netvsc. Unloading hv_netvsc should result in calls to
vmbus_chan_release() for each subchannel. I confirmed that
vmbus_chan_release() isn't called until the "attribute->show()" function
returns.
> But as you pointed, at least for sub-channels, channel->ringbuffer_page
> can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> the "attribute->show()" could crash when the race happens.
> Adding channel_mutex here seems to be able to fix the race for
> sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().
>
> For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> free channel->ringbuffer_page without notifying the "attribute->show()".
> We may also need to acquire/release the channel_mutex in vmbus_close()?
>
> > Actually, there should be checks that "chan" is not null and that
> > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > need to fix that.
> I suppose "chan" can not be NULL here (see the above).
>
> Checking "chan->state" may not help to completely fix the race, because
> there is no locking/synchronization code in
> vmbus_close_internal() when we test and change "channel->state".
>
The calls to vmbus_close_internal() for the subchannels and the primary
channel are protected with channel_mutex in vmbus_disconnect_ring().
This prevents "channel->state" from changing while "attribute->show()" is
running.
> I guess we may need to check if channel->ringbuffer_page is NULL in
> the "attribute->show()".
>
For the primary channel, vmbus_free_ring() is called after the
return from vmbus_disconnect_ring(). Therefore, the primary channel's
state is changed before "channel->ringbuffer_page" is set to NULL.
Checking the channel state should be sufficient to prevent the ring
buffers from being freed while "attribute->show()" is running. The
ring buffers can't be freed until the channel's state is changed, and
the channel state change is protected by the mutex.
I think checking that "channel->ringbuffer_page" is not NULL would
also work, but, as you stated, we would need to aquire/release
channel_mutex in vmbus_close().
> PS, to prove that a race condition does exist and can really cause a panic or
> something, I usually add some msleep() delays in different paths so that I
> can reproduce the crash every time I take a special action, e.g. when I read
> the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
> a patch can indeed help, at least it can fix the crash which would happen
> without the patch. :-)
>
Thanks! I was able to free the ring buffers while "attribute->show()"
was running, which caused a null pointer dereference bug. As expected,
the mutex lock fixed it.
> -- Dexuan
> From: Kimberly Brown <[email protected]>
> > ...
> > But as you pointed, at least for sub-channels, channel->ringbuffer_page
> > can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> > the "attribute->show()" could crash when the race happens.
> > Adding channel_mutex here seems to be able to fix the race for
> > sub-channels, as the same channel_mutex is used in
> vmbus_disconnect_ring().
> >
> > For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> > free channel->ringbuffer_page without notifying the "attribute->show()".
> > We may also need to acquire/release the channel_mutex in vmbus_close()?
> >
> > > Actually, there should be checks that "chan" is not null and that
> > > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > > need to fix that.
> > I suppose "chan" can not be NULL here (see the above).
> >
> > Checking "chan->state" may not help to completely fix the race, because
> > there is no locking/synchronization code in
> > vmbus_close_internal() when we test and change "channel->state".
> >
>
> The calls to vmbus_close_internal() for the subchannels and the primary
> channel are protected with channel_mutex in vmbus_disconnect_ring().
> This prevents "channel->state" from changing while "attribute->show()" is
> running.
Ah, I think you're right.
> > I guess we may need to check if channel->ringbuffer_page is NULL in
> > the "attribute->show()".
> >
>
> For the primary channel, vmbus_free_ring() is called after the
> return from vmbus_disconnect_ring(). Therefore, the primary channel's
> state is changed before "channel->ringbuffer_page" is set to NULL.
> Checking the channel state should be sufficient to prevent the ring
> buffers from being freed while "attribute->show()" is running. The
> ring buffers can't be freed until the channel's state is changed, and
> the channel state change is protected by the mutex.
I think you're right (I noticed in a previous mail you mentioned you would
improve your patch to check "chan->state" with the mutax held).
> I think checking that "channel->ringbuffer_page" is not NULL would
> also work, but, as you stated, we would need to aquire/release
> channel_mutex in vmbus_close().
Then it looks unnecessary to check "channel->ringbuffer_page".
> > PS, to prove that a race condition does exist and can really cause a panic or
> > something, I usually add some msleep() delays in different paths so that I
> > can reproduce the crash every time I take a special action, e.g. when I read
> > the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
> > a patch can indeed help, at least it can fix the crash which would happen
> > without the patch. :-)
> >
>
> Thanks! I was able to free the ring buffers while "attribute->show()"
> was running, which caused a null pointer dereference bug. As expected,
> the mutex lock fixed it.
Awesome!
-- Dexuan
On Tue, Jan 29, 2019 at 07:20:28PM +0000, Dexuan Cui wrote:
>> From: Kimberly Brown <[email protected]>
>> > ...
>> > But as you pointed, at least for sub-channels, channel->ringbuffer_page
>> > can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
>> > the "attribute->show()" could crash when the race happens.
>> > Adding channel_mutex here seems to be able to fix the race for
>> > sub-channels, as the same channel_mutex is used in
>> vmbus_disconnect_ring().
>> >
>> > For a primary channel, vmbus_close() -> vmbus_free_ring() can still
>> > free channel->ringbuffer_page without notifying the "attribute->show()".
>> > We may also need to acquire/release the channel_mutex in vmbus_close()?
>> >
>> > > Actually, there should be checks that "chan" is not null and that
>> > > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
>> > > need to fix that.
>> > I suppose "chan" can not be NULL here (see the above).
>> >
>> > Checking "chan->state" may not help to completely fix the race, because
>> > there is no locking/synchronization code in
>> > vmbus_close_internal() when we test and change "channel->state".
>> >
>>
>> The calls to vmbus_close_internal() for the subchannels and the primary
>> channel are protected with channel_mutex in vmbus_disconnect_ring().
>> This prevents "channel->state" from changing while "attribute->show()" is
>> running.
>Ah, I think you're right.
>
>> > I guess we may need to check if channel->ringbuffer_page is NULL in
>> > the "attribute->show()".
>> >
>>
>> For the primary channel, vmbus_free_ring() is called after the
>> return from vmbus_disconnect_ring(). Therefore, the primary channel's
>> state is changed before "channel->ringbuffer_page" is set to NULL.
>> Checking the channel state should be sufficient to prevent the ring
>> buffers from being freed while "attribute->show()" is running. The
>> ring buffers can't be freed until the channel's state is changed, and
>> the channel state change is protected by the mutex.
>I think you're right (I noticed in a previous mail you mentioned you would
>improve your patch to check "chan->state" with the mutax held).
>
>> I think checking that "channel->ringbuffer_page" is not NULL would
>> also work, but, as you stated, we would need to aquire/release
>> channel_mutex in vmbus_close().
>Then it looks unnecessary to check "channel->ringbuffer_page".
>
>> > PS, to prove that a race condition does exist and can really cause a panic or
>> > something, I usually add some msleep() delays in different paths so that I
>> > can reproduce the crash every time I take a special action, e.g. when I read
>> > the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
>> > a patch can indeed help, at least it can fix the crash which would happen
>> > without the patch. :-)
>> >
>>
>> Thanks! I was able to free the ring buffers while "attribute->show()"
>> was running, which caused a null pointer dereference bug. As expected,
>> the mutex lock fixed it.
>Awesome!
I've queued this one for hyper-fixes, thanks all!
--
Thanks,
Sasha
From: Sasha Levin <[email protected]> Sent: Thursday, January 31, 2019 7:20 AM
>
> I've queued this one for hyper-fixes, thanks all!
>
Actually, please hold off on queuing this one. In a conversation I had yesterday with Kim, they had identified a deadlock. Kim was going to be looking at some revisions to avoid the deadlock.
Kim -- please confirm.
Michael
On Thu, Jan 31, 2019 at 04:45:35PM +0000, Michael Kelley wrote:
> From: Sasha Levin <[email protected]> Sent: Thursday, January 31, 2019 7:20 AM
> >
> > I've queued this one for hyper-fixes, thanks all!
> >
>
> Actually, please hold off on queuing this one. In a conversation I had yesterday with Kim, they had identified a deadlock. Kim was going to be looking at some revisions to avoid the deadlock.
>
> Kim -- please confirm.
>
> Michael
This is correct. I need to send a v2 of this patch which addresses two
issues:
1) Check channel->state inside the mutex acquire/release in
vmbus_chan_attr_show().
2) Prevent a deadlock that can occur between the proposed mutex_lock()
call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
I've identified two possible solutions to the deadlock:
1) Add a new mutex to the vmbus_channel struct which protects changes to
"channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
"vmbus_connection.channel_mutex".
2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
originally proposed, and acquire it with mutex_trylock(). If the mutex
cannot be acquired, return -EINVAL.
I'm leaning towards (2), using mutex_trylock(). "vmbus_connection.channel_mutex"
appears to be used only when a channel is being opened or closed, so
vmbus_chan_attr_show() should be able to acquire the mutex when the
channel is in use.
If anyone has suggestions, please let me know.
Thanks,
Kim
On Thu, Jan 31, 2019 at 12:47:07PM -0500, Kimberly Brown wrote:
>On Thu, Jan 31, 2019 at 04:45:35PM +0000, Michael Kelley wrote:
>> From: Sasha Levin <[email protected]> Sent: Thursday, January 31, 2019 7:20 AM
>> >
>> > I've queued this one for hyper-fixes, thanks all!
>> >
>>
>> Actually, please hold off on queuing this one. In a conversation I had yesterday with Kim, they had identified a deadlock. Kim was going to be looking at some revisions to avoid the deadlock.
>>
>> Kim -- please confirm.
>>
>> Michael
>
>This is correct. I need to send a v2 of this patch which addresses two
>issues:
Now dropped, thanks for the heads-up.
--
Thanks,
Sasha
> From: Kimberly Brown <[email protected]>
> Sent: Thursday, January 31, 2019 9:47 AM
> ...
> 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
Hi Kim,
Can you please share more details about the deadlock?
It's unclear to me how the deadlock happens.
> I've identified two possible solutions to the deadlock:
>
> 1) Add a new mutex to the vmbus_channel struct which protects changes to
> "channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
> "vmbus_connection.channel_mutex".
>
> 2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
> originally proposed, and acquire it with mutex_trylock(). If the mutex
> cannot be acquired, return -EINVAL.
It looks more like a workaround. IMO we should try to find a real fix. :-)
> I'm leaning towards (2), using mutex_trylock().
> "vmbus_connection.channel_mutex"
> appears to be used only when a channel is being opened or closed, so
> vmbus_chan_attr_show() should be able to acquire the mutex when the
> channel is in use.
>
> If anyone has suggestions, please let me know.
>
> Thanks,
> Kim
Thanks,
-- Dexuan
On Fri, Feb 01, 2019 at 06:24:24PM +0000, Dexuan Cui wrote:
> > From: Kimberly Brown <[email protected]>
> > Sent: Thursday, January 31, 2019 9:47 AM
> > ...
> > 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> > call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
> Hi Kim,
> Can you please share more details about the deadlock?
> It's unclear to me how the deadlock happens.
>
Hi Dexuan,
I encountered the deadlock by:
1) Adding a call to msleep() before acquiring the mutex in
vmbus_chan_attr_show()
2) Opening a hv_netvsc subchannel's sysfs file
3) Removing hv_netvsc while the sysfs file is opening
Here's the lockdep output:
[ 164.167699] hv_vmbus: unregistering driver hv_netvsc
[ 164.171660] ======================================================
[ 164.171661] WARNING: possible circular locking dependency detected
[ 164.171663] 5.0.0-rc1+ #58 Not tainted
[ 164.171664] ------------------------------------------------------
[ 164.171666] kworker/0:1/12 is trying to acquire lock:
[ 164.171668] 00000000664f9893 (kn->count#43){++++}, at: kernfs_remove+0x23/0x40
[ 164.171676]
but task is already holding lock:
[ 164.171677] 000000007b9e8443 (&vmbus_connection.channel_mutex){+.+.}, at: vmbus_onoffer_rescind+0x1ae/0x210 [hv_vmbus]
[ 164.171686]
which lock already depends on the new lock.
[ 164.171687]
the existing dependency chain (in reverse order) is:
[ 164.171688]
-> #1 (&vmbus_connection.channel_mutex){+.+.}:
[ 164.171694] __mutex_lock+0x65/0x9b0
[ 164.171696] mutex_lock_nested+0x1b/0x20
[ 164.171700] vmbus_chan_attr_show+0x3f/0x90 [hv_vmbus]
[ 164.171703] sysfs_kf_seq_show+0xa4/0x130
[ 164.171705] kernfs_seq_show+0x2d/0x30
[ 164.171708] seq_read+0xe2/0x410
[ 164.171711] kernfs_fop_read+0x14e/0x1a0
[ 164.171714] __vfs_read+0x3a/0x1a0
[ 164.171716] vfs_read+0x91/0x140
[ 164.171718] ksys_read+0x58/0xc0
[ 164.171721] __x64_sys_read+0x1a/0x20
[ 164.171724] do_syscall_64+0x65/0x1b0
[ 164.171727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 164.171728]
-> #0 (kn->count#43){++++}:
[ 164.171732] lock_acquire+0xa3/0x180
[ 164.171734] __kernfs_remove+0x278/0x300
[ 164.171737] kernfs_remove+0x23/0x40
[ 164.171739] sysfs_remove_dir+0x51/0x60
[ 164.171741] kobject_del.part.7+0x13/0x40
[ 164.171743] kobject_put+0x6a/0x1a0
[ 164.171748] hv_process_channel_removal+0xfe/0x180 [hv_vmbus]
[ 164.171752] vmbus_onoffer_rescind+0x20a/0x210 [hv_vmbus]
[ 164.171756] vmbus_onmessage+0x5f/0x150 [hv_vmbus]
[ 164.171760] vmbus_onmessage_work+0x22/0x30 [hv_vmbus]
[ 164.171763] process_one_work+0x291/0x5c0
[ 164.171765] worker_thread+0x34/0x400
[ 164.171767] kthread+0x124/0x140
[ 164.171770] ret_from_fork+0x3a/0x50
[ 164.171771]
other info that might help us debug this:
[ 164.171772] Possible unsafe locking scenario:
[ 164.171773] CPU0 CPU1
[ 164.171775] ---- ----
[ 164.171776] lock(&vmbus_connection.channel_mutex);
[ 164.171777] lock(kn->count#43);
[ 164.171779] lock(&vmbus_connection.channel_mutex);
[ 164.171781] lock(kn->count#43);
[ 164.171783]
*** DEADLOCK ***
[ 164.171785] 3 locks held by kworker/0:1/12:
[ 164.171786] #0: 000000002128b29f ((wq_completion)"events"){+.+.}, at: process_one_work+0x20f/0x5c0
[ 164.171790] #1: 0000000041d2602c ((work_completion)(&ctx->work)){+.+.}, at: process_one_work+0x20f/0x5c0
[ 164.171794] #2: 000000007b9e8443 (&vmbus_connection.channel_mutex){+.+.}, at: vmbus_onoffer_rescind+0x1ae/0x210 [hv_vmbus]
[ 164.171799]
stack backtrace:
[ 164.171802] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc1+ #58
[ 164.171804] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
[ 164.171809] Workqueue: events vmbus_onmessage_work [hv_vmbus]
[ 164.171811] Call Trace:
[ 164.171816] dump_stack+0x8e/0xd5
[ 164.171819] print_circular_bug.isra.37+0x1e7/0x1f5
[ 164.171822] __lock_acquire+0x1427/0x1490
[ 164.171826] ? sched_clock+0x9/0x10
[ 164.171830] lock_acquire+0xa3/0x180
[ 164.171832] ? lock_acquire+0xa3/0x180
[ 164.171835] ? kernfs_remove+0x23/0x40
[ 164.171838] __kernfs_remove+0x278/0x300
[ 164.171840] ? kernfs_remove+0x23/0x40
[ 164.171843] kernfs_remove+0x23/0x40
[ 164.171846] sysfs_remove_dir+0x51/0x60
[ 164.171848] kobject_del.part.7+0x13/0x40
[ 164.171850] kobject_put+0x6a/0x1a0
[ 164.171855] hv_process_channel_removal+0xfe/0x180 [hv_vmbus]
[ 164.171859] vmbus_onoffer_rescind+0x20a/0x210 [hv_vmbus]
[ 164.171863] vmbus_onmessage+0x5f/0x150 [hv_vmbus]
[ 164.171868] vmbus_onmessage_work+0x22/0x30 [hv_vmbus]
[ 164.171870] process_one_work+0x291/0x5c0
[ 164.171874] worker_thread+0x34/0x400
[ 164.171877] kthread+0x124/0x140
[ 164.171879] ? process_one_work+0x5c0/0x5c0
[ 164.171881] ? kthread_park+0x90/0x90
[ 164.171884] ret_from_fork+0x3a/0x50
==========================================================================
> > I've identified two possible solutions to the deadlock:
> >
> > 1) Add a new mutex to the vmbus_channel struct which protects changes to
> > "channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
> > "vmbus_connection.channel_mutex".
> >
> > 2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
> > originally proposed, and acquire it with mutex_trylock(). If the mutex
> > cannot be acquired, return -EINVAL.
> It looks more like a workaround. IMO we should try to find a real fix. :-)
>
Good point. I think that the root cause of the deadlock problem is that
the calls to free_channel() in hv_process_channel_removal() are within a
section that's locked with vmbus_connection.channel_mutex. So, first, I
need to determine whether free_channel() needs to be called within the
locked section.
If free_channel() does need to be called within the locked section, then
there's no way to prevent the possibility of a deadlock when
vmbus_connection.channel_mutex is used in vmbus_chan_attr_show(). A
different solution, such as a new mutex to protect changes to
"channel->state", is necessary.
If free_channel() does not need to be in the locked section, then we may
be able to restructure the functions to prevent the deadlock.
I see that there was a race condition problem when
hv_process_channel_removal() was called on an already freed channel
(fixed in commit 192b2d78722f, "Fix bugs in rescind handling"), so I
suspect that free_channel() needs to stay in the locked section.
> > I'm leaning towards (2), using mutex_trylock().
> > "vmbus_connection.channel_mutex"
> > appears to be used only when a channel is being opened or closed, so
> > vmbus_chan_attr_show() should be able to acquire the mutex when the
> > channel is in use.
> >
> > If anyone has suggestions, please let me know.
> >
> > Thanks,
> > Kim
>
> Thanks,
> -- Dexuan
On Sat, Feb 02, 2019 at 03:07:35PM -0500, Kimberly Brown wrote:
>On Fri, Feb 01, 2019 at 06:24:24PM +0000, Dexuan Cui wrote:
>> > From: Kimberly Brown <[email protected]>
>> > Sent: Thursday, January 31, 2019 9:47 AM
>> > ...
>> > 2) Prevent a deadlock that can occur between the proposed mutex_lock()
>> > call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
>> Hi Kim,
>> Can you please share more details about the deadlock?
>> It's unclear to me how the deadlock happens.
>>
>
>Hi Dexuan,
>
>I encountered the deadlock by:
>1) Adding a call to msleep() before acquiring the mutex in
>vmbus_chan_attr_show()
>2) Opening a hv_netvsc subchannel's sysfs file
>3) Removing hv_netvsc while the sysfs file is opening
Dexuan, any objections to pulling this fix in?
--
Thanks,
Sasha
On Thu, Feb 14, 2019 at 08:54:31PM -0500, Sasha Levin wrote:
> On Sat, Feb 02, 2019 at 03:07:35PM -0500, Kimberly Brown wrote:
> > On Fri, Feb 01, 2019 at 06:24:24PM +0000, Dexuan Cui wrote:
> > > > From: Kimberly Brown <[email protected]>
> > > > Sent: Thursday, January 31, 2019 9:47 AM
> > > > ...
> > > > 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> > > > call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
> > > Hi Kim,
> > > Can you please share more details about the deadlock?
> > > It's unclear to me how the deadlock happens.
> > >
> >
> > Hi Dexuan,
> >
> > I encountered the deadlock by:
> > 1) Adding a call to msleep() before acquiring the mutex in
> > vmbus_chan_attr_show()
> > 2) Opening a hv_netvsc subchannel's sysfs file
> > 3) Removing hv_netvsc while the sysfs file is opening
>
> Dexuan, any objections to pulling this fix in?
>
Hi Sasha,
This fix can cause a deadlock. I'm working on a different fix for the
original race condition problem.
Thanks,
Kim
> --
> Thanks,
> Sasha
This patchset fixes a race condition vulnerability in the "_show"
functions that access a channel ring buffer.
Changes in v2:
- In v1, I proposed using “vmbus_connection.channel_mutex” in the
“_show” functions to prevent the race condition. However, using this
mutex could result in a deadlock, so a new approach is needed.
- Patch 1 is new and consists of a code refactor.
- Patch 2 introduces a new mutex lock in the “vmbus_channel” struct,
and the new mutex is used to eliminate the race condition.
Kimberly Brown (2):
Drivers: hv: vmbus: Refactor chan->state if statement
Drivers: hv: vmbus: Add a channel ring buffer mutex lock
drivers/hv/channel.c | 5 ++
drivers/hv/channel_mgmt.c | 1 +
drivers/hv/ring_buffer.c | 11 +++-
drivers/hv/vmbus_drv.c | 118 ++++++++++++++++++++++++++------------
include/linux/hyperv.h | 10 +++-
5 files changed, 104 insertions(+), 41 deletions(-)
--
2.17.1
The chan->state "if statement" was introduced in commit 6712cc9c2211
("vmbus: don't return values for uninitalized channels"). That commit
states that the purpose of the chan->state "if statement" is to prevent
returning garbage or causing a kernel OOPS when the channel ring buffer
is not initialized. The changes in this patch provide the same
protection.
Refactor the chan->state “if statement” in vmbus_chan_attr_show():
- Instead of checking the channel state in the "if statement", check
whether the channel ring buffer pointer is NULL. Checking the
ring buffer pointer makes this code consistent with
hv_ringbuffer_get_debuginfo().
- Move the "if statement" to the four "_show" functions that access a
channel ring buffer. Only four of the channel-level "_show" functions
access a ring buffer. The ring buffer pointer does not need to be
checked before calling the other "_show" functions, and moving the
ring buffer pointer "if statement" to the "_show" functions that
access a ring buffer makes the purpose of the "if statement" clear.
Signed-off-by: Kimberly Brown <[email protected]>
---
drivers/hv/vmbus_drv.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f290401f97e5..b02bcf1a9380 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1434,9 +1434,6 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
if (!attribute->show)
return -EIO;
- if (chan->state != CHANNEL_OPENED_STATE)
- return -EINVAL;
-
return attribute->show(chan, buf);
}
@@ -1448,6 +1445,9 @@ static ssize_t out_mask_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
}
static VMBUS_CHAN_ATTR_RO(out_mask);
@@ -1456,6 +1456,9 @@ static ssize_t in_mask_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
}
static VMBUS_CHAN_ATTR_RO(in_mask);
@@ -1464,6 +1467,9 @@ static ssize_t read_avail_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
}
static VMBUS_CHAN_ATTR_RO(read_avail);
@@ -1472,6 +1478,9 @@ static ssize_t write_avail_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
}
static VMBUS_CHAN_ATTR_RO(write_avail);
--
2.17.1
The "_show" functions that access channel ring buffer data are
vulnerable to a race condition that can result in a NULL pointer
dereference. This problem was discussed here:
https://lkml.org/lkml/2018/10/18/779
To prevent this from occurring, add a new mutex lock,
"ring_buffer_mutex", to the vmbus_channel struct.
Acquire/release "ring_buffer_mutex" in the functions that can set the
ring buffer pointer to NULL: vmbus_free_ring() and __vmbus_open().
Acquire/release "ring_buffer_mutex" in the four channel-level "_show"
functions that access ring buffer data. Remove the "const" qualifier
from the "struct vmbus_channel *chan" parameter of the channel-level
"_show" functions so that "ring_buffer_mutex" can be acquired/released
in these functions.
Acquire/release "ring_buffer_mutex" in hv_ringbuffer_get_debuginfo().
Pass the channel pointer to hv_ringbuffer_get_debuginfo() so that
"ring_buffer_mutex" can be accessed in this function.
Signed-off-by: Kimberly Brown <[email protected]>
---
drivers/hv/channel.c | 5 ++
drivers/hv/channel_mgmt.c | 1 +
drivers/hv/ring_buffer.c | 11 +++-
drivers/hv/vmbus_drv.c | 111 ++++++++++++++++++++++++--------------
include/linux/hyperv.h | 10 +++-
5 files changed, 96 insertions(+), 42 deletions(-)
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 23381c41d087..7770e97e4202 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -82,8 +82,10 @@ EXPORT_SYMBOL_GPL(vmbus_setevent);
/* vmbus_free_ring - drop mapping of ring buffer */
void vmbus_free_ring(struct vmbus_channel *channel)
{
+ mutex_lock(&channel->ring_buffer_mutex);
hv_ringbuffer_cleanup(&channel->outbound);
hv_ringbuffer_cleanup(&channel->inbound);
+ mutex_unlock(&channel->ring_buffer_mutex);
if (channel->ringbuffer_page) {
__free_pages(channel->ringbuffer_page,
@@ -241,8 +243,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
newchannel->ringbuffer_gpadlhandle = 0;
error_clean_ring:
+ mutex_lock(&newchannel->ring_buffer_mutex);
hv_ringbuffer_cleanup(&newchannel->outbound);
hv_ringbuffer_cleanup(&newchannel->inbound);
+ mutex_unlock(&newchannel->ring_buffer_mutex);
+
newchannel->state = CHANNEL_OPEN_STATE;
return err;
}
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 62703b354d6d..769873cddfe5 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -329,6 +329,7 @@ static struct vmbus_channel *alloc_channel(void)
spin_lock_init(&channel->lock);
init_completion(&channel->rescind_event);
+ mutex_init(&channel->ring_buffer_mutex);
INIT_LIST_HEAD(&channel->sc_list);
INIT_LIST_HEAD(&channel->percpu_list);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 9e8b31ccc142..35de60d2c1e8 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -167,13 +167,18 @@ hv_get_ringbuffer_availbytes(const struct hv_ring_buffer_info *rbi,
/* Get various debug metrics for the specified ring buffer. */
int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
- struct hv_ring_buffer_debug_info *debug_info)
+ struct hv_ring_buffer_debug_info *debug_info,
+ struct vmbus_channel *channel)
{
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
- if (!ring_info->ring_buffer)
+ mutex_lock(&channel->ring_buffer_mutex);
+
+ if (!ring_info->ring_buffer) {
+ mutex_unlock(&channel->ring_buffer_mutex);
return -EINVAL;
+ }
hv_get_ringbuffer_availbytes(ring_info,
&bytes_avail_toread,
@@ -184,6 +189,8 @@ int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
debug_info->current_write_index = ring_info->ring_buffer->write_index;
debug_info->current_interrupt_mask
= ring_info->ring_buffer->interrupt_mask;
+ mutex_unlock(&channel->ring_buffer_mutex);
+
return 0;
}
EXPORT_SYMBOL_GPL(hv_ringbuffer_get_debuginfo);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index b02bcf1a9380..1ff767795d0a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -345,9 +345,8 @@ static ssize_t out_intr_mask_show(struct device *dev,
if (!hv_dev->channel)
return -ENODEV;
-
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
- &outbound);
+ &outbound, hv_dev->channel);
if (ret < 0)
return ret;
@@ -366,7 +365,7 @@ static ssize_t out_read_index_show(struct device *dev,
return -ENODEV;
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
- &outbound);
+ &outbound, hv_dev->channel);
if (ret < 0)
return ret;
return sprintf(buf, "%d\n", outbound.current_read_index);
@@ -385,7 +384,7 @@ static ssize_t out_write_index_show(struct device *dev,
return -ENODEV;
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
- &outbound);
+ &outbound, hv_dev->channel);
if (ret < 0)
return ret;
return sprintf(buf, "%d\n", outbound.current_write_index);
@@ -404,7 +403,7 @@ static ssize_t out_read_bytes_avail_show(struct device *dev,
return -ENODEV;
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
- &outbound);
+ &outbound, hv_dev->channel);
if (ret < 0)
return ret;
return sprintf(buf, "%d\n", outbound.bytes_avail_toread);
@@ -423,7 +422,7 @@ static ssize_t out_write_bytes_avail_show(struct device *dev,
return -ENODEV;
ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
- &outbound);
+ &outbound, hv_dev->channel);
if (ret < 0)
return ret;
return sprintf(buf, "%d\n", outbound.bytes_avail_towrite);
@@ -440,7 +439,8 @@ static ssize_t in_intr_mask_show(struct device *dev,
if (!hv_dev->channel)
return -ENODEV;
- ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
+ ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound,
+ hv_dev->channel);
if (ret < 0)
return ret;
@@ -458,7 +458,8 @@ static ssize_t in_read_index_show(struct device *dev,
if (!hv_dev->channel)
return -ENODEV;
- ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
+ ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound,
+ hv_dev->channel);
if (ret < 0)
return ret;
@@ -476,7 +477,8 @@ static ssize_t in_write_index_show(struct device *dev,
if (!hv_dev->channel)
return -ENODEV;
- ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
+ ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound,
+ hv_dev->channel);
if (ret < 0)
return ret;
@@ -495,7 +497,8 @@ static ssize_t in_read_bytes_avail_show(struct device *dev,
if (!hv_dev->channel)
return -ENODEV;
- ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
+ ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound,
+ hv_dev->channel);
if (ret < 0)
return ret;
@@ -514,7 +517,8 @@ static ssize_t in_write_bytes_avail_show(struct device *dev,
if (!hv_dev->channel)
return -ENODEV;
- ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
+ ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound,
+ hv_dev->channel);
if (ret < 0)
return ret;
@@ -1409,7 +1413,7 @@ static void vmbus_chan_release(struct kobject *kobj)
struct vmbus_chan_attribute {
struct attribute attr;
- ssize_t (*show)(const struct vmbus_channel *chan, char *buf);
+ ssize_t (*show)(struct vmbus_channel *chan, char *buf);
ssize_t (*store)(struct vmbus_channel *chan,
const char *buf, size_t count);
};
@@ -1428,7 +1432,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
{
const struct vmbus_chan_attribute *attribute
= container_of(attr, struct vmbus_chan_attribute, attr);
- const struct vmbus_channel *chan
+ struct vmbus_channel *chan
= container_of(kobj, struct vmbus_channel, kobj);
if (!attribute->show)
@@ -1441,58 +1445,89 @@ static const struct sysfs_ops vmbus_chan_sysfs_ops = {
.show = vmbus_chan_attr_show,
};
-static ssize_t out_mask_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t out_mask_show(struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ ssize_t ret;
+
+ mutex_lock(&channel->ring_buffer_mutex);
- if (!rbi->ring_buffer)
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&channel->ring_buffer_mutex);
return -EINVAL;
+ }
- return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ mutex_unlock(&channel->ring_buffer_mutex);
+
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(out_mask);
-static ssize_t in_mask_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t in_mask_show(struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ ssize_t ret;
- if (!rbi->ring_buffer)
+ mutex_lock(&channel->ring_buffer_mutex);
+
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&channel->ring_buffer_mutex);
return -EINVAL;
+ }
+
+ ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ mutex_unlock(&channel->ring_buffer_mutex);
- return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(in_mask);
-static ssize_t read_avail_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t read_avail_show(struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ ssize_t ret;
+
+ mutex_lock(&channel->ring_buffer_mutex);
- if (!rbi->ring_buffer)
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&channel->ring_buffer_mutex);
return -EINVAL;
+ }
+
+ ret = sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
+ mutex_unlock(&channel->ring_buffer_mutex);
- return sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(read_avail);
-static ssize_t write_avail_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t write_avail_show(struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ ssize_t ret;
- if (!rbi->ring_buffer)
+ mutex_lock(&channel->ring_buffer_mutex);
+
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&channel->ring_buffer_mutex);
return -EINVAL;
+ }
- return sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
+ ret = sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
+ mutex_unlock(&channel->ring_buffer_mutex);
+
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(write_avail);
-static ssize_t show_target_cpu(const struct vmbus_channel *channel, char *buf)
+static ssize_t show_target_cpu(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%u\n", channel->target_cpu);
}
static VMBUS_CHAN_ATTR(cpu, S_IRUGO, show_target_cpu, NULL);
-static ssize_t channel_pending_show(const struct vmbus_channel *channel,
- char *buf)
+static ssize_t channel_pending_show(struct vmbus_channel *channel, char *buf)
{
if (!channel->offermsg.monitor_allocated)
return -EINVAL;
@@ -1503,8 +1538,7 @@ static ssize_t channel_pending_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(pending, S_IRUGO, channel_pending_show, NULL);
-static ssize_t channel_latency_show(const struct vmbus_channel *channel,
- char *buf)
+static ssize_t channel_latency_show(struct vmbus_channel *channel, char *buf)
{
if (!channel->offermsg.monitor_allocated)
return -EINVAL;
@@ -1515,19 +1549,19 @@ static ssize_t channel_latency_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(latency, S_IRUGO, channel_latency_show, NULL);
-static ssize_t channel_interrupts_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t channel_interrupts_show(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%llu\n", channel->interrupts);
}
static VMBUS_CHAN_ATTR(interrupts, S_IRUGO, channel_interrupts_show, NULL);
-static ssize_t channel_events_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t channel_events_show(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%llu\n", channel->sig_events);
}
static VMBUS_CHAN_ATTR(events, S_IRUGO, channel_events_show, NULL);
-static ssize_t channel_intr_in_full_show(const struct vmbus_channel *channel,
+static ssize_t channel_intr_in_full_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1535,7 +1569,7 @@ static ssize_t channel_intr_in_full_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(intr_in_full, 0444, channel_intr_in_full_show, NULL);
-static ssize_t channel_intr_out_empty_show(const struct vmbus_channel *channel,
+static ssize_t channel_intr_out_empty_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1543,7 +1577,7 @@ static ssize_t channel_intr_out_empty_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(intr_out_empty, 0444, channel_intr_out_empty_show, NULL);
-static ssize_t channel_out_full_first_show(const struct vmbus_channel *channel,
+static ssize_t channel_out_full_first_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1551,7 +1585,7 @@ static ssize_t channel_out_full_first_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(out_full_first, 0444, channel_out_full_first_show, NULL);
-static ssize_t channel_out_full_total_show(const struct vmbus_channel *channel,
+static ssize_t channel_out_full_total_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1559,7 +1593,7 @@ static ssize_t channel_out_full_total_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(out_full_total, 0444, channel_out_full_total_show, NULL);
-static ssize_t subchannel_monitor_id_show(const struct vmbus_channel *channel,
+static ssize_t subchannel_monitor_id_show(struct vmbus_channel *channel,
char *buf)
{
if (!channel->offermsg.monitor_allocated)
@@ -1569,8 +1603,7 @@ static ssize_t subchannel_monitor_id_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(monitor_id, S_IRUGO, subchannel_monitor_id_show, NULL);
-static ssize_t subchannel_id_show(const struct vmbus_channel *channel,
- char *buf)
+static ssize_t subchannel_id_show(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%u\n",
channel->offermsg.offer.sub_channel_index);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 64698ec8f2ac..6a6f79d7beba 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -934,6 +934,13 @@ struct vmbus_channel {
* full outbound ring buffer.
*/
u64 out_full_first;
+
+ /*
+ * The mutex lock that protects the channel's ring buffers. It's used to
+ * prevent the ring buffer pointers from being set to NULL while a
+ * function is accessing ring buffer data.
+ */
+ struct mutex ring_buffer_mutex;
};
static inline bool is_hvsock_channel(const struct vmbus_channel *c)
@@ -1207,7 +1214,8 @@ struct hv_ring_buffer_debug_info {
int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
- struct hv_ring_buffer_debug_info *debug_info);
+ struct hv_ring_buffer_debug_info *debug_info,
+ struct vmbus_channel *channel);
/* Vmbus interface */
#define vmbus_driver_register(driver) \
--
2.17.1
From: Kimberly Brown <[email protected]> Sent: Thursday, February 21, 2019 7:47 PM
>
> The chan->state "if statement" was introduced in commit 6712cc9c2211
> ("vmbus: don't return values for uninitalized channels"). That commit
> states that the purpose of the chan->state "if statement" is to prevent
> returning garbage or causing a kernel OOPS when the channel ring buffer
> is not initialized. The changes in this patch provide the same
> protection.
>
> Refactor the chan->state “if statement” in vmbus_chan_attr_show():
> - Instead of checking the channel state in the "if statement", check
> whether the channel ring buffer pointer is NULL. Checking the
> ring buffer pointer makes this code consistent with
> hv_ringbuffer_get_debuginfo().
>
> - Move the "if statement" to the four "_show" functions that access a
> channel ring buffer. Only four of the channel-level "_show" functions
> access a ring buffer. The ring buffer pointer does not need to be
> checked before calling the other "_show" functions, and moving the
> ring buffer pointer "if statement" to the "_show" functions that
> access a ring buffer makes the purpose of the "if statement" clear.
>
> Signed-off-by: Kimberly Brown <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
From: Kimberly Brown <[email protected]> Sent: Thursday, February 21, 2019 7:47 PM
>
> The "_show" functions that access channel ring buffer data are
> vulnerable to a race condition that can result in a NULL pointer
> dereference. This problem was discussed here:
> https://lkml.org/lkml/2018/10/18/779
>
> To prevent this from occurring, add a new mutex lock,
> "ring_buffer_mutex", to the vmbus_channel struct.
>
> Acquire/release "ring_buffer_mutex" in the functions that can set the
> ring buffer pointer to NULL: vmbus_free_ring() and __vmbus_open().
>
> Acquire/release "ring_buffer_mutex" in the four channel-level "_show"
> functions that access ring buffer data. Remove the "const" qualifier
> from the "struct vmbus_channel *chan" parameter of the channel-level
> "_show" functions so that "ring_buffer_mutex" can be acquired/released
> in these functions.
>
> Acquire/release "ring_buffer_mutex" in hv_ringbuffer_get_debuginfo().
> Pass the channel pointer to hv_ringbuffer_get_debuginfo() so that
> "ring_buffer_mutex" can be accessed in this function.
>
> Signed-off-by: Kimberly Brown <[email protected]>
I've reviewed the code. I believe it is correct and fixes the race
condition. Unfortunately, the code ended up being messier than I
had hoped, and in particular, the need to pass the channel pointer
into the ring buffer functions is distasteful. An alternate idea is to
put the new mutex into the hv_ring_buffer_info structure. This results
in two mutex's since there's a separate hv_ring_buffer_info structure for
the "in" ring and the "out" ring. But it makes the ring buffer functions
more self-contained and able to operate without knowledge of the
channel. The mutex can be obtained in hv_ringbuffer_cleanup() instead
of in the vmbus functions, and hv_ringbuffer_get_debuginfo() doesn't
need the channel pointer.
The "const" still has to dropped from the channel pointer because
the hv_ring_buffer_info structures are inline in the channel structure,
but that's less objectionable. The extra memory for two mutex's isn't
really a problem, and none of the code paths are performance
sensitive.
It's a tradeoff. I think I slightly prefer moving the mutex to the
hv_ring_buffer_info structure, but could also be persuaded to
take it like it is.
Thoughts?
Michael
On Sun, Feb 24, 2019 at 04:53:03PM +0000, Michael Kelley wrote:
> From: Kimberly Brown <[email protected]> Sent: Thursday, February 21, 2019 7:47 PM
> >
> > The "_show" functions that access channel ring buffer data are
> > vulnerable to a race condition that can result in a NULL pointer
> > dereference. This problem was discussed here:
> > https://lkml.org/lkml/2018/10/18/779
> >
> > To prevent this from occurring, add a new mutex lock,
> > "ring_buffer_mutex", to the vmbus_channel struct.
> >
> > Acquire/release "ring_buffer_mutex" in the functions that can set the
> > ring buffer pointer to NULL: vmbus_free_ring() and __vmbus_open().
> >
> > Acquire/release "ring_buffer_mutex" in the four channel-level "_show"
> > functions that access ring buffer data. Remove the "const" qualifier
> > from the "struct vmbus_channel *chan" parameter of the channel-level
> > "_show" functions so that "ring_buffer_mutex" can be acquired/released
> > in these functions.
> >
> > Acquire/release "ring_buffer_mutex" in hv_ringbuffer_get_debuginfo().
> > Pass the channel pointer to hv_ringbuffer_get_debuginfo() so that
> > "ring_buffer_mutex" can be accessed in this function.
> >
> > Signed-off-by: Kimberly Brown <[email protected]>
>
> I've reviewed the code. I believe it is correct and fixes the race
> condition. Unfortunately, the code ended up being messier than I
> had hoped, and in particular, the need to pass the channel pointer
> into the ring buffer functions is distasteful. An alternate idea is to
> put the new mutex into the hv_ring_buffer_info structure. This results
> in two mutex's since there's a separate hv_ring_buffer_info structure for
> the "in" ring and the "out" ring. But it makes the ring buffer functions
> more self-contained and able to operate without knowledge of the
> channel. The mutex can be obtained in hv_ringbuffer_cleanup() instead
> of in the vmbus functions, and hv_ringbuffer_get_debuginfo() doesn't
> need the channel pointer.
>
> The "const" still has to dropped from the channel pointer because
> the hv_ring_buffer_info structures are inline in the channel structure,
> but that's less objectionable. The extra memory for two mutex's isn't
> really a problem, and none of the code paths are performance
> sensitive.
>
> It's a tradeoff. I think I slightly prefer moving the mutex to the
> hv_ring_buffer_info structure, but could also be persuaded to
> take it like it is.
>
Thanks for the feedback! I don't have a compelling reason to keep the
lock in the vmbus_channel struct. I chose this approach because only one
lock would be required, rather than two. But, as you noted, using one
lock requires some tradeoffs.
I've looked through the changes that would be required to use two locks,
and I agree with you; I prefer using two locks. I'll submit a v3 for this
patch.
Thanks,
Kim
> Thoughts?
>
> Michael
>
This patchset fixes a race condition in the "_show" functions that
access the channel ring buffers.
Changes in v3:
Patch 1: Drivers: hv: vmbus: Refactor chan->state if statement
- Added the “reviewed-by” line from v2.
Patch 2: Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
- This patch is new. This change allows the new mutex locks in patch 3
to be initialized when the channel is initialized.
Patch 3: Drivers: hv: vmbus: Fix race condition with new
ring_buffer_info mutex
- Added two ring buffer info mutex locks instead of the single channel
mutex lock that was proposed in v2.
- Changed the mutex acquire/release calls as needed for the new ring
buffer info mutex locks.
- Updated the commit message.
Changes in v2:
- In v1, I proposed using “vmbus_connection.channel_mutex” in the
“_show” functions to prevent the race condition. However, using this
mutex could result in a deadlock, so a new approach is proposed in
this patchset.
- Patch 1 is new and consists of refactoring an if statement.
- Patch 2 introduces a new mutex lock in the “vmbus_channel” struct,
which is used to eliminate the race condition.
Kimberly Brown (3):
Drivers: hv: vmbus: Refactor chan->state if statement
Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
Drivers: hv: vmbus: Fix race condition with new ring_buffer_info mutex
drivers/hv/channel_mgmt.c | 2 +
drivers/hv/hyperv_vmbus.h | 1 +
drivers/hv/ring_buffer.c | 22 ++++++++--
drivers/hv/vmbus_drv.c | 89 +++++++++++++++++++++++++++------------
include/linux/hyperv.h | 7 ++-
5 files changed, 88 insertions(+), 33 deletions(-)
--
2.17.1
The chan->state "if statement" was introduced in commit 6712cc9c2211
("vmbus: don't return values for uninitalized channels"). That commit
states that the purpose of the chan->state "if statement" is to prevent
returning garbage or causing a kernel OOPS when the channel ring buffer
is not initialized. The changes in this patch provide the same
protection.
Refactor the chan->state “if statement” in vmbus_chan_attr_show():
- Instead of checking the channel state in the "if statement", check
whether the channel ring buffer pointer is NULL. Checking the
ring buffer pointer makes this code consistent with
hv_ringbuffer_get_debuginfo().
- Move the "if statement" to the four "_show" functions that access a
channel ring buffer. Only four of the channel-level "_show" functions
access a ring buffer. The ring buffer pointer does not need to be
checked before calling the other "_show" functions, and moving the
ring buffer pointer "if statement" to the "_show" functions that
access a ring buffer makes the purpose of the "if statement" clear.
Signed-off-by: Kimberly Brown <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
drivers/hv/vmbus_drv.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index a7bb709870a8..7f15c41d952e 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1435,9 +1435,6 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
if (!attribute->show)
return -EIO;
- if (chan->state != CHANNEL_OPENED_STATE)
- return -EINVAL;
-
return attribute->show(chan, buf);
}
@@ -1449,6 +1446,9 @@ static ssize_t out_mask_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
}
static VMBUS_CHAN_ATTR_RO(out_mask);
@@ -1457,6 +1457,9 @@ static ssize_t in_mask_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
}
static VMBUS_CHAN_ATTR_RO(in_mask);
@@ -1465,6 +1468,9 @@ static ssize_t read_avail_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
}
static VMBUS_CHAN_ATTR_RO(read_avail);
@@ -1473,6 +1479,9 @@ static ssize_t write_avail_show(const struct vmbus_channel *channel, char *buf)
{
const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ if (!rbi->ring_buffer)
+ return -EINVAL;
+
return sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
}
static VMBUS_CHAN_ATTR_RO(write_avail);
--
2.17.1
Fix a race condition that can result in a ring buffer pointer being set
to null while a "_show" function is reading the ring buffer's data. This
problem was discussed here: https://lkml.org/lkml/2018/10/18/779
To fix the race condition, add a new mutex lock to the
"hv_ring_buffer_info" struct. Add a new function,
"hv_ringbuffer_pre_init()", where a channel's inbound and outbound
ring_buffer_info mutex locks are initialized.
Acquire/release the locks in the "hv_ringbuffer_cleanup()" function,
which is where the ring buffer pointers are set to null.
Acquire/release the locks in the four channel-level "_show" functions
that access ring buffer data. Remove the "const" qualifier from the
"vmbus_channel" parameter and the "rbi" variable of the channel-level
"_show" functions so that the locks can be acquired/released in these
functions.
Acquire/release the locks in hv_ringbuffer_get_debuginfo(). Remove the
"const" qualifier from the "hv_ring_buffer_info" parameter so that the
locks can be acquired/released in this function.
Signed-off-by: Kimberly Brown <[email protected]>
---
drivers/hv/channel_mgmt.c | 2 +
drivers/hv/hyperv_vmbus.h | 1 +
drivers/hv/ring_buffer.c | 19 ++++++++-
drivers/hv/vmbus_drv.c | 82 +++++++++++++++++++++++++--------------
include/linux/hyperv.h | 7 +++-
5 files changed, 79 insertions(+), 32 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 9d7f9c1c60c7..14543059cc3e 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -336,6 +336,8 @@ static struct vmbus_channel *alloc_channel(void)
tasklet_init(&channel->callback_event,
vmbus_on_event, (unsigned long)channel);
+ hv_ringbuffer_pre_init(channel);
+
return channel;
}
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index a94aab94e0b5..e5467b821f41 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -193,6 +193,7 @@ extern void hv_synic_clockevents_cleanup(void);
/* Interface */
+void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
struct page *pages, u32 pagecnt);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 0386ff48c5ea..121a01c43298 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -166,14 +166,18 @@ hv_get_ringbuffer_availbytes(const struct hv_ring_buffer_info *rbi,
}
/* Get various debug metrics for the specified ring buffer. */
-int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
+int hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_buffer_debug_info *debug_info)
{
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
- if (!ring_info->ring_buffer)
+ mutex_lock(&ring_info->ring_buffer_mutex);
+
+ if (!ring_info->ring_buffer) {
+ mutex_unlock(&ring_info->ring_buffer_mutex);
return -EINVAL;
+ }
hv_get_ringbuffer_availbytes(ring_info,
&bytes_avail_toread,
@@ -184,10 +188,19 @@ int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
debug_info->current_write_index = ring_info->ring_buffer->write_index;
debug_info->current_interrupt_mask
= ring_info->ring_buffer->interrupt_mask;
+ mutex_unlock(&ring_info->ring_buffer_mutex);
+
return 0;
}
EXPORT_SYMBOL_GPL(hv_ringbuffer_get_debuginfo);
+/* Initialize a channel's ring buffer info mutex locks */
+void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
+{
+ mutex_init(&channel->inbound.ring_buffer_mutex);
+ mutex_init(&channel->outbound.ring_buffer_mutex);
+}
+
/* Initialize the ring buffer. */
int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
struct page *pages, u32 page_cnt)
@@ -240,8 +253,10 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
/* Cleanup the ring buffer. */
void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info)
{
+ mutex_lock(&ring_info->ring_buffer_mutex);
vunmap(ring_info->ring_buffer);
ring_info->ring_buffer = NULL;
+ mutex_unlock(&ring_info->ring_buffer_mutex);
}
/* Write to the ring buffer. */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 7f15c41d952e..84f3a510b4c9 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1410,7 +1410,7 @@ static void vmbus_chan_release(struct kobject *kobj)
struct vmbus_chan_attribute {
struct attribute attr;
- ssize_t (*show)(const struct vmbus_channel *chan, char *buf);
+ ssize_t (*show)(struct vmbus_channel *chan, char *buf);
ssize_t (*store)(struct vmbus_channel *chan,
const char *buf, size_t count);
};
@@ -1429,7 +1429,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
{
const struct vmbus_chan_attribute *attribute
= container_of(attr, struct vmbus_chan_attribute, attr);
- const struct vmbus_channel *chan
+ struct vmbus_channel *chan
= container_of(kobj, struct vmbus_channel, kobj);
if (!attribute->show)
@@ -1442,57 +1442,81 @@ static const struct sysfs_ops vmbus_chan_sysfs_ops = {
.show = vmbus_chan_attr_show,
};
-static ssize_t out_mask_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t out_mask_show(struct vmbus_channel *channel, char *buf)
{
- const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ struct hv_ring_buffer_info *rbi = &channel->outbound;
+ ssize_t ret;
- if (!rbi->ring_buffer)
+ mutex_lock(&rbi->ring_buffer_mutex);
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&rbi->ring_buffer_mutex);
return -EINVAL;
+ }
- return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ mutex_unlock(&rbi->ring_buffer_mutex);
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(out_mask);
-static ssize_t in_mask_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t in_mask_show(struct vmbus_channel *channel, char *buf)
{
- const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ struct hv_ring_buffer_info *rbi = &channel->inbound;
+ ssize_t ret;
- if (!rbi->ring_buffer)
+ mutex_lock(&rbi->ring_buffer_mutex);
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&rbi->ring_buffer_mutex);
return -EINVAL;
+ }
- return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
+ mutex_unlock(&rbi->ring_buffer_mutex);
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(in_mask);
-static ssize_t read_avail_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t read_avail_show(struct vmbus_channel *channel, char *buf)
{
- const struct hv_ring_buffer_info *rbi = &channel->inbound;
+ struct hv_ring_buffer_info *rbi = &channel->inbound;
+ ssize_t ret;
- if (!rbi->ring_buffer)
+ mutex_lock(&rbi->ring_buffer_mutex);
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&rbi->ring_buffer_mutex);
return -EINVAL;
+ }
- return sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
+ ret = sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
+ mutex_unlock(&rbi->ring_buffer_mutex);
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(read_avail);
-static ssize_t write_avail_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t write_avail_show(struct vmbus_channel *channel, char *buf)
{
- const struct hv_ring_buffer_info *rbi = &channel->outbound;
+ struct hv_ring_buffer_info *rbi = &channel->outbound;
+ ssize_t ret;
- if (!rbi->ring_buffer)
+ mutex_lock(&rbi->ring_buffer_mutex);
+ if (!rbi->ring_buffer) {
+ mutex_unlock(&rbi->ring_buffer_mutex);
return -EINVAL;
+ }
- return sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
+ ret = sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
+ mutex_unlock(&rbi->ring_buffer_mutex);
+ return ret;
}
static VMBUS_CHAN_ATTR_RO(write_avail);
-static ssize_t show_target_cpu(const struct vmbus_channel *channel, char *buf)
+static ssize_t show_target_cpu(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%u\n", channel->target_cpu);
}
static VMBUS_CHAN_ATTR(cpu, S_IRUGO, show_target_cpu, NULL);
-static ssize_t channel_pending_show(const struct vmbus_channel *channel,
+static ssize_t channel_pending_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%d\n",
@@ -1501,7 +1525,7 @@ static ssize_t channel_pending_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(pending, S_IRUGO, channel_pending_show, NULL);
-static ssize_t channel_latency_show(const struct vmbus_channel *channel,
+static ssize_t channel_latency_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%d\n",
@@ -1510,19 +1534,19 @@ static ssize_t channel_latency_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(latency, S_IRUGO, channel_latency_show, NULL);
-static ssize_t channel_interrupts_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t channel_interrupts_show(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%llu\n", channel->interrupts);
}
static VMBUS_CHAN_ATTR(interrupts, S_IRUGO, channel_interrupts_show, NULL);
-static ssize_t channel_events_show(const struct vmbus_channel *channel, char *buf)
+static ssize_t channel_events_show(struct vmbus_channel *channel, char *buf)
{
return sprintf(buf, "%llu\n", channel->sig_events);
}
static VMBUS_CHAN_ATTR(events, S_IRUGO, channel_events_show, NULL);
-static ssize_t channel_intr_in_full_show(const struct vmbus_channel *channel,
+static ssize_t channel_intr_in_full_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1530,7 +1554,7 @@ static ssize_t channel_intr_in_full_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(intr_in_full, 0444, channel_intr_in_full_show, NULL);
-static ssize_t channel_intr_out_empty_show(const struct vmbus_channel *channel,
+static ssize_t channel_intr_out_empty_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1538,7 +1562,7 @@ static ssize_t channel_intr_out_empty_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(intr_out_empty, 0444, channel_intr_out_empty_show, NULL);
-static ssize_t channel_out_full_first_show(const struct vmbus_channel *channel,
+static ssize_t channel_out_full_first_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1546,7 +1570,7 @@ static ssize_t channel_out_full_first_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(out_full_first, 0444, channel_out_full_first_show, NULL);
-static ssize_t channel_out_full_total_show(const struct vmbus_channel *channel,
+static ssize_t channel_out_full_total_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%llu\n",
@@ -1554,14 +1578,14 @@ static ssize_t channel_out_full_total_show(const struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR(out_full_total, 0444, channel_out_full_total_show, NULL);
-static ssize_t subchannel_monitor_id_show(const struct vmbus_channel *channel,
+static ssize_t subchannel_monitor_id_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%u\n", channel->offermsg.monitorid);
}
static VMBUS_CHAN_ATTR(monitor_id, S_IRUGO, subchannel_monitor_id_show, NULL);
-static ssize_t subchannel_id_show(const struct vmbus_channel *channel,
+static ssize_t subchannel_id_show(struct vmbus_channel *channel,
char *buf)
{
return sprintf(buf, "%u\n",
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 64698ec8f2ac..8b9a93c99c9b 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -141,6 +141,11 @@ struct hv_ring_buffer_info {
u32 ring_datasize; /* < ring_size */
u32 priv_read_index;
+ /*
+ * The ring buffer mutex lock. This lock prevents the ring buffer from
+ * being freed while the ring buffer is being accessed.
+ */
+ struct mutex ring_buffer_mutex;
};
@@ -1206,7 +1211,7 @@ struct hv_ring_buffer_debug_info {
};
-int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
+int hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_buffer_debug_info *debug_info);
/* Vmbus interface */
--
2.17.1
Set "ring_info->priv_read_index" to 0. Now, all of ring_info's fields
are explicitly set in this function. The memset() call is no longer
necessary, so remove it.
Signed-off-by: Kimberly Brown <[email protected]>
---
drivers/hv/ring_buffer.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 9e8b31ccc142..0386ff48c5ea 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -197,8 +197,6 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
- memset(ring_info, 0, sizeof(struct hv_ring_buffer_info));
-
/*
* First page holds struct hv_ring_buffer, do wraparound mapping for
* the rest.
@@ -232,6 +230,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
reciprocal_value(ring_info->ring_size / 10);
ring_info->ring_datasize = ring_info->ring_size -
sizeof(struct hv_ring_buffer);
+ ring_info->priv_read_index = 0;
spin_lock_init(&ring_info->ring_lock);
--
2.17.1
On Thu, 14 Mar 2019 13:05:15 -0700
"Kimberly Brown" <[email protected]> wrote:
> Fix a race condition that can result in a ring buffer pointer being set
> to null while a "_show" function is reading the ring buffer's data. This
> problem was discussed here:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org
> %2Flkml%2F2018%2F10%2F18%2F779&data=02%7C01%7Csthemmin%40microsoft.com
> %7C1d7557d667b741bdbb6008d6a8b8620f%7C72f988bf86f141af91ab2d7cd011db47%7C1
> %7C0%7C636881907217609564&sdata=1bUbLaxsODANM7lCBR8lxyYajNpufuwUW%2FOl
> vtGu2hU%3D&reserved=0
>
> To fix the race condition, add a new mutex lock to the
> "hv_ring_buffer_info" struct. Add a new function,
> "hv_ringbuffer_pre_init()", where a channel's inbound and outbound
> ring_buffer_info mutex locks are initialized.
>
> Acquire/release the locks in the "hv_ringbuffer_cleanup()" function,
> which is where the ring buffer pointers are set to null.
>
> Acquire/release the locks in the four channel-level "_show" functions
> that access ring buffer data. Remove the "const" qualifier from the
> "vmbus_channel" parameter and the "rbi" variable of the channel-level
> "_show" functions so that the locks can be acquired/released in these
> functions.
>
> Acquire/release the locks in hv_ringbuffer_get_debuginfo(). Remove the
> "const" qualifier from the "hv_ring_buffer_info" parameter so that the
> locks can be acquired/released in this function.
>
> Signed-off-by: Kimberly Brown <[email protected]>
> ---
> drivers/hv/channel_mgmt.c | 2 +
> drivers/hv/hyperv_vmbus.h | 1 +
> drivers/hv/ring_buffer.c | 19 ++++++++-
> drivers/hv/vmbus_drv.c | 82 +++++++++++++++++++++++++--------------
> include/linux/hyperv.h | 7 +++-
> 5 files changed, 79 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 9d7f9c1c60c7..14543059cc3e 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -336,6 +336,8 @@ static struct vmbus_channel *alloc_channel(void)
> tasklet_init(&channel->callback_event,
> vmbus_on_event, (unsigned long)channel);
>
> + hv_ringbuffer_pre_init(channel);
> +
> return channel;
> }
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index a94aab94e0b5..e5467b821f41 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -193,6 +193,7 @@ extern void hv_synic_clockevents_cleanup(void);
>
> /* Interface */
>
> +void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
>
> int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> struct page *pages, u32 pagecnt);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 0386ff48c5ea..121a01c43298 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -166,14 +166,18 @@ hv_get_ringbuffer_availbytes(const struct
> hv_ring_buffer_info *rbi,
> }
>
> /* Get various debug metrics for the specified ring buffer. */
> -int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info
> *ring_info,
> +int hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
> struct hv_ring_buffer_debug_info
> *debug_info)
> {
> u32 bytes_avail_towrite;
> u32 bytes_avail_toread;
>
> - if (!ring_info->ring_buffer)
> + mutex_lock(&ring_info->ring_buffer_mutex);
> +
> + if (!ring_info->ring_buffer) {
> + mutex_unlock(&ring_info->ring_buffer_mutex);
> return -EINVAL;
> + }
>
> hv_get_ringbuffer_availbytes(ring_info,
> &bytes_avail_toread,
> @@ -184,10 +188,19 @@ int hv_ringbuffer_get_debuginfo(const struct
> hv_ring_buffer_info *ring_info,
> debug_info->current_write_index =
> ring_info->ring_buffer->write_index;
> debug_info->current_interrupt_mask
> = ring_info->ring_buffer->interrupt_mask;
> + mutex_unlock(&ring_info->ring_buffer_mutex);
> +
> return 0;
> }
> EXPORT_SYMBOL_GPL(hv_ringbuffer_get_debuginfo);
>
> +/* Initialize a channel's ring buffer info mutex locks */
> +void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
> +{
> + mutex_init(&channel->inbound.ring_buffer_mutex);
> + mutex_init(&channel->outbound.ring_buffer_mutex);
> +}
> +
> /* Initialize the ring buffer. */
> int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> struct page *pages, u32 page_cnt)
> @@ -240,8 +253,10 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info
> *ring_info,
> /* Cleanup the ring buffer. */
> void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info)
> {
> + mutex_lock(&ring_info->ring_buffer_mutex);
> vunmap(ring_info->ring_buffer);
> ring_info->ring_buffer = NULL;
> + mutex_unlock(&ring_info->ring_buffer_mutex);
> }
>
> /* Write to the ring buffer. */
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 7f15c41d952e..84f3a510b4c9 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1410,7 +1410,7 @@ static void vmbus_chan_release(struct kobject *kobj)
>
> struct vmbus_chan_attribute {
> struct attribute attr;
> - ssize_t (*show)(const struct vmbus_channel *chan, char *buf);
> + ssize_t (*show)(struct vmbus_channel *chan, char *buf);
> ssize_t (*store)(struct vmbus_channel *chan,
> const char *buf, size_t count);
> };
> @@ -1429,7 +1429,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject
> *kobj,
> {
> const struct vmbus_chan_attribute *attribute
> = container_of(attr, struct vmbus_chan_attribute, attr);
> - const struct vmbus_channel *chan
> + struct vmbus_channel *chan
> = container_of(kobj, struct vmbus_channel, kobj);
>
> if (!attribute->show)
> @@ -1442,57 +1442,81 @@ static const struct sysfs_ops vmbus_chan_sysfs_ops
> = {
> .show = vmbus_chan_attr_show,
> };
>
> -static ssize_t out_mask_show(const struct vmbus_channel *channel, char
> *buf)
> +static ssize_t out_mask_show(struct vmbus_channel *channel, char *buf)
> {
> - const struct hv_ring_buffer_info *rbi = &channel->outbound;
> + struct hv_ring_buffer_info *rbi = &channel->outbound;
> + ssize_t ret;
>
> - if (!rbi->ring_buffer)
> + mutex_lock(&rbi->ring_buffer_mutex);
> + if (!rbi->ring_buffer) {
> + mutex_unlock(&rbi->ring_buffer_mutex);
> return -EINVAL;
> + }
>
> - return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
> + ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
> + mutex_unlock(&rbi->ring_buffer_mutex);
> + return ret;
> }
> static VMBUS_CHAN_ATTR_RO(out_mask);
>
> -static ssize_t in_mask_show(const struct vmbus_channel *channel, char
> *buf)
> +static ssize_t in_mask_show(struct vmbus_channel *channel, char *buf)
> {
> - const struct hv_ring_buffer_info *rbi = &channel->inbound;
> + struct hv_ring_buffer_info *rbi = &channel->inbound;
> + ssize_t ret;
>
> - if (!rbi->ring_buffer)
> + mutex_lock(&rbi->ring_buffer_mutex);
> + if (!rbi->ring_buffer) {
> + mutex_unlock(&rbi->ring_buffer_mutex);
> return -EINVAL;
> + }
>
> - return sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
> + ret = sprintf(buf, "%u\n", rbi->ring_buffer->interrupt_mask);
> + mutex_unlock(&rbi->ring_buffer_mutex);
> + return ret;
> }
> static VMBUS_CHAN_ATTR_RO(in_mask);
>
> -static ssize_t read_avail_show(const struct vmbus_channel *channel, char
> *buf)
> +static ssize_t read_avail_show(struct vmbus_channel *channel, char *buf)
> {
> - const struct hv_ring_buffer_info *rbi = &channel->inbound;
> + struct hv_ring_buffer_info *rbi = &channel->inbound;
> + ssize_t ret;
>
> - if (!rbi->ring_buffer)
> + mutex_lock(&rbi->ring_buffer_mutex);
> + if (!rbi->ring_buffer) {
> + mutex_unlock(&rbi->ring_buffer_mutex);
> return -EINVAL;
> + }
>
> - return sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
> + ret = sprintf(buf, "%u\n", hv_get_bytes_to_read(rbi));
> + mutex_unlock(&rbi->ring_buffer_mutex);
> + return ret;
> }
> static VMBUS_CHAN_ATTR_RO(read_avail);
>
> -static ssize_t write_avail_show(const struct vmbus_channel *channel, char
> *buf)
> +static ssize_t write_avail_show(struct vmbus_channel *channel, char *buf)
> {
> - const struct hv_ring_buffer_info *rbi = &channel->outbound;
> + struct hv_ring_buffer_info *rbi = &channel->outbound;
> + ssize_t ret;
>
> - if (!rbi->ring_buffer)
> + mutex_lock(&rbi->ring_buffer_mutex);
> + if (!rbi->ring_buffer) {
> + mutex_unlock(&rbi->ring_buffer_mutex);
> return -EINVAL;
> + }
>
> - return sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
> + ret = sprintf(buf, "%u\n", hv_get_bytes_to_write(rbi));
> + mutex_unlock(&rbi->ring_buffer_mutex);
> + return ret;
> }
> static VMBUS_CHAN_ATTR_RO(write_avail);
>
> -static ssize_t show_target_cpu(const struct vmbus_channel *channel, char
> *buf)
> +static ssize_t show_target_cpu(struct vmbus_channel *channel, char *buf)
> {
> return sprintf(buf, "%u\n", channel->target_cpu);
> }
> static VMBUS_CHAN_ATTR(cpu, S_IRUGO, show_target_cpu, NULL);
>
> -static ssize_t channel_pending_show(const struct vmbus_channel *channel,
> +static ssize_t channel_pending_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%d\n",
> @@ -1501,7 +1525,7 @@ static ssize_t channel_pending_show(const struct
> vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR(pending, S_IRUGO, channel_pending_show, NULL);
>
> -static ssize_t channel_latency_show(const struct vmbus_channel *channel,
> +static ssize_t channel_latency_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%d\n",
> @@ -1510,19 +1534,19 @@ static ssize_t channel_latency_show(const struct
> vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR(latency, S_IRUGO, channel_latency_show, NULL);
>
> -static ssize_t channel_interrupts_show(const struct vmbus_channel
> *channel, char *buf)
> +static ssize_t channel_interrupts_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%llu\n", channel->interrupts);
> }
> static VMBUS_CHAN_ATTR(interrupts, S_IRUGO, channel_interrupts_show,
> NULL);
>
> -static ssize_t channel_events_show(const struct vmbus_channel *channel,
> char *buf)
> +static ssize_t channel_events_show(struct vmbus_channel *channel, char
> *buf)
> {
> return sprintf(buf, "%llu\n", channel->sig_events);
> }
> static VMBUS_CHAN_ATTR(events, S_IRUGO, channel_events_show, NULL);
>
> -static ssize_t channel_intr_in_full_show(const struct vmbus_channel
> *channel,
> +static ssize_t channel_intr_in_full_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%llu\n",
> @@ -1530,7 +1554,7 @@ static ssize_t channel_intr_in_full_show(const
> struct vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR(intr_in_full, 0444, channel_intr_in_full_show,
> NULL);
>
> -static ssize_t channel_intr_out_empty_show(const struct vmbus_channel
> *channel,
> +static ssize_t channel_intr_out_empty_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%llu\n",
> @@ -1538,7 +1562,7 @@ static ssize_t channel_intr_out_empty_show(const
> struct vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR(intr_out_empty, 0444, channel_intr_out_empty_show,
> NULL);
>
> -static ssize_t channel_out_full_first_show(const struct vmbus_channel
> *channel,
> +static ssize_t channel_out_full_first_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%llu\n",
> @@ -1546,7 +1570,7 @@ static ssize_t channel_out_full_first_show(const
> struct vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR(out_full_first, 0444, channel_out_full_first_show,
> NULL);
>
> -static ssize_t channel_out_full_total_show(const struct vmbus_channel
> *channel,
> +static ssize_t channel_out_full_total_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%llu\n",
> @@ -1554,14 +1578,14 @@ static ssize_t channel_out_full_total_show(const
> struct vmbus_channel *channel,
> }
> static VMBUS_CHAN_ATTR(out_full_total, 0444, channel_out_full_total_show,
> NULL);
>
> -static ssize_t subchannel_monitor_id_show(const struct vmbus_channel
> *channel,
> +static ssize_t subchannel_monitor_id_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%u\n", channel->offermsg.monitorid);
> }
> static VMBUS_CHAN_ATTR(monitor_id, S_IRUGO, subchannel_monitor_id_show,
> NULL);
>
> -static ssize_t subchannel_id_show(const struct vmbus_channel *channel,
> +static ssize_t subchannel_id_show(struct vmbus_channel *channel,
> char *buf)
> {
> return sprintf(buf, "%u\n",
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 64698ec8f2ac..8b9a93c99c9b 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -141,6 +141,11 @@ struct hv_ring_buffer_info {
>
> u32 ring_datasize; /* < ring_size */
> u32 priv_read_index;
> + /*
> + * The ring buffer mutex lock. This lock prevents the ring buffer
> from
> + * being freed while the ring buffer is being accessed.
> + */
> + struct mutex ring_buffer_mutex;
> };
>
>
> @@ -1206,7 +1211,7 @@ struct hv_ring_buffer_debug_info {
> };
>
>
> -int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info
> *ring_info,
> +int hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
> struct hv_ring_buffer_debug_info
> *debug_info);
>
> /* Vmbus interface */
Adding more locks will solve the problem but it seems like overkill.
Why not either use a reference count or an RCU style access for the
ring buffer?
On Thu, Mar 14, 2019 at 03:45:33PM -0700, Stephen Hemminger wrote:
> On Thu, 14 Mar 2019 13:05:15 -0700
> "Kimberly Brown" <[email protected]> wrote:
>
> > Fix a race condition that can result in a ring buffer pointer being set
> > to null while a "_show" function is reading the ring buffer's data. This
> > problem was discussed here:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org
> > %2Flkml%2F2018%2F10%2F18%2F779&data=02%7C01%7Csthemmin%40microsoft.com
> > %7C1d7557d667b741bdbb6008d6a8b8620f%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > %7C0%7C636881907217609564&sdata=1bUbLaxsODANM7lCBR8lxyYajNpufuwUW%2FOl
> > vtGu2hU%3D&reserved=0
> >
> > To fix the race condition, add a new mutex lock to the
> > "hv_ring_buffer_info" struct. Add a new function,
> > "hv_ringbuffer_pre_init()", where a channel's inbound and outbound
> > ring_buffer_info mutex locks are initialized.
> >
> > ... snip ...
>
> Adding more locks will solve the problem but it seems like overkill.
> Why not either use a reference count or an RCU style access for the
> ring buffer?
I agree that a reference count or RCU could also solve this problem.
Using mutex locks seemed like the most straightforward solution, but
I'll certainly switch to a different approach if it's better!
Are you concerned about the extra memory required for the mutex locks,
read performance, or something else?
Thanks,
Kim
On Sat, 16 Mar 2019 21:49:28 -0400
Kimberly Brown <[email protected]> wrote:
> On Thu, Mar 14, 2019 at 03:45:33PM -0700, Stephen Hemminger wrote:
> > On Thu, 14 Mar 2019 13:05:15 -0700
> > "Kimberly Brown" <[email protected]> wrote:
> >
> > > Fix a race condition that can result in a ring buffer pointer being set
> > > to null while a "_show" function is reading the ring buffer's data. This
> > > problem was discussed here:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org
> > > %2Flkml%2F2018%2F10%2F18%2F779&data=02%7C01%7Csthemmin%40microsoft.com
> > > %7C1d7557d667b741bdbb6008d6a8b8620f%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > > %7C0%7C636881907217609564&sdata=1bUbLaxsODANM7lCBR8lxyYajNpufuwUW%2FOl
> > > vtGu2hU%3D&reserved=0
> > >
> > > To fix the race condition, add a new mutex lock to the
> > > "hv_ring_buffer_info" struct. Add a new function,
> > > "hv_ringbuffer_pre_init()", where a channel's inbound and outbound
> > > ring_buffer_info mutex locks are initialized.
> > >
> > > ... snip ...
> >
> > Adding more locks will solve the problem but it seems like overkill.
> > Why not either use a reference count or an RCU style access for the
> > ring buffer?
>
> I agree that a reference count or RCU could also solve this problem.
> Using mutex locks seemed like the most straightforward solution, but
> I'll certainly switch to a different approach if it's better!
>
> Are you concerned about the extra memory required for the mutex locks,
> read performance, or something else?
Locks in control path are ok, but my concern is performance of the
data path which puts packets in/out of rings. To keep reasonable performance,
no additional locking should be added in those paths.
So if data path is using RCU, can/should the control operations also
use it?
On Wed, Mar 20, 2019 at 01:06:19PM -0700, Stephen Hemminger wrote:
> On Sat, 16 Mar 2019 21:49:28 -0400
> Kimberly Brown <[email protected]> wrote:
>
> > On Thu, Mar 14, 2019 at 03:45:33PM -0700, Stephen Hemminger wrote:
> > > On Thu, 14 Mar 2019 13:05:15 -0700
> > > "Kimberly Brown" <[email protected]> wrote:
> > >
> > > > Fix a race condition that can result in a ring buffer pointer being set
> > > > to null while a "_show" function is reading the ring buffer's data. This
> > > > problem was discussed here:
> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org
> > > > %2Flkml%2F2018%2F10%2F18%2F779&data=02%7C01%7Csthemmin%40microsoft.com
> > > > %7C1d7557d667b741bdbb6008d6a8b8620f%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > > > %7C0%7C636881907217609564&sdata=1bUbLaxsODANM7lCBR8lxyYajNpufuwUW%2FOl
> > > > vtGu2hU%3D&reserved=0
> > > >
> > > > To fix the race condition, add a new mutex lock to the
> > > > "hv_ring_buffer_info" struct. Add a new function,
> > > > "hv_ringbuffer_pre_init()", where a channel's inbound and outbound
> > > > ring_buffer_info mutex locks are initialized.
> > > >
> > > > ... snip ...
> > >
> > > Adding more locks will solve the problem but it seems like overkill.
> > > Why not either use a reference count or an RCU style access for the
> > > ring buffer?
> >
> > I agree that a reference count or RCU could also solve this problem.
> > Using mutex locks seemed like the most straightforward solution, but
> > I'll certainly switch to a different approach if it's better!
> >
> > Are you concerned about the extra memory required for the mutex locks,
> > read performance, or something else?
>
> Locks in control path are ok, but my concern is performance of the
> data path which puts packets in/out of rings. To keep reasonable performance,
> no additional locking should be added in those paths.
>
> So if data path is using RCU, can/should the control operations also
> use it?
The data path doesn't use RCU to protect the ring buffers.
From: Kimberly Brown <[email protected]> Sent: Wednesday, March 20, 2019 8:48 PM
> > > > Adding more locks will solve the problem but it seems like overkill.
> > > > Why not either use a reference count or an RCU style access for the
> > > > ring buffer?
> > >
> > > I agree that a reference count or RCU could also solve this problem.
> > > Using mutex locks seemed like the most straightforward solution, but
> > > I'll certainly switch to a different approach if it's better!
> > >
> > > Are you concerned about the extra memory required for the mutex locks,
> > > read performance, or something else?
> >
> > Locks in control path are ok, but my concern is performance of the
> > data path which puts packets in/out of rings. To keep reasonable performance,
> > no additional locking should be added in those paths.
> >
> > So if data path is using RCU, can/should the control operations also
> > use it?
>
> The data path doesn't use RCU to protect the ring buffers.
My $.02: The mutex is obtained only in the sysfs path and the "delete
ringbuffers" path, neither of which is performance or concurrency sensitive.
There's no change to any path that reads or writes data from/to the ring
buffers. It seems like the mutex is the most straightforward solution to
preventing sysfs from accessing the ring buffer info while the memory is
being freed as part of "delete ringbuffers".
Michael
On Thu, Mar 21, 2019 at 04:04:20PM +0000, Michael Kelley wrote:
> From: Kimberly Brown <[email protected]> Sent: Wednesday, March 20, 2019 8:48 PM
> > > > > Adding more locks will solve the problem but it seems like overkill.
> > > > > Why not either use a reference count or an RCU style access for the
> > > > > ring buffer?
> > > >
> > > > I agree that a reference count or RCU could also solve this problem.
> > > > Using mutex locks seemed like the most straightforward solution, but
> > > > I'll certainly switch to a different approach if it's better!
> > > >
> > > > Are you concerned about the extra memory required for the mutex locks,
> > > > read performance, or something else?
> > >
> > > Locks in control path are ok, but my concern is performance of the
> > > data path which puts packets in/out of rings. To keep reasonable performance,
> > > no additional locking should be added in those paths.
> > >
> > > So if data path is using RCU, can/should the control operations also
> > > use it?
> >
Hi Stephen,
Do you have any additional questions or suggestions for this race
condition and the mutex locks? I think that your initial questions were
addressed in the responses below. If there's anything else, please let
me know!
Thanks,
Kim
> > The data path doesn't use RCU to protect the ring buffers.
>
> My $.02: The mutex is obtained only in the sysfs path and the "delete
> ringbuffers" path, neither of which is performance or concurrency sensitive.
> There's no change to any path that reads or writes data from/to the ring
> buffers. It seems like the mutex is the most straightforward solution to
> preventing sysfs from accessing the ring buffer info while the memory is
> being freed as part of "delete ringbuffers".
>
> Michael
On Thu, 28 Mar 2019 00:30:57 -0400
Kimberly Brown <[email protected]> wrote:
> On Thu, Mar 21, 2019 at 04:04:20PM +0000, Michael Kelley wrote:
> > From: Kimberly Brown <[email protected]> Sent: Wednesday, March 20, 2019 8:48 PM
> > > > > > Adding more locks will solve the problem but it seems like overkill.
> > > > > > Why not either use a reference count or an RCU style access for the
> > > > > > ring buffer?
> > > > >
> > > > > I agree that a reference count or RCU could also solve this problem.
> > > > > Using mutex locks seemed like the most straightforward solution, but
> > > > > I'll certainly switch to a different approach if it's better!
> > > > >
> > > > > Are you concerned about the extra memory required for the mutex locks,
> > > > > read performance, or something else?
> > > >
> > > > Locks in control path are ok, but my concern is performance of the
> > > > data path which puts packets in/out of rings. To keep reasonable performance,
> > > > no additional locking should be added in those paths.
> > > >
> > > > So if data path is using RCU, can/should the control operations also
> > > > use it?
> > >
>
>
> Hi Stephen,
>
> Do you have any additional questions or suggestions for this race
> condition and the mutex locks? I think that your initial questions were
> addressed in the responses below. If there's anything else, please let
> me know!
>
> Thanks,
> Kim
>
>
> > > The data path doesn't use RCU to protect the ring buffers.
> >
> > My $.02: The mutex is obtained only in the sysfs path and the "delete
> > ringbuffers" path, neither of which is performance or concurrency sensitive.
> > There's no change to any path that reads or writes data from/to the ring
> > buffers. It seems like the mutex is the most straightforward solution to
> > preventing sysfs from accessing the ring buffer info while the memory is
> > being freed as part of "delete ringbuffers".
> >
> > Michael
I have no problems with the patch you did.
My discussion was more around the general issues with ringbuffers being detached
from the device. Not sure if it was even a good design choice but that is
something that is hard to fix now.
From: Kimberly Brown <[email protected]> Sent: Thursday, March 14, 2019 1:05 PM
>
> Set "ring_info->priv_read_index" to 0. Now, all of ring_info's fields
> are explicitly set in this function. The memset() call is no longer
> necessary, so remove it.
>
> Signed-off-by: Kimberly Brown <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
From: Kimberly Brown <[email protected]> Sent: Thursday, March 14, 2019 1:05 PM
>
> Fix a race condition that can result in a ring buffer pointer being set
> to null while a "_show" function is reading the ring buffer's data. This
> problem was discussed here: https://lkml.org/lkml/2018/10/18/779
>
> To fix the race condition, add a new mutex lock to the
> "hv_ring_buffer_info" struct. Add a new function,
> "hv_ringbuffer_pre_init()", where a channel's inbound and outbound
> ring_buffer_info mutex locks are initialized.
>
> Acquire/release the locks in the "hv_ringbuffer_cleanup()" function,
> which is where the ring buffer pointers are set to null.
>
> Acquire/release the locks in the four channel-level "_show" functions
> that access ring buffer data. Remove the "const" qualifier from the
> "vmbus_channel" parameter and the "rbi" variable of the channel-level
> "_show" functions so that the locks can be acquired/released in these
> functions.
>
> Acquire/release the locks in hv_ringbuffer_get_debuginfo(). Remove the
> "const" qualifier from the "hv_ring_buffer_info" parameter so that the
> locks can be acquired/released in this function.
>
> Signed-off-by: Kimberly Brown <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
On Thu, Mar 14, 2019 at 04:04:52PM -0400, Kimberly Brown wrote:
>This patchset fixes a race condition in the "_show" functions that
>access the channel ring buffers.
>
>Changes in v3:
>Patch 1: Drivers: hv: vmbus: Refactor chan->state if statement
> - Added the “reviewed-by” line from v2.
>
>Patch 2: Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
> - This patch is new. This change allows the new mutex locks in patch 3
> to be initialized when the channel is initialized.
>
>Patch 3: Drivers: hv: vmbus: Fix race condition with new
> ring_buffer_info mutex
> - Added two ring buffer info mutex locks instead of the single channel
> mutex lock that was proposed in v2.
> - Changed the mutex acquire/release calls as needed for the new ring
> buffer info mutex locks.
> - Updated the commit message.
>
>
>Changes in v2:
> - In v1, I proposed using “vmbus_connection.channel_mutex” in the
> “_show” functions to prevent the race condition. However, using this
> mutex could result in a deadlock, so a new approach is proposed in
> this patchset.
> - Patch 1 is new and consists of refactoring an if statement.
> - Patch 2 introduces a new mutex lock in the “vmbus_channel” struct,
> which is used to eliminate the race condition.
>
>Kimberly Brown (3):
> Drivers: hv: vmbus: Refactor chan->state if statement
> Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
> Drivers: hv: vmbus: Fix race condition with new ring_buffer_info mutex
>
> drivers/hv/channel_mgmt.c | 2 +
> drivers/hv/hyperv_vmbus.h | 1 +
> drivers/hv/ring_buffer.c | 22 ++++++++--
> drivers/hv/vmbus_drv.c | 89 +++++++++++++++++++++++++++------------
> include/linux/hyperv.h | 7 ++-
> 5 files changed, 88 insertions(+), 33 deletions(-)
Queued up, thanks Kimberly!
--
Thanks,
Sasha