2012-02-06 09:51:48

by Christian Hoff

[permalink] [raw]
Subject: Pe: [PATCH v5 1/3] virtio-scsi: first version

Hello Paolo,

first let me say that your patch is working fine on my local clone of the
qemu repository.

Let me ask just one question about the format of the data being
transmitted over the virtqueue.

Paolo Bonzini wrote:
+ cmd->req.cmd = (struct virtio_scsi_cmd_req){
+ .lun[0] = 1,
+ .lun[1] = sc->device->id,
+ .lun[2] = (sc->device->lun >> 8) | 0x40,
+ .lun[3] = sc->device->lun & 0xff,
+ [...]
+ };

Can't we have seperate fields for the SCSI target ID and the LUN number
here? Putting all this into a single field seems confusing. The following
line of code (sc->device->lun >> 8) | 0x40 essentially means that LUN
numbers will be limited to 8+6 Bits=14 Bits for no obvious reason that I
can see. Maybe we could just split the LUN field up into two uint32 fields
for target ID and LUN number?

Also, lun[1] = sc->device->id means that only 255 SCSI target IDs will be
supported. Think about bigger usage scenarios, such as FCP networks with
several hundred HBAs in the net. If you want to have the target ID<->HBA
mapping the same as on the guest as on the host, then 255 virtual target
IDs could be a limit.

Sorry for coming up so late with these suggestions. I hope there is still
enough time left to discuss and address these problems.

Mit freundlichen Gr??en / Kind regards

Christian Hoff

Student - Applied Computer Science


Phone:
49-16098976-950
IBM Deutschland

E-Mail:
[email protected]
Am Fichtenberg 1


71083 Herrenberg


Germany


IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke,
Dieter Scholz, Gregor Pillen, Joachim Heel, Christian Noll
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940


2012-02-07 09:55:09

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/06/2012 10:51 AM, Christian Hoff wrote:
> Hello Paolo,
>
> first let me say that your patch is working fine on my local clone of the
> qemu repository.
>
> Let me ask just one question about the format of the data being
> transmitted over the virtqueue.
>
> Paolo Bonzini wrote:
> + cmd->req.cmd = (struct virtio_scsi_cmd_req){
> + .lun[0] = 1,
> + .lun[1] = sc->device->id,
> + .lun[2] = (sc->device->lun>> 8) | 0x40,
> + .lun[3] = sc->device->lun& 0xff,
> + [...]
> + };
>
> Can't we have seperate fields for the SCSI target ID and the LUN number
> here? Putting all this into a single field seems confusing. The following
> line of code (sc->device->lun>> 8) | 0x40 essentially means that LUN
> numbers will be limited to 8+6 Bits=14 Bits for no obvious reason that I
> can see. Maybe we could just split the LUN field up into two uint32 fields
> for target ID and LUN number?

The 14-bit limitation can be lifted. SAM defines a 24-bit LUN format
too, but I've never seen it used in practice.

> Also, lun[1] = sc->device->id means that only 255 SCSI target IDs will be
> supported. Think about bigger usage scenarios, such as FCP networks with
> several hundred HBAs in the net. If you want to have the target ID<->HBA
> mapping the same as on the guest as on the host, then 255 virtual target
> IDs could be a limit.

I think you would hit other scalability limitations well before that. I
plan to give each target its own MSI-X interrupt, but there is no
infinite supplies of those either.

VMware only supports 15 targets and 255 LUNs per host, by comparison. I
think 255 targets and 16383 LUNs is already several times more than is
actually needed. But in any case, we could still use the fixed "1" byte
to go beyond 255 targets.

> Sorry for coming up so late with these suggestions. I hope there is still
> enough time left to discuss and address these problems.

Sure. :) I hope the above answer is satisfactory, though.

Paolo

2012-02-07 11:10:47

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 07, 2012 at 10:54:54AM +0100, Paolo Bonzini wrote:
> >Also, lun[1] = sc->device->id means that only 255 SCSI target IDs will be
> >supported. Think about bigger usage scenarios, such as FCP networks with
> >several hundred HBAs in the net. If you want to have the target ID<->HBA
> >mapping the same as on the guest as on the host, then 255 virtual target
> >IDs could be a limit.
>
> I think you would hit other scalability limitations well before
> that. I plan to give each target its own MSI-X interrupt, but there
> is no infinite supplies of those either.

virtio-pci generally lets guests share MSI-X vectors between queues,
why not allow this here?

--
MST

2012-02-07 11:26:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/07/2012 12:10 PM, Michael S. Tsirkin wrote:
>>> Also, lun[1] = sc->device->id means that only 255 SCSI target IDs will be
>>> > >supported. Think about bigger usage scenarios, such as FCP networks with
>>> > >several hundred HBAs in the net. If you want to have the target ID<->HBA
>>> > >mapping the same as on the guest as on the host, then 255 virtual target
>>> > >IDs could be a limit.
>> >
>> > I think you would hit other scalability limitations well before
>> > that. I plan to give each target its own MSI-X interrupt, but there
>> > is no infinite supplies of those either.
> virtio-pci generally lets guests share MSI-X vectors between queues,
> why not allow this here?

Yes, of course. However, with dozens of queues, many of them will share
the same vector and all of them will be examined when you get the
interrupt. Even if you find the right balance between sharing (because
you have to) and separating (because of scalability), I wouldn't be
surprised if more than 255 targets do not work too well.

Anyway multiqueue is not even in this patchset, so there's more work to
do before we can worry. :)

Paolo

2012-02-07 11:56:47

by Christian Borntraeger

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

>> + cmd->req.cmd = (struct virtio_scsi_cmd_req){
>> + .lun[0] = 1,
>> + .lun[1] = sc->device->id,
>> + .lun[2] = (sc->device->lun>> 8) | 0x40,
>> + .lun[3] = sc->device->lun& 0xff,
>> + [...]
>> + };
>>
>> Can't we have seperate fields for the SCSI target ID and the LUN number
>> here? Putting all this into a single field seems confusing. The following
>> line of code (sc->device->lun>> 8) | 0x40 essentially means that LUN
>> numbers will be limited to 8+6 Bits=14 Bits for no obvious reason that I
>> can see. Maybe we could just split the LUN field up into two uint32 fields
>> for target ID and LUN number?
>
> The 14-bit limitation can be lifted. SAM defines a 24-bit LUN format too,
> but I've never seen it used in practice.

Why not lift that limitation before the first version is committed upstream?

As far as I see we have to allocate multiple target ids if we want
to provide multipath (e.g. 8 target ids if there are 8 pathes, thus limiting
ourselves to 64 targets, no?)

As a compromise between space/flexibility, cant we just split the 4 bytes
in a similar fashion as our major/minor numbers (12/20bit)?

In the mainframe area I have seen real-life problems hitting the 64k
device limit for disks (mostly due to multipathing), it was extended
afterwards, but every extension tends to make an interface less
appealing. (look at some virtio drivers and you will find places
were feature bits made the code "less pretty")
>
>> Also, lun[1] = sc->device->id means that only 255 SCSI target IDs will be
>> supported. Think about bigger usage scenarios, such as FCP networks with
>> several hundred HBAs in the net. If you want to have the target ID<->HBA
>> mapping the same as on the guest as on the host, then 255 virtual target
>> IDs could be a limit.
>
> I think you would hit other scalability limitations well before that.

Performance might be fixed later, but the interface is then settled.
[..]

> But in any case, we could still use the fixed "1" byte to go beyond 255 targets.

Again, why not now? Any extension would require a feature bit, no?
Is there a technical reason why a fixed 1 here is better than just using that space
as scsi id? (e.g. hash table sizes, lookup etc)

Regards

Christian

PS: what puzzles me as well, is the fact that struct virtio_scsi_cmd_req has lun[8]
in the structure. That would sum up as 5 bytes wasted of those 8, no?

2012-02-07 12:31:39

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/07/2012 12:56 PM, Christian Borntraeger wrote:
>> The 14-bit limitation can be lifted. SAM defines a 24-bit LUN format too,
>> but I've never seen it used in practice.
>
> Why not lift that limitation before the first version is committed upstream?

Because nobody supports >14-bit LUNs. The in-kernel LIO target doesn't,
I don't know anything that does.

> As far as I see we have to allocate multiple target ids if we want
> to provide multipath (e.g. 8 target ids if there are 8 pathes, thus limiting
> ourselves to 64 targets, no?)

Well, 256/8 is actually 32, but yes. :) But it's more likely that you
would do multipathing and ALUA on the host, and present a single path to
the guest using the QEMU SCSI target. The guest would see a single disk
that just works.

> As a compromise between space/flexibility, cant we just split the 4 bytes
> in a similar fashion as our major/minor numbers (12/20bit)?

The structure of the LUN is defined by SAM, not by me. I specified a
mandatory subset for two reason: 1) simplicity of implementation; 2)
nobody supports the full hierarchical LUNs spec (SCSI has a lot of
fringe features), so you need to standardize on something to start with.

Also, you can always have more than one HBA. You do not even have to
hotplug the HBAs, you can start a guest with 8 HBAs on a single
multifunction PCI device, and add disks to them as you see fit.

> Again, why not now? Any extension would require a feature bit, no?

No, because the guest would simply scan a wider LUN space, and it would
not find anything on older hosts. It could also look at the
max_channel/max_target/max_lun fields in the configuration and print a
warning to the user that some targets may not be accessible.

Paolo

2012-02-07 13:18:20

by Christian Borntraeger

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 07/02/12 13:31, Paolo Bonzini wrote:
> The structure of the LUN is defined by SAM, not by me.

Ok, got it.

2012-02-07 14:00:10

by Christian Hoff

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

Hello Paolo,

On 07/02/12 13:31, Paolo Bonzini wrote:
> The structure of the LUN is defined by SAM, not by me.

Just had a look at the SAM LUN structure. It appears that you are using
the SCSI format for hierarchical LUNs(SAM 5 chapter 4.7.6) in order to
indicate that the LUN is actually attached to the host system and not to
the Qemu guest operating system itself.

What I am wondering is why you are doing this? The hierarchical SCSI
format is only retained while the LUN number is being transmitted over the
virtqueue, so there does not seem to be a technical necessity for using
this format in order to transmit SCSI target ID and LUN number information
to the host.

Instead the format has some disadvantages:
- It uses up 8 bytes where 3 bytes would be sufficient in order to store
both the target ID and LUN number information
- The format limits us to 255 target IDs. I agree that the LUN limit is
probably more a theoretical and not a practical one, but 255 target IDs
could become a limitation in the future.

Nonetheless I think that virtio-scsi is a useful project and addresses
many of the limitations imposed by virtio-block. The fact that I am still
persisting has more to do with interest in the project rather than wanting
to keep the code from going upstream. Thank you for your work on this
topic.

Mit freundlichen Gr??en / Kind regards

Christian Hoff

Student - Applied Computer Science


Phone:
49-16098976-950
IBM Deutschland

E-Mail:
[email protected]
Am Fichtenberg 1


71083 Herrenberg


Germany


IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke,
Dieter Scholz, Gregor Pillen, Joachim Heel, Christian Noll
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940




From:
[email protected]
To:
Paolo Bonzini <[email protected]>
Cc:
Christian Hoff/Germany/IBM@IBMDE, [email protected],
[email protected], [email protected], [email protected],
[email protected]
Date:
07/02/2012 14:18
Subject:
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version



On 07/02/12 13:31, Paolo Bonzini wrote:
> The structure of the LUN is defined by SAM, not by me.

Ok, got it.


2012-02-07 14:29:06

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/07/2012 02:59 PM, Christian Hoff wrote:
> Instead the format has some disadvantages:
> - It uses up 8 bytes where 3 bytes would be sufficient in order to store
> both the target ID and LUN number information
> - The format limits us to 255 target IDs. I agree that the LUN limit is
> probably more a theoretical and not a practical one, but 255 target IDs
> could become a limitation in the future.

It also provides better upwards-compatibility in case the limitations
are actually hit. If I had used "uint8_t target; uint16_t lun;" an
extension would require a feature bit and a new struct. With 8-bytes,
you can just expand the definition. That pretty much sums it up.

But again, I don't think the limitations are serious. A MegaSAS header
has room for 256 targets too, VMWare has only 15, Hyper-V has 1 (and 2
channels, but I think that's an off-by-one), and you can always have
multiple HBAs on the same guest.

> Nonetheless I think that virtio-scsi is a useful project and addresses
> many of the limitations imposed by virtio-block. The fact that I am still
> persisting has more to do with interest in the project rather than wanting
> to keep the code from going upstream.

No problem. :)

Paolo

2012-02-08 13:37:34

by Christian Hoff

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

Paolo Bonzini wrote:
> Christian Hoff wrote:
> > Instead the format has some disadvantages:
> > - It uses up 8 bytes where 3 bytes would be sufficient in order to
store
> > both the target ID and LUN number information
> > - The format limits us to 255 target IDs. I agree that the LUN limit
is
> > probably more a theoretical and not a practical one, but 255 target
IDs
> > could become a limitation in the future.
>
> It also provides better upwards-compatibility in case the limitations
> are actually hit. If I had used "uint8_t target; uint16_t lun;" an
> extension would require a feature bit and a new struct. With 8-bytes,
> you can just expand the definition. That pretty much sums it up.

Ok, fair enough. This addresses my question.

Again, I have already done much testing with virtio-scsi and can confirm
that the code is working flawlessly. In my opinion, virtio-scsi is a
worthwhile addition to virtio-block and should be considered for inclusion
into mainline kernel code.

Mit freundlichen Gr??en / Kind regards

Christian Hoff

Student - Applied Computer Science


Phone:
49-16098976-950
IBM Deutschland

E-Mail:
[email protected]
Am Fichtenberg 1


71083 Herrenberg


Germany


IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke,
Dieter Scholz, Gregor Pillen, Joachim Heel, Christian Noll
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

2012-02-09 09:25:27

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/08/2012 02:37 PM, Christian Hoff wrote:
> Again, I have already done much testing with virtio-scsi and can confirm
> that the code is working flawlessly. In my opinion, virtio-scsi is a
> worthwhile addition to virtio-block and should be considered for inclusion
> into mainline kernel code.

Thank you very much!

James, will you include virtio-scsi in 3.4?

Thanks,

Paolo

2012-02-09 12:18:37

by Christian Hoff

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

Paolo Bonzini wrote:
> James, will you include virtio-scsi in 3.4?

The key arguments from my side for inclusion of virtio-scsi are:
- Support for SCSI multipathing, which can optionally be done on the host
operating system.
- We can now use other SCSI(non-block) devices and make them accessible to
the guest. For example we were able to virtualize a SCSI FCP tape using
virtio-scsi. One of the biggest constraints of virtio-block is that it is
limited to block devices, while virtio-scsi presents you with a more
generic solution.

So I consider virtio-scsi to be the cleaner approach to do virtualization
of SCSI commands. This is supported by the fact that virtio-block has a
maintenance problem and has to change its protocol from time to time to
support new features introduced by a new version of the SCSI standard.

So please consider including virtio-scsi into the mainline kernel tree.
This is good and conceptually clean code, and it has proven to work very
well in our internal testing.

Mit freundlichen Gr??en / Kind regards

Christian Hoff

Student - Applied Computer Science


Phone:
49-16098976-950
IBM Deutschland

E-Mail:
[email protected]
Am Fichtenberg 1


71083 Herrenberg


Germany


IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Reinhard Reschke,
Dieter Scholz, Gregor Pillen, Joachim Heel, Christian Noll
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 14562 / WEEE-Reg.-Nr. DE 99369940

2012-02-12 20:16:23

by James Bottomley

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Thu, 2012-02-09 at 10:25 +0100, Paolo Bonzini wrote:
> On 02/08/2012 02:37 PM, Christian Hoff wrote:
> > Again, I have already done much testing with virtio-scsi and can confirm
> > that the code is working flawlessly. In my opinion, virtio-scsi is a
> > worthwhile addition to virtio-block and should be considered for inclusion
> > into mainline kernel code.
>
> Thank you very much!
>
> James, will you include virtio-scsi in 3.4?

Well, no-one's yet answered the question I had about why. virtio-scsi
seems to be a basic duplication of virtio-blk except that it seems to
fix some problems virtio-blk has. Namely queue parameter discover,
which virtio-blk doesn't seem to do. There may also be a reason to cut
the stack lower down. Error handling is most often cited for this, but
no-one's satisfactorily explaned why it's better to do error handling in
the guest instead of the host.

Could someone please explain to me why you can't simply fix virtio-blk?
Or would virtio-blk maintainers give a reason why they're unwilling to
have it fixed?

This isn't a "no" by the way: we have absolutely hideous virtual drivers
for other virtualisation systems in SCSI which should also have been in
block, except that's not the way the various virt people think, so I'm
willing to extend KVM the same courtesy ... I'd just really like to know
that the virtio-blk situation is intractable before I do.

James

2012-02-12 23:41:46

by Rusty Russell

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Sun, 12 Feb 2012 14:16:17 -0600, James Bottomley <[email protected]> wrote:
> On Thu, 2012-02-09 at 10:25 +0100, Paolo Bonzini wrote:
> > On 02/08/2012 02:37 PM, Christian Hoff wrote:
> > > Again, I have already done much testing with virtio-scsi and can confirm
> > > that the code is working flawlessly. In my opinion, virtio-scsi is a
> > > worthwhile addition to virtio-block and should be considered for inclusion
> > > into mainline kernel code.
> >
> > Thank you very much!
> >
> > James, will you include virtio-scsi in 3.4?
>
> Well, no-one's yet answered the question I had about why. virtio-scsi
> seems to be a basic duplication of virtio-blk except that it seems to
> fix some problems virtio-blk has. Namely queue parameter discover,
> which virtio-blk doesn't seem to do. There may also be a reason to cut
> the stack lower down. Error handling is most often cited for this, but
> no-one's satisfactorily explaned why it's better to do error handling in
> the guest instead of the host.
>
> Could someone please explain to me why you can't simply fix virtio-blk?
> Or would virtio-blk maintainers give a reason why they're unwilling to
> have it fixed?

My concern is simple: virtio_blk covers the 99% of cases, with very
little complexity. To get that last 1%, we will end up re-specing much
of SCSI.

Having found someone who understand SCSI and is eager to maintain a
driver and spec, I am deeply tempted to partition the problem as simple
== virtio_blk, complex == virtio_scsi.

In fact, it would allow us to tighten the spec on VIRTIO_BLK_T_SCSI_CMD
to its actual use, which AFAICT is CDROMEJECT (maybe CDROMCLOSETRAY).

Cheers,
Rusty.

2012-02-13 07:06:09

by Christian Borntraeger

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 12/02/12 21:16, James Bottomley wrote:
> Well, no-one's yet answered the question I had about why.

Just to give one example from a different angle:
In the big datacenters tape libraries are still very important, and lots
of them have a scsi attachement. virtio-blk certainly is not the right
way to handle those. Furthermore it seems even pretty hard to craft
a virtio-tape since most of those libraries have vendor specific library
controls (via sg). We would need to duplicate scsi generic (hint, hint :-)

> virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to
> fix some problems virtio-blk has. Namely queue parameter discover,
> which virtio-blk doesn't seem to do. There may also be a reason to cut
> the stack lower down. Error handling is most often cited for this, but
> no-one's satisfactorily explaned why it's better to do error handling in
> the guest instead of the host.
>
> Could someone please explain to me why you can't simply fix virtio-blk?

I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
block devices, image files or logical volumes virtio-blk seems to be the right
approach, I think.

> Or would virtio-blk maintainers give a reason why they're unwilling to
> have it fixed?

I dont consider virtio-blk broken. It just doesnt cover everything.

Christian

2012-02-13 07:58:05

by Dor Laor

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/13/2012 09:05 AM, Christian Borntraeger wrote:
> On 12/02/12 21:16, James Bottomley wrote:
>> Well, no-one's yet answered the question I had about why.
>
> Just to give one example from a different angle:
> In the big datacenters tape libraries are still very important, and lots
> of them have a scsi attachement. virtio-blk certainly is not the right
> way to handle those. Furthermore it seems even pretty hard to craft
> a virtio-tape since most of those libraries have vendor specific library
> controls (via sg). We would need to duplicate scsi generic (hint, hint :-)
>
>> virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to
>> fix some problems virtio-blk has. Namely queue parameter discover,
>> which virtio-blk doesn't seem to do. There may also be a reason to cut
>> the stack lower down. Error handling is most often cited for this, but
>> no-one's satisfactorily explaned why it's better to do error handling in
>> the guest instead of the host.
>>
>> Could someone please explain to me why you can't simply fix virtio-blk?
>
> I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
> block devices, image files or logical volumes virtio-blk seems to be the right
> approach, I think.

+1

virtio-scsi is superior w.r.t:
- Device support: tapes, cdroms, other
- Does guest-host mapped multipath
- Supports plenty of virtual disks mapped to the guest w/o need for a
pci slot per each virtio-blk
- offload fancy/new/sophisticated scsi commands from the guest to the
storage array w/o need for qemu implementation. Example XCOPY.

There are some more goodies like ability to support windows guest
clustering w/o hacky versions of scsi pass through over virtio-blk.
virtio-blk is also a candidate to change the request based towards bio
based implementation, so sticking to it does not buy us too much.

>
>> Or would virtio-blk maintainers give a reason why they're unwilling to
>> have it fixed?
>
> I dont consider virtio-blk broken. It just doesnt cover everything.
>
> Christian
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-02-13 09:20:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/12/2012 09:16 PM, James Bottomley wrote:
> Well, no-one's yet answered the question I had about why. virtio-scsi
> seems to be a basic duplication of virtio-blk except that it seems to
> fix some problems virtio-blk has. Namely queue parameter discover,
> which virtio-blk doesn't seem to do.

The biggest differences between virtio-blk and virtio-scsi are that:

1) how the feature set is defined. virtio-blk defines the feature set
of the device through a shared spec between the guest and the host. The
virtio-scsi spec does not define a feature set for the devices, only for
the transport. Introducing new features in the guest does not need to
be done specifically for virt, it can be done in generic code (sd.c).
This results in a large feature set and at the same time a very stable spec.

Right now virtio-blk covers common usecases nicely. However, the Linux
block layer _is_ growing support for new operations: discard is already
there, write same is in the works, extended copy will also come in due
time. Perhaps we'll add them to virtio-blk, perhaps not. If we will,
we will have to modify the spec, the host implementation, and the guest
drivers for each possible guest OS. virtio-scsi will support them
transparently. Depending on your configuration, it might work without
touching the host at all.


2) for disks with SCSI attachment, the native interface is exposed
precisely as it is in the host. I think we had some misunderstanding
WRT queue parameter discovery. My concern with virtio-blk's SG_IO
support is more general than that. It is that SG_IO accesses the host
disk, not the guest disk. They will have the same data, but they are
effectively different disks. For example they might have different
queue parameters, hence the misunderstanding.

People are mostly using the SG_IO interface for sane purposes. For
example you can ping the storage with INQUIRY commands to detect
problems on the NAS or SAN. For these usecases the difference does not
matter. However, there _are_ worrisome usecases for SG_IO that people
are looking at. For example installing vendor backup tools in their
guests. These tools send vendor-specific commands to the disks.
Nothing particularly insane about that, but we want them to do it using
a saner interface than VIRTIO_BLK_T_SCSI_CMD.


On top of this, only virtio-scsi obviously will support devices such as
tapes.

> There may also be a reason to cut the stack lower down. Error
> handling is most often cited for this, but no-one's satisfactorily
> explaned why it's better to do error handling in the guest instead of
> the host.

It's not necessarily better. However error handling in the host may
simply not be there. This is for example the case of NFS-based storage
with the "hard" option.

Paolo

2012-02-13 11:08:31

by Bart Van Assche

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Mon, Feb 13, 2012 at 8:05 AM, Christian Borntraeger
<[email protected]> wrote:
> On 12/02/12 21:16, James Bottomley wrote:
> > Could someone please explain to me why you can't simply fix virtio-blk?
>
> I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
> block devices, image files or logical volumes virtio-blk seems to be the right
> approach, I think.
>
> > Or would virtio-blk maintainers give a reason why they're unwilling to
> > have it fixed?
>
> I dont consider virtio-blk broken. It just doesnt cover everything.

Although I'm not sure whether that helps here: since about a year
there is software present in the upstream kernel that allows to use
any block device or even a file as a SCSI device.

Bart.

2012-02-13 12:40:12

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

Hi Dor, James & Co,

On Mon, 2012-02-13 at 09:57 +0200, Dor Laor wrote:
> On 02/13/2012 09:05 AM, Christian Borntraeger wrote:
> > On 12/02/12 21:16, James Bottomley wrote:
> >> Well, no-one's yet answered the question I had about why.
> >
> > Just to give one example from a different angle:
> > In the big datacenters tape libraries are still very important, and lots
> > of them have a scsi attachement. virtio-blk certainly is not the right
> > way to handle those. Furthermore it seems even pretty hard to craft
> > a virtio-tape since most of those libraries have vendor specific library
> > controls (via sg). We would need to duplicate scsi generic (hint, hint :-)
> >
> >> virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to
> >> fix some problems virtio-blk has. Namely queue parameter discover,
> >> which virtio-blk doesn't seem to do. There may also be a reason to cut
> >> the stack lower down. Error handling is most often cited for this, but
> >> no-one's satisfactorily explaned why it's better to do error handling in
> >> the guest instead of the host.
> >>
> >> Could someone please explain to me why you can't simply fix virtio-blk?
> >
> > I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
> > block devices, image files or logical volumes virtio-blk seems to be the right
> > approach, I think.
>
> +1
>
> virtio-scsi is superior w.r.t:
> - Device support: tapes, cdroms, other

AFAICT any type of non TYPE_DISK struct scsi_device passthrough is going
to currently require virtio-scsi in order to work.

> - Does guest-host mapped multipath

The logic that comes with target_core_fabric_configfs.c and the native
target control plane gives a host-side (tcm_vhost) fabric driver generic
explict/implict ALUA multipath support by default.

I think there are some interesting possibilities for paravirtualized
ALUA multipath.. 8-)

> - Supports plenty of virtual disks mapped to the guest w/o need for a
> pci slot per each virtio-blk

Ouch, virtio-blk lacks multi-lun per pci slot support..?

> - offload fancy/new/sophisticated scsi commands from the guest to the
> storage array w/o need for qemu implementation. Example XCOPY.
>

...

> There are some more goodies like ability to support windows guest
> clustering w/o hacky versions of scsi pass through over virtio-blk.
> virtio-blk is also a candidate to change the request based towards bio
> based implementation, so sticking to it does not buy us too much.
>

MSFT cluster guests that require SPC-3 PR support can run today with
tcm_loop LLD SCSI LUNs + SG_IO/BSG + right megasas QEMU HBA emulation,
but I do agree this would be better served by virtio-scsi for guests
that require SPC-3 PR support or passthrough.

--nab






2012-02-13 12:54:26

by Dor Laor

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/13/2012 02:40 PM, Nicholas A. Bellinger wrote:
> Hi Dor, James& Co,
>
> On Mon, 2012-02-13 at 09:57 +0200, Dor Laor wrote:
>> On 02/13/2012 09:05 AM, Christian Borntraeger wrote:
>>> On 12/02/12 21:16, James Bottomley wrote:
>>>> Well, no-one's yet answered the question I had about why.
>>>
>>> Just to give one example from a different angle:
>>> In the big datacenters tape libraries are still very important, and lots
>>> of them have a scsi attachement. virtio-blk certainly is not the right
>>> way to handle those. Furthermore it seems even pretty hard to craft
>>> a virtio-tape since most of those libraries have vendor specific library
>>> controls (via sg). We would need to duplicate scsi generic (hint, hint :-)
>>>
>>>> virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to
>>>> fix some problems virtio-blk has. Namely queue parameter discover,
>>>> which virtio-blk doesn't seem to do. There may also be a reason to cut
>>>> the stack lower down. Error handling is most often cited for this, but
>>>> no-one's satisfactorily explaned why it's better to do error handling in
>>>> the guest instead of the host.
>>>>
>>>> Could someone please explain to me why you can't simply fix virtio-blk?
>>>
>>> I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
>>> block devices, image files or logical volumes virtio-blk seems to be the right
>>> approach, I think.
>>
>> +1
>>
>> virtio-scsi is superior w.r.t:
>> - Device support: tapes, cdroms, other
>
> AFAICT any type of non TYPE_DISK struct scsi_device passthrough is going
> to currently require virtio-scsi in order to work.
>
>> - Does guest-host mapped multipath
>
> The logic that comes with target_core_fabric_configfs.c and the native
> target control plane gives a host-side (tcm_vhost) fabric driver generic
> explict/implict ALUA multipath support by default.
>
> I think there are some interesting possibilities for paravirtualized
> ALUA multipath.. 8-)
>
>> - Supports plenty of virtual disks mapped to the guest w/o need for a
>> pci slot per each virtio-blk
>
> Ouch, virtio-blk lacks multi-lun per pci slot support..?

Only if you use the pci multi-function option but that kills standard
hot unplug

>
>> - offload fancy/new/sophisticated scsi commands from the guest to the
>> storage array w/o need for qemu implementation. Example XCOPY.
>>
>
> ...
>
>> There are some more goodies like ability to support windows guest
>> clustering w/o hacky versions of scsi pass through over virtio-blk.
>> virtio-blk is also a candidate to change the request based towards bio
>> based implementation, so sticking to it does not buy us too much.
>>
>
> MSFT cluster guests that require SPC-3 PR support can run today with
> tcm_loop LLD SCSI LUNs + SG_IO/BSG + right megasas QEMU HBA emulation,
> but I do agree this would be better served by virtio-scsi for guests
> that require SPC-3 PR support or passthrough.
>
> --nab
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-02-13 13:00:24

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
> Only if you use the pci multi-function option but that kills
> standard hot unplug

It doesn't kill it as such, rather you can't unplug luns individually.

2012-02-13 13:13:39

by ronnie sahlberg

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>> Only if you use the pci multi-function option but that kills
>> standard hot unplug
>
> It doesn't kill it as such, rather you can't unplug luns individually.

Isnt that just a consequence of the current implementation rather than
a SCSI limitation?

A different way to do hoplug could be to flag all devices as removable
in the standard inq page then
leave the LUN there persistently and what you remove/add is not the
LUN device itself but just the media in the device.

Instead of hot-plug remove the LUN, hot-plug becomes "media eject" or
"media insert".
The device remains present all time, you never remove it, but instead
hot-plug controls if the media is present or not.


This would require implementing at least START_STOP_UNIT and
PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.


regards
ronnie sahlberg

2012-02-13 13:17:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/13/2012 02:13 PM, ronnie sahlberg wrote:
> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>>> Only if you use the pci multi-function option but that kills
>>> standard hot unplug
>>
>> It doesn't kill it as such, rather you can't unplug luns individually.
>
> Isnt that just a consequence of the current implementation rather than
> a SCSI limitation?

We're talking about virtio-blk here. :)

Paolo

2012-02-13 13:18:58

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
> > On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
> >> Only if you use the pci multi-function option but that kills
> >> standard hot unplug
> >
> > It doesn't kill it as such, rather you can't unplug luns individually.
>
> Isnt that just a consequence of the current implementation rather than
> a SCSI limitation?

Yes.

> A different way to do hoplug could be to flag all devices as removable
> in the standard inq page then
> leave the LUN there persistently and what you remove/add is not the
> LUN device itself but just the media in the device.
>
> Instead of hot-plug remove the LUN, hot-plug becomes "media eject" or
> "media insert".
> The device remains present all time, you never remove it, but instead
> hot-plug controls if the media is present or not.
>
>
> This would require implementing at least START_STOP_UNIT and
> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
>
>
> regards
> ronnie sahlberg

That would work.

2012-02-13 15:13:04

by Hannes Reinecke

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>>>> Only if you use the pci multi-function option but that kills
>>>> standard hot unplug
>>>
>>> It doesn't kill it as such, rather you can't unplug luns individually.
>>
>> Isnt that just a consequence of the current implementation rather than
>> a SCSI limitation?
>
> Yes.
>
>> A different way to do hoplug could be to flag all devices as removable
>> in the standard inq page then
>> leave the LUN there persistently and what you remove/add is not the
>> LUN device itself but just the media in the device.
>>
>> Instead of hot-plug remove the LUN, hot-plug becomes "media eject" or
>> "media insert".
>> The device remains present all time, you never remove it, but instead
>> hot-plug controls if the media is present or not.
>>
>>
>> This would require implementing at least START_STOP_UNIT and
>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
>>
>>
>> regards
>> ronnie sahlberg
>
> That would work.
>
Or we simply use the Peripheral Qualifier that the device is gone;
eg we could simply set PQ = 1, return sense code 0x25/00 and be done
with ...

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2012-02-13 20:42:53

by ronnie sahlberg

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke <[email protected]> wrote:
> On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
>> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
>>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
>>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>>>>> Only if you use the pci multi-function option but that kills
>>>>> standard hot unplug
>>>>
>>>> It doesn't kill it as such, rather you can't unplug luns individually.
>>>
>>> Isnt that just a consequence of the current implementation rather than
>>> a SCSI limitation?
>>
>> Yes.
>>
>>> A different way to do hoplug could be to flag all devices as removable
>>> in the standard inq page then
>>> leave the LUN there persistently and what you remove/add is not the
>>> LUN device itself but just the media in the device.
>>>
>>> Instead of hot-plug remove the LUN, ?hot-plug becomes "media eject" or
>>> "media insert".
>>> The device remains present all time, you never remove it, but instead
>>> hot-plug controls if the media is present or not.
>>>
>>>
>>> This would require implementing at least START_STOP_UNIT and
>>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
>>>
>>>
>>> regards
>>> ronnie sahlberg
>>
>> That would work.
>>
> Or we simply use the Peripheral Qualifier that the device is gone;
> eg we could simply set PQ = 1, return sense code 0x25/00 and be done
> with ...
>

That is still similar to "rip a device out from the guest without notice"
and can cause the guest to be "surprised".


Removable media is standard feature in SCSI SBC (and other commandsets).
The nice part of removable media is that it activates a contract
between the device and the guest
to prevent removal of the media when the guest depends on the media
not being removed.

I.e. If you have a SBC device with the removable-media bit set,
this is used to tell the initiator "this media can be removed, be
prepared that this might happen".
So when you mount such a SBC device in the guest, the guest will issue
a "PREVENT_ALLOW_MEDIUM_REMOVAL"
to tell the device "this medium is in use and may not be removed".

This automatically provides you with a mechanism where any guest can
signal to qemu when qemu may or may not remove the device/medium.



In addition to implementing PREVENT_ALLOW_MEDIUM_REMOVAL emulation,
qemu would also need to check the prevent-allow status before it
allows the device to be removed.

If nothing else, using this approach will automatically provide a
channel from the guest kernel to qemu to tell qemu when a device may
be unplugged and when it is not safe to unplug the device.



regards
ronnie sahlberg

2012-02-13 20:53:29

by ronnie sahlberg

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
<[email protected]> wrote:
> On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke <[email protected]> wrote:
>> On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
>>> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
>>>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
>>>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>>>>>> Only if you use the pci multi-function option but that kills
>>>>>> standard hot unplug
>>>>>
>>>>> It doesn't kill it as such, rather you can't unplug luns individually.
>>>>
>>>> Isnt that just a consequence of the current implementation rather than
>>>> a SCSI limitation?
>>>
>>> Yes.
>>>
>>>> A different way to do hoplug could be to flag all devices as removable
>>>> in the standard inq page then
>>>> leave the LUN there persistently and what you remove/add is not the
>>>> LUN device itself but just the media in the device.
>>>>
>>>> Instead of hot-plug remove the LUN, ?hot-plug becomes "media eject" or
>>>> "media insert".
>>>> The device remains present all time, you never remove it, but instead
>>>> hot-plug controls if the media is present or not.
>>>>
>>>>
>>>> This would require implementing at least START_STOP_UNIT and
>>>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
>>>>
>>>>
>>>> regards
>>>> ronnie sahlberg
>>>
>>> That would work.
>>>
>> Or we simply use the Peripheral Qualifier that the device is gone;
>> eg we could simply set PQ = 1, return sense code 0x25/00 and be done
>> with ...
>>
>
> That is still similar to "rip a device out from the guest without notice"
> and can cause the guest to be "surprised".
>
>
> Removable media is standard feature in SCSI SBC (and other commandsets).
> The nice part of removable media is that it activates a contract
> between the device and the guest
> to prevent removal of the media when the guest depends on the media
> not being removed.
>
> I.e. ?If you have a SBC device with the removable-media bit set,
> this is used to tell the initiator "this media can be removed, be
> prepared that this might happen".
> So when you mount such a SBC device in the guest, the guest will issue
> a "PREVENT_ALLOW_MEDIUM_REMOVAL"
> to tell the device "this medium is in use and may not be removed".
>

What I mean is that if /dev/sdb is removable,
if you mount this as "mount /dev/sdb1 /mnt"
this will automatically cause the guest kernel to send a
PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.

When you "umount /dev/sdb1" the kernel/guest will automagically send
PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
media again.


If you capture this command and track the "prevent/allow removal
status" you automatically get a channel where qemu will
know when it is safe to unplug the device and when it is not safe to
unplug the device.
This is a nice feature.

2012-02-13 22:59:17

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
> On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
> <[email protected]> wrote:
> > On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke <[email protected]> wrote:
> >> On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
> >>> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
> >>>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
> >>>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
> >>>>>> Only if you use the pci multi-function option but that kills
> >>>>>> standard hot unplug
> >>>>>
> >>>>> It doesn't kill it as such, rather you can't unplug luns individually.
> >>>>
> >>>> Isnt that just a consequence of the current implementation rather than
> >>>> a SCSI limitation?
> >>>
> >>> Yes.
> >>>
> >>>> A different way to do hoplug could be to flag all devices as removable
> >>>> in the standard inq page then
> >>>> leave the LUN there persistently and what you remove/add is not the
> >>>> LUN device itself but just the media in the device.
> >>>>
> >>>> Instead of hot-plug remove the LUN, ?hot-plug becomes "media eject" or
> >>>> "media insert".
> >>>> The device remains present all time, you never remove it, but instead
> >>>> hot-plug controls if the media is present or not.
> >>>>
> >>>>
> >>>> This would require implementing at least START_STOP_UNIT and
> >>>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
> >>>>
> >>>>
> >>>> regards
> >>>> ronnie sahlberg
> >>>
> >>> That would work.
> >>>
> >> Or we simply use the Peripheral Qualifier that the device is gone;
> >> eg we could simply set PQ = 1, return sense code 0x25/00 and be done
> >> with ...
> >>
> >
> > That is still similar to "rip a device out from the guest without notice"
> > and can cause the guest to be "surprised".
> >
> >
> > Removable media is standard feature in SCSI SBC (and other commandsets).
> > The nice part of removable media is that it activates a contract
> > between the device and the guest
> > to prevent removal of the media when the guest depends on the media
> > not being removed.
> >
> > I.e. ?If you have a SBC device with the removable-media bit set,
> > this is used to tell the initiator "this media can be removed, be
> > prepared that this might happen".
> > So when you mount such a SBC device in the guest, the guest will issue
> > a "PREVENT_ALLOW_MEDIUM_REMOVAL"
> > to tell the device "this medium is in use and may not be removed".
> >
>
> What I mean is that if /dev/sdb is removable,
> if you mount this as "mount /dev/sdb1 /mnt"
> this will automatically cause the guest kernel to send a
> PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.
>
> When you "umount /dev/sdb1" the kernel/guest will automagically send
> PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
> media again.
>
>
> If you capture this command and track the "prevent/allow removal
> status" you automatically get a channel where qemu will
> know when it is safe to unplug the device and when it is not safe to
> unplug the device.
> This is a nice feature.

Presumably there's a way for device to notify the OS
that user requested removal, as well?

2012-02-13 23:31:03

by ronnie sahlberg

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
>> On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
>> <[email protected]> wrote:
>> > On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke <[email protected]> wrote:
>> >> On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
>> >>> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
>> >>>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
>> >>>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>> >>>>>> Only if you use the pci multi-function option but that kills
>> >>>>>> standard hot unplug
>> >>>>>
>> >>>>> It doesn't kill it as such, rather you can't unplug luns individually.
>> >>>>
>> >>>> Isnt that just a consequence of the current implementation rather than
>> >>>> a SCSI limitation?
>> >>>
>> >>> Yes.
>> >>>
>> >>>> A different way to do hoplug could be to flag all devices as removable
>> >>>> in the standard inq page then
>> >>>> leave the LUN there persistently and what you remove/add is not the
>> >>>> LUN device itself but just the media in the device.
>> >>>>
>> >>>> Instead of hot-plug remove the LUN, ?hot-plug becomes "media eject" or
>> >>>> "media insert".
>> >>>> The device remains present all time, you never remove it, but instead
>> >>>> hot-plug controls if the media is present or not.
>> >>>>
>> >>>>
>> >>>> This would require implementing at least START_STOP_UNIT and
>> >>>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
>> >>>>
>> >>>>
>> >>>> regards
>> >>>> ronnie sahlberg
>> >>>
>> >>> That would work.
>> >>>
>> >> Or we simply use the Peripheral Qualifier that the device is gone;
>> >> eg we could simply set PQ = 1, return sense code 0x25/00 and be done
>> >> with ...
>> >>
>> >
>> > That is still similar to "rip a device out from the guest without notice"
>> > and can cause the guest to be "surprised".
>> >
>> >
>> > Removable media is standard feature in SCSI SBC (and other commandsets).
>> > The nice part of removable media is that it activates a contract
>> > between the device and the guest
>> > to prevent removal of the media when the guest depends on the media
>> > not being removed.
>> >
>> > I.e. ?If you have a SBC device with the removable-media bit set,
>> > this is used to tell the initiator "this media can be removed, be
>> > prepared that this might happen".
>> > So when you mount such a SBC device in the guest, the guest will issue
>> > a "PREVENT_ALLOW_MEDIUM_REMOVAL"
>> > to tell the device "this medium is in use and may not be removed".
>> >
>>
>> What I mean is that if /dev/sdb is removable,
>> if you mount this as ? "mount /dev/sdb1 /mnt"
>> this will automatically cause the guest kernel to send a
>> PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.
>>
>> When you "umount /dev/sdb1" ? the kernel/guest will automagically send
>> PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
>> media again.
>>
>>
>> If you capture this command and track the "prevent/allow removal
>> status" ?you automatically get a channel where qemu will
>> know when it is safe to unplug the device ?and when it is not safe to
>> unplug the device.
>> This is a nice feature.
>
> Presumably there's a way for device to notify the OS
> that user requested removal, as well?


I think that is done by responding with sense to one of the commands,
like the every few second TEST_UNIT_READY that the
initiator/guest-kernel will send.

5Ah 01h DT WROM BK OPERATOR MEDIUM REMOVAL REQUEST

This sense code should be the one to use.


I dont know if linux scsi initiator honors this or what it will do.



I guess something like this could work ?

IF device is marked as prevent-removal THEN
send OPERATOR SEND MEDIUM REMOVAL REQUEST to the initiator
wait xyz seconds
IF device is still marked as prevent-removal THEN
ask operator "guest refused to release the LUN, do you want to
forcefully remove it?"
ELSE
unmount the media
FI
ELSE
unmount the media
FI

2012-02-13 23:33:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 10:30:59AM +1100, ronnie sahlberg wrote:
> On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin <[email protected]> wrote:
> > On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
> >> On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
> >> <[email protected]> wrote:
> >> > On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke <[email protected]> wrote:
> >> >> On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
> >> >>> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
> >> >>>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
> >> >>>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
> >> >>>>>> Only if you use the pci multi-function option but that kills
> >> >>>>>> standard hot unplug
> >> >>>>>
> >> >>>>> It doesn't kill it as such, rather you can't unplug luns individually.
> >> >>>>
> >> >>>> Isnt that just a consequence of the current implementation rather than
> >> >>>> a SCSI limitation?
> >> >>>
> >> >>> Yes.
> >> >>>
> >> >>>> A different way to do hoplug could be to flag all devices as removable
> >> >>>> in the standard inq page then
> >> >>>> leave the LUN there persistently and what you remove/add is not the
> >> >>>> LUN device itself but just the media in the device.
> >> >>>>
> >> >>>> Instead of hot-plug remove the LUN, ?hot-plug becomes "media eject" or
> >> >>>> "media insert".
> >> >>>> The device remains present all time, you never remove it, but instead
> >> >>>> hot-plug controls if the media is present or not.
> >> >>>>
> >> >>>>
> >> >>>> This would require implementing at least START_STOP_UNIT and
> >> >>>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
> >> >>>>
> >> >>>>
> >> >>>> regards
> >> >>>> ronnie sahlberg
> >> >>>
> >> >>> That would work.
> >> >>>
> >> >> Or we simply use the Peripheral Qualifier that the device is gone;
> >> >> eg we could simply set PQ = 1, return sense code 0x25/00 and be done
> >> >> with ...
> >> >>
> >> >
> >> > That is still similar to "rip a device out from the guest without notice"
> >> > and can cause the guest to be "surprised".
> >> >
> >> >
> >> > Removable media is standard feature in SCSI SBC (and other commandsets).
> >> > The nice part of removable media is that it activates a contract
> >> > between the device and the guest
> >> > to prevent removal of the media when the guest depends on the media
> >> > not being removed.
> >> >
> >> > I.e. ?If you have a SBC device with the removable-media bit set,
> >> > this is used to tell the initiator "this media can be removed, be
> >> > prepared that this might happen".
> >> > So when you mount such a SBC device in the guest, the guest will issue
> >> > a "PREVENT_ALLOW_MEDIUM_REMOVAL"
> >> > to tell the device "this medium is in use and may not be removed".
> >> >
> >>
> >> What I mean is that if /dev/sdb is removable,
> >> if you mount this as ? "mount /dev/sdb1 /mnt"
> >> this will automatically cause the guest kernel to send a
> >> PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.
> >>
> >> When you "umount /dev/sdb1" ? the kernel/guest will automagically send
> >> PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
> >> media again.
> >>
> >>
> >> If you capture this command and track the "prevent/allow removal
> >> status" ?you automatically get a channel where qemu will
> >> know when it is safe to unplug the device ?and when it is not safe to
> >> unplug the device.
> >> This is a nice feature.
> >
> > Presumably there's a way for device to notify the OS
> > that user requested removal, as well?
>
>
> I think that is done by responding with sense to one of the commands,
> like the every few second TEST_UNIT_READY that the
> initiator/guest-kernel will send.

Does it do this even for mounted media?
I didn't realize ...

2012-02-14 00:10:31

by Rusty Russell

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Mon, 13 Feb 2012 10:19:56 +0100, Paolo Bonzini <[email protected]> wrote:
> block layer _is_ growing support for new operations: discard is already
> there, write same is in the works, extended copy will also come in due
> time. Perhaps we'll add them to virtio-blk, perhaps not.

FYI, I'd take patches for discard in virtio_blk today; it's a no-brainer
in a virtual devoce.

But I wouldn't want extended copy and write same.

Thanks,
Rusty.

2012-02-14 00:49:59

by ronnie sahlberg

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 10:33 AM, Michael S. Tsirkin <[email protected]> wrote:
> On Tue, Feb 14, 2012 at 10:30:59AM +1100, ronnie sahlberg wrote:
>> On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin <[email protected]> wrote:
>> > On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
>> >> On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
>> >> <[email protected]> wrote:
>> >> > On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke <[email protected]> wrote:
>> >> >> On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
>> >> >>> On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
>> >> >>>> On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin <[email protected]> wrote:
>> >> >>>>> On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
>> >> >>>>>> Only if you use the pci multi-function option but that kills
>> >> >>>>>> standard hot unplug
>> >> >>>>>
>> >> >>>>> It doesn't kill it as such, rather you can't unplug luns individually.
>> >> >>>>
>> >> >>>> Isnt that just a consequence of the current implementation rather than
>> >> >>>> a SCSI limitation?
>> >> >>>
>> >> >>> Yes.
>> >> >>>
>> >> >>>> A different way to do hoplug could be to flag all devices as removable
>> >> >>>> in the standard inq page then
>> >> >>>> leave the LUN there persistently and what you remove/add is not the
>> >> >>>> LUN device itself but just the media in the device.
>> >> >>>>
>> >> >>>> Instead of hot-plug remove the LUN, ?hot-plug becomes "media eject" or
>> >> >>>> "media insert".
>> >> >>>> The device remains present all time, you never remove it, but instead
>> >> >>>> hot-plug controls if the media is present or not.
>> >> >>>>
>> >> >>>>
>> >> >>>> This would require implementing at least START_STOP_UNIT and
>> >> >>>> PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
>> >> >>>>
>> >> >>>>
>> >> >>>> regards
>> >> >>>> ronnie sahlberg
>> >> >>>
>> >> >>> That would work.
>> >> >>>
>> >> >> Or we simply use the Peripheral Qualifier that the device is gone;
>> >> >> eg we could simply set PQ = 1, return sense code 0x25/00 and be done
>> >> >> with ...
>> >> >>
>> >> >
>> >> > That is still similar to "rip a device out from the guest without notice"
>> >> > and can cause the guest to be "surprised".
>> >> >
>> >> >
>> >> > Removable media is standard feature in SCSI SBC (and other commandsets).
>> >> > The nice part of removable media is that it activates a contract
>> >> > between the device and the guest
>> >> > to prevent removal of the media when the guest depends on the media
>> >> > not being removed.
>> >> >
>> >> > I.e. ?If you have a SBC device with the removable-media bit set,
>> >> > this is used to tell the initiator "this media can be removed, be
>> >> > prepared that this might happen".
>> >> > So when you mount such a SBC device in the guest, the guest will issue
>> >> > a "PREVENT_ALLOW_MEDIUM_REMOVAL"
>> >> > to tell the device "this medium is in use and may not be removed".
>> >> >
>> >>
>> >> What I mean is that if /dev/sdb is removable,
>> >> if you mount this as ? "mount /dev/sdb1 /mnt"
>> >> this will automatically cause the guest kernel to send a
>> >> PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.
>> >>
>> >> When you "umount /dev/sdb1" ? the kernel/guest will automagically send
>> >> PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
>> >> media again.
>> >>
>> >>
>> >> If you capture this command and track the "prevent/allow removal
>> >> status" ?you automatically get a channel where qemu will
>> >> know when it is safe to unplug the device ?and when it is not safe to
>> >> unplug the device.
>> >> This is a nice feature.
>> >
>> > Presumably there's a way for device to notify the OS
>> > that user requested removal, as well?
>>
>>
>> I think that is done by responding with sense ?to one of the commands,
>> like the every few second TEST_UNIT_READY that the
>> initiator/guest-kernel will send.
>
> Does it do this even for mounted media?
> I didn't realize ...

Yes it does. At least for SBC devices that are flagged as removable.
"Normal" SBC devices that are not removable might trigger different
behaviour in the kernel.
(endless string of TEST_UNIT_READY is a way to check if something
"unexpected" happens on the device.)

I just tested on a 3.0.0 kernel.

I have a mediachanger for SBC and the SBC is flagged as removable.
/dev/sdd is the iscsi lun with a removable SBC disk.

By just exposing this device to the kernel, the kernel keeps sending,
or if not the kernel maybe some other process trying to poll the
status?

every few seconds :
PREVENT_ALLOW_MEDIUM_REMOVAL prevent removal
PREVENT_ALLOW_MEDIUM_REMOVAL to immediatel change it back to allow
removal again
TEST_UNIT_READY


After I run this
mount /dev/sdd1 /mnt

The kernel sends a single
PREVENT_ALLOW_MEDIUM_REMOVAL to prevent removal
then every few seconds a
TEST_UNIT_READY


This is what it does for removable SBC disks. Maybe it does something
differently for "permanent" SBC disks.

regards
ronnie sahlberg

2012-02-14 01:11:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On Tue, Feb 14, 2012 at 11:49:55AM +1100, ronnie sahlberg wrote:
> By just exposing this device to the kernel, the kernel keeps sending,
> or if not the kernel maybe some other process trying to poll the
> status?
>
> every few seconds :
> PREVENT_ALLOW_MEDIUM_REMOVAL prevent removal
> PREVENT_ALLOW_MEDIUM_REMOVAL to immediatel change it back to allow
> removal again
> TEST_UNIT_READY
>
>
> After I run this
> mount /dev/sdd1 /mnt
>
> The kernel sends a single
> PREVENT_ALLOW_MEDIUM_REMOVAL to prevent removal
> then every few seconds a
> TEST_UNIT_READY
>
>
> This is what it does for removable SBC disks. Maybe it does something
> differently for "permanent" SBC disks.
>
> regards
> ronnie sahlberg

There used to be a daemon for this.

2012-02-14 09:57:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

On 02/14/2012 02:11 AM, Michael S. Tsirkin wrote:
> On Tue, Feb 14, 2012 at 11:49:55AM +1100, ronnie sahlberg wrote:
>> By just exposing this device to the kernel, the kernel keeps sending,
>> or if not the kernel maybe some other process trying to poll the
>> status?
>>
>> every few seconds :
>> PREVENT_ALLOW_MEDIUM_REMOVAL prevent removal
>> PREVENT_ALLOW_MEDIUM_REMOVAL to immediatel change it back to allow
>> removal again
>> TEST_UNIT_READY
>>
>>
>> After I run this
>> mount /dev/sdd1 /mnt
>>
>> The kernel sends a single
>> PREVENT_ALLOW_MEDIUM_REMOVAL to prevent removal
>> then every few seconds a
>> TEST_UNIT_READY

Sorry to interrupt you again guys, but: the discussion started with
virtio-blk hotplug and now we're talking about SCSI commands? Sure
somebody switched topic at some point :) and anyway this is irrelevant
to what virtio-blk can/cannot do.

BTW, for virtio-scsi the spec provides a way to do hotplug and hotunplug
without any polling, though it's not implemented yet in the driver.

Paolo