2018-02-13 10:02:20

by Harald Dunkel

[permalink] [raw]
Subject: running NFS in LXC

Hi folks,

to support a HA setup I have a block device mirrored via
drbd on 2 hosts (write-through). The drbd Primary runs
the NFS service, using a dedicated IP address. NFS is not
running on the drbd Secondary.

To make this work I have to move several system files and
/var directories to a file system on the mirrored block
device as well. This is pretty clumsy, it breaks system
updates, etc.

Question: What if I run the NFS service in a LXC container
instead?

The idea is to put the whole service as a container on the
same block device as the NFS export partitions. In case of
a hardware failure on the Primary I can stop the NFS service,
make the Secondary to the new Primary and start the NFS
service on the other hardware again.

Do you expect some serious problems here? Please share your
thoughts. Every helpful comment is highly appreciated.


Harri


2018-02-13 18:28:31

by Benjamin Coddington

[permalink] [raw]
Subject: Re: running NFS in LXC

Hi Harri,

I think this is a very effective and simple approach to building out a
hardware resilient NFS service.

I have some experience with this approach running multiple (< 100) NFS
servers in VMs that could started and moved between hardware hosts. If
you have the hardware for it (or want to use iSCSI), you don't need DRBD
- only a way to share your block devices between nodes and ensure that
each is only used by a single host at a time.

We also created some automated orchestration that was able to deploy new
NFS servers and dynamically migrate parts of the filesystem tree to new
servers, which gave us a lot of capability to scale horizontally.

I really think the containerized or virtualized knfsd is a nice solution
to both hardware failure and horizontal scaling if your use-case allows
brief outages for part of the filesystem's tree.

I'm happy to share more, or hear more about your experience.

Ben


On 13 Feb 2018, at 4:55, Harald Dunkel wrote:

> Hi folks,
>
> to support a HA setup I have a block device mirrored via
> drbd on 2 hosts (write-through). The drbd Primary runs
> the NFS service, using a dedicated IP address. NFS is not
> running on the drbd Secondary.
>
> To make this work I have to move several system files and
> /var directories to a file system on the mirrored block
> device as well. This is pretty clumsy, it breaks system
> updates, etc.
>
> Question: What if I run the NFS service in a LXC container
> instead?
>
> The idea is to put the whole service as a container on the
> same block device as the NFS export partitions. In case of
> a hardware failure on the Primary I can stop the NFS service,
> make the Secondary to the new Primary and start the NFS
> service on the other hardware again.
>
> Do you expect some serious problems here? Please share your
> thoughts. Every helpful comment is highly appreciated.
>
>
> Harri
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-02-14 07:06:27

by Harald Dunkel

[permalink] [raw]
Subject: Re: running NFS in LXC

Hi Ben,

I take this as a "no serious problems by now". Good to hear.
Which kernel are you using?

I like your suggestion to use an (redundant?) iscsi server
instead of drbd, but I have to live with the current hardware.
Surely I will consider it for the next NFS server configuration.
On the other hand, drbd 8 has proven as rock-solid.


Regards
Harri

2018-02-14 14:15:07

by Benjamin Coddington

[permalink] [raw]
Subject: Re: running NFS in LXC

On 14 Feb 2018, at 2:06, Harald Dunkel wrote:

> Hi Ben,
>
> I take this as a "no serious problems by now". Good to hear.
> Which kernel are you using?

This was years ago on a 2.6.32 series. I don't expect you'll have
serious problems now, either. As far as I know, my last employer is
still using that architecture, but I couldn't tell you what software
versions they're on now..

We moved to the knfsd-in-a-container from an architecture that was
essentially a bunch of vanilla knfsds that could mount and any of the
block devices, and block devices were tied to IP addresses, and this was
all orchestrated by pacemaker. The problem with that one was that when
a block device or filesystem was migrated, the server receiving that
filesystem had to be put into grace, which disrupted any existing NFS
serving that was going on.

Test things, let us know how it works!

Ben

2018-02-15 15:45:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: running NFS in LXC

On Wed, Feb 14, 2018 at 09:15:05AM -0500, Benjamin Coddington wrote:
> On 14 Feb 2018, at 2:06, Harald Dunkel wrote:
>
> >Hi Ben,
> >
> >I take this as a "no serious problems by now". Good to hear.
> >Which kernel are you using?
>
> This was years ago on a 2.6.32 series. I don't expect you'll have
> serious problems now, either. As far as I know, my last employer is
> still using that architecture, but I couldn't tell you what software
> versions they're on now..
>
> We moved to the knfsd-in-a-container from an architecture that was
> essentially a bunch of vanilla knfsds that could mount and any of
> the block devices, and block devices were tied to IP addresses, and
> this was all orchestrated by pacemaker. The problem with that one
> was that when a block device or filesystem was migrated, the server
> receiving that filesystem had to be put into grace, which disrupted
> any existing NFS serving that was going on.
>
> Test things, let us know how it works!

I think you were using KVM, right, Ben?

Harald is talking about LXC, and there are still a few problems there.

Jeff, do you object to going back to our plan B for reboot recovery (the
daemon)? The usermode helper containerization seems stalled and I have
to admit I'm probably not going to take it on myself. That might be the
only knfsd-in-a-container obstacle left.

--b.

2018-02-15 19:14:25

by Benjamin Coddington

[permalink] [raw]
Subject: Re: running NFS in LXC

On 15 Feb 2018, at 10:45, J. Bruce Fields wrote:

> On Wed, Feb 14, 2018 at 09:15:05AM -0500, Benjamin Coddington wrote:
>> On 14 Feb 2018, at 2:06, Harald Dunkel wrote:
>>
>>> Hi Ben,
>>>
>>> I take this as a "no serious problems by now". Good to hear.
>>> Which kernel are you using?
>>
>> This was years ago on a 2.6.32 series. I don't expect you'll have
>> serious problems now, either. As far as I know, my last employer is
>> still using that architecture, but I couldn't tell you what software
>> versions they're on now..
>>
>> We moved to the knfsd-in-a-container from an architecture that was
>> essentially a bunch of vanilla knfsds that could mount and any of
>> the block devices, and block devices were tied to IP addresses, and
>> this was all orchestrated by pacemaker. The problem with that one
>> was that when a block device or filesystem was migrated, the server
>> receiving that filesystem had to be put into grace, which disrupted
>> any existing NFS serving that was going on.
>>
>> Test things, let us know how it works!
>
> I think you were using KVM, right, Ben?

That's right.

> Harald is talking about LXC, and there are still a few problems there.

Ah, yes -- What a bonehead I am, I'd forgotten about containing the
upcalls.. but if the state recovery db can be stored on the same filesystem
as the container, and there's only one knfsd container per "host", then the
HA model should still work I think.

Ben

> Jeff, do you object to going back to our plan B for reboot recovery (the
> daemon)? The usermode helper containerization seems stalled and I have to
> admit I'm probably not going to take it on myself. That might be the only
> knfsd-in-a-container obstacle left.
>
> --b.

2018-03-20 11:09:07

by Harald Dunkel

[permalink] [raw]
Subject: Re: running NFS in LXC

Hi JB,

On 02/15/18 16:45, J. Bruce Fields wrote:
>
> Harald is talking about LXC, and there are still a few problems there.
>
> Jeff, do you object to going back to our plan B for reboot recovery (the
> daemon)? The usermode helper containerization seems stalled and I have
> to admit I'm probably not going to take it on myself. That might be the
> only knfsd-in-a-container obstacle left.
>

I have no clue about plan B, but it would be very nice if this feature
(running NFServer in a container) could be completed.


Regards
Harri

2018-03-20 12:01:29

by Jeffrey Layton

[permalink] [raw]
Subject: Re: running NFS in LXC

On Thu, 2018-02-15 at 10:45 -0500, J. Bruce Fields wrote:
> On Wed, Feb 14, 2018 at 09:15:05AM -0500, Benjamin Coddington wrote:
> > On 14 Feb 2018, at 2:06, Harald Dunkel wrote:
> >
> > > Hi Ben,
> > >
> > > I take this as a "no serious problems by now". Good to hear.
> > > Which kernel are you using?
> >
> > This was years ago on a 2.6.32 series. I don't expect you'll have
> > serious problems now, either. As far as I know, my last employer is
> > still using that architecture, but I couldn't tell you what software
> > versions they're on now..
> >
> > We moved to the knfsd-in-a-container from an architecture that was
> > essentially a bunch of vanilla knfsds that could mount and any of
> > the block devices, and block devices were tied to IP addresses, and
> > this was all orchestrated by pacemaker. The problem with that one
> > was that when a block device or filesystem was migrated, the server
> > receiving that filesystem had to be put into grace, which disrupted
> > any existing NFS serving that was going on.
> >
> > Test things, let us know how it works!
>
> I think you were using KVM, right, Ben?
>
> Harald is talking about LXC, and there are still a few problems there.
>
> Jeff, do you object to going back to our plan B for reboot recovery (the
> daemon)? The usermode helper containerization seems stalled and I have
> to admit I'm probably not going to take it on myself. That might be the
> only knfsd-in-a-container obstacle left.

Sorry for the late response. I've no objection to resurrecting that
approach if it helps this use-case. The daemon and umh callout should be
able to share a lot of the same code and database if it's done properly.

--
Jeff Layton <[email protected]>

2018-03-20 13:22:11

by Scott Mayhew

[permalink] [raw]
Subject: Re: running NFS in LXC

On Tue, 20 Mar 2018, Jeff Layton wrote:

> On Thu, 2018-02-15 at 10:45 -0500, J. Bruce Fields wrote:
> > On Wed, Feb 14, 2018 at 09:15:05AM -0500, Benjamin Coddington wrote:
> > > On 14 Feb 2018, at 2:06, Harald Dunkel wrote:
> > >
> > > > Hi Ben,
> > > >
> > > > I take this as a "no serious problems by now". Good to hear.
> > > > Which kernel are you using?
> > >
> > > This was years ago on a 2.6.32 series. I don't expect you'll have
> > > serious problems now, either. As far as I know, my last employer is
> > > still using that architecture, but I couldn't tell you what software
> > > versions they're on now..
> > >
> > > We moved to the knfsd-in-a-container from an architecture that was
> > > essentially a bunch of vanilla knfsds that could mount and any of
> > > the block devices, and block devices were tied to IP addresses, and
> > > this was all orchestrated by pacemaker. The problem with that one
> > > was that when a block device or filesystem was migrated, the server
> > > receiving that filesystem had to be put into grace, which disrupted
> > > any existing NFS serving that was going on.
> > >
> > > Test things, let us know how it works!
> >
> > I think you were using KVM, right, Ben?
> >
> > Harald is talking about LXC, and there are still a few problems there.
> >
> > Jeff, do you object to going back to our plan B for reboot recovery (the
> > daemon)? The usermode helper containerization seems stalled and I have
> > to admit I'm probably not going to take it on myself. That might be the
> > only knfsd-in-a-container obstacle left.
>
> Sorry for the late response. I've no objection to resurrecting that
> approach if it helps this use-case. The daemon and umh callout should be
> able to share a lot of the same code and database if it's done properly.

I'm sorta in the early stages of resurrecting nfsdcld because I wanted
to tie it in to corosync for use in pacemaker clusters. I'm planning on
having the cluster-related stuff controls by a command-line or nfs.conf
flag, but I think even in the standalone case the upcall message needs
to change to include the session flag (which the umh callout passes via
an environment variable)... but I was also looking at maybe adding the
server address and filesystem fsid to the message.

>
> --
> Jeff Layton <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html