2006-02-20 08:11:32

by Dmitry Mishin

[permalink] [raw]
Subject: [PATCH 1/2] iptables 32bit compat layer

Hello,

This patch set extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient
due to alignment checks both in kernel and user space tools.

This patch introduces base compatibility interface for other ip_tables modules

--
Thanks,
Dmitry.


Attachments:
(No filename) (317.00 B)
diff-ms-ipt-compat-20060217 (45.96 kB)
Download all attachments

2006-02-20 08:14:38

by Dmitry Mishin

[permalink] [raw]
Subject: [PATCH 2/2] iptables 32bit compat layer

This patch introduces compatibility functions for some ip_tables matches and
targets, using interface provided in the first patch.

> Hello,
>
> This patch set extends current iptables compatibility layer in order to get
> 32bit iptables to work on 64bit kernel. Current layer is insufficient
> due to alignment checks both in kernel and user space tools.
>

--
Thanks,
Dmitry.


Attachments:
(No filename) (381.00 B)
diff-ms-ipt-compat2-20060217 (26.62 kB)
Download all attachments

2006-02-20 08:31:41

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/2] iptables 32bit compat layer

From: Mishin Dmitry <[email protected]>
Date: Mon, 20 Feb 2006 11:10:38 +0300

> This patch set extends current iptables compatibility layer in order
> to get 32bit iptables to work on 64bit kernel. Current layer is
> insufficient due to alignment checks both in kernel and user space
> tools.

Thanks a lot for doing this work Mishin.

2006-02-20 16:14:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] iptables 32bit compat layer

On Monday 20 February 2006 09:10, Mishin Dmitry wrote:
> --- ./include/linux/netfilter/x_tables.h.iptcompat??????2006-02-15 16:16:02.000000000 +0300
> +++ ./include/linux/netfilter/x_tables.h????????2006-02-15 18:53:09.000000000 +0300
> ?struct xt_match
> ?{
> ????????struct list_head list;
> @@ -118,6 +125,10 @@ struct xt_match
> ????????/* Called when entry of this type deleted. */
> ????????void (*destroy)(void *matchinfo, unsigned int matchinfosize);
> ?
> +#ifdef CONFIG_COMPAT
> +???????/* Called when userspace align differs from kernel space one */
> +???????int (*compat)(void *match, void **dstptr, int *size, int convert);
> +#endif
> ????????/* Set this to THIS_MODULE if you are a module, otherwise NULL */
> ????????struct module *me;
> ?};

Is CONFIG_COMPAT the right conditional here? If the code is only used
for architectures that have different aligments, it should not need be
compiled in for the other architectures.

> @@ -154,6 +165,10 @@ struct xt_target
> ????????/* Called when entry of this type deleted. */
> ????????void (*destroy)(void *targinfo, unsigned int targinfosize);
> ?
> +#ifdef CONFIG_COMPAT
> +???????/* Called when userspace align differs from kernel space one */
> +???????int (*compat)(void *target, void **dstptr, int *size, int convert);
> +#endif
> ????????/* Set this to THIS_MODULE if you are a module, otherwise NULL */
> ????????struct module *me;
> ?};
> @@ -233,6 +248,34 @@ extern void xt_proto_fini(int af);
> ?extern struct xt_table_info *xt_alloc_table_info(unsigned int size);
> ?extern void xt_free_table_info(struct xt_table_info *info);
> ?
> +#ifdef CONFIG_COMPAT
> +#include <net/compat.h>
> +
> +/* FIXME: this works only on 32 bit tasks
> + * need to change whole approach in order to calculate align as function of
> + * current task alignment */
> +
> +struct compat_xt_counters
> +{
> +???????u_int32_t cnt[4];
> +};

Hmm, maybe we should have something like

typedef u64 __attribute__((aligned(4))) compat_u64;

in order to get the right alignment on the architectures
where it makes a difference. Do all compiler versions
get that right?

> --- ./include/linux/netfilter_ipv4/ip_tables.h.iptcompat????????2006-02-15 16:06:41.000000000 +0300
> +++ ./include/linux/netfilter_ipv4/ip_tables.h??2006-02-15 16:37:12.000000000 +0300
> @@ -364,5 +365,62 @@ extern unsigned int ipt_do_table(struct
> ???????????????????????????????? void *userdata);
> ?
> ?#define IPT_ALIGN(s) XT_ALIGN(s)
> +
> +#ifdef CONFIG_COMPAT
> +#include <net/compat.h>
> +
> +struct compat_ipt_getinfo
> +{
> +???????char name[IPT_TABLE_MAXNAMELEN];
> +???????compat_uint_t valid_hooks;
> +???????compat_uint_t hook_entry[NF_IP_NUMHOOKS];
> +???????compat_uint_t underflow[NF_IP_NUMHOOKS];
> +???????compat_uint_t num_entries;
> +???????compat_uint_t size;
> +};

This structure looks like it does not need any
conversions. You should probably just use
struct ipt_getinfo then.

> +
> +struct compat_ipt_entry_match
> +{
> +???????union {
> +???????????????struct {
> +???????????????????????u_int16_t match_size;
> +???????????????????????char name[IPT_FUNCTION_MAXNAMELEN];
> +???????????????} user;
> +???????????????u_int16_t match_size;
> +???????} u;
> +???????unsigned char data[0];
> +};
> +
> +struct compat_ipt_entry_target
> +{
> +???????union {
> +???????????????struct {
> +???????????????????????u_int16_t target_size;
> +???????????????????????char name[IPT_FUNCTION_MAXNAMELEN];
> +???????????????} user;
> +???????????????u_int16_t target_size;
> +???????} u;
> +???????unsigned char data[0];
> +};

Dito

> +#define COMPAT_IPT_ALIGN(s) ???COMPAT_XT_ALIGN(s)
> +
> +extern int ipt_match_align_compat(void *match, void **dstptr,
> +???????????????int *size, int off, int convert);
> +extern int ipt_target_align_compat(void *target, void **dstptr,
> +???????????????int *size, int off, int convert);
> +
> +#endif /* CONFIG_COMPAT */
> ?#endif /*__KERNEL__*/
> ?#endif /* _IPTABLES_H */
> --- ./include/net/compat.h.iptcompat????2006-01-03 06:21:10.000000000 +0300
> +++ ./include/net/compat.h??????2006-02-15 18:45:49.000000000 +0300
> @@ -23,6 +23,14 @@ struct compat_cmsghdr {
> ????????compat_int_t????cmsg_type;
> ?};
> ?
> +#if defined(CONFIG_X86_64)
> +#define is_current_32bits() (current_thread_info()->flags & _TIF_IA32)
> +#elif defined(CONFIG_IA64)
> +#define is_current_32bits() (IS_IA32_PROCESS(ia64_task_regs(current)))
> +#else
> +#define is_current_32bits()????0
> +#endif
> +

This definition looks very wrong to me. For x86_64, the right thing to check
should be TS_COMPAT, no _TIF_IA32, since you can also call the 64 bit
syscall entry point from a i386 task running on x86_64. For most other
architectures, is_current_32bits returns something that is not reflected
in the name. I would e.g. expect the function to return '1' on i386 and
the correct task state on other compat platforms, instead of a bogus '0'.

There have been long discussions about the inclusions of the 'is_compat_task'
macro. Let's at least not define a second function that does almost the
same but gets it wrong.

I would much rather have either an extra 'compat' argument to to
sock_setsockopt and proto_ops->setsockopt than to spread the use
of is_compat_task further.

> ?#else /* defined(CONFIG_COMPAT) */
> ?#define compat_msghdr??msghdr??????????/* to avoid compiler warnings */
> ?#endif /* defined(CONFIG_COMPAT) */
> --- ./net/compat.c.iptcompat????2006-01-03 06:21:10.000000000 +0300
> +++ ./net/compat.c??????2006-02-15 16:38:45.000000000 +0300
> @@ -308,107 +308,6 @@ void scm_detach_fds_compat(struct msghdr
> ?}
> ?
> ?/*
> - * For now, we assume that the compatibility and native version
> - * of struct ipt_entry are the same - sfr. ?FIXME
> - */
> -struct compat_ipt_replace {
> -???????char????????????????????name[IPT_TABLE_MAXNAMELEN];
> -???????u32?????????????????????valid_hooks;
> -???????u32?????????????????????num_entries;
> -???????u32?????????????????????size;
> -???????u32?????????????????????hook_entry[NF_IP_NUMHOOKS];
> -???????u32?????????????????????underflow[NF_IP_NUMHOOKS];
> -???????u32?????????????????????num_counters;
> -???????compat_uptr_t???????????counters;???????/* struct ipt_counters * */
> -???????struct ipt_entry????????entries[0];
> -};

Is the FIXME above the only reason that the code needs to be changed?
What is the reason that you did not just address this in the
compat_sys_setsockopt implementation?

Arnd <><

2006-02-20 21:23:42

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] iptables 32bit compat layer

Mishin Dmitry <[email protected]> writes:

> Hello,
>
> This patch set extends current iptables compatibility layer in order to get
> 32bit iptables to work on 64bit kernel. Current layer is insufficient
> due to alignment checks both in kernel and user space tools.
>
> This patch introduces base compatibility interface for other ip_tables modules

Nice. But some issues with the implementation


+#if defined(CONFIG_X86_64)
+#define is_current_32bits() (current_thread_info()->flags & _TIF_IA32)

This should be is_compat_task(). And we don't do such ifdefs
in generic code. And what you actually need here is a
is_compat_task_with_funny_u64_alignment() (better name sought)

So I would suggest you add macros for that to the ia64 and x86-64
asm/compat.hs and perhaps a ARCH_HAS_FUNNY_U64_ALIGNMENT #define in there.

+ ret = 0;
+ switch (convert) {
+ case COMPAT_TO_USER:
+ pt = (struct ipt_entry_target *)target;

etc. that looks ugly. why can't you just define different functions
for that? We don't really need in kernel ioctl

+#ifdef CONFIG_COMPAT
+ down(&compat_ipt_mutex);
+#endif

Why does it need an own lock?

Overall the implementation looks very complicated. Are you sure
it wasn't possible to do this simpler?


-Andi

2006-02-21 09:04:57

by Dmitry Mishin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 1/2] iptables 32bit compat layer

On Monday 20 February 2006 18:55, Arnd Bergmann wrote:
> On Monday 20 February 2006 09:10, Mishin Dmitry wrote:
> > --- ./include/linux/netfilter/x_tables.h.iptcompat??????2006-02-15
> > 16:16:02.000000000 +0300 +++
> > ./include/linux/netfilter/x_tables.h????????2006-02-15 18:53:09.000000000
> > +0300 struct xt_match
> > ?{
> > ????????struct list_head list;
> > @@ -118,6 +125,10 @@ struct xt_match
> > ????????/* Called when entry of this type deleted. */
> > ????????void (*destroy)(void *matchinfo, unsigned int matchinfosize);
> > ?
> > +#ifdef CONFIG_COMPAT
> > +???????/* Called when userspace align differs from kernel space one */
> > +???????int (*compat)(void *match, void **dstptr, int *size, int
> > convert); +#endif
> > ????????/* Set this to THIS_MODULE if you are a module, otherwise NULL */
> > ????????struct module *me;
> > ?};
>
> Is CONFIG_COMPAT the right conditional here? If the code is only used
> for architectures that have different aligments, it should not need be
> compiled in for the other architectures.
So, I'll define ARCH_HAS_FUNNY_64_ALIGNMENT in x86_64 and ia64 code and will
check it, as Andi suggested.

>
> > @@ -154,6 +165,10 @@ struct xt_target
> > ????????/* Called when entry of this type deleted. */
> > ????????void (*destroy)(void *targinfo, unsigned int targinfosize);
> > ?
> > +#ifdef CONFIG_COMPAT
> > +???????/* Called when userspace align differs from kernel space one */
> > +???????int (*compat)(void *target, void **dstptr, int *size, int
> > convert); +#endif
> > ????????/* Set this to THIS_MODULE if you are a module, otherwise NULL */
> > ????????struct module *me;
> > ?};
> > @@ -233,6 +248,34 @@ extern void xt_proto_fini(int af);
> > ?extern struct xt_table_info *xt_alloc_table_info(unsigned int size);
> > ?extern void xt_free_table_info(struct xt_table_info *info);
> > ?
> > +#ifdef CONFIG_COMPAT
> > +#include <net/compat.h>
> > +
> > +/* FIXME: this works only on 32 bit tasks
> > + * need to change whole approach in order to calculate align as function
> > of + * current task alignment */
> > +
> > +struct compat_xt_counters
> > +{
> > +???????u_int32_t cnt[4];
> > +};
>
> Hmm, maybe we should have something like
>
> typedef u64 __attribute__((aligned(4))) compat_u64;
>
> in order to get the right alignment on the architectures
> where it makes a difference. Do all compiler versions
> get that right?
good point. I don't know this and that's why tried to avoid use of 'aligned'
attribute.

>
> > ---
> > ./include/linux/netfilter_ipv4/ip_tables.h.iptcompat????????2006-02-15
> > 16:06:41.000000000 +0300 +++
> > ./include/linux/netfilter_ipv4/ip_tables.h??2006-02-15 16:37:12.000000000
> > +0300 @@ -364,5 +365,62 @@ extern unsigned int ipt_do_table(struct
> > ???????????????????????????????? void *userdata);
> > ?
> > ?#define IPT_ALIGN(s) XT_ALIGN(s)
> > +
> > +#ifdef CONFIG_COMPAT
> > +#include <net/compat.h>
> > +
> > +struct compat_ipt_getinfo
> > +{
> > +???????char name[IPT_TABLE_MAXNAMELEN];
> > +???????compat_uint_t valid_hooks;
> > +???????compat_uint_t hook_entry[NF_IP_NUMHOOKS];
> > +???????compat_uint_t underflow[NF_IP_NUMHOOKS];
> > +???????compat_uint_t num_entries;
> > +???????compat_uint_t size;
> > +};
>
> This structure looks like it does not need any
> conversions. You should probably just use
> struct ipt_getinfo then.
I just saw compat_uint_t use in net/compat.c and thought, that it is a good
style to use it. Does anybody know arch, where sizeof(compat_uint_t) != 4?

>
> > +
> > +struct compat_ipt_entry_match
> > +{
> > +???????union {
> > +???????????????struct {
> > +???????????????????????u_int16_t match_size;
> > +???????????????????????char name[IPT_FUNCTION_MAXNAMELEN];
> > +???????????????} user;
> > +???????????????u_int16_t match_size;
> > +???????} u;
> > +???????unsigned char data[0];
> > +};
> > +
> > +struct compat_ipt_entry_target
> > +{
> > +???????union {
> > +???????????????struct {
> > +???????????????????????u_int16_t target_size;
> > +???????????????????????char name[IPT_FUNCTION_MAXNAMELEN];
> > +???????????????} user;
> > +???????????????u_int16_t target_size;
> > +???????} u;
> > +???????unsigned char data[0];
> > +};
>
> Dito
Disagree, ipt_entry_match and ipt_entry_target contain pointers which make
their alignment equal 8 byte on 64bits architectures.

>
> > +#define COMPAT_IPT_ALIGN(s) ???COMPAT_XT_ALIGN(s)
> > +
> > +extern int ipt_match_align_compat(void *match, void **dstptr,
> > +???????????????int *size, int off, int convert);
> > +extern int ipt_target_align_compat(void *target, void **dstptr,
> > +???????????????int *size, int off, int convert);
> > +
> > +#endif /* CONFIG_COMPAT */
> > ?#endif /*__KERNEL__*/
> > ?#endif /* _IPTABLES_H */
> > --- ./include/net/compat.h.iptcompat????2006-01-03 06:21:10.000000000
> > +0300 +++ ./include/net/compat.h??????2006-02-15 18:45:49.000000000 +0300
> > @@ -23,6 +23,14 @@ struct compat_cmsghdr {
> > ????????compat_int_t????cmsg_type;
> > ?};
> > ?
> > +#if defined(CONFIG_X86_64)
> > +#define is_current_32bits() (current_thread_info()->flags & _TIF_IA32)
> > +#elif defined(CONFIG_IA64)
> > +#define is_current_32bits() (IS_IA32_PROCESS(ia64_task_regs(current)))
> > +#else
> > +#define is_current_32bits()????0
> > +#endif
> > +
>
> This definition looks very wrong to me. For x86_64, the right thing to
> check should be TS_COMPAT, no _TIF_IA32, since you can also call the 64 bit
> syscall entry point from a i386 task running on x86_64. For most other
> architectures, is_current_32bits returns something that is not reflected in
> the name. I would e.g. expect the function to return '1' on i386 and the
> correct task state on other compat platforms, instead of a bogus '0'.
>
> There have been long discussions about the inclusions of the
> 'is_compat_task' macro. Let's at least not define a second function that
> does almost the same but gets it wrong.
>
> I would much rather have either an extra 'compat' argument to to
> sock_setsockopt and proto_ops->setsockopt than to spread the use
> of is_compat_task further.
Another weak place in my code. is_compat_task() approach has one advantage -
it doesn't require a lot of current code modifications.
>
> > ?#else /* defined(CONFIG_COMPAT) */
> > ?#define compat_msghdr??msghdr??????????/* to avoid compiler warnings */
> > ?#endif /* defined(CONFIG_COMPAT) */
> > --- ./net/compat.c.iptcompat????2006-01-03 06:21:10.000000000 +0300
> > +++ ./net/compat.c??????2006-02-15 16:38:45.000000000 +0300
> > @@ -308,107 +308,6 @@ void scm_detach_fds_compat(struct msghdr
> > ?}
> > ?
> > ?/*
> > - * For now, we assume that the compatibility and native version
> > - * of struct ipt_entry are the same - sfr. ?FIXME
> > - */
> > -struct compat_ipt_replace {
> > -???????char????????????????????name[IPT_TABLE_MAXNAMELEN];
> > -???????u32?????????????????????valid_hooks;
> > -???????u32?????????????????????num_entries;
> > -???????u32?????????????????????size;
> > -???????u32?????????????????????hook_entry[NF_IP_NUMHOOKS];
> > -???????u32?????????????????????underflow[NF_IP_NUMHOOKS];
> > -???????u32?????????????????????num_counters;
> > -???????compat_uptr_t???????????counters;???????/* struct ipt_counters *
> > */ -???????struct ipt_entry????????entries[0];
> > -};
>
> Is the FIXME above the only reason that the code needs to be changed?
> What is the reason that you did not just address this in the
> compat_sys_setsockopt implementation?
Code above doesn't work. iptables with version >= 1.3 does alignment checks as
well as kernel does. So, we can't simply put entries with 8 bytes alignment
to userspace or with 4 bytes alignment to kernel - we need translate them
entry by entry. So, I tried to do this the most correct way - that userspace
will hide its alignment from kernel and vice versa, with not only
SET_REPLACE, but also GET_INFO, GET_ENTRIES and SET_COUNTERS translation.
First implementation was exactly in compat_sys_setsockopt, but David asked me
to do this in netfilter code itself.

>
> Arnd <><
>
> _______________________________________________
> Devel mailing list
> [email protected]
> https://openvz.org/mailman/listinfo/devel

--
Thanks,
Dmitry.

2006-02-21 09:24:35

by Dmitry Mishin

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 1/2] iptables 32bit compat layer

On Tuesday 21 February 2006 00:23, Andi Kleen wrote:
> Mishin Dmitry <[email protected]> writes:
> > Hello,
> >
> > This patch set extends current iptables compatibility layer in order to
> > get 32bit iptables to work on 64bit kernel. Current layer is insufficient
> > due to alignment checks both in kernel and user space tools.
> >
> > This patch introduces base compatibility interface for other ip_tables
> > modules
>
> Nice. But some issues with the implementation
>
>
> +#if defined(CONFIG_X86_64)
> +#define is_current_32bits() (current_thread_info()->flags & _TIF_IA32)
>
> This should be is_compat_task(). And we don't do such ifdefs
> in generic code. And what you actually need here is a
> is_compat_task_with_funny_u64_alignment() (better name sought)
>
> So I would suggest you add macros for that to the ia64 and x86-64
> asm/compat.hs and perhaps a ARCH_HAS_FUNNY_U64_ALIGNMENT #define in there.
agree.

>
> + ret = 0;
> + switch (convert) {
> + case COMPAT_TO_USER:
> + pt = (struct ipt_entry_target *)target;
>
> etc. that looks ugly. why can't you just define different functions
> for that? We don't really need in kernel ioctl
3 functions and the requirement that if defined one, than defined all of them?

>
> +#ifdef CONFIG_COMPAT
> + down(&compat_ipt_mutex);
> +#endif
>
> Why does it need an own lock?
Because it protects only compatibility translation. We spend a lot of time in
these cycles and I don't think that it is a good way to hold ipt_mutex for
this. The only reason of this lock is offset list - in the first iteration I
fill it, in the second - use it. If you know how to implement this better,
let me know.

>
> Overall the implementation looks very complicated. Are you sure
> it wasn't possible to do this simpler?
ughh...
I don't like this code as well. But seems that it is due to iptables code
itself, which was designed with no thoughts about compatibility in minds.

So, I see following approaches:
1) do translation before pass data to original do_replace or get_entries.
Disadvantage of such approach is additional 2 cycles through data.
2) do translation in compat_do_replace and compat_get_entries. Avoidance of
additional cycles, but some code duplication.
3) remove alignment checks in kernel - than we need only first time
translation from kernel to user. But such code will not work with both 32bit
and 64 bit iptables at the same time.

Any suggestions?

>
>
> -Andi
>
> _______________________________________________
> Devel mailing list
> [email protected]
> https://openvz.org/mailman/listinfo/devel

--
Thanks,
Dmitry.

2006-02-21 11:56:25

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [Devel] Re: [PATCH 1/2] iptables 32bit compat layer

On Tuesday 21 February 2006 10:04, Dmitry Mishin wrote:
> On Monday 20 February 2006 18:55, Arnd Bergmann wrote:

> > Is CONFIG_COMPAT the right conditional here? If the code is only used
> > for architectures that have different aligments, it should not need be
> > compiled in for the other architectures.
> So, I'll define ARCH_HAS_FUNNY_64_ALIGNMENT in x86_64 and ia64 code and will
> check it, as Andi suggested.
>

I think nowadays, unconditionally setting CONFIG_FUNNY_64_ALIGNMENT from
arch/{ia64,x86_64}/Kconfig would be the preferred way to a #define in
include/asm.

> > > ?#define IPT_ALIGN(s) XT_ALIGN(s)
> > > +
> > > +#ifdef CONFIG_COMPAT
> > > +#include <net/compat.h>
> > > +
> > > +struct compat_ipt_getinfo
> > > +{
> > > +???????char name[IPT_TABLE_MAXNAMELEN];
> > > +???????compat_uint_t valid_hooks;
> > > +???????compat_uint_t hook_entry[NF_IP_NUMHOOKS];
> > > +???????compat_uint_t underflow[NF_IP_NUMHOOKS];
> > > +???????compat_uint_t num_entries;
> > > +???????compat_uint_t size;
> > > +};
> >
> > This structure looks like it does not need any
> > conversions. You should probably just use
> > struct ipt_getinfo then.
> I just saw compat_uint_t use in net/compat.c and thought, that it is a good
> style to use it. Does anybody know arch, where sizeof(compat_uint_t) != 4?

No, the compat layer already heavily depends on the fact that compat_uint_t
is always the same as unsigned int.

> >
> > Dito
> Disagree, ipt_entry_match and ipt_entry_target contain pointers which make
> their alignment equal 8 byte on 64bits architectures.

Ah, I see.

> > I would much rather have either an extra 'compat' argument to to
> > sock_setsockopt and proto_ops->setsockopt than to spread the use
> > of is_compat_task further.
> Another weak place in my code. is_compat_task() approach has one advantage -
> it doesn't require a lot of current code modifications.
> >
> > Is the FIXME above the only reason that the code needs to be changed?
> > What is the reason that you did not just address this in the
> > compat_sys_setsockopt implementation?
> Code above doesn't work. iptables with version >= 1.3 does alignment checks as
> well as kernel does. So, we can't simply put entries with 8 bytes alignment
> to userspace or with 4 bytes alignment to kernel - we need translate them
> entry by entry. So, I tried to do this the most correct way - that userspace
> will hide its alignment from kernel and vice versa, with not only
> SET_REPLACE, but also GET_INFO, GET_ENTRIES and SET_COUNTERS translation.
> First implementation was exactly in compat_sys_setsockopt, but David asked me
> to do this in netfilter code itself.

Ok, I see the point there. It's probably best to push down all the conversions
from compat_sys_setsockopt down to the protocol specific parts, similar to what
we do for the ioctl handlers.

I'm thinking of something like

int compat_sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, int optlen)
{
switch (optname) {
case SO_ATTACH_FILTER:
return do_set_attach_filter(fd, level, optname,
optval, optlen);
case SO_SNDTIMEO:
return do_set_sock_timeout(fd, level, optname,
optval, optlen);
default:
break;
}
return sock_setsockopt(sock, level, optname, optval, optlen);
}

asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
char __user *optval, int optlen)
{
int err;
struct socket *sock;

if (optlen < 0)
return -EINVAL;

if ((sock = sockfd_lookup(fd, &err))!=NULL)
{
err = security_socket_setsockopt(sock,level,optname);
if (err) {
sockfd_put(sock);
return err;
}

if (level == SOL_SOCKET)
err = compat_sock_setsockopt(sock, level,
optname, optval, optlen);
else if (sock->ops->compat_setsockopt)
err = sock->ops->compat_setsockopt(sock, level,
optname, optval, optlen);
else
err = sock->ops->setsockopt(sock, level,
optname, optval, optlen);
sockfd_put(sock);
}
return err;
}

int tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen)
{
int err = 0;

err = ip_setsockopt(sk, level, optname, optval, optlen);

#ifdef CONFIG_NETFILTER
if (err = -ENOPROTOOPT) {
lock_sock(sk);
err = nf_setsockopt(sk, PF_INET, optname, optval, optlen);
release_sock(sk);
}
#endif
return err;
}

int compat_tcp_setsockopt(struct sock *sk, int level, int optname, char __user *optval, int optlen)
{
int err = 0;

err = ip_setsockopt(sk, level, optname, optval, optlen);

#ifdef CONFIG_NETFILTER
if (err = -ENOPROTOOPT) {
lock_sock(sk);
err = compat_nf_setsockopt(sk, PF_INET, optname, optval, optlen);
release_sock(sk);
}
#endif
return err;
}

And the same for udp, raw, ipv6, decnet and each of those with getsockopt.
It is a bigger change, but it puts all the handlers where they belong
and it is more extensible to other sockopt handlers if we find more
fsckup in some of them.

Arnd <><

2006-03-07 14:07:32

by Dmitry Mishin

[permalink] [raw]
Subject: {get|set}sockopt compat layer

Hello, Arnd!

Sorry for such delay, was on vacancy. Here is a patch, introducing
compat_(get|set)sockopt handlers, as you proposed.

On Tuesday 21 February 2006 14:56, Arnd Bergmann wrote:
> On Tuesday 21 February 2006 10:04, Dmitry Mishin wrote:
> > On Monday 20 February 2006 18:55, Arnd Bergmann wrote:
> > > Is CONFIG_COMPAT the right conditional here? If the code is only used
> > > for architectures that have different aligments, it should not need be
> > > compiled in for the other architectures.
> >
> > So, I'll define ARCH_HAS_FUNNY_64_ALIGNMENT in x86_64 and ia64 code and
> > will check it, as Andi suggested.
>
> I think nowadays, unconditionally setting CONFIG_FUNNY_64_ALIGNMENT from
> arch/{ia64,x86_64}/Kconfig would be the preferred way to a #define in
> include/asm.
>
> > > > ?#define IPT_ALIGN(s) XT_ALIGN(s)
> > > > +
> > > > +#ifdef CONFIG_COMPAT
> > > > +#include <net/compat.h>
> > > > +
> > > > +struct compat_ipt_getinfo
> > > > +{
> > > > +???????char name[IPT_TABLE_MAXNAMELEN];
> > > > +???????compat_uint_t valid_hooks;
> > > > +???????compat_uint_t hook_entry[NF_IP_NUMHOOKS];
> > > > +???????compat_uint_t underflow[NF_IP_NUMHOOKS];
> > > > +???????compat_uint_t num_entries;
> > > > +???????compat_uint_t size;
> > > > +};
> > >
> > > This structure looks like it does not need any
> > > conversions. You should probably just use
> > > struct ipt_getinfo then.
> >
> > I just saw compat_uint_t use in net/compat.c and thought, that it is a
> > good style to use it. Does anybody know arch, where sizeof(compat_uint_t)
> > != 4?
>
> No, the compat layer already heavily depends on the fact that compat_uint_t
> is always the same as unsigned int.
>
> > > Dito
> >
> > Disagree, ipt_entry_match and ipt_entry_target contain pointers which
> > make their alignment equal 8 byte on 64bits architectures.
>
> Ah, I see.
>
> > > I would much rather have either an extra 'compat' argument to to
> > > sock_setsockopt and proto_ops->setsockopt than to spread the use
> > > of is_compat_task further.
> >
> > Another weak place in my code. is_compat_task() approach has one
> > advantage - it doesn't require a lot of current code modifications.
> >
> > > Is the FIXME above the only reason that the code needs to be changed?
> > > What is the reason that you did not just address this in the
> > > compat_sys_setsockopt implementation?
> >
> > Code above doesn't work. iptables with version >= 1.3 does alignment
> > checks as well as kernel does. So, we can't simply put entries with 8
> > bytes alignment to userspace or with 4 bytes alignment to kernel - we
> > need translate them entry by entry. So, I tried to do this the most
> > correct way - that userspace will hide its alignment from kernel and vice
> > versa, with not only SET_REPLACE, but also GET_INFO, GET_ENTRIES and
> > SET_COUNTERS translation. First implementation was exactly in
> > compat_sys_setsockopt, but David asked me to do this in netfilter code
> > itself.
>
> Ok, I see the point there. It's probably best to push down all the
> conversions from compat_sys_setsockopt down to the protocol specific parts,
> similar to what we do for the ioctl handlers.
>
> I'm thinking of something like
>
> int compat_sock_setsockopt(struct socket *sock, int level, int optname,
> char __user *optval, int optlen)
> {
> switch (optname) {
> case SO_ATTACH_FILTER:
> return do_set_attach_filter(fd, level, optname,
> optval, optlen);
> case SO_SNDTIMEO:
> return do_set_sock_timeout(fd, level, optname,
> optval, optlen);
> default:
> break;
> }
> return sock_setsockopt(sock, level, optname, optval, optlen);
> }
>
> asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
> char __user *optval, int optlen)
> {
> int err;
> struct socket *sock;
>
> if (optlen < 0)
> return -EINVAL;
>
> if ((sock = sockfd_lookup(fd, &err))!=NULL)
> {
> err = security_socket_setsockopt(sock,level,optname);
> if (err) {
> sockfd_put(sock);
> return err;
> }
>
> if (level == SOL_SOCKET)
> err = compat_sock_setsockopt(sock, level,
> optname, optval, optlen);
> else if (sock->ops->compat_setsockopt)
> err = sock->ops->compat_setsockopt(sock, level,
> optname, optval, optlen);
> else
> err = sock->ops->setsockopt(sock, level,
> optname, optval, optlen);
> sockfd_put(sock);
> }
> return err;
> }
>
> int tcp_setsockopt(struct sock *sk, int level, int optname, char __user
> *optval, int optlen) {
> int err = 0;
>
> err = ip_setsockopt(sk, level, optname, optval, optlen);
>
> #ifdef CONFIG_NETFILTER
> if (err = -ENOPROTOOPT) {
> lock_sock(sk);
> err = nf_setsockopt(sk, PF_INET, optname, optval, optlen);
> release_sock(sk);
> }
> #endif
> return err;
> }
>
> int compat_tcp_setsockopt(struct sock *sk, int level, int optname, char
> __user *optval, int optlen) {
> int err = 0;
>
> err = ip_setsockopt(sk, level, optname, optval, optlen);
>
> #ifdef CONFIG_NETFILTER
> if (err = -ENOPROTOOPT) {
> lock_sock(sk);
> err = compat_nf_setsockopt(sk, PF_INET, optname, optval, optlen);
> release_sock(sk);
> }
> #endif
> return err;
> }
>
> And the same for udp, raw, ipv6, decnet and each of those with getsockopt.
> It is a bigger change, but it puts all the handlers where they belong
> and it is more extensible to other sockopt handlers if we find more
> fsckup in some of them.
>
> Arnd <><

--
Thanks,
Dmitry.


Attachments:
(No filename) (5.30 kB)
diff-ms-sockopts-compat-20060307 (40.40 kB)
Download all attachments

2006-03-07 15:06:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: {get|set}sockopt compat layer

On Tuesday 07 March 2006 15:07, Dmitry Mishin wrote:
> Sorry for such delay, was on vacancy. Here is a patch, introducing
> compat_(get|set)sockopt handlers, as you proposed.

Looks pretty good to me, just a few nits I like to pick:

> --- ./include/linux/net.h.compat????????2006-03-07 11:22:27.000000000 +0300
> +++ ./include/linux/net.h???????2006-03-07 11:20:07.000000000 +0300
> @@ -149,6 +149,12 @@ struct proto_ops {
> ???????????????????????????????? ? ? ?int optname, char __user *optval, int optlen);
> ????????int?????????????(*getsockopt)(struct socket *sock, int level,
> ???????????????????????????????? ? ? ?int optname, char __user *optval, int __user *optlen);
> +#ifdef CONFIG_COMPAT
> +???????int?????????????(*compat_setsockopt)(struct socket *sock, int level,
> +??????????????????????????????? ? ? ?int optname, char __user *optval, int optlen);
> +???????int?????????????(*compat_getsockopt)(struct socket *sock, int level,
> +??????????????????????????????? ? ? ?int optname, char __user *optval, int __user *optlen);
> +#endif
> ????????int?????????????(*sendmsg) ? (struct kiocb *iocb, struct socket *sock,
> ???????????????????????????????? ? ? ?struct msghdr *m, size_t total_len);
> ????????int?????????????(*recvmsg) ? (struct kiocb *iocb, struct socket *sock,

For the compat_ioctl stuff, we don't have the function pointer inside an
#ifdef, the overhead is relatively small since there is only one of these
structures per module implementing a protocol, but it avoids having to
rebuild everything when changing CONFIG_COMPAT.

It's probably not a big issue either way, maybe davem has a stronger opinion
on it either way.

> --- ./include/linux/netfilter.h.compat??2006-03-06 12:06:34.000000000 +0300
> +++ ./include/linux/netfilter.h?2006-03-07 15:00:14.000000000 +0300
> @@ -2,6 +2,7 @@
> ?#define __LINUX_NETFILTER_H
> ?
> ?#ifdef __KERNEL__
> +#include <linux/config.h>
> ?#include <linux/init.h>
> ?#include <linux/types.h>
> ?#include <linux/skbuff.h>

You don't need to add new <linux/config.h> includes any more, these are
automatic now.

> @@ -80,10 +81,18 @@ struct nf_sockopt_ops
> ????????int set_optmin;
> ????????int set_optmax;
> ????????int (*set)(struct sock *sk, int optval, void __user *user, unsigned int len);
> +#ifdef CONFIG_COMPAT
> +???????int (*compat_set)(struct sock *sk, int optval,
> +???????????????????????void __user *user, unsigned int len);
> +#endif
> ?
> ????????int get_optmin;
> ????????int get_optmax;
> ????????int (*get)(struct sock *sk, int optval, void __user *user, int *len);
> +#ifdef CONFIG_COMPAT
> +???????int (*compat_get)(struct sock *sk, int optval,
> +???????????????????????void __user *user, int *len);
> +#endif
> ?
> ????????/* Number of users inside set() or get(). */
> ????????unsigned int use;

see above, same for some more of these.

> @@ -816,6 +826,12 @@ extern int sock_common_recvmsg(struct ki
> ???????????????????????? ? ? ? struct msghdr *msg, size_t size, int flags);
> ?extern int sock_common_setsockopt(struct socket *sock, int level, int optname,
> ???????????????????????????????? ?char __user *optval, int optlen);
> +#ifdef CONFIG_COMPAT
> +extern int compat_sock_common_getsockopt(struct socket *sock, int level,
> +???????????????int optname, char __user *optval, int __user *optlen);
> +extern int compat_sock_common_setsockopt(struct socket *sock, int level,
> +???????????????int optname, char __user *optval, int optlen);
> +#endif
> ?
> ?extern void sk_common_release(struct sock *sk);
> ?

Declarations don't belong inside #ifdef.

Arnd <><

2006-03-09 10:24:05

by Dmitry Mishin

[permalink] [raw]
Subject: Re: {get|set}sockopt compat layer

Hello, Arnd!

> For the compat_ioctl stuff, we don't have the function pointer inside an
> #ifdef, the overhead is relatively small since there is only one of these
> structures per module implementing a protocol, but it avoids having to
> rebuild everything when changing CONFIG_COMPAT.
>
> It's probably not a big issue either way, maybe davem has a stronger
> opinion on it either way.
>
Done.

--
Thanks,
Dmitry.


Attachments:
(No filename) (418.00 B)
diff-ms-sockopts-compat-20060309 (39.48 kB)
Download all attachments

2006-03-09 23:29:43

by David Miller

[permalink] [raw]
Subject: Re: {get|set}sockopt compat layer

From: Dmitry Mishin <[email protected]>
Date: Thu, 9 Mar 2006 13:23:59 +0300

> Hello, Arnd!
>
> > For the compat_ioctl stuff, we don't have the function pointer inside an
> > #ifdef, the overhead is relatively small since there is only one of these
> > structures per module implementing a protocol, but it avoids having to
> > rebuild everything when changing CONFIG_COMPAT.
> >
> > It's probably not a big issue either way, maybe davem has a stronger
> > opinion on it either way.
> >
> Done.

I think this looks fine but it doesn't apply cleanly to the
current net-2.6.17 tree.

Could you cook up a fresh patch, and send it with a complete
changelog entry and appropriate Signed-off-by: lines?

Thanks a lot Dmitry.

2006-03-10 11:21:20

by Dmitry Mishin

[permalink] [raw]
Subject: [PATCH] {get|set}sockopt compatibility layer

This patch extends {get|set}sockopt compatibility layer in order to move
protocol specific parts to their place and avoid huge universal net/compat.c
file in the future.

Signed-off-by: Dmitry Mishin <[email protected]>

--
Thanks,
Dmitry.


Attachments:
(No filename) (241.00 B)
diff-ms-sockopts-compat-20060310 (54.93 kB)
Download all attachments

2006-03-10 11:34:57

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] {get|set}sockopt compatibility layer

From: Dmitry Mishin <[email protected]>
Date: Fri, 10 Mar 2006 14:21:10 +0300

> This patch extends {get|set}sockopt compatibility layer in order to move
> protocol specific parts to their place and avoid huge universal net/compat.c
> file in the future.
>
> Signed-off-by: Dmitry Mishin <[email protected]>

Applied, thanks Dmitry.

Please give "-p1" format patches in the future, I fixed your's
up by hand so could feed it to git.

Thanks again.