> Detected by ubsan:
>
> UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1345:3
> shift exponent 64 is too large for 32-bit type 'int'
> CPU: 2 PID: 1167 Comm: perf_pmu Not tainted 4.14.0-rc5+ #532
> Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
> Call Trace:
> <IRQ>
> dump_stack+0xab/0xfe
> ? _atomic_dec_and_lock+0x112/0x112
> ? get_unsigned_val+0x48/0x91
> ubsan_epilogue+0x9/0x49
> __ubsan_handle_shift_out_of_bounds+0x1ea/0x241
> ? __ubsan_handle_load_invalid_value+0x136/0x136
> ? _raw_spin_unlock_irqrestore+0x32/0x50
> ? qi_submit_sync+0x642/0x820
> ? qi_flush_dev_iotlb+0x158/0x1b0
> qi_flush_dev_iotlb+0x158/0x1b0
> ? qi_flush_iotlb+0x110/0x110
> ? do_raw_spin_lock+0x93/0x130
> iommu_flush_dev_iotlb+0xff/0x170
> iommu_flush_iova+0x168/0x220
> iova_domain_flush+0x2b/0x50
> fq_flush_timeout+0xa6/0x1e0
> ? fq_ring_free+0x260/0x260
> ? call_timer_fn+0xfd/0x600
> call_timer_fn+0x160/0x600
> ? fq_ring_free+0x260/0x260
> ? trace_timer_cancel+0x1d0/0x1d0
> ? _raw_spin_unlock_irq+0x29/0x40
> ? fq_ring_free+0x260/0x260
> ? fq_ring_free+0x260/0x260
> run_timer_softirq+0x3bb/0x9a0
> ? timer_fixup_init+0x30/0x30
> ? __lock_is_held+0x35/0x150
> ? sched_clock_cpu+0x14/0x180
> __do_softirq+0x159/0x9c7
> irq_exit+0x118/0x170
> smp_apic_timer_interrupt+0xda/0x530
> apic_timer_interrupt+0x9d/0xb0
> </IRQ>
> RIP: 0010:__asan_load4+0xa/0x80
> RSP: 0018:ffff88012f647380 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> RAX: ffff7fffffffffff RBX: ffff88012f647548 RCX: ffffffff9a07ea01
> RDX: dffffc0000000000 RSI: ffff88012f647808 RDI: ffff88012f647548
> RBP: ffff88012f640000 R08: fffffbfff3b27e96 R09: fffffbfff3b27e95
> R10: ffffffff9d93f4ad R11: fffffbfff3b27e96 R12: ffff88012f648000
> R13: ffff88012f647558 R14: ffff88012f647550 R15: ffff88012f647808
> ? stack_access_ok+0x61/0x110
> stack_access_ok+0x6d/0x110
> deref_stack_reg+0x64/0x120
> ? __read_once_size_nocheck.constprop.3+0x50/0x50
> ? deref_stack_reg+0x120/0x120
> ? __kmalloc+0x13a/0x550
> unwind_next_frame+0xc04/0x14e0
> ? kasan_kmalloc+0xa0/0xd0
> ? __kmalloc+0x13a/0x550
> ? __kmalloc+0x13a/0x550
> ? deref_stack_reg+0x120/0x120
> ? unwind_next_frame+0xf3e/0x14e0
> ? i915_gem_do_execbuffer+0x4fd/0x2570 [i915]
> __save_stack_trace+0x7a/0x120
> ? __kmalloc+0x13a/0x550
> save_stack+0x33/0xa0
> ? save_stack+0x33/0xa0
> ? kasan_kmalloc+0xa0/0xd0
> ? _raw_spin_unlock+0x24/0x30
> ? deactivate_slab+0x650/0xb90
> ? ___slab_alloc+0x3e0/0x940
> ? ___slab_alloc+0x3e0/0x940
> ? i915_gem_do_execbuffer+0x4fd/0x2570 [i915]
> ? __lock_is_held+0x35/0x150
> ? mark_held_locks+0x33/0x130
> ? kasan_unpoison_shadow+0x30/0x40
> kasan_kmalloc+0xa0/0xd0
> __kmalloc+0x13a/0x550
> ? depot_save_stack+0x16a/0x7f0
> i915_gem_do_execbuffer+0x4fd/0x2570 [i915]
> ? save_stack+0x92/0xa0
> ? eb_relocate_slow+0x890/0x890 [i915]
> ? debug_check_no_locks_freed+0x200/0x200
> ? ___slab_alloc+0x3e0/0x940
> ? ___slab_alloc+0x3e0/0x940
> ? i915_gem_execbuffer2+0xdb/0x5f0 [i915]
> ? __lock_is_held+0x35/0x150
> ? i915_gem_execbuffer+0x580/0x580 [i915]
> i915_gem_execbuffer2+0x2c1/0x5f0 [i915]
> ? i915_gem_execbuffer+0x580/0x580 [i915]
> ? lock_downgrade+0x310/0x310
> ? i915_gem_execbuffer+0x580/0x580 [i915]
> drm_ioctl_kernel+0xdc/0x190
> drm_ioctl+0x46a/0x6e0
> ? i915_gem_execbuffer+0x580/0x580 [i915]
> ? drm_setversion+0x430/0x430
> ? lock_downgrade+0x310/0x310
> do_vfs_ioctl+0x138/0xbf0
> ? ioctl_preallocate+0x180/0x180
> ? do_raw_spin_unlock+0xf5/0x1d0
> ? do_raw_spin_trylock+0x90/0x90
> ? task_work_run+0x35/0x120
> ? mark_held_locks+0x33/0x130
> ? _raw_spin_unlock_irq+0x29/0x40
> ? mark_held_locks+0x33/0x130
> ? entry_SYSCALL_64_fastpath+0x5/0xad
> ? trace_hardirqs_on_caller+0x184/0x360
> SyS_ioctl+0x3b/0x70
> entry_SYSCALL_64_fastpath+0x18/0xad
> RIP: 0033:0x7fc0e1f0d587
> RSP: 002b:00007ffcfd9929f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007fc0e1f0d587
> RDX: 00007ffcfd992a90 RSI: 0000000040406469 RDI: 0000000000000003
> RBP: 0000000000000003 R08: 00007ffcfd992a50 R09: 0000000000000000
> R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>
> Code is:
>
> void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> u64 addr, unsigned mask)
> {
> struct qi_desc desc;
>
> if (mask) {
> BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
>
> ^^^ This last line is where the warning comes. Digging deeper
> VTD_PAGE_SHIFT is 12 and the passed in mask comes from iommu_flush_iova
> (via iommu_flush_dev_iotlb) as MAX_AGAW_PFN_WIDTH which is:
>
> #define MAX_AGAW_WIDTH 64
> #define MAX_AGAW_PFN_WIDTH (MAX_AGAW_WIDTH - VTD_PAGE_SHIFT)
>
> So mask is 64, which overflows the left shift in the BUG_ON.
The resulting shift is 64, mask itself is (64 - 12). I assume the code
is expecting the overflow to result in zero, so we assert that addr is
64bit aligned, ie. zero.
Reported-by: Tvrtko Ursulin <[email protected]>
Link: https://lkml.org/lkml/2017/10/20/752
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/iommu/dmar.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..87888b102057 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
struct qi_desc desc;
if (mask) {
- BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
+ BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
+ ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
+ (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
} else
On Tue, Dec 12, 2017 at 03:43:08PM -0700, Alex Williamson wrote:
[...]
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 9a7ffd13c7f0..87888b102057 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> struct qi_desc desc;
>
> if (mask) {
> - BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
> + BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
> + ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
> + (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
Could it work if we just use 1ULL instead of 1 here? Thanks,
--
Peter Xu
On Wed, 13 Dec 2017 15:13:55 +0800
Peter Xu <[email protected]> wrote:
> On Tue, Dec 12, 2017 at 03:43:08PM -0700, Alex Williamson wrote:
>
> [...]
>
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 9a7ffd13c7f0..87888b102057 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> > struct qi_desc desc;
> >
> > if (mask) {
> > - BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
> > + BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
> > + ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
> > + (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
>
> Could it work if we just use 1ULL instead of 1 here? Thanks,
In either case we're talking about shifting off the end of the
variable, which I understand to be undefined. Right? Thanks,
Alex
On 12/13/2017 9:58 AM, Alex Williamson wrote:
> On Wed, 13 Dec 2017 15:13:55 +0800
> Peter Xu <[email protected]> wrote:
>
>> On Tue, Dec 12, 2017 at 03:43:08PM -0700, Alex Williamson wrote:
>>
>> [...]
>>
>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>>> index 9a7ffd13c7f0..87888b102057 100644
>>> --- a/drivers/iommu/dmar.c
>>> +++ b/drivers/iommu/dmar.c
>>> @@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
>>> struct qi_desc desc;
>>>
>>> if (mask) {
>>> - BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
>>> + BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
>>> + ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
>>> + (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
>>
>> Could it work if we just use 1ULL instead of 1 here? Thanks,
>
> In either case we're talking about shifting off the end of the
> variable, which I understand to be undefined. Right? Thanks,
How so? Bits fall off the left (MSB) end, zeroes fill in the right (LSB)
end. I believe that behavior is pretty set.
The only problem here is that the promotion of integral types is (at the
very least) unclear in this statement. As for the proposal, do we know
that 1ULL is always going to be the same size as addr?
I would suggest, as pedantic as it sounds, creating a local u64
variable, initialized to 1, and using that in the BUG_ON:
u64 ubit = 1;
...
if (mask) {
BUG_ON(addr & ((ubit << (VTD_PAGE_SHIFT + mask)) - 1));
I believe this forces everything to be appropriately wide, and the same
size?
On Wed, 13 Dec 2017 10:41:47 -0600
"Hook, Gary" <[email protected]> wrote:
> On 12/13/2017 9:58 AM, Alex Williamson wrote:
> > On Wed, 13 Dec 2017 15:13:55 +0800
> > Peter Xu <[email protected]> wrote:
> >
> >> On Tue, Dec 12, 2017 at 03:43:08PM -0700, Alex Williamson wrote:
> >>
> >> [...]
> >>
> >>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> >>> index 9a7ffd13c7f0..87888b102057 100644
> >>> --- a/drivers/iommu/dmar.c
> >>> +++ b/drivers/iommu/dmar.c
> >>> @@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> >>> struct qi_desc desc;
> >>>
> >>> if (mask) {
> >>> - BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
> >>> + BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
> >>> + ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
> >>> + (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
> >>
> >> Could it work if we just use 1ULL instead of 1 here? Thanks,
> >
> > In either case we're talking about shifting off the end of the
> > variable, which I understand to be undefined. Right? Thanks,
>
> How so? Bits fall off the left (MSB) end, zeroes fill in the right (LSB)
> end. I believe that behavior is pretty set.
Maybe I'm relying too much on stackoverflow, but:
https://stackoverflow.com/questions/11270492/what-does-the-c-standard-say-about-bitshifting-more-bits-than-the-width-of-type
Thanks,
Alex
On 12/13/2017 11:15 AM, Alex Williamson wrote:
> On Wed, 13 Dec 2017 10:41:47 -0600
> "Hook, Gary" <[email protected]> wrote:
>
>> On 12/13/2017 9:58 AM, Alex Williamson wrote:
>>> On Wed, 13 Dec 2017 15:13:55 +0800
>>> Peter Xu <[email protected]> wrote:
>>>
>>>> On Tue, Dec 12, 2017 at 03:43:08PM -0700, Alex Williamson wrote:
>>>>
>>>> [...]
>>>>
>>>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>>>>> index 9a7ffd13c7f0..87888b102057 100644
>>>>> --- a/drivers/iommu/dmar.c
>>>>> +++ b/drivers/iommu/dmar.c
>>>>> @@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
>>>>> struct qi_desc desc;
>>>>>
>>>>> if (mask) {
>>>>> - BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
>>>>> + BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
>>>>> + ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
>>>>> + (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
>>>>
>>>> Could it work if we just use 1ULL instead of 1 here? Thanks,
>>>
>>> In either case we're talking about shifting off the end of the
>>> variable, which I understand to be undefined. Right? Thanks,
>>
>> How so? Bits fall off the left (MSB) end, zeroes fill in the right (LSB)
>> end. I believe that behavior is pretty set.
>
> Maybe I'm relying too much on stackoverflow, but:
>
> https://stackoverflow.com/questions/11270492/what-does-the-c-standard-say-about-bitshifting-more-bits-than-the-width-of-type
No, probably not. I don't have my copy of c99 handy, so can't check it.
But it is beyond me why any compiler implementation would choose to use
a rotate instead of a shift... probably a performance issue.
So, yeah, when you have silly parameters, you get what you get.
I'll stick to my suggestion. Which seems unambiguous... but I could be
wrong.
Hi Alex,
I love your patch! Yet something to improve:
[auto build test ERROR on v4.15-rc3]
[also build test ERROR on next-20171214]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Alex-Williamson/iommu-vt-d-Fix-shift-overflow-in-qi_flush_dev_iotlb/20171215-094227
config: x86_64-randconfig-x000-201750 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All error/warnings (new ones prefixed by >>):
In file included from include/linux/string.h:6:0,
from include/uapi/linux/uuid.h:22,
from include/linux/uuid.h:19,
from include/linux/mod_devicetable.h:13,
from include/linux/pci.h:21,
from drivers//iommu/dmar.c:31:
drivers//iommu/dmar.c: In function 'qi_flush_dev_iotlb':
>> drivers//iommu/dmar.c:1348:18: error: 'MAX_AGAW_PFN_WIDTH' undeclared (first use in this function)
BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
^
include/linux/compiler.h:77:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
>> drivers//iommu/dmar.c:1348:3: note: in expansion of macro 'BUG_ON'
BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
^~~~~~
drivers//iommu/dmar.c:1348:18: note: each undeclared identifier is reported only once for each function it appears in
BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
^
include/linux/compiler.h:77:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
>> drivers//iommu/dmar.c:1348:3: note: in expansion of macro 'BUG_ON'
BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
^~~~~~
vim +/MAX_AGAW_PFN_WIDTH +1348 drivers//iommu/dmar.c
1341
1342 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
1343 u64 addr, unsigned mask)
1344 {
1345 struct qi_desc desc;
1346
1347 if (mask) {
> 1348 BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
1349 ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
1350 (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
1351 addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
1352 desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
1353 } else
1354 desc.high = QI_DEV_IOTLB_ADDR(addr);
1355
1356 if (qdep >= QI_DEV_IOTLB_MAX_INVS)
1357 qdep = 0;
1358
1359 desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
1360 QI_DIOTLB_TYPE;
1361
1362 qi_submit_sync(&desc, iommu);
1363 }
1364
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Wed, Dec 13, 2017 at 11:31:02AM -0600, Hook, Gary wrote:
> On 12/13/2017 11:15 AM, Alex Williamson wrote:
> > On Wed, 13 Dec 2017 10:41:47 -0600
> > "Hook, Gary" <[email protected]> wrote:
> >
> > > On 12/13/2017 9:58 AM, Alex Williamson wrote:
> > > > On Wed, 13 Dec 2017 15:13:55 +0800
> > > > Peter Xu <[email protected]> wrote:
> > > > > On Tue, Dec 12, 2017 at 03:43:08PM -0700, Alex Williamson wrote:
> > > > >
> > > > > [...]
> > > > > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > > > > > index 9a7ffd13c7f0..87888b102057 100644
> > > > > > --- a/drivers/iommu/dmar.c
> > > > > > +++ b/drivers/iommu/dmar.c
> > > > > > @@ -1345,7 +1345,9 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
> > > > > > struct qi_desc desc;
> > > > > > if (mask) {
> > > > > > - BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
> > > > > > + BUG_ON((mask > MAX_AGAW_PFN_WIDTH) ||
> > > > > > + ((mask == MAX_AGAW_PFN_WIDTH) && addr) ||
> > > > > > + (addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)));
> > > > >
> > > > > Could it work if we just use 1ULL instead of 1 here? Thanks,
> > > >
> > > > In either case we're talking about shifting off the end of the
> > > > variable, which I understand to be undefined. Right? Thanks,
> > >
> > > How so? Bits fall off the left (MSB) end, zeroes fill in the right (LSB)
> > > end. I believe that behavior is pretty set.
> >
> > Maybe I'm relying too much on stackoverflow, but:
> >
> > https://stackoverflow.com/questions/11270492/what-does-the-c-standard-say-about-bitshifting-more-bits-than-the-width-of-type
>
> No, probably not. I don't have my copy of c99 handy, so can't check it. But
> it is beyond me why any compiler implementation would choose to use a rotate
> instead of a shift... probably a performance issue.
>
> So, yeah, when you have silly parameters, you get what you get.
>
> I'll stick to my suggestion. Which seems unambiguous... but I could be
> wrong.
Hi, Alex, Hook,
I did a quick test:
xz-mi:tmp $ cat a.c
#include <stdio.h>
void main(void)
{
unsigned long long val = 0x8000000000000001ULL;
int shift;
printf("origin: 0x%llx\n", val);
shift = 1; printf("shl 1: 0x%llx\n", val << shift);
shift = 62; printf("shl 62: 0x%llx\n", val << shift);
shift = 63; printf("shl 63: 0x%llx\n", val << shift);
shift = 64; printf("shl 64: 0x%llx\n", val << shift);
shift = 65; printf("shl 65: 0x%llx\n", val << shift);
}
xz-mi:tmp $ gcc -std=c99 a.c
xz-mi:tmp $ ./a.out
origin: 0x8000000000000001
shl 1: 0x2
shl 62: 0x4000000000000000
shl 63: 0x8000000000000000
shl 64: 0x8000000000000001
shl 65: 0x2
xz-mi:tmp $ objdump -d a.out | grep -A20 "<main>"
00000000004004d7 <main>:
4004d7: 55 push %rbp
4004d8: 48 89 e5 mov %rsp,%rbp
4004db: 48 83 ec 10 sub $0x10,%rsp
4004df: 48 b8 01 00 00 00 00 movabs $0x8000000000000001,%rax
4004e6: 00 00 80
4004e9: 48 89 45 f8 mov %rax,-0x8(%rbp)
4004ed: 48 8b 45 f8 mov -0x8(%rbp),%rax
4004f1: 48 89 c6 mov %rax,%rsi
4004f4: bf 60 06 40 00 mov $0x400660,%edi
4004f9: b8 00 00 00 00 mov $0x0,%eax
4004fe: e8 ed fe ff ff callq 4003f0 <printf@plt>
400503: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp)
40050a: 8b 45 f4 mov -0xc(%rbp),%eax
40050d: 48 8b 55 f8 mov -0x8(%rbp),%rdx
400511: 89 c1 mov %eax,%ecx
400513: 48 d3 e2 shl %cl,%rdx
400516: 48 89 d0 mov %rdx,%rax
400519: 48 89 c6 mov %rax,%rsi
40051c: bf 70 06 40 00 mov $0x400670,%edi
400521: b8 00 00 00 00 mov $0x0,%eax
So it seems not really a rotation operation, but it looks more like
convering a "shifting N" into "shifting N%64" when N>=64. So now I
agree with Alex's change. Thanks all.
--
Peter Xu