x86 Assembly

Moderators: None (Apply to moderate this forum)
Number of threads: 4556
Number of posts: 16011

This Forum Only
Post New Thread
Single Post View       Linear View       Threaded View      f

Report
understanding gcc assembly output Posted by _codist_ on 12 Sept 2004 at 1:03 PM
This message was edited by _codist_ at 2004-9-12 13:27:16

Out of curiosity I disassembled a trivial executable I've written in C, and faced some problems fully understanding the output. Here's the source of the exectuable:
void function()
{
   int i;
   i = 1;
}

int main()
{
   function();
   return 0;
}

I compiled it using gcc 3.3.4 (command: gcc -o main main.c) and disassembled it using objdump 2.14.90.0.7 (command: objdump -d main). Here's the (shortened) output:
08048334 <function>:
 8048334:       55                      push   %ebp
 8048335:       89 e5                   mov    %esp,%ebp
 8048337:       83 ec 04                sub    $0x4,%esp
 804833a:       c7 45 fc 01 00 00 00    movl   $0x1,0xfffffffc(%ebp)
 8048341:       c9                      leave  
 8048342:       c3                      ret    

08048343 <main>:
 8048343:       55                      push   %ebp
 8048344:       89 e5                   mov    %esp,%ebp
 8048346:       83 ec 08                sub    $0x8,%esp
 8048349:       83 e4 f0                and    $0xfffffff0,%esp
 804834c:       b8 00 00 00 00          mov    $0x0,%eax
 8048351:       29 c4                   sub    %eax,%esp
 8048353:       e8 dc ff ff ff          call   8048334 <function>
 8048358:       b8 00 00 00 00          mov    $0x0,%eax
 804835d:       c9                      leave  
 804835e:       c3                      ret    
 804835f:       90                      nop    

As I understand this, the main function starts by setting up an 8 byte stack frame. Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
Anyway, I think I've got the rest of the main function. 0 is put in eax and is then substracted from esp (probably that's what they mean with the overhead C automatically adds), function() is called, 0 is put in eax again, the stack frame is cleared, and the function returns.

The function "function" also starts with the good old stack frame set up, sized 4 bytes this time to keep the lokal int variable. Then the value of the var is set to 1 (line 804833a: movl $0x1,0xfffffffc(%ebp)). What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...

Any answers for the mentioned questions would be appreciated, and please also let me know if I any of my interpretations of the assembley code are wrong ... Looking forward to your answers!


Report
Re: understanding gcc assembly output Posted by shaolin007 on 13 Sept 2004 at 12:01 PM
: This message was edited by _codist_ at 2004-9-12 13:27:16

: Out of curiosity I disassembled a trivial executable I've written in C, and faced some problems fully understanding the output. Here's the source of the exectuable:
:
: void function()
: {
:    int i;
:    i = 1;
: }
: 
: int main()
: {
:    function();
:    return 0;
: }
: 

: I compiled it using gcc 3.3.4 (command: gcc -o main main.c) and disassembled it using objdump 2.14.90.0.7 (command: objdump -d main). Here's the (shortened) output:
:
: 08048334 <function>:
:  8048334:       55                      push   %ebp
:  8048335:       89 e5                   mov    %esp,%ebp
:  8048337:       83 ec 04                sub    $0x4,%esp
:  804833a:       c7 45 fc 01 00 00 00    movl   $0x1,0xfffffffc(%ebp)
:  8048341:       c9                      leave  
:  8048342:       c3                      ret    
: 
: 08048343 <main>:
:  8048343:       55                      push   %ebp
:  8048344:       89 e5                   mov    %esp,%ebp
:  8048346:       83 ec 08                sub    $0x8,%esp
:  8048349:       83 e4 f0                and    $0xfffffff0,%esp
:  804834c:       b8 00 00 00 00          mov    $0x0,%eax
:  8048351:       29 c4                   sub    %eax,%esp
:  8048353:       e8 dc ff ff ff          call   8048334 <function>
:  8048358:       b8 00 00 00 00          mov    $0x0,%eax
:  804835d:       c9                      leave  
:  804835e:       c3                      ret    
:  804835f:       90                      nop    
: 

: As I understand this, the main function starts by setting up an 8 byte stack frame. Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
: Anyway, I think I've got the rest of the main function. 0 is put in eax and is then substracted from esp (probably that's what they mean with the overhead C automatically adds), function() is called, 0 is put in eax again, the stack frame is cleared, and the function returns.
:
: The function "function" also starts with the good old stack frame set up, sized 4 bytes this time to keep the lokal int variable. Then the value of the var is set to 1 (line 804833a: movl $0x1,0xfffffffc(%ebp)). What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...
:
: Any answers for the mentioned questions would be appreciated, and please also let me know if I any of my interpretations of the assembley code are wrong ... Looking forward to your answers!
:
:
:
Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...

Depends, what is the value in ESP before the bitwise AND? Have you run it through a debugger?

What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...

I'm not too sure either, and it don't help to look at this backwards either since I'm use to the Intel syntax not AT&T, but it looks like the variable 'i' was allocated 4 bytes(dword size=size movl ?) hence the 4 byte stack frame. Then it was moved into the memory location 0xfffffffc. Why it was put in the EBP location puzzles me and why you would move that value into EBP and not ESP? Sorry if I couldn't be more of help.
Report
Re: understanding gcc assembly output Posted by _codist_ on 14 Sept 2004 at 9:05 AM
: : This message was edited by _codist_ at 2004-9-12 13:27:16

: : Out of curiosity I disassembled a trivial executable I've written in C, and faced some problems fully understanding the output. Here's the source of the exectuable:
: :
: : void function()
: : {
: :    int i;
: :    i = 1;
: : }
: : 
: : int main()
: : {
: :    function();
: :    return 0;
: : }
: : 

: : I compiled it using gcc 3.3.4 (command: gcc -o main main.c) and disassembled it using objdump 2.14.90.0.7 (command: objdump -d main). Here's the (shortened) output:
: :
: : 08048334 <function>:
: :  8048334:       55                      push   %ebp
: :  8048335:       89 e5                   mov    %esp,%ebp
: :  8048337:       83 ec 04                sub    $0x4,%esp
: :  804833a:       c7 45 fc 01 00 00 00    movl   $0x1,0xfffffffc(%ebp)
: :  8048341:       c9                      leave  
: :  8048342:       c3                      ret    
: : 
: : 08048343 <main>:
: :  8048343:       55                      push   %ebp
: :  8048344:       89 e5                   mov    %esp,%ebp
: :  8048346:       83 ec 08                sub    $0x8,%esp
: :  8048349:       83 e4 f0                and    $0xfffffff0,%esp
: :  804834c:       b8 00 00 00 00          mov    $0x0,%eax
: :  8048351:       29 c4                   sub    %eax,%esp
: :  8048353:       e8 dc ff ff ff          call   8048334 <function>
: :  8048358:       b8 00 00 00 00          mov    $0x0,%eax
: :  804835d:       c9                      leave  
: :  804835e:       c3                      ret    
: :  804835f:       90                      nop    
: : 

: : As I understand this, the main function starts by setting up an 8 byte stack frame. Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
: : Anyway, I think I've got the rest of the main function. 0 is put in eax and is then substracted from esp (probably that's what they mean with the overhead C automatically adds), function() is called, 0 is put in eax again, the stack frame is cleared, and the function returns.
: :
: : The function "function" also starts with the good old stack frame set up, sized 4 bytes this time to keep the lokal int variable. Then the value of the var is set to 1 (line 804833a: movl $0x1,0xfffffffc(%ebp)). What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...
: :
: : Any answers for the mentioned questions would be appreciated, and please also let me know if I any of my interpretations of the assembley code are wrong ... Looking forward to your answers!
: :
: :
: :
: Then the lowest 4 bits in esp are set to zero (line 8048349: and $0xfffffff0,%esp). What's the reason behind this? I'd have supposed that manipulations like that rather mess up the stack than do something useful ...
:
: Depends, what is the value in ESP before the bitwise AND? Have you run it through a debugger?

Good idea, I set a breakpoint in main and figured out that ESP is 0xbffffa10 before the AND, so the value shouldn't be modified at all by the command
:
: What I don't really understand here is the way the addressing works - I thought that "0xfffffffc(%ebp)" means as much as "the memory address contained in ebp added to 0xfffffffc" - but then, the address refered to would already exceed 0xffffffff bytes if ebp is greater 3 ... What exactly does that mean, then? I expected something like movl $1,-4(%ebp) here ...
:
: I'm not too sure either, and it don't help to look at this backwards either since I'm use to the Intel syntax not AT&T, but it looks like the variable 'i' was allocated 4 bytes(dword size=size movl ?) hence the 4 byte stack frame. Then it was moved into the memory location 0xfffffffc. Why it was put in the EBP location puzzles me and why you would move that value into EBP and not ESP? Sorry if I couldn't be more of help.
:
Finally figured that out now. 0xfffffffc is not a memory offset here, but a signed number. It is in fact the -4 value I've expected, written in hex. Seems I was to focused on memory addresses to note that . In intel syntax, that line would be "mov dword ptr [ebp-4],1".
You're right, that "movl" is used in the AT&T syntax because of the dword size. Mnemonics are always suffixed with 'l' 'w' or 'b' in AT&T to indicate dword, word or byte operands. That way, byte ptr and the like are not needed.

Report
Re: understanding gcc assembly output Posted by shaolin007 on 14 Sept 2004 at 12:28 PM
: Finally figured that out now. 0xfffffffc is not a memory offset here, but a signed number. It is in fact the -4 value I've expected, written in hex. Seems I was to focused on memory addresses to note that . In intel syntax, that line would be "mov dword ptr [ebp-4],1".
: You're right, that "movl" is used in the AT&T syntax because of the dword size. Mnemonics are always suffixed with 'l' 'w' or 'b' in AT&T to indicate dword, word or byte operands. That way, byte ptr and the like are not needed.

:

I guess I'll have to take your word for it. To me it is a confusing way to have EBP-4 represented. But that's my opinion though.

Report
Re: understanding gcc assembly output Posted by _codist_ on 14 Sept 2004 at 2:00 PM
: : Finally figured that out now. 0xfffffffc is not a memory offset here, but a signed number. It is in fact the -4 value I've expected, written in hex. Seems I was to focused on memory addresses to note that . In intel syntax, that line would be "mov dword ptr [ebp-4],1".
: : You're right, that "movl" is used in the AT&T syntax because of the dword size. Mnemonics are always suffixed with 'l' 'w' or 'b' in AT&T to indicate dword, word or byte operands. That way, byte ptr and the like are not needed.

: :
:
: I guess I'll have to take your word for it. To me it is a confusing way to have EBP-4 represented. But that's my opinion though.

:
Same here. I think the instruction suffixes are a bit more pleasant to read than pointers, but the fact that sub eax,[ebx+ecx*4-20] is written subl -20(%ebx,%ecx,0x4),%eax in AT&T makes me definitely prefer intel syntax



 

Recent Jobs

Official Programmer's Heaven Blogs
Web Hosting | Browser and Social Games | Gadgets

Popular resources on Programmersheaven.com
Assembly | Basic | C | C# | C++ | Delphi | Flash | Java | JavaScript | Pascal | Perl | PHP | Python | Ruby | Visual Basic
© Copyright 2011 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Operated by CommunityHeaven, a BootstrapLabs company.