Wednesday, February 11, 2009

How G++ implement virtual function

Introduction

    We know that virtual function in C++ is implemented by VPTR. And there is a pointer in the beginning of every instance of the Class point to the VPTR table. That means VPTR table is sole in the process and every instance share the same VPTR.

    When the virtual function is called in program, this pointer pass to the virtual function as the first parameter.(all the non-static function of the class use the same call method) The function can know which instance is used.

    It's clear and simple. But how the whole mechanism is implemented in machine code or assemble code.

    Let's begin our fascinating journey by a little example:

class A{

public:

virtual void test( ){i = 0;};

int i;

};

class B: public A{

public:

virtual void test(){i=1976;};

};

void tv(A* p)

{

p->test();

}

int main()

{

B b;

tv(&b);

}

The example do nothing, only set the class attribute i to 1976.

We can generate assemble code by a option -S, “g++ vptr.cpp -S�.

Let's check the assemble code in two parts:

  1. How to instance b?

  2. How virtual function is called?

First part, How to instant b?

In the main function, after push esp, init ebp , the program do the following thing.

subl $20, %esp

leal -12(%ebp), %eax

movl %eax, (%esp)

call _ZN1BC1Ev

What's that mean? Let me explain it instruction by instruction.

First instruction subtitude esp by 20, it will reserve 20 bytes for local variable. You may ask rith now, the size of B is only 8 bytes, including VPTR pointer and an int i, why do GCC reserve 20 bytes. Beats me, Except the 8 bytes, there are 12 bytes left. Who need them? The answer is call stack and return value. In next two instructions, it's function will displayed.

“leal� and “movl� instrunciton assign the instance b's memory and make preparation for call the constructor of B. After this two instructions, the stack illustrated as below diagram.

Let's enter the construction of _ZN1BC1Ev:

pushl %ebp

movl %esp, %ebp.

subl $8, %esp

movl 8(%ebp), %eax

movl %eax, (%esp)

call _ZN1AC2Ev

movl $_ZTV1B+8, %edx

movl 8(%ebp), %eax

movl %edx, (%eax)

leave

retB,

We skip A's constructor because they are same as call B's constructor. The first instructor after “call _ZN1AC2Ev� is a movl instructor. The _ ZTV1B is the VPTR table. It is defined in code:

_ZTV1B:

.long 0

.long _ZTI1B

.long _ZN1B4testEv

_ZTV1B+8 is the virtual function test address. This instruction would move the VPTR's address to register edx. The next two instruction will init the class's VPTR pointer, put the VPTR's address into the pointer's value. Then the construction is completed. The later leave instruction will move EBP's value and pop EBP from the stack.

The Second Part: How virtual function is called?

Let's watch the assemble code of function tv:

The two instructions before the tv is called are very clear. They are function tv's parameter.

leal -12(%ebp), %eax

movl %eax, (%esp)

call _Z2tvP1A

Let's enter function tv directly.

pushl %ebp

movl %esp, %ebp

subl $8, %esp

movl 8(%ebp), %eax

movl (%eax), %eax

movl (%eax), %edx

movl 8(%ebp), %eax

movl %eax, (%esp)

call *%edx

leave

ret

I'll skip the first three instructions, they are all same in each function. After the instruction,movl 8(%ebp), %eax , register eax's value is the address of variable b. After the instruction, movl (%eax), %eax, register eax value is the address of VPTR. After the instruction,movl (%eax), %edx, register edx value is the virtual function's address. The next three instruction is calling the virtual function.

Then all is done. It's simple, right?

No comments:

Post a Comment