- The value numbering scheme does not work in situations where
there are loops or conditionals:
x[i+1] = y;
while ( i < max ) {
x [ i+1 ] = x[ i+1 ] + z;
i = i+1;
}
In this example, we would assign the same value number to all four instances of i+1, but
the assignment statement at the end of the loop means that the instance of i+1 outside
the loop will not have the same value as those inside in all cases.
w = 2*x + 1;
if ( x > 0 )
{ z = 2*x }
y = z + 1;
In this example, 2*x + 1 and z + 1 might be common subexpressions, but we can't be sure
unless we know whether or not x will be positive.
- Examples like this make this notion of "straight line code" important
enough to deserve a name. We will call sequences of staight line
code "basic blocks".
- In the real world, before optimization, a compiler usually would rewrite
the program into a form in which basic blocks and the flow
graph were represented explicitly.
- Most optimizers work on an intermediate form that is quite
a bit closer to assembly language than our syntax trees.
In such code a basic block block is simply a sequence of
statements beginning with a label
that contains no branches (other than subroutine
calls) or other labels.
- To preserve control flow information, such compilers
build a directed graph whose nodes are basic blocks
and whose edges represent possible branches between
blocks. This is called a control flow graph.
- Describing a basic block in our intermediate form is
a bit trickier since trees aren't quite as linear as
pseudo-assembly language.
Luckly, since we are only interested in using basic
blocks to identify sequences of straight line code we can take
a simpler approach.
- The approach I want you to imagine depends on two facts:
- All we need to do to apply a local optimization algorithm to basic
blocks is take the code used to do a "standard"
traversal of the syntax tree and figure out where to put the
"initialize various data structures" steps.
- To make this precise, here are sketches of
some pieces of the optimization traversal
code:
void optimizeStmtList( node * slist ) {
visitlist( slist, optimizeStmt, 0);
}
void optimizeStmt( node * stmt ) {
switch (stmt->internal.type ) {
case Nif:
optimizeIf( stmt );
break;
case Nwhile:
...
}
void optimizeIf( node * stmt ) {
optimizeExpr( stmt->internal.child[0] );
startNewBlock();
optimizeStmtList( stmt->internal.child[1] );
startNewBlock();
if ( there is an else part ) {
optimizeStmtList( stmt->internal.child[2]);
startNewBlock();
}
}
- We can also take advantage of the fact that our goal is local
optimization by working with a slightly
looser definition of basic blocks.
- At a point where control branches
we can continue to propagate CSE information down one of the
branches (or both if we are willing to save the state
of the algorithm when we head down the first branch).
We just have to start over again whenever two control
paths joing.
- This leads to the notion of an "extended basic block".
- In low-level (assembly language like) intermediate
forms, a extended basic block is a sequence of
statements starting with a label (or the entry
point of a procedure) that includes no other
labels (but may contain branches unlike simple
basic blocks).
- Note that an extended basic block will be the
union of a sequence of basic blocks.
- In our trees, extended basic blocks can be formed
by leaving out the "startNewBlock" except at
points where we know labels may be placed.