| <?xml version="1.0"?> |
| <document> |
| |
| <properties> |
| <author email="markus.dahm@berlin.de">Markus Dahm</author> |
| <title>Byte Code Engineering Library (BCEL)</title> |
| </properties> |
| |
| <body> |
| |
| <section name="Abstract"> |
| <p> |
| Extensions and improvements of the programming language Java and |
| its related execution environment (Java Virtual Machine, JVM) are |
| the subject of a large number of research projects and |
| proposals. There are projects, for instance, to add parameterized |
| types to Java, to implement <a |
| href="http://aspectj.org/">Aspect-Oriented Programming</a>, to |
| perform sophisticated static analysis, and to improve the run-time |
| performance. |
| </p> |
| |
| <p> |
| Since Java classes are compiled into portable binary class files |
| (called <em>byte code</em>), it is the most convenient and |
| platform-independent way to implement these improvements not by |
| writing a new compiler or changing the JVM, but by transforming |
| the byte code. These transformations can either be performed |
| after compile-time, or at load-time. Many programmers are doing |
| this by implementing their own specialized byte code manipulation |
| tools, which are, however, restricted in the range of their |
| re-usability. |
| </p> |
| |
| <p> |
| To deal with the necessary class file transformations, we |
| introduce an API that helps developers to conveniently implement |
| their transformations. |
| </p> |
| </section> |
| |
| <section name="1 Introduction"> |
| <p> |
| The <a href="http://java.sun.com/">Java</a> language has become |
| very popular and many research projects deal with further |
| improvements of the language or its run-time behavior. The |
| possibility to extend a language with new concepts is surely a |
| desirable feature, but the implementation issues should be hidden |
| from the user. Fortunately, the concepts of the Java Virtual |
| Machine permit the user-transparent implementation of such |
| extensions with relatively little effort. |
| </p> |
| |
| <p> |
| Because the target language of Java is an interpreted language |
| with a small and easy-to-understand set of instructions (the |
| <em>byte code</em>), developers can implement and test their |
| concepts in a very elegant way. One can write a plug-in |
| replacement for the system's <em>class loader</em> which is |
| responsible for dynamically loading class files at run-time and |
| passing the byte code to the Virtual Machine (see section ). |
| Class loaders may thus be used to intercept the loading process |
| and transform classes before they get actually executed by the |
| JVM. While the original class files always remain unaltered, the |
| behavior of the class loader may be reconfigured for every |
| execution or instrumented dynamically. |
| </p> |
| |
| <p> |
| The <font face="helvetica,arial">BCEL</font> API (Byte Code |
| Engineering Library), formerly known as JavaClass, is a toolkit |
| for the static analysis and dynamic creation or transformation of |
| Java class files. It enables developers to implement the desired |
| features on a high level of abstraction without handling all the |
| internal details of the Java class file format and thus |
| re-inventing the wheel every time. <font face="helvetica,arial">BCEL |
| </font> is written entirely in Java and freely available under the |
| terms of the <a href="license.html">Apache Software License</a>. |
| </p> |
| |
| <p> |
| This manual is structured as follows: We give a brief description |
| of the Java Virtual Machine and the class file format in <a |
| href="#2 The Java Virtual Machine">section 2</a>. <a href="#3 The |
| BCEL API">Section 3</a> introduces the <font |
| face="helvetica,arial">BCEL</font> API. <a href="#4 Application |
| areas">Section 4</a> describes some typical application areas and |
| example projects. The appendix contains code examples that are to |
| long to be presented in the main part of this paper. All examples |
| are included in the down-loadable distribution. |
| </p> |
| |
| </section> |
| |
| <section name="2 The Java Virtual Machine"> |
| <p> |
| Readers already familiar with the Java Virtual Machine and the |
| Java class file format may want to skip this section and proceed |
| with <a href="#3 The BCEL API">section 3</a>. |
| </p> |
| |
| <p> |
| Programs written in the Java language are compiled into a portable |
| binary format called <em>byte code</em>. Every class is |
| represented by a single class file containing class related data |
| and byte code instructions. These files are loaded dynamically |
| into an interpreter (<a |
| href="http://java.sun.com/docs/books/vmspec/index.html">Java |
| Virtual Machine</a>, aka. JVM) and executed. |
| </p> |
| |
| <p> |
| <a href="#Figure 1">Figure 1</a> illustrates the procedure of |
| compiling and executing a Java class: The source file |
| (<tt>HelloWorld.java</tt>) is compiled into a Java class file |
| (<tt>HelloWorld.class</tt>), loaded by the byte code interpreter |
| and executed. In order to implement additional features, |
| researchers may want to transform class files (drawn with bold |
| lines) before they get actually executed. This application area |
| is one of the main issues of this article. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 1"> |
| <img src="images/jvm.gif"/> |
| <br/> |
| Figure 1: Compilation and execution of Java classes</a> |
| </p> |
| |
| <p> |
| Note that the use of the general term "Java" implies in fact two |
| meanings: on the one hand, Java as a programming language, on the |
| other hand, the Java Virtual Machine, which is not necessarily |
| targeted by the Java language exclusively, but may be used by <a |
| href="http://grunge.cs.tu-berlin.de/~tolk/vmlanguages.html">other |
| languages</a> as well. We assume the reader to be familiar with |
| the Java language and to have a general understanding of the |
| Virtual Machine. |
| </p> |
| |
| </section> |
| |
| <section name="2.1 Java class file format"> |
| <p> |
| Giving a full overview of the design issues of the Java class file |
| format and the associated byte code instructions is beyond the |
| scope of this paper. We will just give a brief introduction |
| covering the details that are necessary for understanding the rest |
| of this paper. The format of class files and the byte code |
| instruction set are described in more detail in the <a |
| href="http://java.sun.com/docs/books/vmspec/index.html">Java |
| Virtual Machine Specification</a>. Especially, we will not deal |
| with the security constraints that the Java Virtual Machine has to |
| check at run-time, i.e. the byte code verifier. |
| </p> |
| |
| <p> |
| <a href="#Figure 2">Figure 2</a> shows a simplified example of the |
| contents of a Java class file: It starts with a header containing |
| a "magic number" (<tt>0xCAFEBABE</tt>) and the version number, |
| followed by the <em>constant pool</em>, which can be roughly |
| thought of as the text segment of an executable, the <em>access |
| rights</em> of the class encoded by a bit mask, a list of |
| interfaces implemented by the class, lists containing the fields |
| and methods of the class, and finally the <em>class |
| attributes</em>, e.g., the <tt>SourceFile</tt> attribute telling |
| the name of the source file. Attributes are a way of putting |
| additional, user-defined information into class file data |
| structures. For example, a custom class loader may evaluate such |
| attribute data in order to perform its transformations. The JVM |
| specification declares that unknown, i.e., user-defined attributes |
| must be ignored by any Virtual Machine implementation. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 2"> |
| <img src="images/classfile.gif"/> |
| <br/> |
| Figure 2: Java class file format</a> |
| </p> |
| |
| <p> |
| Because all of the information needed to dynamically resolve the |
| symbolic references to classes, fields and methods at run-time is |
| coded with string constants, the constant pool contains in fact |
| the largest portion of an average class file, approximately |
| 60%. In fact, this makes the constant pool an easy target for code |
| manipulation issues. The byte code instructions themselves just |
| make up 12%. |
| </p> |
| |
| <p> |
| The right upper box shows a "zoomed" excerpt of the constant pool, |
| while the rounded box below depicts some instructions that are |
| contained within a method of the example class. These |
| instructions represent the straightforward translation of the |
| well-known statement: |
| </p> |
| |
| <p align="center"> |
| <source>System.out.println("Hello, world");</source> |
| </p> |
| |
| <p> |
| The first instruction loads the contents of the field <tt>out</tt> |
| of class <tt>java.lang.System</tt> onto the operand stack. This is |
| an instance of the class <tt>java.io.PrintStream</tt>. The |
| <tt>ldc</tt> ("Load constant") pushes a reference to the string |
| "Hello world" on the stack. The next instruction invokes the |
| instance method <tt>println</tt> which takes both values as |
| parameters (Instance methods always implicitly take an instance |
| reference as their first argument). |
| </p> |
| |
| <p> |
| Instructions, other data structures within the class file and |
| constants themselves may refer to constants in the constant pool. |
| Such references are implemented via fixed indexes encoded directly |
| into the instructions. This is illustrated for some items of the |
| figure emphasized with a surrounding box. |
| </p> |
| |
| <p> |
| For example, the <tt>invokevirtual</tt> instruction refers to a |
| <tt>MethodRef</tt> constant that contains information about the |
| name of the called method, the signature (i.e., the encoded |
| argument and return types), and to which class the method belongs. |
| In fact, as emphasized by the boxed value, the <tt>MethodRef</tt> |
| constant itself just refers to other entries holding the real |
| data, e.g., it refers to a <tt>ConstantClass</tt> entry containing |
| a symbolic reference to the class <tt>java.io.PrintStream</tt>. |
| To keep the class file compact, such constants are typically |
| shared by different instructions and other constant pool |
| entries. Similarly, a field is represented by a <tt>Fieldref</tt> |
| constant that includes information about the name, the type and |
| the containing class of the field. |
| </p> |
| |
| <p> |
| The constant pool basically holds the following types of |
| constants: References to methods, fields and classes, strings, |
| integers, floats, longs, and doubles. |
| </p> |
| |
| </section> |
| |
| <section name="2.2 Byte code instruction set"> |
| <p> |
| The JVM is a stack-oriented interpreter that creates a local stack |
| frame of fixed size for every method invocation. The size of the |
| local stack has to be computed by the compiler. Values may also be |
| stored intermediately in a frame area containing <em>local |
| variables</em> which can be used like a set of registers. These |
| local variables are numbered from 0 to 65535, i.e., you have a |
| maximum of 65536 of local variables per method. The stack frames |
| of caller and callee method are overlapping, i.e., the caller |
| pushes arguments onto the operand stack and the called method |
| receives them in local variables. |
| </p> |
| |
| <p> |
| The byte code instruction set currently consists of 212 |
| instructions, 44 opcodes are marked as reserved and may be used |
| for future extensions or intermediate optimizations within the |
| Virtual Machine. The instruction set can be roughly grouped as |
| follows: |
| </p> |
| |
| <p> |
| <b>Stack operations:</b> Constants can be pushed onto the stack |
| either by loading them from the constant pool with the |
| <tt>ldc</tt> instruction or with special "short-cut" |
| instructions where the operand is encoded into the instructions, |
| e.g., <tt>iconst_0</tt> or <tt>bipush</tt> (push byte value). |
| </p> |
| |
| <p> |
| <b>Arithmetic operations:</b> The instruction set of the Java |
| Virtual Machine distinguishes its operand types using different |
| instructions to operate on values of specific type. Arithmetic |
| operations starting with <tt>i</tt>, for example, denote an |
| integer operation. E.g., <tt>iadd</tt> that adds two integers |
| and pushes the result back on the stack. The Java types |
| <tt>boolean</tt>, <tt>byte</tt>, <tt>short</tt>, and |
| <tt>char</tt> are handled as integers by the JVM. |
| </p> |
| |
| <p> |
| <b>Control flow:</b> There are branch instructions like |
| <tt>goto</tt>, and <tt>if_icmpeq</tt>, which compares two integers |
| for equality. There is also a <tt>jsr</tt> (jump to sub-routine) |
| and <tt>ret</tt> pair of instructions that is used to implement |
| the <tt>finally</tt> clause of <tt>try-catch</tt> blocks. |
| Exceptions may be thrown with the <tt>athrow</tt> instruction. |
| Branch targets are coded as offsets from the current byte code |
| position, i.e., with an integer number. |
| </p> |
| |
| <p> |
| <b>Load and store operations</b> for local variables like |
| <tt>iload</tt> and <tt>istore</tt>. There are also array |
| operations like <tt>iastore</tt> which stores an integer value |
| into an array. |
| </p> |
| |
| <p> |
| <b>Field access:</b> The value of an instance field may be |
| retrieved with <tt>getfield</tt> and written with |
| <tt>putfield</tt>. For static fields, there are |
| <tt>getstatic</tt> and <tt>putstatic</tt> counterparts. |
| </p> |
| |
| <p> |
| <b>Method invocation:</b> Static Methods may either be called via |
| <tt>invokestatic</tt> or be bound virtually with the |
| <tt>invokevirtual</tt> instruction. Super class methods and |
| private methods are invoked with <tt>invokespecial</tt>. A |
| special case are interface methods which are invoked with |
| <tt>invokeinterface</tt>. |
| </p> |
| |
| <p> |
| <b>Object allocation:</b> Class instances are allocated with the |
| <tt>new</tt> instruction, arrays of basic type like |
| <tt>int[]</tt> with <tt>newarray</tt>, arrays of references like |
| <tt>String[][]</tt> with <tt>anewarray</tt> or |
| <tt>multianewarray</tt>. |
| </p> |
| |
| <p> |
| <b>Conversion and type checking:</b> For stack operands of basic |
| type there exist casting operations like <tt>f2i</tt> which |
| converts a float value into an integer. The validity of a type |
| cast may be checked with <tt>checkcast</tt> and the |
| <tt>instanceof</tt> operator can be directly mapped to the |
| equally named instruction. |
| </p> |
| |
| <p> |
| Most instructions have a fixed length, but there are also some |
| variable-length instructions: In particular, the |
| <tt>lookupswitch</tt> and <tt>tableswitch</tt> instructions, which |
| are used to implement <tt>switch()</tt> statements. Since the |
| number of <tt>case</tt> clauses may vary, these instructions |
| contain a variable number of statements. |
| </p> |
| |
| <p> |
| We will not list all byte code instructions here, since these are |
| explained in detail in the <a |
| href="http://java.sun.com/docs/books/vmspec/index.html">JVM |
| specification</a>. The opcode names are mostly self-explaining, |
| so understanding the following code examples should be fairly |
| intuitive. |
| </p> |
| |
| </section> |
| |
| <section name="2.3 Method code"> |
| <p> |
| Non-abstract (and non-native) methods contain an attribute |
| "<tt>Code</tt>" that holds the following data: The maximum size of |
| the method's stack frame, the number of local variables and an |
| array of byte code instructions. Optionally, it may also contain |
| information about the names of local variables and source file |
| line numbers that can be used by a debugger. |
| </p> |
| |
| <p> |
| Whenever an exception is raised during execution, the JVM performs |
| exception handling by looking into a table of exception |
| handlers. The table marks handlers, i.e., code chunks, to be |
| responsible for exceptions of certain types that are raised within |
| a given area of the byte code. When there is no appropriate |
| handler the exception is propagated back to the caller of the |
| method. The handler information is itself stored in an attribute |
| contained within the <tt>Code</tt> attribute. |
| </p> |
| |
| </section> |
| |
| <section name="2.4 Byte code offsets"> |
| <p> |
| Targets of branch instructions like <tt>goto</tt> are encoded as |
| relative offsets in the array of byte codes. Exception handlers |
| and local variables refer to absolute addresses within the byte |
| code. The former contains references to the start and the end of |
| the <tt>try</tt> block, and to the instruction handler code. The |
| latter marks the range in which a local variable is valid, i.e., |
| its scope. This makes it difficult to insert or delete code areas |
| on this level of abstraction, since one has to recompute the |
| offsets every time and update the referring objects. We will see |
| in <a href="#3.3 ClassGen">section 3.3</a> how <font |
| face="helvetica,arial">BCEL</font> remedies this restriction. |
| </p> |
| |
| </section> |
| |
| <section name="2.5 Type information"> |
| <p> |
| Java is a type-safe language and the information about the types |
| of fields, local variables, and methods is stored in so called |
| <em>signatures</em>. These are strings stored in the constant pool |
| and encoded in a special format. For example the argument and |
| return types of the <tt>main</tt> method |
| </p> |
| |
| <p align="center"> |
| <source>public static void main(String[] argv)</source> |
| </p> |
| |
| <p> |
| are represented by the signature |
| </p> |
| |
| <p align="center"> |
| <source>([java/lang/String;)V</source> |
| </p> |
| |
| <p> |
| Classes are internally represented by strings like |
| <tt>"java/lang/String"</tt>, basic types like <tt>float</tt> by an |
| integer number. Within signatures they are represented by single |
| characters, e.g., <tt>I</tt>, for integer. Arrays are denoted with |
| a <tt>[</tt> at the start of the signature. |
| </p> |
| |
| </section> |
| |
| <section name="2.6 Code example"> |
| <p> |
| The following example program prompts for a number and prints the |
| faculty of it. The <tt>readLine()</tt> method reading from the |
| standard input may raise an <tt>IOException</tt> and if a |
| misspelled number is passed to <tt>parseInt()</tt> it throws a |
| <tt>NumberFormatException</tt>. Thus, the critical area of code |
| must be encapsulated in a <tt>try-catch</tt> block. |
| </p> |
| |
| <source> |
| import java.io.*; |
| |
| public class Faculty { |
| private static BufferedReader in = new BufferedReader(new |
| InputStreamReader(System.in)); |
| |
| public static final int fac(int n) { |
| return (n == 0)? 1 : n * fac(n - 1); |
| } |
| |
| public static final int readInt() { |
| int n = 4711; |
| try { |
| System.out.print("Please enter a number> "); |
| n = Integer.parseInt(in.readLine()); |
| } catch(IOException e1) { System.err.println(e1); } |
| catch(NumberFormatException e2) { System.err.println(e2); } |
| return n; |
| } |
| |
| public static void main(String[] argv) { |
| int n = readInt(); |
| System.out.println("Faculty of " + n + " is " + fac(n)); |
| } |
| } |
| </source> |
| |
| <p> |
| This code example typically compiles to the following chunks of |
| byte code: |
| </p> |
| |
| <source> |
| 0: iload_0 |
| 1: ifne #8 |
| 4: iconst_1 |
| 5: goto #16 |
| 8: iload_0 |
| 9: iload_0 |
| 10: iconst_1 |
| 11: isub |
| 12: invokestatic Faculty.fac (I)I (12) |
| 15: imul |
| 16: ireturn |
| |
| LocalVariable(start_pc = 0, length = 16, index = 0:int n) |
| </source> |
| |
| <p><b>fac():</b> |
| The method <tt>fac</tt> has only one local variable, the argument |
| <tt>n</tt>, stored at index 0. This variable's scope ranges from |
| the start of the byte code sequence to the very end. If the value |
| of <tt>n</tt> (the value fetched with <tt>iload_0</tt>) is not |
| equal to 0, the <tt>ifne</tt> instruction branches to the byte |
| code at offset 8, otherwise a 1 is pushed onto the operand stack |
| and the control flow branches to the final return. For ease of |
| reading, the offsets of the branch instructions, which are |
| actually relative, are displayed as absolute addresses in these |
| examples. |
| </p> |
| |
| <p> |
| If recursion has to continue, the arguments for the multiplication |
| (<tt>n</tt> and <tt>fac(n - 1)</tt>) are evaluated and the results |
| pushed onto the operand stack. After the multiplication operation |
| has been performed the function returns the computed value from |
| the top of the stack. |
| </p> |
| |
| <source> |
| 0: sipush 4711 |
| 3: istore_0 |
| 4: getstatic java.lang.System.out Ljava/io/PrintStream; |
| 7: ldc "Please enter a number> " |
| 9: invokevirtual java.io.PrintStream.print (Ljava/lang/String;)V |
| 12: getstatic Faculty.in Ljava/io/BufferedReader; |
| 15: invokevirtual java.io.BufferedReader.readLine ()Ljava/lang/String; |
| 18: invokestatic java.lang.Integer.parseInt (Ljava/lang/String;)I |
| 21: istore_0 |
| 22: goto #44 |
| 25: astore_1 |
| 26: getstatic java.lang.System.err Ljava/io/PrintStream; |
| 29: aload_1 |
| 30: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V |
| 33: goto #44 |
| 36: astore_1 |
| 37: getstatic java.lang.System.err Ljava/io/PrintStream; |
| 40: aload_1 |
| 41: invokevirtual java.io.PrintStream.println (Ljava/lang/Object;)V |
| 44: iload_0 |
| 45: ireturn |
| |
| Exception handler(s) = |
| From To Handler Type |
| 4 22 25 java.io.IOException(6) |
| 4 22 36 NumberFormatException(10) |
| </source> |
| |
| <p><b>readInt():</b> First the local variable <tt>n</tt> (at index 0) |
| is initialized to the value 4711. The next instruction, |
| <tt>getstatic</tt>, loads the referencs held by the static |
| <tt>System.out</tt> field onto the stack. Then a string is loaded |
| and printed, a number read from the standard input and assigned to |
| <tt>n</tt>. |
| </p> |
| |
| <p> |
| If one of the called methods (<tt>readLine()</tt> and |
| <tt>parseInt()</tt>) throws an exception, the Java Virtual Machine |
| calls one of the declared exception handlers, depending on the |
| type of the exception. The <tt>try</tt>-clause itself does not |
| produce any code, it merely defines the range in which the |
| subsequent handlers are active. In the example, the specified |
| source code area maps to a byte code area ranging from offset 4 |
| (inclusive) to 22 (exclusive). If no exception has occurred |
| ("normal" execution flow) the <tt>goto</tt> instructions branch |
| behind the handler code. There the value of <tt>n</tt> is loaded |
| and returned. |
| </p> |
| |
| <p> |
| The handler for <tt>java.io.IOException</tt> starts at |
| offset 25. It simply prints the error and branches back to the |
| normal execution flow, i.e., as if no exception had occurred. |
| </p> |
| |
| </section> |
| |
| <section name="3 The BCEL API"> |
| <p> |
| The <font face="helvetica,arial">BCEL</font> API abstracts from |
| the concrete circumstances of the Java Virtual Machine and how to |
| read and write binary Java class files. The API mainly consists |
| of three parts: |
| </p> |
| |
| <p> |
| |
| <ol type="1"> |
| <li> A package that contains classes that describe "static" |
| constraints of class files, i.e., reflects the class file format and |
| is not intended for byte code modifications. The classes may be |
| used to read and write class files from or to a file. This is |
| useful especially for analyzing Java classes without having the |
| source files at hand. The main data structure is called |
| <tt>JavaClass</tt> which contains methods, fields, etc..</li> |
| |
| <li> A package to dynamically generate or modify |
| <tt>JavaClass</tt> or <tt>Method</tt> objects. It may be used to |
| insert analysis code, to strip unnecessary information from class |
| files, or to implement the code generator back-end of a Java |
| compiler.</li> |
| |
| <li> Various code examples and utilities like a class file viewer, |
| a tool to convert class files into HTML, and a converter from |
| class files to the <a |
| href="http://mrl.nyu.edu/~meyer/jasmin/">Jasmin</a> assembly |
| language.</li> |
| </ol> |
| </p> |
| </section> |
| |
| <section name="3.1 JavaClass"> |
| <p> |
| The "static" component of the <font |
| face="helvetica,arial">BCEL</font> API resides in the package |
| <tt>org.apache.bcel.classfile</tt> and closely represents class |
| files. All of the binary components and data structures declared |
| in the <a |
| href="http://java.sun.com/docs/books/vmspec/index.html">JVM |
| specification</a> and described in section <a |
| href="#2 The Java Virtual Machine">2</a> are mapped to classes. |
| |
| <a href="#Figure 3">Figure 3</a> shows an UML diagram of the |
| hierarchy of classes of the <font face="helvetica,arial">BCEL |
| </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also |
| shows a detailed diagram of the <tt>ConstantPool</tt> components. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 3"> |
| <img src="images/javaclass.gif"/> <br/> |
| Figure 3: UML diagram for the JavaClass API</a> |
| </p> |
| |
| <p> |
| The top-level data structure is <tt>JavaClass</tt>, which in most |
| cases is created by a <tt>ClassParser</tt> object that is capable |
| of parsing binary class files. A <tt>JavaClass</tt> object |
| basically consists of fields, methods, symbolic references to the |
| super class and to the implemented interfaces. |
| </p> |
| |
| <p> |
| The constant pool serves as some kind of central repository and is |
| thus of outstanding importance for all components. |
| <tt>ConstantPool</tt> objects contain an array of fixed size of |
| <tt>Constant</tt> entries, which may be retrieved via the |
| <tt>getConstant()</tt> method taking an integer index as argument. |
| Indexes to the constant pool may be contained in instructions as |
| well as in other components of a class file and in constant pool |
| entries themselves. |
| </p> |
| |
| <p> |
| Methods and fields contain a signature, symbolically defining |
| their types. Access flags like <tt>public static final</tt> occur |
| in several places and are encoded by an integer bit mask, e.g., |
| <tt>public static final</tt> matches to the Java expression |
| </p> |
| |
| |
| <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source> |
| |
| <p> |
| As mentioned in <a href="#2.1 Java class file format">section |
| 2.1</a> already, several components may contain <em>attribute</em> |
| objects: classes, fields, methods, and <tt>Code</tt> objects |
| (introduced in <a href="#2.3 Method code">section 2.3</a>). The |
| latter is an attribute itself that contains the actual byte code |
| array, the maximum stack size, the number of local variables, a |
| table of handled exceptions, and some optional debugging |
| information coded as <tt>LineNumberTable</tt> and |
| <tt>LocalVariableTable</tt> attributes. Attributes are in general |
| specific to some data structure, i.e., no two components share the |
| same kind of attribute, though this is not explicitly |
| forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped |
| with the component they belong to. |
| </p> |
| |
| </section> |
| |
| <section name="3.2 Class repository"> |
| <p> |
| Using the provided <tt>Repository</tt> class, reading class files into |
| a <tt>JavaClass</tt> object is quite simple: |
| </p> |
| |
| <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source> |
| |
| <p> |
| The repository also contains methods providing the dynamic equivalent |
| of the <tt>instanceof</tt> operator, and other useful routines: |
| </p> |
| |
| <source> |
| if(Repository.instanceOf(clazz, super_class) { |
| ... |
| }</source> |
| |
| </section> |
| |
| <section name="3.2.1 Accessing class file data"> |
| |
| <p> |
| Information within the class file components may be accessed like |
| Java Beans via intuitive set/get methods. All of them also define |
| a <tt>toString()</tt> method so that implementing a simple class |
| viewer is very easy. In fact all of the examples used here have |
| been produced this way: |
| </p> |
| |
| <source> |
| System.out.println(clazz); |
| printCode(clazz.getMethods()); |
| ... |
| public static void printCode(Method[] methods) { |
| for(int i=0; i < methods.length; i++) { |
| System.out.println(methods[i]); |
| |
| Code code = methods[i].getCode(); |
| if(code != null) // Non-abstract method |
| System.out.println(code); |
| } |
| } |
| </source> |
| |
| </section> |
| |
| <section name="3.2.2 Analyzing class data"> |
| <p> |
| Last but not least, <font face="helvetica,arial">BCEL</font> |
| supports the <em>Visitor</em> design pattern, so one can write |
| visitor objects to traverse and analyze the contents of a class |
| file. Included in the distribution is a class |
| <tt>JasminVisitor</tt> that converts class files into the <a |
| href="http://mrl.nyu.edu/~meyer/jasmin/">Jasmin</a> |
| assembler language. |
| </p> |
| |
| </section> |
| |
| <section name="3.3 ClassGen"> |
| <p> |
| This part of the API (package <tt>org.apache.bcel.generic</tt>) |
| supplies an abstraction level for creating or transforming class |
| files dynamically. It makes the static constraints of Java class |
| files like the hard-coded byte code addresses "generic". The |
| generic constant pool, for example, is implemented by the class |
| <tt>ConstantPoolGen</tt> which offers methods for adding different |
| types of constants. Accordingly, <tt>ClassGen</tt> offers an |
| interface to add methods, fields, and attributes. |
| <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 4"> |
| <img src="images/classgen.gif"/> |
| <br/> |
| Figure 4: UML diagram of the ClassGen API</a> |
| </p> |
| |
| </section> |
| |
| <section name="3.3.1 Types"> |
| <p> |
| We abstract from the concrete details of the type signature syntax |
| (see <a href="#2.5 Type information">2.5</a>) by introducing the |
| <tt>Type</tt> class, which is used, for example, by methods to |
| define their return and argument types. Concrete sub-classes are |
| <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt> |
| which consists of the element type and the number of |
| dimensions. For commonly used types the class offers some |
| predefined constants. For example, the method signature of the |
| <tt>main</tt> method as shown in |
| <a href="#2.5 Type information">section 2.5</a> is represented by: |
| </p> |
| |
| <source> |
| Type return_type = Type.VOID; |
| Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) }; |
| </source> |
| |
| <p> |
| <tt>Type</tt> also contains methods to convert types into textual |
| signatures and vice versa. The sub-classes contain implementations |
| of the routines and constraints specified by the Java Language |
| Specification. |
| </p> |
| </section> |
| |
| <section name="3.3.2 Generic fields and methods"> |
| <p> |
| Fields are represented by <tt>FieldGen</tt> objects, which may be |
| freely modified by the user. If they have the access rights |
| <tt>static final</tt>, i.e., are constants and of basic type, they |
| may optionally have an initializing value. |
| </p> |
| |
| <p> |
| Generic methods contain methods to add exceptions the method may |
| throw, local variables, and exception handlers. The latter two are |
| represented by user-configurable objects as well. Because |
| exception handlers and local variables contain references to byte |
| code addresses, they also take the role of an <em>instruction |
| targeter</em> in our terminology. Instruction targeters contain a |
| method <tt>updateTarget()</tt> to redirect a reference. This is |
| somewhat related to the Observer design pattern. Generic |
| (non-abstract) methods refer to <em>instruction lists</em> that |
| consist of instruction objects. References to byte code addresses |
| are implemented by handles to instruction objects. If the list is |
| updated the instruction targeters will be informed about it. This |
| is explained in more detail in the following sections. |
| </p> |
| |
| <p> |
| The maximum stack size needed by the method and the maximum number |
| of local variables used may be set manually or computed via the |
| <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods |
| automatically. |
| </p> |
| |
| </section> |
| |
| <section name="3.3.3 Instructions"> |
| <p> |
| Modeling instructions as objects may look somewhat odd at first |
| sight, but in fact enables programmers to obtain a high-level view |
| upon control flow without handling details like concrete byte code |
| offsets. Instructions consist of an opcode (sometimes called |
| tag), their length in bytes and an offset (or index) within the |
| byte code. Since many instructions are immutable (stack operators, |
| e.g.), the <tt>InstructionConstants</tt> interface offers |
| shareable predefined "fly-weight" constants to use. |
| </p> |
| |
| <p> |
| Instructions are grouped via sub-classing, the type hierarchy of |
| instruction classes is illustrated by (incomplete) figure in the |
| appendix. The most important family of instructions are the |
| <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to |
| targets somewhere within the byte code. Obviously, this makes them |
| candidates for playing an <tt>InstructionTargeter</tt> role, |
| too. Instructions are further grouped by the interfaces they |
| implement, there are, e.g., <tt>TypedInstruction</tt>s that are |
| associated with a specific type like <tt>ldc</tt>, or |
| <tt>ExceptionThrower</tt> instructions that may raise exceptions |
| when executed. |
| </p> |
| |
| <p> |
| All instructions can be traversed via <tt>accept(Visitor v)</tt> |
| methods, i.e., the Visitor design pattern. There is however some |
| special trick in these methods that allows to merge the handling |
| of certain instruction groups. The <tt>accept()</tt> do not only |
| call the corresponding <tt>visit()</tt> method, but call |
| <tt>visit()</tt> methods of their respective super classes and |
| implemented interfaces first, i.e., the most specific |
| <tt>visit()</tt> call is last. Thus one can group the handling of, |
| say, all <tt>BranchInstruction</tt>s into one single method. |
| </p> |
| |
| <p> |
| For debugging purposes it may even make sense to "invent" your own |
| instructions. In a sophisticated code generator like the one used |
| as a backend of the <a href="http://barat.sourceforge.net">Barat |
| framework</a> for static analysis one often has to insert |
| temporary <tt>nop</tt> (No operation) instructions. When examining |
| the produced code it may be very difficult to track back where the |
| <tt>nop</tt> was actually inserted. One could think of a derived |
| <tt>nop2</tt> instruction that contains additional debugging |
| information. When the instruction list is dumped to byte code, the |
| extra data is simply dropped. |
| </p> |
| |
| <p> |
| One could also think of new byte code instructions operating on |
| complex numbers that are replaced by normal byte code upon |
| load-time or are recognized by a new JVM. |
| </p> |
| |
| </section> |
| |
| <section name="3.3.4 Instruction lists"> |
| <p> |
| An <em>instruction list</em> is implemented by a list of |
| <em>instruction handles</em> encapsulating instruction objects. |
| References to instructions in the list are thus not implemented by |
| direct pointers to instructions but by pointers to instruction |
| <em>handles</em>. This makes appending, inserting and deleting |
| areas of code very simple and also allows us to reuse immutable |
| instruction objects (fly-weight objects). Since we use symbolic |
| references, computation of concrete byte code offsets does not |
| need to occur until finalization, i.e., until the user has |
| finished the process of generating or transforming code. We will |
| use the term instruction handle and instruction synonymously |
| throughout the rest of the paper. Instruction handles may contain |
| additional user-defined data using the <tt>addAttribute()</tt> |
| method. |
| </p> |
| |
| <p> |
| <b>Appending:</b> One can append instructions or other instruction |
| lists anywhere to an existing list. The instructions are appended |
| after the given instruction handle. All append methods return a |
| new instruction handle which may then be used as the target of a |
| branch instruction, e.g.: |
| </p> |
| |
| <source> |
| InstructionList il = new InstructionList(); |
| ... |
| GOTO g = new GOTO(null); |
| il.append(g); |
| ... |
| // Use immutable fly-weight object |
| InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); |
| g.setTarget(ih); |
| </source> |
| |
| <p> |
| <b>Inserting:</b> Instructions may be inserted anywhere into an |
| existing list. They are inserted before the given instruction |
| handle. All insert methods return a new instruction handle which |
| may then be used as the start address of an exception handler, for |
| example. |
| </p> |
| |
| <source> |
| InstructionHandle start = il.insert(insertion_point, |
| InstructionConstants.NOP); |
| ... |
| mg.addExceptionHandler(start, end, handler, "java.io.IOException"); |
| </source> |
| |
| <p> |
| <b>Deleting:</b> Deletion of instructions is also very |
| straightforward; all instruction handles and the contained |
| instructions within a given range are removed from the instruction |
| list and disposed. The <tt>delete()</tt> method may however throw |
| a <tt>TargetLostException</tt> when there are instruction |
| targeters still referencing one of the deleted instructions. The |
| user is forced to handle such exceptions in a <tt>try-catch</tt> |
| clause and redirect these references elsewhere. The <em>peep |
| hole</em> optimizer described in the appendix gives a detailed |
| example for this. |
| </p> |
| |
| <source> |
| try { |
| il.delete(first, last); |
| } catch(TargetLostException e) { |
| InstructionHandle[] targets = e.getTargets(); |
| for(int i=0; i < targets.length; i++) { |
| InstructionTargeter[] targeters = targets[i].getTargeters(); |
| for(int j=0; j < targeters.length; j++) |
| targeters[j].updateTarget(targets[i], new_target); |
| } |
| } |
| </source> |
| |
| <p> |
| <b>Finalizing:</b> When the instruction list is ready to be dumped |
| to pure byte code, all symbolic references must be mapped to real |
| byte code offsets. This is done by the <tt>getByteCode()</tt> |
| method which is called by default by |
| <tt>MethodGen.getMethod()</tt>. Afterwards you should call |
| <tt>dispose()</tt> so that the instruction handles can be reused |
| internally. This helps to improve memory usage. |
| </p> |
| |
| <source> |
| InstructionList il = new InstructionList(); |
| |
| ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", |
| "<generated>", ACC_PUBLIC | ACC_SUPER, |
| null); |
| MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, |
| Type.VOID, new Type[] { |
| new ArrayType(Type.STRING, 1) |
| }, new String[] { "argv" }, |
| "main", "HelloWorld", il, cp); |
| ... |
| cg.addMethod(mg.getMethod()); |
| il.dispose(); // Reuse instruction handles of list |
| </source> |
| |
| </section> |
| |
| <section name="3.3.5 Code example revisited"> |
| <p> |
| Using instruction lists gives us a generic view upon the code: In |
| <a href="#Figure 5">Figure 5</a> we again present the code chunk |
| of the <tt>readInt()</tt> method of the faculty example in section |
| <a href="#2.6 Code example">2.6</a>: The local variables |
| <tt>n</tt> and <tt>e1</tt> both hold two references to |
| instructions, defining their scope. There are two <tt>goto</tt>s |
| branching to the <tt>iload</tt> at the end of the method. One of |
| the exception handlers is displayed, too: it references the start |
| and the end of the <tt>try</tt> block and also the exception |
| handler code. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 5"> |
| <img src="images/il.gif"/> |
| <br/> |
| Figure 5: Instruction list for <tt>readInt()</tt> method</a> |
| </p> |
| |
| </section> |
| |
| <section name="3.3.6 Instruction factories"> |
| <p> |
| To simplify the creation of certain instructions the user can use |
| the supplied <tt>InstructionFactory</tt> class which offers a lot |
| of useful methods to create instructions from |
| scratch. Alternatively, he can also use <em>compound |
| instructions</em>: When producing byte code, some patterns |
| typically occur very frequently, for instance the compilation of |
| arithmetic or comparison expressions. You certainly do not want |
| to rewrite the code that translates such expressions into byte |
| code in every place they may appear. In order to support this, the |
| <font face="helvetica,arial">BCEL</font> API includes a <em>compound |
| instruction</em> (an interface with a single |
| <tt>getInstructionList()</tt> method). Instances of this class |
| may be used in any place where normal instructions would occur, |
| particularly in append operations. |
| </p> |
| |
| <p> |
| <b>Example: Pushing constants</b> Pushing constants onto the |
| operand stack may be coded in different ways. As explained in <a |
| href="#2.2 Byte code instruction set">section 2.2</a> there are |
| some "short-cut" instructions that can be used to make the |
| produced byte code more compact. The smallest instruction to push |
| a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other |
| possibilities are <tt>bipush</tt> (can be used to push values |
| between -128 and 127), <tt>sipush</tt> (between -32768 and 32767), |
| or <tt>ldc</tt> (load constant from constant pool). |
| </p> |
| |
| <p> |
| Instead of repeatedly selecting the most compact instruction in, |
| say, a switch, one can use the compound <tt>PUSH</tt> instruction |
| whenever pushing a constant number or string. It will produce the |
| appropriate byte code instruction and insert entries into to |
| constant pool if necessary. |
| </p> |
| |
| <source> |
| InstructionFactory f = new InstructionFactory(class_gen); |
| InstructionList il = new InstructionList(); |
| ... |
| il.append(new PUSH(cp, "Hello, world")); |
| il.append(new PUSH(cp, 4711)); |
| ... |
| il.append(f.createPrintln("Hello World")); |
| ... |
| il.append(f.createReturn(type)); |
| </source> |
| |
| </section> |
| |
| <section name="3.3.7 Code patterns using regular expressions"> |
| <p> |
| When transforming code, for instance during optimization or when |
| inserting analysis method calls, one typically searches for |
| certain patterns of code to perform the transformation at. To |
| simplify handling such situations <font |
| face="helvetica,arial">BCEL </font>introduces a special feature: |
| One can search for given code patterns within an instruction list |
| using <em>regular expressions</em>. In such expressions, |
| instructions are represented by their opcode names, e.g., |
| <tt>LDC</tt>, one may also use their respective super classes, e.g., |
| "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>, |
| <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus, |
| the expression |
| </p> |
| |
| <source>"NOP+(ILOAD|ALOAD)*"</source> |
| |
| <p> |
| represents a piece of code consisting of at least one <tt>NOP</tt> |
| followed by a possibly empty sequence of <tt>ILOAD</tt> and |
| <tt>ALOAD</tt> instructions. |
| </p> |
| |
| <p> |
| The <tt>search()</tt> method of class |
| <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular |
| expression and a starting point as arguments and returns an |
| iterator describing the area of matched instructions. Additional |
| constraints to the matching area of instructions, which can not be |
| implemented via regular expressions, may be expressed via <em>code |
| constraint</em> objects. |
| </p> |
| |
| </section> |
| |
| <section name="3.3.8 Example: Optimizing boolean expressions"> |
| <p> |
| In Java, boolean values are mapped to 1 and to 0, |
| respectively. Thus, the simplest way to evaluate boolean |
| expressions is to push a 1 or a 0 onto the operand stack depending |
| on the truth value of the expression. But this way, the |
| subsequent combination of boolean expressions (with |
| <tt>&&</tt>, e.g) yields long chunks of code that push |
| lots of 1s and 0s onto the stack. |
| </p> |
| |
| <p> |
| When the code has been finalized these chunks can be optimized |
| with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt> |
| (e.g. the comparison of two integers: <tt>if_icmpeq</tt>) that |
| either produces a 1 or a 0 on the stack and is followed by an |
| <tt>ifne</tt> instruction (branch if stack value 0) may be |
| replaced by the <tt>IfInstruction</tt> with its branch target |
| replaced by the target of the <tt>ifne</tt> instruction: |
| </p> |
| |
| <source> |
| CodeConstraint constraint = new CodeConstraint() { |
| public boolean checkCode(InstructionHandle[] match) { |
| IfInstruction if1 = (IfInstruction)match[0].getInstruction(); |
| GOTO g = (GOTO)match[2].getInstruction(); |
| return (if1.getTarget() == match[3]) && |
| (g.getTarget() == match[4]); |
| } |
| }; |
| |
| InstructionFinder f = new InstructionFinder(il); |
| String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; |
| |
| for(Iterator e = f.search(pat, constraint); e.hasNext(); ) { |
| InstructionHandle[] match = (InstructionHandle[])e.next();; |
| ... |
| match[0].setTarget(match[5].getTarget()); // Update target |
| ... |
| try { |
| il.delete(match[1], match[5]); |
| } catch(TargetLostException e) { ... } |
| } |
| </source> |
| |
| <p> |
| The applied code constraint object ensures that the matched code |
| really corresponds to the targeted expression pattern. Subsequent |
| application of this algorithm removes all unnecessary stack |
| operations and branch instructions from the byte code. If any of |
| the deleted instructions is still referenced by an |
| <tt>InstructionTargeter</tt> object, the reference has to be |
| updated in the <tt>catch</tt>-clause. |
| </p> |
| |
| <p> |
| <b>Example application:</b> |
| The expression: |
| </p> |
| |
| <source> |
| if((a == null) || (i < 2)) |
| System.out.println("Ooops"); |
| </source> |
| |
| <p> |
| can be mapped to both of the chunks of byte code shown in <a |
| href="#Figure 6">figure 6</a>. The left column represents the |
| unoptimized code while the right column displays the same code |
| after the peep hole algorithm has been applied: |
| </p> |
| |
| <p align="center"><a name="Figure 6"> |
| <table> |
| <tr> |
| <td valign="top"><pre> |
| 5: aload_0 |
| 6: ifnull #13 |
| 9: iconst_0 |
| 10: goto #14 |
| 13: iconst_1 |
| 14: nop |
| 15: ifne #36 |
| 18: iload_1 |
| 19: iconst_2 |
| 20: if_icmplt #27 |
| 23: iconst_0 |
| 24: goto #28 |
| 27: iconst_1 |
| 28: nop |
| 29: ifne #36 |
| 32: iconst_0 |
| 33: goto #37 |
| 36: iconst_1 |
| 37: nop |
| 38: ifeq #52 |
| 41: getstatic System.out |
| 44: ldc "Ooops" |
| 46: invokevirtual println |
| 52: return |
| </pre></td> |
| <td valign="top"><pre> |
| 10: aload_0 |
| 11: ifnull #19 |
| 14: iload_1 |
| 15: iconst_2 |
| 16: if_icmpge #27 |
| 19: getstatic System.out |
| 22: ldc "Ooops" |
| 24: invokevirtual println |
| 27: return |
| </pre></td> |
| </tr> |
| </table> |
| </a> |
| </p> |
| |
| </section> |
| |
| <section name="4 Application areas"> |
| <p> |
| There are many possible application areas for <font |
| face="helvetica,arial">BCEL</font> ranging from class |
| browsers, profilers, byte code optimizers, and compilers to |
| sophisticated run-time analysis tools and extensions to the Java |
| language. |
| </p> |
| |
| <p> |
| Compilers like the <a |
| href="http://barat.sourceforge.net">Barat</a> compiler use <font |
| face="helvetica,arial">BCEL</font> to implement a byte code |
| generating back end. Other possible application areas are the |
| static analysis of byte code or examining the run-time behavior of |
| classes by inserting calls to profiling methods into the |
| code. Further examples are extending Java with Eiffel-like |
| assertions, automated delegation, or with the concepts of <a |
| href="http://aspectj.org">Aspect-Oriented Programming</a>.<br/> A |
| list of projects using <font face="helvetica,arial">BCEL</font> can |
| be found <a href="projects.html">here</a>. |
| </p> |
| |
| </section> |
| |
| <section name="4.1 Class loaders"> |
| <p> |
| Class loaders are responsible for loading class files from the |
| file system or other resources and passing the byte code to the |
| Virtual Machine. A custom <tt>ClassLoader</tt> object may be used |
| to intercept the standard procedure of loading a class, i.e.m the |
| system class loader, and perform some transformations before |
| actually passing the byte code to the JVM. |
| </p> |
| |
| <p> |
| A possible scenario is described in <a href="#Figure 7">figure |
| 7</a>: |
| During run-time the Virtual Machine requests a custom class loader |
| to load a given class. But before the JVM actually sees the byte |
| code, the class loader makes a "side-step" and performs some |
| transformation to the class. To make sure that the modified byte |
| code is still valid and does not violate any of the JVM's rules it |
| is checked by the verifier before the JVM finally executes it. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 7"> |
| <img src="images/classloader.gif"/> |
| <br/> |
| Figure 7: Class loaders |
| </a> |
| </p> |
| |
| <p> |
| Using class loaders is an elegant way of extending the Java |
| Virtual Machine with new features without actually modifying it. |
| This concept enables developers to use <em>load-time |
| reflection</em> to implement their ideas as opposed to the static |
| reflection supported by the <a |
| href="http://java.sun.com/j2se/1.3/docs/guide/reflection/index.html">Java |
| Reflection API</a>. Load-time transformations supply the user with |
| a new level of abstraction. He is not strictly tied to the static |
| constraints of the original authors of the classes but may |
| customize the applications with third-party code in order to |
| benefit from new features. Such transformations may be executed on |
| demand and neither interfere with other users, nor alter the |
| original byte code. In fact, class loaders may even create classes |
| <em>ad hoc</em> without loading a file at all.<br/> <font |
| face="helvetica,arial">BCEL</font> has already builtin support for |
| dynamically creating classes, an example is the <a |
| href="../examples/ProxyCreator.java">ProxyCreator</a> class. |
| </p> |
| |
| </section> |
| |
| <section name="4.1.1 Example: Poor Man's Genericity"> |
| <p> |
| The <a href="http://www.inf.fu-berlin.de/~bokowski/pmgjava/">"Poor |
| Man's Genericity"</a> project that extends Java with parameterized |
| classes, for example, uses <font |
| face="helvetica,arial">BCEL</font> in two places to generate |
| instances of parameterized classes: During compile-time (with the |
| standard <tt>javac</tt> with some slightly changed classes) and at |
| run-time using a custom class loader. The compiler puts some |
| additional type information into class files (attributes) which is |
| evaluated at load-time by the class loader. The class loader |
| performs some transformations on the loaded class and passes them |
| to the VM. The following algorithm illustrates how the load method |
| of the class loader fulfills the request for a parameterized |
| class, e.g., <tt>Stack<String></tt> |
| </p> |
| |
| <p> |
| <ol type="1"> |
| <li> Search for class <tt>Stack</tt>, load it, and check for a |
| certain class attribute containing additional type |
| information. I.e. the attribute defines the "real" name of the |
| class, i.e., <tt>Stack<A></tt>.</li> |
| |
| <li>Replace all occurrences and references to the formal type |
| <tt>A</tt> with references to the actual type <tt>String</tt>. For |
| example the method |
| </li> |
| |
| <source> |
| void push(A obj) { ... } |
| </source> |
| |
| <p> |
| becomes |
| </p> |
| |
| <source> |
| void push(String obj) { ... } |
| </source> |
| |
| <li> Return the resulting class to the Virtual Machine.</li> |
| </ol> |
| </p> |
| |
| </section> |
| |
| <section name="A Appendix"> |
| </section> |
| |
| <section name="HelloWorldBuilder"> |
| <p> |
| The following program reads a name from the standard input and |
| prints a friendly "Hello". Since the <tt>readLine()</tt> method may |
| throw an <tt>IOException</tt> it is enclosed by a <tt>try-catch</tt> |
| clause. |
| </p> |
| |
| <source> |
| import java.io.*; |
| |
| public class HelloWorld { |
| public static void main(String[] argv) { |
| BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); |
| String name = null; |
| |
| try { |
| System.out.print("Please enter your name> "); |
| name = in.readLine(); |
| } catch(IOException e) { return; } |
| |
| System.out.println("Hello, " + name); |
| } |
| } |
| </source> |
| |
| <p> |
| We will sketch here how the above Java class can be created from the |
| scratch using the <font face="helvetica,arial">BCEL</font> API. For |
| ease of reading we will use textual signatures and not create them |
| dynamically. For example, the signature |
| </p> |
| |
| <source>"(Ljava/lang/String;)Ljava/lang/StringBuffer;"</source> |
| |
| <p> |
| actually be created with |
| </p> |
| |
| <source>Type.getMethodSignature(Type.STRINGBUFFER, new Type[] { Type.STRING });</source> |
| |
| <p><b>Initialization:</b> |
| First we create an empty class and an instruction list: |
| </p> |
| |
| <source> |
| ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", |
| "<generated>", ACC_PUBLIC | ACC_SUPER, |
| null); |
| ConstantPoolGen cp = cg.getConstantPool(); // cg creates constant pool |
| InstructionList il = new InstructionList(); |
| </source> |
| |
| <p> |
| We then create the main method, supplying the method's name and the |
| symbolic type signature encoded with <tt>Type</tt> objects. |
| </p> |
| |
| <source> |
| MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, // access flags |
| Type.VOID, // return type |
| new Type[] { // argument types |
| new ArrayType(Type.STRING, 1) }, |
| new String[] { "argv" }, // arg names |
| "main", "HelloWorld", // method, class |
| il, cp); |
| InstructionFactory factory = new InstructionFactory(cg); |
| </source> |
| |
| <p> |
| We now define some often used types: |
| </p> |
| |
| <source> |
| ObjectType i_stream = new ObjectType("java.io.InputStream"); |
| ObjectType p_stream = new ObjectType("java.io.PrintStream"); |
| </source> |
| |
| <p><b>Create variables <tt>in</tt> and <tt>name</tt>:</b> We call |
| the constructors, i.e., execute |
| <tt>BufferedReader(InputStreamReader(System.in))</tt>. The reference |
| to the <tt>BufferedReader</tt> object stays on top of the stack and |
| is stored in the newly allocated <tt>in</tt> variable. |
| </p> |
| |
| <source> |
| il.append(factory.createNew("java.io.BufferedReader")); |
| il.append(InstructionConstants.DUP); // Use predefined constant |
| il.append(factory.createNew("java.io.InputStreamReader")); |
| il.append(InstructionConstants.DUP); |
| il.append(factory.createFieldAccess("java.lang.System", "in", i_stream, |
| Constants.GETSTATIC)); |
| il.append(factory.createInvoke("java.io.InputStreamReader", "<init>", |
| Type.VOID, new Type[] { i_stream }, |
| Constants.INVOKESPECIAL)); |
| il.append(factory.createInvoke("java.io.BufferedReader", "<init>", Type.VOID, |
| new Type[] {new ObjectType("java.io.Reader")}, |
| Constants.INVOKESPECIAL)); |
| |
| LocalVariableGen lg = mg.addLocalVariable("in", |
| new ObjectType("java.io.BufferedReader"), null, null); |
| int in = lg.getIndex(); |
| lg.setStart(il.append(new ASTORE(in))); // "i" valid from here |
| </source> |
| |
| <p> |
| Create local variable <tt>name</tt> and initialize it to <tt>null</tt>. |
| </p> |
| |
| <source> |
| lg = mg.addLocalVariable("name", Type.STRING, null, null); |
| int name = lg.getIndex(); |
| il.append(InstructionConstants.ACONST_NULL); |
| lg.setStart(il.append(new ASTORE(name))); // "name" valid from here |
| </source> |
| |
| <p><b>Create try-catch block:</b> We remember the start of the |
| block, read a line from the standard input and store it into the |
| variable <tt>name</tt>. |
| </p> |
| |
| <source> |
| InstructionHandle try_start = |
| il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, |
| Constants.GETSTATIC)); |
| |
| il.append(new PUSH(cp, "Please enter your name> ")); |
| il.append(factory.createInvoke("java.io.PrintStream", "print", Type.VOID, |
| new Type[] { Type.STRING }, |
| Constants.INVOKEVIRTUAL)); |
| il.append(new ALOAD(in)); |
| il.append(factory.createInvoke("java.io.BufferedReader", "readLine", |
| Type.STRING, Type.NO_ARGS, |
| Constants.INVOKEVIRTUAL)); |
| il.append(new ASTORE(name)); |
| </source> |
| |
| <p> |
| Upon normal execution we jump behind exception handler, the target |
| address is not known yet. |
| </p> |
| |
| <source> |
| GOTO g = new GOTO(null); |
| InstructionHandle try_end = il.append(g); |
| </source> |
| |
| <p> |
| We add the exception handler which simply returns from the method. |
| </p> |
| |
| <source> |
| InstructionHandle handler = il.append(InstructionConstants.RETURN); |
| mg.addExceptionHandler(try_start, try_end, handler, "java.io.IOException"); |
| </source> |
| |
| <p> |
| "Normal" code continues, now we can set the branch target of the <tt>GOTO</tt>. |
| </p> |
| |
| <source> |
| InstructionHandle ih = |
| il.append(factory.createFieldAccess("java.lang.System", "out", p_stream, |
| Constants.GETSTATIC)); |
| g.setTarget(ih); |
| </source> |
| |
| <p><b>Printing "Hello":</b> |
| String concatenation compiles to <tt>StringBuffer</tt> operations. |
| </p> |
| |
| <source> |
| il.append(factory.createNew(Type.STRINGBUFFER)); |
| il.append(InstructionConstants.DUP); |
| il.append(new PUSH(cp, "Hello, ")); |
| il.append(factory.createInvoke("java.lang.StringBuffer", "<init>", |
| Type.VOID, new Type[] { Type.STRING }, |
| Constants.INVOKESPECIAL)); |
| il.append(new ALOAD(name)); |
| il.append(factory.createInvoke("java.lang.StringBuffer", "append", |
| Type.STRINGBUFFER, new Type[] { Type.STRING }, |
| Constants.INVOKEVIRTUAL)); |
| il.append(factory.createInvoke("java.lang.StringBuffer", "toString", |
| Type.STRING, Type.NO_ARGS, |
| Constants.INVOKEVIRTUAL)); |
| |
| il.append(factory.createInvoke("java.io.PrintStream", "println", |
| Type.VOID, new Type[] { Type.STRING }, |
| Constants.INVOKEVIRTUAL)); |
| il.append(InstructionConstants.RETURN); |
| </source> |
| |
| |
| <p><b>Finalization:</b> Finally, we have to set the stack size, |
| which normally would have to be computed on the fly and add a |
| default constructor method to the class, which is empty in this |
| case. |
| </p> |
| |
| <source> |
| mg.setMaxStack(); |
| cg.addMethod(mg.getMethod()); |
| il.dispose(); // Allow instruction handles to be reused |
| cg.addEmptyConstructor(ACC_PUBLIC); |
| </source> |
| |
| <p> |
| Last but not least we dump the <tt>JavaClass</tt> object to a file. |
| </p> |
| |
| <source> |
| try { |
| cg.getJavaClass().dump("HelloWorld.class"); |
| } catch(java.io.IOException e) { System.err.println(e); } |
| </source> |
| |
| </section> |
| |
| <section name="Peephole optimizer"> |
| <p> |
| This class implements a simple peephole optimizer that removes any NOP |
| instructions from the given class. |
| </p> |
| |
| <source> |
| import java.io.*; |
| |
| import java.util.Iterator; |
| import org.apache.bcel.classfile.*; |
| import org.apache.bcel.generic.*; |
| import org.apache.bcel.Repository; |
| import org.apache.bcel.util.InstructionFinder; |
| |
| public class Peephole { |
| public static void main(String[] argv) { |
| try { |
| /* Load the class from CLASSPATH. |
| */ |
| JavaClass clazz = Repository.lookupClass(argv[0]); |
| Method[] methods = clazz.getMethods(); |
| ConstantPoolGen cp = new ConstantPoolGen(clazz.getConstantPool()); |
| |
| for(int i=0; i < methods.length; i++) { |
| if(!(methods[i].isAbstract() || methods[i].isNative())) { |
| MethodGen mg = new MethodGen(methods[i], |
| clazz.getClassName(), cp); |
| Method stripped = removeNOPs(mg); |
| |
| if(stripped != null) // Any NOPs stripped? |
| methods[i] = stripped; // Overwrite with stripped method |
| } |
| } |
| |
| /* Dump the class to "class name"_.class |
| */ |
| clazz.setConstantPool(cp.getFinalConstantPool()); |
| clazz.dump(clazz.getClassName() + "_.class"); |
| } catch(Exception e) { e.printStackTrace(); } |
| } |
| |
| private static final Method removeNOPs(MethodGen mg) { |
| InstructionList il = mg.getInstructionList(); |
| InstructionFinder f = new InstructionFinder(il); |
| String pat = "NOP+"; // Find at least one NOP |
| InstructionHandle next = null; |
| int count = 0; |
| |
| for(Iterator i = f.search(pat); i.hasNext(); ) { |
| InstructionHandle[] match = (InstructionHandle[])e.next(); |
| InstructionHandle first = match[0]; |
| InstructionHandle last = match[match.length - 1]; |
| |
| /* Some nasty Java compilers may add NOP at end of method. |
| */ |
| if((next = last.getNext()) == null) |
| break; |
| |
| count += match.length; |
| |
| /* Delete NOPs and redirect any references to them to the following |
| * (non-nop) instruction. |
| */ |
| try { |
| il.delete(first, last); |
| } catch(TargetLostException e) { |
| InstructionHandle[] targets = e.getTargets(); |
| for(int i=0; i < targets.length; i++) { |
| InstructionTargeter[] targeters = targets[i].getTargeters(); |
| |
| for(int j=0; j < targeters.length; j++) |
| targeters[j].updateTarget(targets[i], next); |
| } |
| } |
| } |
| |
| Method m = null; |
| |
| if(count > 0) { |
| System.out.println("Removed " + count + " NOP instructions from method " + |
| mg.getName()); |
| m = mg.getMethod(); |
| } |
| |
| il.dispose(); // Reuse instruction handles |
| return m; |
| } |
| } |
| </source> |
| </section> |
| <section name="Constant pool UML diagram"> |
| |
| <p align="center"> |
| <a name="Figure 8"> |
| <img src="images/constantpool.gif"/> |
| <br/> |
| Figure 8: UML diagram for constant pool classes |
| </a> |
| </p> |
| </section> |
| </body> |
| </document> |