A comparison of Java and C++

This document is from a project done during my third year of Computer Science at the University of the Witwatersrand, in South Africa.

Introduction:
Java is designed to be simple, object oriented and similar to C++ while removing the unnecessary complexities of C++. It is also said to be a robust, architecturally neutral, portable, interpreted, threaded, dynamic and high performance language. Java enables the development of robust applications on multiple platforms in heterogeneous, distributed networks. C++ is not truly portable nor is it suited to heterogeneous, distributed networks. While C++ excels in high performance, its powerful features and complexities are often the source of many errors. This paper hopes to critically evaluate various aspects of the two languages, firstly in C++ and then in Java.

Inheritance model:
C++: Implements the multiple inheritance model, where a class can inherit from one or more superclasses. While it may be considered a powerful feature, many find it complicated and confusing and thus creating problems for the programmer.
Java: Implements a single inheritance model, where a class can only inherit from one superclass. To provide the desirable features of multiple inheritance, Java provides interfaces. A class can inherit from multiple interfaces, where the interface only declares the methods and does not implement them. Therefore the class must implement the methods of the interfaces. As a result a reasonable alternative to multiple inheritance is provided by interfaces.

Module system and linking regime:
C++: Modules in C++ usually consist of a .H header file and a .CPP source code file. The header file contains the declaration or interface of classes, functions, unions, structures, etc. while the .CPP source code file contains the implementation of the classes, functions, etc. When a module is compiled a binary object file is generated with the same file name as the .CPP source code file, but with an .OBJ file extension. Once all modules of the application are compiled into binary object files, they are linked to form the executable. The binary object files do not need to be distributed to the end user with the executable. Binary object files are, however, often distributed to programmers with the corresponding header file. In this way the programmer can access the classes, functions, etc. in the module without having access to the full source code. While there are no rules as to what should and should not go into a certain module, it makes sense to group classes, functions, etc., which relate to each other in some way, into the same module.
Java: Packages form the module system in Java. Packages consist of classes and interfaces, with no separate header and source files as in C++. It is important that classes and interfaces which relate to each other are the same package due the default "friendly" members which are only accessible to objects within the same package. As in C++, the primary use of the module system is to break large applications into smaller, more manageable units. Packages, however, do not get linked to form one executable or one large bytecode file. Instead classes are loaded and linked on the fly as needed from a variety of sources, even across networks. The language and run-time system are thus dynamic in their linking stages. The dynamically linked code is verified before it is interpreted and executed.

Program processing:
C++: Programs written in C++ are compiled into machine code specific to a particular hardware architecture and operating system. The result is an executable binary which runs directly on the hardware. The advantage of compilation is a program which can run easily more than one hundred times faster than a program which is interpreted. The main disadvantage is that C++ programs are not portable once compiled and although it is possible to compile one and the same C++ program to various platforms, this is usually done with great difficulty, to say the least.
Java: Portability and architecture neutrality are probably the two main attractions of Java. To achieve this Java programs are both interpreted and compiled. Java programs are compiled to bytecode to form packages or .class files. This bytecode can then be run on any system on which the Java virtual machine ( JVM ), which consists of the Java interpreter and run-time system, has been implemented. The JVM then interprets and executes the Java bytecode. The main advantage of this approach over just using an interpreter is that bytecode executed by the JVM achieves near machine code performance.

Security:
- memory management
C++: Memory is explicitly allocated, deallocated and kept track of what memory can be freed when by the programmer. This burden for the programmer of memory management is often the source of bugs, crashes, memory leaks and poor performance.
Java: The memory management burden is removed by the automatic garbage collector. While Java has a "new" operator to allocate memory for objects, the programmer doesn't have to worry about releasing the memory. The garbage collector automatically does so when no references to the object exist. The garbage collector runs in a low priority thread, so that when the process is idle, the garbage collector releases objects longer in use, as well as gathering and compacting unused memory so that more memory will be available when needed. The result is better performance, easier programming and more robust applications.
- pointers
C++: The use of pointers is common and provide a great deal of flexibility. Unfortunately due to it's complex nature it is also the source of many bugs, such as the dangling pointer.
Java: There are no pointers in Java. Any task which would require pointers in C++ can be replaced by the use of objects and arrays of objects in Java. Array indexing in Java is checked by the Java run-time system to ensure indices are within the bounds of the array. The real memory addresses of objects can only be accessed by the Java interpreter at run-time. The programmer can't forge pointers to memory because the memory allocation and referencing model is controlled entirely by the Java virtual machine. The result is more reliable and secure applications.
- linking
C++: There are no security issues to deal with when linking C++ binary object code because the code is linked from a known source at compile-time and not from possible unknown sources at run-time, as is the case with Java.
Java: Java bytecode which is linked at run-time can come from local systems and/or across the network from a host far away. Because the source of packages is possibly unknown the Java run-time system does not trust the incoming code. The bytecode is put through a simple theorem prover, called the bytecode verifier, to ensure that the code does not forge pointers, violate access restrictions or access objects as something they are not. Thus the bytecode verifier ensures that the code is safe to be executed by the interpreter. In this way the dynamic linking process of the Java virtual machine is secure.

Reference semantics:
C++: In C++, one has a choice of whether or not to use reference semantics. If one does not use reference semantics, the variable is the object and memory is automatically allocated and deallocated when the variable goes out of scope. The memory for the object is allocated on the run-time stack. This method is efficient but does not support dynamic binding and polymorphism. If one uses reference semantics, the variable is a pointer to the object and memory must be explicitly allocated and deallocated. The memory for the object is allocated on the heap. Dynamic allocation is therefore more flexible but has a higher overhead.
Java: Java only uses reference semantics. Since there are no pointers in Java, all references to the object are through symbolic "handles" and memory must be explicitly allocated using the "new" operator. The garbage collector deallocates the memory when no more references to the object exist.

Concurrency:
C++: Programs written in C++ are typically single-threaded, meaning that only one thing happens at a time. The language does not have built in support for threads, although there are libraries to simulate multithreading, this is usually done with great difficulty. Also given any library function, there is no way of ensuring that the implementation of the function allows it to be executed by multiple concurrent threads of execution.
Java: Clearly for multithreading to be viable, it must be implemented at the language level. Java supports threads at the syntactic level, from its run-time system and thread objects. Java's multithreading capability provides the means to build applications with many concurrent threads of activity, which results in a high degree of user interactivity. Although multithreading is built into the language, one should still take care to implement thread-safe classes and methods, like the Java run-time libraries are implemented.

Compile-time and run-time checking:
C++: The compile-time and run-time checking is thorough and C++ is strongly typed. Due to aspects of C++ such as pointers and automatic coercions, it is very difficult for the compiler to pick up problems relating to these constructs. If an error occurs at run-time, such as an invalid pointer, an exception is raised and the program is most likely to be terminated.
Java: The Java compiler also employs strict and extensive compile-time checking, to eliminate syntax-related errors before the bytecode is distributed. Java is strongly typed and the compiler does not allow automatic coercions as in C++, instead requires an explicit cast. The run-time checking is equally extensive and repeats many of the type checks done by the compiler. Since there are no pointers or automatic coercions in Java many of the problems associated with such constructs are removed. If run-time errors due occur, an exception is raised and handled, as in C++, by an exception handler.

Access control:
C++: A member of a class can be:
i) private: the member can be used only by member functions and friends of the class in which it is declared.
ii) protected: the member can be used only by member functions and friends of the class in which it is declared and by member functions and friends of classes derived from this class.
iii) public: the member can be used by any function.
Members of a class defined with the keyword "class" are private by default. Members of a class defined with the keywords "struct" or "union" are public by default.
The keyword "friend" denotes a function or class which is a friend of a class. This friend has full access rights to the private and protected members of the class. Many feel that friends violate the principle of data hiding.
Java: A member of a class can be private, protected or public as defined in C++, with the exception that there is no such thing as friends of a class in Java. Members of a class are "friendly" ( which has nothing to do with friends in C++ ) by default, meaning that the members of the class are accessible to all objects within the same package but inaccessible to objects outside the package.


This document was last modified 2 May 1998.