September 6, 2002
Q: I've received quite a collection of string questions in my JavaWorld Java Q&A mailbox. Here is just a sampling:
"If I create two strings the following way:
String string1 = "hello world"; String string2 = new String("hello world");
string2will have the same hash code. Does that mean that they are actually the same object in the JVM?"
"In Java, if I create a
Stringobject, I can compare it to other
Stringobjects using the
equals()method. However, if I initialize a
and another like this:
then I can compare them using the
"If I code this:
a = "hello" b = "hello" c = new string("hello") d = "hello"
bboth refer to the same
Stringobject in memory, whereas
crefers to a separate
Stringobject that exists concurrently with the first
Stringobject. Therefore, two
"hello"objects exist in memory. Which of the two
drefer to? How is it decided?"
Let's examine each question in turn.
Question 1: Are they the same object?
You are correct: both objects will have the same hash code. As stated in the Javadocs, the string's hash code is computed according to the following formula:
s*31^(n-1) + s*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Since both strings have the same character sequence, the
hashcode() methods will compute the same value. That being said,
string2 do not point to the same object. They point to different objects! The call
String string1 = "hello world"; results in the allocation of one
"hello world". Explicitly calling
String string2 = new String("hello world"); forces the creation of a second
String object in memory:
String allocation, like all object allocation, proves costly in both time and memory. To cut down the number of
String objects created in the JVM, the
String class keeps a pool of strings. Each time you create a string literal, the pool is checked. If the string already exists in the pool, a reference to the pooled instance returns. If the string does not exist in the pool, a new
String object instantiates, then is placed in the pool. Java can make this optimization since strings are immutable and can be shared without fear of data corruption.
Unfortunately, creating a string through
new defeats this pooling mechanism by creating multiple
String objects, even if an equal string already exists in the pool. Considering all that, avoid
new String unless you specifically know that you need it!
Question 2: Why can we use equals() and == on strings?
The answer to Question 2 directly relates to the answer to Question 1. Because string literals are pooled, the following code causes the JVM to create just one
String str="hello"; String s="hello"
Thus, the reference pointed to by
s is actually the same. Therefore,
== returns the correct result. However, relying on
== for string equality checking is unsafe. Let's say someone says
String tricky = new String("hello"). In that case, the JVM will not check whether a
"hello" string object already exists. Instead, the JVM will blindly allocate another string in memory. The
new call forces the JVM to create a new object. If you say
tricky == s, the expression will return false, since
s are not references to the same object.
== to check whether two references refer to the same object. Use
equals() to check whether the contents of two objects are equal. Do not assume that
== will always work for testing strings!
Question 3: How is it decided?
As for Question 3, in the case of
d = "hello",
d points to the same reference as that pointed to by both
b. Before the JVM creates a string literal, the JVM checks the string literal pool first. Since
"hello" already exists in that pool,
d will simply be set to point to that pooled instance.
String wrap up
All three questions are related because the JVM performs some trickery while instantiating string literals to increase performance and decrease memory overhead. The JVM can reuse and share string references because strings are immutable and, therefore, by definition thread-safe. That's a nice design; although a design you must be aware of and treat accordingly.
Learn more about this topic
- For more on strings, read these JavaWorld articles:
- "Java Tip 112Improve Tokenization of Information-Rich Strings," Bhabani Padhi (June 2001)
- "A Boolean Wrapped with String," Tony Sintes (November 2001)
- "StringBuffer Versus String," Reggie Hutcherson (March 2000)
- The Class String Javadoc from java.sun.com
- Browse the Core Java section of JavaWorld's Topical Index
- Want more? See the Java Q&A index page for the full Q&A catalog
- For more than 100 insightful Java tips from some of the best minds in the business, visit JavaWorld's Java Tips index page
- Learn the basics of object-oriented programming in our Programming Theory & Practice discussion
- Sign up for JavaWorld's free weekly Applied Java email newsletter
- You'll find a wealth of IT-related articles from our sister publications at IDG.net