What are the advantages of compact strings over compressed strings in JDK9?
See Question&Answers more detail:osWhat are the advantages of compact strings over compressed strings in JDK9?
See Question&Answers more detail:osCompressed strings (Java 6) and compact strings (Java 9) both have the same motivation (strings are often effectively Latin-1, so half the space is wasted) and goal (make those strings small) but the implementations differ a lot.
In an interview Aleksey Shipil?v (who was in charge of implementing the Java 9 feature) had this to say about compressed strings:
UseCompressedStrings feature was rather conservative: while distinguishing between
char[]
andbyte[]
case, and trying to compress thechar[]
intobyte[]
onString
construction, it done mostString
operations onchar[]
, which required to unpack theString.
Therefore, it benefited only a special type of workloads, where most strings are compressible (so compression does not go to waste), and only a limited amount of knownString
operations are performed on them (so no unpacking is needed). In great many workloads, enabling-XX:+UseCompressedStrings
was a pessimization.[...] UseCompressedStrings implementation was basically an optional feature that maintained a completely distinct
String
implementation inalt-rt.jar
, which was loaded once the VM option is supplied. Optional features are harder to test, since they double the number of option combinations to try.
In Java 9 on the other hand, compact strings are fully integrated into the JDK source. String
is always backed by byte[]
, where characters use one byte if they are Latin-1 and otherwise two. Most operations do a check to see which is the case, e.g. charAt
:
public char charAt(int index) {
if (isLatin1()) {
return StringLatin1.charAt(value, index);
} else {
return StringUTF16.charAt(value, index);
}
}
Compact strings are enabled by default and can be partially disabled - "partially" because they are still backed by a byte[]
and operations returning char
s must still put them together from two separate bytes (due to intrinsics it is hard to say whether this has a performance impact).
If you're interested in more background on compact strings I recommend to read the interview I linked to above and/or watch this great talk by the same Aleksey Shipil?v (which also explains the new string concatenation).