Tuesday 29 October 2013

using super and sub keyword java

Understanding super types and subtypes.

Subtypes and Subclasses

Subtypes

We have used closed arrows in module dependence diagrams and object models. In MDDs, a closed arrow from an implementation ArrayList to a specification List means that ArrayList “meets” the specification of List. In an object model, a closed arrow denotes a subset relationship between the two sets.

We can also draw a closed arrow from A to B in an MDD, where A and B are implementation parts.
Here, the specifications of A and B are not drawn, but are implied in the MDD. A and B are modules with objects and associated methods. We say that that A is a B if every A object is also a B object. For instance, every automobile is a vehicle, and every bicycle is a vehicle, and every pogo stick is a vehicle; every vehicle is a mode of transport, as is every pack animal.

This subset relationship is a necessary but not sufficient condition for a subtyping relationship. Type A is a subtype of type B when A’s specification implies B’s specification. That is, any object (or class) that satisfies A’s specification also satisfies B’s specification, because B’s specification is weaker.

The definition of subtyping depends on the definition of strong versus weak specifications. In 6.170, we will define true subtyping to mean that anywhere in the code, if you expect a B object, an A object is acceptable.1 Code written to work with B objects (and to depend on their properties) is guaranteed to continue to work if it is supplied A objects instead; furthermore, the behavior will be the same, if we only consider the aspects of A’s behavior that is also included in B’s behavior. (A may introduce new behaviors that B does not have, but it may only change existing B behaviors in certain ways; see below.)

With no hubris whatsoever, we call the 6.170 notion of subtyping, true subtyping to distinguish it from Java subtypes, which correspond to a weaker notion. 

Example: Bicycles

Suppose we have a class for representing bicycles. Here is a partial implementation:

class Bicycle 

{

private int framesize;
private int chainringGears;
private int freewheelGears;
...
// returns the number of gears on this bicycle
public int gears() { return chainringGears * freewheelGears; }
// returns the cost of this bicycle
public float cost() { ... }
// returns the sales tax owed on this bicycle
public float salesTax() { return cost() * .0825; }
// effects: transports the rider from work to home
public void goHome() { ... }
...
}
A new class representing bicycles with headlamps can accommodate late nights (or early mornings).
class LightedBicycle {
private int framesize;
private int chainringGears;
private int freewheelGears;
private BatteryType battery;
...
// returns the number of gears on this bicycle
public int gears() { return chainringGears * freewheelGears; }
// returns the cost of this bicycle
float cost() { ... }
// returns the sales tax owed on this bicycle
public float salesTax() { return cost() * .0825; }
// effects: transports the rider from work to home
public void goHome() { ... }
// effects: replaces the existing battery with the argument b
public void changeBattery(BatteryType b);
...
}

Copying all the code is tiresome and error-prone. (The error might be failure to copy correctly or failure to make a required change.) Additionally, if a bug is found in one version, it is easy to forget to propagate the fix to all versions of the code. Finally, it is very hard to comprehend the distinction the two classes by looking for differences themselves in a mass of similarities. 2

Java and other programming languages use subclassing to overcome these difficulties. Subclassing
permits reuse of implementations and overriding of methods. A better implementation of LightedBicycle is

class LightedBicycle extends Bicycle {
private BatteryType battery;
...
// returns the cost of this bicycle
float cost() { return super.cost() + battery.cost(); }
// effects: transports the rider from work to home
public void goHome() { ... }
// effects: replaces the existing battery with the argument b
public void changeBattery(BatteryType b);
...
}

LightedBicycle need not implement methods and fields that appear in its superclass Bicycle; the Bicycle versions are automatically used by Java when they are not overridden in the subclass.
Consider the following implementations of the goHome method (along with more complete specifications).

If these are the only changes, are LightedBicycle and RacingBicycle subtypes of Bicycle? (For the time being we will talk about subtyping; we’ll return to the differences between Java subclassing, Java subtyping, and true subtyping later.)

class Bicycle {
...
// requires: windspeed < 20mph && daylight
// effects: transports the rider from work to home
void goHome() { ... }
}

class LightedBicycle {
...
// requires: windspeed < 20mph
// effects: transports the rider from work to home
void goHome() { ... }
}

class RacingBicycle {
...
// requires: windspeed < 20mph && daylight
// effects: transports the rider from work to home
// in an elapsed time of < 10 minutes
// && gets the rider sweaty
void goHome() { ... }
}

To answer that question, recall the definition of subtyping: can an object of the subtype be substituted anywhere that code expects an object of the supertype? If so, the subtyping relationship is valid. 3
In this case, both LightedBicycle and RacingBicycle are subtypes of Bicycle. In the first case, the requirement is relaxed; in the second case, the effects are strengthened in a way that still satisfies the superclass’s effects.

The cost method of LightedBicycle shows another capability of subclassing in Java. Methods
can be overridden to provide a new implementation in a subclass. This enables more code reuse; in particular, LightedBicycle can reuse Bicycle’s salesTax method. When salesTax is invoked on a LightedBicycle, the Bicycle version is used instead. Then, the call to cost inside salesTax invokes the version based on the runtime type of the object (LightedBicycle), so the LightedBicycle version is used. Regardless of the declared type of an object, an implementation of a method with multiple implementations (of the same signature) is always selected based on the run-time type.

In fact, there is no way for an external client to invoke the version of a method specified by the declared type or any other type that is not the run-time type. 

This is an important and very desirable property of Java (and other object-oriented languages). Suppose that the subclass maintains some extra fields which are kept in sync with fields of the superclass. If superclass methods could be invoked directly, possibly modifying superclass fields without also updating subclass fields, then the representation invariant of the subclass would be broken.

A subclass may invoke methods of its parent via use of super, however. Sometimes this is useful when 
the subclass method needs to do just a little bit more work; recall the LightedBicycle implementation of cost:

class LightedBicycle extends Bicycle {
//returns the cost of this bicycle
float cost()
{
return super.cost() + battery.cost(); 
}
}

Suppose the Rider class models people who ride bicycles. In the absence of subclassing and subtypes, the module dependence diagram would look something like this:

Rider  Bicycle  LightedBicycle  RacingBicycle  PennyFarthing
The code for Rider would also need to test which type of object it had been passed, which would be ugly, verbose, and error-prone.

With subtyping, the MDD dependences look like this:
Rider  Bicycle  LightedBicycle  RacingBicycle  PennyFarthing3
The many dependences have been reduced to a single one.

When subtype arrows are added, the diagram is only a bit more complicated:
Rider Bicycle  LightedBicycle  RacingBicycle  PennyFarthing
Even though there are just as many arrows, this diagram is much simpler than the original one: dependence edges complicate designs and implementations more than other types of edge.

Substitution principle

The substitution principle is the theoretical underpinning of subtypes; it provides a precise definition of when two types are subtypes. Informally, it states that subtypes must be substitutable for supertypes. This guarantees that if code depends on (any aspect of) a supertype, but an object of a subtype is substituted, system behavior will not be affected. (The Java compiler also requires that the extends or implements clause names the parent in order for subtypes to be used in place of supertypes.)
The methods of a subtype must hold certain relationships to the methods of the supertype, and the subtype must guarantee that any properties of the supertype (such as representation invariants or specification constraints) are not violated by the subtype.

methods There are two necessary properties:

1.For each method in the supertype, the subtype must have a corresponding method. (The subtype is allowed to introduce additional, new methods that do not appear in the supertype.)
2.Each method in subtype that corresponds to one in the supertype:


•requires less (has a weaker precondition)


–there are no more “requires” clauses, and each one is no more strict than the one in the supertype method.
–the argument types may be supertypes of the ones in the supertype. This is called contravariance, and it feels somewhat backward, because the arguments to the subtype method are supertypes of the arguments to the supertype method. However, it makes sense, because any arguments passed to the supertype method are sure to be legal arguments to the subtype method.

•guarantees more (has a stronger postcondition)
–there are no more exceptions
–there are no more modified variables
–in the description of the result and/or result state, there are more clauses, and they describe stronger properties
–the result type may be a subtype of that of the supertype. This is called covariance:
the return type of the subtype method is a subtype of the return type of the supertype method. 

5 (The above descriptions should all permit equality; for instance, “requires less” should be “requires no more”, and “less strict” should be “no more strict”. They are stated in this form for ease of reading.) The subtype method should not promise to have more or different results; it merely promises to do what the supertype method did, but possibly to ensure additional properties
as well. For instance, if a supertype method returns a number larger than its argument, a subtype method could return a prime number larger than its argument. As an example of the type constraints, 

if A is a subtype of B, then the following would be a legal overriding:
Bicycle B.f(Bicycle arg);
RacingBicycle A.f(Vehicle arg); Method B.f takes a Bicycle as its argument, but A.f can accept any Vehicle (which includes all Bicycles). Method B.f returns a Bicycle as its result, but A.f returns a RacingBicycle (which is itself a Bicycle).
properties Any properties guaranteed by a supertype, such as constraints over the values that may appear in specification fields, must be guaranteed by the subtype as well. (The subtype is permitted to strengthen these constraints.)

As a simple example from the book, consider FatSet, which is always nonempty.

class FatSet { // Specification constraints: this always contains at least one element
...
// effects: if this contains x and this.size > 1, removes x from this void remove(int x); }
Type NotSoFatSet with additional method
// effects: removes x from this
void reallyRemove(int x) is not a subtype of FatSet. Even though there is no problem with any method of FatSet — reallyRemove is a new method, so the rules about corresponding methods do not apply — this method violates the constraint.

If the subtype object is considered purely as a supertype object (that is, only the supertype methods and fields are queried), then the result should be the same as if an object of the supertype had been manipulated all along instead.

signatures: this is essentially the contravariant and covariant rules above. (A procedure’s signature is its name, argument types, return types, and exceptions.)

methods: this is constraints on the behavior, or all aspects of a specification that cannot be expressed in a signature

Java types are classes, interfaces, or primitives. Java has its own notion of subtype (which involves only classes and interfaces). This is a weaker notion than the true subtyping described above; Java subtypes do not necessarily satisfy the substitution principle. Further, a subtype definition that satisfies the substitution principle may not be allowed in Java, and will not compile.
In order for a type to be a Java subtype of another type, the relationship must be declared (via Java’s extends or implements syntax), and the methods must satisfy two properties similar to, but weaker than, those for true subtyping:

1.for each method in the supertype, the subtype must have a corresponding method. (The subtype is allowed to introduce additional, new methods that do not appear in the supertype.)

2.each method in subtype that corresponds to one in the supertype
•the arguments must have the same types
•the result must have the same type
•there are no more declared exceptions

Java has no notion of a behavioral specification, so it performs no such checks and can make no guarantees about behavior. The requirement of type equality for arguments and result is stronger than strictly necessary to guarantee type-safety. This prohibits some code we might like to write. However, it simplifies the Java language syntax and semantics.

4.1 Example: Square and rectangle


We know from elementary school that every square is a rectangle. Suppose we wanted to make Square a subtype of Rectangle, which included a setSize method:

class Rectangle {
...
// effects: sets width and height to the specified values
// (that is, this.width’ = w && this.height’ = h)
void setSize(int w, int h);
}

class Square extends Rectangle {
...
}

Which of the following methods is right for Square?

// requires: w = h
void setSize(int w, int h);
void setSize(int edgeLength);
// throws BadSizeException if w != h
void setSize(int w, int h) throws BadSizeException;
7
5

The first one isn’t right because the subclass method requires more than the superclass method. Thus, subclass objects can’t be substituted for superclass objects, as there might be code that called setSize with non-equal arguments.

The second one isn’t right (all by itself) because the subclass still must specify a behavior for setSize(int, int); that definition is of a different method (whose name is the same but whose signature differs).

The third one isn’t right because it throws an exception that the superclass doesn’t mention. Thus, it again has different behavior and so Squares can’t be substituted for Rectangles. If BadSizeException is an unchecked exception, then Java will permit the third method to compile; but then again, it will also permit the first method to compile. Therefore, Java’s notion of subtype is weaker than the 6.170 notion of subtype.

There isn’t a way out of this quandary without modifying the supertype. Sometimes subtypes do not accord with our intuition! Or, our intuition about what is a good supertype is wrong.
One plausible solution would be to change Rectangle.setSize to specify that it throws the exception; of course, in practice only Square.setSize would do so. Another solution would be to eliminate setSize and instead have a

void scale(double scaleFactor);
method that shrinks or grows a shape. Other solutions are also possible.

Java subclassing


Subclassing has a number of advantages, all of which stem from reuse:
•Implementations of subclasses need not repeat unchanged fields and methods, but can reuse those of the superclass
•Clients (callers) need not change code when new subtypes are added, but can reuse the existing code (which doesn’t mention the subtypes at all, just the supertype
•The resulting design has better modularity and reduced complexity, because designers, implementers,
and users only have to understand the supertype, not every subtype; this is specification reuse.
A key mechanism that enables these benefits is overriding, which specializes behavior for some methods. In the absence of overriding, any change to behavior (even a compatible one) could force a complete reimplementation. Overriding permits part of an implementation to be changed without changing other parts that depend on it. This permits more code and specification reuse, both by the implementation and the client.

A potential disadvantage of subclassing is the opportunities it presents for inappropriate reuse. Subclasses and superclasses may depend on one another (explicitly by type name or implicitly by knowledge of the implementation), particularly since subclasses have access to the protected parts of the superclass implementation. These extra dependences complicate the MDD, the design, and the implementation, making it harder to code, understand, and modify.  

Converting Non-Unicode Text in java language


In the Java programming language char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium Web site .
Few text editors currently support Unicode text entry. The text editor we used to write this section's code examples supports only ASCII characters, which are limited to 7 bits. To indicate Unicode characters that cannot be represented in ASCII, such as ö, we used the\uXXXX escape sequence. Each X in the escape sequence is a hexadecimal digit. The following example shows how to indicate the ö character with an escape sequence:


String str = "\u00F6";
char c = '\u00F6';
Character letter = new Character('\u00F6'); 
 
A variety of character encodings are used by systems around the world. Currently few of these encodings conform to Unicode. Because your program expects characters in Unicode, the text data it gets from the system must be converted into Unicode, and vice versa. Data in text files is automatically converted to Unicode when its encoding matches the default file encoding of the Java Virtual Machine. You can identify the default file encoding by creating an OutputStreamWriter using it and asking for its canonical name:


OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
System.out.println(out.getEncoding()); 
 
If the default file encoding differs from the encoding of the text data you want to process, then you must perform the conversion yourself. You might need to do this when processing text from another country or computing platform.
This section discusses the APIs you use to translate non-Unicode text into Unicode. Before using these APIs, you should verify that the character encoding you wish to convert into Unicode is supported. The list of supported character encodings is not part of the Java programming language specification. Therefore the character encodings supported by the APIs may vary with platform. To see which encodings the Java Development Kit supports, see the Supported Encodings document.
The material that follows describes two techniques for converting non-Unicode text to Unicode. You can convert non-Unicode byte arrays into String objects, and vice versa. Or you can translate between streams of Unicode characters and byte streams of non-Unicode text.

Unicode Escapes

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit of
the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.

UnicodeInputCharacter:
    UnicodeEscape
    RawInputCharacter

UnicodeEscape:
    \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

UnicodeMarker:
    u
    UnicodeMarker u

RawInputCharacter:
    any Unicode character

HexDigit: one of
    0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The \u, and hexadecimal digits here are all ASCII characters.
In addition to the processing implied by the grammar, for each raw input character that is a backslash \, input processing must consider how many other \ characters contiguously precede it, separating it from a non-\ character or the start of the input stream. If this number is even, then the \ is eligible to begin a Unicode escape; if the number is odd, then the \ is not eligible to begin a Unicode escape.
For example, the raw input "\\u2297=\u2297" results in the eleven characters " \ \ u 2 2 9 7 = ⊗ " (\u2297 is the Unicode encoding of the character ⊗).
If an eligible \ is not followed by u, then it is treated as a RawInputCharacter and remains part of the escaped Unicode stream.
If an eligible \ is followed by u, or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.
The character produced by a Unicode escape does not participate in further Unicode escapes.
For example, the raw input \u005cu005a results in the six characters \ u 0 0 5 a, because 005c is the Unicode value for \. It does not result in the character Z, which is Unicode character 005a, because the \ that resulted from the \u005c is not interpreted as the start of a further Unicode escape.
The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.
This transformed version is equally acceptable to a Java compiler and represents the exact same program. The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u, while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.
A Java compiler should use the \uxxxx notation as an output format to display Unicode characters when a suitable font is not available. 

Unicode escapes in Java source code


java.util.regex
Class Pattern

java.lang.Object
  extended by java.util.regex.Pattern
All Implemented Interfaces:
Serializable

public final class Pattern
extends Object
implements Serializable
A compiled representation of a regular expression. 
A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern. 
A typical invocation sequence is thus 
 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();
matches method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement 
 boolean b = Pattern.matches("a*b", "aaaaab");
is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused. Instances of this class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use. 

Summary of regular-expression constructs


ConstructMatches
Characters
xThe character x
\\The backslash character
\0nThe character with octal value 0n (0 <= n <= 7)
\0nnThe character with octal value 0nn (0 <= n <= 7)
\0mnnThe character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhhThe character with hexadecimal value 0xhh
\uhhhhThe character with hexadecimal value 0xhhhh
\tThe tab character ('\u0009')
\nThe newline (line feed) character ('\u000A')
\rThe carriage-return character ('\u000D')
\fThe form-feed character ('\u000C')
\aThe alert (bell) character ('\u0007')
\eThe escape character ('\u001B')
\cxThe control character corresponding to x
Character classes
[abc]ab, or c (simple class)
[^abc]Any character except ab, or c (negation)
[a-zA-Z]a through z or A through Z, inclusive (range)
[a-d[m-p]]a through d, or m through p[a-dm-p] (union)
[a-z&&[def]]de, or f (intersection)
[a-z&&[^bc]]a through z, except for b and c[ad-z] (subtraction)
[a-z&&[^m-p]]a through z, and not m through p[a-lq-z](subtraction)
Predefined character classes
.Any character (may or may not match line terminators)
\dA digit: [0-9]
\DA non-digit: [^0-9]
\sA whitespace character: [ \t\n\x0B\f\r]
\SA non-whitespace character: [^\s]
\wA word character: [a-zA-Z_0-9]
\WA non-word character: [^\w]
POSIX character classes (US-ASCII only)
\p{Lower}A lower-case alphabetic character: [a-z]
\p{Upper}An upper-case alphabetic character:[A-Z]
\p{ASCII}All ASCII:[\x00-\x7F]
\p{Alpha}An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}A decimal digit: [0-9]
\p{Alnum}An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}A visible character: [\p{Alnum}\p{Punct}]
\p{Print}A printable character: [\p{Graph}\x20]
\p{Blank}A space or a tab: [ \t]
\p{Cntrl}A control character: [\x00-\x1F\x7F]
\p{XDigit}A hexadecimal digit: [0-9a-fA-F]
\p{Space}A whitespace character: [ \t\n\x0B\f\r]
java.lang.Character classes (simple java character type)
\p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored}Equivalent to java.lang.Character.isMirrored()
Classes for Unicode blocks and categories
\p{InGreek}A character in the Greek block (simple block)
\p{Lu}An uppercase letter (simple category)
\p{Sc}A currency symbol
\P{InGreek}Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)
Boundary matchers
^The beginning of a line
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input
Greedy quantifiers
X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times
Reluctant quantifiers
X??X, once or not at all
X*?X, zero or more times
X+?X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n,m}?X, at least n but not more than m times
Possessive quantifiers
X?+X, once or not at all
X*+X, zero or more times
X++X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n,m}+X, at least n but not more than m times
Logical operators
XYX followed by Y
X|YEither X or Y
(X)X, as a capturing group
Back references
\nWhatever the nth capturing group matched
Quotation
\Nothing, but quotes the following character
\QNothing, but quotes all characters until \E
\ENothing, but ends quoting started by \Q
Special constructs (non-capturing)
(?:X)X, as a non-capturing group
(?idmsux-idmsux) Nothing, but turns match flags on - off
(?idmsux-idmsux:X)  X, as a non-capturing group with the given flags on - off
(?=X)X, via zero-width positive lookahead
(?!X)X, via zero-width negative lookahead
(?<=X)X, via zero-width positive lookbehind
(?<!X)X, via zero-width negative lookbehind
(?>X)X, as an independent, non-capturing group



Backslashes, escapes, and quoting

The backslash character ('\') serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \\ matches a single backslash and \{ matches a left brace. 
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct. 
Backslashes within string literals in Java source code are interpreted as required by the Java Language Specification as either Unicode escapes or other character escapes. It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary. The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used. 

Character Classes

Character classes may appear within other character classes, and may be composed by the union operator (implicit) and the intersection operator (&&). The union operator denotes a class that contains every character that is in at least one of its operand classes. The intersection operator denotes a class that contains every character that is in both of its operand classes. 
The precedence of character-class operators is as follows, from highest to lowest: 
1    Literal escape    \x
2    Grouping[...]
3    Rangea-z
4    Union[a-e][i-u]
5    Intersection[a-z&&[aeiou]]
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter. 

Line terminators

line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators: 
If UNIX_LINES mode is activated, then the only line terminators recognized are newline characters. 
The regular expression . matches any character except a line terminator unless the DOTALL flag is specified. 
By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence. 

Groups and capturing

Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups: 

1    ((A)(B(C)))
2    (A)
3    (B(C))
4    (C)
Group zero always stands for the entire expression. 
Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete. 
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match. 
Groups beginning with (? are pure, non-capturing groups that do not capture text and do not count towards the group total. 

Unicode support

This class is in conformance with Level 1 of Unicode Technical Standard #18: Unicode Regular Expression Guidelines, plus RL2.1 Canonical Equivalents. 
Unicode escape sequences such as \u2014 in Java source code are processed as described in §3.3 of the Java Language Specification. Such escape sequences are also implemented directly by the regular-expression parser so that Unicode escapes can be used in expressions that are read from files or from the keyboard. Thus the strings "\u2014" and "\\u2014", while not equal, compile into the same pattern, which matches the character with hexadecimal value 0x2014
Unicode blocks and categories are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property. Blocks are specified with the prefix In, as in InMongolian. Categories may be specified with the optional prefix Is: Both \p{L} and \p{IsL} denote the category of Unicode letters. Blocks and categories can be used both inside and outside of a character class. 
The supported categories are those of The Unicode Standard in the version specified by the Character class. The category names are those defined in the Standard, both normative and informative. The block names supported by Pattern are the valid block names accepted and defined by UnicodeBlock.forName
Categories that behave like the java.lang.Character boolean ismethodname methods (except for the deprecated ones) are available through the same \p{prop} syntax where the specified property has the name javamethodname