Books
in black and white
Main menu
Share a book About us Home
Books
Biology Business Chemistry Computers Culture Economics Fiction Games Guide History Management Mathematical Medicine Mental Fitnes Physics Psychology Scince Sport Technics
Ads

More Java Pitfalls Share Reactor - Daconta M,C.

Daconta M,C. More Java Pitfalls Share Reactor - Wiley publishing, 2003. - 476 p.
ISBN: 0-471-23751-5
Download (direct link): morejavapitfallssharereactor2003.pdf
Previous << 1 .. 92 93 94 95 96 97 < 98 > 99 100 101 102 103 104 .. 166 >> Next

266 Item 30
String test = "111.63642343422";
String testAnswer = test.replaceAll("(\\.\\d\\d?)\\d+", "$1"); // 111.63 System.out.println("testAnswer=" + testAnswer);
Another simple example that performs grouping operations is a code snippet that performs pig Latin string manipulation shown below. In pig Latin, all strings that start with a nonvowel character move the first character to the end of the string and add the "ay" characters to the end of the string. Since Eagles starts with E, the string operation will be skipped. All of the other strings will be replaced with their Pig latin string manipulation.
String testAnswer="";
String[] s = { "PigLatin", "Eagles", "Redskins", "Giants" };
for (int i=0; i < s.length; i++) {
testAnswer = s[i].replaceAll("^([^aeiouAEIOU])(.+)", "$2$1ay"); System.out.println("testAnswer=" + testAnswer);
}
A big problem for developers with regular expressions on Unix systems is inconsistent behavior with its tools, specifically ed, ex, vi, sed, awk, grep, and egrep. Different conventions often lead to unpredictable behavior that requires lots of patience and work to better understand the pattern syntax of their regular expression libraries. Users often have problems when describing patterns and recognizing the context in which they appear. These same problems exist in Java applications that use third-party regular expression libraries. Hopefully, the implementation of regular expressions in the latest Java SDK will address this and provide pattern consistency across applications and different platforms.
The regular expression libraries that shipped with the Merlin release were a long-awaited addition to an already powerful enterprise programming language. Pattern matching and replacement should unleash a great deal of flexibility in Java development efforts and will facilitate Web development in the future. With regular expressions, metacharacter implementations in patterns will improve text range matching and make text processing a much more pleasant experience.
According to the published Java 2 Standard Edition APIs, the Java Regular Expressions API does not support the following operations that are supported by the Perl 5 scripting language:
Conditional constructs. (?{X}).
Embedded code constructs. (?{code}).
Embedded comment syntax. To parse comments from a string, the?#comment is used.
Preprocessing operations. This includes the implementation of the "\l \L \u \U" constructs. To perform lowercase and uppercase operations on an entire string, the \L and \U are used. To perform lowercase and uppercase on the next character in a string, the \l and \u constructs need to be used.
Also, the constructs that are supported by the Java Regular Expressions class that are not supported by the Perl 5 scripting language are as follows:
Possessive quantifiers. The inability to backtrack to another operation when a condition has been met. This results in a greedy match operation.
Form Validation Using Regular Expressions 267
Character class operator precedence. Literal escape, Grouping, Range, Union, Intersection.
The validation code shown above is okay for parsing user input, but there are cases where you might want to parse text within a Web page. The code below is used to strip meta data from all HTML pages that are spidered. On line 166, the Pattern class is used to set up the string pattern to parse on the page. There are two tag elements that are part of the pattern, meta name and content. The Matcher class is given the Web page content in the pageOutput string, and all of the items are stripped out using the new split method of the String class.
163 // strip out metadata ------------------------ ---- 2
164 StringBuffer metadata = new StringBuffer();
165
166 Pattern p = Pattern.compile("<meta name=\"XX data1\" 2
(CONTENT|content)=\"(.*)");
167 Matcher m = p.matcher(pageOutput);
168 int z;
169 if (m.find()) {
170 String[] sw1 = m.group(0).split("[\"]");
171 String[] data1 = sw1[3].split("[,]");
172 for (z=0; z < data1.length; z++)
173 metadata.append("<data1>" + data1[z] + "</data1>");
174 }
175
176 p = Pattern.compile("<meta name=\"XX.data2\" 2
(CONTENT|content)=\"(.*)");
177 m = p.matcher(pageOutput);
178 if (m.find()) {
179 String[] sw2 = m.group(0).split("[\"]");
180 String[] data2 = sw2[3].split("[,]");
181 for (z=0; z < data2.length; z++)
182 metadata.append("<data2>" + data2[z] + "</data2>");
183 }
184
185 p = Pattern.compile("<meta name=\"XX.data3\" 2
(CONTENT|content)=\"(.*)");
186 m = p.matcher(pageOutput);
187 if (m.find()) {
188 String[] sw3 = m.group(0).split("[\"]");
189 String[] data3 = sw3[3].split("[,]");
190 for (z=0; z < data3.length; z++)
191 metadata.append("<data2>" + data3[z] + "</data2>");
192 }
193
194 System.out.println("metadata= " + metadata.toString());
Listing 30.1 (continued)
268 Item 30
A new regular expression pattern is used to strip out additional links in the HTML pages to spider on subsequent levels. The <a href></a> pattern is the target expression to be parsed from the text. The (a|A) groupings are used so that both lowercase and uppercase expressions are matched.
Previous << 1 .. 92 93 94 95 96 97 < 98 > 99 100 101 102 103 104 .. 166 >> Next