regex - How to use regular expressions in java to remove certain characters -


general question is: how parse string , eliminate punctuation , replace of them?

i'm trying modify input text. case have normal text file, punctuation , want of them eliminated. if symbol . ! ? ... want replace "" string.

i never used regex , tried string comparison, isn't sufficient cases. have trouble if there 2 punctuation marks; in text "the second day (the 4ht).", when have ). togheter.

for example, given input expect following:

input :  [...] @ it!" speech caused excpected output : @ <s> speech caused 

every word in code added arraylist because need work later.

thanks lot!

fileinputstream fileinputstream = new fileinputstream("text.txt"); inputstreamreader inputstreamreader = new inputstreamreader(         fileinputstream, "utf-8"); bufferedreader bf = new bufferedreader(inputstreamreader);  words.add("<s>"); string s; while ((s = bf.readline()) != null) {     string[] var = s.split(" ");      (int = 0; < var.length; i++) {         if (var[i].endswith(",") || var[i].endswith(")")                 || var[i].endswith("(") || var[i].endswith(":")                  ||  var[i].endswith(";") ||var[i].endswith("'")) {             var[i] = var[i].substring(0, var[i].length() - 1);             words.add(var[i].tolowercase());         } else if ( var[i].startswith("'")) {             var[i] = var[i].substring(1, var[i].length() );             words.add(var[i].tolowercase());         } else if (var[i].endswith(".") || var[i].endswith("...")                 || var[i].endswith("!") || var[i].endswith("?")) {             var[i] = var[i].substring(0, var[i].length() - 1);             words.add(var[i].tolowercase());             words.add("<s>");         } else {             words.add(var[i].tolowercase()); //              // system.out.println("\n neu eingelesenes wort: " + var[i]);         }} } 

first use regex filter out punctuations , split space , add result list:

fileinputstream fileinputstream = new fileinputstream("text.txt"); inputstreamreader inputstreamreader = new inputstreamreader(         fileinputstream, "utf-8"); bufferedreader bf = new bufferedreader(inputstreamreader); words.add("<s>"); string s; while ((s = bf.readline()) != null) {     s = s.replaceall("[^a-za-z ]", ""); // replace non-word/non-space characters empty string     string[] var = s.split(" ");     words.addall(var); } 

Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -