|
|
Java String.split() city,state,zip
Last post 11-20-2008, 1:35 PM by tcrisera. 9 replies.
-
11-18-2008, 1:18 PM |
-
tcrisera
-
-
-
Joined on 11-18-2008
-
-
Posts 5
-
-
|
Java String.split() city,state,zip
I'm am pulling values out of a xsl, csv or tab delimited file that may include a column that represents the city, state and zip portions of an address
I would like a regex that can be provided to the split() method of a Java String to split into the 3 parts. My problem is inconsistency of commas and spaces following city name and state code. i.e There may be a comma & space or just a comma following city name. There may or may not be a comma after the state code. Of course city names may have one or more spaces - New York
|
|
-
11-18-2008, 1:25 PM |
-
prometheuzz
-
-
-
Joined on 04-28-2008
-
-
Posts 659
-
-
|
Re: Java String.split() city,state,zip
Have you considered a CSV parser? If so, and still decide to go for regex, could you post a couple of examples?
|
|
-
11-19-2008, 10:56 AM |
-
tcrisera
-
-
-
Joined on 11-18-2008
-
-
Posts 5
-
-
|
Re: Java String.split() city,state,zip
I'm not having any problem getting the data out of csv, txt or xsl for each column. My problem is that there are variable field mappings the client allows and one possible one is a column containing city, state and zip data. I want to break these out to store in a db separately and there have been discrepencies. I can write some String parsing logic myself in Java but wondered if I could just use split(regex) with an expression that could handle this. Here are some variations: Brooklyn, NY 11204 Brooklyn,NY 11204 Brooklyn,NY,11204 Brooklyn, NY, 11204 New York, NY 10010 so there may or may not be a comma, may be comma followed by space, may have space between parts of city name, etc. Sorry, but not very familiar with regex yet. Thanks
|
|
-
11-19-2008, 11:04 AM |
-
prometheuzz
-
-
-
Joined on 04-28-2008
-
-
Posts 659
-
-
|
Re: Java String.split() city,state,zip
tcrisera:I'm not having any problem getting the data out of csv, txt or xsl for each column. My problem is that there are variable field mappings the client allows and one possible one is a column containing city, state and zip data. I want to break these out to store in a db separately and there have been discrepencies. I can write some String parsing logic myself in Java but wondered if I could just use split(regex) with an expression that could handle this. Here are some variations: Brooklyn, NY 11204 Brooklyn,NY 11204 Brooklyn,NY,11204 Brooklyn, NY, 11204 New York, NY 10010
What is that supposed to be? Your input? And how should the output look like? You need to explain yourself a lot better than this! Also read the posting guidelines from this forum: http://regexadvice.com/forums/thread/47451.aspx which explains what is needed from you in order to get a meaningful answer.
tcrisera: so there may or may not be a comma, may be comma followed by space, may have space between parts of city name, etc. Sorry, but not very familiar with regex yet. Thanks
Well, if you don't know any regex at all, then I can only suggest you to learn something about it. If you don't have the time, then writing your own String parsing logic in Java would be the way to go.
Sun has written an excellent tutorial: http://java.sun.com/docs/books/tutorial/essential/regex/ Good luck.
|
|
-
11-19-2008, 1:19 PM |
-
tcrisera
-
-
-
Joined on 11-18-2008
-
-
Posts 5
-
-
|
Re: Java String.split() city,state,zip
I suppose I'll have to write my own parsing logic. All I know is the Java API for String split() shows split(String regex) returning a String array. That's my output. An array where each element contains a portion of the original String delimited by the input (regex). Very straightforward to use. String s = "111-22-3333"; String[] sa = s.split("-"); sa[0] is "111", sa[1] is 22, sa[2] is 3333 But I can't find any examples of this method with anything other than a single delimiter character. I wanted a 3 element String array that contained city, state, zip respectively. I assumed a more involved regex could split that on a variety of delimiters. Thanks anyway.
|
|
-
11-19-2008, 7:19 PM |
|
|
Re: Java String.split() city,state,zip
You don't necessarily need to give up on a regex-based solution and write your own parser, but I think you need to understand what we are asking for. Before you can write your own parser, you need to fully understand the text that you will be parsing and develop a comprehensive set of 'rules' as to how to determine the boundaries between the various elements of your text. We are simply asking for those rules as well as a reasonable example of the text that you need to parse - preferably one that shows the various combinations you need to deal with. As is often the case with problems, you can only solve them once they are properly defined. The problem you solve will be the one you define so if you mis-define (or don't fully define) the issue, then the solution (in whatever form) will probably not work. Once you have the definition, the solution is often straight-forward in whatever form it may take. Susan
|
|
-
11-20-2008, 12:37 PM |
-
tcrisera
-
-
-
Joined on 11-18-2008
-
-
Posts 5
-
-
|
Re: Java String.split() city,state,zip
Ok, thanks Susan. I have a dozen other things to work on or I probably would have done this already. It probably will only save me 20-30 lines of code and half hour at most, but I'm always interested in learning something and keeping code concise. Sorry if the rules weren't clear from original posts. 1) The first portion of the String (city) will always terminate with a comma. This portion may contain spaces.
2) The first comma may or may not be followed by a space before the state portion. Examples: Brooklyn, NY Brookly,NY New Haven, CT 3) The second portion of the String (state) may or may not be terminated by a comma. If it is, it may or may not also be followed by a space before zip portion. 4) If state is not terminated with a comma it must be followed with a space. 5) There will be a 3rd portion with some character data, (validation not required for proper zipcode data) Examples: Brooklyn, NY, 11204 Brookly,NY 11204 New Haven, CT,000
If anything breaks these rules, the value is not valid. (Actually if the parts are terminated with a comma, it doesn't matter whether it is followed by I space, since I'll trim the result anyway. So if that makes it easier, portion 1 WILL be terminated by a comma, portion 2, by a space or comma) Thanks
|
|
-
11-20-2008, 1:04 PM |
-
prometheuzz
-
-
-
Joined on 04-28-2008
-
-
Posts 659
-
-
|
Re: Java String.split() city,state,zip
Yes, that explanation looks more like it!
Try this: String[] tests = { "Brooklyn, NY 11204", "Brooklyn,NY 11204", "Brooklyn,NY,11204", "Brooklyn, NY, 11204", "New York, NY 10010" }; for(String t: tests) { String[] tokens = t.split(",\\s*|\\s(?=\\d)"); System.out.println(java.util.Arrays.toString(tokens)); }
Look at the API docs for the java.util.regex.Pattern class to see what the regex does (all functionality is described there). If after studying it for a bit you have a question, feel free to post back. Good luck.
|
|
-
11-20-2008, 1:06 PM |
-
mash
-
-
-
Joined on 04-14-2005
-
Birmingham, AL
-
Posts 1,965
-
-
|
Re: Java String.split() city,state,zip
For the given criteria assuming each input is processed separately.
Raw Match Pattern: ,\x20?|\x20(?=\d)
Java Code Example: import java.util.regex.Pattern; class Module1{ public static void main(String[] asd){ String sourcestring = "source string to match with pattern"; Pattern re = Pattern.compile(",\\x20?|\\x20(?=\\d)"); String[] parts = re.split(sourcestring); for(int partsIdx = 0; partsIdx < parts.length; partsIdx++ ){ System.out.println( "[" + partsIdx + "] = " + parts[partsIdx]); } } }
$matches Array: ( [0] => Brooklyn [1] => NY [2] => 11204 )
Michael "In theory, theory and practice are the same. In practice, they are not." Albert Einstein
|
|
-
11-20-2008, 1:35 PM |
-
tcrisera
-
-
-
Joined on 11-18-2008
-
-
Posts 5
-
-
|
Re: Java String.split() city,state,zip
Thanks all, both worked great!
|
|
|
|
|