Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Java String.split() city,state,zip

Last post 11-20-2008, 1:35 PM by tcrisera. 9 replies.
Sort Posts: Previous Next
  •  11-18-2008, 1:18 PM 48460

    Java String.split() city,state,zip

    I'm am pulling values out of a xsl, csv or tab delimited file that may include a column that represents the city, state and zip portions of an address

    I would like a regex that can be provided to the split() method of a Java String to split into the 3 parts.   My problem is inconsistency of commas and spaces following city name and state code. i.e There may be a comma & space or just a comma following city name.  There may or may not be a comma after the state code. Of course city names may have one or more spaces - New York

  •  11-18-2008, 1:25 PM 48461 in reply to 48460

    Re: Java String.split() city,state,zip

    Have you considered a CSV parser?

    If so, and still decide to go for regex, could you post a couple of examples?

  •  11-19-2008, 10:56 AM 48511 in reply to 48460

    Re: Java String.split() city,state,zip

    I'm not having any problem getting the data out of csv, txt or xsl for each column.  My problem is that there are variable field mappings the client allows and one possible one is a column containing city, state and zip data.  I want to break these out to store in a db separately and there have been discrepencies.  I can write some String parsing logic myself in Java but wondered if I could just use split(regex) with an expression that could handle this.  Here are some variations:

    Brooklyn, NY 11204

    Brooklyn,NY 11204

    Brooklyn,NY,11204

    Brooklyn, NY, 11204

    New York, NY 10010

     

    so there may or may not be a comma, may be comma followed by space, may have space between parts of city name, etc.  Sorry, but not very familiar with regex yet.

    Thanks

  •  11-19-2008, 11:04 AM 48512 in reply to 48511

    Re: Java String.split() city,state,zip

    tcrisera:

    I'm not having any problem getting the data out of csv, txt or xsl for each column.  My problem is that there are variable field mappings the client allows and one possible one is a column containing city, state and zip data.  I want to break these out to store in a db separately and there have been discrepencies.  I can write some String parsing logic myself in Java but wondered if I could just use split(regex) with an expression that could handle this.  Here are some variations:

    Brooklyn, NY 11204

    Brooklyn,NY 11204

    Brooklyn,NY,11204

    Brooklyn, NY, 11204

    New York, NY 10010

    What is that supposed to be? Your input? And how should the output look like? You need to explain yourself a lot better than this!

    Also read the posting guidelines from this forum: http://regexadvice.com/forums/thread/47451.aspx which explains what is needed from you in order to get a meaningful answer.

    tcrisera:

    so there may or may not be a comma, may be comma followed by space, may have space between parts of city name, etc.  Sorry, but not very familiar with regex yet.

    Thanks

    Well, if you don't know any regex at all, then I can only suggest you to learn something about it. If you don't have the time, then writing your own String parsing logic in Java would be the way to go.

    Sun has written an excellent tutorial:

    http://java.sun.com/docs/books/tutorial/essential/regex/

    Good luck.

  •  11-19-2008, 1:19 PM 48517 in reply to 48512

    Re: Java String.split() city,state,zip

    I suppose I'll have to write my own parsing logic.  All I know is the Java API for String split() shows   split(String regex) returning a String array.  That's my output.   An array where each element contains a portion of the original String delimited by the input (regex).   Very straightforward to use.

    String s = "111-22-3333";

    String[] sa = s.split("-");

    sa[0] is "111", sa[1] is 22, sa[2] is 3333    But I can't find any examples of this method with anything other than a single delimiter character.  I wanted a 3 element String array that contained city, state, zip respectively.  I assumed a more involved regex could split that on a variety of delimiters.   Thanks anyway.

  •  11-19-2008, 7:19 PM 48536 in reply to 48517

    Re: Java String.split() city,state,zip

    You don't necessarily need to give up on a regex-based solution and write your own parser, but I think you need to understand what we are asking for.

    Before you can write your own parser, you need to fully understand the text that you will be parsing and develop a comprehensive set of 'rules' as to how to determine the boundaries between the various elements of your text. We are simply asking for those rules as well as a reasonable example of the text that you need to parse - preferably one that shows the various combinations you need to deal with.

    As is often the case with problems, you can only solve them once they are properly defined. The problem you solve will be the one you define so if you mis-define (or don't fully define) the issue, then the solution (in whatever form) will probably not work. Once you have the definition, the solution is often straight-forward in whatever form it may take.

    Susan

  •  11-20-2008, 12:37 PM 48567 in reply to 48536

    Re: Java String.split() city,state,zip

    Ok, thanks Susan.

    I have a dozen other things to work on or I probably would have done this already.  It probably will only save me 20-30 lines of code and half hour at most, but I'm always interested in learning something and keeping code concise.  Sorry if the rules weren't clear from original posts.

    1) The first portion of the String (city) will always  terminate with a comma.  This portion may contain spaces.

    2) The first comma may or may not be followed by a space before the state portion.

    Examples:    Brooklyn, NY      Brookly,NY     New Haven, CT

    3) The second portion of the String (state) may or may not be terminated by a comma.  If it is, it may or may not also be followed by a space before zip portion.

    4) If state is not terminated with a comma it must be followed with a space.

    5) There will be a 3rd portion with some character data, (validation not required for proper zipcode data)

     Examples:    Brooklyn, NY, 11204      Brookly,NY 11204    New Haven, CT,000

    If anything breaks these rules, the value is not valid.  (Actually if the parts are terminated with a comma, it doesn't matter whether it is followed by I space, since I'll trim the result anyway. So if that makes it easier, portion 1 WILL be terminated by a comma, portion 2, by a space or comma)

    Thanks

  •  11-20-2008, 1:04 PM 48568 in reply to 48567

    Re: Java String.split() city,state,zip

    Yes, that explanation looks more like it!

    Try this:

    String[] tests = {
      "Brooklyn, NY 11204",
      "Brooklyn,NY 11204",
      "Brooklyn,NY,11204",
      "Brooklyn, NY, 11204",
      "New York, NY 10010"
    };
    for(String t: tests) {
      String[] tokens = t.split(",\\s*|\\s(?=\\d)");
      System.out.println(java.util.Arrays.toString(tokens));
    }

    Look at the API docs for the java.util.regex.Pattern class to see what the regex does (all functionality is described there). If after studying it for a bit you have a question, feel free to post back.

    Good luck.

  •  11-20-2008, 1:06 PM 48569 in reply to 48567

    Re: Java String.split() city,state,zip

    For the given criteria assuming each input is processed separately.

    Raw Match Pattern:
    ,\x20?|\x20(?=\d)

    Java Code Example:

    import java.util.regex.Pattern;
    class Module1{
    public static void main(String[] asd){
    String sourcestring = "source string to match with pattern";
    Pattern re = Pattern.compile(",\\x20?|\\x20(?=\\d)");
    String[] parts = re.split(sourcestring);
    for(int partsIdx = 0; partsIdx < parts.length; partsIdx++ ){
    System.out.println( "[" + partsIdx + "] = " + parts[partsIdx]);
    }
    }
    }

    $matches Array:
    (
    [0] => Brooklyn
    [1] => NY
    [2] => 11204
    )

     


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-20-2008, 1:35 PM 48571 in reply to 48569

    Re: Java String.split() city,state,zip

    Thanks all, both worked great!
View as RSS news feed in XML