Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regex needed to parse location with varying city, state zip combos

Last post 02-23-2009, 11:42 AM by JenniC. 8 replies.
Sort Posts: Previous Next
  •  10-26-2008, 3:25 PM 47606

    Regex needed to parse location with varying city, state zip combos

    Thanks for reading my post.  I need some help to develop a regex to handle a querystring value for location that can be any combo of city, state and zip. In addition to that it would also 

    have to handle the fact that some users may enter the location differently, i.e. - with a comma and space after city, with a comma, but no space after city, etc.  Also, what has to be taken into account is the fact that a browser will change a comma to "%2C" and will change the space character to '+", etc.  Also, state can be an abbreviation (e.g. - NY) or fully written out (e.g. - New York)

    Below are the different ways a user may enter the location:

    location=san+jose%2Cca (city,state, no zip - no space after comma)

    location=san+jose%2C+ca  (city, state, no zip - with space after comma)

    location=san+jose+ca  (city, state, no zip - with no comma just a space after city)

    location=san+jose%2C+ca+95131 (city, state and zip - comma and space after city)

    location=san+jose%2C+california

    location=san+jose

    location=california

    location=95131

  •  10-26-2008, 5:05 PM 47607 in reply to 47606

    Re: Regex needed to parse location with varying city, state zip combos

    What platform?
  •  10-26-2008, 6:04 PM 47612 in reply to 47607

    Re: Regex needed to parse location with varying city, state zip combos

    The code will be in PHP. 

    I also forgot to mention that I would need to be able to get the values seperately, i.e. - be able to pull the city, state and zip out of each if they exist.

  •  10-26-2008, 8:04 PM 47614 in reply to 47612

    Re: Regex needed to parse location with varying city, state zip combos

    This is more of an array question than a regex one, but here's the code I came up with, let me know if it fails for any of your stated samples:

    http://pastebin.com/f42cb916b


  •  10-27-2008, 7:14 PM 47669 in reply to 47614

    Re: Regex needed to parse location with varying city, state zip combos

    Hey ddrudik,

    Thanks a ton.  Your function works great!   Any chance you want to take on an extra challenge to figure out how to parse the value currently being returned so that the function will instead return a value or values that relate to directly to each part of the potential value your getting now, e.g. - zip=Value1, state=Value2, city=Value3 ?   If not...no worries, your help so far has been much appreciated.

  •  10-27-2008, 7:19 PM 47670 in reply to 47669

    Re: Regex needed to parse location with varying city, state zip combos

    Please be more specific about what kind of output you would like from the function, now it outputs an array.
  •  10-27-2008, 8:41 PM 47672 in reply to 47670

    Re: Regex needed to parse location with varying city, state zip combos

    Yes, it does return an array, but there is an issue when a user does not enter a comma.  For example, if they enter SAN JOSE CA - it reads it as one string with 1 array value or if they enter SAN JOSE CA 95131...same. 

    However, I think I can create a quick function that will do the following:

    -  If the whole string is numeric and fits a regex for zip code, then thats is all i need and returns that, break

    -  if non-numeric/not zip, then I'll have to parse it to get the state first and match that against the state array

    -  if more text exists, then I have to grab the city name and make my first call to the db - Shouldnt be too expensive since I should only have 2 calls to the db (assuming 90% of city names are 2 words or less).

  •  10-27-2008, 9:58 PM 47674 in reply to 47672

    Re: Regex needed to parse location with varying city, state zip combos

    I will leave you to modify the function as needed, it seems you understand what changes you would like to make and you now have the regex for matching zip code etc.

    location=SAN JOSE CA 95131
    Array ( [city] => SAN JOSE [state] => CALIFORNIA [zip] => 95131 )

     

    location=SAN JOSE CA
    Array ( [city] => SAN JOSE [state] => CALIFORNIA )
    location=SAN JOSE
    Array ( [city] => SAN JOSE )
     

  •  02-23-2009, 11:42 AM 51099 in reply to 47606

    Re: biterscripting: Regex needed to parse location with varying city, state zip combos

    Excellent advice so far.

     Here is a slightly different approach using biterscripting  (http://www.biterscripting.com for free download).

    The wex (word extractor) command extracts words from a string. The characters that separate words are specified in the system variable $wsep. $wsep is by default set to ", " etc. so your example is already covered. All you need to do is to get the first word - that's the city, the second word - that's the state, the third word (if it exists) - that's the zip. biterScripting's wex command will take care of the various combinations of commas, spaces, etc. that you mentioned.

    Get the concepts ?

    The following is the code.

    Let's say we read in the input string into a str variable s (declared as var str s).

    wex -p "1" $s        # Will print city

    wex -p "2" $s        # Will print state

    wex -p "3" $s        # Will print zip, if it is present

     

    To take care of comma vs. %2C, the following simple loop will simplify the input string s.

    while ( { sen -c "%2C" $s } > 0 )  # sen=string enumerator - counts instance of %2C, -c for case insensitive - will also look for %2c

         sal "^%2C^" "," $s >null         # sal = string alterer

     

    Funny thing. I have not even had to use regular expressions - they are even more powerful in biterscripting.

     J.

     

View as RSS news feed in XML