Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

validate UK Postcode

Last post 04-11-2012, 6:33 AM by JoeBlack999. 3 replies.
Sort Posts: Previous Next
  •  04-04-2012, 1:09 PM 84890

    validate UK Postcode

    hi, I need to validate UK Postcode in my project, so I found some regex in 

    http://regexlib.com/Search.aspx?k=uk%20postcode

    and I tried two here:

    const wregex postCodeUKPat(L"\\s*(?<O>(?<d>[BEGLMNS]|A[BL]|B[ABDHLNRST]|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]|F[KY]|G[LUY]|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]|M[EKL]|N[EGNPRW]|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKL-PRSTWY]|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)(?<a>\\d\\d?)|(?<d>E)(?<a>\\dW)|(?<d>EC)(?<a>\\d[AMNPRVY0])|(?<d>N)(?<a>\\dP)|(?<d>NW)(?<a>\\dW)|(?<d>SE)(?<a>\\dP)|(?<d>SW)(?<a>\\d[AEHPVWXY])|(?<d>W)(?<a>1[0-4A-DFGHJKSTUW])|(?<d>W)(?<a>[2-9])|(?<d>WC)(?<a>[12][ABEHNRVX]))\\ (?<I>(?<s>\\d)(?<u>[ABD-HJLNP-UW-Z]{2}))\\s*");

    const wregex postCodeUKPat2(L"\\s*\\b([A-PR-UWYZa-pr-uwyz]([0-9]{1,2}|([A-HK-Ya-hk-y][0-9]|[A-HK-Ya-hk-y][0-9]([0-9]|[ABEHMNPRV-Yabehmnprv-y]))|[0-9][A-HJKS-UWa-hjks-uw])\\ {0,1}[0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}|([Gg][Ii][Rr]\\ 0[Aa][Aa])|([Ss][Aa][Nn]\\ {0,1}[Tt][Aa]1)|([Bb][Ff][Pp][Oo]\\ {0,1}([Cc]\\/[Oo]\\ )?[0-9]{1,4})|(([Aa][Ss][Cc][Nn]|[Bb][Bb][Nn][Dd]|[BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]|[Ss][Tt][Hh][Ll]|[Tt][Dd][Cc][Uu]|[Tt][Kk][Cc][Aa])\\ {0,1}1[Zz][Zz]))\\b\\s*");

    i am using VC++, and the first one gave me a runtime error, and second one couldnt validate a legitimate UK postcode, e.g. "SWIV 2BJ"; so any way solve the problem?

    cheers

    daiyue  

  •  04-04-2012, 8:06 PM 84895 in reply to 84890

    Re: validate UK Postcode

    I'm taking a guess here, but a quick check of the regex pattern syntax for the TR1" expressions (following the Google trail from the "wregex" keyword - I hope it led to a relevant topic) shows that it does not support the named matching groups syntax that you have used (it works with the .NET regex syntax). I would suggest that you can probably get it to at least be accepted if you remove the '(?<x>'  structures (remember to keep the '(' part as it marks the beginning of the group definition).

    However this also makes it a little harder to retrieve the information that is captured by the pattern. One advantage of named capture groups is that they generally allow access to multiple groups that share the same name. In other words, that pattern contains several groups that are called "d" - you can retrieve the captured characters by requesting the "d" capture no matter which one actually did the work. However if you only have numbered groups, then you need to check each possible group number to see which actually made the match.

    On the other hand, if all you are looking for is a "yes/no" validation, then you don't really care about extracting the individual parts so this is not a problem.

    The second pattern would seem to be overly complex in that it uses lots of character class definitions just to allow for case insensitive matches - instead you can use the "case insensitive" match (I think the "regex_constants::icase" is appropriate in your case but no guarantee). Therefore sequences such as:

    [BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]

    can be replaced by:

    [BFS]IQQ|PCRN

    as long as you have the "case insensitive" match option set.

    I have no idea of what the "rules" are as to what makes up a valid UK postcode and so I can't comment on the correctness of the pattern. However, from what I can see, at a top level there are 5 basic alternatives. 4 of those seem to be specific letter combinations such as "GIR 0AA' and "SAN TA1".

    Looking at the first major alternative, it can be broken down something like:

    [A-PR-UWYZ]
    (
       \d{1,2}
       |
       (
        [A-HK-Y]\d
        |
        [A-HK-Y]\d(\d|[ABEHMNPRV-Y])
       )
       |
       \d[A-HJKS-UW]
    )
    \ ?\d[ABD-HJLNP-UW-Z]{2}

    Therefore the "S" will match the first character set definition. The next part of the pattern is 3 alternatives: the first can't match as it is 2 digits and the last can't match as it requires 1 leading digit. The middle alternative is itself 2 alternatives but can be re-written from 

     [A-HK-Y]\d|[A-HK-Y]\d(\d|[ABEHMNPRV-Y])

    to 

    [A-HK-Y]\d(\d|[ABEHMNPRV-Y])?

    All I have done is to factor out the initial parts of each alternative (which are the same) and make the remaining pattern of the 2nd alternative optional.

    From this we can see that the "W" will match the initial character set definition.

    Now we get to the problem part. The character after the "W" is "I" but the pattern requires that it be a digit. Therefore your text will fail to match the pattern.

    I have no idea what the rules are to allow a valid UK postcode so I can't say how to fix this.

    Susan

  •  04-06-2012, 6:52 PM 84910 in reply to 84895

    Re: validate UK Postcode

    hi, susan, your advice has been very helpful, thx.

    cheers

    daiyue 

  •  04-11-2012, 6:33 AM 84949 in reply to 84890

    Re: validate UK Postcode

    I've been using

     

    ([^QVXqvx0-9-\s][0-9]|[^QVXqvx0-9-\s]{2}[0-9]{2}|[^QVXqvx0-9-\s][^IJZijz0-9-\s][0-9]|[^QVXqvx0-9-\s][^IJZijz0-9-\s]([0-9]{2})|[^QVXqvx0-9-\s][0-9][ABCDEFGHJKSTUWabcdefghjkstuw]|[^QVXqvx0-9-\s]([^IJZijz0-9-\s][0-9][ABEHMNPRVWXYabehmnprvwxy]))(\s)([0-9]{1}([^CIKMOVcikmov0-9-\s]{2}))

     

    .. not sure SWIV 2BJ is a valid uk postcode.

View as RSS news feed in XML