Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

New to RegEx

Last post 08-01-2010, 6:57 PM by Aussie Susan. 12 replies.
Sort Posts: Previous Next
  •  07-28-2010, 11:01 AM 70175

    New to RegEx

    I'm working with VB.NET and I'm working with a RESTful API... They have a construction for selecting resources (Includes) and fields. I've written a wrapper and want to add syntax checking before making any calls. I'm wanting to use RegEX but am struggling to understand and need help.

     

     The calling syntax structure is nested up to 3 levels with field selection for each nested association (Include) as follows:

    Include,Include/Subinclude,Include(field,field,...,field)/Subinclude(field,field,...,field),...,Include(field,field,...,field)/Subinclude(field,field,...,field)/Subinclude(field,field,...,field)

     

     * Fields must be associated with Includes seperated by commas and are only lowercase letters & nubmers

     * Includes are seperated by commas and are title case letters & numbers

     * Includes can specify fields wrapped in braces. eg. Include(field,field,...,field)

     * Includes can be nested with two-sub levels. Subincludes are nested using '/' and are title case letters & numbers. eg. Include/Subinclude/Subinclude

     * Every sub-include can have fields like Includes.

     * There can be any number of each type of element, usage is variable, and Includes can only be nested to 3 levels (Include/Subinclude/Subinclude).

     

    I can code this in VB.NET but am wanting to use RegEx for its speed benefits. Im sure this is dead easy but am not sure how to work with nested multi-level optional elements in RegEx and would really appriciate some help from the gurus...

     

    Many Thanks,

     

    Graeme.

     

  •  07-28-2010, 12:02 PM 70176 in reply to 70175

    Re: New to RegEx

    Here's an example that needs to be parsed:

     

    Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Seller(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)

     

    Forgot to mention that '_' underscore character is ok.

     

    Here's the RegEx expression I've built so far: [0-9a-fA-F]*((.|\n)*?)((|)|,|\)

     

    I'm trying to stripe out all text and build a nested representation. So being able to identify level/type groups would be a bonus...

     

    Thanks again,

     

    G.

  •  07-28-2010, 5:00 PM 70182 in reply to 70176

    Re: New to RegEx

    Raw Match Pattern:
    ^(\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})(,\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})*

    VB.NET Code Example:

    Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "replace with your source string"
        Dim re As Regex = New Regex("^(\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})(,\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})*")
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module


    $matches Array:
    (
        [0] => Array
            (
                [0] => Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Seller(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)
            )

        [1] => Array
            (
                [0] => Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)
            )

        [2] => Array
            (
                [0] => (listing_id,user_id,title,materials,tags,url)
            )

        [3] => Array
            (
                [0] => ,url
            )

        [4] => Array
            (
                [0] => /Images(listing_image_id,url_75x75)
            )

        [5] => Array
            (
                [0] => (listing_image_id,url_75x75)
            )

        [6] => Array
            (
                [0] => ,url_75x75
            )

        [7] => Array
            (
                [0] => ,Subject(user_id,login_name)
            )

        [8] => Array
            (
                [0] => (user_id,login_name)
            )

        [9] => Array
            (
                [0] => ,login_name
            )

        [10] => Array
            (
                [0] => /Profile
            )

        [11] => Array
            (
                [0] => (user_id,login_name)
            )

        [12] => Array
            (
                [0] => ,login_name
            )

    )

     

    You can look in the Captures collection (not shown) to get various parts broken down but that won't show which part is related to others higher in the hierarchy.    It would probably be easier to  take each Include set and reprocess them to get the associated parts.


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  07-28-2010, 6:10 PM 70184 in reply to 70182

    Re: New to RegEx

    Thanks for replying. I'm not sure how to apply you code example above but I do agree with your last statement.   Here's the class representation I'm trying to build:
    Public Class Include
        Public fields As List(Of String)
        Public Includes As Includes(Of Include)
        Sub New()
            Includes = New Includes(Of Include)
        End Sub
    End Class

    Public Class Includes(Of Include) : Implements IEnumerable(Of include)
    #Region "Constructor"
        Public Sub New()
        End Sub
    #End Region


    #Region "Public Properties"
        Private _Includes As New List(Of include)
        Default Public Property Item(ByVal index As Integer) As include
            Get
                Return _Includes(index)
            End Get
            Set(ByVal value As include)
                Try
                    _Includes.Remove(value)
                Catch ex As Exception
                    'Handle Error
                End Try
                _Includes.Add(value)
            End Set
        End Property
    #End Region


    #Region "Pulic Methods & Functions"
        Public Sub Clear()
            _Includes.Clear()
        End Sub


        Public Sub Add(ByVal item As include)
            _Includes.Add(item)
        End Sub


        Public Function Count() As Integer
            Return _Includes.Count
        End Function


        Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of include) _
                        Implements IEnumerable(Of include).GetEnumerator
             Return _Includes.GetEnumerator
        End Function


        Public Function GetEnumerator1() As System.Collections.IEnumerator _
                        Implements IEnumerable.GetEnumerator
            Return _Includes.GetEnumerator
        End Function
    #End Region


    #Region "Boxing"
        Public Shared Narrowing Operator CType(ByVal src As includes(Of include)) As include()
            Dim dest(src.Count) As include
            For i As Integer = 0 To src.Count - 1
                dest(i) = src(i)
            Next
            Return dest
        End Operator


        Public Shared Widening Operator CType(ByVal src As include()) As includes(Of include)
            Dim dest = New includes(Of include)
            For i As Integer = 0 To src.Count - 1
                dest.Add(src(i))
            Next
            Return dest
         End Operator
    #End Region


    End Class


    I guess that I was looking for a magic bullet that would it break down like:

        Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),
        Listing(listing_id)/User(user_id,login_name),
        Buyer(user_id,login_name),
        Seller(user_id,login_name),
        Author(user_id,login_name,feedback_info)/Profile,
        Subject(user_id, login_name)

    Then further down into sub sections, and so on with group names - is this not possble?

  •  07-28-2010, 7:39 PM 70186 in reply to 70184

    Re: New to RegEx

    What about:

    \w+(\((,?\w+)+\)(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?

    Given a variant of your longest example (so as to get to 2 sub-levels) of:

    Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)/SubSub(tester,another),

    the breakdown is as follows:

    Raw Match Pattern:
    \w+(\((,?\w+)+\)(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?

    VB.NET Code Example:

    Imports System.Text.RegularExpressions
    Module Module1
      Sub Main()
        Dim sourcestring as String = "replace with your source string"
        Dim re As Regex = New Regex("\w+(\((,?\w+)+\)(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?")
        Dim mc as MatchCollection = re.Matches(sourcestring)
        Dim mIdx as Integer = 0
        For each m as Match in mc
          For groupIdx As Integer = 0 To m.Groups.Count - 1
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
          Next
          mIdx=mIdx+1
        Next
      End Sub
    End Module


    Matches Found:
    [0][0] = Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)/SubSub(tester,another)
    [0][1] = (listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)/SubSub(tester,another)
    [0][2] = ,url
    [0][3] = /Images(listing_image_id,url_75x75)
    [0][4] = (listing_image_id,url_75x75)
    [0][5] = ,url_75x75
    [0][6] = /SubSub(tester,another)
    [0][7] = (tester,another)
    [0][8] = ,another

     The captures of match group #2 contain all of the arguments at the top level (you only see the last one in this listing but if you use something like Expresso you get to see the full breakdown and the individual captures), ditto for match group #5 for the first "sub" level and match group #8 for the 2nd level.

    You can add in more match groups to capture the level names and either use group names or adjust the match group numbers accordingly.

    If a particular level does not exist, then that group (#3 for the first level and #6 for the second) and the later groups will all be null.

    Susan

  •  07-28-2010, 8:58 PM 70188 in reply to 70186

    Re: New to RegEx

    Wow! So close Susan... I've left one other rule out by mistake - Includes don't have to have fields. So "Include/Subinclude/Subinclude" & "Include/Subinclude(field)/SubInclude" are also valid.
  •  07-28-2010, 10:56 PM 70194 in reply to 70188

    Re: New to RegEx

    Actually, I thought I had included that combination but I left out a pair of parentheses and a quantifier - #%^%^$&^. Try:

    \w+((\((,?\w+)+\))?(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?

    Susan

  •  07-28-2010, 10:56 PM 70195 in reply to 70186

    Re: New to RegEx

    Hi Susan,

     

    Thanks again for pointing me in the right direction. I've learnt alot from your expression. The final answer that works is:

     

    (\w+)(/(\w+)?(/(\w+)))?(/(\w+))?(\((,?(\w+))+\)(/(\w+)(\((,?(\w+))+\))?)?(/(\w+)(\((,?(\w+))+\))?)?)?

     

    Here's the revised test that makes sure that all tests are covered:

     

    Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Seller(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)/User(user_id)/Profile(user_profile_id,login_name)/Subject(user_id,login_name),Listing/User/Shops,Listing(listing_id)/User(user_id,login_name)/Shops,Listings,Listings/Images

     

    G.

  •  07-28-2010, 11:02 PM 70197 in reply to 70194

    Re: New to RegEx

    Cool - thanks! Looks more efficient than my attempt :)

     This is a revised test string: 

     

    Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)/User(user_id)/Profile(user_profile_id,login_name),Listing/User/Shops,Listing/User(user_id,login_name)/Shops,Listings/Images,Subject/User/Profile(user_profile_id,login_name)

  •  07-28-2010, 11:07 PM 70198 in reply to 70194

    Re: New to RegEx

    Actually, there is a small problem with your solution:

     

    If you look at test "Listings/Images", Groups 1 & 4 return "/Images" not "Listings" & "\Images"

     

    Sorry.

     

    G.

  •  07-29-2010, 7:24 PM 70243 in reply to 70198

    Re: New to RegEx

    As I said several postings back:

    "You can add in more match groups to capture the level names and either use group names or adjust the match group numbers accordingly."

    The actual names are not captured explicitly in my pattern but you can do this with something like:

    (\w+)((\((,?\w+)+\))?(/(\w+)(\((,?\w+)+\))?)?(/(\w+)(\((,?\w+)+\))?)?)?

    Of course this will change the numbering of all groups but that can be accounted for in your code.

    There are probably matching groups that are not needed by you and you can make them non-capturing by making them start '(?:' For example, match group #2 in the above pattern probably does not need to be captured so you could start the pattern

    (\w+)(?:(\((............

    Susan

  •  07-30-2010, 2:49 AM 70251 in reply to 70243

    Re: New to RegEx

    I've learnt alot from you in a very short period of time - much appriciated for hanging in there... The expression has grown considerably (62 groups) and the interpreter is now completed.

     

    Many Thanks

     

    G.

  •  08-01-2010, 6:57 PM 70343 in reply to 70251

    Re: New to RegEx

    Good to hear.

    Now try to maintain that monster!

    Seriously, if you are getting a pattern that is that complex for interpreting a computer language, I would recommend you look at using some of the parsers that are around (lex, bison etc) and using those. My experience is that LL(1) style parsers are fairly quick and easy to create (and maintain) and perform better than regex patterns, especially if you have large volumes of text to process.

    Susan

View as RSS news feed in XML