|
|
New to RegEx
-
07-28-2010, 11:01 AM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
I'm working with VB.NET and I'm working with a RESTful API... They have a construction for selecting resources (Includes) and fields. I've written a wrapper and want to add syntax checking before making any calls. I'm wanting to use RegEX but am struggling to understand and need help. The calling syntax structure is nested up to 3 levels with field selection for each nested association (Include) as follows: Include,Include/Subinclude,Include(field,field,...,field)/Subinclude(field,field,...,field),...,Include(field,field,...,field)/Subinclude(field,field,...,field)/Subinclude(field,field,...,field) * Fields must be associated with Includes seperated by commas and are only lowercase letters & nubmers * Includes are seperated by commas and are title case letters & numbers * Includes can specify fields wrapped in braces. eg. Include(field,field,...,field) * Includes can be nested with two-sub levels. Subincludes are nested using '/' and are title case letters & numbers. eg. Include/Subinclude/Subinclude * Every sub-include can have fields like Includes. * There can be any number of each type of element, usage is variable, and Includes can only be nested to 3 levels (Include/Subinclude/Subinclude). I can code this in VB.NET but am wanting to use RegEx for its speed benefits. Im sure this is dead easy but am not sure how to work with nested multi-level optional elements in RegEx and would really appriciate some help from the gurus... Many Thanks, Graeme.
|
|
-
07-28-2010, 12:02 PM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
Here's an example that needs to be parsed: Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Seller(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name) Forgot to mention that '_' underscore character is ok. Here's the RegEx expression I've built so far: [0-9a-fA-F]*((.|\n)*?)((|)|,|\) I'm trying to stripe out all text and build a nested representation. So being able to identify level/type groups would be a bonus... Thanks again, G.
|
|
-
07-28-2010, 5:00 PM |
-
mash
-
-
-
Joined on 04-14-2005
-
Birmingham, AL
-
Posts 2,171
-
-
|
Raw Match Pattern:
^(\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})(,\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})*
VB.NET Code Example:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim re As Regex = New Regex("^(\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})(,\w+(\(\w+(,\w+)*\))*(/\w+(\(\w+(,\w+)*\))*){0,2})*")
Dim mc as MatchCollection = re.Matches(sourcestring)
Dim mIdx as Integer = 0
For each m as Match in mc
For groupIdx As Integer = 0 To m.Groups.Count - 1
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
Next
mIdx=mIdx+1
Next
End Sub
End Module
$matches Array:
(
[0] => Array
(
[0] => Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Seller(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)
)
[1] => Array
(
[0] => Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)
)
[2] => Array
(
[0] => (listing_id,user_id,title,materials,tags,url)
)
[3] => Array
(
[0] => ,url
)
[4] => Array
(
[0] => /Images(listing_image_id,url_75x75)
)
[5] => Array
(
[0] => (listing_image_id,url_75x75)
)
[6] => Array
(
[0] => ,url_75x75
)
[7] => Array
(
[0] => ,Subject(user_id,login_name)
)
[8] => Array
(
[0] => (user_id,login_name)
)
[9] => Array
(
[0] => ,login_name
)
[10] => Array
(
[0] => /Profile
)
[11] => Array
(
[0] => (user_id,login_name)
)
[12] => Array
(
[0] => ,login_name
)
)
You can look in the Captures collection (not shown) to get various parts broken down but that won't show which part is related to others higher in the hierarchy. It would probably be easier to take each Include set and reprocess them to get the associated parts.
Michael "In theory, theory and practice are the same. In practice, they are not." Albert Einstein
|
|
-
07-28-2010, 6:10 PM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
Thanks for replying. I'm not sure how to apply you code example above but I do agree with your last statement. Here's the class representation I'm trying to build: Public Class Include Public fields As List(Of String) Public Includes As Includes(Of Include) Sub New() Includes = New Includes(Of Include) End Sub End Class
Public Class Includes(Of Include) : Implements IEnumerable(Of include) #Region "Constructor" Public Sub New() End Sub #End Region
#Region "Public Properties" Private _Includes As New List(Of include) Default Public Property Item(ByVal index As Integer) As include Get Return _Includes(index) End Get Set(ByVal value As include) Try _Includes.Remove(value) Catch ex As Exception 'Handle Error End Try _Includes.Add(value) End Set End Property #End Region
#Region "Pulic Methods & Functions" Public Sub Clear() _Includes.Clear() End Sub
Public Sub Add(ByVal item As include) _Includes.Add(item) End Sub
Public Function Count() As Integer Return _Includes.Count End Function
Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of include) _ Implements IEnumerable(Of include).GetEnumerator Return _Includes.GetEnumerator End Function
Public Function GetEnumerator1() As System.Collections.IEnumerator _ Implements IEnumerable.GetEnumerator Return _Includes.GetEnumerator End Function #End Region
#Region "Boxing" Public Shared Narrowing Operator CType(ByVal src As includes(Of include)) As include() Dim dest(src.Count) As include For i As Integer = 0 To src.Count - 1 dest(i) = src(i) Next Return dest End Operator
Public Shared Widening Operator CType(ByVal src As include()) As includes(Of include) Dim dest = New includes(Of include) For i As Integer = 0 To src.Count - 1 dest.Add(src(i)) Next Return dest End Operator #End Region
End Class
I guess that I was looking for a magic bullet that would it break down like:
Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75), Listing(listing_id)/User(user_id,login_name), Buyer(user_id,login_name), Seller(user_id,login_name), Author(user_id,login_name,feedback_info)/Profile, Subject(user_id, login_name)
Then further down into sub sections, and so on with group names - is this not possble?
|
|
-
07-28-2010, 7:39 PM |
|
|
What about: \w+(\((,?\w+)+\)(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)? Given a variant of your longest example (so as to get to 2 sub-levels) of: Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)/SubSub(tester,another), the breakdown is as follows: Raw Match Pattern:
\w+(\((,?\w+)+\)(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?
VB.NET Code Example:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim re As Regex = New Regex("\w+(\((,?\w+)+\)(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?")
Dim mc as MatchCollection = re.Matches(sourcestring)
Dim mIdx as Integer = 0
For each m as Match in mc
For groupIdx As Integer = 0 To m.Groups.Count - 1
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
Next
mIdx=mIdx+1
Next
End Sub
End Module
Matches Found:
[0][0] = Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)/SubSub(tester,another)
[0][1] = (listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75)/SubSub(tester,another)
[0][2] = ,url
[0][3] = /Images(listing_image_id,url_75x75)
[0][4] = (listing_image_id,url_75x75)
[0][5] = ,url_75x75
[0][6] = /SubSub(tester,another)
[0][7] = (tester,another)
[0][8] = ,another
The captures of match group #2 contain all of the arguments at the top level (you only see the last one in this listing but if you use something like Expresso you get to see the full breakdown and the individual captures), ditto for match group #5 for the first "sub" level and match group #8 for the 2nd level. You can add in more match groups to capture the level names and either use group names or adjust the match group numbers accordingly. If a particular level does not exist, then that group (#3 for the first level and #6 for the second) and the later groups will all be null. Susan
|
|
-
07-28-2010, 8:58 PM |
-
07-28-2010, 10:56 PM |
|
|
Actually, I thought I had included that combination but I left out a pair of parentheses and a quantifier - #%^%^$&^. Try: \w+((\((,?\w+)+\))?(/\w+(\((,?\w+)+\))?)?(/\w+(\((,?\w+)+\))?)?)?
Susan
|
|
-
07-28-2010, 10:56 PM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
Hi Susan, Thanks again for pointing me in the right direction. I've learnt alot from your expression. The final answer that works is: (\w+)(/(\w+)?(/(\w+)))?(/(\w+))?(\((,?(\w+))+\)(/(\w+)(\((,?(\w+))+\))?)?(/(\w+)(\((,?(\w+))+\))?)?)? Here's the revised test that makes sure that all tests are covered: Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Seller(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)/User(user_id)/Profile(user_profile_id,login_name)/Subject(user_id,login_name),Listing/User/Shops,Listing(listing_id)/User(user_id,login_name)/Shops,Listings,Listings/Images G.
|
|
-
07-28-2010, 11:02 PM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
Cool - thanks! Looks more efficient than my attempt :) This is a revised test string: Listing(listing_id,user_id,title,materials,tags,url)/Images(listing_image_id,url_75x75),Listing(listing_id)/User(user_id,login_name),Buyer(user_id,login_name),Author(user_id,login_name,feedback_info)/Profile,Subject(user_id,login_name)/User(user_id)/Profile(user_profile_id,login_name),Listing/User/Shops,Listing/User(user_id,login_name)/Shops,Listings/Images,Subject/User/Profile(user_profile_id,login_name)
|
|
-
07-28-2010, 11:07 PM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
Actually, there is a small problem with your solution: If you look at test "Listings/Images", Groups 1 & 4 return "/Images" not "Listings" & "\Images" Sorry. G.
|
|
-
07-29-2010, 7:24 PM |
|
|
As I said several postings back: "You can add in more match groups to capture the level names and either
use group names or adjust the match group numbers accordingly." The actual names are not captured explicitly in my pattern but you can do this with something like: (\w+)((\((,?\w+)+\))?(/(\w+)(\((,?\w+)+\))?)?(/(\w+)(\((,?\w+)+\))?)?)? Of course this will change the numbering of all groups but that can be accounted for in your code. There are probably matching groups that are not needed by you and you can make them non-capturing by making them start '(?:' For example, match group #2 in the above pattern probably does not need to be captured so you could start the pattern (\w+)(?:(\((............
Susan
|
|
-
07-30-2010, 2:49 AM |
-
GraGra33
-
-
-
Joined on 07-29-2010
-
-
Posts 9
-
-
|
I've learnt alot from you in a very short period of time - much appriciated for hanging in there... The expression has grown considerably (62 groups) and the interpreter is now completed. Many Thanks G.
|
|
-
08-01-2010, 6:57 PM |
|
|
Good to hear. Now try to maintain that monster! Seriously, if you are getting a pattern that is that complex for interpreting a computer language, I would recommend you look at using some of the parsers that are around (lex, bison etc) and using those. My experience is that LL(1) style parsers are fairly quick and easy to create (and maintain) and perform better than regex patterns, especially if you have large volumes of text to process. Susan
|
|
|
|
|