Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

group capturing help

Last post 12-01-2008, 9:30 AM by kminev. 20 replies.
Page 1 of 2 (21 items)   1 2 Next >
Sort Posts: Previous Next
  •  11-20-2008, 10:54 PM 48602

    group capturing help

    Hi,

     I am fairly new to the RegEx concept and I am parsing some log files using java and RegEx. I used a web tool that somewhat generated portion of the RegEx pattern I needed, but I still need to modify it in order to get it to work for my purpose.

    Here is my RegEx:

     

     regEx.append("((?:(?:[0-2]?\\d{1})|(?:[3][0,1]{1}))[-:\\/.](?:[0]?[1-9]|[1][012])[-:\\/.](?:(?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3})))(?![\\d])"); // DDMMYYYY 1 // GROUP 1  Date
            regEx.append(".*?");
            regEx.append("((?:(?:[0-1][0-9])|(?:[2][0-3])|(?:[0-9])):(?:[0-5][0-9])(?::[0-5][0-9])?(?:\\s?(?:am|AM|pm|PM))?)");// HourMinuteSec 1 Group 2 Time
            regEx.append(".*?");
            regEx.append("(?:OrderStatus)");//Group 3 OrderStaus
            regEx.append(".*?");
            regEx.append("(?:Routed)");//Group 4 Routed
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append("((?:[a-z][a-z0-9_]*))");//Group 5 Action
            regEx.append(".*?");
            regEx.append("((?:[a-z][a-z0-9_]*))");//Group 6 B/S
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append("((?:[a-z0-9_][a-z0-9[/s/w]_]*))");//Group 7 SOURCE
            regEx.append(".*?");
            regEx.append("((?:[a-z][a-z0-9_-]*))");//Group 8 Exchange
            regEx.append(".*?");       
            regEx.append("(?:[a-z][a-z0-9_]*)");//miss one match
            regEx.append(".*?");       
            regEx.append("((?:[a-z0-9_][a-z0-9_]*))");//Group 9 Instrument
            regEx.append(".*?");
            regEx.append("(\\d+)"); // (GROUP 10 instrument digits)
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append(".*?");
            regEx.append("(?:([a-z0-9][a-z0-9_]*))");//Group 11 TraderID
            regEx.append(".*?");
            regEx.append("((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\\d])"); //Group 12 IP Address
            regEx.append(".*?");
            regEx.append("(?:[a-z][a-z0-9_]*)");
            regEx.append(".*?");
            regEx.append("((?:[a-z0-9_-][a-z0-9_-]*))");//Group 13 Exchange OrderID
            regEx.append(".*?");
            regEx.append("\\d");
            regEx.append(".*?");
            regEx.append("\\d");
            regEx.append(".*?");
            regEx.append("\\d");
            regEx.append(".*?");
            regEx.append("\\d");
            regEx.append(".*?");
            regEx.append("\\d");
            regEx.append(".*?");
            regEx.append("\\d");
            regEx.append(".*?");
            regEx.append("(\\d+)");//GROUP 14 Latency

     

    And this is my input:

     

    29.10.2008 03:25:04.062 | ORDERSERVER/PROD | 2340 | INFO | 00000000 | LJK063 | OrderStatus(Routed=(sts:8202 D B O Autotrader   15 XXX-DD SPR GE 0000     0.0   28.0  L   X:  15 W:   0    B15086525 XXXXXXXXXXXXXXXXX A1     GTC    08:29:47.000 No. 7198364 sndr XX.XX.XX.XX something_order_id 0004AAAK sok: 060EDT011) Latency=16) 

    Here is what I get as an result:

    (29.10.2008)  (03:25:04.)  (D)  (B)  (Autotrader)  (XXX-DD)  (ES)  (0309)  (XXXXXXXXX)  (XX.XXX.XX.XXX)  (0008FG74)

     

    I am also trying to capture the latency number as well...

     

    Any advice we'll be appreciated.

    Thanks in advance.

  •  11-21-2008, 12:30 AM 48605 in reply to 48602

    Re: group capturing help

    My advice is take some time and learn how to write your own regexes, and stop using the tool that generated that pattern. The generated pattern is needless verbose. And if you are just learning how to write regexes you shouldn't learn how to do unnecessary things.

    I don't see where the data in your results , which I'm assuming is just the capture groups, matches the input from your sample. There is no ES or 0309 in the provided input sample.

    You'll need to provide a sample that the output can be derived from the input and explain to us in detail what you are trying to match.

    Also I'm fairly sure there are programs that parse log files that may already do what you are trying to do.


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-21-2008, 12:49 AM 48606 in reply to 48605

    Re: group capturing help

    To follow on from Michael's comments, I spent quite a bit of time trying to extract the regex pattern from your source code and it seems to go into an infinite loop. However I'm not surprised when it has sub-patterns such as '.*?.*?' which is almost guaranteed to cause major performance problems.

    Having said that, when I cut down the pattern to something that would actually complete, I can regenerate the first 4 matches that you get (even though the comments appear to be out of step!) but I get a completely different value for the 5th match - namely "ng_" from "something_order" which seems to be way beyond the place where your version matches.

    At this point I gave up!

    Can you do yourself and everyone in the forum a favour and tell us what you are trying to achieve rather than have us reverse engineer a whole lot of code.

    Susan

  •  11-21-2008, 12:16 PM 48633 in reply to 48606

    Re: group capturing help

    Anybody who can shed some light for me???

    Your help we'll be appreciated.

     

    Thanks.

  •  11-21-2008, 4:07 PM 48651 in reply to 48633

    Re: group capturing help

    This how far I got, but it keeps blowing up.

            StringBuilder regExB = new StringBuilder();
            //regExB.append("^(?([^\s]*?)\\s+)");
            regExB.append("(?<time>[^\s]*?)\s+");
            regExB.append(".*?OrderStatus\(Routed=\(\w+:\w+\s+\w\s+\w\s+");
            regExB.append("(?<autosomething>\w+)\s+\d+\s+");
            regExB.append("(?<cmesomething>[A-Z-])\s+\w+\w+");
            regExB.append("(?<another1>\w+)\s+");
            regExB.append("(?<another2>\d{4})\s+");
            regExB.append(".*?");
            regExB.append("Latency=(?<latency>\d+)\)$";
            regExB.append("*");")");

    it sort of worked with c#, but when I plug it in java no luck at all

  •  11-21-2008, 4:07 PM 48652 in reply to 48633

    Re: group capturing help

    This how far I got, but it keeps blowing up.

            StringBuilder regExB = new StringBuilder();
            //regExB.append("^(?([^\s]*?)\\s+)");
            regExB.append("(?<time>[^\s]*?)\s+");
            regExB.append(".*?OrderStatus\(Routed=\(\w+:\w+\s+\w\s+\w\s+");
            regExB.append("(?<autosomething>\w+)\s+\d+\s+");
            regExB.append("(?<cmesomething>[A-Z-])\s+\w+\w+");
            regExB.append("(?<another1>\w+)\s+");
            regExB.append("(?<another2>\d{4})\s+");
            regExB.append(".*?");
            regExB.append("Latency=(?<latency>\d+)\)$";
            regExB.append("*");")");

    it sort of worked with c#, but when I plug it in java no luck at all

  •  11-21-2008, 4:47 PM 48656 in reply to 48652

    Re: group capturing help

    kminev:

    This how far I got, but it keeps blowing up.

            StringBuilder regExB = new StringBuilder();
            //regExB.append("^(?([^\s]*?)\\s+)");
            regExB.append("(?<time>[^\s]*?)\s+");
            regExB.append(".*?OrderStatus\(Routed=\(\w+:\w+\s+\w\s+\w\s+");
            regExB.append("(?<autosomething>\w+)\s+\d+\s+");
            regExB.append("(?<cmesomething>[A-Z-])\s+\w+\w+");
            regExB.append("(?<another1>\w+)\s+");
            regExB.append("(?<another2>\d{4})\s+");
            regExB.append(".*?");
            regExB.append("Latency=(?<latency>\d+)\)$";
            regExB.append("*");")");

    it sort of worked with c#, but when I plug it in java no luck at all

    First thing, regex support isn't universal across languages you can't alway test with one language and apply with another. That's why we ask in the posting guideline which langauge you are using. Java doesn't support the named grouped like .Net does.  If you are writing a regex to use in Java, find a Java regex tester. You'll have to access the groups by their ordinal positions.

    Here is a pattern you can try.  I didn't have time to write pattern that validated the data ( ie that a date format was really a date) but that may be more than you need.  If so it can always be added in later. I basically matched what you had highlight based on it's position in the line. You could do stricter matching.

    Raw Match Pattern:
    (?=.+?OrderStatus\(Routed=)((?:\d\d\.){2}\d{4})\s+((?:\d\d:){2}\d\d\.\d+).+?OrderStatus\(Routed=\S+\x20[A-Z]\x20[A-Z]\x20[A-Z]\x20([A-Z\x20]+)\x20\d+\x20(CME-[A-Z])\s\S+\s(\w{2}\x20\d+).+?(TTORD\S+).+?((?:\d{1,3}\.){3}\d{1,3})\x20exchange_order_id\x20(\w+).+?Latency=(\d+)

    Match Pattern Explanation:
    The regular expression:

    (?im-sx:(?=.+?OrderStatus\(Routed=)((?:\d\d\.){2}\d{4})\s+((?:\d\d:){2}\d\d\.\d+).+?OrderStatus\(Routed=\S+\x20[A-Z]\x20[A-Z]\x20[A-Z]\x20([A-Z\x20]+)\x20\d+\x20(CME-[A-Z])\s\S+\s(\w{2}\x20\d+).+?(TTORD\S+).+?((?:\d{1,3}\.){3}\d{1,3})\x20exchange_order_id\x20(\w+).+?Latency=(\d+))

    matches as follows:

    NODE EXPLANATION
    ----------------------------------------------------------------------
    (?im-sx: group, but do not capture (case-insensitive)
    (with ^ and $ matching start and end of
    line) (with . not matching \n) (matching
    whitespace and # normally):
    ----------------------------------------------------------------------
    (?= look ahead to see if there is:
    ----------------------------------------------------------------------
    .+? any character except \n (1 or more times
    (matching the least amount possible))
    ----------------------------------------------------------------------
    OrderStatus 'OrderStatus'
    ----------------------------------------------------------------------
    \( '('
    ----------------------------------------------------------------------
    Routed= 'Routed='
    ----------------------------------------------------------------------
    ) end of look-ahead
    ----------------------------------------------------------------------
    ( group and capture to \1:
    ----------------------------------------------------------------------
    (?: group, but do not capture (2 times):
    ----------------------------------------------------------------------
    \d digits (0-9)
    ----------------------------------------------------------------------
    \d digits (0-9)
    ----------------------------------------------------------------------
    \. '.'
    ----------------------------------------------------------------------
    ){2} end of grouping
    ----------------------------------------------------------------------
    \d{4} digits (0-9) (4 times)
    ----------------------------------------------------------------------
    ) end of \1
    ----------------------------------------------------------------------
    \s+ whitespace (\n, \r, \t, \f, and " ") (1 or
    more times (matching the most amount
    possible))
    ----------------------------------------------------------------------
    ( group and capture to \2:
    ----------------------------------------------------------------------
    (?: group, but do not capture (2 times):
    ----------------------------------------------------------------------
    \d digits (0-9)
    ----------------------------------------------------------------------
    \d digits (0-9)
    ----------------------------------------------------------------------
    : ':'
    ----------------------------------------------------------------------
    ){2} end of grouping
    ----------------------------------------------------------------------
    \d digits (0-9)
    ----------------------------------------------------------------------
    \d digits (0-9)
    ----------------------------------------------------------------------
    \. '.'
    ----------------------------------------------------------------------
    \d+ digits (0-9) (1 or more times (matching
    the most amount possible))
    ----------------------------------------------------------------------
    ) end of \2
    ----------------------------------------------------------------------
    .+? any character except \n (1 or more times
    (matching the least amount possible))
    ----------------------------------------------------------------------
    OrderStatus 'OrderStatus'
    ----------------------------------------------------------------------
    \( '('
    ----------------------------------------------------------------------
    Routed= 'Routed='
    ----------------------------------------------------------------------
    \S+ non-whitespace (all but \n, \r, \t, \f,
    and " ") (1 or more times (matching the
    most amount possible))
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    [A-Z] any character of: 'A' to 'Z'
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    [A-Z] any character of: 'A' to 'Z'
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    [A-Z] any character of: 'A' to 'Z'
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    ( group and capture to \3:
    ----------------------------------------------------------------------
    [A-Z\x20]+ any character of: 'A' to 'Z', '\x20' (1
    or more times (matching the most amount
    possible))
    ----------------------------------------------------------------------
    ) end of \3
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    \d+ digits (0-9) (1 or more times (matching
    the most amount possible))
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    ( group and capture to \4:
    ----------------------------------------------------------------------
    CME- 'CME-'
    ----------------------------------------------------------------------
    [A-Z] any character of: 'A' to 'Z'
    ----------------------------------------------------------------------
    ) end of \4
    ----------------------------------------------------------------------
    \s whitespace (\n, \r, \t, \f, and " ")
    ----------------------------------------------------------------------
    \S+ non-whitespace (all but \n, \r, \t, \f,
    and " ") (1 or more times (matching the
    most amount possible))
    ----------------------------------------------------------------------
    \s whitespace (\n, \r, \t, \f, and " ")
    ----------------------------------------------------------------------
    ( group and capture to \5:
    ----------------------------------------------------------------------
    \w{2} word characters (a-z, A-Z, 0-9, _) (2
    times)
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    \d+ digits (0-9) (1 or more times (matching
    the most amount possible))
    ----------------------------------------------------------------------
    ) end of \5
    ----------------------------------------------------------------------
    .+? any character except \n (1 or more times
    (matching the least amount possible))
    ----------------------------------------------------------------------
    ( group and capture to \6:
    ----------------------------------------------------------------------
    TTORD 'TTORD'
    ----------------------------------------------------------------------
    \S+ non-whitespace (all but \n, \r, \t, \f,
    and " ") (1 or more times (matching the
    most amount possible))
    ----------------------------------------------------------------------
    ) end of \6
    ----------------------------------------------------------------------
    .+? any character except \n (1 or more times
    (matching the least amount possible))
    ----------------------------------------------------------------------
    ( group and capture to \7:
    ----------------------------------------------------------------------
    (?: group, but do not capture (3 times):
    ----------------------------------------------------------------------
    \d{1,3} digits (0-9) (between 1 and 3 times
    (matching the most amount possible))
    ----------------------------------------------------------------------
    \. '.'
    ----------------------------------------------------------------------
    ){3} end of grouping
    ----------------------------------------------------------------------
    \d{1,3} digits (0-9) (between 1 and 3 times
    (matching the most amount possible))
    ----------------------------------------------------------------------
    ) end of \7
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    exchange_order_id 'exchange_order_id'
    ----------------------------------------------------------------------
    \x20 character 32
    ----------------------------------------------------------------------
    ( group and capture to \8:
    ----------------------------------------------------------------------
    \w+ word characters (a-z, A-Z, 0-9, _) (1 or
    more times (matching the most amount
    possible))
    ----------------------------------------------------------------------
    ) end of \8
    ----------------------------------------------------------------------
    .+? any character except \n (1 or more times
    (matching the least amount possible))
    ----------------------------------------------------------------------
    Latency= 'Latency='
    ----------------------------------------------------------------------
    ( group and capture to \9:
    ----------------------------------------------------------------------
    \d+ digits (0-9) (1 or more times (matching
    the most amount possible))
    ----------------------------------------------------------------------
    ) end of \9
    ----------------------------------------------------------------------
    ) end of grouping
    ----------------------------------------------------------------------

    Java Code Example:

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class Module1{
    public static void main(String[] asd){
    String sourcestring = "source string to match with pattern";
    Pattern re = Pattern.compile("(?=.+?OrderStatus\\(Routed=)((?:\\d\\d\\.){2}\\d{4})\\s+((?:\\d\\d:){2}\\d\\d\\.\\d+).+?OrderStatus\\(Routed=\\S+\\x20[A-Z]\\x20[A-Z]\\x20[A-Z]\\x20([A-Z\\x20]+)\\x20\\d+\\x20(CME-[A-Z])\\s\\S+\\s(\\w{2}\\x20\\d+).+?(TTORD\\S+).+?((?:\\d{1,3}\\.){3}\\d{1,3})\\x20exchange_order_id\\x20(\\w+).+?Latency=(\\d+)",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    Matcher m = re.matcher(sourcestring);
    Int mIdx = 0;
    while (m.find()){
    for( int groupIdx = 0; groupIdx < m.groupCount(); groupIdx++ ){
    System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
    }
    mIdx++;
    }
    }
    }

    $matches Array:
    (
    [0] => Array
    (
    [0] => 14.11.2008 00:47:46.158 | ORDERSERVER/PROD | 3180 | INFO | 00000000 | JAD714 | OrderStatus(Routed=(sts:8202 C B O Normal OS 50 CME-F FUT GE 0309 0.0 9783.5 0.0 L X: 0 W: 50 AZHK70 TTORDHKHKAN2NSO A1 GTD 06:47:46.000 No. 914979 sndr 10.131.2.123 exchange_order_id 0000JM03 sok: 0G12PA001) Latency=0
    [1] => 14.11.2008 06:38:23.963 | ORDERSERVER/PROD | 3180 | INFO | 00000000 | JAD714 | OrderStatus(Routed=(sts:8202 A B O Autospreader 10 CME-F SPR GE 0000 0.0 -12.25 0.00 L X: 0 W: 10 &BENC10497 TTORDNAIXBN1BBROCHAR A1 GTD 12:38:23.000 No. 915006 sndr 10.93.81.115 exchange_order_id 0000JM0U sok: 0G2KMH007) Latency=0
    [2] => 17.11.2008 00:00:03.513 | ORDERSERVER/PROD | 3988 | INFO | 00000000 | JVV714 | OrderStatus(Routed=(sts:8202 C S O Autotrader 4 CME-C FUT 6B 0309 0.0 14751 0 L X: 0 W: 4 A15085450 TTORDERERCN1EMICHAEL A1 GTD 06:00:03.000 No. 14144932 sndr 67.202.68.182 exchange_order_id 0008F6AS sok: 09093C957) Latency=16
    )

    [1] => Array
    (
    [0] => 14.11.2008
    [1] => 14.11.2008
    [2] => 17.11.2008
    )

    [2] => Array
    (
    [0] => 00:47:46.158
    [1] => 06:38:23.963
    [2] => 00:00:03.513
    )

    [3] => Array
    (
    [0] => Normal OS
    [1] => Autospreader
    [2] => Autotrader
    )

    [4] => Array
    (
    [0] => CME-F
    [1] => CME-F
    [2] => CME-C
    )

    [5] => Array
    (
    [0] => GE 0309
    [1] => GE 0000
    [2] => 6B 0309
    )

    [6] => Array
    (
    [0] => TTORDHKHKAN2NSO
    [1] => TTORDNAIXBN1BBROCHAR
    [2] => TTORDERERCN1EMICHAEL
    )

    [7] => Array
    (
    [0] => 10.131.2.123
    [1] => 10.93.81.115
    [2] => 67.202.68.182
    )

    [8] => Array
    (
    [0] => 0000JM03
    [1] => 0000JM0U
    [2] => 0008F6AS
    )

    [9] => Array
    (
    [0] => 0
    [1] => 0
    [2] => 16
    )

    )

     


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-21-2008, 9:23 PM 48660 in reply to 48656

    Re: group capturing help

    Thank you so much. This is great I will look into and hopefully that will help me pick some RegEx techniques. I bought Mastering Regular Expressions by O'Reilly any good book that we'll be a good source?

     

  •  11-21-2008, 11:10 PM 48662 in reply to 48660

    Re: group capturing help

    I've never read it myself but many highly recommend the one you have.

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-23-2008, 12:16 AM 48698 in reply to 48662

    Re: group capturing help

    For some reason netbeans will not take  '\' character in my regular expression string and comes back with error message "Illegal escape character" 

    any ideas why it behaves this way with java?

  •  11-23-2008, 1:03 AM 48699 in reply to 48698

    Re: group capturing help

    I fixed that. Just doubled the lines //

    I am also not captrurin values for Latency... I tried to tweek the regex, but no luck so far

     

    Here is my java code.

     

     StringBuilder regEx = new StringBuilder();
            regEx.append("(?=.+?OrderStatus\\(Routed=)((?:\\d\\d\\.){2}\\d{4})\\s+((?:\\d\\d:){2}\\d\\d\\.\\d+).+?OrderStatus\\(Routed=\\S+\\x20[A-Z]\\x20[A-Z]\\x20[A-Z]\\x20([A-Z\\x20]+)\\x20\\d+\\x20(CME-[A-Z])\\s\\S+\\s(\\w{2}\\x20\\d+).+?(TTORD\\S+).+?((?:\\d{1,3}\\.){3}\\d{1,3})\\x20exchange_order_id\\x20(\\w+).+?Latency=(\\d+).*");
           


            Pattern p = Pattern.compile(regEx.toString(), Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

            int lineNum = 0;
            int linesNotMatch = 0;
            String line = dis.readLine();

            Matcher m;
            m = p.matcher(line);

  •  11-23-2008, 1:10 AM 48700 in reply to 48699

    Re: group capturing help

    Take the pattern from the Java code sample, not the raw pattern. The code sample is adjusted for that language.

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-24-2008, 9:56 AM 48768 in reply to 48700

    Re: group capturing help

    Thanks Michael. I did that even though before that I found out that I need to double the slashes.

     

    The RegEx is still not capturing the Latency value (Latency=number) I tried multiple approaches to get this value, but no success so far. You've helped enough already, but it would great if I can help finish this last part. I learned a lot from you example.

     

    Another item I still need to work on is the section "TTORD"  There the word can start with pretty much anything not necessarily TTORD. I was wondering if there is any way I can capture anything the is there by specifying maybe the location.....

    Input Example:

    17.11.2008 04:51:00.756 | ORDERSERVER/PROD | 2644 | INFO | 00000000 | LQX714 | OrderStatus(Routed=(sts:8202 A S O Normal OS    6 CBOT-B FUT ZM 1208     0.0   2690 0 L   X:   0 W:   6     07645775 LQX714NYDN2MKMICKEY A1     GTD    10:51:00.000 No. 3418181 sndr 00.00.00.00 exchange_order_id 000219HH sok: 0G2LYT019) Latency=0) 

    Here is also my current epxression: 

    ReEx: "(?=.+?OrderStatus\\(Routed=)((?:\\d\\d\\.){2}\\d{4})\\s+((?:\\d\\d:){2}\\d\\d\\.\\d+).+?OrderStatus\\(Routed=\\S+\\x20[A-Z]\\x20[A-Z]\\x20[A-Z]\\x20([A-Z\\x20]+)\\x20\\d+\\x20(CME-[A-Z]|CBOT-[A-Z])\\s\\S+\\s(\\w{2}\\x20\\d+).+?(TTORD\\S+).+?((?:\\d{1,3}\\.){3}\\d{1,3})\\x20exchange_order_id\\x20(\\w+).+?Latency=(\\d+)"

     

    Thank you very much for the spent time trying to help me learn ReEx this forum is great.

  •  11-24-2008, 11:17 AM 48770 in reply to 48768

    Re: group capturing help

    You have changed either your requirements, input or the pattern I gave you, as you can see from the sample above the Latency value was captured in the last array. At this point it seems the problem is implementation. You first need to examine your source data and see if it is consistent with the samples you posted.Below is the sample text I used

    kminev:

    Hi,

    I have posted the same post and in hope to to receive some help I decided to repost it again so maybe my post gets noticed from more people.

    I need to do some text parsing and on the following text:

    Text input:

    1) 14.11.2008 00:47:46.158 | ORDERSERVER/PROD | 3180 | INFO | 00000000 | JAD714 | OrderStatus(Routed=(sts:8202 C B O Normal OS   50 CME-F FUT GE 0309     0.0   9783.5 0.0 L   X:   0 W:  50       AZHK70 TTORDHKHKAN2NSO A1     GTD    06:47:46.000 No. 914979 sndr 10.131.2.123 exchange_order_id 0000JM03 sok: 0G12PA001) Latency=0

    2) 14.11.2008 06:38:23.963 | ORDERSERVER/PROD | 3180 | INFO | 00000000 | JAD714 | OrderStatus(Routed=(sts:8202 A B O Autospreader   10 CME-F SPR GE 0000     0.0   -12.25 0.00 L   X:   0 W:  10   &BENC10497 TTORDNAIXBN1BBROCHAR A1     GTD    12:38:23.000 No. 915006 sndr 10.93.81.115 exchange_order_id 0000JM0U sok: 0G2KMH007) Latency=0

    3) 17.11.2008 00:00:03.513 | ORDERSERVER/PROD | 3988 | INFO | 00000000 | JVV714 | OrderStatus(Routed=(sts:8202 C S O Autotrader    4 CME-C FUT 6B 0309     0.0   14751 0 L   X:   0 W:   4    A15085450 TTORDERERCN1EMICHAEL A1     GTD    06:00:03.000 No. 14144932 sndr 67.202.68.182 exchange_order_id 0008F6AS sok: 09093C957) Latency=16

     

    I need to extract all the values that are bolded and my flag should be "OrderStatus(Routed=" any line that has that I need to capture the values and place them in groups.

    Thanks in advance.


     


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-24-2008, 12:29 PM 48772 in reply to 48770

    Re: group capturing help

    This is my implementation code: (The input is the same as the one I posted previously)

    private void cmeParser(String filePath) throws FileNotFoundException, IOException {

            FileInputStream fis = null;
            BufferedInputStream bis = null;
            DataInputStream dis = null;
            String lineTemp = "";
            Matcher m;

            try {
                fis = new FileInputStream(filePath);
                bis = new BufferedInputStream(fis);
                dis = new DataInputStream(bis);
            } catch (Exception e) {
                e.printStackTrace();
                System.out.println("Input Stream failed to open");
            }

            String regEx = "(?=.+?OrderStatus\\(Routed=)((?:\\d\\d\\.){2}\\d{4})\\s+((?:\\d\\d:){2}\\d\\d\\.\\d+).+?OrderStatus\\(Routed=\\S+\\x20[A-Z]\\x20[A-Z]\\x20[A-Z]\\x20([A-Z\\x20]+)\\x20\\d+\\x20(CME-[A-Z]|CBOT-[A-Z])\\s\\S+\\s(\\w{2}\\x20\\d+).+?(TTORD\\S+).+?((?:\\d{1,3}\\.){3}\\d{1,3})\\x20exchange_order_id\\x20(\\w+).+?Latency=(\\d+)";

            Pattern p = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

            lineTemp = dis.readLine();
            m = p.matcher(lineTemp);

            while (dis.available() != 0) {


                if (m.find()) {

                    System.out.println("Found something");
                    //DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG
                    StringBuilder displayResult = new StringBuilder();
                    for (int x = 1; x < m.groupCount(); x++) {
                        displayResult.append(" (" + m.group(x) + ") ");
                    }
                    System.out.println("Before Parsed: " + lineTemp);
                    System.out.println("RESULT: " + displayResult);
                } else {
                    System.out.println("Found nothing");
                }

                lineTemp = dis.readLine();
                m = p.matcher(lineTemp);
            }

        }

     

    Kind of bizarre why I am missing latency value.

     

    Thank you.

Page 1 of 2 (21 items)   1 2 Next >
View as RSS news feed in XML