Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Head scratching regex problem for a newbie

Last post 08-27-2008, 2:36 PM by omit46. 10 replies.
Sort Posts: Previous Next
  •  08-23-2008, 1:35 PM 45573

    Head scratching regex problem for a newbie

    Hi everyone,
    I am a complete newbie in regular expression, went thru a lot of regular expression tutorial but still can't  figure out how to solve this problem. Maybe regex gurus can help.

    What I am trying to do is: from the following text I want to first find a particular pattern of lines. "\\S+([ \\t]+-?[0-9.]+){8}" expression gives me all the lines I am looking for. Out of these lines I want to check if there are two lines that starts with the same word. If such a match is found then I want to add all the high, low, open, close values of the 2nd line to the 1st line and then remove the 2nd line from the text. I hope it doesnt sound too complicated. Is it possible? Or is it too difficult and too much to handle by regex?

    e.g. "\\S+([ \\t]+-?[0-9.]+){8}"  matches all the stocks from the "A GROUP" stocks to the  Spot Transactions.   The stock "LEGACYFOOT"  is  present in  two lines  (once in Z group and once in  spot transaction). I want add the open,high, close, low of LEGACYFOOT in Spot Transaction to the "LEGACYFOOT" in Z category. After that delete the 2nd line of "LEGACYFOOT" occurence from the text.

    thanks
    omit


    the text:(edited to simplify)

                          DHAKA STOCK EXCHANGE LTD.




                      TODAY'S SHARE MARKET : 2008-08-21
                      =================================
        (If the page is not updated please press the refresh button)


        EQUITY                          :        745081109873.65
        DEBT SECURITIES                 :        202154936500.00


        TOTAL                           :        947236046373.65







                       PRICES IN PUBLIC TRANSACTIONS : 2008-08-21
                       ==========================================
    A Group
    -------

    Instr Code     Open     High      Low    Close    %Chg Trade   Volume Value(Lc)

    1STBSRS      705.00   710.00   686.00   691.25    -.18    85     5650    39.365
    1STICB      5200.00  5250.00  5200.00  5224.75    4.22     6       40     2.090
    2NDICB      1650.00  1650.00  1561.00  1583.00    -.07     9       75     1.187
    3RDICB      1020.25  1036.00  1020.25  1029.50    -.50     6       85      .875
    4THICB      1006.25  1050.00  1006.25  1035.00    1.42    11      160     1.656
    MIRACLEIND    26.20    27.00    26.10    26.80    3.87    64    60000    15.965
    MITHUNKNIT   184.50   185.00   176.00   180.75     .97    21      960     1.739
    QSMDRYCELL    37.50    38.50    37.30    38.00    3.26   191   150500    57.158
    RAHIMTEXT    390.00   420.00   390.00   410.00    5.12     2       30      .123
    RANFOUNDRY    59.50    62.00    58.90    61.50    5.12   126    81000    49.217
    UTTARABANK  2849.00  2956.50  2848.00  2900.25    2.94  2892    48130  1404.152
    UTTARAFIN    766.00   825.00   766.00   819.75    5.06   179    15350   124.758
                                                           ----- -------- ---------
                                                           ----- -------- ---------
                                                           55122 12922983 22780.243
    "A Group" Scrips traded in Public Market =  146



    B Group
    -------

    Instr Code     Open     High      Low    Close    %Chg Trade   Volume Value(Lc)

    AGRANINS     213.00   239.00   213.00   226.50    8.76   179    17550    39.805
    BDAUTOCA     157.00   159.75   153.00   156.00    -.63    23      875     1.366
    NITOLINS     332.25   357.00   332.25   340.00    2.10    67     6250    21.441
    SONARBAINS   145.00   153.00   143.75   150.50    6.54    99    11800    17.340
                                                           ----- -------- ---------
                                                           ----- -------- ---------
                                                             741   223380   154.313
    "B Group" Scrips traded in Public Market =   12




    G Group
    -------

    "G Group" Scrips traded in Public Market =    0




    N Group
    -------

    Instr Code     Open     High      Low    Close    %Chg Trade   Volume Value(Lc)

    CONTININS    228.00   240.00   215.00   231.75    8.29   153    11450    26.258
    DBH         1180.00  1249.00  1155.00  1224.75    6.63    94     5150    61.723
    MPETROLEUM   131.50   133.00   129.90   130.60    2.03   496    96800   126.857
    TITASGAS     354.50   357.75   344.00   350.75     .64  2126   379400  1331.344
                                                           ----- -------- ---------
                                                           ----- -------- ---------
                                                            3979   742515  1729.643

    "N Group" Scrips traded in Public Market =    8




    Z Group
    -------

    Instr Code     Open     High      Low    Close    %Chg Trade   Volume Value(Lc)

    ALLTEX        68.75    73.00    68.50    71.75    4.36    18     1500     1.078
    ANLIMAYARN    50.25    50.25    50.00    50.00    3.62     2      150      .075
    LAFSURCEML   568.00   582.00   567.00   577.50    1.27   206    18550   107.275
    LEGACYFOOT    14.80    17.00    14.80    16.50   10.73    77    64000    10.261
    LEXCO        122.00   124.00   122.00   122.50    4.25     2       70      .086
    SHYAMPSUG     10.90    10.90    10.90    10.90    3.80     6      700      .076
    SOCIALINV    365.50   375.00   365.00   371.00    2.77   584    52100   193.397
    WATACHEM     305.25   312.25   305.25   311.25    4.01     6      180      .560
    WONDERTOYS    60.75    62.50    59.25    61.50    2.50    21     2700     1.662
    ZEALBANGLA    14.50    14.90    14.50    14.60     .68     7     3900      .570
                                                           ----- -------- ---------
                                                           ----- -------- ---------
                                                            2888   467200   962.587
    "Z Group" Scrips traded in Public Market =   60

                                                       ===========================

                                                          62730  14356078 25626.792

    Total number of scrips traded in Public Market = 226







                        PRICES IN SPOT TRANSACTIONS : 2008-08-21
                       ==========================================

    Instr Code     Open     High      Low    Close    %Chg Trade   Volume Value(Lc)

    LEGACYFOOT    14.80    16.80    16.00    16.50   10.73     9     9000     1.461
    PUBALIBANK   859.00   872.75   853.00   857.00    1.48  1216    38105   328.444
                                                           ----- -------- ---------
                                                           ----- -------- ---------
                                                            1225    47105   329.904


    Total number of scrips traded in Spot Market =   2






                    PRICES IN SPOT TRANSACTIONS (BONDs) : 2008-08-21
                   ==================================================

    Total number of BONDs traded in Spot Market =   0






                       PRICES IN ODDLOT TRANSACTIONS : 2008-08-21
                      ============================================

    Instr Code    Max Price    Min Price    Trades    Quantity  Value(In lakhs)

    ABBANK           909.00       902.00         2           4            .036
    ACI              475.00       475.00         2          30            .143
    AGNISYSL          67.00        60.10         4         540            .345
    ALARABANK        465.00       395.00        19         354           1.534
    APEXADELFT      2600.00      2600.00         3          30            .780
    UTTARABANK      2950.00      2950.00         1           1            .030
    UTTARAFIN        800.00       800.00         3          62            .496
                                            ------    --------    ------------
                                            ------    --------    ------------
                                               438       12122          27.815
    Total number of scrips traded in Oddlot =   75






                        PRICES IN BLOCK TRANSACTIONS : 2008-08-21
                       ===========================================

    Total number of scrips traded in Block =    0






  •  08-23-2008, 1:58 PM 45574 in reply to 45573

    Re: Head scratching regex problem for a newbie

    Anything is possible, what platform are you programming in?
  •  08-23-2008, 2:05 PM 45575 in reply to 45574

    Re: Head scratching regex problem for a newbie

    C# IN .net
  •  08-23-2008, 4:59 PM 45576 in reply to 45575

    Re: Head scratching regex problem for a newbie

    This should illustrate the matches, it's just a code issue after you have the matches:

    C#.NET Code Example:
    using System;
    using System.Text.RegularExpressions;
    namespace myapp
    {
      class Class1
        {
          static void Main(string[] args)
            {
              String sourcestring = "your source string to match with pattern";
              Regex re = new Regex(@"(\S+)([ \t]+-?[0-9.]+){8}");
              MatchCollection mc = re.Matches(sourcestring);
              int mIdx=0;
              foreach (Match m in mc)
               {
                for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                  {
                    Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                  }
                mIdx++;
              }
            }
        }
    }


  •  08-23-2008, 6:15 PM 45577 in reply to 45576

    Re: Head scratching regex problem for a newbie




     thanks. After the matching I should be using the c#. But once I get my matching result all I have to do is "match two lines that starts with the same word". Isn't it regex's job?

    right now this is what I want to do: Match lines that start with the same word

    For the following text two lines start with "2NDICB" and two lines with "3NDICB". I just want to get the matching lines. Is regex(get matches) or c#'(search/find) string library better choice for extracting the values from the lines?    
    1STICB      5200.00  5250.00  5200.00  5224.75    4.22     6       40     2.090
    2NDICB      1650.00  1650.00  1561.00  1583.00    -.07     9       75     1.187
    3RDICB      1020.25  1036.00  1020.25  1029.50    -.50     6       85      .875
    4THICB      1006.25  1050.00  1006.25  1035.00    1.42    11      160     1.656
    3RDICB      1650.00  1650.00  1561.00  1583.00    -.07     9       75     1.187
    2NDICB      5200.00  5250.00  5200.00  5224.75    4.22     6       40     2.090

  •  08-23-2008, 6:39 PM 45578 in reply to 45577

    Re: Head scratching regex problem for a newbie

    The regex code I supplied will create a capture group for the characters in the first column and the remaining columns and the code loops through those matches, within that loop is where you would put the non-regex logic code to test for duplicates.  I'm not in front of a development environment where I can write this for you at this time.


  •  08-24-2008, 7:54 PM 45594 in reply to 45578

    Re: Head scratching regex problem for a newbie

    You could use a pattern such as :

    ^(\S+)([ \t]+-?[0-9.]+){8}(?=.*?^\1)

    (with the 'multiline' and 'singleline' options set) which will locate only those records that have a duplicate code . At least this would reduce the searching down to only those codes where you KNOW there is a duplicate.

    If you use something like:

    ^(\S+)([ \t]+-?[0-9.]+){8}(?=.*?(^\1([ \t]+-?[0-9.]+)+\r?$))

    then you can use the captures in match group #4 directly to get the values to substitute back into the original line. You can then use the position information in match group #3 to delete the second line. (Be careful doing this in that once you edit or delete a line, all of the position information for duplicate lines that are after the one edited/deleted will be wrong - you will need to either make the necessary adjustments to the offsets; or do one match, do the replacements and deletions (preferably backwards through the text), and then scan the whole text again for any more.

    I don't know if I have misunderstood your requirements, but I see that some of the entries have only 5 numbers after the ID string and not 8. That is why I changed the '{8}' into '+' within the lookahead as this will then find 3 duplicated tags instead of just 1.

    I  have forced the tag to be at the start of a line - just in case!

    Finally, I've used a '\r?' subpattern so that it will pick up the carriage return that .NET tends to add in as part of the line terminator.

    Susan

  •  08-25-2008, 8:33 AM 45609 in reply to 45577

    Re: Head scratching regex problem for a newbie

    In VS2008 this is how I would start that project (note that the code shown creates a datatable and performs a select on that table to show the duplicates, you will need to add your code to sum the columns of the duplicates found, construct a line string, and string replace the row 0 line with the new sum row 0 and then string replace row 1 with empty string.  as you can see from the code, most of this solution is not regex-related).

    Code munged in post here, see link:

    http://pastebin.com/f77db8aca


  •  08-25-2008, 6:37 PM 45626 in reply to 45577

    Re: Head scratching regex problem for a newbie

    BTW, this is how that would be done in PHP (just to show how easier it is with associative array support):

    <?php
    $file=file_get_contents("samplein.txt");
    $cp="\s+(-?[\d.]+)";
    preg_match_all("/^(\S+)$cp$cp$cp$cp$cp$cp$cp$cp.*/m",$file,$matches);
    $lines=Array();
    for ($ii = 0; $ii < count($matches[0]); $ii++) {
      $code=$matches[1][$ii];
      if(!$lines[$code]){
        $lines[$code]=$matches[0][$ii];
        $opening[$code]=$matches[2][$ii];
        $high[$code]=$matches[3][$ii];
        $low[$code]=$matches[4][$ii];
        $closing[$code]=$matches[5][$ii];
        $chg[$code]=$matches[6][$ii];
        $trade[$code]=$matches[7][$ii];
        $volume[$code]=$matches[8][$ii];
        $value[$code]=$matches[9][$ii];
      } else {
        $file=str_replace($matches[0][$ii]."\n",'',$file);
        echo "<pre>removed: ".$matches[0][$ii];
        $opening2=floatval($matches[2][$ii])+floatval($opening[$code]);
        $high2=floatval($matches[3][$ii])+floatval($high[$code]);
        $low2=floatval($matches[4][$ii])+floatval($low[$code]);
        $closing2=floatval($matches[5][$ii])+floatval($closing[$code]);
        $chg2=floatval($matches[6][$ii])+floatval($chg[$code]);
        $trade2=floatval($matches[7][$ii])+floatval($trade[$code]);
        $volume2=floatval($matches[8][$ii])+floatval($volume[$code]);
        $value2=floatval($matches[9][$ii])+floatval($value[$code]);
        $newline="$code     $opening2     $high2     $low2     $closing2     $chg2     $trade2   $volume2   $value2\r";
        $file=str_replace($lines[$code],$newline,$file);
        echo "<br>added: $newline";
      }
    }
    $outfile = fopen('sampleout.txt',"w");
    fwrite($outfile,$file);
    fclose($outfile);
    echo "<br>done.";
    ?>


  •  08-26-2008, 7:29 AM 45642 in reply to 45577

    Re: Head scratching regex problem for a newbie

    To finalize this question I have filled in the remainder of the code required for ASP.NET:

    http://pastebin.com/f7cce98c1


  •  08-27-2008, 2:36 PM 45714 in reply to 45642

    Re: Head scratching regex problem for a newbie

    Yay!!. I finally figured it out. It seems so simple now. Thanks a lot guys for giving me so much time.

     

View as RSS news feed in XML