Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

pl-sql analyzer (how to match comments and strings)

Last post 07-31-2007, 7:15 AM by mucar. 15 replies.
Page 1 of 2 (16 items)   1 2 Next >
Sort Posts: Previous Next
  •  07-18-2007, 9:41 AM 33040

    pl-sql analyzer (how to match comments and strings)

    I am a new bie of reg-ex and i try to code a pl-sql syntax analyzer program using reg-ex with C#.NET.
     
    I want to construct a regular expression to match comments and strings in pl-sql like this;

            1: public static Regex sqlString = new Regex(@"'.*'", RegexOptions.Compiled);
            2: public static Regex sqlSingleLineComment = new Regex(@"--.*", RegexOptions.Compiled);
            3: public static Regex sqlMultiLineComment = new Regex(@"/\*(.*|\n*)*\*/", RegexOptions.Compiled); 

    But i have some diffuculties;

    1) With the first regex it can match 'abcd' but also it mustn't match 'abcd' || 'defg' (as a single match)

    2) With the second regex it can match '--This is a PL/SQL comment. It can contain every characters except new line...'
    but also it mustn't match dbms_output.put_line('--this is not a comment');

    3) With the last regex it can match

    '/* This is
    a multiline
    comment */'

    but it mustn't match

    '/* This is
    first comment
    */

    some codes

    /* This is
    second comment*/' (as a single match)

     Thanks.

    Filed under: ,
  •  07-18-2007, 2:22 PM 33056 in reply to 33040

    Re: pl-sql analyzer (how to match comments and strings)

    Partial Answer:

    Use '.*?'    the ? makes the match non-greedy, so it will stop at the first single quote mark instead of stopping at the last quote mark.

    Use the same concept to solve your third problem.

     

  •  07-18-2007, 10:57 PM 33070 in reply to 33040

    Re: pl-sql analyzer (how to match comments and strings)

    mucar,

    I've taken a look at the pl/sql syntax for a string literal (http://download.oracle.com/docs/cd/B12037_01/appdev.101/b10807/02_funds.htm#i16004) and the full 'rule' is rather complicated as you can define your own delimiters that allow you to include singe quotes within your string.

    The 'trivial' solution is to ignore this type of string literal and either use lyndar's soution, or:

    ' [^']* '

    (I am assuming the 'ignore cases' option is on - otherwise remove the spaces in the pattern).

    Just as an exercise, and probably going WAY beyond what you are looking for, I have found what I think is a pattern that will allow for the specially delimited strings as described in the web page referenced above:

    # Match the initial single quote
    '
    # The next bit checks for a special delimiter, [ { < or ( which is matched with ] } > or )
    # Alternatively, allow for any other punctuation character which will match itself 
    (?<FirstChar>
      (?<C1>\[)
      |
      (?<C2>{)
      |
      (?<C3><)
      |
      (?<C4>\()
      |
      (?<Cn>[\p{P}])
    )?

    # If we are dealing with a special delimiter, then match anything except the ending pattern
    (
      (?(FirstChar)
        (
          (?(C1) (?!]') .
              |
              (?(C2) (?!}') .
                  |
                  (?(C3) (?!>') .
                     |
                     (?(C4) (?!\)') .
                         |
                         (?(Cn) (?!\k<Cn>') .
                         )
                     )
                  )
              )
          )
        )
    # If we are not dealing with a special delimiter, then allow either two consecutive single quotes or any non-single quote character
        |
        (
          [^']
          |
          ''
        )
      )
    )*

    # Check for any special delimiter trailing pattern and match with the 'partner' character
    (?(FirstChar)
      (
        (?(C1)\]
          |
          (?(C2)}
            |
            (?(C3)>
              |
              (?(C4) \)
                |
                (?(Cn) \k<Cn>
                )
              )
            )
          )
        )
      )
    )
    #and/or the final single quote
    '

    What this does is explained a bit in the comments. It's a mess, but it also seems to work with the limited testing I've done. Note that it also allows strings to be split over two or more lines, but that is in line with my reading of the pl/sql spec: it specifically mentions that carriage returns and line feeds are 'characters' and any 'character' can occur between the single quotes. Changing the [^'] to [^\r\n'] would fix that if necessary.

    As for the single line comments, your regex would appear to be adequate, but see my comments later on about the order of your expression matching

    Lyndar's comment about the multi-line match is the easiest way to go.

    You need to be a bit careful about the order in which you do your string matching and understand what you are going to do with the matches once you have them. If you are replacing the text with something else ( blanks or deleting the text altogether), then to overcome the 'comment in the string' (and the 'string in the comment') issue, if you find all comments and then look for all strings you will be OK. Otherwise you will need to get a bit more complicated to exclude comments in strings - it can be done, but you will need to look at your overall goals for that.

    Hope this helps

    Susan 

     

  •  07-20-2007, 2:25 AM 33102 in reply to 33070

    Re: pl-sql analyzer (how to match comments and strings)

    Lyndar, Aussie Susan special thanks for your helpful answers, this tips helped me much. I construct my 1st and 3rd reg-ex as you suggest, but my 2nd reg-ex is still pending. Do you have an idea for it?

    I tried and constructed a reg-ex as I understand from the link and the explanation that Aussie given;

    (?!('[^']*'))--(\w| |[0..9*/()+-<>=!~^'@%"#$&_|{}?\[\]])*
     

    But it doesn't execute as i want. Any advice or tips are welcomed. Thanks

  •  07-22-2007, 8:08 PM 33156 in reply to 33102

    Re: pl-sql analyzer (how to match comments and strings)

    mucar,

    The regex I've come up with for your case #2 is as follows:

    ^
    (
      [^'\-\r\n]*
      (' [^']* ')?
      (-- .* $)?
    ) *

    I have used the 'ignore whitespace' and multiline (^ match start of string AND start of line)  options. My system uses \r\n at the end of each line with the '$' only matching the \n - you may need to play with this part to get it to work on your system.

    It works by scanning each line looking for either the start of a possible quoted literal (see a previous reply for the 'complete' version of how to do this properly) or the possible start of a quote. If it is a literal, then scan everything until you get to the next single quote (including end-of-lines so multi-line literals are processed). If it is a '-', then see if it is actually '--'. If it is, then it cannot be inside a literal, so it must be the start of a single-line comment and therefore we grab everything to the end of the line.

    The way to use this pattern is to check Group #3 within each match to see if it has succeeded. If not, then there is no single line comment; if so, then group #3 matches the extent of the comment. So something like:

    Match m;
    MatchCollection mc = regex.Matches(Pattern, string)
    foreach m in mac
    {
        if(m.group[3].success)
        {
             // do whatever is needed if a comment is present
        }
        else
        {
            // do whatever is needed if no comment is present
        }
    }

    My test data is:

    --This is a PL/SQL comment. It can contain every characters except new line...
    dbms_output.put_line('--this is not a comment');
    -- Hello UFO
    Wombats -- with comments
    SomeFunction('Argument', Other arg); --LineComment
    C = a - b;
    Delete * from Employees;
    Quoted 'literal that --has an embedded comment in multiline literal
    is correctly handled'

    and group # matches on lines 1, 3, 4 and 5 only.

    Susan 

  •  07-24-2007, 2:51 AM 33191 in reply to 33156

    Re: pl-sql analyzer (how to match comments and strings)

    Thanks Susan,

    But I couldn't achieve to match appropriate fields. I coded as you suggest;

                    String s = "^ ([^'\-\r\n]*('[^']*')?(--.*$)?)*";
                    Regex singleLineComment = new Regex(regex.Text, RegexOptions.Compiled);
                    MatchCollection matches = singleLineComment.Matches(sqlText.Text);
                    foreach (Match m in matches)
                    {
                        if (m.Groups[3].Success)
                        {
                            MessageBox.Show("Successful:" +m.Groups[3].Value);
                        }
                        else
                        {
                            MessageBox.Show("Unsuccessful:" + m.Groups[3].Value);
                        }
                    }
  •  07-24-2007, 6:52 PM 33216 in reply to 33191

    Re: pl-sql analyzer (how to match comments and strings)

    mucar,

    I think you have a space after the leading '^' character in the pattern. Unless you either want a space there (which I doubt) or you include the 'RegexOptions.IgnorePatternWhitespace' option, then you should delete it.

    Susan
     

  •  07-26-2007, 12:46 AM 33267 in reply to 33216

    Re: pl-sql analyzer (how to match comments and strings)

    Susan, thank you very much for your answers. But it doesn't match anything actually. I want to cryyyy [:'(] I tried all of them, but no success.
  •  07-26-2007, 7:00 PM 33300 in reply to 33267

    Re: pl-sql analyzer (how to match comments and strings)

    mucar,

    What is the problem? Does it fail to compile? not match anything at all? match the wrong thing? the wrong group?

    Susan 

  •  07-27-2007, 1:25 AM 33307 in reply to 33300

    Re: pl-sql analyzer (how to match comments and strings)

    Not matching at all. The code i write is as the same as above.
  •  07-27-2007, 2:51 AM 33308 in reply to 33307

    Re: pl-sql analyzer (how to match comments and strings)

    Sorry to be pedantic, but did you remove the space from the pattern as I suggested?

    Also can you please give me a sample of your source text that should match but is not?

     
    Thanks

    Susan
     

  •  07-27-2007, 10:26 AM 33318 in reply to 33308

    Re: pl-sql analyzer (how to match comments and strings)

    Susan i tried all you suggested, but i couldn't understand what i did wrong. The codes are as the same as below;

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Text;
    using System.Windows.Forms;
    using System.Text.RegularExpressions;

    namespace PLSQL2VisiFlowChart
    {
       
        public partial class Form1 : Form
        {

            public static Regex regex;
            public Form1()
            {
                InitializeComponent();
            }

            private void button1_Click(object sender, EventArgs e)
            {
              
            }

            private void button2_Click(object sender, EventArgs e)
            {

            }

            private void Form1_Load(object sender, EventArgs e)
            {
                regexText.Text = @"^([^'\-\r\n]*('[^']*')?(--.*$)?)*";
                sqlText.Text = "--hello world\n" +
                    "--hello world;\n" +
                    "--hello -- 123;\n" +
                    "SP(x,y);--stored proc\n" +
                    "SP(x,y);--stored proc;\n" +
                    "SP(x,y); --stored proc\n" +
                    "SP(x,y);-- stored proc;\n" +
                    "hey := '--hey ------------proc';\n" +
                    "hey := '--aa -bb';--comment\n" +
                    "hey := '--aaa bb';     --comment\n" +
                    "dd := 'x5-*--*' || '- hd --'; -- 1222222222\n" +
                    "--deneme.i; - -d-e- --- adfc\n" +
                    "asd:='------deneme------';";
            }

            private void btnFind_Click(object sender, EventArgs e)
            {
                try
                {

                    regex = new Regex(regexText.Text, RegexOptions.Compiled);
                    MatchCollection matches = regex.Matches(sqlText.Text);
                    foreach (Match m in matches)
                    {
                        if (m.Groups[3].Success)
                        {
                            MessageBox.Show("Successful:" + m.Groups[3].Value);
                        }
                        else
                        {
                            MessageBox.Show("Unsuccessful:" + m.Groups[3].Value);
                        }
                    }

                }
                catch (Exception ex)
                {
                    MessageBox.Show("Error!" + Environment.NewLine + "Source: " + ex.Source + Environment.NewLine + "Message: " + ex.Message +
                          Environment.NewLine + "Stack Trace: " + ex.StackTrace);
                }

            }

            private void btnReplace_Click(object sender, EventArgs e)
            {
                try
                {
                    regex = new Regex(regexText.Text, RegexOptions.Compiled);
                    resultText.Text = regex.Replace(sqlText.Text.Trim(), "**found it**");
                }
                catch (Exception ex)
                {
                    MessageBox.Show("Error!" + Environment.NewLine + "Source: " + ex.Source + Environment.NewLine + "Message: " + ex.Message +
                          Environment.NewLine + "Stack Trace: " + ex.StackTrace);
                }
            }
        }
    }

  •  07-28-2007, 11:50 PM 33342 in reply to 33318

    Re: pl-sql analyzer (how to match comments and strings)

    mucar,

    You as passing multiple 'lines' to the regex, and the pattern is specifically looking for the start and end of each line. However, the default action for the "^" and "$" placeholder matches are the start and end of the complete input string respectively. You want them to match the start and end of each line. Therefore, add in the RegexOptions.Multiline when you create the regex instance.

    Susan 

  •  07-30-2007, 9:14 AM 33366 in reply to 33342

    Re: pl-sql analyzer (how to match comments and strings)

    hey Susan,

    i tried what u said but it didn't work either... could u send me a c# cs file which contains a regex object that removes these comments plz?

    ovguvecan@gmail.com is my mail address... thanks Smile

  •  07-30-2007, 7:42 PM 33384 in reply to 33366

    Re: pl-sql analyzer (how to match comments and strings)

    mucar,

    I took your code as a basis and wrote the following:

    using System;
    using System.Text.RegularExpressions;

    namespace CommentDemo
    {
        class Program
        {
            public static void Main(string[] args)
            {
                string RegexPattern = @"^([^'\-\r\n]*('[^']*')?(--.*$)?)*";
                string TestText = "--hello world\n" +
                    "--hello world;\n" +
                    "--hello -- 123;\n" +
                    "SP(x,y);--stored proc\n" +
                    "SP(x,y);--stored proc;\n" +
                    "SP(x,y); --stored proc\n" +
                    "SP(x,y);-- stored proc;\n" +
                    "hey := '--hey ------------proc';\n" +
                    "hey := '--aa -bb';--comment\n" +
                    "hey := '--aaa bb';     --comment\n" +
                    "dd := 'x5-*--*' || '- hd --'; -- 1222222222\n" +
                    "--deneme.i; - -d-e- --- adfc\n" +
                    "asd:='------deneme------';";
                Regex TestRegex = new Regex( RegexPattern, RegexOptions.Multiline);
                MatchCollection matches = TestRegex.Matches(TestText);
                foreach (Match m in matches)
                {
                    if (m.Groups[3].Success)
                    {
                        Console.WriteLine("Successful:" + m.Groups[3].Value);
                    }
                    else
                    {
                        Console.WriteLine("Unsuccessful:" + m.Groups[3].Value);
                    }
                }
                Console.Write("Press any key to continue . . . ");
                Console.ReadKey(true);
            }
        }
    }

    The output I got on the console is:

    C:\>mono CommentDemo.exe
    Successful:--hello world
    Successful:--hello world;
    Successful:--hello -- 123;
    Successful:--stored proc
    Successful:--stored proc;
    Successful:--stored proc
    Successful:-- stored proc;
    Unsuccessful:
    Successful:--comment
    Successful:--comment
    Successful:-- 1222222222
    Successful:--deneme.i; - -d-e- --- adfc
    Unsuccessful:
    Press any key to continue . . .
    C:\>


    Susan 

Page 1 of 2 (16 items)   1 2 Next >
View as RSS news feed in XML