Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

Match empty values form CSV

  •  06-20-2009, 11:20 AM

    Match empty values form CSV

    Hi

     

    I’m using Regex to split values from csv file (using C#) .

     

    Code on this page: http://regexadvice.com/blogs/wayneking/ do the job.

     

    Generally it splits values for some delimiter (in my case  comma {,}) and also correctly split the values that are surrounded by double quotes (case when comma is part of content not as delimiter).

     

     

    The split is performed by following pattern:

     

    @"""([^""\\]*[\\.[^""\\]*]*)""" +

     

             "|" +

     

             @"([^" + delimiters + @"]+)"

     

    Where ‘delimiters’ is parameter passed to function and define delimiters.

     

    I’m using just one {,}.

     

     

    This is complete function:

     

    public static string[] SplitQuoted(string text, string delimiters)

     

        {

     

            // Default delimiters are a space and tab (e.g. " \t").

     

            // All delimiters not inside quote pair are ignored. 

     

            // Default quotes pair is two double quotes ( e.g. '""' ).

     

            if (text == null)

     

                throw new ArgumentNullException("text", "text is null.");

     

            if (delimiters == null || delimiters.Length < 1)

     

                delimiters = " \t"; // Default is a space and tab.

     

     

            ArrayList res = new ArrayList();

     

     

            // Build the pattern that searches for both quoted and unquoted elements

     

            // notice that the quoted element is defined by group #2 (g1)

     

            // and the unquoted element is defined by group #3 (g2).

     

     

            string pattern =

     

             @"""([^""\\]*[\\.[^""\\]*]*)""" +

     

             "|" +

     

             @"([^" + delimiters + @"]+)";

     

     

            // Search the string.

     

            foreach (System.Text.RegularExpressions.Match m in System.Text.RegularExpressions.Regex.Matches(text, pattern))

     

            {

     

                string g0 = m.Groups[0].Value;

     

                string g1 = m.Groups[1].Value;

     

                string g2 = m.Groups[2].Value;

     

                //if (g2 != null && g2.Length > 0)

     

                //{

     

                //    res.Add(g2);

     

                //}

     

                //else

     

                //{

     

                //    // get the quoted string, but without the quotes in g1;

     

                //    res.Add(g1);

     

                //}

     

                res.Add(g2);

     

               

     

            }

     

            return (string[])res.ToArray(typeof(string));

     

        }

     

     

     

    Everything works just fine when all values in all rows are not empty, but when some value is empty it will not match it.

     

    So for example:

     

    One,two,three will return 3 matches

     

    But

     

    One,,three will return 2 matches

     

     

    How to adopt this regex pattern so It will match empty string. I’m not that good with regular expressions but believe that this group @"([^" + delimiters + @"]+)" in pattern is responsible for that .

     

     

    Any help is appreciated

     

    Filed under:
View Complete Thread