Hi
I’m using Regex to split values from csv file (using C#) .
Code on this page: http://regexadvice.com/blogs/wayneking/ do the job.
Generally it splits values for some delimiter (in my case comma {,}) and also correctly split the values that are surrounded by double quotes (case when comma is part of content not as delimiter).
The split is performed by following pattern:
@"""([^""\\]*[\\.[^""\\]*]*)""" +
"|" +
@"([^" + delimiters + @"]+)"
Where ‘delimiters’ is parameter passed to function and define delimiters.
I’m using just one {,}.
This is complete function:
public static string[] SplitQuoted(string text, string delimiters)
{
// Default delimiters are a space and tab (e.g. " \t").
// All delimiters not inside quote pair are ignored.
// Default quotes pair is two double quotes ( e.g. '""' ).
if (text == null)
throw new ArgumentNullException("text", "text is null.");
if (delimiters == null || delimiters.Length < 1)
delimiters = " \t"; // Default is a space and tab.
ArrayList res = new ArrayList();
// Build the pattern that searches for both quoted and unquoted elements
// notice that the quoted element is defined by group #2 (g1)
// and the unquoted element is defined by group #3 (g2).
string pattern =
@"""([^""\\]*[\\.[^""\\]*]*)""" +
"|" +
@"([^" + delimiters + @"]+)";
// Search the string.
foreach (System.Text.RegularExpressions.Match m in System.Text.RegularExpressions.Regex.Matches(text, pattern))
{
string g0 = m.Groups[0].Value;
string g1 = m.Groups[1].Value;
string g2 = m.Groups[2].Value;
//if (g2 != null && g2.Length > 0)
//{
// res.Add(g2);
//}
//else
//{
// // get the quoted string, but without the quotes in g1;
// res.Add(g1);
//}
res.Add(g2);
}
return (string[])res.ToArray(typeof(string));
}
Everything works just fine when all values in all rows are not empty, but when some value is empty it will not match it.
So for example:
One,two,three will return 3 matches
But
One,,three will return 2 matches
How to adopt this regex pattern so It will match empty string. I’m not that good with regular expressions but believe that this group @"([^" + delimiters + @"]+)" in pattern is responsible for that .
Any help is appreciated