Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Please help me come with an expression!

I require to search for a space which is not part of double quotes. How do I do this?

How to do this?

Suppose there is some string like:

"This is \"body text\" sample \"body text\" no more"

Now I need to find matches for all spaces which are not part of the quoted text. (i.e, the space in between the "body text" needs to be excluded).

How do i do this?Please help.....

 

Sponsor
Published Sunday, December 24, 2006 3:00 PM by arjun2u

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Tom Pester said:

using System; using System.Collections.Generic; using System.Text.RegularExpressions; public class MyClass { public static void Main() { string SubjectString = "This is \"body text\" sample \"body text\" no more"; try { Regex RegexObj = new Regex("(.+?)(\"body text\"|$)", RegexOptions.IgnoreCase); Match MatchResults = RegexObj.Match(SubjectString); int matchNr = 0; while (MatchResults.Success) { matchNr++; for (int i = 1; i < MatchResults.Groups.Count; i++) { Group GroupObj = MatchResults.GroupsIdea; if (GroupObj.Success) { WL(matchNr + "-" + i + " - " + GroupObj); } } MatchResults = MatchResults.NextMatch(); } } catch (ArgumentException ex) { // Syntax error in the regular expression } RL(); } #region Helper methods private static void WL(object text, params object[] args) { Console.WriteLine(text.ToString(), args); } private static void RL() { Console.ReadLine(); } private static void Break() { System.Diagnostics.Debugger.Break(); } #endregion }
December 24, 2006 10:54 AM
 

arjun2u said:

Thank you for the reply! But this doesn't server my purpose. Its my mistake that i have not stated the issue clearly earlier. Here's the scenario There are a list of codes seperated by "comma" like so: codes: "Code1 code2 F/UN=\"value value\" code3 code4" Now I need to split this string such that i get all the list of codes into an array of somekind,i.e. the elements should be [0]: Code1 [1]: Code2 [2]: F/UN=\"value value\" [3]: code3 [4]: code4 So thatz the reason i required to find a regular expression such that we can split on the basis of those spaces which are not within quotes. Can you please help me resolve this issue. Thanks & Regards Nagarjuna
December 26, 2006 12:17 PM
 

Tom Pester said:

So you want to split the string where there are spaces? You can do this with the split static function : string test = "azrez zrzer zaerzer"; string [] arr = test.Split(' ');
December 26, 2006 4:38 PM
 

Tom Pester said:

Hi There. In the following post I understood what you wanted. I gave some hints on how to do it there. Good luck :)
December 26, 2006 5:08 PM
 

Josh Titcomb said:

I am confused by your statement that the list is separated by "comma" when I don't see any commas, but I was able to split the string you provided using the following regex: "/(?<=^|\s)[^\"\s]*(\")?(?(1).+?\"|\S+)\S*(?=$|\s)/"
January 4, 2007 5:25 PM
 

arjun2u said:

Thank you Josh!

U hv solved the problem.

Ya sry i messed up the problem statement. It was supposed to be a "space" and not "comma".

Thx a lot!

January 8, 2007 1:09 AM
 

arjun2u said:

Josh, Can you explain the regular expression in brief.

I am kinda gettin confused.....

please do the needful

thx

January 8, 2007 1:12 AM
 

JTitcomb said:

It is a fairly complicated regex, to do something so seemingly simple... Take a look: "/(?<=^|\s)[^\"\s]*(\")?(?(1).+?\"|\S+)\S*(?=$|\s)/" there are two positive assertions, one a lookahead and the other a lookbehind. Those are in the first and last sets of parentheses: (?<=^|\s) and (?=$|\s) The first one says your match should directly follow either the beginning of a line or a whitespace character. The other says it should be directly followed by the end of a line or another whitespace character. With those out of the way, that leaves: [^\"\s]*(\")?(?(1).+?\"|\S+)\S* First it's looking for any characters that are not double quotes or whitespace: [^\"\s]* Then it's checking to see whether there is a double quote: (\")? Note that this is the first capturing subpattern, and is backrefernced with the number 1. Next there is a conditional subpattern: (?(1).+?\"|\S+) That says, if capturing subpattern 1 matched -> ?(1) <- (e.g. there was a double quote), look for any number of characters, including whitespace, until another double quotation is reached -> .+?\" <-. If capturing subpattern 1 did not match (e.g. there was no double quote), look for at least one non-whitespace character -> \S+ <-. That just leaves the last bit: \S* Which grabs the rest of the non-whitespace characters. Please note that there are definite limitations to this regex, and while it worked on the sample you gave me, it may not work if your actual problem deviates much from the sample (for instance if there can be multiple sets of quotes in a single entry). Good luck.
January 8, 2007 8:58 AM
 

arjun2u said:

Thx a lot. It totally served my purpose.

January 8, 2007 10:13 AM
 

Steve Kelly said:

Hey JTitcomb - Many, many thanks from me for this solution - helped me implement a tag cloud!
May 2, 2007 3:38 PM

Leave a Comment

(required) 
(optional)
(required) 
Enter the code you see below

Submit

This Blog

Syndication

Tags

No tags have been created or used yet.