Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

How to replace newlines inside doublequoted?

Last post 01-27-2011, 4:30 PM by Aussie Susan. 5 replies.
Sort Posts: Previous Next
  •  01-26-2011, 8:37 AM 77253

    How to replace newlines inside doublequoted?

    I would like to know how to replace \r\n or \r from double quoted parts of a string... i.e. for a "_"

    i.e.
    input = line1:field1,field2 ," field3"\nline2:field4,field5,"fie\nld6"`nline3:"field7"

    and output should change 'fie_ld6' but remain '\nline2'

    I have reading a lot of posts and google and regex docs, but I cant do that. Im using "pcre" engine.

    Thanks a lot.

    NOTE: I´ve been able to make the opposite: change "," by ";" outside the quoted with regex:

    "[^"]*"|[^",]*)?

    replace for

    $1;

  •  01-26-2011, 3:28 PM 77263 in reply to 77253

    Re: How to replace newlines inside doublequoted?

    You could try:

        Search: ^((?:[^"\n]*"[^"\n]*")*[^"\n]*"[^"\n]*)\n

        Replace: $1_

    and repeat until there are no matches. It won't work as a global replace.

    Why? Well, consider the text after the \n which is replaced:

        ld6"\nline3:"field7"

    It looks like there's a \n between quotes, so that would also be replaced.

  •  01-26-2011, 11:17 PM 77268 in reply to 77253

    Re: How to replace newlines inside doublequoted?

    Try:

    \n(?=[^"]*"([^"]*"[^"]*")*([^"]*)\z)

    with a replacement string of:

    _

    This assumes that you presenting a single "line" of comma separated values at a time - there has to be a way of distinguishing a '\n' that is within a field and a '\n' that ends the "line" and, without additional information about the way the values are specified (one of the things we ask in the posting guidelines in the stocky note at the start of this forum is to not provide made-up examples). Also, it assumes that all double-quotes are correctly balanced and none are escaped etc.

    [Actually, it WILL work even if you provide multiple lines as a time  - however it will ALWAYS check for balanced double-quotes to the end of the string and, if you have a large block of text, then this will probably take some time to perform. It STILL requires there to be balanced double-quotes throughout the entire text]

    This is very similar to MRAB's solution but it can be used as a global replace operation.

    It works by locating a newline character. IT then scans forward to see if there are an even or odd number of double-quote characters between the newline and the end of the string - if there are an odd number then the match is made, if an even number then the match is ignored.

    Susan

  •  01-27-2011, 5:48 AM 77281 in reply to 77268

    Re: How to replace newlines inside doublequoted?

    Thanks a lot to MRAB and Susan...

    It work great!!!. 

    I have googled a lot for a solution for with no result. Probably this is the only post solving this.

    Now, I can fix the google contacts csv export file (, separated and quoted notes with \r\n inside) for a excel .csv readable (; separated in spain, with quoted fields with \r lines).

    This is the ahk code to do that:

    FileRead, String, contacts.csv

    String := RegExReplace(String, "(""[^""]*""|[^"",]*)?," , "$1;" ) ; change ',' by ';' outside quotes

    String := RegExReplace(String, "\n(?=[^""]*""([^""]*""[^""]*"")*([^""]*)\z)" , "" ) ; remove `n (`r`n) inside quotes (left `r lines). Thanks to Aussie Susan @ regexadvice.com

    FileDelete, contacts_fixed.csv

    FileAppend, %String%, contacts_fixed.csv

     

    Filed under: , , ,
  •  01-27-2011, 3:48 PM 77291 in reply to 77281

    Re: How to replace newlines inside doublequoted?

    Sorry for the hijack.

    I tried this expression on a csv file.  One of the fields in each record is surrounded by double quotes.  Within the quotes are \r and \n characters, which I need to remove.  I tried the example here, howver it removed all the \r\n characters.  Is there any way just to remove these characters from the quoted field?

    \n(?=[^"]*"([^"]*"[^"]*")*([^"]*)\z)

    \r(?=[^"]*"([^"]*"[^"]*")*([^"]*)\z)

  •  01-27-2011, 4:30 PM 77292 in reply to 77291

    Re: How to replace newlines inside doublequoted?

    Can you not only start a new post (as I suspect your situation may be sufficiently different to warrant it) but also read the posting guidelines in the sticky note at the start of this forum and provide responses to as many of the items listed there as possible. In this case knowing the regex variant that is being used, the environment (Windows, Unit etc, and programming language) a sample of your input and the expected output would be a minimum.

    Finally, have you read the caveats mentioned in the previous responses as some of them may well apply.

    Susan

View as RSS news feed in XML