Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

Last post 07-25-2008, 10:32 AM by ddrudik. 6 replies.
Sort Posts: Previous Next
  •  07-24-2008, 4:24 PM 44538

    Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    The problem:

    Peachtree 2008 exports CSV data and does not escape double quotes inside double quotes.  When I import the CSV into MySQL it splits up the fields incorrectly and creates problems with some rows.

    I'm trying to build a regular expression to remove the double quotes that appear inside of a field with double quotes.

    Example data:

    "Doyle, Jim",Jimmy,jim@doyle.com,"This contains "quotes" in a field",FALSE,TRUE,TRUE,3,1,Apples,2,-3.45,-6.90
    "Hoss, Bob",Bob,bob@hoss.com,""Quotes" mess things up, yes they do",TRUE,FALSE,TRUE,1,1,Oranges,3,-2.00,-6.00

    Desired output:

    "Doyle, Jim",Jimmy,jim@doyle.com,"This contains quotes in a field",FALSE,TRUE,TRUE,3,1,Apples,2,-3.45,-6.90
    "Hoss, Bob",Bob,bob@hoss.com,"Quotes mess things up, yes they do",TRUE,FALSE,TRUE,1,1,Oranges,3,-2.00,-6.00

    Requirements:

    I must leave quotes on the field if it contains a comma, but no other quotes should be found inside a quoted field.

    I've been trying different regexs for two days, i can't figure it out. Please help!

  •  07-24-2008, 5:52 PM 44540 in reply to 44538

    Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    what is the language you will write your regex in?

    is it possible to have > 1 pairs of unwanted quotes inside "good" quotes?

    are there nested quotes with depth level > 1? i.e. unwanted quotes inside unwanted quotes..

  •  07-24-2008, 6:02 PM 44541 in reply to 44540

    Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    I'm using RegexBuddy for testing, it may be used as a PHP regex to run on the CSV data before importing it.

    Yes, it can have multiple quotes inside the good quotes. It is possible to have unwanted quotes inside unwanted quotes, but i haven't encountered it yet.  It is a notes field, so whatever junk people put in there i'll get in the CSV.

    Example:

    "Sparks, Inc.","Sparks, Inc.",blah,blah,123,""Apples" and "oranges" fell from the "tree" into a 10" x 12" x 8" box.",FALSE,FALSE,123

     

  •  07-24-2008, 8:09 PM 44546 in reply to 44541

    Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    Is this a possible approach - to use the pattern:

    "(([^"]|("(?!(,|$)))?)*)"|[^,\r\n]+|(?<=^|,)(?=,|$)

    to break each line into the separate fields which you can then process individually. You can then look at the value that is returned and remove all double-quotes from within it as necessary. Note that existing quoted fields will have the leading and trailing double-quotes still there (and probably should remain) but you can then delete all other double-quotes (or escape them or whatever). Unquoted fields are also returned but these will not have the quotes at the start and end. Missing fields are represented by a null string as tey should.

    Once you have the individual fields, you can process them and then rejoin them into the CSV line for further processing.

    This pattern will not handle double-quotes in the middle of on-quoted fields. For example:

    asd,345"7865,w

    will cause problems (it will extend the previous quoted field), but this is outside of your specification. 

    Any use?

    Susan 

  •  07-25-2008, 2:06 AM 44549 in reply to 44541

    Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    Isn't the problem really the commas inside of the double quoted strings?  The explode function does fine with all your double quotes as long as the delimiter (comma) outside of the double quoted strings is replaced with something other than comma:

    Note that the showfields function is only to show you the parsing of the fields, the useful functions for the replacement are repdelim and repfunc which is called by repdelim.

    The functions make an assumption that your users did not put single double quotes in a field without a second double quote to pair with it, if they did the only thing I can recommend is change your delimeter when you export to | character instead of comma.

    <pre>
    <?php
    function repfunc($match){
      if ($match[1]){
        return $match[1];
      } else {
        return '|';
      }
    }
    function repdelim($str){
      return preg_replace_callback('/("[^"]*")|,/','repfunc',$str);
    }
    function showfields($str){
      echo "before:$str<br>";
      $str=repdelim($str);
      echo " after:$str<br>";
      $array=explode('|',$str);
      echo print_r($array,true);
    }
    showfields('"Doyle, Jim",Jimmy,jim@doyle.com,"This contains "quotes" in a field",FALSE,TRUE,TRUE,3,1,Apples,2,-3.45,-6.90');
    showfields('"Hoss, Bob",Bob,bob@hoss.com,""Quotes" mess things up, yes they do",TRUE,FALSE,TRUE,1,1,Oranges,3,-2.00,-6.00');
    ?>


  •  07-25-2008, 10:00 AM 44565 in reply to 44549

    Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    Thank you for the replies.  These are good and I think they take care of the data i currently have now.  But there could be rows that defeat the regex.

    ddrudik -- If I get a row like this...

    "Fena, Melissa",Melissa Fena,26998,FALSE,"This, is a test. "Testing," is difficult. What is "up," must come down.",FALSE,FALSE,FALSE,1,1,Free Service,1,0.00,0.00

    ...that has a comma inside unwanted quotes then it fails to be chopped up correctly by the PHP.  This is proper punctuation, so it could be encountered.

    susan --  Your regex handled the row shown above correctly, but if there is a comma after an unwanted quote that is inside a quoted field, then it breaks it.

    "Fena, Melissa",Melissa Fena,26998,FALSE,"This, is a test. "Testing", is difficult. What is "up," must come down.",FALSE,FALSE,FALSE,1,1,Free Service,1,0.00,0.00

    It's not correct punctuation, but it is possible to type that into the notes field.

    ______

     I think the bottom line is that it's impossible to take care of all the cases which may come up and Peachtree needs to do something different because they do not export a file that conforms to CSV standards.  I cannot change the delimiter in Peachtree, I can only export as a CSV file.  A backtick or pipe would work wonderful.

  •  07-25-2008, 10:32 AM 44570 in reply to 44565

    Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes

    The unfortunate truth is that if you can control how they enter commas etc. and you delimit the fields in commas without proper quoting then no regex will likely suffice.

    Peachtree 2008 has the ability to export to XLS as well, if you open that exported format in Excel and re-save it as CSV you will find the fields are likely quoted properly.


View as RSS news feed in XML