|
|
Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
Last post 07-25-2008, 10:32 AM by ddrudik. 6 replies.
-
07-24-2008, 4:24 PM |
-
joshq3i
-
-
-
Joined on 07-24-2008
-
-
Posts 3
-
-
|
Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
The problem: Peachtree 2008 exports CSV data and does not escape double quotes inside double quotes. When I import the CSV into MySQL it splits up the fields incorrectly and creates problems with some rows.
I'm trying to build a regular expression to remove the double quotes that appear inside of a field with double quotes. Example data: "Doyle, Jim",Jimmy,jim@doyle.com,"This contains "quotes" in a field",FALSE,TRUE,TRUE,3,1,Apples,2,-3.45,-6.90 "Hoss, Bob",Bob,bob@hoss.com,""Quotes" mess things up, yes they do",TRUE,FALSE,TRUE,1,1,Oranges,3,-2.00,-6.00 Desired output: "Doyle, Jim",Jimmy,jim@doyle.com,"This contains quotes in a field",FALSE,TRUE,TRUE,3,1,Apples,2,-3.45,-6.90 "Hoss, Bob",Bob,bob@hoss.com,"Quotes mess things up, yes they do",TRUE,FALSE,TRUE,1,1,Oranges,3,-2.00,-6.00 Requirements:
I must leave quotes on the field if it contains a comma, but no other quotes should be found inside a quoted field.
I've been trying different regexs for two days, i can't figure it out. Please help!
|
|
-
07-24-2008, 5:52 PM |
-
Sergei Z
-
-
-
Joined on 07-20-2005
-
Saint Augustine, FL USA
-
Posts 2,744
-
-
|
Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
what is the language you will write your regex in? is it possible to have > 1 pairs of unwanted quotes inside "good" quotes? are there nested quotes with depth level > 1? i.e. unwanted quotes inside unwanted quotes..
|
|
-
07-24-2008, 6:02 PM |
-
joshq3i
-
-
-
Joined on 07-24-2008
-
-
Posts 3
-
-
|
Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
I'm using RegexBuddy for testing, it may be used as a PHP regex to run on the CSV data before importing it. Yes, it can have multiple quotes inside the good quotes. It is possible to have unwanted quotes inside unwanted quotes, but i haven't encountered it yet. It is a notes field, so whatever junk people put in there i'll get in the CSV.
Example:
"Sparks, Inc.","Sparks, Inc.",blah,blah,123,""Apples" and "oranges" fell from the "tree" into a 10" x 12" x 8" box.",FALSE,FALSE,123
|
|
-
07-24-2008, 8:09 PM |
|
|
Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
Is this a possible approach - to use the pattern: "(([^"]|("(?!(,|$)))?)*)"|[^,\r\n]+|(?<=^|,)(?=,|$) to break each line into the separate fields which you can then process individually. You can then look at the value that is returned and remove all double-quotes from within it as necessary. Note that existing quoted fields will have the leading and trailing double-quotes still there (and probably should remain) but you can then delete all other double-quotes (or escape them or whatever). Unquoted fields are also returned but these will not have the quotes at the start and end. Missing fields are represented by a null string as tey should. Once you have the individual fields, you can process them and then rejoin them into the CSV line for further processing. This pattern will not handle double-quotes in the middle of on-quoted fields. For example: asd,345"7865,w will cause problems (it will extend the previous quoted field), but this is outside of your specification. Any use? Susan
|
|
-
07-25-2008, 2:06 AM |
-
ddrudik
-
-
-
Joined on 05-24-2007
-
USA
-
Posts 2,079
-
-
|
Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
Isn't the problem really the commas inside of the double quoted strings? The explode function does fine with all your double quotes as long as the delimiter (comma) outside of the double quoted strings is replaced with something other than comma: Note that the showfields function is only to show you the parsing of the fields, the useful functions for the replacement are repdelim and repfunc which is called by repdelim. The functions make an assumption that your users did not put single double quotes in a field without a second double quote to pair with it, if they did the only thing I can recommend is change your delimeter when you export to | character instead of comma. <pre> <?php function repfunc($match){ if ($match[1]){ return $match[1]; } else { return '|'; } } function repdelim($str){ return preg_replace_callback('/("[^"]*")|,/','repfunc',$str); } function showfields($str){ echo "before:$str<br>"; $str=repdelim($str); echo " after:$str<br>"; $array=explode('|',$str); echo print_r($array,true); } showfields('"Doyle, Jim",Jimmy,jim@doyle.com,"This contains "quotes" in a field",FALSE,TRUE,TRUE,3,1,Apples,2,-3.45,-6.90'); showfields('"Hoss, Bob",Bob,bob@hoss.com,""Quotes" mess things up, yes they do",TRUE,FALSE,TRUE,1,1,Oranges,3,-2.00,-6.00'); ?>
|
|
-
07-25-2008, 10:00 AM |
-
joshq3i
-
-
-
Joined on 07-24-2008
-
-
Posts 3
-
-
|
Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
Thank you for the replies. These are good and I think they take care of the data i currently have now. But there could be rows that defeat the regex.
ddrudik -- If I get a row like this... "Fena, Melissa",Melissa Fena,26998,FALSE,"This, is a test. "Testing," is difficult. What is "up," must come down.",FALSE,FALSE,FALSE,1,1,Free Service,1,0.00,0.00 ...that has a comma inside unwanted quotes then it fails to be chopped up correctly by the PHP. This is proper punctuation, so it could be encountered.
susan -- Your regex handled the row shown above correctly, but if there is a comma after an unwanted quote that is inside a quoted field, then it breaks it. "Fena, Melissa",Melissa Fena,26998,FALSE,"This, is a test. "Testing",
is difficult. What is "up," must come down.",FALSE,FALSE,FALSE,1,1,Free
Service,1,0.00,0.00 It's not correct punctuation, but it is possible to type that into the notes field.
______
I think the bottom line is that it's impossible to take care of all the cases which may come up and Peachtree needs to do something different because they do not export a file that conforms to CSV standards. I cannot change the delimiter in Peachtree, I can only export as a CSV file. A backtick or pipe would work wonderful.
|
|
-
07-25-2008, 10:32 AM |
-
ddrudik
-
-
-
Joined on 05-24-2007
-
USA
-
Posts 2,079
-
-
|
Re: Fixing CSV from Peachtree 2008 - double quotes and commas inside double quotes
The unfortunate truth is that if you can control how they enter commas etc. and you delimit the fields in commas without proper quoting then no regex will likely suffice. Peachtree 2008 has the ability to export to XLS as well, if you open that exported format in Excel and re-save it as CSV you will find the fields are likely quoted properly.
|
|
|
|
|