CSV file seen as 'data' rather than 'ASCII' by OS after written via Python -
i'm using python 2.7.5 read in csv file (input.csv), ignore lines, , write result new csv file (output.csv). i've made many different attempts, result in output file being seen operating system (both red hat , mac os x) 'data', rather 'ascii text'.
input.csv:
cat -v input.csv (truncated) hkey_local_machine\software\microsoft\windows nt\currentversion\windows\spooler,yes,1^m hkey_local_machine\software\microsoft\windows nt\currentversion\windows\appinit_dlls,no,a^m hkey_local_machine\system\currentcontrolset\control\session manager,seed,0x714b3c99^m file input.csv input.csv: data
script.py (latest attempt):
import io input_file = '/users/spork_user/desktop/input.csv' output_file = '/users/spork_user/desktop/output.csv' io.open(input_file, 'r', newline='\r\n') infile, io.open(output_file, 'w', newline='\n') outfile: line in infile: #filters lines don't want, example: if "does not exist" in line: continue #to verify how line appears python when reads in print repr(line) #without rstrip, blank line between each line in output, , it's still seen 'data' outfile.write(unicode(line.rstrip('\r\n')+'\n'))
run:
python script.py (truncated) u'hkey_local_machine\\software\\microsoft\\windows nt\\currentversion\\windows\\spooler,yes,1\r\n' u'hkey_local_machine\\software\\microsoft\\windows nt\\currentversion\\windows\\appinit_dlls,no,a\r\n' u'hkey_local_machine\\system\\currentcontrolset\\control\\session manager,seed,0x714b3c99\r\n'
output.csv:
cat -v output.csv (truncated) hkey_local_machine\software\microsoft\windows nt\currentversion\windows\spooler,yes,1 hkey_local_machine\software\microsoft\windows nt\currentversion\windows\appinit_dlls,no,a hkey_local_machine\system\currentcontrolset\control\session manager,seed,0x714b3c99 file output.csv output.csv: data
no matter combination of open read/write flags or stripping of newline characters try, output.csv file ends being seen os 'data'.
however, if make simplified script hardcoded output, provides me ascii type of file i'm looking for:
simplified.py:
import io output_file = '/users/spork_user/desktop/simple_output.csv' io.open(output_file, 'w', newline='\n') outfile: outfile.write(unicode('hello\n')) outfile.write(unicode('this\n')) outfile.write(unicode('works\n'))
run:
python simplified.py <no output>
simple_output.csv:
cat -v simple_output.csv hello works file simple_output.csv simple_output.csv: ascii text
how can output.csv seen os ascii text simple_output.csv?
thanks
you input.csv
file correct. in order ease porting of csv files across different architecture convention end of line should \r\n
if local convention text file in '\n' (unix-like) or \r
(mac)
the problem file utility not aware of , erroneously points file binary, text/csv file or @ least ms/dos text file
reference: comma-separated values on wikipedia says :
standardization
...
rfc 4180 formalized csv. defines mime type "text/csv", , csv files follow rules should portable. among requirements:
- ms-dos-style lines end (cr/lf) characters (optional last line)
- ...
what : ignore problem of file
saying file data, correct text/csv file (and anyway, editors vim can cope different conventions end of line)
Comments
Post a Comment