CSV file seen as 'data' rather than 'ASCII' by OS after written via Python -


i'm using python 2.7.5 read in csv file (input.csv), ignore lines, , write result new csv file (output.csv). i've made many different attempts, result in output file being seen operating system (both red hat , mac os x) 'data', rather 'ascii text'.

input.csv:

cat -v input.csv (truncated) hkey_local_machine\software\microsoft\windows nt\currentversion\windows\spooler,yes,1^m hkey_local_machine\software\microsoft\windows nt\currentversion\windows\appinit_dlls,no,a^m hkey_local_machine\system\currentcontrolset\control\session manager,seed,0x714b3c99^m  file input.csv input.csv: data 

script.py (latest attempt):

import io  input_file = '/users/spork_user/desktop/input.csv' output_file = '/users/spork_user/desktop/output.csv'      io.open(input_file, 'r', newline='\r\n') infile, io.open(output_file, 'w', newline='\n') outfile:     line in infile:         #filters lines don't want, example:         if "does not exist" in line:             continue          #to verify how line appears python when reads in         print repr(line)          #without rstrip, blank line between each line in output, , it's still seen 'data'         outfile.write(unicode(line.rstrip('\r\n')+'\n')) 

run:

python script.py (truncated) u'hkey_local_machine\\software\\microsoft\\windows nt\\currentversion\\windows\\spooler,yes,1\r\n' u'hkey_local_machine\\software\\microsoft\\windows nt\\currentversion\\windows\\appinit_dlls,no,a\r\n' u'hkey_local_machine\\system\\currentcontrolset\\control\\session manager,seed,0x714b3c99\r\n' 

output.csv:

cat -v output.csv (truncated) hkey_local_machine\software\microsoft\windows nt\currentversion\windows\spooler,yes,1 hkey_local_machine\software\microsoft\windows nt\currentversion\windows\appinit_dlls,no,a hkey_local_machine\system\currentcontrolset\control\session manager,seed,0x714b3c99  file output.csv output.csv: data 

no matter combination of open read/write flags or stripping of newline characters try, output.csv file ends being seen os 'data'.


however, if make simplified script hardcoded output, provides me ascii type of file i'm looking for:

simplified.py:

import io  output_file = '/users/spork_user/desktop/simple_output.csv' io.open(output_file, 'w', newline='\n') outfile:     outfile.write(unicode('hello\n'))     outfile.write(unicode('this\n'))     outfile.write(unicode('works\n')) 

run:

python simplified.py <no output> 

simple_output.csv:

cat -v simple_output.csv hello works  file simple_output.csv simple_output.csv: ascii text 

how can output.csv seen os ascii text simple_output.csv?

thanks

you input.csv file correct. in order ease porting of csv files across different architecture convention end of line should \r\n if local convention text file in '\n' (unix-like) or \r (mac)

the problem file utility not aware of , erroneously points file binary, text/csv file or @ least ms/dos text file

reference: comma-separated values on wikipedia says :

standardization

...

rfc 4180 formalized csv. defines mime type "text/csv", , csv files follow rules should portable. among requirements:

  • ms-dos-style lines end (cr/lf) characters (optional last line)
  • ...

what : ignore problem of file saying file data, correct text/csv file (and anyway, editors vim can cope different conventions end of line)


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -