pandas - allowing multiple inputs to python subprocess -
i have near-identical problem 1 asked several years ago : python subprocess 2 inputs received 1 answer no implemention. i'm hoping repost may clear things me , others.
as in above, use subprocess wrap command-line tool takes multiple inputs. in particular, want avoid writing input files disk, rather use e.g. named pipes, alluded in above. should read "learn how to" admittedly have never tried using named pipes before. i'll further state inputs have 2 pandas dataframes, , i'd 1 output.
the generic command-line implementation:
/usr/local/bin/my_command inputfilea.csv inputfileb.csv -o outputfile
my current implementation, predictably, doesn't work. don't see how/when dataframes sent command process through named pipes, , i'd appreciate help!
import os import stringio import subprocess import pandas pd dfa = pd.dataframe([[1,2,3],[3,4,5]], columns=["a","b","c"]) dfb = pd.dataframe([[5,6,7],[6,7,8]], columns=["a","b","c"]) # make 2 fifos host dataframes fna = 'inputa'; os.mkfifo(fna); ffa = open(fna,"w") fnb = 'inputb'; os.mkfifo(fnb); ffb = open(fnb,"w") # don't know if need make 2 subprocesses pipe inputs ppa = subprocess.popen("echo", stdin =subprocess.pipe, stdout=subprocess.pipe, stderr=subprocess.pipe) ppb = subprocess.popen("echo", stdin = suprocess.pipe, stdout=subprocess.pipe, stderr=subprocess.pipe) ppa.communicate(input = dfa.to_csv(header=false,index=false,sep="\t")) ppb.communicate(input = dfb.to_csv(header=false,index=false,sep="\t")) pope = subprocess.popen(["/usr/local/bin/my_command", fna,fnb,"stdout"], stdout=subprocess.pipe, stderr=subprocess.pipe) (out,err) = pope.communicate() try: out = pd.read_csv(stringio.stringio(out), header=none,sep="\t") except valueerror: # fail out = "" print("\n###command failed###\n") os.unlink(fna); os.remove(fna) os.unlink(fnb); os.remove(fnb)
you don't need additional processes pass data child process without writing disk:
#!/usr/bin/env python import os import shutil import subprocess import tempfile import threading contextlib import contextmanager import pandas pd @contextmanager def named_pipes(count): dirname = tempfile.mkdtemp() try: paths = [] in range(count): paths.append(os.path.join(dirname, 'named_pipe' + str(i))) os.mkfifo(paths[-1]) yield paths finally: shutil.rmtree(dirname) def write_command_input(df, path): df.to_csv(path, header=false,index=false, sep="\t") dfa = pd.dataframe([[1,2,3],[3,4,5]], columns=["a","b","c"]) dfb = pd.dataframe([[5,6,7],[6,7,8]], columns=["a","b","c"]) named_pipes(2) paths: p = subprocess.popen(["cat"] + paths, stdout=subprocess.pipe) p.stdout: df, path in zip([dfa, dfb], paths): t = threading.thread(target=write_command_input, args=[df, path]) t.daemon = true t.start() result = pd.read_csv(p.stdout, header=none, sep="\t") p.wait()
cat
used demonstration. should use command instead ("/usr/local/bin/my_command"
). assume can't pass data using standard input , have pass input via files. result read subprocess' standard output.
Comments
Post a Comment