apache spark - Save data from R code -
i've adapted example spark work on ec2 cluster via hdfs. i've gotten example work saving parquet files.
library(sparkr) # initialize sparkcontext , sqlcontext sc <- sparkr.init() sqlcontext <- sparkrsql.init(sc) # create simple local data.frame localdf <- data.frame(name=c("john", "smith", "sarah"), age=c(19, 23, 18)) # create dataframe json file peopledf <- jsonfile(sqlcontext, file.path("/people.json")) # register dataframe table. registertemptable(peopledf, "people") # sql statements can run using sql methods provided sqlcontext teenagers <- sql(sqlcontext, "select name people age >= 13 , age <= 19") # store teenagers in table saveasparquetfile(teenagers, file.path("/teenagers")) # stop sparkcontext sparkr.stop()
when use savedf
instead of saveasparquetfile
, empty file in hdfs.
drwxr-xr-x - root supergroup 0 2015-07-23 15:14 /teenagers
how can store dataframe text file (json/csv/...)?
spark 2.x
in spark 2.0 or later built-in csv
writer , there no need external dependencies:
write.df(teenagers, "teenagers", "csv", "error")
spark 1.x
you can use spark-csv
:
sys.setenv('sparkr_submit_args' = '"--packages" "com.databricks:spark-csv_2.10:1.1.0" "sparkr-shell"') sqlcontext <- sparkrsql.init(sc) ... # rest of code write.df(teenagers, "teenagers", "com.databricks.spark.csv", "error")
in interactive mode have start sparkr shell --packages
:
bin/sparkr --packages com.databricks:spark-csv_2.10:1.1.0
Comments
Post a Comment