apache spark - Save data from R code -


i've adapted example spark work on ec2 cluster via hdfs. i've gotten example work saving parquet files.

library(sparkr)  # initialize sparkcontext , sqlcontext sc <- sparkr.init() sqlcontext <- sparkrsql.init(sc)  # create simple local data.frame localdf <- data.frame(name=c("john", "smith", "sarah"), age=c(19, 23, 18))  # create dataframe json file peopledf <- jsonfile(sqlcontext, file.path("/people.json"))  # register dataframe table. registertemptable(peopledf, "people")  # sql statements can run using sql methods provided sqlcontext teenagers <- sql(sqlcontext, "select name people age >= 13 , age <= 19")  # store teenagers in table saveasparquetfile(teenagers, file.path("/teenagers"))  # stop sparkcontext sparkr.stop() 

when use savedf instead of saveasparquetfile, empty file in hdfs.

drwxr-xr-x   - root supergroup          0 2015-07-23 15:14 /teenagers 

how can store dataframe text file (json/csv/...)?

spark 2.x

in spark 2.0 or later built-in csv writer , there no need external dependencies:

write.df(teenagers, "teenagers", "csv", "error") 

spark 1.x

you can use spark-csv:

sys.setenv('sparkr_submit_args' =     '"--packages" "com.databricks:spark-csv_2.10:1.1.0" "sparkr-shell"') sqlcontext <- sparkrsql.init(sc)  ... # rest of code  write.df(teenagers, "teenagers", "com.databricks.spark.csv", "error") 

in interactive mode have start sparkr shell --packages:

bin/sparkr --packages com.databricks:spark-csv_2.10:1.1.0  

Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -