scala - How to read some specific files from a collection of files as one RDD -
i have collection of files in directory , want read specific files form these files 1 rdd , example:
2000.txt 2001.txt 2002.txt 2003.txt 2004.txt 2005.txt 2006.txt 2007.txt 2008.txt 2009.txt 2010.txt 2011.txt 2012.txt
and want read every specific range these files, example:
range = 4 = 2004 read files : 2004.txt , 2005.txt , 2006.txt , 2007.txt 1 rdd (data)
how can in spark scala?
because spark's textfile exposes hadoop's fileinputformat, can specify varargs of directories , wildcards. hence should work (untested):
def datedrange(fromyear: int, years: int) = sc.textfile(seq.tabulate(years)(x => fromyear + x).map(y => s"/path/to/dir/$y"): _*)
Comments
Post a Comment