scala - How to read some specific files from a collection of files as one RDD -


i have collection of files in directory , want read specific files form these files 1 rdd , example:

2000.txt 2001.txt 2002.txt 2003.txt 2004.txt 2005.txt 2006.txt 2007.txt 2008.txt 2009.txt 2010.txt 2011.txt 2012.txt 

and want read every specific range these files, example:

range = 4 = 2004  read files : 2004.txt , 2005.txt , 2006.txt , 2007.txt 1 rdd (data) 

how can in spark scala?

because spark's textfile exposes hadoop's fileinputformat, can specify varargs of directories , wildcards. hence should work (untested):

def datedrange(fromyear: int, years: int) =    sc.textfile(seq.tabulate(years)(x => fromyear + x).map(y => s"/path/to/dir/$y"): _*) 

Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -