f# - Return multiple columns / a dataframe in Deedle based on row-wise mapping -
i want @ each row in frame , construct multiple columns new frame based on values in row.
the final result should frame has columns of original frame plus new columns.
i have solution wonder if there better one. think best way explain desired behavior example. i'm using deedle's titanic data set:
#r @"f:\aolney\research_projects\braintrust\code\qualtricstor\packages\deedle.1.2.3\lib\net40\deedle.dll";; #r @"f:\aolney\research_projects\braintrust\code\qualtricstor\packages\fsharp.charting.0.90.12\lib\net40\fsharp.charting.dll";; #r @"f:\aolney\research_projects\braintrust\code\qualtricstor\packages\fsharp.data.2.2.2\lib\net40\fsharp.data.dll";; open system open fsharp.data open deedle open fsharp.charting;; #load @"f:\aolney\research_projects\braintrust\code\qualtricstor\packages\fsharp.charting.0.90.12\fsharp.charting.fsx";; #load @"f:\aolney\research_projects\braintrust\code\qualtricstor\packages\deedle.1.2.3\deedle.fsx";; let titanic = frame.readcsv(@"c:\users\aolne_000\downloads\titanic.csv");;
this frame looks like:
val titanic : frame<int,string> = passengerid survived pclass name sex age sibsp parch ticket fare cabin embarked 0 -> 1 false 3 braund, mr. owen harris male 22 1 0 a/5 21171 7.25 s 1 -> 2 true 1 cumings, mrs. john bradley (florence briggs thayer) female 38 1 0 pc 17599 71.2833 c85 c
my approach grabs each row, uses selection logic, , returns new row value as dictionary. use deedle's expansion operation convert values in dictionary new columns.
titanic?test <- titanic |> frame.maprowvalues( fun x -> if x.getas<int>("pclass") > 1 dict ["a", 1; "b", 2] else dict ["a", 2 ; "b", 1] );; titanic |> frame.expandcols ["test"];;
this gives following new frame:
passengerid survived pclass name sex age sibsp parch ticket fare cabin embarked test.a test.b 0 -> 1 false 3 braund, mr. owen harris male 22 1 0 a/5 21171 7.25 s 1 2 1 -> 2 true 1 cumings, mrs. john bradley (florence briggs thayer) female 38 1 0 pc 17599 71.2833 c85 c 2 1
note last 2 columns test.a , test.b. approach creates new frame (a , b) , joins frame existing frame.
this fine use case confusing others read. forces prefix, e.g. "test", on final columns isn't highly desirable.
is there way append new values end of row series represented in code above x?
i find approach quite elegant , clever. because new series shares index original frame, going pretty fast. so, think solution may better alternative option (but have not measured this).
anyway, other option return new rows frame.maprowvalues
call - each row, return original row additional columns.
titanic |> frame.maprowvalues(fun x -> let add = if x.getas<int>("pclass") > 1 series ["a", box 1; "b", box 2] else series ["a", box 2 ; "b", box 1] series.merge x add) |> frame.ofrows
Comments
Post a Comment