r - Include all factor combinations in contingency table to create square probability table/matrix -


i trying create 9 x 9 probability matrix contingency / frequency table.

it contains frequencies pair of values (x1,x2) transitioning pair of values (y1,y2). x1 , y1 have values of a, b, or c, , x2 , y2 have value of d, e, or f.

transitions between xy pairs not exist. however, have these 'missing' transitions present zeros table / matrix make square (9x9) use in other analyses.

df <- structure(list(x1 = structure(c(1l, 2l, 3l, 1l, 2l, 3l, 1l, 2l,                      3l, 1l, 2l, 3l), .label = c("a", "b", "c"), class = "factor"),                      y1 = structure(c(1l, 2l, 3l, 1l, 2l, 3l, 1l, 2l, 3l, 1l,                      2l, 3l), .label = c("a", "b", "c"), class = "factor"),                      x2 = structure(c(1l,2l, 3l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 3l, 1l),                      .label = c("d", "e", "f"), class = "factor"),                      y2 = structure(c(1l, 2l, 3l, 2l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 3l),                      .label = c("d", "e", "f"), class = "factor"),                      x = c("ad", "be", "cf", "ad", "bd", "cd", "ae", "be", "ce", "ae", "bf", "cd"),                      y = c("ad", "be", "cf", "ae", "bd", "cd", "ad", "bd", "cd", "ae", "be", "cf")),                     .names = c("x1", "y1", "x2", "y2", "x", "y"), row.names = c(na, -12l), class = "data.frame")  # df$x <- paste0(df$x1, df$x2) # included in dput # df$y <- paste0(df$y1,df$y2) # convert factor include transitions http://stackoverflow.com/a/13705236/1670053 df$x <- factor(df$x, levels = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf")) df$y <- factor(df$y,levels = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf") )  t1 <- with(df,(table(x,y))) # t1m <- as.data.frame.matrix(t1) t2 <- t1/(colsums(t1)) dfm <- as.data.frame.matrix(t2) #dm <- as.matrix(dfm) 

the result dfm, above, without using factor on x , y has correct values, of course include full set of 9x9 transitions. desired results dfmd below.

however, when include factored x , y, result produced not desired, values of na , inf introduced.

is there way when using 'missing' factors evaluate table/colsums(table) , desired result?

dfmd <- structure(list(ad = c(0.5, 0.5, 0, 0, 0, 0, 0, 0, 0), ae = c(0.5,  0.5, 0, 0, 0, 0, 0, 0, 0), af = c(0l, 0l, 0l, 0l, 0l, 0l, 0l,  0l, 0l), bd = c(0, 0, 0, 0.5, 0.5, 0, 0, 0, 0), = c(0, 0,  0, 0, 0.5, 0.5, 0, 0, 0), bf = c(0l, 0l, 0l, 0l, 0l, 0l, 0l,  0l, 0l), cd = c(0, 0, 0, 0, 0, 0, 0.5, 0.5, 0), ce = c(0l, 0l,  0l, 0l, 0l, 0l, 0l, 0l, 0l), cf = c(0, 0, 0, 0, 0, 0, 0.5, 0,  0.5)), .names = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce",  "cf"), class = "data.frame", row.names = c("ad", "ae", "af",  "bd", "be", "bf", "cd", "ce", "cf")) 

i still unsure why code above produces inf value or wrong values otherwise, code below results in desired output. seem bit convoluted.

t1 <- with(df,(table(x,y))) # contingency table tcc <- as.matrix(colsums(t1)) # col sums tc <-as.data.frame.matrix(tcc) # store data.frame using rep code below tct <- t(tc) # transpose build matrix of colsums tcx <- tct[rep(seq_len(nrow(tct)), each=9),] # http://stackovernflow.com/a/11121463/1670053 build colsums dataframe 9x9  pmat <- t1/tcx # transition matrix pmat[is.na(pmat)] <- 0 #remove na 0/0 

Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

Nuget pack csproj using nuspec -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -