r - Include all factor combinations in contingency table to create square probability table/matrix -
i trying create 9 x 9 probability matrix contingency / frequency table.
it contains frequencies pair of values (x1,x2)
transitioning pair of values (y1,y2)
. x1
, y1
have values of a
, b
, or c
, , x2
, y2
have value of d
, e
, or f
.
transitions between xy
pairs not exist. however, have these 'missing' transitions present zeros table / matrix make square (9x9) use in other analyses.
df <- structure(list(x1 = structure(c(1l, 2l, 3l, 1l, 2l, 3l, 1l, 2l, 3l, 1l, 2l, 3l), .label = c("a", "b", "c"), class = "factor"), y1 = structure(c(1l, 2l, 3l, 1l, 2l, 3l, 1l, 2l, 3l, 1l, 2l, 3l), .label = c("a", "b", "c"), class = "factor"), x2 = structure(c(1l,2l, 3l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 3l, 1l), .label = c("d", "e", "f"), class = "factor"), y2 = structure(c(1l, 2l, 3l, 2l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 3l), .label = c("d", "e", "f"), class = "factor"), x = c("ad", "be", "cf", "ad", "bd", "cd", "ae", "be", "ce", "ae", "bf", "cd"), y = c("ad", "be", "cf", "ae", "bd", "cd", "ad", "bd", "cd", "ae", "be", "cf")), .names = c("x1", "y1", "x2", "y2", "x", "y"), row.names = c(na, -12l), class = "data.frame") # df$x <- paste0(df$x1, df$x2) # included in dput # df$y <- paste0(df$y1,df$y2) # convert factor include transitions http://stackoverflow.com/a/13705236/1670053 df$x <- factor(df$x, levels = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf")) df$y <- factor(df$y,levels = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf") ) t1 <- with(df,(table(x,y))) # t1m <- as.data.frame.matrix(t1) t2 <- t1/(colsums(t1)) dfm <- as.data.frame.matrix(t2) #dm <- as.matrix(dfm)
the result dfm
, above, without using factor
on x
, y
has correct values, of course include full set of 9x9 transitions. desired results dfmd
below.
however, when include factor
ed x
, y
, result produced not desired, values of na
, inf
introduced.
is there way when using 'missing' factors evaluate table/colsums(table)
, desired result?
dfmd <- structure(list(ad = c(0.5, 0.5, 0, 0, 0, 0, 0, 0, 0), ae = c(0.5, 0.5, 0, 0, 0, 0, 0, 0, 0), af = c(0l, 0l, 0l, 0l, 0l, 0l, 0l, 0l, 0l), bd = c(0, 0, 0, 0.5, 0.5, 0, 0, 0, 0), = c(0, 0, 0, 0, 0.5, 0.5, 0, 0, 0), bf = c(0l, 0l, 0l, 0l, 0l, 0l, 0l, 0l, 0l), cd = c(0, 0, 0, 0, 0, 0, 0.5, 0.5, 0), ce = c(0l, 0l, 0l, 0l, 0l, 0l, 0l, 0l, 0l), cf = c(0, 0, 0, 0, 0, 0, 0.5, 0, 0.5)), .names = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf"), class = "data.frame", row.names = c("ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf"))
i still unsure why code above produces inf
value or wrong values otherwise, code below results in desired output. seem bit convoluted.
t1 <- with(df,(table(x,y))) # contingency table tcc <- as.matrix(colsums(t1)) # col sums tc <-as.data.frame.matrix(tcc) # store data.frame using rep code below tct <- t(tc) # transpose build matrix of colsums tcx <- tct[rep(seq_len(nrow(tct)), each=9),] # http://stackovernflow.com/a/11121463/1670053 build colsums dataframe 9x9 pmat <- t1/tcx # transition matrix pmat[is.na(pmat)] <- 0 #remove na 0/0
Comments
Post a Comment