R code challenge: retrieving the values in matching columns and sum them up with matching rows -
i have problem solving in r. have data frame called testa (dput included). need match letters in column alt
colnames (a,c,g,t,n)
, corresponding values in column along value ref
letters , result ad.new
(my code job).
however, need expand code solve issue line type
column has flat
@ end. row flat
, need match start id (chr10:102053031
) other ids in start column. if match, need sum corresponding value alt
a,c,g,t,n
column , replace ad.new column flat line along ref
value.
if run dput
, code able understand it. basically, want match letters in ref
, alt
columns , corresponding values columns (a,c,g,t,n
) , separate values comma ref
, alt
. (in example), flat
line want sum value in column a
matching start id start id of flat
line (the value in case 6
) , value match (the value in case 7
g
column) , sum them give 13
. flat line result should 0,13
.
the expected result shown below.
my incomplete code:
testa[is.na(testa)]<-0 ref.counts<-testa[,testa[,"ref"]] ref.counts<-as.matrix(ref.counts) ref.counts[is.na(ref.counts)]<-0 ref.counts<-diag(ref.counts) alt.counts<-testa[,testa[,"alt"]] alt.counts<-as.matrix(alt.counts) alt.counts[is.na(alt.counts)]<-0 alt.counts<-diag(alt.counts) ############# ##need extend code here ############# ad.new<-paste(ref.counts,alt.counts,sep=",")
dput testa:
structure(c("chr10:101544447", "chr10:102053031", "chr10:102778767", "chr10:102789831", "chr10:102989480", "chr10:102053031", "chr10:102053031", "0", "6", "0", "0", "0", "0", "0", "0", "34", "24", "0", "0", "34", "34", "0", "0", "0", "0", "0", "0", "7", "53", "0", "0", "30", "12", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "chr10", "chr10", "chr10", "chr10", "chr10", "chr10", "chr10", "101544447", "102053031", "102778767", "102789831", "102989480", "102053031", "102053031", "a", "c", "c", "c", "c", "c", "c", "t", "a", "t", "t", "t", "g", "g", "snp", "snp", "snp", "snp", "snp", "snp:102053031:flat", "snp", "nonsynonymous snv", "intronic", "nonsynonymous snv", "nonsynonymous snv", "ncrna_exonic", "intronic", "intronic", "abcc2:nm_000392:exon2:c.a116t:p.y39f,", "pkd2l1", "pdzd7:nm_024895:exon8:c.g1136a:p.r379q,pdzd7:nm_001195263:exon8:c.g1136a:p.r379q,", "pdzd7:nm_024895:exon2:c.g146a:p.r49q,pdzd7:nm_001195263:exon2:c.g146a:p.r49q,", "lbx1-as1", "pkd2l1", "pkd2l1"), .dim = c(7l, 15l), .dimnames = list( c("1", "2", "3", "4", "5", "6", "7"), c("start", "a", "c", "g", "t", "n", "=", "-", "chr", "end", "ref", "alt", "type", "refgene::location", "refgene::type")))
expected result
ad.new "0,53" "34,6" "24,0" "0,30" "0,12" "0,13" "34,7"
something should work :
# apply "normal" rule (non considering flat exceptions) alts <- as.numeric(diag(testa[,testa[,"alt"]])) refs <- as.numeric(diag(testa[,testa[,"ref"]])) res <- paste(refs,alts,sep=",") # replace lines having type ending "flat" flats <- grep('.*flat$',testa[,"type"]) res[flats] <- unlist(lapply(flats,function(x){ startid <- testa[x,"start"] selection <- setdiff(which(testa[,"start"] == startid),r) paste0("0,",sum(alts[selection])) })) ad.new <- as.matrix(res) > ad.new [,1] [1,] "0,53" [2,] "34,6" [3,] "24,0" [4,] "0,30" [5,] "0,12" [6,] "0,13" [7,] "34,7"
Comments
Post a Comment