MATLAB spending an incredible amount of time writing a relatively small matrix -


i have small matlab script (included below) handling data read csv file 2 columns , hundreds of thousands of rows. each entry natural number, zeros occurring in second column. code taking incredible amount of time (hours) run should achievable in @ seconds. profiler identifies approximately 100% of run time spent writing matrix of zeros, size varies depending on input, in usage smaller 1000x1000.

the code follows

function [data] = datahandler(d) n = size(d,1); s = max(d,1); data = zeros(s,s); = 1:n     data(d(i,1),d(i,2)+1) = data(d(i,1),d(i,2)+1) + 1; end 

it's data = zeros(s,s); line takes around 100% of runtime. can make code run changing out s's in line 1000, sufficient upper bound ensure won't run errors of data i'm looking at.

obviously there're better ways this, being bashed code format data wasn't concerned. said, fixed replacing s 1000 purposes, i'm perplexed why writing matrix bog matlab down several hours. new code runs instantaneously.

i'd interested if has seen kind of behaviour before, or knows why happening. little disconcerting, , able confident can initialize matrices freely without killing matlab.

your call zeros incorrect. looking @ code, d looks d x 2 array. however, call of s = max(d,1) generate d x 2 array. consulting documentation max, happens when call max in way used:

c = max(a,b) returns array same size a , b largest elements taken a or b. either dimensions of a , b same, or 1 can scalar.

therefore, because used max(d,1), comparing every value in d value of 1, you're getting copy of d in end. using input zeros has rather undefined behaviour. happen each row of s, allocate temporary zeros matrix of size , toss temporary result. dimensions of last row of s recorded. because have large matrix d, why profiler hangs here @ 100% utilization. therefore, each parameter zeros must scalar, yet call produce s produce matrix.

what believe intended should have been:

s = max(d(:)); 

this finds overall maximum of matrix d unrolling d single vector , finding overall maximum. if this, code should run faster.

as side note, post may interest you:

faster way initialize arrays via empty matrix multiplication? (matlab)

it shown in post doing zeros(n,n) in fact slow , there several neat tricks initializing array of zeros. 1 way accomplish empty matrix multiplication:

data = zeros(n,0)*zeros(0,n); 

one of personal favourites if assume data not declared / initialized, can do:

data(n,n) = 0; 

if can comment, for loop quite inefficient. doing calculating 2d histogram / accumulation of data. can replace for loop more efficient accumarray call. avoids allocating array of zeros , accumarray under hood you.

as such, code become this:

function [data] = datahandler(d) data = accumarray([d(:,1) d(:,2)+1], 1); 

accumarray in case take pairs of row , column coordinates, stored in d(i,1) , d(i,2) + 1 i = 1, 2, ..., size(d,1) , place match same row , column coordinates separate 2d bin, add of occurrences , output @ 2d bin gives total tally of how many values @ 2d bin corresponds row , column coordinate of interest mapped location.


Comments

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

javascript - Chrome Extension: Interacting with iframe embedded within popup -