optimization - AMD OpenCL Reduce Register Pressure -


i running sorting algorithm in kernel, , sorting part uses 36 vgpr, resulting in 12.5% occupancy , awful performance.

the code segment follows:

typedef struct { float record[8]; float dis; int t_class; }node;   ( int = 0 ; < num_record ; ++ ){ in_data [ i]. dis = dist ( in_data [i]. record , new_point , num_feature ); }   node tmp ; int i; int j; #pragma unroll 1 ( = 0 ; < num_record - 1 ; ++ ) ( j = 0 ; j < num_record - - 1 ; j ++ ) { if ( in_data [ j]. dis > in_data [ (j + 1) ]. dis ) { tmp = in_data [ j ]; in_data [ j ] = in_data [ (j + 1) ]; in_data [ (j + 1) ] = tmp ; } } 

is there way reduce register usage without big modifications algorithm itself? guess better reduce register under 16.

update:

basically kernel trying implement exhaustive knn method.

float tmp;  tmp = in_data [ j ].x; in_data [ j ].x = in_data [ (j + 1) ].x; in_data [ (j + 1) ].x = tmp ;  tmp = in_data [ j ].y; in_data [ j ].y = in_data [ (j + 1) ].y; in_data [ (j + 1) ].y = tmp ;  tmp = in_data [ j ].z; in_data [ j ].z = in_data [ (j + 1) ].z; in_data [ (j + 1) ].z = tmp ; 

should using 1/3 of registers of original code since needs 1/3 space @ time.

you global--->local ------> global

instead of global -----> private -----> global reduce private register usage.


Comments

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

javascript - Chrome Extension: Interacting with iframe embedded within popup -