optimization - AMD OpenCL Reduce Register Pressure -
i running sorting algorithm in kernel, , sorting part uses 36 vgpr, resulting in 12.5% occupancy , awful performance.
the code segment follows:
typedef struct { float record[8]; float dis; int t_class; }node; ( int = 0 ; < num_record ; ++ ){ in_data [ i]. dis = dist ( in_data [i]. record , new_point , num_feature ); } node tmp ; int i; int j; #pragma unroll 1 ( = 0 ; < num_record - 1 ; ++ ) ( j = 0 ; j < num_record - - 1 ; j ++ ) { if ( in_data [ j]. dis > in_data [ (j + 1) ]. dis ) { tmp = in_data [ j ]; in_data [ j ] = in_data [ (j + 1) ]; in_data [ (j + 1) ] = tmp ; } }
is there way reduce register usage without big modifications algorithm itself? guess better reduce register under 16.
update:
basically kernel trying implement exhaustive knn method.
float tmp; tmp = in_data [ j ].x; in_data [ j ].x = in_data [ (j + 1) ].x; in_data [ (j + 1) ].x = tmp ; tmp = in_data [ j ].y; in_data [ j ].y = in_data [ (j + 1) ].y; in_data [ (j + 1) ].y = tmp ; tmp = in_data [ j ].z; in_data [ j ].z = in_data [ (j + 1) ].z; in_data [ (j + 1) ].z = tmp ;
should using 1/3 of registers of original code since needs 1/3 space @ time.
you global--->local ------> global
instead of global -----> private -----> global reduce private register usage.
Comments
Post a Comment