arrays - How to parse a specific Netbackup bpimagelist format data file, records separated by empty line, each line with data label and label -
nbu bpimagelist data output format snippet here. a single record separated blank or empty line, each line contains variable length data label separated colon, random number of spaces, variable data content. single record can of variable length , not specific number of lines.
i'd convert file comma separated format import excel in order analyze it. able extract data labels without problem.
client: <hostname> backup id: <hostname>_1396674012 policy: m-portwarew2k03-prod-clf policy type: ms-windows (13) proxy client: (none specified) creator: root name1: (none specified) sched label: monthly_full schedule type: full (0) retention level: 5 weeks (4) backup time: sat apr 5 01:00:12 2014 (1396674012) elapsed time: 2448 second(s) expiration time: sat apr 3 01:00:12 2021 (1617426012) compressed: no client encrypted: no kilobytes: 37997291 number of files: 240819 number of copies: 1 number of fragments: 1 histogram: 0 0 0 0 0 0 0 0 0 0 db compressed: no files file name: m-portwarew2k03-prod-clf_1396674012_full.f ...many more lines of files data labels , data per record.
i'd data csv format this...
client,backup id,policy,policy type,proxy client,creator,...more labels <hostname>,<hostname>_id#,m-portwarew2k03-prod-clf,ms-windows (13),(none specified),root,(none specified),monthly_full,full (0),5 weeks (4),sat apr 5 01:00:12 2014 (1396674012),...more... # write output headers first file record - single record blank line blank line # first record , pull out first column of data , output single comma delimited line header=`sed '/^\s*$/q' $inputfile | cut -d: -f1 | tr '\n' ','` echo -e $header > $outfile # repeat above on lines in file pull data (2nd column after : instead , output comma delimited # "cut -d: -f2-" removes first column of data left of colon delimiter, , # "tr -d ' ' " removes leading white space between colon , start of data, , # "tr '\n' ',' " or "paster -d, -s" replaces newlines commas between data.
now, how add trailing newline between records?
sed '/^\s*$/d' $inputfile | cut -d: -f2- | tr -d ' ' | tr '\n' ',' >> $outfile
so reformat data lines, showing data right of colon delimiter (removing intervening spaces between delimiter , start of data), removing line feeds between each line (as in source) , replacing commas, until data record output. when next blank line reached in source, output advanced new line , process should repeat until end of data.
client: <hostname> backup id: <hostname>_1349499621 policy: m-portwarew2k03-prod-clf policy type: ms-windows (13) proxy client: (none specified) creator: root name1: (none specified) sched label: monthly_full schedule type: full (0) retention level: 7 years (14) backup time: sat oct 6 01:00:21 2012 (1349499621) elapsed time: 3457 second(s) expiration time: sat oct 5 01:00:21 2019 (1570251621) compressed: no client encrypted: no kilobytes: 37090868 number of files: 215304 number of copies: 1 number of fragments: 6 histogram: 0 0 0 0 0 0 0 0 0 0 db compressed: no files file name: m-portwarew2k03-prod-clf_1349499621_full.f previous backup files file name: (none specified) parent backup image file name: (none specified) sw version: (none specified) options: 0x0 mpx: 1 tir info: 0 tir expiration: wed dec 31 19:00:00 1969 (0) keyword: (none specified) ext security info: no file restore raw: no image dump level: 0 file system only: no object descriptor: (none specified) previous bi time: wed dec 31 19:00:00 1969 (0) bi full time: wed dec 31 19:00:00 1969 (0) request pid: 0 backup status: 0 stream number: 0 backup copy: standard (0) files file size: 0 pfi type: 0 image_attribute: 0 primary copy: 1 image type: 0 (regular) job id: 2123444 num resumes: 0 resume expiration: wed dec 31 19:00:00 1969 (0) data classification: (none specified) data_classification_id: (none specified) storage lifecycle policy: (none specified) storage lifecycle policy version: 0 stl_completed: 0 remote expiration time: wed dec 31 19:00:00 1969 (0) origin master server: (none specified) origin master guid: (none specified) snap time: wed dec 31 19:00:00 1969 (0) ir enabled: no client character set: 0 image on hold: 0 indexing status: 0 copy number: 1 fragment: 1 kilobytes: 0 remainder: 0 media type: media manager (2) density: hcart3 (20) file num: 8 id: k14753 host: <some_other_host> block size: 262144 offset: 1220388 media date: fri oct 5 19:00:10 2012 (1349478010) dev written on: 2 flags: 0x40 (tape encrypted) media descriptor: ? expiration time: sat oct 5 01:00:21 2019 (1570251621) mpx: 1 retention_lvl: 7 years (14) try keep time: wed dec 31 19:00:00 1969 (0) copy creation time: sat oct 6 01:57:58 2012 (1349503078) data format: undefined checkpoint: 0 resume num: 0 key tag: 41f841dd750ef07e68cc5387629bb22d21933ca3a4ea204a01abbee2ba98cd44 stl tag: *null* copy on hold: 0 copy number: 1 fragment: 2 kilobytes: 6423296 remainder: 0 media type: media manager (2) density: hcart3 (20) file num: 9 id: k14753 host: amarlp67 block size: 262144 offset: 1235772 media date: fri oct 5 19:00:10 2012 (1349478010) dev written on: 2 flags: 0x40 (tape encrypted) media descriptor: ? checkpoint: 0 resume num: 0 copy on hold: 0 copy number: 1 fragment: 3 kilobytes: 3038464 remainder: 0 media type: media manager (2) density: hcart3 (20) file num: 10 id: k14753 host: amarlp67 block size: 262144 offset: 1538917 media date: fri oct 5 19:00:10 2012 (1349478010) dev written on: 2 flags: 0x40 (tape encrypted) media descriptor: ? checkpoint: 0
etcetera, until blank line. every record have random number of fragments.
i'm open methodology solve this, though simplest , elegant code efficient. realize source data millions of rows long.
this code generate csv output bpimagelist command:
echo "client_name, date1, date2, version, backupid, policy_name, client_type, proxy_client, creator, sched_label, sched_type, retention, backup_time, elapsed, expiration, compression, encryption, kbytes, num_files, copies, num_fragments, files_compressed, files_file, version, name1, options, primary, image_type, tir_info, tir_expiration, keywords, mpx, ext_security, raw, dump_lvl, fs_only, prev_bitime, bifull_time, obj_desc, requestid, backup_stat, backup_copy, prev_image, jobid, num_resumes, resume_expr, ff_size, pfi_type, image_attrib, ss_classification_id, ss_name, ss_completed, snap_time, slp_version[, remoteexpiration, origin_master_server, origin_master_guid, ir_enabled, client_charset, hold, indexing_status" bpimagelist -l | grep '^image' | sed -e 's/^image //' | tr ' ' ','
Comments
Post a Comment