Unix & Linux Asked by Sagar Joshi on December 31, 2021
I have multiple files (>150) with multiple columns (>150). Most of the headers are common but occur in different order (as eg below):
File 1:
Col1 Col2 Col3 Col4 Col5
A B C D E
File 2:
Col1 Col4 Col3 Col5
P Q R S
Desired output:
Col1 Col3 Col4 Col5
A C D E
P R Q S
Alternatively, nearly 30-40 files have common set of headers (still in different order though). If someone can help me sort the headers (and the corresponding data) so that they appear in the same order in the whole bunch of files, I can then go ahead and remove uncommon columns from 4-5 bunches and merge the common set.
This Perl code takes in all the to-be merged files and determines the common header from the headers of all the files and then rearranges the column printing order to output the files.
perl -wMstrict -Mvars='*ARGV_orig,*comm_hdr,*prev,*h' -lne '
BEGIN{
@::ARGV_orig = @ARGV;
$::prev = q//;
sub trim {
my ($str) = @_ ? @_ : $_;
for($str) {
s/^s*//;s/s*$//;
}
return $str;
}
sub intersection(@@) {
@{$_[0]} > @{$_[1]} and @_ = reverse @_;
my @smaller = @{ +shift };
my @larger = @{ +shift };
my @common;
for my $e (@smaller) {
push @common, $e
if grep { $_ eq $e } @larger;
}
return @common;
}
sub col_print_order {
my @common_hdr = @{ $_[0]->{common_header} };
my @header2prn = @{ $_[0]->{header_2print} };
my @reorder;
for my $e (@common_hdr) {
if ( -1 < (my ($l) = grep { $header2prn[$_] eq $e } 0..$#header2prn) ) {
push @reorder, $l;
}
}
return @reorder;
}
}
if ( $ARGV ne $::prev ) {
$::h{$ARGV}{header} = $_;
my @A = split;
@::comm_hdr = @::comm_hdr ? intersection(@::comm_hdr, @A) : @A;
$::prev = $ARGV;
} else {
push @{ $::h{$ARGV}{data} }, $_;
}
END{
local $, = chr(32);
my @comm_hdr_sorted = sort @::comm_hdr;;
print @comm_hdr_sorted;
for my $argv (@::ARGV_orig) {
my @current_header = split /s+/, trim $::h{$argv}{header};
my @order = col_print_order({
common_header => @comm_hdr_sorted,
header_2print => @current_header,
});
my @file = @{ $::h{$argv}{data} };
for my $line_num ( 0..$#file ) {
my $line = trim $file[$line_num];
my @fields = split /s+/, $line;
print @fields[ @order ];
}
}
}
' yourfile1 yourfile2 yourfile3 # ... specify all your filenames to be merged here
Col1 Col3 Col4 Col5
A C D E
P R Q S
Answered by user218374 on December 31, 2021
GNU awk, can handle an arbitrary number of files (all file content is stored in memory, so depends on your system's memory capacity)
gawk '
# examine the headers for this file
FNR == 1 {
num_files++
delete this_headers
for (i=1; i<=NF; i++) {
all_headers[$i]++
this_headers[i] = $i
}
next
}
# this is a line of data
{
n++
for (i=1; i<=NF; i++) {
data[n][this_headers[i]] = $i
}
}
END {
# find the headers that are common to all files
for (header in all_headers) {
if (all_headers[header] == num_files)
common_headers[header]
}
# sort arrays by index, alphabetically
PROCINFO["sorted_in"] = "@ind_str_asc"
# print out the common headers
for (header in common_headers) {
printf "%s ", header
}
print ""
# print out the data
for (i=1; i<=n; i++) {
for (header in common_headers) {
printf "%s ", data[i][header]
}
print ""
}
}
' file1 file2
outputs
Col1 Col3 Col4 Col5
A C D E
P R Q S
Answered by glenn jackman on December 31, 2021
Save the code as file mergecols, make it executable and start it with
mergecols -C1=0,2,3,4 -C2=0,2,1,3 file1 file2
#!/usr/bin/perl -s
# mergecols
# -C1=0,2,3,4 columns from file 1
# -C2=0,2,1,3 columns from file 2
# file1 input file 1
# file2 input file 2
($f1,$f2) = @ARGV;
@t1 = map { [split] } do { local @ARGV=($f1); <> };
@t2 = map { [split] } do { local @ARGV=($f2); <> };
@c1 = split /,/, $C1;
@c2 = split /,/, $C2;
for ( $i=0; $t1[$i] or $t2[$i]; $i++ ) {
print join ' ', @{$t1[$i]}[@c1], "n" if $t1[$i];
print join ' ', @{$t2[$i]}[@c2], "n" if $t2[$i];
}
Answered by ingopingo on December 31, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP