Unix & Linux Asked by Hambone on December 31, 2021
I have two files that I am trying to match multiple columns worth of information and output to a new file. I have found several posts that have gotten me to about a 80% solution, and I am looking for help with the final piece of the puzzle. I am currently using a system with Solaris Unix system, and mainly do most of my information manipulation using a ksh shell. The solution that I have gotten is using an awk command. The tables are currently in 4 columns separated with spaces. File 1 is an example of the master file list that I keep, and what I am comparing file 2 with.
File 1 looks something like this (The master file list):
FOO1 BAR1 100 SX2000
FOO1 BAR1 101 SX2001
FOO1 BAR1 102 SX2002
FOO1 BAR1 103 SX2003
FOO1 BAR1 104 SX2004
FOO1 BAR1 105 SX2005
FOO1 BAR1 106 SX2006
FOO1 BAR1 107 SX2007
FOO1 BAR1 108 SX2008
FOO1 BAR1 109 SX2009
FOO1 BAR1 110 SX2010
FOO1 BAR1 111 SX2011
FOO1 BAR1 112 SX2012
FOO1 BAR1 113 SX2013
FOO1 BAR1 114 SX2014
FOO1 BAR1 115 SX2015
FOO1 BAR1 116 SX2016
FOO1 BAR1 117 SX2017
FOO1 BAR1 118 SX2018
FOO1 BAR1 119 SX2019
FOO1 BAR1 120 SX2020
File 2 looks like this (the table I am filling in):
FOO1 BAR1 100 SX
FOO1 BAR1 101 SX
FOO1 BAR1 102 SX
FOO1 BAR1 103 SX
FOO1 BAR1 104 SX
FOO1 BAR1 105 NV
FOO1 BAR1 106 SX
FOO1 BAR1 107 SX
FOO1 BAR1 108 SX
FOO1 BAR1 109 SX
FOO1 BAR1 113 SX
FOO1 BAR1 114 SX
FOO1 BAR1 115 NV
FOO1 BAR1 116 SX
FOO1 BAR1 117 SX
FOO1 BAR1 118 SX
FOO1 BAR1 119 SX
What I would like is a result like this:
FOO1 BAR1 100 SX2000
FOO1 BAR1 101 SX2001
FOO1 BAR1 102 SX2002
FOO1 BAR1 103 SX2003
FOO1 BAR1 104 SX2004
FOO1 BAR1 105 NV
FOO1 BAR1 106 SX2006
FOO1 BAR1 107 SX2007
FOO1 BAR1 108 SX2008
FOO1 BAR1 109 SX2009
FOO1 BAR1 113 SX2013
FOO1 BAR1 114 SX2014
FOO1 BAR1 115 NV
FOO1 BAR1 116 SX2016
FOO1 BAR1 117 SX2017
FOO1 BAR1 118 SX2018
FOO1 BAR1 119 SX2019
As you can see, I would like to compare file 2 with file 1, and print out the value in column 4 if it matches all the way across. The code that I have come up with so far allows me to compare the first 3 columns of both files, and then copy across column 4.
Here is an example of the script that I have so far:
#! /bin/ksh
file1=master_table
file2=test_table
file3=combined_table
awk -F' ' '(FNR==NR) {a[$1,$2,$3]=$4; next} (($1,$2,$3) in a) {print $1,$2,$3,a[$1,$2,$3]}' $file1 $file2 > $file3
I am looking for a way to still make the comparisons with the first 3 columns, and also check if the first 2 characters of column 4 match before it copies the information over. Otherwise, I would like to have the original column 4 value copied over from the second file. While I am currently using awk, that does not have to be the final solution. I would just like to keep it within a shell script because it makes it easier for me to do other manipulations with the files.
Thanks in advance for the help!!
$ cat tst.awk
{ key = $1 FS $2 FS $3 FS substr($4,1,2) }
NR==FNR { a[key]=$4; next }
key in a { $4=a[key] }
{ print }
.
$ awk -f tst.awk file1 file2
FOO1 BAR1 100 SX2000
FOO1 BAR1 101 SX2001
FOO1 BAR1 102 SX2002
FOO1 BAR1 103 SX2003
FOO1 BAR1 104 SX2004
FOO1 BAR1 105 NV
FOO1 BAR1 106 SX2006
FOO1 BAR1 107 SX2007
FOO1 BAR1 108 SX2008
FOO1 BAR1 109 SX2009
FOO1 BAR1 113 SX2013
FOO1 BAR1 114 SX2014
FOO1 BAR1 115 NV
FOO1 BAR1 116 SX2016
FOO1 BAR1 117 SX2017
FOO1 BAR1 118 SX2018
FOO1 BAR1 119 SX2019
Answered by Ed Morton on December 31, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP