Super User Asked on December 28, 2020
I have the following for
loop to individually sort
all text files inside of a folder (i.e. producing a sorted output file for each).
for file in *.txt;
do
printf 'Processing %sn' "$file"
LC_ALL=C sort -u "$file" > "./${file}_sorted"
done
This is almost perfect, except that it currently outputs files in the format of:
originalfile.txt_sorted
…whereas I would like it to output files in the format of:
originalfile_sorted.txt
This is because the ${file}
variable contains the filename including the extension. I’m running Cygwin on top of Windows. I’m not sure how this would behave in a true Linux environment, but in Windows, this shifting of the extension renders the file inaccessible by Windows Explorer.
How can I separate the filename from the extension so that I can add the _sorted
suffix in between the two, allowing me to easily differentiate the original and sorted versions of the files while still keeping Windows’ file extensions intact?
I’ve been looking at what might be possible solutions, but to me these seem more equipped to dealing with more complicated problems. More importantly, with my current bash
knowledge, they go way over my head, so I’m holding out hope that there’s a simpler solution which applies to my humble for
loop, or else that someone can explain how to apply those solutions to my situation.
These solutions you link to are in fact quite good. Some answers may lack explanation, so let's sort it out, add some more maybe.
This line of yours
for file in *.txt
indicates the extension is known beforehand (note: POSIX-compliant environments are case sensitive, *.txt
won't match FOO.TXT
). In such case
basename -s .txt "$file"
should return the name without the extension (basename
also removes directory path: /directory/path/filename
→ filename
; in your case it doesn't matter because $file
doesn't contain such path). To use the tool in your code, you need command substitution that looks like this in general: $(some_command)
. Command substitution takes the output of some_command
, treats it as a string and places it where $(…)
is. Your particular redirection will be
… > "./$(basename -s .txt "$file")_sorted.txt"
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^ the output of basename will replace this
Nested quotes are OK here because Bash is smart enough to know the quotes within $(…)
are paired together.
This can be improved. Note basename
is a separate executable, not a shell builtin (in Bash run type basename
, compare to type cd
). Spawning any extra process is costly, it takes resources and time. Spawning it in a loop usually performs poorly. Therefore you should use whatever the shell offers you to avoid extra processes. In this case the solution is:
… > "./${file%.txt}_sorted.txt"
The syntax is explained below for a more general case.
In case you don't know the extension:
… > "./${file%.*}_sorted.${file##*.}"
The syntax explained:
${file#*.}
– $file
, but the shortest string matching *.
is removed from the front;${file##*.}
– $file
, but the longest string matching *.
is removed from the front; use it to get just an extension;${file%.*}
– $file
, but the shortest string matching .*
is removed from the end; use it to get everything but extension;${file%%.*}
– $file
, but with the longest string matching .*
is removed from the end;Pattern matching is glob-like, not regex. This means *
is a wildcard for zero or more characters, ?
is a wildcard for exactly one character (we don't need ?
in your case though). When you invoke ls *.txt
or for file in *.txt;
you're using the same pattern matching mechanism. A pattern without wildcards is allowed. We have already used ${file%.txt}
where .txt
is the pattern.
Example:
$ file=name.name2.name3.ext
$ echo "${file#*.}"
name2.name3.ext
$ echo "${file##*.}"
ext
$ echo "${file%.*}"
name.name2.name3
$ echo "${file%%.*}"
name
But beware:
$ file=extensionless
$ echo "${file#*.}"
extensionless
$ echo "${file##*.}"
extensionless
$ echo "${file%.*}"
extensionless
$ echo "${file%%.*}"
extensionless
For this reason the following contraption might be useful (but it's not, explanation below):
${file#${file%.*}}
It works by identifying everything but extension (${file%.*}
), then removes this from the whole string. The results are like this:
$ file=name.name2.name3.ext
$ echo "${file#${file%.*}}"
.ext
$ file=extensionless
$ echo "${file#${file%.*}}"
$ # empty output above
Note the .
is included this time. You might get unexpected results if $file
contained literal *
or ?
; but Windows (where extensions matter) doesn't allow these characters in filenames anyway, so you may not care. However […]
or {…}
, if present, may trigger their own pattern matching scheme and break the solution!
Your "improved" redirection would be:
… > "./${file%.*}_sorted${file#${file%.*}}"
It should support filenames with or without extension, albeit not with square or curly brackets, unfortunately. Quite a shame. To fix it you need to double quote the inner variable.
Really improved redirection:
… > "./${file%.*}_sorted${file#"${file%.*}"}"
Double quoting makes ${file%.*}
not act as a pattern! Bash is smart enough to tell inner and outer quotes apart because the inner ones are embedded in the outer ${…}
syntax. I think this is the right way.
Another (imperfect) solution, let's analyze it for educational reasons:
${file/./_sorted.}
It replaces the first .
with _sorted.
. It will work fine if you have at most one dot in $file
. There is a similar syntax ${file//./_sorted.}
that replaces all dots. As far as I know there's no variant to replace the last dot only.
Correct answer by Kamil Maciorowski on December 28, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP