Unix & Linux Asked by user979974 on November 21, 2021
I would like to find file names which contain a number and list them in a range of numbers. For example, in my directory I have:
**
Ion_001_rawlib.bam
Ion_002_rawlib.bam
Ion_003_rawlib.bam
Ion_004_rawlib.bam
Ion_005_rawlib.bam
...
Ion_020_rawlib.bam
**
and I want to list only Ion filenames from 003 to 005. I tried to do something like that:
find -name '*Ion_*[3-5]*rawlib.bam'
but it doesn’t produce the effect expected. Do you have any idea if it can be performed?
Thanks.
With the zsh
shell, you can do:
print -rC1 Ion_<3-5>_rawlib.bam
Where <x-y>
is a glob operator that matches on textual decimal representations of positive integer numbers within the given range (from x
to y
, included).
Recursively:
print -rC1 -- **/Ion_<3-5>_rawlib.bam
(add (D)
if you also want to look for those files in hidden folders, or (N)
if you don't want to consider it an error when there's no matching file).
With find
implementations that support a -regex
predicate, you can do:
LC_ALL=C find . -regex '.*/Ion_0*[345]_rawlib.bam'
(matches for file paths that are 0 or more (*
) bytes (.
with LC_ALL=C
) followed by /Ion_
followed by 0 or more (*
) 0
s, followed by either one of the 3
, 4
or 5
characters followed by rawlib.bam
).
Here, it's relatively easy for a 3..5 range, but it would become much more painful for ranges like 78..123 for instance (and you'd run into compatibility issues as the few find
implementations that support -regex
use different formats of regexps there).
Standard find
only supports -name
and -path
for matching on file names and it's done with basic shell wildcards as opposed to regular expressions but wildcards don't have the equivalent of the *
regexp operator (0 or more of the preceding atom), its *
operator is the equivalent of regexp .*
(0 or more characters), so Ion_*[3-5]_rawlib.bam
would match on Ion_9994_rawlib.bam
for instance as *
matches on 999
.
In this simple case however, you could do it using several patterns and negation such as:
LC_ALL=C find . -name 'Ion_*[345]_rawlib.bam'
! -name 'Ion_*[!0]*?_rawlib.bam'
Non-recursively:
LC_ALL=C find . ! -name . -prune
-name 'Ion_*[345]_rawlib.bam'
! -name 'Ion_*[!0]*?_rawlib.bam'
To find files that contain decimal representations of integer numbers x
to y
anywhere in the name, you need a pattern that matches that range (like zsh
's <x-y>
) but also make sure that pattern is not surround by other digits. For instance foo305.txt
does contain 3
, 05
and 5
, all of which match <3-5>
.
In zsh
, that would be:
print -rC1 -- (|*[^0-9])<3-5>(|[^0-9]*)
That is <3-5>
(which matches, 3, 03, 003...) following either nothing or a string ending in a non-digit and followed by either nothing or a string starting with a non-digit.
With BSD find
:
LC_ALL=C find -E . -regex '.*/([^/]*[^0-9])?0*[3-5]([^0-9][^/]*)?'
With GNU find
, same, but replace -E .
with . -regextype posix-extended
.
With busybox find
(though depends on how it was compiled):
busybox find . -regex '.*/([^/]*[^0-9])?0*[3-5]([^0-9][^/]*)?'
Another approach is to use find
to report the list of files, but use more advanced languages like perl
to filter that list:
find . -print0 | perl -l -0ne '
if (m{[^/]*z}) {
for $n ($& =~ /d+/g) {
if ($n >= 3 && $n <= 5) {
print;
next LINE;
}
}
}'
Here, using perl
to extract all the sequences of decimal digits from the basename of each file, and outputting the files if at least one of those sequences of digits represent a number in the 3..5 range.
Answered by Stéphane Chazelas on November 21, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP