Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
943 views
in Technique[技术] by (71.8m points)

automation - In bash, how can I remove multiple versions of the same file?

This may be a very specific case, but I know very little about bash and I need to remove "duplicate" files. I've been downloading totally legal videogame roms these past few days, and I noticed that a lot of packs have many different versions of the same game, like this:

Awesome Golf (1991).lnx
Awesome Golf (1991) [b1].lnx
Baseball Heroes (1991).lnx
Baseball Heroes (1991) [b1].lnx
Basketbrawl (1992).lnx
Basketbrawl (1992) [a1].lnx
Basketbrawl (1992) [b1].lnx
Batman Returns (1992).lnx
Batman Returns (1992) [b1].lnx

How can I make a bash script that removes the duplicates? A duplicate would be any file that has the same name, and the name would be the string before the first parenthesis. The script should parse all the files and grab their names, see which names match to detect duplicates, and remove all files except the first one (first being the first that comes up in alphabetical order).

question from:https://stackoverflow.com/questions/65926526/in-bash-how-can-i-remove-multiple-versions-of-the-same-file

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Would you please try the following:

#!/bin/bash

dir="dir"                               # the directory where the rom files are located
declare -A seen                         # associative array to detect the duplicates
while IFS= read -r -d "" f; do          # loop over filenames by assigning "f" to it
    name=${f%(*}                        # extract the "name" by removing left paren and following characters
    name=${name%.*}                     # remove the extension considering the case the filename doesn't have parens
    name=${name%[*}                     # remove the left square bracket and following characters considering the case as above
    name=${name%% }                     # remove trailing whitespaces, if any
    if (( seen[$name]++ )); then        # if the name duplicates...
        # remove "echo" if the output looks good
        echo rm -- "$f"                 # then remove the file
    fi
done < <(find "$dir" -type f -name "*.lnx" -print0 | sort -z -t "." -k1,1)
                                        # sort the list of filenames in alphabetical order
  • Please modify the first dir= line to your directory path which holds the rom files.
  • The echo command just prints the filenames to be removed as a rehearsal. If the output looks good, then remove echo and execute the real one.

[Explanation]

  • An associative array seen associates the extracted "name" with a counter of appearance. If the counter is non-zero, the file is a duplicated one and can be removed (as long as the files are properly sorted).
  • The -print0 option to find, the -z option to sort and the -d "" option to read make a null character as a delimiter of filenames to accept filenames which contain special characters such as a whitespace, tab, newline, etc.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.7k users

...