automation - In bash, how can I remove multiple versions of the same file?

Question

Welcome To Ask or Share your Answers For Others

automation - In bash, how can I remove multiple versions of the same file?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

automation - In bash, how can I remove multiple versions of the same file?

This may be a very specific case, but I know very little about bash and I need to remove "duplicate" files. I've been downloading totally legal videogame roms these past few days, and I noticed that a lot of packs have many different versions of the same game, like this:

Awesome Golf (1991).lnx
Awesome Golf (1991) [b1].lnx
Baseball Heroes (1991).lnx
Baseball Heroes (1991) [b1].lnx
Basketbrawl (1992).lnx
Basketbrawl (1992) [a1].lnx
Basketbrawl (1992) [b1].lnx
Batman Returns (1992).lnx
Batman Returns (1992) [b1].lnx

How can I make a bash script that removes the duplicates? A duplicate would be any file that has the same name, and the name would be the string before the first parenthesis. The script should parse all the files and grab their names, see which names match to detect duplicates, and remove all files except the first one (first being the first that comes up in alphabetical order).

question from:https://stackoverflow.com/questions/65926526/in-bash-how-can-i-remove-multiple-versions-of-the-same-file

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:06:59+0000

Would you please try the following:

#!/bin/bash

dir="dir"                               # the directory where the rom files are located
declare -A seen                         # associative array to detect the duplicates
while IFS= read -r -d "" f; do          # loop over filenames by assigning "f" to it
    name=${f%(*}                        # extract the "name" by removing left paren and following characters
    name=${name%.*}                     # remove the extension considering the case the filename doesn't have parens
    name=${name%[*}                     # remove the left square bracket and following characters considering the case as above
    name=${name%% }                     # remove trailing whitespaces, if any
    if (( seen[$name]++ )); then        # if the name duplicates...
        # remove "echo" if the output looks good
        echo rm -- "$f"                 # then remove the file
    fi
done < <(find "$dir" -type f -name "*.lnx" -print0 | sort -z -t "." -k1,1)
                                        # sort the list of filenames in alphabetical order

Please modify the first dir= line to your directory path which holds the rom files.
The echo command just prints the filenames to be removed as a rehearsal. If the output looks good, then remove echo and execute the real one.

[Explanation]

An associative array seen associates the extracted "name" with a counter of appearance. If the counter is non-zero, the file is a duplicated one and can be removed (as long as the files are properly sorted).
The -print0 option to find, the -z option to sort and the -d "" option to read make a null character as a delimiter of filenames to accept filenames which contain special characters such as a whitespace, tab, newline, etc.

Categories

automation - In bash, how can I remove multiple versions of the same file?

automation - In bash, how can I remove multiple versions of the same file?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags