TransWikia.com

How to clean unused files?

TeX - LaTeX Asked by Alessandro Cuttin on July 11, 2021

I recently edited an existing document to create a new one from it (that is: I copied the whole folder to a new location and started from there).
The early document had a lot of figures, but not all of them were used in the new version.

Now I have a lot of unused files (jpg, pdf, png) under the /fig which I want to get rid of, because they are not called by any includegraphics command.

Is there a way to list used or unused files?
(I’m not referring to auxiliary files, I’m fine with those.)

5 Answers

I came up with this little script (ran from the root folder of the project):

#!/bin/bash

for image_file in $(ls fig/)
do
if grep $image_file *.log -c > 1
then
        echo "File $image_file is in use."
else
        echo "File $image_file is not in use."
        mv "fig/$image_file" "fig/moved.$image_file" # or any other action
fi
done

Correct answer by Alessandro Cuttin on July 11, 2021

I'm not sure about your question. If you like to clean up a directory and get rid of auxiliary files and let's say all files *.jpg, and you are under Windows, you could use a powershell script published by U. Ziegenhagen here: http://uweziegenhagen.de/?p=2095. Customise it, put it into your folder and press shift + rightclick. Beware: it deletes in a second...

My adaption includes files produced by tex4ht and syntex:

function Get-ScriptDirectory{
    $Invocation = (Get-Variable MyInvocation -Scope 1).Value
    Split-Path $Invocation.MyCommand.Path
}

$path = (Get-ScriptDirectory)

cd $path


remove-item  *.log |% {remove-item $_}

get-childitem *.toc |% {remove-item $_}

get-childitem *.gz |% {remove-item $_}

get-childitem *.aux |% {remove-item $_}

get-childitem *.nav |% {remove-item $_}

get-childitem *.out |% {remove-item $_}

get-childitem *.synctex |% {remove-item $_}

get-childitem *.synctex.gz |% {remove-item $_}

get-childitem *.tmp |% {remove-item $_}

get-childitem *.4ct |% {remove-item $_}

get-childitem *.4tc |% {remove-item $_}

get-childitem *.anl |% {remove-item $_}

get-childitem *.lg |% {remove-item $_}

get-childitem *.idv |% {remove-item $_}

get-childitem *.xref |% {remove-item $_}

Answered by Keks Dose on July 11, 2021

Check out my typical Makefile

# This is a LaTeX Makefile created by Predrag Punosevac#
########################################################
SHELL = /bin/sh
.SUFFIXES : .tex .dvi .ps .pdf

FILE = sam-new

LATEX = /usr/local/bin/latex
PDFLATEX = /usr/local/bin/pdflatex
BIBTEX = /usr/local/bin/bibtex
XDVI = /usr/local/bin/xdvi
DVIPS = /usr/local/bin/dvips
GVU = /usr/local/bin/gvu
PS2PDF = /usr/local/bin/ps2pdf
XPDF = /usr/local/bin/xpdf 
LPR = /usr/bin/lpr

DVI = ${FILE}.dvi
PS = ${FILE}.ps
PDF = ${FILE}.pdf



.tex.pdf :
       ${PDFLATEX} ${FILE}.tex
       ${PDFLATEX} ${FILE}.tex



bib :   
       ${PDFLATEX} ${FILE}.tex
       ${BIBTEX} ${FILE}
pdf : bib
       ${PDFLATEX} ${FILE}.tex
       ${PDFLATEX} ${FILE}.tex



# Various cleaning options
clean-ps :
          /bin/rm -f  *.log *.aux *.dvi *.bbl *.blg *.bm *.toc *.out 
          *Notes.bib *.ps

I typically call

  make pdf clean-ps

using keybindings from nvi.

Answered by Predrag Punosevac on July 11, 2021

In case someone is still looking, I have made a Python 3 script to deal with this problem. I use it to generate a new clean LaTex folder with all the used files directly at the root of the folder, instead of spread in multiple subdirectories. This is a requirement for preprint servers like arXiv and HAL.

(If you only want to delete unused files, then simply use the content of the newly created clean folder)

The script takes as input:

  • a list of TeX file to parse (in case you split your documents in multiple files, located in the same folder)
  • a list of file extensions of the potentially unused files we wish to look for
  • some other self-explanatory options

The script looks in the specified TeX files for all occurrences of the specified extension and builds a list of all used files with this extension. All these files are copied over to a new specified folder. Other files found at the root of the TeX folder are also copied for convenience (except TeX compilation files, and the previous unused files). The provided TeX files are copied over as well, but all their references to the files are changed so that they point directly to the new files at the root of the new folder.

That way, you directly obtain a compilation-ready LaTex folder with all the files you need.

Here is the code:

import os, sys, shutil
import re
import ntpath

############ INPUTS ###############
# list of Tex files to parse
# (they should all be within the same folder, as the image paths
# are computed relative to the first TeX file)
texPathList = ["/home/my/tex/folder/my_first_file.tex",
               "/home/my/tex/folder/my_second_file.tex"]

# extensions to search
extensions=[".png", ".jpg", ".jpeg", ".pdf", ".eps"]

bExcludeComments = True # if True, files appearing in comments will not be kept
# path where all used images and the modified TeX files should be copied
# (you can then copy over missing files, e.g. other types of images, Bib files...)

# location of the new folder (should not exist already)
exportFolder = '/home/my/new/folder/clean_article/'

#  should all other files in the root folder (not in subfolders) be copied ?
# (temporary TeX compilation files are not copied)
bCopyOtherRootFiles = True

############## CREATE CLEAN FOLDER #################
# 1 - load TeX files
text=''
for path in texPathList:
  with open(path,'r') as f:
    text = text + f.read()
    
# 2 - find all occurrences of the extension
global_matches = []
for extension in extensions:
  escaped_extension = ''+extension # so that the point is correctly accounted for
  pattern=r'{[^}]+'+escaped_extension+'}'
  if not bExcludeComments: # simply find all occurrences
    matches = re.findall(pattern=pattern, string=text) # does not give the position
  else: # more involved search
    # 2.1 - find all matches
    positions, matches = [], []
    regex = re.compile(pattern)
    for m in regex.finditer(text):
        print(m.start(), m.group())
        positions.append( m.start() )
        matches.append( m.group())
    # 2.2 - remove matches which appear in a commented line
    # parse list in reverse order and remove if necessary
    for i in range(len(matches)-1,-1,-1):
      # look backwards in text for the first occurrence of 'n' or '%'
      startPosition = positions[i]
      while True:
        if text[startPosition]=='%':
          # the line is commented
          print('file "{}" is commented (discarded)'.format(matches[i]))
          positions.pop(i)
          matches.pop(i)
          break
        if text[startPosition]=='n':
          # the line is not commented --> we keep it
          break
        startPosition -= 1
  global_matches = global_matches + matches
  
# 3 - make sure there are no duplicates
fileList = set(global_matches)
if len(global_matches) != len(fileList):
  print('WARNING: it seems you have duplicate images in your TeX')
# 3.1 - remove curly braces
fileList = [m[1:-1] for m in fileList]

# 4 - copy the used images to the designated new location
try:
  os.makedirs(exportFolder)
except FileExistsError:
  raise Exception('The new folder already exists, please delete it first')

texRoot = os.path.dirname(texPathList[0])
for m in fileList:
  absolutePath = os.path.join(texRoot, m)
  shutil.copy(absolutePath, exportFolder)

# 5 - copy the TeX files also, and modify the image paths they refer to
for path in texPathList:
  with open(path,'r') as f:
    text = f.read()
  for m in fileList:
    text = text.replace(m, ntpath.basename(m) )
  newPath = os.path.join(exportFolder, ntpath.basename(path))
  with open(newPath, 'w') as f:
    f.write(text)

# 6 - if chosen, copy over all the other files (except TeX temp files)
# which are directly at the root of the original TeX folder
if bCopyOtherRootFiles:
  excludedExtensions = ['.aux', '.bak', '.blg', '.bbl', '.spl', '.gz', '.out', '.log']
  for filename in os.listdir(texRoot):
    fullPath = os.path.join(texRoot, filename)
    if os.path.isfile(fullPath):
      ext = os.path.splitext(filename)[1]
      # do not copy already modified TeX files
      if not ( filename in [ntpath.basename(tex) for tex in texPathList]):
        # do not copy temporary files
        if not ( ext.lower() in excludedExtensions ):
          # do not copy files we have already taken care of
          if not ( ext.lower() in extensions ):
            shutil.copy( fullPath, exportFolder)

# The export folder now contains the modified TeX files and all the required files !

Answered by Laurent90 on July 11, 2021

I wrote about it here medium.com/@weslley.spereira/remove-unused-files-from-your-latex-project. In a few words, I generalized a bit Alessandro Cuttin's script to encompass more directory levels. I hope it still helps.

nonUsed="./nonUsedFiles"
mkdir -p "$nonUsed"

# Directory Level 1
for imgFolder in $(ls -d "$projectFolder"/*/); do
    echo "$imgFolder"
    for imageFile in $(ls "$imgFolder"); do
#        echo "$imageFile"
        if grep "$imageFile" "$projectFolder/$mainfilename.log" -c > 1; then
            echo "+ File $imageFile is in use."
        else
            echo "- File $imageFile is not in use."
            mkdir -p $nonUsed"/"$imgFolder
            mv "$imgFolder/$imageFile" "$nonUsed/$imgFolder$imageFile"
        fi
    done
done

# Directory Level 2
for imgFolder in $(ls -d "$projectFolder"/*/*/); do
    echo "$imgFolder"
    for imageFile in $(ls "$imgFolder"); do
#        echo "$imageFile"
        if grep "$imageFile" "$projectFolder/$mainfilename.log" -c > 1; then
            echo "+ File $imageFile is in use."
        else
            echo "- File $imageFile is not in use."
            mkdir -p $nonUsed"/"$imgFolder
            mv "$imgFolder/$imageFile" "$nonUsed/$imgFolder$imageFile"
        fi
    done
done

Answered by Weslley S. Pereira on July 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP