Intro

If you’re like me you have years of files strewn about your computers at home. I occasionally go through and remove usesless stuff, starting with the largest files and directories.

When I need to find something useful way later, I usually just search for it. Let’s take a look at how we can quickly find long forgotten files on Linux.

findutils

GNU Find Utilities are a set of tools for, as you might have guessed, finding files. It consists of four utilities:

  • find
  • locate
  • updatedb
  • xargs

find

find will walk a directory and print out file paths that match the find criteria.

The simplest example find all files recursively in the working directory:

find

Let’s look at some other common scenarios and how you can use find:

Find files by Name

Find all files in the current directory that have “video” in the name:

find . -name "*video*"

Note that since we’re using * in our pattern, we need to put " around it to prevent the shell from expanding it.

If we’re unsure of the capitalization, we can do the same thing but with case-insensitivity:

find . -iname "*video*"
locate & updatedb

findutils comes with locate and updatedb. updatedb uses find to enumerate files on the system and index their names in a file. You can then use locate to search the index by file name. The nice thing about locate is it’s extremely fast, since the data is indexed by find running on a cron job. Lots of distributions will have it configured right out of the box.

Find files with “video” in the name using locate, with case insensitivity:

locate -i video

Find files by Type

Sometimes you only want to find directories, or actual files. Maybe you want to go deeper and look at just block files. It’s all possible with find.

Find only directories below the current directory:

find . -type d

Find only regular files below the current directory:

find . -type f

Find only regular files and directories:

find . -type f,d

There are a few other types:

  • b - block (buffered) special
  • c - character (unbuffered) special
  • d - directory
  • p - named pipe (FIFO)
  • f - regular file
  • l - symbolic ilnk
  • s - socket

Find Files by Size

This one might be a little weird if you’re used to searching for files in a regular GUI explorer. By default, find assumes size arguments are in unites of 512-byte blocks. You probably want to use more familiar sizes.

Find files exactly 1 MiB in size:

find . -size 1M

Find files less than 4 GiB in size:

find . -size -4G

Find files greater than 640 KiB in size:

find . -size +640k

Note: Because find rounds up to the next unit, you should avoid expressions where the number of units is -1, as the behavior is surprising:

This actually only returns empty files:

find . -size -1M

This returns files that are less than 1 MiB:

find . -size -1024k

You can combine flags to further filter results, as well as provide a size range.

This returns files that are more than 1 MiB but less than 10 MiB:

find . -size +1M -size -10M

Rounding means you’ll run into unexpected behavior with something like this (which won’t return anything):

find . -size +2k -size -3k

Avoid a ranges of one unit. To get the expected behavior for the above, do this instead to show files between 2 and 3 KiB in size:

find . -size +2047c -size -3073c

Notice we had to wide the range up by one bit to include files that are exactly 2 or 3 KiB.

Find Files by Time

find will also let you search for files by 3 different timestamps:

  • atime - access, the last time the file’s contents were read
  • ctime - change, the last time the file’s status changes
  • mtime - modified, the last time the file’s contents
ctime vs mtime

So, let’s clear up the obvious confusion around ctime and mtime. The file’s status is the metadata about the file you can see with the stat command:

$ stat ./test1
  File: test1
  Size: 5002      	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 2120180     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/ michael)   Gid: ( 1000/ michael)
Access: 2020-12-06 13:26:58.500986925 -0800
Modify: 2020-12-06 13:45:47.253516071 -0800
Change: 2020-12-06 13:45:47.253516071 -0800
 Birth: -

If any of those values (other than atime and mtime) change, ctime is updated.

mtime is only updated if the actual content of the file is changed. That often leads to the size of the file changing, which will update ctime. We can also update ctime without updating mtime by doing something like changing the owner of the file:

$ chown other_user:other_user ./test1

$ stat ./test1
  File: test1
  Size: 5002      	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 2120180     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1001/other_user)   Gid: ( 1001/other_user)
Access: 2020-12-06 13:26:58.500986925 -0800
Modify: 2020-12-06 13:45:47.253516071 -0800
Change: 2020-12-06 13:46:16.564943124 -0800
 Birth: -

So let’s see how to use it with find. The syntax is the same for all 3 timestamps.

Find files modified less than 24 hours ago:

find . -mtime 0

Find files changed between 24 and 48 hours ago:

find . -ctime 1

As you might have noticed, the syntax is a little weird, as you’re passing in the number of 24 hour periods (not calendar days) that have elapsed since the timestamp.

If you’re like me, you’d probably rather count those periods based on calendar days. We can do that with the -daystart option. You can probably figure out what it does. When this option is used, we start counting the 24 hour periods from the beginning of today.

Find files accessed today:

find . -daystart -atime 0

Find files that were last accessed yesterday:

find . -daystart -atime 1

Like -size, you can also use + and - to find files modified, changed, or accessed in a time range:

Find files accessed more than a week ago, but less than a year ago:

find . -daystart -atime +7 -atime -365

You can also use minutes instead of 24 hours periods.

Find files changed more than 3 and less than 10 minutes ago:

find . -cmin +3 -cmin -10

Note: -daystart works for the -*time and -*min flags, but it only affects flags that come after it on the command line.

Directory Controls

There are some options available to modify how find descends through directories.

Find all files with “test” in the name that are at most 2 levels deep in the current working directory hierarchy:

find . -name "*test*" -maxdepth 2

There’s also a mindepth argument. Let’s show only files at least 1 level deep that are greater than 5 MiB in size:

 find . -mindepth1 -size +5M

Summary

Well, there you have it, a quick primer on using find to find files on your Linux box. You can easily find files based on their names, size, and timestamps. find does a lot more than what’s covered here, so if you need something really bizarre, be sure to checkout the GNU findutils manual.

Thanks for reading! Good luck finding that file