Aug 102012
 

Very often on unix environment we have to deal with symlinks (aka soft links – reference to other files).

Because of different reason the referenced file may be moved or removed, but theĀ  symlink still points to the old file.

Such situations can lead to unexpected errors: e.g if your symlink points to an executable and the referenced file was deleted then you will get “Permission denied” message, and you will ask yourself: why it does not work ,couple of days ago it was just fine; and the file is there ?!.

Because of this few years ago I’ve made a small script to look for dead links.
Usually I start this script during weekend, and Monday morning I’m just looking on the reports.

I want it to run on different Unix flavors, so I’ve choose something portable: JAVA Perl .

What is doing this script:

  1. get all the files/dirs/links from the directory passed as parameter
  2. normal files are just ignored
  3. for each link found in the start dir check if the referenced object exist.
    • if it does not exist put the link in a hash that will store all the dead links
    • if the referenced object is a directory then consider this one as a new input for step #1
  4. For all the directories found in the starting directory apply again step #1

In a few words: it is a recursive search down into the hierarchy.

There is just one interesting thing, because of the symlinks there are chances to have cycles in the files structure.
And it means that with this algorithm we may enter into a never ending loop.

The solution is to keep track of all the already scanned directories, and when they are found 2nd time, they will be ignored – this can be done using other hash (aka associative array).
We will not keep in this hash the name/path of the directory because we can also have hard links of the files (to paths pointing to the same “content”), and because of this we need some info which unique identifies the files/directories content.
And this info is provided by stat($file) Perl routine.
stat() return a lot of things, but we are interested only in first 2 elements: device number of files system and the inode number.
They unique identify the files.

Here is the dead_links.pl , and bellow is the code (just copy->paste the code if the download link has problems)

Feedback is appreciated, comments appreciated too.


#!/usr/local/bin/perl
# search for dead symlinks into hierarchy
# author marcel_preda[at]yahoo.com


use strict ;
# a global hash with  dead links
my %DeadLinks = ();

# a global with the visited dev-inodes 
my %DevInodes = ();


# print a short help
# every `pro' script should have something like this
sub printUse
{
	print <<EOF;
\nUse	`$0 [Options]'
	`$0 [directoryName]'
	Valid options:
		-h, --help print this help ;

EOF
	exit 0;
}


# search (and print) in a subdirectory
sub searchLinks
{
# the global %DeadLinks, %DevInodes variables will be used in this subroutine

	my($file,@files,@files2);
	my ($dev, $inode) = stat($_[0]);
	# $dev:$inode identifies the file 
	if ($DevInodes{"$dev:$inode"}){
		# directory was already visited, probable a cyclic link or hard link
		# print "VISITED: ", $_[0], " \n";
		return; 
	};

	#mark dir as visited to not scan it next time
	$DevInodes{"$dev:$inode"} = 1;

	if (opendir(DIR_HANDLE,$_[0])){
		while ($file=readdir(DIR_HANDLE)) {
			push(@files,"$_[0]/$file");
		};
		closedir(DIR_HANDLE);
		@files2=grep((!/\.\.?$/),@files);	#skip `.' and `..' dirs
		for $file (@files2){
			# if link, check if it is not a dead one
			if ( -l $file ) {
				($dev, $inode) = stat $file;
				if ((! $dev) && (! $inode)){
					$DeadLinks{$file} = readlink $file;
				};
			};
				
			# if is directory call again  the subrutine
           	searchLinks($file) if (-d $file) ;
		}
	};
}

#main
my $DIR=$ARGV[0];

# print help
while ( $_ = shift ){
	(/^-h$/ || /^-+help$/ ) && printUse(); 
};

# no dir provided, use the $PWD
unless ($DIR) {$DIR='.';};

searchLinks($DIR);

# Report dead links
my $link;
foreach $link (keys %DeadLinks){
    print 'DEAD LINK',  "\t $link\t->\t",  $DeadLinks{$link}, "\n";
};

  2 Responses to “Find Dead Links on Unix with Perl”

  1. Greetings from Ohio! I’m bored to death at work so I decided to check out your blog on my iphone during lunch break. I enjoy the information you present here and can’t wait to take a look when I get home. I’m surprised at how fast your blog loaded on my phone .. I’m not even using WIFI, just 3G .. Anyways, superb site!

  2. Sure, please do it.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)