.\" cmpd.1
.\" stored as cmpd rather than cmp to differentiate from standard unix command
.TH CMPD P "Manual V1.20 for Program V1.20" "Julian H. Stacey" "Julian H. Stacey, http://www.berklix.com"
.\" A4 the cludgy way.
.\" .pl 29.6c
.UC 4
.SH NAME
cmpd \- Compares files, (optionally deleting duplicates).

.SH SYNTAX
.B cmpd
[--] [-b] [-d] [-e] [-l[directory]] [-m] [-s] [-v] file_1 [file_2 file_3 file_4] last

.SH SUMMARY
.I cmpd
is a rewrite of the standard Unix like command, with extra functionality.

.SH DESCRIPTION

Compares one or more files, which may have local, relative, or rooted,
normal, hard linked, or symbolic linked names,
with a last element which may be a directory, or a file.
Where more than 2 arguments are given, the last must be a directory.
If a single argument is given reference file data is taken from <stdin>.
.I cmpd
can work with binary files.
.I cmpd
will not willingly delete the only copy of a file,
.I cmpd
detects and avoids deleting data for all the following
sequences (that might arise by human error &/or shell scripts failing):
.in +4
cmpd -d data data
.br
cmpd -d data .
.br
cmpd -d data ../datas_dir/data
.br
ln -s data symbolic_ptr ; cmpd -d data symbolic_ptr
.br
ln -s . symbolic_ptr ; cmpd -d data symbolic_ptr
.in -4
See Limitations Section Below.

Syntax such as
.ti +4
cmpd ../../same_name1 ../../same_name2 .
.br
is warned about, as it will always result in equality.
.br
.cmpd
does not convert it into
.ti +4
cmpd same_name1 same_name2 ../..
.br
Nor will such conversion be added later,
because its not at all agreeable to convert
.ti +4
cmpd -d ../../same_name1 ../../same_name2 .
.br
to
.ti +4
cmpd -d same_name1 same_name2 ../..

.SH OPTIONS

Option flags must precede file names.
Option flags may be used in any order.
Option flags available include :-

.TP
.B \--
Use stdin to provide reference file data.
.in +2
[ When compiled with -D NEW_LATER ] {
Untested code, will exit(1).
I also need to check.
I may have created inconsistent order positioning of Reference & Removals
between these 2 examples:
.in +2
.br
cmpd -d Delete_File_1 Delete_File_2 Delete_File_3 Reference_Directory
.br
cmpd -d --=Reference_Pipe Delete_File_1
.in -2
)
}
.in -2
.TP
.B \-m
Compare email (or news) articles, where compare of content is done,
but not fields preceeding a blank line.
If difference is detected the line number reported is the line number
for the body of the email, excluding header lines & seperator line(s).
.\" what about byte count ?
.TP
.B \-b
Reads are done in 512 byte blocks (to allow for raw unix
devices, or restricted msdos memory), rather than the normal larger
buffer (0x2000 when writing this manual).
.TP
.B \-d
Deletes first of 2 identical files (thus if last element
is a directory, individual files that have identical contents
to 'reference_directory/individual_name', will be deleted).
.br
Option name -d is used rather than -r,
to avoid an unpleasant suprise to users used
to -r recursion/sub directory options such as used in diff -r.
In case a user hopes cmpd -r would compare trees
(which it does not do),
.I cmpd
will not delete files by accident instead!.
.TP
.B \-n
No. Do not actually delete = unlink() file, but do all the other compare stuff.
Maybe later I should also let -n inhibit link() ?.
Purpose of -n is to allow running in dummy mode, to harvest text of what it would do with eg find in script or nohup
.TP
.B \-l[directory]
Creates a symbolic link, to replace the deleted file.
The symbolic link normally points to the unchanged reference file,
but if an optional directory is specified,
.I cmpd
uses that to link via instead of the directory of the reference file.
This option includes action appropriate to option `-d',
(thus `-d' need not be invoked).
(Sigint is ignored between deletion & creation,
but other signals are not blocked).
.br
An example:  cmpd -d -v -l../../../../../../../txt/tax/2018/telekom * ../../../../../../../txt/tax/2018/telekom
.br
Note as the link path is the same as the reference path, it does not need to be specified, ie this is just as good:
.br
cmpd -d -v -l * ../../../../../../../txt/tax/2018/a4_bills
.TP
.B \-e
Merge stderr to stdout.
.TP
.B \-s
Suppress complaints of not being able to stat reference files.
May be used to supplement the -d option.
When using
.ce
cmpd -d -s * ../some_reference_directory
.br
the -s is useful if you want to remove duplicates in the current directory,
but not want to know if some files cannot be found in the reference directory.
This option does not change the functionality, just suppresses warnings, (thus -s will not cause non-duplicated files to be deleted).
.TP
.B \-v
Verbose mode, announce files as processed (normal mode is silent).
.TP
.B \-M
Mode: NOT YET IMPLEMENTED. Only consider equal if as well as identical content, mode bits on both files are also identical.
.TP
.B \-L
Links: NOT YET IMPLEMENTED. Only consider equal if both are symbolic links
pointing to same initial link

.SH LIMITATIONS
.I cmpd
has no means of detecting
.ti +4
cat data | cmpd -d -- data
.br
as a dubious usage, if usage of
.I cmpd
is devious (as above),
.I cmpd
can be forced to delete the only copy of a file.

.SH RESTRICTIONS
None

.SH EXAMPLES

Reducing 2 copies to original + symbolic link:
.br
cd ~/tax/2010 ; cmpd -d -v -l ../../car/2010-06-06-service.tiff 
2010-06-06-service.tiff
.br
ls -l ../../car/2010-06-06-service.tiff 2010-06-06-service.tiff
.br
 ../../car/2010-06-06-service.tiff -> 2010-06-06-service.tiff
.br
10000 Nov 29 14:35 2010-06-06-service.tiff

How to prune out files that are unchanged between an old & new directory tree.
.br
cd old_tree ; find -x . -type f -exec /usr/local/bin/cmpd -d -v {} ../new_tree \\;

.SH EXIT CODE

.in +5
.ti -5
0
.br
All files are the same.
.ti -5
Positive integer
.br
Number of differing files.
.ti -5
Negative integer
.br
Number of differing files, plus system call failures such as open()
or unlink(); note such system call failures do not cause an immediate exit.
.ti -5
-32000
.br
User syntax or logical error.
.in -5

.SH FEATURES
If two zero size files are encountered in different directories,
of name typescript, one will be delete, this is unfortunate, as of
course when the 2 script commands exit, the files would then have
held different contents. Not a bug, merely unfortunate.

.SH BUGS
Error message should be made conformant to cmp & report EOF.
.br
Taking 2 files, one a shorter truncated version of the other,
currently output looks like this:
.nf
% ls -l
	2147473408 long
	 972095488 short
% cmp long short
	cmp: EOF on short
% cmp short long
	cmp: EOF on short
% cmpd -v  long short
	Different: char 696 in line 4641590, (byte 972095489)
% cmpd -v  short long
	Different: char 696 in line 4641590, (byte 972095489)
% cmpd long short
	Different: char 696 in line 4641590, (byte 972095489)
% nice cmpd short long
	Different: char 696 in line 4641590, (byte 972095489)
.fi

.SH POSSIBLE BUGS
.I cmpd does not recognise/analyse symbolic links over NFS between
multiple hosts, it IS possible to lose your only copy of a file,
doing NFS based compare & delete.
.br
Example This deletes the only copy of valid data:
.in +2
cd /home/yourname
.br
find -x . -type f -exec cmpd -d -v {} /host/laptop/usr/home/yourname ;
.in -2

I have not yet considered the following possible bug:
Using 2 hosts & NFS, both files could by chance have same inode number,
& same major & minor, in which case the duplicate file wouldnt
be deleted when it should.

.SH HOST OP SYSTEM
This utility runs on Msdos 3.2 & Unix (inc All FreeBSD to 8.1 & BSD-4.2 on Symmetric S375).

.SH COPYRIGHT
Program Copyright Julian H. Stacey, Munich, 14th May 1987 to 2022 on.
.br
Document Copyright Julian H. Stacey, Munich, 11th Feb 1991 to 2022 on.
.\" .so author.jhs

.SH ENHANCEMENTS TO DO
Add capability to compare & delete symbolic links if paths are identical.
.br
Add capability to compare & delete special devices if major & minor are identical.

.SH BUGS
cmpd -d-v *jpg tmp
.br
does delete, but does not do it verbose.

HALF BAKED BUG DESCRIPTION BELOW, JJLATER FIX IT
Insufficient checking, cmpd can creates a useless self referential symbolic link if you run it wrongly, eg
if 2 copies of a file exist, eg by
.nf
rm -f ~/tmp/Thing ~/tmp/temp/Thing
cp /etc/motd ~/tmp/Thing ; mkdir -p ~/tmp/temp ; cp ~/tmp/Thing ~/tmp/temp/Thing
ls -l ~/tmp/Thing ~/tmp/temp/Thing
.fi
then
.br
cd ~/tmp/temp ; cmpd -d -v -l ../Thing Thing
creates useless self referential symbolic link ~/tmp/Thing -> Thing,
where you probably wanted ~/tmp/temp/Thing removed & replaced by symbolic link Thing -> ../Thing ,
so should have done
.br
cd ~/tmp/temp ; cmpd -d -v -l../ Thing

.SH FILES
No default filenames.

.SH SEE ALSO
diff(1), cmp(1).

.SH ANNEX

Test script to test
.I cmpd
on a BSD Unix.
.nf
.in +5
echo Shell script for cmpd.c, run with sh -x

mkdir test_cmpd
cd test_cmpd
date > data
echo "different" > data2
cmpd=/usr/bin/local/cmpd

$cmpd -d -v data data2
ls -l dat*

$cmpd -d -v data data
ls -l data

$cmpd -d -v data .
ls -l data

$cmpd -d -v data ../test_cmpd/data
ls -l data

ln -s data symbolic_ptr

$cmpd -d -v data symbolic_ptr
ls -l data symbolic_ptr

$cmpd -d -v symbolic_ptr data
ls -l data symbolic_ptr

ln -s . symbolic_ptr ; $cmpd -d -v data symbolic_ptr
ls -l data symbolic_ptr

cat data | $cmpd -d -v -- data
ls -l data

rm data2 symbolic_ptr
cd ..
rmdir test_cmpd
.in -5
.fi
.\" End of file
