.\" ====================================================================
.\"  @Troff-man-file{
.\"     author          = "Nelson H. F. Beebe",
.\"     version         = "0.09",
.\"     date            = "16 March 1996",
.\"     time            = "10:37:14 MST",
.\"     filename        = "bibsort.man",
.\"     address         = "Center for Scientific Computing
.\"                        Department of Mathematics
.\"                        University of Utah
.\"                        Salt Lake City, UT 84112
.\"                        USA",
.\"     telephone       = "+1 801 581 5254",
.\"     FAX             = "+1 801 581 4148",
.\"     URL             = "http://www.math.utah.edu/~beebe",
.\"     checksum        = "41463 462 2165 14510",
.\"     email           = "beebe@math.utah.edu (Internet)",
.\"     codetable       = "ISO/ASCII",
.\"     keywords        = "bibliography, sorting, BibTeX",
.\"     supported       = "yes",
.\"     docstring       = "This file contains the UNIX manual pages
.\"                        for the bibsort utility, a program for
.\"                        sorting BibTeX data base files by their
.\"                        BibTeX citation label names.
.\"
.\"                        The checksum field above contains a CRC-16
.\"                        checksum as the first value, followed by the
.\"                        equivalent of the standard UNIX wc (word
.\"                        count) utility output of lines, words, and
.\"                        characters.  This is produced by Robert
.\"                        Solovay's checksum utility.",
.\"  }
.\" ====================================================================
.if t .ds Bi B\s-2IB\s+2T\\h'-0.1667m'\\v'0.20v'E\\v'-0.20v'\\h'-0.125m'X
.if n .ds Bi BibTeX
.if t .ds Te T\\h'-0.1667m'\\v'0.20v'E\\v'-0.20v'\\h'-0.125m'X
.if n .ds Te TeX
.TH BIBSORT 1 "17 January 1996" "Version 0.09"
.\"======================================================================
.SH NAME
bibsort \- sort a BibTeX bibliography file
.\"======================================================================
.SH SYNOPSIS
.B bibsort
.RB [ \-byday
.RB " or " \-byvolume
.RB " or " \-byyear ]
[ optional
.BR sort (1)
switches ] < infile >outfile
.nf
or
.fi
.B bibsort
.RB [ \-byday
.RB " or " \-byvolume
.RB " or " \-byyear ]
[ optional
.BR sort (1)
switches ] BibTeXfile(s) >outfile
.\"======================================================================
.SH DESCRIPTION
.B bibsort
filters a \*(Bi\& bibliography, or bibliography
fragment, on its standard input, printing on
standard output a sorted bibliography.
.PP
Sorting is normally by \*(Bi\& citation label name, or by
.I @String
macro name, and letter case is always ignored in
the sorting.
.PP
.\"======================================================================
.SH OPTIONS
Except for the switches described below,
command-line words beginning with a hyphen
are assumed to be options to be passed to
.BR sort (1).
.PP
All remaining command-line words are assumed to be
input files.  Should such a filename begin with a
hyphen, it must be disguised by a leading absolute
or relative directory path, e.g.
.I /tmp/-foo.bib
or
.IR ./-foo.bib .
.PP
The
.BR sort (1)
.B \-f
switch to ignore letter case differences is always
supplied.  The
.B \-r
switch reverses the order of the sort. The
.B \-u
switch removes duplicate bibliography entries from
the input stream; however, such entries must match
exactly, including all white space.
.TP \w'\-byvolume'u+3n
.B \-byday
This switch is intended for use with
bibliographies of publications containing day,
month, and year data, such as technical reports,
newspapers, and magazines.  It causes entries to
be sorted by year, month, day, and citation label,
so that the entries appear in their original
publication order.
.IP
With -byday sorting, a day keyword is recognized
(it will be standard in \*(Bi\& 1.0), but for
backward compability, month entries of the form
.IP
.nf
"daynumber " # monthname
"daynumber~" # monthname
{daynumber } # monthname
{daynumber~} # monthname
monthname # "daynumber "
monthname # "daynumber~"
monthname # {daynumber }
monthname # {daynumber~}
.fi
.IP
are also recognized, and will yield both a day and
a month.  If a day number is not available, 99 is
assumed, which will sort the entry after others
that have day values in the same year and month.
.TP
.B \-byvolume
This switch is intended for use with
bibliographies of single journals.  It causes
entries to be sorted by journal, year, volume,
number, page, year, and citation label, so that
the entries appear in their original publication
order.  The journal name is included in the sort
key, so that in a bibliography with multiple
journals, output entries for each journal are kept
together.
.IP
With
.B \-byvolume
sorting, warnings are issued for any entry in
which any of these fields are missing, and a value
of the missing field is supplied that will sort
higher than any printable value.
.IP
Because
.B \-byvolume
sorting is first on journal name, it is essential
that there be only one form of each journal name;
the best way to ensure this is to always use
@String{...}  abbreviations for them.  Order
.B -byvolume
is convenient for checking a bibliography against
the original journal, but less convenient for a
bibliography user.
.TP
.B \-byyear
If this switch is given, then the entry year value
is prefixed to the sort key, so that sorting is
first by year, then by citation label.  This is
useful for keeping a bibliography in approximate
chronological order, ordered by citation label
within each year.
.SH "BIBTEX FILE PARTS"
The input stream is conceptually divided into five
parts, any of which may be absent.
.RS
.TP \w'1.'u+2n
1.
Introductory material such as comments, file
headers, and edit logs that are ignored by
\*(Bi\&.  No line in this part begins with an
at-sign, ``@''.
.TP
2.
Preamble material delineated by ``@Preamble{'' and
a matching closing ``}'', intended to be processed
by \*(Te\&.  Normally, there is only one such
entry in a bibliography file, although \*(Bi\&,
and
.BR bibsort ,
permit more than one.
.TP
3.
Macro definitions (abbreviations) of the form
``@String{.\|.\|.}''.  Any single @String
specification may span multiple lines, and there
are usually several such definitions.
.TP
4.
Bibliography entries such as ``@Article{.\|.\|.}'',
``@Book{.\|.\|.}'', ``@InProceedings{.\|.\|.}'', and
so on, provided that their citation labels have
not already been encountered in a
.I crossref
assignment in a preceding entry.  For
.BR bibsort ,
any line that begins with an ``@'' followed by
letters and digits and an open brace is considered
to be such an entry.  Optional spaces and tabs may
surround the ``@'', and precede the first open
brace; these spaces and tabs will be deleted from
the output to help standardize the appearance.
.TP
5.
``@Proceedings{.\|.\|.}'' bibliography entries,
which are likely to be cross-referenced by
``@InProceedings{.\|.\|.}'' entries, and any other
bibliography entries for which a crossref
assignment was met before the entry itself.
.PP
An unfortunate implementation limitation of the
current \*(Bi\& requires cross-referenced entries
to appear
.I after
all other entries that cross-reference them,
although this limitation works to the advantage of
.BR bibsort ,
allowing single-pass processing.
.RE
.PP
The order of these parts is preserved in the
output stream.  Part 1 will be unchanged, but
parts 2\(en5 will be sorted within themselves.
.PP
The sort key of ``@Preamble'' entries is their
initial line, of ``@String'' entries, the
abbreviation name.  For all other \*(Bi\& entries,
the sort key is citation label between the open
curly brace and the trailing comma, unless the
sort key is prefixed with additional fields as
requested by
.B \-byvolume
or
.B \-byyear
switches.
.PP
.B bibsort
will correctly handle UNIX files with LF line
terminators, as well as IBM PC DOS files with CR
LF line terminators; the essential requirement is
that input lines be delineated by LF characters.
Thus, files from the Apple Macintosh, which uses
bare CR to terminate lines, would first have to be
converted to UNIX or PC DOS line format before
giving them to
.BR bibsort .
.\"======================================================================
.SH CAVEATS
\*(Bi\& has loose syntactical requirements that
the current simple implementation of
.B bibsort
does not support.  In particular, outer
parentheses may
.I not
be used in place of braces following ``@keyword''
patterns.  If you have such a file, you can use
.BR bibclean (1)
to prettyprint it into a form that
.B bibsort
can handle successfully.
.PP
The user must be aware that sorting a bibliography
is not without peril, for at least these reasons:
.RS
.TP \w'1.'u+2n
1.
\*(Bi\& has a
requirement that entry labels given in
.IR "crossref" " = " "label"
pairs in a bibliography entry
.I must
refer to entries defined
.IR later ,
rather than earlier, in the bibliography file.
This regrettable implementation limitation of the
current (pre-1.0) \*(Bi\& prevents arbitrary
ordering of entries when
.I crossref
values are present.
To partially solve this problem,
.B bibsort
will place ``@Proceedings'' entries last, since
they are frequently cross-referenced by
``@InProceedings'' entries.  However, it is also
possible for ``@Book'', ``@InBook'', and
``@InCollection'' entries to cross-reference
``@Book'' entries, and for ``@Article'' entries to
cross-reference other ``@Article'' entries.
Neither of these cases are dealt with by
.BR bibsort ,
except that ``@Book'' entries that contain a
``booktitle'' assignment, and entries that are
explicitly cross-referenced before their
definition, are sorted with ``@Proceedings'',
.TP
2.
If the \*(Bi\& file contains interspersed
commentary between ``@keyword{.\|.\|.}'' entries,
this material will be considered part of the
.I preceding
entry, and will be sorted with it.  Leading
commentary is more common, and will be moved
elsewhere in the file.
.IP
This is normally not a problem for the part 1
material before the ``@Preamble'', since it is kept
together at the beginning of the output stream.
.TP
3.
Some kinds of bibliography files should be kept in
a different order than alphabetically by citation
labels.  Good examples are a bibliography file with
the contents of a journal, or a personal
publication list, for both of which chronological
publication order is likely to be preferred.
.RE
.PP
While a much more sophisticated implementation of
.B bibsort
could deal with the first point, and the
.B \-byvolume
switch provides a partial solution to the third
point, in general, a satisfactory solution
requires human intelligence and natural language
understanding that computers lack.
.PP
.B bibsort
uses octal ASCII control characters 001 through
007, 0177, and 0377, for temporary modifications
of the input stream.  If any of these are already
present in the input, they will be altered on
output.  This is unlikely to be a problem, because
those characters have neither a printable
representation, nor are they conventionally used
to mark line or page boundaries in text files.
.\"======================================================================
.SH "PROGRAMMING NOTES"
Some text editors permit application of an
arbitrary filter command to a region of text.
For example, in GNU
.BR emacs (1),
the command
.IR "C-u M-x shell-command-on-region" ,
or equivalently,
.IR "C-u M-|" ,
can be used to run
.B bibsort
on a region of the buffer that is devoid of cross
references and other material that cannot be
safely sorted.
.PP
Some implementations of \*(Bi\& editing support in
GNU
.BR emacs (1)
have a
.I sort-bibtex-entries
command that is functionally similar to
.BR bibsort .
However, the file size that can be processed
by
.BR emacs (1)
is limited, while
.B bibsort
can be used on arbitrarily large files, since it
acts as a filter, processing a small amount of
data at a time.  The sort stage needs the entire
data stream, but fortunately, the UNIX
.BR sort (1)
command is clever enough to deal with very large
inputs.
.PP
The current implementation of
.B bibsort
follows the UNIX tradition of combining simple
already-available tools.  A six-stage pipeline of
.BR egrep (1),
.BR nawk (1),
.BR sort (1),
and
.BR tr (1)
accomplishes the job in one pass with about 500
lines of heavily-commented shell script, about 225
lines of which is a
.BR nawk (1)
program for insertion of sort keys.  The initial
prototype of
.B bibsort
was written and tested on several large
bibliographies in a couple of hours, and after
considerable use, was later extended with advanced
sorting capabilities and cross-reference
recognition in a couple of days of work.  By
contrast,
.BR bibtex (1)
is more than 11\0000 lines of code and
documentation, and
.BR bibclean (1)
is more than 15\0000 lines long; both took months
to develop, implement, and test.
.\"======================================================================
.SH BUGS
.B bibsort
may fail on some UNIX systems if their
.BR sort (1)
implementations cannot handle very long lines,
because for sorting purposes, each complete
bibliography entry is temporarily folded into a
single line.  You may be able to overcome this
problem by adding a
.BI \-z nnnnn
switch to the
.BR sort (1)
command (passed via the command line to
.BR bibsort )
to increase the maximum line size to some larger
value of
.I nnnnn
bytes.  According to their documentation, some UNIX
.BR sort (1)
implementations require a space after
.BR \-z ,
others forbid it, and still others do not support it at all.
If a space is required, you must quote the pair,
to prevent the
.I nnnnn
value from being interpreted as a filename by
.BR bibsort .
.\"======================================================================
.SH "SEE ALSO"
.BR bibcheck (1),
.BR bibclean (1),
.BR bibdup (1),
.BR bibextract (1),
.BR bibjoin (1),
.BR biblabel (1),
.BR biblex (1),
.BR biborder (1),
.BR bibparse (1),
.BR bibtex (1),
.BR bibunlex (1),
.BR citesub (1),
.BR egrep (1),
.BR emacs (1),
.BR nawk (1),
.BR sort (1),
.BR tr (1).
.\"======================================================================
.SH AUTHOR
.nf
Nelson H. F. Beebe, Ph.D.
Center for Scientific Computing
Department of Mathematics
University of Utah
Salt Lake City, UT 84112
Tel: +1 801 581 5254
FAX: +1 801 581 4148
Email: <beebe@math.utah.edu>
WWW URL: http://www.math.utah.edu/~beebe
.fi
.\"==============================[The End]==============================
