Split tar
Материал из noname.com.ua
Версия от 10:55, 24 июня 2010; Sirmax (обсуждение | вклад)
#!/bin/bash
# splits a large tar file into a set of smaller ones
#
# Author: Dr. Jьrgen Vollmer <juergen.vollmer@informatik-vollmer.de>
# Copyright (C) 2003 Dr. Jьrgen Vollmer, Karlsruhe, Germany
# For usage and license agreement, see below (function usage)
#
# Id: split-tar,v 1.30 2010/01/13 17:57:52 vollmer Exp $
# Version: 1.11 of 2006/06/02
#set -x
CMD=`basename $0`
VERSION="1.11"
###############################################################################
usage()
{
cat <<END
usage: $CMD [options] tarfile.<suffix> (filename|directory)...
Splits a large tar archive into a set of smaller ones.
Creates a set of tar archives direct from the files and directories.
<suffix> is one of tar, tar.gz, tgz, or tar.bz2
Files are written to tarfile-???.<suffix> into the current working directory,
where ??? are three digits.
Note: since a TAR file contains tar-specific administration information
the resulting tar files may be larger that the specified size.
For computation only the file size of the sources are used.
Note: split-tar relies on the GNU version of "tar", "find" and "bash".
Note: split-tar is not able to read the filenames from stdin.
Use -T instead.
Options:
-c : Create the tar archives from [filename|directory...].
-C opts : Pass opts to tar, when creating the tarfile with -c
the compression options -z (gzip) or -j (bizp2) are
added by default, if the <suffix> indicates it.
-e rate : To compute the set of files to be put into a compressed
tarfile, one has to estimate compressed size of each
uncompressed source file. To do this a compression program
indicated by the tarfile.<suffix> is called (e.g. gzip).
This may be quite time consiming.
This overhead my be avoided by giving an "compression rate"
using the -e option. The real file-size of an an uncompressed
file is divided by that <rate>. This may result in
too large or too small result tarfiles. So one has to to some
trial and error to get the <rate> value right.
The <rate> is positive number.
-f prog : Use prog as "find" program, e.g.
-f /usr/local/bin/gfind
-N date : Only store files newer than <date>.
Typical format: YYYY-MM-DD or 'YYYY-MM-DD HH:MM:SS' or
if <date> begins with \`/' or \`.', it is taken to be the name
of a file whose last-modified time specifies the date.
-N passes its argument as tar option \`--newer=<date>'
(this may be changed in the source of this script, see
variable TAR_NEWER).
-N is valid only if -c is given.
-h : Help
-s sizeK : Maximum size of one tar file in Kilo bytes, default ${DEFAULT_SIZE}
-s sizeM : Size given in Mega Byte
-s sizeG : Size given in Giga Byte
-S : Split the existing tar archive tarfile.<suffix>
no [filename|directory...] may be given
that's the default
-t prog : Use prog as "tar" program, e.g.
-t /usr/local/bin/gtar
-T file : Read names to create the archive from <file>
-v : Verbose (verbose tar messages)
-V : Version.
Example:
Splitting an already existing archive:
If foo.tar.gz has a size of 3 M bytes, the command
split-tar -s 1M foo.tar.gz
will create the three tar.gz archives:
foo-000.tar.gz
foo-001.tar.gz
foo-002.tar.gz
which may be unpacked as usual:
tar -xzvf foo-000.tar.gz
tar -xzvf foo-001.tar.gz
tar -xzvf foo-002.tar.gz
and the the result would be the same as if one unpacks the initial archive
tar -xzvf foo.tar.gz
Creating the archives directly from the sources:
split-tar -e 5 -s 10M -c foo.tar.gz /home/foo
will create tar archives:
foo-000.tar.gz, .... foo-<n>.tar.gz
containing foo's home directory. A compression rate of 5 is assumed
for all not already compressed files.
Requirements:
BASH, GNU-tar, and GNU-find.
Version:
1.11 of 2006/06/02
Author:
Dr. Jьrgen Vollmer <juergen.vollmer@informatik-vollmer.de>
If you find this software useful, I would be glad to receive a postcard
from you, showing the place where you're living.
Homepage:
http://www.informatik-vollmer.de/software/split-tar.html
Copyright:
(C) 2003 Dr. Jьrgen Vollmer, Viktoriastrasse 15, D-76133 Karlsruhe, Germany
License:
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
END
exit
}
###############################################################################
DEFAULT_SIZE=1024 # kbyte
# we need the GNU utilities!
# which tar program to use, may be changed with the -t option
TAR=tar
# which find program to use, may be changed with the -t option
FIND=find
# file containing filenames
FILES=${TMP=/tmp}/$CMD.files.$$
# file containing the filename of a single file, too large for a single tar
FILE=${TMP=/tmp}/$CMD.file.$$
# where to untar the source tar file
TAR_DIR=${TMP=/tmp}/$CMD.dir.$$/
# file containing tar sources names
TAR_SOURCES=${TMP=/tmp}/$CMD.tarsources.$$
# remove temporary created files on exit
exit_trap()
{
if [ $OWN_TAR_SOURCES = NO ]
then
rm -fr $TAR_SOURCES
fi
rm -fr $FILES $FILE $TAR_DIR
}
trap exit_trap EXIT
# tar-file count
COUNT=0
# flag for selection about OWN_TAR_SOURCES
OWN_TAR_SOURCES=NO
# the GNU-tar option to conserve absolute filenames
# used only if for the -c (create) mode, if the user gives an absolute
# path
# older tar versions may use:
# TAR_WITH_ABSOLUTE_NAMES=--absolute-paths
TAR_WITH_ABSOLUTE_NAMES=--absolute-names
# the GNU-tar option for storing files newer than DATE
# another possibility would be: --newer-mtime
TAR_NEWER=--newer
# argument of -e
COMPRESSION_RATE=
##############################################################################
# emit an error message and terminate
##############################################################################
error()
{
echo "$CMD: error $*" 1>&2
exit 1
}
##############################################################################
# create tar files
##############################################################################
TAR_VERBOSE=
do_tar()
{
files=$1
dest=`printf "%s/%s-%03d%s" $DEST_DIR $DEST_BASE $COUNT $SUFFIX`
touch $dest >/dev/null 2>&1 || error "can not create file $dest"
if [ $DO_CREATE = NO ]
then TD="-C $TAR_DIR"
else TD=
fi
$TAR $TD $CREATE_OPTS $TAR_COMPRESS $TAR_VERBOSE \
-c -f $dest --files-from=$files --no-recursion
COUNT=$((COUNT + 1))
( size=`cat $dest | wc -c`;
printf "** create: %s: size: %9d (bytes)\n" $dest $size )
}
##############################################################################
# emit all parts of a directory path name
##############################################################################
emit_dir_parts()
{
local ff="$*"
while [ ! \( -z "$ff" -o "$ff" = "." -o "$ff" = "/" \) ]
do
echo "yyyyy $ff/"
ff=`dirname "$ff"`
done
}
##############################################################################
# emit all parts of a directory path name (sorted)
##############################################################################
# LC_ALL=C to get the traditional sort order that uses native byte values.
emit_dir_parts_sorted()
{
local ff="$*"
(
while [ ! \( -z "$ff" -o "$ff" = "." -o "$ff" = "/" \) ]
do
echo "$ff"
ff=`dirname "$ff"`
done
) | (LC_ALL=C sort -u -s)
}
##############################################################################
# check options
##############################################################################
DO_CREATE=NO
CREATE_OPTS=
MAX_SIZE=$((DEFAULT_SIZE * 1024))
TAR_NEWER_ARG=
while getopts cC:e:f:N:hvs:St:T:vV opt "$@"
do
case $opt in
c ) DO_CREATE=YES;;
C ) CREATE_OPTS="$CREATE_OPTS $OPTARG";;
e ) COMPRESSION_RATE=$OPTARG;;
f ) FIND=$OPTARG;;
N ) TAR_NEWER_ARG=$OPTARG;;
S ) DO_CREATE=NO;;
s ) [ x"$OPTARG" = x`expr "$OPTARG" : "\([0-9]*[kKmMgG]\)"` ] ||
error "-s expects a number followed by one optional character of KMG"
case $OPTARG in
*[kK] ) MAX_SIZE=$((${OPTARG%[kK]} * 1024));;
*[mM] ) MAX_SIZE=$((${OPTARG%[mM]} * 1024 * 1024));;
*[gG] ) MAX_SIZE=$((${OPTARG%[gG]} * 1024 * 1024 * 1024));;
* ) MAX_SIZE=$(($OPTARG * 1024));;
esac;;
t) TAR=$OPTARG;;
T) TAR_SOURCES=$OPTARG
OWN_TAR_SOURCES=YES
[ -s $TAR_SOURCES ] ||
error "-T expects a filename with files to get tar'ed in"
;;
v) TAR_VERBOSE=-v;;
V) echo "$CMD $VERSION"
exit
;;
h|*) usage;;
esac
done
shift `expr $OPTIND - 1`
# check correct version of TAR and FIND
if $TAR --version 2>&1 | grep "GNU tar" > /dev/null
then :
else echo "$CMD: sorry $TAR is no GNU tar"
exit 1;
fi
if $FIND --version 2>&1 | grep "GNU find" > /dev/null
then :
else echo "$CMD: sorry $FIND is no GNU find"
exit 1;
fi
if [ $DO_CREATE == YES ]
then
if [ $OWN_TAR_SOURCES = YES ]
then
[ $# -ge 1 ] || error "expected at least one more argument, for more information: $CMD -h"
TAR_FILE=$1; shift
else
[ $# -ge 2 ] || error "expected at least two arguments, for more information: $CMD -h"
TAR_FILE=$1; shift
while [ $# -ge 1 ]
do
echo $1 >> $TAR_SOURCES ; shift
# more $TAR_SOURCES
done
fi
[ -z "$TAR_NEWER_ARG" ] && TAR_NEWER_ARG="1970-01-01 00:00:00"
TAR_DIR=
else
[ $# -eq 1 ] || error "expected one argument, for more information: $CMD -h"
TAR_FILE=$1
[ -f $TAR_FILE ] || error "could not read $TAR_FILE"
[ -z "$TAR_NEWER_ARG" ] || error "-N requires -c"
fi
# COMPRESS_CMD is used only to compute the estimated compressed size fo a file
# it is not used to actually do the compression. That is done via the
# TAR_COMPRESS tar command line option
case `basename $TAR_FILE` in
*.tar.bz2 ) SUFFIX=".tar.bz2"
COMPRESS_CMD="bzip2 --stdout"
TAR_COMPRESS=--bzip2;;
*.tar.gz ) SUFFIX=".tar.gz"
COMPRESS_CMD="gzip --stdout --no-name"
TAR_COMPRESS=--gzip;;
*.tgz ) SUFFIX=".tgz"
COMPRESS_CMD="gzip --stdout --no-name"
TAR_COMPRESS=--gzip;;
*.tar ) SUFFIX=".tar"
COMPRESS_CMD=
TAR_COMPRESS=;;
* ) error "unknown suffix of $TAR_FILE";;
esac
DEST_BASE=`basename $TAR_FILE $SUFFIX`
DEST_DIR=`dirname $TAR_FILE`
##############################################################################
# do the job
##############################################################################
# the size of the files to be tar'ed
cur_size=0
rm -fr $FILES $FILE $TAR_DIR $DEST_BASE-[0-9][0-9][0-9]$SUFFIX
# The line with "xxxx xxxx" indicate: we have seen all files, tar the remaining
# files
# The line with "yyyy <name>" indicate: a directory or other kind of file.
# We have to add directories in order to get the file permissions right.
(
if [ $DO_CREATE = NO ]
then
############################################################################
# unpack the source tar archive
############################################################################
mkdir -p $TAR_DIR || error "can not create $TAR_DIR"
$TAR -C $TAR_DIR -x $TAR_COMPRESS -f $TAR_FILE || error "can not un-tar $TAR_FILE"
$FIND $TAR_DIR \( -type f -o -type l \) -a -printf "%s %p\n"
else
############################################################################
# create new archive
# Note: In order to get file-ownership correct, we have to tar all
# all directories and parts of it found in any file-path to be added
# in the resulting archive. If we don't do that, we get for
# created (intermediate) directories the ownership of the
# extractor (e.g.).
# Therefore we call tar with the --no-recursion option.
############################################################################
(
$TAR $TAR_WITH_ABSOLUTE_NAMES \
$CREATE_OPTS \
$TAR_NEWER "$TAR_NEWER_ARG" \
--files-from=$TAR_SOURCES \
-cv -f /dev/null
) |
while read -r f
do
if [ -f "$f" ]
then wc -c "$f"
if [ "${f%/*}" != "$last_dir" ]
then last_dir=`dirname "$f"`
emit_dir_parts "$last_dir"
fi
elif [ -d "$f" ]
then f=${f%/}
emit_dir_parts "$f"
last_dir="$f"
else echo "yyyyy $f"
fi
done
fi
) | ( LC_ALL=C sort -u -s -k2; echo "xxxx xxxx"; ) | ( sed -e "s|/$||" ) |
while read -r size name
do
case $size in
xxx* ) [ -f $FILES ] && do_tar $FILES
;;
yyy* ) # The file name must be stored too :-)
# but it will be compressed too
# Add it in any case (ok if we have very bad luck and we're
# saving a HUGE directory structure without any files
# the resulting archive would be too large).
size=$((size + ${#name} / 4))
cur_size=$((cur_size + size))
echo "$name" | sed -e"s|^$TAR_DIR||" >> $FILES
;;
* ) if [ x"$COMPRESS_CMD" != x ]
then # compute estimate of compressed file size
case "${name##*.}" in
gz | zip | bzip | bzip2 ) ;; # already compressed
* ) if [ x"$COMPRESSION_RATE" = x ]
then size=`$COMPRESS_CMD "$name" | wc -c`
else size=$((size / $COMPRESSION_RATE))
fi
;;
esac
fi
size=$((size + ${#name} / 4))
# the file name must be stored too :-)
# but it will be compressed too
if [ $size -ge $MAX_SIZE ]
then echo "$name" | sed -e"s|^$TAR_DIR||" > $FILE
do_tar $FILE
elif [ $((size + cur_size)) -ge $MAX_SIZE ]
then do_tar $FILES
cur_size=$size
# start new tar archive, so we need to emit all
# parts of the current files pathname (sorted)
cat /dev/null > $FILES
dir_names=$(emit_dir_parts_sorted "`dirname "$name"`")
if [ -n "$dir_names" ]; then
echo "$dir_names" | sed -e"s|^$TAR_DIR||" >> $FILES
fi
echo "$name" | sed -e"s|^$TAR_DIR||" >> $FILES
else cur_size=$((cur_size + size))
echo "$name" | sed -e"s|^$TAR_DIR||" >> $FILES
fi
esac
done
##############################################################################
# T h e E n d
##############################################################################
# Log: split-tar,v $
# Revision 1.30 2010/01/13 17:57:52 vollmer
# typoo
#
# Revision 1.29 2006/07/10 07:17:28 vollmer
# typoo
#
# Revision 1.28 2006/06/02 09:26:07 vollmer
# typoo
#
# Revision 1.27 2006/04/24 14:11:46 vollmer
# typoo
#
# Revision 1.26 2006/02/23 20:01:46 vollmer
# Now all directories get the correct time stamp.
# Sorting works as expected, even if non- 7-bit-ASCII letters are used
# by using LC_ALL=C.
# Thanks to one who wants to be unnamed for sending me the bug-fixes.
#
# Revision 1.25 2005/04/27 13:48:34 vollmer
# Add all intermediate directories of a path explicitly in order to get
# file/directory ownership correctly.
# Thanks to Tom Battisto <tbattist-AT-mailaka.net> for the bug report.
#
# Revision 1.24 2005/04/26 07:56:49 vollmer
# Directory persmissions are set now correctly when unpacking the archives.
# Thanks to Tom Battisto <tbattist-AT-mailaka.net> for the bug report.
#
# Revision 1.23 2005/04/08 20:52:32 vollmer
# Added option -T, thanks to Juergen Kainz <jkainz-AT-transflow.com>
#
# Revision 1.21 2005/04/08 20:14:22 vollmer
# added option -e
#
# Revision 1.20 2004/07/23 21:30:15 vollmer
# - added -f and -t options to specify a FIND and TAR program.
#
# Revision 1.18 2003/11/06 16:24:13 vollmer
# - options passed by -C to tar will be passed now to the do_tar routine
# - \ as part of file names are allowed now
# Thanks to A. R.
#
# Revision 1.17 2003/11/03 16:42:10 vollmer
# - Added option -N
# - The created tar files are stored now in the given directory and not
# in the current one.
# Thanks to Martin Walter <martin.walter-AT-erol.at>, who found that bug and
# asked for -N
#
# Revision 1.16 2003/10/31 13:01:51 vollmer
# Creating a splitted tar file from directory works now for absolute
# path names of the directory
#
# Revision 1.15 2003/09/18 17:10:40 vollmer
# Filenames containing blanks are processed correctly if given on the
# command line.
# Thanks to Dr. Jim McCaa <jmccaa-AT-ucar.edu>, who gave me the fix.
#
# Revision 1.14 2003/08/18 07:28:16 vollmer
# The number followed -s must be followed now by k m or g
# (in order to make `expr' more portable)
#
# Revision 1.13 2003/08/12 07:56:38 vollmer
# added an Example
#
# Revision 1.12 2003/08/12 07:22:27 vollmer
# fixed a bug found by Willem Penninckx <willem.penninckx-AT-belgacom.net>:
# filenames may contain now blanks and * and other shell emta charcters.
#
# Revision 1.11 2003/07/29 14:08:10 vollmer
# added the aibility to create the tar archive directly from the sources
# (option -c)
#
# Revision 1.10 2003/07/29 13:01:50 vollmer
# -s accepts size specifier k,K,m,M,g or G
#
# Revision 1.9 2003/07/29 12:38:17 vollmer
# improved computing expected size computation
#
# Revision 1.8 2003/07/21 07:55:32 vollmer
# added --no-name option to the gzip COMPRESS_CMD
#
# Revision 1.7 2003/07/15 08:27:04 vollmer
# - added bzip2, thanks to Martin Deinhofer <martin.deinhofer-AT-gesig.at>
# - added length of file names when computing the size
#
# Revision 1.0 2003/07/02 14:57:17 vollmer
# Initial revision
##############################################################################