Split tar

Материал из noname.com.ua
Перейти к навигацииПерейти к поиску


#!/bin/bash
# splits a large tar file into a set of smaller ones
#
# Author: Dr. Jьrgen Vollmer <juergen.vollmer@informatik-vollmer.de>
# Copyright (C) 2003 Dr. Jьrgen Vollmer, Karlsruhe, Germany
# For usage and license agreement, see below (function usage)
#
# Id: split-tar,v 1.30 2010/01/13 17:57:52 vollmer Exp $
# Version: 1.11 of 2006/06/02

#set -x

CMD=`basename $0`
VERSION="1.11"

###############################################################################

usage()
{
    cat <<END
usage: $CMD [options] tarfile.<suffix> (filename|directory)...

  Splits a large tar archive into a set of smaller ones.
  Creates a set of tar archives direct from the files and directories.

  <suffix> is one of tar, tar.gz, tgz, or tar.bz2
  Files are written to tarfile-???.<suffix> into the current working directory,
  where ??? are three digits.
  Note: since a TAR file contains tar-specific administration information
        the resulting tar files may be larger that the specified size.
        For computation only the file size of the sources are used.

  Note: split-tar relies on the GNU version of "tar", "find" and "bash".
  Note: split-tar is not able to read the filenames from stdin.
        Use -T instead.

Options:
  -c       : Create the tar archives from [filename|directory...].
  -C opts  : Pass opts to tar, when creating the tarfile with -c
	     the compression options -z (gzip) or -j (bizp2) are
             added by default, if the <suffix> indicates it.
  -e rate  : To compute the set of files to be put into a compressed
             tarfile, one has to estimate compressed size of each
	     uncompressed source file. To do this a compression program
             indicated by the  tarfile.<suffix> is called (e.g. gzip).
             This may be quite time consiming.

             This overhead my be avoided by giving an "compression rate"
             using the -e option. The real file-size of an an uncompressed
             file is divided by that <rate>. This may result in
             too large or too small result tarfiles. So one has to to some
             trial and error to get the <rate> value right.

             The <rate> is positive number.
  -f prog  : Use prog as "find" program, e.g.
               -f /usr/local/bin/gfind
  -N date  : Only store files newer than <date>.
             Typical format: YYYY-MM-DD or 'YYYY-MM-DD HH:MM:SS' or
             if <date> begins with \`/' or \`.', it is taken to be the name
             of a file whose last-modified time specifies the date.
	     -N passes its argument as tar option \`--newer=<date>'
             (this may be changed in the source of this script, see
              variable TAR_NEWER).
	     -N is valid only if -c is given.
  -h       : Help
  -s sizeK : Maximum size of one tar file in Kilo bytes, default ${DEFAULT_SIZE}
  -s sizeM : Size given in Mega Byte
  -s sizeG : Size given in Giga Byte
  -S       : Split the existing tar archive tarfile.<suffix>
             no [filename|directory...] may be given
             that's the default
  -t prog  : Use prog as "tar" program, e.g.
               -t /usr/local/bin/gtar
  -T file  : Read names to create the archive from <file>
  -v       : Verbose (verbose tar messages)
  -V       : Version.

Example:
  Splitting an already existing archive:
    If foo.tar.gz has a size of 3 M bytes, the command
       split-tar -s 1M foo.tar.gz
    will create the three tar.gz archives:
       foo-000.tar.gz
       foo-001.tar.gz
       foo-002.tar.gz
    which may be unpacked as usual:
       tar -xzvf foo-000.tar.gz
       tar -xzvf foo-001.tar.gz
       tar -xzvf foo-002.tar.gz
    and the the result would be the same as if one unpacks the initial archive
       tar -xzvf foo.tar.gz

  Creating the archives directly from the sources:
       split-tar -e 5 -s 10M -c foo.tar.gz /home/foo
  will create tar archives:
       foo-000.tar.gz, ....  foo-<n>.tar.gz
  containing foo's home directory. A compression rate of 5 is assumed
  for all not already compressed files.

Requirements:
  BASH, GNU-tar, and GNU-find.

Version:
  1.11 of 2006/06/02

Author:
  Dr. Jьrgen Vollmer <juergen.vollmer@informatik-vollmer.de>
  If you find this software useful, I would be glad to receive a postcard
  from you, showing the place where you're living.

Homepage:
  http://www.informatik-vollmer.de/software/split-tar.html

Copyright:
  (C) 2003 Dr. Jьrgen Vollmer, Viktoriastrasse 15, D-76133 Karlsruhe, Germany

License:
  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation; either version 2 of the License, or
  any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program; if not, write to the Free Software
  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
END
exit
}

###############################################################################

DEFAULT_SIZE=1024 # kbyte

# we need the GNU utilities!
# which tar program to use, may be changed with the -t option
TAR=tar

# which find program to use, may be changed with the -t option
FIND=find

# file containing filenames
FILES=${TMP=/tmp}/$CMD.files.$$

# file containing the filename of a single file, too large for a single tar
FILE=${TMP=/tmp}/$CMD.file.$$

# where to untar the source tar file
TAR_DIR=${TMP=/tmp}/$CMD.dir.$$/

# file containing tar sources names
TAR_SOURCES=${TMP=/tmp}/$CMD.tarsources.$$

# remove temporary created files on exit
exit_trap()
{
  if [ $OWN_TAR_SOURCES = NO ]
  then
    rm -fr $TAR_SOURCES
  fi
  rm -fr $FILES $FILE $TAR_DIR
}
trap exit_trap EXIT

# tar-file count
COUNT=0

# flag for selection about OWN_TAR_SOURCES
OWN_TAR_SOURCES=NO

# the GNU-tar option to conserve absolute filenames
# used only if for the -c (create) mode, if the user gives an absolute
# path
# older tar versions may use:
#  TAR_WITH_ABSOLUTE_NAMES=--absolute-paths
TAR_WITH_ABSOLUTE_NAMES=--absolute-names

# the GNU-tar option for storing files newer than DATE
# another possibility would be: --newer-mtime
TAR_NEWER=--newer

# argument of -e
COMPRESSION_RATE=

##############################################################################
# emit an error message and terminate
##############################################################################

error()
{
  echo "$CMD: error $*" 1>&2
  exit 1
}

##############################################################################
# create tar files
##############################################################################

TAR_VERBOSE=
do_tar()
{
  files=$1
  dest=`printf "%s/%s-%03d%s" $DEST_DIR $DEST_BASE $COUNT $SUFFIX`
  touch $dest >/dev/null 2>&1 || error "can not create file $dest"
  if [ $DO_CREATE = NO ]
  then TD="-C $TAR_DIR"
  else TD=
  fi
  $TAR $TD $CREATE_OPTS $TAR_COMPRESS $TAR_VERBOSE \
          -c -f $dest --files-from=$files --no-recursion
  COUNT=$((COUNT + 1))
  ( size=`cat $dest | wc -c`;
    printf "** create: %s: size: %9d (bytes)\n" $dest $size )
}

##############################################################################
# emit all parts of a directory path name
##############################################################################

emit_dir_parts()
{
  local ff="$*"
  while [ ! \( -z "$ff" -o "$ff" = "." -o "$ff" = "/" \) ]
  do
   echo "yyyyy $ff/"
   ff=`dirname "$ff"`
  done
}

##############################################################################
# emit all parts of a directory path name (sorted)
##############################################################################

# LC_ALL=C to get the traditional sort order that uses native byte values.

emit_dir_parts_sorted()
{
  local ff="$*"
  (
    while [ ! \( -z "$ff" -o "$ff" = "." -o "$ff" = "/" \) ]
    do
     echo "$ff"
     ff=`dirname "$ff"`
    done
  ) | (LC_ALL=C sort -u -s)
}

##############################################################################
# check options
##############################################################################

DO_CREATE=NO
CREATE_OPTS=
MAX_SIZE=$((DEFAULT_SIZE * 1024))
TAR_NEWER_ARG=
while getopts cC:e:f:N:hvs:St:T:vV opt "$@"
do
  case $opt in
    c  ) DO_CREATE=YES;;
    C  ) CREATE_OPTS="$CREATE_OPTS $OPTARG";;
    e  ) COMPRESSION_RATE=$OPTARG;;
    f  ) FIND=$OPTARG;;
    N  ) TAR_NEWER_ARG=$OPTARG;;
    S  ) DO_CREATE=NO;;
    s  ) [ x"$OPTARG" = x`expr "$OPTARG" : "\([0-9]*[kKmMgG]\)"` ] ||
	    error "-s expects a number followed by one optional character of KMG"
	 case $OPTARG in
          *[kK] ) MAX_SIZE=$((${OPTARG%[kK]} * 1024));;
          *[mM] ) MAX_SIZE=$((${OPTARG%[mM]} * 1024 * 1024));;
          *[gG] ) MAX_SIZE=$((${OPTARG%[gG]} * 1024 * 1024 * 1024));;
          *     ) MAX_SIZE=$(($OPTARG        * 1024));;
         esac;;
    t)   TAR=$OPTARG;;
    T)   TAR_SOURCES=$OPTARG
         OWN_TAR_SOURCES=YES
         [ -s $TAR_SOURCES ] ||
                 error "-T expects a filename with files to get tar'ed in"
         ;;
    v)   TAR_VERBOSE=-v;;
    V)   echo "$CMD $VERSION"
         exit
         ;;
    h|*) usage;;
  esac
done
shift `expr $OPTIND - 1`

# check correct version of TAR and FIND
if  $TAR --version 2>&1 | grep "GNU tar" > /dev/null
then :
else echo "$CMD: sorry $TAR is no GNU tar"
     exit 1;
fi

if  $FIND --version 2>&1 | grep "GNU find" > /dev/null
then :
else echo "$CMD: sorry $FIND is no GNU find"
     exit 1;
fi

if [ $DO_CREATE == YES ]
then
  if [ $OWN_TAR_SOURCES = YES ]
  then
    [ $# -ge 1 ] || error "expected at least one more argument, for more information: $CMD -h"
    TAR_FILE=$1; shift
  else
    [ $# -ge 2 ] || error "expected at least two arguments, for more information: $CMD -h"
    TAR_FILE=$1; shift
    while [ $# -ge 1 ]
    do
      echo $1 >> $TAR_SOURCES ; shift
      # more $TAR_SOURCES
    done
  fi
  [ -z "$TAR_NEWER_ARG" ] && TAR_NEWER_ARG="1970-01-01 00:00:00"
  TAR_DIR=
else
  [ $# -eq 1 ] || error "expected one argument, for more information: $CMD -h"
  TAR_FILE=$1
  [ -f $TAR_FILE ]        || error "could not read $TAR_FILE"
  [ -z "$TAR_NEWER_ARG" ] || error "-N requires -c"
fi

# COMPRESS_CMD is used only to compute the estimated compressed size fo a file
# it is not used to actually do the compression. That is done via the
# TAR_COMPRESS tar command line option
case `basename $TAR_FILE` in
   *.tar.bz2 ) SUFFIX=".tar.bz2"
	       COMPRESS_CMD="bzip2 --stdout"
               TAR_COMPRESS=--bzip2;;
   *.tar.gz  ) SUFFIX=".tar.gz"
	       COMPRESS_CMD="gzip --stdout --no-name"
	       TAR_COMPRESS=--gzip;;
   *.tgz     ) SUFFIX=".tgz"
	       COMPRESS_CMD="gzip --stdout --no-name"
	       TAR_COMPRESS=--gzip;;
   *.tar     ) SUFFIX=".tar"
	       COMPRESS_CMD=
	       TAR_COMPRESS=;;
   *         ) error "unknown suffix of $TAR_FILE";;
esac
DEST_BASE=`basename $TAR_FILE $SUFFIX`
DEST_DIR=`dirname $TAR_FILE`

##############################################################################
# do the job
##############################################################################

# the size of the files to be tar'ed
cur_size=0

rm -fr $FILES $FILE $TAR_DIR $DEST_BASE-[0-9][0-9][0-9]$SUFFIX

# The line with "xxxx xxxx" indicate: we have seen all files, tar the remaining
# files
# The line with "yyyy <name>" indicate: a directory or other kind of file.
# We have to add directories in order to get the file permissions right.
(
if [ $DO_CREATE = NO ]
then
  ############################################################################
  # unpack the source tar archive
  ############################################################################

  mkdir -p $TAR_DIR || error "can not create $TAR_DIR"
  $TAR -C $TAR_DIR -x $TAR_COMPRESS -f $TAR_FILE || error "can not un-tar $TAR_FILE"

  $FIND $TAR_DIR \( -type f -o -type l \) -a -printf "%s %p\n"
else
  ############################################################################
  # create new archive
  # Note: In order to get file-ownership correct, we have to tar all
  #       all directories and parts of it found in any file-path to be added
  #       in the resulting archive. If we don't do that, we get for
  #       created (intermediate) directories the ownership of the
  #       extractor (e.g.).
  #       Therefore we call tar with the --no-recursion option.
  ############################################################################
  (
     $TAR $TAR_WITH_ABSOLUTE_NAMES       \
          $CREATE_OPTS                   \
          $TAR_NEWER "$TAR_NEWER_ARG"    \
          --files-from=$TAR_SOURCES      \
         -cv -f /dev/null
  ) |
  while read -r f
  do
    if   [ -f "$f" ]
    then wc -c "$f"
	 if [ "${f%/*}" != "$last_dir" ]
	 then last_dir=`dirname "$f"`
              emit_dir_parts "$last_dir"
	 fi
    elif [ -d "$f" ]
    then f=${f%/}
	 emit_dir_parts "$f"
         last_dir="$f"
    else echo "yyyyy $f"
    fi
  done
fi
) | ( LC_ALL=C sort -u -s -k2; echo "xxxx xxxx"; ) | ( sed -e "s|/$||" ) |
while read -r size name
do
   case $size in
    xxx* ) [ -f $FILES ] && do_tar $FILES
	   ;;
    yyy* ) # The file name must be stored too :-)
	   # but it will be compressed too
	   # Add it in any case (ok if we have very bad luck and we're
	   # saving a HUGE directory structure without any files
	   # the resulting archive would be too large).
	   size=$((size + ${#name} / 4))
	   cur_size=$((cur_size + size))
	   echo "$name" | sed -e"s|^$TAR_DIR||" >> $FILES
	   ;;
    *    ) if   [ x"$COMPRESS_CMD" != x ]
	   then # compute estimate of compressed file size
	        case "${name##*.}" in
	          gz | zip | bzip | bzip2 ) ;; # already compressed
	          *  ) if   [ x"$COMPRESSION_RATE" = x ]
		       then size=`$COMPRESS_CMD "$name" | wc -c`
		       else size=$((size / $COMPRESSION_RATE))
		       fi
		       ;;
	        esac
	   fi
	   size=$((size + ${#name} / 4))
		# the file name must be stored too :-)
	        # but it will be compressed too
	   if   [ $size -ge $MAX_SIZE ]
	   then echo "$name" | sed -e"s|^$TAR_DIR||" > $FILE
	        do_tar $FILE
	   elif [ $((size + cur_size)) -ge $MAX_SIZE ]
	   then do_tar $FILES
	        cur_size=$size
		# start new tar archive, so we need to emit all
		# parts of the current files pathname (sorted)
		cat /dev/null > $FILES
		dir_names=$(emit_dir_parts_sorted "`dirname "$name"`")
		if [ -n "$dir_names" ]; then
		    echo "$dir_names" | sed -e"s|^$TAR_DIR||" >> $FILES
		fi
		echo "$name" | sed -e"s|^$TAR_DIR||" >> $FILES
	   else cur_size=$((cur_size + size))
	        echo "$name" | sed -e"s|^$TAR_DIR||" >> $FILES
	   fi
    esac
done

##############################################################################
#                         T h e     E n d
##############################################################################

# Log: split-tar,v $
# Revision 1.30  2010/01/13 17:57:52  vollmer
# typoo
#
# Revision 1.29  2006/07/10 07:17:28  vollmer
# typoo
#
# Revision 1.28  2006/06/02 09:26:07  vollmer
# typoo
#
# Revision 1.27  2006/04/24 14:11:46  vollmer
# typoo
#
# Revision 1.26  2006/02/23 20:01:46  vollmer
# Now all directories get the correct time stamp.
# Sorting works as expected, even if non- 7-bit-ASCII letters are used
# by using LC_ALL=C.
# Thanks to one who wants to be unnamed for sending me the bug-fixes.
#
# Revision 1.25  2005/04/27 13:48:34  vollmer
# Add all intermediate directories of a path explicitly in order to get
# file/directory ownership correctly.
# Thanks to Tom Battisto <tbattist-AT-mailaka.net> for the bug report.
#
# Revision 1.24  2005/04/26 07:56:49  vollmer
# Directory persmissions are set now correctly when unpacking the archives.
# Thanks to Tom Battisto <tbattist-AT-mailaka.net> for the bug report.
#
# Revision 1.23  2005/04/08 20:52:32  vollmer
# Added option -T, thanks to Juergen Kainz <jkainz-AT-transflow.com>
#
# Revision 1.21  2005/04/08 20:14:22  vollmer
# added option -e
#
# Revision 1.20  2004/07/23 21:30:15  vollmer
# - added -f and -t options to specify a FIND and TAR program.
#
# Revision 1.18  2003/11/06 16:24:13  vollmer
# - options passed by -C to tar will be passed now to the do_tar routine
# - \ as part of file names are allowed now
#   Thanks to A. R.
#
# Revision 1.17  2003/11/03 16:42:10  vollmer
# - Added option -N
# - The created tar files are stored now in the given directory and not
#   in the current one.
# Thanks to Martin Walter <martin.walter-AT-erol.at>, who found that bug and
# asked for -N
#
# Revision 1.16  2003/10/31 13:01:51  vollmer
# Creating a splitted tar file from directory works now for absolute
# path names of the directory
#
# Revision 1.15  2003/09/18 17:10:40  vollmer
# Filenames containing blanks are processed correctly if given on the
# command line.
# Thanks to Dr. Jim McCaa <jmccaa-AT-ucar.edu>, who gave me the fix.
#
# Revision 1.14  2003/08/18 07:28:16  vollmer
# The number followed -s must be followed now by k m or g
# (in order to make `expr' more portable)
#
# Revision 1.13  2003/08/12 07:56:38  vollmer
# added an Example
#
# Revision 1.12  2003/08/12 07:22:27  vollmer
# fixed a bug found by Willem Penninckx <willem.penninckx-AT-belgacom.net>:
# filenames may contain now blanks and * and other shell emta charcters.
#
# Revision 1.11  2003/07/29 14:08:10  vollmer
# added the aibility to create the tar archive directly from the sources
# (option -c)
#
# Revision 1.10  2003/07/29 13:01:50  vollmer
# -s accepts size specifier k,K,m,M,g or G
#
# Revision 1.9  2003/07/29 12:38:17  vollmer
# improved computing expected size computation
#
# Revision 1.8  2003/07/21 07:55:32  vollmer
# added --no-name option to the gzip COMPRESS_CMD
#
# Revision 1.7  2003/07/15 08:27:04  vollmer
# - added bzip2, thanks to Martin Deinhofer <martin.deinhofer-AT-gesig.at>
# - added length of file names when computing the size
#
# Revision 1.0  2003/07/02 14:57:17 vollmer
# Initial revision
##############################################################################