Introduction

Bred2 is a slight update to bred giving
slightly improved service.  The changes are given below.

1) The sorting routines have been changed.

2) The check for runaway false data will now only signal
false data. It should not now signal when the data
is contracted by more than twice the block size by
the repetition filter, and as a consequence
require the control -r in bexp be adjusted to cope.

3) There is now a work limit incorporated which causes
   an abort when too many very deep comparisons are 
required.

4)  The space needed for sorting is reduced from 8 times
blocksize to 5 times blocksize (bytes).

These should give improvements in ease of use and
speed.  The speed is increased by 50% for large
files, although small files are sometimes slower
due to the two character radix sort overhead.

Items 2&3
  The routine bexp incorporates a check to prevent
data errors causing a runaway which consumes too
much file space.  Unfortunately bred could, when
contracting a large number of repeats, cause the
expansion to exceed twice the block size.  bred2
contains a test to keep within this limit. This
causes slightly more subdivision into smaller
blocks than otherwise would be the case.  This
takes effect only when the number of repeats exceeds
twice the block size. Thus bred may, when the conditions
are right, effectively use a block size three times
the given limit. This will enhance the compression
within the given constraints.

  There is now a detection to test when the sorting
work exceeds a limit.  There is a crude correction for this
(unlikely ?) event. The file is merely read again and
the block size reduced by four until the work
criteron is satisfied. The reduction is printed
if the silence parameter is an odd multiple of 2.
Thus for these special cases the compression achieved
will be reduced.

The work parameter can now be set by -wx to x
and if the work done sorting at depths greater
than 64, exceeds this value, the sorting is aborted
and effective block size for this file reduced by a
factor four. The default is 4000000.


Item 1
There are two major changes to the sorting algorithm.

Radix 2^16 is now used for the preliminary sort.
This allows repeats to be dealt with as detailed
 in report 124. The repeat filter is kept to
be compatible with bexp and to ease the problem of 
inserting enough zeroes in front of the routine to
be sure no comparison goes further.

Using two character radix sorting allows about
one half of the subsequent sorting to be eliminated.
This improvement reduces the need for
packing four characters per word and setting an
index in the word after it has been sorted to
enhance the later sort.

There are files when using the extra word does
enhance performance.  In particular a file made
of a duplicated random file, would be very slow
under the new bred2 but quite fast with the old
bred.  It does not seem worth the average slow
down, particularly as we still need to cope with
abcabcabc etc.

Instead of using radix 2^16 we can split the file
into groups starting with x,<x or x,x or x,>x.
Then the same system can be used but we only need
about 4*256 space for the radix counts.  This
method is often better than the 2^16 one,
particularly for small files where the overheads
are less. However this has not been implemented here.

The method used is given below. To simplify the
description it is given for forward sorting i.e.
suffix sorting.

At the end of the radix sort, the counter corresponding
to group starting b,c gives the start of the pointers
to group b,c

Now assume we have completed the sort for all
groups starting with s. We now make a copy of the counters
indexed by x,s for all x >= s

 We now scan the pointer list
for all items starting with s, inspecting the character=y
before s.  If y>s we set the pointer at copy counter y,s to 
point to y and step that counter. Because we are scanning
in sorted order on s, this is equivalent to a radix sort
on groups y,s where y>s.

Now we start with the groups starting with the next letter
to s say t. Consider t,x if x<t then the group has already
been sorted.  If t=x we can do the sort either with
a quicksort else by scanning as described in report
124, if t>x we can sort by quicksort.

We now set the pointers to z,t where z>t, and continue.

A similar method works if we split into s,>s ; s,s ; s,<s ;.
The quick sort can be by word 4characters at a time and we
can use the index adjustment. These precautions do pay off
for some files, but at the expense of an extra 3 bytes.

Subroutine qsw is a quicksort type of routine, which sorts
a list of pointers to all the prefixes in the file so the
prefixes are in backwards dictionary order. There is also a
work limit parameter.

Subroutine qsww is a sort on the prefixes in a character 
string rch. It generates a list of pointers, and uses 
the algorithm above and qsw. It uses the character beyond
the end of the string, and enough characters before the
string to ensure all the comparisons are valid.

UPDATED 2 Sept 1996 Now has a correction on line 416 of "main"
rch-=40 has been corrected to rch+=40 The error was found by
Andrew Kadach

...........................................................
...........................................................

/* PROGRAM for COMPRESSION */
/* bred2 a somewhat faster bred */

#include <stdio.h>
#define VS(v,op,n) {long nZ=0 ; while (nZ<n) \
(v)[nZ] op, nZ++ ; }


/* A simple sort routine used by huffgen
positive integers v pointers to relevant v
values and number of elements in pt */
 
void shell(long * v,long * pt,long n) {
long p,q,x,k,pk ;
for (k=(n+3)/5 ; k>0 ; k=(k+1)/3)
  for (p=0 ; p<n-k ; p++ )
    if ( (x=v[pt[p+k]]) < v[pt[p]] )  {
      q=p-k ; pk=pt[p+k] ; pt[p+k]=pt[p] ;
      while ( q>=0 && v[pt[q]]>x )
        pt[q+k]=pt[q], q-=k ;
      pt[q+k]=pk ;                    } 
                                    }

/* generates a huffman type coding table
by giving length and codes for in len and f
where f contains the frequencies on entry */

long huffgen(long * f,long *len,long n)
{long j,m,mlim=0,u=0 ,
p,q,x,y,z,bits,r=22,
*pt, *type, cts[50] , *sum ;
pt = ( long * ) malloc( 4*n+4) ;

for (j=n-1 ; j>=0 ; j-- )  {
   if (x= f[j] ) pt[mlim++]=j, u+=x ;
   len[j]=0 ;           }

type = ( long * ) malloc(8*mlim) ;
sum=type+mlim ;
shell(f,pt,mlim) ;
js :
for (j=0 ; j<50 ; j++) cts[j]=0 ;
for (m=0 ; m<mlim ; m++) sum[m]=0x7000000 ;
m=0 ; p=0 ; q=0 ;

while (q<mlim-1)
  if 
( (m+1<mlim) && ( (x=f[pt[m+1]])<=sum[p] ) )
     sum[q]=x+f[pt[m]],
     type[q++]=0,
     m+=2 ;
  else if 
( (m<mlim) && (sum[p+1]< f[pt[m]]) || (m==mlim) )
     sum[q]=sum[p]+sum[p+1],
     type[q++]=2,
     p+=2 ;
        else sum[q]=sum[p++]+f[pt[m++]],
             type[q++]=1 ;

bits=0 ; 
for (j=0 ; j<mlim-1 ; j++) bits+=sum[j] ;
sum[q-1]=0 ;
while (q) { q-- ;
 x=type[q] ; y=sum[q]+1 ;
 if (x==2) sum[--p]=y, sum[--p]=y ;
 else if (x==1)
      len[pt[--m]]=y, sum[--p]=y, cts[y]++ ;
      else len[pt[--m]]=y, 
           len[pt[--m]]=y, cts[y]+=2 ;
         }
y=len[pt[0]]-23 ; if (y>0) {
x=u>>(r--) ; x++ ;
z=0 ; q=0 ;
while ( z>=0) { y=f[pt[q]] ; f[pt[q++]]=x ;
  z+=x-y ; 
              }
f[pt[q-1]] -= z ; goto js ; }

x=0 ;
for (m=49 ; m>=0 ; m--)
   y=cts[m], cts[m]=x, 
   x+=y, x>>=1 ;
for (m=n-1 ; m>=0 ; m--)
   if (len[m])
   f[m]=cts[len[m]]++ ;
free( (char *) pt) ;
free( (char *) type) ;
return bits ;
}

/* A Sort routine that generates a pointer list
from pt to e, which points to rch in backwards
dictionary order. Work is decreased by the sorting
done at DEPTH or greater by the subroutine qsw
and given as the result. If -ve the result is
forced zero and the sorting aborted. The sorting
result assumes that rch[0] is preceded by a 
character lower in value than all others. */

long qsww( long * pt, long * e, unsigned char * rch,
   long work)
{{long tft[65540],x,y,z,nlim=e-pt+1,
 *ftt,mm,p,*s,m, n, startoffset=0,
 ftm[257], *ft ;
for (m=0 ; m<65540 ; m++ ) tft[m]=0 ;
rch[nlim]=0 ;
ft=tft+2 ; x=0 ;
for ( m=0 ; m<nlim ; m++ )
  ft[rch[m]<<8^x]++,
  x=rch[m] ;
x=0 ;
for ( m=0 ; m<65536 ; m++ )
x=ft[m]+=x ;
ft-- ;
x=0 ;
for (m=0 ; m<nlim ; m++ )
  y=ft[(z=rch[m])<<8^x]++,
  pt[y]=m,
  x=z ;
ft-- ;
 while (rch[startoffset]==0) startoffset++ ;
ft[0]=startoffset ;

for (mm=0,m=0 ; m<256 ; mm+=256,m++)
 if (ft[mm]^ft[mm+256])          {
  for ( n=m ; n<256 ; n++) ftm[n]=ft[m^n<<8] ;
  ftt=ft+mm ;
  for (p=m+1 ; p<256 ; p++ )
    if (ftt[p]+1<ftt[p+1])
    work=qsw(pt+ftt[p],pt+ftt[p+1]-1,rch-2,work) ;
 if ( ftt[m+1]-ftt[m] < 10+(ftt[256]-ftt[0]>>3) ) {
   if (ftt[m]+1<ftt[m+1])
        work=qsw(pt+ftt[m],pt+ftt[m+1]-1,rch-2,work) ;
        goto skip ;
                            }
   { long *t=pt+ftt[m] ; s=pt+ftt[0] ;
    rch[nlim]=255-m ;
    while (s!=t) { if (rch[*s+1]==m)
             *t= *s+1, t++ ; s++ ; }

    s=pt+ftt[256]-1 ; t=pt+ftt[m+1]-1 ;
    while (s!=t) { if (rch[*s+1]==m) 
                  *t= *s+1, t-- ; s-- ; }
    rch[nlim]=0 ;
          }
skip :
 if (m==0)
   ftm[rch[startoffset]]++ ;
 for ( s=pt+ftt[0] ; s<pt+ftt[256] ; s++ ) {
   z= rch[*s+1] ;
   if (z>m)
   pt[ftm[z]++]= *s+1 ;                    }
                                  }
return work ;
}}


/* a quick sort routine using a given pointer
list starting at s and ending at e. The
pointers scan puts rch character string
in order ( backwards dictionary ). Work
is decreased by the sorting work at a depth
greater than DEPTH, default 64 and returned
as the result. If it becomes -ve the sort 
is stopped and the result is zero. The sorting
will use characters below rch[0] if necessary
and there should be enough to allow this.  */

long qsw(long *s,long *e,
      unsigned char *ch,long work) {
long list[99],zref,zs,ze,*ss,*ee,dd=0 ;
unsigned char *chd=ch ;
#define DEPTH 64
goto jss ;

jred : if (dd==0) return work ;
chd=(unsigned char *) list[--dd] ;
e=(long *) list[--dd] ;
s=(long * ) list[--dd] ;
jss :
if (s+1>=e) {unsigned char *chdd=chd ;
  if (e==s) goto jred ;
  while (chdd[*s]==chdd[*e]) chdd-- ;
  if (chdd[*s]>chdd[*e])
  zref= *s, *s= *e, *e=zref ;
  zref=ch-chdd-DEPTH ;
  if (zref>0) { work-=zref ;
  if (work<0) return 0 ; }
  goto jred ; }

if ((zs=chd[*s])==(ze=chd[*e])) {
  ss=s+1 ;
  while ( (chd[*ss]==ze) && (ss!=e) ) ss++ ;
  if (ss==e)   {
    chd-- ; zref=ch-chd-DEPTH ;
    if (zref>0)               {
       work-=e-s ;
       if (work<0) return 0 ; }
    goto jss ; }
  else zs=chd[*ss] ; }
zref=(zs+ze)>>1 ; ss=s ; ee=e ; goto js2 ;

jss1 :
zs= *ss ; *ss++ = *ee ; *ee-- =zs ;
js2 :
while (zref>=chd[*ss]) ss++ ;
while (zref<chd[*ee]) ee-- ;
if (ss<ee) goto jss1 ;

ss-- ; ee++ ;
if (s-ss)
  if (e-ee)
     if (ss-s>e-ee) {
        list[dd++]=(int) s ;
        list[dd++]=(int) ss ;
        list[dd++]=( long ) chd ;
        s=ee ; goto jss ; }
     else {
        list[dd++]=(int) ee ;
        list[dd++]=(int) e ;
        list[dd++]=( long ) chd ;
        e=ss ; goto jss ; }
  else { e=ss ; goto jss ; }
else if (e-ee) { s=ee ; goto jss ; }
      else goto jred ;             }



/*  MAIN PRORAM */

long main(long argc, char * argv[]) {
char chl[50], *error ;
unsigned char *rch ;
long  x,y,z,z1, 
*pt,
m,n,p,q, 
argcc=1,
  /* control variables */
silent=1,
keep=1,
errct=0, 
errlim=1,
blocksize=100000 ,
work=40000000,
/* counts */
outct, 
ctch, 
dct ;
FILE *fpr, *fpw, *fopen() ;

/* FILE and OPTION handling */
goto jss1 ;

/* error returns */
err   : close(fpw) ; unlink(fpw) ;
err1  : close(fpr) ;
err2  :
fprintf(stderr,"bred fails, file %s\n%s\n",
   argv[argcc],error) ; 
if (errct>=errlim) return errct ;  

jss  : argcc++ ;
jss1 :
while (argcc<argc) {
x=0 ;
while ( chl[x]=argv[argcc][x] ) x++ ;
if (chl[0]=='-')     {long zn=2 ;
   x=0 ;
   while (chl[zn]) x=10*x + chl[zn++]-'0' ;
   switch ( argv[argcc][1] )       {
case 's' : silent=x ; break ;
case 'k' : keep=x ; break ;
case 'M' : blocksize=1000000*x ; break ;
case 'K' : blocksize=1000*x ; break ;
case 'e' : errlim=x ; break ;
case 'w' : work=x ; break ;        }
goto jss ;  }

if ((fpr=fopen(argv[argcc],"r"))==NULL)
   { error= "cannot open to read" ;
   goto err2  ; }
chl[x++]='+' ; chl[x++]=0 ;
if ((fpw=fopen(chl,"w"))== NULL)
   { error="cannot open to write" ;
   goto err1 ; }
if (blocksize%1000==0)         {
   rch = (unsigned char *) malloc(blocksize+40) ;
   pt= ( long *) malloc(blocksize*4) ;
   if (pt==0)   { error= "not enough space from malloc" ;
                 goto err ; }
   blocksize-= 200 ;
   rch+=40 ;   /* Andrew Kadach pointed out an error in the 
original which had rch-=40  ************************* */
   VS(rch-40,=0,41)
                          }

{{{
long  mlim, bsize=blocksize ;
jrestart :
ctch=0 ; outct=0 ; dct=0 ;

/* read file and reduce simple repeats
limit number of repeats to
twice blocksize */

jstart :
z= -512; m=1 ; y=0 ; x=2*blocksize-1000 ;

while (m<bsize)    {
z1=z ;
z=getc(fpr) ; if (z==EOF) break ;
rch[m++]=z ; ctch++ ;
if (z^z1) continue ;

while ( (z=getc(fpr)) ==z1 ){
  if (x-- < 0) break ;
  y++ ;                     }
  ctch+=y+1 ;

while (y) {y-- ;
       rch[m++]=z1^y&1 ; y=y>>1 ; }
if ((z^z1)<3) rch[m++]=z1^2 ;
if (z^EOF) rch[m++]=z ; 
if (x<0) break ;     }

/* sort by pointer */
mlim=m ;
x=qsww(pt,pt+mlim-1,rch,work) ;
if (x==0) { rewind(fpr) ; rewind(fpw) ;
   if (silent&2)
   printf(" \nreducing blocksize to %d",bsize/4) ;
    bsize>>=2 ; goto jrestart ; }

/* For the 'next' or predicted values
 construct move to front values and
1-2 zero string coding. insert coding
in pt */
{
#define OUT while (bfct>7) { \
  putc(bff>>bfct-8,fpw) ; outct++ ; bfct-=8 ; }
#define PUT(a,b) { bff=bff<<(b)^(a) ; bfct+=b ; }

long r[257], /* used for move to front */
trel[260],   /* holds relative frequencies, then codes */
len[260],    /* holds code lengths */
s=0,j ,k=0 , bff=0, bfct=0 ;

VS(r,=nZ,257) /* set char list for move to front */ 
VS(trel,=0,260) /* zero frequencies */
for (j=0 ; j<mlim ; j++ )  {
   z=rch[(x=pt[j]+1)] ;
   if (x==mlim) z=256 ;
   p=0 ;
   while (z^r[p]) p++ ;
   q=p ;
   while (p)  r[p]=r[p-1], p--  ;
   r[0]=z ;

   if (q==0) s++ ;
   else { while (s)  {
       s-- ;
       trel[s&1]++ ;
       pt[k++]=s&1 ;
       s>>=1 ;       }
       trel[q+1]++ ;
       pt[k++]= q+1 ;
        }                  }
while (s)  {
       s-- ;
       trel[s&1]++ ;
       pt[k++]=s&1 ;
       s>>=1 ;       }

/* generate code and length table for
encoding  */
trel[258]=1 ; /* make eof smallest code */
y=huffgen(trel,len,259) ;

/* output the code length table
 base 4 abreviations for repeated values
or zeros, first nibble is not zero */

dct-=outct+outct ;
y=1 ;
for (m=0,n=0,p=0 ; m<259 ; m++)  {
z=len[m] ;
if ( z==0 || z^y )
  while (n)    { 
     n-- ;
     PUT(8^n&3,4) OUT
     n>>=2 ;   }
if (z) while (p)  {
   p-- ;
   PUT(12^p&3,4) OUT
   p>>=2 ;        }
else { p++ ; continue ; }
if (y==z) { n++ ; continue ; }
x=z-y+3 ;
if (x<0) x+=23 ;
if (x>22) x-=23 ;
if (x<7) { PUT(x+1,4) OUT
    y=z ; continue ; }
PUT(4,4) PUT(x-7,4) OUT
y=z ;                            }
while (n)    { 
   n-- ;
   PUT(8^n&3,4) OUT
   n>>=2 ;   }
dct +=outct+outct+(bfct>>2) ;

/* output coded values */
for (m=0 ; m<k ; m++) {
z=pt[m] ;
PUT(trel[z],len[z]) OUT  } 

PUT(trel[258],len[258]) OUT
PUT(0,16-bfct) OUT

     /* possibly print tables of len and code */
if (silent&4) 
   { printf("\n char, codein hex, length triples\n") ;
   for (m=0,n=0 ; m<259 ; m++)
 if (len[m])
  x=printf("%c%4d%3x%3d",
  (n%4==0) ? 10 : ' ',m,trel[m],len[m]), n++ ; }
} 

/* check for file completion or
another block */

z=getc(fpr) ;
if (z^EOF) { ungetc(z,fpr) ; goto jstart ; }
}}}

/* resume file handling */

fclose(fpr) ; fclose(fpw) ;
if (keep) unlink(argv[argcc]) ;
if (ctch) x=ctch ; else x=1 ;
if (silent)
 printf("\n%s length %d cut by %d%% to\
 %s length %d\n",
 argv[argcc],ctch,100*(ctch-outct)/x, chl, outct) ; 
if (silent&2)
 printf("mbits per byte %d length table ratio %d/1000\n",
 (x<100000) ? outct*8000/x : outct*80/(x/100),dct*500/outct ) ; 

argcc++ ;
                   }
return errct ; }









