VENTI(8)                                                 VENTI(8)

     NAME
          venti - archival storage server

     SYNOPSIS
         venti/venti [ -Ldrs ] [ -a address ] [ -B blockcachesize ] [
            -c config ] [ -C lumpcachesize ] [ -h httpaddress ] [ -I
            indexcachesize ] [ -m free-memory-percent ] [ -W webroot ]

     DESCRIPTION
          Venti is a SHA1-addressed archival storage server.  See
          venti(6) for a full introduction to the system.  This page
          documents the structure and operation of the server.

          A venti server requires multiple disks or disk partitions,
          each of which must be properly formatted before the server
          can be run.

        Disk
          The venti server maintains three disk structures, typically
          stored on raw disk partitions: the append-only data log,
          which holds, in sequential order, the contents of every
          block written to the server; the index, which helps locate a
          block in the data log given its score; and optionally the
          bloom filter, a concise summary of which scores are present
          in the index.  The data log is the primary storage.  To
          improve the robustness, it should be stored on a device that
          provides RAID functionality.  The index and the bloom filter
          are optimizations employed to access the data log effi-
          ciently and can be rebuilt if lost or damaged.

          The data log is logically split into sections called arenas,
          typically sized for easy offline backup (e.g., 500MB).  A
          data log may comprise many disks, each storing one or more
          arenas.  Such disks are called arena partitions. Arena par-
          titions are filled in the order given in the configuration.

          The index is logically split into block-sized pieces called
          buckets, each of which is responsible for a particular range
          of scores.  An index may be split across many disks, each
          storing many buckets.  Such disks are called index sections.

          The index must be sized so that no bucket is full.  When a
          bucket fills, the server must be shut down and the index
          made larger.  Since scores appear random, each bucket will
          contain approximately the same number of entries.  Index
          entries are 40 bytes long.  Assuming that a typical block
          being written to the server is 8192 bytes and compresses to
          4096 bytes, the active index is expected to be about 1% of
          the active data log.  Storing smaller blocks increases the
          relative index footprint; storing larger blocks decreases

     VENTI(8)                                                 VENTI(8)

          it.  To allow variation in both block size and the random
          distribution of scores to buckets, the suggested index size
          is 5% of the active data log.

          The (optional) bloom filter is a large bitmap that is stored
          on disk but also kept completely in memory while the venti
          server runs.  It helps the venti server efficiently detect
          scores that are not already stored in the index.  The bloom
          filter starts out zeroed.  Each score recorded in the bloom
          filter is hashed to choose nhash bits to set in the bloom
          filter.  A score is definitely not stored in the index of
          any of its nhash bits are not set.  The bloom filter thus
          has two parameters: nhash (maximum 32) and the total bitmap
          size (maximum 512MB, 232 bits).

          The bloom filter should be sized so that nhash × nblock ≦
          0.7 × b, where nblock is the expected number of blocks
          stored on the server and b is the bitmap size in bits.  The
          false positive rate of the bloom filter when sized this way
          is approximately 2-nblock.  Nhash less than 10 are not very
          useful; nhash greater than 24 are probably a waste of mem-
          ory.  Fmtbloom (see venti-fmt(8)) can be given either nhash
          or nblock; if given nblock, it will derive an appropriate
          nhash.

        Memory
          Venti can make effective use of large amounts of memory for
          various caches.

          The lump cache holds recently-accessed venti data blocks,
          which the server refers to as lumps. The lump cache should
          be at least 1MB but can profitably be much larger.  The lump
          cache can be thought of as the level-1 cache: read requests
          handled by the lump cache can be served instantly.

          The block cache holds recently-accessed disk blocks from the
          arena partitions.  The block cache needs to be able to
          simultaneously hold two blocks from each arena plus four
          blocks for the currently-filling arena.  The block cache can
          be thought of as the level-2 cache: read requests handled by
          the block cache are slower than those handled by the lump
          cache, since the lump data must be extracted from the raw
          disk blocks and possibly decompressed, but no disk accesses
          are necessary.

          The index cache holds recently-accessed or prefetched index
          entries.  The index cache needs to be able to hold index
          entries for three or four arenas, at least, in order for
          prefetching to work properly.  Each index entry is 50 bytes.
          Assuming 500MB arenas of 128,000 blocks that are 4096 bytes
          each after compression, the minimum index cache size is
          about 6MB.  The index cache can be thought of as the level-3

     VENTI(8)                                                 VENTI(8)

          cache: read requests handled by the index cache must still
          go to disk to fetch the arena blocks, but the costly random
          access to the index is avoided.

          The size of the index cache determines how long venti can
          sustain its `burst' write throughput, during which time the
          only disk accesses on the critical path are sequential
          writes to the arena partitions.  For example, if you want to
          be able to sustain 10MB/s for an hour, you need enough index
          cache to hold entries for 36GB of blocks.  Assuming 8192-
          byte blocks, you need room for almost five million index
          entries.  Since index entries are 50 bytes each, you need
          250MB of index cache.  If the background index update pro-
          cess can make a single pass through the index in an hour,
          which is possible, then you can sustain the 10MB/s indefi-
          nitely (at least until the arenas are all filled).

          The bloom filter requires memory equal to its size on disk,
          as discussed above.

          A reasonable starting allocation is to divide memory equally
          (in thirds) between the bloom filter, the index cache, and
          the lump and block caches; the third of memory allocated to
          the lump and block caches should be split unevenly, with
          more (say, two thirds) going to the block cache.

        Network
          The venti server announces two network services, one (con-
          ventionally TCP port venti, 17034) serving the venti proto-
          col as described in venti(6), and one serving HTTP (conven-
          tionally TCP port http, 80).

          The venti web server provides the following URLs for access-
          ing status information:

          /index    A summary of the usage of the arenas and index
                    sections.

          /xindex   An XML version of /index.

          /storage  Brief storage totals.

          /set/variable
                    The current integer value of variable. Variables
                    are: compress, whether or not to compress blocks
                    (for debugging); logging, whether to write entries
                    to the debugging logs; stats, whether to collect
                    run-time statistics; icachesleeptime, the time in
                    milliseconds between successive updates of mega-
                    bytes of the index cache; arenasumsleeptime, the
                    time in milliseconds between reads while

     VENTI(8)                                                 VENTI(8)

                    checksumming an arena in the background.  The two
                    sleep times should be (but are not) managed by
                    venti; they exist to provide more experience with
                    their effects.  The other variables exist only for
                    debugging and performance measurement.

          /set/variable/value
                    Set variable to value.

          /graph/name/param/param
                    A PNG image graphing the named run-time statistic
                    over time.  The details of names and parameters
                    are undocumented; see httpd.c in the venti
                    sources.

          /log      A list of all debugging logs present in the
                    server's memory.

          /log/name The contents of the debugging log with the given
                    name.

          /flushicache
                    Force venti to begin flushing the index cache to
                    disk.  The request response will not be sent until
                    the flush has completed.

          /flushdcache
                    Force venti to begin flushing the arena block
                    cache to disk.  The request response will not be
                    sent until the flush has completed.

          Requests for other files are served by consulting a direc-
          tory named in the configuration file (see webroot below).

        Configuration File
          A venti configuration file enumerates the various index sec-
          tions and arenas that constitute a venti system.  The compo-
          nents are indicated by the name of the file, typically a
          disk partition, in which they reside.  The configuration
          file is the only location that file names are used.  Inter-
          nally, venti uses the names assigned when the components
          were formatted with fmtarenas or fmtisect (see venti-
          fmt(8)). In particular, only the configuration needs to be
          changed if a component is moved to a different file.

          The configuration file consists of lines in the form
          described below.  Lines starting with # are comments.

          index name   Names the index for the system.

          arenas file  File is an arena partition, formatted using

     VENTI(8)                                                 VENTI(8)

                       fmtarenas.

          isect file   File is an index section, formatted using
                       fmtisect.

          bloom file   File is a bloom filter, formatted using
                       fmtbloom.

          After formatting a venti system using fmtindex, the order of
          arenas and index sections should not be changed.  Additional
          arenas can be appended to the configuration; run fmtindex
          with the -a flag to update the index.

          The configuration file also holds configuration parameters
          for the venti server itself.  These are:

          mem size           lump cache size
          bcmem size         block cache size
          icmem size         index cache size
          addr netaddr       network address to announce venti service
                             (default tcp!*!venti)
          httpaddr netaddr   network address to announce HTTP service
                             (default tcp!*!http)
          queuewrites        queue writes in memory (default is not to
                             queue)
          webroot dir        directory tree containing files for
                             venti's internal HTTP server to consult
                             for unrecognized URLs

          The units for the various cache sizes above can be specified
          by appending a `k', `m', or `g' (case-insensitive) to indi-
          cate kilobytes, megabytes, or gigabytes respectively.

          The file name in the configuration lines above can be of the
          form file:lo-hi to specify a range of the file. Lo and hi
          are specified in bytes but can have the usual k, m, or g
          suffixes.  Either lo or hi may be omitted.  This notation
          eliminates the need to partition raw disks on non-Plan 9
          systems.

        Command Line
          Many of the options to Venti duplicate parameters that can
          be specified in the configuration file.  The command line
          options override those found in a configuration file.  Addi-
          tional options are:

          -c config  The server configuration file (default
                     venti.conf)

          -d         Produce various debugging information on standard
                     error.  Implies -s.

     VENTI(8)                                                 VENTI(8)

          -L         Enable logging.  By default all logging is dis-
                     abled.  Logging slows server operation consider-
                     ably.

          -m         Allocate free-memory-percent percent of the
                     available free RAM, and partition it per the
                     guidelines in the Memory subsection.  This per-
                     centage should be large enough to include the
                     entire bloom filter.  This overrides all other
                     memory sizing parameters, including those on the
                     command line and in the configuration file.  25%
                     is a reasonable choice.

          -r         Allow only read access to the venti data.

          -s         Do not run in the background.  Normally, the
                     foreground process will exit once the Venti
                     server is initialized and ready for connections.

     EXAMPLE
          A simple configuration:

               % cat venti.conf
               index main
               isect /tmp/disks/isect0
               isect /tmp/disks/isect1
               arenas /tmp/disks/arenas
               bloom /tmp/disks/bloom
               %

          Format the index sections, the arena partition, the bloom
          filter, and finally the main index:

               % venti/fmtisect isect0. /tmp/disks/isect0
               % venti/fmtisect isect1. /tmp/disks/isect1
               % venti/fmtarenas arenas0. /tmp/disks/arenas &
               % venti/fmtbloom /tmp/disks/bloom &
               % wait
               % venti/fmtindex venti.conf
               %

          Start the server and check the storage statistics:

               % venti/venti
               % hget http://$sysname/storage

     SOURCE
          /sys/src/cmd/venti/srv

     SEE ALSO
          venti(1), venti(2), venti(6), venti-backup(8), venti-fmt(8)
          Sean Quinlan and Sean Dorward, ``Venti: a new approach to

     VENTI(8)                                                 VENTI(8)

          archival storage'', Usenix Conference on File and Storage
          Technologies , 2002.

     BUGS
          Setting up a venti server is too complicated.