VENTI(6)                                                 VENTI(6)

     NAME
          venti - archival storage server

     DESCRIPTION
          Venti is a block storage server intended for archival data.
          In a Venti server, the SHA1 hash of a block's contents acts
          as the block identifier for read and write operations.  This
          approach enforces a write-once policy, preventing accidental
          or malicious destruction of data.  In addition, duplicate
          copies of a block are coalesced, reducing the consumption of
          storage and simplifying the implementation of clients.

          This manual page documents the basic concepts of block
          storage using Venti as well as the Venti network protocol.

          Venti(1) documents some simple clients.  Vac(1) and vacfs(4)
          are more complex clients.

          Venti(2) describes a C library interface for accessing Venti
          servers and manipulating Venti data structures.

          Venti(8) describes the programs used to run a Venti server.

        Scores
          The SHA1 hash that identifies a block is called its score.
          The score of the zero-length block is called the zero score.

          Scores may have an optional label: prefix, typically used to
          describe the format of the data.  For example, vac(1) uses a
          vac: prefix.

        Files and Directories
          Venti accepts blocks up to 56 kilobytes in size. By conven-
          tion, Venti clients use hash trees of blocks to represent
          arbitrary-size data files. The data to be stored is split
          into fixed-size blocks and written to the server, producing
          a list of scores.  The resulting list of scores is split
          into fixed-size pointer blocks (using only an integral num-
          ber of scores per block) and written to the server, produc-
          ing a smaller list of scores.  The process continues, even-
          tually ending with the score for the hash tree's top-most
          block.  Each file stored this way is summarized by a VtEntry
          structure recording the top-most score, the depth of the
          tree, the data block size, and the pointer block size.  One
          or more VtEntry structures can be concatenated and stored as
          a special file called a directory. In this manner, arbitrary
          trees of files can be constructed and stored.

          Scores passed between programs conventionally refer to
          VtRoot blocks, which contain descriptive information as well

     VENTI(6)                                                 VENTI(6)

          as the score of a directory block containing a small number
          of directory entries.

          Conventionally, programs do not mix data and directory
          entries in the same file.  Instead, they keep two separate
          files, one with directory entries and one with metadata ref-
          erencing those entries by position.  Keeping this parallel
          representation is a minor annoyance but makes it possible
          for general programs like venti/copy (see venti(1)) to tra-
          verse the block tree without knowing the specific details of
          any particular program's data.

        Block Types
          To allow programs to traverse these structures without need-
          ing to understand their higher-level meanings, Venti tags
          each block with a type.  The types are:

              VtDataType     000  data
              VtDataType+1   001  scores of VtDataType blocks
              VtDataType+2   002  scores of VtDataType+1 blocks
              ...
              VtDirType      010  VtEntry structures
              VtDirType+1    011  scores of VtDirType blocks
              VtDirType+2    012  scores of VtDirType+1 blocks
              ...
              VtRootType     020  VtRoot structure

          The octal numbers listed are the type numbers used by the
          commands below.  (For historical reasons, the type numbers
          used on disk and on the wire are different from the above.
          They do not distinguish VtDataType+n blocks from VtDirType+n
          blocks.)

        Zero Truncation
          To avoid storing the same short data blocks padded with dif-
          fering numbers of zeros, Venti clients working with fixed-
          size blocks conventionally `zero truncate' the blocks before
          writing them to the server.  For example, if a 1024-byte
          data block contains the 11-byte string `hello world' fol-
          lowed by 1013 zero bytes, a client would store only the 11-
          byte block.  When the client later read the block from the
          server, it would append zero bytes to the end as necessary
          to reach the expected size.

          When truncating pointer blocks (VtDataType+n and VtDirType+n
          blocks), trailing zero scores are removed instead of trail-
          ing zero bytes.

          Because of the truncation convention, any file consisting
          entirely of zero bytes, no matter what its length, will be
          represented by the zero score: the data blocks contain all
          zeros and are thus truncated to the empty block, and the

     VENTI(6)                                                 VENTI(6)

          pointer blocks contain all zero scores and are thus also
          truncated to the empty block, and so on up the hash tree.

        Network Protocol
          A Venti session begins when a client connects to the network
          address served by a Venti server; the conventional address
          is tcp!server!venti (the venti port is 17034).  Both client
          and server begin by sending a version string of the form
          venti-versions-comment\n.  The versions field is a list of
          acceptable versions separated by colons.  The protocol
          described here is version 02.  The client is responsible for
          choosing a common version and sending it in the VtThello
          message, described below.

          After the initial version exchange, the client transmits
          requests (T-messages) to the server, which subsequently
          returns replies (R-messages) to the client.  The combined
          act of transmitting (receiving) a request of a particular
          type, and receiving (transmitting) its reply is called a
          transaction of that type.

          Each message consists of a sequence of bytes.  Two-byte
          fields hold unsigned integers represented in big-endian
          order (most significant byte first).  Data items of variable
          lengths are represented by a one-byte field specifying a
          count, n, followed by n bytes of data.  Text strings are
          represented similarly, using a two-byte count with the text
          itself stored as a UTF-encoded sequence of Unicode charac-
          ters (see utf(6)). Text strings are not NUL-terminated: n
          counts the bytes of UTF data, which include no final zero
          byte.  The NUL character is illegal in text strings in the
          Venti protocol.  The maximum string length in Venti is 1024
          bytes.

          Each Venti message begins with a two-byte size field speci-
          fying the length in bytes of the message, not including the
          length field itself.  The next byte is the message type, one
          of the constants in the enumeration in the include file
          <venti.h>.  The next byte is an identifying tag, used to
          match responses to requests.  The remaining bytes are param-
          eters of different sizes.  In the message descriptions, the
          number of bytes in a field is given in brackets after the
          field name.  The notation parameter[n] where n is not a con-
          stant represents a variable-length parameter: n[1] followed
          by n bytes of data forming the parameter. The notation
          string[s] (using a literal s character) is shorthand for
          s[2] followed by s bytes of UTF-8 text.  The notation
          parameter[] where parameter is the last field in the message
          represents a variable-length field that comprises all
          remaining bytes in the message.

          All Venti RPC messages are prefixed with a field size[2]

     VENTI(6)                                                 VENTI(6)

          giving the length of the message that follows (not including
          the size field itself).  The message bodies are:

               VtThello tag[1] version[s] uid[s] strength[1] crypto[n]
               codec[n]
               VtRhello tag[1] sid[s] rcrypto[1] rcodec[1]

               VtTping tag[1]
               VtRping tag[1]

               VtTread tag[1] score[20] type[1] pad[1] count[2]
               VtRread tag[1] data[]

               VtTwrite tag[1] type[1] pad[3] data[]
               VtRwrite tag[1] score[20]

               VtTsync tag[1]
               VtRsync tag[1]

               VtRerror tag[1] error[s]

               VtTgoodbye tag[1]

          Each T-message has a one-byte tag field, chosen and used by
          the client to identify the message.  The server will echo
          the request's tag field in the reply.  Clients should
          arrange that no two outstanding messages have the same tag
          field so that responses can be distinguished.

          The type of an R-message will either be one greater than the
          type of the corresponding T-message or Rerror, indicating
          that the request failed.  In the latter case, the error
          field contains a string describing the reason for failure.

          Venti connections must begin with a hello transaction.  The
          VtThello message contains the protocol version that the
          client has chosen to use.  The fields strength, crypto, and
          codec could be used to add authentication, encryption, and
          compression to the Venti session but are currently ignored.
          The rcrypto, and rcodec fields in the VtRhello response are
          similarly ignored.  The uid and sid fields are intended to
          be the identity of the client and server but, given the lack
          of authentication, should be treated only as advisory.  The
          initial hello should be the only hello transaction during
          the session.

          The ping message has no effect and is used mainly for debug-
          ging.  Servers should respond immediately to pings.

          The read message requests a block with the given score and
          type. Use vttodisktype and vtfromdisktype (see venti(2)) to
          convert a block type enumeration value (VtDataType, etc.)

     VENTI(6)                                                 VENTI(6)

          to the type used on disk and in the protocol.  The count
          field specifies the maximum expected size of the block.  The
          data in the reply is the block's contents.

          The write message writes a new block of the given type with
          contents data to the server.  The response includes the
          score to use to read the block, which should be the SHA1
          hash of data.

          The Venti server may buffer written blocks in memory, wait-
          ing until after responding to the write message before writ-
          ing them to permanent storage.  The server will delay the
          response to a sync message until after all blocks in earlier
          write messages have been written to permanent storage.

          The goodbye message ends a session.  There is no VtRgoodbye:
          upon receiving the VtTgoodbye message, the server terminates
          up the connection.

     SEE ALSO
          venti(1), venti(2), venti(8)
          Sean Quinlan and Sean Dorward, ``Venti: a new approach to
          archival storage'', Usenix Conference on File and Storage
          Technologies , 2002.