wiki:internet-video-distribution-proposal

Intro

By many estimates (see Cisco White Paper and NY Times article), video traffic on the Internet will continue to increase over the next few years, becoming a dominant form of content and driving up bandwidth usage every where.

Corporations are both fueling and controlling this process by providing unsustainable and capital intensive ad-supported services, like YouTube, that have unacceptable content ownership policies, support the use of proprietary video formats, and are politically unreliable to trust with sensitive content.

In short, the world needs a better way to host and delivery multi-media content.

Global Cache Nework

The Global Cache Network is a proposal for creating such a system. The proposal has been developed by May First/People Link based on meetings and discussions with our members who work heavily with multi media content.

The Global Cache Network would provide:

  • Servers for uploading and transcoding multi-media content from widely used proprietary and non-proprietary formats to free formats.
  • Servers for storing and delivering the media content

  • Servers designed to cache the content, providing fast and reliable access.

Principles and Content Policy

The Global Cache Network would be based on the following principles:

  • Distributed: Nearly all aspects and servers used in the network will be distributed. Anyone who demonstrates the ability to technically install the appropriate software on a server can contribute a server to the network, thus allowing it to grow infinitely. The only centralized aspect of the network will the domain name system, which will control which servers are authorized to be in the network (and will allow for the removal of servers that are unreliable or proven untrustworthy).
  • Anonymous: all submissions are anonymous. Authors are free to link to uploaded content from their own sites and assert ownership, however, no ownership is internally linked to any content in the network. By extension of anonymous uploads, no user-initiated edits or deletes are allowed. Once content is uploaded, the uploader cannot request that it be removed or replaced (there will be no mechanism to know who has authority to authorize such edits or deletes).
  • Distribution-focused: The system is not designed to provide personal storage. Content pruning will happen via automatic deletion of material that is deemed "minimally accessed" over a given period of time (pruned content can be re-uploaded).

Any content is allowed provided it meets the following Content policy:

  • No content that advances an agenda of hate, discrimination or oppression
  • No commercials. Content that directly advances a for-profit venture is prohibited.
  • No initial uploads greater than 2 GB (subject to change)
  • No sexually explicit content.

Technical Overview

The technical details are based on a central domain name of gc.domain (obviously this will change).

The network would be broken into several pieces:

  • Entry servers, providing upload and transcoding services (a.entry.gc.domain, b.entry.gc.domain, c.entry.gc.domain etc.) also available via round robin DNS as entry.gc.domain.
  • Caching servers (responding to a.store.gc.domain, b.store.gc.domain, c.store.gc.domain) and pulling from the corresponding primary storage servers (see below).
  • Primary storage servers (primary.a.store.gc.domain, primary.b.store.gc.domain, primary.c.store.gc.domain, etc.).

  • Backup storage servers (0.a.store.gc.domain, 1.a.store.gc.domain, 2.a.store.gc.domain, 0.b.store.gc.domain, 1.b.store.gc.domain, 2.b.store.gc.domain, etc).
  • Indexing and searching server (anyone can troll the content servers and create their own indexes and searching mechanisms)

Users begin by entering entry.gc.domain into their browsers and, via round robin DNS, they are directed to one of many entry servers. The entry server re-writes the domain name to the entry server's canonical domain name (e.g. a.entry.gc.domain). The user begins an http file upload. Once the initial file is uploaded, the server provides the user with a URL where the user can check back on the status (e.g. a.entry.gc.domain/?status=hJlw24lF814Jqm9z). Then, the server begins transcoding the file. When complete, the server publishes the transcoded file in a pre-defined directory (e.g. a.entry.gc.domain/data/hJlw24lF814Jqm9z/).

Meanwhile, primary storage servers, on a scheduled job, iterate over all available entry servers asking for content. When queried, if the entry server has content that has not been passed to a storage server, it hands it to the storage servers and logs which storage server it handed it off to.

Once the storage server copies the content, it responds to the entry server saying it's complete. At that point the entry server publishes, via the status URL, the canonical URL of the uploaded content (e.g. a.store.gc.domain/hJlw24lF814Jqm9z/item.ogg) to the uploader and deletes the content from the entry server.

Storage servers, on entering the system, will publish their maximum capacity. When they reach capacity, a storage server admin may turn off their scheduled job to stop additional downloads. Storage servers will advertise their status via a public URL (e.g. a.store.gc.domain/status).

Backup servers can choose a primary storage server to mirror based on the published storage capacity. Backup storage servers run a scheduled job to mirror the content from the primary storage servers. The backup storage servers only exist to recover from a primary server going down. They do not delivery content to the public.

Caching servers provide load and bandwidth balancing for the primary servers.

Last modified 10 years ago Last modified on Oct 15, 2009, 2:49:11 PM