Changes between Version 7 and Version 8 of container-infrastructure

Jul 9, 2018, 9:38:11 AM (3 years ago)
Jamie McClelland



  • container-infrastructure

    v7 v8  
    1 = Container Infrastructure =
    3 Our current infrastructure was designed in 2005 to primarily provide web, email and database services to our members. A lot has changed since then - both with the organization and the available technology. It is time we re-organized our infrastructure to address our short comings and take advantage of new technologies available to us.
    5 This document provides a less technical overview of the big picture and objects. A [wiki:red-mosh-reorganization companion piece provides more technical details about immediate steps we are planning] to reach this goal.
    7 == The Problem ==
    9 **The primary problem with our current infrastructure is inflexibility:** when a site suddenly becomes popular, it is painfully slow and tedious to re-allocate hardware to support it. When a web site is attacked it is hard to move it behind a better firewall to block the attacks. And, when a site is compromised, it is hard to isolate it so it does not negatively impact other members. As our membership grows and increasingly experiences sudden and dramatic changes in technology needs, we need to be better able to handle them in a matter of minutes rather than hours or days.
    11 == The Technology ==
    13 The main technology explored in this plan is called [ containers]. Containers are a way to run Internet services, like a single web site, in an way that is isolated from everything else on the same server and can easily be re-created on other servers.
    15 == The Goals ==
    17 The container technology we will explore is all based on free, open source software however it's development is driven by capitalism. So, it's important that we keep our goals clear and understand where they differ from the goals of the technology we are implementing.
    19 In particular, the container based technology often assumes the use of leased hardware from corporations like Amazon, whereas our politics require us to fully own and control our own hardware. This important distinction has an impact on how we approach the technology.
    21 **The primary goal of our project is to allow us to more flexibility allocate our limited computer hardware to meet the needs of our members.**
    23 There are also a number of secondary goals which we hope to achieve but not at the expense of our first goal:
    25  * ''Ability to scale from a few thousand users to millions of users:'' this goal is the primary goal of most container based technologies, however, it only marginally applies to us. Yes, we want to be able to handle a web site that becomes explosively popular over night. However, our primary need is to handle thousands of relatively low traffic web sites rather than a single high traffic web site. This goal is still an important secondary goal so we have the ability to support the few members that are focused on growing their Internet resources into the millions.
    27  * ''Ability to instantly recover from hardware failure:'' this goal is also a primary goal of most container based technologies, but does not apply well to us. It largely depends on hardware capacity that is more than double the capacity you need to run your servers. When you have access to leased hardware via Amazon, this is quite simple and affordable. When you own all of your hardware it is prohibitively expensive. This goal still remains as an important secondary goal - and the ability to manually recover from hardware failure in a matter of minutes will still be possible. However, auto fail-over will most likely not be feasible for all member services.
    29 == First Steps ==
    31 It is impractical to simply switch from our current infrastructure to a new container-based infrastructure. Instead, we will need to plan a transition.
    33 Our current infrastructure was designed to be distributed as a protection against computers breaking down. Toward that goal, most of our services are organized into individual servers called MOSH's - which provide web, email, database, and ssh/sftp services on a single virtual server that is shared by about 50 members. Each MOSH is mostly independent of all other servers - it will keep on working even if every other server goes down. Currently we have about 75 MOSH servers.
    35 Below are some first steps we will need to take within our current infrastructure design to help prepare us for the transition to a container-based design.
    37 === Routing ===
    39 Our current infrastructure mostly uses the Domain Name System (DNS) to determine which member web sites, email etc. should be routed to which of our 75 MOSH's.
    41 To prepare for the container-based approach, we will need to change, so that the DNS system routes all members to one or several public facing servers, and these servers in turn route the request to the appropriate place in our network.
    43 We are currently using this approach for email - all members configure their email programs to send and receive via - which in turn routes the request to the appropriate server.
    45 We also are starting to provide that services for web sites - we have one web server that can provide caching services to protect it from DDOS and also high traffic, which in turn routes that web traffic to the appropriate server. This approach will need to be further developed to provide a generic form of the service for all web sites.
    47 Lastly, we have not started implementing this approach for incoming email (MX services) or SSH/SFTP which are still routed via DNS or MySQL servers (which are all served via localhost).
    49 === Authentication ===
    51 Our current authentication system is a mish-mash of MySQL provided by our control panel (the final authoriy), a [wiki:login-service login service api] that is backed by the database, an open ID system (also backed by the database) that is due to be retired, and a process of keeping traditional /etc/shadow files in sync with the control panel MySQL database.
    53 These will need to be replaced by a single, distributed system - most likely LDAP, [ FreeIPA], or an improved SQL based solution.
    55 With a single system, we can manage user authentication as well as common user and group ids to help ensure file system permissions are preserved.
    57 === Network storage ===
    59 Network storage means that a hard disk that is mounted on one physical server can be quickly unmounted on that server and re-mounted on a different server. It is a critical component to a container-based infrastructure in general and meeting our primary goal in particular.
    61 Currently, all hard disks in our network are provided by the physical servers hosting the services which means moving data is a slow and resource-intensive process.
    63 We will need to invest in a dedicated server to provide file systems to our network and begin experimenting with moving our data to this new server, probably running NFS plus [ DRBD] or [ ceph].
     1This page has been moved to [wiki:infrastructure-2018].