wiki:faq/gitification

Version 31 (modified by Jackrabbit, 7 years ago) (diff)

--

Gitification of Indymedia linksunten

Introduction

Until May 2012,Indymedia linksunten used the Good Old Fashioned Way™ to keep track of upstream changes to its Drupal (in fact: Pressflow) core and modules: drush dl mymodule. At least in theory. In reality, the core has been patched twice, many modules even more and some self-written modules do not even exist in a public drupal repository. linksunten has some 80 modules installed and keeping track of updates is wearisome for the non-patched modules and troublesome for the patched ones. We learned that a version control system could ease the error-prone update procedure and as drupal.org has migrated to git we decided to do the same. Beforehand, we evaluated mercurial and bazaar and from a technical point of view we could have chosen all three.

Since a Drupal website is not a monolithic bloc and nearly each module is maintained by different developers we needed to find a way to update the core and each module separately from one another. The traditional way git offers for this is a concept called git-submodule. It is complicated, unintuitive and detested by many for good reasons. But as git follows the TMTOWTDI paradigm we could avoid using git-submodule and settled for git-subtree instead which has recently been merged into git core. Besides the possibility to update the core and each module separately and replaying our patches to the updated version automatically, we want the linksunten code to be one git repository which simply "works" after cloning it. After our move to git we will use the features and ctools modules to version control as much of our configuration data as possible.

Drupal installation

Drupal core

We create a new directory, run git init, use git-config to specify which name and email to use, create a temporary file which we commit, delete and commit again to create a master branch:

mkdir liu_d6
cd liu_d6
git init
git config user.name "Indymedia linksunten"
git config user.email "línksunten@índymedía.org"
touch liu_d6
git add liu_d6
git commit -m "Initial commit Indymedia linksunten Drupal 6."
git rm liu_d6
git commit -m "Created master branch."

We don't want the settings.php file in our repository as it contains database login credentials so we tell git to exclude it:

echo settings.php >> .git/info/exclude

We add Pressflow 6 as a new remote and fetch it:

git remote add pressflow-6.x git://github.com/pressflow/6.git master
git fetch pressflow-6.x

Now we can add Drupal core from our pressflow-6.x remote via git-subtree in a subdirectory core. We use the --squash parameter as we do not need the whole commit history of Pressflow in our master branch:

git subtree add --squash --prefix=core pressflow-6.x/drupal-6.26

As we are creating a production environment, we'll delete some of the unnecesary files:

git rm core/install.php core/*.txt
git commit -m "Delete install.php and text files from core directory."

We copy our settings.php to core/sites/default and delete the default.settings.php:

cp ~/settings.php core/sites/default
git rm core/sites/default/default.settings.php
git commit -m "Deleted default.settings.php."

There are some more modifications to do (like applying two core patches using git-am which have been created with git-format-patch before) but they are really specific to Indymedia linksunten so we leave them out.

Drupal modules

Normally, Drupal modules are installed under sites/default/modules. This would be fine with our approach but it would create unnecessary huge merges when updating the core and it keeping all parts separately accessible from the root directory of our installation is much clearer arranged. So we create a modules (and perhaps also a files, libraries and themes) directory, a symbolic link to it and add commit the link:

mkdir modules
cd core/sites/default
ln -s ../../../modules
cd ../../../
git add core/sites/default/modules
git commit -m "Add symbolic link to modules directory."

Now we install a module in it. As an example we chose the i18n module. At the Drupal project page we click on the green "Version Control" tab and chose "Version to work from: 6.x-1.x". There we get the URL which we need to add the repository as a remote repository. The parameter -f triggers an instant fetch:

git remote add -f i18n-6.x-1.x git://git.drupal.org/project/i18n.git 6.x-1.x

Now we do not install the latest version 6.x-1.10 but version 6.x-1.9. The reason is that we have patched that version and we want to use git-subtree and git-rebase to reapply our patches to the newest version. First, we install 6.x-1.9:

git subtree add --squash --prefix="modules/i18n" 6.x-1.9

Then we overwrite the newly imported files with our patched version and commit the patches. At this point, interactive staging might be a good idea.

cp ~/i18n.pages.inc modules/i18n
cp ~/i18nsync.module modules/i18n/i18nsync
git add modules/i18n/i18n.pages.inc
git commit -m "i18n: Exchange title with nid in translation box."
git add modules/i18n/i18nsync/i18nsync.module
git commit -m "i18n: Inherit path when syncing."

Drupal update

We are going to update a patched Drupal module using git-subtree and git-rebase. We create a new branch containing the version of our module we want to update to. As we have already added i18n as a git-remote we can git-checkout (as a shortcut for git-branch) the desired version in a separate branch using the corresponding tag. As long as there are no separate namespaces for remote tags in git we definitely want to start with git-fetch to be sure to refer to tags of the right module:

git fetch i18n-6.x-1.x
git branch i18n-6.x-1.10 6.x-1.10

Then we extract the patched version of our module into a branch along with its history:

git subtree split --rejoin --prefix=modules/i18n --branch=i18n-linksunten

Now we can git-rebase our branch on top of the new version of the module:

git rebase i18n-6.x-1.10 i18n-linksunten

Finally, we have to subtree merge the patched new version into our master branch:

git checkout master
git subtree merge --squash --prefix=modules/i18n i18n-linksunten

After that, you can delete the two branches:

git branch -D i18n-6.x-1.10 i18n-linksunten

The same process can be applied to update the core. Hopefully, you did not need to patch the core (as we did) so git-fetch will result in fast-forward merges.

hook_system_info_alter

In the past, drupal.org used CVS as version control system and switched to git only recently. Unfortunately, not all module maintainers have adapted their code base to the new revision control system manually. Instead, the Drupal team migrated lots of projects automatically. So at least at the moment, you'll discover that release version of many modules are not tagged at all.

Another problem is the available updates page at /admin/reports/updates and the version information obtained by drush pm-list. When checking out a module via git no version information is added by the drupal.org package manager. Enter git_deploy. More precisely commit 68bd1a8219cbe59e7fbe56b317a600321116ddfa from Thu Apr 26 10:15:16 2012 -0700.

git_deploy 6.x-2.x

<?php

/**
 * @file
 *
 * This module add versioning information to projects checked out of git.
 */

/**
 * Implement hook_system_info_alter() to provide metadata to drupal from git.
 *
 * We support populating $info['version'] and $info['project'].
 *
 * @param $info
 *   The module/theme info array we're altering.
 * @param $file
 *   An object describing the filesystem location of the module/theme.
 */
function git_deploy_system_info_alter(&$info, $file) {
  $type = isset($info['engine']) ? 'theme' : 'module';
  if (empty($info['version'])) {
    $directory = dirname($file->filename);
    // Check whether this belongs to core. Speed optimization.
    if (substr($directory, 0, strlen($type)) != $type) {
      while ($directory && !is_dir("$directory/.git")) {
        $directory = substr($directory, 0,  strrpos($directory, '/'));
      }
      $git_dir = "$directory/.git";
      // Theoretically /.git could exist.
      if ($directory && is_dir($git_dir)) {
        $git = "git --git-dir $git_dir";
        // Find first the project name based on fetch URL.
        // Eat error messages. >& is valid on Windows, too. Also, $output does
        // not need initialization because it's taken by reference.
        exec("$git remote show -n origin 2>&1", $output);
        if ($fetch_url = preg_grep('/^\s*Fetch URL:/', $output)) {
          $fetch_url = current($fetch_url);
          $project_name = substr($fetch_url, strrpos($fetch_url, '/') + 1);
          if (substr($project_name, -4) == '.git') {
            $project_name = substr($project_name, 0, -4);
          }
          $info['project'] = $project_name;
        }
        // Try to fill in branch and tag.
        exec("$git rev-parse --abbrev-ref HEAD 2>&1", $branch);
        $tag_found = FALSE;
        if ($branch) {
          $branch = $branch[0];
          // Any Drupal-formatted branch.
          $branch_preg =  '\d+\.x-\d+\.';
          if (preg_match('/^' . $branch_preg . 'x$/', $branch)) {
            $info['version'] = $branch . '-dev';
            // Nail down the core and the major version now that we know
            // what they are.
            $branch_preg = preg_quote(substr($branch, 0, -1));
          }
          // Now try to find a tag.
          exec("$git rev-list --topo-order --max-count=1 HEAD 2>&1", $last_tag_hash);
          if ($last_tag_hash) {
            exec("$git describe  --tags $last_tag_hash[0] 2>&1", $last_tag);
            if ($last_tag) {
              $last_tag = $last_tag[0];
              // Make sure the tag starts as Drupal formatted (for eg.
              // 7.x-1.0-alpha1) and if we are on a proper branch (ie. not
              // master) then it's on that branch.
              if (preg_match('/^(' . $branch_preg . '\d+(?:-[^-]+)?)(-(\d+-)g[0-9a-f]{7})?$/', $last_tag, $matches)) {
                $tag_found = TRUE;
                $info['version'] = isset($matches[2]) ? $matches[1] . '.' . $matches[3] . 'dev' : $last_tag;
              }
            }
          }
        }
        if (!$tag_found) {
          $last_tag = '';
        }
        // The git log -1 command always succeeds and if we are not on a
        // tag this will happen to return the time of the last commit which
        // is exactly what we wanted.
        exec("$git log -1 --pretty=format:%at $last_tag 2>&1", $datestamp);
        if ($datestamp && is_numeric($datestamp[0])) {
          $info['datestamp'] = $datestamp[0];
        }

        // However, the '_info_file_ctime' should always get the latest value.
        if (empty($info['_info_file_ctime'])) {
          $info['_info_file_ctime'] = $datestamp[0];
        }
        else {
          $info['_info_file_ctime'] = max($info['_info_file_ctime'], $datestamp[0]);
        }
      }
    }
  }
}

Analysis of git_deploy

Version 1.x of git_deploy was based on glip, a Git Library In PHP. Version 2.x of git_deploy calls the git executable directly and parses the output instead. There might be issues in a shared hosting environment but many people report that the 2.x version works far better than the 1.x version, so we'll adapt the idea of git_deploy 2.x to git-subtree.

Let's analyse what git_deplploy does. The module implements only one hook: hook_system_info_alter. With this hook the module info obtained through git can be induced.

git_deploy searches for the .git directory of the module, so it will only work for git-submodules:

      while ($directory && !is_dir("$directory/.git")) {
        $directory = substr($directory, 0,  strrpos($directory, '/'));
      }

git_deploy then uses the fetch url obtained by git remote show -n origin to determine the project name:

        exec("$git remote show -n origin 2>&1", $output);
        if ($fetch_url = preg_grep('/^\s*Fetch URL:/', $output)) {
          $fetch_url = current($fetch_url);
          $project_name = substr($fetch_url, strrpos($fetch_url, '/') + 1);
          if (substr($project_name, -4) == '.git') {
            $project_name = substr($project_name, 0, -4);
          }
          $info['project'] = $project_name;
        }

This approach won't work for git-subtree as there is no mapping of modules to remotes and remotes are not exported. So a git-clone won't know the original fetch urls used for git-subtree-add. Fortunaltely, drupal.org uses well-defined fetch urls so we can reconstruct the information by sanning for .modules files. But the process will be much more complicated and time-consuming with git-subtree than it is with git-submodule as we would either have to keep the module's history in a separate branch, incorporate the whole history by not using the --squash parameter, use git-ls-remote to search git.drupal.org or (temporarily) git-clone the module.

Keeping the whole history in a separate branch would work but this is a fragile approach because we'd lose the liberty to mess around with our repository which is one of the key advantages of the git-subtree approach compared to the git-submodule one. We do want to use the --squash parameter to keep the overall size of the git repo small and the git history uncluttered. git-ls-remote is too slow when having lots of modules installed and searching through remote git repositories does not feel right at all. So the only sensible way seems to be to clone the repository and determine the necessary info locally.

Outline of git_subtree

One way to solve the problem would be a Drupal module which uses the Drush api to clone the git repository of each Drupal module locally and then determines the version information by using the git-subtree-split line returned by git-log. It could write this information to a mymodule.gitinfo file. A module could then retrieve this information from the mymodule.gitinfo file and a hook_system_info_alter could induce it into the local Drupal ecosystem. By adding *.gitinfo to '.gitignore' we could be sure that these files do not cause any problems when syncing with upstream repositories.

git_subtree_system_info_alter()

This function takes a .gitinfo file and uses Drupal's drupal_parse_info_file() to populate the $info variable through hook_system_info_alter:

/**
 * Implement hook_system_info_alter() to provide metadata to drupal from git.
 *
 * We support populating $info from a gitinfo file.
 *
 * @param $info
 *   The module/theme info array we're altering.
 * @param $file
 *   An object describing the filesystem location of the module/theme.
 */
function git_subtree_system_info_alter(&$info, $file) {
  // Determine whether this is a theme or a module
  $type = isset($info['engine']) ? 'theme' : 'module';
  // We only need to look for a version if it is not yet set
  if (empty($info['version'])) {
    // Get the filename
    $filename = check_plain($file->filename);
    // Get the directory of the theme/module
    $directory = dirname($filename);
    // Check whether this belongs to core. Speed optimization.
    if (drupal_substr($directory, 0, drupal_strlen($type)) != $type) {
      // Guess the gitinfo filename
      $gitinfo_filename = preg_replace('/\.[^.]*/', '.gitinfo', $filename);
      // Check whether a gitinfo file exists
      if (file_exists($gitinfo_filename) && is_readable ($gitinfo_filename)) {
        // Parse the gitinfo file
        $gitinfo = drupal_parse_info_file($gitinfo_filename);
        // Populate $info with data from gitinfo file
        foreach (array_keys($gitinfo) as $key) {
          if (isset($gitinfo[$key])) {
            $info[$key] = $gitinfo[$key];
          }
        }
      }
    }
  }
}