fromJune 2015
Feature:

Wait, $langcode? What the Heck?

Writing Language-aware Drupal 8 Code That Just Works
0

Wait, $langcode? What the Heck?

Photo of words If that was the most polite thought that crossed your mind when dealing with the Drupal 7 Field API, please read on.

No matter whether you build complex multilingual sites, or whether just hearing the words “Drupal” and “language” in the same sentence makes you want to hide in the darkest corner of your office, there are a few language-related notions that you really need to know to write Drupal 8 code that works properly. After all, language is an intrinsic property of textual content, and since Drupal is supposed to manage content, having to deal with language does not seem such a peregrine idea, does it?

Speaking of Content

Historically, content in Drupal is a user-friendly way to refer to nodes. However, in Drupal 8, content has a broader meaning: it refers to any entity type which stores information usually meant to be experienced in some form by a certain set of site users.

Content entities, such as nodes, comments, terms, custom blocks, custom menu links, and users, are all examples of this kind of entity type. The other main category is Configuration entities: node types, views, fields, menus, and roles, are meant to store information mainly related to determining the site behavior. Note that this distinction may not always be so clear-cut, as in some cases the choice of picking one category or the other may be determined mainly by implementation details, as in the case of module-provided menu links.

To sum up, when in Drupal 8 we speak of content, most of the time we are referring to content entity types.

Multilingual Content: A Bit of History

In Drupal 7, a new way of translating content was introduced by adding native multilingual support to the Field API. That allowed the ability to store multilingual values for any field attached to any entity type. But code that implements business logic needs to explicitly deal with field language, which implies a very poor developer experience (DX); i.e., this infamous field data structure:

$entity->{$field_name}[$langcode][$delta][$column]

Unlike the $delta and $column levels, whose values are predictable, deciding what the correct value should be for the $langcode level is definitely not trivial. In fact, depending on whether the field is translatable or not, it may hold an actual language value or LANGUAGE_NONE. Dealing with a translatable field in Drupal 7 can be accomplished in two different ways, depending on the business logic being implemented:

  1. Acting on all available translations, which is what the Field API does in (C)RUD operations.
  2. Acting on a single language, the active language, which is what the Field API does when rendering the entity or its edit form.

Dear developer: As you may have guessed (or learned the hard way), figuring all this out is your responsibility. (Ouch!)

One of the reasons dealing with field language is so hard in Drupal 7 is that, despite all the work that went into designing and coding the Field API, core does not provide a full-fledged Entity API to build on. As a consequence the Drupal 7 “Entity Language API” is an inconsistent mess spread in three places:

  • Core provides an entity_language() function that can be used to determine the active language.
  • The Entity API module provides the entity metadata wrapper which, among the rest, makes it easier to access field values and deal with their language.
  • The entity translation CRUD hooks are provided by the Entity Translation module.

Content Translation Models

In Drupal 7 we have two competing models to translate content:

  • The core Content Translation module allows us to translate nodes by creating one node for each translation and grouping them into translation sets.
  • The Entity Translation module relies on the Field Language API to implement field translation support for any fieldable entity type.

The main reason why the field translation model was introduced is that the node translation model, although easy to implement and support on a superficial level, has several drawbacks when trying to deal with more advanced use cases, in particular when needing to share data among translations.

Another issue that can be problematic when building a website that needs a high degree of symmetry among the various language versions, is that identifiers are different for each language.

Last but not least, we need to translate any entity type, but extending the translation set approach to work universally is definitely not a trivial effort. On the other hand, aside from the DX issues, making field translation work properly does not turn out to be an easy task either. In fact, all non-field data attached to entities is not supported, and requires a workaround like the Title module, that allows us to translate entity labels.

In fact, having two models in core is a bad situation: site builders have to pick one, developers have to support both, and everyone has to understand both. This leads to additional cognitive, operative, and maintenance burdens, that in many cases make building a multilingual website in Drupal 7 a painful experience. In Drupal 8, we faced the need to resolve this “conflict” and provide a single content translation model. The solution we found is a unified model: if every piece of data is a field we can replicate the node translation model by making every field translatable. The only difference is that we have a single entity for each translation set.

The Content Translation UI

The Drupal 8 Content Translation module comes with a very powerful configuration page, that allows us to configure translatability from bundle to field property level for all the supported entity types. It also allows us to configure which should be the entity default language, and whether it should be alterable or not.

The content translation UI is very similar to the Drupal 7 one: the main differences are the source language picker and the labels indicating that a field is not translatable.

The “Content language and translation” configuration page.

The content translation user interface.

The table layout for translatable and revisionable entity types

  • The {entity} base table holds entity keys and metadata only:
    | entity_id | uuid | revision_id | bundle_name |
    
  • The {entity_revision} table holds the basic revision entity keys and language:
    | entity_id | revision_id | langcode |
    
  • The {entity_field_data} table stores entity field data per language:
    | entity_id | revision_id | langcode | default_langcode | label |
    
  • The {entity_field_revision} table holds the revisions of field data:
    | entity_id | revision_id | langcode | default_langcode | label |
    

Translating Every Field

The main obstacles toward making every piece of data a field (and making it translatable) in Drupal 7, were the huge DX issues mentioned above and, above all, the lack of a storage layer supporting multilingual values for any field.

In Drupal 8, we can rely on a solid core Entity API that exposes entities as classed objects, and thus allows us to encapsulate all the complexities involved in dealing with field translatability. We also have an Entity Storage layer that provides a unified way to load and store field data in a storage-agnostic fashion. This means entities can be stored in an SQL database (the default implementation), as well as rely on MongoDB or an XML storage; the decision is totally up to the current storage handler. This also means we can bake native multilingual support into each storage handler without needing to touch any other part of the Entity API.

When entity translations are added to or removed from the storage, the following hooks are fired:

  • hook_entity_translation_insert(EntityInterface $translation)
  • hook_entity_translation_delete(EntityInterface $translation)

Core SQL Storage

The default SQL implementation distinguishes between base fields – which are attached to any entity – and bundle fields – which are attached only to certain bundles. The former are stored in the entity shared tables; the latter, as well as base fields with multiple cardinality, are stored in dedicated field tables. While field tables natively support multilingual values, exactly as in Drupal 7, for shared tables, four different layouts are supported, depending on the entity type definition:

  • Simple entity types use just a base table, where all base field values are stored.
  • Revisionable entity types use a base table and revision table, where all the base field revisions are stored.
  • Translatable entity types use a base table and a field data table. The former stores just very basic data, like id, bundle or UUID, while the latter stores base field data, one record for each available translation.
  • Revisionable and translatable entity types use four tables to store basic data, revision metadata, base field translations, and base field translated revisions.

The Entity Storage API allows us to switch between table layouts by merely altering entity type definitions. This allows us to pick the most performant table layout that suits the project requirements. By the way, did I mention that Views relies on the Entity Storage API to provide native multilingual support?

Entity querying and multilingual

The Entity Query API does not make any assumption on language conditions:

$result = \Drupal::entityQuery('node')
  ->condition('promote', 1)
  ->condition('status', 1)
  ->execute(); // Nodes with one published/promoted translation

$result = \Drupal::entityQuery('node')
  ->condition('promote', 1)
  ->condition('status', 1)
  ->condition('langcode', 'en')
  ->execute(); // Nodes with one english promoted translation

$result = \Drupal::entityQuery('node')
  ->condition('promote', 1)
  ->condition('status', 1)
  ->condition('default_langcode', 1)
  ->execute(); // Nodes with promoted original values

Shut Up and Show Me Some Code

Being able to store multilingual values for any piece of content is great, but is that enough? Smart readers will have guessed the answer at this point, I suppose.

Although many of the intricacies of dealing with field translatability have been hidden below the ContentEntityBase rug, developers still need to keep in mind that any entity type may find itself operating in a multilingual environment sooner or later, so the related business logic should be coded accordingly.

Okay, let's see some examples!

Accessing Field Data

The new Entity Translation API relies on a very simple concept: every (content) entity object represents an entity translation.

    // A translation object is a regular entity object.
    $entity->langcode->value = 'en';
    $value = $entity->foo->value;
    $translation = $entity->getTranslation('it');
    $it_value = $translation->foo->value;

    // A translation object can be instantiated from any translation object.
    $langcode = $translation->language()->getId(); // $langcode is 'it';
    $untranslated_entity = $translation->getUntranslated();
    $langcode = $untranslated_entity->language()->getId(); // $langcode is 'en'
    $identical = $entity === $untranslated_entity; // $identical is TRUE
    $entity_langcode = $translation->getUntranslated()->language()->getId(); // $entity_langcode is 'en'

As you can see, field language is no longer exposed in the public API. (Yay!) Additionally, thanks to some behind-the-scenes magic, untranslatable field data is shared among entity translation objects, while translatable field data is accessible only from the related entity translation object.

    // Field data is shared among all the translation objects.
    $entity->langcode->value = 'en';
    $translation = $entity->getTranslation('it');
    $en_value = $entity->field_foo->value;
    $it_value = $translation->field_foo->value;
    $entity->field_untranslatable->value = 'foo';
    $translation->field_untranslatable->value = 'bar';
    $value = $entity->field_untranslatable->value; // $value is 'bar'

This means that we no longer need to worry about field translatability, which is pretty relieving, to say it politely.

The Active Language

In Drupal 8, every translation is an entity object with its own language; we only need to pass it around, as we’re already used to doing, to make the active language available in any part of our code base.

    $langcode = Drupal::languageManager()->getLanguage(LanguageInterface::TYPE_CONTENT);
    $translation = $entity->getTranslation($langcode);
    entity_do_stuff($translation);

    function entity_do_stuff(EntityInterface $entity) {
      $value = $entity->field_foo->value;
      $langcode = $entity->language()->getId();
      // do stuff
    }

As you can see, in Drupal 8 entity_do_stuff() can be completely language-agnostic, unless its business logic explicitly deals with language, which can be retrieved form the entity (translation) object.

Entity Language Negotiation

In Drupal 7, the only way to determine what translation should be selected among the available ones, is by calling field_get_items(), which tries to determine which translations are available by inspecting field values through field_language(). If a field translation is missing for a certain language, field_language() will return a fallback value, which may lead to having different languages for different fields, a confusing and unintended behavior. When writing this code we assumed that all field translations would be in the same language.

Additionally, this behavior makes sense only in a rendering context; applying field language fallback in a save context would cause non-existing translations to be stored. In Drupal 8, we fixed this mess by introducing the concept of entity language negotiation and applying it to the whole entity object. Then we inspect available entity translations and pick the one that most suits the available contextual data. For the rest, this means empty values are just treated as such, instead of triggering field-level language fallback.

    // D8 features a reusable entity language negotiation API.
    function viewEntity(EntityInterface $entity, $view_mode = 'full', $langcode = NULL) {
      // If $langcode is NULL the current content language is used
      $translation = \Drupal::entityManager->getTranslationFromContext($entity, $langcode);
      entity_do_stuff($translation);
      $build = array();
      // do more stuff
      return $build;
    }

    // A context can be provided.
    function node_tokens($type, $tokens, $data = array(), $options = array()) {
      if (!isset($options['langcode'])) {
        // This instructs the system to use the entity original language.
        $langcode = Language::LANGCODE_DEFAULT;
      }

      // The default operation is 'entity_view'.
      $context = array('operation' => 'node_tokens');
      $translation = \Drupal::entityManager()->getTranslationFromContext($data['node'], $langcode, $context);
      $items = $translation->get('body');
      // do stuff
    }

Modules can hook into the entity language negotiation process and alter it based on the contextual data passed along via hook_language_fallback_candidates_alter(). There is also an operation-specific version of the hook: hook_language_fallback_candidates_OPERATION_alter().

This API is not limited to entities, but contextual data allows us to tell when entities are involved. In fact, the entity object is always part of the contextual data. The default operation (entity_view) works correctly for view builders and form handlers. Entity language negotiation is automatically applied to entities being referenced in the current route's path (e.g., node/1), so in these cases there is no need to explicitly instantiate a translation object: the correct one is provided by default. The same is true for all the hook_form_alter() implementations and the various callback functions involved in entity form building and entity rendering.

Entity Translation Handling

If your code explicitly needs to deal with translations, and it needs to act on all translations instead of just dealing with the active one, there are a few useful methods to help with that.

    // Acting an all translations.
    $languages = $entity->getTranslationLanguages();
    foreach ($languages as $langcode => $language) {
      $translation = $entity->getTranslation($langcode);
      entity_do_stuff($translation);
    }

    // Creating a new translation after checking it does not exist.
    if (!$entity->hasTranslation('fr')) {
      $translation = $entity->addTranslation('fr', array('field_foo' => 'bag'));
    }

    // Which is equivalent to the following code, although if an invalid language
    // code is specified an exception is thrown
    $translation = $entity->getTranslation('fr');
    $translation->field_foo->value = 'bag';

    // Accessing a field on a removed translation object causes an exception to be
    // thrown.
    $translation = $entity->getTranslation('it');
    $entity->removeTranslation('it');
    $value = $translation->field_foo->value; // throws an exception

What is Missing?

Okay, this sounds great (hopefully), but is it really as cool as it sounds?

Not yet. There is some more work to do to make all of this perform flawlessly:

  • We still need to finalize the SQL storage API to make it possible to actually switch between table layouts. The foundational code is in place, but the entity-type-specific storage handlers need to be updated to deal with it.
  • The Content Translation UI still needs some TLC with UX improvements, and UI polishing would be more than welcome.
  • And last but not least: bug fixing. You know, from time to time, in the process of rewriting core, we inadvertently introduce a bug or two. We were not able to fix them all so far. Maybe you will?

Ad Maiora

To sum up, even if some care from developers is still required when designing and coding a functionality that may end up operating in a multilingual environment, we truly believe we built a system far superior to what we have in Drupal 7, a system allowing us to minimize the effort required to make things simply work properly.

So what you are waiting for? Make Drupal 8 shine even brighter!

Image: ©aaabbc-Fotolia.com