When bringing multilignual capabilities to CiviCRM, one thing is to figure out (and actually implement afterwards…) the whole mechanism for storing and managing mulitlingual data, but quite another is to choose which database fields/columns should benefit from (or be cursed with) the ability of exisitng in parallel, language-dependant versions.

 

The mechanism of managing mutlilingual data in CiviCRM is based on the idea of replacing certian single-language database columns with their multiple copies (one per language). For example, a Russian+English site might want to store the contact names in both Cyrillic and English transliterations; to make this possible, when a site turns multilingual the civicrm_contact table’s last_name column is replaced by last_name_en_US and last_name_ru_RU columns.

To make this approach usable from the coding perspective, along with the replacement of columns new views are created (one for every table+language combination). These views expose the langauge-specific columns under their ‘original’ names; for example, in the above case two views would be created: civicrm_contact_en_US (which exposes last_name_en_US as last_name) and civicrm_contact_ru_RU (which exposes last_name_ru_RU as last_name). This solution means that all existing database queries can be rewritten on-the-fly by simply suffixing the table names with the current locale, and so the code issuing the queries doesn’t even have to be aware that it’s working with a multilingual database.

As mentioned in the introduction, the implementation of the above approach is but a part (even if a major one) of the whole ‘how to store multilingual CiviCRM data’ problem. The other problem is *which* fields (database columns) should get this special treatment. One one hand, it might be useful (in some corner cases) to be able to present a lot of the CiviCRM data differently in different languages; on the other, most installations wouldn’t really benefit from having an entry on animal shelter housing ten dogs in the English version and twenty dogs in the Russian one (so, obviously, numerical custom data shouldn’t be multilingual – but what if there’s a free-form text for ‘species we can house’?).

My general approach when resolving this question is based on answering the questions whether the potential field is a free-form, text one (that’s a requirement for even considering a field to be multilingual); whether it’s about the interface presented to the users rather than actual data (i.e., a pre/post-form help text, which should definitely be different in different languages); whether it stores some data that really should be kept synchronised across languages and does not really vary with language (and so making the field multilingual would be a bad idea), like email address; whether it would benefit at all from being multilingual, like the contents of a note about a contact – on one hand, the users of a given CiviCRM install might be multilingual and benefit from translated notes, but on the other, in the vast majority of cases the contact-managing staff uses a common language, and it would be detrimental to have potential discrepancy in the contents of the notes; finally, whether some parts that are definitely hard data that should be the same across all languages wouldn’t actually benefit from being transliterable, like the ability to write a contact name (or address) in Cyrillic, the example shown above.

After some discussion inside the CiviCRM team we decided to have a small subset of core fields made multilingual first, and then extend the scope based on whether our calls were right and the reaction of the multilingual community (‘it would really help us if the X field was multilingual’); thus, my recent changes to CiviCRM which extend the number of multilingualised fields.

The last issue regarding these fileds is how to expose their multilignual capabilities in the UI. One obvious solution is to have the right version of the content editable when switched to a given language (and this is what CiviCRM does); another solution was to introduce small pop-ups, which allow instant access (and editing) of a given field’s value across all the languages supported in a given installation. I’m happy to say that this functionality, initially implemented for just a couple of fields as a proof-of-concept solution, was successfuly ported from Dojo to JQuery, made more useful and, thus, extended to cover almost all multilignualised fields.

And now off to make the multilingual upgrade path easier to cope with on the code level – with my CiviCRM 3.0 release manager hat on, I’m defilintely interested in making it as simple as possible this week. :)