Thursday, December 14, 2006

The Hefty Job of Internationalization

When I worked in the purely commercial software world, internationalization of a product was a critical, magical event that happened somewhere offsite in the hands of contracted translation companies. Having your software translated into as many different languages as possible is just as important in the open source arena. Arguably, open source projects NEED translations MORE to grow into a global presence. The PROCESS of translating software and documentation for an open source project is an entirely different experience from the commercial event depicted above, as I have learned over my last two years with Pentaho.

This topic is at the forefront of my writings today because I have been tasked with figuring out how to manage localizing the Pentaho documentation in a wiki that has no feature support for internationalization of content! Did I mention that Pentaho is moving all of their documentation to a wiki? Well, there it is. The cat's out of the bag. As of our 1.2 GA release, the documentation will be maintained, by community and development team, in a Confluence wiki.

Confluence is a great tool and has lots of integration points with our case tracking system, JIRA. That said, it seems that Atlassian (the company behind Confluence and JIRA) is a little behind in the internationalization game. Confluence only recently started to support language packs for translation of the wiki itself, and has no support for content translation.

So here we are, and I need to figure out a solution to satisfy three very important groups. The first group are the translators of our documentation. They are community members who contribute those translations to our projects. We need to set up the internationalization in the wiki so that it is easy for these folks to do the initial translation, and also have a mechanism for notifying them when the master language version of the doc has changed. The second group is the users of the documentation. It should be easy for me, if I am French, to find the French documentation, but also be able to peruse the English documentation. And the third group is the poor guys in house that have to maintain the organization of this documentation. When you consider we have over 10 projects, translated (so far) into 8 languages, that get a new revision of documentation for every version of the project released... well, that's alot to manage.

So my initial attempt at a design for this conglomeration that we need to support was to try to stuff all of the languages in the same document in the wiki using DIV type tags and such for separation. I'd then use some custom code to hide the other translations based on the user's language setting in the browser. This would make my translators happy initially, because they can do the initial translation almost inline with the master language version. My users would be happy because our wiki respects their browser's language settings, and if a particular piece of content hasn't been translated to the user's language of choice, we would default to the master language version. Of course, this solution does not address the translator notification of master language changes, and well, it would be a bit of a pain to determine whether it was an master language version change or a translation change, with all that content in the same document. Also, this only addresses translation of the content. What about the document titles? Since the navigation of a wiki (by definition) is based on document name, we have a big problem to solve there. And the largest point of failure in my grand plan is the fact that the merging capabilities are not so hot in our wiki of choice, so the translators would have to line up and take turns translating. Ick, in a word.

So the next path we steer down is that path that takes us to completely separate repositories (called Spaces in Confluence) for the different languages. This gives us autonomy for the purposes of editing and not having the language content intermingled, but at a pretty large synchronization and maintenance cost. We now need to figure out, do we populate the French repository with all of the English documentation, to assist our translators in translating? Well, then we have up to eight copies of the doc, that is changing realtime, and is sure to get out of sync. So, perhaps we should let the translators populate their language wiki from scratch, organically? This isn't very accommodating to our translators, and documents will surely be placed out of order with the master language wiki, making it confusing for the users.

Yikes. It's at this point, having discussed the plethora of less-than-stellar options with a few clever guys on our team, that I decided to step back and write to the community. In my mental gymnastics over this problem, I made many assumptions about what our users and translators really want.

For the translators in our community, have you worked at translating in a wiki before? What did you like about it? What did you hate? Is it easier for you to translate everything in your format of choice, or do you like the idea that once you translate it it would be available, without having to wait for the Pentaho team to publish it? Of the two scenarios I detailed above, which is the lesser of two evils for you?

And for all of the rest our community that must USE and update our documentation - would it be more frustrating for you to work with translations inline in the wiki (in edit mode only) or to have to go hunt around someplace else to find those translations?

And, of course, for any other open source project that holds the silver bullet to this problem - feel free to share your solution here!! Heck, I'd even take well intentioned guesses and good ideas :)