Thursday, December 14, 2006

The Hefty Job of Internationalization

When I worked in the purely commercial software world, internationalization of a product was a critical, magical event that happened somewhere offsite in the hands of contracted translation companies. Having your software translated into as many different languages as possible is just as important in the open source arena. Arguably, open source projects NEED translations MORE to grow into a global presence. The PROCESS of translating software and documentation for an open source project is an entirely different experience from the commercial event depicted above, as I have learned over my last two years with Pentaho.

This topic is at the forefront of my writings today because I have been tasked with figuring out how to manage localizing the Pentaho documentation in a wiki that has no feature support for internationalization of content! Did I mention that Pentaho is moving all of their documentation to a wiki? Well, there it is. The cat's out of the bag. As of our 1.2 GA release, the documentation will be maintained, by community and development team, in a Confluence wiki.

Confluence is a great tool and has lots of integration points with our case tracking system, JIRA. That said, it seems that Atlassian (the company behind Confluence and JIRA) is a little behind in the internationalization game. Confluence only recently started to support language packs for translation of the wiki itself, and has no support for content translation.

So here we are, and I need to figure out a solution to satisfy three very important groups. The first group are the translators of our documentation. They are community members who contribute those translations to our projects. We need to set up the internationalization in the wiki so that it is easy for these folks to do the initial translation, and also have a mechanism for notifying them when the master language version of the doc has changed. The second group is the users of the documentation. It should be easy for me, if I am French, to find the French documentation, but also be able to peruse the English documentation. And the third group is the poor guys in house that have to maintain the organization of this documentation. When you consider we have over 10 projects, translated (so far) into 8 languages, that get a new revision of documentation for every version of the project released... well, that's alot to manage.

So my initial attempt at a design for this conglomeration that we need to support was to try to stuff all of the languages in the same document in the wiki using DIV type tags and such for separation. I'd then use some custom code to hide the other translations based on the user's language setting in the browser. This would make my translators happy initially, because they can do the initial translation almost inline with the master language version. My users would be happy because our wiki respects their browser's language settings, and if a particular piece of content hasn't been translated to the user's language of choice, we would default to the master language version. Of course, this solution does not address the translator notification of master language changes, and well, it would be a bit of a pain to determine whether it was an master language version change or a translation change, with all that content in the same document. Also, this only addresses translation of the content. What about the document titles? Since the navigation of a wiki (by definition) is based on document name, we have a big problem to solve there. And the largest point of failure in my grand plan is the fact that the merging capabilities are not so hot in our wiki of choice, so the translators would have to line up and take turns translating. Ick, in a word.

So the next path we steer down is that path that takes us to completely separate repositories (called Spaces in Confluence) for the different languages. This gives us autonomy for the purposes of editing and not having the language content intermingled, but at a pretty large synchronization and maintenance cost. We now need to figure out, do we populate the French repository with all of the English documentation, to assist our translators in translating? Well, then we have up to eight copies of the doc, that is changing realtime, and is sure to get out of sync. So, perhaps we should let the translators populate their language wiki from scratch, organically? This isn't very accommodating to our translators, and documents will surely be placed out of order with the master language wiki, making it confusing for the users.

Yikes. It's at this point, having discussed the plethora of less-than-stellar options with a few clever guys on our team, that I decided to step back and write to the community. In my mental gymnastics over this problem, I made many assumptions about what our users and translators really want.

For the translators in our community, have you worked at translating in a wiki before? What did you like about it? What did you hate? Is it easier for you to translate everything in your format of choice, or do you like the idea that once you translate it it would be available, without having to wait for the Pentaho team to publish it? Of the two scenarios I detailed above, which is the lesser of two evils for you?

And for all of the rest our community that must USE and update our documentation - would it be more frustrating for you to work with translations inline in the wiki (in edit mode only) or to have to go hunt around someplace else to find those translations?

And, of course, for any other open source project that holds the silver bullet to this problem - feel free to share your solution here!! Heck, I'd even take well intentioned guesses and good ideas :)


Michael said...

Hi Gretchen,

That sounds like a really interesting project. I'm always amazed by the variety of ways in which our wiki can be used - on the face of it, a team-based internationalization effort seems like a perfect fit for the wiki environment.

I wanted to let you know that we are in the midst of having professional translations completed for both Confluence and JIRA, in both French and German. I expect to have these available for download by the beginning of January, and they will be included in the next version releases of both products. We plan other translations through the year, so be sure check back on our blogs for updates.

While I don't have a good answer for how best to go about your project (you've already thought through the things I would have suggested), our user community often finds answers to similar issues on our forums. I think that if you tried posting this topic there you might get some good feedback.

I hope you'll post again as to how you eventually solved this problem - I'll look forward to hearing the details.

Michael Knighten @ Atlassian

Gretchen said...

Hi Michael,

Thanks for your comments on the subject! I have scoured as well as posted a few messsages regarding this subject in your forums. It seems you have a few other customers asking for the same functionality that I have outlined here, but unfortunately, noone from Atlassian answered any of my posts or the others. So I was very happy to get your comments here.

Right now, we are still examining our options, including replacing Confluence with a wiki that natively handles internationalized, versioned content. I've been told MediaWiki has such support but haven't had a chance to research that claim yet.

Honestly, our hope has been that Atlassian has this feature request on a near-future roadmap for implementation. We have done a pretty deep dive into the technical architecture of Confluence, and versioning and internationalizing content is something we believe should be architected in at the database and content handling core levels of the Confluence application. Not as a customization or a plug-in.

I'd love to get your feedback regarding the possibility of putting htis feature on your roadmap. The pentaho team would be happy to help, by giving requirements, use cases, and providing beta testing. We believe strongly that without this kind of content internationalization support, companies that operate inn a global fashion will not be able to adopt Confluence for the long term. We know we can't.

Thanks again for your input, look forward to hearing from you!

kind regards,

Harald Walker said...

Hi Gretchen,

Did you find a solution for your problem?
We are at a similar point in our current project where we are considering confluence for the online documentation. Support for multiple languages is a must and since it is for an agile product, it should be easy to maintain, update and extend the content translations. I like the approach which has been suggested.

Gretchen said...

Hey Harald,

We are still researching our best options on this one, and if we decide to remain with Confluence, it looks like we will be writing some plugins to create the functionality that comes natively with MediaWiki. It follows very closely to my proposed solution.

I haven't given up on Atlassian just yet, and it seems that they prioritize their work based on votes in there feature tracking system. So, please!!! Go to their site and vote for the internationalized content feature!!! You can get there if you follow the link in my post, .

Harald Walker said...

Hello Gretchen,
One more from me. Did you have a look at xwiki? They seem to have exactly what we are asking for: Internationalization

Gretchen said...

Very nice, Harald! We are going to take a look at xWiki as an alternative, thanks for the heads up! I'll post our solution here, as soon as we get it all figured out! Please let me what you choose, and how it works out for you.

kind regards,

Anonymous said...
This comment has been removed by a blog administrator.