When I worked in the purely commercial software world, internationalization of a product was a critical, magical event that happened somewhere offsite in the hands of contracted translation companies. Having your software translated into as many different languages as possible is just as important in the open source arena. Arguably, open source projects NEED translations MORE to grow into a global presence. The PROCESS of translating software and documentation for an open source project is an entirely different experience from the commercial event depicted above, as I have learned over my last two years with Pentaho.
This topic is at the forefront of my writings today because I have been tasked with figuring out how to manage localizing the Pentaho documentation in a wiki that has no feature support for internationalization of content! Did I mention that Pentaho is moving all of their documentation to a wiki? Well, there it is. The cat's out of the bag. As of our 1.2 GA release, the documentation will be maintained, by community and development team, in a Confluence wiki.
Confluence is a great tool and has lots of integration points with our case tracking system, JIRA. That said, it seems that Atlassian (the company behind Confluence and JIRA) is a little behind in the internationalization game. Confluence only recently started to support language packs for translation of the wiki itself, and has no support for content translation.
So here we are, and I need to figure out a solution to satisfy three very important groups. The first group are the translators of our documentation. They are community members who contribute those translations to our projects. We need to set up the internationalization in the wiki so that it is easy for these folks to do the initial translation, and also have a mechanism for notifying them when the master language version of the doc has changed. The second group is the users of the documentation. It should be easy for me, if I am French, to find the French documentation, but also be able to peruse the English documentation. And the third group is the poor guys in house that have to maintain the organization of this documentation. When you consider we have over 10 projects, translated (so far) into 8 languages, that get a new revision of documentation for every version of the project released... well, that's alot to manage.
So my initial attempt at a design for this conglomeration that we need to support was to try to stuff all of the languages in the same document in the wiki using DIV type tags and such for separation. I'd then use some custom code to hide the other translations based on the user's language setting in the browser. This would make my translators happy initially, because they can do the initial translation almost inline with the master language version. My users would be happy because our wiki respects their browser's language settings, and if a particular piece of content hasn't been translated to the user's language of choice, we would default to the master language version. Of course, this solution does not address the translator notification of master language changes, and well, it would be a bit of a pain to determine whether it was an master language version change or a translation change, with all that content in the same document. Also, this only addresses translation of the content. What about the document titles? Since the navigation of a wiki (by definition) is based on document name, we have a big problem to solve there. And the largest point of failure in my grand plan is the fact that the merging capabilities are not so hot in our wiki of choice, so the translators would have to line up and take turns translating. Ick, in a word.
So the next path we steer down is that path that takes us to completely separate repositories (called Spaces in Confluence) for the different languages. This gives us autonomy for the purposes of editing and not having the language content intermingled, but at a pretty large synchronization and maintenance cost. We now need to figure out, do we populate the French repository with all of the English documentation, to assist our translators in translating? Well, then we have up to eight copies of the doc, that is changing realtime, and is sure to get out of sync. So, perhaps we should let the translators populate their language wiki from scratch, organically? This isn't very accommodating to our translators, and documents will surely be placed out of order with the master language wiki, making it confusing for the users.
Yikes. It's at this point, having discussed the plethora of less-than-stellar options with a few clever guys on our team, that I decided to step back and write to the community. In my mental gymnastics over this problem, I made many assumptions about what our users and translators really want.
For the translators in our community, have you worked at translating in a wiki before? What did you like about it? What did you hate? Is it easier for you to translate everything in your format of choice, or do you like the idea that once you translate it it would be available, without having to wait for the Pentaho team to publish it? Of the two scenarios I detailed above, which is the lesser of two evils for you?
And for all of the rest our community that must USE and update our documentation - would it be more frustrating for you to work with translations inline in the wiki (in edit mode only) or to have to go hunt around someplace else to find those translations?
And, of course, for any other open source project that holds the silver bullet to this problem - feel free to share your solution here!! Heck, I'd even take well intentioned guesses and good ideas :)
I recently stumbled across this problem with one of Pentaho's applications. When the application was downloaded and installed on a Mac, ...
If you are interested in the ultimate extendability of Pentaho's visualization layer, you'll love this fun holiday gift from Pentaho...
As part of my job at Pentaho, I occasionally get to help build solutions using the Pentaho BI platform and the solution building toolset, De...
We recently received a monster (in a great way!!) of a tech tip from Nic Guzaldo, a self-taught Pentaho expert and supporter. I spent a day ...
Last week, I had the pleasure of offering our very first 5-day session of the Pentaho Architect's Bootcamp training, and overall, it was...
Java code is the bread and butter of what I do, but as most Java developers know, there is a plethora of good frameworks and technologies th...
I've been gathering some interesting and useful information when dealing with Pentaho Reporting, Pentaho Metadata and characters not rep...
The Pentaho Reporting client toolset can look a bit confusing at first glance. We have the Pentaho Report Designer, and then we have the Pen...
I had the chance this week to play around with the still-under-construction Pentaho plugin architecture in the Citrus code line. The new ar...
MySQL has been one of the most popular databases amongst the Pentaho community. We receive questions and comments regularly about setting up...
- Gretchen Moran
- I have been in the business analytics and Java development space for fourteen years. I started out as a data warehouse developer for a small insurance company in the Midwest, worked for Hyperion Solutions as a software engineer for a number of years, and came in on the ground floor with the guys at Pentaho. I have a history of moving around, so I have enjoyed working with the community, consulting directly with Pentaho customers and partners and coding with the development team. I am now a core developer at Pentaho, working on various projects.