TransWikia.com

How to implement a website with translatable articles?

Software Engineering Asked on October 29, 2021

I am developing a website that is supposed to be in at least two languages. I am a skilled developer, but I never had to deal with internationalization.

The owner of the site will create a new article in a language (say English) and link it to a category. Then, a translator will translate that article into another language (say Spanish) and it will appear in the same category in the other language. It must be the same article translated! It is not like Wikipedia, where two versions of the site coexist and sometimes you can link to the other. I intend to implement a warning "translation missing for article blahblahblah", articles are single entities in multiple languages.

I had a few ideas so far, but I can’t decide which is best:

Main language

My first idea was to have a "main language". In the articles tables, I would have:

id SERIAL
title VARCHAR
content TEXT
author_id FK
category_id FK
created_at DATETIME

I would then have a translations table with:

lang VARCHAR(2)
title VARCHAR
content TEXT
article_id FK

The problem is that, sometimes, the articles are created in a "secondary language" first, and then translated into the "main language" (there are some community contributions involved). This is therefore not an option.

Only translations

My next approach was to have an articles table containing pretty much nothing:

id SERIAL
category_id FK
created_at DATETIME

And then to have translations:

lang VARCHAR(2)
title VARCHAR
content TEXT
author_id FK
article_id FK

Then, when I want to load the list of articles (or the content of a given one), a JOIN gives me content from both tables, based on the current language preferences of the viewer.

This looks promising but also highly unmaintainable (although it’s just my feeling).

Original language

Last idea was to have an articles table with all the content:

id SERIAL
lang varchar(2)
title VARCHAR
content TEXT
base_article FK
author_id FK
category_id FK
created_at DATETIME

The based_on field is a foreign key to another article. If it is not null, some content from the base article will be used (category, for instance). If I want to display all articles in a given language and in a given category, the query is not trivial: I have to find all articles that match the category and the language, plus those that have the correct language, a base_article not null, only if the base_article has the same category.

This is not so easy to implement and so very easy to screw up. I have to think about loops, redundancy and deletions of entries are a nightmare.

Traditional implementation

I couldn’t find any documentation about what is usually done. The versions up there are only presented to show what I’ve done so far, I’m not convinced it’s good. I believe I should simply do as others do (I’m certainly not the first one to face this wall).

How does the world translate websites?

One Answer

The question is not so much how the world does it, but how you should do it.

The world is full of websites that are designed with a main language in mind, and that later implemented some extensions to add additional languages (your option 1). For your problem, it's a feasible workaround. But it looks clumsy if you want to create foreign content first: you'd have articles with an empty content and a status "waiting for translation".

The next popular thing is the original version. This sounds appealing for managing the content, since there is always an original first and translations later. But when you extract the content to render language-specific outpout on the fly, you cannot just read one table, you always have to look for the article in original, check if the language matches, if not look for a translation. So it's conceptually nice in the DB, but painful in the code. You could make it easier to use, with an SQL UNION, but as the translation misses some infos, you'd need to do some SQL magic first, that would make it looking like the last option.

The last option we look into is IMHO the most adapted to your needs. It's not only translation, but only content:

  • on one side you need to identify, label, date, categorize content. This is your Article table. In order to make it less misleading, I'd just suggest to rename it ArticleReference . ANd since you need to know which is the original version, so that translaters all know which is the most reliable source, I'd also add a column original_language
  • on the other side, you have the language specific content. This is your Translation table. But to make it less misleading, I propose to rename it ArticleContent. By the way, I'd also add a column validated or completed in case translators would have to save a long content to go to lunch and finish or proof-read it later. The same feature makes sense also for the original article, where the author may need to interupt the content creation activity or even ask someone else to do the proofreading.

So, given your requirements, this last approach (i.e. the second int your list) would be just perfect, with a couple of minor changes to avoid confusions.

Answered by Christophe on October 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP