Quantcast
Channel: filetypes – multifarious
Viewing all 49 articles
Browse latest View live

Some you win… some you lose

$
0
0

When we released the new Trados 2021 last week I fully intended to make my first article, after the summary of the release notes, to be something based around the new appstore integration.  The number of issues we are seeing with this release are very low which is a good thing, but nonetheless I feel compelled to tackle one thing first that has come up a little in the forums.  It relates to some changes made to improve the product for the many.

Dedicated apps for working with WorldServer projects and legacy file types

This title in the release notes doesn’t completely address what was done here, so lets break it down starting with what was done.  The new SDL Trados Studio 2021 release removed the following broad components from Studio:

  • Trados 2007 Components
  • TMS Support for Synergy Packages
  • TTX, ITD and Workbench filters.
  • WorldServer Components
  • Translation Memory Migration Tool

The reasons for moving them were as follows:

  1. The codebase for these features is based on very old code
  2. This old codebase is heavy, and it’s difficult to maintain as Studio starts to take advantage of newer and better technologies
  3. The feedback and telemetry data that we receive suggests these features are actually used by a rather small number of people compared to the overall userbase
  4. Without this old code Studio can be lighter, easier to install and more responsive
  5. Continuing to improve Studio without this old code in the future will be far easier to manage allowing many of the improvements the majority of our users would like to see to become realised

So far the success of this move is compelling… at least in terms of the significant reduction in the number of users having problems installing and licensing the product.  The SDL support team are over the moon and we’re seeing things like this in various social platforms:

 

 

 

 

 

 

 

This last one is important because you’ll note I said “a small price to pay”.  This comment, from the SDL Community, related to users not knowing where some of the features I mentioned above had gone.  So, let’s see what was done to mitigate the loss.

The SDL AppStore applications

If you look at the AppStore integration and type “Pack” into the search you’ll see this:

Neither of these two applications are plugin installers (*.sdlplugin files), so when you click and download they’ll be placed into here:

You will be prompted to get them so don’t worry about not being able to find them, but you will need the extra steps to unzip and run the installers.  Nonetheless, a fairly trivial exercise.

These applications will restore the features that were removed from Studio, so if you did use them all is well… or is it?

What’s still missing?

Since releasing we have already seen that the main thing people were looking for were these three things:

  1. ability to open WorldServer Packages
  2. ability to use the Upgrade Translation Memory routines to merge SDLTM files (Studio Translation Memories)
  3. ability to add a TMX to your project and have it immediately upgraded to SDLTM through a “Quick Upgrade” routine

There are other things in the applications, but these are the ones people are missing so far.  The good thing is that the first two of these three are restored when you install the applications.  The bad thing, and this is what we’ve lost, is that the “Quick Upgrade” feature is gone.  But it’s not the end of the world because there are various solutions to work around this if you just want an SDLTM from the TMX.

Upgrade the TMX

To do this use the Upgrade Translation Memories button from the Translation Memories View:

You’ll find this after installing the StudioPowerPack2021_16.0.0.1798.exe you downloaded from the AppStore.  It’s probably further over to the right in your ribbon… I took some artistic liberty with the screenshot to make it more visible.  You could also add it to your Quick Access Toolbar if you need it often:

Use the Glossary Converter

The latest version of the Glossary Converter has some very neat new capabilities, and this includes the ability to create an SDLTM from any of the supported filetypes:

So…

  1. select the SdlTM (SDL Translation Memory) output format
  2. check the box to Use the selected output format for any input format

Then just drag and drop your TMX right in.  The beauty of this workaround is that the Glossary Converter is a standalone application so even if you’re in the middle of your work in Studio you can still run it without breaking a sweat.

Import your TMX

This isn’t really the same thing, but I’m mentioning it because someone is bound to tell me you can also import the TMX into an SDLTM.  Of course, this is possible in a number of ways and I’m sure everyone will know how to do that already.  But it’s just not a quick conversion of the TMX into it’s own SDLTM.  You’d have to create the empty SDLTM first and then import the TMX… surely faster to use one of the two workarounds above.

What about the future?

So that’s the bad news, we have won some and we’ve lost some.  All in all I think the benefits far outweigh the loss in this case for the majority of users.  The improved performance we are seeing in Studio is a massive win and allows the development team to build on what they have achieved with this release which has to be good for everyone.

Longer term I think we’ll see the development look at how they can bring back any features missing if we hear about this from our userbase.  But not using the older codebase that was a weight around their necks for years.

In the meantime I’m sure you’ll find it’s pretty easy to install these applications from the AppStore, and the workarounds although inconvenient are not the end of the world, and in my opinion they are a small price to pay for now so we can look forward to a brighter future.

One more thing…

If you happen to be looking for the *.ttx (TRADOStag… old Trados Workbench format), *.itd (SDL Edit… old SDLX format) or *.doc(x) (SDL Trados Translator’s Workbench… bilingual Word file format, also useful for Wordfast bilingual DOC(X)) then make sure you look here after installing the SDL Trados Compatibility and Migration Power Pack:

The post Some you win... some you lose first appeared on multifarious.


Psst… wanna know a few more things about file types?

$
0
0

I wrote under this title back in 2013 and provided a bit of information about the Word filetypes in Studio.  It was a pretty popular article and I always meant to circle back and do some more.  Seven is a lucky number so now we’re in 2020, seven years later, I thought I’d do it again… and it’s also just as long, so grab a coffee first!

The message I wanted to leave users with after that article is that it’s always worth opening the options from time to time and just exploring the filetype options, particularly for the filetypes you work with often.  You never know what you’ll find in there.  If you did this there is one thing you’d find that we have not been making a song and dance about, and that’s embedded content options.  What do I mean by this?  If you take a quick review of this article where I covered the handling of taggy Excel files you’ll know what I mean.  Embedded content (in the context of this article) is usually text, programming code, or markup, that has been added to your file in such a way that it will be extracted as translatable content, but you want to protect it from translation.  So you want to “tag it” which is an “affectionate” name given to protecting the code from being changed during the translation process and at the same time make it easier for the translator to work around it.

The way embedded content is handled

Today, in Trados Studio 2021, you might not have realised that over 60% of filetypes do have the ability to handle embedded content.  But they don’t all do it the same way or carry the same advantages, or disadvantages depending on what the content is.  In fact there seems to be two different ways this can be represented and quite a few different ways of representing where this feature is in each filetype.

To try and simplify this I’m going to split this article into three:

  1. handling embedded content using tag definition rules
  2. handling embedded content using the Embedded Content Processor Filetypes
  3. handling embedded content in other ways

I don’t intend to show every configuration in the options but I have identified which filetypes support each type below.

Just with tag definition rules

This approach is the same as described in the article on handling taggy files from Excel.  You’ll find it under some of the filetype options like this:

The use of this approach is quite simple.  You activate the embedded content processing by checking this box:

Telling Studio which parts of the document your rules should apply to:

And finally adding your rules:

The rules can be used to define placeholders or tag pairs, and for tag pairs you can define whether the text between them should be translatable or not.  So a reasonable amount of flexibility in here, but you do need to have some knowledge of regular expressions if you want to avoid having to create a lot of rules.

Affected file types

Which ones adopt this approach?  These ones:

  • Microsoft Word 2007-2019 (WordprocessingML v. 2)
  • Microsoft Word 97-2003 (DOC v 2.0.0.0)
  • Microsoft PowerPoint 2007-2019 (PresentationML v. 1)
  • Microsoft PowerPoint 97-2003 (PPT v 2.0.0.0)
  • Microsoft Excel 2007-2019 (SpreadsheetML v. 1)
  • Microsoft Excel 97-2003 (XLS v 3.0.0.0)
  • Bilingual Excel (Bilingual Excel v 1.0.0.0)
  • XLIFF (XLIFF 1.1-1.2 v 2.0.0.0)
  • XLIFF 2.0 (XLIFF 2.0 v 1.0.0.0)
  • Java Resources (Java Resources v 2.0.0.0)
  • Portable Object (PO files v 1.0.0.0)
  • Subtitle formats (Subtitles v 1.0.0.0)
  • XML: Author-it Compliant (XML: Author-it 1.2 v 1.0.0.0)
  • XML: Any XML (XML: Any v 1.2.0.0)
  • Text (Plain Text v 1.0.0.0)
  • Custom filetypes
    • Regular Expression Delimited Text (RegEx v 1.0.0.0)
    • XML (Legacy Embedded Content) (XML v 1.2.0.0)

Using an Embedded Content Processor Filetype

More recent versions of Studio have gradually introduced the concept of chaining filetypes together.  So you could open a file with one filetype, but handle the embedded content with another one.  A good example of where this is really appropriate is when the embedded content is HTML with a lot of HTML syntax to create rules for.  Using the HTML filetype as an embedded content processor means the markup is all handled perfectly and you don’t need to write any rules with regular expressions yourself.

This technique manifests itself in the filetypes in a few ways… so in the email filetype it’s like this under the “Common” node as opposed to “Embedded Content“:

In the XHTM 1.1 (2) filetype it’s under “Embedded Content” node:

But in all cases the basic idea is the same.  You select the filetype you wish to be used to process the embedded content from a drop down list similar to this:

By default you’ll see there are three options:

  1. Plain text
  2. Excel spreadsheet
  3. HTML

But you can have many more… as long as they are based on these defaults.  To explain what I mean by this let’s close up the nodes and look again at the options:

Underneath the “File Types” you’ll see “Embedded Content Processors” where you can select and copy them to create as many as you need, like this for example:

Each one of these can have different rules so that you can create your custom filetypes with unique rules to solve the particular problem you need for a particular situation.  NOTE: after creating new embedded content processors they won’t be visible in the filetype for selection until you close your settings and reopen them again.  But once done you see something like this:

Now I can select the two new ones I just created when I’m configuring the XHTML filetype I referred to earlier.

Whilst the choice of filetypes to chain is quite limited, I think these three probably cover the majority of cases you’re likely to come across.  Configuring them is simple:

  1. Plain text
    1. this is exactly the same as creating a regular expression based filetype.
    2. define the structure
    3. define the inline tags
  2. Excel spreadsheet
    1. exactly the same as described in the article on taggy Excel files
  3. HTML
    1. just refine the parser rules already provided to suit your usecase
    2. for most files you won’t need to do anything at all

I deliberately haven’t gone into a lot of detail here for two reasons.  The first is because I think by now you’ll have got the idea and probably don’t need to know anything else, and secondly because I have covered these principles in some detail before.  If you review this article on custom XML (scroll down to the sub-headings on embedded content processing) you should find what you need to know.  But if there are specific questions feel free to ask in the comments below or post into the SDL Community.

Affected file types

Which ones adopt this approach?  These ones:

  • Email (EMAIL v 1.0.0.0)
  • XHTML 1.1 (2) (XHTML 1.1 v 2.0.0.0)
  • XHTML 1.1 (XHTML 1.1 v 1.2.0.0)
  • HTML 5 (Html 5 2.0.0.0)
  • JSON (JSON v 1.0.0.0)
  • YAML (YAML v 1.0.0.0)
  • Markdown (Markdown v 1.0.0.0)
  • XML 2: Microsoft .NET Resources (XML: RESX v 2.0.0.0)
  • XML: Microsoft .NET Resources (XML: RESX v 1.2.0.0)
  • XML 2: OASIS DITA 1.3 Compliant (XML: DITA 1.3 v 2.0.0.0)
  • XML: OASIS DITA 1.3 Compliant (XML: DITA 1.2 v 1.2.0.0)
  • XML 2: OASIS DocBook 4.5 Compliant (XML: DocBook 4.5 v 2.0.0.0)
  • XML: OASIS DocBook 4.5 Compliant (DocBook 4.5 v 1.2.0.0)
  • XML 2: Author-IT Compliant (XML: Author-IT 1.2 v 2.0.0.0)
  • XML 2: MadCap Compliant (XML: MadCap 1.2 v 2.0.0.0)
  • XML: MadCap Compliant (XML: MadCap 1.2 v 1.0.0.0)
  • XML 2: W3C ITS Compliant (XML: ITS 1.0 v 2.0.0.0)
  • Custom filetypes
    • HTML 5 (Html File v 2.0.0.0)
    • HTML 4 (Html File v 2.0.0.0)
    • XML 2 (XML v 2.0.0.0)
    • XML (Embedded Content) (XML v 1.3.0.0)

A few gotchas…

Scope and specifications

Sometimes there is an embedded content option, but it can be restrictive in terms of the coverage offered.  So you may need to do a little more investigative work to figure out why, if you can’t immediately see the reason, your non-translatable text is not being protected.  A good example would be Markdown files (*.md).

Here the only embedded content that can be processed is within code blocks and html blocks.  So if you are trying to handle embedded content in a Markdown file and it’s not working you first need to check whether the content has been written inside one of these objects?

    • If not then you have your answer… you need to handle the content some other way.
    • if it has then you need to make sure that the Studio Markdown filetype understands these objects in the same way you do.

What do I mean by understanding the objects?  The specification for Markdown is a little loose, and whilst I don’t believe we have documented this anywhere I think the one we follow would be this:

https://spec.commonmark.org/

The rules for code blocks ( indented code blocks and fenced code blocks) and html blocks are quite well described in here.  However, there is other documentation, equally valid:

https://www.markdownguide.org/basic-syntax/#code-blocks

But the problem here is that the rules in this other documentation are not as comprehensive and it’s very easy to create Markdown code that may work for the application intended, but Studio won’t see it that way.  I saw a good example of this a week ago where the code block, an indented code block, worked fine where it was used.  But because there wasn’t a blank line after the paragraph preceding it Studio didn’t see it that way and so the embedded content processor could not pick up the non-translatable elements.

This same “gotcha” can apply in other areas.  The message I’m trying to get across being you need to investigate in more than simply Studio before concluding the embedded content processing just doesn’t work.

Document Structure

Another common problem I have come across when users try to use these features is the use of Document Structure.  Studio uses the Document Structure to improve accurate leverage from your translation memory by adding context information to each translation unit you save.  Take this simple example where I added some markup (<strong>multifarious</strong>) into a word document:

If I open this in Studio I see this in the Document Structure column:

The right-hand column tells us what the Document Structure is… in this case a paragraph (P), list-items(LI) and table cells (TC+).  The plus symbol after TC tells us that there is actually more structure associated to this one which I can see by clicking on it:

The reason this is all important is because if I create a rule for this markup I have to specify the relevant Document Structure.  If I just create a rule for the paragraph then I get this:

Only the segment with the Paragraph Document Structure is tagged.  To get them all you need to add each applicable reference for the Document Structure:

When you do that the file will look like this:

This creates two emotions in me… first of all one of relief because now I know why my embedded content processor didn’t work!  And then a second emotion which is less positive because I don’t understand why there isn’t a catch all structure to ensure my rules apply globally and not just on specific parts of my file.  I like that this granularity is possible because it does lend itself to more complex scenarios where you might only want to tag content in a specific set of circumstances (unlikely in my opinion… but lets be generous and enjoy the sophistication), however, let’s do something simple for the majority of use cases!  I have tried testing against body and section which appear to be structural items in the underlying XML but these have no effect.

So vote for this idea!  Clearly we thought it would be a good idea and it was added into the filetype options for Trados Live… so some consistency and parity across the tools would be good!

Filetype options versus Project settings

Ah yes… that old chestnut!  Make sure, especially if you’re new to Trados Studio that you know where you are checking the settings you created.  If you have already created your project then you cannot alter the way the text has been extracted and will need to create the project again.  It’s worth reading this great article from Jerzy Czopik… Tea and Settings!

What about the ones that don’t?

If you have a need to handle embedded content in files that don’t support anything I’ve covered before then you have two options at least… translators and localization engineers have an unsurpassed ability to invent the most amazing solutions when needed so I’ll just cover the basics here:

  1. address the content in the source file
  2. use a plugin from the SDL AppStore

Address the content in the source file

One way you may be able to tackle this is through the use of non-translatable styles.  By applying a specific style for content that should not be translated in the source file you might be able to use them in the file type settings to exclude the content from translation.  For example:

This method isn’t always going to be helpful because you don’t have a lot of control over how the content should be handled.  It’s all converted to structure tags.  But if you only need to completely exclude blocks of content from translation then this can be a very effective and simple way to do it.

Use a plugin from the SDL AppStore

The best way to manage it (in my opinion), if you do need a little more control, is to address this after the project has already been created.  Coupled with the improved filter capabilities in Studio this approach can be very effective.  Worth noting that it’s probably not unusual to receive project packages where the person who created them had a very limited knowledge of how to work with filetypes and non-translatable content.  So being able to address this after the project has been prepared is very useful indeed!

There are two applications freely available on the SDL AppStore to help with this:

  1. CleanUp Tasks
  2. SDL Data Protection Suite

My preference is for the Data Protection Suite simply because I think it’s a more robust and easier solution to use.  But CleanUp Tasks does offer quite a few interesting possibilities including being able to work with tag pairs which isn’t possible using the Data Protection Suite.  I don’t intend to cover these applications in this article… it’s already longer than I originally intended (my apologies, and thanks, if you’ve made it this far!)… so if you have any specific questions feel free to ask in the comments below or post into the SDL Community.  I’d also be interested if there is anything related to the use of filetypes, or embedded content that you think could do with a separate article to clarify the details.

Affected file types

Which ones don’t handle embedded content using the methods above at all?  These ones:

  • SDL XLIFF (SDL XLIFF 1.0 v 1.0.0.0)
  • TRADOStag (TTX 2.0 v 2.0.0.0)
  • SDL Edit (ITD v 1.0.0.0)
  • SDL Trados Translator’s Workbench (Bilingual Workbench 1.0.0.0)
  • Rich Text Format (RTF) (RTF v 2.0.0.0)
  • Microsoft Visio (Visio v 1.0.0.0)
  • Adobe FrameMaker 8-2020 MIF V2 (FrameMaker v 10.0.0)
  • Adobe FrameMaker 8-2020 MIF (FrameMaker 8.0 v 2.0.0.0)
  • Adobe InDesign CS2-CS4 INX (Inx 1.0.0.0)
  • Adobe InDesign CS4-CC IDML (IDML v 1.0.0.0)
  • Adobe InCopy CS4-CC ICML (ICML Filter 1.0.0.0)
  • Adobe Photoshop (Photoshop v 1.0.0.0)
  • OpenDocument Text Document (ODT) (Odt 1.0.0.0)
  • OpenDocument Presentation (ODP) (Odp 1.0.0.0)
  • OpenDocument Spreadsheet (ODS) (Ods 1.0.0.0)
  • QuarkXPress Export (QuarkXPress v 2.0.0.0)
  • XLIFF: Kilgray MemoQ (MemoQ v 1.0.0.0)
  • PDF (PDF v 3.0.0.0)
  • Comma Delimited Text (CSV) (CSV v 2.0.0.0)
  • Tab Delimited Text (Tab Delimited v 2.0.0.0)
  • XML: W3C ITS Compliant (XML: ITS 1.0 v 1.2.0.0)
  • XML 2: Any XML (XML: Any v 2.0.0.0)
  • Custom filetypes
    • Simple Delimited Text (Delimited Text v 2.0.0.0)

Conclusion

Like with so many of the articles I write I find that the more I start looking into a topic, the more there is to talk about and it’s really hard knowing where to stop.  Certainly the labyrinth of Studio settings and file types can leave many users viewing it as a bit of a Pandora’s box.  This is quite unfortunate because the best way to learn about the capabilities of Trados Studio is to explore these things.  Just take a little bit at a time, and if you don’t understand something ask about it in the SDL Community.  Discussions around these sort of things are always really welcome… it’s not just a place to go when you have a problem!  And if you do this I can guarantee you’ll find your ability to work with any tool will be significantly improved.

Introducing the multilingual XML… super filetype!

$
0
0

I was compelled to make a return to a previous theme around Marvel Comics because it’s the only way I can do justice to the amazing work the RWS AppStore team carry out on a daily basis.  There are some things you just can’t wait to get up in the morning for, and for me, one of these things is being able to work with this team on a daily basis.  The first meeting of every day for me is with this team and what a fantastic way to start the day it is!  But as you’ll see the hero of this story is probably a Honey Badger!

The API (Application Programming Interface) documentation (recently updated here – https://developers.rws.com/), used by developers to help them create the knots that tie their solutions to the RWS products, contain a number of simple examples which can be used as a starting block.  One of these is something I’ve been waiting to see turned into a proper app suitable for anyone to use, and to date I’m only aware of a couple of developers who did this for their own use.  One developer released a bilingual XML filetype onto the appstore by mistake some years ago and I even wrote about it… but then had to remove the article and the app a few days later when they realised they’d made it public in error!  But now, I’m really happy to say we’ve made the time to address this for more general consumption via the appstore.  The solution the appstore team came up with is spectacular and worth waiting for, but it’s also simple and incredibly useful!

Bilingual XML Filetype

What am I talking about?  A bilingual XML filetype, and the relevant API documentation for anyone who is interested is the Filetype Support Framework and as you can see this does already contain an example of how to build a bilingual XML filetype.  But before I go any further, what exactly do I mean by a bilingual XML Filetype and why would it be useful?  Well, here’s an example of the sort of file I see from time to time… or rather a fabricated file containing some of the trickiest things to deal with:

This file is actually an example of a multilingual XML file supporting translations in multiple languages, but the problems of how to handle it in Studio are relevant.  In this small example of only four segments in Studio we have the following issues to deal with:

  1. The file is not monolingual and we have to be able to read one element and write the translation into another
  2. The file is partially translated so the workaround of using regex to copy source from the source elements into the target elements in the source file is not appropriate
  3. There are html tags and CDATA within the translatable text
  4. There are also non-translatable placeholders in the text.  So {0} and {1} for example
  5. The language codes are not anything Trados Studio can recognise

But wait… this isn’t just bilingual, it’s a

Multilingual XML Filetype

So there are two more problems!

  1. ideally we should be able to create a multilingual project from this single source file
  2. when we have completed the project we need to be able to rebuild the single multilingual XML target file as opposed to having multiple files, one for each target language

I started off by saying this was so simple, but in fact it’s not!  The problems that anyone having to deal with when faced with a file like this, especially if they are the project manager having to handle all the target languages, are not trivial.  But despite this the appstore team have managed to create a solution that pretty much does address these problems by taking the Honey Badger approach!  This filetype doesn’t give a … hoot about standards, or specifications.  Files have to well formed, but after that anything goes.  The app is designed with just being able to get the translatable content out and do the work!  After all, that’s what you get paid for!!

How does it work?

You can find a fairly detailed explanation of how to work with this filetype here in the RWS Community, including a video from the RWS Autumn Roadshow where the filetype was first introduced.  The app wasn’t completely finished at the time and there were still a few things we wanted to complete before releasing, but it is now available in the appstore and ready for use!

The user interface

The basic idea is you need to tell the filetype a few things:

  1. what’s the file extension (xml, xliff, tmx etc.)?
  2. where should the different translations be in the file?
  3. what languages should the translations be in?
  4. should an embedded content processor be used?
  5. do you have to handle placeables using regex because an embedded content processor won’t pick them up?
  6. do you need to handle entity conversions
  7. do you want to provide support for any quick inserts?

So probably all sound fairly familiar apart from 2. and 5.

2. where should the different translations be in the file?

Trados Studio can only handle pre-defined bilingual files such as XLIFF, TTX, ITD and SDLXLFF for example.  It cannot handle multilingual filetypes at all (unless you are just extracting a single language to work on with a custom XML filetype), and it certainly can’t support the creation of a multilingual project that makes proper use of all the languages in the file.  One of the reasons for this is that files like this could have been prepared with whatever structure the developer felt most appropriate to use for their own purposes.  So the interface needs to reflect this.  For example:

In this example I am managing a TMX file for translation with 25 languages.  One file to create a single multilingual project.  The Language Mapping interface has two parts:

  1. Languages Root
  2. Languages

In the root I need to specify where in the file the languages can be located.  So I do this using an absolute XPath query (an XML technology I have discussed before in case you’re new to this).  For a TMX which looks something like this:

The Languages Root XPath query would therefore be:

/tmx/body/tu

The languages are all contained within the //tuv/seg elements and defined by the use of an xml:lang attribute.  We can use this attribute to tell the app where each language goes.  So using this same example we have these relative XPath queries:

English
tuv[@xml:lang=’EN’]/seg

Bulgarian
tuv[@xml:lang=’BG’]/seg

And so on for all 25 languages in my file.  Incidentally if you’d like a good explanation of absolute and relative XPath queries, as well as an introduction to working with XPath then this W3 Schools is a good place to start.

Once you start to work with this filetype you’ll see how logical and well thought out this interface is.  Every file is likely to be different and this provides the flexibility to handle them.

5. do you have to handle placeables using regex because an embedded content processor won’t pick them up?

This is something I expect every Trados Studio user working with embedded content in their files will be wishing was available in all the filetypes.  Frankly I have no clue why it isn’t!  If you don’t know what I mean then take the elements in this file for example:

Some years ago Trados Studio introduced the ability (for some filetypes) to handle CDATA sections using an embedded content processor, such as the html filetype for example.  This was great and it significantly cuts down the work involved in creating regular expression rules for files containing content like this which was the process before this feature was introduced.  However, you still get CDATA that not only contains html, but it also contains placeables.  You are then forced to look for workarounds (Data Protection Suite or Clean up tasks for example) or just manually handle them while translating.  This is sub-optimal.  So in the multilingual XML filetype the developer added some settings to allow you to tag up any content you like using regular expressions in addition to the use of embedded content processors.

There are some default rules to give you a head start and an idea of how to use this, but you can create as many as you need.  An important point to note is that you create the expressions to suit your content.  If one of the defaults works for you then that’s great… they do cover some common scenarios… but they are not intended to be the answer for all placeables!

Such a simple solution though… should be available for every filetype in Trados Studio!

The batch tasks

It’s just a filetype so why do we need batch tasks?  It seems so far that everything in this article has two reasons… and this question is no exception!

  1. we need to be able to import the translations for each language in the project if the file is partially translated
  2. we need to be able to put the fully translated multilingual XML file back together again when the project is complete

When you install the plugin you will also find you have two new batch tasks:

  • Import Multilingual Translations, and
  • Generate Multilingual Translations

If you have the Freelance version of Trados Studio then you will have to run these batch tasks manually after creating your projects in Trados Studio.

Import Multilingual Translations

The “Import Multilingual Translations” would be run after the project is created.  The options on this task are straightforward:

You can run the task after pre-translating from your TM as part of your normal project creation process because the options allow you to overwrite any existing translations if they are already approved or preferred for example.  You can also set the “Origin System” and the segment status in Trados Studio to be used after import (Draft, Translated, Approved etc.), and you can also exclude segments from being updated based on a wider range of one or more selection criteria:

  • properties
    • locked
  • status
    • Draft, Translated, Approved etc.
  • type of match
    • Perfect Match, Context Match, Exact Match, Machine Translation etc.

So a decent amount of flexibility around whether you would prefer to use work already done with other resources or take the translations provided in the imported file.

Generate Target Translations

The “Generate Target Translations”  batch task is needed to pull the final target file together.  Why?  Well, Trados Studio is a tool based on working with bilingual content created from either bilingual or monolingual source files.  Studio will create an SDLXLIFF file for each language pair and will recreate the target file with the translated content inserted into the right place for each one.  So if you have a multilingual file with 25 languages in it (one of them being the source) you will end up with 24 target files, one for each target language.  You now have to put all of these together into one file to be able to provide the fully translated multilingual file back to your customer.  That can be quite a task!

So this batch task does it for you.  It will create target files in each language folder containing only the translations for that language AND it will create a single file in a new folder called “Multilingual” which contains the single multilingual file with all the translations for your customer.  I don’t know if you’ve ever tried to do this before?  It is possible of course and you may, if you are an experienced user or localization engineer, have created scripts or processes to do this.  But it’s not simple and some files can be incredibly difficult to handle.  So for me this task is a stroke of genius 🙂

Professional Version of Trados Studio

If you have the professional version of Trados Studio  then of course you can create custom tasks.  So for example, I have one that does this:

When I create a multilingual project with this template I only do three things:

  • convert to translatable format
  • copy to target languages
  • import multilingual translations

So the project creation process is quick and I don’t need to run the batch task to import the translations afterwards.  A nice feature in the professional version.

I know we often share the “secret code” for this sort of customisation so Freelance users who want to have this, and are prepared to manually edit their project templates can achieve a similar level of automation albeit with a workaround every time they want to use it somewhere new… so here’s what you need for the example above:

“notsoSecret” code
  <InitialTaskTemplate Description="Used for multilingual projects using the new multilingual xml filetype" Name="multilingual" Id="70d9843e-78f7-463f-a4a2-785ec9622659">
    <SubTaskTemplates>
      <SubTaskTemplate TaskTemplateId="Sdl.ProjectApi.AutomaticTasks.Conversion" />
      <SubTaskTemplate TaskTemplateId="Sdl.ProjectApi.AutomaticTasks.Split" />
      <SubTaskTemplate TaskTemplateId="MultilingualXMLFileType_ImportBatchTask_Id" />
    </SubTaskTemplates>
  </InitialTaskTemplate>

You might only want to insert the “<SubTaskTemplate TaskTemplateId=”MultilingualXMLFileType_ImportBatchTask_Id” />” into an existing template, but now you know what to use!

A Preview

I should also mention the preview.  You can’t create a preview that mirrors whatever the finished translations will look like because we don’t know what this is from a flat XML file.  We could create a preview showing all the other languages so you might get some inspiration if the file is partially translated.  But we questioned the value in that too.  So in the end we went for showing the XML itself, and where the translation you are working on sits in the file… so you get a preview like this for example:

If something else would be preferred we are always happy to look at the suggestion.

Interesting use cases

If you take a look at the video in this wiki you’ll see various file usecases for this filetype such as:

  • the really common crappy XLIFF created by many non-localization friendly tools such as WordPress as they abuse the CDATA concept in XLIFF making content very difficult to handle
  • invalid XLIFF with incorrect language codes, non-recognised elements or attributes… all things which Trados Studio won’t like because it adheres to the XLIFF specification and expects nothing less.  The multilingual XML filetype is a bit like the Honey Badger in this respect as it doesn’t give a …. care less about standards or specifications!  As long as the file is well formed it’ll allow you to handle the translation which is all you really want!
    • a good example of this would be here in the RWS Community and I think this may be the first real life completed project using this new filetype!
  • .. and more

But I thought it would be interesting to tackle a more off-beat usecase that came up in the RWS Community a week or so ago from a user looking for a solution to handling a bilingual requirement inside a Word file.  The user doesn’t seem to be too interested anymore as he never responded, but I was.  It was the perfect opportunity to try something quite complicated with this new Multilingual XML filetype!  To summarise the problem was how to handle a Word document that looked like this:

What makes this tricky is three things:

  1. only content in the table cells need to be translated
  2. the source is in the second column of each table and the target needs to be placed into the third column
  3. if the cell in the third column s shaded grey then that particular row should not be translated at all

At first glance, this is something that looks like so much work (especially if the file is large) that it’s probably easier to translate in the Word file itself.  However… we have the Multilingual XML filetype!  So we could do this:

  1. unzip the docx file
    1. a docx file for anyone who didn’t know this is actually a zipped set of files and folders.  So if you add a .zip extension to the file name you can unzip it and get at the files inside
  2. Inside the files you’ll find something like this:

    And inside the “word” folder this:

    Gets exciting when we see the xml extension coming up 😉
  3. The document.xml in my file contains all the content I might need to translate in this file.
  4. translate the XML in Trados Studio
  5. Save the target file
  6. Put it back into the unzipped docx file and zip it up again.

Simple really…. it it wasn’t for the three tricky things I mentioned above.  The document.xml file in a docx is quite complicated.  It has 19 namespaces and a lot of structure:

I won’t lie to you… I did find it tricky to get to the bottom of what I needed here and actually built a simplified version just to get the XPath right first:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document>
	<body>
		<tbl>
			<tr>
				<tc>
					<p>
						<r>
							<t>Expert</t>
						</r>
					</p>
				</tc>
				<tc>
					<p>
						<r>
							<t>Problem Description</t>
						</r>
					</p>
				</tc>
				<tc>
					<tcPr>
						<shd fill="A6A6A6" />
					</tcPr>
					<p>
						<r>
							<t>GREY IN HERE</t>
						</r>
					</p>
				</tc>
			</tr>
		</tbl>																	
	</body>
</document>

So I removed the namespaces and all the stuff in the file apart from the main paths to the information I needed.  This allowed me to configure the filetype like this… putting the appropriate namespaces back in which is this case is “w”:

Absolute XPath to the “Language root”:
/w:document/w:body/w:tbl/w:tr[not(w:tc[3]/w:tcPr/w:shd/@w:fill=’A6A6A6′)]

So I’m telling the app that the part of the document where the source and target languages will be is in the tr element.  But not when the colour of the cell in the third column of the table row is grey.  The colour is held in the fill attribute of the shd element.  So this path will only filter out the table rows where the third column doesn’t contain a grey cell.

Then I just need Relative XPath expressions for each language.  In this case:

Source:
w:tc[2]/w:p/w:r/w:t

Target:
w:tc[3]/w:p/w:r/w:t

So I’m just pulling out the text from the second and third table columns to insert onto my project.  This gets me the following in Trados Studio with the one cell already pre-translated as this was in the Word file already:

Pretty sweet!  If the file was huge this will have saved me one hell of a lot of work.  I can now translate the file (machine translation!!):

I run the “Generate Multilingual Translations” batch task and put the target file back into my unzipped Word folder to replace the document.xml that was there before and Bob’s your Uncle!

And finally, just in case it’s easer to follow, I created a video of the whole process from start to finish.  Hopefully it’ll show you how well the filetype works as well as how to work through the steps I’ve been talking about above:

Length: 10 mins 33 seconds

I actually had a lot of fun doing this, and the exercise proved a useful test case for the developer because we discovered the logic we used for handling namespaces was inadequate for a file like this.  So you’ll see a new version of the filetype was released on the 1st December (today) to accommodate the fix.  Just another advantage of having filetypes as apps rather than in the core product… they can be fixed from one day to the next and you don’t have to wait for a release of the core product to enjoy the benefits!

All thanks to the genius of the RWS AppStore Honey Badgers!!

Styling with ChatGPT!

$
0
0

Digital art: A cartoon image depicting ChatGPT as a technology style guru.Continuing the theme of how to make use of AI technologies to help with the more technical nature of localization I thought I could revisit an article I wrote back in 2013… this month a decade ago!  In that article I explained how to write a very basic stylesheet that could be used to provide more context when translating XML files.  To do that I had to learn some basics myself and that did give me enough of a skillset to pretty much create stylesheets for all kinds of basic html table based previews that I come across… but I can never claim to be an expert and if the styling or the XML was more complex I might not be able to do it at all.

Another really interesting usecase for this type of technology (ChatGPT today but I expect we’ll see more over the course of this year) in a localization context is the creation of a stylesheet without needing to be too technical.  So if I take the example XML in the earlier article I can ask ChatGPT this:

 So in simple English:

I have an XML file that contains the text for a poem in the text element and an explanation of the text in a comment element. The title of the poem is in the title element and a link to the wiki article where the explanation is provided is in the Ref element. I want to create a stylesheet to render this XML as an HTML table. What is the best way to structure my XSLT template to achieve this?
An example of the xml file is this:

Followed by a short example of the XML file making sure I captured enough of the file to give it a representative sample.

In seconds it returns the code which looks like this… I’m going to omit the code for the rest of this article but I just wanted you to see what’s provided:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="html" encoding="UTF-8"/>

<xsl:template match="/">
<html>
<head>
<style>
table {
border-collapse: collapse;
width: 100%;
}
th, td {
text-align: left;
padding: 8px;
border-bottom: 1px solid #ddd;
}
th {
background-color: #333;
color: white;
}
</style>
</head>
<body>
<table>
<tr>
<th>Title</th>
<td><xsl:value-of select="POEM/Title"/></td>
</tr>
<tr>
<th>Reference</th>
<td><a href="{POEM/Ref}"><xsl:value-of select="POEM/Ref"/></a></td>
</tr>
</table>
<br/>
<table>
<tr>
<th>Line</th>
<th>Explanation</th>
</tr>
<xsl:apply-templates select="POEM/Line"/>
</table>
</body>
</html>
</xsl:template>

<xsl:template match="Line">
<tr>
<td><xsl:value-of select="text"/></td>
<td><xsl:value-of select="comment"/></td>
</tr>
</xsl:template>

</xsl:stylesheet>

It also provides a help explanation which I really like:

In this XSLT template, the HTML table is generated using the table, th and td tags. The table rows for title and reference are created first, followed by a table for each line in the poem, with each row containing the text of the line and its explanation. The apply-templates function is used to loop through each Line element in the XML and generate a row in the HTML table for each one using the Line template. Note that the a tag is used to create a link to the reference URL provided in the XML.

All I do now is copy/paste this code into a txt editor (like Notepad++) and save it with some name and the extension .xsl.  Then I can add this to the “Preview” in my filetype settings:

If I now open the fie in Studio and preview it I see this:

If I don’t like that, although it’s nice and clear and even uses the link as a working clickable link then I can try something else:

Can you rewrite this so the style is more pleasing to a cultured reader of poetry?

The result a few seconds later

Certainly, here’s an example of a possible stylesheet that presents the poem with a more artistic style:

No doubt I could spend some time refining it… or rather ask ChatGPT to refine it!  The wonderful thing about this is it never gets tired of my questions!!

But that was easy!!

So let’s do another one, but perhaps something more likely than a poem by Lord Tennyson.  Let’s take a sample file I found here and edited slightly to remove a lot of additional info that just took up a lot of space for this article.  The content looks like this:

This file has some interesting things in it, such as references to images in a url provided by the creator of the XML.  I’d love to have these images rendered in my preview.  I only need to translate some of the elements so I set up my parser rules like this:

But in my preview I want a lot more to help me.  So, I use this question to ChatGPT:

I have an xml file containing the details for a product catalogue. Create a stylesheet to display in an html table the following elements with all content left aligned:

In row 1:
Brand_Name: Bold and 2 px larger than the rest of the text
Product_Name

In row 2:
Long_Description: in italics

In row 3:
Category
SubCategory

In row 4:
Product_Group
Thumb_URL: the image found at the url in this element should be rendered by the html

In row 5:
SKU: smaller text added as a reference for lookup only

An small example of this XML file is below. But note that the stylesheet should account for hundred of groups within the product elements:

Followed by a short example of the XML file making sure I captured enough of the file to give it a representative sample… in this case I actually used the first two products as t seems to help ChatGPT if you provide more than one.

My “efforts” are rewarded with a stylesheet (I won’t include the whole thing this time but you can find the files I used at the end of the article):

Which I add into my settings and my preview looks like this:

Wow!  I’ve played with this a while now but I’m still astonished at how easy this is!  I put all the files I played with in a zip here should anyone want to have a go… I also included my filetype settings (that will include a stylesheet)… note these are all for Trados Studio 2022.  But I’d really recommend you sign up for ChatGPT and have a go yourselves!

Conclusion

One of the complaints we often hear about Trados Studio is it’s complicated. Frankly I don’t think it is… in fact having played with most translation tools over the years I actually think Trados Studio is the easiest and most logical to work with.  However, the reason it’s a market leader is because it still offers capabilities unmatched where it counts for sophisticated/technical users and filetype capabilities are one of them.  The problem for your average translator who ventures into the world normally frequented by localization engineers and people who like the technical aspects is that they often don’t have the technical skills to really take advantage of them… and in most cases I’d hazard a guess they don’t want to either!  The creation of a stylesheet to support the translation process with a helpful preview is one of these things I reckon most users wouldn’t even attempt.  Well now you can!!  Using AI to support you by helping with the more technical tasks while you concentrate on the translation is something you must take advantage of.  If you don’t then you’re missing a trick!

… and if anyone is wondering about the DALL·E 2 image I used at the beginning of this article, I created it using this as a prompt “Digital art: A cartoon image depicting ChatGPT as a technology style guru.“.  The results were not all great, but I kind of liked that one. In a way it reminded me of Romulus who helped me get the whole concept of the appstore moving faster and in the right direction to where we see it today.  Less hair in the image but he was definitely a technology style guru!

Never get lost with ChatGPT…

$
0
0

As I’m getting lost in my own thoughts around just what to talk about next with regard to AI technologies and in particular ChatGPT… and as I’m pondering about the effect this is going to have on our industry I recalled a couple of questions around the use of XPath in the community.  One of these questions was yesterday and it related to how to use XPath to extract one of the languages in a TMX file using the XML filetype in Trados Studio.  Not a particularly tricky thing to do, and I imagined the user was just editing the content or maybe changing the language pair by translating one of the languages into something else, or something like that.  But what struck me was the XPath expression he used.

It was quite a sophisticated use of XPath, and definitely used syntax I wouldn’t have used for this simple task:

//seg[contains(ancestor-or-self::*/@xml:lang,’en-US’)]

I would have just used this:

//tuv[@xml:lang=’en-US’]/seg

I tend to think simple first… but really liked the expression he came up with.  So today this got me thinking about how well ChatGPT could be used to help with stuff like this, especially since there are always multiple ways to tackle things like this.  So I asked this question in the ChatGPT interface:

Create five different XPath expressions to select the content of the seg element when the xml:lang attribute in the seg element is equal to en-US. An example XML file is this:

<?xml version=”1.0″ encoding=”utf-8″?>
<tmx version=”1.4″>
<header creationtool=”SDL Language Platform” creationtoolversion=”8.1″ o-tmf=”SDL TM8 Format” datatype=”xml” segtype=”sentence” adminlang=”en-US” srclang=”en-US” creationdate=”20230203T142912Z” creationid=”GLOBAL\pfilkin”>
<prop type=”x-Recognizers”>RecognizeAll</prop>
<prop type=”x-IncludesContextContent”>True</prop>
<prop type=”x-TMName”>en-ro (tmx project)</prop>
<prop type=”x-TokenizerFlags”>DefaultFlags</prop>
<prop type=”x-WordCountFlags”>DefaultFlags</prop>
</header>
<body>
<tu creationdate=”20230203T141938Z” creationid=”GLOBAL\pfilkin” changedate=”20230203T142257Z” changeid=”GLOBAL\pfilkin” lastusagedate=”20230203T142257Z”>
<prop type=”x-LastUsedBy”>GLOBAL\pfilkin</prop>
<prop type=”x-Context”>0, 0</prop>
<prop type=”x-Origin”>TM</prop>
<prop type=”x-ConfirmationLevel”>Translated</prop>
<tuv xml:lang=”en-US”>
<seg>Official Journal L 333/2022</seg>
</tuv>
</tu>

I actually gave it two TUs and not just the one above, I just shortened it here for brevity.  The answers a couple of seconds later were these:

  1. Using a predicate:
    //seg[@xml:lang=’en-US’]/text()
    This expression selects all seg elements that have an xml:lang attribute equal to ‘en-US’, and then returns the text content of the selected seg elements.

  2. Using the descendant:: axis:
    //tuv[@xml:lang=’en-US’]/descendant::seg/text()
    This expression selects all tuv elements that have an xml:lang attribute equal to ‘en-US’, and then selects all descendant seg elements of the selected tuv elements. Finally, it returns the text content of the selected seg elements.

  3. Using the ancestor:: axis:
    //seg[ancestor::tuv[@xml:lang=’en-US’]]/text()
    This expression selects all seg elements that have an ancestor tuv element with an xml:lang attribute equal to ‘en-US’, and then returns the text content of the selected seg elements.

  4. Using the self:: axis:
    //tuv[@xml:lang=’en-US’]/self::*/seg/text()
    This expression selects all tuv elements that have an xml:lang attribute equal to ‘en-US’, and then selects the seg elements that are children of the selected tuv elements. Finally, it returns the text content of the selected seg elements.

  5. Using the following-sibling:: axis:
    //tuv[@xml:lang=’en-US’]/following-sibling::tuv[1]/seg/text()
    This expression selects all tuv elements that have an xml:lang attribute equal to ‘en-US’, and then selects the seg element that is the first following sibling of each selected tuv element. Finally, it returns the text content of the selected seg elements. Note that this expression assumes that there is only one tuv element with the xml:lang attribute equal to ‘en-US’ in each tu element. If there are multiple such tuv elements, only the first one will be selected.

Pretty smart huh?  Well almost… the first one is actually wrong as it doesn’t return anything because it’s expecting the xml:lang attribute to be in the seg element when it’s not.  But the other four are all correct and provide a very nice lesson in how to use XPath to create a parser rule for this task.

But this brings me onto a very cool feature of this technology and why it’s called ChatGPT.  You’ll only really see this properly if I share the “conversation”:

It took two more goes to correct itself, which is quite interesting given this was the easiest of all the expressions it attempted.  I’m not sure why it struggles with this, but being able to go backwards and forwards in this way is very helpful.  I even find myself wanting to be pleasant towards it as if I was talking to someone I know who is helping me out.

So of course I can also reciprocate…

Which of course is the same as the one our friend in the community forum came up with… so interesting that it never found the simpler solution.  I think it’s probably because of my prompt… or original question since I didn’t mention the tuv element at all.  The answers it provides are absolutely dependent on how useful the question is in providing the right context.

This is something we’ve been finding as we have been working on using AI to help answer questions accurately and quickly in the RWS Community by training the engine with relevant content.  This has many challenges that I’ll write about once we’ve released our first solution, but one of them is users not providing enough information in the question to be able to help, or worse cause the provision of an answer that’s completely wrong!  Users posting things like “Help, I have an error.”, or “My software doesn’t work” are extreme examples that we do actually see and engaging in a bit of detective work is all part of the task… but I hope you can see how this presents a challenge to a computer.  ChatGPT does actually provide a really helpful answer without judgement at all… but in terms of actually helping with the real problem it’s not helpful at all.  Try it!

That TMX was a basic challenge!

Of course I have to stretch things a little… so I recalled the most challenging question on XPath I have had, and it was actually one I was unable to answer myself and I had to seek help from stackoverflow.  If you’re interested the community question was this one, and the stackoverflow solution was here.

I hope you can also see the difference in how to ask the question… I couldn’t ask for help over there without making sure I provided enough information the first time around.  Stackoverflow experts are brilliant… but not always known for their patience if the question isn’t understandable!

So, I took my question and tested it against ChatGPT.

Create an XPath expression to satisfy the following criteria:

1. Extract text from <P> where countries=”AR”, other values are always ignored
2. Extract text from <P> where it’s parent element (in this example but it’s not always the case) contains AR in the countries attribute (countries=”AR,GB,US” for example)
3. Extract text from current element (<P> in this example, not always) when there is no countries attribute present in the current element or it’s ancestors

A sample XML to extract the text from is this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<rootelement>
<Description>
<Note countries=”AR,GB,US” >
<P countries=”AR” >We want this one as it’s AR.</P>
<P countries=”US” >We don’t want this one as it’s not AR.</P>
<P countries=”GB” >We don’t want this either as it’s not AR.</P>
</Note>
</Description>
<Description>
<Note countries=”AR,GB,US” >
<P>Everyone in AR, GB and US gets to buy.</P>
<P>No restrictions for this product in these countries.</P>
</Note>
</Description>
<Description>
<Note>
<P>No country, that’s because it will be treated as AR.</P>
</Note>
</Description>
</rootelement>

The result should be to extract the text from these elements:

<P countries=”AR” >We want this one as it’s AR.</P>
<P>Everyone in AR, GB and US gets to buy.</P>
<P>No restrictions for this product in these countries.</P>
<P>No country, that’s because it will be treated as AR.</P>

I won’t bore you with all the details, but suffice it to say it didn’t get it right the first time.  For example, attempt #1:

//P[@countries=’AR’] | //P[contains(ancestor-or-self::*[@countries], ‘AR’) and not(@countries)]

This didn’t satisfy the 3rd criteria.  Attempt #2:

//P[@countries=’AR’ or contains(ancestor-or-self::/@countries, ‘AR’) or not(ancestor-or-self::/@countries)]/text()

This didn’t satisfy the 1st criteria.  Attempt #3:

//P[(@countries=”AR”) or (ancestor::[@countries and contains(concat(‘,’, @countries, ‘,’), ‘,AR,’)]) or (not(@countries) and not(ancestor::[@countries]))]/text()

This didn’t satisfy the 1st or the 3rd criteria.  Attempt #4:

//P[@countries=’AR’ or ancestor::*[@countries][contains(@countries, ‘AR’)]=’AR’ or not(@countries)]

This one was correct!!  Now it’s quite likely my question was still lacking detail, or formatted appropriately to help ChatGPT.  This of course mirrors my earlier comment about providing the appropriate information to be able to receive good help.  So I asked how I should have done this:

Sure! When asking a question like this, it’s important to be as specific and clear as possible about what you want the XPath expression to do, and what the desired output should look like. Here are a few tips that could help:

  1. State the problem you’re trying to solve: Give a clear explanation of what you’re trying to accomplish with the XPath expression. This will help the person answering your question to better understand your needs and provide a more tailored response.
  2. Provide a sample XML document: Include an example XML document that represents the type of data you’re working with. Make sure the example document covers all the different scenarios that you want the XPath expression to handle.
  3. Specify the criteria for selecting nodes: Be clear about the criteria you want to use to select nodes from the XML document. Include details such as which attributes should be considered, whether the search should include descendants or only immediate children, and any other relevant information.
  4. Give clear examples of desired output: Include examples of the type of output you expect from the XPath expression. This will help the person answering your question to ensure that their solution meets your needs.

By following these tips, you’ll be able to provide clear and specific information that will help ensure that you get the correct answer the first time.

Great advice… and the egotostical parts of me think I already did that… but I’m starting to have enough confidence in this technology for these sort of tasks that I’m pretty sure I didn’t do it well enough.  Either way the advice is excellent and is something I think everyone can take on board when asking for help!  You have to help others to help yourself!

Summary

Getting back to the point of this article I think this demonstrates another fantastic way to help every user get more from a sophisticated tool like Trados Studio.  Most of the time the use cases are relatively simple for an experienced user to solve on their own, but with the help of AI technology like this even a relative novice is able to tackle some pretty sophisticated problems in their localization process.

I was thinking as I wrote this article how we are right now at a point where we are no longer just talking about the dangers of AI, we are all actually able to witness and experience its potential to change our lives in many areas.  Yesterday, late afternoon, I was trying to help get an agenda completed for an event we’re planning and needed to write several interesting titles and synopsis for presentations that hadn’t been prepared yet.  This same technology helped me do this in minutes, and I have to say far better than I could have done it myself!

It’s very difficult not to get excited about the capabilities of this technology but I think a balanced view is really important.  We can’t ignore it and we have to think about the implications from all sides.  Does it mean the technology is replacing me, or does it mean it’s simply making me more productive than I could possibly be without it?  Could it do everything I do today?  I think we’re still a long way off the sort of intelligence that could replace me completely… and in some ways I hope I don’t live to see it… and I’m not ready to go any time soon!  At least not until I see more evidence of us being more responsible about peoples lives in the future and how we ensure everyone can live a fulfilling life when we have technology doing our work for us.

A few years ago I wrote an article called “Information 4.0… we’re all doomed!“.  I think the ending paragraphs are still relevant today.

All in all I like to focus on the opportunity, and even though we read all the time about how these new technologies will steal our jobs I think it’s important to think about the new ones that will come around as a result of these advancements. There’s one thing for sure… we can’t stop the progress but we can help to shape it.

It’s always easy to complain about things and be negative… much harder to see the opportunity.  But now’s the time to start looking and if we keep a sense of perspective and realism around the real capabilities of a machine I think there will be plenty… embrace the change!

ChatGPT + Powershell = Localization Superpowers!

$
0
0

DALL·E - Digital art: AI generating Powershell scriptsLocalization engineers are the miracle workers behind the scenes of localization workflows, and without them many of the projects we see couldn’t happen.  The skillsets they possess go far beyond the sort of things that most translators know how to do, and often require the ability to code.  I’ve already written a little about these sorts of things in the last three or four articles I published this month, mainly because the use of AI (tools like ChatGPT for example) is opening up the possibility for the rest of us mere mortals to benefit from the sort of things they do.  Today I’m extending on another such skill that I have introduced only once before back in 2013, a decade ago!  It is a very technical, and yet powerful thing to be able to tap into, so now with the help of ChatGPT I’m going to do it again!

Filetype support…

The filetype support in Trados Studio is pretty extensive.  But sometimes what you actually need to improve the overall experience based on information in a file just isn’t there.  So what do you do… if you’re lucky you’ll have a localization engineer to help, and if not you’ll ask in the forums… probably.  But this sort of technical skillset is probably nothing you’ll get a lot of help with because it’s normally quite bespoke and expensive to provide.  So what sort of thing am I talking about?  Let’s take a simple sort of example to get the idea.

Let’s say you received an IOS .strings file like this for translation:

The first thing you’d do is try and open it in Trados Studio, or whatever CAT tool you own (some CAT tools may support this file already… keep in mind I’m just using this as an example for a skillset you’re gonna like!).  If your tool doesn’t support it you may see something like this:

But… unperturbed you know there’s another way to handle this using regular expressions.  So you create your filetype, make sure it can be used for a .strings file:

And now you see this when you open the file:

All simple stuff and then you realise you can’t see the comments, and you might need these to give yourself some context while translating.  Now with Trados Studio you are lucky as the preview will show the contents of the file:

But if you didn’t have this, or if the file was actually more complex so the preview was less helpful due to the amount of unnecessary clutter in the file, then you really want something better.  For example, if the file was in Excel then you could use the html preview or the DSI Viewer to show you exactly what you needed by handling the file with the Bilingual Excel filetype.

If you’re lucky your localization engineer will have done this already for you, but if not you now have a little help from our AI friend, ChatGPT.  First of all I’m not going to say this was straightforward, because it wasn’t.  I have mentioned before that getting the prompt correct is the most important part. So I started with this, just to give you some idea:

Please create an Excel file using a Powershell script containing a sheet with three columns: “Comment”, “Key” and “Value”. The file should contain a list of localization strings in the format of:

/* Comment */
“Key” = “Value”;
A blank row follows each key/value pair.

The localization strings come from a .strings resource file that you prompt the user to provide.

The Excel sheet should contain one row per localization string, with the Comment column containing the comment, with the “Key” column containing the string key (without quotes), and the “Value” column containing the localized value (also without quotes). The Excel sheet should have no blank rows.

I had to make corrections five or six times (not straight forward but not exactly difficult!) and ChatGPT told me later I should have asked this:

Write a PowerShell script that can read a .strings resource file and create an Excel spreadsheet with the data from the file. The script should:

  1. Prompt the user to enter the path to the .strings resource file.
  2. Import the content of the .strings resource file.
  3. Create a new Excel spreadsheet with three columns: “Comment”, “Key”, and “Value”.
  4. Add the content of the .strings resource file to the spreadsheet, with comments in the “Comment” column and key-value pairs in the “Key” and “Value” columns.
  5. Save the spreadsheet to the same directory as the .strings resource file with the filename “Localization-Strings.xlsx”.
  6. Display a message to the user indicating whether the operation was successful or if any errors occurred.
  7. Avoid using the EPPlus assembly.

That last bit was there because at one point it actually tried to introduce an additional Powershell module called EPPlus that I didn’t want.  If you’re familiar with ChatGPT or you’ve read my other articles you can probably imagine the discussion to get it right!

However, after around 15 or 20 minutes I think I got there and the end result was this script:

# Prompt user for path to .strings resource file
$stringsFilePath = Read-Host “Enter the path to the .strings resource file”

# Import the content of the .strings resource file
$stringsContent = Get-Content $stringsFilePath

# Create new Excel spreadsheet
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Add()
$worksheet = $workbook.Worksheets.Item(1)
$worksheet.Name = “Localization-Strings”

# Add column headers to worksheet
$worksheet.Cells.Item(1,1) = “Comment”
$worksheet.Cells.Item(1,2) = “Key”
$worksheet.Cells.Item(1,3) = “Value”

# Parse the .strings content and add to worksheet
$row = 2
foreach ($line in $stringsContent) {
if ($line -match ‘^/\*’) {
# This line is a comment
$comment = $line.TrimStart(‘/* ‘).TrimEnd(‘ */’)
}
elseif ($line -match ‘^”(.+)”\s*=\s*”(.+)”\s*;$’) {
# This line is a key-value pair
$key = $Matches[1]
$value = $Matches[2]
$worksheet.Cells.Item($row,1) = $comment
$worksheet.Cells.Item($row,2) = $key
$worksheet.Cells.Item($row,3) = $value
$row++
}
}

# Save the spreadsheet
$excel.DisplayAlerts = $false
$workbook.SaveAs(($stringsFilePath -replace ‘\.strings$’, ‘-Strings.xlsx’))
$workbook.Close()
$excel.Quit()

# Display success message
Write-Host “Localization strings have been exported to an Excel spreadsheet.”

This successfully converts my strings file into Excel like this:

So now I could set up the Bilingual Excel filetype in Trados Studio like this:

Where…

  1. I set the source and target column to be the same
  2. the comments from the .strings file are added as context
  3. the key is added as a comment

I could have switched 2. and 3. around… just preferences… but now when I translate I have this using the out of the box preview in Trados Studio 2022:

Or I could use the DSI viewer from the appstore like this:

The DSI viewer gets you more screen estate while translating, the html preview is more pleasing to the eye… all a matter of preference.

So… that’s it!  Pretty cool, and another fantastic way to use AI to help you get more by giving you localization super powers!!

What about the target file!

But wait… what about the target file!!!  Well you could, add the .strings file to the same project you just translated using the Regex Delimited Filetype you tested at the start; then just pre-translate from your translation memory and save the target.  Job done!

But you could also just ask ChatGPT to create a script to put it back again!!  After we… “we”… my language talking to a computer is starting to worry me!  After the AI gave me the script I said thanks:

It asked for more and I obliged.  Sounds easy but this time there was a lot more to’ing and fro’ing.  ChatGPT tried to get me to install all kinds of add-ons which I didn’t want, and it persisted for a while to ignore what I was asking, but in the end I got there too.  I think it took me around 45 minutes to get this right… a lot longer but nothing compared to the time it would have taken me without it.  I’m sure a well versed localization engineer would have done this faster, but to be fair localization engineers probably have a lot of scripts up their sleeves for any occasion after being in the business for a while.  I can imagine this tool being very helpful for the first time, or for tackling something new, even for them.

So here’s the return script:

# Prompt user to select the input file
$inputFile = Read-Host “Enter the path to the Excel file”

# Create a new string builder
$stringsBuilder = New-Object System.Text.StringBuilder

# Import data from the Excel file
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($inputFile)
$worksheet = $workbook.Sheets.Item(1)
$range = $worksheet.UsedRange
$rows = $range.Rows.Count
$cols = $range.Columns.Count

# Iterate over the rows and columns of the Excel data
for ($i = 2; $i -le $rows; $i++) {
$comment = $worksheet.Cells.Item($i, 1).Text
$key = $worksheet.Cells.Item($i, 2).Text
$value = $worksheet.Cells.Item($i, 3).Text
if ($comment -ne “”) {
$stringsBuilder.AppendLine(“/* $comment */”)
}
$stringsBuilder.AppendLine(“`”$key`” = `”$value`”;”)
if ($i -ne $rows) {
$nextComment = $worksheet.Cells.Item($i + 1, 1).Text
if ($comment -ne “” -and $nextComment -ne “”) {
$stringsBuilder.AppendLine()
}
}
}

# Save the output to a file
$outputFile = [System.IO.Path]::ChangeExtension($inputFile, “.strings”)
[System.IO.File]::WriteAllText($outputFile, $stringsBuilder.ToString())

# Close the workbook and Excel
$workbook.Close($false)
$excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($worksheet) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel) | Out-Null

# Display a message to the user
Write-Host “Strings file saved to $outputFile.”

I’m not really sure what the best prompts should be to solve a problem like this.  I think at the end of the day you just treat it like a tool.  Ask your question, test the output, explain what’s wrong and try again… and again until you get it right.  So really no different to what you’d do yourself if you were writing… the only difference being I bet you couldn’t write it!

But how do you use it?

I thought that rather than write this all down with even more explanations it would simply be easier to show you.  So here’s how I used the scripts to achieve my goal.

I think there are many applications of this, so please do keep in mind I was just showing you an example of how it could be used.  If you have a tool that already does it then that’s great… try and think about the files you get which are not so easy to work with!  How could you use an AI solution like ChatGPT to help you tackle the problem in ways you probably never even considered before!!

And of course if you want to try this yourself using Powershell, here’s a zip of the .strings file I used and also the two scripts “I” created.

Helping the Help!

$
0
0

Image created with DALL·E, an AI system by OpenAI - “Helping the Help in the style of Richard Estes.”I really like this image created by DALL·E of a man… maybe a businessman… on a wall, putting down his newspaper and reaching down to offer help to the worker with a ladder.  Created with only this prompt – “Helping the Help in the style of Richard Estes.”  When we read about how ChatGPT is “only” an advanced autosuggest we really need to think about how it must have some understanding of what was previously said to be able to predict the suggestion.  DALL·E really demonstrates this well because it had to have enough of an understanding of the concept of help in terms of not only helping, but also the use of the word help as someone who could be employed to help (in this case maybe a caretaker or janitor)… and then think about how this could be represented as an image, and in the style of a photorealist painter I mentioned by name.  Then do all that in a matter of seconds.  Quite astonishing really.

But I’m not here to write about this kind of help!  A couple of years ago I qualified as a technical writer as part of a Masters degree I undertook, so in theory I ought to be capable of creating help documentation that comes up to a recognised standard in terms of how technical communication is carried out.  In fact tekom says this: “Technical writers make modern technology comprehensible.”  When I think about help documentation that simple definition seems to say it all.  However, technical documentation for online help can sometimes be less than helpful for a variety of reasons… for example:

  • Complexity of the subject
  • Lack of clarity
  • Incompleteness
  • Outdated information
  • Poor organization
  • Jargon and technical terms
  • Lack of examples or use cases
  • Insufficient visual aids
  • Inadequate focus on the user
  • Limited resources and time constraints

There can be many reasons for each of these problems, and I’m not here to judge or be critical of any particular documentation, but I am interested to see how it could be improved when you find something that isn’t as clear as you need.  One way, and this is something we can all do, is to use AI like ChatGPT, to improve the material we read so it helps us with a better understanding of what we’re trying to learn.

I started to look at this after trying to use the online help to provide a user with some information to help explain about some of the options in the XLIFF filetype.  In particular, these options in Trados Studio:

Screenshot showing the "Segment status mapping" options for the XLIFF filetype in Trados Studio

Clicking on the help takes you, at the time of writing to this location.  This page contains some information about all the settings for the XLIFF filetype, but scrolling down a little I can find this part which is relevant to the “Segment status mapping” node in Studio that I’m interested in:

Now, I’m not going to critique the information in here so let’s just say it’s not optimal for my understanding of how to use the options in support of this filetype.  So let’s start by just seeing what ChatGPT knows about this.  So I just copy this section into ChatGPT and ask it to elaborate on the provided text:

Screenshot of the aforementioned question being asked in ChatGPT.

The response wasn’t bad… at least it was already somewhat more informative:

The help documentation you provided explains the XLIFF filetype settings in Trados Studio. XLIFF, or XML Localization Interchange File Format, is a standardized file format used to manage the translation process. In Trados Studio, the Segment status mapping page allows you to control the behavior of segments when opening and saving XLIFF files. Here’s a more detailed explanation of each setting:

  1. File opening: This setting establishes a correspondence between the XLIFF status and the Studio status columns based on the values you choose for each column. When opening a file, if the XLIFF status value matches the Studio status value set for that segment, the target content will be displayed as locked. Locked segments cannot be edited, ensuring that the translation remains unchanged.
  2. File saving: Similar to the File opening setting, this option creates a correspondence between the XLIFF status and the Studio status columns based on the selected values for each column. When saving a file, if the Studio status value matches the XLIFF status value set for that segment, the target content will be displayed as locked content in the saved XLIFF file, preventing any changes.
  3. Segment locking: This setting enables you to lock segments with a specific status. By checking the status, you trigger a lock operation on those segments, making them uneditable and ensuring their translations remain unchanged.
  4. Map state-qualifier attribute values when reading file: This setting creates a correspondence between segment states in the Studio Editor when opening the XLIFF file. It helps maintain the segment states, such as “approved,” “rejected,” or “needs review,” so that translators can easily understand the status of each segment and act accordingly.
  5. Map state-qualifier attribute values when writing file: This option creates a correspondence between segment states in the target file when saving the XLIFF file. It ensures that the segment states are accurately reflected in the saved XLIFF file, facilitating a smooth translation process across different platforms.

By configuring these settings in Trados Studio, you can manage the translation workflow more efficiently, maintain segment states, and protect specific translations from unintentional changes.

There some parts that might be a bit misleading but what I’m really missing is an understanding of how these settings influence the behaviour of a file in Trados Studio.  I’m pretty clear about the “Segment status mapping” and the “Segment locking” as this improved explanation coupled with me clicking on the dropdown in Trados Studio for these options makes this fairly obvious… although the help documentation does have room for improvement there too.  But I’m not at all clear on how to work with the “State-qualifiers”… in fact I may not even be sure what they are!  So my first task is to find this out and to do that the beginning of the page I get to in the online help tells that these settings are for working with files compliant with the OASIS XML Localisation Interchange File Format (XLIFF) standard version 1.1 or 1.2.  So I google that and find this document:

http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html

I search that page for “state-qualifier” and find a neat section related to this topic.  Now I know this:

State-qualifier – Describes the state of a particular translation in a <target> or <bin-target> element.

There is also a very neat table describing all the possible values and their descriptions.  Very helpful… but if I had a test file it would be even better as I could test how Trados Studio handles this and actually see the effect of each value when using the “State-qualifiers” options in Trados Studio.  Maybe ChatGPT can help with this?  So I ask this question and paste the entire table from the specification into the prompt:

I asked it to create a sample file for me with 14 translation units, as there are 14 values in the table.  In seconds I’m obliged with this:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <file source-language="en" target-language="fr" datatype="plaintext" original="file.ext">
    <header>
      <tool tool-id="chatgpt" tool-name="ChatGPT" tool-version="1.0" />
    </header>
    <body>
      <trans-unit id="1" xml:space="preserve">
        <source>Hello, world!</source>
        <target state-qualifier="exact-match">Bonjour, le monde !</target>
      </trans-unit>
      <trans-unit id="2" xml:space="preserve">
        <source>Goodbye, world!</source>
        <target state-qualifier="fuzzy-match">Au revoir, le monde !</target>
      </trans-unit>
      <trans-unit id="3" xml:space="preserve">
        <source>Thank you!</source>
        <target state-qualifier="id-match">Merci !</target>
      </trans-unit>
      <trans-unit id="4" xml:space="preserve">
        <source>Please</source>
        <target state-qualifier="leveraged-glossary">S'il vous plaît</target>
      </trans-unit>
      <trans-unit id="5" xml:space="preserve">
        <source>Good morning</source>
        <target state-qualifier="leveraged-inherited">Bonjour</target>
      </trans-unit>
      <trans-unit id="6" xml:space="preserve">
        <source>See you later</source>
        <target state-qualifier="leveraged-mt">À plus tard</target>
      </trans-unit>
      <trans-unit id="7" xml:space="preserve">
        <source>How are you?</source>
        <target state-qualifier="leveraged-repository">Comment ça va ?</target>
      </trans-unit>
      <trans-unit id="8" xml:space="preserve">
        <source>What's your name?</source>
        <target state-qualifier="leveraged-tm">Comment tu t'appelles ?</target>
      </trans-unit>
      <trans-unit id="9" xml:space="preserve">
        <source>Good night</source>
        <target state-qualifier="mt-suggestion">Bonne nuit</target>
      </trans-unit>
      <trans-unit id="10" xml:space="preserve">
        <source>I like apple</source>
        <target state-qualifier="rejected-grammar">J'aime pomme</target>
      </trans-unit>
      <trans-unit id="11" xml:space="preserve">
        <source>Where is the library?</source>
        <target state-qualifier="rejected-inaccurate">Où est la piscine ?</target>
      </trans-unit>
      <trans-unit id="12" xml:space="preserve">
        <source>It's a beautiful day</source>
        <target state-qualifier="rejected-length">C'est un beau</target>
      </trans-unit>
      <trans-unit id="13" xml:space="preserve">
        <source>Have a great weekend!</source>
        <target state-qualifier="rejected-spelling">Passez un excellent weekand !</target>
      </trans-unit>
      <trans-unit id="14" xml:space="preserve">
        <source>What time is it?</source>
        <target state-qualifier="tm-suggestion">Quelle heure est-il ?</target>
      </trans-unit>
    </body>
  </file>
</xliff>

I put the whole file in here as I think it’s really helpful for anyone who would like to test it… and it only took me a few seconds to create anyway!  I can now use this XLIFF and test in Studio to see what happens!

Screenshot showing the segment statuses allocated to each of the 14 translation units in Trados Studio.

This is much better and now it’s become quite clear for me.  I even like that the sample file created by ChatGPT used translations relevant to the reason for the stats-qualifier value!  Rejected grammar, accuracy, length and spelling… pretty smart especially as I didn’t even ask for it to be that clever!

But then I think… what happens if I also map the segment statuses?  Which takes precedence…is there even a precedence?  So first I do the same operation and just look at the mappings.  I can find a similar table in the specification here:

http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#state

I repeat the process I used before to have ChatGPT create a sample file for me:

Screenshot showing a prompt in ChatGPT to ask it to create an XLIFF file using all the State values in the XLIFF specification.

The result is a slightly smaller file this time as there were only 10 different states.  It actually did it wrong the first time and it placed the state attributes into the trans-unit element which is incorrect… I know this because the specification says this:

State – The status of a particular translation in a <target> or <bin-target> element.

So I just asked ChatGPT to do it again and make sure the state attribute was in the target element… which it duly obliged.  I mention this because the point I’m making is that this is still a tool that can help even if you do have to make some corrections… just don’t blindly accept everything it’s telling you as you may not get the results you expected.  I also asked ChatGPT why it made this mistake but it offered no explanation, only an apology and then created the corrected file:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <file source-language="en" target-language="fr" datatype="plaintext" original="example.txt">
    <header></header>
    <body>
      <trans-unit id="1">
        <source>Welcome to our website!</source>
        <target state="final">Bienvenue sur notre site !</target>
      </trans-unit>
      <trans-unit id="2">
        <source>20% discount on selected products</source>
        <target state="needs-adaptation">20 % de réduction sur les produits sélectionnés</target>
      </trans-unit>
      <trans-unit id="3">
        <source>Our store is located at 123 Main St.</source>
        <target state="needs-l10n">Notre magasin est situé au 123 rue Principale.</target>
      </trans-unit>
      <trans-unit id="4">
        <source>Call us at 123-456-7890.</source>
        <target state="needs-review-adaptation">Appelez-nous au 123-456-7890.</target>
      </trans-unit>
      <trans-unit id="5">
        <source>We ship to the USA and Canada.</source>
        <target state="needs-review-l10n">Nous livrons aux États-Unis et au Canada.</target>
      </trans-unit>
      <trans-unit id="6">
        <source>Sign up for our newsletter.</source>
        <target state="needs-review-translation">Inscrivez-vous à notre lettre d'information.</target>
      </trans-unit>
      <trans-unit id="7">
        <source>Visit our blog for the latest news.</source>
        <target state="needs-translation"></target>
      </trans-unit>
      <trans-unit id="8">
        <source>Check out our new collection.</source>
        <target state="new"></target>
      </trans-unit>
      <trans-unit id="9">
        <source>Follow us on social media.</source>
        <target state="signed-off">Suivez-nous sur les réseaux sociaux.</target>
      </trans-unit>
      <trans-unit id="10">
        <source>Contact our customer support.</source>
        <target state="translated">Contactez notre support client.</target>
      </trans-unit>
    </body>
  </file>
</xliff>

But it was still similarly simple and ChatGPT created a useful file in a few seconds that I can also test.  I don’t change any of the defaults in Trados Studio (but I do remember to uncheck the boxes for the “state-qualifiers”) since this filetype is based on compliant XLIFF 1.2 and this surely is one!  The result being:

Screenshot showing the segment states and how they are mapped in Trados Studio using the defaults.

This is also very useful because even though I had a reasonable understanding of what to expect it’s much better to be able to visualize it.  Now that I’ve seen both I can take a look at which takes precedence.  To do this I think I need some sort of baseline so I can just add the state=”translated” state to each of the 14 translation units in my first example and see how this changes things.  I could do this with regex… but I can just as easily copy the file ChatGPT created for me into the prompt and ask it to add the state=”translated” for me.  Here’s the file just in case you’d like to test it yourselves:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <file source-language="en" target-language="fr" datatype="plaintext" original="file.ext">
    <header>
      <tool tool-id="chatgpt" tool-name="ChatGPT" tool-version="1.0" />
    </header>
    <body>
      <trans-unit id="1" xml:space="preserve">
        <source>Hello, world!</source>
        <target state="translated" state-qualifier="exact-match">Bonjour, le monde !</target>
      </trans-unit>
      <trans-unit id="2" xml:space="preserve">
        <source>Goodbye, world!</source>
        <target state="translated" state-qualifier="fuzzy-match">Au revoir, le monde !</target>
      </trans-unit>
      <trans-unit id="3" xml:space="preserve">
        <source>Thank you!</source>
        <target state="translated" state-qualifier="id-match">Merci !</target>
      </trans-unit>
      <trans-unit id="4" xml:space="preserve">
        <source>Please</source>
        <target state="translated" state-qualifier="leveraged-glossary">S'il vous plaît</target>
      </trans-unit>
      <trans-unit id="5" xml:space="preserve">
        <source>Good morning</source>
        <target state="translated" state-qualifier="leveraged-inherited">Bonjour</target>
      </trans-unit>
      <trans-unit id="6" xml:space="preserve">
        <source>See you later</source>
        <target state="translated" state-qualifier="leveraged-mt">À plus tard</target>
      </trans-unit>
      <trans-unit id="7" xml:space="preserve">
        <source>How are you?</source>
        <target state="translated" state-qualifier="leveraged-repository">Comment ça va ?</target>
      </trans-unit>
      <trans-unit id="8" xml:space="preserve">
        <source>What's your name?</source>
        <target state="translated" state-qualifier="leveraged-tm">Comment tu t'appelles ?</target>
      </trans-unit>
      <trans-unit id="9" xml:space="preserve">
        <source>Good night</source>
        <target state="translated" state-qualifier="mt-suggestion">Bonne nuit</target>
      </trans-unit>
      <trans-unit id="10" xml:space="preserve">
        <source>I like apple</source>
        <target state="translated" state-qualifier="rejected-grammar">J'aime pomme</target>
      </trans-unit>
      <trans-unit id="11" xml:space="preserve">
        <source>Where is the library?</source>
        <target state="translated" state-qualifier="rejected-inaccurate">Où est la piscine ?</target>
      </trans-unit>
      <trans-unit id="12" xml:space="preserve">
        <source>It's a beautiful day</source>
        <target state="translated" state-qualifier="rejected-length">C'est un beau</target>
      </trans-unit>
      <trans-unit id="13" xml:space="preserve">
        <source>Have a great weekend!</source>
        <target state="translated" state-qualifier="rejected-spelling">Passez un excellent weekand !</target>
      </trans-unit>
      <trans-unit id="14" xml:space="preserve">
        <source>What time is it?</source>
        <target state="translated" state-qualifier="tm-suggestion">Quelle heure est-il ?</target>
      </trans-unit>
    </body>
  </file>
</xliff>

When I open this file in Trados Studio making sure I have again checked the options to map the “state-qualifiers” I see this:

Screenshot showing the use of the state and also the state-qualifier attributes in an XLIFF when opening in Trados Studio.

Comparing the two and it seems clear that the “state-qualifiers” take precedence where applicable.  Segments #10, #11, #12 and #13 have all retained the “Translation Rejected” status which makes sense since you shouldn’t be able to have a confirmed translation that had rejected errors in it.  Another quick test, this time by simply search replacing the state=”translated” attribute with state=”needs-review-l10n” and this time it’s clear that the “state” takes precedence:

Screenshot showing the statuses of the segments in Studio when the state attribute is "needs-review-l10n"

So then I went back to ChatGPT and sought some clarification by asking it “Can you explain which would take precedence, the “state” attribute or the “state-qualifier” when both are used in a target element?”.  This may seem obvious to anyone familiar with XLIFF but given we have these controls in Trados Studio and I am clearly allowed to do something odd by creating a file that contained possible conflicting information between these attributes I think an explanation would be useful.  The answer was this:

When both “state” and “state-qualifier” attributes are present in the target element, it’s important to understand the difference between them:

  • The “state” attribute defines the general translation status of the target element, such as “translated,” “needs-translation,” or “final.”
  • The “state-qualifier” attribute provides additional context or information about the state, such as “needs-review-l10n,” “needs-review-adaptation,” or “signed-off.”

In case both attributes are used in a target element, they complement each other and do not conflict. There is no precedence of one over the other. Instead, they provide more granular information about the translation status.

For example:

<target state=”needs-review-translation” state-qualifier=”needs-review-l10n”>…</target>

In this example, the “state” attribute indicates that the translation needs review, while the “state-qualifier” attribute specifies that it needs review specifically for localization aspects. Both attributes work together to describe the translation status more accurately.

So, the conclusion I can draw from this is that Trados Studio doesn’t expect to receive an XLIFF where these attributes conflict, but when it does it will try to take a sensible decision and ensure that the segment status represents the more plausible status based on the situation in each case.  I could test each one for my own benefit if I was documenting all of this to create an improved version of the online help, but this is probably enough information for me in being able to understand how these options work and why.

But what can I conclude from this exercise as a whole?

Conclusion

For the User

Using a tool like ChatGPT to help with understanding and working with this particular online help page has proven to be beneficial in several ways:

  1. Quick and informative responses: ChatGPT can provide instant responses to questions or concerns about the topic, helping you grasp concepts faster and more efficiently.
  2. Custom examples: ChatGPT can generate tailored examples based on your specific requirements, which can aid in better understanding and visualization of how different attributes and settings work in practice.
  3. Iterative learning: By engaging in a back-and-forth conversation with ChatGPT, you can refine your understanding of the subject and address any misconceptions or confusion along the way.
  4. Supplementing existing knowledge: ChatGPT can complement your existing knowledge and serve as a convenient source of information for clarification or deeper understanding.

However, it’s important to keep in mind some limitations and considerations while using a tool like ChatGPT:

  1. Possibility of errors: ChatGPT, while advanced, is not infallible and can sometimes provide incorrect or incomplete information. It’s essential to cross-verify the information received from the AI with reliable sources or your own understanding.
  2. Need for guidance: ChatGPT may require specific instructions or prompts to produce the desired output. You may need to rephrase your questions or provide additional context to get the most relevant and accurate response.

In conclusion, using a tool like ChatGPT to assist with enhancing your understanding of technical topics like this can be highly beneficial. It can serve as a valuable resource for quick information, custom examples, and iterative learning. However, it’s essential to be aware of its limitations and verify the information received to ensure accuracy and reliability.

For the Technical Writer

ChatGPT can probably be useful to a Technical Writer too… especially one like me who “in theory ought to be capable” of writing good help documentation:

  1. Content generation: ChatGPT can help generate content quickly, providing initial drafts, ideas, or examples that the writer can then refine and edit.
  2. Editing and proofreading: ChatGPT can provide suggestions for improving clarity, grammar, or style in existing documentation.
  3. Explaining complex concepts: ChatGPT can help simplify and explain complex concepts in a more user-friendly manner.
  4. Answering questions: ChatGPT can help clarify doubts or provide additional information about the subject matter.
  5. Generating examples: ChatGPT can generate sample code, files, or illustrations to better explain a concept or procedure.

However, it’s important to note that ChatGPT is an AI tool, and its output should be carefully reviewed and edited by the Technical Writer to ensure accuracy, relevance, and consistency.  The AI is not perfect, and it may sometimes generate incorrect or irrelevant information.  The Technical Writer should always verify the information provided by the AI and apply their own expertise and judgment to create high-quality documentation.

But all in all a very useful tool in yet another way!

XML… unravelling chaos

$
0
0

Image of a ball of wool unravelling around the letters XMLWhilst I would definitely not claim to be an expert, writing this blog has allowed me to learn a reasonable amount about XML over the years.  Most of the articles I’ve written have been about explaining how to manage the many amazing features in the filetypes that are supported by Trados Studio… and of course how to deal with the many changes over the years as the filetypes have become more and more sophisticated catering for the demands of our customers and the changes in the technologies applied to XML in general.  The result of these changes has led to some… let’s say… less than user friendly interfaces and features and you’d certainly be forgiven if you thought things were becoming a little chaotic!

Another long article warning (XML does this to me!), but for example, when 2019 SR2 was released some years ago we could see this:

Four different types of XML to choose from when you came to create your filetype!  Fortunately today we only have one:

It turns out the chaos was just the result of having to not only provide many enhancements over the years while still supporting legacy workflows because we had so many customers still using older technology, but we were also building for the future… supporting both desktop and working in the cloud:

Screenshot showing the same new filetype options in the cloud

Everything the same and only one XML to choose from!  It’s all starting to make sense and it supports users who are choosing to work more in the cloud by ensuring they have the same experience whether working with Trados Studio or any of the cloud products such as Trados Enterprise and Trados Team.

But there are still a few things I’ve not discussed in my articles around the use of XML so I decided to put my recent forays into the world of AI to one side and see if I can unravel a little more of the XML chaos that’s still hanging around.

Parser Rules Type

When you create your new XML filetype the second thing you’ll do after providing the standard Filetype Information is create your parser rules.  But before you get to there you have to make a decision about the Parser Rule Type.  It’s the same question in Trados Studio as it is in the cloud:

Screenshot showing the Trados Studio and Cloud UI for this feature together

I won’t keep doing this… but it’s definitely worth noting that most of all this cool stuff we can do in Trados Studio is there in the cloud too.  This takes a lot of work and it adds a lot of complexity that I bet most users won’t really appreciate, and the engineering teams definitely deserve a lot of credit for trying to keep the right balance over the years in supporting the past while forging ahead into the future.

But let’s talk about these two types and why we have them… and more importantly what it means when you choose one over the other.

XPath rules

If you choose to go with XPath rules you are giving yourself complete flexibility to do two things:

  1. create XPath rules to manage the data in the XML in pretty much any way you wish.  Here the rules for XPath can be as complex as you like allowing you to manage some very specific requirements such as the ones I covered in this article, and
  2. work with namespaces to further customise your XML filetype through being able to uniquely identify element and attribute  names, especially when working with multiple XML files in a project where only the namespace provides the ability to avoid any ambiguity between the files.

XML Settings Import

Here you have two choices:

  1. Create an XML filetype based on the default settings, or
  2. Define settings based on XML, sdlftsettings or XSD rule

Looks like this:

Screenshot showing the XML settings page in Trados Studio

If you choose to create an XML filetype based on the default settings then you’ll be taken through the filetype wizard and will have to create all the information needed to handle your XML file yourself.  If you decide to define settings based on XML, sdlftsettings or XSD rules then some, or all, of the work in defining the rules is done for you.

XML: this is going to import the elements that are found in the XML file and make them all “Translatable (except in protected content)”.  They will appear as XPath rules.  It will not import the attribute names or make them available to you at any stage, so you will need to write your own XPath to select them.

sdlftsettings: unless you need to make changes this is probably going to contain all the rules you need because it’s essentially the settings for a Trados Studio custom xml filetype.  When you import it the new filetype will be created and all the rules and filetype settings will be there already.

XSD: this is going to import all the rules provided that could support multiple XML filetypes based on the same schema.  So useful for ensuring you have all the elements possible.  However, there are two gotchas to be aware of.  First if you have multiple root elements only the first one will be imported into your new filetype so you’ll have to add any others manually.  Second the attributes are not going to be available to you so you’ll have to add any rules for these manually by inspecting the XSD or the actual XML files.

Once imported you’ll see the rules similar to this:

Screenshot showing the parser rules after importing an XML file or XSD under XPath rules.

Note that the namespaces are all there, and under the namespace node you’ll see the namespaces and their prefix similar to this:

Screenshot showing the namespace settings for the filetype.

So with the XPath rules you really do have complete flexibility to manage the extraction of translatable text for all sorts of complex criteria.  But it comes with the price of having to define rules based on you inspecting the definition files for the attribute names as they are not made available for you.  A small price to pay I think.

Element Rules

These are essentially there to make things easier for the translator who doesn’t need to do anything complex that might require the use of XPath. It handles the namespaces for you, and it only presents you with the names of the elements and the attributes so you can easily select them.  Perfect for simple scenarios where there is no complex logic required.

XML Settings Import

Here you also have two choices but the second choice is different

  1. Create an XML filetype based on the default settings, or
  2. Define settings based on XML, XSD or DTD rule

Looks like this:

Screenshot showing the XML settings page in Trados Studio for Element Rules

If you decide to define settings based on XML, XSD, or DTD rules then some, or all, of the work in defining the rules is done for you.  I’ll just mention DTD.

DTD: this is going to import the elements that are found in the DTD file and make them all “Translatable (except in protected content)”.  They will appear as Element names.  It will also import the attribute names and make them available to you during the creation process only.  For example, if I edit ChaosTools I can also see the three attributes that are associated with this element:

Screenshot showing the ability to select an element and then the corresponding attribute values for that element.

After the filetype has been created these attribute values are no longer available for selection and you would have to type them in manually.  But I think this is probably a bug, or a simple omission, as this always used to be possible with the legacy XML filetypes.

Interestingly, despite DTD being largely superseded by XSD, I have found that when it comes to using an XSD versus a DTD in practice and using it to create the filetype settings, the DTD actually makes it far easier despite not supporting namespaces, especially when you start supporting multiple XML files with different roots and namespaces into one DTD/XSD.

Once imported you’ll see the rules similar to this:

Screenshot showing the parser rules after import for Element Rules.

Note you don’t see the namespaces this time, and the simpler use of columns for the Element name and the Attribute name are displayed instead.  If you look under the namespace mode which is still available you’ll see this:

Screenshot showing the namespace options... none!

Why is it still available… well if you took the simpler approach to creating you XML filetype and later realised you needed a little more flexibility as your client introduced some rules around what you needed to extract for translation that could only be handled with XPath then you do still have this option to convert your rules to XPath!

Screenshot showing how to convert Element Rules to XPath rules.

This will convert all your rules to XPath like this:

Screenshot showing the converted Element Rules to XPath expressions.

All good except now you have local-name, which is useful when dealing with XML documents that use namespaces and you want to select elements based on their local name, regardless of the namespace prefix:

//*[local-name() = ‘ChaosTools’]

as opposed to the use of the namespace:

//tools:ChaosTools

So the other important point to note is that if you wish to use the namespaces to distinguish between elements with the same name that have different namespaces, then you will also have to add them in manually.  This is because the “Convert to XPath” feature also makes namespaces available to you, but as the use of Element Rules doesn’t pay attention to namespaces the detail isn’t there:

Screenshot showing the empty namespace node afer converting to XPath.

So you would have to define the Prefix and the Uri yourself… or recreate filetype using XPath rules.  Really depends on your level of comfort, although when you play around with these options you’ll soon learn how to do it!

All in all you are going to have to do some work when creating your filetypes because only the sdlftsettings import is going to do everything for you.  Everything else will require you making sure that the rules and settings have been appropriately set up for the XML files you are translating.

Moving from XML1 to XML2

I wanted to add this in here because recently I have seen many users migrating from 2019 or 2021 where they had been using custom XML filetypes created with the older XML1 filetype.  Now that 2022 has simplified the versions of XML available the XML1 filetype may not work correctly anymore in 2022.  For example, if you created custom stylesheets for previewing your XML they may not work at all anymore.

The solution to this problem is to export the sdlftsettings file from your XML1 filetype using Trados Studio 2021 or earlier (so this means you have to have had the foresight to back up your custom filetypes to an sdlftsettings file in the first place) and then create a new XML2 filetype in Trados Studio 2022 by importing the sdlftsettings.

If however you didn’t have your crystal ball available before upgrading, then not to worry as the RWS technical support team have you covered as described in this KB article.

What about the legacy?

I need to ask this question because these had been around forever.  In the old filetypes this defining settings used to be based on INI, ANL, XML, XSD, ITS or DTD rule files.  So what’s with this… why have we dropped them?

INI: these were used by Trados2007 and earlier.  INI files played a role in configuring the handling of various file formats and filters within Trados and determined how different types of documents were processed and segmented for translation.

ANL: these were used by SDLX to define how the software should handle different file types during the translation process.

ITS: these were filetype settings based on the Internationalization Tag Set (ITS) defined by the W3C.

You may have never heard of any of these and this would be one of the reasons why they are no longer supported… they are old!  Trados 2007 has been out of support since 2013.  ITS  was last updated in 2013 as well.  SDLX… to be honest I’m not sure when this was no longer supported, but as a product it hung around to work with SDL TMS for a while and then eventually the filetype support in SDL TMS was adapted to match Trados Studio so SDLX really became redundant.  Although I’m sure we have a few old sweats left who can correct me on that!  However, the point is they are old and it was no longer considered necessary to support these old formats.  So if you do have any then I’m afraid you’ll just have to create them again!!


Working with CSV’s…

$
0
0

Icon of a CSV file with a pencil, indicating a file used for entering or editing comma-separated values. The icon is designed to resemble a physical document with lines of text and the CSV extension in the upper right corner, encapsulated in a blue rounded square with a slight shadow effect.CSV, or files with “comma separated values”, is a simple format that everyone should be able to handle.  Certainly you’d think so except nothing is ever that straightforward and if you’ve ever spent time trying to work with these files and having to deal with all the problems inherent to this format then you’ll know exactly what I’m talking about.  In our industry, localization, the number of problems can even increase because we also have to deal with converting the content from one language to another.  I also wondered if there is a standard for CSV and the closest things to one is more of a recommendation, called RFC 4180.  It doesn’t seem to have been updated since 2005 and doesn’t take account of many of the problems that can be caused by not managing these files carefully.

So what sort of things are we talking about?  Here’s a few…

Character Encoding Issues

I have discussed this at length in the past, but CSV files can be saved in different character encodings, such as UTF-8 or ISO-8859-7 for example.  The only problem is that with the exception of UTF-8 with a BOM (Byte Order Mark), when you send that file to someone else and don’t tell them what encoding you used they will have no way of knowing and this could result in characters becoming corrupted resulting in an inability to read them any longer and possibly cause problems on import to their intended application after the translation is complete.  For example if I open a CSV containing Greek characters in notepad I see this as it defaulted to an encoding called Win-1252:

A screenshot showing a table with corrupted text. The table appears to have four columns labeled TransactionID, Date, Amount, and Description. The rows contain dates in the format YYYY-MM-DD, Euro amounts, and descriptions with garbled text, which includes various special characters and diacritics in a pattern that suggests a text encoding issue.

Had I known the intention in this case was to use ISO-8859-7 Greek (ISO) for an old legacy system then I would have made sure to use this when handling the file to make sure it’s handled and returned with the same encoding.

Some tools can take a guess at the encoding using heuristics or defaults to guess the encoding, but this process is not foolproof.  It can lead to misinterpretation of the file’s content, especially if it contains characters outside the ASCII range.  For example, I have started using an application called Rons Data Edit for working with files like this (it used to be called Rons CSV Editor many years ago when I first used it and I almost forgot I had it until it was updated recently) and this version of the application is so good (in my opinion) it’s now my default application for these sort of files.  There are many reasons why I like this, but I’ll try and share a few reasons as we go along.  The first reason would be that I can set a fallback encoding for files I’m working on if it cannot be recognised as UTF-8 for example:

Screenshot of a computer interface with a CSV file named "greek.csv" open, showing a table with headers "TransactionID", "Ημερομηνία" (Date), "Ποσό" (Amount), "Περιγραφή" (Description), "Κωδικός Συναλλαγής" (Transaction Code), and "Λογαριασμός Προορισμού" (Destination Account), with sample data in Greek. A settings window overlay with "Data Edit Options" is open, showing "Text Defaults" section with "Encoding" set to "ISO-8859-7 - Greek (ISO)" and "Line Separator" set to "New Line". Red arrows point to the encoding and line separator settings.

You can also see the next reason I like this software is because not only has the file opened up using  ISO-8859-7 Greek (ISO), but it’s also nice and clear so I can easily read it.  You may say you can do this in Excel, and you can, but I can’t tell you how many times I’ve seen Excel mess up CSV files, and how many times I’ve seen people messing them up because they don’t know how to work properly with a CSV in Excel.  In Rons Data Edit it’s simple, you just open them!

Now some of you may be thinking, don’t talk crap Paul since UTF-8 can be used to handle everything these days.  This is mostly true, but it’s also fair to say that several other encodings are still commonly used today, especially in legacy systems or in regions with specific language requirements.  For example. I think these are still relatively commonplace:

  • ISO-8859 Series:  There are several variations catering to different languages, such as ISO-8859-1 (Latin-1 for Western European languages), ISO-8859-2 (Latin-2 for Central and Eastern European languages), and others in this series.
  • Windows-1252:  Often used by default in older Windows systems, it’s similar to ISO-8859-1 but includes additional printable characters.
  • Shift JIS: Can still be used for Japanese text, especially on platforms where legacy support is needed.
  • GB2312 and GBK:  Encodings for Simplified Chinese characters, used in Mainland China.
  • BIG5:  An encoding for Traditional Chinese, used primarily in Taiwan and Hong Kong.
  • KOI8-R and KOI8-U:  Used for Russian and Ukrainian respectively, particularly in older systems.
  • EUC Family:  Including EUC-JP for Japanese, EUC-KR for Korean, and others for various Asian languages.

These encodings are often maintained for compatibility with older systems, for certain governmental regulations, or in user communities where these encodings remain standard.  However, it’s definitely true that the global computing community continues to adopt Unicode (and UTF-8 by extension) due to its ability to handle virtually all text in any language, and all the crazy emojis for example that we see today, through a single encoding system.  So having a decent editor to help with these files is something worth investing in if you work in a profession that is forced to handle them from time to time.

The takeaway from here would be to always ask your client what encoding should be used if they don’t tell you; and then make sure you have the appropriate tooling to handle them without breaking them.

Delimiter Confusion

The term “CSV” suggests that commas are used as field separators.  However, the delimiter can vary by region and use.  Here’s a few common ones you may have come across before:

  • TSV (Tab-Separated Values)
  • PSV (Pipe-Separated Values)
  • SCSV (Semicolon-Separated Values)
  • SSV (Space-Separated Values)
  • Custom Delimiter-Separated Values

This can lead to confusion when opening a file in software that expects a different delimiter.  If you’re using Excel you need to make sure you specify the delimiter before importing the data… if you use Rons Data Edit it recognises most separators out of the box, but you can also set up a data profile to handle anything you like:

Screenshot of a software interface for editing data profiles. The 'Edit Data Profile' pane is active, showing settings for a 'Semi-Colon Separated' profile. Fields include 'Name', 'Field Open' and 'Close' with quotation marks, 'Field Separate' with a semicolon, and 'First Row is Header' checked. Under 'File Details', 'Use as File format' and 'Associate File' are checked with 'File Extension(s)' set to csv. There's a navigation bar with options like 'Add', 'Delete', and 'Save File'.

We are not the only ones getting confused.  I’ve had a fruitless engagement with the Deutsche Bank on and off for years who don’t seem able to export a properly formatted CSV for import into an accounting application.  Every month I have to correct the file before I can import it because they don’t wrap fields containing a semi-colon as part of the text in a SCSV (Semicolon-Separated Values) export file.  So for example they give me a file like this:

Booking date;Value date;Transaction Type;Beneficiary / Originator;Payment Details;Debit;Credit;Currency
08/08/2023;08/08/2023;Debit Card Payment;DM Fil.;DM FIL.2490 H:65349;-22.40;;EUR
08/08/2023;08/08/2023;SEPA-Direct Debit;Vollkorn Naturkost;Kundenkarte 1234;Rabattabzug erfolgt immer zum Monatsende;Danke - Ihr BioMarkt;-76.35;;EUR
08/09/2023;08/09/2023;Debit Card Payment;Edeka;EDEKA KOEHLER;-22.83;;EUR
08/09/2023;08/09/2023;Debit Card Payment;Alnatura;ALNATURA PRODUKTIONS;-37.71;;EUR

When I open it in my editor it looks something like this:

Screenshot of a CSV file named "misalign.csv" displayed in a table format. The headers include "Booking date", "Value date", "Transaction Type", "Beneficiary / Originator", "Payment Details", "Debit", "Credit", and "Currency". Row 2 is highlighted, showing a misalignment where "Debit" column entry "-76.35 EUR" shifts into the "Currency" column.

Note that Row 2 is completely misaligned.  This is easily spotted in this editor, but in an ordinary text editor you can see how complicated it can get to spot that it should have been written like this:

Booking date;Value date;Transaction Type;Beneficiary / Originator;Payment Details;Debit;Credit;Currency
08/08/2023;08/08/2023;Debit Card Payment;DM Fil.;DM FIL.2490 H:65349;-22.40;;EUR
08/08/2023;08/08/2023;SEPA-Direct Debit;Vollkorn Naturkost;"Kundenkarte 1234;Rabattabzug erfolgt immer zum Monatsende;Danke - Ihr BioMarkt";-76.35;;EUR
08/09/2023;08/09/2023;Debit Card Payment;Edeka;EDEKA KOEHLER;-22.83;;EUR
08/09/2023;08/09/2023;Debit Card Payment;Alnatura;ALNATURA PRODUKTIONS;-37.71;;EUR

Very hard to spot… and another point worth noting with quotes used to wrap a column… make sure they are straight quotes because otherwise they just become text within the field.

Lack of Standardization for Encoding Complex Types

While basic data types (strings, numbers) are straightforward to encode in CSV, more complex types (dates, boolean values) can be represented in multiple ways, leading to inconsistencies.  For example, dates might be formatted differently depending on the locale.  Why is this a problem?  Perhaps take a look at the CSV I used as an example from my bank.  The dates are 08/08/2023, and 08/09/2023.  In this format I have no idea whether this is dd/mm/yyyy or mm/dd/yyyy.  In my bank export it’s obviously not a problem as I know by now and after I get past the 12th day it all comes clear, but for a long file provided for translation with thousands of lines and dates it could be a problem if you need to make sure a particular format is used in the translation and it’s not the same as the source.

There’s not a lot you can really do if your client doesn’t tell you what format the dates are in and every date happens to be a day that never exceeds the 12th of the month!  But Rons Data Edit has a few neat tricks that you probably won’t find in Excel and could be very helpful.  For example, if the source locale is en(US) I can pretty much expect the dates to be mm/dd/yyyyy.  I can check this and I might find any incorrect dates in the source before I start which can help with the validation of the target later:

A 'Filter' panel is in Rons Data Edit showing a filter condition that 'Booking date' is not a matching the condition. The condition is English (United States).

  1. create a filter to use the column with the date in it and set it to either find what you want, or what you don’t want.  In this case I set it to find the dates that don’t match the date in my options.
  2. the option I used was the “Conversion Region” which is English (United States).  I’m assuming this is being drawn from my windows regional settings since there are no controls over the settings for the local in Rons Data Edit itself, and here I use mm/dd/yyyy for the en(US) date.
  3. after applying the filter I see the one date in this small example that doesn’t match the required format.

Pretty neat and I’m mentioning it because I haven’t come across a feature like that in a text editor before.  But the other good capability is that it also supports regular expressions which you don’t get with Excel.  This is incredibly useful for a couple of reasons:

  1. I can carry out some basic data validation on the source and these can be saved as a construct I can reuse.  Getting the source right should make the localization process easier with less errors.
  2. I can search replace content in the source to ensure I maximise the leverage from my TM by improving placeable recognition, and again reduce errors in the target translation.

The search & replace should be familiar to us all as it’s fairly similar to Trados Studio with the exception it shows the change in real time and highlights them in the document… something I wish Trados Studio could do too 😉

Search & Replace functionality using regular expressions in Rons Data Edit shoing the result in real time and highlighted in a different colour.

Newline Characters Within Fields

Handling newline characters within fields can be problematic.  Different systems may expect different newline conventions (LF vs. CRLF for example), and improperly handled newlines within fields can be interpreted as new records, leading to data corruption.  In all  cases we want to be able to manage the files so we send them back to the client in the same way they were received.  Here’s a simple example in my normal text editor:

Multiline CSV opening in a text editor showing the mixed Line Feeds and Carriage Returns within the file.

Mixing LF with CRLF in one file should probably not really happen, but it can simply because of the way these types of files get handled.  Typically Windows uses a Carriage Return followed by a Line Feed (CRLF, \r\n) as the EOL (End of Line) marker.  Unix/Linux use a Line Feed (LF, \n) and when using a MAC the older versions (Mac OS 9 and earlier) used a Carriage Return (CR, \r) whilst Mac OS X and later versions adopted the Unix standard of LF (\n).

When translating these files you should try to ensure that you don’t change these EOL markers so that you avoid any unexpected behaviour when the files are returned to the application they originated from.  This of course begs the question why should they change?  Well, a number of ways ranging from curious opening and closing files intended for Windows in a Mac just to see what they look like for example, to manipulating them in tools not designed to maintain their integrity like Excel.  Expanding on the use of Excel, if I open this file in my example by importing them (I won’t go into just plain opening them which is another mistake people often make) I see a couple of problems.  The first is that the markers seem to have disappeared altogether:

The same CSV file opened in Excel and showing no breaks between the lines that should be on seperate lines in each cell.

In practice, saving this file back to CSV doesn’t remove the EOL markers, but if you were to handle the file in Excel itself you could easily break this since you don’t know the EOL marker is there at all.  You might add the spaces between the sentences for example and lose the marker.  But what does happen when you resave the file as CSV, is that Excel replaces these EOL markers with a pilcrow like this:

CSV resulting from beings saved in Excel and the CRF and LF EOL markers are all replaced with a Pilcrow.

A pilcrow isn’t really an EOL marker and it’s more a function of text processing and document editing software than of operating systems themselves.  I’d also like to think that modern systems these days will most likely cope and make sense of the structure of the file even if you do send it back with these changes to the original.  But the wisest approach is always going to be to maintain the integrity of the source file you received and provide the translation in the exact same way it was presented to you.  And for that you need a tool that won’t mess it up.  Notwithstanding that when I open this file in Rons Data Editor it’s perfect 😉

Screen hot of Rons Data Editor shoing the multiline CSV rendered clearly and without problems.

If you hadn’t realised yet I like this tool!

Embedded Content

Whilst there are more things that can go wrong with a CSV I probably covered the main things already, with the exception of handling embedded content, for problems associated with translation.  Embedded content is a topic that comes up again and again in the forums as people look for a way to handle the markup so it’s safely protected.  For example, take this simple example file:

ID,Name,Description,Category,Last Updated
201,LED Lamp,<p>Energy-efficient LED lamp with <strong>adjustable brightness</strong>. Perfect for desks and bedside tables.</p>,Home & Office,2024-01-15
202,Water Bottle,"<p>Durable, leak-proof <strong>water bottle</strong>. BPA-free plastic. <a href='https://example.com'>Buy now</a>.</p>",Sports & Outdoors,2024-01-10
203,Backpack,<p>Lightweight backpack with multiple compartments. Ideal for travel and <em>everyday use</em>.</p>,Travel & Luggage,2024-01-20

Many CSV files that contain embedded content can be difficult to read with the combination of delimiters and markup creating a quite confusing environment for understanding the structure of the file itself.  If I open it in Rons Data Editor I can not only see a very clear structure for the fields but in this case I note it also has a very neat HTML viewer built into it:

Rons Data Edit showing embedded content displayed as rendered HTML so it's easy to read.

This isn’t going to help you handle the markup in Trados Studio but it is surely a help for when you come to proofread the file.  Now you can read the text without the tags being in the way and you can double check correctness of the tags based on the html rendering of your work.

I’ve covered embedded content many times in my blog but the closest filetype to this in terms of how to manage it would addressed in this article about handling taggy files in Excel.  It still uses the old regex only approach which is what you’ll also find in the CSV filetype so that article should get you to where you need to be.

But what about Trados Studio?

I’ve spent a good while now talking about CSV files and Rons Data Edit, and I’ve really only scratched the surface of what this tool is capable of doing.  But even with the smaller usecase I have for working with these files I think it’s become an invaluable tool in my toolbox.  The developer is also really helpful and regularly updates the application, so if you’re looking for a better way to help you work with these files you couldn’t do much better based on the experience I’ve been having so far.

But what about Trados Studio?  Well now that we know everything we probably need to know about CSV files this bit is simple!!  We’re not going to look at conversions to Excel to do this, we’ll look at a real example, a simple one, that I came across last week.  In the RWS Community ideas page for Trados Studio I saw a request from a couple of users to be able to translate the file and folder names for a project based on a nested folder.  My initial reaction to this was surprise, and I have to confess I’m still surprised, as this surely creates work, and room for problems, especially if the files themselves contain any references to the folders or other files within them.  It seems such a rare requirement to me and one much better managed through a little engineering afterwards.  I also like to dabble in scripting these days now that we have AI to help so much so to do this I created two powershell scripts.

Script #1: Create a tab separated file

If you want to try it copy the code and save it to a file with the extension ps1.  So I called this 01_extract.ps1 for example:

$rootFolder = Read-Host "Enter the path of the root folder"
$outputFile = ".\filelist.txt"

function Get-FileFolderList($path) {
    Get-ChildItem -Path $path -Recurse | ForEach-Object {
        $originalName = $_.Name
        if ($_.PSIsContainer) {
            "Folder`t$($_.FullName)`t$originalName`t$originalName"
        } else {
            "File`t$($_.FullName)`t$originalName`t$originalName"
        }
    }
}

Get-FileFolderList -path $rootFolder | Out-File $outputFile
Write-Host "The file list has been successfully created at $outputFile"

This script creates a tab separated (delimited) file with four columns:

  1. text denoting whether the data in the row refers to a file or a folder
  2. the full path, including the name, of the files and folders in the project
  3. the name of the file or folder – intended to be the source language
  4. also the name of the file or folder – intended to become the translation of column #3

Script #2: Rename all the files and folders in the project

The script looks like this:

# Prompt for the path to the translated text file
$translatedFilePath = Read-Host "Please enter the full path to the translated text file (filelist.txt)"
# Load the content from the translation file
$content = Get-Content -Path $translatedFilePath
# Sort items by their depth in descending order
$sortedContent = $content | Sort-Object {($_ -split "`t")[1].Split('\').Length} -Descending
foreach ($line in $sortedContent) {
    $parts = $line -split "`t"
    $type = $parts[0]
    $fullPath = $parts[1]
    $newName = $parts[3]
    if (Test-Path $fullPath) {
        # Construct the new path based on the new name
        $parentPath = Split-Path -Path $fullPath
        $newFullPath = Join-Path -Path $parentPath -ChildPath $newName
        try {
            Rename-Item -Path $fullPath -NewName $newName
            Write-Host "Successfully renamed $type at '$fullPath' to '$newName'."
        } catch {
            Write-Warning "Failed to rename $type at '$fullPath' to '$newName': $_"
        }
    } else {
        Write-Warning "Could not find $type at path: $fullPath"
    }
}

This script loads the tab separated file into memory and then sorts the rows in descending order based on the depth of the path (the number of subfolders indicated by the path).  Once it’s done that it replaces the name of each folder, and then file with the translated content from the fourth column.

The way it works in terms of process would be this:

  1. run the first script
  2. you will be prompted for the path to the root folder for the project (the one containing all the files and folders)
  3. the tab separated file (filelist.txt) is created
  4. translate the filelist.txt so that the fourth column contains the translated names (you could add this back into the overall Trados Studio project as an additional file at the end)
  5. run the second script
  6. you will be prompted for the filelist.txt (now translated)
  7. the script reads from the file and replaces all the names with the translated versions

You can find a video of the whole process in the ideas post in the community, but I reproduced it below for convenience:

I didn’t mention the Trados Studio part so I’ll cover that here, but first let’s take a look at the filelist.txt in Rons Data Edit:

A screenshot of a CSV file opened in Rons Data Edit, with 4 columns and several rows visible. Column 00 contains file types like 'File' and 'Folder'. Column 01 has paths to various folders and files on a Windows system. Column 02 lists document titles, such as 'Release Planning' and 'Development Tracking'. Column 03 shows the same titles translated into Chinese characters.

You’ll note that the Chinese is already translated and this is because I can easily copy the fourth column (called Column 03 because I didn’t use a header row) into an MT provider, in this case Google Translate, then copy/paste the translated text back.  Takes seconds and now I can set up Trados Studio like this:

A screenshot of the filetype settings for importing a 'Tab Delimited Text' file into Trados Studio. The relevant options include selecting the delimiter, specifying the source and translation columns, and additional settings like extracting comments as structure information, excluding certain lines, handling existing translations, and the process for handling files not matching formatting requirements. In this context, the filetype is set up to consider the third column as the source text and the fourth column as the translated text, with the first column as comments.

The advantage being when I open this for translation I have the file ready translated for me to post-edit against the Translation Memory and Termbase I used for the project, so my context is good, and whatever other resources I decide would be useful to complete the work

The Trados Studio Editor View showing a Translation Memory being used to validate the translations, the DSI Viewer plugin is installed to see the comments as context as I work.

It’s funny how this article on the ever so simple CSV file format turned into one of the longest articles I have ever written!  If you’re still with me I hope you found it interesting, and perhaps Rons Data Edit is a tool that’ll make its way into your armoury too!

Viewing all 49 articles
Browse latest View live