A couple of little known gems in SDL Trados Studio

July 1, 2012, 8:14 am

≫ Next: Creating a TM from a Termbase, or Glossary, in SDL Trados Studio

Two questions came up on ProZ today which Studio can handle very nicely. Despite this I often see some very clever and amazing workarounds that are probably not necessary at all. So I thought I’d write this quick post for two reasons… the first just to share these great and easy to use features in Studio, and secondly because I thought I’d use FastStoneCapture to record a video to explain the process when I answered both questions on ProZ this afternoon, and I blogged about this brilliant little tool last week.

Using the Regex Filetype in Studio

So, the first question was about how to extract the text after the equals sign when translating LNG files. The sample text provided was this:

OptimizingPars=Optimizing paragraph structure
GettingParData=Getting paragraph data
AnalyzingPars=Analyzing paragraph structure
RemovingPars=Removing paragraphs

Studio has a great regex filetype that has been in the software since the early Trados days as well, but it’s even better now. So my video is here:

Translating columns of text prepared in Microsoft Excel using Studio

The second question that came up related to being able to translate a column of text in excel, but place the translated text into a different column. Then, just to add to this problem, some of the text was already translated and was already in the second column.

Studio can also handle this a couple of ways really nicely, so I did another quick video to address this as well:

So, two neat and simple to use features in SDL Trados Studio.

↧

Creating a TM from a Termbase, or Glossary, in SDL Trados Studio

July 10, 2012, 4:06 am

≫ Next: What can you do with the SDLXLIFF Converter?

≪ Previous: A couple of little known gems in SDL Trados Studio

In the last week or two this question of how to create a Translation Memory from a glossary, or termbase exported to Excel has arisen a few times. There have also been some interesting and clever responses… but notably not the easiest one.

Studio has a csv filetype that provides some very interesting options, like this:

CSV isn’t great for retaining clever formatting, but I think I’d be safe in saying that most glossaries are not formatted anyway, so this presents us with some interesting possibilities. The simplest and the one referred to at the start of this post is converting your glossary that’s in Microsoft Excel to a Translation Memory.

To do this you only need a few simple steps.

Step One
Save your Excel file as a CSV (Comma delimited) file:

Step 2
Set up your CSV filetype in Studio as follows:

Go to Tools – Options – Filetypes – Comma Delimited Text (CSV)
In my file I have source language in the first column of my spreadsheet and target in the second column. So I make sure have the columns set up like this:
I also make sure the delimiter is set to be comma:

This is because if I look in my CSV file the two columns are separated by commas, like this:
I then check the option to automatically confirm the translations so I don’t have to do this (assuming I know the translations are good)

Step 3
Open the CSV filetype in Studio and add the TM you wish to update, or create a new one as you go. The view of the file in the Editor looks like this:

So you can see both columns from the spreadsheet in the Editor as source on the left and target on the right, all confirmed and ready to update into your Translation Memory. Note in the translation results window there are currently no results showing.

Step 4
Run a batch task to update your TM. Easy way to do this from here is to use File – Batch Tasks – Update Main Translation Memory (not forgetting to save the project first – see earlier blog on using Open Document)

And Bob’s your Uncle… all done. I now get results from my translation memory window:

My Translation Memory now contains all these Translation Units:

And if I wish to export this as a TMX I can right click on the TM in this view and select export:

If I was only interested in the TMX, then there are also a couple of applications on the SDL Open Exchange that can create a TMX directly from an SDLXLIFF. So you would only need to open the file in Studio, save it, and then convert the SDLXLIFF…. also very simple and one of them can save in either direction so English to Portugese or Portugese to English (in my example):

- SDLXliff2Tmx by Costas Nadalis, TMServe

- SDLXLIFF to Legacy Converter by Patrick Hartnett, Logos s.p.a.

Once you get familiar with this filetype you’ll see there are other interesting applications for it, for example:

Use Studio to QA your terminology
Use Studio to create your terminology by translating glossaries and having the target text placed into the second column automatically
Use Studio to add another translation to an existing glossary by placing the translation of either the first or second column into the third column.. etc.
Use Studio to translate excel files where the source is in one column and the target is partially complete in another. Here you can make use of the locked segment ability to confirm and lock the segments already completed so you don’t waste time working on them unnecessarily
Take comments with instructional material that are in a separate column and convert to structural context to help with the file… for example this:

Can look like this in Studio with this CSV filetype:

Where we see the existing translation is locked and confirmed already and the comments are now reflected in the right hand column as COM. You can also click on these to see this:

So just a few interesting uses of this filetype… no doubt you can think of many more.

↧

What can you do with the SDLXLIFF Converter?

July 18, 2012, 12:30 pm

≫ Next: Making use of the Studio Track Changes features

≪ Previous: Creating a TM from a Termbase, or Glossary, in SDL Trados Studio

Whilst SDL Trados Studio 2011 SP2 incorporates the ability to export and import Word documents for review the application originally developed is still available and working (in fact SP2 has an updated version). @jaynefox wrote a very nice blog post about how to use the SDLXLIFF Converter for Microsoft Office that is available for Studio 2009 through the SDL OpenExchange and is installed with Studio 2011 in the program group. So I thought it would be interesting just to note what the different options are for this application.

But first I’ll just confirm where you find it. If you have Studio 2009 you get it from here (click on the image):

If you have Studio 2011 then you will find it in here and not on the Open Exchange:

If you have SDL Trados Studio 2011 SP2 then in addition to this you can also use the most commonly used part of this tool as a batch task within Studio itself. The most commonly used part described in a nice clear way by Jayne Fox in her blog article on this.

There is also a nice video on YouTube that takes you through the process using this application. But I’m going to quickly go through what you can do with all the options of this tool (that is still available separately after the release of SP2) and I hope it will give you some ideas on how to use this to handle many other things that crop up from time to time… it’s more than just a review tool.

Across the top of this application we see this which belies the real capabilities of this application:

This application can also convert the files to Microsoft Excel and also to xml, both of which have their uses. So whilst these tabs do indeed say MS Word now, when you change the option in the settings tab like this note that the first tab now says “Convert to MS Excel“:

If I select the xml option then it will say “Convert to XML“. We’ll look at why these options can be useful in a minute, but a point to note is that these two formats, Excel and xml, are export only. So you cannot import these back in again.

But before we look at the options and how they work it is worth noting this part which is found at the bottom of the start screen on the “Convert to MS Word” tab:

Studio has the concept of projects, which can be a single file in one language pair or multiple files being translated into multiple languages. By clicking on the “Load files from project…” button or by dragging the *.sdlproj file from the project folder into this first window the application will add all the *.sdlxliff files in the Project for conversion. So this could be hundreds of files in multiple folders and they are all handled in one go with the converted files placed into the same folders as their respective *.sdlxliff files. This may not sound that impressive, but I think it is, so I created a quick video just to show you what I mean:

The same thing works in reverse so as you get your reviewed files in Microsoft Word back, by placing into the same folders you can import them to update your translations after review individually or as a complete project in one go.

If we go back to the settings tab then and take a look at the options.

Export Format

The screenshot shows you have three file formats for exporting to.

1. Microsoft Word
2. Microsoft Excel
3. XML

I previously mentioned that you can only import the Word format, so for reviewing a translation this is the best option. There are two layouts for this, a “Side-by-side” and a “Top-down” approach. These are quite useful because many reviewers prefer to read through the source and translation in different ways, and certainly for some texts the “Top-down” approach would be more suitable. You can see both options with different colours (customisable) for different match values below:

The Microsoft excel export provides for the ability to set the column widths for the source and target before you start… a nicety but not really necessary, and the output can also inherit the colours as above:

The excel export has some excellent applications, some of which I have touched upon in previous articles. So a few ideas might be:

QA checking individual segments based on variable character lengths. So you can set the different parameters in Excel on a segment by segment basis if needed and have complete flexibility this way. Then use the segment numbers to cross reference the translation and correct as needed.
Creating Translation Memories from monolingual sources. WinAlign or similar would also be suitable for this, but I came across a question on ProZ last week where the user had monolingual xliff files that he wanted as a Translation Memory. The solution using this application is documented here.
Some users who were used to using SDLX had quite defined workflows around an Excel export like this, so for these users it meant an easier transition to Studio without losing their Excel review process.
Being able to filter and order the segments to an almost unlimited capability with Excel allows a much enhanced QA capability for some content, in addition to producing complex analysis reports on the translation compared to the source that are unthinkable with any CAT tool.

The XML export provides a simple and easy to understand xml structure for the translation:

This is more interesting than it looks at first glance. I opened this in a web browser and the message at the top tells me there is no stylesheet associated with this, so this gives you a clue as to what value you might be able to achieve from this in being able to render any supported file format in Studio as a simple xml file that could be viewed on a web page in an intranet for example allowing a quick check by anyone without the need for any software at all, only a web browser… and also on any device that supported a browser and internet access. You could use the sdlxliff of course, but this simple xml is a lot less complex making it useful for more users.

So this one may not be that interesting for a translator, but it has some hidden value in the right circumstances for extending the review capability to other mediums.

Misc. settings under this option

There are a couple of other options in there for determining whether you want to extract comments or not, and also to define a filename prefix for the exported files:

The SDLXLIFF Converter integrates comments made in Studio with the Microsoft Word commenting feature very nicely. So comments made in Studio will appear in the exported document, and comments made in Word will appear in Studio when imported. This first option on whether to extract comments or not determines whether you take the comments in Studio and export them to Word or Excel. So these two subsegment comments for example:

Would appear like this in the exported Word file allowing the commenting features to be used in Word. Of course the reverse is also true so if you add comments in Word they will be imported back into Studio:

In Excel they are simply added to the comments column.

Exclusions

This set of options allows you to be selective with the things you are choosing to export. You can exclude by segment category, or by segment status.

You might choose to do this so you reduced a very long file for review to something shorter and only relevent to the work that needs to be addressed.

A practical example may be that you have received a partially reviewed file back and now need to send the rest out to your reviewer. If there is still sufficient context in what’s left then by changing the status of the reviewed work to “Translation Approved” if it is not already, then by checking the appropriate box you will get a file containing all the “unapproved” segments.

You might also decide you only want to review fuzzy matches because you do not get paid for 100% at all (so remove the temptation to waste time reading them). Whilst you can use the colours to distinguish them from other matches you still have to work through the whole translation to get to them. For some texts it might be better to only have to work through the fuzzy matches and excluding the appropriate categories will allow this.

Segment update

The options here will allow you to set the status of the segments you import back into Studio from Microsoft Word and can also be used to determine whether you update the content of all segments to reflect the fact the document as a whole has been reviewed, or only those you made changes to.

The tracked changes capability is also worthy of special mention. So if the reviewer made changes like this in the Word document:

They are represented in Studio upon import like this:

Then you can filter on segments with tracked changes and use the tracked changes functionality, that works the same way as in Word, to accept or reject as required.

Of course this also works the other way around, so changes made in Studio using the tracked changes functionality will be exported into the Word document where you can use the same functionality in Word.

Colors and segment locking

The results of using these settings can be seen in the images above. You have options to set different colours for different segment categories:

The additional option, that is only really useful when working with Excel is the ability to lock the segments to prevent them from being changed.

Application

This option is there to prevent mistakes and to allow you the ability to recover changes that were made in error:

The first option will display a warning like this when checked:

So if you have already created your export, particularly when working with Projects where the number of files you are working on could be considerable, then setting this will warn you if you are about to overwrite the exported document you had already created.

The second option just creates a backup file of the sdlxliff by replacing the *.sdlxliff extension to *.backup. This way if you make a mistake you can always delete the sdlxliff and rename the backup as sdlxliff again to get back to where you started.

That’s it… a little long but hopefully it will give you an idea of what this tool can be used for in addition to the review cycle, and maybe spark a few more interesting applications… feel free to post them in the comments as it’s always interesting to read what people do with these tools.

↧

Making use of the Studio Track Changes features

July 26, 2012, 4:09 am

≫ Next: Did you know you can export Studio comments in your target Word file?

≪ Previous: What can you do with the SDLXLIFF Converter?

SDL Trados Studio 2011 SP2 was released last week and SDL are in the process of giving introductory webinars and sending mailers with lots of nice details about the new features provided. One of these features is being able to open word documents (DOCX only) that contain tracked changes. This is interesting of course, but what makes this so useful?

The feature is aimed initially at regulatory industries such as Pharmaceuticals and other Life Science verticals. These industries operate within a very controlled environment where every change is significant and has to be managed carefully to avoid the potentially damaging effect of an incorrect sentence or translation when goods are finally released to market.

But this isn’t the only usecase for being able to work with tracked changes in a translation environment. The legal industry is similarly regulated, and in some ways because of the faster turnaround to market of legal texts compared to new drugs it is even more advantageous to be able to handle these texts clearly within the translation tools. Professional legal translators normally have a background in the legal industry and it is essential that they are able to not only translate the text, but also create a translation that truly reflects the legal implications of the original for the intended audience. So being able to clearly see what has changed as you work and then reflect the changes accurately in the translation is an advantage.

At the moment, one way to handle changes to legal translations, is to review the original translation with the amended translation using the Word compare feature. So the translator is provided with an amended text in the source language and with no reference other than a copy of the original will retranslate the file against a Translation Memory (if they have one) and produce a new target text. The reviewer and the translator can use the Word compare feature like this to compare them and work on the changes as required:

This is a great tool, and you can continue to use it, but for the translator who is using Studio SP2 this is no longer necessary and an improved way of working is possible. There are new options on the DOCX filetype like this:

These options allow you to decide how you handle word documents that contain tracked changes.

You can ignore them, and then you will get the same effect you get now if you try to open a word document containing tracked changes. The solution being, remove all the tracked changes first so you are working on the final document.

You can apply the changes first, and with this option all the changes will be accepted and you will be working on the document as changed.

The third option is the one we’re talking about here. This option allows you to open the document and see the changes in the same way they were reflected in Word. So for example, I have this document in Word that has been updated after the original translation was completed:

I can now add this file to my project and pre-translate it, or use Perfect Match. In this case I used Perfect Match and the result in Studio looks like this:

So anything that has not been changed now has the PM status in the centre column to show this is a Perfect Match, and the segment is locked (optionally). The TC status in the blue square is new, and this means the sentence has a match in the Translation Memory, but the original source is changed using track changes. The match value is still shown in the Translation Results window. So in this example segment #3 is actually a 48% match (if you work with this low a setting for fuzzy matches) and you can see clearly what has been changed by being able to see the tracked changes in the source segment, and also in the TM results window:

In order to be able to find all of the segments that have been changed to work on them specifically, or just make sure you didn’t miss any you can use the display filter:

The display filter now supports filtering on tracked changes in the source too so you can display only these segments. In this case showing segments with matches, and also segments that were not in the original translation at all so the only evidence is that the source is an inserted tracked change as in segment #5:

So far this is clear… but now you want to translate the changes and also make sure they are correctly reflected as tracked changes in the target word document so that the reviewer can make the changes appropriately. To do this you turn on the track changes toolbar in Studio. If this is not in your menu just go to View-Toolbars-Track Changes and activate it when you are in the Editor View. This toolbar looks like this:

Toggle Track Changes On/Off (Ctrl+Alt+F9)
Toggle Final Mode On/Off (Ctrl+Alt+Shift+F9)
Accept Changes and Move to Next (Ctrl+F9)
Reject Changes and Move to Next (Alt+F9)
Move to Previous Change (Shift+F9), and Move to Next Change (F9)

You’ll notice the similarity in functionality to Microsoft Word to make sure that using this provides a similar experience and is easy to grasp. The Toggle Final Mode feature being a cut down version of the range of features in Word, but still allowing you to see what the text would look like in source and target if the changes were implemented. This is useful because sometimes you can easily add or omit spaces when working with tracked changes and this allows you to very quickly see the impact of what you have written… so in this example you can see the double spaces that appear once you toggle off the tracked changes:

Once you’re happy with the translation you can save the target document and the tracked change translation you have completed in the bilingual file, like this:

The tracked changes translated in Studio are now in the fully formatted target document like this where the reviewer can work through the file and make the changes as required:

However, we haven’t addressed how to get the changes made by the reviewer in the target document, which of course you as the translator want, and as part of a properly documented and controlled process you want this too. So another very important point to note is that the External Review export that is now part of Studio 2011 SP2 (The SDLXLIFF Converter for Microsoft Office from the SDL OpenExchange) will create the document for review with the tracked changes like this:

So if the reviewer accepts/rejects/amends the changes in this document then it can be imported back into Studio at the end without the need for any double handling and potential errors from manual transposition. Then the final target document can be produced from the bilingual file, Translation Memories can be updated, the bilingual file can be stored safely for future leverage using Perfect Match and the process is complete.

This is quite a long post, so I hope you’re still with me as I wanted to mention one more thing. The SDL OpenExchange (my favourite subject) can provide you with the ability to create customised Package formats that might be more appropriate for your workflow. So for example we have a special package format available for Pharmaceutical Companies that contains documents used in their very specific processes during the translation and review process. In Studio, with the right license, you will see this:

The reviewer will then receive a bundle of documents that allow them to properly review the translations and changes with a familiar workflow, make their own amendments, all using Microsoft Word as these are often medial professionals and not translators, and then return the files for update in Studio as the example I looked at above.

So to conclude… I think this is a very nice feature to get free as part of a Service Pack, and I’m looking forward to seeing how this will be used for other industries in addition to the initial target.; legal being the obvious candidate but I’m sure there are others too.

Note: Please excuse the Welsh… it’s all Google Translate

↧

Did you know you can export Studio comments in your target Word file?

July 27, 2012, 3:50 pm

≫ Next: Upgrading your legacy resources – filetypes

≪ Previous: Making use of the Studio Track Changes features

*** Please note that this feature is temporarily disabled in the latest update to SP2***

If you found this ability to export comments into the target file useful I’d be very pleased to hear in the comments to this article.

Exporting Comments

Another nice addition to Studio 2011 is the ability to include comments in your target file when translating word files.

So take this example of a translated document with three comments in there… two subsegment comments making it easier to bring attention to the relevant part of the text, and one ordinary comment at the segment level:

When I save the target I see this:

So comments added to the fully formatted target file, and using Word commenting where the location of the comment directly relates to the text highlighted in Studio.

A nice little feature… with only one drawback… you can’t prevent the comments from being exported if you add them to the document. The option is there but it is greyed out (a bug to be fixed soon ;-) ):

So if you see comments in your target document and don’t want them then you need to remove them from your file in Studio first first:

And if you have a lot of comments and don’t want to lose them then you can use the Export to External Review if you have Studio 2011 SP2 or the SDLXLIFF Converter for Microsoft Office to export the file first and then after deleting the comments and saving the target file, import the review document straight back in to replace the comments. However you can only import the comments back in using the SDLXLIFF Converter for Microsoft Office because there has been a change to the SDLXLIFF after exporting and the integrated version in SP2 checks this to prevent you overwriting changes that may be important. The OpenExchange application is less controlling (fortunately) and you do have this installed with Studio 2011… all versions.

So you may need a workaround if you use comments and don’t want them in your file… but only for the time being and I think it’s still a useful feature.

↧

Upgrading your legacy resources – filetypes

August 6, 2012, 6:45 am

≫ Next: It’s a colourful world..!

≪ Previous: Did you know you can export Studio comments in your target Word file?

When you upgrade from Trados to SDL Trados Studio there are a number of things you can take with you. Translation Memories, Termbases, AutoText lists, custom variable lists, customised segmentation rules for example. These are all discussed quite a lot in the public forums and in blog articles, but what we don’t see a lot of information on is how to update your file types. As a result I think many users convert files to TTX unnecessarily just so they can use the old *.INI files they’ve had for years.

So, this article is just a quick explanation of how to do this, starting off with what we did in TagEditor to create these *.INI files in the first place. So, in TagEditor when you created a custom filetype it all revolved around the same process where you run the wizard and at the start specify whether the settings will be for SGML/HTML or XML based files:

This is important to note because in TagEditor there was only one filetype for custom files and its use was determined through settings. In SDL Trados Studio we have two. There is a proper XML filetype that has the potential to do a lot more than you could ever achieve with the old TagEditor settings, and there is a custom HTML capability. You can see these in the list when you first try to create a new filetype in Studio:

There are of course more than 60 filetypes that Studio will support out of the box, but I’m talking specifically about the filetypes you create that have different parser rules. So filetypes you can create that extract specific text from the file based on rules you create. For the purposes of this blog article I’m specifically referring to HTML and XML as these are filetypes that you may have an *.INI for that was created in Trados.

Upgrading this *.INI file is not always the best approach because Studio can do things in a better way, and because many *.INI files have not been maintained very well and contain a lot of unnecessary information that could cause performance issues… or may be simply unclear when you try to understand what the rules are doing. However, not everyone has the necessary skills, time, or inclination to rewrite these rules so Studio provides a mechanism to upgrade them. This mechanism is similar in both cases, but you need to know what the purpose of the file is first. Your client should be able to tell you this, but if they don’t and you have the *.INI and the files for translation then you need to look at the file for translation and see what it is before you create your new filetype.

Generally if it’s XML then it will start with a declaration similar to this:

<!–?xml version=”1.0″ encoding=”ISO-8859-1″?>

It may also have a reference to a DTD or an XML schema like this that are used to define the structure of the file, and in Studio you can validate the files against the relevant DTD or Schema:

<!–DOCTYPE note SYSTEM “multifarious.dtd”>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”>

In this case when you upgrade your *.INI file you will select XML in the Select Type dialogue box above where you will be presented with options, all optional, for the filetype properties:

As a general rule of thumb I like to complete the four options shown here because they ensure you always know what the file was for and also have a double check when translating that the correct filetype was chosen:

This is the name of the Filetype that will appear in the File Types list in Studio
This identifier is used to ensure that Studio picks the correct filetype when saving the target or previewing with a custom stylesheet (so if you share an SDLXLIFF with another translator they will also need this filetype on their computer to do these things in addition to translating), and can also be used to check the correct filetype was used by checking against the TagID:

So when you select this mode the orange tab at the top of your translation will show the filetype ID that was used instead of the filename:
The file dialog expression can be set to whatever the extension of the file you are using is. In this case it’s just XML so I left it that way.
Finally, the description is to ensure you know exactly what the filetype was for, and can contain client details, dates you got it… whatever is helpful to ensure you always know exactly what this filetype was for.

The next dialogue in the wizard is the important one here:

I can base this new XML file on various things… one of them being the *.INI file I use for completing translations for a particular client. So I select the *.INI and this brings in all the parser rules… I’m happy with them (or don’t know any different) so I click on Next until I see Finish and that’s it… just a few clicks.

If the files I had were HTML, and I’d know this because on opening the file it would probably start off with an HTML declaration and then some code surrounded by the HTML element:

<!–DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN” ”http://www.w3.org/TR/html4/strict.dtd”>
<HTML>
… some stuff in here…
</HTML>

Then I would repeat the process above and create the new filetype after selecting HTML instead of XML. In this example of what an HTML file looks like I have also referred to a DTD because HTML 4.01 was based on SGML and this is why when you look at the options for creating a filetype in TagEditor you see SGML and HTML as a single option; they are both based on the HTML filetype. However SGML probably won’t have the HTML element in there… just a DOCTYPE reference that if you’re lucky will say something like:

<!SGML ”ISO 8879:1986 (WWW)” … etc

But more often than not it won’t… so the absence of the HTML element, and the likelihood that the file extensions you are provided with for translation are *.SGM may be a giveaway. Hopefully you won’t need to worry as your client should tell you what the file was for anyway… but if not maybe this small amount of information will be helpful.

So, I complete the first screen like this:

Just the same as before except that this time the template is based on HTML and not XML, and I amended the file dialog expression to be *.sgm to match an SGML filetype (I’d have left the default File dialog expressions for an HTML file). Once complete the next step is to add the *.INI and I am again presented with the option for this on the next screen:

That’s it… so pretty simple. The resulting effect is that I have new filetypes upgraded from an old Trados *.INI for XML, HTML or SGML files:

So no reason to convert these files to TTX first; you can upgrade these old filetypes to Studio as well and enjoy a better experience when translating the files as well as removing the additional steps required to go to TTX first and then converting the target files from TTX at the end.

↧

It’s a colourful world..!

October 17, 2012, 7:47 am

≫ Next: My favourite OpenExchange apps in 2012…

≪ Previous: Upgrading your legacy resources – filetypes

The release of SP2 introduced a couple of nice enhancements to the filetypes that haven’t been publicised, but for the right use-case could be very useful indeed. These enhancements revolve around using colours..!

Highlight Text in Microsoft Word

The first example relates to translating Microsoft Word files (DOC and DOCX) and having the ability to mark up text in the fully formatted target file that requires attention by the reviewer. To do this there are new QuickInsert button(s). So for a DOCX you have a single button to apply the highlighting:

And for DOC you have two… the first is Apply and the second is Force Off:

Actually it only looks like this in the default settings for me because DOCX is using more of the spaces up… so if you also see this then look in here under the full list of available QuickInserts and you’ll find them both:

To apply the highlighting in DOC or DOCX however you only need the apply. So just select the text and click on the Apply HighlightColor or use the keyboard shortcut. If you select the text and do this then there is no need to force off at all.

So my document in Studio can now look like this:

Don’t forget about the tags…

if you are displaying tags you will see them like this in the target document but you should not receive errors for a mismatch with the source.

Microsoft Word can recognise these tags and will apply the appropriate formatting onto the text when the file is opened in Word.

One interesting comment I came across from a user who found this feature helpful is that Studio can only highlight in one colour..! So you only get yellow, and for this user the client used different colour highlighting for different stages in the review process. So she actually wanted green..! But this made me look at Word because there you have a pretty cool search and replace tool that can actually replace the colours of the highlighting. So, my target document now looks like this in Word:

To get green I first change the highlight colour to green:

Then I activate the search and replace window and click on <<More (1.) which provides more options. In these options I can click on Format and choose Highlight (2.)

If I do this with the cursor first in the search box and then again in the replace box, with nothing entered into the fields at all then I see the word Highlight added underneath the field boxes (3.)

If I now run the search I get this:

Which is exactly what I wanted… green highlighting..! Maybe you already knew this… but this capability was a neat find for me.

Non-translatable text in Microsoft Excel files

This feature is even more hidden than the previous one… especially if you don’t loko at the filetypes very much… and I have to confess when it comes to the Excel filetypes (XLS or XLSX) I don’t… unless I’m adding some embedded content rules. To find this feature you go to Tools -> Options -> File Types -> Microsoft Excel 2007-2010 (or 2000-2003 ) -> Color and you’ll find this new addition to the options:

So I can take an excel file with content like this:

If I open this in Studio I see this:

Now this is a simplified example, but assuming I didn’t want to make the product codes available for translation then I could use regex to filter them out in the display filter, and I could use regex to convert them to protected placeables… but if I didn’t want to do that, or if the text wasn’t as simple as it is in this example then I could apply a colour to the text and restore it later… like this… note the range of colurs here matches the ones in the Excel File Type Options so it’s easy to make sure you get the right one:

So I choose orange and colour all the product codes in orange. Then I select the same orange in the Excel File Type settings and open the file in Studio:

Much better… and easier to avoid mistakes and the effort required to work through the file.

It is a colourful world..!

↧

My favourite OpenExchange apps in 2012…

January 4, 2013, 2:48 pm

≫ Next: I thought Studio could handle a PDF?

≪ Previous: It’s a colourful world..!

When I started writing this blog the first article I wrote was about the SDL OpenExchange. I thought I’d start this year off by sharing my favorite applications … my favourite FREE applications. We had a fair few of these over the course of the year but I’ll pick out six that I think are well worth a look. In no particular order (well… alphabetical order) these six are:

Glossary Converter
Package Reader
SDLTmReverseLangs
SDLXLIFF Compare
SDLXLIFF to Legacy Converter
Terminjector

I haven’t written about all of these applications but I’ve probably mentioned them here and there, so this article provides a quick summary of what each one does, shows you where to get it and links to a few places where you can learn more about them.

Glossary Converter

This was a latecomer in 2012 and is already the third most downloaded application of all time. In a nutshell this application makes it possible for you to create a MultiTerm termbase from a spreadsheet (or a TBX) just by dragging and dropping the file… and then it works the other way too (MultiTerm termbase to spreadsheet). You can read more on this here:

You can download the application from this link on the OpenExchange.

Package Reader

If you work with Studio packages then this application is indispensable. It provides a way to see what’s in the package (files, termbases, translation memories, analysis reports) and also get a preview of the translatable files themselves… all without opening Studio at all. Works for packages and return packages. You can find a little more info on this application here:

The ATA53 Studio presentation in San Diego…

You can download the application from this link on the OpenExchange.

SDLTmReverseLangs

Reversing a TM in Studio requires an export to TMX and then import back into an empty TM created in the other direction… or another workaround. This application makes it easy, you just drag and drop your sdltm into the application window, press a button and the TM is reversed for you as a copy. Simple and very useful.

Costas Nadalis has written several useful applications for the OpenExchange and I think we’ll see more really exciting things from him in 2013.

You can download this application from this link on the OpenExchange.

SDLXLIFF Compare

One of my favourite applications altogether and I expect to see more development extending the usefulness of this great tool in 2013. The current version allows you to take two sdlxliff files (Studio bilingual files) and compare them so you can see the changes that were made during translation or review. You can also take a complete project (bilingual or multilingual) and compare it with a different version so all the changes are displayed in one simple to read report. You can find a little more info on this application here:

SDLXLIFF Compare

I like this one so much I can’t believe I haven’t written about it yet… so you can be sure I will do once the new version is ready to go..! You can download this application from this link on the OpenExchange.

SDLXLIFF to Legacy Converter

This hardly needs an introduction because I think anyone who uses Studio and also has clients still using Bilingual Word or TTX files already has it. This application can convert an sdlxliff to a Bilingual doc/docx, TTX or TMX (can also be reversed) and then best thing is that once the Bilingual doc/docx or TTX has been translated in whatever tool requires this format you can import the files back in to update the sdlxliff. Excellent tool. You can find a little more about this application here:

Another useful application for this tool is being able to create files that only contain part of the original sdlxliff. There could be many reasons for wanting to do this but here’s a good example:

Identifying numbers in your analysis

You can download this application from this link on the OpenExchange.

Terminjector

I’m still finding uses for this extremely clever little plugin to Studio that act as as a wrapper to your translation memory. The basic idea is that you can set up patterns using regular expressions, or create your alternative glossary lists, that can inject terms, words, phrases, dates, numbers etc. directly into your translation memory results. This is particularly useful for handling placeables that are not recognised by Studio or for creating variables that are based on a regular expression rather than a list of the actual variables themselves. Confused? It probably makes sense to take a look, or read some of these articles that give an example of how this tool can be used… really very useful once you get your head around it:

You can download this application from this link on the OpenExchange.

Of course we saw many other really useful applications and I’d recommend you take some time to browse through the applications available to see whether there is one there that might just save you huge amounts of effort based on your particular processes.

That’s it… maybe you can drop a comment and tell me which applications you found the most useful, or even what improvements you’d like to see in the existing applications. In the meantime I’m really looking forward to some of the exciting developments we can expect in 2013 from the developers of these applications themselves..!

↧

I thought Studio could handle a PDF?

February 3, 2013, 3:32 pm

≫ Next: Translate with style…

≪ Previous: My favourite OpenExchange apps in 2012…

Studio has a PDF filetype, and it can do a great job of translating PDF files… BUT… not all PDF files!

So what exactly do I mean by this, surely a PDF is a PDF? Well this is true, but not all PDF files have been created in the same way and this is an important point. PDF stands for Portable Document Format and was originally developed by Adobe some 20-years ago. Today it’s even a recognised standard and for anyone interested you can find them here… at least the ones I could find:

Despite this recognition, the strength of the PDF is unfortunately its weakness when it comes to translation and this is because we need the text. Any document format can become a PDF and there are many ways to get there. So for example you can create a file in MSWord and save it as a PDF. If you do this then Studio will normally be able to extract the text from this PDF and present it for translation. But if the PDF was created from an image then no text will be extracted at all. The reason for this is because there is no textual information in an image so you have to use OCR (Optical Character Recognition)… exactly the same as if you had to translate text from images inserted into a Word file.

If you receive a PDF that was created from images then you will probably see something like this when you attempt to open the PDF in Studio:

Image based PDF

This is because Studio was unable to find any text so you only see the start and end of file markers in orange. So what’s the solution? You have to OCR the PDF and extract the text. There are plenty of OCR solutions out there, many users tell me the best of which is ABBYY Finereader. This application can extract the text and even make a decent job of retaining the formatting. But if formatting is something your client is prepared to tackle, because it may not be trivial, or you have another DTP application you intend to use for this, then you might find an accurate and free application like FreeOCR useful. It uses the Tesseract OCR engine, originally developed at HP Labs and now improved and sponsored by Google. The interface looks like this, with the pane on the left showing the image based PDF I opened and the pane on the right the extracted text:

FreeOCR

So if I take a PDF that is prepared on a textual basis, and the text extracted from the image based PDF using FreeOCR, and open them both in Studio I see this:

Difference between extracts

You can see the PDF on the left retains all its nice formatting, and things like hyperlinks and other tags would probably be retained too, whilst the one on the left is plain text. It’s not perfect, but you could easily go through the text and tidy it up before opening in Studio… certainly the text extraction isn’t too bad. The application also comes with built-in language packs to improve the character recognition for EN, DA, DE , ES, FI, FR, IT, NL, NO, PL and SE. The final target file, if you can open the PDF directly in Studio, will be a DOCX anyway so it might be a good solution for many.

FreeOCR can also OCR directly from a scanner, so if you are only sent hard copies for translation and you have scanning capability then you can use FreeOCR for this too. Certainly worth a look if you need to resolve a problem like this and you don’t have any other software already.

A final word on this would be to make the point that I think PDF is a last resort, even if Studio does have a PDF filetype. It is always more appropriate to work from the original source file and avoid all of this additional work.

↧

Translate with style…

February 22, 2013, 2:03 am

≫ Next: A clean editing environment?

≪ Previous: I thought Studio could handle a PDF?

Quite often people ask me how to handle XML files where the author has written guidance notes as a tag in the XML. These guidance notes should not be translated so you don’t really want to see them presented as a translatable segment as you work, but you would like them to be clearly visible as a reference for the translator to help clarify meaning or give guidance on the maximum number of characters allowed for each segment when this could vary throughout the file for example.

One of the ways this used to be handled in the “olden days” was by creating a special ini file that TagEditor could use to ensure that the text containing the guidance was visible within a tag but not as translatable text… so something like this where you can see the comment explaining what “coots” and “herns” are:

If you open the same file in Studio you have much cleaner view but of course you can’t see the comments. You can’t see them if you open the file saved as a TTX, and you can’t see them if you create a new filetype for the original XML and only extract the translatable text.

But all is not lost… and the solution is even better! Studio can preview XML files out of the box so you have an idea about what’s in the file as you work. So if I create a new filetype for this file and preview it I can see the translatable text on the left in the source column of the Studio Editor, and the Preview of the file is on the right in the Preview Window that I have positioned here for convenience:

This isn’t too bad… but this is quite a clean XML that I created myself by using the text from a website explaining what this poem means. In practice many XML files have a lot more nontranslatable text and the default preview here would be pretty messy. Wouldn’t it be nicer for the Translator if the preview looked more like this for example:

I added the picture for fun, but you can see that the preview now provides much better context for the Translator because the translatable text is in the column on the left, line by line and the comments providing guidance are nicely added to the column on the right in a different format providing an easy method clearly seeing what needs to be reviewed as you work. The preview is also real-time so as you translate the text can be refreshed (Ctrl+R) to show the translated text as you go.

And if you have a separate screen you can move the preview window onto your other screen providing a permanent, real-time preview of the work you are doing which would give you room for the simple preview as shown below or a side by side preview where you could display the source layout on the left and the target as you worked on the right:

So, how do you do this and do you need to be a rocket scientist? Well, fortunately if you wish to create a simple stylesheet it’s really not that hard. Until this week I had never actually created one myself from scratch, but in order to help someone with this issue of not being able to see comments that were included in tags I decided to have a go. To get started I used the XSLT Tutorial provided free of charge by W3Schools.

What I learned from this was that the basics for displaying only the things you needed are actually very simple to do. I’ve no doubt a web developer could do fantastic things with this feature in Studio but I also think it’s reasonably accessible to anyone. To help you get the idea I have zipped up several files and placed them here – Paul’s Zip file. The zip contains the following:

poem.xml (the xml file you’ll translate)
stylesheet.xsl (the simple stylesheet I created)
BROOK.jpg (the pretty image… not necessary, but pleasing to the eye)
poetry.sdlftsettings (my filetype)

All you need to do to test these files is click on the link above, download the zip and unzip it to a folder somewhere on your computer. Once you’ve done this just add the filetype I created to Studio by going to Tools -> Options -> Filetypes and then click on Import Settings:

You select poetry.sdlftsettings from the files in your new folder and Studio will give you a short message telling you that these settings are for a new filetype and asking you whether you would like it to be created. So click on Yes and you should be told that the settings were successfully imported. You should now see a view something like this with your new filetype added:

If you open up the Poetry filetype by clicking on the plus symbol on the left you can check out the settings I used. So the Parser rules show how I created two simple rules to extract only the text I wanted.. in this case //text gets me the translatable text from the XML file and presents it in the Editor when I open for translation, and //* with “Not translatable” just means don’t show me anything else in the Editor:

But you probably know about that stuff already… so the interesting part is how to get the style. To do this if you look in the Preview node I have added my stylesheet and a file containing the image I used for fun:

If I close the options now and open the XML file in the zip for translation and then generate a preview you should see the preview as shown above.

The creation of the stylesheet itself is not based on an SDL specific skillset. It is pure XSLT and I created this one using the basics of the tutorial at W3Schools as mentioned earlier. So if you open the stylesheet.xsl file in a text editor, I use EditPad Pro for this but most text editors should be fine then I’ll explain the basics of the file so you can see how I managed to do this… it’s really not as tricky as you think and you don’t have to understand a lot to get this far.

The first thing I learned is that every stylesheet must have a declaration. Conveniently I just copied this straight out of the tutorial:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

I then learned that because a stylesheet is an XML document that it must also start with an XML declaration too. So I copied that one too:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

Easy so far! The tutorial then provided an example of a template you can use for many stylesheets. I amended the content to suit the names of the elements in my XML like this:

<xsl:template match="/">
 <html>
  <body>
  <table border="1">
    <tr bgcolor="#009966">
      <th>The Poem</th>
      <th>Literary Explanation</th>
    </tr>
    <tr>
      <td>.</td>
      <td>.</td></td>
    </tr>
  </table>
  </body>
 </html>
</xsl:template>

So even blindly following the tutorial I have a structure that will give me two columns inside a table. But for a full explanation you should read the tutorial… it is really good and explains each part of this code really well.

The final part was to be able to select information within elements, or tags, from the XML, and then put this part where the dot is in <td>.</td>.

<xsl:for-each select="POEM/Line">
  <tr>
    <td><xsl:value-of select="text"/></td>
    <td><xsl:value-of select="comment"/></td>
  </tr>
</xsl:for-each>

I get the names of the elements for this part from the XML file itself. So if I open the XML file in my editor and take a quick look I can see the structure, or path, to the information I want is like this… and it’s really just like the paths for your files in Windows Explorer… just a different way of thinking about it:

<POEM><Line><text>

<POEM><Line><comment>

So the first statement select=”POEM/Line” is just saying look at all the information in the file in the Line folder. The next two are saying select the “text” and the “comment” information from the “text” and “comment” files that are in the “Line” folder.

Now I appreciate this may well be flying over the heads of many readers, and to be honest it does mine to some extent. But reading these parts of the tutorial really helped me to put it all together and it genuinely only took me 30 minutes to do. Once I had this I was able to preview the translatable text and the comments in a nice clear way… my file now looking like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
    <body>
      <table border="1">
        <tr bgcolor="#009966">
          <th>The Poem</th>
          <th>Literary Explanation</th>
        </tr>
        <xsl:for-each select="POEM/Line">
        <tr>
          <td><xsl:value-of select="text"/></td>
          <td><xsl:value-of select="comment"/></td>
        </tr>
        </xsl:for-each>
      </table>
    </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Once you have this framework you might find that you can reuse this stylesheet with a little editing each time to create better previews to help you work when translating XML files just in case your Client didn’t do this for you. The final version I used for this article is a little different because I added to the stylesheet to make it look a little prettier, but the basics are all above.

So I hope this will not be too technically off putting… given how technical all Translators using CAT tools have to be today just to understand tags and how to handle all the different kinds of file formats, translation memories, termbases etc. I reckon you can do it if you put your mind to it! So translate in style ;-)

↧

A clean editing environment?

May 2, 2013, 11:45 am

≫ Next: Duplicates and Roadshows…

≪ Previous: Translate with style…

000 I love to see technology being used to help provide a clean environment for us to live in and to bring up our children. This topic regularly comes up in our household as my wife and son support the ethos behind this ideal wholeheartedly… actually I may even be understating this point a little!

But this isn’t the clean environment I want to talk about today. I’m interested in a clean editing environment when you use a translation tool.

What I mean by this really is the ability of the tool you use to provide a clear view of the translatable text and to maximise the value you get from a translation by handling the tags for you. I started to think about this more after reading, and participating, in a thread on ProZ that you can review here if you’re interested : http://goo.gl/C1AsD

So for example, I don’t want to see lots of tags when I translate, I’d rather see clear text as much as possible, like this:

003

In that image you can see a segment created from a Microsoft PowerPoint file and then opened in Studio 2011. First of all I displayed the tags and then I turned them off (ctrl+shift+h) so that you have a cleaner way of working. The tag-free image does look better of course, but it is very important to understand that these tags are still there even if you can’t see them. You can read more about handling the tags in the Studio editor in this article:

Simple guide to working with Tags in Studio

Studio is pretty good at keeping the environment clear of tags to help you make sense of the text, and it’s good at easily showing the tags again so you can correctly place them into your target translation. But what I’m interested in now is how well does the translation of the text get reused across different filetypes where the tags could be represented in a different way?

This question arose in the same ProZ discussion so I started to take a look at this with Selcuk Akyuz, a Deja-Vu user, who kindly provided a few DTP test files and a PO file that I could add to the MS Office, HTML and XML files I created. This allowed me to create a Project using these files, and I merged them together so it was easier to see:

001

When I open this in Studio I see something like this:

002

Interestingly the Excel files show the tags whatever I do so I’m stuck with the tags in this format, but the other files are rendered perfectly making it more pleasant to translate in all of these formats. Now, the interesting part is what can I expect in terms of matching? Well before I look at this I think it will be useful to look at the tags and see how they are actually offered in the Editor because they are still there even if I can’t see them. So I press ctrl+shift+h and now I see this:

004

The coloured numbers at the right represent the “type of tagging” that the filetypes extract for these filetypes. So for each of these thirteen different filetypes, each containing exactly the same translatable text, I would really need to have three “different” translations if I wanted to be able to pretranslate this project and achieve a 100% match for all of them. So after adding each one as a different TU I now have this in my Translation Memory where each tag is a placeholder for the content rather than the actual content itself:

005

I said “different” translations, but really the translation is the same. The problem is that each filetype has its own way of adding tags and if this results in more tags in one than another then you will get a formatting penalty that prevents a 100% match. So Type 1. contains 18 tags, Type 2. contains 8 tags and Type 3. contains 12 tags.

The content of the tags themselves are irrelevant, so for example in Type 2 I can translate the HTML file and I get 100% autopropagated matches for the PO and XML. I used another nice feature of Studio to show the tag content here so you can see the differences… each of these segments could be translated without showing any tags at all as shown at the start:

006

With Type 3 I can translate the FrameMaker (MIF) file and get 100% autopropagated matches for the InDesign (INX and IDML) files, the PDF and both flavours of Powerpoint and MSWord:

007

The two Excel files (XLS and XLSX) would be matched with a single translation and really I think should handle the Office format better so that only two TUs are needed to completely translate the whole file rather than three. But I hope from these examples you get the idea and can see how Studio nicely uses the principle of placeholder tags in the Translation Memory to ensure that you get more leverage from your Translation Memories when handling similar text in different filetypes.

To end this matching review just another idea… you could also remove the formatting penalties from the TM in here:

008

When you do this you only need two TUs in the TM because Type 2 and Type 3 are both translated 100% without the penalties for formatting being applied. The Excel filetype should work like this as well… so something to be fixed in the future and then you’ll only need one! (But please use this idea with care to avoid missing tag problems when saving the target document… or make sure you run the QA check – F8)

If you’re interested to see how your CAT tool handles these files and provides you with a clean editing environment whilst retaining the ability to maximise reuse of the translatable text I have placed the files I used here on dropbox:

The Test Files

Finally, I thought it would be interesting to see how other types of tags that are perhaps not so simple as bold, italic and underline are handled. So not simple formatting. To do this I took a few word documents all with the same translatable text in Word, but all containing “hidden” stuff… like hyperlinks and index entries. So in Word all three documents look like this:

009

Simple… but when I open all three in Studio and translate the first one I see this where Studio extracts these entries altogether marking them in the right hand column so you know what type of information they represent:

010

In doing this Studio enables the ability to get a 100% match for the translatable text in all three files even though the URLs were different and the Index Entry with Sub Entry was different in each case. In addition you can still translate the URLs and the Index Entries if necessary… quite a sophisticated feature I think.

I added these files to dropbox as well just in case you were curious how well your CAT handles this sort of thing:

Test Index Entries and URLs

Have fun… and maybe share your experience with this in the comments.

↧

Duplicates and Roadshows…

May 21, 2013, 2:09 pm

≫ Next: Glossary to TM… been there, done that…

≪ Previous: A clean editing environment?

A strange title, and a stranger image with a pair of zebras and a road, but in keeping with the current fascination with animals during the SDL Spring Roadshows I thought it was quite fitting. Nothing at all to do with the subject other than the Zebras may be duplicated and they are hovering a road to somewhere that looks cold!

The problem posed at the SDL Trados Roadshow in Helsinki by some very technical attendees, after the event was over, was about how to efficiently work on a Translation Memory (TM) so you could remove all the unnecessary duplicates.

The problem can be managed through Studio using the features available in the Translation Memory Maintenance View… but only if you know which segments are duplicates so you can find them. Not really helpful in this case where we actually want to be able to find the segments with same source but different target, and then remove the ones we don’t want with the aid of better QA features that are in the Studio Editor.

So the solution we came up with was to make use of two of the things we demonstrated during the events:

SDLTmConvert – an OpenExchange application
Frequently Occurring Units – a Project Management feature in Studio

To demonstrate how this works I took a TM from the DGT (just a sample of around 20k TUs), upgraded it to Studio and then using the SDLTmConvert application I converted it into XLIFF files. I don’t intend to work on these files directly so I just created 4 files with around 5000 TUs in each one.

I then created a project in Studio with these files and made sure that when I did this I applied the Frequently Occurring Units feature during the analysis:

This is a very cool feature in Studio that allows you to create an SDLXLIFF file containing only the segments that occur more than the number of times you set… I selected 2. If you use this when you are working on a Project with some colleagues, but you don’t have a TM Server (SDL GroupShare) where you can actively share the same TM as you work, then by translating this file first and then pre-translating the Project you can ensure consistency for these segments. Then you share the pre-translated files out for translation… so pretty neat.

But for my purposes I’m interested in the TUs that occur more than once so I can find the duplicates in the TM and remove the ones I don’t want. So to do this I add the exported file (which is created in a folder called “Exports” in the Studio Project folder) to my Project as a translatable file and then open it with my TM attached. I now see things like this:

So 1. is my TM Results window and 2. is the active segment in my Frequently Occurring Units export. You can see I have 5 results, so I can now decide which ones I wish to keep and I can remove the rest one at a time, or by selecting more than one at a time, by right-clicking in the TM Results window and selecting “Delete Translation Unit”:

If I also ensure my TM is not set to update so I don’t mess with the context information that may be on the TU then I can work through the file confirming segments and then I know exactly where I got to and can easily return to the task on another day if there are so many TUs to correct.

I thought this was quite a neat solution using the Studio Platform to solve a problem that perhaps many people have come across and then resorted to other, perhaps more arduous means to resolve it.

↧

Glossary to TM… been there, done that…

May 29, 2013, 4:04 am

≫ Next: Glueing your files…

≪ Previous: Duplicates and Roadshows…

#01 So now let’s flip the process on its head!

I’m not sure how often the need arises to create a Translation Memory from a Termbase but I can tell you that the article I wrote previously called “Creating a TM from a Termbase, or Glossary, in SDL Trados Studio” is the most popular article I have ever written… closely followed by an article on why wordcounts differ between tools called “So how many words do you think it is?“. It’s an unfair competition because the latter was written some 4-months afterwards so needs more time to catch up… but there is no denying that the process of converting a Glossary to a Translation Memory is something people are interested in.

I then came across a post in ProZ this week from a user who wanted to do the reverse. So convert a Translation Memory to a Glossary for use in MultiTerm. There were a number of solutions offered… but here’s one I like using the SDL OpenExchange.

First of all you take your Translation Memory, and we’re talking Studio here so it’s the SDLTM. You then use the SDLTmConvert application to convert the SDLTM to a TXT file like this:

#02

So I used these options;

Select your SDLTM
Check the box to convert to TXT
Optionally remove the tags (probably sensible for a Glossary)

I then click on Run and a TXT file is created in the same folder. Next I start up the Glossary Converter and drag the TXT file onto the application window pane:

#03

This brings up a window where you can select the languages you want to be used if the names are not recognised:

#04

And that’s it, you’re done. The termbase is created. In my case it looks like this:

#05

I’m sure some things won’t be suitable… so if you have long sentences in your TM I’m not sure I see how this will be helpful… unless you wanted to use this an an AutoSuggest TM! But it would be pretty simple to go through and delete the longer segments… in fact you could do this in excel before you converted the TXT very easily.

It will be interesting for me to see if this reverse process is as popular as the creation of a Translation Memory from a Glossary… and of course it’s an interesting usecase that once again demonstrates the power of the SDL OpenExchange in providing features for all kinds of unusual usecases that would often require complex and tricky workarounds.

↧

Glueing your files…

June 6, 2013, 4:00 am

≫ Next: The SDLXLIFF to Legacy Converter

≪ Previous: Glossary to TM… been there, done that…

#01 The use of the term “glue” in describing what “Trados Glue” was used for made it very clear what it was intended to do. In fact the term “glue” for merging files together is almost a standard! I have no idea whether it was Trados that first coined the term in the context of CAT tools but it certainly stuck ;-)

Today I see the question of how to “glue” files together to make it easier to manage them quite often… sometimes accompanied by the phrase “Trados Glue”. So it seems appropriate to provide a quick article on how this is achieved with today’s CAT, SDL Trados Studio. Studio has had a similar feature since it was launched back in 2009 but it is not called “Studio Glue”, although perhaps it should have been, it is simply called “Merge Files”. It is also a big improvement over the original Trados version allowing you to merge any filetypes you like and work on them as a single file.

To to this, in the current version of Studio, you have to make the decision to merge the files before creating the Project. The only exception to this at the moment is when you receive a package from SDL WorldServer as you are prompted when you open the package that the files are not merged and are given an opportunity to merge them. But once the Project is created that’s it.

Once you have made the decision that you want the files to be merged and you reach the point of adding them to your Project using the Project Wizard you will see something like this:

#02

So I have added five files to my Project, all different filetypes, and as you can see the “Merge Files” command is greyed out. This remains greyed out until you select the files you wish to be merged. You can select them all and merge them into a single SDLXLIFF (Studio bilingual file) or you can select groups of them as you see fit and create several merged files to work on. I guess the biggest criteria in how you do this would be the size of the files as you don’t want to try and work with a single file that has hundreds of thousands of segments if you can help it.

For my example I’m going to select them all and merge them into a single file.

#03

I write the name I wish to give the merged file (1.) and I can also change the order of the files in the SDLXLIFF by selecting the files (2.) and moving them up or down. Once done I click on OK and after expanding the SDLXLIFF by clicking on the little plus symbol I see this with the files in the different order I set and all listed under the merged SDLXLIFF I created:

#04

On completing the Project and opening the file for translation I now have all the files in one view:

#05

In the Editing window (1.) you can see each file (only small to allow me to illustrate the point) where the name of the file is written in the orange coloured tabs that indicate the start and finish of each file. In the navigation window (2.) you can select the files you wish to work on and quickly get to the start of each one… and in the case of the powerpoint file this expands even further to allow you to navigate by slide.

But that’s not all, even though I think this is already an improvement over “Trados Glue”, because you can also preview these files based on the capability for preview of each filetype. I haven’t put a screenshot here as it’s hard to show a realtime preview in a screenshot… but a video will do it ;-)

1 minute 21 seconds

Finally, once the translation is complete and you save the target files they will be created as separate files in their original format for returning to your client. You cannot save one file by itself, so it’s all the files in the merged file or nothing. If you use Shift F12, or File -> Save Target As then you are prompted for the files one at a time. So if the one you want is at the top then you could go through them until you reach that one and then cancel the rest… but in reality I doubt this is much of a help. Otherwise you can use the “Generate Target Translations” batch task which will take care of all of this for you without prompting for each one individually. A quick tip from Emma, a good one too, is that when you “Finalize” a Project, or “Generate Target Translations“, the SDLXLIFF will disappear from the files view and you are left with the separate files… nothing to worry about. You just need to either use “Revert to SDLXLIFF” or double click a file to get the SDLXLIFF back… so like this:

#06

Simple!

If you watched the video and want to know where the article on creating stylesheets for XML is… Translate with style…

↧

The SDLXLIFF to Legacy Converter

July 1, 2013, 7:52 am

≫ Next: More Regex? No, it’s time for something completely different.

≪ Previous: Glueing your files…

#01 This application, free on the SDL OpenExchange, has been around for about a year and a half and is one of the most popular applications on there. It was written by Patrick Hartnett and is incredibly useful in more ways than one. In fact it’s so useful I have referred to it quite often and used it for working around other issues in many of the articles I have written… so why haven’t I written specifically about it here until now? The answer is I have no idea… but I should have done! What prompted me to write now is that Patrick hasn’t released many updates to this tool, mainly because it did what was needed from the start and has been a really reliable and useful application; but he has released an update this week.

You can get a copy from here : the SDLXLIFF to Legacy Converter

The logo is quite fitting because what this application actually does is provide the ability for you to convert your files that had been prepared for a hare into ones for a tortoise ;-) So you can prepare your projects in SDL Trados Studio which gives you SDLXLIFF files, and then convert them to the following:

Bilingual Doc(x)
Bilingual TTX
TMX Translation Memory

The obvious benefit of this application is that you can prepare your files in Studio, convert them to an older bilingual format (Doc(x), TTX) and send them to a translator or reviewer who is working with an older CAT technology that cannot handle SDLXLIFF. But it gets better… once you make the change in the older bilingual format you can use the Legacy Converter to update the SDLXLIFF file so you don’t have to manually apply the changes.

An important point to note however, and this often leads to confusion for some users, is that the TTX or Bilingual Doc(x) that is created is based on the SDLXLIFF. So this is a bit like running an SDLXLIFF through the old Translators Workbench…. but without the XLIFF. The file you get will clean up into an old Trados 2007 Translation Memory but the target file will be nothing more than text. It will not be the same filetype as the file that was originally used to create the SDLXLIFF. So this is really a tool, when used in this way, for enabling a workflow to get translations or reviews completed by subject matter experts who still use the older tools.

The update this week contains, amongst others, enhancements for a couple of things that users have been asking Patrick for to make this even more useful.

Included new filter option ‘Un-locked segment’
Included 2 new general options related to empty target translations, as follows:
- Copy source to target for empty translations during export
- Ignore empty translations

It was these that really prompted me to add an article about this tool because the first one makes some of the workarounds I use this tool for much easier to apply as you can now use regular expressions in Studio to find the text you want and then lock it, rather than to find the text you don’t want (which is often harder) and lock that. For example, if you wanted to calculate the analysis of a file without the numbers in Studio then you can now do this by filtering on the numbers, locking them, export all the rest to TTX and then reanalyse the TTX. Previously you would have to select everything apart from the numbers so you could export the contents of the locked segments to get an analysis of the file without numbers. This is often a harder regular expression to concoct. You can read that article here.

So you can find this new filter option for “Un-locked segments” in the filter settings here:

#02

The second most interesting and useful enhancement was spelled out nicely at the ETUG (European Trados User Group) conference in Berlin last week by Andreas Ljungström, a trainer and consultant, who uses this application a lot for outsourced work to old tool users. In the previous version when you created a legacy bilingual file then any segments that had no match in Studio were filled with the content of the source. This meant that the translator had to first remove the target content in order to get their TM match automatically added which slowed them down quite a bit. So now the files can be prepared by automatically copying source to target for empty translations or not (during export) and/or ignoring empty translations (during export/import):

#03

I’m sure many users of this application will be pleased to see that one!

What’s next? Patrick is working on the release of an API (Application Programming Interface) library and command line exe so that anyone using the Studio API for automation of their workflows will be able to integrate the Legacy Converter as well. In this way they will be able to use Studio to prepare all of their work in an automated way and prepare TTX or Bilingual Doc(x) files for those still using older, or different technology for translation. Keep an eye on the SDL OpenExchange for that one!

To finish off I thought a quick demonstration of how the application works would be best, so I created a short video that I hope will be useful for anyone wondering how this excellent application works:

Video : 8 minutes 36 seconds

↧

More Regex? No, it’s time for something completely different.

July 29, 2013, 3:22 pm

≫ Next: Regex for Microsoft Word… is there no end?

≪ Previous: The SDLXLIFF to Legacy Converter

#02 Now that we’ve learned enough about regular expressions, and because I get so many requests for custom filetypes I thought it might be useful to take a dip into the world of XPath. So what exactly is XPath?

Well as far as most CAT tools go it probably is something completely different… certainly it was not used in the old Trados days. But as a tool it’s nothing new and is simply a language used to find parts of an XML document and what’s more it’s a language that is recommended by the World Wide Web Consortium W3C. So there is nothing proprietary here.

If you did dive in and start to look at this documentation I referred to it may, unless you lean towards the technical side, be a little off putting. But in reality, as far as we are interested for most applications in Studio, the phrases “keep it simple” and “economy of accuracy” apply. To try and illustrate this let’s look at some examples in Studio. Let’s take a simple XML file that contains some translatable text:

<?xml version="1.0" encoding="UTF-8"?>
 <xpath>
  <Title>An explanation of XPath in SDL Trados Studio</Title>
 </xpath>

In this file there is one translatable element called <Title>. If I create a new filetype to extract this text I would import the XML file and the parser rules would look like this:

#03

The <xpath> element is the root element and I don’t need this as a parser rule so I’ll remove it in a minute, but for the time being take a look at the “Rule” column. Here you see two rules; //Title and //xpath. If I edit the //Title rule I see this:

#04

So as expected the first rule is just me importing the file and Studio taking the element <Title> and making the content in it translatable so that when I open the file in Studio all I will see is the text inside the <Title> elements. But what you may not have known is that the //Title in the rule column is actually an XPath expression. Pretty simple huh? It even looks like the syntax you would use for navigating folders in windows explorer. So for example, I can add some more translatable text to my example file like this:

<?xml version="1.0" encoding="UTF-8"?>
 <xpath>
  <Book lang="en-US">
   <Title>An explanation of XPath in SDL Trados Studio</Title>
   <Text>XPath helps to <b>navigate</b> through the file.</Text>
   <Text>It helps you pick out important <bn>brand names</bn></Text>
   <Notes>
    <Text>This should not be extracted.</Text>
   </Notes>
  </Book>
 </xpath>

So, if I wanted to create a rule using XPath that picked out the <Text> it could look like this:

//Text

But if I did this I would get all the <Text> inside the <Notes> element as well and I don’t want that. So instead I can be more specific like this:

//Book/Text

This way it will not pick up the <Text> element in here, //Book/Notes/Text, even though they have the same name.

In this example I would add similar rules for the <b> and <bn> elements and I would make them inline tags so the sentence does not break. Then I’m going to add a rule that says do not extract anything at all unless I specifically tell you with one of my rules and I do this by using a wildcard. XPath understands the star symbol to mean select everything… so I add the rule like this and I make it a non-translatable rule where everything else I made “Always translatable”:

//*

This would give me a set of rules like this (I also took a little liberty to apply some simple formatting to some of the rules):

#05

The file, when I open it in Studio looks like this:

#06

So, all very simple and straightforward. But what happens if the file contains translatable text in an attribute instead of an element? I don’t think this is really good practice, but we all know that in real life this happens all the time. So what if the file looked like this where the title of the document has been moved into an attribute called mytitle?:

<?xml version="1.0" encoding="UTF-8"?>
 <xpath>
  <Book lang="en-US">
   <Title mytitle="An explanation of XPath in SDL Trados Studio" />
   <Text>XPath helps to <b>navigate</b> through the file.</Text>
   <Text>It helps you pick out important <bn>brand names</bn></Text>
   <Notes>
    <Text>This should not be extracted.</Text>
   </Notes>
  </Book>
 </xpath>

If you were to import this file into Studio and then manually add the rule for translating an attribute like this:

#07

Then you would again see in the “Rule” column that the XPath expression is defined for you like this:

//Title/@mytitle

So again this is pretty simple… but of course an attribute is usually a tag so now you will see the document structure column on the right annotates the translatable content as a tag:

#08

At this stage, because Studio has made this so simple you would be forgiven for wondering why you need to know anything about all of this syntax at all. Hopefully you’ll always receive simple files and never need to. However… sometimes things are not so simple and this is where XPath comes into its own and you can enter an XPath expression as the new rule here by selecting XPath instead of the element or attribute:

#10

Let’s take a little more complex scenario to see how this works, if our file now contains things that look like this where an attribute value is used to instruct you whether the name should be protected from translation or not:

<Text>Non-translatable <bn lock="y">brand names</bn> are locked</Text>
<Text>Translatable <bn lock="n">brand names</bn> are unlocked</Text>

You still want to see the name, but you want to ensure that the translator will know it has to remain exactly the same. So here you use a new “Not translatable” rule to identify this change so that when the attribute lock= has a value of “y” then the content should be protected. The syntax for this uses a reference to the attribute value inside square quotes as follows:

//Book/Text/bn[@lock="y"]

In Studio when I open the file with this new content I now see this where the protected brand names have little padlocks around them and when the tag is copied to the target you will find the text inside is greyed and cannot be changed at all:

#09

You can even string together attributes. So if the XML file was a multilingual XML file for example, and each part of the file was repeated to allow space for each language like this:

.....
<Text>These <bn lang="en-US" lock="y">brand names</bn> are locked</Text>
<Text>These <bn lang="en-US" lock="n">brand names</bn> are not</Text>
.....
<Text>These <bn lang="de-DE" lock="y">brand names</bn> are locked</Text>
<Text>These <bn lang="de-DE" lock="n">brand names</bn> are not</Text>
.....
<Text>These <bn lang="fr-FR" lock="y">brand names</bn> are locked</Text>
<Text>These <bn lang="fr-FR" lock="n">brand names</bn> are not</Text>
.....

Then in order to prepare a multilingual project with filetypes that extracted only the text for the appropriate language codes you could adapt the same rule we just added for the locked content like this… based on extracting the French translatable text only by stringing together the attributes using natural language queries:

//Book/Text/bn[@lang="fr-FR" and @lock="y"]

So now Studio would only extract the text you need from the strings that have the lang=“fr-FR” attribute as well as paying attention to the need to lock content if the lock attribute is “y”.

There are so many things you can do with XPath to manipulate the information in the XML file that was quite tricky, if not impossible, with the older versions of the product that I couldn’t possibly cover them all here. So if you want to learn more about XPath I would recommend you take a look at the W3 Schools website where they have many really useful tutorials about web programming and one of these is all about XPATH. You can find the relevant material here : w3schools.com XPath tutorial.

I hope this article was useful and not too geeky… but just to finish off here’s a few examples of things I have used XPath for in the past that might be handy if you come across similar questions when preparing filetypes in Studio for some tricky situations:

//*[@translate='yes']
Where you have translatable content in this fashion with any element containing the attribute translate, <BodyText translate=“yes”>, then this expression can be used to extract all translatable text.

//A[@M = '8804']/V
You need the text in <V> but only where M=”8804” in <A>. For example:
<A M=“8804″><V>Beschreibung zum Task</V></A>

//journalItem[@id='journal1']/dialog/object/@text
Translating the content of an attribute with an element defined by a different attribute. So the translatable content is in the text attribute but only where the attribute id=‘journal1′

//book[@lang="fr-FR" and @translate="y"]/ul/li
A way to check for two matching attributes and then the subsequent elements in the path.

↧

Regex for Microsoft Word… is there no end?

August 5, 2013, 2:42 pm

≫ Next: Handling taggy Excel files in Studio…

≪ Previous: More Regex? No, it’s time for something completely different.

#01 Unfortunately the practice of being asked to translate a Microsoft Word file that contains HTML code doesn’t look as though it will go away any time soon for some translators. But it’s not the end of the world and it’s often all in the preparation of the Word file before you translate it.

This article is just a short.. ish one I decided to write after seeing this come up again in ProZ last week, and because it’s another place where all those lovely regular expressions we’re learning about can come in handy. Yes, Microsoft Word also supports regular expressions, although it is their own flavour. You can read more about this by just googling for “regular expressions in Microsoft Word” and you will find plenty of help on the subject. In Word they are called wildcards but they have many similar principles as we’ll see with this very simple example.

I have a Word file that looks like this and you can see I have added what’s often referred to as embedded HTML copied in as text:

#02

If I open this ins Studio I get this which is not too easy to work with. Hardly surprising though as this is a terrible way to handle content like this… actually if anyone can tell me why people do it I’d be interested to learn!:

#03

So the solution for Studio users is to do one of two things:

Copy the html into a decent text editor, save as html, and then use Studio to handle the html separately, or
Use a little regex magic to replace all the tags as hidden text so they can’t been seen in Studio

For this article I’m going to use the latter and search and replace the tags with the hidden formatting property in Word. Sometimes this is an easier approach for files with embedded content like this because the HTML may be scattered all over the place so this is one operation rather than many. To do this I’ll use the following expression to find the tags:

\<*\>

So very similar to .NET flavour of regex that Studio uses but this has a slightly different meaning. Word uses the angle brackets to mark the start and end of a word so that you can find single words only… sort of like word boundary markers in .NET. I actually want to find the angle brackets so I have to escape them and this is what the backslash does. The star symbol is exactly the same as .NET, it just means find anything. So in my Word find and replace dialogue I set it up like this:

#04

I enter my regular expression
I check the “Use Wildcards” checkbox
I click on “Format”, then “Font” and in there click on “Hidden”

You can see just beneath the search pattern and beneath the empty replace box it tells me what settings I used for each. Now all I do is click on “Replace All”. Immediately all my tags have disappeared and the Word file looks like this:

#06

#07 But don’t worry… if I click the display formatting button it all comes back again… so the button shown here on the right. The text will now have dotted lines under it but this just tells you that it has the hidden font properties so I can simply set the option in Studio not to extract hidden text for translation. You can find this option here under the “Common” node in the filetype settings for Microsoft Word:

#08

Now when I open the file for translation I see this:

#09

Much easier to handle, all the HTML code is hidden, and I can safely handle the file.

In reality this is an exercise in seeing yet another application for regular expressions in other software tools…. this time Microsoft Office… because I truly hope you don’t see any files like this at all. But if you do, as I do occasionally see, then perhaps this article will be helpful for you in having to safely navigate the content of the file without destroying the tags.

Once you are done you select the text in the target file, right click and select font, then unhide the hidden text. Simple!

↧

Handling taggy Excel files in Studio…

August 21, 2013, 2:58 pm

≫ Next: Psst… wanna know a few things about file types?

≪ Previous: Regex for Microsoft Word… is there no end?

#01 By taggy files I mean “embedded xml or html content” that is written into an Excel file alongside translatable text. In the last article I wrote I documented a method sometimes used by people to handle tagged content in a Word file… funnily enough I came across a Word file containing the XML components of an IDML file today and I guess it must have been prepared in a very similar way judging by the enormous number of tags using the tw4win style to hide them when opened by any SDL Trados version! Proof for me that this practice is sadly alive and well. But I digress… because this time I want to cover how to handle a similar problem when you find HTML or XML tagged content in an Excel file. This crops up quite a bit on ProZ so I thought it might be better to document it once and for all so I have something else to refer to in addition to the Studio help.

Studio uses a concept when creating custom XML files of parsing the file again based on the document structure type of an XML parser rule and replacing patterns you create in the parsed text with tags. Now let me say that again in English… Studio can look in the content of the text that is extracted for translation and then pick out the bits you don’t want to see and convert them to tags. So for example, if you had an Excel file that contained things like this:

#02

And then you opened this file in Studio you would see something that looked just like the Excel spreadsheet but what you would probably prefer is what it can be changed into as shown below:

#03

So you want to protect all the angle brackets and text between them. Just in case you don’t like to see all of this in wysiwyg mode don’t forget that you don’t have to. You can change the font sizes as shown by Kevin Lossner and Jayne Fox in a neat little video, or you can also select the default to always show you consistent plain text, and all tags (because we know they are really there even in wysiwyg mode!) all the time with this option here… so plenty of choice to suit your preferences:

#04

Of course you also don’t have to convert the plain text excel file into the crazy formatting I showed here!

But the important thing is that we have converted all of the tagged content in the Excel file into protected tags in Studio so that you can safely translate the text alone. How do you do this… easy!

You just create some rules, using a little regex, to pick out the text that should be tags. These rules are all added through the Excel filetype settings for XLS and XLSX filetypes in here (the screenshot shows XLSX):

#05

So the process is to first enable the “Embedded Content Processing” in 1. by ticking the box, and then selecting “Cell” from the list of available types. This is because for Excel the ONLY one that works is “Cell”. The rest are all part of the available types when you use the same “Embedded Content Processor” in a custom XML filetype, but they have no effect in the Excel filetype. It makes sense when you think about it as we are dealing with “Cells” in Excel… but it’s not the most intuitive part of this solution.

Once you have enabled the processing you can add your rules as I have in 2. I was a little flamboyant with them in this case just to show you what could be done if you wanted… I could have converted all of the tags in this file with three rules… maybe less if I was really clever. In reality, most Excel files I see translators having problems with only contain quite simple XML/HTML and in these cases the first catch all rule below will probably handle the complete file:

TRANSLATABLE TAG PAIR - CATCH ALL
<[a-z][a-z0-9]*[^<>]*>                     </[a-z][a-z0-9]*[^<>]*>
PLACEABLES
{[0-9]}
Alt attribute
<.*alt="                                   ">

The interesting thing is that in my actual example, by getting a little flamboyant I have actually shown how simple it can be because I have just taken the literal text that formed the tags and added these as rules. For example, I don’t want <b></b> tags to be text. So I add them in as a translatable tag pair here:

#06

Quite simple when you look at it like this… but the drawback is that you need to add a rule for every type of tag in the file which is what I did to create the colourful view above. If you have a lot of different tags and it’s a big file (or lots of files) then the slicker regex rule is much better and it may well be all you need to catch all the tags:

#07

Once you have added all your rules, and made them as fancy as you like, you can open the Excel file and all being well you’ll see protected tags, or a fancy wysiwyg format to handle the file.

Just to finish off… the same file displayed using the “no wysiwyg” option I mentioned above will show as follows even if I have set all the fancy rules I did. The segments that don’t show any tags are like this because the tags are actually at the start and end of the cells, so they are not required. If I did want to see them (and have to deal with them) this is also possible by changing them to be internal rather than external in the advanced rules as you add the regular expressions:

#08

↧

Psst… wanna know a few things about file types?

November 18, 2013, 9:00 am

≫ Next: Unclean… who thought of that?

≪ Previous: Handling taggy Excel files in Studio…

Studio has some excellent capabilities for getting more from your file types, and I’m often surprised by the reaction of Studio users when they find out what’s possible.

It seems we’ve been keeping a big secret that nobody was supposed to know… so I thought it would be worth taking a quick look at just one file type, everyones favourite, Microsoft Word. The mechanism for finding these options in any filetype and seeing how they can benefit you will be the same as it is for Microsoft Word… and just as simple. It’s a long post but hopefully useful.

To begin with you need to know where the options are, and you’ll find these in File -> Options -> File Types (Alt+f, t if you’re using Studio 2014 with the new ribbon). This will present you with a list of file types similar to this, all of which have their own specific options.

Don’t be put off, or overwhelmed by this, and don’t worry if I have different file types to you as I have a few custom file types in there and some downloaded from the SDL OpenExchange. Just find the file type you’re interested in and click on the little plus symbol on the left, so in this case Microsoft Word, and we’ll look at the 2007-2013 variety as this is DOCX and more relevant these days.

Styles

The options that are available for changing are not exactly the same in each file type but there are similarities. So Styles for example are an option in the Word 2000-2003, Rich Text Format (RTF), Adobe Framemaker 8-11 MIF, Open Document Text Document (ODT) and PDF file types. This option is there so you can define a text style that is applied to text that should be excluded from the translation. So the options are convert to an inline tag or to a structure tag. Inline just means that you will see the word as a tag in Studio and will have the opportunity to decide where it is placed in the target segment. So this might be used for non-translatable product names, or chemical formulae, for example to ensure they are not changed when working on the translation… like this:

To achieve this I simply created a character based style in MSWord called productname, applied this style to the product name in the text and then added productname as an inline style in Studio like this:

A really nice feature of Studio is that it’s possible to see the text inside the tag so it can be handled easily and will never be changed by mistake during the translation. And of course you can search for your brand names in MSWord and automatically apply this style to them using the replace feature.

Structure means that words that have this style applied to them will be moved out of the translation altogether. Obviously if they are in the middle of a sentence this won’t happen and they will appear as inline tags, but when applied to a whole sentence, or table cells for example, they will be moved out of the translation and you won’t have to deal with them at all. This also has the important advantage of any externally tagged segments not being included in the word count for the analysis.

Common

These settings can also be found across many file types. I think they are called common because they represent settings that are common to other file types in the same family. So Office files will have similar settings, Adobe products will have similar settings… but in practice this is probably irrelevant because you still have to set these for each file type you are working with. I’m just going to cover three of the features in the Common settings for this file type but I’d encourage you to take a look at them all and if you have any questions post them at the end of this article. The features I’m covering are Extract Hyperlinks, Track changes extraction mode and Extract comments as I think these are all features you could make more of once you see how they can be applied.

Extract hyperlinks

The handling of hyperlinks in Studio is quite sophisticated and provides you with a number of options that allow you to have the hyperlinks automatically handled and not included in the wordcount, or allow you to change them as part of your translation project. They work like this:

I have a sentence in MSWord with a single hyperlink which you can see when I display the field codes in Word. If I use the option to Always process hyperlinks then the translatable text and the link itself is extracted. You can see from the Tag ID in the right hand Document Structure Information column that this is a tag so you can make the conscious decision to copy source to target or not… although in practice Studio will auto-localise this during a pre-translation anyway. But if you don’t wish to localise it then the option to Extract only hyperlink text would be better so you don’t have to handle it at all, and more importantly it won’t be needlessly included in the word count.

There is a third option called Never process hyperlinks but in practice this behaves in the same way as Extract only hyperlink text despite what it says in the online help. So the two options discussed above are the relevant and useful ones. I think a later version of the options will rationalise the choices a little.

Track changes extraction mode

Track changes can be handled in any file type Studio supports for translation and review. But this MSWord file type has a special ability to allow you more. The handling of track changes is discussed in more detail in another article in this blog called “Making use of the Studio Track Changes features” so I won’t add anything more here other than to show what the different settings do when you open a file containing tracked changes in Studio. I used this single sentence that I translated and confirmed into my Translation Memory prior to editing the document in MSWord using Tracked Changes:

I guess the most common message you might have seen is this one if you use the default Ignore documents with pending changes:

It’s also a pretty common error message to get if you were using earlier versions of Studio that didn’t have these options at all. The reason for it is deliberate because the software is assuming that you won’t want to unwittingly translate a document that has not been finalised by your client. So you have the opportunity to either reject or accept the changes before translating the file. This is clearly important because until Studio 2011 you had no way of being able to translate the track changes themselves.

The problem of course is when you opened the Word document, accepted all the changes, and it still wouldn’t open because it says there are tracked changes in the file! At this point you may have felt like throwing your laptop out the window in frustration (sadly these type of things have this effect on me!) but there is a workaround. The problem is caused because there is a flag in the underlying Word file that identifies whether there are tracked changes in the file or not and sometimes this flag is not cleared properly by MSWord. Studio, and SDL Trados 2007 look for this flag, and report the status accordingly… very frustrating. The flag looks something like this:

style='mso-prop-change:"Some text in here" 20071130T1052'"

There is a Knowledgebase article here, KB Article #3253, that explains how to deal with this, including how to remove the unwanted properties in the file if necessary.

Fortunately, since Studio 2011, we don’t need to resort to this anymore because you have two other options to consider that allow you to open the file for translation whether there are visible tracked changes or not. First of all there’s this one… Apply changes before opening:

Using this option I see a 73% match against the original translation and the source text used shows the sentence with all changes accepted. The Translation Memory results window of course shows me the differences in the normal way.

This mode has the added benefit of helping you deal with the error message above when you are certain there are no tracked changes in the file and it still won’t open. So you can simply select this mode, safe in the knowledge that there won’t be any changes as it’s only the underlying property fooling Studio into thinking there are tracked changes present.

The third option, Display pending changes, will present the segment like this in the Studio Editor:

This in turn allows you to handle the translation using tracked changes as well and then your client can use the target translation in MSWord to make their own minds up about the final translation. So you could do something like this for example (excuse the actual translation as this is entirely based on Google):

I already have a Translation Unit in my Translation Memory for the original translation and this is why you see the TC icon in the middle column. Points to note from this exercise are these:

The TC (Tracked Changes) indicator becomes transparent because I have edited the draft suggestion from the Translation Memory
I can toggle the view as I translate (default is Ctrl+Alt+Shift+F9 and this is of course customisable, or you can just click on the ribbon icon) so I don’t have to see these tracked changes… sometimes they can make it tricky to see what’s going on, especially if there are many changes in the source
The target file will export to Word showing these tracked changes as MSWord Tracked Changes format
If I confirm this segment it is the final version that is added to the Translation Memory. So as if the changes are accepted.
In the analysis tracked changes in the source are counted as if all changes have been accepted. A better example would be this segment where the analysis is one word irrespective of the mode used to open the file:

This makes sense when you open the file using Apply changes before opening but as a couple of Translators pointed out during the ATA in San Antonio this year I think it might be helpful to be able to count the effort required when you work using Tracked Changes… so in this example you would have 5 deletions and 1 addition. This is quite simplified of course but I can see the usecase. There might be an interesting solution to this question very soon.

Extract comments

This is another interesting story because I wrote an article in July 2012 when Studio 2011 first introduced the ability to export comments in your target file. The article was called “Did you know you can export Studio comments in your target Word file?” and I had to publish an update when we released SP2 for Studio 2011 and disabled this neat functionality. Fortunately it’s back in Studio 2014 with these options:

Looking at incoming comments first, the options are pretty explanatory. If you have comments in your MSWord source file then extracting them as translatable text means you have to handle them in Studio and they will be counted in the analysis. So a short text like this in MSWord:

Would be brought into Studio like this:

The second segment containing the text in the comment so it can be handled as translatable text. You can identify the fact it is a comment either by working with the realtime preview on or simply by looking at the right hand column where the Document Structure Information make this quite clear through the use of the COM code.

Now, if you don’t need to translate the comments, but you’d still like to be able to see them then you can use the second option which is to extract the comments As Studio comments. When you do this the comments look like this:

You can see the source comment as it’s highlighted around the same word the comment was attached to in MSWord, and you can hover over it to read it. You can also view the comments in the comments window and if you rearrange the windows so that the comments pane is larger and maybe to the left, or right, of your Editing View, or even on a different screen, then you can very easily read the comments as you work or even navigate the document via the comments. In the image below the Comments pane is on the left and split into many columns… I’m just showing the first three. These contain the Severity (Information, Warning or Error) used for Studio comments, S/T (Source/Target) which identifies whether the comment is in the source or the target by the green and yellow icon, and Comment which of course is the comment itself. I annotated the image to show where the source comments are and you can see the target comments by the highlighting in different colours to help visually identify the severity of each one.

When you work with this option the comments are also not included in the analysis count.

Now the last option is more interesting. This provides you with the ability to add comments in Studio and then have them exported into your target document for the reviewer to see in context. This is great, and is something many users wanted to see back after Studio 2011 SP2 removed it. But there is a “funny” effect worth knowing about.

If you extract comments for translation (they will also be included in the word count) and add your own Studio comments then both the Studio comments and the translated comments will be exported to the target file.
If you extract comments as Studio source comments (they will not be included in the word count) and add your own Studio target comments then only your Studio target comments will be exported to the target file.

So what happens if you don’t want the source comments included in the word count, but you still want them to be retained in the target file alongside your own Studio comments for the reviewer? The answer is there is no built-in option for this scenario. But you can still handle it with a little help from the SDLXLIFF Toolkit. The process would be this:

Add your Project file to the Toolkit and generate the Document Structure Information like this:

Note the sdl:comment which is the COM code you can see in the Document Structure Information column in Studio
Select sdl:comment and then use the lock segment option and click on Changeit!

This will lock all the segments that have been extracted as comments for translation in Studio.
You can now use the display filter to skip the comments if you don’t want to see them in the Editor, and you can use the new option in Studio 2014 to exclude them from the analysis:
So now you can add all the comments you like to Studio and when you save the target file both the Studio comments and the original comments will still be in the target file.

Tag Check

This section of the file type options is pretty consistent for almost every file type I have on my installation. The exceptions are some of the OpenExchange file types that don’t use this in their settings. The basic idea here is that you have the ability to decide how the severity of tag differences between source and target is reported and whether they are reported at all as you work. So you have an overriding Enable tag verification and if you check this you then have different options you can check for:

Depending on the content of the files you are translating you may wish to switch some of these off, or change the severity when they are on. For example… handling Word files with lots of formatting that you don’t wish to see in the target translation will not affect your ability to save the target file. So this option is checked to ignore formatting tags by default. But if your client insisted that the same formatting that was used in the source must be used for the target to the best of your ability, then you could set a little aide memoire by unchecking this option and then all formatting omissions will be reported.

If the source document had been incorrectly prepared with normal breaking spaces between numbers and units where you knew it was important to make sure the correct non-breaking space was used in the target then you could check the last option here to ignore this difference. In fact you go a step further and create a QA verification rule to check that where a breaking space had been used inappropriately in the source you always used a non-breaking space in the target… but that would be in the verification settings rather than on the file type.

Quite useful options, and in general the defaults work well. But it’s worth reviewing these settings if you often get warnings where you think it’s inappropriate as you may be able to control them differently in here.

QuickInsert

I’m not going to cover QuickInserts as I think I covered them in an earlier post so it’s clear how they work:

Those dumb smart quotes…

But if anyone has questions on this not answered anywhere else then post a question and perhaps I can revisit this section at a later date. They are available in almost every file type and you have to set them up for every file type which is why I prefer AutoHotkey as a solution to many of the requirements you might have for QuickInserts. Mats Linder provides some interesting solutions to these requirements using another tool called Phrase Express, so I think there are several options to choose from if you want more flexibility and extensibility than you get from QuickInserts alone. You can read about these options in his excellent Studio Manual or in his blog here:

Autocorrection and autocompletion

Font Mapping

This is another feature found throughout most of the file types. It’s main value is in being able to specify a different target font to the one use in the source… and more importantly it will be used in the target document. So whilst font adaptation in the Studio Options can change the font being used in the Editor only… the Font Mapping settings will allow you to change the font in the target document.

I’m not very familiar with fonts and their nuances, but the Studio help provides a good example. Quoting it exactly:

Use the Font mapping settings to specify the fonts that Studio uses
in the target document to replace the fonts used in the source document.  
For example, if the target language is Chinese (Taiwan), you might want
to specify that PMingLiu font is used for all text in the target 
document, whatever fonts are used in the source document. If so, 
specify that for Chinese (Taiwan), all source language fonts are mapped to PMingLiu.  

By default, most target languages in this dialog box have only one font
and for that language all source fonts are mapped to one font.  

Note: The mapping is done when the target document is created (when the
document is saved as target), so you may not see the mapping in the 
editor.

I haven’t addressed every option in this filetype, or covered any of the numerous options that are available in some of the other file types, but I hope it’s provided a little insight how you might be able to use the options available to improve the benefits you get from working with Studio now you know where to look… in case you didn’t already!

↧

Unclean… who thought of that?

March 14, 2014, 5:29 pm

≫ Next: XML Length Restrictions

≪ Previous: Psst… wanna know a few things about file types?

001 I spent the weekend at my Mothers house the week before last and was digging around looking for photographs of myself when I was the same age as my son. I found a few… a few I wouldn’t share with anyone else but my son! What was I thinking with the baggy trousers and platform shoes…!

I also found some old Army pictures including these two taken during my basic training, which did an excellent job of shaking me out of my baggy trousers and platform shoes! Also provided me with the most tenuous link yet into the translation environment because I wanted to write about clean and unclean files. I don’t know who came up with this terminology, but if I think about it, the description probably fits quite well. But the first time I heard it I’m sure something like these photos would have been closer to mind!

So lets define an unclean file first of all. In our world this would be a file that contained the source text and the target translation… so essentially what we call a bilingual file. In the olden days, not quite as old as my pictures, this would have been Bilingual DOC or TTX and today it’s more likely to be XLIFF. However, we still see questions being asked because a translator has been asked to provide the unclean file for their customer and they’re not sure how to do this. The task can become more confusing because technically an XLIFF is also an unclean file but this term generally refers to Bilingual DOC rather than XLIFF, and probably rather than TTX as well. This of course poses a problem if you have a modern translation environment because even if the tools you use are capable of creating an unclean file like this, the only real guarantee that the file will be suitable for your customer is if it is created using the same tools they are using. So this means you need to be translating the unclean file in the first place and not a clean, or monolingual file.

SDL Legit!

002 I’m going to talk about Trados unclean files, and if you have just purchased Studio 2014 then you may not have a copy of SDL Trados Translators Workbench which did create what we think about when we refer to unclean files. But you do have the next best thing… maybe even a better thing! You have a free OpenExchange Application called SDL Legit!. This application allows you to convert files that were supported by SDL Trados 2007 into a fully segmented Bilingual DOC or a TTX (I’ll come back to that phrase “fully segmented” in a bit). It also allows you to use custom ini files when you do this, and you can also use a legacy translation memory to pre-translate the unclean file as much as possible before you start if you wish. But I’m just going to talk about how you go about the translation workflow if you need to provide an unclean file to your customer and they only gave you a clean file to start with.

Let’s start with a sketch… the basic workflow for most users will be something like this:

003

If we examine these parts it’s quite straight forward really.

Receive “clean” source text

This refers to the file you are provided by your client. For these purposes we are assuming that they have provided you with a monolingual source text. If the unclean file has to be a TTX then this file could be anything that is supported by SDL Trados 2007.

If it has to be a Bilingual DOC, (Bilingual RTF is not supported) then it can only be a DOC file. DOCX will convert to TTX by default.

Convert to fully segmented “unclean” file

When the clean source file is converted to an unclean format you are presented with a new file that contains both the source and the target text. This process is normally carried out with a Translation Memory and any matches for the source segments are copied into the target part of the file. If there are no matches found then that segment will not have a corresponding target text and will be what is referred to as unsegmented. In practice it will look like this:

TTX

Unsegmented

004

Segmented

005

Bilingual DOC

Unsegmented

006

Segmented

007

Now, Studio can handle unsegmented unclean files because it will complete the segmentation on it’s own, using Studio segmentation rules. For simple text this won’t be a problem, but for some material, particularly if it’s heavily tagged, you may find that when you have completed the translation the differences in how Studio handles these tags might cause a problem for the old legacy Trados which your client won’t be too happy about.

When you used SDL Trados Translators Workbench to segment the files you had an option to “segment unknown sentences” and this always made sure that the files would be fully segmented even if there was no match in the translation memory. It did this by copying the source into the target. So in the examples above you can see that the segmented translation units have a zero match value separating the source from target, but they do have a source and target even if they are the same text.

SDL Legit! always fully segments the unclean files, whether you have a real translation memory to use or not, so you won’t be able to forget. There is another application on the OpenExchange called SDL TTXit!. This can only create bilingual TTX files and they are unsegmented. So in my opinion you are better off using SDL Legit! for a safer round trip of your files.

When you convert your files to Bilingual DOC you will find that you now have two files in your folder. One that has the same name as your original clean file with a DOC extension and one with a BAK extension. The DOC is the unclean file, and the BAK is a backup of your original clean file. So it is just the original DOC renamed as BAK.

When you convert to TTX the original clean file remains as it is and an additional file with the extension of TTX is created. This is your unclean file.

Translate “unclean” file in Studio

Once you have your unclean file, you open it in Studio and you will see something like this:

008

This is the unclean Bilingual DOC file in Studio. The 100% matches are segments that were translated during the conversion to an unclean file in SDL Legit! based on the contents of the translation memory I used. The 0% matches are untranslated and the source is just copied into the target. These are most likely the segments you need to translate.

The TTX will be similar although you start to see some of the differences between files and versions. So here the unsegmented sentences still have the source copied to target, but they are given an draft status. You can also see that segment #1 had a number in it that the Bilingual DOC file didn’t segment so it was ignored by the Studio filetype, whereas the TTX brought it in unsegmented and the Studio TTX filetype handled it with an untranslated status:

009

But in both cases you would simply translate the file.

Save target file which will be a translated “unclean” file

Once the translation is complete you save the target file from Studio. When you do this, depending on the original unclean file, you will have different options. So if the original was a TTX then you will see this:

010

You now have the opportunity to save the TRADOStag Document which is the fully translated TTX you wanted for your customer. Or, you can save the Original File Format which is the clean file you started with only now it contains the target text. So for TTX files you can save both the clean and the unclean file from Studio.

If the original unclean file was a Bilingual DOC then you won’t see this option. The target file will be the fully translated unclean file your customer wanted. Studio will not clean up this type of unclean file.

“Clean” the “unclean” file if required

If you need to provide a clean file as well as an unclean file, and you don’t have the legacy Trados tools then no problem… help is at hand with another free tool on the OpenExchange called tw4winClean. This tool will remove the segmentation mark up and leave you with a clean file as well as retaining the original unclean file, so you can still provide both files to your customer.

All that sounds complicated..!!

After writing all of that, and I could think of a host of other things I’d love to tell you, it certainly does sound a little complex. But it’s not really. So to demonstrate this in real time I prepared a short video just to show you the process from start to finish, and all without using the legacy SDL Trados 2007 at all.

A final note would be to direct you to this article where you can read a lot more about working without Trados – Life without Trados! And of course if you do have any questions or think I should have covered something else feel free to let me know in the comments.

↧