Quantcast
Channel: filetypes – multifarious
Viewing all articles
Browse latest Browse all 49

The JSON files…

$
0
0

01The JSON files… not really related to Jason Voorhees of course, but for some users who have received these file types for translation the problem of how to handle them and extract the appropriate text may well seem like an episode of Friday the 13th!  I’ve seen a few threads in the last couple of weeks sharing various methods for handling these files ranging from opening them in MSWord and applying a hidden style to the parts you don’t want, to asking vendors to create variations on javascript filetypes.  But I think Studio offers a much simpler mechanism for handling them out of the box.

So what are these file types and how can you handle them with Studio 2014, or even 2009/2011?  In this article I’m going to look at the regex filetype as this is very well suited to files like this, but before we get into that detail let’s take a look at what they are.

JavaScript Object Notation

This filetype is a simple text based format that was introduced around 2001, so it’s nothing new, and it’s used as a method of sharing data between aplications irrespective of the programming language used.  For those of you are interested in this stuff it was derived from the ECMAScript programming language and you can find the full specification for it on the JSON website.

I like to read this stuff to an extent, but really I stop at the point where I can figure out how to get at the translatable text.  The format of these JSON files is based on a simple structure which I have taken straight from the JSON specification:

02

The four components on the left represent the structure components and the image on the right is an example file coloured to show you which components are which.  Now, the reason I did this is because these files can contain any kind of data, and the important part is for you to know which parts are translatable.  If you translate something that should not be translated then it’s likely that the file won’t be fit for purpose when you give it back to your client.

How do you know what should be translated and what should not?  You ask your client!!

Handling the file in Studio

Once you know what’s translatable the next step is to create a filetype in Studio to handle this.  It’s actually quite straightforward using the regex filetype.  The steps are like this:

Create new filetype

Go to File -> Options -> Filetypes and then select New…  Select the “Regular Expression Delimited Text” type and click on OK.

03

Once you’ve done this you give the new filetype a little bit of information:

04

  1. Filetype name: you can call this whatever you like.  I called it JSON
  2. File type icon: this is completely optional, but as I have never actually done this before in Studio I thought I’d try it!  If you want the icon file I used you can download it from here.
  3. File dialog wildcard expression: this is just the file extension written like this *.json so that Studio knows to use this filetype when you open a JSON file.

Click on OK and that’s it.  Your filetype is created!  Not too hard was it and you can now open a JSON file and translate it in Studio.  However, using my example file which you can find here if you would like to play with it, and you don’t scare easily, the result isn’t too clever because the default will just extract everything in the file like this:

05

So the next thing you have to do is tell the filetype what you actually want to see in the editor for translation.  This is why it’s important to speak to your client so you understand the requirements of the job.

The files I have seen so far all seem to follow the same principle for the translatable text.  You have a String at the start followed by a Structural colon and then another String or Number, finally ending in a Stuctural comma like this:

"FileId": "45b1b4b4-32ae-4a33-b2bb-b35b6940d348",
"FileName": "thejsonfiles.docx",
"Language": "French (France)",
"Film": "Friday the 13th",
"Year": 1980,
"Director": "Sean S. Cunningham",
"Writer": "Victor Miller",
"Producer": "Sean S. Cunningham",
"Synopsis": "Friday the 13th is a 1980 American slasher film etc.",
"UtcDateTime": "2015-03-16T22:52:36.1622185Z"
"Recorded": true,

The first string represents an identifier of some sort, similar to an element name in an XML file.  The second string contains the translatable text.  So all you have to do is extract the contents of the second string.  We do this using our old friend the regular expression.  However, you still need to know if all of them are translatable or only some, and then once you do you can create your expression to suit.  The expressions go in here:

06

You need two, an opening pattern and a closing pattern.  The translatable text will be the text that is inbetween these patterns.  So in a line of text that contains code you don’t wish to translate you can move the text found by the opening pattern into the hidden part of the editor so the translator doesn’t have to deal with it; similarly for the closing pattern.  So using Regex Buddy (my preferred tool for this stuff) let’s look at a couple of examples and what they would extract.  If you don’t understand how to use regular expressions I’d really recommend you learn a few basics, they are incredibly useful.  You can find four articles here on how they can be used in Studio that I have written in the past… starting with simple explanations and leading up to slightly more complex examples.

Extract all the second strings
".*": "
",$

The first line is the opening pattern and the second line the closing pattern.  The first line basically means look for a quote, then look for anything and keep looking until you find the next quote followed by a colon, then a space and then a colon.  So this opening pattern should select the following segments only and make the coloured parts structural ie. hidden in the editor:

07

The closing pattern is just going to find the last quote and comma and move that into the structure so it’s also hidden in the editor.  So when you add these rules into the filetype you see this on opening my test JSON file:

08

This is much better because now all the JSON structural elements are gone and I’m only getting the second string extracted for translation.  However, some of these don’t need to be translated at all so I can further refine my filetype by using a different rule.

Extract only named strings
"(Film|Director|Writer|Producer|Synopsis)".*?"
",$

This time I am saying look for a quote and then find any of the words between the pipe symbols followed by a quote, and then anything at all up to the very next quote.  So in effect I extract this:

09

This is because I don’t think I need to translate any of the other strings at all.  In reality I guess I would only translate Film and Synopsis from this file, but this is just an example!  So have a play and you’ll see how simple this is to work with.  However, if the file contains many different translatable strings then the list of identifiers is going to get longer and longer.  In this case it might be easier to specify what you don’t want instead!

Extract everything apart from named strings
"(?!FileId|FileName|Language|UtcDateTime)\w+".*?"
",$

With this expression we are using something called a negative lookahead… wonderful names but quite sensible.  This means take a look ahead of you and see what’s coming, if it doesn’t match the following text then it’s what we want.  So the opposite of a positive lookahead where it would match what it found.  Maybe takes a little getting your head around, but have a play!

So the expression says look for a quote and then look ahead to see if any of the following words between the pipe symbols match.  If they do then don’t use this segment, but if they don’t then look for any word character, one or more, followed by a quote, and then anything at all up to the very next quote.  So in effect extracting exactly the same as before.  But this time I used a rule to specify what I didn’t want rather than what I wanted!

Phew… makes your head go giddy!  But in Studio I now see this:

10

Exactly what I wanted.  The beauty in this of course is that the simplicity behind the JSON concept translates nicely into the simplicity of the regex filetype!

A final note here on SDL Passolo after Daniel reminded me!  If you want to have full native functionality with JSON files out of the box then you should really create your translation projects in SDL Passolo in the first place.  Here you have full control out of the box over all aspects of this filetype including developers comments etc.  You can read a little about this here.  You will need the full version of Passolo to create the projects in the first place as the free Translator Edition will not allow this.  But if you are serious about working with these filetypes then it’s worth the effort.  So this article provides, I hope, a good workaround for anyone sent JSON files and they don’t have the full version of SDL Passolo.  Perfect for the occasional job but perhaps lacking if you are going to make a habit of it and need to accommodate more variations in the content than I have shown here.



Viewing all articles
Browse latest Browse all 49

Trending Articles