In the past two entries I described what is in the data files included with the program. In this part let me write a bit about data that is generated by the user, which must be saved and restored.
Since Vista, files cannot be created in some folders by running programs, unless they are given administrator privileges or run by an administrator. For example the Program Files folder is one such location. Unless zkanji is “installed” in such a folder, it keeps user data files in the “data” folder which is next to the executable. Otherwise user data files are saved in the user’s document folder. There can be 2 user files for each user dictionary. Though if you are only using the English dictionary, there is a single file only, as there is no dictionary data to be saved.
As I wrote in the previous entry, user made dictionaries are saved in exactly the same format as the main dictionary data file, but the kanji data, which stay the same for all languages (so everything apart from the meaning of a kanji) are not written. There is no JLPT data stored in these files either. The other file saved is the group / study progress file with the .zkd extension. User dictionary is unnecessary for English, but the group file is created even in that case. The group file obviously stores which kanji are moved to kanji groups, and which words are moved to word groups. It must also contain study progress, which is mainly a list of words and their standing in some study group or the long-term study list.
It is less obvious how a word or its identifier is saved and loaded in groups and the study list. One possibility would be to store a unique index number for each word (and probably meaning where it makes sense) that gives the word’s position in the dictionary, but unfortunately this approach wouldn’t work. With each update the English dictionary is also updated. I have no control over how that is done, and unfortunately when the JMDict data changes, it is very common that the index of words change as well. If I just saved the user data with an index number, once the program and its dictionary is updated, most words in word groups and study groups would not be the ones that should be there. The best workaround I could find was saving both the kanji and the kana form of every single word that was added to a group or to a study list. This can increase the size of user files considerably, but what is more important, influences loading times, loading huge user data files much slowly than it would otherwise be necessary. Even if you keep zkanji on an SSD drive, it’s not reading the file to memory which is slow, but looking up every word in the user data in the dictionary, to find their current indexes. This is a pity even more, since updates are rare nowadays (sorry) and indexes only change if the dictionary changes as well. The situation is even worse though, because the meaning of words often change in the dictionary as well. What was the first meaning could be the third in the next release, or some meaning might be split to several meanings. Unfortunately I couldn’t find any solution to this problem (until now).
Once I release the next zkanji, what I wrote in this entry will be out of date. The next version, which is mostly already done for more than 6 months now, does things differently and more reliably, with the price of taking up a lot more disk space. This “lot more” is in the tens of megabytes, an amount which in my opinion is nothing to fret about. In the next part I’ll try to explain the changes. I have already written about what the user will see when the dictionary is updated, but this time I will explain how it concerns data files.
Filed under: Under-the-hood Tagged: data files, user data
