Data handling in zkanji mini-series, Part III.

In the past two entries I described what is in the data files included with the program. In this part let me write a bit about data that is generated by the user, which must be saved and restored.

Since Vista, files cannot be created in some folders by running programs, unless they are given administrator privileges or run by an administrator. For example the Program Files folder is one such location. Unless zkanji is “installed” in such a folder, it keeps user data files in the “data” folder which is next to the executable. Otherwise user data files are saved in the user’s document folder. There can be 2 user files for each user dictionary. Though if you are only using the English dictionary, there is a single file only, as there is no dictionary data to be saved.

As I wrote in the previous entry, user made dictionaries are saved in exactly the same format as the main dictionary data file, but the kanji data, which stay the same for all languages (so everything apart from the meaning of a kanji) are not written. There is no JLPT data stored in these files either. The other file saved is the group / study progress file with the .zkd extension. User dictionary is unnecessary for English, but the group file is created even in that case. The group file obviously stores which kanji are moved to kanji groups, and which words are moved to word groups. It must also contain study progress, which is mainly a list of words and their standing in some study group or the long-term study list.

It is less obvious how a word or its identifier is saved and loaded in groups and the study list. One possibility would be to store a unique index number for each word (and probably meaning where it makes sense) that gives the word’s position in the dictionary, but unfortunately this approach wouldn’t work. With each update the English dictionary is also updated. I have no control over how that is done, and unfortunately when the JMDict data changes, it is very common that the index of words change as well. If I just saved the user data with an index number, once the program and its dictionary is updated, most words in word groups and study groups would not be the ones that should be there. The best workaround I could find was saving both the kanji and the kana form of every single word that was added to a group or to a study list. This can increase the size of user files considerably, but what is more important, influences loading times, loading huge user data files much slowly than it would otherwise be necessary. Even if you keep zkanji on an SSD drive, it’s not reading the file to memory which is slow, but looking up every word in the user data in the dictionary, to find their current indexes. This is a pity even more, since updates are rare nowadays (sorry) and indexes only change if the dictionary changes as well. The situation is even worse though, because the meaning of words often change in the dictionary as well. What was the first meaning could be the third in the next release, or some meaning might be split to several meanings. Unfortunately I couldn’t find any solution to this problem (until now).

Once I release the next zkanji, what I wrote in this entry will be out of date. The next version, which is mostly already done for more than 6 months now, does things differently and more reliably, with the price of taking up a lot more disk space. This “lot more” is in the tens of megabytes, an amount which in my opinion is nothing to fret about. In the next part I’ll try to explain the changes. I have already written about what the user will see when the dictionary is updated, but this time I will explain how it concerns data files.

Filed under: Under-the-hood Tagged: data files, user data

Data handling in zkanji mini-series, Part III.

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112