Quantcast
Viewing all articles
Browse latest Browse all 6

Data handling in zkanji mini-series, Part V.

The first problem I tried to solve was that the main English dictionary was not editable. Of course I could have simply allowed editing the main dictionary like any user dictionary, saving it every few minutes (if the auto save option is on). I didn’t go with this because some features might need the original main dictionary. For example the example sentences data relies on word indexes in the main dictionary, and if those indexes change, the program would crash when looking up the sentences.

This is not the real reason though. The real reason is that I don’t remember what parts would break (if any) if the dictionary changed, so I avoided it. To be even more precise, even the way I solved this problem (for the next release) won’t allow deleting words that were not added by the user. I could have made the program this way from the beginning. My suspicion is though, that there wouldn’t be a problem even if words were deleted from the main dictionary (apart from breaking the example sentences, which will be fixed in the next release or the one after that). I just never had enough patience to check.

So the next release will allow changing the main dictionary as well. It changes the data, but keeps the original words in a separate list, in case they have to be reverted. Reverting, or using the original data is not implemented yet, but I didn’t want to break anything for future releases so I decided to keep the originals anyway. Their list will be saved with the changed English dictionary data.

-

From the next release, there will be three files for the main dictionary instead of one. I will keep “zdict.zkj”, but from now on, it will only hold data about the kanji, which is shared among all dictionaries. I.e. stroke count does not depend on the target language. Only the kanji meanings depend on the language which is still kept in this file. Of course changed meanings will be saved in user dictionaries like before. I have decided to keep the kanji data separate from the word data, because handling the other files will be simpler this way.

The other two files will be “English.zkj” and “English.zkd”, both holding the word dictionary data and they will be identical at first. The .zkj file will be the dictionary as it was installed, and the .zkd  will be a copy of .zkj. You might have noticed, that .zkd is the extension for user dictionaries. This is because “English.zkd” will be handled just like any other user dictionary. Any user changes will be reflected in it, but not in “English.zkj”. Once the program starts, it will check for the user version of the dictionary file, and if found it will load that one, otherwise load the original and create a copy with the .zkd extension. This wastes around 30Mb of disk space.

I could have made the program to either update the original file when the user changes it, or delete it once it created a copy or something similar, so only one version would be present. The original file can’t be kept with its original name though (I’ll explain why in a minute) so renaming it or creating a copy was the only viable option. It is probably not necessary to keep the file with the original name, but makes things a bit easier.

The reason for having an original and a user version of the same data was the simple fact that the setup program (and the zip package) contains the dictionary data under the name “English.zkj”, so updating the program could very easily delete any changes the user have made to his or her own English dictionary. Both the original and the user data will contain a date. When a future release of zkanji runs, it will check that date in the two files, and if they are not identical, it will know that the program was updated, or at least that the original dictionary file was replaced with a different one. If the two dates are identical, it will run as usual, otherwise it will bring up a dialog where the user can check which words differ that were added to a group or study list, so he or she can resolve any issues.

Keeping a separate user English dictionary file will avoid a lot of difficulties the current release has to deal with. For example the file which keeps the data for word groups will no longer have to store both the kanji and kana forms of words, it will be enough to save an index in the user dictionary. It will be also possible to create word groups where each entry in the group can hold more than one meaning for words, as the update won’t break anything, since the user will be notified of changes and will be able to resolve them.

-

The next one will be probably the last in this mini-series. I will write about what is not ready yet for a new release (apart from that the big changes I just described need a lot of testing, though I’ll probably ask for help with that), and why it is so challenging for me to finish it. I can almost imagine how excited you might be, waiting for the last part to be finally here!


Filed under: Development, Under-the-hood Tagged: data files, user data Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 6

Trending Articles