DH 2016 Abstracts

A Management of Personal Name with Alternate Name and its Searching for Japanese Historical Study

1. 1. Introduction

The basis of historical study or historical understanding consists of collecting historical materials (mainly literature materials such as old documents, old diaries, …), precise reading of the materials, and source criticism. In order to perform them, an identification of a personal name is one of important methods or works, and the researchers of history cannot avoid it. The personal name identification is not simple issue, because there is a diversity in the name representations in the materials. The main representing patterns of the diversity are the follows:

a) Written real or original name

b) Written first name only

c) Written nickname, epithet, or alias

d) Written omitted name

e) Written role name

f) Written using different characters

g) Described Kao (which is a stylized signature or a mark.)

Examples of the representations of “伊集院忠棟 (Ijuin Tadamune)”, who is a senior statesman of “島津家 (Shimazu family)” and the relatives of the family and is in the 15th century in Japan, there are “伊集院” (which is his family name), “忠棟” (which is his first name), “幸侃” (which is a nick name, is often appeared in the old documents), “伊右衛門大夫” (which is a nick name), “伊右”, “右衛門”, “伊大夫” (which are his nick names called by familiar persons), “忠金” (which is his original name). Here is a difficult problem that the various represented names should be identified. In the above example, if you understand the various alternate names can be identified with “伊集院忠棟”, the problem is not hard. However, in practice, there are no persons (including researchers of the history) who know and understand all historical persons. For the solution of the problem, we consider that the results of personal name identifications which can be performed by researchers of history should be managed. Furthermore, in the search against historical materials, if the results can be available, the performance of the search can be surely improved compared to simply full-text search.

In the paper we introduce a management method of personal names and the alternate names of the persons and a search method using the managed names. In (Ho, 2015) and (Bol et al., 2015), personal names can be extracted and tagged automated against target documents based on China Biographical Database (CBCB; http://isites.harvard.edu/icb/icb.do?keyword=k16229) as a biographical dictionary. Unfortunately, there are no exhaustive the encyclopedias or dictionaries for the names of Japanese historical persons. Moreover, methods introduced in (Ho, 2015) and (Bol et al., 2015) can be performed better if you can treat a document which is a secondary source like a "地方志 (difangzhi)" in which almost personal names indicates its real name. Most of documents which we treated in the work is primary source and hardly have real names of the person.

2. 2. Extraction of personal names and alternate names

At first, in order to collecting personal names, we used “上井覚兼日記 (Uwaikakken nikki)” which is a diary of Japanese medieval period (from 1574 until 1586) written by “上井覚兼 (Uwai kakken)” who is a senior statesman of “島津家” of Japan. For the historical study in Kyushu (which is a local area of Japan) or “島津家” in medieval period, the diary is one of important historical materials and Japanese national treasure. The text of the diary has been stored in “The Full-text Database of the Old Japanese Diaries” which has been published by Historiographical Institute, The University of Tokyo. In the text the number of characters is about 1.4 million (for 1777 days; note that there are days which he was not written in the diary). The format of the text is very simply, because the text is just plain text and does not have tags such as XML, TEI. The sample is as follows:

…一、此朝、入来院（重豊）殿太刀を、東郷（重尚）殿次に拳可有之由被申候、御老中よりハ、東郷・祁答院・入来之事ハ同家にて候間、東郷之次に者根占殿（禰寝重長）太刀を可被召成之由、…

This is a part of the text in the diary of “天正2年8月１日” (which indicates A.D. 1574-08-17)”. There are three persons (“入来院 (Irikiin)”, “東郷(Togo)”, “根占(Nejime)”) in the part. If an alternate name against written name could be solved, the alternate name was added using parenthesis.

The representation can be understood by human (who can read and understand the sentence), but machine can not be solved if the machine doesn’t know or understand the pattern. Due to machine usable, we extracted a written name, an alternate name of the name and the date and we managed the result as a set. As shown in the example, a real name or well-known name is hardly written in the Japanese historical materials, and alternate name added by researcher what indicates real name in most cases. The added name can be used for personal name identification, because this is controlled by the researchers who added the alternate names, and the notation is consistent if the same person. We performed the identification method and could obtain a name pair of 520 sets. In the process of the method, a method of a personal name extraction was needed. We could extract personal names using Machine Leaning method (which consists of an appearance patterns of the names and SVM (Support Vector Machine)). Figure 1 shows examples of the appearance pattern, which an expression of a sequence, an extraction pattern and an extracted result. We used SVM to judge whether the extracted results indicate personal name or not. The SVM results were fed back to the appearance patterns, and we performed the extraction based on appearance patterns and judgment with SVN again. The feedback was preformed several time in the work.

Figure 1 Name extraction patterns

We prototyped text search system which supports alternate names using the above constructed personal pairs rather than simple text matching. In the search, for example you queries “忠棟”, then you can obtain the results including “忠棟” as a string, “伊集院忠棟” which is controlled name (well-known name or real name), and alternate names such as “幸侃”, “伊右衛門大夫”, “伊右”, “右衛門”, “伊大夫”, “忠金” (which are mentioned above).

3. 3. Conclusion

Personal name extraction method which we introduced above is useful only target document is “上井覚兼日記 (Uwaikakken nikki)”. In order to extract more generalization, preparing a pattern suitable to each historical material is necessary. We constructed a database which can be stored the personal name pairs. Currently we also have been collecting personal names from other texts and databases, and storing it in the database. The data in the database indicates the results as an identification of personal name. The data can assist to identify personal names in a material which reading comprehension has been not yet done. We expect that the method can be useful to progress of the study of Japanese history.

Bibliography

Ho, H. I. (2015). MARKUS – A Fundamental Semi-automatic Markup Platform for Classical Chinese. Proceedings of the 2015 International Conference on Digital Humanities.
Bol, P., Liu, Ch.-L., and Wang, H. (2015). Mining and Discovering Biographical Information in Difangzhi with a Language-Model-based Approach. Proceedings of the 2015 International Conference on Digital Humanities.