The LODEM Project - LOD for Enhancing Manga contents

What is LODEM (LOD for Enhancing Manga content) ?

Goal

Manga is becoming accessible through electronic books, web posting and sharing sites, SNS, and other various approaches on the Web. Manga on the Web is mostly provided using bitmap images, which are expressed as a set of fine dots for each page or several pages of manga. Therefore, the convenience of searching such manga content on the Web has fallen behind when compared to text or video content.

The purpose of this project is to allow for a high usage of manga content in the digital environment. We are working on the creation and technology development for metadata (i.e. data about data) about content of manga images. In detail, searching of manga content, such as text search and scene search, can be realized using our proposed metadata.

Characteristic

In Japan, 270,000 manga have been published and more than 10,000 new manga are published every year (according to Agency for Cultural Affairs Media Art database). It is quite difficult and labor-intensive to create specific metadata for a huge amount of manga.

To effectively create a vast quantity of manga metadata, this project adopts two-step approach to ensure quantity and quality of metadata creation: the first step is to create metadata by machines using programming, and the second step is to manually correct the errors in the created metadata.

Metadata created in this project will be provided in the form of open data as Linked Open Data that can be readable by computers on the Web.

Subject 1. Metadata Creation for Manga Frame

Task 1. Identifying if manga frame areas are correct or not (time consumption: around 2 seconds per task)

This is a microtask to identify whether or not the extracted frame areas from manga images, made using a tool for automatic identification of frames, are correct.

There are already diverse approaches proposed for automatic identification of frames of manga images. However, the existing approaches have their own pros and cons, and an approach for general usage is not widespread.

In this project, we have developed an automatic frame identification tool named MangaCV using an improved and highly-precise identification algorithm. There were many errors in the identified frame areas in our experimental data created using this tool. In this microtask, workers are assigned to identify if a frame area is correct and create correct frame data.

*Manga images are from「ブラックジャックによろしく」(Author: Shuho Sato)

Perform Task

Task 2.　The order of frames in a page (time consumption: around 5 seconds per task)

The order of frames are essential to understand and process contents of manga. However, there are variety of patterns to express the order. It is difficult to identify all of them automatically. On the other side, human can understand the order of frames and read manga in almost cases. In this microtask, workers are assigned to select frames in a page in order.

*Manga images are from「飛ぶ東京　切符と花束　上・下」(Author: Hinata Kino)

タスクを行う

Subject 2. Metadata Creation for Dialog in Manga

Task 3.　The combination of dialogue and speaker (time consumption: around 3 seconds per task)

As same as the order of frames, the combination of dialogue and its speaker is easy to identify for human though it is still difficult by computer. In this microtask, workers are assigned to judge if the combination identified automatically is correct or not.