Major research activities
We are interested in data science, which includes topics such as data mining, machine learning, artificial intelligence, big data, and databases.
Data mining is the process of discovering knowledge or patterns
from massive amounts of data. Machine learning constructs algorithms that automatically improve with data. Specifically, we are interested in analyzing data that are structured, such as graphs. Our current three major projects are as follows.
Our first major project is the clustering of evolving graphs. A human network is represented by a graph in which each vertex and edge, respectively, correspond to a human and relationship between two humans in a human network. A human network can vary over time, so the network is represented by an evolving graph. Most conventional methods cannot detect the division and mergence of clusters because they assume the number of clusters is constant over time. We have developed an algorithm for clustering evolving graphs that detects not only human communities as clusters but also the division and mergence of these human communities. Our method is based on spectral clustering, which is a hard clustering method, so our method is fast, accurate, and robust to outliers.
Another major project is graph classification. Chemical compounds can be represented by graphs, where each vertex and edge correspond to an atom and chemical bond, respectively. It is beneficial to predict the properties of newly developed chemical compounds by learning existing chemical compounds and their properties. However, most conventional methods in machine learning assume that each example, such as a chemical compound, is represented as a vector, but there is no way to convert chemical compounds to vectors without information loss. One method for analyzing chemical compounds without converting them to vectors uses machine learning algorithms with kernel methods. We have developed a kernel called the shorted Hadamard code kernel (SHCK), which is based on the Hadamard code. SHCK has the characteristics of both fast computation and precise expression for measuring the similarity between two graphs.
The third major project concerns labeled graph enumeration. As mentioned above, chemical compounds can be represented by graphs. In addition, atom and bond types in the chemical compound correspond to the labels of vertices and edges in the graphs. Although it is said that there are more than 1060 possible chemical compounds, only 108 chemical compounds are available to the general population. Our challenge in this project is to quickly enumerate the labeled graphs that can be found in nature. We have recently developed software that can enumerate about 75,000 labeled graphs per second without parallel computation.
Major relevant publications
- Tetsuya Kataoka and Akihiro Inokuchi. Hadamard Code Graph Kernels for Classifying Graphs. Proc. of International Conference on Pattern Recognition Applications and Methods, pp. 24-32, 2016
- Akihiro Inokuchi and Ayumu Yamaoka. Mining Rules for Rewriting States in a Transition-based Dependency Parser for English. Proc. of International Conference on Computational Linguistics, pp. 1275-1290, 2012
- Akihiro Inokuchi and Takashi Washio. Mining Frequent Graph Sequence Patterns Induced by Vertices. Proc. of SIAM International Conference on Data Mining, pp. 466-477, 2010.
- Hisashi Kashima, Koji Tsuda and Akihiro Inokuchi. Marginalized Kernels Between Labeled Graphs. Proc. of International Conference on Machine Learning, pp. 321-328, 2003.
- Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. Proc. of European Conference on Principles of Data Mining and Knowledge Discovery, pp. 13-23, 2000.