Interview 3 – Responsible Data Science

Sinem: Please describe general developments in the technology and how your company will contribute to the field.

Hurkan Akbiyik: There are many ways for data storage one of them is the folders that is the most frequent and daily use for storage. The types of data can be pictures, word folders, voice records or videos etc. In today’s developing technology, the websites such as Facebook, Twitter use traditional database for holding the data. Traditional database is made up by the rows and columns. We use ID’s to identify something as unique such as the identification numbers that exist only in one in the world. We add the name, surname kind of data into it and that structure is called as SQL. It has its own inquiry system and that structure is called table. With the developing technology, some aspects of the graph become important. In graph, except with the row column relationship, the data are stored in a relational way. For example, Hürkan- working- Datateam (consist person table, company table). While doing this, there are developing methods that have questions such as one person can’t have more than one company logically but a company has has more than one person. In that situation, Hürkan work in Company 1( we use id numbers instead of the company names because these ids also belong to something else). It is in that way in traditional structure. To reach that information easily, there are structures that are called as graphs, which are made up by the nods and edges(the bonds). For example, Hurkan is working for datateam, and datateam is connected to Cyberpark, and Cyberpark is built in Bilkent University. Let Hurkan do master degree at Bilkent. This way, you keep your data in a relational manner, in graph database. In traditional methods, we keep data in the tables, as rows and columns. All big companies keep it this way. But they all need this (graph database). Let there be an Ahmet in our example. Let Ahmet be a student in Bilkent. Let us ask the closest relation between Hurkan and Ahmet. Hurkan does master degree at Bilkent, Ahmet is a normal student there. They have a common node at Bilkent for example. Or if this connection did not exist, we would have a connection through Bilkent, Cyberpark, and Datateam, it would be longer, my connection with Ahmet would be based on the company. Or maybe the previous institutions I have worked for, maybe I would have relations through them. My wife could be working in one of them, and she could be doing masters degree at Bilkent. Therefore my closest relation with Ahmet, called as the shortest path, would be based on that, for example.

Kubra: Is it not possible to do this in the other, traditional, tabular method?

Hurkan Akbiyik: It works this way in the other method. There is a notion of depth in tables. If you query a table multiple times, it gets into the depth. How it works? For example, Hurkan is in people table, we go to company table from here, the second depth, we go to institutions table, third depth, we go from institutions table to universities table, fourth depth. We can go to people table once again, for example, for my wife. What is the query logic in tables? Search through all the table, bring what is there. If you live in a country with 80 million population, If you think about querying a table of 80 million people, it is easy for first depth, not so hard for second depth, okay for third, but starting from fourth depth, issues begin to arise. What is this fourth depth? The cousin of Hurkan. The son of the brother of my father, fourth depth. It is very hard to find this in the traditional structure. For this reason, there is a concept of graph database. It is faster and more practical for relational queries. Okay then, companies hold their data in traditional methods. But they also want to use graph method. What do companies do? There are foreign tools that convert your data to graph structure like Neo4j and TitanDB. So what does datateam do differently from these tools? It uses Oracle database, which is a traditional database, through the java packets it writes inside the traditional database, it allows you to use a graph database inside your graph database. We can use graph database inside of a traditional database. So, your data stays where it is, then I’m coming and adding a module so that you can use a graph database. This is what our datagraph does.

Kubra: Why aren’t they just using the datagraphs instead of traditional databases?

Hurkan Akbiyik: Each one of them has its own advantages and disadvantages. It is hard to see properties in the graph databases. You can easily find the relationship between Hurkan and Dilen but it is hard to find the type of petrol Hurkan’s car is using and what other cars are using it for example. So, each one of them has their own pros and cons. Databases consist of three basic bases. One of them is consistency for example. You cannot achieve all three bases at the same time. Usually you can have up to two and you sacrifice one.

So, we translate the traditional database to a graph database. But it is kept in a text format. You cannot just look at it and see a graph. It is easier with traditional databases because there are tables and you can just open Excel for example. But with graph databases you need to see it as well. So, we have a product called Graph Vision. Just like SQL for traditional databases we have DQL (DatagraphQueryLanguage). You can query graph databases with this language. *shows an example of a query on the board*. So, after a query we get a result. But I need to see this result. This is where our Graph Vision comes in. You can see the result via nodes and lines between them. *draws a node and a line on the board*. Then you can double click on a node (circle) and see more information about this node. We use such technologies as Node.js, Angular.js, D3.js.

Sinem: What are the main applications for this technology?

Hurkan Akbiyik: Datateam works in a tailor method in general, that is, there exists a software project, it was written before, but the institution will not be developing that code, and another institution takes the code and develops it. The second option is the direct approach, in which people come and ask for a project, and we implement it from scratch. In general, our products, as I have mentioned earlier, are focused on providing a graph database inside a regular database without interference. There are many things that can be done with data, you can process it, visualize it, mine it, use machine learning for grouping, smart teaching, etc. For instance, you show a code pictures of dogs and cats, and teach it which is which like a child. Then you ask for a picture you have not shown before, and it gives a probabilistic response on if it is a dog or a cat. These type of things can be done, and we are doing it.

Kubra: What are the risks and benefits of this technology?

Hurkan Akbiyik: Time is a risk factor, and there are also risks associated with data usage. Security is the biggest problem of our age. institutions can not get past their security boundaries. They do not provide VPN access to us, so we have to work at the place in person. There are data security problems. You can not reveal the acknowledged data to the outsiders, and if you would want to use the data for advertising, you have to get permission. These are all risks that we have to consider in this business.

Sinem: What sort of modifications will be made to the technology in the future?

Hurkan Akbiyik: In data science, major developments are underway in topics like machine learning, deep learning, and neural networks, and these will highly affect human lives. At the moment, we are teaching robots how to operate. In the future, they will analyze the data provided to them, and calculate the best course of operation themselves. The implementation of this technology will be the solution to many problems in the future.