Technology:
Big Data
The term Big Data refers to technologies that work with large amounts of data. Their challenges are usuallycapturing, storing, analyzing, searching, sharing, transfering, visualizing, querying, updating and keeping the privacy of the data.Datateam also uses data as their fuel. They get their data from the customers and go through these steps:
- Analyzing data
The data that our site obtains is stored in a traditional manner. They get the data in tables that hold the units of data as unique entries. “For example, one project we have been working on lately is for a ministry. They wanted us to provide them with data analysis tools to determine the efficiency of their employees “ (D. Osman, personal communication, December 1st, 2017). It is obvious that without data analysis there would be no Big Data technologies since the top priority of such technologies is analyzing the data and then doing something with the results.
- Transfering and Storing data
After Datateam gets their data from their customers in tables they transfer it to graphs and store them. They do not replace the old way of storage, instead, they add a new way. To a question asking for a reason for not replacing the traditional way the answer was: “Each one of them has its own advantages and disadvantages. It is hard to see properties in the graph databases. You can easily find the relationship between Hurkan and Dilen but it is hard to find the type of petrol Hurkan’s car is using and what other cars are using it for example” (A. Hurkan, personal communication, November 25th, 2017). As we can see, their technology is not perfect and there are some cases when it is better to use the traditional way of storing data.
- Querying and Visualizing data
After analyzing and storing data obtained from customers in graphs, Datateam uses their GraphVision tool to visualize it for the customer. “You can query graph databases with this language. So, after a query we get a result. But I need to see this result. This is where our GraphVision comes in. You can see the result via nodes and lines between them” (A. Hurkan, personal communication, November 25th, 2017). GraphVision plays a big role in Datateam’s work. This is the product that is used by their customers to see the results of their work and helps reach the bottom line.
- Privacy of data
Privacy is the biggest concern in Big Data technologies. Especially for Datateam since their main clients are ministries and the government. This point will be explained as a final value in the end of the page.
Machine Learning
Datateam has access to large amounts of data. Thus, they can apply machine learning algorithms to find something that customers want. As mentioned above, one of their projects recently was from a ministry. They applied a machine learning algorithm to see the efficiency of the employees. Machine Learning algorithms of this kind should be used carefully because it affect someone’s life if used irresponsibly.“Machine learning applications in data analysis is a particular example on this. If the initial data provided to the machine is biased, then the machine will continue to produce biased results if proper precautions are not taken, and this can be a major problem. For example, in our work efficiency analysis technology we have mentioned before, inaccurate analysis results ca which would be very unfair for them. Therefore, we are trying to develop our technologies with these concerns in mind.” (D. Osman, personal communication, December 2nd, 2017). As we can see, Datateam approaches machine learning algorithms responsibly most of the time and does not cause problems in peoples’ lives.
Users/Customers:
Datateam is a user driven company. All of their actions depend on the customers. They do not have their own researches or data storages. Thus, customers are very important for their work. The main customers of our site are ministries and the government. However, sometimes they can work with non governmental companpanies as well.
- Ministries/Government
As it was stated above, these are the main customers of our site. Ministries and the government have huge amounts of data which is very private. They are the reason why Datateam employees have to work outside. They can only access the data where their customers tell them to because of the private concerns. This point will be explained below.
- Companies Although, Datateam works with non governmental companies as well, people we interviewed were in charge of the first group of users discussed above. Other employees did not want to give us the interview the reason being their lack of time. However, we found out that even if the companies are non governmental they are still concerned about their privacy and demand Datateam to work inside their institution.
Modifications:
- Instrumental
According to Yilmaz Tekirdag the data is one of the most valuable resources, just as gold or oil was in the past. (Y. Tekirdag, personal communication, December 2nd, 2017) The world consists of data and we can use it to power up the machines and supply them with information that they need in order to work. Most of the businesses and organizations have an instrumental view on Big Data and machine learning because their success depends on data calculations and diagnostics which are provided by these technologies.
- Efficiency
As it was mentioned above, the ministries and government use machine learning to calculate the efficiency of certain groups of employees. In the same way it can be used by smaller organizations to estimate the efficiency of their employees and make the decision according to these results. By obtaining the data from these companies and using the algorithms which calculate the efficiency of the employees, Datateam provides the results which lead particular companies to encourage employees to work at the same level if the result was good, or to work harder if the result of the efficiency was not on an appropriate level.
- Reliability
It is very crucial for DataTeam that the user can trust the information that he or she obtains from the results of machine learning. Therefore they believe that there should be transparency of how the result was obtained from the big amount of data. This was achieved by showing the steps of conversion of data to the final result in the form of the pipeline. According to Osman Demir, the main concern of the Machine Learning is that some results can lead to biased opinions due to inaccurate results. For example DataTeam is often asked to obtain the efficiency of the people working in a particular company. And Osman Demir said that there were cases when the whole group of employees were almost fired because of the low efficiency, even the ones who worked hard. However sometimes DataTeam updates and improves their algorithms to avoid these problems. (D. Osman, personal communication, December 2nd, 2017)
- Final Values – Privacy
Privacy is another crucial factor which should be considered when analyzing big amount of data. According to Osman Demir “The confidentiality of data is often a headache for our business. In general, we are not provided with connection to the data of institutions from outside, so we have to do our work at institutions.” (D. Osman, personal communication, December 2nd, 2017). The companies are not willing to share their information with anyone, therefore DataTeam has to go and implement their algorithms and analyzing machines inside these companies and organizations, so that the private information would not leak to the public. The DataTeam also uses Private Impact Assessment (PIA) evaluation system which helps them to have a better understanding of “how personal information can be used, stored, shared out and also decreases the risks for privacy.” (Y. Tekirdag, personal communication, December 2nd, 2017). By using this system, it is possible to find the problem or error at the early stages of the evaluation, so that they can fix it before it is too late. Ultimately, it can be concluded that the DataTeam’s main approach to satisfy the needs of their users, by providing the reliability and the privacy, which is very crucial in the work with big amount of data.