Worldwide, data is an essential resource for economic growth, innovation, job creation and societal progress. Therefore, technology focusses on applications translating data into usable variables for science or organisations. Optimising data analytics is essential for improving efficiency and innovation. Artificial Intelligence, in particular its’ subset Machine Learning, is increasingly used to achieve optimal analytics and outcomes in business and governmental organisations. Successful Machine Learning projects are dependent on multiple factors. These include correct algorithm selection, compatibility with organisation strategy and being in harmony with ethical standards. Combining these aspects into a method to set up machine learning projects could create great potential.

The demand for structural guidance for implementing machine learning has become evident – exampled by the recent issue regarding child allowance and Dutch Tax organisation. Lack of understanding regarding the technical aspects of machine learning, the organisational aspects of successful implementation of machine learning projects and the difficulties coming with meeting ethical standards, contribute to the suboptimal use of machine learning projects. In the
governmental sector, Standard Business Reporting (SBR) is used by Dutch organisations to improve the exchange of financial data. Within the need to improve the ongoing development circle of machine learning techniques, SBR could be a promising case. Systematic scientific literature analysis identified the gap of a structured method for setting up machine learning projects that consider a combination of the technical steps, organisational aspects and ethical aspects. A demand for structured guidance is called upon.

The research question of this master thesis is to fill this existing gap and is stated as follows: How can technical, organisational and ethical aspects be combined into a method that supports stakeholders to systematically set up machine learning projects in SBR context?

The outcome of this thesis is a method to help actors in SBR context to systematically set up machine learning projects. Furthermore, it will assess the use of machine learning in SBR context. Additionally, it will support the stakeholders to determine if the proposed machine learning project could be viable in their organisation so that the stakeholders have a better chance to develop a successful machine learning project.

The chosen research approach to develop this method is the Design Science Research Methodology (DSRM). The DSRM contains six steps: problem identification and motivation, definition of the objectives for a solution, design and development, demonstration, evaluation, and communication. The first step, problem identification and motivation, was formulated in the previous paragraphs of this summary.

In order to complete the second step of the DSRM, defining the objectives for a solution, the problems identified has been converted into design objectives to formulate what type of solution would be desirable. The design objectives were formulated on the basis of an extensive literature review towards ethical, technical and organisational aspects, and based on experiments conducted in collaboration with DUO. These experiments included one regression experiment, one classification experiments and the production of a Strategy Map. As a result, the following six design objectives were formulated and divided into two categories: what the method should include, and what the method should provide.

Design objective focussing on the method, to include:

  1. The designed method should include an ethical framework
  2. The designed method should include machine learning steps to create a model in SBR context
  3. The designed method should include a machine learning algorithm selection method, including multiple machine learning techniques
  4. The designed method should include organisational factors relevant for creating a machine learning project Design objectives focussing on the method, to provide:
  5. The designed method should provide an understandable process for creating a machine learning project in SBR context
  6. The designed method should help decision-makers to understand if machine learning can create added value in their organisation

The third step of the DSRM is focussed on the design and development of the method. The relevant data and insights for the development of the first version of the method in SBR context were derived from the literature review and experiments conducted in collaboration with DUO, an SBR stakeholder. Three important factors were extracted from the literature review and are combined into the method: the Ethical Impact Assessment for ethical aspects, the Strategy Map for organisational aspects and Knowledge Discovery in Databases for technical aspects. In addition, in collaboration with DUO, two machine learning experiments and one Strategy Map based on the strategy “using Machine Learning” were carried out. The input of the literature and the experiments resulted in a first tailor-made Machine Learning Project Method version. The designed method includes ten unique steps for setting up machine learning projects in SBR context, taking into account ethical, organisational and technical aspects.

Before the second iteration in the development of the method (Design Phase 2), a small but relevant selection of interviewees was made, and six semi-structured interviews were conducted. During these interviews, the respondents were asked to systematically evaluate the first version of the designed method and provide suggestions for improvement. The input of the interviewees was analysed, and eleven relevant suggestions were determined and implemented in the second design iteration.

This exercise resulted in the updated and second version of the method. The improved method is subdivided into ten unique steps: goal formulation, project team setup, context analysis, data collection, data preparation, algorithm selection, model testing, model adjusting, project evaluation. A concise overview of the final research product, the Machine Learning Project Method.

The method helps the user to evaluate the use of machine learning in their organisation by providing the stakeholders with a systematic process for creating a machine learning project in SBR context. Furthermore, the method includes an algorithm selection method. The designed method provides a comprehensive list of options which algorithms to use for a specific case, a framework to assess the ethical impact and various other aspects important to create a successful machine learning project in SBR context.

The fourth step of the DSRM is to demonstrate the designed artifact as a result of step three. In this research, the Demonstration of the designed method is done in collaboration with WSW. A project team was set up to use the designed method. The aim of the project was to estimate the added value of machine learning for WSW by building a machine learning model that determines the financial risk labels. After completing all steps of the method, a machine learning model was delivered: providing WSW with insight into the added value of machine learning. WSW was therefore able to expand their insights on determining the financial risk of their stakeholders. Furthermore, the demonstrate was successful in showing the methods’ applicability on a real case.

Evaluation is the second to last step of the DSRM. First, the method is evaluated. In summary, the design objectives are essentially completed. Although the design objectives were ambitious, they are all incorporated in the designed method. However, in order for the designed method to become a fully operational and validated method, the design objectives should be further investigated and developed. Supported by the interviews, the designed method has demonstrated to be effective in setting up a machine learning project in a real case. Second, machine learning in SBR context is put in perspective. Although this research recognises the potential of machine learning in SBR context, the experiments show that for implementing machine learning in the SBR context, the conditions are still not optimal and therefore are not ready yet to replace the current systems. However, it should be taken into consideration that the method is developed for a specific context: the SBR context. It is not clear whether it can be applied in different context. Furthermore, two out of four machine learning techniques are included in the designed method: the supervised and unsupervised machine learning techniques, and supervised machine learning has been tested during this research. The designed method is published on GitLab and the TU-Delft Repository to facilitate further research and development (Digicampus and Data | European Data Portal, 2020), thereby completing the final step of the DSRM.

In conclusion, this thesis describes the successful development and testing of a method: the Machine Learning Project Method. This method includes an algorithm selection method. The Machine Learning Project Method provides a structured method that helps managers to understand the process of setting up machine learning projects and provides them with guidelines on how to setup a project. Different aspects of managerial domains are integrated: organisational and ethical aspects and guidelines on managerial implementation.

The scientific contribution of this thesis lies in the theory of the designed method. The new method enhances the understanding of machine learning projects in SBR context, a form of structured data. This method, which combines technical, ethical and organisational aspects in a systematic approach, enables its users to obtain knowledge of the added value of machine learning, and to set up machine learning projects. The integration of these three pillars into a single method was not yet available. Therefore, this method fills the gap that other methods left open. At this moment it is unique in comparison to other methods, as it combines interdisciplinary aspects into one method. The newly created method is added to the scientific field and is shared to facilitate further research and development. Furthermore, the research provides insight of machine learning in SBR context. Evaluating the results of this research, it is found that at this moment, machine learning is not yet capable of generating the desired application and outcome in SBR context. Therefore, this research contributes to the development of the use of machine learning on structured data.

The practical contribution of this research is that the method provides a structured and partly iterative process to set up machine learning projects. Following the Machine Learning Project Method, a machine learning model can be made with respect to organisational, ethical and technical aspects. This allows the user of the method to evaluate the added value of machine learning in the organisation: it guides users towards asking the right questions, also making them aware of the limitations and impossibilities of machine learning. This method prevents initiating a machine learning project without an estimation of the applicability. Furthermore, it gives managers, policy makers and engineers an overview of what it takes to start a machine learning project, including preconditions and restrictions. It provides insight into possible applications of machine learning and enables a structured process for both engineers and managers, creating alignment and understanding between management and engineers. When all steps are completed, the method provides the following deliverables: insight into whether machine learning has an added value for the organisation and an ethical, potentially cost efficient, yet simple, prototype machine learning model.

The designed method has proven to be reliable in achieving a machine learning project with usable outcome and is presented on GitLab and shared on the European Data Portal (Digicampus and Data | European Data Portal, 2020) for further use and development. The method developed fills a niche in the current (knowledge) gap reviewing the application of machine learning within SBR in line with the research question. Considering this demonstrable added value, it is useful to further elaborate and apply this method. Recommendations for future research can be summarized in three-fold.

  1. Further focus on fine-tuning of the methodology, with the aim of a fully tested and operational model, including sufficient iteration steps. This will be an important next step to translate this prototype into a reliable model in a professional, operational organisation using SBR.
  2. The promising generic part of this model should be conceptualised in order the broaden the concept for machine learning in wide scale of algorithms and applications. This could ultimately contribute to uniformity within use of machine learning.
  3. Machine learning and ethical issues go hand in hand. The risks this entails is still underestimated in operational applications. Further research into the possible negative impact of machine learning on society must be conducted. Therefor it should be researched how organisations and society can be included in the process of constructing machine learning.