KNIME is an open source business intelligence tool based on the Eclipse environment. The KNIME development environment is shown in Figure 1. As can be seen from the figure, KNIME controls data integration, cleaning, conversion, filtering, statistics, data mining, and finally data visualization through workflow. The entire development is carried out in a visual environment, and the development of a process can be completed by simply dragging and setting. Through KNIME’s white paper, we learned that the full name of KNIME is The Konstanz Information Miner. It is designed as a platform for teaching, research, and collaborative work. Figure 1: KNIME development environment KNIME architectural features KNIME is designed as a modular and easy-to-expand framework. There are no dependencies between its processing units and data containers, making them more suitable for distributed environments and independent development. In addition, it is relatively easy to extend KNIME. Developers can easily extend KNIME's various types of nodes, views, etc. In KNIME, the data analysis process consists of a series of nodes and the edges connecting the nodes. The data or model to be processed is passed between nodes. Each node has one or more inputs and outputs. Data or model enters from the input end of the node and is processed by the node and then output from the output end of the node (Figure 2). All data flows passed between nodes are encapsulated into DataTable objects. In order to handle large amounts of data, KNIME allows only part of the data to be kept in memory for processing, thereby improving processing efficiency. Figure 2: Node processing model Node is the main processing unit in KNIME. The Node class encapsulates all processing functions. If you want to develop user-defined nodes, you must implement a NodeModel class and one or more NodeView classes. Of course, if you define a custom node without dialog boxes and views, you don't need to implement the NodeModel and NodeView classes. The extension of nodes is based on the MVC design pattern. In addition, the input port and output port of each node are numbered. If there are multiple ports, the index of these ports starts from 0. Figure 3: KNIME node structure KNIME provides a large number of nodes, which contain different functions, including IO operations, data processing, data conversion, data mining, machine learning and visualization components. Among them, IO operations can read and write data from the file system; data processing includes data row and data column filtering, partitioning, sampling, sorting, merging, etc.; data transformation includes missing value replacement, matrix transformation, etc.; data mining algorithms include KMEANS, decision tree , regression, association rules and other methods. After the processing is completed, it can be displayed through a variety of graphics, including scatter plots, histograms, pie charts, line charts, etc. As mentioned earlier, KNIME provides an easily extensible architecture. If you want to develop a new node, you need to extend three abstract classes: NodeModel: This class is the most important class for extending nodes, and all key processing work is completed in this class. Inheriting this abstract class requires implementing three methods: configure(), execute() and reset(). The first method is used to receive the information coming from the input port and create the corresponding output port definition. The second method is used to process the input data and create a data table or model for output. The last method resets all operations and discards all intermediate results. NodeDialog: This class is used to specify the dialog box of the node. Each node has a dialog box for setting all parameters of the node's processing functions. NodeView: This class can be rewritten multiple times to implement different types of views to correspond to different models. After all classes are rewritten, a NodeFactory needs to be implemented to create new instances. The above is some information about the business intelligence tool KNIME. After reading KNIME's white paper, I also tried KNIME, and it was really easy to get started. A complete processing flow can be designed in no time. You might as well give it a try, it will definitely be interesting. If you can learn more about similar tools or frameworks, you will have more choices when formulating the technical route of the project.
Remember to adopt it