Hey there, fellow machine learning enthusiasts! I came across a fascinating problem in process mining that I’d love to tackle with graph neural networks (GNNs). The idea is to predict the next activity in a business process given a prefix graph. Sounds simple, but it’s surprisingly challenging.
Imagine you have a helpdesk process represented as a graph, where each node corresponds to an activity. You want to predict the next activity that logically comes next in the process. This problem is crucial in many industries, such as customer service, logistics, and healthcare.
The dataset consists of 4580 graphs, each with an average of 7 nodes and 15 total labels (activities). I’m considering using a 3-layer GCN for the prediction task, but I’m not sure if it’s the best architecture for this problem.
One of the main challenges is dealing with multiple process instances (graphs) during training. Should I treat them as separate graphs or merge them into one big graph while preserving per-node instance information? I’m concerned about how GNNs typically work with multiple small graphs.
I’d love to hear from someone with experience in GNNs or process mining. Have you tackled a similar problem before? What architecture would you recommend for sequence-based node prediction in process graphs?
Let’s discuss!