Consist of a case ID, activity name, time stamp, plus other data.
Let's take a look at an e-mail.
The e-mail has:
According to the class, one possibility would be the sender is the resource or the activity name. The set of receivers would be "other data", the subject would be the case ID and so on.
Looking at a student database, the case ID could be the student, the exam date could be the time stamp, and the course could be the activity name.
The mapping really depends on the context and the question that is being asked.
XES is a www.xes-standard.org - stands for eXtensible Event Stream - a standard format for event Streams.
Various representations of data -
Network is static and composed of places and transitions. Places hold tokens. Transitions produce or consume tokens. A state in a Petri Net is called a marking.
A transition is enabled if each of the inputs that support it contain a token.
A reachability graph is a transition system with one initial state and no explicit final marking.
A workflow net should have a well-defined start and end and should be free of obvious anomalies (soundness).. What makes up an
anomaly?
A workflow net has one source place and one sink place. Everything else should be a path from the source to the sink.
A Workflow net is sound:
Verification (soundness checking) and performance analysis (simulation). But the analysis is limited on the quality of the model.
In other words, are people really doing what they say they are doing. Is the administrator really adding value? How to re-verify that
the model is correct.
Process mining is the direct connection between the model and the event data.
This is a process discovery algorithm.
This is a control flow - just the order of activity, by case, while ignoring any of the other data.
So, we could wind up with a sequence of [(register_order, check_stock, ship_order, handle_payment),(register_order, check_stock, cancel_order),....]
So, the goal is to come up with a set like:
$$L_1 = [\langle a, b, c, d\rangle^3, \langle a, c, b, d\rangle^2,\langle a, e, d\rangle ]$$
where the trace has happened 3 times, has happened 2 times and once.
The alpha algorithm is to take this event log and create a model that fits what has been observed.
For this week, we are looking at fitness, or the ability to explain desired behavior. Later on, we will look at precision,
generalization and simplicity.
So the algorithm is looking for the following
An alpha network can discover choices, concurrency, loops. But it cannot cover all situations. Limitations:
A marking is dead if there is no transition enabled in it
A Petri net has a potential deadlock if there is a reachable dead marking.
A transition t is live if it is possible to reach a marking that enables t
A live petri net if all the transitions are live
Complete traces go from start state to final state