Process Model Analysis Week 2

Event Logs

Consist of a case ID, activity name, time stamp, plus other data.

Let's take a look at an e-mail.
The e-mail has:

  • A sender
  • A set of receivers
  • A subject
  • A sent date
  • A received date
  • A body
  • Lots of other data
What would be the case ID, the activity name, the time stamp?
It would have to depend on the process that is being measured. For example, let's say that we were trying to figure out the amount of time that it took a receiver to reply to an e-mail. Then the case ID might be the receiver, the activity name might be reply and the timestamp would be the time of the reply.

According to the class, one possibility would be the sender is the resource or the activity name. The set of receivers would be "other data", the subject would be the case ID and so on.

Looking at a student database, the case ID could be the student, the exam date could be the time stamp, and the course could be the activity name.

The mapping really depends on the context and the question that is being asked.

XES is a - stands for eXtensible Event Stream - a standard format for event Streams.

Various representations of data -

  • control-flow - step by step process
  • data-flow - who needs what when
  • time
  • resources
  • costs
  • risks
  • ....
The representation is critical. The model has to capture really well, and the end user will have specific needs/

Petri Net

Network is static and composed of places and transitions. Places hold tokens. Transitions produce or consume tokens. A state in a Petri Net is called a marking.

A transition is enabled if each of the inputs that support it contain a token.

A reachability graph is a transition system with one initial state and no explicit final marking.

Workflow Net

A workflow net should have a well-defined start and end and should be free of obvious anomalies (soundness).. What makes up an anomaly?
A workflow net has one source place and one sink place. Everything else should be a path from the source to the sink.
A Workflow net is sound:

  • If it is safe - a place cannot hold multiple tokens at the same time. Think of it this way, the invoice can be copied, but the administrator shouldn't be working on two copies of the same invoice at the same time.
  • Proper completion - if the sink place is marked, all other places are empty. An administrator should NOT be working on the order after it has reached the final state
  • It should always be possible to reach the end state. The paperwork should not get lost.
  • Absence of dead parts. There should be no cases where the paperwork gets lost, blocking the process.
  • If and only if the short-circuited Petri net is live and bounded.

Model Based Analysis

Verification (soundness checking) and performance analysis (simulation). But the analysis is limited on the quality of the model. In other words, are people really doing what they say they are doing. Is the administrator really adding value? How to re-verify that the model is correct.
Process mining is the direct connection between the model and the event data.

Alpha Algorithm

This is a process discovery algorithm.
This is a control flow - just the order of activity, by case, while ignoring any of the other data.
So, we could wind up with a sequence of [(register_order, check_stock, ship_order, handle_payment),(register_order, check_stock, cancel_order),....]
So, the goal is to come up with a set like: $$L_1 = [\langle a, b, c, d\rangle^3, \langle a, c, b, d\rangle^2,\langle a, e, d\rangle ]$$ where the trace has happened 3 times, has happened 2 times and once.
The alpha algorithm is to take this event log and create a model that fits what has been observed.
For this week, we are looking at fitness, or the ability to explain desired behavior. Later on, we will look at precision, generalization and simplicity.
So the algorithm is looking for the following

  • Direct succession: x>y - case x is directly followed by y
  • Causality: x \(\rightarrow \) y - if x>y but not y>x
  • Parallel: X||y iff x>y and y>x
  • Choice: x#y iff not x>y and not y>x

An alpha network can discover choices, concurrency, loops. But it cannot cover all situations. Limitations:

  • Implicit places - A place that doesn't add anything. What does this mean
  • Loops of length 1 and length 2
  • \( \tau \) represents a transition that doesn't have an event.
  • The resultant model might not be a sound WF-net.
  • representative bias
  • Noise can really affect the model, rare events will muck up the model, because there is no determiniation of how many times an trace has happened
  • Completeness

Petri Nets

A marking is dead if there is no transition enabled in it
A Petri net has a potential deadlock if there is a reachable dead marking.
A transition t is live if it is possible to reach a marking that enables t
A live petri net if all the transitions are live
Complete traces go from start state to final state


Petri-NetOne version of a way to document a process model control-flowSomething moving from here to there k-boundedA place (p) is k-bounded if there is not reachable marking with more than K tokens in p
A petri net is k-bounded if all places are k-bounded
safeA place (p) is safe if it is 1-bounded. A petri net is safe if all the places are 1-bounded Workflow Nets Source PlaceFor a Workflow net, this is is the start or i. Sink placeFor a Workflow net, this is the end or o markingA state in a Petri net

Published on 11 November 2014

Brian Hoover is a full stack software engineer with many years of experience. He's also a facilitator at the University of Phoenix