Larry Hughes: Dataflow Diagrams

Dataflow Diagrams

Processes: Circles ("bubbles") labeled with actions represent functions that transform inputs into outputs.
Data Flows: Directed arrows labeled with data type represent data moving through the system
Data Stores: Bars (also boxes and ellipses) labeled with data type represent data or aggregates of data that must be remembered for a period of time (typically implemented as files or database)
Externals/Terminators: Boxes represent external entities (source/sink) with which the system communicates (e.g., individuals, groups, external computer systems)

The Context Diagram

shows the relation (data flows in and out) of the "system" to external entities (i.e., terminators)

Diagram/Figure 0

first level decomposition of system into major processes (e.g., input, transform, output)
contains 2-10 processes but do not suddenly show a huge amount of detail or show a trivial decomposition
hides external entities but maintains in/out flows
data stores are shown here (and at lower levels)

Subdiagrams / Process Descriptions

Data Dictionary

each process should have both input and output
What does it mean if one is missing?
each store probably has both input and output
What does it mean if one is missing?
Why have a write-only store?
Might have a read-only store.
data cannot flow between stores
not even if you put the stores close to each other
diagrams must be balanced across levels
flows in/out of context diagram == flows in/out of diagram 0
all parent/child digrams must be balanced

ALL elements should be named
use meaningful/specific names (not "Transform Data")
data flows are named after the flowing data
record, form, order, ...
unlabeled flows represent complete records
data stores are named after the stored data
records, forms, orders, ...
processes state transformations as imperatives
Read Input, Calculate Interest, Save Results, ...
(not Reader, Calculator, File Cabinet)
processes should be numbered hierarchically
1.2.4 is a subprocess of 1.2
external entities named after what they represent
person, organization, other system

The process may be unstructured, tedious, and iterative, but the result must be well-organized.

determine the system boundary
try to consider a larger context at first because extending systems is harder than reducing their scope
identify system inputs/output
input/display devices create inputs/outputs, but focus on what information is flowing, not how
emphasize data flow, not material flow
draw initial context diagram
mentally trace data paths
does each input contribute to an output?
identify stores
files (electronic or manual), documents, collections, references
draw top-level diagram (0) of whole system
focus your efforts here, perhaps showing more detail than other diagram
trace some complex processes forward from inputs backward from outputs
develop an initial data dictionary
system inputs/outputs, data stores, internal data flows
emphasize logical rather than physical description (do not model trivial changes of format/media)
evaluate the diagram and refine/repartition the diagram
use walkthroughs to find mistakes
combine many small processes / break complex processes, but make the changes based on content, not just size
continue decomposition until primities are reached
stop when a transformation is trivial (e.g., a single function)

Completeness

Consistency

Correctness

Communication

Can people (users, developers, ...) understand the description?
(users may only look at Diagram 0)
Are good names used?
Is the hierarchy of reasonable breadth/depth?
no diagrams with 1-2 or 10+ processes
no process specifications of 1-5 or 100+ lines
Are the diagrams neat?
not a tangled mess (use multiple instances of stores to avoid a web)
Are the materials well-organized?
table of contents, paginated reports, section headings

Independence of Transformations

Maximize modularity, minimize coupling
design general processes
Use topological repartitioning
"promote" subdiagrams and then refactor it
Reduce shared information (among processes)
use a database to store data for later use
Use direct data flows
not through several processes

Aggregation of Related Transformations

Maximize cohesion within subdiagrams
processes in a subdiagram should share attributes

From Perlman, Ohio State University, 1996