The map and problem described here were part of my presentation Mapping as a tool for thought, and mentioned in my interview with @john.grant and Ben Mosior (to appear sometime soon in the Wardley Maps community youtube channel). I’m looking for ideas on how to make this map easier to understand and useful.
Problem I’m trying to solve
As a consultant (and as someone always trying to keep up with technology) I’m interested in being able to answer three questions of a language or technology:
- How easy it is to find work/workers in the area right now?
- How hard it is to learn?
- How easy is it going to be to find work/workers in the area once I’m proficient enough?
Also, I need to know the relationship between any of them.
This problem has been on the back of my mind for many years, and upon getting proficient with Wardley mapping, I thought I could just map it. Of course, it’s not a Wardley map, because the axes are completely different, but having anchors and movement, it is spiritually close enough for me.
In the diagram I will show in a while, I have placed technologies I am proficient in, currently learning, or looking forward to learn. In all of them I am at least a beginner in the sense that I know what they are used for and have done some minor PoC (proof of concept) to get an idea of how the work.
Looking for axis metrics
There are several ways you can address this technology space to answer the questions above. The first and easiest metric, and one of the axes I have used (Y axis) is Difficulty. Since I know something about each technology I can rank them on Difficulty, at least in relationship with each other. It’s only a qualitative metric of difficulty, because in any new technology there are always unknown unknowns. There is no movement assumed in this axis, because Difficulty is supposed to be consistent throughout (leaving aside the more you know the easier it is to learn as well as familiarity with similar concepts that offset that, you could think of these two concepts as doctrine in such a map).
One natural metric for the other axis could be popularity, as measured by any of the several programming language/framework popularity rankings. You can use popularity as one of the axes, and use arrows to indicate whether it is growing in popularity or diminishing in popularity. But, popularity alone does not help in answering questions 1 and 3. What we need is knowing how large the market for this technology is, and how large the pool of workers in this market is. Could we use either as an axis?
If we were to use market size as X axis, we would probably have large markets on the right and small markets on the left, we would likely use arrows to indicate growing markets and shrinking markets. But, market size alone won’t answer the questions either. A small thought experiment: imagine we have the largest market possible, it is growing… but the pool of workers for that technology is 2x the size of the market. It would be impossible to find work there (but, would be easy to find workers). This suggests that a possible correct for the X axis is market saturation, i.e. the ratio of market size with worker pool. Highly saturated markets are uninteresting to look for work, but are very interesting if you are starting a company: you’d have an easy time finding hires for that technology. Market saturation is related to flows (as in Thinking in Systems style flow analysis) of users into a variable-sized container.
Markets become saturated in one of 3 ways:
- Market is growing, but the pool of workers grows faster
- Market is stagnating with a growing pool of workers
- Market is shrinking faster than the pool of workers is shrinking
Cases 1 and 2 are the most usual (I’d put Python as type 1 and Java as type 2), but 3 is an interesting situation: it would indicate a technology that has died in favour of another. Workers in that pool have retrained in the new technology, but are still in pool for the dying technology (for instance, traditional MapReduce).
Markets become desaturated in one of 3 ways as well:
- Market is growing faster than the pool of workers is growing
- Market is stagnating with a shrinking pool of workers
- Market is shrinking slower than the pool of workers is shrinking
Likewise, here, cases 1 and 3 may be the most interesting. I’d put Kubernetes in 1, and Scala in either 2 or 3. Please note that not only are these subjective evaluations, but are not meant to be negative. Scala has been my preferred language for a long while.
We can represent all these with slanted arrows: slant up covers growing markets, slant down covers shrinking markets. And then the arrow points left or right whether it is becoming saturated or desaturated.
With market saturation as an axis and arrows to indicate evolution of a technology, we can now almost answer questions 1 and 3. There is still the question of market size, which can’t be represented with such a relative measure. Although we could add circles to represent current market size, that would bring an already weird map to more weirdness. Hence, market size is not considered.
The map
Here you can see the map. Before I get a bit into the topography of it, let me quickly define some of the technologies:
APL: programming language based on non-ASCII symbols designed in the 70s. Not extensively used, but in use
Airflow: workflow scheduler for data operations
Akka: Scala/JVM actor framework used for reactive programming, clustering, stream processing, etc
Arrow: cross-language platform for in-memory data. Used in Spark, Pandas, etc
Awk: special-purpose programming language designed for text processing
Beam: unified model for data processing pipelines. Can use Spark, Flink and others as execution engines
Dask: cluster-capable, library for parallel computation in Python.
Datafusion: rust-based, Arrow-powered in-memory data analytics
Docker: containerisation solution
FP: as in Functional Programming. Software development paradigm based on immutable state, among other things. Scala and Haskell are some the most mainstream languages for it
Flink: cluster computing framework for big data, stream focused
Forth: stack based, low level programming language. Not in common use.
FoundationDB: multi-model distributed NoSQL database, offering “build your own abstraction” capabilities
Go: statically typed, compiled programming language
Haskell: statically typed, purely functional programming language
Hive: data warehousing project over Hadoop, roughly based in “tables”
Kafka: cluster based stream processing platform (often used as a message bus) written in Scala
Kubernetes: container orchestration system for managing application deployment and scaling. Written in Go, depending (non-strictly) on Docker
Presto: distributed SQL engine for big data
Python: interpreted high level programming language, very extended in data science and engineering
Rust: memory safe, concurrency safe programming language. Has some functional capabilities
Scala: JVM based language offering strong typing and functional and OOP capabilities
Spark: cluster computing framework for big data, batch and stream (stronger in batch)
These cover a range of the data engineering space (Flink, Spark), as well as technologies I want to get better at and are close enough (Kubernetes, FoundationDB) and technologies I know but are not directly related (AWK, APL, Forth) and are used as anchors.
(continues)