Archive 27/02/2024.

Conversational programming - why can it happen right now?

chris.daniel

Since the topic of conversational programming seems to attract a lot of attention, and have a significant impact on things that are very close to my heart (programming), I have decided to map it.

This is a map representing the landscape in the past (say, pre-2005). I would say the landscape was as follows:

  • A sufficiently efficient speech engine capable both of emitting voice (easier) and understanding input (much harder) did not exist. This alone was a significant challenge (Siri was released only in 2011) that blocked the conversational programming for good.
  • User commands had no meaning. It could have been possible to dictate things, but that was just the voice input without meaning. It could serve as a way to ‘dictate in’ computer program in for example C, but, first of all, it would be slower than typing in, and secondly, it would be still be ‘C’, so such a mechanism could work only for people already familiar with the art of software development.
  • There is a challenge in adding meaning to words. If the user says “send an email”, it is good to know what does it mean, which service can do that, and what parameters are necessary. And that requires:
    • a catalogue of computer-accessible services (we had some early attempts in that space - SOA, but public APIs were still emerging)
    • domain-specific lexicon, in the beginning, one for programming, but later, if f.e. customer sends a complain, it would be good to know what “sends” mean and what is in the complaint (context-specific words and phrases, maybe meaning and categorisation).
  • Finally, there is a need to handle edge cases, when something did not fit into the programming model, or whether the flow did not cover all possible situations.

As you can see on the map above, there is plenty of custom-built components, which form together a wobbly tower. It does not mean that building such a solution was impossible, but that it would be ridiculously expensive (as it is usually the case if you don’t use standard and predictable components), and it would bear a significant risk of not addressing real customer needs (as conversational programming would be something completely new). I think that IBM Watson fits such a description.

Time has passed, and things have changed a bit. We, as the industry, have learned quite a lot:
5c8b7d7ab2a932006083a4cc
The same map without diff, just things as they seem to be today (for the sake of readability):
5c8b7d7ab2a932006083a4cc%20(3)

  • APIs with description are standard and expected
  • NLP is efficient enough to be used
  • domain specific lexicons are being (slowly) created. Check Amazon Comprehend.
  • it is possible to compose a process out of available services (see @alexander.simovic work).

Conclusions:

  • Lexicons are being created for the sake of automation and as an additional input to AI engines, not as a conversational interface only. I have not looked exactly what is the application of Lexicons in the AI and automatic text processing, but I expect text processing drives their evolution, and voice programming will be able to adopt them.
  • The only significant piece of know how that is preventing voice programming is the knowledge how to handle edge cases, especially when the user is not exactly sure what is expected/possible or uses some vague language. Translating human language into a von Neumann architecture (abstract model of how computers work) is usually the job of human developers. I can’t see that aspect changing in the predictable future, but developers will get much more efficient if some AI will be capable of guiding users on how to specify their requirements. Over time, it can become a part of the conversational programming.
  • Conversational programming is not right now programming that has been made conversational. It is a subset of programming (assembling flows) that got a conversational interface. It is important to distinguish this, as users probably never will be able to tell the computer “create a 20% more efficient algorithm for car routing”.
  • A market may appear for flow patterns, flow building blocks, as well as for domain specific lexicons. This market will be driven by conversational engine finding the most common demands that cannot be yet found in the service catalogue.
  • Most importantly, right now, the necessary investment seems to be much lower than it was in the past. And one can focus only on missing components, which makes the challenge approachable.
elves

Great stuff. I like your analysis, the threshold to implementations has lowered and in some places something close to this has happened.

I’ve been fascinated with natural language parsing since I worked on a chomsky type 0 parser in the 1970s. All I’ll say about that is Lexicons aren’t enough; It has to be phrase based.

Ever since the fifth generation computing fiasco people have talked about conversational programmers workbenches. They have to do some things we humans can’t though because otherwise we could just employ more programmers apprentices.

More recently convdev has become a thing. Although like “Agile” the term has become subverted.

In one project someone plugged chatbots into sharepoint. Quite a lot of the grunt work for devops was parsed out of comments.

Somewhere I have a diagram for an AI supported development environment, and it indicated to me that conversational connected speech is still the bottleneck. Most modern dev IDEs now have snippet handling and do automatic refactoring. What has been happening in chip layout for years is starting to happen in software layout.

I’m not sure that edge (or even corner) cases are a problem. In test world there are generators that create all permutations of variables and the results always find unexpected cases. I’m sure generating many (millions) of variations and annealing solutions that meet requirements can (and probably are) done now.

Once again mapping proves invaluable and a great bit of work, thanks.