- As the program interpreter is end-to-end differentiable, we can optimize this behaviour directly through gradient descent techniques on user specified objectives, and also integrate the program into any larger neural computation graph.
- We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex transduction tasks such as sequence sorting or addition with substantially less data and better generalisation over problem sizes.
- To the end we present a differentiable interpreter for the programming language Forth.
- We consider the case of prior procedural knowledge such as knowing the overall recursive structure of a sequence transduction program or the fact that a program will likely use arithmetic operations on real numbers to solve a task.
- Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model.
Read the full article, click here.
@_rockt: “Differentiable memory, Turing machines & ADTs, now a diff. programming language! #dlearn #ai”
[1605.06640] Programming with a Differentiable Forth Interpreter