Program specification
What is program specification
I am not going here to refer to any authorities or encyclopedic articles. The common sense tells us that we think of specification of more or less strict description of program behavior in a whole or its separate parts. The hope is that the specification
- is more human friendly than a code
- is as much as possible implementation agnostic
- reflects designer's vision rather than developer's one
- does't require knowledge of programming languages maybe besides some of the form of stricter human language or some of the simplified form of predicates language to express propositions like $∀ ...$ or $∃ ...$.
Unfortunately that is only a hope. Deep thinking about the possibility of such specification emergence leads us to realizing that usual software development process especially in the not-very-big companies is like
- whoever (sales manager, kind of director, some presuming guy or committee etc) tell the team lead (or anybody who has the same or similar language with developers) something like "We need the following program or feature or whatever which could be programmed"
- then team lead discuss with developers and testers the architecture and iteratively and agilely they do the following cycle think/discuss - code - test - analyze
- sometimes they do the phase of think/discuss together with initial task giver
When finally the program looks good they can finish and go to the production or begin to think about additional verification. They might even have a documentation which is "more human readable than a code", "implementation agnostic" (rare), reflects designer's vision and even use more or less formal language. But the problem is that all those are created post factum and very often by the same people who participate in development and testing. Another contradiction here is that everybody is always in a hurry, production use and customers don't like to wait. And as a result the documentation describes not "what program must do" but "what it hopefully does".
I will not give the answer here, I just put the issue. And later will write some ideas of how all those can be mitigated in theory.
Does the program specify itself?
Programs often consist of functions, methods, blocks, whatever... And people think that while specifying the program in general they must start with individual functions specification. Btw I totally disagree with that. And here and mostly in the next posts I will try to explain myself.
Start with the example of factorial function:
In C++
(or any similar language):
(recursive)
unsigned int factorial(unsigned int n)
{
if (n == 0 || n == 1)
return 1;
return n * factorial(n - 1);
}
or not recursive
unsigned int factorial(unsigned int n)
{
int res = 1, i;
for (i = 2; i <= n; i++)
res *= i;
return res;
}
(I will not further give recursive examples on non-functional languages.)
Haskell
recursive:
fac :: (Integral a) => a -> a
fac 0 = 1
fac n = n * fac (n - 1)
Coq
recursive:
Fixpoint fac (n:nat) : nat :=
match n with
| O => 1
| S n' => (S n')*(fac n')
end.
Coq
in proof mode:
Definition fac_proofmode (n:nat) : nat.
induction n ; [exact 1 | exact (S n*IHn)].
Defined.
Mathematical definition is
$$ n!=\Pi_{i=1}^n i $$
Coq with some helpers which simulate the mathematical definition:
Fixpoint fold_left {A B}
(f : A -> B -> A)
(l:list B) (a0:A) : A :=
match l with
| nil => a0
| cons b t => fold_left f t (f a0 b)
end.
Fixpoint series (n:nat) (m:nat) {struct m} : list nat :=
match m with
| O => nil
| S m' => if (leb n m) then cons m (series n m') else nil
end.
Definition fac_foldl (n:nat) : nat :=
fold_left mult (series 1 n) 1.
Note that fac_foldl
looks already very similar to mathematical definition. We can even go a little bit further with Coq notations:
Notation " [ n 'to' m ] " := (series n m) (at level 20).
Notation " 'Π' l " := (fold_left mult l 1) (at level 20).
Definition facmath (n:nat): nat := Π [ 1 to n].
Notation " n ! " := (facmath n) (at level 20).
Compute 0!. (* 1 *)
Compute 1!. (* 1 *)
Compute 2!. (* 2 *)
Compute 3!. (* 6 *)
Compute 5!. (* 120 *)
Now ask yourself why we think that mathematical definition is "correct" but the program we use to calculate factorial has to be verified. Why C++
or Haskell
examples we consider as programs but
$$\Pi_{i=1}^n i$$
expression as formula. And what about Coq
definitions then - are they programs or formulas?
To give mathematical definition we must preliminary know what is
$$Π_{i=...}^{...} ...(i)$$
notation which is actually not trivial. For example it would be probably hard to explain to kids of smaller grades what all this stuff means. This point makes program definitions of factorial actually not longer than mathematical, as they are full expressions on the correspondent language but the mathematical definition suggests some educational background context. The latest is very important fact when comparing mathematical and programming education and skills. Exactly the context allows humans to evolve in mathematics.
However the factorial notion is as simple as "multiplication of all (natural) numbers from 1 to the given one". And it is not-too-hard to explain this to anybody who knows numbers and multiplication.
From my point of view though Coq
and Haskell
give definitions more close to mathematical (but who says that mathematical variant should be considered as original?), the C++
can be also considered as definition - not a program.
We can consider the Coq
(recursive) variant as specification and prove that C++
variant corresponds it or opposite: take the Coq
variant as a program and prove that it corresponds C++
specification. If one thinks about this a little bit more she comes to following important generalization: specification and program are just different ways to say the same thing.
And the main point of verification is that if one proves that two differently said things are the same (one corresponds in some sense to another) those things are possibly correct because ... (unexpectedly) ... it is difficult to make the same mistake in defining a thing twice in different ways.
Not being verbatim I will show how it could be done in Coq
proving that
$$ ∀ n, {\rm fac}\ (n) = {\rm facmath}\ (n) $$
Lemma pi_gen: forall (l: list nat) (a0: nat),
fold_left mul l a0 = a0 * (fold_left mul l 1).
Proof.
intros l. induction l; intros. { simpl. lia. }
simpl. rewrite IHl. symmetry. rewrite IHl.
lia.
Qed.
Lemma fac_correct: forall n, fac n = n!.
Proof.
induction n. { reflexivity. }
unfold fac_math.
simpl. rewrite IHn.
rewrite pi_gen.
replace (n+0) with n by lia.
remember (n!) as f.
remember (Π ([1 to n])) as f1.
assert (f=f1). subst f f1. reflexivity.
lia.
Qed.
This is not tutorial on Coq
and I even didn't think about
optimality of definitions and proofs - just showing that it is actually easy to proof the equivalence of two definitions.
Resuming I would like to note that distinguishing between program and specification is sometimes artificial - that is just two (or more) ways to say the same thing.
Well..., until one may reveal some specific properties and prove them separately. Like $n!>0$ or $(n+1)!=(n+1)*n!$ or maybe even for given and independently defined $Γ$-function we have the property $Γ (n+1)=n!$. And all these can be proven showing more specific properties. Will those convince us in program correctness or not depends exclusively on our internal ratio of willingness to have maximally correct program and efforts we are ready to spend on verification.
Here is one good news. If we prove that given the specification there exists only equivalent functions in some sense (for example the strict one: $f∼g ⟺ ∀ x, f (x) = g (x)$) we can stop finding other properties to convince ourselves more.
However sometimes we can even prove such a fact. For example
Definition fac_like (f:nat->nat) :=
(forall n, f (S n) = (S n)*(f n) ) /\ (f 0 = 1) /\ (f 1 = 1).
Lemma fac_like_full: forall f g,
fac_like f ->
fac_like g ->
f = g.
Proof.
intros.
extensionality n.
unfold fac_like in H, H0.
destruct H, H1, H0, H3.
induction n. { congruence. }
rewrite H, H0.
congruence.
Qed.
We use here functional extensionality axiom (which is exactly the same as the given above relation $f∼g$) to prove that having property fac_like
known for two functions $f,g$ we can stop with further verification as we already have proven that $f=g$ and we cannot find any differences between them unless the theory is consistent.
Worth to note that finding non trivial full properties is very complicated task, and to prove that some property is full is either not easy in real life cases.
I will return to C++
non-recursive variant later when writing about imperative programs properties and proofs.
What is next
In the next posts about specification I will discuss the possible ways to find the specifications for functional and imperative programs, the naming problem while looking for program meaning and I will also sketch two equivalent but very different in nature ways to represent imperative program which open some paths for verification.