Monday, March 15, 2010

Formulating Expressions a Step at a Time: Lazy Evaluation

I am reading this book, and I found the section with the title “Formulating Expressions a Step at a Time” particularly interesting:

First, the book shows you this the query for the sentence "Get pairs of supplier numbers such that the suppliers concerned are collocated (i.e., are in the same city)" written in Tutorial D:

( ( ( S RENAME ( SNO AS SA ) ) { SA , CITY } JOIN
( S RENAME ( SNO AS SB ) ) { SB , CITY } )
WHERE SA < SB ) { SA , SB }

And then it proceeds to show you how to write this query in a more readable (step by step) way:

WITH ( S RENAME ( SNO AS SA ) ) { SA , CITY } AS R1 ,
( S RENAME ( SNO AS SB ) ) { SB , CITY } AS R2 ,
R1 JOIN R2 AS R3 ,
R3 WHERE SA < SB AS R4 :
R4 { SA, SB }

Finally, it shows you how to write this query in SQL:

WITH T1 AS ( SELECT SNO AS SA , CITY
FROM S ) ,
T2 AS ( SELECT SNO AS SB , CITY
FROM S ) ,
T3 AS ( SELECT *
FROM T1 NATURAL JOIN T2 ) ,
T4 AS ( SELECT *
FROM T3
WHERE SA < SB )
SELECT SA , SB
FROM T4

Thanks to the “with” keyword, both in SQL and in Tutorial D, it is possible to deal with this query in a “step by step” way, instead of having to deal with it a single hard to write and hard to read expression (note that this is not a recursive query, so the with keyword is not being used for that in this examples)

Sadly, so far I have been unable to find a equivalent for this syntax in Dataphor… While it is possible to write something like (I am not 100% confident the syntax is right, but I think it should give you the general idea):

var R1 := S {SNO SA}
var R2 := S {SNO SB}
var R3 := R2 JOIN R3
var R4 := R3 WHERE SA < SB
select R4 {SA, SB}

In Dataphor, the variables (R1, R2, etc) are not lazily evaluated, and therefore the performance is not as good, as in, for example, the SQL case (I ran a similar example in SqlServer, and the expressions were evaluated lazily, at the end, instead of one by one, resulting in far better performance). Maybe I am doing something wrong?

I wonder how hard would it be to make Dataphor generate SQL using the WITH keyword with those databases that support it (so far the latest versions of SQLServer (2008) and Oracle (10 & 11) seem to support this syntax)… I guess it is time to ask the Dataphor authors…

Thursday, March 04, 2010

Null versus None

None means Nothing, Nothing is a concept that describes the absence of anything at all. Nothing is sometimes confused with Null, but they are very different concepts because Nothing means absence of anything, while Null means unknown (you do not know if there is a thing or not).

For Nothing, the normal Two valued logic applies (Nothing=Nothing : true, Nothing = Something : false), for Null, Three valued logic is necessary(Null=Null:unknown, Null=Something:unknown). Unfortunately, this 2 concepts have been used interchangeably without much thought, to point where the most common use for Null in relational databases is to mean Nothing (even when Null was designed to represent unknown by Codd). This confusion is aggravated by the fact that many mainstream application languages (Java, C#, C, etc) use the null keyword to mean uninitialized variable which easily maps to the interpretation that null means the variable is pointing to "nothing" (no object).

But for databases, Null was not invented to represent nothingness, was invented to represent that the value of something was not known (maybe be something, maybe nothing, we just do not know).

Now that Chris Date wants Null to be removed from Relational Databases, so that the incongruence and confusion brought in by Three valued logic is eliminated, the developers, accustomed to use Null to represent Nothing, resist to the idea asking: How am I going to represent the fact that a Person is not married? I use to do that by marking the Marriage Date? as a nullable Date. Now what? I need to split the table into 2 tables just to represent the fact that the Marriage Date? is not mandatory? That of course seems like the obvious, elegant (if extremely cumbersome answer). But the practical developer refuses to get into that trouble, it is just too much effort, it is simple easier to continue using Null. But... what about Nothing? why not just simply add "Nothing" as a possible value for to the Date domain? That way it is possible to say that the Person has no Marriage Date?, and still stay inside the realm of two valued logic.

Is this solution, in any way in conflict with The Third Manifesto? I really would like to know… I wonder what will be the opinion of the community in the  C2 Wiki