RESTful SQL?

I’m trying to understand Mark Baker’s criticism of yesterday’s SPARQL Update proposal. To me his criticism seems to boil down to “it’s not RESTful”, which is true, but not necessarily a problem. Why insist on applying REST to everything that goes over HTTP? So here are some questions to REST proponents.

To set the tone, an SQL query:

UPDATE table.foo SET table.foo=table.foo+1 WHERE table.bar > 500

Now, how would you RESTify SQL? Big bonus points if queries like the above are possible with your approach.

If your answer is, “I wouldn’t”, then would it be acceptable, under any circumstances, to receive SQL queries over port 80 in order to expliot existing server infrastructure?

This entry was posted in General, Semantic Web. Bookmark the permalink.

18 Responses to RESTful SQL?

  1. Mark Baker says:

    Ok, over to your Web server 8-)

    My answer to your question is “I wouldn’t”, and no, SQL queries should not be sent over port 80; you say it would be “exploiting” existing infrastructure, I say that’s subverting it. Port 80 is for the Web.

  2. Hi Richard,

    [BTW you should correct the SQL stmt to “UPDATE table SET …” as per http://en.wikipedia.org/wiki/Update_%28SQL%29 ]

    My first comment is that REST is all about resources so in order to make your update query more RESTful we should try to identify the hidden resources in there. So if we look into the SQL relational model we see that we have tuples and relations (tables). A “natural” mapping of the relational model to REST is therefore the modeling of tuples and relations as resources so that:

    0. Each tuple is identified by a URI (based on the tuple’s primary key)
    1. If you want to update a tuple you PUT a representation to its URI
    2. In order to delete a tuple you just DELETE it
    3. (Obviously) you can GET a single tuple
    4. Each relation is identified by a URI (based on the table’s name)
    5. If you want to retrieve the contents of a relation you just GET it
    6. If you want to filter and return a subset of the contents of a relation you attach some query parameters to its URI a (e.g. …?bar=>500) and GET it
    7. In the same way you can DELETE more than one tuples of a relation by DELETEing a URI constructed by the relation’s URI and the query parameters
    8. If you want to insert a tuple you just POST the tuple’s representation to the relation’s URI

    It is also possible to have distinct URIs to identify each of a tuple’s attributes so the UPDATE you have given can be modeled as

    f = GET /codd/table?bar=>500/foo

    f += 1

    PUT /codd/table?bar=>500/foo
    f

    There are problems of course because this is not atomic (getting , incrementing, and then updating) but there are ways to achieve atomicity through some optimistic concurrency methods (see the lost update problem, http://www.w3.org/1999/04/Editing/)

    Now how about some (inner, outer) joins, subselects, …? These are left as exercises for the readers :-D

    Cheers
    Stelios

  3. Receiving “SQL queries over port 80″ may indeed lead to “exploits [of] existing server infrastructure” — but probably not the type of exploits that you’d like. :-) Better make sure that your code doesn’t open a new door for SQL injection attacks.

  4. Alex James says:

    This is very interesting, a couple of days ago I posted about how I like REST because in theory it could lead to a SQL like language for composing results from many servers. Something with the name RESTQL seemed appropriate…

    Stelios is right on the money too. I have been dreaming of updating my open source product Base4 so it works in a RESTful way right down to the individual Properties of an Object or in RDBMS terms the cell or intersection of a Row and Column….

    That would be incredibly powerful and is what I am calling Data2.0, on my blog I have a heap of articles here: http://www.base4.net/blog.aspx?Tag=Data2.0

  5. Mark Baker says:

    Stelios is on the right track, but goes a bit too granular; tuples shouldn’t be given URIs, sets of tuples (graphs if you prefer) should be (well, technically, the resource whose state the tuples represent should).

    e.g. :mark rdf:type :Person & :mark :fullName “Mark Baker” are two tuples that might be returned together by dereferencing a URI for me (after a 303 redirect, of course 8-).

  6. Alex James says:

    Mark,

    Not sure about your tuple graph assertion for example if I request this:
    GET /database/person/1
    I should get a representation of person 1 of course, made up of a number of tuples as you describe.
    However this:
    GET /database/person/1/FullName
    should return just one tuple.

    And yes of course the distinction is based on an understanding of the inherient model.

  7. Mark Baker says:

    You’re free to use that convention if you really need that level of granularity, but it comes at a cost, and IME, isn’t that generally useful. I know because I was forced to implement it a few years ago.

    One problem is that you’re dealing with a lot less data so the overhead of HTTP typically swamps the data.

    Another problem is caching; if you set “fullName” directly via PUT, then GET the person resource, they won’t be in synch if you’re using a cache unless you disable caching for the person resource… which is exactly where you need it because it’s the one with all the data.

  8. Alex James says:

    I don’t buy the Cache argument I’m afraid.

    It implies all updates are occuring only through HTTP? how practical is that in reality?

    If I have an application running inside AMAZON’s network that applies some general discounting rules then all cached URI’s that represent books are immediately out of date anyway?

    It is not as if because we expose our data restfully, all of a sudden that is going to be the only way you can update it.

    So I don’t see why you should enforce some rules about PUT granularity in your REST implementation, it makes no sense to me.

    I also don’t buy the HTTP swamping argument either. Updates are generally a lot less frequent than gets, so why not go granular on the update. If you have a number of tuples to update sure go more chunky, that is a standard piece of distributed architecture advice, but if you have only one thing to update, why pass everything else?

    That can have some nasty side effects too, you might now introduce the last update problem, because of optimistic concurrency issues, i.e. the server doesn’t actually know what fields you are trying to update so it assumes you are trying to update everything.

    If in our distributed systems we mandate chunkyness we simply further increase the systems brittleness.

  9. Luke Opperman says:

    I’ll put these thoughts more together later, but here’s the essence of my reaction:

    This is about distributed data computing. This is about efficient and non-trivial client-specified adhoc querying via http, and local processing for arbitrary-sourced combinations, and probably local processing of relative updates as in this case.

    This is about scaling through applying this distributed-store/local-processing internally in partitioning our data stores. This is about dealing with caching by strong support in the “physical” model, not by impinging on the logical layer.

  10. Just a clarification for Mark: This was only an attempt to “RESTify SQL” and in particular the specific query that Richard posed, trying to be as closer as we can to the original SQL statement. I didn’t say that this is a recommended way to access a relational database through the web for all the reasons you gave (although I think Alex has a point also). In fact I would say that in many cases the representation of a single tuple (row) of a table as a REST resource is wrong because of the restrictions imposed by the relational model and the normalization of the database (e.g. every attribute has to have a single value). So for example a person’s addresses (or foaf:mbox’s :-) could be stored in another table whereas logically belong to the same person entity.

  11. Thanks all for the thoughtful comments.

    Mark, your first comment seems to say that anything not RESTful is not the Web. Web services and SPARQL HTTP bindings and SQL queries sent via POST requests are certainly not RESTful, but that doesn’t necessarily make them unappropriate for the Web. 99% of deployed Web content is not RESTful, and most of it works anyway (think cookies).

    I agree with Alex on the caching point. I don’t believe that many real-world systems can be partitioned in a way that a change to one resource never changes the representations of another resource.

    Stelios’ proposal is nice, unless you want to do something that does not cleanly map to a single resource, like my UPDATE example. It requires 2000 roundtrips to update 1000 rows, where a simple “send SQL via POST” approach would require one roundtrip to do the same. Same for RESTQL, which otherwise is an interesting idea.

  12. Richard,

    there is no need to have 2000 round trips to update 1000 rows. In my example GET /codd/table?bar=>500/foo could return a list of (ID, foo-value) pairs, in other words a 2-ary relation. (Actually it couldn’t have been otherwise because the URI does not uniquely identify a row in the table). Now the client could update all the foo values, reconstruct the list, and PUT it in the server. (Etag/If-Match conditional update is more difficult in this case but it’s certainly doable…)

    So the update process is:

    fs = GET /codd/table?bar=>500/foo

    fun = function(id, foo-value) { return (id, ++foo-value);}
    fs’ = map fun fs

    PUT /codd/table?bar=>500/foo
    fs’

    :-D

    Stelios

  13. Apologies Stelios, I jumped to conclusions. Now I understand the approach, and it looks pretty good.

    I think I’d like to do it a little bit differently, to avoid having to transfer all the data back and forth. Something like

    POST /codd/table?bar>=500

    and then in the message body the operation to apply to every row:

    foo = foo + 1

    This would map quite nicely to SQL:

    UPDATE table SET foo = foo + 1 WHERE bar>=500

    I wonder what Mark thinks about this.

  14. Richard,

    what you propose is to have the operation in the body of the HTTP message in some sort of tunneling which is not RESTful (REST has a standard number of operation that for HTTP are GET,POST,HEAD,DELETE,PUT) but let’s see what Mark has to say on the subject :-)

    Stelios

  15. Mark Baker says:

    Richard – not everything not RESTful is not part of the Web (got that? 8-). Cookies are a great example, as you point out. But IMO, none of the other examples you provided are, and none should be using port 80.

    I won’t elaborate much more on the caching point. You really have to implement it. Suffice to say that it’s really not that difficult at all to keep state changes local. And where it isn’t, just turn off caching. But be wary of models or conventions which require turning it off.

    To do the equivalent of your UPDATE example (btw, your SQL could use some work 8-) RESTfully – and I mentioned this on my weblog comments in the context of a similar example – the server could offer a GET form that the client could first populate with “500” (addressing the >500 requirement), creating a new “query URI” (ala HTML forms) upon which GET would return a document representing all the foo values. Then the client would do the “+1″ and PUT it back to the server. Voila!

  16. Mark Baker says:

    Erm, yah, what Stelios said (in his last 2 messages). 8-) I should have read all the comments first.

  17. Thanks Mark, I think I understand your point about what to do over HTTP and what not, though I do not agree. My POV is that clearly, when your problem maps nicely into REST, then use REST, but some problems don’t, and in that case tunneling over HTTP might still be the best approach because it gives at least some of the benefits.

    I still wonder about my POST approach a few posts up. It has the “query URI” to identify just the subset of data we want to update, but then POSTs to the URI in order to avoid the expense of having to pull all the data over the wire, increment it client-side, and push it back up. Basically it maps things nicely into resources as Stelios suggested, but allows POST with an expression language to change resources. What do you think about that?