database - Programmatically determine required series of JOIN statements in SQL -


in standard database theory on referential integrity , inclusion dependencies, it's alluded there pspace algorithm taking known list of e.g. foreign key , primary key relationships plus candidate relationship , determining if candidate relationship logically implied known list. non-deterministic version of algorithm given in algorithm 9.1.6 of this book [pdf], not explicit deterministic version (and i'm guessing idea isn't somehow manually use savitch's theorem find algorithm).

a related question how programmatically ask database system question this:

given result set query 1 , result set query 2, there known-in-advance chain of primary / foreign key dependencies allow me join 2 result sets (possibly requring additional columns come derived chain of key relationships).

i'm thinking in context of postgres, mssql, or mysql, how ask database information programmatically.

i can query sp_help , sp_fkeys stored procedures, it's unclear how can provide set of columns given table , set of columns target table, , directly ask database if can derive, outputs of things sp_fkeys, way join tables each other.

i'm more willing (happy even) write code myself if there known algorithms (based on theorem mentioned above, @ least math algorithm exists if it's not part of database systems directly).

my application ingest assortment of tables poorly maintained schemas, , metaprogramming during schema discovery phase, there not need human-in-the-loop try manually search chains of key relationships imply way 2 given tables can joined.

suppose database these 3 tables:

tablea  | a_key1  | a_key2  | a_value | +---------+---------+---------+ |       1 |   'foo' |     300 | |       2 |   'bar' |     400 |  tableb  | b_key1  | b_key2     | b_value | +---------+------------+---------+ |   'foo' | 2012-12-01 |   'bob' | |   'bar' | 2012-12-02 |   'joe' |   tablec  | c_key1     | c_key2  | c_value | +------------+---------+---------+ | 2012-12-01 |     100 |     3.4 | | 2012-12-02 |     200 |     2.7 | 

and suppose b_key1 foreign key a_key2, c_key1 foreign key b_key2, , the "key" columns of each table, together, form primary key. it's not important whether tables normalized (so, instance, fact b_key1 can identify row in tablea though tablea ostensibly has 2-tuple key -- unimportant , happens tables inherit in wild).

then in case, knowing key relationships in advance, know following join "supported" (in sense data types , value ranges must make sense under system of foreign keys):

select a.*, c.c_value tablea join tableb b     on a.a_key2 = b.b_key1 join table c     on b.b_key2 = c.c_key1 

so if person came database 2 separate queries:

select * tablea  select c_key1, c_value tablec 

then database system ought able deduce joins required query brings these 2 alignment (as implied foreign keys, not other arbitrary ways of making them alignment).

my question is: search functionality produce series of required joins exist in major sql-compliant rdbmss?

one can imagine following idea discussed in answer , sucking metadata other programming language, deconstructing kind of graph of key relationships, , using known graph path-finding algorithm solve problem. homebrewed implementations of fraught errors , it's kind of problem algorithms can written exponentially slow. asking if exists in database system already, i'm trying leverage work done make smart , database-aware implementation if work done.

let set of input queries q, , assume each of these queries selects subset of columns single table. i'll assume that, when joining table t in q table outside q, want consider fks can formed columns present in query t; when joining table outside q table outside q, can use fk defined between these 2 tables.

in case suggest want know if there unique set of joins generate combined column set of input queries. , can tested efficiently:

  1. form graph g having v = {all tables} , e = {(u,v) | there fk relationship u v or v u}, added restriction edges incident on tables in q can use columns present in corresponding input queries.
  2. build spanning forest f on g.
  3. if pair of tables in q appear in different components of f know there no query satisfies our goal, , can stop.
  4. otherwise, there @ least 1 such query, involving or of tables in unique component x containing tables in q.
  5. to rid of pointless joins, delete edges in x not on path table in q other table in q (e.g. repeatedly "nibbling away" pendant edges table outside of q, until can no longer done).
  6. for each edge e remains in new, smaller tree x:
    1. create copy x_e of induced subgraph of v(x) in g, , delete edge e x_e.
    2. create spanning forest f_e x_e.
    3. if f_e consists of single component, can mean there second (i.e. distinct) way of joining tables in q avoids edge e, can stop.
  7. if point, every edge e in x necessary, , can conclude set of joins implied x unique way produce set of columns input query set q.

Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -