python - Efficient ways to store multiple excel files in a database? -


we're working on big inner project enables users upload excel files, , perform searches on data gathered these excels. i'm trying homework before begin design, , come best solution.

the requirements -

  1. user can upload excel file many columns wants, there's no predefined structure excel.

  2. in contrary first point, there few fields assume user have. example - first name, last name. these colums don't have exist.

  3. the search option work following - when user searches, can search specific columns - ones predefined , expected excel file have. (in our example - first , last name). can search other columns, under field "other".

another word other search field - field go through columns in excel files don't fit predefined columns. i.e - 1 file has age column, has birthplace column, "other" field search through columns.


what's best way this?

  1. dynamically create new django model each excel uploaded, columns excel has?

  2. dynamically create new django model each file, predefined columns (if exist!), , "other" text field, concatenate unrelated fields?

  3. have 1 big django model (means 1 table in database) has predefined fields (which again, can null), , field called "others" concatenate unrelated columns?

  4. i can have main table has predefined columns, , table foreign key main table, each row represent "other" field.

example 4th solution -

+----+--------+--------+--------+ | id | field1 | field2 | field3 | +----+--------+--------+--------+ |  1 | val1   | val1   | val1   | |  2 | val2   | val2   | val2   | |  3 | val3   | val3   | val3   | +----+--------+--------+--------+ 

and dimension table -

+----+------+------+ | fk | key  | val  | +----+------+------+ |  1 | key1 | val1 | |  1 | key2 | val2 | |  1 | key3 | val3 | |  2 | key4 | val4 | +----+------+------+ 



as scaling - expect have no more 1500 excel files, each containing between 100 approx. 100,000 rows (we limit number of rows each excel file has 100k). statistics have excels examined won't go on 30~ million rows.

we using django either mysql or postgresql.

i hope question clear , not opaque.

thanks!

edit: after changed question. have added short section on model 4.

i recommend against dynamically creating tables. messy , doubt perform well. database create access path each database table query, if create multiple databases files need search of them.

you need variant of model 3.

this means use 1 table, instead using columns each field, create 2 columns 1 excel column name , 1 it's value. need additional entries identify excel column , values belong excel spreadsheet.

so conceptually, instead of modelling:

field1 field2 field3 field4 other ------------------------------------ x       y     z           etc=xyz 

you model this:

sheet fieldname value ------------------------------------ key   field1    x key   field2    y key   field3    z key   field4    key   etc       xyz 

the advantage of model programming searches become easier. can model search select * data fieldname='%s' , value='%s'. if create database index on fieldname (and index on key use identify excel sheets), there should no performance penalty on original ideas model 3.

your model 4 work. has advantage that, predefined fields, user's query statements map sql select statements. has disadvantage need handle "others" columns differently rest of user's search criteria. indicated users not enter columns expect there. means have make these columns nullable, increases storage requirements.

overall, think suggested approach better option 4, conceptually simpler. indicated thought create many rows. indeed create more rows, mysql , postgressql can amount of rows. postgressql can store unlimited number of rows. mysql can store 4000~ million rows (and can compile mysql --big-tables if need more).

in terms of performance, makes no real difference how big table is, long have index on field.


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -