Dictionary Maybe?

I have a database that has 16,000,000 rows and 3 columns that needs to be transposed into an excel spreadsheet with 731 columns (2 years worth of data)
I’m thinking dictionary is the way to go but for some reason my brain doesn’t like dictionaries :slight_smile:
What I’m guessing I would need to do is
Select * from prices WHERE id = blah ORDER BY thedate
This would give me price, thedate and id.
The issue is not all id’s will have 731 dates.
I have all the dates in a text file which I can read in 1 date at a time with no issue.
The question is can I make 731 entries into a dictionary without their value then put in the value from the above recordset, or is it better to just use the recordset and HasKey to check to see if the date is there?

(I’m guessing date would be the key and price would be the value – but that could be backwards)

Then I can never figure out how to get the value/key after it’s in the dictionary.

The data is just going to end up as a comma delimited text file (which I have no problem creating)
Thanks.

A better question is… will you be able to hold all of that data in RAM at the same time? 16 million rows with 731 columns of just 1 byte each would be 11GB. If your dates are 8 bytes, you’ll need 93GB of RAM.

There is no way any user or business logic is going to need to access all 16M rows at the same time… nor for that matter even need all 731 columns of data

The proper solution is a database, and appropriate SQL

SELECT FROM table WHERE condition

or depending on the database engine, use LIMIT to read “pages” of records depending on the business need

my last major project had a database of 1.6 million records about healthcare providers, that was served to a web portal, we paged the data to the user 100 records at a time, as they barely looked at a thousand records in a given session, let alone trying to load, store in memory and display 1.6 million

I’ll only do 1 id at a time (read then store to file), so I won’t need to keep more than 731 dictionary entries in the RAM at a time.

and if your 731 columns are DATES then, I suggest your schema design is flawed

database tables should be long and skinny, not short and wide

[quote=279049:@Dave S]and if your 731 columns are DATES then, I suggest your schema design is flawed

database tables should be long and skinny, not short and wide[/quote]

I have 3 columns in the database. I am transposing them. They need to become a 731 X 32,000 csv file so it can be imported into SPSS data analysis. No where have I said I have a database with 731 columns.

wow… your welcome

[quote=279040:@Jym Morton]The question is can I make 731 entries into a dictionary without their value then put in the value from the above recordset, or is it better to just use the recordset and HasKey to check to see if the date is there?

(I’m guessing date would be the key and price would be the value – but that could be backwards)

Then I can never figure out how to get the value/key after it’s in the dictionary.
[/quote]
Answering your question, dictionaries are stored as key/value pairs.
While it is possible to create 731 keys, I would advise only adding keys that you are going to use and then when translating to the CSV use the dictionary function HasKey to determine if the dictionary has the key, and do something with a default value if not.

You set and get values from the dictionary with Value, however requesting the value for a key that does not exist will raise an exception - so be sure to check it exists with HasKey first.

I’m not exactly sure what your data set is, but dates as columns is generally not the best idea. A table of transactions where the date was a value of a column would be more efficient.

[quote=279054:@Tim Parnell]Answering your question, dictionaries are stored as key/value pairs.
While it is possible to create 731 keys, I would advise only adding keys that you are going to use and then when translating to the CSV use the dictionary function HasKey to determine if the dictionary has the key, and do something with a default value if not.

You set and get values from the dictionary with Value, however requesting the value for a key that does not exist will raise an exception - so be sure to check it exists with HasKey first.

I’m not exactly sure what your data set is, but dates as columns is generally not the best idea. A table of transactions where the date was a value of a column would be more efficient.[/quote]

Thanks, still confused on how to use the dictionary. The excel file needs to have the dates as row 1 for it to be imported into SPSS correctly. So with my RecordSet is this correct?
Dim dic as New Dictionary
While rs.EOF = FALSE
dic.Value(rs.field(“price”).doublevalue) = rs.field(“thedate”).datevalue
rs.movenext
wend

Then to get the value into a string (or do I have things backwards)?
For each d As date in mydates
If dic.HasKey(d) then
s = s + ", " + CDBL(dic.value)
else
s = s + “,”
End if
Next

I don’t think that will give you a usable result. The key for the dictionary should be unique. If you have the price as the key for the dictionary you’ll only get the last date for that price. So if you have two entries for 17.99, you’ll only get one of them.

Remember to paginate your sql query or you will crash the app with 16mil rows.

I’m pretty sure what you’re doing is completely undermining the data structure by turning each date into a column. I just noticed that you mean each column is one calendar day. I think your setup in SPSS is probably wrong if you’re trying to bend over backwards to send it data like this.

Ok, so the Key needs to be the date which is Unique for each ID. I’m not doing the SPSS part another Prof. is, he’s just sent me an excel spreadsheet and told me to fill out the cells. He has dates in row 1 and ID’s in in Column A. I’m only pulling 1 ID at a time so it will be max 731 rows per call to the database. It’s actually quite simple to do this way, I just never use (nor understand) dictionaries because I generally use a Class Array.

Thanks for the answers.

Imagine each “key” for the dictionary is for one ROW
then the contents of that dictionary slot needs to be something that can hold 731 “columns” - and you can use either a new dictionary or an array

  dim d as new dictionary
  dim columns as dictionary
  
  dim id as integer
  
  // id = 1 
  // for simplicity row 1 is id 1
  id = 1
  
  // lets add one row
  if d.haskey(id) = false then
    columns = new dictionary
    
    // now if we knew all 731 colums we could add them here
    d.value(id) = columns
    
  end if
  
  // add one value to the first row
  if d.haskey(id) <> false then
    
    columns = Dictionary(d.Value(id)) // we have to do this cast becase dictionary values are VARIANTS
    
    columns.value("2016-01-13") = 123
    
  end if
  
  // see if row 1 contains a column with the date "2016-01-29"
  if d.haskey(id) <> false then
    
    columns = Dictionary(d.Value(id)) // we have to do this cast becase dictionary values are VARIANTS
    
    if columns.hasKey("2016-01-29") = true then
      // row 1 has column "2016-01-29"
    else
      // row 1 does NOT have column "2016-01-29"
    end if
  else
    // there's no row 1 at all
  end if

Then why not go that direction here? A dictionary may not be the best solution.

as a “bucket to dump stuff in” dictionaries are pretty handy
but they often get used instead of a class designed for a purpose and then they can grow to become unwieldly monsters

I suggest an Array too since you don’t need the data to be position independent. Working with keys here is almost the same amount of work if you create a single class (date,id,ect) and it’s faster.

Just create a simple class “myColumn” which hold a data() array (yourCustomClass) and add your columns/data.
This way you can easily access each row/column by indexes:
mycolum(3).data(0)

Since many values will be empty then a dictionary can be a good solution

from what you have said:
you have a text file with all the 731 dates
you want a csv file: ID date1 date2 … date731

so loop for each distinct id
in each loop read the date and value
create a dictionary (or reset an existing one), let’s call it dict
create a record set (rs) with select price, thedate from where id= (order in not important)
dict.value(rs.field(“theDate”)…)=rs.field(“price”)… //The … stands for dateValue or string value, depends on the format you have in your file of dates and db (they must match); for the price you can convert it to the string value you need

Now loop on a vector of all 731 dates and write the value with matching date
//you can write to the text file using the right separator or build an array of strings and then join it to the file
//using the vector
dim results() as string
results.append
for i as integer=0 to myDates.ubound
results.append dict.lookup(myDates(i), “”) //write nothing for non existent date
next
theFile.write join(results, “;”) //using ; as cvs separator

More difficult to explain than to code and should be fast enough

Thanks all, after the Dictionary was explained to me I was good. maybe 15 lines of code. It’s a one-off project I could have done it in SQL but that would have been a lot more complicated.