Hi Jacque thanks for answer.
I would like something easier to do sizing, because currently in all demand I need to do data insertion in greenplum I follow the following steps:
1 – Create a new table
2 – I try to simulate the data with a randon function to make the compression do not interfere in the estimation
3 – I measure the data with a select.
But this time process is very slow and I also can not have exact size accuracy. normally after the return of the result I place a margin of 10 to 20%.
I would like to see here in the community how you make this estimate, what would be the best practices.
Hi – sizing is a bit more complicated than just a spreadsheet. Everything from concurrency, to types of queries, to amount of data storage, to type of datatypes can affect this. I’d suggest starting with how much total data do you need, and size the number of hosts by that as a starting point but it’ll be pretty imperfect. If you’d like to share more info please do.