![]() Share: Twitter Facebook LinkedIn Comments ![]() # Schema: AccountID, DateOfPurchase, TransactionID, PurchasePrice, StoreName, StoreCategory, CityName, ProvinceName, PurchaseMethodĪfter generating a couple of outputs, I loaded them to HDFS, created a table and mapped the data and then I was able to what was expected. X_cities = random.choice(city_dict.keys()) X_stores = random.choice(shop_dict.keys()) 'Sportmans Warehouse': 'Sports Equipment', ![]() Here is the Python Script that will generate 1000 lines per run: I ended up writing a Python Script that generates comma delimited lines that represents dummy transactional data that ends up like the following: #AccId, Date, TransactionId, Price, MerchantName, MerchantCategory, City, Province, Purchase MethodÄ 714,4000645,506.94,Rage Shoes,Shoes,Kroonstad,Free State,credit The other day, I was facing a scenario where I had to setup bucketing with Hive and I needed some sample data, but in the same way I thought it would've been nice to have some random data, but that makes sense, and not just have random data that does not make sense.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |