-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathInterviewQ
2737 lines (2712 loc) · 130 KB
/
InterviewQ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Apache Hadoop
What are the main components of a Hadoop Application?
What is the core concept behind Apache Hadoop framework?
What is Hadoop Streaming?
What is the difference between Nodes in HDFS?
What is the optimum hardware configuration to run Apache Hadoop?
What do you know about Block and Block scanner in HDFS?
What are the default port numbers on which Nodes run in Hadoop?
How will you disable a Block Scanner on HDFS DataNode?
How will you get the distance between two nodes in Apache Hadoop?
Why do we use commodity hardware in Hadoop?
How does inter cluster data copying works in Hadoop?
How can we update a file at an arbitrary location in HDFS?
What is Replication factor in HDFS?
What is the difference between NAS and DAS in Hadoop cluster?
What are the two messages that NameNode receives from DataNode?
How does indexing work in Hadoop?
What data is stored in a HDFS NameNode?
What would happen if NameNode crashes in a HDFS cluster?
What are the main functions of Secondary NameNode?
What happens if HDFS file is set with replication factor of 1 and DataNode crashes?
What is the meaning of Rack Awareness in Hadoop?
How will you check if a file exists in HDFS?
Why do we use fsck command in HDFS?
What will happen when NameNode is down and a user submits a new job?
What are the core methods of a Reducer in Hadoop?
What are the primary phases of a Reducer in Hadoop?
What is the use of Context object in Hadoop?
How does partitioning work in Hadoop?
What is a Combiner in Hadoop?
What is the default replication factor in HDFS?
How much storage is allocated by HDFS for storing a file of 25 MB size?
Why does HDFS store data in Block structure?
How will you create a custom Partitioner in a Hadoop job?
What is a Checkpoint node in HDFS?
What is a Backup Node in HDFS?
What is the meaning of term Data Locality in Hadoop?
What is a Balancer in HDFS?
What are the important points a NameNode considers before selecting the DataNode for placing a data block?
How will you replace HDFS data volume before shutting down a DataNode?
What are the important configuration files in Hadoop?
How will you monitor memory used in a Hadoop cluster?
Why do we need Serialization in Hadoop map reduce methods?
What is the use of Distributed Cache in Hadoop?
How will you synchronize the changes made to a file in Distributed Cache in Hadoop?
Can you elaborate about Mapreduce Job
Why compute nodes and the storage nodes are the same?
What is the configuration object importance in Mapreduce?
Where Mapreduce not recommended?
What is Namenode and it’s responsibilities?
What is Jobtracker’s responsibility?
What are the Jobtracker and Tasktracker?
What is Job scheduling importance in Hadoop Mapreduce?
When used Reducer?
Where the Shuffle and Sort process does?
Java is mandatory to write Mapreduce Jobs?
What methods can controle the Map And Reduce function’s output?
What is the main difference between Mapper And Reducer?
Why compute Nodes and the Storage Nodes are same?
What is difference between mapside join and reduce side join?
What happen if number of Reducer is 0?
When we are goes to Combiner? Why it is Recommendable?
What is the main difference between Mapreduce Combiner and Reducer?
What Is Partition?
When we goes to Partition?
What are the important steps when you are partitioning table?
Can you elaborate Mapreduce Job architecture?
Why task Tracker launch child Jvm?
Why JobClient and Job Tracker submits job resources to file system?
How many Mappers and Reducers can run?
What is InputSplit?
How to configure the split value?
How much ram required to process 64mb data?
What is difference between block And split?
Why Hadoop Framework reads a file parallel why not sequential?
If I am change block size from 64 to 128?
What is IsSplitable()?
How much Hadoop allows maximum block size and minimum block size?
What are the Job Resource files?
What’s the Mapreduce Job consists?
What is the data locality?
What is speculative execution?
What is chain Mapper?
How to do value level comparison?
What is setup and clean up methods?
How many slots allocate for each task?
Why TaskTracker launch child Jvm to do a task? Why not use Existent Jvm?
What main configuration parameters are specified in Mapreduce?
What is identity Mapper?
What is RecordReader in a MapReduce?
What is OutputCommitter?
What are the parameters of Mappers and Reducers?
Explain JobConf in Mapreduce?
Explain Job scheduling through Jobtracker?
What is SequenceFileInputFormat?
Explain how input and output data format of the Hadoop Framework?
What are the restriction to the Key and Value Class ?
Explain the wordcount implementation via Hadoop Framework?
How Mapper is instantiated in a running Job?
Which are the methods in the Mapper Interface?
What happens if You don't Override the Mapper methods and keep them as it is?
What is the use of context Object?
How can you Add the arbitrary Key-value pairs in your Mapper?
How Does Mapper's Run() Method Works?
Which Object can be used to get the progress of a particular Job?
What is next step after Mapper Or Maptask?
How can we control particular Key should go in a specific Reducer?
What is the use of Combiner?
How many Maps are there in a particular Job?
What is the Reducer used for?
Explain the core methods of the Reducer?
What are the primary phases of the Reducer?
Explain the Shuffle?
Explain the Reducer's sort phase?
Explain the Reducer's reduce phase?
How many Reducers should be configured?
It can be possible that a Job has 0 Reducers?
What happens if number of Reducers are 0?
How many instances of Jobtracker can run on a Hadoop Cluster?
What is the Jobtracker and what it performs in a Hadoop Cluster?
How a task is scheduled by a Jobtracker?
How many instances of Tasktracker run on a Hadoop Cluster?
How many maximum Jvm can run on a Slave Node?
What is Nas?
How Hdfs differs with Nfs?
How does a NameNode handle the failure of the Data Nodes?
Can Reducer talk with each other?
Where the Mapper's intermediate data will be stored?
What is the Hadoop Mapreduce api contract for a Key and Value Class?
What is a IdentityMapper and IdentityReducer in Mapreduce?
What is the meaning of Speculative Execution in Hadoop?
How Hdfs is different from traditional File Systems?
What is Hdfs block size and how is it different from Traditional File System block size?
What is a NameNode and how many instances of NameNode run on a Hadoop Cluster?
How the client communicates with Hdfs?
How the Hdfs blocks are replicated?
Can you give some examples of Big Data?
What is the basic difference between traditional Rdbms and Hadoop?
What is structured and unstructured Data?
Since the data is replicated thrice in Hdfs so does it mean that any calculation done on One Node will also be replicated on the other Two?
What is throughput and how does Hdfs get a good throughput?
What is streaming access?
What is a Commodity Hardware so does Commodity Hardware include Ram?
Is NameNode also a Commodity?
What is a Metadata?
What is a Daemon?
What is a Heartbeat in Hdfs?
How indexing is done in Hdfs?
If a Data Node is full how it's identified?
If DataNodes increase then do we need to upgrade NameNode?
Are Job Tracker and Task Trackers present in separate machines?
On what basis NameNode will decide which DataNode to write on?
Who is a user in Hdfs?
Is client the end user in Hdfs?
What is the Communication Channel between client and NameNode/DataNode?
What is a Rack?
On what basis Data will be stored on a Rack?
Do we need to place 2nd and 3rd Data in Rack 2 only?
What if Rack 2 and DataNode fails?
What is the difference between Gen1 and Gen2 Hadoop with regards to the NameNode?
Do we require two servers for the NameNode and the DataNodes?
Why are the number of splits equal to the number of Maps?
Is a Job split into maps?
Which are the two types of writes in Hdfs?
Why reading is done in parallel and writing is not in Hdfs?
Can Hadoop be compared to Nosql Database like Cassandra?
How JobTracker schedules a task?
What is a Task Tracker in Hadoop and how many instances of Task Tracker run on a Hadoop Cluster?
What is a task instance in Hadoop and where does it run?
What is configuration of a typical Slave Node on Hadoop Cluster and how many Jvms run on a Slave Node?
How NameNode handles DataNode failures?
Does Mapreduce programming model provide a way for Reducers to communicate with each other and in a Mapreduce Job can a Reducer communicate with another Reducer?
Can I set the number of Reducers to Zero?
Where is the Mapper Output intermediate Kay-value data stored?
If Reducers do not start before all Mappers finish then why does the progress on Mapreduce Job shows something like Map 50 percents Reduce 10 percents and why Reducers progress percentage is displayed when Mapper is not Finished yet?
Explain in brief the three Modes in which Hadoop can be run?
Explain what are the features of Standalone local Mode?
What are the features of fully distributed mode?
Explain what are the main features Of pseudo mode?
What are the port numbers of NameNode and JobTracker and TaskTracker?
Tell us what is a spill factor with respect to the ram?
Is fs.mapr working for a single directory?
Which are the three main Hdfs-site.xml properties?
How can I restart NameNode?
How can we check whether Namenode is working or not?
At times you get a connection refused Java Exception when you run the file system check command Hadoop fsck?
What is the use of the command Mapred.job.tracker?
What does etc/init.d do?
How can we look for the Namenode in the browser?
What do masters and slaves consist of?
What is the function Of Hadoop-env.sh and where is it present?
Can we have multiple entries in the master files?
In Hadoop_pid_dir and what does pid stands for?
What does Hadoop-metrics and properties file do?
What are the network requirements for hadoop?
Why do we need a password-less ssh in fully distributed environment?
What will happen if a NameNode has no data?
What happens to job tracker when NameNode is down?
Explain what do you mean by formatting of the Dfs?
We use Unix variants for hadoop and can we use Microsoft Windows for the same?
Which one decides the input split hdfs client or NameNode?
Can you tell me if we can create a hadoop cluster from scratch?
Explain the significance of ssh and what is the port on which port does ssh work and why do we need password in ssh local host?
What is ssh and explain in detail about ssh communication between masters and the slaves?
Can You Tell Is What Will Happen To A NameNode and When Job Tracker Is Not Up And Running?
Table of Contents
Apache Flume
What is Flume?
What is Apache Flume?
Which is the reliable channel in Flume to ensure that there is no Data Loss?
How can Flume be used with Hbase?
What is an Agent?
Is it possible to Leverage Real Time Analysis on the Big Data collected by Flume directly?
What is a Channel?
Explain about the different channel types in Flume and which channel type is faster?
Explain about the replication and multiplexing selectors in Flume?
Does Apache Flume provide support for third party Plugins?
Differentiate between FileSink and FileRollSink?
Why we are using Flume?
What is Flumeng?
What are the complicated steps in Flume configurations?
What are Flume core components?
What are the Data Extraction Tools in Hadoop?
Does Flume provide 100% reliability to the Data Flow?
Tell any two Features of Flume?
What are Interceptors?
Why Flume?
What is Flume Event?
How Multi hop agent can be setup in Flume?
Can Flume can distribute data to multiple destinations?
Can you explain about configuration files?
What are the similarities and differences between Apache Flume and Apache Kafka?
Explain Reliability and Failure Handling in Apache Flume?
Table of Contents
Apache NiFi
What is Apache Nifi?
What is Nifi FlowFile?
What is Relationship in Nifi DataFlow?
What is Reporting Task?
What is a Nifi Processor?
Is there a programming language that Apache Nifi supports?
How do you define Nifi content Repository?
What is the Backpressure in Nifi system?
What is the template in Nifi?
What is the bulleting and how it helps in Nifi?
Do the attributes get added to content (actual Data) when data is pulled by Nifi?
What happens if you have stored a password in a dataflow and create a template out of it?
How does Nifi support huge volume of Payload in a Dataflow?
What is a Nifi custom properties registry?
Does Nifi works as a Master Slave architecture?
Table of Contents
Apache Avro
What is Apache Avro?
State some Key Points about Apache Avro?
What Avro Offers?
Who is intended audience to Learn Avro?
What are prerequisites to learn Avro?
Explain Avro schemas?
Explain Thrift and Protocol Buffers and Avro?
Why Avro?
How to use Avro?
Name some primitive types of Data Types which Avro Supports.
Name some complex types of Data Types which Avro Supports.
What are best features of Apache Avro?
Explain some advantages of Avro.
Explain some disadvantages of Avro.
What do you mean by schema declaration?
Explain the term Serialization?
What do you mean by Schema Resolution?
Explain the Avro Sasl profile?
What is the way of creating Avro Schemas?
Name some Avro Reference Apis?
When to use Avro?
Explain sort order in brief?
What is the advantage of Hadoop Over Java Serialization?
What are the disadvantages of Hadoop Serialization?
Table of Contents
Apache Kafka
Mention what is Apache Kafka?
Mention what is the traditional method of message transfer?
Mention what is the benefits of Apache Kafka over the traditional technique?
Mention what is the meaning of Broker in Kafka?
Mention what is the Maximum Size of the Message does Kafka server can Receive?
Explain what is Zookeeper in Kafka and can we use Kafka without Zookeeper?
Explain how message is consumed by Consumer in Kafka?
Explain how you can improve the throughput of a remote consumer?
Explain how you can get Exactly Once Messaging from Kafka during data production?
Explain how you can reduce churn in Isr and when does Broker leave the Isr?
Why Replication is required in Kafka?
What does it indicate if replica stays out of Isr for a long time?
Mention what happens if the preferred replica is not in the Isr?
Is it possible to get the Message Offset after Producing?
Mention what is the difference between Apache Kafka and Apache Storm?
List the various components in Kafka?
Explain the role of the Offset?
[Explain the concept of Leader and Follower?](kafka.md#Explain-the-concept-of Leader-and-Follower)
How do you define a Partitioning Key?
In the Producer when does Queuefullexception occur?
Explain the role of the Kafka Producer Api.
Table of Contents
Apache Impala
How do I try Impala out?
Does Cloudera offer a Vm for demonstrating Impala?
How much Memory is required?
What are the main features of Impala?
What features from relational databases or hive are not available in Impala?
Does Impala support Generic Jdbc?
Is Avro supported?
How do I Know how many Impala Nodes are in my cluster?
Are results returned as they become available or all at once when a query completes?
Why does my select statement fail?
Why does my insert statement fail?
Does Impala performance improve as it is deployed to more hosts in a cluster in much the same way that Hadoop performance does?
Is the Hdfs Block Size reduced to achieve faster query results?
Does Impala use caching?
What are the good use cases for Impala as opposed to Hive or Mapreduce?
Is Mapreduce required for impala and will Impala continue to work as expected if Mapreduce is stopped?
Can Impala be used for complex event processing?
Is Impala intended to handle Real Time Queries in Low latency Applications or is it for ad-Hoc Queries for the purpose of data exploration?
How does Impala compare to Hive and Pig?
Can I do transforms or add New functionality?
Can any Impala Query also be executed in Hive?
Can I use Impala to Query data already loaded into Hive and Hbase?
Is Hive an Impala requirement?
Is Impala production ready?
How do I configure Hadoop High Availability for Impala?
What happens if there is an Error in Impala?
What is the Maximum number of Rows in a Table?
On which hosts does Impala run?
How are Joins performed in Impala?
How does Impala process Join Queries for large tables?
What is Impala's aggregation strategy?
How is Impala Metadata managed?
What load do concurrent queries produce on the Name Node?
How does Impala achieve its performance improvements?
What happens when the data set exceeds available memory?
What are the most Memory intensive operations?
When does Impala hold on to or return memory?
Is there an update statement?
Can Impala do User defined functions?
Why do I have to use refresh and invalidate Metadata and what do they do?
Why is space not freed up when I issue drop table?
Is there a dual table?
How do I load a big Csv file into a partitioned table?
Can I Do Insert ... Select * Into a partitioned table?
What kinds of Impala Queries or data are best suited for Hbase?
Table of Contents
Apache Cassandra
Explain what is Cassandra?
List the benefits of using Cassandra?
What is the use of Cassandra and why to use Cassandra?
Explain the concept of tunable consistency in Cassandra?
Explain what is composite type in Cassandra?
How does Cassandra write?
How Cassandra stores data?
Define the management tools in Cassandra?
Mention what are the main components of Cassandra data model?
Define Memtable?
Explain what is a Column Family in Cassandra?
What is SStable and how is it different from other relational tables?
Explain what is a Cluster in Cassandra?
Explain the concept of Bloom Filter?
List out the other components of Cassandra?
Explain Cap Theorem?
Explain what is a Keyspace in Cassandra?
State the differences between Node and Cluster And DataCenter in Cassandra?
Mention what are the values stored in the Cassandra Column?
How to write a Query in Cassandra?
Mention when you can use Alter Keyspace?
What os Cassandra supports?
Explain what is Cassandra cqlsh?
What is Cassandra Data Model?
Mention what does the Shell Commands capture And consistency determines?
What is Cql?
What is mandatory while creating a table in Cassandra?
Explain the concept of compaction in Cassandra?
Mention what needs to be taken care while adding a Column?
Does Cassandra support ACID transactions?
Explain how Cassandra writes data?
What is SuperColumn in Cassandra?
Explain what is Memtable in Cassandra?
Define the Consistency Levels for Read Operations in Cassandra?
Explain how Cassandra writes changed data into Commitlog?
What is difference between Column and Super Column?
What is ColumnFamily?
Explain how Cassandra delete data?
Define the use of Source Command in Cassandra?
What is Thrift?
Explain Tombstone in Cassandra?
What Platforms Cassandra runs on?
Name the ports Cassandra uses?
Can you Add Or Remove column families in a working cluster?
What is Replication Factor in Cassandra?
Can we change Replication Factor on a Live Cluster?
How to Iterate all rows in ColumnFamily?
Explain Cassandra.
In which language Cassandra is written?
Which query language is used in Cassandra database?
What are the benefits and advantages of Cassandra?
Where Cassandra stores its data?
What was the design goal of Cassandra?
How many types of NoSQL databases and give some examples.
What is keyspace in Cassandra?
What are the different composite keys in Cassandra?
What is data replication in Cassandra?
What is node in Cassandra?
What do you mean by data center in Cassandra?
What do you mean by commit log in Cassandra?
What do you mean by column family in Cassandra?
What do you mean by consistency in Cassandra?
How does Cassandra perform write function?
What is SSTable?
How the SSTable is different from other relational tables?
What is the role of ALTER KEYSPACE?
What are the differences between node and cluster and datacenter in Cassandra?
What is the use of Cassandra CQL collection?
What is the use of Bloom Filter in Cassandra?
How does Cassandra delete data?
What is SuperColumn in Cassandra?
What are the Hadoop and HBase and Hive and Cassandra?
What is the usage of void close() method?
Which command is used to start the cqlsh prompt?
What is the usage of cqlsh-version command?
Does Cassandra work on Windows?
What is Kundera in Cassandra?
What do you mean by Thrift in Cassandra?
What is Hector in Cassandra?
Table of Contents
Apache Airflow
What is Airflow?
What issues does Airflow resolve?
Explain how workflow is designed in Airflow?
Explain Airflow Architecture and its components?
What are the types of Executors in Airflow?
What are the pros and cons of SequentialExecutor?
What are the pros and cons of LocalExecutor?
What are the pros and cons of CeleryExecutor?
What are the pros and cons of KubernetesExecutor?
How to define a workflow in Airflow?
How do you make the module available to airflow if you're using Docker Compose?
How to schedule DAG in Airflow?
What is XComs In Airflow?
What is xcom_pull in XCom Airflow?
What is Jinja templates?
How to use Airflow XComs in Jinja templates?
Table of Contents
DWH Architectures
What is the Main Difference between View and Materialized View?
What is Junk Dimension?
What is Data Warehouse architecture?
What is an integrity constraints and what are different types of integrity constraints?
Why is that Data Architect actually monitor and enforce compliance data standards?
Explain the different data models that are available in detail?
Differentiate between dimension and attribute?
Differentiate between Oltp and Olap?
What is a Real time Data Warehouse and how is it different from near to Real time Data Warehouse?
What is Type 2 Version Dimension?
What are Data Modeling and Data Mining?
Where the Data Cube Technology is used?
How can you implement many relations in Star Schema Model?
What is Critical Column?
What is the main difference between Star and Snowflake Star Schema and which one is better and why?
What is the difference between Dependent Data Warehouse and Independent Data Warehouse?
Which technology should be used for interactive Data Querying across multiple dimensions for a decision making for a Dw?
What is Virtual Data Warehousing?
What is the difference between Metadata and Data Dictionary?
What is the difference between Mapping Parameter and Mapping Variable in Data Warehousing?
Explain the advantages Of Raid 1 and 1/0 And 5 and what type of Raid setup would you put your Tx Logs.
What are the characteristics of data files?
What is Rollback Segment?
What is a Table Space?
What is Database Link?
What is a Hash Cluster?
Describe referential Integrity?
What is Schema?
What is Table?
What is a View?
What is an Extent?
What is an Index?
What is an Integrity Constrains?
What are Clusters?
What are the different types of Segments?
Explain the Relationship among database and Table Space and Data File?
What is an Index Segment?
What are the Referential Actions supported by Foreign Key integrity constraint?
Do you View contain Data?
What is the use of Control File?
Can Objects of the same Schema reside in different Table Spaces?
Can a Table Space hold objects from different Schemes?
Can a View based on another View?
What is a full Backup?
What is Mirrored on line redo Log?
What is Partial Backup?
What is Restricted Mode of Instance Startup?
What are the steps involved in Database Shutdown?
What are the advantages of Operating a Database in archivelog mode over operating it in no Archivelog Mode?
What are the different modes of Mounting a Database with the Parallel Server?
Can Full Backup be performed when the Database is Open?
What are the steps involved in instance Recovery?
What are the steps involved in Database Startup?
Which parameter specified in the Default Storage Clause of create Tablespace cannot be Altered after creating the Table Space?
What is Online redo Log?
What is Log Switch?
What is Dimensional Modelling?
What are the difference between snow Flake and Star Schema and what are situations where Snow Flake Schema is better than Star Schema to use and when the opposite is True?
What is a Cube in Data Warehousing concept?
What are the differences between Star and Snowflake Schema?
What are Data Marts?
What is the Data Type of the Surrogate Key?
What are Fact and Dimension and Measure?
What are the different Types of Data Warehousing?
What do you mean by Static and Local Variable?
What is a Source Qualifier?
What are the Steps to Build the Data Warehouse?
What is the advantages Data Mining over Traditional approaches?
What is the difference between View and Materialized View?
What is the main difference between Inmon and Kimball Philosophies of Data Warehousing?
What is Junk Dimension and what is the difference between Junk Dimension and Degenerated Dimension?
Why Fact Table is in Normal Form?
What is difference between Er Modeling and Dimensional Modeling?
What is Conformed Fact?
What are the Methodologies of Data Warehousing?
What is Bus Schema?
What is Data Warehousing Hierarchy?
What are Data Validation Strategies for Data Mart Validation after loading process?
What are the Data Types present in Bo and what happens if we implement View in the Designer N Report?
What is Surrogate Key and where we use it?
What is a Linked Cube?
What is meant by Metadata in Context of a Data Warehouse and how it is important?
What are the possible Data Marts in Retail Sales?
What are the various Etl Tools in the market?
What is Dimensional Modeling?
What is Vldb?
What is Degenerate Dimension Table?
What is Er Diagram?
What is the difference between Snowflake and Star Schema and what are situations where Snowflake Schema is better than Star Schema?
Can a Dimension Table contain numeric values?
What is Hybrid Slowly Changing Dimension?
How many clustered indexes can you create for a table in Dwh and in case Of Truncate and Delete command what happens to table which has unique Id.
What is Loop in Data Warehousing?
What is an Error Log Table in Informatica occurs and how to maintain it in mapping?
What is Drilling Across?
Where the Cache Files stored?
What is Dimension Modeling?
What is Data Cleaning?
Can you explain the Hierarchies Level Data Warehousing?
Can you explain about Core Dimension and Balanced Dimension and Dirty Dimension?
What is Core Dimension?
After we create a Scd Table can we use that Particular Dimension as a Dimension Table for Star Schema?
Suppose you are filtering rows using a Filter Transformation and only rows meet the condition pass to the Target so tell me where rows will go that does not meet condition.
What is Galaxy Schema?
Briefly state different between Data Ware House and Data Mart?
What is MetaData?
What is the Definitions for Datawarehose And Datamart?
What is Data Validation Strategies for Data Mart validation after Loading Process
What is Data Mining?
What is Ods?
What is Etl?
Is Oltp Database is design optimal for Data Warehouse?
If Denormalized is improves Data Warehouse Processes and why Fact Table is in Normal Form?
What are Lookup Tables?
What are Aggregate Tables?
What is real time Datawarehousing?
What are Conformed Dimensions?
How do you load the Time Dimension?
What is a Level of Granularity of a Fact Table?
What are Non additive facts?
What is Factless Facts Table?
Explain about Olap?
Explain about the Functionality of Olap?
Explain about Molap?
Explain about Rolap?
Explain about Aggregations?
Explain about the View Selection problem?
Explain about the role of Bitmap Indexes to solve Aggregation Problems?
Explain about Encoding Technique used in Bitmaps Indexes?
Explain about Binning?
Explain about Hybrid Olap?
Explain about Shared Features of Olap?
Explain about Analysis?
Explain about Multidimensional Features present in Olap?
Explain about the Database Marketing Application of Olap?
Compare Data Warehouse Database and Oltp Database.
What is the difference between Etl Tool and Olap Tool and what are various Etl in the Market?
Steps in building the Data Model.
Why is Data Modeling important?
What Type of Indexing Mechanism do we need use for a typical Datawarehouse?
What are Semi additive and Factless Facts?
Is it correct develop a Data Mart using an Ods?
Explain degenerated dimension.
What are the different methods of loading Dimension Tables?
What are Slowly Changing Dimensions?
What is meant by Metadata in context of a Datawarehouse?
What are Modeling Tools available in the Market
What is the main difference between Schema in Rdbms and Schemas in Datawarehouse?
What is a general purpose Scheduling Tool?
What is the need of Surrogate Key and why Primary Key not used as Surrogate Key?
What is Snow Flake Schema?
What is the difference between Oltp and Olap?
How are the Dimension Tables designed?
What are the advantages Data Mining over traditional approaches?
Which automation tool is used in Data Warehouse testing?
Give examples of Degenerated Dimensions.
What is the datatype of the Surrogate Key?
What is the difference between scan Component and Rollup Component?
What is M_dump?
What is Brodcasting and Replicate?
What is Local and Formal Parameter?
What is the difference between Dml Expression and Xfr Expression?
Have you used Rollup Component?
What are Primary Keys and Foreign Keys?
What is Outer Join?
What are Cartesian Joins?
What is the purpose of having Stored Procedures in a Database?
Why might you create a Stored Procedure with Recompile Option?
What is Cursor?
Describe process steps you would perform when Defragmenting a Data Table and this table contains mission critical Data?
Explain the difference between Truncate and Delete Commands?
How would you find out whether Sql Query is using Indices you Expect?
What are the Security Level used in Bo?
What are the Functional and Architectural differences between Business Objects and Web Intelligence Reports?
What is batch processing in Business Objects?
What is Data Cardinality?
What is Chained Data Replication?
Explain in brief various fundamental stages of Data Warehousing.
What is the difference between Enterprise Data Warehouse and Data Warehouse?
Give me any example of Semi and Non Additive Measures?
What are the options in the Target Session of Update Strategy Transformations?
What are the Various Types of Transformation?
What is the difference between Active Transformation and Passive Transformation?
What is the difference between Static Cache and Dynamic Cache?
How do we join Two tables without Joiner or Sql Override?
Differences between Normalizer and Normalizer Transformation.
What is Business Intelligence?
What is a Universe in Business Intelligence?
What is Olap in Business Intelligence?
What are various Modules in Business Objects Product?
What is Olap Molap Rolap Dolap Holap?
Why an Infocube has maximum of 16 dimensions?
Name some standard Business Intelligence Tools in the Market?
What are Dashboards?
What is Hierarchy Relationship in a Dimension.
What are Adhoc reports and Static Reports?
What is the Importance of Surrogate Key in Data Warehousing?
What is a Query?
What are the Features of a Physical Data Model?
What are the steps to design a Physical Model?
What are the Features of Conceptual Data Model?
What are the difference between Logical Data Model and Conceptual Data Model?
What are the steps to design Logical Data Model?
What is Etl?
What is a Three Tier Data Warehouse?
What is Etl Process and how many steps Etl contains?
What is Full Load and Incremental or Refresh Load?
What is a Staging Area?
Compare Etl and Manual Development.
What is Rdbms?
What is Normalization?
What are different Normalization Forms?
What is Stored Procedure?
What is Trigger?
What is View?
Advantages of Dbms?
Disadvantage in File Processing System?
Describe Three Levels of Data Abstraction?
Define integrity Rules?
What is Extension and Intention?
What is Data Independence?
What is a View and how it is related to Data Independence?
What is Data Model?
What is Object Oriented Model?
What is an Entity?
What is an Entity Type?
What is an Entity Set?
What is an Attribute?
What is Relation Schema and Relation?
What is Degree of Relation?
What is Relationship?
What is Relationship Set?
What is Relationship Type?
What Is DDL?
What Is Vdl?
What is Sdl?
What Is Data Storage Definition Language?
What Is Dml?
What is Query Evaluation Engine?
What is Ddl Interpreter?
What is Record at a time?
What is Set at a time or Set oriented?
What is Relational Algebra?
What is Relational Calculus?
How does Tuple oriented Relational calculus differ from Domain oriented Relational Calculus?
What is Functional Dependency?
What is Multivalued Dependency?
What is Lossless Join Property?
What Is 1 Nf?
What is Fully Functional Dependency?
What is 2nf?
What is 3nf?
What is 4nf?
What is 5nf?
What is Domain key NF?
What are Partial Alternate Artificial Compound and Natural Key?
What is Indexing and what are the different kinds of Indexing?
What is meant by Query Optimization?
What is Join Dependency and Inclusion Dependency?
What is Durability in Dbms?
What do you mean by Atomicity and Aggregation?
What is Phantom Deadlock?
What is Checkpoint and when does it cccur?
What are different Phases of Transaction?
What do you mean by Flat File Database?
What is transparent Dbms?
What do you mean by Correlated Subquery?
What are the Primitive Operations common to all record management systems?
What are Unary Operations in Relational Algebra?
Are resulting Relations of Product and Join Operation the same?
What is Rdbms Kernel?
Name the Sub systems of Rdbms?
What is Rowid?
What is Storage Manager?
What is Buffer Manager?
What is Transaction Manager?
What is File Manager?
What is Authorization and Integrity Manager?
What are Stand alone procedures?
What are the different methods of loading dimension tables?
Describe the foreign key columns in fact tables and dimension tables?
Table of Contents
Amazon Web Services
What is EC2?
What is SnowBall?
What is CloudWatch?
What is Elastic Transcoder?
What do you understand by VPC?
DNS and Load Balancer Services come under which type of Cloud Service?
What are the Storage Classes available in Amazon S3?
Explain what T2 instances are?
What are Key-Pairs in AWS?
How many Subnets can you have per VPC?
List different types of Cloud Services.
Explain what S3 is?
How does Amazon Route 53 provide high availability and low latency?
How can you send a request to Amazon S3?
What does AMI include?
What are the different types of Instances?
What is the relation between the Availability Zone and Region?
How do you monitor Amazon VPC?
What are the different types of EC2 instances based on their costs?
What do you understand by stopping and terminating an EC2 Instance?
What are the consistency models for modern DBs offered by AWS?
What is Geo-Targeting in CloudFront?
What are the advantages of AWS IAM?
What do you understand by a Security Group?
What are Spot Instances and On-Demand Instances?
Explain Connection Draining.
What is a Stateful and a Stateless Firewall?
What is a Power User Access in AWS?
What is an Instance Store Volume and an EBS Volume?
What are Recovery Time Objective and Recovery Point Objective in AWS?
Is there a way to upload a file that is greater than 100 Megabytes in Amazon S3?
Can you change the Private IP Address of an EC2 instance while it is running or in a stopped state?
What is the use of lifecycle hooks is Autoscaling?
What are the policies that you can set for your user’s passwords?
What do tou know about the Amazon Database?
Explain Amazon Relational Database?
What are the Features of Amazon Database?
Which of the Aws Db Service is a Nosql Database and Serverless and Delivers Consistent singledigit Millisecond Latency at any scale?
What is Key Value Store?
What is Dynamodb?
List of the benefits of using Amazon Dynamodb?
What is a Dynamodbmapper Class?
What are the Data Types supported by Dynamodb?
What do you understand by Dynamodb Auto Scaling?
What is a Data Warehouse and how Aws Redshift can play a vital role in the Storage?
What is Amazon Redshift and why is it popular among other Cloud Data Warehouses?
What is Redshift Spectrum?
What is a Leader Node and Compute Node?
How to load data iIn Amazon Redshift?
Mention the database engines which are supported by Amazon Rds?
What is the work of Amazon Rds?
What is the purpose of standby Rds Instance?
Are Rds instances upgradable or down gradable according to the Need?
What is Amazon Elastic Ache?
What is the use of Amazon Elastic Ache?
What are the Benefits of Amazon Elastic Ache?
Explain the Types of Engines in Elastic Ache?
Is it possible to run Multiple Db Instances for free for Amazon Rds?
Which Aws Services will you choose for collecting and processing Ecommerce Data for Realtime Analysis?
What will happen to the Db Snapshots and Backups if any user deletes Db Instance?
Table of Contents
Google Cloud Platform
What is Google BigQuery
How does GCP handle data redundancy and backup
What is the purpose of Cloud Composer in GCP
How does GCP store data
What is the purpose of Cloud Machine Learning Engine in GCP
What is Cloud SQL and how is it used in GCP
Explain the use of Cloud CDN (Content Delivery Network) in GCP
How can you securely transfer data to and from GCP
Explain the role of Cloud Storage in GCP
What is the purpose of Cloud Monitoring in GCP
What is the purpose of Cloud NAT in GCP
How can you manage data access and permissions in GCP
How can you monitor and analyze GCP costs
What is Google Cloud Platform (GCP)
How does GCP ensure compliance and data privacy
Explain the use of Cloud Datastore in GCP
What is the purpose of Cloud Security Scanner in GCP
How does GCP handle data encryption
Explain the use of Cloud VPN in GCP
Explain the use of Cloud Dataflow in GCP
How can you optimize data ingestion in GCP
How can you monitor and troubleshoot performance issues in GCP
What is the purpose of Cloud DNS in GCP
What are the key components of GCP
How can you ensure high availability and fault tolerance in GCP
How can you ensure data integrity in GCP
What are the advantages of using GCP for data engineering
What is Cloud Dataproc
What is Memorystore in GCP
Explain the use of Cloud Security Command Center in GCP
How can you monitor and analyze GCP resources and services
Explain the use of Cloud Key Management Service (KMS) in GCP
What is Cloud Pub/Sub
What is the purpose of Cloud SQL Proxy in GCP
Explain the concept of VPC (Virtual Private Cloud) in GCP
Explain the use of Cloud Machine Learning Engine in GCP
How does GCP handle data replication and synchronization
How does GCP handle disaster recovery
What is the purpose of Cloud Functions in GCP
Explain the concept of dataflow in GCP
What is the purpose of Cloud Load Balancing in GCP
How does GCP ensure data security
Explain the use of Cloud IoT Core in GCP
Explain the use of Cloud Composer in GCP
What is the purpose of Cloud Deployment Manager in GCP
Explain the use of Cloud Run in GCP
What is Cloud Spanner in GCP
How does GCP handle data archiving and long-term storage
Explain the use of Cloud Datalab in GCP
What is Cloud Composer in GCP
Explain the use of Cloud AutoML in GCP
What is Cloud Memorystore for Redis in GCP
What is the purpose of Cloud Identity and Access Management (IAM) in GCP
How can you optimize data processing in GCP
How can you move data into GCP for analysis
Table of Contents
Bigquery
What is the purpose of BigQuery ML's CREATE MODEL statement
What is clustering, and how does it optimize query performance
How can you export BigQuery query results to a file
How does BigQuery handle data privacy and security
What is the purpose of the INFORMATION_SCHEMA in BigQuery
Can you explain the concept of wildcard tables in BigQuery
What is the difference between BigQuery and traditional relational databases
What are the benefits of using partitioned tables in BigQuery
Can you explain the concept of BigQuery's workload management
What are the limitations of using BigQuery streaming inserts
Can you explain the concept of slot reservations in BigQuery
How does BigQuery handle data skew and hotspots in queries
Can you explain the concept of streaming buffer in BigQuery
What are the different data export options in BigQuery
What is the purpose of the BigQuery ML service
How can you monitor and optimize BigQuery costs
What is the purpose of the BigQuery Storage API
How does BigQuery handle nested data types like arrays and structs
How do you load data into BigQuery
What are the limitations or constraints of using BigQuery
What is the purpose of the BigQuery Data Transfer Service for SaaS
What is the purpose of the BigQuery ML EVALUATE statement
How can you monitor and troubleshoot query performance in BigQuery
Explain the concept of federated queries in BigQuery
How does BigQuery handle data backup and recovery
How does BigQuery handle data encryption
How does BigQuery handle schema changes for large tables
How can you control access and permissions in BigQuery
What is the role of service accounts in BigQuery
Explain the concept of partitioning in BigQuery
What is the purpose of BigQuery reservations
Can you explain the concept of clustering keys in BigQuery
What is the purpose of BigQuery BI Engine
How does BigQuery handle data export to external services
Can you explain the concept of BigQuery's query cache
Can you explain the concept of BigQuery Omni
What is the purpose of the BigQuery Data QnA service
What is BigQuery, and how does it fit into the data engineering ecosystem
How does BigQuery handle data consistency in distributed queries
How does BigQuery handle data ingestion from streaming sources
Explain the concept of nested and repeated fields in BigQuery
How does BigQuery handle data partitioning and clustering
What are the key advantages of using BigQuery
How does BigQuery handle nested and repeated fields in JSON data
What are the different types of pricing models available for BigQuery
Explain the difference between BigQuery slots and slots reservation
Can you explain the concept of BigQuery's billing export
Can you share your experience with implementing data pipelines in BigQuery
What is the difference between a table decorator and a snapshot decorator in BigQuery
How can you automate BigQuery tasks using Cloud Functions
Can you explain the concept of streaming inserts in BigQuery
How does BigQuery handle data storage and processing
What is the difference between a view and a materialized view in BigQuery
How can you handle schema evolution in BigQuery
How does BigQuery handle query optimization and query execution
How can you monitor and troubleshoot streaming data pipelines in BigQuery
Can you explain the concept of table clustering and its benefits
Can you explain the concept of BigQuery federated queries
How does BigQuery handle data security
Can you explain the concept of materialized views in BigQuery
Can you explain the concept of time travel in BigQuery
Can you explain the concept of slots in BigQuery
What are the best practices for optimizing query performance in BigQuery
Can you explain the concept of geographic data types in BigQuery
Can you explain the concept of the BigQuery Data Catalog
Can you explain the concept of data sharding in BigQuery
What is the purpose of the BigQuery ML TRANSFORM statement
How does BigQuery handle data deduplication during batch loading
How can you optimize data storage costs in BigQuery
Can you explain the concept of data lineage in BigQuery
What are the best practices for data modeling in BigQuery
How can you optimize query performance in BigQuery
How can you schedule and automate jobs in BigQuery
Can you explain the concept of query caching in BigQuery
How does BigQuery handle data deduplication
What is the difference between a table and a view in BigQuery
What is the role of BigQuery Data Transfer Service
What are the different data ingestion options in BigQuery
What is the purpose of the BigQuery Data Transfer Service
How can you automate BigQuery tasks using Cloud Composer
Table of Contents
Bigtable
What is the role of the Bigtable client library
What is the role of the Bigtable master server
Can you explain the role of Bigtable's client-side buffering and batching in optimizing write operations
Does Bigtable provide support for complex data types like arrays or JSON
How does Bigtable handle schema evolution
What is the impact of schema design on Bigtable performance
Can you explain how Bigtable handles high availability and seamless failover
How does Bigtable handle data replication across regions
How does Bigtable handle storage growth over time
How does Big table handle data sharding and distribution
How does Bigtable handle schema changes
How does Bigtable handle access control for different types of operations, such as read, write, or delete
What are some typical use cases for Bigtable
Can you explain how Bigtable handles access control for different levels of data granularity
How does Bigtable handle backups and disaster recovery
What are the considerations for choosing between Bigtable and other databases like Cassandra or MongoDB
Can you explain how Bigtable handles data compression and decompression
How does Bigtable support structured data
What consistency model does Bigtable provide
Does Bigtable support full-text search capabilities
How does Bigtable ensure high performance
Does Bigtable support integration with popular data processing frameworks like Apache Spark or Apache Beam
Does Bigtable support integration with machine learning frameworks like TensorFlow or PyTorch
How does Bigtable achieve scalability
Can you explain how Bigtable handles data encryption
Can you explain the difference between Bigtable and HBase
Does Bigtable provide automatic indexing for faster querying
Can you explain the concept of Bigtable's compaction and memtable
How does Bigtable handle access control for data in transit
What are the advantages of using Bigtable over traditional relational databases
How does Bigtable handle data durability and fault tolerance
Can you explain how Bigtable handles data replication across regions in terms of consistency and latency
How does Bigtable handle data locality in a multi-region setup
What is a tablet in Bigtable
Does Bigtable provide automatic indexing for efficient querying
Can you explain how Bigtable handles data access control for multi-tenant environments
How does Bigtable handle data consistency across replicas
How does Bigtable handle hotspots
How does Bigtable handle backup and restore operations
How does Bigtable handle access control and security
Can you explain the role of Bigtable's tablet placement policy
What is Bigtable
Does Bigtable support change data capture (CDC) for real-time data integration
Can you explain the role of Bigtable's compression algorithm, Snappy
Can you explain the role of a Bloom filter in Bigtable
Does Bigtable support integration with popular ETL (Extract, Transform, Load) tools
Does Bigtable support automatic data partitioning
Does Bigtable support secondary indexes
Does Bigtable support ACID transactions
How does Bigtable handle time-based data, such as event logs
Can you explain how Bigtable handles data versioning
Does Bigtable provide support for aggregations and analytics
How can you interact with Bigtable
Does Bigtable support time travel queries with fine-grained control over historical data retrieval
How does Bigtable handle schema changes without downtime
How does Bigtable handle concurrent access to the same row
Can you describe the data model used in Bigtable