This repository has been archived by the owner on Jul 23, 2024. It is now read-only.
forked from apache/pig
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES.txt
2673 lines (1499 loc) · 91 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Pig Change Log
Trunk (unreleased changes)
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-2365: Current TOP implementation needlessly results in a null bag name (jcoveney via dvryaboy)
PIG-2151: Add annotation to specify output schema in Java UDFs (dvryaboy)
PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan)
PIG-2328: Add builtin UDFs for building and using bloom filters (gates)
PIG-2338: Need signature for EvalFunc (daijy)
PIG-2337: Provide UDF with input schema (xutingz via daijy)
OPTIMIZATIONS
BUG FIXES
PIG-2355: ant clean does not clean e2e test build artifacts (daijy)
PIG-2352: e2e test harness' use of environment variables causes unintended effects between tests (gates)
Release 0.10.0 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-2332: JsonLoader/JsonStorage (daijy)
PIG-2334: Set default number of reducers for S3N filesystem (ddaniels888 via daijy)
PIG-1387: Syntactical Sugar for PIG-1385 (azaroth)
PIG-2305: Pig should log the split locations in task logs (vivekp via thejas)
PIG-2293: Pig should support a more efficient merge join against data sources that natively support point
lookups or where the join is against large, sparse tables (aklish via daijy)
PIG-2287: add test cases for limit and sample that use expressions with
constants only (no scalar variables) (thejas via gates)
PIG-2092: Missing sh command from Grant shell (olgan)
PIG-2163: Improve nested cross to stream one relation (zjshen via daijy)
PIG-2249: Enable pig e2e testing on EC2 (gates)
PIG-2256: Upgrade Avro dependency to 1.5.3 (tucu00 via dvryaboy)
PIG-604: Kill the Pig job should kill all associated Hadoop Jobs (daijy)
PIG-2096: End to end tests for new Macro feature (gates)
PIG-2242: Allow the delimiter to be specified when calling TOKENIZE (markroddy via hashutosh)
PIG-2240: Allow any compression codec to be specified in AvroStorage (tomwhite via dvryaboy)
PIG-2229: Pig end-to-end tests should test local mode as well as mr mode (gates)
PIG-2235: Several files in e2e tests aren't being run (gates)
PIG-2196: Test harness should be independent of Pig (hashutosh) -- Missed few
changes in last commit.
PIG-2196: Test harness should be independent of Pig (hashutosh)
PIG-1429: Add Boolean Data Type to Pig (zjshen via daijy)
PIG-2218: Pig end-to-end tests should be accessible from top level build.xml (gates)
PIG-2176: add logical plan assumption checker (thejas)
PIG-1631: Support to 2 level nested foreach (aniket486 via daijy)
PIG-2191: Reduce amount of log spam generated by UDFs (dvryaboy)
PIG-2200: Piggybank cannot be built from the Git mirror (dvryaboy)
PIG-2168: CubeDimensions UDF (dvryaboy)
PIG-2189: e2e test harness needs to use Pig as a source of truth (gates via daijy)
PIG-1904: Default split destination (azaroth via thejas)
PIG-2125: Make Pig work with hadoop .NEXT (daijy)
PIG-2143: Make PigStorage optionally store schema; improve docs. (dvryaboy)
PIG-1973: UDFContext.getUDFContext usage of ThreadLocal pattern
is not typical (woody via thejas)
PIG-2053: PigInputFormat uses class.isAssignableFrom() where
instanceof is more appropriate (woody via thejas)
PIG-2161: TOTUPLE should use no-copy tuple creation (dvryaboy)
PIG-1946: HBaseStorage constructor syntax is error prone (billgraham via dvryaboy)
PIG-2001: DefaultTuple(List) constructor is inefficient, causes List.size()
System.arraycopy() calls (though they are 0 byte copies),
DefaultTuple(int) constructor is a bit misleading wrt time
complexity (woody via thejas)
PIG-1916: Nested cross (zjshen via daijy)
PIG-2128: Generating the jar file takes a lot of time and is unnecessary when running Pig local mode (julien)
PIG-2121: e2e test harness should use ant instead of make (gates)
PIG-2142: Allow registering multiple jars from DFS via single statement (rangadi via dvryaboy)
PIG-1926: Sample/Limit should take scalar (azaroth via thejas)
PIG-1950: e2e test harness needs to be able to compare to previous version of
Pig (gates)
PIG-536: the shell script 'pig' does not work if PIG_HOME has the word 'hadoop' in it's directory (miguno via olgan)
PIG-2108 e2e test harness needs to be able to mark certain tests as ignored
(gates)
PIG-1825: ability to turn off the write ahead log for pig's HBaseStorage (billgraham via dvryaboy)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1824: Support import modules in Jython UDF (woody via rding)
PIG-1994: e2e test harness deployment implementation for existing cluster
(gates)
PIG-2036: [piggybank] Set header delimiter in PigStorageSchema (mmoeller via dvryaboy)
PIG-1949: e2e test harness should use bin/pig rather than calling java
directly (gates)
PIG-2026: e2e tests in eclipse classpath (azaroth via hashutosh)
PIG-2024: Incorrect jar paths in .classpath template for eclipse (azaroth via hashutosh)
OPTIMIZATIONS
PIG-2011: Speed up TestTypedMap.java (dvryaboy)
PIG-2228: support partial aggregation in map task (thejas)
BUG FIXES
PIG-2209: JsonMetadata fails to find schema for glob paths (daijy)
PIG-2165: Need a way to deal with params and param_file in embedded pig in python (daijy)
PIG-2313: NPE in ILLUSTRATE trying to get StatusReporter in STORE (daijy)
PIG-2335: bin/pig does not work with bash 3.0 (azaroth)
PIG-2275: NullPointerException from ILLUSTRATE (daijy)
PIG-2119: DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan (daijy)
PIG-2290: TOBAG wraps tuple parameters in another tuple (ryan.hoegg via thejas)
PIG-2288: Pig 0.9 error message not useful as compared to 0.8 in case
of group by (vivekp via thejas)
PIG-2316: Incorrect results for FILTER *** BY ( *** OR ***) with
FilterLogicExpressionSimplifier optimizer turned on (knoguchi via thejas)
PIG-2271: PIG regression in BinStorage/PigStorage in 0.9.1 (thejas)
PIG-2309: Keyword 'NOT' is wrongly treated as a UDF in split statement (vivekp via thejas)
PIG-2307: Jetty version should be updated in .eclipse.templates/.classpath,
pig-template.xml and pig.pom as well (zjshen via daijy)
PIG-2273: Pig.compileFromFile in embedded python fails when pig script starts with a comment (ddaniels888 via gates)
PIG-2278: Wrong version numbers for libraries in eclipse template classpath (azaroth)
PIG-2115: Fix Pig HBaseStorage configuration and setup issues ([email protected] via dvryaboy)
PIG-2232: "declare" document contains a typo (daijy)
PIG-2055: inconsistent behavior in parser generated during build (thejas)
PIG-2185: NullPointerException while Accessing Empty Bag in FOREACH { FILTER } (daijy)
PIG-2227: Wrong jars copied into lib directory in e2e tests when invoked from top level (gates)
PIG-2219: Pig tests fail if ${user.home}/pigtest/conf does not already exist (cwsteinbach via gates)
PIG-2215: Newlines in function arguments still cause exceptions to be thrown (awarring via gates)
PIG-2214: InternalSortedBag two-arg constructor doesn't pass bagCount (sallen via gates)
PIG-2174: HBaseStorage column filters miss some fields (billgraham via dvryaboy)
PIG-2090: re-enable TestGrunt test cases (thejas)
PIG-2181: Improvement : for error message when describe misses alias (vivekp via daijy)
PIG-2124: Script never ending when joining from the same source (daijy)
PIG-2170: NPE thrown during illustrate (thejas)
PIG-2186: PigStorage new warnings about missing schema file
can be confusing (thejas)
PIG-2179: tests in TestLoad are failing (thejas)
PIG-2146: POStore.getSchema() returns null because of which PigOutputCommitter
is not storing schema while cleanup (thejas)
PIG-2027: NPE if Pig don't have permission for log file (daijy)
PIG-2171: TestScriptLanguage is broken on trunk (daijy and thejas)
PIG-2172: Fix test failure for ant 1.8.x (daijy)
PIG-2162: bin/pig should not modify user args (rangadi via thejas)
PIG-2060: Fix errors in pig grammars reported by ANTLRWorks (azaroth via thejas)
PIG-2156: Limit/Sample with variable does not work if the expression starts
with an integer/double (azaroth via thejas)
PIG-2130: Piggybank:MultiStorage is not compressing output files (vivekp via daijy)
PIG-2147: Support nested tags for XMLLoader (vivekp via daijy)
PIG-1890: Fix piggybank unit test TestAvroStorage (kengoodhope via daijy)
PIG-2110: NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor (dale_jin via daijy)
PIG-2144: ClassCastException when using IsEmpty(DIFF()) (thejas)
PIG-2139: LogicalExpressionSimplifier optimizer rule should check if udf is
deterministic while checking if they are equal (thejas)
PIG-2137: SAMPLE should not be pushed above DISTINCT (dvryaboy and thejas)
PIG-2136: Implementation of Sample should use LessThanExpression
instead of LessThanEqualExpression (azaroth via thejas)
PIG-2140: Usage printed from Main.java gives wrong option for disabling
LogicalExpressionSimplifier (thejas)
PIG-2120: UDFContext.getClientSystemProps() does not respect pig.properties (dvryaboy)
PIG-2129: NOTICE file needs updates (gates)
PIG-2131: Add back test for PIG-1769 (qwertymaniac via gates)
PIG-2112: ResourceSchema.toString does not properly handle maps in the schema (gates)
PIG-1702: Streaming debug output outputs null input-split information (awarring via daijy)
PIG-2109: Ant build continues even if the parser classes fail to be generated. (zjshen via daijy)
PIG-2071: casting numeric type to chararray during schema merge for union
is inconsistent with other schema merge cases (thejas)
PIG-2044: Patten match bug in org.apache.pig.newplan.optimizer.Rule (knoguchi via daijy)
PIG-2048: Add zookeeper to pig jar (gbowyer via gates)
PIG-2008: Cache outputFormat in HBaseStorage (thedatachef via gates)
PIG-2025: org.apache.pig.test.udf.evalfunc.TOMAP is missing package
declaration (azaroth via gates)
PIG-2019: smoketest-jar target has to depend on pigunit-jar to guarantee
inclusion of test classes (cos via gates)
Release 0.9.2 - Unreleased
BUG FIXES
PIG-2320: Error: "projection with nothing to reference" (daijy)
PIG-2346: TypeCastInsert should not insert Foreach if there is no as statement (daijy)
PIG-2339: HCatLoader loads all the partitions in a partitioned table even though
a filter clause on the partitions is specified in the Pig script (daijy)
Release 0.9.1 - Unreleased
IMPROVEMENTS
PIG-2284: Add pig-setup-conf.sh script (eyang via daijy)
PIG-2272: e2e test harness should be able to set HADOOP_HOME (gates via daijy)
PIG-2239: Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command itself (daijy)
PIG-2213: Pig 0.9.1 Documentation (chandec via daijy)
PIG-2221: Couldnt find documentation for ColumnMapKeyPrune optimization rule (chandec via daijy)
BUG FIXES
PIG-2310: bin/pig fail when both pig-0.9.1.jar and pig.jar are in PIG_HOME (daijy)
PIG-1857: Create an package integration project (eyang via daijy)
PIG-2013: Penny gets a null pointer when no properties are set (breed via daijy)
PIG-2102: MonitoredUDF does not work (dvryaboy)
PIG-2152: Null pointer exception while reporting progress (thejas)
PIG-2183: Pig not working with Hadoop 0.20.203.0 (daijy)
PIG-2193: Using HBaseStorage to scan 2 tables in the same Map job produces bad data (rangadi via dvryaboy)
PIG-2199: Penny throws Exception when netty classes are missing (ddaniels888 via daijy)
PIG-2223: error accessing column in output schema of udf having project-star input (thejas)
PIG-2208: Restrict number of PIG generated Haddop counters (rding via daijy)
PIG-2299: jetty 6.1.14 startup issue causes unit tests to fail in CI (thw via daijy)
PIG-2301: Some more bin/pig, build.xml cleanup for 0.9.1 (daijy)
PIG-2237: LIMIT generates wrong number of records if pig determines no of reducers as more than 1 (daijy)
PIG-2261: Restore support for parenthesis in Pig 0.9 (rding via daijy)
PIG-2238: Pig 0.9 error message not useful as compared to 0.8 (daijy)
PIG-2286: Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization (daijy)
PIG-2270: Put jython.jar in classpath (daijy)
PIG-2274: remove pig deb package dependency on sun-java6-jre (gkesavan via daijy)
PIG-2264: Change conf/log4j.properties to conf/log4j.properties.template (daijy)
PIG-2231: Limit produce wrong number of records after foreach flatten (daijy)
Release 0.9.0 - Unreleased
INCOMPATIBLE CHANGES
PIG-1622: DEFINE streaming options are ill defined and not properly documented (xuefu)
PIG-1680: HBaseStorage should work with HBase 0.90 (gstathis, billgraham, dvryaboy, tlipcon via dvryaboy)
PIG-1745: Disable converting bytes loading from BinStorage (daijy)
PIG-1188: Padding nulls to the input tuple according to input schema (daijy)
PIG-1876: Typed map for Pig (daijy)
IMPROVEMENTS
PIG-1938: support project-range as udf argument (thejas)
PIG-2059: PIG doesn't validate incomplete query in batch mode even if -c option is given (xuefu)
PIG-2062: Script silently ended (xuefu)
PIG-2039: IndexOutOfBounException for a case (xuefu)
PIG-2038: Pig fails to parse empty tuple/map/bag constant (xuefu)
PIG-1775: Removal of old logical plan (xuefu)
PIG-1998: Allow macro to return void (rding)
PIG-2003: Using keyward as alias doesn't either emit an error or produce a logical plan (xuefu)
PIG-1981: LoadPushDown.pushProjection should pass alias in addition to position (daijy)
PIG-2006: Regression: NPE when Pig processes an empty script file, fix test case (xuefu)
PIG-2006: Regression: NPE when Pig processes an empty script file (xuefu)
PIG-2007: Parsing error when map key referred directly from udf in nested foreach (xuefu)
PIG-2000: Pig gives incorrect error message dealing with scalar projection (xuefu)
PIG-2002: Regression: Pig gives error "Projection with nothing to reference!" for a valid query (xuefu)
PIG-1921: Improve error messages in new parser (xuefu)
PIG-1996: Pig new parser fails to recognize PARALLEL keywords in a case (xuefu)
PIG-1612: error reporting: PigException needs to have a way to indicate that
its message is appropriate for user (laukik via thejas)
PIG-1782: Add ability to load data by column family in HBaseStorage (billgraham via dvryaboy)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1954: Design deployment interface for e2e test harness (gates)
PIG-1881: Need a special interface for Penny (Inspector Gadget) (laukik via
gates)
PIG-1947: Incorrect line number is reported during parsing(xuefu)
PIG1918: Line number should be give for logical plan failures (xuefu)
PIG-1961: Pig prints "null" as file name in case of grammar error (xuefu)
PIG-1956: Pig parser shouldn't log error code 0 (xuefu)
PIG-1957: Pig parser gives misleading error message when the next foreach block has syntactic errors (xuefu)
PIG-1958: Regression: Pig doesn't log type cast warning messages (xuefu)
PIG-1918: Line number should be give for logical plan failures (xuefu)
PIG-1899: Add end to end test harness for Pig (gates)
PIG-1932: GFCross should allow the user to set the DEFAULT_PARALLELISM value (gates)
PIG-1913: Use a file for excluding tests (tomwhite via gates)
PIG-1693: support project-range expression. (was: There
needs to be a way in foreach to indicate "and all the
rest of the fields" ) (thejas)
PIG-1772: Pig 090 Documentation (chandec via daijy)
PIG-1830: Type mismatch error in key from map, when doing GROUP on PigStorageSchema() variable (dvryaboy)
PIG-1566: Support globbing for registering jars in pig script (nrai via daijy)
PIG-1886: Add zookeeper jar to list of jars shipped when HBaseStorage used (dvryaboy)
PIG-1874: Make PigServer work in a multithreading environment (rding)
PIG-1889: bin/pig should pick up HBase configuration from HBASE_CONF_DIR
PIG-1794: Javascript support for Pig embedding and UDFs in scripting languages (julien)
PIG-1853: Using ANTLR jars from maven repository (rding)
PIG-1728: more doc updates (chandec via olgan)
PIG-1793: Add macro expansion to Pig Latin (rding)
PIG-847: Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag (daijy)
PIG-1748: Add load/store function AvroStorage for avro data (guolin2001, jghoman via daijy)
PIG-1769: Consistency for HBaseStorage (dvryaboy)
PIG-1786: Move describe/nested describe to new logical plan (daijy)
PIG-1809: addition of TOMAP function (olgan)
PIG-1749: Update Pig parser so that function arguments can contain newline characters (jghoman via daijy)
PIG-1806: Modify embedded Pig API for usability (rding)
PIG-1799: Provide deployable maven artifacts for pigunit and pig smoke tests
(cos via gates)
PIG-1728: turing complete docs (chandec via olgan)
PIG-1675: allow PigServer to register pig script from InputStream (zjffdu via dvryaboy)
PIG-1479: Embed Pig in scripting languages (rding)
PIG-946: Combiner optimizer does not optimize when limit follow group, foreach (thejas)
PIG-1277: Pig should give error message when cogroup on tuple keys of different inner type (daijy)
PIG-1755: Clean up duplicated code in PhysicalOperators (dvryaboy)
PIG-750: Use combiner when algebraic UDFs are used in expressions (thejas)
PIG-490: Combiner not used when group elements referred to in
tuple notation instead of flatten. (thejas)
PIG-1768: 09 docs: illustrate (changec via olgan)
PIG-1768: docs reorg (changec via olgan)
PIG-1712: ILLUSTRATE rework (yanz)
PIG-1758: Deep cast of complex type (daijy)
PIG-1728: doc updates (chandec via olgan)
PIG-1752: Enable UDFs to indicate files to load into the Distributed Cache
(gates)
PIG-1747: pattern match classes for matching patterns in physical plan (thejas)
PIG-1707: Allow pig build to pull from alternate maven repo to enable building
against newer hadoop versions (pradeepkth)
PIG-1618: Switch to new parser generator technology (xuefuz via thejas)
PIG-1531: Pig gobbles up error messages (nrai via hashutosh)
PIG-1508: Make 'docs' target (forrest) work with Java 1.6 (cwsteinbach via gates)
PIG-1608: pig should always include pig-default.properties and pig.properties in the pig.jar (nrai via daijy)
OPTIMIZATIONS
PIG-1696: Performance: Use System.arraycopy() instead of manually copying the bytes while reading the data (hashutosh)
BUG FIXES
PIG-2159: New logical plan uses incorrect class for SUM causing for ClassCastException (daijy)
PIG-2106: Fix Zebra unit test TestBasicUnion.testNeg3, TestBasicUnion.testNeg4 (daijy)
PIG-2083: bincond ERROR 1025: Invalid field projection when null is used (thejas)
PIG-2089: Javadoc for ResourceFieldSchema.getSchema() is wrong (daijy)
PIG-2084: pig is running validation for a statement at a time batch mode,
instead of running it for whole script (thejas)
PIG-2088: Return alias validation failed when there is single line comment in the macro (rding)
PIG-2081: Dryrun gives wrong line numbers in error message for scripts containing macro (rding)
PIG-2078: POProject.getNext(DataBag) does not handle null (daijy)
PIG-2029: Inconsistency in Pig Stats reports (rding)
PIG-2070: "Unknown" appears in error message for an error case (thejas)
PIG-2069: LoadFunc jar does not ship to backend in MultiQuery case (rding)
PIG-2076: update documentation, help command with correct default value
of pig.cachedbag.memusage (thejas)
PIG-2072: NPE when udf has project-star argument and input schema is null (thejas)
PIG-2075: Bring back TestNewPlanPushUpFilter (daijy)
PIG-1827: When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason (rding)
PIG-2056: Jython error messages should show script name (rding)
PIG-2014: SAMPLE shouldn't be pushed up (dvryaboy)
PIG-2058: Macro missing returns clause doesn't give a good error message (rding)
PIG-2035: Macro expansion doesn't handle multiple expansions of same macro inside another macro (rding)
PIG-2030: Merged join/cogroup does not automatically ship loader (daijy)
PIG-2052: Ship guava.jar to backend (daijy)
PIG-2012: Comments at the begining of the file throws off line numbers in errors (rding)
PIG-2043: Ship antlr-runtime.jar to backend (daijy)
PIG-2049: Pig should display TokenMgrError message consistently across all parsers (rding)
PIG-2041: Minicluster should make each run independent (daijy)
PIG-2040: Move classloader from QueryParserDriver to PigContext (daijy)
PIG-1999: Macro alias masker should consider schema context (rding)
PIG-1821: UDFContext.getUDFProperties does not handle collisions
in hashcode of udf classname (+ arg hashcodes) (thejas)
PIG-2028: Speed up multiquery unit tests (rding)
PIG-1990: support casting of complex types with empty inner schema
to complex type with non-empty inner schema (thejas)
PIG-2016: -dot option does not work with explain and new logical plan (daijy)
PIG-2018: NPE for co-group with group-by column having complex schema and
different load functions for each input (thejas)
PIG-2015: Explain writes out logical plan twice (alangates)
PIG-2017: consumeMap() fails with EmptyStackException (thedatachef via daijy)
PIG-1989: complex type casting should return null on casting failure (daijy)
PIG-1826: Unexpected data type -1 found in stream error (daijy)
PIG-2004: Incorrect input types passed on to eval function (thejas)
PIG-1814: mapred.output.compress in SET statement does not work (daijy)
PIG-1976: One more TwoLevelAccess to remove (daijy)
PIG-1865: BinStorage/PigStorageSchema cannot load data from a different namenode (daijy)
PIG-1910: incorrect schema shown when project-star is used with other projections (daijy)
PIG-2005: Discrepancy in the way dry run handles semicolon in macro definition (rding)
PIG-1281: Detect org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.Tuple type of errors at Compile Type during
creation of logical plan (thejas)
PIG-1939: order-by statement should support project-range to-end in
any position among the sort columns if input schema is known (thejas)
PIG-1978: Secondary sort fail when dereferencing two fields inside foreach (daijy)
PIG-1962: Wrong alias assinged to store operator (daijy)
PIG-1975: Need to provide backward compatibility for legacy LoadCaster (without bytesToMap(bytes, fieldSchema)) (daijy)
PIG-1987: -dryrun does not work with set (rding)
PIG-1871: Dont throw exception if partition filters cannot be pushed up. (rding)
PIG-1870: HBaseStorage doesn't project correctly (dvryaboy)
PIG-1788: relation-as-scalar error messages should indicate the field
being used as scalar (laukik via thejas)
PIG-1697: NullPointerException if log4j.properties is Used (laukik via daijy)
PIG-1929:Type checker failed to catch invalid type comparison (thejas)
PIG-1928: Type Checking, incorrect error message (thejas)
PIG-1979: New logical plan failing with ERROR 2229: Couldn't find matching uid -1 (daijy)
PIG-1897: multiple star projection in a statement does not produce
the right plan (thejas)
PIG-1917: NativeMapReduce does not Allow Configuration Parameters
containing Spaces (thejas)
PIG-1974: Lineage need to set for every cast (thejas)
PIG-1988: Importing an empty macro file causing NPE (rding)
PIG-1977: "Stream closed" error while reading Pig temp files (results of intermediate jobs) (rding)
PIG-1963: in nested foreach, accumutive udf taking input from order-by does not get results in order (thejas)
PIG-1911: Infinite loop with accumulator function in nested foreach (thejas)
PIG-1923: Jython UDFs fail to convert Maps of Integer values back to Pig types (julien)
PIG-1944: register javascript UDFs does not work (julien)
PIG-1955: PhysicalOperator has a member variable (non-static) Log object that
is non-transient, this causes serialization errors (woody via rding)
PIG-1964: PigStorageSchema fails if a column value is null (thejas))
PIG-1866: Dereference a bag within a tuple does not work (daijy)
PIG-1984: Worng stats shown when there are multiple stores but same file names (rding)
PIG-1893: Pig report input size -1 for empty input file (rding)
PIG-1868: New logical plan fails when I have complex data types from udf
(daijy)
PIG-1927: Dereference partial name failed (daijy)
PIG-1934: Fix zebra test TestCheckin1, TestCheckin4 (daijy)
PIG-1931: Integrate Macro Expansion with New Parser (rding)
PIG-1933: Hints such as 'collected' and 'skewed' for "group by" or "join by"
should not be treated as tokens. (xuefuz via thejas)
PIG-1925: Parser error message doesn't show location of the error or show it
as Line 0:0 (xuefuz via gates)
PIG-671: typechecker does not throw an error when multiple arguments are
passed to COUNT (deepujain via gates)
PIG-1152: bincond operator throws parser error (xuefuz via thejas)
PIG-1885: SUBSTRING fails when input length less than start (deepujain via
gates)
PIG-719: store <expr> into 'filename'; should be valid syntax, but does not work (xuefuz via thejas)
PIG-1770: matches clause problem with chars that have special meaning in dk.brics - #, @ .. (thejas)
PIG-1862: Pig returns exit code 0 for the failed Pig script due to non-existing input directory (rding)
PIG-1888: Fix TestLogicalPlanGenerator not use hardcoded path (daijy)
PIG-1837: Error while using IsEmpty function (rding)
PIG-1884: Change ReadToEndLoader.setLocation not throw UnsupportedOperationException (thejas)
PIG-1887: Fix pig-withouthadoop.jar to contains proper jars (daijy)
PIG-1779: Wrong stats shown when there are multiple loads but same file names (rding)
PIG-1861: The pig script stored in the Hadoop History logs is stored as a concatenated string without whitespace this causes problems when attempting to extract and execute the script (rding)
PIG-1829: "0" value seen in PigStat's map/reduce runtime, even when the job is successful (rding)
PIG-1856: Custom jar is not packaged with the new job created by LimitAdjuster (rding)
PIG-1872: Fix bug in AvroStorage (guolin2001, jghoman via daijy)
PIG-1536: use same logic for merging inner schemas in "default union" and
"union onschema" (daijy)
PIG-1304: Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input (laukik via rding)
PIG-1852: Packaging antlr jar with pig.jar (rding via daijy)
PIG-1717 pig needs to call setPartitionFilter if schema is null but
getPartitionKeys is not (gerritjvv via gates)
PIG-313: Error handling aggregate of a computation (daijy)
PIG-496: project of bags from complex data causes failures (daijy)
PIG-730: problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema (daijy)
PIG-767: Schema reported from DESCRIBE and actual schema of inner bags are different (daijy)
PIG-1801: Need better error message for Jython errors (rding)
PIG-1742: org.apache.pig.newplan.optimizer.Rule.java does not work
with plan patterns where leaves/sinks are not siblings (thejas)
Release 0.8.0 - Unreleased
INCOMPATIBLE CHANGES
PIG-1518: multi file input format for loaders (yanz via rding)
PIG-1249: Safe-guards against misconfigured Pig scripts without PARALLEL keyword (zjffdu vi olgan)
IMPROVEMENTS
PIG-1561: XMLLoader in Piggybank does not support bz2 or gzip compressed XML files (vivekp via daijy)
PIG-1677: modify the repository path of pig artifacts to org/apache/pig in stead or org/apache/hadoop/pig (nrai via olgan)
PIG-1600: Docs update (romainr via olgan)
PIG-1632: The core jar in the tarball contains the kitchen sink (eli via olgan)
PIG-1617: 'group all' should always use one reducer (thejas)
PIG-1589: add test cases for mapreduce operator which use distributed cache (thejas)
PIG-1548: Optimize scalar to consolidate the part file (rding)
PIG-1600: Docs update (chandec via olgan)
PIG-1585: Add new properties to help and documentation(olgan)
PIG-1399: Filter expression optimizations (yanz via gates)
PIG-1531: Pig gobbles up error messages (nrai via hashutosh)
PIG-1458: aggregate files for replicated join (rding)
PIG-1205: Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc (zjffdu and dvryaboy)
PIG-1568: Optimization rule FilterAboveForeach is too restrictive and doesn't
handle project * correctly (xuefuz via daijy)
PIG-1574: Optimization rule PushUpFilter causes filter to be pushed up out joins (xuefuz via daijy)
PIG-1515: Migrate logical optimization rule: PushDownForeachFlatten (xuefuz via daijy)
PIG-1321: Logical Optimizer: Merge cascading foreach (xuefuz via daijy)
PIG-1483: [piggybank] Add HadoopJobHistoryLoader to the piggybank (rding)
PIG-1555: [piggybank] add CSV Loader (dvryaboy)
PIG-1501: need to investigate the impact of compression on pig performance (yanz via thejas)
PIG-1497: Mandatory rule PartitionFilterOptimizer (xuefuz via daijy)
PIG-1514: Migrate logical optimization rule: OpLimitOptimizer (xuefuz via daijy)
PIG-1551: Improve dynamic invokers to deal with no-arg methods and array parameters (dvryaboy)
PIG-1311: Document audience and stability for remaining interfaces (gates)
PIG-506: Does pig need a NATIVE keyword? (aniket486 via thejas)
PIG-1510: Add `deepCopy` for LogicalExpressions (swati.j via daijy)
PIG-1447: Tune memory usage of InternalCachedBag (thejas)
PIG-1505: support jars and scripts in dfs (anhi via rding)
PIG-1334: Make pig artifacts available through maven (niraj via rding)
PIG-1466: Improve log messages for memory usage (thejas)
PIG-1404: added PigUnit, a framework fo building unit tests of Pig Latin scripts (romainr via gates)
PIG-1452: to remove hadoop20.jar from lib and use hadoop from the apache maven
repo. (rding)
PIG-1295: Binary comparator for secondary sort (azaroth via daijy)
PIG-1448: Detach tuple from inner plans of physical operator (thejas)
PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi
via olgan)
PIG-103: Shared Job /tmp location should be configurable (niraj via rding)
PIG-1496: Mandatory rule ImplicitSplitInserter (yanz via daijy)
PIG-346: grant help command cleanup (olgan)
PIG-1199: help includes obsolete options (olgan)
PIG-1434: Allow casting relations to scalars (aniket486 via rding)
PIG-1461: support union operation that merges based on column names (thejas)
PIG-1517: Pig needs to support keywords in the package name (aniket486 via olgan)
PIG-928: UDFs in scripting languages (aniket486 via daijy)
PIG-1509: Add .gitignore file (cwsteinbach via gates)
PIG-1478: Add progress notification listener to PigRunner API (rding)
PIG-1472: Optimize serialization/deserialization between Map and Reduce and between MR jobs (thejas)
PIG-1389: Implement Pig counter to track number of rows for each input files
(rding)
PIG-1454: Consider clean up backend code (rding)
PIG-1333: API interface to Pig (rding)
PIG-1405: Need to move many standard functions from piggybank into Pig
(aniket486 via daijy)
PIG-1427: Monitor and kill runaway UDFs (dvryaboy)
PIG-1428: Make a StatusReporter singleton available for incrementing counters (dvryaboy)
PIG-972: Make describe work with nested foreach (aniket486 via daijy)
PIG-1438: [Performance] MultiQueryOptimizer should also merge DISTINCT jobs
(rding)
PIG-1441: new test targets (olgan)
PIG-282: Custom Partitioner (aniket486 via daijy)
PIG-283: Allow to set arbitrary jobconf key-value pairs inside pig program (hashutosh)
PIG-1373: We need to add jdiff output to docs on the website (daijy)
PIG-1422: Duplicate code in LOPrinter.java (zjffdu)
PIG-1420: Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple (rjurney via dvryaboy)
PIG-1408: Annotate explain plans with aliases (rding)
PIG-1410: Make PigServer can handle files with parameters (zjffdu)
PIG-1406: Allow to run shell commands from grunt (zjffdu)
PIG-1398: Marking Pig interfaces for org.apache.pig.data package (gates)
PIG-1396: eclipse-files target in build.xml fails to generate necessary classes in src-gen
PIG-1390: Provide a target to generate eclipse-related classpath and files (chaitk via thejas)
PIG-1384: Adding contrib javadoc to main Pig javadoc (daijy)
PIG-1320: final documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1363: Unnecessary loadFunc instantiations (hashutosh)
PIG-1370: Marking Pig interface for org.apache.pig package (gates)
PIG-1354: UDFs for dynamic invocation of simple Java methods (dvryaboy)
PIG-1316: TextLoader should use Bzip2TextInputFormat for bzip files so that
bzip files can be efficiently processed by splitting the files (pradeepkth)
PIG-1317: LOLoad should cache results of LoadMetadata.getSchema() for use in
subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
(pradeepkth)
PIG-1413: Remove svn:externals reference for test-patch.sh and
create a local copy of test-patch.sh (gkesavan)
PIG-1302: Include zebra's "pigtest" ant target as a part of pig's
ant test target. (gkesavan)
PIG-1582: To upgrade commons-logging
OPTIMIZATIONS
PIG-1353: Map-side joins (ashutoshc)
PIG-1309: Map-side Cogroup (ashutoshc)
BUG FIXES
PIG-2067: FilterLogicExpressionSimplifier removed some branches in some cases (daijy)
PIG-2033: Pig returns sucess for the failed Pig script (rding)
PIG-1993: PigStorageSchema throw NPE with ColumnPruning (daijy)
PIG-1935: New logical plan: Should not push up filter in front of Bincond (daijy)
PIG-1912: non-deterministic output when a file is loaded multiple times (daijy)
PIG-1892: Bug in new logical plan : No output generated even though there are
valid records (daijy)
PIG-1808: Error message in 0.8 not much helpful as compared to 0.7 (daijy)
PIG-1850: Order by is failing with ClassCastException if schema is undefined
for new logical plan in 0.8 (daijy)
PIG-1831: Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf (daijy)
PIG-1841: TupleSize implemented incorrectly (laukik via daijy)
PIG-1843: NPE in schema generation (daijy)
PIG-1820: New logical plan: FilterLogicExpressionSimplifier fail to deal with UDF (daijy)
PIG-1854: Pig returns exit code 0 for the failed Pig script (rding)
PIG-1812: Problem with DID_NOT_FIND_LOAD_ONLY_MAP_PLAN (daijy)
PIG-1813: Pig 0.8 throws ERROR 1075 while trying to refer a map in the result
of eval udf.Works with 0.7 (daijy)
PIG-1776: changing statement corresponding to alias after explain , then
doing dump gives incorrect result (thejas)
PIG-1800: Missing Signature for maven staging release (rding)
PIG-1815: pig task retains used instances of PhysicalPlan (thejas)
PIG-1785: New logical plan: uid conflict in flattened fields (daijy)
PIG-1787: Error in logical plan generated (daijy)
PIG-1791: System property mapred.output.compress, but pig-cluster-hadoop-site.xml doesn't (daijy)
PIG-1771: New logical plan: Merge schema fail if LoadFunc.getSchema return different schema with "Load...AS" (daijy)
PIG-1766: New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite (daijy)
PIG-1762: Logical simplification fails on map key referenced values (yanz)
PIG-1761: New logical plan: Exception when bag dereference in the middle of expression (daijy)
PIG-1757: After split combination, the number of maps may vary slightly (yanz)
PIG-1760: Need to report progress in all databags (rding)
PIG-1709: Skewed join use fewer reducer for extreme large key (daijy)