-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path100DoC Log.txt
5262 lines (4358 loc) · 373 KB
/
100DoC Log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Format: #<day number>.<attempt number>. <date>
Note: running log of progress and notes that may not be code related. Also, I'm not the best speller.
Note 2: I think outloud, so the posts may be wordier then others, because I'm thinking out the process and alternatives.
#0.1. 3/18/2018:
- Project name determined thanks to a friend
#1.1. 3/19/2018:
- Fleshing out docs and making a main function. Plan for the first couple days is to get the project started, do some parsing, how to get some values from logs, and building habits
- I'm not exactly promoting this with the hashtag right now, just to get this going. I don't feel I need motivation, just less stuff to do. I constantly have ideas and write them down, but lack the time to work on it. It makes me easy to distract, so I want to build a habit naturally with everything else I have going on before I start getting "keep going!" tweets and messages.
#2.1. 3/20/2018:
- Almost missed this day (the commit for yesterday and today say they're a day later, but it's because I start late the prior day). #HabitForming
- Already a bit overthinking it because I started trying to do streaming log parsing... just parse the log right now.
- Initial log format is XML. I'm gonna have to make a config file to determine what properties describe the log, just so the log format isn't hardcoded... but that's for a bit later. Maybe tomorrow?
- Initial plan was for a C++ program... but XML parsing, streaming logs, json parsing too (config and I'm sure someone logs in JSON) are not pleasent to work with in C++ and since I want to get something running instead of dealing with frameworks and C++ package systems, I'm just gonna start with C#
#3.1. 3/21/2018:
- I wanted to try and start processing the actual log data, so I finished the XML parsing part. Should still work on making it stream the input. Noted for later.
#4.1. 3/22/2018:
- Late commit again.
- Plan: initial orginization of log output, get streaming XML working (need option parsing so I can turn this on only if needed), have log values determined by config (requires JSON parsing)
- I started late, so I only go the first one done and will do the others tomorrow
#5.1. 3/23/2018:
- If yesterday was late, this one is later...
- I load a config file of info about parsing logs. As it's very late for me, I will test this tomorrow.
- For the "temp" C# project, I decided to be a little curious and use .Net Portable Framework. I'm kinda regretting it. So many APIs just aren't available. I get it's purpose and it should be used, but going 2 levels deep to open a StreamReader? The main library just opened one internally anyway.
#6.1. 3/24/2018:
- Late start because I was finishing something up. Wrote https://github.com/rcmaniac25/setbuilder to help with that task.
- Config for log parsing now used
- Made up a syntax (like XPath) to get out XML data... maybe I should just use XPath? Thoughts for another time
- Changed version, as this isn't 1.0 yet.
- Decided that the XML streaming is a bit too large for doing at this initial phase: why?
-- The log files I initially imagined using this on are in the multi-GB range. This inital impl. loads everything into memory, and loading one or more multi-GB logs would probabaly end poorly.
-- At the same time, I can see two uses of this system (when it reaches 1.0 status):
1. Tracing across multiple log files.
2. Manual "debugging" of a log.
-- In both cases, the full logs aren't needed upfront. It's like a sliding window, the system/user is only looking at one part of the log at a time to find/track something.
-- So ideally, the system should load and unload as needed. "Read the first 100 lines of logs. Ok, the user is scrolling, load the next 100. Ok, they got 1000 lines in, load another 100 and unload the first 100. Wait, they did a search... load but don't persist the logs unless it has whatever they searched for".
-- I'd assume info will be cached so it's not constantly parsing. It may be useful to do some tricks, like loading whole sections of the logs into RAM but not parsing them, that way parsing is fast when it is needed.
-- While I'm thinking about some of these things, I will eventually need logging and unit tests. I know I eventually will need to parse source code, if anything, for determining the end of a function and maybe when a function is container within the calling function. Log parsing, code parsing, etc. should all be seperate components so new ones can be used as needed...
-- ...I'm starting to overthink things. Just get some useful parsing done. Tomorrow.
#7.1. 3/25/2018:
- I did not accomplish useful parsing. Instead I refactored code, got a option parser included, and allowed XML parsing to be more "advanced" for each element instead of just the log message itself.
- PSA: Socialize. You'll get good ideas by talking with people about what you're working on. In my case, I didn't do more code (and much of anything else) because I played Fortnite for the whole day. It was fun, especially with friends.
#8.1. 3/26/2018:
- I realized to bring about the idea that's in my head, I need a better way of processing through the logs. This makes me think "database" so I can do queries, but I'm not ready for that point.
-- Basically, gather all the logs together, then do a "get by thread ID" or "get by function" or "get by <random attribute>", and then breaking it down further. Hmm... this sounds like a LINQ query. Fine will we're in .Net, but may be more difficult in C++ (unless I use Rx/Ix)
- This change "broke" the inital printout system I wrote, so I needed to get it back to the inital state it was previously at
- Had a few more log attributes that I can parse. I should look into what other loggers produce/log, so I have a general common set of values and then add some "fancy" parser for supporting log values that aren't common/custom. Also, I still haven't added support for key/value attributes and parsing types of attributes
#9.1. 3/27/2018:
- Orginized the files into folders and added one more parameter. Also fixed a bug where missing parameters could cause the program to crash
- Needed a bit of abstraction for reorginizing sources
- Just realized I finished 10 days of 100 Days of Code. Woot.
#10.1. 3/28/2018:
- Late start, but today's goal was parsing attribute types (and inadvertently formalizing the "path" syntax so I don't have to do string parsing the whole time)
- Needed to touch a little more code then I wanted... but it works, and in theory could be faster then the string parsing from before
- The path system could be expanded... but at that point, it's probably better to revisit the idea of using "paths". Speaking of which, a side reason for formalizing "paths" is because when JSON and other formats are added, we don't want to need special configs for each parser type (though we should support it...)
#11.1. 3/29/2018:
- Added key-value pair support for context attributes, which should simplify things later when we want to trace by a value
- Wasn't working... found out I forgot to actually register the attribute. Yay, copy-paste bugs
#12.1. 3/30/2018:
- Next couple days are with family, so I can't get as much done as I would like...
- Added a printer class so I can support files, console, etc. printouts without needing to rewrite systems
#13.1. 3/31/2018:
- I did one more abstraction to get the parsers and printers from a factory.
- I expect 3 reactions: why? ah, good decision, and premature optimization...
- It's not premature optmization... it's premature generalization. Something of equivilent badness, but only if it's actually going to have an effect.
- In this case, and to awnser the "why", is to shrink the classes down a bit, allow for easy testing with mocks, and to prepare for the inevitable expansion of functionality.
- See, I thought up some ideas on how I want this to operate (I should write them down in some document...), and they all revolved around the GUI. But I know some of the people who have voiced interest are CLI users. Others only use a IDE like Eclipse, VS Code, or Sublime (depending on how you set it up, it can act like an IDE...)
- I can write a single app (that I would still need to move to C++) that has tens of hundreds of options and configs, letting you do anything you wanted... or I can try and be "smart". "What file type are you using? XML? Ok, I should get the XML parser." "I'm a CLI app, so I will print to the console... wait, you specifed you wanted a file, so I will now print to a file"
- How about for "how do I know I didn't break XML parsing by switching from parser X to parser Y?" Easy: tests. But, I need to expose interfaces and internal structures to change values... or I can mock them, abstracted by interfaces, to control within the actual execution class.
- And that darn porting thing. I want this in C++ so instead of requiring *nix users to install C++, or have Java installed... or some other framework. How about "extract a program from the zip, and it runs". Like a good-old portable app.
- Here's what now happens:
-- 1. Bunch of small classes, as independent functionality and decision making is broken up, makes it easier to port eventually. It also allows for making focused classes, such as one that says "what do we need to parse" and "how do we need to present it" instead of a giant code block that's all tangled together.
-- 2. Tests now can be more complex, test more functionality because it doesn't have to test functions with 30 if statements in them, and actually be written without modifying the real code. Need to start this while things are small...
-- 3. Lastly, every time I say "let me add a new feature", I don't have to "make it fit". I should have a class that defines that functionnal "idea" and where it fits in.
- TL:DR: I didn't need to do this now, but it will be easier now then when the code is already tangled.
#14.1. 4/1/2018:
- Moved a couple classes that I didn't get to yesterday, since the factories imply the internal/implementation details don't need to know. Trust the interface.
- PSA: don't kill yourself coding. It's never worth it. Take it from my own experience: I keep wanting to get a bunch of stuff done, and end up staying up late. Doing that and getting up late, so you get the sleep, is fine (so long as work, school, whatever is your real life allows it). In my case, I have been getting what I mentally averaged to 4-5 hours a sleep a night... for like 6+ months. It's made me more tired, sluggish, harder to focus, and I've gotten more sick then I'm used to. I need to break this habit...
- ...which brings me to: I'm sick and need rest. So I did an additional refactor which got rid of a redudent class, though at the detriment of being able to insert mocks easily... (didn't I write about this yesterday?). I will probably reintroduce the class or an equiv. component.
- For now, I sleep. Tomorrow, I'd like to plan further moves and maybe get a public list of tasks I'm trying to do.
#15.1. 4/2/2018:
- PSA: being sick is annoying. Take care of yourself.
- Today I worked on docs... a needed evil
#16.1. 4/3/2018:
- More doc work. I had a timeline to follow, but it was agressive when I wrote it, and probably can't do such a thing (and ideas for how/what to do also became more concrete between then and now). Simplified:
-- Day 1. figure out program name
-- Day 25. Have a library that can parse a large-ish log quickly
-- Day 50. Initial GUI that can visually show the flow of the logs
-- Day 75. Have a working library for parsing, GUI for visually debugging through the logs, support for RPCs and message tracing, have let others (I know directly) try it out and see what they think and what they want/have issues with, and start working on an offical CLI and search capability
-- Day 100. Version 1.0, with a bit of polish, integration with a real debugger (at the time, I assumed to augment a debugger with log info... but I'm not sure how much that would actually be needed), and possibly work on any stretch goals I could think of along the way.
- Talk about agressive...
#17.1. 4/4/2018:
- Thought a bit and opted to just split the program and library portions. There was no good abstract interface that would allow unit testing AND be usable by more then just one class/program.
- Instead decided to start trying to get unit tests setup, starting with spliting the program and library.
- At least add NUnit reference
#18.1. 4/5/2018:
- I've been struggling with the "dedicate 1 hour" for the 100 Days of Code project. First it was habit, next sickness, now is the disease all software engineers have: time. I look at a clock and say "I have time" and then when I next look, it's time for bed.
- Realized I missed one attribute within the logs I'm testing. Also realized the names of those attributes are not always the best, so added some docs around them.
- Wanted to get the test project setup with at least one test. Didn't like the tutorials for using NUnit within Visual Studio, but found http://www.dotnetcurry.com/visualstudio/1352/nunit-testing-visual-studio-2015 which I liked.
- So I got the test project and what would be test #0... and Visual Studio is complaining about the lib not having a .exe or .dll extension (according to the bin dir, it has a .dll extension). If it was earlier, I'd try to fix it. For now, tests pass
#19.1. 4/6/2018:
- I'm sad... I'm sad because I can't get a simple test project to work. I can and should upgrade to VS 2017, but I fail to see that as a requirement to get these tests working with .Net Core. Nearly every tutorial talks like it's so easy "add this dependency and your done" but know what, I add the dependency and my tests don't show up in the test explorer.
- It doesn't help that I'm half asleep (if you ever meet me IRL, you'll find out I'm perpetually tired. I have a small window where I can focus and get stuff done. Today I used that period to play video games)
- Welp... looks like I get to reset my progress back to 0 because I couldn't get anything done today. I am very unhappy with Visual Studio, NUnit, and the tech results I'm getting back from doing searches. This should've been "Visual Studio wants to run tests, NUnit looked for NUnit references... found them, VS is now happy to show the tests". But this isn't the case.
- I'm not calling it defeat but this is truely annoying. It's stuff like this that makes me drop projects... dependency? Right click dependencies and add to it. Good. Unit test? Seems like a half-hearted attempt to add test support but not maintain consistancy throughout (and I'm a stickler for consistancy). Packaging? I'd like to offer a single sentence response, but there are too many variables for it. Zipping a directory doesn't count, that's a hack.
- Still firm belief this is because I'm really tired.
#0.2. 4/11/2018:
- 4/7/2018: Took a short break, since I'm starting the countdown again. Namely, played around with modding a game I like
- 4/8/2018: Made the mistake of starting late again... today (and probably tomorrow) was spent trying to debug the NUnit issue. Visual Studio doesn't always show the logs for running tests, but I managed to get it a couple times and they all say they can't find "nunit.framework". My challenge now is that I'm looking through the code paths to get to the "error", and the stack traces don't match what is getting printed and what files I have. Reading, nunit (and others?) seems to suffer from VS cache issues that requires deleting the old cache. Tried, didn't work. But given the stack trace and the code I'm looking at, something's definitely wrong.
- 4/9/2018: Did a deep dive into what DLLs are getting loaded. I started too late again, and still have no lead. It's trying ot call NUnit.Engine.Runners.DirectTestRunner.LoadDriver and that just doesn't exist. I cleared caches to ensure it wasn't loading it from a strange location... I don't know where it's getting that function from. The journey continues...
- 4/10/2018: I tried a couple things... and everything has failed. VS UI, command line, extensions, packages, different versions, etc. The only thing left is to upgrade to VS 2017. It's very stupid to me. NUnit with VS support is advertised as "Supported VS 2012 RTM and newer" and yet every test, post, discussion about it is on VS 2017. I'm only one version older... VS 2015, Update 3. I kept putting off the upgrade due to laziness AND not having a need. The posts all speak of the "new" XML-based project configs making things so much easier(?). It's all very stupid and a waste of time. In fact... it's a perfect reason tests exist. At work I aim for 100% branch coverage because I'm crazy... but when I get told "your code doesn't work" I can respond back "no, you did something wrong". Then I (or a coworker) goes and automates them. When I look at the NUnit code, I see a handful of tests for the adapter. It makes me wonder "did anyone try automating the tests for VS 2012, VS 2013, VS 2015, and VS 2017 at the same time? Or did they just run the tests after telling all other contributors that they were working in VS 2017 and called it a day? But... enough of rants. Let me get VS 2017 so tomorrow I can see if I can actually DO something, and get this process going again. If it still doesn't work, I'm dropping NUnit, sending a message to the devs (and making a bug), and probably moving to xUnit or similar. No need to waste any more time. I needed to start a new attempt because tests didn't run, and from what I can tell, it's the framework/adapter that isn't working. "Famous last words" but if I upgrade to 2017 and it magically works... well, was my assumption wrong? For tomorrow.
- So after a lot of pain, needing to restart my count, and run upgrade to VS 2017... tests actually run.
- So, https://github.com/nunit/docs/wiki/.NET-Core-and-.NET-Standard half tells you what to do. I followed https://docs.microsoft.com/en-us/dotnet/core/testing/unit-testing-with-nunit via CLI and some manual editing and got everything working as I'd like.
- Looking through docs now, I see the key line I missed: "FAQ -> Why can't my tests target .NET Standard? -> "...it cannot be .NET Standard, it must target a platform, .NET Core or .NET Framework."
- See, if you look at my log for #5.1, I mention .Net Portable Standard. Diving into why tests aren't working, I have since learned that .Net Standard is equiv of an interface (IDotNet), and .Net Core the impl. (DotNet). I didn't know what the difference was or why things weren't working. I am both right and wrong with what I said yesterday... the framework didn't work... but the devs don't know why and are hoping it will one day work. They blame Microsoft for wanting a hard impl. of .Net (.Net Core or .Net Framework). In the end, it was a giant waste of time. I wonder if there is some way to determine at runtime what a .Net test project is running under (Core, Standard, or Framework) and say "hey, you can't run as Standard. Needs to be a different base". It's probably something that someone has done and said "this is too much work... just write the docs" but, as is typical for docs, is in a bad location (it should really say that here: https://github.com/nunit/docs/wiki/Known-Problems) and people all it a day. Well, now I know and hopefully if someone looks at my project and has the same issue, they eventually find this log and go "yea, that should be written in a different location". I'm not exactly doing anyone a favor by not contributing a fix... but I honestly don't know if that location is the right place. It's just the one I kept coming across.
- Right as I was about to start, I noticed that the libraries have an attribute compiled into them that states the framework used. So my automated version _could_ be done. Have the test running get assembly attributes, find the framework one and, if it exists, check if it's a framework that it knows about/can use. If not, then error with a sane message. I'll note this as something to possibly contribute to NUnit at another point of time.
- I can now start working on all the other tests tomorrow.
#1.2. 4/12/2018:
- I started off tests by... reading. RTFM. If I didn't, I would've just did the "classic model" of "Assert.Equal(X, Y)" which is not what NUnit prefers. So I did a bunch of reading...
- ...and redid the existing test.
- I started late, so I didn't get to anything else. Oops. Time to get back into the habit.
#2.2. 4/13/2018:
- Wanted to spend more time on this, but I need to be up early.
- I was trying to figure out what to write a test for next and realized I had defined an ILogEntry interface, used it in ILogRegistry, but... the actual GetBy function (basically a DB/dictionary query) had hardcoded LogEntry instead of ILogEntry. So that needed to change.
#3.2. 4/14/2018:
- So I always start late and commit on the following day then the one I log for. I will probably still do that, but as I had to do stuff early yesterday, I need to commit today and "tomorrow" so it doesn't appear I skipped a day.
- Started working on more tests. First up: factories. Gave me a reason to add a mocking library (NSubstitute) and to use it. I need to read the docs a bit. I'm used to gmock/gtest, where I expect something to be called and then it complains when it is called (and wasn't expected) or isn't called (and was expected). NSubstitute seems to work off a "check after the fact" but won't tell me if something got called unexpectedly. I feel like that isn't actually the case, but I haven't gone through the logs (or written something that does that). Yet.
- Started working on LogEntry tests.
#4.2. 4/15/2018:
- More tests time. I actually gave some thought and did TDD for the remaining tests in LogEntryTests. So now things work as expected.
- Next up: LogRegistry tests. I realized I don't have a LogEntry factory and question if I need one. I can see different parsers and printers... but what's a different log entry? Yes, I can unit test it slightly easier, but at the expensive of adding a whole new factory. Given how easy it would be to add (add interface with a create function, pass into registry constructor, use...), I'll skip it and can add it if needed at a later time.
- TDD time again... this is a habit I need to get into in general. At my work, it's always easy to spend a day or two working on a class, then spend a few hours on tests and know all the paths to test. But the point is to write how you expect the program to work, not how you know it already works.
- I got a couple tests, only to realize my implementation work as I expected. Initial thought was "what am I doing wrong", then looked at the impl. and realized "I don't want that..." (getting logs by their message or timestamp). See the "todos" in LogRegistryTests.cs
- I also made the mistake of doing too much in a unit test, and it didn't work because of my GetBy impl. anyway
#5.2. 4/16/2018:
- Wrote a lot more LogRegistry tests. Half TDDed, half "was updating something in LogRegistry when I noticed a case I coded to handle, but didn't write a test for".
- The big question for this evening is "return error, or exception?" So this has been a struggle recently for me. All the time I did C# (Java, F#, and others), throwing an exception "is what you do" (TM). Then along comes C/C++. C++ has exceptions, but I've become a bit more "I want a good, clean, end-to-end opcodes" when doing native development. As such, I've seen how Windows and Linux handle exceptions, and it's not pretty. I've also learned that "-O3 optomized" tends to not mean squat if your code isn't in a specific format. Also, C++ is a lot easier to forget to cleanup a resource that was mid-process. Think:
int main()
{
try
{
myFunc();
}
catch
{
}
return 0;
}
void myFunc()
{
std::ifstream in("log.txt", std::ios_base::in);
doSomethingCool(in);
}
void doSomethingCool(std::ifstream& input)
{
//...
throw new std::exception("oops");
}
- I'd expect that the input stream would cleanup. But... I don't know. I'd like to not know, because it would make programming easier (I.e. assuming that stuff gets cleaned up for you). It's not naive, it's taking the human element into account when designing a language.
- Meanwhile, in my preferred native language: C... what's an exception?
- So with plans to move this to C++ at a later point, I had to decide: do I write C# functions without exceptions or not?
- My answer was: depends. If what I was dealing with something had a return status (either from the function itself, or as a different function), I didn't throw an exception. If it didn't, then I threw the exception.
- The way I figure it: if I'm saying "add this value to the log entry" and I want to know if it got added, instead of getting said value and comparing, I'll just look at a return code. So for all failures, return according to return code. Else, I have no idea I did something wrong... so throw the exception.
- Bonus: unrelated to ^^, I added a GetByTimestamp function to LogRegistry and changed the return from GetBy from IDictionary<object, IEnumerable<ILogEntry>> which allows me to support getting by log message* AND still works in pretty much any way I expect.
- * so, to make a grouped dictionary, I need to go through every single log... working with yield would be better but it's still going to be very challenging to deal with large logs.
#6.2. 4/17/2018:
- I wanted to get many things done today... and it came at the fall of the unit tests.
- I didn't miss a commit or need to start the countdown again, but plans will have to be tomorrow.
- Instead I added a couple more tests and ensured LogEntry supports Equals, GetHashCode, and ToString.
#7.2. 4/18/2018:
- I typo-ed dates in my logs. I was panicking for a moment as I thought I somehow missed a day of work.
- Spent most of the time trying to see if I could get code coverage. It's something I look at work, possibly too much. Not because I'm required to, but because I want to make sure if I do anything. If I sneeze and hit a key on my keyboard, I don't cause errors. True story (sneezed, hit a key while typing a string. Compile worked, but runtime failed. Eventually found I was looking for the wrong input file). So, I write a lot of tests and want to make sure everything is covered.
- Long story short: maybe in the future, but not now. VS2017 Community doesn't support code coverage. OpenCover looks nice, but I need to make sure to have the correct command line. NUnit keeps crying it can't find nunit.framework and Google points to semi-relevant responses, but none of them "NUnit cannot find framework" so I couldn't do anything without spending much too long on it. Instead, I looked and found "dotnet test" will do the work without a hitch... except the only output format it supports is trx... and OpenCover doesn't have a clue what to do with those. And NUnit has had a year-long ticket open for adding support with exporting the test results via XML in the VS adapter. So it's one step foward and one step back.
- Planned to do some additional work, but will push it to tomorrow.
#8.2. 4/19/2018:
- Suddenly, nap. Thus: instead of doing the same task I wanted to do yesterday, I did something else: testing Equals, GetHashCode, and testing ToString(?) for LogEntry
- I tried something a bit different in I used a TestCaseSource input... then vastly overcomplicated it (see the code for why. If you don't see a complex test impl., yay?)
- I also learned about "TestCaseData" and wonder if I can/should use it. Right now I say "maybe" but will get tests working first.
- All this to get as much test coverage (sans using a tool that tells me the test coverage), so that as I do more work, I can ensure everything is working AND adding new tests is simply adding a test case rather then a whole writeup. Word of advice, taken from work experience (and it's written about enough in motivational blog posts): just start. In the case of unit tests, if I wrote everything and said I'll do it "eventually", it will never happen.
- The new challenge: VS/NUnit has decided that the 2 "Equals" tests I wrote for LogEntry are weird and did the following: made a warning/error in console about "converting" the tests, ran all the tests and said they passed, told me the two tests "did not run", told me the whole test suite didn't pass. Something's weird as I managed to get "pass", "did not run", and "fail" for the same tests at the same time.
#9.2. 4/20/2018:
- Ignore starting working on this late, I had another issue: God of War came out. Between that, Fortnite, and Parkitect (my current game collection), and a little thing called "life", I watched 8 hours fly by in an instant. This project must still go on... but a challenge:
- So I want to cover as many cases as possible. An ideal in programming, to me, is that something "just works" or works as expected. So if I create a LogEntry, I expect that I can stick them into a list. I expect I can "simply" print them out. I don't expect to use some custom formatter, LogEntryList, etc. In .Net, that means supporting ToString, GetHashCode, and Equals.
- C# makes it nice. Override those three functions, and .Net will keep running without a hitch. Don't override them and... things will still work, but you will hit cases that what you expect doesn't work, or that something might not work at all. Output a #6.2 version of the library to a GUI List and you'll find every entry is "LogTracker.Log.LogEntry" instead of a unique log message.
- Sets, Dictionaries, comparisions, ==, and many others all have the same challenge: they use one of those functions.
- But there's a different challenge: C++. I will continue to be adamant about this eventually being written in C++. C++ doesn't have the niceities of C#, so when I say "give me a set of LogEntries", it's going to ask me for something to print out the hashes of each LogEntry or hope the default works.
- So I need the values... AND I need them to work. That last part is where all the tests come into play. I want to test as many potential cases as I can without going crazy. (At work, our QA team has been working to do more "orthogonal testing" so it's not writing hundreds of tests).
- There were 3 sets of tests: non-LogEntry, LogEntry, and "complex" LogEntry. The first were accomplished with a simple array. The second required more details... and I wrote "TestObject" to be a nice wrapper for them. But when I've gone into the complex cases, it got, well, complex.
- I half wrote updates before realizing the test is not going to work as expected and will require a fair amount of rewriting. Instead, I'd rather figure out a good way to write these tests (that blends with NUnit?) so I can do my three sets of tests, but without the test itself going "easy, intermediate, hard" to implement.
- Doesn't help that I might have to do this 3 times: Equals, ToString, and GetHashCode.
- Bonus: I may have figured out an issue with tests yesterday. NUnit (or Visual Studio) use names to define tests. My helper class did a printout of LogEntry, which prints a date-time. Suddenly, all my tests pass. But then I run again a little later, and it's a _new_ set of tests, and some generic case claims it didn't run. BUT WAIT, there's more: I... scrolled down. I never scrolled up to see a couple tests were failing in general. That has been fixed, but the others haven't (and I haven't tested tests with the same name, but different classes, to see if they all show up or only one).
- For tomorrow...
#10.2. 4/21/2018:
- Yay, back to double digts in progress.
- I did a full rewrite of the Equals tests. Now with a builder for the test data
- "A BUILDER?!" says a set of devs... yes, a builder. The builder code is slightly longer then the "still in progress" TestObject (since deleted), but instead of `TestObject(arg, arg, arg, arg, arg, arg, arg, () => fancy work)`, it's closer to an explanation of what the heck it's doing.
- Also, the builder outputs a TestCaseData, which has it's own set of functions. So I'm able to utilize it to do additional work for me.
- Long story short: I can do more complex tests with less work. Only hacky element is adding attributes to the LogEntry to test against... it was either that, or make the whole thing Lazy and that would just be ugly.
- Minor refactor so test helpers are located elsewhere
- Now... fix the Equals functions so the tests pass
#11.2. 4/22/2018:
- A quick test laid to rest that "same name tests won't be shown as seperate tests". They show up with the same name, but indicate they have different sources.
- Attempted to setup GetHashCode tests... but a couple tests are failing...
- Attempt #2... this time, return the actual hash code. Somehow, the first attempt had 2 failed tests. This one has 8 failed tests. I don't know if testing hash codes is a thing. I guess it's worth testing, but I also worry about how Dictionary GetHashCode works. If it's implemented somewhere, I haven't found it. I worry it's producing some random results for me (attribute order?) but regardless, this isn't working and I can't figure what went wrong. So I'm going to trash it.
- Realized I test two attributes that are tested a second time later. So changed the Equals operation to get the values directly from the attributes, and to test the other values.
- I opted not to do a ToString, because the Printers will be doing the work, not the ToString.
- Now to ginally get back to what I really wanted to work on: TDD. I want to write all the remaining tests, without running them and without fixing the failures. Then test and fix failures...
- ...and I got 1.5 of 5 tests in before I need to call it quits. Sigh.
#12.2. 4/23/2018:
- Finished GetBy tests for LogRegistry. Technically still have to make tests for the actual implementation, but that's for another time.
- Nevermind. I wrote the remaining tests.
- Next up: ConsolePrinter. I want to test this over IOPrinter because it will actually be used. If something gets changed in IOPrinter, suddenly there will be X tests breaking, where X is every implementor of IOPrinter.
- Actually, writing that, I probably should do it for IOPrinter... Or at least have some tests "for" IOPrinter derived classes, but use the implementation (IOTester.Test(impl, outputGrabberDelegate)). This way I can run the tests on the impl, which will actually get used directly, and NOT have to write 10 tests the same way.
- First tests are... confidence tests. What if there are no logs, and just make sure I can grab console output.
- Both work, but the console output test is... not plesent. Can you imagine every test needing to check the NewLine? Probably be best to specify a new line...
- But wait, there's more! I was under the impression that NUnit ran in parallel (at work, we started non-parallel, then wanted tests to run faster, switched to parallel, and watched many break).
- As such, I wrote all the tests (at least I think so) with the expectation that they would run in parallel. Since the ConsolePrinter test would be changing the global Console.Out param, I couldn't have that run in parallel.
- I marked the test as NonParallelizable. Curiosity, I looked into it and... apparently NUnit doesn't run in parallel by default. Probably a good default, but it now meant that I wanted everything to run in parallel (unless marked).
- You can apparently mark assemblies, classes/fixtures, and methods as Parallelizable. I didn't want to go nuts right now, so I marked only fixtures (at an assembly level) as parallelizable.
- But there was no place to put the attribute. Generating the test project apparently meant it would generate the assembly info at build time... but left me without a place to put assembly level attributes. Luckily, diff tools to the rescue.
- There are attributes that tell the project not to generate the assembly info, and that left me with the ability to write my own.
- All is good in the world. Next up: writing the rest of the ConsolePrinter tests, and doing the work mentioned above.
#13.2. 4/24/2018:
- Specify our own newline, so changes in platform are easier to handle
- Abstracted IOPrinter tests
- Started working on IOPrinter tests... this is going to be either annoying, or complex.
- ...and would you look at the time. *cough* Yea, it's late (again) and I want to be able to focus on this then to be half-asleep while trying to ensure print outs occur as expected
#14.2. 4/25/2018:
- Started off trying to finish the IOPrinter tests. Made good progress, if not completed it.
- And... not feeling good. Occured just after I ate some food... that can't be good. I'm gonna have to call it a night early (unintentionally this time)
- One more, different sources
#15.2. 4/26/2018:
- Late day... probably should've done something before playing games.
- So right before I finished yesterday, I saw I had opened a tab about GetHashCode and it, like StackOverflow and others, basically said: if Equals is true, GetHashCode should too (hey, that rhymes).
- I added a GetHashCode test to the equals tests. If it's not equal, it won't test hash code. If equals, it will test hash code.
- After a short thought, I will add the GetHashCode tests, but they're simply going to be comparing the values. The && I was doing in the equals tests would mean that "equals = T && hash code = F" would tell me the test failed, but I wouldn't know where. Likewise, "equals = F && hash code = T" would mean we can have a colision between LogEntrys, but would never know because the && made it false. So a seperate test would be better. Will do this another time.
- I would've liked to start implementing the tests for ParserUtil (at least the InternalsVisibleTo is working), but as stated: it's late. And stated at another time, don't kill yourself over code. I haven't been too good at that lately. Make myself too busy.
#16.2. 4/27/2018:
- I'm still standing. Somehow. I could only spend a portion of time on this, so I started implementing test cases for ParserUtil.
- I realized part way through that I could use TestCases (instead of test) to vastly increase the number of tests without making the test file giant.
- (You don't realize the difficulty it was to write ^^. I need to go to bed. Late night D&D didn't help with being tired)
- For some reason, doing invalid tests resulted in CS0051 coming from C#. I tried changing the values, but C# continues to complain. When I didn't do TestCase, I didn't get errors... but I also didn't try compiling, so it may have happened there.
- Regardless, this kind of work should be easy. Just need to _not_ be asleep.
#17.2. 4/28/2018:
- Fixed the CS0051 issue with a test
- Added the string and int tests for casts.
- Was preparing to work on Key-Value tests, but realized it would be a bit more work then I have (the joys of working on this, late, again). I will do this tomorrow and start on testing the parser itself
#18.2. 4/29/2018:
- Right after writing yesterday's note about Key-Value tests, I realized "this is basically the same as the LogEntry tests, but with a different expected data. Instead of writing a whole new builder, I should just extend the other one."
- This was followed closely with "Wait, even better. Common logic in a parent class, then simplified top-classes". Then I was done for the night.
- I went about to do just that... and basically wrote a whole new builder. The abstractions are useful, as it will make it easy to create more advanced tests (I'm imagining XML parsing tests, and maybe GUI work too? We'll see when the time comes). The basic need is for tests that produce non-const data AND takea variable input.
- Simple input or complex setup means needing individual test cases. But complex input can't be compiled in, so it would need individual tests. But then it's a lot of copy-paste and/or a helper function to do the work. The TestCase attributes simplify testing.
- Casting Key-Value tests needed some additional helpers to make these work better. Also, adding an extension class made me very happy... because it meant that I didn't need to have some ever expanding base class AND it meant I could now do something fancy (see TestDataBuilderExtensions.For and explain how to do that in a base class. As in, C# syntax won't allow you to have a base class do some action and return a child class. But it can do it as an extension function)
- Now... the tests I came up with. Most are obvious, the last couple are going to be a pain. I put them in because I can easily see logs coming in that go "uuid=XXX-XXX-XXX; name=Smith; userData={\"name\":\"John Smith\", \"userNotes\":[\"I'm debating on if the data should be A;B;C or A,B,C. I need the result to be key=value and...
- ...that isn't going to be pretty doing a naive split right now. I'd rather the parsers NOT have to have to handle the complex cases individually. Instead, do it in the cross-parser utility classes.
- In addition, in keeping with "TDD" (notice the quotes, since I haven't been doing to good with it), I can plan these cases now and worry fix implementation later.
- It also sets up later implementations/tests such as "implicit parsing" (I don't have a better name). At some point in the future, I'd like to be able to get log data and, if the log data didn't have a specific type or there was a type mismatch, have the system determine the data type ahead of time.
- A parser (utility) that can split data without mixing up quoted data means that I can take the sample from above and do an implicit parse of the userData to JSON.
- I'm getting ahead of myself. I've spent a long time on making UT. I hope it all pays off in the end, so I can build and know all the old code still works.
- Also, these new tests broke the 100 tests count.
- I didn't get to the parser testing or ParsePath tests, but I hope to get to those within the coming days (one of those days will be catching up on movies, the other will be watching Infinity Wars. So I expect both of those to be slightly smaller updates in an effort to get to get enough sleep)
#19.2. 4/30/2018:
- So when I said "smaller updates", it ended up being very small.
- I started looking into the string split. This seems to be a perpetual problem with strings. Or at least I always seem to encounter "split a string, but if there's a quote inside the string, don't split on that"
- This might be something I spend one day on, then leave it for another time.
- In the mean time, I fixed the null case and added an additional seperator key (gut feeling is this should be a config value)
- I also added ever increasingly complex cast key-value tests...
- Note: I looked at https://stackoverflow.com/questions/554013/regular-expression-to-split-on-spaces-unless-in-quotes and it looks good, now I just need to use different seperators
- Note 2: These are the test strings that are failing because of quotes (without the C# escaping):
1. "key=value"="test=pain1"
2. "key=value;oh=boy"="test=pain1"
3. "key=value;oh=boy"="test=pain1";yep="this=pain2"
4. "let's \"use=more\" quotes"="oh boy"
#20.2. 5/1/2018:
- Oh snap, we passed our earlier "best number of build days"
- I started working on the splitting work. General idea is go through the string and mark the contents as either quoted or not. Then iterate through non-quoted and group everything by the delimiters that split up different vey-value pairs. Lastly, go through each of those groups to split it by seperator and create the key-value pairs.
- The hope is that reduces and simplifes the work needed without doing a bunch of crazy regex or writing one massive function to do the work (always a bad idea)
- Also, it occured to me: this is all overkill. I want to finish it as the above concept simplifes it to a point where I can focus on individual pieces of logic without needing to know the whole (though I've done the problem in reverse to get to the different "how" steps).
- If this doesn't work, then I'm back at my starting point. I disable the old logic, make a note to work on it (or maybe at some point add stat gathering and see how many logs actually have quoted strings), and move on. For now: one attempt.
#21.2. 5/2/2018:
- So I didn't mention it yesterday: one other reason I want to get this working is I always seem to hit this problem, but I never get a real working solution. If I can get something working, I can say I actually have a solution and can reference it whenever I hit this problem again.
- So a goal I had with CastKeyValueSplit function is to use data once. It was kind of a challenge.
- This is important to note, as I thought about how to do the next step (I got taking a individual group and turning it into a key-value pair. Now I needed to go from a single list of strings and turn them into different groups. Essentially the individual sets that make up a single key-value pair).
- ...it involved multiple loops (think: iterate; skip(x) then iterate; skip(x+y) then iterate). But I could only imagine iterating through a process, stopping, then doing it again but ignoring the results until we get to a certain point. Also, I'd need to count elements and a number of other variable creation steps to know where I was.
- So, how do I do it in one loop? Enumerator (instead of Enumerable). In order to make this work, I need to be able to return leftover data outside of the processing loop (GroupSplitExtractGroup) which would return an enumerable itself.
- C# doesn't allow yielding inside of yielding, so the seperate function was needed anyway. Wait, what's this error? I can't return anything in a yield function? Well this is a problem...
- ...one that can be fixed with a hack: class variables, like arrays, persist stack traversal. I've done this for years. It is a hack and a feature of programming languages. If this was F#, there are helper functions for enumerations that let me pass state. That would do the same thing.
- So now I define an array, save sepecific data in it, and move on. I did test it and it worked. I'm very happy.
- This may make that function non-thread safe... but only if multi-threaded/async work is done with the resulting iteration. I'm not sure and have no intention of testing it... because I have no intention of making CastField or CastKeyValueSplit async/multi-threaded.
- With that done, and working, all that's left is the QuoteGroup function. I talked with a friend who is into this kind of work and they simply stated "state machine". We'll see. It sounds nice, though I don't want to go that crazy if I can avoid it.
#22.2. 5/3/2018:
- I didn't actually get code done today. Instead of was trying to figure out what is needed to split the quoted elements from the string.
- A processing state or state machine would be useful. I can see the states being: Normal, Quoted, and EscapeCheck.
- The last one counts escape chars to ensure it's not a quote. \" is 2 chars. So if I encounter " but \ is before it, I can ignore it.
- States then are Normal -> EscapeCheck -> Quoted -> EscapeCheck -> Normal
- Tomorrow will be a short day as well.
#23.2. 5/4/2018:
- Before I ended all activity for yesterday, I thought "do I really need the EscapeCheck state?"
- Fast foward to today: I talked to the same friend as before, and they indicated all they needed was a single char look-ahead.
- I thought for a short bit and realized it would work. Though I can make life a little easier by looking behind instead of ahead.
- Looking ahead involves range checks, a complex internal state (if the char ahead of current is " and current is \, then add current AND next, then move 2 ahead. If not, then add current and continue). What's to stop trying to optimize by working two chars at a time?
- Looking behind simply means "check if it's \. If so, add char. Else, return"
- You'll notice the code is under 30 lines (excluding comments), including newline braces. There is a couple additional state elements to watch. I didn't want to return empty groups because the string was "\"", and some other pairs I checked.
- While testing, all but one test passed. After looking it over, I realized I mis-escaped the "correct" result. So I had to fix the result for the test, as the code worked just fine.
- Yay!
#24.2. 5/5/2018:
- I've been going through gui.cs (https://github.com/migueldeicaza/gui.cs) and it's pretty cool. If it wasn't for the fact that this project is _not_ going to remain in C# forever, I'd probably want to use it and see if I can extend it as a wrapper around either WPF or WinForms. It would be awesome to make a window, add a button, add a menu, etc. and... if it's running in a command line (or a flag indicates it's in command line), then run in the window like a Curses app. Else, open a GUI and run there.
- Maybe I can port it to C++... but I'm pretty sure that's what Curses and others are for. But a single UI framework for CLI and GUI would be quite nice.
- And... just found out the first version of it came out in 2007. o_o
- Slight more thought: it could be useful to basically clone the API (they aren't Oracle, and the license is MIT) and have the "driver" be able to switch between console and GUI.
- For C#, the Console driver will use gui.cs, while the GUI driver will use something else (or I'll have to write it).
- The upside is this abstraction lets me rewrite/reproduce it in C++. This way I can keep writting, the code, but now don't need to worry about the backends/drivers. C#? gui.cs and other. C++? Qt, wxWidgets, other. Heck (I both like and abhore this idea), in theory a small webserver can be started by a driver and "suddenly" the whole thing runs in a webpage. And by "run" I mean "the UI" as nobody in their right mind would use JS unless you're making a Todo app. Jokes aside, I'll note it but it's below every other task.
- One thing I got out of looking at the code, some of the newer syntax is useful. Instead of `public int Value { get { return _innerField.Value; } }` you can now do `public int Value { get => _innerField.Value; }` and if `_innerField` exists because you need to set a default? `public int Value { get; } = 1337;` I've not done C# in a bit.
- I updated some code with those new syntax nuggets.
- My intention was to work on the ParsePath function, but I couldn't figure out how to properly test it.
- NUnit doesn't seem to have something akin to "Does.Contain(<item>).AtIndex(<index>)" or "Contains.Item(<item>).AtIndex(<index>).And.Has.Property(<property name>).EqualTo(<value>)" or "Has.Member(<item>).With.Property(<peroperty name>).EqualTo(<value>).After.IndexOf(<index>)".
- That syntax could get quite ugly... but the alternative is to ensure "ParserPathElement" has an equals function and make a whole array of items then test the results.
- Challenge is the solutions I can think of has one or more of the following traits:
- Requires a bunch of extensions that may or may not require NUnit to be updated.
- Requires a bunch of work in the library/test set to enable writing a test.
- Makes the test too rigid. It was a tip by a senior dev of mine: "If you write a test too rigid, every time you change the code and get the same outcome but it returns a different result, the test will break".
- The first would make me want to do the second instead. The second would possibly be a ton of boiler plate (or a IComparer) that would remind me greatly of the first attempt to write the equals tests for LogEntry.
- But the biggest worry is the 3rd case. The 2nd would imply the 3rd. The goal of the ParserPathElement is to provide a "path" to getting a piece of data out of what is generally a complex piece of data (logs aren't just a string).
- So if, for now, I say "Type, FieldType, StringValue, and IndexValue" need to be checked, then I realize that I can add a SubType for something, I can end up with tests still passing. But what if the StringValue is different based off the SubType? Like it's all lowercase or something?
- Now it becomes "all your values are wrong". Ok, fix. "Wait, I got a bug that & is not a char in Chinese. I will add a test for that". "Wait, I got the wrong Type now. Let me fix that" The SubType changes without me knowing. "Wait, now all the values are wrong again"
- The correct course of action is to add SubType checks. Guess what, now all the tests fail because SubType isn't set. Ok, set all of them. But am I now just mimicing the ParsePath function to get all the path fields? As in, writing a test that will always pass as the test comparision is written to the parser result?
- One more: what if I do some updates to ParsePath and end up never setting IndexValue anymore because it's never used in code. All tests with IndexValue now fail. Let me remove those tests...
- Back to reality: I'm now writing tests to always pass. It's not how it should work. Aside from some basic tests, I don't care about most of the contents. I just care about the last element.
- ...am I now writing a LINQ expression to get the last element or skip values (which doesn't exist as a modifier/constraint in NUnit either). I just want an index.. I want a "Last/First" option. `Is.First.Item(<item>)` or something.
- I feel like I could ramble on about this more... obviously, this isn't as easy as it could seem. It could also be that I'm overthinking it and just doing the LINQ expression/getting the value from the array is the easiest.
#25.2. 5/6/2018:
- Today ended up being suprisingly busy... so I didn't get much done.
- As my day was short, I wanted to do _something_ and my "rant" yesterday left a bad taste in the mouth. Basically: I'm unsure and possibly dissatisfied with how one is supposed to test an array of structs/classes without manually testing values.
- Two great pieces of advice I was given when I first started coding: 1) don't be afraid to delete your code. 2) Sometimes, brute force is the most efficent way.
- #2 is more true then people think. How much time has been spent on optimizing a function, method, process that... will only get used once? Or infrequently? (note: there is an inverse of this too. Not optimizing something because it's not used frequently. Often has to do with complexity and time spent on it. Like manually running tests...)
- In this case, the simplist way to do this is... to just manually interact with it. No fancy syntax. No updating of NUnit. No equality functions and tests to check them. Just write a bunch of tests and figure out optimizations once you've had to write something 40 times.
- In this process, I realized if someone wanted to manually specify field type for a string, it wasn't possible. I also realized not everyone would get the type names easily (Java dev will say boolean, C#/C++ would say bool, C would say BOOL), so I added some aliases and luckily already had a ToLower written in.
#26.2. 5/7/2018:
- Started working on individual field tests for ParsePath.
- Not too much to say today. Only a few more tests to write for ParsePath, then the last software element: the XML parser (which will probably be the largest test set of all of them)
- If there's one thing I realized: You can create this "path" but none of the values let you specify a name. I have a "named field" type, but it's under the expectation that the root log has a set of attributes that can be accessed. Reality is, it may not. Indexes don't work unless order is guarenteed.
- So <root at="hello">Message</root> and {"at": "hello", "msg": "Message"} can both be parsed, but <root at="hello"><Primary>Message</Primary><Second>Other</Second></root> and {"at": "hello", "logs": {"Second": "Other", "Primary": "Message"}} would not be parsable.
- Now, if I had a named field, I could say "/!Primary" and "/!logs/!Primary" and they would be parsable. Unlike the other tests written for the parsers, where I checked the code for what I could do (parsers are "fun" in that they aren't obvious)... the named field tests I intend to write before I implement.
- Last tests after that is to actually make some paths. I may try to take advantage of NUnit's randomization so it can make random paths to test against.
#27.2. 5/8/2018:
- Goal: finish ParseUtil.ParsePath
- 1. I need to be able to parse named fields... but I wrote the test first, and it indeed failed. I implemented the code (easy...) and it passed.
- 2. Parse an actual path... this was more tedious then tough.
- I have the unit tests generate a seed, then I use that in a Random number generator. This way the results are always reproducable, yet easy to be different.
- Next I need a path... and the data I plan to test against. It's easier to just look at the code then for me to explain.
- I also needed to test type... but only sometimes.
- Path creation requires variable lookup (for simplicity sake), but I want to ensure I don't miss a value. So I wrote a test to make sure my data sets are good. :D
- Now for the actual test: generally easy, especially since the "test" portion is basically a Builder type. I could do it in sections.
- And... it works! Yay. Means I get to move onto something else, and simple debug output means I get to double check it's actually making a path. (it was)
- Tomorrow starts the fun task of an XML parser. This will be interesting because of the complexity of writing tests for a parser AND the fact that the ParsePath util has just enabled me to create some pretty complex paths.
- I know that the XML parser pretty much doesn't let use a filter field for anything except the last element. It was written like "now that you have the field, you wnat to double check it's type". The reality is, between the possibility of a mid-path filter and the addition of the named field, you could end up with "filter to only XML elements, now give me 'UserData'".
- It makes me think "I need to keep a collection of child elements" instead of what I do now.
- But that's for tomorrow-me. Today-me is done.
#28.2. 5/9/2018:
- Today was namely looking at the XMLLogParser and figuring out what can be tested, and what should be tested.
- Tomorrow I will start working on SetConfig, which will be a little weird because it doesn't "do" anything. It affects how Parse works, but I don't want to go testing Parse to test SetConfig.
#29.2. 5/10/2018:
- VS 2017 15.7.1 came out. After updating, the Test Explorer showed tests as a hierarchy instead of list a list. It make me realize that the ConsolePrinterTests and IOPrinterTests were in the wrong namespace. Fixed.
- Otherwise, I spent nearly all night working on a different programming project. Can't release it for reasons, but I can say if you ever need formalized JSON, JSON-schema is pretty good...
- It's now too late to really do anything except look at the XML parser and go "this needs some work to make it usable for testing"
#30.2. 5/11/2018:
- Realized one change to make the XML Parser easier to test: don't require a file to read.
- I like extension classes. I used them a bunch in C# before, then mainly went over to C++... and now working in C# again, I wish more languages had them. Core functionality in the class, then all the nice "I wish it had function X" in the extension class.
- As such, file parsing is moved to the extension class.
- As for the tests... it's too late for me to be working on this stuff, but I was at least able to squeeze in a basic test.
- It was a catch 22 for me: in order to test the log parsing, I need to ensure SetConfig works. In order to test SetConfig, I need to parse data.
- One solution is the provide "get" functions for config data out of the parser. But I like the concept of a black box. I don't want to know what it's doing, I just want to provide it parameters and tell it to go.
- So providing a config and a registry and then saying "here's a log. Go" is nice.
- So basic test is just an ideal, minimal case of what to expect. I will try to get more done tomorrow. I just need to not do it late...
#31.2. 5/12/2018:
- Timing: still not working for me...
- Finally got the SetConfig tests. They're peusdo parsing tests, but only to ensure that the config values are getting used.
- If there's one thing I thought I needed, it was a multi-log test. I thought this because in my mind I thought "some logs could be "bad" and we want to skip them. So I need a test where we parse 1 and 2 logs. A test where a bad config results in no logs. And a test where a bad log results in missing logs. I need to ensure bad config and bad log are confused..."
- Reality is: right now, implementation would simply stop parsing/ignore attribute on a bad log. So there is no way to confuse them.
- I added a "todo" in the source for adding a config or some bit of code for handling bad logs without stopping parsing.
- But a weird thought occured: what if, instead of ignoring the log message (which could become confusing), we created an invalid log AND only populated the values we knew? That way, we can have a config that stops parsing all together, skips any bad logs, AND still parses the bad logs but marks them as bad.
- I like configs. It allows custimization (wow, can I not spell... I should spell check this whole document) without adding a ton of complexity. "Um, it adds complexity" then you're doing it wrong. "We accept one type of value as a key, then look for it in a dictionary" -> "we accept a customizable path to find the value, then we wrote a function that does the search". Suddenly, `dict[key]` -> `getValue(dict, key)`. You have added custimization without complexity.
- of course, there are a tone of arguments that disprove my example. But I still think there's more value in adding the custimization then not.
- Needless, the set config tests are done, and the Parse tests (including sub-tests) are next.
#32.2. 5/13/2018:
- I changed the couple tests that didn't need the double-log elements and replaced them with single-log elements.
- I also wanted to fill out the SetConfigAdditionalAttribute before moving onto Parse.
- Ahem, it took me the whole time just doing that. So no Parse tests yet. (the following is a near play-by-play. Summary is that I wish C# had F# levels of type inference and design decisions can sometimes make things like tests a pain)
- I made a concious decision to not just have a dictionary for the LogConfig. I would have to write code to normalize the names (probably just ToLower), the config files wouldn't be meta ({ "attr" : "value" } -> { "configs" : { "attr" : "value" } }), it would be easier to spot log issues at a glance (both with a debugger, and if you looked at the meta log, there would probably be more attributes then "configs").
- End result, I was trying to make this nicer to use. If I changed it, for the sake of a test, it would be more of a pain to use while the test that gets written once gets a few lines of code removed. Not worth the hassle.
- This backfired on me since in order to do the test, I needed to set config object properties. The resulting attribute values and the property names are not 1-1. The attributes are to be generic between numerous uses, while the config is supposed to describe what it is. So "SourceFile" is "SourceFilePath" but "SequenceNumber" is "LogSequencePath". There is a method to the madness of naming, but it could probably use some cleanup.
- So now I need to set these values. There's a good description I've seen that looks similar: "novice: brute force. professional: elegent complex solution. senior professional/expert: brute force". The novice reads materials to get to the professional state, and if they look at .Net, Java, JS, and others, they will see they can do reflection or class prototype manipulation. Reality is... the brute force approach is 5 lines, while the fancy and elegant solution is 150 lines spread across 7 classes and 2 utilites.
- What I really desire is for the ability to add one attribute to the LogConfig, specify what attribute it gets set to, and never have to set a test, log parser, etc.
- For now, I'm not there. cleanup of names and maybe it will be possible with a 5-10 line reflection helper function...
- Back to the point, I needed to get something to set the config and be associated with the attribute. A tuple will do... but I'll need an Action<...> to set the config or to do reflection (with a dictionary of attribute -> config name). I don't want to do reflection, so Action it is.
- I made LogConfig a struct... so I need to return it. Now it's a Func<LogConfig, LogConfig>.
- This is somewhat generic... what if I want to do a pairing or something ("attribute, value" + Func<...>) or more likely: blackbox. I don't want the test to know what is being done, I just want it to say "log, you have attribute X. Set config. Test, did you find X? Great"
- ...so I need to be able to set a name. Now it's `Func<LogConfig, string, LogConfig>`. Ok... so now I have `Tuple<LogAttribute, Func<LogConfig, string, LogConfig>>` as my type. Cue gag reflux.
- I make an array of those and... `new Tuple<LogAttribute, Func<LogConfig, string, LogConfig>>(LogAttribute.ThreadID, (conf, name) => { conf.ThreadIDPath = $"!{name}"; return conf; })),` Cue gag reflux again.
- I know, I'll make a simple helper lambda so it's readable. Instead of "types, types, types, types, actual content that you miss because types, and more types", you just see "lambda(enum, func)"
- Well, C# can't seem to do typer inference as well as I'd hope. so `var makeTuple = (type, func) => new Tuple<LogAttribute, Func<LogConfig, string, LogConfig>>(type, func)` becomes... eh, just look at the code as of this date.
- I write a whole set of comments on what this would look like in F#. With the exception of the printfn, it would be a lot smaller and cleaner.
- I learned F# for a job some years ago. It, or OCaml, are good languages to learn. If you think "I'll use Swift or some new fangled language", I can tell you from experience that you'll like them... and then if you use F#, you'll wish those languages had the same features as it. It's not perfect, but it takes out a lot of the pain of programming.
- If you actually do functional programming, instead of using a functional language to do imperative development, all the languages tend to work better but F# still comes across as easier to work with IMHO.
- Stepping back a bit: experience seeing and using multiple programming languages at this point, even if the language designers came up with the idea without outside influence, I see this:
- F#/RD groups -> (~0.5-1 year) C# -> (~0.5-1 year) Other .Net languages -> (~0.5-2 years) Python/Java -> (~1-4 years) C++/JS -> (some amount of time) "other languages"
- From async/await to Rx, LNIQ to Generics, Implicit state machines to multi-generational non-blocking GC: F#, C#, and .Net have been very far ahead of language functionality compared to other languages.
- I've heard JS devs talk about (this was a year or more ago) how in a year or so, they'd be able to use this new thing called "async and await, so we don't have to do Promises anymore". I first started using those 4-5 years prior.
- This writeup is too large... basically, I got annoyed at what had to be written to make the test data more readable, it's not gonna be any better in C++, but I with that timeline would occur faster so I don't have to type out nearly all of this.
- If one thing came to mind, it's that it might be useful to have LogConfig to have a static function for getting all the parameter names and what attributes they're associated with. I don't really want to have it do reflection, but the returns values could be useful for functions that are only executed on one-time, minimal-use functions. Would at least apply to XMLLogParser.SetConfig
#33.2. 5/14/2018:
- Realized that SetConfig uses a helper function and wanted to ensure that it gets tested.
- Started work on Parse tests... very quickly got to the end before realizing that my previously described failure tests were what I really need to test. Because the only things besides those is the failure tests, path tests, and checking files can be parsed.
- I have improvements I want to make to the parser (like streaming) but it's not time for that yet.
- Fixed the couple test failures I hit. That's it for tonight.
#34.2. 5/15/2018:
- Lost track of time. Only thing I really got done was adding a static function to get attributes and their associated "path" config.
- My intention is to use this in XMLLogParser.SetConfig with some reflection. I don't "need" reflection, but the implementation could fit into the same space it occupies now. Benefit is that the log parsers don't need to be touched when new attributes are added.
- The other reason I'm fine with using a bit of reflection is because SetConfig should not be called often. Reducing performance worries and adding complexity for the sake of coolness.
#35.2. 5/16/2018:
- Went to add reflection and... found out that nearly all the useful attributes in Type only exist in .Net Standard 2.0 and up.
- Do to some weird bug... somewhere, when I first started the project (never had a need to mention this), it defaulted the .Net Standard versions to 1.6. I updated everything to 2.0. But while everything compiled, it wouldn't execute. It complained about .Net Standard version mismatch.
- Needless to say, I didn't see a need to use 2.0 at the time, as much as I wanted to. Now I do and, luckily, it compiled and ran. I don't know if it's a newer version of VS or something. But no more crash on startup
- Added the configurable failure handling enum and unit tests, but haven't implemented yet.
- Also added more combinations for "LogConfig.IsValidValuesTest"
#36.2. 5/17/2018:
- So today I wanted to add the enum for failure handling. Along the way I had a couple ideas and remembered something...
- First, one of the failure handling options is to parse the log, but only the attributes it knows about and can parse. I call these "failed logs"
- To test if a valid log or not, I added IsValid to ILogEntry. Simple, but effective. I expect it to be a shortcut for "do I have to pay attention to what I'm doing, or can I expect that some values will be there?"
- But in order to have a failed log, without setting a variable, I need to register a failed log. Thus, ILogRegistry.AddFailedLog
- I also happened to start documenting these... I'll see how I do later on.
- While doing this, I remembered "Oh yea, I don't know the source of these log entries. I just know they exist". But I also stopped myself from giving myself more work to do right now. I can probably add that after I get failure handling added and either do or start working on the other parsing tests. Info noted in LogAttributeEnum as that's where I'd need to add an entry and what to do with it.
- I added a few additional tests and finished the failure handling tests... which don't work right now as failure handling hasn't been implemented. But that's for tomorrow.
#37.2. 5/18/2018:
- Ok, it's been a long week. I got home and fell asleep. But because I've been commiting the following day, I'm just really late on this... (I repeat "don't kill yourself for code", and didn't plan for falling asleep)
- Mainly, just wrote the unit tests.
#38.2. 5/19/2018:
- Started working on FailedLogEntry. I say "started" because as I started working to integrate it into the registry and update tests, I had a bunch of questions...
- Good questions, like can I compare LogEntry to FailedLogEntry? What about Equals?
- It's a good start, and I like some of the ideas. Also, adding this functionality means the failure handling can work as a developer intends. Or maybe, how the log source can be used. A single log may want failure, multiple file logs may want to skip, while a log stream may want to attempt to parse.
- ...and the handling questions will allow that to work without any "oh, BTW, the Equals function doesn't work if you set this one global value". It shouldn't care about your config.
- I also learned that `Property { get => "hi"; }` can be simplified to `Property => "hi";`
#39.2. 5/20/2018:
- My intention today was to figure out CompareTo, ensure support in LogRegistry, and get some tests working.
- My computer decided that all IO devices would no longer function. I got out my backup keyboard and mouse (aka, spare) and tried those. Still didn't work. I'm literally sitting in front of my computer, with a laptop, remoted into it to write this info.
- As such, I haven't gotten all that done. I realized CompareTo is probably not the correct thing to implement because I'm only doing a single-field comparision, and it could be easy to mistake CompareTo == 0 meaning Equals == true. Since it's more a domain specific task, and it only happens inside LogRegistry, I decided to do the comparision there.
- By the time my IO devices failed, I realized "this still isn't right". Because the comparision is used for insertion, and failed logs won't have any fields upon addition. So the comparison will always follow the same path.
- I don't want to fight with the computer right now, so I will do that work tomorrow. Unit tests passed, so it's at least at feature parity.
- I wrote it at an earlier point, but one interesting aspect I'll have to write is lazy/post execution of sorting the failed resources. My head says "threadpool function to do the work when it has time, started after parsing ends" my gut says "when a parser is done, operate then". But for streams, this won't work as it may never end. But a threadpool is a bit much. Perhaps I need to be smart and if it's a stream, I need to do the parsing on the fly (with locks and all), but if a file, just do it when parsing finishes.
- Reality check: just run when done for now and figure out fancy, smart, whatever term... (the word of the year is "AI" as "machine learning" was so 2017...). It also begs the question "what happens if we don't execute it or forget to?"
- Tomorrow...
#40.2. 5/21/2018:
- So "what happens if we don't execute it or forget to?" Nothing, for the most part. The only time it has any effect is when GetByTimestamp is called. Because we don't want to to return a failed log that has no timestamp OR has a timestamp and hasn't be sorted properly.
- Only thing I don't like is doing a simple iteration to find the failed log. Involves something I had very early on: an ID. I worried that just doing equality checks would 1) be slow. 2) Could result in two of the same logs being confused, but because they were classes, the wrong log could be returned.
- With that done, the solution for "how do I know when it's done?" was to just have a function that notified the registry it was done. This would allow it to do any additional processing if needed.
- None of this stuff would scale to streaming logs or, if needed, threadsafe.
- In addition, IMutableLogEntry was originally intended for "I want to be able to modify some logs... period" and now it's expanding to an internal/LogRegistry ILogEntry... probably time to rename it.
- In other news: tests before impl was good, as nearly all those tests were done by the time I did the implementation, and now I know that the only thing that needs work (because it wasn't implemented yet) is the parser itself.
#41.2. 5/22/2018:
- I ran out of time today... but one thought that came up yesterday was that interaction LogEntry and FailedLogEntry are, and should be, only done through ILogEntry. As they are generated types, and both implement IMutableLogEntry (which requires renaming), let's move LogEntry and rename IMutableLogEntry.
- Moved LogEntry into "Internal" and renamed IMutableLogEntry to... IInternalLogEntry. It's only "supposed" to be used by internal types, so I made it internal/package (I could probably set the access modifier to internal, but IIRC (it's been a bit), internal allows access from the entire assembly when the only elements that should care are LogRegistry and the log entries themseleves)
- Want an example where unit tests are good? Ok, it's not a great example, as it was a failure of the unit test itself: the mocking library failed because it couldn't see internal types. But I wouldn't know why if I didn't have the tests.
#42.2. 5/23/2018:
- Tasks for today: support failure handling, at least make the function for "config context", and ensure existing parser tests work.
- I got 2 of 3.
- The one that didn't work: existing parser tests. If you run this build, you'll find that everything worked... because I realized I wrote a couple tests wrong and "fixed" them but now they're no-ops for all intents and purposes. So I need to fix them.
- Failure handling: done (simple refactor and making 2 functions: one for handling the good case, and one for the bad case)
- Config context... this was what I mentioned some time ago (or at least wrote a code comment for) where you could specify config values for a temp parser.
- Why? And what's with the name. Name: I couldn't think of anything better. It's a config... within a limited lifetime.
- Why: Parsers are unlikely to be "lite", and could vary on backend implementation. I have thought for a bit that larger log parsings would do better with a database, not a sorted list. What would be more efficent: opening 40 databases and then needing a system or method to sync them, opening 40 database connections to 1 DB and invoking everything async/locked, or opening 1 database connection with 1 DB and doing interaction with async/locked?
- My guess is the last one. First one: heck no. Second one: eh, each request would be async, but now it's network/io async instead of local statemachine within an application.
- So maybe instead of CreateParser(src1), CreateParser(src2), CreateParser(src3), etc. we could just do CreateParser(config) and then invoke parser.Parse(src1), parser.Parse(src2), parser.Parse(src3), etc.
- Wait... that's what we did. The problem space is how do we set the log source? (a file, a network stream, hostname, etc.). What if a file log we just want to stop parsing on bad entries, while a network we want to skip them. Now do we have to create multiple?
- This is where the config context comes in: set the config values you want to use for a specific "context" and then go. Internally, all details will be taken care of. Continue to be async or threaded, maybe chain together contexts once the data is known.
- In implementation, it allows reusing a potentially expensive parser to do lite work. I can now have 40 shims lock and unlock a mutex before adding to the list, or sending a DB command. I can do the per-stream failure handling.
- And by putting the return in a context instead of an IDisposable or similar, I know it will go away and not be forgotten.
#43.2. 5/24/2018:
- Late... again...
- After thinking about it a bit, I realized that the bad test wasn't actually bad. It was testing the parser, while I was expecting the registry to do what I was expecting (removing log entries)
- Now, the parser test was still useful as it tested "there is an element, but it's empty"
- I would add a count test, but it's really late for me so I'm going to have to put that off to tomorrow, along with writing tests for ApplyContextConfig
- I did manage to get the log file test added to ensure the file streams work.
#44.2. 5/25/2018:
- It's that time of the week where I just want to sleep. So a quick one today...
- Added a log count and tests for the work. Gave me a reason to remove a log that has no attributes.
- I wanted to save some time tomorrow... so I wrote the tests for ApplyContextConfig
#45.2. 5/26/2018:
- My attempt to save time yesterday... saved time.
- With those tests, I had an idea of what to implement for ApplyContextConfig and went about doing such a thing.
- ApplyContextConfig is now implemented. Not the cleanest method, but it does what it needs to do with minimal work.
- The biggest challenge was all the parser paths. How I see it - Naive: pass everything as different variables. Intermediate: pass seperate class with configs. Best: just reference the main class instance.
- C# has class and struct. Classes, when passed as arguments, are always by reference. Structs are copied.
- As I didn't want to make a copy function for the configs, didn't want to need to populate a config class that could be cloned numerous times or would never be cloned at all
- What I did:
-- parser paths are the only thing constant between everything, and I don't want to clone/pass a config class just for those... so I pass the parser itself around.
-- Specific config values change only when ApplyContextConfig is used, so put them into a struct so I only need to update the copy-on-set instance.
-- Since ^^ config is copied through arguments, and it contains a dictionary (one value for now, but could grow) that needs to be cloned and updated, use a struct to cut the copying to just the dictionary (which I can do if the variable is set)
-- Move all parsing functionality to static functions
-- Instance functions will use defaults when calling the static functions
-- All "context" usage will use their own config data for invoking static functions
- In the end, tests passed, thought of at least one additional one, and it works as planned and desired. I also like the idea of stacking contexts, if desired. Allows for some additional flexability.
- I'm done now... I keep adding/updating parts of this code. Thinking for a moment. Then removing them because they are not needed. Reduce, reuse(, recycle?)
#46.2. 5/27/2018:
- Simple work today: started working on the tests for path usage within the XML parser.
- One question that comes to mind is if "paths" should be generic enough that you don't need to know log data, or not.
- Like: if you end up mixing JSON and XML, or maybe XML from multiple sources, should you specify the same path OR specify multiple configs?
- I'll think about it tomorrow...
#47.2. 5/28/2018:
- So I failed to think about yesterday's question... something for tomorrow.
- I wanted to finish the parser unit tests, but just lacked the interest tonight. I finished non-combination unit tests though.
- Another question that came up (and I determined) was how filters and similar should work. Basically: everything operates on the children of the current node, unless it doesn't have children, and then it operates upon the current node.
- In this way, if the dev filters all elements, it will remove any child node that isn't an element... but if there is no child node, then it tests the existing node to determine if it is an element.
- Do the same for each path type and you'll end up with lists of nodes. It's like tree traversal, done level by level. If more then one element exists at the end, it takes the first. Assuming the XML isn't randomly output, the parser will always read it in the same order.
- It makes the system a bit more generic which means it can probably do a bit more complex processing.
- I also realized that the named node idea, using the "if no children, use self" could apply to attributes and the prior "!<name>" becomes a shortcut for the root. So now a child node with attributes can be accessed.
#48.2. 5/29/2018:
- Finished the unit tests(!!!)
- Used "confusing XML" on self. It's confusing...
- I'll work on making the tests pass tomorrow. I worry that some failures are because I'm not using NSubstitute correctly.
- So, the question from the 27th... my thought is: maybe. I say this not because of indicision, but because the only thing that isn't generic is flters.
- But filters vary by parser... isn't that the question? If those should be generic?
- XML types (that matter): element, text, cdata. JSON types: object, array, string, int, number, boolean, null?
- (XML and JSON are the two types I think _need_ support)
- Now... the only overlap I can see is `text` <-> `string`. You could argue `cdata` is also similare to `string`. But `cdata` is not the same as `text`. So you can't do a real equivilance.
- You could do `element` and `object` in theory. Though that too could be `element` and `array`, but `array` and `element` don't work.
- So if I went with generic, it would be: object, array, string, char-array. But it leaves out JSON types or starts to say the same types could be the same, which makes things... weird.
- I'd end up with needing a generic value type for the non-generic filters. Or to make the generic filters optional. I read an article a while ago that discussed "the power of defaults"
- ...but I have also learned a less talked about development strategy that I bet you've never heard of: copy-paste.
- Now combine the two and think of the outcomes:
-- Generics only: people will write generics, docs will need to mention you can do non-generics... but people won't know when to use them or those that need it will have no need to speak about it.
-- Non-generics only: people will ask why generics are needed...
-- Mixed usage: people will pick the first examples or the shortest to type
-- No docs: someone will figure something out, and it will become the de facto standard for how everyone writes configs.
- Common theme: people will take what is shown and copy it for their needs. Other types will be left by the wayside.
- Easy solution: the technical one. Instead of adding more complexity, just don't do generics.
#49.2. 5/30/2018:
- Failing tests: 15
- Rewrote path traversal function...
- Failing tests: 14
- Well, it's something. The most confusing aspect is probably my "test child nodes OR self if no child nodes" because it can cause the nodes to process to "collapse".
- So you have <parent><child><data>value</data></child><child data=value></child></parent> and you do /$elem/!data. First node to process is <parent> and it results in two elements, the two children.
- Now we go to test data and we get
-- Named element "data" from the child nodes of <child> #1
-- ...there are no child nodes in <child> #2, so check for attributes. You found one named "data", so return immediatly.
- Now, with a slight change, the node list becomes: <data>value</data> and <child data=value></child>. We've collapsed the tree a bit... but is it correct?
- Right now, because the unit tests aren't passing and some of them should (even before I had rewritten the path function), I think I'm using NSubstitute wrong... so I'm gonna work on that tomorrow before I try to fix the function more.
#50.2. 5/31/2018:
- 50 days! Half way (and I spent 48 days just doing unit tests. 50 if you count what I did when I restarted my counter... more if you count trying to get unit tests working prior).
- Code will come. These are the last tests for the most part.
- A moment of happiness: I decided to figure out what was going wrong, I needed to debug a unit test. The results I found online... basically said I needed a couple more Nuget packages, manual setup of a debug execution setting, and to run everything... and then I saw the blog was from 2004, then 2006, 2008, 2010, 2012...
- That last one (https://stackoverflow.com/questions/4103712/stepping-through-and-debugging-code-in-unit-tests) gave the exact breakthrough I needed: Test -> Debug.
- That simple bit of info meant that I was able to find out... I forgot to parse "Text" types. So I handled elements and cdata... but not text. 3 lines later, my failing tests went from 14 to 4.
- Couple more small fixes and fixing a typo in a unit test: everything is passing!
- Next up is... the stuff on the Tasks.md list... for tomorrow
#51.2. 6/01/2018:
- So, I didn't have a lot of time tonight, and I procastinated on doing this last night... but the refactor tool made it very easy: changed from LogTracker to Stethoscope for the namespace.
- I'm still not entirely thrilled with namespace, but LogTracker cornered the dev into one use of the library (to track logs) when the library itself can do more then that.
- Also, having the name of the library in the namespace probably makes it less likely to confuse with something else.
- That's all for tonight...
#52.2. 6/02/2018:
- Original plan: read through all the prior logs written here and determine if I have new tasks to add to the Tasks.md page.
- I'm very tired, what's less likely to put me to sleep?
- Making LogRegistry and XMLLogParser thread safe
- LogRegistry: not too hard... until I realize that the get functions just operate on the array... not something you just want to lock up
- XMLLogParser? Doesn't really need to be threadsafe since each operation is unique and it doesn't operate upon itself, but config and registry changes could require some locks.
- On that note, the printers also should have locks on the setting the config and registry
#53.2. 6/03/2018:
- Yesterday I started working on thread safety, today I got stuck on thread safety
- Note 1: I learned long ago that mutexes, at the speed a computer operates at, are not fast. Something like a monitor is better even if it takes a few more lines of code to build. There will always be a place for mutex, you just need to know where.
- Note 2: Faster then a monitor: just not having locking contexts... use atomics. It happens at the processor level and is much faster then anything else you can work with.
- Two issues:
-- 1. Registry iteration won't work if a log is added mid-stream while iterating, regardless of iteration location. For one, the current List implementation will throw an exception during iteration as it's been modifed.
-- 2. Log parsing relies on config attributes... that could change if the config is changed. Luckily, the values are copied for the most part.
- For #2, I planned to count how many parse operations are occuring at one time and, if config was active, not allow/delay the parsing. If parsing was active, throw an exception from the config functions.
- But while typing this I realized "the config functions aren't exposed". ILogParser doesn't expose the config functions (and neither does IPrinter), this means that the "proper" way to call those functions is to have the factory invoke them, and it invokes them before returning the implementations.
- So if anyone/anything uses them later, they're on their own. I won't try to control "advanced" or naive usage of those classes.
- That solves #2, but what about #1?
- Turns out... it's not easy, but it gives me a way to do something I have wanted to put in: Rx (observables).
- By using Rx "nearly" remove all worries of "what happens when we have millions of logs?" because Observables don't store the data and are a programatic method of doing iteration without "you removed/added an item mid-rexecution".
- In some apps, that might still be a problem, but for our usage, if the processing is past the point processing, then we don't care. If it's before, then we get that value (or for replays, we get a new value in the replay).
- This will require a custom Subject (look up "Subjects" in Rx/Observables) so we can insert values mid-stream. If/when we switch to a DB or similar to store logs, then it becomes "can a DB query get values mid-query?" if yes, then great. If not, then so long as it simply skips over it, it doesn't have any effect. So it's a win-win.
- The end goal is to allow multi-thread add/read of logs without exception. Since constant adding of logs could make it so that iteration never ends, we can't put a mutex or lock into the iteration code. Also, async-iterators are... not that great: https://codeblog.jonskeet.uk/2009/10/23/iterating-atomically/
- Past experience with Rx tells me that replacing with lists with iterators will make changing to Rx really easy... and we already switched to iterators, so this will be easy.
- Minor benefit: Rx exists for numerous languages, which means that we can move it to a different language later.
#54.2. 6/04/2018:
- Goal was to start working on switching from iterators with exposed classes, to observables
- Mostly succeeded. I didn't update GetLogBy or the storage within LogRegistry, but nearly everything else is done
- Only problem that was encountered was one of the logging tests failed because the outer subscription (threads) runs before the inner subscription (logs) finishes
- I also had to place some weird methods for converting from observable to enumeration in registry tests, first so later removal makes it apparent what needs to be replaced, second because NUnit doesn't support checking Observables and I wanted to switch to observables before doing any work for nunit so they'd be understood.
- Tomorrow's goal is to get the test to pass, finish work on LogRegistry, and then maybe move to the NUnit Observable support
#55.2. 6/05/2018:
- So the issue I encountered is that observables are pushed based. It's by design... but I had written a "nice printer type" that expected everything one-at-a-time.
- Changing to observables made this apparent now, rather then at a later point when items would've been designed for a pull-based (enumerations) system.
- Observables also force async, though it's somewhat transparent though some "ObserveOn" functions will probably be needed later.
- To the point, I did a weird trick to get the observables to be provided in a one-at-a-time manner and, since the existing concat functions would've simplfied the grouping observables to just the values, took advantage of C# 7's pattern matching syntax.
- If discriminated unions existed in C#, I'd use that since then I can just match on expected types instead of hoping for specific types.
- Speaking of pattern matching, I implemented that everywhere else where it could be used.
- I also do one more update to LogRegistry so the grouping function would use observables.
- I planned on switching the backing storage of LogRegistry to use observables, but I knew that would take a while and it's late right now and I don't want a repeat of yesterday (that work took 3.5 hours because I was re-learning Observables while implementing them, trying to find the documentation that I wanted (it used to be good...), and I progressivly get slower as it gets later)
#56.2. 6/06/2018:
- So I spent a bit reading through Rx and thinking: how do I replace the array of elements
- I came to the following requirements for the storage: I need it to replay, sorted, and "lossy". I also wanted to be able to count the number of log elements, clear the logs, and it needed a better way of handling "buffered" content.
- Why for each feature?
-- Replay: kindof obvious; need to to be able to go through logs
-- Sorted: This one will be needed when GetByTimetstamp is needed, and it's easier to sort on insert then at runtime.
-- Lossy: This is just the name I gave this... basically, given the nature of observables, if a new subscription occurs, it should replay everything. But if an existing subscription is in use, and it's getting data, and a new log comes in (say, from a new source) and has a timestamp earlier then what we've iterated on, then ignore it. We don't need to revisit the past right now.
- Clearing is easy, counting is mostly easy, and the buffered content is just to handle failed logs until the registry is notified of there completion.
- Once I realized the work is gonna be a long time, I decided to deplay it to tomorrow. Instead I did the long running "let me read through old 100DoC logs and see if there's any todos I missed". There weren't any.
- One realization is that I may have to implement Queriable, or for observables, IQbservable. I came to this because replay is easy (so long as the data is small), but sorting an observable is nearly impossible. I could stick to the array and just do AsObservable, but that's gonna get weird once we insert something mid-iteration. Maybe it's already "lossy". But I could also go to a database...
- Then my mind wondered, and I realized maybe a sub-class for specifying the storage system. List (default), InMemoryDB, FileDB, Auto? (which would switch between them as time went on... it sounds cool but probably not)
- So... if I have a DB, then I can simply request a sorted list. But wait, I've wanted to have Queriable... and IQbservable is a query-enabled observable... and queries work well with DBs.
- It just kindof all fit together, I can make a storage type for the registry, make everything queriable, and it should all produce an efficant and functional system.
- And once that's done, it should make handling large logs easier and faster... and that comes after this work. So it's just in time.
- (ok, looking at the task list, I'm doing stats next... useful for determining if the work is really useful and possibly to gather some early stats for determining "should the default storage be a list, InMemoryDB, or FileDB?")
- (ok... also missed making sure NUnit supported Observables...)
#57.2. 6/07/2018:
- I talked with a friend about the observable design for longed then expected...
- I'm going to go with the plan I had yesterday, but hide it with a factory.
- The factory will use a set of criteria (unused as of this commit) to pick which storage to use.
- But... due to that conversation going for a bit, I'll have to do the storage implementation and work tomorrow.
#58.2. 6/08/2018:
- So much for doing the storage implementation "tomorrow". And as a heads up, I hope to do more _tommorrow_...
- Basically, I took a look at IQbservable and went "this is cool... how do I utilize this?" and everyone else seems to have the same question.
- I found a promissing project (https://github.com/RxDave/Qactive) though haven't looked it over enough yet to be able to determine if it's useful...
- But besides that... it's again, many people going "it would be great to recieve values as they appear, with filters, and done server side instead of client side" and then nobody having a DB connection or something that can be utilized.
- My initial plan is just to have a list anyway, so it's not going to be anything incredible. Heck, it could just be I take the expression from the Qbservable, execute it on the list, and as I get results back, return.
- But for now, after glancing at a lot of discussions and some code, I just made an interface. I don't want it to just be a mini-ILogRegistry. it should be storage of ILogEntries and nothing more. Closer to ICollection<ILogEntry>.
- I doubt this is the final design, but I can at least do something to get an idea out.
#59.2. 6/09/2018:
- Again, not much work today.
- All I did was substitute the old implementation of LogRegistry with the initial work on implementing IRegistryStorage usage.
- Ah, a few tests didn't compile... refactored LogRegistry creation to a function for them so only one spot needs to be updated. Should probably switch to using the LogFactory anyway.
#60.2. 6/10/2018:
- We're back in business... not perfectly, but unit tests pass and test program work as expected.
- Looking at IQbservable... I think I'm in over my head. As such, my implementations for this will not be ideal, but I don't want to spent the remaining days on a proper implementation (though I know this will take longer then 100 days total).
- Basically, I don't know enough about implementing a QbservableProvider and system and how Expressions work. Ideally, all invocations that check attributes on the log should be turned into a DB query (or similar), while everything else should be lambdas and similar that run within .Net.
- Providers (to my understanding) basically take the entire query (from log in logs where log.Timestamp > DateTime.Now && log.Timestamp.Minutes > 30 select log) which has been turned into an "Expression" and iterates through it so "where log.Timestamp > DateTime.Now" becomes RemoteSQL("WHERE logTimestamp > cast((now()) as date)").Where(log => log.Timestamp.Minutes > 30).
- LINQ is extremly powerful, but after nearly 50 days of doing unit tests, I'd like to not spend the rest of it learning and implementing LINQ.
- With that explained... I got a basic storage that uses List, and ensured LogRegistry was implemented.
- I will do a bit more research tomorrow to try and determine if there are any SQL IQObservables and/or how to ensure Insert doesn't screw up log iteration (I need it to be that if a subscription hasn't gotten to the inserted member, it's returned. But if it has passed that timestamp/iteration, it doesn't return it).
- For now, everything's working again.
#61.2. 6/11/2018:
- So, today wasn't terribly productive.
- I was mainly going through code, trying to understand how IQbservable would work... long story short, I'm not doing any more work on it in the short term.
- It comes across as "the future" and "under-implemented" at the same time. And as I said else where: "I may have crossed an event horizon of code and I both want to continue and have no idea what todo"
- I can basically ignore it by using AsQbservable on an Observable, but it's not a very useful tool when used that way.
- Looking at the tasts I have, for the most part, I can ignore it for a bit. But the place where I see it being needed is when I support remote logs.
- The reason is that, as of now, I'm reading files and streams. They'll be finite for the most part. But once I'm streaming logs in from remote servers, it will go from KB and MB of log data to GB of data, streaming from one or more server at a time.
- That kind of data will require some updates to how they're inserted, possibly requireing queuing and proxies. Mutexes and Monitors may be too slow. To then read such volumes of data, having one giant stream will be like an infinite loop of log data. The ListStorage I have no will ballon in memory and probably become unusable. A DB will be needed (maybe sharding too?) and with a DB, IQbservable will actually have a real use.
- But before the DB gets created, the ListStorage type should at least work properly.
- So... lots of reading, but not a lot of code. Tomorrow I'll see if I can start and finish support for Observables within NUnit.
#62.2. 6/12/2018:
- I know this feeling... this is the feeling of "I'm going to crash, hard, because of lack of sleep". I can barely keep my head up.
- This feeling started earlier in the day, and now at this point, I'm going to have to call it quits.
- But before that, I did research on how to support Observables with unit testing/NUnit. Ideally, I'd like to support Observables directly (Assert.That(obs, Is.Empty)) rather then converting to an enumeration or processing the Observable outside of that.
- Luckily, NUnit has a page on extendable constraints: https://github.com/nunit/docs/wiki/Custom-Constraints
- Looking at the area where I expose Observables, I came up with the following items I need to support:
-- IObservable
--- Is(.Not).Empty
--- Is.SubsetOf
--- <expression>.Exactly
--- Has.Member(x)
-- IObservable<IGroupedObservable<K, T>>
--- <something to get all keys, and Concat? (to combine all values)> (though this could possibly be done with simple LINQ)
- I wanted to work on this, but you don't realize how many typos I made and words I accidently when I typed just this log because of how tired I am.
- I will work on the above tomorrow
#63.2. 6/13/2018:
- I started working on the Observable support for NUnit and hit into a number of issues. Nothing blocking, but they make what I expected to be a simplistic and elegant addition, and tarnish it a bit.
- I managed to get the Empty constraint (known as ExEmpty) implemented and will do more work tomorrow.
- Issues encountered:
-- The Custom Constraints page has an example of "public class Is : NUnit.Framework.Is". When first read, I assumed the types would clash but thought "just don't include the namespace that has Is in it"... but all the types you interact with in NUnit are in that namespace, so I had to change the name of my "Is" to "IsEx"
-- C# doesn't support extension properties (just do a Google search for why...), so what could've been "Is.Not.ExEmpty" now has to be "Is.Not.ExEmpty()"
-- It would be cool to do "Is.Not.Empty()" but for consistancy to prevent name clashes, had to rename them to ExEmpty and will do the same for other types. Also, since it's near impossible to tell
-- NUnit doesn't let you add sub-constraints... so I had to make entierly new constraints. Ex: EmptyConstraint really just checks a few types, picks a "real constraint", then uses that for testing. I'd be great to add additional types, but I can't. Instead, for ExEmpty, I do my own type checks and if none match, I run EmptyConstraint internally.
-- For collections, there is IEnumerable and IEnumerable<T>. For Observables, there is only IObservable<T>. This meant that I needed to figure out how to interact with a type that I didn't know the type of until runtime. I made a helper function to convert to a known, generic, IObservable<object> but it took all the time I had allocated, and more, to working on this. C#'s generics are very good, but if you don't know the types ahead of time, they become near impossible without Reflection.
-- One holdup in working on other functions was, after all the work needed for just doing ExEmpty, I didn't want to spend another hour or two just doing a couple more functions.
- This has not gone as fast as I hoped. We'll see if I can finish it tomorrow.
- I'm glad I worked on it, but it's just slow going. Also, was not expecting the Observable thing to be as much work as it was. A bunch of trial-error, some research, and a couple curses.
#64.2. 6/14/2018:
- It lives!!! Ok, at least for the extensions.
- To add to the "issues" list: there are many existing constrains that could simply be extended... but even though the classes aren't sealed, the member variables are private, so I can't access them to use in an overriden ApplyTo function.
- Next up: state gathering. I expect this will appear to be easy, and will end up taking me a couple days. The weekend will also be a bit slow again.
#65.2. 6/15/2018:
- I didn't have a lot of time tonight, and I need to think about how to do the next task: stat gathering
- Something easy but also useful, I went around to everything (except parsers) and put in comments for where stat gathering should be specified
- Doing this is useful because, while I can't do it now, tomorrow I should be able to just look through everything that stats should be gathered and try to determine how the gathering should be done (static class that has some function call? instance type that can be instantiated? mix? other?)
- But that's all for now.
#66.2. 6/16/2018:
- Ok, as stated already, didn't get very far.
- Most I did was finish marking in the parser.
- I will plan out how the state system should be implemented tomorrow. I may also mark more areas depending on what I think the stat system should be.
#67.2. 6/17/2018:
- So... still not a whole lot, but the main thing was I tried to summerize some of the operations.
- I know a histogram will be something that will be created by the stat system.
- I want to be minimal and only provide stat recording thatr is as minimal as possible (think "collection.Record(value)" instead of "StatCollector.Record(Name, value, RecordStatScope.Public | RecordStatScope.GlobalStore)").
- ...then the actual statistic types will be handled in the background. Statistics like min, max, average, the previously mentioned histogram, etc.
- I'd also like self-feedback to exist as an option, so stats can be polled, written to a file, callbacks invoked or gather during runtime, etc.
- Though, I need to look to see if existing libraries/systems exist. No need to reinvent the wheel if something good enough exists already.
- Bonus: as I've been doing it as I come across them, I now use "nameof" for all argument exceptions* where I specify the argument name.
#68.2. 6/18/2018:
- So doing some searches, I came across Metrics.NET and spent today reading through docs.
- If it reminded me of anything, it's that I want to grab performance stats too. "Timers" in Metrics's API.
- The challenge I have is that I don't want the library to do configuration...
- ...but I'm also not sure if I want to abstract the logging to hide away the global Metric class or similar. And if I _do_ abstract it, then I will have to take some control of configuration.
#69.2. 6/19/2018:
- So, I'm going to directly invoke the Metrics library for now. I'd like a little more isolation instead of using globals, but it's the best library available from what I can tell.
- To start, I implemented metrics for the factory types.
- I also spent a bit of time trying to debug why things weren't working... turns out an additional dependency of System.Configuration.ConfigurationManager was needed.
- Lastly, I needed to ensure that the application and tests would be configured correctly.
#70.2. 6/20/2018:
- Implemented trival metrics.
- Was debating on the ListStorage.AddLogSorted, if I wanted to manually measure the lock time AND the actual execution, or just do (what I ended up doing) of just combining the two timers.
- It was easier to combine then to try to do StartTimer, StopTimer, Record(stop - start, ...) and then doing that only around locks and then the execution itself.
#71.2. 6/21/2018:
- Implemented non-trival metrics.
- There are probably a few more I can add, but they'd only be added because I thought of them... not because I saw a need for them. Like timing how long GetElementDataFromPath takes... realty is, it doesn't matter. ProcessElement does matter, since it limits how many logs we can go through. While GetElementDataFromPath would just be "this type of operation/length is slow"
#72.2. 6/22/2018:
- So this is a weird one... I wrote code, but thank God for unit tests... because I found out it didn't work.
- What's weird is that it should...
- Ex. Instead of "var result = x + y" I wrote "var result = Math.Add(x, y)". The first passes unit tests, the second doesn't.
- So I went off and did this: "var result = x + y; var result2 = Math.Add(x, y);" and... it too doesn't pass.
- What the heck? How does addiitonal code, that has purposly been written to not interact with the original fail?
- ...and I spoke too soon, I realized while typing this that the reader probably already read the value for the original format... but running both means that it's not getting the same values/children.
- It's late, so I'm going to stop here and try to figure out what to do tomorrow.
- The end goal was to eliminate the "root" node, since streaming and massive logs don't need millions of child nodes.
#73.2. 6/23/2018:
- So in a slight comedy of errors yesterday, I tried to replace the tree-model of XML nodes (element.Parent.Children.Contains(element)) with... a Queue.
- Take a moment to think about that... how do I take that same example and do it with a queue? Hint: you have to create a new queue.
- This is what happens when I work with little sleep. What was I actually looking for? Stack, not Queue.
- So I switched to a Stack and... everything still failed.
- Turns out, I condensed some of the logic down originally and didn't make it obvious what was happening... so I past self confused current self with what I did.
- Key: adding new elements to a parent. I went "nah, don't need this" but didn't think "um, I don't want to add elements to <root> but everything else is fine"
- With the proper container in place, a couple comments on what is being done, and ensuring that I add children at the correct moment, I was able to get everything working and passing tests.
- I repeat: unit tests are very useful... I had many of the XMLLogParser tests turn red.
- I also repeat: sleep. Don't kill yourself over code. I possibly would've had this done yesterday if I wasn't tired and realized I was using a Queue instead of a Stack. I distinctly remember going "why is there no Push function? Well, Enqueue is close enough".
- Bonus: I realized "what if a log shows up that already has a root?" and made a note that I need to support XML logs that already have roots.
#74.2. 6/24/2018:
- I didn't have a lot of time today, so I quickly added one element I mentioned yesterday: supporting logs with an existing root.
- While the flag is there, the tasks I need to do with it:
-- Support flag with ContextConfigs
-- Unit tests
-- Think about if this is applicable for other log formats... and if not, how should I handle this?
--- While I keep flip-flopping mentally if I should or shouldn't, I'd like LogConfig to be per-parser/per-log to ensure the program can support multiple log sources at the same time AND not have sub-configs or something.
--- For all I know, the sub-configs might be a better idea... but do it per-log type?
--- Other log type ideas... would they have there own variables that don't work elsewhere?
#75.2. 6/25/2018:
- Finished support for ContextConfigs and unit tests for LogHasRoot
- As I like the idea of per-log/format configs, the LogHasRoot config can be ignored by parsers that don't support it.
- I also aquired bigger logs... going up to 7GB.
- The smaller of the large logs parse just fine, though RAM usage goes to ~2.5-3GB.
- The 7GB log... didn't parse because Stream.CopyTo couldn't copy the data. Understandable, it's 7GB trying to be copied to a MemoryStream.
- I'll need to do streaming to test that one.
- One interesting one: I support XML right now, will add support for JSON later... but I forgot, there was a period of time (and many logging systems output) unstructured plaintext. I'm curious if I can support it... I don't know how at this point, but we'll think about that when the time comes.
#76.2. 6/26/2018:
- So, I wasn't sure where I should go... and decided to jsut get benchmarks.
- So no code today, just numbers.
- Rough numbers... because the metrics continue to get gathered even when we're done processing. Grabbing them right as parsing finishes would probably be good. But I'm still namely trying to gather how to read some of these metrics.
- Numbers:
-- Setup without parsing: ~20.4 MB of RAM
-- Setup with parsing: ~2636 MB of RAM
-- Logs parsed: 687,674 logs
-- Log file size: 554,220 KB
-- Parsing rate: ~22,500 logs per sec, or ~22.5 logs per ms
-- Parsing time:
--- Min: ~0.01 ms per log
--- Max: ~2.4 ms per log
--- 99% percentile of parsing: ~0.07 ms per log
-- Locks when adding to storage adds ~0.1 ms per log, at worse
#77.2. 6/27/2018:
- I'm late, but wanted to see what I could do... which wasn't much.
- I was planning on trying to implement streaming/concat Streams (https://stackoverflow.com/questions/3879152/how-do-i-concatenate-two-system-io-stream-instances-into-one) but realized "I need a different registry storage for this... it'd be too big"
- See, by the numbers yesterday, a ~500MB log file produced ~700k logs. This took up ~2.5 GB of RAM.
- The largest log I have on me right now is 7.1 GB. Using a nice ratio, that comes out to ~1.4 logs per MB, or in this case, a little over 10 million logs. Do the same for RAM, and it comes to ~36.5 GB.
- Yes, ~36.5 GB. My computer has 64 GB of RAM ("they told me I was crazy... what are they to say now?"), and while technically I'd be fine, the reality is I'd be pushing it on RAM alone. I might be fine, I might not.
- The other aspect is that LogEntry is a class right now. Which means every single one of them will be GCed and have additional data associated with them. That's a stretch to go. I'm not even sure if a List would be able to allocate a 36 GB array of memory.
- My last goal for the night was to do some unit tests, but I'm going to push that to tomorrow as it occured to me, "what if LogEntries were structs instead of classes?"
- Before I code that, my solution was to add NullStorage (think /dev/null) where every new log would just be ignored. Counted, but ignored. Nothing is stored. It's actually probably a useful type to have.
- Ok, it took longer then expected to convert FailedLogEntry to a struct (let alone LogEntry) and unit tests broke. So I need to see what happened... tomorrow.
#78.2. 6/28/2018:
- I tested the different entries as structs. I didn't measure any improvements in any metric... but did notice that it caused multiple unit tests to fail.
- So for the sake of knowing things work, and making the implementation less hacky, I'll leave them as classes.
- A quick test with NullStorage showed the XML parsing can run at 30,000 logs per sec.
- I wonder if hybrid approaches might be useful, like lockless queues. Everything goes into the queue extremely fast, but then a methodical dequeuing. This way parsing happens really fast, while storage takes as long as needed. This may be better for DBs, where it could be hitting disks instead of just RAM.
- NullStorage unit tests are done... I want to ensure all registry storage tests are in the same place, and need to make unit tests for the storage selection function. Tomorrow.
#79.2. 6/29/2018:
- I was gonna do unit tests to finish them off, but I mentally "crashed" and am very tired.
- Instead, I did some mindless work and wrote as many inline docs as I could.
#80.2. 6/30/2018:
- I decided to procrastinate the unit tests some more, and instead worked on more documentation. Not everything, but a few.
#81.2. 7/1/2018:
- I looked at the LogRegistry tests and decided that I wasn't going to remove any. Functionality could change if a null storage is used, but all other storage types should function the same.
- I also added a test for checking the desired storage type is returned for a specific criteria.
- Lastly, another bit of documentation.
- I might be a bit busy tomorrow, but if I can get the time, I'd like to start working on streaming log data (and do a bit more documentation).
#82.2. 7/2/2018:
- Ok... didn't get the time. More docs!
- Actually... I finished the docs. :D
#83.2. 7/3/2018:
- So, I was a bit late and planned to at least attempt to work on streaming and... unexpected injury(?)
- I listen to music using over-ear headphones (as is proper...) and the ear pads/cups/etc. that sit between the eat and the speaker becomes torn up on my headset. I bought new ones.
- I don't know about others, but these pads have a barely elastic band that is supposed to be stretched around the pads and then they contract and it holds the piece in place.
- I had a bit of a hard time and then I finally succeeded, I went to type out a message and... realized I couldn't feel one hand. I could barely type. At this point, it's uncomfortable.
- Instead of coding... I will read/research. A friend sent this to me: https://blog.marcgravell.com/2018/07/pipe-dreams-part-2.html
- I will read part 1 (https://blog.marcgravell.com/2018/07/pipe-dreams-part-1.html) before I get to part 2 (^^), as it's interesting and I like the concepts described.
- But that's all for now. Let's see if this doesn't sidetrack me into implementing...
#84.2. 7/4/2018:
- I basically finished witng yesterday's log, then "crashed" and didn't finish reading the article.
- I got further, but have't finished it yet. From what I've read so far, I'm tempted to switch to this Pipe/Pipeline system already.
- Years of dealing with Steams (C#), Reader/Writer (Java), I/O streams (C++), and numerous other I/O implementations have made me go "there's gotta be a better way" because even though API and usage varied between all of them, the structure was always the same: buffer, write, flush, close; buffer, read, test, loop, <repeat>, close
- C++ was probably the nicest because you didn't need the buffer stage... but if your data wasn't already in a serializable structure, you needed to convert between formats. Also, more commonly, the APIs you were working with didn't implement the << or >> operators, so you ended up needing to do buffers anyway.
- I'm not going to start on C++ templates... that's a complexity that was a mistake, even if the concept was ideal.
- Instead, the best "I/O" I've used is Rx (Reactive). Write being "ISubject<byte[]> stream = HTTPRequest.Open(url); IObservable<byte[]> dataOut = Serialize(...); stream.Subscribe(dataOut);" and read being "Deserialize(HTTPRequest.Open(url).Which(<filter empty data>))".
- Now... that actually doesn't look to good, now that I've written it, but it works so much nicer then other options.
- Pipe/Pipeline seems like either a nicer version of that (not specifically Rx, but using Rx for data streams) or at least something that would make Rx nicer to work with.
- My biggest issue with all the APIs has been knowing when data ends. Getting -1 means "we're done" but what if I simply lost connection over the network? Now I have to explictly define in API or data structure a "the data is done" in order to determine if I need to reopen the stream or not. What if I get an empty data stream? It's easy to skip an empty stream, but it's at least one additional line of code. If you do any abstraction around reading or writing, you now need to determine "where" to do that skip. Depending on where could mean you get your buffer, wrap it, start your decryption logic, get your IV, start a threadpool for IO operations, pass the buffer to the threadpool, thread to the decryption, and... nope, it was empty. No-op. But how many resources did you use to get there?
- Next annoyances: Knowing if I should write more data to IO, knowing the optimal amount of data to send, and otherwise building around the API (again, buffers for everything... as some put it, it's a code smell). I'm not sure if they have a nicer way to handle it with Pipe/Pipeline, but compressing/encrypting/etc. (and the reverse) are either a stage in the I/O that is like reading/writing inside of a read/write logic or it's encapsulation and you need to dig layers deep to get to the original stream if needed and the wrapping just repeats the logic at an intermediary layer, outside the read/write logic.
- The last one, encapsulation, is how many of the current stream I/Os work: stream = new BinaryReader(new SevenZipReader(new AESDecrypter(new FileStream(path), iv)));
- So... I'll continue reading and decide if I want to do something with it now, as I do still need to stream log data.
#85.2. 7/5/2018:
- So... I've been very slow at reading. Code articles/whitepapers are some of the dryest reading materials you can find. I'm not sure it could hold the attention of a rock.
- This article is fine, but it's not holding my attention.
- Looking up the code (which I now know is System.IO.Pipelines, which is never mentioned) and reading the articles (to the current point I'm at), I've decided...
- ...to skip it for now. As great of a concept as it is, there is literally no backend implementation. As the article puts "It's current release is like the abstract Stream class with no implementation of NetworkStream or FileStream".
- So while I want to use it, I must put function in front of cool-ness and move on.
- ...as soon as I finish reading it.
#86.2. 7/6/2018:
- So I noticed it's almost day 100... I'm going to do something weird... I'm going to branch and try to do a quick-and-dirty implementation to get something working (and will continue the log there) and then hopefully merge that back in (somehow).
- Look at branch experimental/poc for branch
- So I'm giving this a try: I want to see if I can get a simple system that I can actually use, even if not written "ideally".
- A proof of concept. Today may just be me playing around so there may or may not be much code today.
- ...and right after writing that, I realized a common logic piece I need anyway: streaming. Namely, some way to concat streams together.
- I simply (cowboy) coded and got most of the way through making a Stream that a generic interface type appended to, so that they'll iterate one after another as if one large stream.
- The biggest I have is that this needs to be support streaming logs, and NetworkStreams don't support Position (setting? Or getting too?), Seek, and Length. So I need to ensure that all my logic either has those as optional, or simply don't use them.
- The other worry I have is the code is not the cleanest (so maybe it's good it's in the branch) which means if something goes wrong, it will be a struggle to find what happened.
- I took inspiration from https://stackoverflow.com/a/3879208/492347 (https://github.com/lassevk/Streams/blob/master/Streams/CombinedStream.cs) but I allow runtime additions and don't rely on seeking and being able to use Length and Position.
#87.2. 7/7/2018:
- So my goal tonight was to get ConcatStream working in a way that would support streaming logs
- I succeeded
- There's still work to do... I could've done it, but I started watching food videos on YouTube. (I state this only for being honest, and because the videos looked really good)
- I learned that Position and Length require CanSeek position. At least, Position is documented to need it. Length isn't, but I can't figure out you'd really use Position without knowing Length, and if Position requires CanSeek, then by my logic, Length requires CanSeek too.
- Somewhat related: somehow seek lets you specify (final) positions less then 0 and greater then then length of the stream. It explicitly states it's allowed. Yet every implementation I see doesn't seem to allow it.
- Onto some cool bits, I ran a few stats with the new system. Now, a key difference is that I'm using the new NullStorage type because... I tested with the largest log I have, which is around 8BG in size. MemoryStream supports 2GB buffers... so it would never support even copying this log file. This is why I needed the appendable/ConcatStream and NullStorage. One so I could open the Streams (which is the same case for streaming logs) and the other because I simply don't have a need for storing such massive amounts of data right now.
- Numbers:
-- Parsing (without storage): ~23 MB of RAM
-- Logs parsed: ~9 millions logs
-- Log file size: ~8 GB
-- Parsing rate: ~33,500 logs per sec, or ~33.5 logs per ms
- Now, I think the interesting thing is that the parsing rate went up. "But NullStorage", yes... I'm not sure I wrote it before, but I tested before hand with NullStorage. Take NullStorage parsing rate and subtract ~10k/s.
- Statistics measured 50k/s parsing. I substracted my 10k/s and a little more to get what I think is a realistic rate once storage is used.
- But it made me go "did change in Streams make things go faster?" and I want to say no, as using the file from the last test gave me the same results. Then why the heck was this so much faster?
- I'm actually not sure. MemoryStream is very fast and skips disk I/O, FileStream hits disk (though I have an SSD... but that doesn't change getting the data for processing purposes). Nothing really changed from how the XML parser gets the data, and yet with the NullStorage, the larger log parsed 50k/s while the smaller log parsed 40k/s. There was a slight config change, but the per-log rate seemed the same.
- It's a mystery that I'll accept, as I don't think I'll get anything of that rate coming over SSH.
- Tomorrow, I'd like to finish of the stream and... figure out what I want to see as a working PoC and then start working towards that. Kinda hoping I don't have to do the DB-based storage right now, but we'll see.
#88.2. 7/8/2018:
- Finished Seek and wrote the source "optimization".
- For Seek, unlike the implementation I found, I don't produce buckets to determine progress. It reduces an aspect of complexity IMHO.
- I could probably cleanup the somewhat wacky Seek loop, but... have other things to do.
- I will do the actual planning tomorrow. Wanted to do it tonight, but still have a bunch of things to finish today.
#89.2. 7/9/2018:
- Little time crunch today, but I wrote up some things to try (Tasks.md) which is probably a stretch, but could be cool if I can pull it off.
#90.2. 7/10/2018:
- Looking at the source code that exists, if something gets added while the Observabke is running, it will stop or indicate completed.
- So... I'll have to write my own. Do it like an extension with an argument for choosing how new data/data changes are handled.
- I ran out of time, so I'll work on this tomorrow.
#91.2. 7/11/2018:
- Ran out of time again... but I wanted to get something. I made at least the boilerplate for the extension so that ICollection can be converted to an Observable.
- ICollection is chosen because the idea is that instead of an enumeration (which errors when it's contents change) we use for (int i = 0; i < collection.Count; i++)
- This will enable us to keep iterating, even if the contents change.
- I realized "but some logs run forever, so how do I do that?" and the easiest idea I had was to just have the collection never complete when it reaches the end of the collection.
- Then I went to implement it and went "maybe another time" and "what happens when I go to run unit tests?"
- FYI: this is the benefit of unit tests, as they depict how I assume people will use it. Suddenly saying "it never ends" is probably a bad idea.
- Unrelated: at the time I went "I want to work on stuff fast", I didn't assume "actually starting on work" would occur then... and still hasn't fully happened. I think I may miss a PoC on the 100 day deadline. Oh well... life of personal projects.
#92.2. 7/12/2018:
- I'm a broken record... ran out of time again. It's been a busy week and other events put me working on this later then I wanted.
- I tried to start working on an abstraction for the observable, so that I can simply write it without the IDisposable/scheduler/etc. complexity.
- I didn't get anywhere with that besides a base class that returns it's "type". I'm not sure how much I really care about the type but we'll see. It makes the function unique so it can act as an overload.
- Also, I decided the "infinite" model isn't needed. Since the LiveUpdating type will continue running even if the underlying sequence changes, a scheduler can be used to just slow things down... and then later speed them up, or something like that. No need for a special observable.
- Now... one question that came to mind: Skip/SkipLast functions basically iterate and just don't return data. Since the current extension is for a collection, can't I just do one easy skip (different starting index of ending test)?
- I'm not sure how I can pull that off without "you can't use System.Reactive.Linq", and that's not as easy as it might sound. Will look later...
- I also need to determine if I can get the ListStorage to use specific schedulers...
#93.2. 7/13/2018:
- Most of today was spent trying to figure out the best way to write an observable.
- I'm currently under the assumption that in order to get the optomizations mentioned before to work, I need to actually have control of execution. Otherwise a simple "the observable has a Subject and simply subscribes with it, then writes the collection to it" could be done.
- But there is an expected element to the creation, based on ToObservable: long running and short running schedulers.
- So when a subscription happens, if we don't want it running on "this" thread, we use a scheduler (which could still be "this" but at least it's controllable).