From a3ebee3ff8aa9f2cd4f682067e3073b3025f1738 Mon Sep 17 00:00:00 2001 From: Alex Benini Date: Fri, 5 Jan 2024 16:53:03 +0100 Subject: [PATCH 01/39] Add Snowplow Limited Use License (close #346) --- LICENSE-2.0.txt | 202 ------------------------------------------------ LICENSE.md | 57 ++++++++++++++ 2 files changed, 57 insertions(+), 202 deletions(-) delete mode 100644 LICENSE-2.0.txt create mode 100644 LICENSE.md diff --git a/LICENSE-2.0.txt b/LICENSE-2.0.txt deleted file mode 100644 index 59ec4e459..000000000 --- a/LICENSE-2.0.txt +++ /dev/null @@ -1,202 +0,0 @@ - - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright 2013-2022 Snowplow Analytics Ltd - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 000000000..6abbe69c7 --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,57 @@ +# Snowplow Limited Use License Agreement + +_Version 1.0, January 2024_ + +This Snowplow Limited Use License Agreement, Version 1.0 (the “Agreement”) sets forth the terms on which Snowplow Analytics, Ltd. (“Snowplow”) makes available certain software (the “Software”). BY INSTALLING, DOWNLOADING, ACCESSING, OR USING ANY OF THE SOFTWARE, YOU AGREE TO THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF YOU DO NOT AGREE TO SUCH TERMS AND CONDITIONS, YOU MUST NOT USE THE SOFTWARE. IF YOU ARE RECEIVING THE SOFTWARE ON BEHALF OF A LEGAL ENTITY, YOU REPRESENT AND WARRANT THAT YOU HAVE THE ACTUAL AUTHORITY TO AGREE TO THE TERMS AND CONDITIONS OF THIS AGREEMENT ON BEHALF OF SUCH ENTITY. “Licensee” means you, an individual, or the entity on whose behalf you are receiving the Software. + +## LICENSE GRANT AND CONDITIONS + +**1.1 License.** Subject to the terms and conditions of this Agreement, Snowplow hereby grants to Licensee a non-exclusive, royalty-free, worldwide, non-transferable, non-sublicensable license during the term of this Agreement to: (a) use the Software; (b) prepare modifications and derivative works of the Software; and (c) reproduce copies of the Software (the “License”). No right to distribute or make available the Software is granted under this License. Licensee is not granted the right to, and Licensee shall not, exercise the License for any Excluded Purpose. + +**1.2** For purposes of this Agreement, an “Excluded Purpose” is any use that is either a Competing Use or a Highly-Available Production Use, or both of them. + +* **1.2.1** A “Competing Use” is making available any on-premises or distributed software product, or any software-as-a-service, platform-as-a-service, infrastructure-as-a-service, or other similar online service, that competes with any products or services that Snowplow or any of its affiliates provides using the Software. + +* **1.2.2** Highly-Available Production Use is any highly-available use, including without limitation any use where multiple instances of any Software component run concurrently to avoid a single point of failure, in a production environment, where production means use on live data. + +**1.3 Conditions.** In consideration of the License, Licensee’s use of the Software is subject to the following conditions: + +* **a.** Licensee must cause any Software modified by Licensee to carry prominent notices stating that Licensee modified the Software. + +* **b.** On each Software copy, Licensee shall reproduce and not remove or alter all Snowplow or third party copyright or other proprietary notices contained in the Software, and Licensee must include the notice below on each copy. + + ``` + This software is made available by Snowplow Analytics, Ltd., + under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + located at https://docs.snowplow.io/limited-use-license-1.0 + BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + ``` + +**1.4 Licensee Modifications.** Licensee may add its own copyright notices to modifications made by Licensee. + +**1.5 No Sublicensing.** The License does not include the right to sublicense the Software, however, each recipient to which Licensee provides the Software may exercise the Licenses so long as such recipient agrees to the terms and conditions of this Agreement. + +## TERM AND TERMINATION + +This Agreement will continue unless and until earlier terminated as set forth herein. If Licensee breaches any of its conditions or obligations under this Agreement, this Agreement will terminate automatically and the License will terminate automatically and permanently. + +## INTELLECTUAL PROPERTY + +As between the parties, Snowplow will retain all right, title, and interest in the Software, and all intellectual property rights therein. Snowplow hereby reserves all rights not expressly granted to Licensee in this Agreement. Snowplow hereby reserves all rights in its trademarks and service marks, and no licenses therein are granted in this Agreement. + +## DISCLAIMER + +SNOWPLOW HEREBY DISCLAIMS ANY AND ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, AND SPECIFICALLY DISCLAIMS ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, WITH RESPECT TO THE SOFTWARE. + +## LIMITATION OF LIABILITY + +SNOWPLOW WILL NOT BE LIABLE FOR ANY DAMAGES OF ANY KIND, INCLUDING BUT NOT LIMITED TO LOST PROFITS OR ANY CONSEQUENTIAL, SPECIAL, INCIDENTAL, INDIRECT, OR DIRECT DAMAGES, HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, ARISING OUT OF THIS AGREEMENT. THE FOREGOING SHALL APPLY TO THE EXTENT PERMITTED BY APPLICABLE LAW. + +## GENERAL + +**6.1 Governing Law.** This Agreement will be governed by and interpreted in accordance with the laws of the state of Delaware, without reference to its conflict of laws principles. If Licensee is located within the United States, all disputes arising out of this Agreement are subject to the exclusive jurisdiction of courts located in Delaware, USA. If Licensee is located outside of the United States, any dispute, controversy or claim arising out of or relating to this Agreement will be referred to and finally determined by arbitration in accordance with the JAMS International Arbitration Rules. The tribunal will consist of one arbitrator. The place of arbitration will be in the State of Delaware, USA. The language to be used in the arbitral proceedings will be English. Judgment upon the award rendered by the arbitrator may be entered in any court having jurisdiction thereof. + +**6.2. Assignment.** Licensee is not authorized to assign its rights under this Agreement to any third party. Snowplow may freely assign its rights under this Agreement to any third party. + +**6.3. Other.** This Agreement is the entire agreement between the parties regarding the subject matter hereof. No amendment or modification of this Agreement will be valid or binding upon the parties unless made in writing and signed by the duly authorized representatives of both parties. In the event that any provision, including without limitation any condition, of this Agreement is held to be unenforceable, this Agreement and all licenses and rights granted hereunder will immediately terminate. Waiver by Snowplow of a breach of any provision of this Agreement or the failure by Snowplow to exercise any right hereunder will not be construed as a waiver of any subsequent breach of that right or as a waiver of any other right. From 25968749d2baa4911d855d8da70804e8f02abefb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Piotr=20Poniedzia=C5=82ek?= Date: Thu, 27 Jul 2023 18:05:09 +0200 Subject: [PATCH 02/39] Add http4s module (close #364) --- README.md | 17 ++---- build.sbt | 42 +++++++++----- .../scalastream/it/CollectorContainer.scala | 22 +++---- .../scalastream/it/CollectorOutput.scala | 22 +++---- .../scalastream/it/EventGenerator.scala | 22 +++---- .../collectors/scalastream/it/Http.scala | 22 +++---- .../collectors/scalastream/it/utils.scala | 20 +++---- .../Collector.scala | 20 +++---- .../CollectorRoute.scala | 20 +++---- .../CollectorService.scala | 20 +++---- .../HealthService.scala | 20 +++---- .../Warmup.scala | 20 +++---- .../model.scala | 20 +++---- .../sinks/Sink.scala | 22 +++---- .../telemetry/CloudVendor.scala | 20 +++---- .../telemetry/TelemetryAkkaService.scala | 20 +++---- .../telemetry/TelemetryPayload.scala | 20 +++---- .../telemetry/package.scala | 20 +++---- .../utils/SplitBatch.scala | 18 +++--- .../CollectorRouteSpec.scala | 20 +++---- .../CollectorServiceSpec.scala | 20 +++---- .../TestSink.scala | 22 +++---- .../TestUtils.scala | 20 +++---- .../config/ConfigReaderSpec.scala | 26 +++------ .../config/ConfigSpec.scala | 26 +++------ .../utils/SplitBatchSpec.scala | 20 +++---- examples/config.kafka.extended.hocon | 19 +++--- examples/config.kinesis.extended.hocon | 19 +++--- examples/config.nsq.extended.hocon | 19 +++--- examples/config.pubsub.extended.hocon | 19 +++--- examples/config.rabbitmq.extended.hocon | 19 +++--- examples/config.sqs.extended.hocon | 24 +++----- examples/config.stdout.extended.hocon | 19 +++--- .../CollectorApp.scala | 58 +++++++++++++++++++ .../CollectorRoutes.scala | 15 +++++ .../CollectorRoutesSpec.scala | 21 +++++++ .../KafkaCollector.scala | 24 ++++---- .../sinks/KafkaSink.scala | 22 ++++--- .../KafkaConfigSpec.scala | 20 ++----- .../scalastream/it/core/CookieSpec.scala | 22 +++---- .../scalastream/it/core/CustomPathsSpec.scala | 22 +++---- .../it/core/DoNotTrackCookieSpec.scala | 20 +++---- .../it/core/HealthEndpointSpec.scala | 20 +++---- .../it/core/XForwardedForSpec.scala | 20 +++---- .../scalastream/it/kinesis/Kinesis.scala | 20 +++---- .../it/kinesis/KinesisCollectorSpec.scala | 20 +++---- .../it/kinesis/containers/Collector.scala | 20 +++---- .../it/kinesis/containers/Localstack.scala | 20 +++---- .../KinesisCollector.scala | 24 ++++---- .../sinks/KinesisSink.scala | 22 ++++--- .../sinks/KinesisConfigSpec.scala | 22 +++---- .../sinks/KinesisSinkSpec.scala | 24 ++++---- .../NsqCollector.scala | 24 ++++---- .../sinks/NsqSink.scala | 28 ++++----- .../NsqConfigSpec.scala | 20 ++----- project/BuildSettings.scala | 21 +++---- project/Dependencies.scala | 36 +++++++----- .../scalastream/it/pubsub/Containers.scala | 20 +++---- .../it/pubsub/GooglePubSubCollectorSpec.scala | 20 +++---- .../scalastream/it/pubsub/PubSub.scala | 20 +++---- .../GooglePubSubCollector.scala | 20 +++---- .../sinks/GooglePubSubSink.scala | 18 +++--- .../PubsubConfigSpec.scala | 26 +++------ .../sinks/GcpUserAgentSpec.scala | 42 -------------- .../RabbitMQCollector.scala | 20 +++---- .../sinks/RabbitMQSink.scala | 18 +++--- .../SqsCollector.scala | 24 ++++---- .../sinks/SqsSink.scala | 22 ++++--- .../SqsConfigSpec.scala | 20 ++----- .../StdoutCollector.scala | 45 ++++---------- .../sinks/StdoutSink.scala | 40 ------------- .../StdoutConfigSpec.scala | 25 -------- 72 files changed, 675 insertions(+), 969 deletions(-) create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala delete mode 100644 pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala delete mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/StdoutSink.scala delete mode 100644 stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutConfigSpec.scala diff --git a/README.md b/README.md index d2715d258..37260bfd6 100644 --- a/README.md +++ b/README.md @@ -21,16 +21,9 @@ events to [Amazon Kinesis][kinesis] and [NSQ][nsq], and is built on top of [akka ## Copyright and license -The Scala Stream Collector is copyright 2013-2022 Snowplow Analytics Ltd. +Copyright (c) 2023-present Snowplow Analytics Ltd. All rights reserved. -Licensed under the [Apache License, Version 2.0][license] (the "License"); -you may not use this software except in compliance with the License. - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. +Licensed under the [Snowplow Limited Use License Agreement][license]. _(If you are uncertain how it applies to your use case, check our answers to [frequently asked questions][faq].)_ [snowplow]: http://snowplowanalytics.com @@ -55,5 +48,7 @@ limitations under the License. [release-image]: https://img.shields.io/github/v/release/snowplow/stream-collector?sort=semver&style=flat [releases]: https://github.com/snowplow/stream-collector -[license-image]: http://img.shields.io/badge/license-Apache--2-blue.svg?style=flat -[license]: http://www.apache.org/licenses/LICENSE-2.0 +[license]: https://docs.snowplow.io/limited-use-license-1.0 +[license-image]: https://img.shields.io/badge/license-Snowplow--Limited-Use-blue.svg?style=flat + +[faq]: https://docs.snowplow.io/docs/contributing/limited-use-license-faq/ diff --git a/build.sbt b/build.sbt index 25984eb2c..586a41d42 100644 --- a/build.sbt +++ b/build.sbt @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ import com.typesafe.sbt.packager.docker._ import sbtbuildinfo.BuildInfoPlugin.autoImport.buildInfoPackage @@ -90,6 +86,7 @@ lazy val buildSettings = Seq( name := "snowplow-stream-collector", description := "Scala Stream Collector for Snowplow raw events", scalaVersion := "2.12.10", + scalacOptions ++= Seq("-Ypartial-unification"), javacOptions := Seq("-source", "11", "-target", "11"), resolvers ++= Dependencies.resolutionRepos ) @@ -109,7 +106,7 @@ lazy val allSettings = buildSettings ++ lazy val root = project .in(file(".")) .settings(buildSettings ++ dynVerSettings) - .aggregate(core, kinesis, pubsub, kafka, nsq, stdout, sqs, rabbitmq) + .aggregate(core, kinesis, pubsub, kafka, nsq, stdout, sqs, rabbitmq, http4s) lazy val core = project .settings(moduleName := "snowplow-stream-collector-core") @@ -119,6 +116,21 @@ lazy val core = project .settings(Defaults.itSettings) .configs(IntegrationTest) +lazy val http4s = project + .settings(moduleName := "snowplow-stream-collector-http4s-core") + .settings(buildSettings ++ BuildSettings.sbtAssemblySettings) + .settings( + libraryDependencies ++= Seq( + Dependencies.Libraries.http4sDsl, + Dependencies.Libraries.http4sEmber, + Dependencies.Libraries.http4sBlaze, + Dependencies.Libraries.http4sNetty, + Dependencies.Libraries.log4cats, + Dependencies.Libraries.slf4j, + Dependencies.Libraries.specs2 + ) + ) + lazy val kinesisSettings = allSettings ++ buildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( moduleName := "snowplow-stream-collector-kinesis", @@ -251,14 +263,14 @@ lazy val stdoutSettings = lazy val stdout = project .settings(stdoutSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile") lazy val stdoutDistroless = project .in(file("distroless/stdout")) .settings(sourceDirectory := (stdout / sourceDirectory).value) .settings(stdoutSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile") lazy val rabbitmqSettings = allSettings ++ buildInfoSettings ++ Seq( diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala index 0ec85bd9d..344391426 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it @@ -20,4 +16,4 @@ case class CollectorContainer( container: GenericContainer[_], host: String, port: Int -) \ No newline at end of file +) diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala index 88f098bf4..a14ea04af 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it @@ -21,4 +17,4 @@ import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPa case class CollectorOutput( good: List[CollectorPayload], bad: List[BadRow] -) \ No newline at end of file +) diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala index 6f7cbdaed..e25dd11ad 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2022-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it @@ -59,4 +55,4 @@ object EventGenerator { val body = if (valid) "foo" else "a" * (maxBytes + 1) Request[IO](Method.POST, uri).withEntity(body) } -} \ No newline at end of file +} diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala index 9048a6543..2feb1dae4 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it @@ -43,4 +39,4 @@ object Http { def mkClient: Resource[IO, Client[IO]] = BlazeClientBuilder[IO](executionContext).resource -} \ No newline at end of file +} diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala index 7c370725d..bfefaafba 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2022-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala index 9e0363337..81e6af06d 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala index b86f4b07d..de818bbf6 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala index 968f9b9e2..d9b457b81 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala index 02759c038..54c77eee4 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala index d4e40b266..3a7449303 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala index 168393502..6627f274d 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala index 02157734f..00c51b959 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala @@ -1,20 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala index 803c1740d..00ae0d0d4 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.telemetry diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala index 30b2fe032..a2c9aec59 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.telemetry diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala index f72d7e998..56d766bc2 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.telemetry diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala index 3f36e0ab2..1105bf994 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala index e861ea047..7785aeabb 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala @@ -1,14 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream package utils diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala index a35b7512a..55360e093 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala index cc1036ac8..f4f3f3df3 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala index 9bab4ce7d..649353fbe 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala @@ -1,20 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala index 129585f44..7fc024d90 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream import scala.concurrent.duration._ diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala index de61a0b30..161bde64b 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala @@ -1,21 +1,13 @@ /** - * Copyright (c) 2014-2023 Snowplow Analytics Ltd. - * All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream.config import pureconfig.ConfigSource diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala index 951bfb3b9..fb12ffa2c 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala @@ -1,21 +1,13 @@ /** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. - * All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream.config import com.snowplowanalytics.snowplow.collectors.scalastream.Collector diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala index 122cc10c4..d3ecdd3b0 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream package utils diff --git a/examples/config.kafka.extended.hocon b/examples/config.kafka.extended.hocon index b5d0c26da..072fb28fa 100644 --- a/examples/config.kafka.extended.hocon +++ b/examples/config.kafka.extended.hocon @@ -1,15 +1,12 @@ -# Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # This file (config.hocon.sample) contains a template with # configuration options for the Scala Stream Collector. diff --git a/examples/config.kinesis.extended.hocon b/examples/config.kinesis.extended.hocon index 9a54621ee..21b7b9360 100644 --- a/examples/config.kinesis.extended.hocon +++ b/examples/config.kinesis.extended.hocon @@ -1,15 +1,12 @@ -# Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # This file (config.hocon.sample) contains a template with # configuration options for the Scala Stream Collector. diff --git a/examples/config.nsq.extended.hocon b/examples/config.nsq.extended.hocon index 36b4c005c..e4309d916 100644 --- a/examples/config.nsq.extended.hocon +++ b/examples/config.nsq.extended.hocon @@ -1,15 +1,12 @@ -# Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # This file (config.hocon.sample) contains a template with # configuration options for the Scala Stream Collector. diff --git a/examples/config.pubsub.extended.hocon b/examples/config.pubsub.extended.hocon index 4dabb624f..548588a0a 100644 --- a/examples/config.pubsub.extended.hocon +++ b/examples/config.pubsub.extended.hocon @@ -1,15 +1,12 @@ -# Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # This file (config.hocon.sample) contains a template with # configuration options for the Scala Stream Collector. diff --git a/examples/config.rabbitmq.extended.hocon b/examples/config.rabbitmq.extended.hocon index 04a8bbde1..ca9ded4ad 100644 --- a/examples/config.rabbitmq.extended.hocon +++ b/examples/config.rabbitmq.extended.hocon @@ -1,15 +1,12 @@ -# Copyright (c) 2022-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # This file (config.hocon.sample) contains a template with # configuration options for the Scala Stream Collector. diff --git a/examples/config.sqs.extended.hocon b/examples/config.sqs.extended.hocon index 6f127c6d2..c48a6c461 100644 --- a/examples/config.sqs.extended.hocon +++ b/examples/config.sqs.extended.hocon @@ -1,20 +1,12 @@ -# Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. - -# This file (config.hocon.sample) contains a template with -# configuration options for the Scala Stream Collector. -# -# To use, copy this to 'application.conf' and modify the configuration options. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # 'collector' contains configuration options for the main Scala collector. collector { diff --git a/examples/config.stdout.extended.hocon b/examples/config.stdout.extended.hocon index fe4d647af..75289ae55 100644 --- a/examples/config.stdout.extended.hocon +++ b/examples/config.stdout.extended.hocon @@ -1,15 +1,12 @@ -# Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +# Copyright (c) 2013-present Snowplow Analytics Ltd. +# All rights reserved. # -# This program is licensed to you under the Apache License Version 2.0, and -# you may not use this file except in compliance with the Apache License -# Version 2.0. You may obtain a copy of the Apache License Version 2.0 at -# http://www.apache.org/licenses/LICENSE-2.0. -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the Apache License Version 2.0 is distributed on an "AS -# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or -# implied. See the Apache License Version 2.0 for the specific language -# governing permissions and limitations there under. +# This software is made available by Snowplow Analytics, Ltd., +# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 +# located at https://docs.snowplow.io/limited-use-license-1.0 +# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION +# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + # This file (config.hocon.sample) contains a template with # configuration options for the Scala Stream Collector. diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala new file mode 100644 index 000000000..fbdbce6e4 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala @@ -0,0 +1,58 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.implicits._ +import cats.effect.{ExitCode, IO} +import cats.effect.kernel.Resource +import com.comcast.ip4s.IpLiteralSyntax +import org.http4s.server.Server +import org.http4s.ember.server.EmberServerBuilder +import org.http4s.blaze.server.BlazeServerBuilder +import org.http4s.netty.server.NettyServerBuilder +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import java.net.InetSocketAddress +import scala.concurrent.duration.DurationLong + +object CollectorApp { + + implicit private def unsafeLogger: Logger[IO] = + Slf4jLogger.getLogger[IO] + + def run(): IO[ExitCode] = + buildHttpServer().use(_ => IO.never).as(ExitCode.Success) + + private def buildHttpServer(): Resource[IO, Server] = + sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { + case Some("EMBER") | None => buildEmberServer + case Some("BLAZE") => buildBlazeServer + case Some("NETTY") => buildNettyServer + case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") + } + + private def buildEmberServer = + Resource.eval(Logger[IO].info("Building ember server")) >> + EmberServerBuilder + .default[IO] + .withHost(ipv4"0.0.0.0") + .withPort(port"8080") + .withHttpApp(new CollectorRoutes[IO].value) + .withIdleTimeout(610.seconds) + .build + + private def buildBlazeServer: Resource[IO, Server] = + Resource.eval(Logger[IO].info("Building blaze server")) >> + BlazeServerBuilder[IO] + .bindSocketAddress(new InetSocketAddress(8080)) + .withHttpApp(new CollectorRoutes[IO].value) + .withIdleTimeout(610.seconds) + .resource + + private def buildNettyServer: Resource[IO, Server] = + Resource.eval(Logger[IO].info("Building netty server")) >> + NettyServerBuilder[IO] + .bindLocal(8080) + .withHttpApp(new CollectorRoutes[IO].value) + .withIdleTimeout(610.seconds) + .resource +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala new file mode 100644 index 000000000..d83973fca --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -0,0 +1,15 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.Sync +import org.http4s.{HttpApp, HttpRoutes} +import org.http4s.dsl.Http4sDsl + +class CollectorRoutes[F[_]: Sync]() extends Http4sDsl[F] { + + lazy val value: HttpApp[F] = HttpRoutes + .of[F] { + case GET -> Root / "health" => + Ok("ok") + } + .orNotFound +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala new file mode 100644 index 000000000..6a238bc12 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -0,0 +1,21 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.IO +import cats.effect.unsafe.implicits.global +import org.http4s.implicits.http4sLiteralsSyntax +import org.http4s.{Method, Request, Status} +import org.specs2.mutable.Specification + +class CollectorRoutesSpec extends Specification { + + "Health endpoint" should { + "return OK always because collector always works" in { + val request = Request[IO](method = Method.GET, uri = uri"/health") + val response = new CollectorRoutes[IO].value.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.Ok) + response.as[String].unsafeRunSync() must beEqualTo("ok") + } + } + +} diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala index 7d625e208..4d6ed1e4d 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala @@ -1,17 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream import com.snowplowanalytics.snowplow.collectors.scalastream.model._ diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala index 40a8dc9d8..6e63f2cab 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala @@ -1,15 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala index 8ef97a9a5..7bc486a72 100644 --- a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala +++ b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala @@ -1,20 +1,12 @@ /** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. + * Copyright (c) 2013-present Snowplow Analytics Ltd. * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala index ac1c1ab27..556d77f0a 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core @@ -259,4 +255,4 @@ class CookieSpec extends Specification with Localstack with CatsIO { } """ } -} \ No newline at end of file +} diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala index 78610f161..7c69ed56e 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core @@ -81,4 +77,4 @@ class CustomPathsSpec extends Specification with Localstack with CatsIO { } } } -} \ No newline at end of file +} diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala index 03bc31fa2..37f5b2f9c 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala index 12b27a48d..9c25c834a 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala index adba7c2fd..cd21768bf 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala index 793dbfeba..8b6eba662 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala index af1878555..d606b2e36 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala index 8408dd6d6..2a5b44e37 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Localstack.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Localstack.scala index ed3af3698..c421753ba 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Localstack.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Localstack.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2023-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala index d16d59454..9209debc9 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala @@ -1,17 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream import java.util.concurrent.ScheduledThreadPoolExecutor diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala index 59b995faf..2c76850f1 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala @@ -1,15 +1,13 @@ -/* - * Copyright (c) 2013-2023 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index 03f8e0be3..c2a1e8ba8 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -1,22 +1,14 @@ /** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. + * Copyright (c) 2013-present Snowplow Analytics Ltd. * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ -package com.snowplowanalytics.snowplow.collectors.scalastream.sinks +package com.snowplowanalytics.snowplow.collectors.scalastream import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkSpec.scala index 05cbc016a..02e1a3c0a 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkSpec.scala @@ -1,17 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala index 44bdd04f0..7a7235c4d 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala @@ -1,17 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala index cd466e441..f811755fb 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala @@ -1,21 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. - * All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala index a70ad4606..f4716c56a 100644 --- a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala +++ b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala @@ -1,20 +1,12 @@ /** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. + * Copyright (c) 2013-present Snowplow Analytics Ltd. * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/project/BuildSettings.scala b/project/BuildSettings.scala index c150b687b..b4cc4e13d 100644 --- a/project/BuildSettings.scala +++ b/project/BuildSettings.scala @@ -1,17 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the - * Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on - * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either - * express or implied. See the Apache License Version 2.0 for the specific - * language governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ // SBT diff --git a/project/Dependencies.scala b/project/Dependencies.scala index 6cb214c79..b43e773a3 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ import sbt._ @@ -47,12 +43,16 @@ object Dependencies { val pureconfig = "0.17.2" val akkaHttpMetrics = "1.7.1" val badRows = "2.1.1" + val log4cats = "2.6.0" // Scala (test only) val specs2 = "4.11.0" val specs2CE = "0.4.1" val testcontainers = "0.40.10" val catsRetry = "2.1.0" - val http4s = "0.21.33" + val http4s = "0.23.23" + val blaze = "0.23.15" + val http4sNetty = "0.5.9" + val http4sIT = "0.21.33" } object Libraries { @@ -86,14 +86,22 @@ object Dependencies { val akkaSlf4j = "com.typesafe.akka" %% "akka-slf4j" % V.akka val pureconfig = "com.github.pureconfig" %% "pureconfig" % V.pureconfig val akkaHttpMetrics = "fr.davit" %% "akka-http-metrics-datadog" % V.akkaHttpMetrics + val log4cats = "org.typelevel" %% "log4cats-slf4j" % V.log4cats + + //http4s + val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s + val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s + val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze + val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty + // Scala (test only) val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test val specs2It = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest val specs2CEIt = "com.codecommit" %% "cats-effect-testing-specs2" % V.specs2CE % IntegrationTest val testcontainersIt = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest val catsRetryIt = "com.github.cb372" %% "cats-retry" % V.catsRetry % IntegrationTest - val http4sClientIt = "org.http4s" %% "http4s-blaze-client" % V.http4s % IntegrationTest + val http4sClientIt = "org.http4s" %% "http4s-blaze-client" % V.http4sIT % IntegrationTest val akkaTestkit = "com.typesafe.akka" %% "akka-testkit" % V.akka % Test val akkaHttpTestkit = "com.typesafe.akka" %% "akka-http-testkit" % V.akkaHttp % Test val akkaStreamTestkit = "com.typesafe.akka" %% "akka-stream-testkit" % V.akka % Test diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala index 6f91b9297..85ec55bee 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2022-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.pubsub diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala index d1943eeb3..f8e2bc2ef 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2022-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.pubsub diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala index d7ec43b0f..3bac0f273 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2022-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.pubsub diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala index 0a79ee614..55938984b 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala index b2b7700eb..8d9fb2943 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala @@ -1,14 +1,12 @@ -/* - * Copyright (c) 2013-2023 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala index 40583f94e..b9da73a19 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala @@ -1,21 +1,13 @@ /** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. - * All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala deleted file mode 100644 index b852aadff..000000000 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala +++ /dev/null @@ -1,42 +0,0 @@ -/* - * Copyright (c) 2023 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.sinks - -import java.util.regex.Pattern - -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -import org.specs2.mutable.Specification - -class GcpUserAgentSpec extends Specification { - - "createUserAgent" should { - "create user agent string correctly" in { - val gcpUserAgent = GcpUserAgent(productName = "Snowplow OSS") - val resultUserAgent = GooglePubSubSink.createUserAgent(gcpUserAgent) - val expectedUserAgent = s"Snowplow OSS/collector (GPN:Snowplow;)" - - val userAgentRegex = Pattern.compile( - """(?iU)(?:[^\(\)\/]+\/[^\/]+\s+)*(?:[^\s][^\(\)\/]+\/[^\/]+\s?\([^\(\)]*)gpn:(.*)[;\)]""" - ) - val matcher = userAgentRegex.matcher(resultUserAgent) - val matched = if (matcher.find()) Some(matcher.group(1)) else None - val expectedMatched = "Snowplow;" - - resultUserAgent must beEqualTo(expectedUserAgent) - matched must beSome(expectedMatched) - } - } -} diff --git a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala b/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala index ce7336422..2d17dc39a 100644 --- a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala +++ b/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala @@ -1,16 +1,12 @@ -/* - * Copyright (c) 2022-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala b/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala index e1e9ca368..0ebf71a7d 100644 --- a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala +++ b/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala @@ -1,14 +1,12 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala index 53c964c40..2e3bf14cf 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala @@ -1,17 +1,13 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream import java.util.concurrent.ScheduledThreadPoolExecutor diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala index b3e388ad8..6ffe57f6f 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala @@ -1,15 +1,13 @@ -/* - * Copyright (c) 2013-2023 Snowplow Analytics Ltd. All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the Apache License Version 2.0 for the specific language governing permissions and limitations there under. - */ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks import java.nio.ByteBuffer diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index 690c63d44..84f955a0e 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -1,20 +1,12 @@ /** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. + * Copyright (c) 2013-present Snowplow Analytics Ltd. * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala index 4fbdb1f2c..97721fa34 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala @@ -1,40 +1,19 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. All rights reserved. +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. * - * This program is licensed to you under the Apache License Version 2.0, and - * you may not use this file except in compliance with the Apache License - * Version 2.0. You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the Apache License Version 2.0 is distributed on an "AS - * IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - * implied. See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.StdoutSink -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService - -object StdoutCollector extends Collector { +import cats.effect.{ExitCode, IO, IOApp} - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion +object StdoutCollector extends IOApp { - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks = { - val (good, bad) = collectorConf.streams.sink match { - case s: Stdout => (new StdoutSink(s.maxBytes, "out"), new StdoutSink(s.maxBytes, "err")) - case _ => throw new IllegalArgumentException("Configured sink is not stdout") - } - CollectorSinks(good, bad) - } - run(collectorConf, akkaConf, sinks, telemetry) - } + def run(args: List[String]): IO[ExitCode] = + CollectorApp.run() } diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/StdoutSink.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/StdoutSink.scala deleted file mode 100644 index 72a2f6c42..000000000 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/StdoutSink.scala +++ /dev/null @@ -1,40 +0,0 @@ -/* - * Copyright (c) 2013-2022 Snowplow Analytics Ltd. - * All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -package sinks - -import org.apache.commons.codec.binary.Base64 - -class StdoutSink(val maxBytes: Int, streamName: String) extends Sink { - - // Print a Base64-encoded event. - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - streamName match { - case "out" => - events.foreach { e => - println(Base64.encodeBase64String(e)) - } - case "err" => - events.foreach { e => - Console.err.println(Base64.encodeBase64String(e)) - } - } - - override def shutdown(): Unit = () -} diff --git a/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutConfigSpec.scala b/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutConfigSpec.scala deleted file mode 100644 index c25885e85..000000000 --- a/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutConfigSpec.scala +++ /dev/null @@ -1,25 +0,0 @@ -/** - * Copyright (c) 2014-2022 Snowplow Analytics Ltd. - * All rights reserved. - * - * This program is licensed to you under the Apache License Version 2.0, - * and you may not use this file except in compliance with the Apache - * License Version 2.0. - * You may obtain a copy of the Apache License Version 2.0 at - * http://www.apache.org/licenses/LICENSE-2.0. - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the Apache License Version 2.0 is distributed - * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, - * either express or implied. - * - * See the Apache License Version 2.0 for the specific language - * governing permissions and limitations there under. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec - -class StdoutConfigSpec extends ConfigSpec { - makeConfigTest("stdout", "", "") -} From 6f7f3bba4cf8845a41e9e4b3240ba6875a5f183d Mon Sep 17 00:00:00 2001 From: Ian Streeter Date: Wed, 2 Aug 2023 21:13:31 +0100 Subject: [PATCH 03/39] Add http4s graceful shutdown (close #365) --- .../CollectorApp.scala | 71 ++++++++++++------- .../CollectorRoutes.scala | 4 +- .../Sink.scala | 11 +++ .../CollectorRoutesSpec.scala | 3 +- .../CollectorTestUtils.scala | 13 ++++ .../StdoutCollector.scala | 28 +++++++- 6 files changed, 100 insertions(+), 30 deletions(-) create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala index fbdbce6e4..3e7f82e36 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala @@ -1,9 +1,11 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.implicits._ -import cats.effect.{ExitCode, IO} +import cats.effect.{Async, ExitCode, Sync} import cats.effect.kernel.Resource +import fs2.io.net.Network import com.comcast.ip4s.IpLiteralSyntax +import org.http4s.HttpApp import org.http4s.server.Server import org.http4s.ember.server.EmberServerBuilder import org.http4s.blaze.server.BlazeServerBuilder @@ -12,47 +14,66 @@ import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger import java.net.InetSocketAddress -import scala.concurrent.duration.DurationLong +import scala.concurrent.duration.{DurationLong, FiniteDuration} object CollectorApp { - implicit private def unsafeLogger: Logger[IO] = - Slf4jLogger.getLogger[IO] + implicit private def unsafeLogger[F[_]: Sync]: Logger[F] = + Slf4jLogger.getLogger[F] - def run(): IO[ExitCode] = - buildHttpServer().use(_ => IO.never).as(ExitCode.Success) + def run[F[_]: Async](mkGood: Resource[F, Sink[F]], mkBad: Resource[F, Sink[F]]): F[ExitCode] = { + val resources = for { + bad <- mkBad + good <- mkGood + _ <- withGracefulShutdown(610.seconds) { + buildHttpServer[F](new CollectorRoutes[F](good, bad).value) + } + } yield () - private def buildHttpServer(): Resource[IO, Server] = + resources.surround(Async[F].never[ExitCode]) + } + + private def withGracefulShutdown[F[_]: Async, A](delay: FiniteDuration)(resource: Resource[F, A]): Resource[F, A] = + for { + a <- resource + _ <- Resource.onFinalizeCase { + case Resource.ExitCase.Canceled => + Logger[F].warn(s"Shutdown interrupted. Will continue to serve requests for $delay") >> + Async[F].sleep(delay) + case _ => + Async[F].unit + } + } yield a + + private def buildHttpServer[F[_]: Async](app: HttpApp[F]): Resource[F, Server] = sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { - case Some("EMBER") | None => buildEmberServer - case Some("BLAZE") => buildBlazeServer - case Some("NETTY") => buildNettyServer + case Some("EMBER") | None => buildEmberServer[F](app) + case Some("BLAZE") => buildBlazeServer[F](app) + case Some("NETTY") => buildNettyServer[F](app) case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") } - private def buildEmberServer = - Resource.eval(Logger[IO].info("Building ember server")) >> + private def buildEmberServer[F[_]: Async](app: HttpApp[F]) = { + implicit val network = Network.forAsync[F] + Resource.eval(Logger[F].info("Building ember server")) >> EmberServerBuilder - .default[IO] + .default[F] .withHost(ipv4"0.0.0.0") .withPort(port"8080") - .withHttpApp(new CollectorRoutes[IO].value) + .withHttpApp(app) .withIdleTimeout(610.seconds) .build + } - private def buildBlazeServer: Resource[IO, Server] = - Resource.eval(Logger[IO].info("Building blaze server")) >> - BlazeServerBuilder[IO] + private def buildBlazeServer[F[_]: Async](app: HttpApp[F]): Resource[F, Server] = + Resource.eval(Logger[F].info("Building blaze server")) >> + BlazeServerBuilder[F] .bindSocketAddress(new InetSocketAddress(8080)) - .withHttpApp(new CollectorRoutes[IO].value) + .withHttpApp(app) .withIdleTimeout(610.seconds) .resource - private def buildNettyServer: Resource[IO, Server] = - Resource.eval(Logger[IO].info("Building netty server")) >> - NettyServerBuilder[IO] - .bindLocal(8080) - .withHttpApp(new CollectorRoutes[IO].value) - .withIdleTimeout(610.seconds) - .resource + private def buildNettyServer[F[_]: Async](app: HttpApp[F]): Resource[F, Server] = + Resource.eval(Logger[F].info("Building netty server")) >> + NettyServerBuilder[F].bindLocal(8080).withHttpApp(app).withIdleTimeout(610.seconds).resource } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala index d83973fca..628cb3524 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -4,7 +4,9 @@ import cats.effect.Sync import org.http4s.{HttpApp, HttpRoutes} import org.http4s.dsl.Http4sDsl -class CollectorRoutes[F[_]: Sync]() extends Http4sDsl[F] { +class CollectorRoutes[F[_]: Sync](good: Sink[F], bad: Sink[F]) extends Http4sDsl[F] { + + val _ = (good, bad) lazy val value: HttpApp[F] = HttpRoutes .of[F] { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala new file mode 100644 index 000000000..8cdc85935 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala @@ -0,0 +1,11 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +trait Sink[F[_]] { + + // Maximum number of bytes that a single record can contain. + // If a record is bigger, a size violation bad row is emitted instead + val maxBytes: Int + + def isHealthy: F[Boolean] + def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala index 6a238bc12..3d4df8296 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -11,7 +11,8 @@ class CollectorRoutesSpec extends Specification { "Health endpoint" should { "return OK always because collector always works" in { val request = Request[IO](method = Method.GET, uri = uri"/health") - val response = new CollectorRoutes[IO].value.run(request).unsafeRunSync() + val routes = new CollectorRoutes[IO](CollectorTestUtils.noopSink, CollectorTestUtils.noopSink) + val response = routes.value.run(request).unsafeRunSync() response.status must beEqualTo(Status.Ok) response.as[String].unsafeRunSync() must beEqualTo("ok") diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala new file mode 100644 index 000000000..e83091692 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala @@ -0,0 +1,13 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.Applicative + +object CollectorTestUtils { + + def noopSink[F[_]: Applicative]: Sink[F] = new Sink[F] { + val maxBytes: Int = Int.MaxValue + def isHealthy: F[Boolean] = Applicative[F].pure(true) + def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = Applicative[F].unit + } + +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala index 97721fa34..90e520c43 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala @@ -10,10 +10,32 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import cats.effect.{ExitCode, IO, IOApp} +import cats.effect.{ExitCode, IO, IOApp, Sync} +import cats.effect.kernel.Resource +import cats.implicits._ + +import java.util.Base64 +import java.io.PrintStream object StdoutCollector extends IOApp { - def run(args: List[String]): IO[ExitCode] = - CollectorApp.run() + def run(args: List[String]): IO[ExitCode] = { + val good = Resource.pure[IO, Sink[IO]](printingSink(System.out)) + val bad = Resource.pure[IO, Sink[IO]](printingSink(System.err)) + CollectorApp.run[IO](good, bad) + } + + private def printingSink[F[_]: Sync](stream: PrintStream): Sink[F] = new Sink[F] { + val maxBytes = Int.MaxValue // TODO: configurable? + def isHealthy: F[Boolean] = Sync[F].pure(true) + + val encoder = Base64.getEncoder().withoutPadding() + + def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + events.traverse_ { e => + Sync[F].delay { + stream.println(encoder.encodeToString(e)) + } + } + } } From d9da0e7d89bf0fb7ad04e464119df67d972a0bf4 Mon Sep 17 00:00:00 2001 From: spenes Date: Thu, 3 Aug 2023 13:32:15 +0300 Subject: [PATCH 04/39] Add http4s POST endpoint (close #366) --- build.sbt | 3 + .../CollectorApp.scala | 14 +- .../CollectorRoutes.scala | 49 +++- .../CollectorService.scala | 170 +++++++++++ .../SplitBatch.scala | 152 ++++++++++ .../model.scala | 33 +++ .../CollectorRoutesSpec.scala | 41 ++- .../CollectorServiceSpec.scala | 265 ++++++++++++++++++ .../SplitBatchSpec.scala | 144 ++++++++++ .../TestSink.scala | 20 ++ .../TestUtils.scala | 14 + .../StdoutCollector.scala | 11 +- 12 files changed, 899 insertions(+), 17 deletions(-) create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala diff --git a/build.sbt b/build.sbt index 586a41d42..56f075e4d 100644 --- a/build.sbt +++ b/build.sbt @@ -126,6 +126,9 @@ lazy val http4s = project Dependencies.Libraries.http4sBlaze, Dependencies.Libraries.http4sNetty, Dependencies.Libraries.log4cats, + Dependencies.Libraries.thrift, + Dependencies.Libraries.badRows, + Dependencies.Libraries.collectorPayload, Dependencies.Libraries.slf4j, Dependencies.Libraries.specs2 ) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala index 3e7f82e36..82074116d 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala @@ -16,17 +16,27 @@ import org.typelevel.log4cats.slf4j.Slf4jLogger import java.net.InetSocketAddress import scala.concurrent.duration.{DurationLong, FiniteDuration} +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ + object CollectorApp { implicit private def unsafeLogger[F[_]: Sync]: Logger[F] = Slf4jLogger.getLogger[F] - def run[F[_]: Async](mkGood: Resource[F, Sink[F]], mkBad: Resource[F, Sink[F]]): F[ExitCode] = { + def run[F[_]: Async]( + mkGood: Resource[F, Sink[F]], + mkBad: Resource[F, Sink[F]], + config: CollectorConfig, + appName: String, + appVersion: String + ): F[ExitCode] = { val resources = for { bad <- mkBad good <- mkGood _ <- withGracefulShutdown(610.seconds) { - buildHttpServer[F](new CollectorRoutes[F](good, bad).value) + val sinks = CollectorSinks(good, bad) + val collectorService: CollectorService[F] = new CollectorService[F](config, sinks, appName, appVersion) + buildHttpServer[F](new CollectorRoutes[F](collectorService).value) } } yield () diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala index 628cb3524..814e3a56f 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -1,17 +1,48 @@ package com.snowplowanalytics.snowplow.collectors.scalastream +import cats.implicits._ import cats.effect.Sync -import org.http4s.{HttpApp, HttpRoutes} +import org.typelevel.ci.CIString +import org.http4s.{HttpApp, HttpRoutes, Request} import org.http4s.dsl.Http4sDsl +import org.http4s.implicits._ +import com.comcast.ip4s.Dns -class CollectorRoutes[F[_]: Sync](good: Sink[F], bad: Sink[F]) extends Http4sDsl[F] { +class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDsl[F] { - val _ = (good, bad) + implicit val dns: Dns[F] = Dns.forSync[F] - lazy val value: HttpApp[F] = HttpRoutes - .of[F] { - case GET -> Root / "health" => - Ok("ok") - } - .orNotFound + private val healthRoutes = HttpRoutes.of[F] { + case GET -> Root / "health" => + Ok("ok") + } + + private val cookieRoutes = HttpRoutes.of[F] { + case req @ POST -> Root / vendor / version => + val path = collectorService.determinePath(vendor, version) + val userAgent = extractHeader(req, "User-Agent") + val referer = extractHeader(req, "Referer") + val spAnonymous = extractHeader(req, "SP-Anonymous") + + collectorService.cookie( + queryString = Some(req.queryString), + body = req.bodyText.compile.string.map(Some(_)), + path = path, + cookie = None, //TODO: cookie will be added later + userAgent = userAgent, + refererUri = referer, + hostname = req.remoteHost.map(_.map(_.toString)), + ip = req.remoteAddr.map(_.toUriString), // TODO: Do not set the ip if request contains SP-Anonymous header + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = req.contentType.map(_.value.toLowerCase), + spAnonymous = spAnonymous + ) + } + + val value: HttpApp[F] = (healthRoutes <+> cookieRoutes).orNotFound + + def extractHeader(req: Request[F], headerName: String): Option[String] = + req.headers.get(CIString(headerName)).map(_.head.value) } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala new file mode 100644 index 000000000..e652e0c49 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -0,0 +1,170 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import java.util.UUID + +import scala.collection.JavaConverters._ + +import cats.effect.Sync +import cats.implicits._ + +import org.http4s.{Request, RequestCookie, Response} +import org.http4s.Status._ + +import org.typelevel.ci._ + +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload + +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ + +trait Service[F[_]] { + def cookie( + queryString: Option[String], + body: F[Option[String]], + path: String, + cookie: Option[RequestCookie], + userAgent: Option[String], + refererUri: Option[String], + hostname: F[Option[String]], + ip: Option[String], + request: Request[F], + pixelExpected: Boolean, + doNotTrack: Boolean, + contentType: Option[String] = None, + spAnonymous: Option[String] = None + ): F[Response[F]] + def determinePath(vendor: String, version: String): String +} + +class CollectorService[F[_]: Sync]( + config: CollectorConfig, + sinks: CollectorSinks[F], + appName: String, + appVersion: String +) extends Service[F] { + + // TODO: Add sink type as well + private val collector = s"$appName-$appVersion" + + private val splitBatch: SplitBatch = SplitBatch(appName, appVersion) + + def cookie( + queryString: Option[String], + body: F[Option[String]], + path: String, + cookie: Option[RequestCookie], + userAgent: Option[String], + refererUri: Option[String], + hostname: F[Option[String]], + ip: Option[String], + request: Request[F], + pixelExpected: Boolean, + doNotTrack: Boolean, + contentType: Option[String] = None, + spAnonymous: Option[String] = None + ): F[Response[F]] = + for { + body <- body + hostname <- hostname + // TODO: Get ipAsPartitionKey from config + (ipAddress, partitionKey) = ipAndPartitionKey(ip, ipAsPartitionKey = false) + // TODO: nuid should be set properly + nuid = UUID.randomUUID().toString + event = buildEvent( + queryString, + body, + path, + userAgent, + refererUri, + hostname, + ipAddress, + nuid, + contentType, + headers(request, spAnonymous) + ) + _ <- sinkEvent(event, partitionKey) + } yield buildHttpResponse + + def determinePath(vendor: String, version: String): String = { + val original = s"/$vendor/$version" + config.paths.getOrElse(original, original) + } + + /** Builds a raw event from an Http request. */ + def buildEvent( + queryString: Option[String], + body: Option[String], + path: String, + userAgent: Option[String], + refererUri: Option[String], + hostname: Option[String], + ipAddress: String, + networkUserId: String, + contentType: Option[String], + headers: List[String] + ): CollectorPayload = { + val e = new CollectorPayload( + "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0", + ipAddress, + System.currentTimeMillis, + "UTF-8", + collector + ) + queryString.foreach(e.querystring = _) + body.foreach(e.body = _) + e.path = path + userAgent.foreach(e.userAgent = _) + refererUri.foreach(e.refererUri = _) + hostname.foreach(e.hostname = _) + e.networkUserId = networkUserId + e.headers = (headers ++ contentType).asJava + contentType.foreach(e.contentType = _) + e + } + + // TODO: Handle necessary cases to build http response in here + def buildHttpResponse: Response[F] = Response(status = Ok) + + // TODO: Since Remote-Address and Raw-Request-URI is akka-specific headers, + // they aren't included in here. It might be good to search for counterparts in Http4s. + /** If the SP-Anonymous header is not present, retrieves all headers + * from the request. + * If the SP-Anonymous header is present, additionally filters out the + * X-Forwarded-For, X-Real-IP and Cookie headers as well. + */ + def headers(request: Request[F], spAnonymous: Option[String]): List[String] = + request.headers.headers.flatMap { h => + h.name match { + case ci"X-Forwarded-For" | ci"X-Real-Ip" | ci"Cookie" if spAnonymous.isDefined => None + case _ => Some(h.toString()) + } + } + + /** Produces the event to the configured sink. */ + def sinkEvent( + event: CollectorPayload, + partitionKey: String + ): F[Unit] = + for { + // Split events into Good and Bad + eventSplit <- Sync[F].delay(splitBatch.splitAndSerializePayload(event, sinks.good.maxBytes)) + // Send events to respective sinks + _ <- sinks.good.storeRawEvents(eventSplit.good, partitionKey) + _ <- sinks.bad.storeRawEvents(eventSplit.bad, partitionKey) + } yield () + + /** + * Gets the IP from a RemoteAddress. If ipAsPartitionKey is false, a UUID will be generated. + * + * @param remoteAddress Address extracted from an HTTP request + * @param ipAsPartitionKey Whether to use the ip as a partition key or a random UUID + * @return a tuple of ip (unknown if it couldn't be extracted) and partition key + */ + def ipAndPartitionKey( + ipAddress: Option[String], + ipAsPartitionKey: Boolean + ): (String, String) = + ipAddress match { + case None => ("unknown", UUID.randomUUID.toString) + case Some(ip) => (ip, if (ipAsPartitionKey) ip else UUID.randomUUID.toString) + } +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala new file mode 100644 index 000000000..907adcc51 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala @@ -0,0 +1,152 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import java.nio.ByteBuffer +import java.nio.charset.StandardCharsets.UTF_8 +import java.time.Instant +import org.apache.thrift.TSerializer + +import cats.syntax.either._ +import io.circe.Json +import io.circe.parser._ +import io.circe.syntax._ + +import com.snowplowanalytics.iglu.core._ +import com.snowplowanalytics.iglu.core.circe.CirceIgluCodecs._ +import com.snowplowanalytics.snowplow.badrows._ +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ + +/** Object handling splitting an array of strings correctly */ +case class SplitBatch(appName: String, appVersion: String) { + + // Serialize Thrift CollectorPayload objects + val ThriftSerializer = new ThreadLocal[TSerializer] { + override def initialValue = new TSerializer() + } + + /** + * Split a list of strings into batches, none of them exceeding a given size + * Input strings exceeding the given size end up in the failedBigEvents field of the result + * @param input List of strings + * @param maximum No good batch can exceed this size + * @param joinSize Constant to add to the size of the string representing the additional comma + * needed to join separate event JSONs in a single array + * @return split batch containing list of good batches and list of events that were too big + */ + def split(input: List[Json], maximum: Int, joinSize: Int = 1): SplitBatchResult = { + @scala.annotation.tailrec + def iterbatch( + l: List[Json], + currentBatch: List[Json], + currentTotal: Long, + acc: List[List[Json]], + failedBigEvents: List[Json] + ): SplitBatchResult = l match { + case Nil => + currentBatch match { + case Nil => SplitBatchResult(acc, failedBigEvents) + case nonemptyBatch => SplitBatchResult(nonemptyBatch :: acc, failedBigEvents) + } + case h :: t => + val headSize = getSize(h.noSpaces) + if (headSize + joinSize > maximum) { + iterbatch(t, currentBatch, currentTotal, acc, h :: failedBigEvents) + } else if (headSize + currentTotal + joinSize > maximum) { + iterbatch(l, Nil, 0, currentBatch :: acc, failedBigEvents) + } else { + iterbatch(t, h :: currentBatch, headSize + currentTotal + joinSize, acc, failedBigEvents) + } + } + + iterbatch(input, Nil, 0, Nil, Nil) + } + + /** + * If the CollectorPayload is too big to fit in a single record, attempt to split it into + * multiple records. + * @param event Incoming CollectorPayload + * @return a List of Good and Bad events + */ + def splitAndSerializePayload(event: CollectorPayload, maxBytes: Int): EventSerializeResult = { + val serializer = ThriftSerializer.get() + val everythingSerialized = serializer.serialize(event) + val wholeEventBytes = getSize(everythingSerialized) + + // If the event is below the size limit, no splitting is necessary + if (wholeEventBytes < maxBytes) { + EventSerializeResult(List(everythingSerialized), Nil) + } else { + (for { + body <- Option(event.getBody).toRight("GET requests cannot be split") + children <- splitBody(body) + initialBodyDataBytes = getSize(Json.arr(children._2: _*).noSpaces) + _ <- Either.cond[String, Unit]( + wholeEventBytes - initialBodyDataBytes < maxBytes, + (), + "cannot split this POST request because event without \"data\" field is still too big" + ) + splitted = split(children._2, maxBytes - wholeEventBytes + initialBodyDataBytes) + goodSerialized = serializeBatch(serializer, event, splitted.goodBatches, children._1) + badList = splitted.failedBigEvents.map { e => + val msg = "this POST request split is still too large" + oversizedPayload(event, getSize(e), maxBytes, msg) + } + } yield EventSerializeResult(goodSerialized, badList)).fold({ msg => + val tooBigPayload = oversizedPayload(event, wholeEventBytes, maxBytes, msg) + EventSerializeResult(Nil, List(tooBigPayload)) + }, identity) + } + } + + def splitBody(body: String): Either[String, (SchemaKey, List[Json])] = + for { + json <- parse(body).leftMap(e => s"cannot split POST requests which are not json ${e.getMessage}") + sdd <- json + .as[SelfDescribingData[Json]] + .leftMap(e => s"cannot split POST requests which are not self-describing ${e.getMessage}") + array <- sdd.data.asArray.toRight("cannot split POST requests which do not contain a data array") + } yield (sdd.schema, array.toList) + + /** + * Creates a bad row while maintaining a truncation of the original payload to ease debugging. + * Keeps a tenth of the original payload. + * @param event original payload + * @param size size of the oversized payload + * @param maxSize maximum size allowed + * @param msg error message + * @return the created bad rows as json + */ + private def oversizedPayload( + event: CollectorPayload, + size: Int, + maxSize: Int, + msg: String + ): Array[Byte] = + BadRow + .SizeViolation( + Processor(appName, appVersion), + Failure.SizeViolation(Instant.now(), maxSize, size, s"oversized collector payload: $msg"), + Payload.RawPayload(event.toString().take(maxSize / 10)) + ) + .compact + .getBytes(UTF_8) + + private def getSize(a: Array[Byte]): Int = ByteBuffer.wrap(a).capacity + + private def getSize(s: String): Int = getSize(s.getBytes(UTF_8)) + + private def getSize(j: Json): Int = getSize(j.noSpaces) + + private def serializeBatch( + serializer: TSerializer, + event: CollectorPayload, + batches: List[List[Json]], + schema: SchemaKey + ): List[Array[Byte]] = + batches.map { batch => + val payload = event.deepCopy() + val body = SelfDescribingData[Json](schema, Json.arr(batch: _*)) + payload.setBody(body.asJson.noSpaces) + serializer.serialize(payload) + } +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala new file mode 100644 index 000000000..24a99ae9e --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala @@ -0,0 +1,33 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import io.circe.Json + +object model { + + /** + * Case class for holding both good and + * bad sinks for the Stream Collector. + */ + final case class CollectorSinks[F[_]](good: Sink[F], bad: Sink[F]) + + /** + * Case class for holding the results of + * splitAndSerializePayload. + * + * @param good All good results + * @param bad All bad results + */ + final case class EventSerializeResult(good: List[Array[Byte]], bad: List[Array[Byte]]) + + /** + * Class for the result of splitting a too-large array of events in the body of a POST request + * + * @param goodBatches List of batches of events + * @param failedBigEvents List of events that were too large + */ + final case class SplitBatchResult(goodBatches: List[List[Json]], failedBigEvents: List[Json]) + + final case class CollectorConfig( + paths: Map[String, String] + ) +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala index 3d4df8296..f59414de5 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -3,20 +3,51 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.effect.IO import cats.effect.unsafe.implicits.global import org.http4s.implicits.http4sLiteralsSyntax -import org.http4s.{Method, Request, Status} +import org.http4s.{Method, Request, RequestCookie, Response, Status} +import org.http4s.Status._ +import fs2.{Stream, text} import org.specs2.mutable.Specification class CollectorRoutesSpec extends Specification { - "Health endpoint" should { - "return OK always because collector always works" in { + val collectorService = new Service[IO] { + override def cookie( + queryString: Option[String], + body: IO[Option[String]], + path: String, + cookie: Option[RequestCookie], + userAgent: Option[String], + refererUri: Option[String], + hostname: IO[Option[String]], + ip: Option[String], + request: Request[IO], + pixelExpected: Boolean, + doNotTrack: Boolean, + contentType: Option[String], + spAnonymous: Option[String] + ): IO[Response[IO]] = + IO.pure(Response(status = Ok, body = Stream.emit("cookie").through(text.utf8.encode))) + + override def determinePath(vendor: String, version: String): String = "/p1/p2" + } + val routes = new CollectorRoutes[IO](collectorService).value + + "The collector route" should { + "respond to the health route with an ok response" in { val request = Request[IO](method = Method.GET, uri = uri"/health") - val routes = new CollectorRoutes[IO](CollectorTestUtils.noopSink, CollectorTestUtils.noopSink) - val response = routes.value.run(request).unsafeRunSync() + val response = routes.run(request).unsafeRunSync() response.status must beEqualTo(Status.Ok) response.as[String].unsafeRunSync() must beEqualTo("ok") } + + "respond to the post cookie route with the cookie response" in { + val request = Request[IO](method = Method.POST, uri = uri"/p1/p2") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.Ok) + response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") + } } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala new file mode 100644 index 000000000..92b2aa483 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -0,0 +1,265 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import scala.collection.JavaConverters._ +import cats.effect.IO +import cats.effect.unsafe.implicits.global +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload +import org.http4s.{Headers, Method, Request, RequestCookie, Status} +import org.http4s.headers._ +import com.comcast.ip4s.IpAddress +import org.specs2.mutable.Specification +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import org.apache.thrift.{TDeserializer, TSerializer} + +class CollectorServiceSpec extends Specification { + case class ProbeService(service: CollectorService[IO], good: TestSink, bad: TestSink) + + val service = new CollectorService[IO]( + config = TestUtils.testConf, + sinks = CollectorSinks[IO](new TestSink, new TestSink), + appName = "appName", + appVersion = "appVersion" + ) + val event = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") + val uuidRegex = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}".r + + def probeService(): ProbeService = { + val good = new TestSink + val bad = new TestSink + val service = new CollectorService[IO]( + config = TestUtils.testConf, + sinks = CollectorSinks[IO](good, bad), + appName = "appName", + appVersion = "appVersion" + ) + ProbeService(service, good, bad) + } + + def emptyCollectorPayload: CollectorPayload = + new CollectorPayload(null, null, System.currentTimeMillis, null, null) + + def serializer = new TSerializer() + def deserializer = new TDeserializer() + + "The collector service" should { + "cookie" in { + "respond with a 200 OK and a good row in good sink" in { + val ProbeService(service, good, bad) = probeService() + val headers = Headers( + `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), + Cookie(RequestCookie("cookie", "value")), + `Access-Control-Allow-Credentials`() + ) + val req = Request[IO]( + method = Method.POST, + headers = headers + ) + val r = service + .cookie( + queryString = Some("a=b"), + body = IO.pure(Some("b")), + path = "p", + cookie = None, + userAgent = Some("ua"), + refererUri = Some("ref"), + hostname = IO.pure(Some("h")), + ip = Some("ip"), + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif"), + spAnonymous = None + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" + e.ipAddress shouldEqual "ip" + e.encoding shouldEqual "UTF-8" + e.collector shouldEqual s"appName-appVersion" + e.querystring shouldEqual "a=b" + e.body shouldEqual "b" + e.path shouldEqual "p" + e.userAgent shouldEqual "ua" + e.refererUri shouldEqual "ref" + e.hostname shouldEqual "h" + //e.networkUserId shouldEqual "nuid" //TODO: add check for nuid as well + e.headers shouldEqual List( + "X-Forwarded-For: 127.0.0.1", + "Cookie: cookie=value", + "Access-Control-Allow-Credentials: true", + "image/gif" + ).asJava + e.contentType shouldEqual "image/gif" + } + + "sink event with headers removed when spAnonymous set" in { + val ProbeService(service, good, bad) = probeService() + val headers = Headers( + `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), + Cookie(RequestCookie("cookie", "value")), + `Access-Control-Allow-Credentials`() + ) + val req = Request[IO]( + method = Method.POST, + headers = headers + ) + val r = service + .cookie( + queryString = Some("a=b"), + body = IO.pure(Some("b")), + path = "p", + cookie = None, + userAgent = Some("ua"), + refererUri = Some("ref"), + hostname = IO.pure(Some("h")), + ip = Some("ip"), + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif"), + spAnonymous = Some("*") + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.headers shouldEqual List( + "Access-Control-Allow-Credentials: true", + "image/gif" + ).asJava + } + } + + "buildEvent" in { + "fill the correct values" in { + val ct = Some("image/gif") + val headers = List("X-Forwarded-For", "X-Real-Ip") + val e = service.buildEvent( + Some("q"), + Some("b"), + "p", + Some("ua"), + Some("ref"), + Some("h"), + "ip", + "nuid", + ct, + headers + ) + e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" + e.ipAddress shouldEqual "ip" + e.encoding shouldEqual "UTF-8" + e.collector shouldEqual s"appName-appVersion" + e.querystring shouldEqual "q" + e.body shouldEqual "b" + e.path shouldEqual "p" + e.userAgent shouldEqual "ua" + e.refererUri shouldEqual "ref" + e.hostname shouldEqual "h" + e.networkUserId shouldEqual "nuid" + e.headers shouldEqual (headers ::: ct.toList).asJava + e.contentType shouldEqual ct.get + } + + "set fields to null if they aren't set" in { + val headers = List() + val e = service.buildEvent( + None, + None, + "p", + None, + None, + None, + "ip", + "nuid", + None, + headers + ) + e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" + e.ipAddress shouldEqual "ip" + e.encoding shouldEqual "UTF-8" + e.collector shouldEqual s"appName-appVersion" + e.querystring shouldEqual null + e.body shouldEqual null + e.path shouldEqual "p" + e.userAgent shouldEqual null + e.refererUri shouldEqual null + e.hostname shouldEqual null + e.networkUserId shouldEqual "nuid" + e.headers shouldEqual headers.asJava + e.contentType shouldEqual null + } + } + + "sinkEvent" in { + "send back the produced events" in { + val ProbeService(s, good, bad) = probeService() + s.sinkEvent(event, "key").unsafeRunSync() + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + good.storedRawEvents.head.zip(serializer.serialize(event)).forall { case (a, b) => a mustEqual b } + } + } + + "ipAndPartitionkey" in { + "give back the ip and partition key as ip if remote address is defined" in { + val address = Some("127.0.0.1") + service.ipAndPartitionKey(address, true) shouldEqual (("127.0.0.1", "127.0.0.1")) + } + "give back the ip and a uuid as partition key if ipAsPartitionKey is false" in { + val address = Some("127.0.0.1") + val (ip, pkey) = service.ipAndPartitionKey(address, false) + ip shouldEqual "127.0.0.1" + pkey must beMatching(uuidRegex) + } + "give back unknown as ip and a random uuid as partition key if the address isn't known" in { + val (ip, pkey) = service.ipAndPartitionKey(None, true) + ip shouldEqual "unknown" + pkey must beMatching(uuidRegex) + } + } + + "determinePath" in { + val vendor = "com.acme" + val version1 = "track" + val version2 = "redirect" + val version3 = "iglu" + + "should correctly replace the path in the request if a mapping is provided" in { + val expected1 = "/com.snowplowanalytics.snowplow/tp2" + val expected2 = "/r/tp2" + val expected3 = "/com.snowplowanalytics.iglu/v1" + + service.determinePath(vendor, version1) shouldEqual expected1 + service.determinePath(vendor, version2) shouldEqual expected2 + service.determinePath(vendor, version3) shouldEqual expected3 + } + + "should pass on the original path if no mapping for it can be found" in { + val service = new CollectorService( + TestUtils.testConf.copy(paths = Map.empty[String, String]), + CollectorSinks(new TestSink, new TestSink), + "", + "" + ) + val expected1 = "/com.acme/track" + val expected2 = "/com.acme/redirect" + val expected3 = "/com.acme/iglu" + + service.determinePath(vendor, version1) shouldEqual expected1 + service.determinePath(vendor, version2) shouldEqual expected2 + service.determinePath(vendor, version3) shouldEqual expected3 + } + } + } +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala new file mode 100644 index 000000000..84c412d06 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala @@ -0,0 +1,144 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import org.apache.thrift.TDeserializer + +import io.circe.Json +import io.circe.parser._ +import io.circe.syntax._ + +import com.snowplowanalytics.iglu.core.circe.implicits._ +import com.snowplowanalytics.iglu.core.SelfDescribingData +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload +import com.snowplowanalytics.snowplow.badrows._ +import com.snowplowanalytics.snowplow.collectors.scalastream.model.SplitBatchResult + +import org.specs2.mutable.Specification + +class SplitBatchSpec extends Specification { + val splitBatch: SplitBatch = SplitBatch("app", "version") + + "SplitBatch.split" should { + "Batch a list of strings based on size" in { + splitBatch.split(List("a", "b", "c").map(Json.fromString), 9, 1) must_== + SplitBatchResult(List(List("c"), List("b", "a")).map(_.map(Json.fromString)), Nil) + } + + "Reject only those strings which are too big" in { + splitBatch.split(List("1234567", "1", "123").map(Json.fromString), 8, 0) must_== + SplitBatchResult(List(List("123", "1").map(Json.fromString)), List("1234567").map(Json.fromString)) + } + + "Batch a long list of strings" in { + splitBatch.split( + List("123456778901", "123456789", "12345678", "1234567", "123456", "12345", "1234", "123", "12", "1") + .map(Json.fromString), + 13, + 0 + ) must_== + SplitBatchResult( + List( + List("1", "12", "123"), + List("1234", "12345"), + List("123456"), + List("1234567"), + List("12345678"), + List("123456789") + ).map(_.map(Json.fromString)), + List("123456778901").map(Json.fromString) + ) + } + } + + "SplitBatch.splitAndSerializePayload" should { + "Serialize an empty CollectorPayload" in { + val actual = splitBatch.splitAndSerializePayload(new CollectorPayload(), 100) + val target = new CollectorPayload() + new TDeserializer().deserialize(target, actual.good.head) + target must_== new CollectorPayload() + } + + "Reject an oversized GET CollectorPayload" in { + val payload = new CollectorPayload() + payload.setQuerystring("x" * 1000) + val actual = splitBatch.splitAndSerializePayload(payload, 100) + val res = parse(new String(actual.bad.head)).toOption.get + val selfDesc = SelfDescribingData.parse(res).toOption.get + val badRow = selfDesc.data.as[BadRow].toOption.get + badRow must beAnInstanceOf[BadRow.SizeViolation] + val sizeViolation = badRow.asInstanceOf[BadRow.SizeViolation] + sizeViolation.failure.maximumAllowedSizeBytes must_== 100 + sizeViolation.failure.actualSizeBytes must_== 1019 + sizeViolation.failure.expectation must_== "oversized collector payload: GET requests cannot be split" + sizeViolation.payload.event must_== "CollectorP" + sizeViolation.processor shouldEqual Processor("app", "version") + actual.good must_== Nil + } + + "Reject an oversized POST CollectorPayload with an unparseable body" in { + val payload = new CollectorPayload() + payload.setBody("s" * 1000) + val actual = splitBatch.splitAndSerializePayload(payload, 100) + val res = parse(new String(actual.bad.head)).toOption.get + val selfDesc = SelfDescribingData.parse(res).toOption.get + val badRow = selfDesc.data.as[BadRow].toOption.get + badRow must beAnInstanceOf[BadRow.SizeViolation] + val sizeViolation = badRow.asInstanceOf[BadRow.SizeViolation] + sizeViolation.failure.maximumAllowedSizeBytes must_== 100 + sizeViolation.failure.actualSizeBytes must_== 1019 + sizeViolation + .failure + .expectation must_== "oversized collector payload: cannot split POST requests which are not json expected json value got 'ssssss...' (line 1, column 1)" + sizeViolation.payload.event must_== "CollectorP" + sizeViolation.processor shouldEqual Processor("app", "version") + } + + "Reject an oversized POST CollectorPayload which would be oversized even without its body" in { + val payload = new CollectorPayload() + val data = Json.obj( + "schema" := Json.fromString("s"), + "data" := Json.arr( + Json.obj("e" := "se", "tv" := "js"), + Json.obj("e" := "se", "tv" := "js") + ) + ) + payload.setBody(data.noSpaces) + payload.setPath("p" * 1000) + val actual = splitBatch.splitAndSerializePayload(payload, 1000) + actual.bad.size must_== 1 + val res = parse(new String(actual.bad.head)).toOption.get + val selfDesc = SelfDescribingData.parse(res).toOption.get + val badRow = selfDesc.data.as[BadRow].toOption.get + badRow must beAnInstanceOf[BadRow.SizeViolation] + val sizeViolation = badRow.asInstanceOf[BadRow.SizeViolation] + sizeViolation.failure.maximumAllowedSizeBytes must_== 1000 + sizeViolation.failure.actualSizeBytes must_== 1091 + sizeViolation + .failure + .expectation must_== "oversized collector payload: cannot split POST requests which are not self-describing Invalid Iglu URI: s, code: INVALID_IGLUURI" + sizeViolation + .payload + .event must_== "CollectorPayload(schema:null, ipAddress:null, timestamp:0, encoding:null, collector:null, path:ppppp" + sizeViolation.processor shouldEqual Processor("app", "version") + } + + "Split a CollectorPayload with three large events and four very large events" in { + val payload = new CollectorPayload() + val data = Json.obj( + "schema" := Schemas.SizeViolation.toSchemaUri, + "data" := Json.arr( + Json.obj("e" := "se", "tv" := "x" * 600), + Json.obj("e" := "se", "tv" := "x" * 5), + Json.obj("e" := "se", "tv" := "x" * 600), + Json.obj("e" := "se", "tv" := "y" * 1000), + Json.obj("e" := "se", "tv" := "y" * 1000), + Json.obj("e" := "se", "tv" := "y" * 1000), + Json.obj("e" := "se", "tv" := "y" * 1000) + ) + ) + payload.setBody(data.noSpaces) + val actual = splitBatch.splitAndSerializePayload(payload, 1000) + actual.bad.size must_== 4 + actual.good.size must_== 2 + } + } +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala new file mode 100644 index 000000000..2c273a603 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala @@ -0,0 +1,20 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.IO + +import scala.collection.mutable.ListBuffer + +class TestSink extends Sink[IO] { + + private val buf: ListBuffer[Array[Byte]] = ListBuffer() + + override val maxBytes: Int = Int.MaxValue + + override def isHealthy: IO[Boolean] = IO.pure(true) + + override def storeRawEvents(events: List[Array[Byte]], key: String): IO[Unit] = + IO.delay(buf ++= events) + + def storedRawEvents: List[Array[Byte]] = buf.toList + +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala new file mode 100644 index 000000000..f0adaf65a --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala @@ -0,0 +1,14 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import com.snowplowanalytics.snowplow.collectors.scalastream.model.CollectorConfig + +object TestUtils { + + val testConf = CollectorConfig( + paths = Map( + "/com.acme/track" -> "/com.snowplowanalytics.snowplow/tp2", + "/com.acme/redirect" -> "/r/tp2", + "/com.acme/iglu" -> "/com.snowplowanalytics.iglu/v1" + ) + ) +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala index 90e520c43..bc524634a 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala @@ -17,12 +17,21 @@ import cats.implicits._ import java.util.Base64 import java.io.PrintStream +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo + object StdoutCollector extends IOApp { def run(args: List[String]): IO[ExitCode] = { val good = Resource.pure[IO, Sink[IO]](printingSink(System.out)) val bad = Resource.pure[IO, Sink[IO]](printingSink(System.err)) - CollectorApp.run[IO](good, bad) + CollectorApp.run[IO]( + good, + bad, + CollectorConfig(Map.empty), + BuildInfo.shortName, + BuildInfo.version + ) } private def printingSink[F[_]: Sync](stream: PrintStream): Sink[F] = new Sink[F] { From 72f9903c872c9fd21872fc8f80061c3d7dd2a05f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Piotr=20Poniedzia=C5=82ek?= Date: Tue, 8 Aug 2023 13:58:33 +0200 Subject: [PATCH 05/39] Add test for the stdout sink (close #367) --- .../PrintingSink.scala | 31 +++++++++++++++++ .../StdoutCollector.scala | 27 +++------------ .../sinks/PrintingSinkSpec.scala | 34 +++++++++++++++++++ 3 files changed, 69 insertions(+), 23 deletions(-) create mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala create mode 100644 stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala new file mode 100644 index 000000000..ef5e7725f --- /dev/null +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala @@ -0,0 +1,31 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.Sync +import cats.implicits._ + +import java.io.PrintStream +import java.util.Base64 + +class PrintingSink[F[_]: Sync](stream: PrintStream) extends Sink[F] { + private val encoder: Base64.Encoder = Base64.getEncoder.withoutPadding() + + override val maxBytes: Int = Int.MaxValue // TODO: configurable? + override def isHealthy: F[Boolean] = Sync[F].pure(true) + + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + events.traverse_ { event => + Sync[F].delay { + stream.println(encoder.encodeToString(event)) + } + } +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala index bc524634a..7a7f3456c 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala @@ -10,21 +10,16 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import cats.effect.{ExitCode, IO, IOApp, Sync} import cats.effect.kernel.Resource -import cats.implicits._ - -import java.util.Base64 -import java.io.PrintStream - -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import cats.effect.{ExitCode, IO, IOApp} import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ object StdoutCollector extends IOApp { def run(args: List[String]): IO[ExitCode] = { - val good = Resource.pure[IO, Sink[IO]](printingSink(System.out)) - val bad = Resource.pure[IO, Sink[IO]](printingSink(System.err)) + val good = Resource.pure[IO, Sink[IO]](new PrintingSink[IO](System.out)) + val bad = Resource.pure[IO, Sink[IO]](new PrintingSink[IO](System.err)) CollectorApp.run[IO]( good, bad, @@ -33,18 +28,4 @@ object StdoutCollector extends IOApp { BuildInfo.version ) } - - private def printingSink[F[_]: Sync](stream: PrintStream): Sink[F] = new Sink[F] { - val maxBytes = Int.MaxValue // TODO: configurable? - def isHealthy: F[Boolean] = Sync[F].pure(true) - - val encoder = Base64.getEncoder().withoutPadding() - - def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = - events.traverse_ { e => - Sync[F].delay { - stream.println(encoder.encodeToString(e)) - } - } - } } diff --git a/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala b/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala new file mode 100644 index 000000000..e241a95ad --- /dev/null +++ b/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala @@ -0,0 +1,34 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import cats.effect.IO +import cats.effect.unsafe.implicits.global +import com.snowplowanalytics.snowplow.collectors.scalastream.PrintingSink +import org.specs2.mutable.Specification + +import java.io.{ByteArrayOutputStream, PrintStream} +import java.nio.charset.StandardCharsets + +class PrintingSinkSpec extends Specification { + + "Printing sink" should { + "print provided bytes encoded as BASE64 string" in { + val baos = new ByteArrayOutputStream() + val sink = new PrintingSink[IO](new PrintStream(baos)) + val input = "Something" + + sink.storeRawEvents(List(input.getBytes(StandardCharsets.UTF_8)), "key").unsafeRunSync() + + baos.toString(StandardCharsets.UTF_8) must beEqualTo("U29tZXRoaW5n\n") // base64 of 'Something' + newline + } + } +} From 5d6de12449c124a77ea19363d111b4d31dc91687 Mon Sep 17 00:00:00 2001 From: spenes Date: Mon, 7 Aug 2023 15:26:50 +0300 Subject: [PATCH 06/39] Configure set-cookie header (close #368) --- .../CollectorService.scala | 112 ++++++- .../model.scala | 22 +- .../CollectorServiceSpec.scala | 276 +++++++++++++++++- .../TestUtils.scala | 13 +- .../StdoutCollector.scala | 15 +- 5 files changed, 428 insertions(+), 10 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala index e652e0c49..75cddc2e9 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -2,12 +2,15 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import java.util.UUID +import scala.concurrent.duration._ import scala.collection.JavaConverters._ -import cats.effect.Sync +import cats.effect.{Clock, Sync} import cats.implicits._ -import org.http4s.{Request, RequestCookie, Response} +import org.http4s._ +import org.http4s.headers._ +import org.http4s.implicits._ import org.http4s.Status._ import org.typelevel.ci._ @@ -81,8 +84,18 @@ class CollectorService[F[_]: Sync]( contentType, headers(request, spAnonymous) ) + now <- Clock[F].realTime + setCookie = cookieHeader( + headers = request.headers, + cookieConfig = config.cookieConfig, + networkUserId = nuid, + doNotTrack = doNotTrack, + spAnonymous = spAnonymous, + now = now + ) + responseHeaders = Headers(setCookie.toList.map(_.toRaw1)) _ <- sinkEvent(event, partitionKey) - } yield buildHttpResponse + } yield buildHttpResponse(responseHeaders) def determinePath(vendor: String, version: String): String = { val original = s"/$vendor/$version" @@ -122,7 +135,8 @@ class CollectorService[F[_]: Sync]( } // TODO: Handle necessary cases to build http response in here - def buildHttpResponse: Response[F] = Response(status = Ok) + def buildHttpResponse(headers: Headers): Response[F] = + Response(status = Ok, headers = headers) // TODO: Since Remote-Address and Raw-Request-URI is akka-specific headers, // they aren't included in here. It might be good to search for counterparts in Http4s. @@ -152,6 +166,96 @@ class CollectorService[F[_]: Sync]( _ <- sinks.bad.storeRawEvents(eventSplit.bad, partitionKey) } yield () + /** + * Builds a cookie header with the network user id as value. + * + * @param cookieConfig cookie configuration extracted from the collector configuration + * @param networkUserId value of the cookie + * @param doNotTrack whether do not track is enabled or not + * @return the build cookie wrapped in a header + */ + def cookieHeader( + headers: Headers, + cookieConfig: Option[CookieConfig], + networkUserId: String, + doNotTrack: Boolean, + spAnonymous: Option[String], + now: FiniteDuration + ): Option[`Set-Cookie`] = + if (doNotTrack) { + None + } else { + spAnonymous match { + case Some(_) => None + case None => + cookieConfig.map { config => + val responseCookie = ResponseCookie( + name = config.name, + content = networkUserId, + expires = Some(HttpDate.unsafeFromEpochSecond((now + config.expiration).toSeconds)), + domain = cookieDomain(headers, config.domains, config.fallbackDomain), + path = Some("/"), + sameSite = config.sameSite, + secure = config.secure, + httpOnly = config.httpOnly + ) + `Set-Cookie`(responseCookie) + } + } + } + + /** + * Determines the cookie domain to be used by inspecting the Origin header of the request + * and trying to find a match in the list of domains specified in the config file. + * + * @param headers The headers from the http request. + * @param domains The list of cookie domains from the configuration. + * @param fallbackDomain The fallback domain from the configuration. + * @return The domain to be sent back in the response, unless no cookie domains are configured. + * The Origin header may include multiple domains. The first matching domain is returned. + * If no match is found, the fallback domain is used if configured. Otherwise, the cookie domain is not set. + */ + def cookieDomain( + headers: Headers, + domains: List[String], + fallbackDomain: Option[String] + ): Option[String] = + (domains match { + case Nil => None + case _ => + val originHosts = extractHosts(headers) + domains.find(domain => originHosts.exists(validMatch(_, domain))) + }).orElse(fallbackDomain) + + /** Extracts the host names from a list of values in the request's Origin header. */ + def extractHosts(headers: Headers): List[String] = + (for { + // We can't use 'headers.get[Origin]' function in here because of the bug + // reported here: https://github.com/http4s/http4s/issues/7236 + // To circumvent the bug, we split the the Origin header value with blank char + // and parse items individually. + originSplit <- headers.get(ci"Origin").map(_.head.value.split(' ')) + parsed = originSplit.map(Origin.parse(_).toOption).toList.flatten + hosts = parsed.flatMap(extractHostFromOrigin) + } yield hosts).getOrElse(List.empty) + + private def extractHostFromOrigin(originHeader: Origin): List[String] = + originHeader match { + case Origin.Null => List.empty + case Origin.HostList(hosts) => hosts.map(_.host.value).toList + } + + /** + * Ensures a match is valid. + * We only want matches where: + * a.) the Origin host is exactly equal to the cookie domain from the config + * b.) the Origin host is a subdomain of the cookie domain from the config. + * But we want to avoid cases where the cookie domain from the config is randomly + * a substring of the Origin host, without any connection between them. + */ + def validMatch(host: String, domain: String): Boolean = + host == domain || host.endsWith("." + domain) + /** * Gets the IP from a RemoteAddress. If ipAsPartitionKey is false, a UUID will be generated. * diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala index 24a99ae9e..18c0b4563 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala @@ -1,5 +1,9 @@ package com.snowplowanalytics.snowplow.collectors.scalastream +import scala.concurrent.duration._ + +import org.http4s.SameSite + import io.circe.Json object model { @@ -27,7 +31,21 @@ object model { */ final case class SplitBatchResult(goodBatches: List[List[Json]], failedBigEvents: List[Json]) - final case class CollectorConfig( - paths: Map[String, String] + final case class CookieConfig( + enabled: Boolean, + name: String, + expiration: FiniteDuration, + domains: List[String], + fallbackDomain: Option[String], + secure: Boolean, + httpOnly: Boolean, + sameSite: Option[SameSite] ) + + final case class CollectorConfig( + paths: Map[String, String], + cookie: CookieConfig + ) { + val cookieConfig = if (cookie.enabled) Some(cookie) else None + } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala index 92b2aa483..08720df71 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -1,11 +1,15 @@ package com.snowplowanalytics.snowplow.collectors.scalastream +import scala.concurrent.duration._ import scala.collection.JavaConverters._ -import cats.effect.IO +import cats.effect.{Clock, IO} import cats.effect.unsafe.implicits.global +import cats.data.NonEmptyList import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload -import org.http4s.{Headers, Method, Request, RequestCookie, Status} +import org.http4s._ import org.http4s.headers._ +import org.http4s.implicits._ +import org.typelevel.ci._ import com.comcast.ip4s.IpAddress import org.specs2.mutable.Specification import com.snowplowanalytics.snowplow.collectors.scalastream.model._ @@ -43,6 +47,26 @@ class CollectorServiceSpec extends Specification { "The collector service" should { "cookie" in { + "not set a cookie if SP-Anonymous is present" in { + val r = service + .cookie( + queryString = Some("nuid=12"), + body = IO.pure(Some("b")), + path = "p", + cookie = None, + userAgent = None, + refererUri = None, + hostname = IO.pure(Some("h")), + ip = None, + request = Request[IO](), + pixelExpected = false, + doNotTrack = false, + contentType = None, + spAnonymous = Some("*") + ) + .unsafeRunSync() + r.headers.get(ci"Set-Cookie") must beNone + } "respond with a 200 OK and a good row in good sink" in { val ProbeService(service, good, bad) = probeService() val headers = Headers( @@ -229,6 +253,254 @@ class CollectorServiceSpec extends Specification { } } + "cookieHeader" in { + val testCookieConfig = CookieConfig( + enabled = true, + name = "name", + expiration = 5.seconds, + domains = List("domain"), + fallbackDomain = None, + secure = false, + httpOnly = false, + sameSite = None + ) + val now = Clock[IO].realTime.unsafeRunSync() + + "give back a cookie header with the appropriate configuration" in { + val nuid = "nuid" + val conf = testCookieConfig + val Some(`Set-Cookie`(cookie)) = service.cookieHeader( + headers = Headers.empty, + cookieConfig = Some(conf), + networkUserId = nuid, + doNotTrack = false, + spAnonymous = None, + now = now + ) + + cookie.name shouldEqual conf.name + cookie.content shouldEqual nuid + cookie.domain shouldEqual None + cookie.path shouldEqual Some("/") + cookie.expires must beSome + (cookie.expires.get.toDuration - now).toMillis must beCloseTo(conf.expiration.toMillis, 1000L) + cookie.secure must beFalse + cookie.httpOnly must beFalse + cookie.extension must beEmpty + } + "give back None if no configuration is given" in { + service.cookieHeader( + headers = Headers.empty, + cookieConfig = None, + networkUserId = "nuid", + doNotTrack = false, + spAnonymous = None, + now = now + ) shouldEqual None + } + "give back None if doNoTrack is true" in { + val conf = testCookieConfig + service.cookieHeader( + headers = Headers.empty, + cookieConfig = Some(conf), + networkUserId = "nuid", + doNotTrack = true, + spAnonymous = None, + now = now + ) shouldEqual None + } + "give back None if SP-Anonymous header is present" in { + val conf = testCookieConfig + service.cookieHeader( + headers = Headers.empty, + cookieConfig = Some(conf), + networkUserId = "nuid", + doNotTrack = true, + spAnonymous = Some("*"), + now = now + ) shouldEqual None + } + "give back a cookie header with Secure, HttpOnly and SameSite=None" in { + val nuid = "nuid" + val conf = testCookieConfig.copy( + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ) + val Some(`Set-Cookie`(cookie)) = + service.cookieHeader( + headers = Headers.empty, + cookieConfig = Some(conf), + networkUserId = nuid, + doNotTrack = false, + spAnonymous = None, + now = now + ) + cookie.secure must beTrue + cookie.httpOnly must beTrue + cookie.sameSite must beSome(SameSite.None) + cookie.extension must beNone + service.cookieHeader( + headers = Headers.empty, + cookieConfig = Some(conf), + networkUserId = nuid, + doNotTrack = true, + spAnonymous = None, + now = now + ) shouldEqual None + } + } + + "cookieDomain" in { + val testCookieConfig = CookieConfig( + enabled = true, + name = "name", + expiration = 5.seconds, + domains = List.empty, + fallbackDomain = None, + secure = false, + httpOnly = false, + sameSite = None + ) + "not return a domain" in { + "if a list of domains is not supplied in the config and there is no fallback domain" in { + val headers = Headers.empty + val cookieConfig = testCookieConfig + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None + } + "if a list of domains is supplied in the config but the Origin request header is empty and there is no fallback domain" in { + val headers = Headers.empty + val cookieConfig = testCookieConfig.copy(domains = List("domain.com")) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None + } + "if none of the domains in the request's Origin header has a match in the list of domains supplied with the config and there is no fallback domain" in { + val origin: Origin = Origin.HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("origin.com")), + Origin + .Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("otherorigin.com"), port = Some(8080)) + ) + ) + val headers = Headers(origin.toRaw1) + val cookieConfig = testCookieConfig.copy( + domains = List("domain.com", "otherdomain.com") + ) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None + } + } + "return the fallback domain" in { + "if a list of domains is not supplied in the config but a fallback domain is configured" in { + val headers = Headers.empty + val cookieConfig = testCookieConfig.copy( + fallbackDomain = Some("fallbackDomain") + ) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( + "fallbackDomain" + ) + } + "if the Origin header is empty and a fallback domain is configured" in { + val headers = Headers.empty + val cookieConfig = testCookieConfig.copy( + domains = List("domain.com"), + fallbackDomain = Some("fallbackDomain") + ) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( + "fallbackDomain" + ) + } + "if none of the domains in the request's Origin header has a match in the list of domains supplied with the config but a fallback domain is configured" in { + val origin: Origin = Origin.HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("origin.com")), + Origin + .Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("otherorigin.com"), port = Some(8080)) + ) + ) + val headers = Headers(origin.toRaw1) + val cookieConfig = testCookieConfig.copy( + domains = List("domain.com", "otherdomain.com"), + fallbackDomain = Some("fallbackDomain") + ) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( + "fallbackDomain" + ) + } + } + "return the matched domain" in { + "if there is only one domain in the request's Origin header and it matches in the list of domains supplied with the config" in { + val origin: Origin = Origin.HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("www.domain.com")) + ) + ) + val headers = Headers(origin.toRaw1) + val cookieConfig = testCookieConfig.copy( + domains = List("domain.com", "otherdomain.com"), + fallbackDomain = Some("fallbackDomain") + ) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( + "domain.com" + ) + } + "if multiple domains from the request's Origin header have matches in the list of domains supplied with the config" in { + val origin: Origin = Origin.HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("www.domain2.com")), + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("www.domain.com")), + Origin.Host( + scheme = Uri.Scheme.http, + host = Uri.Host.unsafeFromString("www.otherdomain.com"), + port = Some(8080) + ) + ) + ) + val headers = Headers(origin.toRaw1) + val cookieConfig = testCookieConfig.copy( + domains = List("domain.com", "otherdomain.com"), + fallbackDomain = Some("fallbackDomain") + ) + service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( + "domain.com" + ) + } + } + } + + "extractHosts" in { + "correctly extract the host names from a list of values in the request's Origin header" in { + val origin: Origin = Origin.HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.https, host = Uri.Host.unsafeFromString("origin.com")), + Origin.Host( + scheme = Uri.Scheme.http, + host = Uri.Host.unsafeFromString("subdomain.otherorigin.gov.co.uk"), + port = Some(8080) + ) + ) + ) + val headers = Headers(origin.toRaw1) + service.extractHosts(headers) shouldEqual Seq("origin.com", "subdomain.otherorigin.gov.co.uk") + } + } + + "validMatch" in { + val domain = "snplow.com" + "true for valid matches" in { + val validHost1 = "snplow.com" + val validHost2 = "blog.snplow.com" + val validHost3 = "blog.snplow.com.snplow.com" + service.validMatch(validHost1, domain) shouldEqual true + service.validMatch(validHost2, domain) shouldEqual true + service.validMatch(validHost3, domain) shouldEqual true + } + "false for invalid matches" in { + val invalidHost1 = "notsnplow.com" + val invalidHost2 = "blog.snplow.comsnplow.com" + service.validMatch(invalidHost1, domain) shouldEqual false + service.validMatch(invalidHost2, domain) shouldEqual false + } + } + "determinePath" in { val vendor = "com.acme" val version1 = "track" diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala index f0adaf65a..a60a79c0a 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala @@ -1,6 +1,7 @@ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.model.CollectorConfig +import scala.concurrent.duration._ +import com.snowplowanalytics.snowplow.collectors.scalastream.model._ object TestUtils { @@ -9,6 +10,16 @@ object TestUtils { "/com.acme/track" -> "/com.snowplowanalytics.snowplow/tp2", "/com.acme/redirect" -> "/r/tp2", "/com.acme/iglu" -> "/com.snowplowanalytics.iglu/v1" + ), + cookie = CookieConfig( + enabled = true, + name = "sp", + expiration = 365.days, + domains = List.empty, + fallbackDomain = None, + secure = false, + httpOnly = false, + sameSite = None ) ) } diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala index 7a7f3456c..c70cdfd4b 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala @@ -10,6 +10,7 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream +import scala.concurrent.duration._ import cats.effect.kernel.Resource import cats.effect.{ExitCode, IO, IOApp} import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo @@ -23,7 +24,19 @@ object StdoutCollector extends IOApp { CollectorApp.run[IO]( good, bad, - CollectorConfig(Map.empty), + CollectorConfig( + Map.empty, + cookie = CookieConfig( + enabled = true, + name = "sp", + expiration = 365.days, + domains = List.empty, + fallbackDomain = None, + secure = false, + httpOnly = false, + sameSite = None + ) + ), BuildInfo.shortName, BuildInfo.version ) From 4eb079b68b591bb40a7d67d64dc04da141b5983b Mon Sep 17 00:00:00 2001 From: spenes Date: Wed, 9 Aug 2023 16:51:36 +0300 Subject: [PATCH 07/39] Add http4s GET and HEAD endpoints (close #369) --- .../CollectorRoutes.scala | 53 +++++--- .../CollectorService.scala | 47 ++++++- .../CollectorRoutesSpec.scala | 128 ++++++++++++++++-- .../CollectorServiceSpec.scala | 43 ++++++ 4 files changed, 241 insertions(+), 30 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala index 814e3a56f..7ccdde5e3 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -18,27 +18,48 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs } private val cookieRoutes = HttpRoutes.of[F] { - case req @ POST -> Root / vendor / version => + case req @ (POST | GET | HEAD) -> Root / vendor / version => val path = collectorService.determinePath(vendor, version) val userAgent = extractHeader(req, "User-Agent") val referer = extractHeader(req, "Referer") val spAnonymous = extractHeader(req, "SP-Anonymous") + val hostname = req.remoteHost.map(_.map(_.toString)) + val ip = req.remoteAddr.map(_.toUriString) - collectorService.cookie( - queryString = Some(req.queryString), - body = req.bodyText.compile.string.map(Some(_)), - path = path, - cookie = None, //TODO: cookie will be added later - userAgent = userAgent, - refererUri = referer, - hostname = req.remoteHost.map(_.map(_.toString)), - ip = req.remoteAddr.map(_.toUriString), // TODO: Do not set the ip if request contains SP-Anonymous header - request = req, - pixelExpected = false, - doNotTrack = false, - contentType = req.contentType.map(_.value.toLowerCase), - spAnonymous = spAnonymous - ) + req.method match { + case POST => + collectorService.cookie( + queryString = Some(req.queryString), + body = req.bodyText.compile.string.map(Some(_)), + path = path, + cookie = None, //TODO: cookie will be added later + userAgent = userAgent, + refererUri = referer, + hostname = hostname, + ip = ip, // TODO: Do not set the ip if request contains SP-Anonymous header + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = req.contentType.map(_.value.toLowerCase), + spAnonymous = spAnonymous + ) + case GET | HEAD => + collectorService.cookie( + queryString = Some(req.queryString), + body = Sync[F].delay(None), + path = path, + cookie = None, //TODO: cookie will be added later + userAgent = userAgent, + refererUri = referer, + hostname = hostname, + ip = ip, // TODO: Do not set the ip if request contains SP-Anonymous header + request = req, + pixelExpected = true, + doNotTrack = false, + contentType = None, + spAnonymous = spAnonymous + ) + } } val value: HttpApp[F] = (healthRoutes <+> cookieRoutes).orNotFound diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala index 75cddc2e9..16316a167 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -2,12 +2,16 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import java.util.UUID +import org.apache.commons.codec.binary.Base64 + import scala.concurrent.duration._ import scala.collection.JavaConverters._ import cats.effect.{Clock, Sync} import cats.implicits._ +import fs2.Stream + import org.http4s._ import org.http4s.headers._ import org.http4s.implicits._ @@ -38,6 +42,11 @@ trait Service[F[_]] { def determinePath(vendor: String, version: String): String } +object CollectorService { + // Contains an invisible pixel to return for `/i` requests. + val pixel = Base64.decodeBase64("R0lGODlhAQABAPAAAP///wAAACH5BAUAAAAALAAAAAABAAEAAAICRAEAOw==") +} + class CollectorService[F[_]: Sync]( config: CollectorConfig, sinks: CollectorSinks[F], @@ -45,6 +54,8 @@ class CollectorService[F[_]: Sync]( appVersion: String ) extends Service[F] { + val pixelStream = Stream.iterable[F, Byte](CollectorService.pixel) + // TODO: Add sink type as well private val collector = s"$appName-$appVersion" @@ -70,8 +81,7 @@ class CollectorService[F[_]: Sync]( hostname <- hostname // TODO: Get ipAsPartitionKey from config (ipAddress, partitionKey) = ipAndPartitionKey(ip, ipAsPartitionKey = false) - // TODO: nuid should be set properly - nuid = UUID.randomUUID().toString + nuid = UUID.randomUUID().toString // TODO: nuid should be set properly event = buildEvent( queryString, body, @@ -93,9 +103,13 @@ class CollectorService[F[_]: Sync]( spAnonymous = spAnonymous, now = now ) - responseHeaders = Headers(setCookie.toList.map(_.toRaw1)) + headerList = List( + setCookie.map(_.toRaw1), + cacheControl(pixelExpected).map(_.toRaw1) + ).flatten + responseHeaders = Headers(headerList) _ <- sinkEvent(event, partitionKey) - } yield buildHttpResponse(responseHeaders) + } yield buildHttpResponse(responseHeaders, pixelExpected) def determinePath(vendor: String, version: String): String = { val original = s"/$vendor/$version" @@ -135,8 +149,23 @@ class CollectorService[F[_]: Sync]( } // TODO: Handle necessary cases to build http response in here - def buildHttpResponse(headers: Headers): Response[F] = - Response(status = Ok, headers = headers) + def buildHttpResponse( + headers: Headers, + pixelExpected: Boolean + ): Response[F] = + pixelExpected match { + case true => + Response[F]( + headers = headers.put(`Content-Type`(MediaType.image.gif)), + body = pixelStream + ) + // See https://github.com/snowplow/snowplow-javascript-tracker/issues/482 + case false => + Response[F]( + status = Ok, + headers = headers + ).withEntity("ok") + } // TODO: Since Remote-Address and Raw-Request-URI is akka-specific headers, // they aren't included in here. It might be good to search for counterparts in Http4s. @@ -153,6 +182,12 @@ class CollectorService[F[_]: Sync]( } } + /** If the pixel is requested, this attaches cache control headers to the response to prevent any caching. */ + def cacheControl(pixelExpected: Boolean): Option[`Cache-Control`] = + if (pixelExpected) + Some(`Cache-Control`(CacheDirective.`no-cache`(), CacheDirective.`no-store`, CacheDirective.`must-revalidate`)) + else None + /** Produces the event to the configured sink. */ def sinkEvent( event: CollectorPayload, diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala index f59414de5..da3a833fb 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -1,16 +1,41 @@ package com.snowplowanalytics.snowplow.collectors.scalastream +import scala.collection.mutable.ListBuffer import cats.effect.IO import cats.effect.unsafe.implicits.global -import org.http4s.implicits.http4sLiteralsSyntax -import org.http4s.{Method, Request, RequestCookie, Response, Status} +import com.comcast.ip4s.SocketAddress +import org.http4s.implicits._ +import org.http4s._ +import org.http4s.headers._ import org.http4s.Status._ import fs2.{Stream, text} +import org.typelevel.ci._ import org.specs2.mutable.Specification class CollectorRoutesSpec extends Specification { - val collectorService = new Service[IO] { + case class CookieParams( + queryString: Option[String], + body: IO[Option[String]], + path: String, + cookie: Option[RequestCookie], + userAgent: Option[String], + refererUri: Option[String], + hostname: IO[Option[String]], + ip: Option[String], + request: Request[IO], + pixelExpected: Boolean, + doNotTrack: Boolean, + contentType: Option[String], + spAnonymous: Option[String] + ) + + class TestService() extends Service[IO] { + + private val cookieCalls: ListBuffer[CookieParams] = ListBuffer() + + def getCookieCalls: List[CookieParams] = cookieCalls.toList + override def cookie( queryString: Option[String], body: IO[Option[String]], @@ -26,28 +51,115 @@ class CollectorRoutesSpec extends Specification { contentType: Option[String], spAnonymous: Option[String] ): IO[Response[IO]] = - IO.pure(Response(status = Ok, body = Stream.emit("cookie").through(text.utf8.encode))) + IO.delay { + cookieCalls += CookieParams( + queryString, + body, + path, + cookie, + userAgent, + refererUri, + hostname, + ip, + request, + pixelExpected, + doNotTrack, + contentType, + spAnonymous + ) + Response(status = Ok, body = Stream.emit("cookie").through(text.utf8.encode)) + } override def determinePath(vendor: String, version: String): String = "/p1/p2" } - val routes = new CollectorRoutes[IO](collectorService).value + + val testConnection = Request.Connection( + local = SocketAddress.fromStringIp("127.0.0.1:80").get, + remote = SocketAddress.fromStringIp("127.0.0.1:80").get, + secure = false + ) + + val testHeaders = Headers( + `User-Agent`(ProductId("testUserAgent")), + Referer(Uri.unsafeFromString("example.com")), + Header.Raw(ci"SP-Anonymous", "*"), + `Content-Type`(MediaType.application.json) + ) + + def createTestServices = { + val collectorService = new TestService() + val routes = new CollectorRoutes[IO](collectorService).value + (collectorService, routes) + } "The collector route" should { "respond to the health route with an ok response" in { - val request = Request[IO](method = Method.GET, uri = uri"/health") - val response = routes.run(request).unsafeRunSync() + val (_, routes) = createTestServices + val request = Request[IO](method = Method.GET, uri = uri"/health") + val response = routes.run(request).unsafeRunSync() response.status must beEqualTo(Status.Ok) response.as[String].unsafeRunSync() must beEqualTo("ok") } "respond to the post cookie route with the cookie response" in { - val request = Request[IO](method = Method.POST, uri = uri"/p1/p2") + val (collectorService, routes) = createTestServices + + val request = Request[IO](method = Method.POST, uri = uri"/p3/p4?a=b&c=d") + .withAttribute(Request.Keys.ConnectionInfo, testConnection) + .withEntity("testBody") + .withHeaders(testHeaders) val response = routes.run(request).unsafeRunSync() + val List(cookieParams) = collectorService.getCookieCalls + cookieParams.queryString shouldEqual Some("a=b&c=d") + cookieParams.body.unsafeRunSync() shouldEqual Some("testBody") + cookieParams.path shouldEqual "/p1/p2" + cookieParams.cookie shouldEqual None + cookieParams.userAgent shouldEqual Some("testUserAgent") + cookieParams.refererUri shouldEqual Some("example.com") + cookieParams.hostname.unsafeRunSync() shouldEqual Some("localhost") + cookieParams.ip shouldEqual Some("127.0.0.1") + cookieParams.pixelExpected shouldEqual false + cookieParams.doNotTrack shouldEqual false + cookieParams.contentType shouldEqual Some("application/json") + cookieParams.spAnonymous shouldEqual Some("*") + response.status must beEqualTo(Status.Ok) response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") } + + "respond to the get or head cookie route with the cookie response" in { + def getHeadTest(method: Method) = { + val (collectorService, routes) = createTestServices + + val request = Request[IO](method = method, uri = uri"/p3/p4?a=b&c=d") + .withAttribute(Request.Keys.ConnectionInfo, testConnection) + .withEntity("testBody") + .withHeaders(testHeaders) + val response = routes.run(request).unsafeRunSync() + + val List(cookieParams) = collectorService.getCookieCalls + cookieParams.queryString shouldEqual Some("a=b&c=d") + cookieParams.body.unsafeRunSync() shouldEqual None + cookieParams.path shouldEqual "/p1/p2" + cookieParams.cookie shouldEqual None + cookieParams.userAgent shouldEqual Some("testUserAgent") + cookieParams.refererUri shouldEqual Some("example.com") + cookieParams.hostname.unsafeRunSync() shouldEqual Some("localhost") + cookieParams.ip shouldEqual Some("127.0.0.1") + cookieParams.pixelExpected shouldEqual true + cookieParams.doNotTrack shouldEqual false + cookieParams.contentType shouldEqual None + cookieParams.spAnonymous shouldEqual Some("*") + + response.status must beEqualTo(Status.Ok) + response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") + } + + getHeadTest(Method.GET) + getHeadTest(Method.HEAD) + } } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala index 08720df71..561f18fe0 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -26,6 +26,11 @@ class CollectorServiceSpec extends Specification { ) val event = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") val uuidRegex = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}".r + val hs = Headers( + `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), + Cookie(RequestCookie("cookie", "value")), + `Access-Control-Allow-Credentials`() + ) def probeService(): ProbeService = { val good = new TestSink @@ -162,6 +167,30 @@ class CollectorServiceSpec extends Specification { "image/gif" ).asJava } + + "return necessary cache control headers and respond with pixel when pixelExpected is true" in { + val r = service + .cookie( + queryString = Some("nuid=12"), + body = IO.pure(Some("b")), + path = "p", + cookie = None, + userAgent = None, + refererUri = None, + hostname = IO.pure(Some("h")), + ip = None, + request = Request[IO](), + pixelExpected = true, + doNotTrack = false, + contentType = None, + spAnonymous = Some("*") + ) + .unsafeRunSync() + r.headers.get[`Cache-Control`] shouldEqual Some( + `Cache-Control`(CacheDirective.`no-cache`(), CacheDirective.`no-store`, CacheDirective.`must-revalidate`) + ) + r.body.compile.toList.unsafeRunSync().toArray shouldEqual CollectorService.pixel + } } "buildEvent" in { @@ -235,6 +264,20 @@ class CollectorServiceSpec extends Specification { } } + "buildHttpResponse" in { + "send back a gif if pixelExpected is true" in { + val res = service.buildHttpResponse(hs, pixelExpected = true) + res.status shouldEqual Status.Ok + res.headers shouldEqual hs.put(`Content-Type`(MediaType.image.gif)) + res.body.compile.toList.unsafeRunSync().toArray shouldEqual CollectorService.pixel + } + "send back ok otherwise" in { + val res = service.buildHttpResponse(hs, pixelExpected = false) + res.status shouldEqual Status.Ok + res.bodyText.compile.toList.unsafeRunSync() shouldEqual List("ok") + } + } + "ipAndPartitionkey" in { "give back the ip and partition key as ip if remote address is defined" in { val address = Some("127.0.0.1") From 338cfad610f5c193f4fdf25543835822f910214d Mon Sep 17 00:00:00 2001 From: spenes Date: Thu, 10 Aug 2023 16:40:02 +0300 Subject: [PATCH 08/39] Add http4s pixel endpoint (close #370) --- .../CollectorRoutes.scala | 82 +++++++---------- .../CollectorService.scala | 35 +++---- .../CollectorRoutesSpec.scala | 92 +++++++------------ .../CollectorServiceSpec.scala | 87 ++++++++---------- 4 files changed, 123 insertions(+), 173 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala index 7ccdde5e3..0158b17fc 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -2,8 +2,7 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.implicits._ import cats.effect.Sync -import org.typelevel.ci.CIString -import org.http4s.{HttpApp, HttpRoutes, Request} +import org.http4s.{HttpApp, HttpRoutes} import org.http4s.dsl.Http4sDsl import org.http4s.implicits._ import com.comcast.ip4s.Dns @@ -18,52 +17,41 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs } private val cookieRoutes = HttpRoutes.of[F] { - case req @ (POST | GET | HEAD) -> Root / vendor / version => - val path = collectorService.determinePath(vendor, version) - val userAgent = extractHeader(req, "User-Agent") - val referer = extractHeader(req, "Referer") - val spAnonymous = extractHeader(req, "SP-Anonymous") - val hostname = req.remoteHost.map(_.map(_.toString)) - val ip = req.remoteAddr.map(_.toUriString) - - req.method match { - case POST => - collectorService.cookie( - queryString = Some(req.queryString), - body = req.bodyText.compile.string.map(Some(_)), - path = path, - cookie = None, //TODO: cookie will be added later - userAgent = userAgent, - refererUri = referer, - hostname = hostname, - ip = ip, // TODO: Do not set the ip if request contains SP-Anonymous header - request = req, - pixelExpected = false, - doNotTrack = false, - contentType = req.contentType.map(_.value.toLowerCase), - spAnonymous = spAnonymous - ) - case GET | HEAD => - collectorService.cookie( - queryString = Some(req.queryString), - body = Sync[F].delay(None), - path = path, - cookie = None, //TODO: cookie will be added later - userAgent = userAgent, - refererUri = referer, - hostname = hostname, - ip = ip, // TODO: Do not set the ip if request contains SP-Anonymous header - request = req, - pixelExpected = true, - doNotTrack = false, - contentType = None, - spAnonymous = spAnonymous - ) - } + case req @ POST -> Root / vendor / version => + val path = collectorService.determinePath(vendor, version) + collectorService.cookie( + body = req.bodyText.compile.string.map(Some(_)), + path = path, + cookie = None, //TODO: cookie will be added later + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = req.contentType.map(_.value.toLowerCase) + ) + + case req @ (GET | HEAD) -> Root / vendor / version => + val path = collectorService.determinePath(vendor, version) + collectorService.cookie( + body = Sync[F].pure(None), + path = path, + cookie = None, //TODO: cookie will be added later + request = req, + pixelExpected = true, + doNotTrack = false, + contentType = None + ) + + case req @ (GET | HEAD) -> Root / ("ice.png" | "i") => + collectorService.cookie( + body = Sync[F].pure(None), + path = req.pathInfo.renderString, + cookie = None, //TODO: cookie will be added later + request = req, + pixelExpected = true, + doNotTrack = false, + contentType = None + ) } val value: HttpApp[F] = (healthRoutes <+> cookieRoutes).orNotFound - - def extractHeader(req: Request[F], headerName: String): Option[String] = - req.headers.get(CIString(headerName)).map(_.head.value) } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala index 16316a167..be9bf6dc2 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -19,25 +19,21 @@ import org.http4s.Status._ import org.typelevel.ci._ +import com.comcast.ip4s.Dns + import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload import com.snowplowanalytics.snowplow.collectors.scalastream.model._ trait Service[F[_]] { def cookie( - queryString: Option[String], body: F[Option[String]], path: String, cookie: Option[RequestCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: F[Option[String]], - ip: Option[String], request: Request[F], pixelExpected: Boolean, doNotTrack: Boolean, - contentType: Option[String] = None, - spAnonymous: Option[String] = None + contentType: Option[String] = None ): F[Response[F]] def determinePath(vendor: String, version: String): String } @@ -54,6 +50,8 @@ class CollectorService[F[_]: Sync]( appVersion: String ) extends Service[F] { + implicit val dns: Dns[F] = Dns.forSync[F] + val pixelStream = Stream.iterable[F, Byte](CollectorService.pixel) // TODO: Add sink type as well @@ -62,23 +60,22 @@ class CollectorService[F[_]: Sync]( private val splitBatch: SplitBatch = SplitBatch(appName, appVersion) def cookie( - queryString: Option[String], body: F[Option[String]], path: String, cookie: Option[RequestCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: F[Option[String]], - ip: Option[String], request: Request[F], pixelExpected: Boolean, doNotTrack: Boolean, - contentType: Option[String] = None, - spAnonymous: Option[String] = None + contentType: Option[String] = None ): F[Response[F]] = for { body <- body - hostname <- hostname + hostname <- request.remoteHost.map(_.map(_.toString)) + userAgent = extractHeader(request, "User-Agent") + refererUri = extractHeader(request, "Referer") + spAnonymous = extractHeader(request, "SP-Anonymous") + ip = request.remoteAddr.map(_.toUriString) + queryString = Some(request.queryString) // TODO: Get ipAsPartitionKey from config (ipAddress, partitionKey) = ipAndPartitionKey(ip, ipAsPartitionKey = false) nuid = UUID.randomUUID().toString // TODO: nuid should be set properly @@ -116,6 +113,9 @@ class CollectorService[F[_]: Sync]( config.paths.getOrElse(original, original) } + def extractHeader(req: Request[F], headerName: String): Option[String] = + req.headers.get(CIString(headerName)).map(_.head.value) + /** Builds a raw event from an Http request. */ def buildEvent( queryString: Option[String], @@ -163,8 +163,9 @@ class CollectorService[F[_]: Sync]( case false => Response[F]( status = Ok, - headers = headers - ).withEntity("ok") + headers = headers, + body = Stream.emit("ok").through(fs2.text.utf8.encode) + ) } // TODO: Since Remote-Address and Raw-Request-URI is akka-specific headers, diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala index da3a833fb..6ad4cffda 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -3,31 +3,23 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import scala.collection.mutable.ListBuffer import cats.effect.IO import cats.effect.unsafe.implicits.global -import com.comcast.ip4s.SocketAddress import org.http4s.implicits._ import org.http4s._ import org.http4s.headers._ import org.http4s.Status._ import fs2.{Stream, text} -import org.typelevel.ci._ import org.specs2.mutable.Specification class CollectorRoutesSpec extends Specification { case class CookieParams( - queryString: Option[String], body: IO[Option[String]], path: String, cookie: Option[RequestCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: IO[Option[String]], - ip: Option[String], request: Request[IO], pixelExpected: Boolean, doNotTrack: Boolean, - contentType: Option[String], - spAnonymous: Option[String] + contentType: Option[String] ) class TestService() extends Service[IO] { @@ -37,35 +29,23 @@ class CollectorRoutesSpec extends Specification { def getCookieCalls: List[CookieParams] = cookieCalls.toList override def cookie( - queryString: Option[String], body: IO[Option[String]], path: String, cookie: Option[RequestCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: IO[Option[String]], - ip: Option[String], request: Request[IO], pixelExpected: Boolean, doNotTrack: Boolean, - contentType: Option[String], - spAnonymous: Option[String] + contentType: Option[String] ): IO[Response[IO]] = IO.delay { cookieCalls += CookieParams( - queryString, body, path, cookie, - userAgent, - refererUri, - hostname, - ip, request, pixelExpected, doNotTrack, - contentType, - spAnonymous + contentType ) Response(status = Ok, body = Stream.emit("cookie").through(text.utf8.encode)) } @@ -73,19 +53,6 @@ class CollectorRoutesSpec extends Specification { override def determinePath(vendor: String, version: String): String = "/p1/p2" } - val testConnection = Request.Connection( - local = SocketAddress.fromStringIp("127.0.0.1:80").get, - remote = SocketAddress.fromStringIp("127.0.0.1:80").get, - secure = false - ) - - val testHeaders = Headers( - `User-Agent`(ProductId("testUserAgent")), - Referer(Uri.unsafeFromString("example.com")), - Header.Raw(ci"SP-Anonymous", "*"), - `Content-Type`(MediaType.application.json) - ) - def createTestServices = { val collectorService = new TestService() val routes = new CollectorRoutes[IO](collectorService).value @@ -105,60 +72,69 @@ class CollectorRoutesSpec extends Specification { "respond to the post cookie route with the cookie response" in { val (collectorService, routes) = createTestServices - val request = Request[IO](method = Method.POST, uri = uri"/p3/p4?a=b&c=d") - .withAttribute(Request.Keys.ConnectionInfo, testConnection) + val request = Request[IO](method = Method.POST, uri = uri"/p3/p4") .withEntity("testBody") - .withHeaders(testHeaders) + .withHeaders(`Content-Type`(MediaType.application.json)) val response = routes.run(request).unsafeRunSync() val List(cookieParams) = collectorService.getCookieCalls - cookieParams.queryString shouldEqual Some("a=b&c=d") cookieParams.body.unsafeRunSync() shouldEqual Some("testBody") cookieParams.path shouldEqual "/p1/p2" cookieParams.cookie shouldEqual None - cookieParams.userAgent shouldEqual Some("testUserAgent") - cookieParams.refererUri shouldEqual Some("example.com") - cookieParams.hostname.unsafeRunSync() shouldEqual Some("localhost") - cookieParams.ip shouldEqual Some("127.0.0.1") cookieParams.pixelExpected shouldEqual false cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual Some("application/json") - cookieParams.spAnonymous shouldEqual Some("*") response.status must beEqualTo(Status.Ok) response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") } "respond to the get or head cookie route with the cookie response" in { - def getHeadTest(method: Method) = { + def test(method: Method) = { val (collectorService, routes) = createTestServices - val request = Request[IO](method = method, uri = uri"/p3/p4?a=b&c=d") - .withAttribute(Request.Keys.ConnectionInfo, testConnection) - .withEntity("testBody") - .withHeaders(testHeaders) + val request = Request[IO](method = method, uri = uri"/p3/p4").withEntity("testBody") val response = routes.run(request).unsafeRunSync() val List(cookieParams) = collectorService.getCookieCalls - cookieParams.queryString shouldEqual Some("a=b&c=d") cookieParams.body.unsafeRunSync() shouldEqual None cookieParams.path shouldEqual "/p1/p2" cookieParams.cookie shouldEqual None - cookieParams.userAgent shouldEqual Some("testUserAgent") - cookieParams.refererUri shouldEqual Some("example.com") - cookieParams.hostname.unsafeRunSync() shouldEqual Some("localhost") - cookieParams.ip shouldEqual Some("127.0.0.1") cookieParams.pixelExpected shouldEqual true cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual None - cookieParams.spAnonymous shouldEqual Some("*") response.status must beEqualTo(Status.Ok) response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") } - getHeadTest(Method.GET) - getHeadTest(Method.HEAD) + test(Method.GET) + test(Method.HEAD) + } + + "respond to the get or head pixel route with the cookie response" in { + def test(method: Method, uri: String) = { + val (collectorService, routes) = createTestServices + + val request = Request[IO](method = method, uri = Uri.unsafeFromString(uri)).withEntity("testBody") + val response = routes.run(request).unsafeRunSync() + + val List(cookieParams) = collectorService.getCookieCalls + cookieParams.body.unsafeRunSync() shouldEqual None + cookieParams.path shouldEqual uri + cookieParams.cookie shouldEqual None + cookieParams.pixelExpected shouldEqual true + cookieParams.doNotTrack shouldEqual false + cookieParams.contentType shouldEqual None + + response.status must beEqualTo(Status.Ok) + response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") + } + + test(Method.GET, "/i") + test(Method.HEAD, "/i") + test(Method.GET, "/ice.png") + test(Method.HEAD, "/ice.png") } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala index 561f18fe0..a172574c4 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -10,7 +10,7 @@ import org.http4s._ import org.http4s.headers._ import org.http4s.implicits._ import org.typelevel.ci._ -import com.comcast.ip4s.IpAddress +import com.comcast.ip4s.{IpAddress, SocketAddress} import org.specs2.mutable.Specification import com.snowplowanalytics.snowplow.collectors.scalastream.model._ import org.apache.thrift.{TDeserializer, TSerializer} @@ -26,11 +26,19 @@ class CollectorServiceSpec extends Specification { ) val event = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") val uuidRegex = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}".r - val hs = Headers( + val testHeaders = Headers( + `User-Agent`(ProductId("testUserAgent")), + Referer(Uri.unsafeFromString("example.com")), + `Content-Type`(MediaType.application.json), `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), Cookie(RequestCookie("cookie", "value")), `Access-Control-Allow-Credentials`() ) + val testConnection = Request.Connection( + local = SocketAddress.fromStringIp("127.0.0.1:80").get, + remote = SocketAddress.fromStringIp("127.0.0.1:80").get, + secure = false + ) def probeService(): ProbeService = { val good = new TestSink @@ -53,51 +61,36 @@ class CollectorServiceSpec extends Specification { "The collector service" should { "cookie" in { "not set a cookie if SP-Anonymous is present" in { + val request = Request[IO]().withHeaders(Header.Raw(ci"SP-Anonymous", "*")) val r = service .cookie( - queryString = Some("nuid=12"), body = IO.pure(Some("b")), path = "p", cookie = None, - userAgent = None, - refererUri = None, - hostname = IO.pure(Some("h")), - ip = None, - request = Request[IO](), + request = request, pixelExpected = false, doNotTrack = false, - contentType = None, - spAnonymous = Some("*") + contentType = None ) .unsafeRunSync() r.headers.get(ci"Set-Cookie") must beNone } "respond with a 200 OK and a good row in good sink" in { val ProbeService(service, good, bad) = probeService() - val headers = Headers( - `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), - Cookie(RequestCookie("cookie", "value")), - `Access-Control-Allow-Credentials`() - ) val req = Request[IO]( method = Method.POST, - headers = headers - ) + headers = testHeaders, + uri = Uri(query = Query.unsafeFromString("a=b")) + ).withAttribute(Request.Keys.ConnectionInfo, testConnection) val r = service .cookie( - queryString = Some("a=b"), body = IO.pure(Some("b")), path = "p", cookie = None, - userAgent = Some("ua"), - refererUri = Some("ref"), - hostname = IO.pure(Some("h")), - ip = Some("ip"), request = req, pixelExpected = false, doNotTrack = false, - contentType = Some("image/gif"), - spAnonymous = None + contentType = Some("image/gif") ) .unsafeRunSync() @@ -108,17 +101,20 @@ class CollectorServiceSpec extends Specification { val e = emptyCollectorPayload deserializer.deserialize(e, good.storedRawEvents.head) e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" - e.ipAddress shouldEqual "ip" + e.ipAddress shouldEqual "127.0.0.1" e.encoding shouldEqual "UTF-8" e.collector shouldEqual s"appName-appVersion" e.querystring shouldEqual "a=b" e.body shouldEqual "b" e.path shouldEqual "p" - e.userAgent shouldEqual "ua" - e.refererUri shouldEqual "ref" - e.hostname shouldEqual "h" + e.userAgent shouldEqual "testUserAgent" + e.refererUri shouldEqual "example.com" + e.hostname shouldEqual "localhost" //e.networkUserId shouldEqual "nuid" //TODO: add check for nuid as well e.headers shouldEqual List( + "User-Agent: testUserAgent", + "Referer: example.com", + "Content-Type: application/json", "X-Forwarded-For: 127.0.0.1", "Cookie: cookie=value", "Access-Control-Allow-Credentials: true", @@ -129,30 +125,20 @@ class CollectorServiceSpec extends Specification { "sink event with headers removed when spAnonymous set" in { val ProbeService(service, good, bad) = probeService() - val headers = Headers( - `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), - Cookie(RequestCookie("cookie", "value")), - `Access-Control-Allow-Credentials`() - ) + val req = Request[IO]( method = Method.POST, - headers = headers + headers = testHeaders.put(Header.Raw(ci"SP-Anonymous", "*")) ) val r = service .cookie( - queryString = Some("a=b"), body = IO.pure(Some("b")), path = "p", cookie = None, - userAgent = Some("ua"), - refererUri = Some("ref"), - hostname = IO.pure(Some("h")), - ip = Some("ip"), request = req, pixelExpected = false, doNotTrack = false, - contentType = Some("image/gif"), - spAnonymous = Some("*") + contentType = Some("image/gif") ) .unsafeRunSync() @@ -163,7 +149,11 @@ class CollectorServiceSpec extends Specification { val e = emptyCollectorPayload deserializer.deserialize(e, good.storedRawEvents.head) e.headers shouldEqual List( + "User-Agent: testUserAgent", + "Referer: example.com", + "Content-Type: application/json", "Access-Control-Allow-Credentials: true", + "SP-Anonymous: *", "image/gif" ).asJava } @@ -171,19 +161,13 @@ class CollectorServiceSpec extends Specification { "return necessary cache control headers and respond with pixel when pixelExpected is true" in { val r = service .cookie( - queryString = Some("nuid=12"), body = IO.pure(Some("b")), path = "p", cookie = None, - userAgent = None, - refererUri = None, - hostname = IO.pure(Some("h")), - ip = None, request = Request[IO](), pixelExpected = true, doNotTrack = false, - contentType = None, - spAnonymous = Some("*") + contentType = None ) .unsafeRunSync() r.headers.get[`Cache-Control`] shouldEqual Some( @@ -266,14 +250,15 @@ class CollectorServiceSpec extends Specification { "buildHttpResponse" in { "send back a gif if pixelExpected is true" in { - val res = service.buildHttpResponse(hs, pixelExpected = true) + val res = service.buildHttpResponse(testHeaders, pixelExpected = true) res.status shouldEqual Status.Ok - res.headers shouldEqual hs.put(`Content-Type`(MediaType.image.gif)) + res.headers shouldEqual testHeaders.put(`Content-Type`(MediaType.image.gif)) res.body.compile.toList.unsafeRunSync().toArray shouldEqual CollectorService.pixel } "send back ok otherwise" in { - val res = service.buildHttpResponse(hs, pixelExpected = false) + val res = service.buildHttpResponse(testHeaders, pixelExpected = false) res.status shouldEqual Status.Ok + res.headers shouldEqual testHeaders res.bodyText.compile.toList.unsafeRunSync() shouldEqual List("ok") } } From b272c991bb5e3de3ae37ea824da37c7bc1c7a0ef Mon Sep 17 00:00:00 2001 From: spenes Date: Fri, 11 Aug 2023 15:26:43 +0300 Subject: [PATCH 09/39] Add http4s CORS support (close #371) --- .../CollectorRoutes.scala | 9 +- .../CollectorService.scala | 50 +++++-- .../model.scala | 5 +- .../CollectorRoutesSpec.scala | 15 +++ .../CollectorServiceSpec.scala | 126 ++++++++++++++++-- .../TestUtils.scala | 3 +- .../StdoutCollector.scala | 3 +- 7 files changed, 183 insertions(+), 28 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala index 0158b17fc..02aa9cb6e 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -2,7 +2,7 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.implicits._ import cats.effect.Sync -import org.http4s.{HttpApp, HttpRoutes} +import org.http4s._ import org.http4s.dsl.Http4sDsl import org.http4s.implicits._ import com.comcast.ip4s.Dns @@ -16,6 +16,11 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs Ok("ok") } + private val corsRoute = HttpRoutes.of[F] { + case req @ OPTIONS -> _ => + collectorService.preflightResponse(req) + } + private val cookieRoutes = HttpRoutes.of[F] { case req @ POST -> Root / vendor / version => val path = collectorService.determinePath(vendor, version) @@ -53,5 +58,5 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs ) } - val value: HttpApp[F] = (healthRoutes <+> cookieRoutes).orNotFound + val value: HttpApp[F] = (healthRoutes <+> corsRoute <+> cookieRoutes).orNotFound } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala index be9bf6dc2..9ea656a69 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -26,6 +26,7 @@ import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPa import com.snowplowanalytics.snowplow.collectors.scalastream.model._ trait Service[F[_]] { + def preflightResponse(req: Request[F]): F[Response[F]] def cookie( body: F[Option[String]], path: String, @@ -59,7 +60,7 @@ class CollectorService[F[_]: Sync]( private val splitBatch: SplitBatch = SplitBatch(appName, appVersion) - def cookie( + override def cookie( body: F[Option[String]], path: String, cookie: Option[RequestCookie], @@ -102,17 +103,30 @@ class CollectorService[F[_]: Sync]( ) headerList = List( setCookie.map(_.toRaw1), - cacheControl(pixelExpected).map(_.toRaw1) + cacheControl(pixelExpected).map(_.toRaw1), + accessControlAllowOriginHeader(request).some, + `Access-Control-Allow-Credentials`().toRaw1.some ).flatten responseHeaders = Headers(headerList) _ <- sinkEvent(event, partitionKey) } yield buildHttpResponse(responseHeaders, pixelExpected) - def determinePath(vendor: String, version: String): String = { + override def determinePath(vendor: String, version: String): String = { val original = s"/$vendor/$version" config.paths.getOrElse(original, original) } + override def preflightResponse(req: Request[F]): F[Response[F]] = Sync[F].pure { + Response[F]( + headers = Headers( + accessControlAllowOriginHeader(req), + `Access-Control-Allow-Credentials`(), + `Access-Control-Allow-Headers`(ci"Content-Type", ci"SP-Anonymous"), + `Access-Control-Max-Age`.Cache(config.cors.accessControlMaxAge.toSeconds).asInstanceOf[`Access-Control-Max-Age`] + ) + ) + } + def extractHeader(req: Request[F], headerName: String): Option[String] = req.headers.get(CIString(headerName)).map(_.head.value) @@ -240,6 +254,19 @@ class CollectorService[F[_]: Sync]( } } + /** + * Creates an Access-Control-Allow-Origin header which specifically allows the domain which made + * the request + * + * @param request Incoming request + * @return Header allowing only the domain which made the request or everything + */ + def accessControlAllowOriginHeader(request: Request[F]): Header.Raw = + Header.Raw( + ci"Access-Control-Allow-Origin", + extractHostsFromOrigin(request.headers).headOption.map(_.renderString).getOrElse("*") + ) + /** * Determines the cookie domain to be used by inspecting the Origin header of the request * and trying to find a match in the list of domains specified in the config file. @@ -259,12 +286,12 @@ class CollectorService[F[_]: Sync]( (domains match { case Nil => None case _ => - val originHosts = extractHosts(headers) - domains.find(domain => originHosts.exists(validMatch(_, domain))) + val originDomains = extractHostsFromOrigin(headers).map(_.host.value) + domains.find(domain => originDomains.exists(validMatch(_, domain))) }).orElse(fallbackDomain) /** Extracts the host names from a list of values in the request's Origin header. */ - def extractHosts(headers: Headers): List[String] = + def extractHostsFromOrigin(headers: Headers): List[Origin.Host] = (for { // We can't use 'headers.get[Origin]' function in here because of the bug // reported here: https://github.com/http4s/http4s/issues/7236 @@ -272,15 +299,12 @@ class CollectorService[F[_]: Sync]( // and parse items individually. originSplit <- headers.get(ci"Origin").map(_.head.value.split(' ')) parsed = originSplit.map(Origin.parse(_).toOption).toList.flatten - hosts = parsed.flatMap(extractHostFromOrigin) + hosts = parsed.flatMap { + case Origin.Null => List.empty + case Origin.HostList(hosts) => hosts.toList + } } yield hosts).getOrElse(List.empty) - private def extractHostFromOrigin(originHeader: Origin): List[String] = - originHeader match { - case Origin.Null => List.empty - case Origin.HostList(hosts) => hosts.map(_.host.value).toList - } - /** * Ensures a match is valid. * We only want matches where: diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala index 18c0b4563..ff4eabfc9 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala @@ -42,9 +42,12 @@ object model { sameSite: Option[SameSite] ) + final case class CORSConfig(accessControlMaxAge: FiniteDuration) + final case class CollectorConfig( paths: Map[String, String], - cookie: CookieConfig + cookie: CookieConfig, + cors: CORSConfig ) { val cookieConfig = if (cookie.enabled) Some(cookie) else None } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala index 6ad4cffda..f1caf284b 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -28,6 +28,9 @@ class CollectorRoutesSpec extends Specification { def getCookieCalls: List[CookieParams] = cookieCalls.toList + override def preflightResponse(req: Request[IO]): IO[Response[IO]] = + IO.pure(Response[IO](status = Ok, body = Stream.emit("preflight response").through(text.utf8.encode))) + override def cookie( body: IO[Option[String]], path: String, @@ -69,6 +72,18 @@ class CollectorRoutesSpec extends Specification { response.as[String].unsafeRunSync() must beEqualTo("ok") } + "respond to the cors route with a preflight response" in { + val (_, routes) = createTestServices + def test(uri: Uri) = { + val request = Request[IO](method = Method.OPTIONS, uri = uri) + val response = routes.run(request).unsafeRunSync() + response.as[String].unsafeRunSync() shouldEqual "preflight response" + } + test(uri"/i") + test(uri"/health") + test(uri"/p3/p4") + } + "respond to the post cookie route with the cookie response" in { val (collectorService, routes) = createTestServices diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala index a172574c4..e06c8f8a9 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -175,6 +175,73 @@ class CollectorServiceSpec extends Specification { ) r.body.compile.toList.unsafeRunSync().toArray shouldEqual CollectorService.pixel } + + "include CORS headers in the response" in { + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + cookie = None, + request = Request[IO](), + pixelExpected = true, + doNotTrack = false, + contentType = None + ) + .unsafeRunSync() + r.headers.get[`Access-Control-Allow-Credentials`] shouldEqual Some( + `Access-Control-Allow-Credentials`() + ) + r.headers.get(ci"Access-Control-Allow-Origin").map(_.head) shouldEqual Some( + Header.Raw(ci"Access-Control-Allow-Origin", "*") + ) + } + + "include the origin if given to CORS headers in the response" in { + val headers = Headers( + Origin + .HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("origin.com")), + Origin.Host( + scheme = Uri.Scheme.http, + host = Uri.Host.unsafeFromString("otherorigin.com"), + port = Some(8080) + ) + ) + ) + .asInstanceOf[Origin] + ) + val request = Request[IO](headers = headers) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + cookie = None, + request = request, + pixelExpected = true, + doNotTrack = false, + contentType = None + ) + .unsafeRunSync() + r.headers.get[`Access-Control-Allow-Credentials`] shouldEqual Some( + `Access-Control-Allow-Credentials`() + ) + r.headers.get(ci"Access-Control-Allow-Origin").map(_.head) shouldEqual Some( + Header.Raw(ci"Access-Control-Allow-Origin", "http://origin.com") + ) + } + } + + "preflightResponse" in { + "return a response appropriate to cors preflight options requests" in { + val expected = Headers( + Header.Raw(ci"Access-Control-Allow-Origin", "*"), + `Access-Control-Allow-Credentials`(), + `Access-Control-Allow-Headers`(ci"Content-Type", ci"SP-Anonymous"), + `Access-Control-Max-Age`.Cache(60).asInstanceOf[`Access-Control-Max-Age`] + ) + service.preflightResponse(Request[IO]()).unsafeRunSync.headers shouldEqual expected + } } "buildEvent" in { @@ -379,6 +446,46 @@ class CollectorServiceSpec extends Specification { } } + "accessControlAllowOriginHeader" in { + "give a restricted ACAO header if there is an Origin header in the request" in { + val headers = Headers( + Origin + .HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("origin.com")) + ) + ) + .asInstanceOf[Origin] + ) + val request = Request[IO](headers = headers) + val expected = Header.Raw(ci"Access-Control-Allow-Origin", "http://origin.com") + service.accessControlAllowOriginHeader(request) shouldEqual expected + } + "give a restricted ACAO header if there are multiple Origin headers in the request" in { + val headers = Headers( + Origin + .HostList( + NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.http, host = Uri.Host.unsafeFromString("origin.com")), + Origin.Host( + scheme = Uri.Scheme.http, + host = Uri.Host.unsafeFromString("otherorigin.com"), + port = Some(8080) + ) + ) + ) + .asInstanceOf[Origin] + ) + val request = Request[IO](headers = headers) + val expected = Header.Raw(ci"Access-Control-Allow-Origin", "http://origin.com") + service.accessControlAllowOriginHeader(request) shouldEqual expected + } + "give an open ACAO header if there are no Origin headers in the request" in { + val expected = Header.Raw(ci"Access-Control-Allow-Origin", "*") + service.accessControlAllowOriginHeader(Request[IO]()) shouldEqual expected + } + } + "cookieDomain" in { val testCookieConfig = CookieConfig( enabled = true, @@ -496,18 +603,17 @@ class CollectorServiceSpec extends Specification { "extractHosts" in { "correctly extract the host names from a list of values in the request's Origin header" in { - val origin: Origin = Origin.HostList( - NonEmptyList.of( - Origin.Host(scheme = Uri.Scheme.https, host = Uri.Host.unsafeFromString("origin.com")), - Origin.Host( - scheme = Uri.Scheme.http, - host = Uri.Host.unsafeFromString("subdomain.otherorigin.gov.co.uk"), - port = Some(8080) - ) + val originHostList = NonEmptyList.of( + Origin.Host(scheme = Uri.Scheme.https, host = Uri.Host.unsafeFromString("origin.com")), + Origin.Host( + scheme = Uri.Scheme.http, + host = Uri.Host.unsafeFromString("subdomain.otherorigin.gov.co.uk"), + port = Some(8080) ) ) - val headers = Headers(origin.toRaw1) - service.extractHosts(headers) shouldEqual Seq("origin.com", "subdomain.otherorigin.gov.co.uk") + val origin: Origin = Origin.HostList(originHostList) + val headers = Headers(origin.toRaw1) + service.extractHostsFromOrigin(headers) shouldEqual originHostList.toList } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala index a60a79c0a..a4aa99982 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala @@ -20,6 +20,7 @@ object TestUtils { secure = false, httpOnly = false, sameSite = None - ) + ), + cors = CORSConfig(60.seconds) ) } diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala index c70cdfd4b..3400ac297 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala @@ -35,7 +35,8 @@ object StdoutCollector extends IOApp { secure = false, httpOnly = false, sameSite = None - ) + ), + cors = CORSConfig(60.seconds) ), BuildInfo.shortName, BuildInfo.version From 7f3f236405b88596809491cec36b1fa1fc938af3 Mon Sep 17 00:00:00 2001 From: spenes Date: Mon, 14 Aug 2023 12:50:27 +0300 Subject: [PATCH 10/39] Add http4s anonymous tracking (close #372) --- .../CollectorRoutes.scala | 3 - .../CollectorService.scala | 50 +++- .../CollectorRoutesSpec.scala | 6 - .../CollectorServiceSpec.scala | 246 ++++++++++++++++-- 4 files changed, 268 insertions(+), 37 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala index 02aa9cb6e..d3a28e933 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala @@ -27,7 +27,6 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs collectorService.cookie( body = req.bodyText.compile.string.map(Some(_)), path = path, - cookie = None, //TODO: cookie will be added later request = req, pixelExpected = false, doNotTrack = false, @@ -39,7 +38,6 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs collectorService.cookie( body = Sync[F].pure(None), path = path, - cookie = None, //TODO: cookie will be added later request = req, pixelExpected = true, doNotTrack = false, @@ -50,7 +48,6 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs collectorService.cookie( body = Sync[F].pure(None), path = req.pathInfo.renderString, - cookie = None, //TODO: cookie will be added later request = req, pixelExpected = true, doNotTrack = false, diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala index 9ea656a69..d75e51e46 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala @@ -19,8 +19,6 @@ import org.http4s.Status._ import org.typelevel.ci._ -import com.comcast.ip4s.Dns - import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload import com.snowplowanalytics.snowplow.collectors.scalastream.model._ @@ -30,7 +28,6 @@ trait Service[F[_]] { def cookie( body: F[Option[String]], path: String, - cookie: Option[RequestCookie], request: Request[F], pixelExpected: Boolean, doNotTrack: Boolean, @@ -42,6 +39,8 @@ trait Service[F[_]] { object CollectorService { // Contains an invisible pixel to return for `/i` requests. val pixel = Base64.decodeBase64("R0lGODlhAQABAPAAAP///wAAACH5BAUAAAAALAAAAAABAAEAAAICRAEAOw==") + + val spAnonymousNuid = "00000000-0000-0000-0000-000000000000" } class CollectorService[F[_]: Sync]( @@ -51,8 +50,6 @@ class CollectorService[F[_]: Sync]( appVersion: String ) extends Service[F] { - implicit val dns: Dns[F] = Dns.forSync[F] - val pixelStream = Stream.iterable[F, Byte](CollectorService.pixel) // TODO: Add sink type as well @@ -63,23 +60,24 @@ class CollectorService[F[_]: Sync]( override def cookie( body: F[Option[String]], path: String, - cookie: Option[RequestCookie], request: Request[F], pixelExpected: Boolean, doNotTrack: Boolean, contentType: Option[String] = None ): F[Response[F]] = for { - body <- body - hostname <- request.remoteHost.map(_.map(_.toString)) + body <- body + hostname = extractHostname(request) userAgent = extractHeader(request, "User-Agent") refererUri = extractHeader(request, "Referer") spAnonymous = extractHeader(request, "SP-Anonymous") - ip = request.remoteAddr.map(_.toUriString) + ip = extractIp(request, spAnonymous) queryString = Some(request.queryString) + cookie = extractCookie(request) + nuidOpt = networkUserId(request, cookie, spAnonymous) + nuid = nuidOpt.getOrElse(UUID.randomUUID().toString) // TODO: Get ipAsPartitionKey from config (ipAddress, partitionKey) = ipAndPartitionKey(ip, ipAsPartitionKey = false) - nuid = UUID.randomUUID().toString // TODO: nuid should be set properly event = buildEvent( queryString, body, @@ -109,7 +107,8 @@ class CollectorService[F[_]: Sync]( ).flatten responseHeaders = Headers(headerList) _ <- sinkEvent(event, partitionKey) - } yield buildHttpResponse(responseHeaders, pixelExpected) + resp = buildHttpResponse(responseHeaders, pixelExpected) + } yield resp override def determinePath(vendor: String, version: String): String = { val original = s"/$vendor/$version" @@ -130,6 +129,18 @@ class CollectorService[F[_]: Sync]( def extractHeader(req: Request[F], headerName: String): Option[String] = req.headers.get(CIString(headerName)).map(_.head.value) + def extractCookie(req: Request[F]): Option[RequestCookie] = + config.cookieConfig.flatMap(c => req.cookies.find(_.name == c.name)) + + def extractHostname(req: Request[F]): Option[String] = + req.uri.authority.map(_.host.renderString) // Hostname is extracted like this in Akka-Http as well + + def extractIp(req: Request[F], spAnonymous: Option[String]): Option[String] = + spAnonymous match { + case None => req.from.map(_.toUriString) + case Some(_) => None + } + /** Builds a raw event from an Http request. */ def buildEvent( queryString: Option[String], @@ -331,4 +342,21 @@ class CollectorService[F[_]: Sync]( case None => ("unknown", UUID.randomUUID.toString) case Some(ip) => (ip, if (ipAsPartitionKey) ip else UUID.randomUUID.toString) } + + /** + * Gets the network user id from the query string or the request cookie. + * + * @param request Http request made + * @param requestCookie cookie associated to the Http request + * @return a network user id + */ + def networkUserId( + request: Request[F], + requestCookie: Option[RequestCookie], + spAnonymous: Option[String] + ): Option[String] = + spAnonymous match { + case Some(_) => Some(CollectorService.spAnonymousNuid) + case None => request.uri.query.params.get("nuid").orElse(requestCookie.map(_.content)) + } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala index f1caf284b..6025b0c62 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala @@ -15,7 +15,6 @@ class CollectorRoutesSpec extends Specification { case class CookieParams( body: IO[Option[String]], path: String, - cookie: Option[RequestCookie], request: Request[IO], pixelExpected: Boolean, doNotTrack: Boolean, @@ -34,7 +33,6 @@ class CollectorRoutesSpec extends Specification { override def cookie( body: IO[Option[String]], path: String, - cookie: Option[RequestCookie], request: Request[IO], pixelExpected: Boolean, doNotTrack: Boolean, @@ -44,7 +42,6 @@ class CollectorRoutesSpec extends Specification { cookieCalls += CookieParams( body, path, - cookie, request, pixelExpected, doNotTrack, @@ -95,7 +92,6 @@ class CollectorRoutesSpec extends Specification { val List(cookieParams) = collectorService.getCookieCalls cookieParams.body.unsafeRunSync() shouldEqual Some("testBody") cookieParams.path shouldEqual "/p1/p2" - cookieParams.cookie shouldEqual None cookieParams.pixelExpected shouldEqual false cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual Some("application/json") @@ -114,7 +110,6 @@ class CollectorRoutesSpec extends Specification { val List(cookieParams) = collectorService.getCookieCalls cookieParams.body.unsafeRunSync() shouldEqual None cookieParams.path shouldEqual "/p1/p2" - cookieParams.cookie shouldEqual None cookieParams.pixelExpected shouldEqual true cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual None @@ -137,7 +132,6 @@ class CollectorRoutesSpec extends Specification { val List(cookieParams) = collectorService.getCookieCalls cookieParams.body.unsafeRunSync() shouldEqual None cookieParams.path shouldEqual uri - cookieParams.cookie shouldEqual None cookieParams.pixelExpected shouldEqual true cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual None diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala index e06c8f8a9..60763f145 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala @@ -30,13 +30,13 @@ class CollectorServiceSpec extends Specification { `User-Agent`(ProductId("testUserAgent")), Referer(Uri.unsafeFromString("example.com")), `Content-Type`(MediaType.application.json), - `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), + `X-Forwarded-For`(IpAddress.fromString("192.0.2.3")), Cookie(RequestCookie("cookie", "value")), `Access-Control-Allow-Credentials`() ) val testConnection = Request.Connection( - local = SocketAddress.fromStringIp("127.0.0.1:80").get, - remote = SocketAddress.fromStringIp("127.0.0.1:80").get, + local = SocketAddress.fromStringIp("192.0.2.1:80").get, + remote = SocketAddress.fromStringIp("192.0.2.2:80").get, secure = false ) @@ -61,12 +61,15 @@ class CollectorServiceSpec extends Specification { "The collector service" should { "cookie" in { "not set a cookie if SP-Anonymous is present" in { - val request = Request[IO]().withHeaders(Header.Raw(ci"SP-Anonymous", "*")) + val request = Request[IO]( + headers = Headers( + Header.Raw(ci"SP-Anonymous", "*") + ) + ) val r = service .cookie( body = IO.pure(Some("b")), path = "p", - cookie = None, request = request, pixelExpected = false, doNotTrack = false, @@ -75,18 +78,147 @@ class CollectorServiceSpec extends Specification { .unsafeRunSync() r.headers.get(ci"Set-Cookie") must beNone } + "not set a network_userid from cookie if SP-Anonymous is present" in { + val ProbeService(service, good, bad) = probeService() + val nuid = "test-nuid" + val req = Request[IO]( + method = Method.POST, + headers = Headers( + Header.Raw(ci"SP-Anonymous", "*") + ) + ).addCookie(TestUtils.testConf.cookie.name, nuid) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif") + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.networkUserId shouldEqual "00000000-0000-0000-0000-000000000000" + } + "network_userid from cookie should persist if SP-Anonymous is not present" in { + val ProbeService(service, good, bad) = probeService() + val nuid = "test-nuid" + val req = Request[IO]( + method = Method.POST + ).addCookie(TestUtils.testConf.cookie.name, nuid) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif") + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.networkUserId shouldEqual "test-nuid" + } + "use the ip address from 'X-Forwarded-For' header if it exists" in { + val ProbeService(service, good, bad) = probeService() + val req = Request[IO]( + method = Method.POST, + headers = Headers( + `X-Forwarded-For`(IpAddress.fromString("192.0.2.4")) + ) + ).withAttribute(Request.Keys.ConnectionInfo, testConnection) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif") + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.ipAddress shouldEqual "192.0.2.4" + } + "use the ip address from remote address if 'X-Forwarded-For' header doesn't exist" in { + val ProbeService(service, good, bad) = probeService() + val req = Request[IO]( + method = Method.POST + ).withAttribute(Request.Keys.ConnectionInfo, testConnection) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif") + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.ipAddress shouldEqual "192.0.2.2" + } + "set the ip address to 'unknown' if if SP-Anonymous is present" in { + val ProbeService(service, good, bad) = probeService() + val req = Request[IO]( + method = Method.POST, + headers = Headers( + Header.Raw(ci"SP-Anonymous", "*") + ) + ).withAttribute(Request.Keys.ConnectionInfo, testConnection) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = "p", + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = Some("image/gif") + ) + .unsafeRunSync() + + r.status mustEqual Status.Ok + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + val e = emptyCollectorPayload + deserializer.deserialize(e, good.storedRawEvents.head) + e.ipAddress shouldEqual "unknown" + } "respond with a 200 OK and a good row in good sink" in { val ProbeService(service, good, bad) = probeService() + val nuid = "dfdb716e-ecf9-4d00-8b10-44edfbc8a108" val req = Request[IO]( method = Method.POST, headers = testHeaders, - uri = Uri(query = Query.unsafeFromString("a=b")) - ).withAttribute(Request.Keys.ConnectionInfo, testConnection) + uri = Uri( + query = Query.unsafeFromString("a=b"), + authority = Some(Uri.Authority(host = Uri.RegName("example.com"))) + ) + ).withAttribute(Request.Keys.ConnectionInfo, testConnection).addCookie(TestUtils.testConf.cookie.name, nuid) val r = service .cookie( body = IO.pure(Some("b")), path = "p", - cookie = None, request = req, pixelExpected = false, doNotTrack = false, @@ -101,7 +233,7 @@ class CollectorServiceSpec extends Specification { val e = emptyCollectorPayload deserializer.deserialize(e, good.storedRawEvents.head) e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" - e.ipAddress shouldEqual "127.0.0.1" + e.ipAddress shouldEqual "192.0.2.3" e.encoding shouldEqual "UTF-8" e.collector shouldEqual s"appName-appVersion" e.querystring shouldEqual "a=b" @@ -109,15 +241,15 @@ class CollectorServiceSpec extends Specification { e.path shouldEqual "p" e.userAgent shouldEqual "testUserAgent" e.refererUri shouldEqual "example.com" - e.hostname shouldEqual "localhost" - //e.networkUserId shouldEqual "nuid" //TODO: add check for nuid as well + e.hostname shouldEqual "example.com" + e.networkUserId shouldEqual nuid e.headers shouldEqual List( "User-Agent: testUserAgent", "Referer: example.com", "Content-Type: application/json", - "X-Forwarded-For: 127.0.0.1", - "Cookie: cookie=value", + "X-Forwarded-For: 192.0.2.3", "Access-Control-Allow-Credentials: true", + "Cookie: cookie=value; sp=dfdb716e-ecf9-4d00-8b10-44edfbc8a108", "image/gif" ).asJava e.contentType shouldEqual "image/gif" @@ -134,7 +266,6 @@ class CollectorServiceSpec extends Specification { .cookie( body = IO.pure(Some("b")), path = "p", - cookie = None, request = req, pixelExpected = false, doNotTrack = false, @@ -163,7 +294,6 @@ class CollectorServiceSpec extends Specification { .cookie( body = IO.pure(Some("b")), path = "p", - cookie = None, request = Request[IO](), pixelExpected = true, doNotTrack = false, @@ -181,7 +311,6 @@ class CollectorServiceSpec extends Specification { .cookie( body = IO.pure(Some("b")), path = "p", - cookie = None, request = Request[IO](), pixelExpected = true, doNotTrack = false, @@ -216,7 +345,6 @@ class CollectorServiceSpec extends Specification { .cookie( body = IO.pure(Some("b")), path = "p", - cookie = None, request = request, pixelExpected = true, doNotTrack = false, @@ -446,6 +574,90 @@ class CollectorServiceSpec extends Specification { } } + "headers" in { + "don't filter out the headers if SP-Anonymous is not present" in { + val request = Request[IO]( + headers = Headers( + `User-Agent`(ProductId("testUserAgent")), + `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), + Header.Raw(ci"X-Real-Ip", "127.0.0.1"), + Cookie(RequestCookie("cookie", "value")) + ) + ) + val expected = List( + "User-Agent: testUserAgent", + "X-Forwarded-For: 127.0.0.1", + "X-Real-Ip: 127.0.0.1", + "Cookie: cookie=value" + ) + service.headers(request, None) shouldEqual expected + } + "filter out the headers if SP-Anonymous is present" in { + val request = Request[IO]( + headers = Headers( + `User-Agent`(ProductId("testUserAgent")), + `X-Forwarded-For`(IpAddress.fromString("127.0.0.1")), + Header.Raw(ci"X-Real-Ip", "127.0.0.1"), + Cookie(RequestCookie("cookie", "value")) + ) + ) + val expected = List( + "User-Agent: testUserAgent" + ) + service.headers(request, Some("*")) shouldEqual expected + } + } + + "networkUserId" in { + "with SP-Anonymous header not present" in { + "give back the nuid query param if present" in { + service.networkUserId( + Request[IO]().withUri(Uri().withQueryParam("nuid", "12")), + Some(RequestCookie("nuid", "13")), + None + ) shouldEqual Some("12") + } + "give back the request cookie if there no nuid query param" in { + service.networkUserId( + Request[IO](), + Some(RequestCookie("nuid", "13")), + None + ) shouldEqual Some("13") + } + "give back none otherwise" in { + service.networkUserId( + Request[IO](), + None, + None + ) shouldEqual None + } + } + + "with SP-Anonymous header present give back the dummy nuid" in { + "if query param is present" in { + service.networkUserId( + Request[IO]().withUri(Uri().withQueryParam("nuid", "12")), + Some(RequestCookie("nuid", "13")), + Some("*") + ) shouldEqual Some("00000000-0000-0000-0000-000000000000") + } + "if the request cookie can be used in place of a missing nuid query param" in { + service.networkUserId( + Request[IO](), + Some(RequestCookie("nuid", "13")), + Some("*") + ) shouldEqual Some("00000000-0000-0000-0000-000000000000") + } + "in any other case" in { + service.networkUserId( + Request[IO](), + None, + Some("*") + ) shouldEqual Some("00000000-0000-0000-0000-000000000000") + } + } + } + "accessControlAllowOriginHeader" in { "give a restricted ACAO header if there is an Origin header in the request" in { val headers = Headers( From e32f5df7654bf5ae8f43c65e85b54b2a8a60ed67 Mon Sep 17 00:00:00 2001 From: Benjamin BENOIST Date: Wed, 16 Aug 2023 17:20:27 +0000 Subject: [PATCH 11/39] Load config (close #326) --- build.sbt | 17 +- http4s/src/main/resources/reference.conf | 85 ++++++++++ .../App.scala | 27 ++++ .../AppInfo.scala | 7 + .../Config.scala | 152 ++++++++++++++++++ .../ConfigParser.scala | 80 +++++++++ .../HttpServer.scala | 73 +++++++++ .../Routes.scala} | 16 +- .../Run.scala | 95 +++++++++++ .../Service.scala} | 91 +++++------ .../Sink.scala | 2 +- .../SplitBatch.scala | 8 +- .../model.scala | 29 +--- .../CollectorApp.scala | 89 ---------- http4s/src/test/resources/test-config.hocon | 18 +++ .../ConfigParserSpec.scala | 40 +++++ .../RoutesSpec.scala} | 18 ++- .../ServiceSpec.scala} | 102 ++++++------ .../SplitBatchSpec.scala | 15 +- .../TestSink.scala | 2 +- .../TestUtils.scala | 107 ++++++++++++ .../CollectorTestUtils.scala | 13 -- .../TestUtils.scala | 26 --- project/Dependencies.scala | 31 ++-- stdout/src/main/resources/application.conf | 33 +--- .../PrintingSink.scala | 27 ++++ .../SinkConfig.scala | 14 ++ .../StdoutCollector.scala | 17 ++ .../PrintingSink.scala | 31 ---- .../StdoutCollector.scala | 45 ------ .../sinks/PrintingSinkSpec.scala | 22 +-- 31 files changed, 912 insertions(+), 420 deletions(-) create mode 100644 http4s/src/main/resources/reference.conf create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala rename http4s/src/main/scala/{com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala => com.snowplowanalytics.snowplow.collector.core/Routes.scala} (76%) create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala rename http4s/src/main/scala/{com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala => com.snowplowanalytics.snowplow.collector.core/Service.scala} (83%) rename http4s/src/main/scala/{com.snowplowanalytics.snowplow.collectors.scalastream => com.snowplowanalytics.snowplow.collector.core}/Sink.scala (81%) rename http4s/src/main/scala/{com.snowplowanalytics.snowplow.collectors.scalastream => com.snowplowanalytics.snowplow.collector.core}/SplitBatch.scala (96%) rename http4s/src/main/scala/{com.snowplowanalytics.snowplow.collectors.scalastream => com.snowplowanalytics.snowplow.collector.core}/model.scala (50%) delete mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala create mode 100644 http4s/src/test/resources/test-config.hocon create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala rename http4s/src/test/scala/{com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala => com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala} (94%) rename http4s/src/test/scala/{com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala => com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala} (94%) rename http4s/src/test/scala/{com.snowplowanalytics.snowplow.collectors.scalastream => com.snowplowanalytics.snowplow.collector.core}/SplitBatchSpec.scala (92%) rename http4s/src/test/scala/{com.snowplowanalytics.snowplow.collectors.scalastream => com.snowplowanalytics.snowplow.collector.core}/TestSink.scala (87%) create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala delete mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala delete mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala create mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/PrintingSink.scala create mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala create mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala delete mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala delete mode 100644 stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala diff --git a/build.sbt b/build.sbt index 56f075e4d..e9bb395ef 100644 --- a/build.sbt +++ b/build.sbt @@ -9,7 +9,7 @@ * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ import com.typesafe.sbt.packager.docker._ -import sbtbuildinfo.BuildInfoPlugin.autoImport.buildInfoPackage +import sbtbuildinfo.BuildInfoPlugin.autoImport._ lazy val commonDependencies = Seq( // Java @@ -86,7 +86,7 @@ lazy val buildSettings = Seq( name := "snowplow-stream-collector", description := "Scala Stream Collector for Snowplow raw events", scalaVersion := "2.12.10", - scalacOptions ++= Seq("-Ypartial-unification"), + scalacOptions ++= Seq("-Ypartial-unification", "-Ywarn-macros:after"), javacOptions := Seq("-source", "11", "-target", "11"), resolvers ++= Dependencies.resolutionRepos ) @@ -96,6 +96,11 @@ lazy val dynVerSettings = Seq( ThisBuild / dynverSeparator := "-" // to be compatible with docker ) +lazy val http4sBuildInfoSettings = Seq( + buildInfoKeys := Seq[BuildInfoKey](name, dockerAlias, version), + buildInfoOptions += BuildInfoOption.Traits("com.snowplowanalytics.snowplow.collector.core.AppInfo") +) + lazy val allSettings = buildSettings ++ BuildSettings.sbtAssemblySettings ++ BuildSettings.formatting ++ @@ -130,7 +135,10 @@ lazy val http4s = project Dependencies.Libraries.badRows, Dependencies.Libraries.collectorPayload, Dependencies.Libraries.slf4j, - Dependencies.Libraries.specs2 + Dependencies.Libraries.decline, + Dependencies.Libraries.circeGeneric, + Dependencies.Libraries.circeConfig, + Dependencies.Libraries.specs2CE3 ) ) @@ -258,13 +266,14 @@ lazy val nsqDistroless = project .dependsOn(core % "test->test;compile->compile") lazy val stdoutSettings = - allSettings ++ buildInfoSettings ++ Seq( + allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( moduleName := "snowplow-stream-collector-stdout", Docker / packageName := "scala-stream-collector-stdout" ) lazy val stdout = project .settings(stdoutSettings) + .settings(buildInfoPackage := s"com.snowplowanalytics.snowplow.collector.stdout") .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) .dependsOn(http4s % "test->test;compile->compile") diff --git a/http4s/src/main/resources/reference.conf b/http4s/src/main/resources/reference.conf new file mode 100644 index 000000000..e6acbc7ef --- /dev/null +++ b/http4s/src/main/resources/reference.conf @@ -0,0 +1,85 @@ +{ + paths {} + + p3p { + policyRef = "/w3c/p3p.xml" + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + } + + crossDomain { + enabled = false + domains = [ "*" ] + secure = true + } + + cookie { + enabled = true + expiration = 365 days + domains = [] + name = sp + secure = true + httpOnly = true + sameSite = "None" + } + + doNotTrackCookie { + enabled = false + name = "" + value = "" + } + + cookieBounce { + enabled = false + name = "n3pc" + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000" + } + + redirectMacro { + enabled = false + } + + rootResponse { + enabled = false + statusCode = 302 + headers = {} + body = "" + } + + cors { + accessControlMaxAge = 60 seconds + } + + streams { + useIpAddressAsPartitionKey = false + + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } + } + + monitoring { + metrics { + statsd { + enabled = false + hostname = localhost + port = 8125 + period = 10 seconds + prefix = snowplow.collector + } + } + } + + ssl { + enable = false + redirect = false + port = 443 + } + + enableDefaultRedirect = false + + redirectDomains = [] + + preTerminationPeriod = 10 seconds +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala new file mode 100644 index 000000000..df25ac885 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala @@ -0,0 +1,27 @@ +package com.snowplowanalytics.snowplow.collector.core + +import cats.effect.{ExitCode, IO, Sync} +import cats.effect.kernel.Resource + +import com.monovore.decline.effect.CommandIOApp +import com.monovore.decline.Opts + +import io.circe.Decoder + +import com.snowplowanalytics.snowplow.collector.core.model.Sinks + +abstract class App[SinkConfig <: Config.Sink: Decoder](appInfo: AppInfo) + extends CommandIOApp( + name = App.helpCommand(appInfo), + header = "Snowplow application that collects tracking events", + version = appInfo.version + ) { + + def mkSinks[F[_]: Sync](config: Config.Streams[SinkConfig]): Resource[F, Sinks[F]] + + final def main: Opts[IO[ExitCode]] = Run.fromCli[IO, SinkConfig](appInfo, mkSinks) +} + +object App { + private def helpCommand(appInfo: AppInfo) = s"docker run ${appInfo.dockerAlias}" +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala new file mode 100644 index 000000000..1215a8149 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala @@ -0,0 +1,7 @@ +package com.snowplowanalytics.snowplow.collector.core + +trait AppInfo { + def name: String + def version: String + def dockerAlias: String +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala new file mode 100644 index 000000000..5e40f43cb --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -0,0 +1,152 @@ +package com.snowplowanalytics.snowplow.collector.core + +import scala.concurrent.duration.FiniteDuration + +import io.circe.config.syntax._ +import io.circe.generic.semiauto._ +import io.circe.Decoder +import io.circe._ + +import org.http4s.SameSite + +case class Config[+SinkConfig]( + interface: String, + port: Int, + paths: Map[String, String], + p3p: Config.P3P, + crossDomain: Config.CrossDomain, + cookie: Config.Cookie, + doNotTrackCookie: Config.DoNotTrackCookie, + cookieBounce: Config.CookieBounce, + redirectMacro: Config.RedirectMacro, + rootResponse: Config.RootResponse, + cors: Config.CORS, + streams: Config.Streams[SinkConfig], + monitoring: Config.Monitoring, + ssl: Config.SSL, + enableDefaultRedirect: Boolean, + redirectDomains: Set[String], + preTerminationPeriod: FiniteDuration +) + +object Config { + + case class P3P( + policyRef: String, + CP: String + ) + + case class CrossDomain( + enabled: Boolean, + domains: List[String], + secure: Boolean + ) + + case class Cookie( + enabled: Boolean, + name: String, + expiration: FiniteDuration, + domains: List[String], + fallbackDomain: Option[String], + secure: Boolean, + httpOnly: Boolean, + sameSite: Option[SameSite] + ) + + case class DoNotTrackCookie( + enabled: Boolean, + name: String, + value: String + ) + + case class CookieBounce( + enabled: Boolean, + name: String, + fallbackNetworkUserId: String, + forwardedProtocolHeader: Option[String] + ) + + case class RedirectMacro( + enabled: Boolean, + placeholder: Option[String] + ) + + case class RootResponse( + enabled: Boolean, + statusCode: Int, + headers: Map[String, String], + body: String + ) + + case class CORS( + accessControlMaxAge: FiniteDuration + ) + + case class Streams[+SinkConfig]( + good: String, + bad: String, + useIpAddressAsPartitionKey: Boolean, + sink: SinkConfig, + buffer: Buffer + ) + + trait Sink { + val maxBytes: Int + } + + case class Buffer( + byteLimit: Long, + recordLimit: Long, + timeLimit: Long + ) + + case class Monitoring( + metrics: Metrics + ) + + case class Metrics( + statsd: Statsd + ) + + case class Statsd( + enabled: Boolean, + hostname: String, + port: Int, + period: FiniteDuration, + prefix: String + ) + + case class SSL( + enable: Boolean, + redirect: Boolean, + port: Int + ) + + implicit def decoder[SinkConfig: Decoder]: Decoder[Config[SinkConfig]] = { + implicit val p3p = deriveDecoder[P3P] + implicit val crossDomain = deriveDecoder[CrossDomain] + implicit val sameSite: Decoder[SameSite] = Decoder.instance { cur => + cur.as[String].map(_.toLowerCase) match { + case Right("none") => Right(SameSite.None) + case Right("strict") => Right(SameSite.Strict) + case Right("lax") => Right(SameSite.Lax) + case Right(other) => + Left(DecodingFailure(s"sameSite $other is not supported. Accepted values: None, Strict, Lax", cur.history)) + case Left(err) => Left(err) + } + } + implicit val cookie = deriveDecoder[Cookie] + implicit val doNotTrackCookie = deriveDecoder[DoNotTrackCookie] + implicit val cookieBounce = deriveDecoder[CookieBounce] + implicit val redirectMacro = deriveDecoder[RedirectMacro] + implicit val rootResponse = deriveDecoder[RootResponse] + implicit val cors = deriveDecoder[CORS] + implicit val buffer = deriveDecoder[Buffer] + implicit val streams = deriveDecoder[Streams[SinkConfig]] + implicit val statsd = deriveDecoder[Statsd] + implicit val metrics = deriveDecoder[Metrics] + implicit val monitoring = deriveDecoder[Monitoring] + implicit val ssl = deriveDecoder[SSL] + deriveDecoder[Config[SinkConfig]] + } +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala new file mode 100644 index 000000000..c2960ba8d --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala @@ -0,0 +1,80 @@ +package com.snowplowanalytics.snowplow.collector.core + +import java.nio.file.{Files, Path} + +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import com.typesafe.config.{Config => TypesafeConfig, ConfigFactory} + +import scala.collection.JavaConverters._ + +import io.circe.Decoder +import io.circe.config.syntax.CirceConfigOps + +import cats.implicits._ +import cats.data.EitherT + +import cats.effect.{ExitCode, Sync} + +object ConfigParser { + + implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] + + def fromPath[F[_]: Sync, SinkConfig: Decoder]( + configPath: Option[Path] + ): EitherT[F, ExitCode, Config[SinkConfig]] = { + val eitherT = configPath match { + case Some(path) => + for { + text <- EitherT(readTextFrom[F](path)) + hocon <- EitherT.fromEither[F](hoconFromString(text)) + result <- EitherT.fromEither[F](resolve[Config[SinkConfig]](hocon)) + } yield result + case None => + EitherT.fromEither[F]( + for { + config <- Either + .catchNonFatal(namespaced(ConfigFactory.load())) + .leftMap(e => s"Error loading the configuration (without config file): ${e.getMessage}") + parsed <- config.as[Config[SinkConfig]].leftMap(_.show) + } yield parsed + ) + } + + eitherT.leftSemiflatMap { str => + Logger[F].error(str).as(ExitCode.Error) + } + } + + private def readTextFrom[F[_]: Sync](path: Path): F[Either[String, String]] = + Sync[F].blocking { + Either + .catchNonFatal(Files.readAllLines(path).asScala.mkString("\n")) + .leftMap(e => s"Error reading ${path.toAbsolutePath} file from filesystem: ${e.getMessage}") + } + + private def hoconFromString(str: String): Either[String, TypesafeConfig] = + Either.catchNonFatal(ConfigFactory.parseString(str)).leftMap(_.getMessage) + + private def resolve[A: Decoder](hocon: TypesafeConfig): Either[String, A] = { + val either = for { + resolved <- Either.catchNonFatal(hocon.resolve()).leftMap(_.getMessage) + resolved <- Either.catchNonFatal(loadAll(resolved)).leftMap(_.getMessage) + parsed <- resolved.as[A].leftMap(_.show) + } yield parsed + either.leftMap(e => s"Cannot resolve config: $e") + } + + private def loadAll(config: TypesafeConfig): TypesafeConfig = + namespaced(ConfigFactory.load(namespaced(config.withFallback(namespaced(ConfigFactory.load()))))) + + private def namespaced(config: TypesafeConfig): TypesafeConfig = { + val namespace = "collector" + if (config.hasPath(namespace)) + config.getConfig(namespace).withFallback(config.withoutPath(namespace)) + else + config + } + +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala new file mode 100644 index 000000000..cfbc2ebe5 --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala @@ -0,0 +1,73 @@ +package com.snowplowanalytics.snowplow.collector.core + +import java.net.InetSocketAddress + +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import scala.concurrent.duration.DurationLong + +import com.comcast.ip4s.{IpAddress, Port} + +import cats.implicits._ + +import cats.effect.{Async, Resource} + +import org.http4s.HttpApp +import org.http4s.server.Server +import org.http4s.ember.server.EmberServerBuilder +import org.http4s.blaze.server.BlazeServerBuilder +import org.http4s.netty.server.NettyServerBuilder + +import fs2.io.net.Network + +object HttpServer { + + implicit private def logger[F[_]: Async] = Slf4jLogger.getLogger[F] + + def build[F[_]: Async]( + app: HttpApp[F], + interface: String, + port: Int + ): Resource[F, Server] = + sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { + case Some("EMBER") | None => buildEmberServer[F](app, interface, port) + case Some("BLAZE") => buildBlazeServer[F](app, port) + case Some("NETTY") => buildNettyServer[F](app, port) + case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") + } + + private def buildEmberServer[F[_]: Async]( + app: HttpApp[F], + interface: String, + port: Int + ) = { + implicit val network = Network.forAsync[F] + Resource.eval(Logger[F].info("Building ember server")) >> + EmberServerBuilder + .default[F] + .withHost(IpAddress.fromString(interface).get) + .withPort(Port.fromInt(port).get) + .withHttpApp(app) + .withIdleTimeout(610.seconds) + .build + } + + private def buildBlazeServer[F[_]: Async]( + app: HttpApp[F], + port: Int + ): Resource[F, Server] = + Resource.eval(Logger[F].info("Building blaze server")) >> + BlazeServerBuilder[F] + .bindSocketAddress(new InetSocketAddress(port)) + .withHttpApp(app) + .withIdleTimeout(610.seconds) + .resource + + private def buildNettyServer[F[_]: Async]( + app: HttpApp[F], + port: Int + ): Resource[F, Server] = + Resource.eval(Logger[F].info("Building netty server")) >> + NettyServerBuilder[F].bindLocal(port).withHttpApp(app).withIdleTimeout(610.seconds).resource +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala similarity index 76% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala rename to http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index d3a28e933..21401d157 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -1,4 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import cats.implicits._ import cats.effect.Sync @@ -7,7 +7,7 @@ import org.http4s.dsl.Http4sDsl import org.http4s.implicits._ import com.comcast.ip4s.Dns -class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDsl[F] { +class Routes[F[_]: Sync](service: IService[F]) extends Http4sDsl[F] { implicit val dns: Dns[F] = Dns.forSync[F] @@ -18,13 +18,13 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs private val corsRoute = HttpRoutes.of[F] { case req @ OPTIONS -> _ => - collectorService.preflightResponse(req) + service.preflightResponse(req) } private val cookieRoutes = HttpRoutes.of[F] { case req @ POST -> Root / vendor / version => - val path = collectorService.determinePath(vendor, version) - collectorService.cookie( + val path = service.determinePath(vendor, version) + service.cookie( body = req.bodyText.compile.string.map(Some(_)), path = path, request = req, @@ -34,8 +34,8 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs ) case req @ (GET | HEAD) -> Root / vendor / version => - val path = collectorService.determinePath(vendor, version) - collectorService.cookie( + val path = service.determinePath(vendor, version) + service.cookie( body = Sync[F].pure(None), path = path, request = req, @@ -45,7 +45,7 @@ class CollectorRoutes[F[_]: Sync](collectorService: Service[F]) extends Http4sDs ) case req @ (GET | HEAD) -> Root / ("ice.png" | "i") => - collectorService.cookie( + service.cookie( body = Sync[F].pure(None), path = req.pathInfo.renderString, request = req, diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala new file mode 100644 index 000000000..3a107651f --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -0,0 +1,95 @@ +package com.snowplowanalytics.snowplow.collector.core + +import java.nio.file.Path + +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import scala.concurrent.duration.FiniteDuration + +import cats.implicits._ +import cats.data.EitherT + +import cats.effect.{Async, ExitCode, Sync} +import cats.effect.kernel.Resource + +import com.monovore.decline.Opts + +import io.circe.Decoder + +import com.snowplowanalytics.snowplow.collector.core.model.Sinks + +object Run { + + implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] + + def fromCli[F[_]: Async, SinkConfig: Decoder]( + appInfo: AppInfo, + mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]] + ): Opts[F[ExitCode]] = { + val configPath = Opts.option[Path]("config", "Path to HOCON configuration (optional)", "c", "config.hocon").orNone + configPath.map(fromPath[F, SinkConfig](appInfo, mkSinks, _)) + } + + private def fromPath[F[_]: Async, SinkConfig: Decoder]( + appInfo: AppInfo, + mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], + path: Option[Path] + ): F[ExitCode] = { + val eitherT = for { + config <- ConfigParser.fromPath[F, SinkConfig](path) + _ <- EitherT.right[ExitCode](fromConfig(appInfo, mkSinks, config)) + } yield ExitCode.Success + + eitherT.merge.handleErrorWith { e => + Logger[F].error(e)("Exiting") >> + prettyLogException(e).as(ExitCode.Error) + } + } + + private def fromConfig[F[_]: Async, SinkConfig]( + appInfo: AppInfo, + mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], + config: Config[SinkConfig] + ): F[ExitCode] = { + val resources = for { + sinks <- mkSinks(config.streams) + collectorService = new Service[F]( + config, + Sinks(sinks.good, sinks.bad), + appInfo + ) + httpServer = HttpServer.build[F]( + new Routes[F](collectorService).value, + config.interface, + config.port + ) + _ <- withGracefulShutdown(config.preTerminationPeriod)(httpServer) + } yield () + + resources.surround(Async[F].never[ExitCode]) + } + + private def prettyLogException[F[_]: Sync](e: Throwable): F[Unit] = { + + def logCause(e: Throwable): F[Unit] = + Option(e.getCause) match { + case Some(e) => Logger[F].error(s"caused by: ${e.getMessage}") >> logCause(e) + case None => Sync[F].unit + } + + Logger[F].error(e.getMessage) >> logCause(e) + } + + private def withGracefulShutdown[F[_]: Async, A](delay: FiniteDuration)(resource: Resource[F, A]): Resource[F, A] = + for { + a <- resource + _ <- Resource.onFinalizeCase { + case Resource.ExitCase.Canceled => + Logger[F].warn(s"Shutdown interrupted. Will continue to serve requests for $delay") >> + Async[F].sleep(delay) + case _ => + Async[F].unit + } + } yield a +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala similarity index 83% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala rename to http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index d75e51e46..3a20f51eb 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -1,4 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import java.util.UUID @@ -21,9 +21,9 @@ import org.typelevel.ci._ import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import com.snowplowanalytics.snowplow.collector.core.model._ -trait Service[F[_]] { +trait IService[F[_]] { def preflightResponse(req: Request[F]): F[Response[F]] def cookie( body: F[Option[String]], @@ -36,26 +36,24 @@ trait Service[F[_]] { def determinePath(vendor: String, version: String): String } -object CollectorService { +object Service { // Contains an invisible pixel to return for `/i` requests. val pixel = Base64.decodeBase64("R0lGODlhAQABAPAAAP///wAAACH5BAUAAAAALAAAAAABAAEAAAICRAEAOw==") val spAnonymousNuid = "00000000-0000-0000-0000-000000000000" } -class CollectorService[F[_]: Sync]( - config: CollectorConfig, - sinks: CollectorSinks[F], - appName: String, - appVersion: String -) extends Service[F] { +class Service[F[_]: Sync]( + config: Config[Any], + sinks: Sinks[F], + appInfo: AppInfo +) extends IService[F] { - val pixelStream = Stream.iterable[F, Byte](CollectorService.pixel) + val pixelStream = Stream.iterable[F, Byte](Service.pixel) - // TODO: Add sink type as well - private val collector = s"$appName-$appVersion" + private val collector = s"${appInfo.name}:${appInfo.version}" - private val splitBatch: SplitBatch = SplitBatch(appName, appVersion) + private val splitBatch: SplitBatch = SplitBatch(appInfo) override def cookie( body: F[Option[String]], @@ -67,17 +65,16 @@ class CollectorService[F[_]: Sync]( ): F[Response[F]] = for { body <- body - hostname = extractHostname(request) - userAgent = extractHeader(request, "User-Agent") - refererUri = extractHeader(request, "Referer") - spAnonymous = extractHeader(request, "SP-Anonymous") - ip = extractIp(request, spAnonymous) - queryString = Some(request.queryString) - cookie = extractCookie(request) - nuidOpt = networkUserId(request, cookie, spAnonymous) - nuid = nuidOpt.getOrElse(UUID.randomUUID().toString) - // TODO: Get ipAsPartitionKey from config - (ipAddress, partitionKey) = ipAndPartitionKey(ip, ipAsPartitionKey = false) + hostname = extractHostname(request) + userAgent = extractHeader(request, "User-Agent") + refererUri = extractHeader(request, "Referer") + spAnonymous = extractHeader(request, "SP-Anonymous") + ip = extractIp(request, spAnonymous) + queryString = Some(request.queryString) + cookie = extractCookie(request) + nuidOpt = networkUserId(request, cookie, spAnonymous) + nuid = nuidOpt.getOrElse(UUID.randomUUID().toString) + (ipAddress, partitionKey) = ipAndPartitionKey(ip, config.streams.useIpAddressAsPartitionKey) event = buildEvent( queryString, body, @@ -93,7 +90,7 @@ class CollectorService[F[_]: Sync]( now <- Clock[F].realTime setCookie = cookieHeader( headers = request.headers, - cookieConfig = config.cookieConfig, + cookieConfig = config.cookie, networkUserId = nuid, doNotTrack = doNotTrack, spAnonymous = spAnonymous, @@ -130,7 +127,7 @@ class CollectorService[F[_]: Sync]( req.headers.get(CIString(headerName)).map(_.head.value) def extractCookie(req: Request[F]): Option[RequestCookie] = - config.cookieConfig.flatMap(c => req.cookies.find(_.name == c.name)) + req.cookies.find(_.name == config.cookie.name) def extractHostname(req: Request[F]): Option[String] = req.uri.authority.map(_.host.renderString) // Hostname is extracted like this in Akka-Http as well @@ -237,32 +234,28 @@ class CollectorService[F[_]: Sync]( */ def cookieHeader( headers: Headers, - cookieConfig: Option[CookieConfig], + cookieConfig: Config.Cookie, networkUserId: String, doNotTrack: Boolean, spAnonymous: Option[String], now: FiniteDuration ): Option[`Set-Cookie`] = - if (doNotTrack) { - None - } else { - spAnonymous match { - case Some(_) => None - case None => - cookieConfig.map { config => - val responseCookie = ResponseCookie( - name = config.name, - content = networkUserId, - expires = Some(HttpDate.unsafeFromEpochSecond((now + config.expiration).toSeconds)), - domain = cookieDomain(headers, config.domains, config.fallbackDomain), - path = Some("/"), - sameSite = config.sameSite, - secure = config.secure, - httpOnly = config.httpOnly - ) - `Set-Cookie`(responseCookie) - } - } + (doNotTrack, cookieConfig.enabled, spAnonymous) match { + case (true, _, _) => None + case (_, false, _) => None + case (_, _, Some(_)) => None + case _ => + val responseCookie = ResponseCookie( + name = cookieConfig.name, + content = networkUserId, + expires = Some(HttpDate.unsafeFromEpochSecond((now + cookieConfig.expiration).toSeconds)), + domain = cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain), + path = Some("/"), + sameSite = cookieConfig.sameSite, + secure = cookieConfig.secure, + httpOnly = cookieConfig.httpOnly + ) + Some(`Set-Cookie`(responseCookie)) } /** @@ -356,7 +349,7 @@ class CollectorService[F[_]: Sync]( spAnonymous: Option[String] ): Option[String] = spAnonymous match { - case Some(_) => Some(CollectorService.spAnonymousNuid) + case Some(_) => Some(Service.spAnonymousNuid) case None => request.uri.query.params.get("nuid").orElse(requestCookie.map(_.content)) } } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala similarity index 81% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala rename to http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala index 8cdc85935..5a5c7d05b 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Sink.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala @@ -1,4 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core trait Sink[F[_]] { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala similarity index 96% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala rename to http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala index 907adcc51..f7114be0e 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatch.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala @@ -1,4 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import java.nio.ByteBuffer import java.nio.charset.StandardCharsets.UTF_8 @@ -14,10 +14,10 @@ import com.snowplowanalytics.iglu.core._ import com.snowplowanalytics.iglu.core.circe.CirceIgluCodecs._ import com.snowplowanalytics.snowplow.badrows._ import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import com.snowplowanalytics.snowplow.collector.core.model._ /** Object handling splitting an array of strings correctly */ -case class SplitBatch(appName: String, appVersion: String) { +case class SplitBatch(appInfo: AppInfo) { // Serialize Thrift CollectorPayload objects val ThriftSerializer = new ThreadLocal[TSerializer] { @@ -124,7 +124,7 @@ case class SplitBatch(appName: String, appVersion: String) { ): Array[Byte] = BadRow .SizeViolation( - Processor(appName, appVersion), + Processor(appInfo.name, appInfo.version), Failure.SizeViolation(Instant.now(), maxSize, size, s"oversized collector payload: $msg"), Payload.RawPayload(event.toString().take(maxSize / 10)) ) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala similarity index 50% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala rename to http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala index ff4eabfc9..1a998715f 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala @@ -1,8 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import scala.concurrent.duration._ - -import org.http4s.SameSite +package com.snowplowanalytics.snowplow.collector.core import io.circe.Json @@ -12,7 +8,7 @@ object model { * Case class for holding both good and * bad sinks for the Stream Collector. */ - final case class CollectorSinks[F[_]](good: Sink[F], bad: Sink[F]) + final case class Sinks[F[_]](good: Sink[F], bad: Sink[F]) /** * Case class for holding the results of @@ -30,25 +26,4 @@ object model { * @param failedBigEvents List of events that were too large */ final case class SplitBatchResult(goodBatches: List[List[Json]], failedBigEvents: List[Json]) - - final case class CookieConfig( - enabled: Boolean, - name: String, - expiration: FiniteDuration, - domains: List[String], - fallbackDomain: Option[String], - secure: Boolean, - httpOnly: Boolean, - sameSite: Option[SameSite] - ) - - final case class CORSConfig(accessControlMaxAge: FiniteDuration) - - final case class CollectorConfig( - paths: Map[String, String], - cookie: CookieConfig, - cors: CORSConfig - ) { - val cookieConfig = if (cookie.enabled) Some(cookie) else None - } } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala deleted file mode 100644 index 82074116d..000000000 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorApp.scala +++ /dev/null @@ -1,89 +0,0 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import cats.implicits._ -import cats.effect.{Async, ExitCode, Sync} -import cats.effect.kernel.Resource -import fs2.io.net.Network -import com.comcast.ip4s.IpLiteralSyntax -import org.http4s.HttpApp -import org.http4s.server.Server -import org.http4s.ember.server.EmberServerBuilder -import org.http4s.blaze.server.BlazeServerBuilder -import org.http4s.netty.server.NettyServerBuilder -import org.typelevel.log4cats.Logger -import org.typelevel.log4cats.slf4j.Slf4jLogger - -import java.net.InetSocketAddress -import scala.concurrent.duration.{DurationLong, FiniteDuration} - -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -object CollectorApp { - - implicit private def unsafeLogger[F[_]: Sync]: Logger[F] = - Slf4jLogger.getLogger[F] - - def run[F[_]: Async]( - mkGood: Resource[F, Sink[F]], - mkBad: Resource[F, Sink[F]], - config: CollectorConfig, - appName: String, - appVersion: String - ): F[ExitCode] = { - val resources = for { - bad <- mkBad - good <- mkGood - _ <- withGracefulShutdown(610.seconds) { - val sinks = CollectorSinks(good, bad) - val collectorService: CollectorService[F] = new CollectorService[F](config, sinks, appName, appVersion) - buildHttpServer[F](new CollectorRoutes[F](collectorService).value) - } - } yield () - - resources.surround(Async[F].never[ExitCode]) - } - - private def withGracefulShutdown[F[_]: Async, A](delay: FiniteDuration)(resource: Resource[F, A]): Resource[F, A] = - for { - a <- resource - _ <- Resource.onFinalizeCase { - case Resource.ExitCase.Canceled => - Logger[F].warn(s"Shutdown interrupted. Will continue to serve requests for $delay") >> - Async[F].sleep(delay) - case _ => - Async[F].unit - } - } yield a - - private def buildHttpServer[F[_]: Async](app: HttpApp[F]): Resource[F, Server] = - sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { - case Some("EMBER") | None => buildEmberServer[F](app) - case Some("BLAZE") => buildBlazeServer[F](app) - case Some("NETTY") => buildNettyServer[F](app) - case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") - } - - private def buildEmberServer[F[_]: Async](app: HttpApp[F]) = { - implicit val network = Network.forAsync[F] - Resource.eval(Logger[F].info("Building ember server")) >> - EmberServerBuilder - .default[F] - .withHost(ipv4"0.0.0.0") - .withPort(port"8080") - .withHttpApp(app) - .withIdleTimeout(610.seconds) - .build - } - - private def buildBlazeServer[F[_]: Async](app: HttpApp[F]): Resource[F, Server] = - Resource.eval(Logger[F].info("Building blaze server")) >> - BlazeServerBuilder[F] - .bindSocketAddress(new InetSocketAddress(8080)) - .withHttpApp(app) - .withIdleTimeout(610.seconds) - .resource - - private def buildNettyServer[F[_]: Async](app: HttpApp[F]): Resource[F, Server] = - Resource.eval(Logger[F].info("Building netty server")) >> - NettyServerBuilder[F].bindLocal(8080).withHttpApp(app).withIdleTimeout(610.seconds).resource -} diff --git a/http4s/src/test/resources/test-config.hocon b/http4s/src/test/resources/test-config.hocon new file mode 100644 index 000000000..71202d62f --- /dev/null +++ b/http4s/src/test/resources/test-config.hocon @@ -0,0 +1,18 @@ +collector { + interface = "0.0.0.0" + port = 8080 + + streams { + good = "good" + bad = "bad" + + sink { + foo = "hello" + bar = "world" + } + } + + ssl { + enable = true + } +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala new file mode 100644 index 000000000..8106ab345 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala @@ -0,0 +1,40 @@ +package com.snowplowanalytics.snowplow.collector.core + +import java.nio.file.Paths + +import org.specs2.mutable.Specification + +import cats.effect.IO + +import cats.effect.testing.specs2.CatsEffect + +import io.circe.generic.semiauto._ + +class ConfigParserSpec extends Specification with CatsEffect { + + "Loading the configuration" should { + "use reference.conf and the hocon specified in the path" in { + case class SinkConfig(foo: String, bar: String) + implicit val decoder = deriveDecoder[SinkConfig] + + val path = Paths.get(getClass.getResource(("/test-config.hocon")).toURI()) + + val expectedStreams = Config.Streams[SinkConfig]( + "good", + "bad", + TestUtils.testConfig.streams.useIpAddressAsPartitionKey, + SinkConfig("hello", "world"), + TestUtils.testConfig.streams.buffer + ) + val expected = TestUtils + .testConfig + .copy[SinkConfig]( + paths = Map.empty[String, String], + streams = expectedStreams, + ssl = TestUtils.testConfig.ssl.copy(enable = true) + ) + + ConfigParser.fromPath[IO, SinkConfig](Some(path)).value.map(_ should beRight(expected)) + } + } +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala similarity index 94% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala rename to http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 6025b0c62..7b67ede3e 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -1,16 +1,20 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import scala.collection.mutable.ListBuffer + +import org.specs2.mutable.Specification + import cats.effect.IO import cats.effect.unsafe.implicits.global + import org.http4s.implicits._ import org.http4s._ import org.http4s.headers._ import org.http4s.Status._ + import fs2.{Stream, text} -import org.specs2.mutable.Specification -class CollectorRoutesSpec extends Specification { +class RoutesSpec extends Specification { case class CookieParams( body: IO[Option[String]], @@ -21,7 +25,7 @@ class CollectorRoutesSpec extends Specification { contentType: Option[String] ) - class TestService() extends Service[IO] { + class TestService() extends IService[IO] { private val cookieCalls: ListBuffer[CookieParams] = ListBuffer() @@ -54,9 +58,9 @@ class CollectorRoutesSpec extends Specification { } def createTestServices = { - val collectorService = new TestService() - val routes = new CollectorRoutes[IO](collectorService).value - (collectorService, routes) + val service = new TestService() + val routes = new Routes(service).value + (service, routes) } "The collector route" should { diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala similarity index 94% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala rename to http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index 60763f145..3b3fa4903 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -1,28 +1,36 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import scala.concurrent.duration._ import scala.collection.JavaConverters._ + +import org.specs2.mutable.Specification + +import org.typelevel.ci._ + +import org.apache.thrift.{TDeserializer, TSerializer} + +import com.comcast.ip4s.{IpAddress, SocketAddress} + +import cats.data.NonEmptyList + import cats.effect.{Clock, IO} import cats.effect.unsafe.implicits.global -import cats.data.NonEmptyList -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload + import org.http4s._ import org.http4s.headers._ import org.http4s.implicits._ -import org.typelevel.ci._ -import com.comcast.ip4s.{IpAddress, SocketAddress} -import org.specs2.mutable.Specification -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import org.apache.thrift.{TDeserializer, TSerializer} -class CollectorServiceSpec extends Specification { - case class ProbeService(service: CollectorService[IO], good: TestSink, bad: TestSink) +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload + +import com.snowplowanalytics.snowplow.collector.core.model._ + +class ServiceSpec extends Specification { + case class ProbeService(service: Service[IO], good: TestSink, bad: TestSink) - val service = new CollectorService[IO]( - config = TestUtils.testConf, - sinks = CollectorSinks[IO](new TestSink, new TestSink), - appName = "appName", - appVersion = "appVersion" + val service = new Service( + config = TestUtils.testConfig, + sinks = Sinks(new TestSink, new TestSink), + appInfo = TestUtils.appInfo ) val event = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") val uuidRegex = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}".r @@ -43,11 +51,10 @@ class CollectorServiceSpec extends Specification { def probeService(): ProbeService = { val good = new TestSink val bad = new TestSink - val service = new CollectorService[IO]( - config = TestUtils.testConf, - sinks = CollectorSinks[IO](good, bad), - appName = "appName", - appVersion = "appVersion" + val service = new Service( + config = TestUtils.testConfig, + sinks = Sinks(good, bad), + appInfo = TestUtils.appInfo ) ProbeService(service, good, bad) } @@ -86,7 +93,7 @@ class CollectorServiceSpec extends Specification { headers = Headers( Header.Raw(ci"SP-Anonymous", "*") ) - ).addCookie(TestUtils.testConf.cookie.name, nuid) + ).addCookie(TestUtils.testConfig.cookie.name, nuid) val r = service .cookie( body = IO.pure(Some("b")), @@ -110,7 +117,7 @@ class CollectorServiceSpec extends Specification { val nuid = "test-nuid" val req = Request[IO]( method = Method.POST - ).addCookie(TestUtils.testConf.cookie.name, nuid) + ).addCookie(TestUtils.testConfig.cookie.name, nuid) val r = service .cookie( body = IO.pure(Some("b")), @@ -214,7 +221,7 @@ class CollectorServiceSpec extends Specification { query = Query.unsafeFromString("a=b"), authority = Some(Uri.Authority(host = Uri.RegName("example.com"))) ) - ).withAttribute(Request.Keys.ConnectionInfo, testConnection).addCookie(TestUtils.testConf.cookie.name, nuid) + ).withAttribute(Request.Keys.ConnectionInfo, testConnection).addCookie(TestUtils.testConfig.cookie.name, nuid) val r = service .cookie( body = IO.pure(Some("b")), @@ -235,7 +242,7 @@ class CollectorServiceSpec extends Specification { e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" e.ipAddress shouldEqual "192.0.2.3" e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"appName-appVersion" + e.collector shouldEqual s"${TestUtils.appName}:${TestUtils.appVersion}" e.querystring shouldEqual "a=b" e.body shouldEqual "b" e.path shouldEqual "p" @@ -303,7 +310,7 @@ class CollectorServiceSpec extends Specification { r.headers.get[`Cache-Control`] shouldEqual Some( `Cache-Control`(CacheDirective.`no-cache`(), CacheDirective.`no-store`, CacheDirective.`must-revalidate`) ) - r.body.compile.toList.unsafeRunSync().toArray shouldEqual CollectorService.pixel + r.body.compile.toList.unsafeRunSync().toArray shouldEqual Service.pixel } "include CORS headers in the response" in { @@ -391,7 +398,7 @@ class CollectorServiceSpec extends Specification { e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" e.ipAddress shouldEqual "ip" e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"appName-appVersion" + e.collector shouldEqual s"${TestUtils.appName}:${TestUtils.appVersion}" e.querystring shouldEqual "q" e.body shouldEqual "b" e.path shouldEqual "p" @@ -420,7 +427,7 @@ class CollectorServiceSpec extends Specification { e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" e.ipAddress shouldEqual "ip" e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"appName-appVersion" + e.collector shouldEqual s"${TestUtils.appName}:${TestUtils.appVersion}" e.querystring shouldEqual null e.body shouldEqual null e.path shouldEqual "p" @@ -448,7 +455,7 @@ class CollectorServiceSpec extends Specification { val res = service.buildHttpResponse(testHeaders, pixelExpected = true) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders.put(`Content-Type`(MediaType.image.gif)) - res.body.compile.toList.unsafeRunSync().toArray shouldEqual CollectorService.pixel + res.body.compile.toList.unsafeRunSync().toArray shouldEqual Service.pixel } "send back ok otherwise" in { val res = service.buildHttpResponse(testHeaders, pixelExpected = false) @@ -477,7 +484,7 @@ class CollectorServiceSpec extends Specification { } "cookieHeader" in { - val testCookieConfig = CookieConfig( + val testCookieConfig = Config.Cookie( enabled = true, name = "name", expiration = 5.seconds, @@ -491,30 +498,29 @@ class CollectorServiceSpec extends Specification { "give back a cookie header with the appropriate configuration" in { val nuid = "nuid" - val conf = testCookieConfig val Some(`Set-Cookie`(cookie)) = service.cookieHeader( headers = Headers.empty, - cookieConfig = Some(conf), + cookieConfig = testCookieConfig, networkUserId = nuid, doNotTrack = false, spAnonymous = None, now = now ) - cookie.name shouldEqual conf.name + cookie.name shouldEqual testCookieConfig.name cookie.content shouldEqual nuid cookie.domain shouldEqual None cookie.path shouldEqual Some("/") cookie.expires must beSome - (cookie.expires.get.toDuration - now).toMillis must beCloseTo(conf.expiration.toMillis, 1000L) + (cookie.expires.get.toDuration - now).toMillis must beCloseTo(testCookieConfig.expiration.toMillis, 1000L) cookie.secure must beFalse cookie.httpOnly must beFalse cookie.extension must beEmpty } - "give back None if no configuration is given" in { + "give back None if cookie is not enabled" in { service.cookieHeader( headers = Headers.empty, - cookieConfig = None, + cookieConfig = testCookieConfig.copy(enabled = false), networkUserId = "nuid", doNotTrack = false, spAnonymous = None, @@ -522,10 +528,9 @@ class CollectorServiceSpec extends Specification { ) shouldEqual None } "give back None if doNoTrack is true" in { - val conf = testCookieConfig service.cookieHeader( headers = Headers.empty, - cookieConfig = Some(conf), + cookieConfig = testCookieConfig, networkUserId = "nuid", doNotTrack = true, spAnonymous = None, @@ -533,10 +538,9 @@ class CollectorServiceSpec extends Specification { ) shouldEqual None } "give back None if SP-Anonymous header is present" in { - val conf = testCookieConfig service.cookieHeader( headers = Headers.empty, - cookieConfig = Some(conf), + cookieConfig = testCookieConfig, networkUserId = "nuid", doNotTrack = true, spAnonymous = Some("*"), @@ -553,7 +557,7 @@ class CollectorServiceSpec extends Specification { val Some(`Set-Cookie`(cookie)) = service.cookieHeader( headers = Headers.empty, - cookieConfig = Some(conf), + cookieConfig = conf, networkUserId = nuid, doNotTrack = false, spAnonymous = None, @@ -565,7 +569,7 @@ class CollectorServiceSpec extends Specification { cookie.extension must beNone service.cookieHeader( headers = Headers.empty, - cookieConfig = Some(conf), + cookieConfig = conf, networkUserId = nuid, doNotTrack = true, spAnonymous = None, @@ -699,7 +703,7 @@ class CollectorServiceSpec extends Specification { } "cookieDomain" in { - val testCookieConfig = CookieConfig( + val testCookieConfig = Config.Cookie( enabled = true, name = "name", expiration = 5.seconds, @@ -711,9 +715,8 @@ class CollectorServiceSpec extends Specification { ) "not return a domain" in { "if a list of domains is not supplied in the config and there is no fallback domain" in { - val headers = Headers.empty - val cookieConfig = testCookieConfig - service.cookieDomain(headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None + val headers = Headers.empty + service.cookieDomain(headers, testCookieConfig.domains, testCookieConfig.fallbackDomain) shouldEqual None } "if a list of domains is supplied in the config but the Origin request header is empty and there is no fallback domain" in { val headers = Headers.empty @@ -864,11 +867,10 @@ class CollectorServiceSpec extends Specification { } "should pass on the original path if no mapping for it can be found" in { - val service = new CollectorService( - TestUtils.testConf.copy(paths = Map.empty[String, String]), - CollectorSinks(new TestSink, new TestSink), - "", - "" + val service = new Service( + TestUtils.testConfig.copy(paths = Map.empty[String, String]), + Sinks(new TestSink, new TestSink), + TestUtils.appInfo ) val expected1 = "/com.acme/track" val expected2 = "/com.acme/redirect" diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala similarity index 92% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala rename to http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala index 84c412d06..ef734ec1b 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SplitBatchSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala @@ -1,4 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import org.apache.thrift.TDeserializer @@ -8,14 +8,17 @@ import io.circe.syntax._ import com.snowplowanalytics.iglu.core.circe.implicits._ import com.snowplowanalytics.iglu.core.SelfDescribingData + import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload + import com.snowplowanalytics.snowplow.badrows._ -import com.snowplowanalytics.snowplow.collectors.scalastream.model.SplitBatchResult + +import com.snowplowanalytics.snowplow.collector.core.model.SplitBatchResult import org.specs2.mutable.Specification class SplitBatchSpec extends Specification { - val splitBatch: SplitBatch = SplitBatch("app", "version") + val splitBatch: SplitBatch = SplitBatch(TestUtils.appInfo) "SplitBatch.split" should { "Batch a list of strings based on size" in { @@ -70,7 +73,7 @@ class SplitBatchSpec extends Specification { sizeViolation.failure.actualSizeBytes must_== 1019 sizeViolation.failure.expectation must_== "oversized collector payload: GET requests cannot be split" sizeViolation.payload.event must_== "CollectorP" - sizeViolation.processor shouldEqual Processor("app", "version") + sizeViolation.processor shouldEqual Processor(TestUtils.appName, TestUtils.appVersion) actual.good must_== Nil } @@ -89,7 +92,7 @@ class SplitBatchSpec extends Specification { .failure .expectation must_== "oversized collector payload: cannot split POST requests which are not json expected json value got 'ssssss...' (line 1, column 1)" sizeViolation.payload.event must_== "CollectorP" - sizeViolation.processor shouldEqual Processor("app", "version") + sizeViolation.processor shouldEqual Processor(TestUtils.appName, TestUtils.appVersion) } "Reject an oversized POST CollectorPayload which would be oversized even without its body" in { @@ -118,7 +121,7 @@ class SplitBatchSpec extends Specification { sizeViolation .payload .event must_== "CollectorPayload(schema:null, ipAddress:null, timestamp:0, encoding:null, collector:null, path:ppppp" - sizeViolation.processor shouldEqual Processor("app", "version") + sizeViolation.processor shouldEqual Processor(TestUtils.appName, TestUtils.appVersion) } "Split a CollectorPayload with three large events and four very large events" in { diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala similarity index 87% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala rename to http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala index 2c273a603..d17aadc11 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala @@ -1,4 +1,4 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collector.core import cats.effect.IO diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala new file mode 100644 index 000000000..3937b2580 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -0,0 +1,107 @@ +package com.snowplowanalytics.snowplow.collector.core + +import scala.concurrent.duration._ + +import cats.Applicative + +import org.http4s.SameSite + +import com.snowplowanalytics.snowplow.collector.core.Config._ + +object TestUtils { + val appName = "collector-test" + val appVersion = "testVersion" + + val appInfo = new AppInfo { + def name = appName + def version = appVersion + def dockerAlias = "docker run collector" + } + + def noopSink[F[_]: Applicative]: Sink[F] = new Sink[F] { + val maxBytes: Int = Int.MaxValue + def isHealthy: F[Boolean] = Applicative[F].pure(true) + def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = Applicative[F].unit + } + + val testConfig = Config[Any]( + interface = "0.0.0.0", + port = 8080, + paths = Map( + "/com.acme/track" -> "/com.snowplowanalytics.snowplow/tp2", + "/com.acme/redirect" -> "/r/tp2", + "/com.acme/iglu" -> "/com.snowplowanalytics.iglu/v1" + ), + p3p = P3P( + "/w3c/p3p.xml", + "NOI DSP COR NID PSA OUR IND COM NAV STA" + ), + crossDomain = CrossDomain( + false, + List("*"), + true + ), + cookie = Cookie( + enabled = true, + name = "sp", + expiration = 365.days, + domains = Nil, + fallbackDomain = None, + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ), + doNotTrackCookie = DoNotTrackCookie( + false, + "", + "" + ), + cookieBounce = CookieBounce( + false, + "n3pc", + "00000000-0000-4000-A000-000000000000", + None + ), + redirectMacro = RedirectMacro( + false, + None + ), + rootResponse = RootResponse( + false, + 302, + Map.empty[String, String], + "" + ), + cors = CORS(60.seconds), + streams = Streams( + "raw", + "bad-1", + false, + AnyRef, + Buffer( + 3145728, + 500, + 5000 + ) + ), + monitoring = Monitoring( + Metrics( + Statsd( + false, + "localhost", + 8125, + 10.seconds, + "snowplow.collector" + ) + ) + ), + ssl = SSL( + false, + false, + 443 + ), + enableDefaultRedirect = false, + redirectDomains = Set.empty[String], + preTerminationPeriod = 10.seconds + ) +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala deleted file mode 100644 index e83091692..000000000 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorTestUtils.scala +++ /dev/null @@ -1,13 +0,0 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import cats.Applicative - -object CollectorTestUtils { - - def noopSink[F[_]: Applicative]: Sink[F] = new Sink[F] { - val maxBytes: Int = Int.MaxValue - def isHealthy: F[Boolean] = Applicative[F].pure(true) - def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = Applicative[F].unit - } - -} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala deleted file mode 100644 index a4aa99982..000000000 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala +++ /dev/null @@ -1,26 +0,0 @@ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import scala.concurrent.duration._ -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -object TestUtils { - - val testConf = CollectorConfig( - paths = Map( - "/com.acme/track" -> "/com.snowplowanalytics.snowplow/tp2", - "/com.acme/redirect" -> "/r/tp2", - "/com.acme/iglu" -> "/com.snowplowanalytics.iglu/v1" - ), - cookie = CookieConfig( - enabled = true, - name = "sp", - expiration = 365.days, - domains = List.empty, - fallbackDomain = None, - secure = false, - httpOnly = false, - sameSite = None - ), - cors = CORSConfig(60.seconds) - ) -} diff --git a/project/Dependencies.scala b/project/Dependencies.scala index b43e773a3..fc1e4f4a7 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -36,22 +36,26 @@ object Dependencies { val protobuf = "3.21.7" // force this version to mitigate security vulnerabilities // Scala val collectorPayload = "0.0.0" - val tracker = "1.0.0" + val tracker = "1.0.1" val akkaHttp = "10.2.7" val akka = "2.6.16" val scopt = "4.0.1" val pureconfig = "0.17.2" val akkaHttpMetrics = "1.7.1" - val badRows = "2.1.1" + val badRows = "2.2.1" val log4cats = "2.6.0" + val http4s = "0.23.23" + val blaze = "0.23.15" + val http4sNetty = "0.5.9" + val decline = "2.4.1" + val circe = "0.14.1" + val circeConfig = "0.10.0" // Scala (test only) val specs2 = "4.11.0" val specs2CE = "0.4.1" + val specs2CE3 = "1.5.0" val testcontainers = "0.40.10" val catsRetry = "2.1.0" - val http4s = "0.23.23" - val blaze = "0.23.15" - val http4sNetty = "0.5.9" val http4sIT = "0.21.33" } @@ -88,15 +92,18 @@ object Dependencies { val akkaHttpMetrics = "fr.davit" %% "akka-http-metrics-datadog" % V.akkaHttpMetrics val log4cats = "org.typelevel" %% "log4cats-slf4j" % V.log4cats - - //http4s - val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s - val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s - val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze - val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty - + // http4s + val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s + val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s + val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze + val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty + val decline = "com.monovore" %% "decline-effect" % V.decline + val circeGeneric = "io.circe" %% "circe-generic" % V.circe + val circeConfig = "io.circe" %% "circe-config" % V.circeConfig + // Scala (test only) val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test + val specs2CE3 = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE3 % Test val specs2It = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest val specs2CEIt = "com.codecommit" %% "cats-effect-testing-specs2" % V.specs2CE % IntegrationTest val testcontainersIt = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest diff --git a/stdout/src/main/resources/application.conf b/stdout/src/main/resources/application.conf index 6636da3dc..570541343 100644 --- a/stdout/src/main/resources/application.conf +++ b/stdout/src/main/resources/application.conf @@ -1,38 +1,7 @@ collector { streams { - useIpAddressAsPartitionKey = false - sink { - enabled = stdout maxBytes = 1000000000 } - - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } - } -} - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off } -} +} \ No newline at end of file diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/PrintingSink.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/PrintingSink.scala new file mode 100644 index 000000000..83abb72a5 --- /dev/null +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/PrintingSink.scala @@ -0,0 +1,27 @@ +package com.snowplowanalytics.snowplow.collector.stdout + +import java.io.PrintStream +import java.util.Base64 + +import cats.implicits._ + +import cats.effect.Sync + +import com.snowplowanalytics.snowplow.collector.core.Sink + +class PrintingSink[F[_]: Sync]( + maxByteS: Int, + stream: PrintStream +) extends Sink[F] { + private val encoder: Base64.Encoder = Base64.getEncoder.withoutPadding() + + override val maxBytes: Int = maxByteS + override def isHealthy: F[Boolean] = Sync[F].pure(true) + + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + events.traverse_ { event => + Sync[F].delay { + stream.println(encoder.encodeToString(event)) + } + } +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala new file mode 100644 index 000000000..59e16e209 --- /dev/null +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala @@ -0,0 +1,14 @@ +package com.snowplowanalytics.snowplow.collector.stdout + +import io.circe.Decoder +import io.circe.generic.semiauto._ + +import com.snowplowanalytics.snowplow.collector.core.Config + +final case class SinkConfig( + maxBytes: Int +) extends Config.Sink + +object SinkConfig { + implicit val configDecoder: Decoder[SinkConfig] = deriveDecoder[SinkConfig] +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala new file mode 100644 index 000000000..4fdc196c4 --- /dev/null +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala @@ -0,0 +1,17 @@ +package com.snowplowanalytics.snowplow.collector.stdout + +import cats.effect.Sync +import cats.effect.kernel.Resource + +import com.snowplowanalytics.snowplow.collector.core.model.Sinks +import com.snowplowanalytics.snowplow.collector.core.App +import com.snowplowanalytics.snowplow.collector.core.Config + +object StdoutCollector extends App[SinkConfig](BuildInfo) { + + override def mkSinks[F[_]: Sync](config: Config.Streams[SinkConfig]): Resource[F, Sinks[F]] = { + val good = new PrintingSink(config.sink.maxBytes, System.out) + val bad = new PrintingSink(config.sink.maxBytes, System.err) + Resource.pure(Sinks(good, bad)) + } +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala deleted file mode 100644 index ef5e7725f..000000000 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PrintingSink.scala +++ /dev/null @@ -1,31 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import cats.effect.Sync -import cats.implicits._ - -import java.io.PrintStream -import java.util.Base64 - -class PrintingSink[F[_]: Sync](stream: PrintStream) extends Sink[F] { - private val encoder: Base64.Encoder = Base64.getEncoder.withoutPadding() - - override val maxBytes: Int = Int.MaxValue // TODO: configurable? - override def isHealthy: F[Boolean] = Sync[F].pure(true) - - override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = - events.traverse_ { event => - Sync[F].delay { - stream.println(encoder.encodeToString(event)) - } - } -} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala deleted file mode 100644 index 3400ac297..000000000 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/StdoutCollector.scala +++ /dev/null @@ -1,45 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import scala.concurrent.duration._ -import cats.effect.kernel.Resource -import cats.effect.{ExitCode, IO, IOApp} -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -object StdoutCollector extends IOApp { - - def run(args: List[String]): IO[ExitCode] = { - val good = Resource.pure[IO, Sink[IO]](new PrintingSink[IO](System.out)) - val bad = Resource.pure[IO, Sink[IO]](new PrintingSink[IO](System.err)) - CollectorApp.run[IO]( - good, - bad, - CollectorConfig( - Map.empty, - cookie = CookieConfig( - enabled = true, - name = "sp", - expiration = 365.days, - domains = List.empty, - fallbackDomain = None, - secure = false, - httpOnly = false, - sameSite = None - ), - cors = CORSConfig(60.seconds) - ), - BuildInfo.shortName, - BuildInfo.version - ) - } -} diff --git a/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala b/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala index e241a95ad..ba510ca57 100644 --- a/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala +++ b/stdout/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PrintingSinkSpec.scala @@ -1,29 +1,21 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks +import java.io.{ByteArrayOutputStream, PrintStream} +import java.nio.charset.StandardCharsets + +import org.specs2.mutable.Specification + import cats.effect.IO import cats.effect.unsafe.implicits.global -import com.snowplowanalytics.snowplow.collectors.scalastream.PrintingSink -import org.specs2.mutable.Specification -import java.io.{ByteArrayOutputStream, PrintStream} -import java.nio.charset.StandardCharsets +import com.snowplowanalytics.snowplow.collector.stdout.PrintingSink class PrintingSinkSpec extends Specification { "Printing sink" should { "print provided bytes encoded as BASE64 string" in { val baos = new ByteArrayOutputStream() - val sink = new PrintingSink[IO](new PrintStream(baos)) + val sink = new PrintingSink[IO](Integer.MAX_VALUE, new PrintStream(baos)) val input = "Something" sink.storeRawEvents(List(input.getBytes(StandardCharsets.UTF_8)), "key").unsafeRunSync() From 2a8815829df78b748c611fd5586bce60c4fc4bd7 Mon Sep 17 00:00:00 2001 From: spenes Date: Thu, 17 Aug 2023 12:37:05 +0300 Subject: [PATCH 12/39] Add http4s redirect support (close #373) --- .../Routes.scala | 13 +- .../Run.scala | 2 +- .../Service.scala | 44 +++++- .../RoutesSpec.scala | 44 +++++- .../ServiceSpec.scala | 143 +++++++++++++++++- 5 files changed, 230 insertions(+), 16 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index 21401d157..18afcb585 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -7,7 +7,7 @@ import org.http4s.dsl.Http4sDsl import org.http4s.implicits._ import com.comcast.ip4s.Dns -class Routes[F[_]: Sync](service: IService[F]) extends Http4sDsl[F] { +class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, service: IService[F]) extends Http4sDsl[F] { implicit val dns: Dns[F] = Dns.forSync[F] @@ -55,5 +55,14 @@ class Routes[F[_]: Sync](service: IService[F]) extends Http4sDsl[F] { ) } - val value: HttpApp[F] = (healthRoutes <+> corsRoute <+> cookieRoutes).orNotFound + def rejectRedirect = HttpRoutes.of[F] { + case _ -> Root / "r" / _ => + NotFound("redirects disabled") + } + + val value: HttpApp[F] = { + val routes = healthRoutes <+> corsRoute <+> cookieRoutes + val res = if (enableDefaultRedirect) routes else rejectRedirect <+> routes + res.orNotFound + } } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 3a107651f..41d969751 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -60,7 +60,7 @@ object Run { appInfo ) httpServer = HttpServer.build[F]( - new Routes[F](collectorService).value, + new Routes[F](config.enableDefaultRedirect, collectorService).value, config.interface, config.port ) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index 3a20f51eb..c0bd670d1 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -65,6 +65,7 @@ class Service[F[_]: Sync]( ): F[Response[F]] = for { body <- body + redirect = path.startsWith("/r/") hostname = extractHostname(request) userAgent = extractHeader(request, "User-Agent") refererUri = extractHeader(request, "Referer") @@ -104,7 +105,12 @@ class Service[F[_]: Sync]( ).flatten responseHeaders = Headers(headerList) _ <- sinkEvent(event, partitionKey) - resp = buildHttpResponse(responseHeaders, pixelExpected) + resp = buildHttpResponse( + queryParams = request.uri.query.params, + headers = responseHeaders, + redirect = redirect, + pixelExpected = pixelExpected + ) } yield resp override def determinePath(vendor: String, version: String): String = { @@ -170,11 +176,19 @@ class Service[F[_]: Sync]( e } - // TODO: Handle necessary cases to build http response in here def buildHttpResponse( + queryParams: Map[String, String], headers: Headers, + redirect: Boolean, pixelExpected: Boolean ): Response[F] = + if (redirect) + buildRedirectHttpResponse(queryParams, headers) + else + buildUsualHttpResponse(pixelExpected, headers) + + /** Builds the appropriate http response when not dealing with click redirects. */ + def buildUsualHttpResponse(pixelExpected: Boolean, headers: Headers): Response[F] = pixelExpected match { case true => Response[F]( @@ -190,6 +204,32 @@ class Service[F[_]: Sync]( ) } + /** Builds the appropriate http response when dealing with click redirects. */ + def buildRedirectHttpResponse(queryParams: Map[String, String], headers: Headers): Response[F] = { + val targetUri = for { + target <- queryParams.get("u") + uri <- Uri.fromString(target).toOption + if redirectTargetAllowed(uri) + } yield uri + + targetUri match { + case Some(t) => + Response[F]( + status = Found, + headers = headers.put(Location(t)) + ) + case _ => + Response[F]( + status = BadRequest, + headers = headers + ) + } + } + + private def redirectTargetAllowed(target: Uri): Boolean = + if (config.redirectDomains.isEmpty) true + else config.redirectDomains.contains(target.host.map(_.renderString).getOrElse("")) + // TODO: Since Remote-Address and Raw-Request-URI is akka-specific headers, // they aren't included in here. It might be good to search for counterparts in Http4s. /** If the SP-Anonymous header is not present, retrieves all headers diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 7b67ede3e..81590ab95 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -57,15 +57,15 @@ class RoutesSpec extends Specification { override def determinePath(vendor: String, version: String): String = "/p1/p2" } - def createTestServices = { + def createTestServices(enabledDefaultRedirect: Boolean = true) = { val service = new TestService() - val routes = new Routes(service).value + val routes = new Routes(enabledDefaultRedirect, service).value (service, routes) } "The collector route" should { "respond to the health route with an ok response" in { - val (_, routes) = createTestServices + val (_, routes) = createTestServices() val request = Request[IO](method = Method.GET, uri = uri"/health") val response = routes.run(request).unsafeRunSync() @@ -74,7 +74,7 @@ class RoutesSpec extends Specification { } "respond to the cors route with a preflight response" in { - val (_, routes) = createTestServices + val (_, routes) = createTestServices() def test(uri: Uri) = { val request = Request[IO](method = Method.OPTIONS, uri = uri) val response = routes.run(request).unsafeRunSync() @@ -86,7 +86,7 @@ class RoutesSpec extends Specification { } "respond to the post cookie route with the cookie response" in { - val (collectorService, routes) = createTestServices + val (collectorService, routes) = createTestServices() val request = Request[IO](method = Method.POST, uri = uri"/p3/p4") .withEntity("testBody") @@ -106,7 +106,7 @@ class RoutesSpec extends Specification { "respond to the get or head cookie route with the cookie response" in { def test(method: Method) = { - val (collectorService, routes) = createTestServices + val (collectorService, routes) = createTestServices() val request = Request[IO](method = method, uri = uri"/p3/p4").withEntity("testBody") val response = routes.run(request).unsafeRunSync() @@ -128,7 +128,7 @@ class RoutesSpec extends Specification { "respond to the get or head pixel route with the cookie response" in { def test(method: Method, uri: String) = { - val (collectorService, routes) = createTestServices + val (collectorService, routes) = createTestServices() val request = Request[IO](method = method, uri = Uri.unsafeFromString(uri)).withEntity("testBody") val response = routes.run(request).unsafeRunSync() @@ -149,6 +149,36 @@ class RoutesSpec extends Specification { test(Method.GET, "/ice.png") test(Method.HEAD, "/ice.png") } + + "allow redirect routes when redirects enabled" in { + def test(method: Method) = { + val (_, routes) = createTestServices() + + val request = Request[IO](method = method, uri = uri"/r/abc") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.Ok) + response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") + } + + test(Method.GET) + test(Method.POST) + } + + "disallow redirect routes when redirects disabled" in { + def test(method: Method) = { + val (_, routes) = createTestServices(enabledDefaultRedirect = false) + + val request = Request[IO](method = method, uri = uri"/r/abc") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.NotFound) + response.bodyText.compile.string.unsafeRunSync() must beEqualTo("redirects disabled") + } + + test(Method.GET) + test(Method.POST) + } } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index 3b3fa4903..8ece33b51 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -48,11 +48,11 @@ class ServiceSpec extends Specification { secure = false ) - def probeService(): ProbeService = { + def probeService(config: Config[Any] = TestUtils.testConfig): ProbeService = { val good = new TestSink val bad = new TestSink val service = new Service( - config = TestUtils.testConfig, + config = config, sinks = Sinks(good, bad), appInfo = TestUtils.appInfo ) @@ -365,6 +365,35 @@ class ServiceSpec extends Specification { Header.Raw(ci"Access-Control-Allow-Origin", "http://origin.com") ) } + + "redirect if path starts with '/r/'" in { + val testConf = TestUtils + .testConfig + .copy( + redirectDomains = Set("snowplow.acme.com", "example.com") + ) + val testPath = "/r/example?u=https://snowplow.acme.com/12" + val ProbeService(service, good, bad) = probeService(config = testConf) + val req = Request[IO]( + method = Method.GET, + uri = Uri.unsafeFromString(testPath) + ) + val r = service + .cookie( + body = IO.pure(Some("b")), + path = testPath, + request = req, + pixelExpected = false, + doNotTrack = false, + contentType = None + ) + .unsafeRunSync() + + r.status mustEqual Status.Found + r.headers.get[Location] must beSome(Location(Uri.unsafeFromString("https://snowplow.acme.com/12"))) + good.storedRawEvents must have size 1 + bad.storedRawEvents must have size 0 + } } "preflightResponse" in { @@ -451,20 +480,126 @@ class ServiceSpec extends Specification { } "buildHttpResponse" in { + "rely on buildRedirectHttpResponse if redirect is true" in { + val testConfig = TestUtils + .testConfig + .copy( + redirectDomains = Set("example1.com", "example2.com") + ) + val ProbeService(service, _, _) = probeService(config = testConfig) + val res = service.buildHttpResponse( + queryParams = Map("u" -> "https://example1.com/12"), + headers = testHeaders, + redirect = true, + pixelExpected = true + ) + res.status shouldEqual Status.Found + res.headers shouldEqual testHeaders.put(Location(Uri.unsafeFromString("https://example1.com/12"))) + } + "send back a gif if pixelExpected is true" in { + val res = service.buildHttpResponse( + queryParams = Map.empty, + headers = testHeaders, + redirect = false, + pixelExpected = true + ) + res.status shouldEqual Status.Ok + res.headers shouldEqual testHeaders.put(`Content-Type`(MediaType.image.gif)) + res.body.compile.toList.unsafeRunSync().toArray shouldEqual Service.pixel + } + "send back ok otherwise" in { + val res = service.buildHttpResponse( + queryParams = Map.empty, + headers = testHeaders, + redirect = false, + pixelExpected = false + ) + res.status shouldEqual Status.Ok + res.headers shouldEqual testHeaders + res.bodyText.compile.toList.unsafeRunSync() shouldEqual List("ok") + } + } + + "buildUsualHttpResponse" in { "send back a gif if pixelExpected is true" in { - val res = service.buildHttpResponse(testHeaders, pixelExpected = true) + val res = service.buildUsualHttpResponse( + headers = testHeaders, + pixelExpected = true + ) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders.put(`Content-Type`(MediaType.image.gif)) res.body.compile.toList.unsafeRunSync().toArray shouldEqual Service.pixel } "send back ok otherwise" in { - val res = service.buildHttpResponse(testHeaders, pixelExpected = false) + val res = service.buildUsualHttpResponse( + headers = testHeaders, + pixelExpected = false + ) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders res.bodyText.compile.toList.unsafeRunSync() shouldEqual List("ok") } } + "buildRedirectHttpResponse" in { + "give back a 302 if redirecting and there is a u query param" in { + val testConfig = TestUtils + .testConfig + .copy( + redirectDomains = Set("example1.com", "example2.com") + ) + val ProbeService(service, _, _) = probeService(config = testConfig) + val res = service.buildRedirectHttpResponse( + queryParams = Map("u" -> "https://example1.com/12"), + headers = testHeaders + ) + res.status shouldEqual Status.Found + res.headers shouldEqual testHeaders.put(Location(Uri.unsafeFromString("https://example1.com/12"))) + } + "give back a 400 if redirecting and there are no u query params" in { + val testConfig = TestUtils + .testConfig + .copy( + redirectDomains = Set("example1.com", "example2.com") + ) + val ProbeService(service, _, _) = probeService(config = testConfig) + val res = service.buildRedirectHttpResponse( + queryParams = Map.empty, + headers = testHeaders + ) + res.status shouldEqual Status.BadRequest + res.headers shouldEqual testHeaders + } + "give back a 400 if redirecting to a disallowed domain" in { + val testConfig = TestUtils + .testConfig + .copy( + redirectDomains = Set("example1.com", "example2.com") + ) + val ProbeService(service, _, _) = probeService(config = testConfig) + val res = service.buildRedirectHttpResponse( + queryParams = Map("u" -> "https://invalidexample1.com/12"), + headers = testHeaders + ) + res.status shouldEqual Status.BadRequest + res.headers shouldEqual testHeaders + } + "give back a 302 if redirecting to an unknown domain, with no restrictions on domains" in { + val testConfig = TestUtils + .testConfig + .copy( + redirectDomains = Set.empty + ) + val ProbeService(service, _, _) = probeService(config = testConfig) + val res = service.buildRedirectHttpResponse( + queryParams = Map("u" -> "https://unknown.example.com/12"), + headers = testHeaders + ) + res.status shouldEqual Status.Found + res.headers shouldEqual testHeaders.put(Location(Uri.unsafeFromString("https://unknown.example.com/12"))) + } + } + "ipAndPartitionkey" in { "give back the ip and partition key as ip if remote address is defined" in { val address = Some("127.0.0.1") From 031fc6931f1af638f407c6d5ce9f6c8719b3f64f Mon Sep 17 00:00:00 2001 From: spenes Date: Thu, 17 Aug 2023 15:31:06 +0300 Subject: [PATCH 13/39] Add http4s SSL support (close #374) --- .../HttpServer.scala | 58 ++++++++++++++++--- .../Run.scala | 3 +- 2 files changed, 51 insertions(+), 10 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala index cfbc2ebe5..bef93905d 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala @@ -1,6 +1,9 @@ package com.snowplowanalytics.snowplow.collector.core import java.net.InetSocketAddress +import javax.net.ssl.SSLContext + +import io.netty.handler.ssl._ import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger @@ -20,6 +23,7 @@ import org.http4s.blaze.server.BlazeServerBuilder import org.http4s.netty.server.NettyServerBuilder import fs2.io.net.Network +import fs2.io.net.tls.TLSContext object HttpServer { @@ -28,19 +32,21 @@ object HttpServer { def build[F[_]: Async]( app: HttpApp[F], interface: String, - port: Int + port: Int, + secure: Boolean ): Resource[F, Server] = sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { - case Some("EMBER") | None => buildEmberServer[F](app, interface, port) - case Some("BLAZE") => buildBlazeServer[F](app, port) - case Some("NETTY") => buildNettyServer[F](app, port) + case Some("EMBER") | None => buildEmberServer[F](app, interface, port, secure) + case Some("BLAZE") => buildBlazeServer[F](app, port, secure) + case Some("NETTY") => buildNettyServer[F](app, port, secure) case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") } private def buildEmberServer[F[_]: Async]( app: HttpApp[F], interface: String, - port: Int + port: Int, + secure: Boolean ) = { implicit val network = Network.forAsync[F] Resource.eval(Logger[F].info("Building ember server")) >> @@ -50,24 +56,58 @@ object HttpServer { .withPort(Port.fromInt(port).get) .withHttpApp(app) .withIdleTimeout(610.seconds) + .cond(secure, _.withTLS(TLSContext.Builder.forAsync.fromSSLContext(SSLContext.getDefault))) .build } private def buildBlazeServer[F[_]: Async]( app: HttpApp[F], - port: Int + port: Int, + secure: Boolean ): Resource[F, Server] = Resource.eval(Logger[F].info("Building blaze server")) >> BlazeServerBuilder[F] .bindSocketAddress(new InetSocketAddress(port)) .withHttpApp(app) .withIdleTimeout(610.seconds) + .cond(secure, _.withSslContext(SSLContext.getDefault)) .resource private def buildNettyServer[F[_]: Async]( app: HttpApp[F], - port: Int - ): Resource[F, Server] = + port: Int, + secure: Boolean + ) = Resource.eval(Logger[F].info("Building netty server")) >> - NettyServerBuilder[F].bindLocal(port).withHttpApp(app).withIdleTimeout(610.seconds).resource + NettyServerBuilder[F] + .bindLocal(port) + .withHttpApp(app) + .withIdleTimeout(610.seconds) + .cond( + secure, + _.withSslContext( + new JdkSslContext( + SSLContext.getDefault, + false, + null, + IdentityCipherSuiteFilter.INSTANCE, + new ApplicationProtocolConfig( + ApplicationProtocolConfig.Protocol.ALPN, + ApplicationProtocolConfig.SelectorFailureBehavior.NO_ADVERTISE, + ApplicationProtocolConfig.SelectedListenerFailureBehavior.ACCEPT, + ApplicationProtocolNames.HTTP_2, + ApplicationProtocolNames.HTTP_1_1 + ), + ClientAuth.NONE, + null, + false + ) + ) + ) + .resource + + implicit class ConditionalAction[A](item: A) { + def cond(cond: Boolean, action: A => A): A = + if (cond) action(item) else item + } } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 41d969751..a221fdfec 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -62,7 +62,8 @@ object Run { httpServer = HttpServer.build[F]( new Routes[F](config.enableDefaultRedirect, collectorService).value, config.interface, - config.port + if (config.ssl.enable) config.ssl.port else config.port, + config.ssl.enable ) _ <- withGracefulShutdown(config.preTerminationPeriod)(httpServer) } yield () From 69b302ea731de34703566782703cb74dd137ac3d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Piotr=20Poniedzia=C5=82ek?= Date: Thu, 10 Aug 2023 09:39:15 +0200 Subject: [PATCH 14/39] Add http4s PubSub sink (close #376) --- build.sbt | 38 ++- .../scalastream/it/CollectorContainer.scala | 19 ++ .../scalastream/it/CollectorOutput.scala | 13 +- .../scalastream/it/EventGenerator.scala | 58 ++++ .../collectors/scalastream/it/Http.scala | 35 +++ .../collectors/scalastream/it/utils.scala | 135 +++++++++ http4s/src/main/resources/reference.conf | 3 +- .../App.scala | 4 +- .../Routes.scala | 7 + .../Service.scala | 4 + .../RoutesSpec.scala | 2 + .../ServiceSpec.scala | 2 +- .../TestUtils.scala | 2 +- project/Dependencies.scala | 67 +++-- .../scalastream/it/pubsub/Containers.scala | 20 +- .../it/pubsub/GooglePubSubCollectorSpec.scala | 16 +- pubsub/src/main/resources/application.conf | 26 +- .../GooglePubSubCollector.scala | 54 ---- .../PubSubCollector.scala | 16 ++ .../sinks/BuilderOps.scala | 36 +++ .../sinks/GooglePubSubSink.scala | 272 ------------------ .../sinks/PubSubHealthCheck.scala | 74 +++++ .../sinks/PubSubSink.scala | 133 +++++++++ .../sinks/PubSubSinkConfig.scala | 33 +++ .../ConfigSpec.scala | 125 ++++++++ .../StdoutCollector.scala | 9 +- 26 files changed, 778 insertions(+), 425 deletions(-) create mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala rename pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala => http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala (60%) create mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala create mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala create mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala delete mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala create mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala create mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/BuilderOps.scala delete mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala create mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala create mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala create mode 100644 pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala create mode 100644 pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala diff --git a/build.sbt b/build.sbt index e9bb395ef..b28020214 100644 --- a/build.sbt +++ b/build.sbt @@ -37,9 +37,9 @@ lazy val commonDependencies = Seq( Dependencies.Libraries.akkaStreamTestkit, Dependencies.Libraries.specs2, // Integration tests - Dependencies.Libraries.testcontainersIt, - Dependencies.Libraries.http4sClientIt, - Dependencies.Libraries.catsRetryIt + Dependencies.Libraries.LegacyIT.testcontainers, + Dependencies.Libraries.LegacyIT.http4sClient, + Dependencies.Libraries.LegacyIT.catsRetry ) lazy val commonExclusions = Seq( @@ -138,9 +138,18 @@ lazy val http4s = project Dependencies.Libraries.decline, Dependencies.Libraries.circeGeneric, Dependencies.Libraries.circeConfig, - Dependencies.Libraries.specs2CE3 + Dependencies.Libraries.specs2, + Dependencies.Libraries.specs2CE, + + //Integration tests + Dependencies.Libraries.IT.testcontainers, + Dependencies.Libraries.IT.http4sClient, + Dependencies.Libraries.IT.catsRetry + ) ) + .settings(Defaults.itSettings) + .configs(IntegrationTest) lazy val kinesisSettings = allSettings ++ buildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( @@ -151,8 +160,8 @@ lazy val kinesisSettings = Dependencies.Libraries.sts, Dependencies.Libraries.sqs, // integration tests dependencies - Dependencies.Libraries.specs2It, - Dependencies.Libraries.specs2CEIt + Dependencies.Libraries.LegacyIT.specs2, + Dependencies.Libraries.LegacyIT.specs2CE ), IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated @@ -195,15 +204,16 @@ lazy val sqsDistroless = project .dependsOn(core % "test->test;compile->compile") lazy val pubsubSettings = - allSettings ++ buildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( + allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( moduleName := "snowplow-stream-collector-google-pubsub", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", Docker / packageName := "scala-stream-collector-pubsub", libraryDependencies ++= Seq( - Dependencies.Libraries.pubsub, - Dependencies.Libraries.protobuf, + Dependencies.Libraries.catsRetry, + Dependencies.Libraries.fs2PubSub, // integration tests dependencies - Dependencies.Libraries.specs2It, - Dependencies.Libraries.specs2CEIt, + Dependencies.Libraries.IT.specs2, + Dependencies.Libraries.IT.specs2CE, ), IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated @@ -212,7 +222,7 @@ lazy val pubsubSettings = lazy val pubsub = project .settings(pubsubSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile;it->it") + .dependsOn(http4s % "test->test;compile->compile;it->it") .configs(IntegrationTest) lazy val pubsubDistroless = project @@ -220,7 +230,7 @@ lazy val pubsubDistroless = project .settings(sourceDirectory := (pubsub / sourceDirectory).value) .settings(pubsubSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile;it->it") + .dependsOn(http4s % "test->test;compile->compile;it->it") .configs(IntegrationTest) lazy val kafkaSettings = @@ -268,12 +278,12 @@ lazy val nsqDistroless = project lazy val stdoutSettings = allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( moduleName := "snowplow-stream-collector-stdout", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collector.stdout", Docker / packageName := "scala-stream-collector-stdout" ) lazy val stdout = project .settings(stdoutSettings) - .settings(buildInfoPackage := s"com.snowplowanalytics.snowplow.collector.stdout") .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) .dependsOn(http4s % "test->test;compile->compile") diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala new file mode 100644 index 000000000..a0a7c886c --- /dev/null +++ b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorContainer.scala @@ -0,0 +1,19 @@ +/* + * Copyright (c) 2023-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it + +import org.testcontainers.containers.GenericContainer + +case class CollectorContainer( + container: GenericContainer[_], + host: String, + port: Int +) diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala similarity index 60% rename from pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala rename to http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala index b9da73a19..a14ea04af 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubsubConfigSpec.scala +++ b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala @@ -8,10 +8,13 @@ * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ -package com.snowplowanalytics.snowplow.collectors.scalastream +package com.snowplowanalytics.snowplow.collectors.scalastream.it -import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec +import com.snowplowanalytics.snowplow.badrows.BadRow -class PubsubConfigSpec extends ConfigSpec { - makeConfigTest("pubsub", "", "") -} +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload + +case class CollectorOutput( + good: List[CollectorPayload], + bad: List[BadRow] +) diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala new file mode 100644 index 000000000..e25dd11ad --- /dev/null +++ b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala @@ -0,0 +1,58 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it + +import cats.effect.IO + +import org.http4s.{Method, Request, Uri} + +object EventGenerator { + + def sendEvents( + collectorHost: String, + collectorPort: Int, + nbGood: Int, + nbBad: Int, + maxBytes: Int + ): IO[Unit] = { + val requests = generateEvents(collectorHost, collectorPort, nbGood, nbBad, maxBytes) + Http.statuses(requests) + .flatMap { responses => + responses.collect { case resp if resp.code != 200 => resp.reason } match { + case Nil => IO.unit + case errors => IO.raiseError(new RuntimeException(s"${errors.size} requests were not successful. Example error: ${errors.head}")) + } + } + } + + def generateEvents( + collectorHost: String, + collectorPort: Int, + nbGood: Int, + nbBad: Int, + maxBytes: Int + ): List[Request[IO]] = { + val good = List.fill(nbGood)(mkTp2Event(collectorHost, collectorPort, valid = true, maxBytes)) + val bad = List.fill(nbBad)(mkTp2Event(collectorHost, collectorPort, valid = false, maxBytes)) + good ++ bad + } + + def mkTp2Event( + collectorHost: String, + collectorPort: Int, + valid: Boolean = true, + maxBytes: Int = 100 + ): Request[IO] = { + val uri = Uri.unsafeFromString(s"http://$collectorHost:$collectorPort/com.snowplowanalytics.snowplow/tp2") + val body = if (valid) "foo" else "a" * (maxBytes + 1) + Request[IO](Method.POST, uri).withEntity(body) + } +} diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala new file mode 100644 index 000000000..e7d1d613a --- /dev/null +++ b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala @@ -0,0 +1,35 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it + +import cats.effect.{IO, Resource} +import cats.implicits._ +import org.http4s.blaze.client.BlazeClientBuilder +import org.http4s.client.Client +import org.http4s.{Request, Response, Status} + +object Http { + + def statuses(requests: List[Request[IO]]): IO[List[Status]] = + mkClient.use { client => requests.traverse(client.status) } + + def status(request: Request[IO]): IO[Status] = + mkClient.use { client => client.status(request) } + + def response(request: Request[IO]): IO[Response[IO]] = + mkClient.use(c => c.run(request).use(resp => IO.pure(resp))) + + def responses(requests: List[Request[IO]]): IO[List[Response[IO]]] = + mkClient.use(c => requests.traverse(r => c.run(r).use(resp => IO.pure(resp)))) + + def mkClient: Resource[IO, Client[IO]] = + BlazeClientBuilder.apply[IO].resource +} diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala new file mode 100644 index 000000000..485836c1e --- /dev/null +++ b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala @@ -0,0 +1,135 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it + +import scala.concurrent.duration._ + +import org.apache.thrift.TDeserializer + +import org.slf4j.LoggerFactory + +import org.testcontainers.containers.GenericContainer +import org.testcontainers.containers.output.Slf4jLogConsumer + +import io.circe.{Json, parser} + +import cats.implicits._ + +import cats.effect.IO + +import retry.syntax.all._ +import retry.RetryPolicies + +import com.snowplowanalytics.snowplow.badrows.BadRow + +import com.snowplowanalytics.iglu.core.SelfDescribingData +import com.snowplowanalytics.iglu.core.circe.implicits._ + +import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload + +object utils { + + def parseCollectorPayload(bytes: Array[Byte]): CollectorPayload = { + val deserializer = new TDeserializer() + val target = new CollectorPayload() + deserializer.deserialize(target, bytes) + target + } + + def parseBadRow(bytes: Array[Byte]): BadRow = { + val str = new String(bytes) + val parsed = for { + json <- parser.parse(str).leftMap(_.message) + sdj <- SelfDescribingData.parse(json).leftMap(_.message("Can't decode JSON as SDJ")) + br <- sdj.data.as[BadRow].leftMap(_.getMessage()) + } yield br + parsed match { + case Right(br) => br + case Left(err) => throw new RuntimeException(s"Can't parse bad row. Error: $err") + } + } + + def printBadRows(testName: String, badRows: List[BadRow]): IO[Unit] = { + log(testName, "Bad rows:") *> + badRows.traverse_(br => log(testName, br.compact)) + } + + def log(testName: String, line: String): IO[Unit] = + IO(println(s"[$testName] $line")) + + def startContainerWithLogs( + container: GenericContainer[_], + loggerName: String + ): GenericContainer[_] = { + container.start() + val logger = LoggerFactory.getLogger(loggerName) + val logs = new Slf4jLogConsumer(logger) + container.followOutput(logs) + container + } + + def waitWhile[A]( + a: A, + condition: A => Boolean, + maxDelay: FiniteDuration + ): IO[Boolean] = { + val retryPolicy = RetryPolicies.limitRetriesByCumulativeDelay( + maxDelay, + RetryPolicies.capDelay[IO]( + 2.second, + RetryPolicies.fullJitter[IO](1.second) + ) + ) + + IO(condition(a)).retryingOnFailures( + result => IO(!result), + retryPolicy, + (_, _) => IO.unit + ) + } + + /** Return a list of config parameters from a raw JSON string. */ + def getConfigParameters(config: String): List[String] = { + val parsed: Json = parser.parse(config).valueOr { case failure => + throw new IllegalArgumentException("Can't parse JSON", failure.underlying) + } + + def flatten(value: Json): Option[List[(String, Json)]] = + value.asObject.map( + _.toList.flatMap { + case (k, v) => flatten(v) match { + case None => List(k -> v) + case Some(fields) => fields.map { + case (innerK, innerV) => s"$k.$innerK" -> innerV + } + } + } + ) + + def withSpaces(s: String): String = if(s.contains(" ")) s""""$s"""" else s + + val fields = flatten(parsed).getOrElse(throw new IllegalArgumentException("Couldn't flatten fields")) + + fields.flatMap { + case (k, v) if v.isString => + List(s"-D$k=${withSpaces(v.asString.get)}") + case (k, v) if v.isArray => + v.asArray.get.toList.zipWithIndex.map { + case (s, i) if s.isString => + s"-D$k.$i=${withSpaces(s.asString.get)}" + case (other, i) => + s"-D$k.$i=${withSpaces(other.toString)}" + } + case (k, v) => + List(s"-D$k=${withSpaces(v.toString)}") + } + } +} diff --git a/http4s/src/main/resources/reference.conf b/http4s/src/main/resources/reference.conf index e6acbc7ef..9bee1be0c 100644 --- a/http4s/src/main/resources/reference.conf +++ b/http4s/src/main/resources/reference.conf @@ -46,7 +46,7 @@ } cors { - accessControlMaxAge = 60 seconds + accessControlMaxAge = 60 minutes } streams { @@ -78,6 +78,7 @@ } enableDefaultRedirect = false + preTerminationPeriod = 10 seconds redirectDomains = [] diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala index df25ac885..cb69be2f9 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala @@ -1,6 +1,6 @@ package com.snowplowanalytics.snowplow.collector.core -import cats.effect.{ExitCode, IO, Sync} +import cats.effect.{ExitCode, IO} import cats.effect.kernel.Resource import com.monovore.decline.effect.CommandIOApp @@ -17,7 +17,7 @@ abstract class App[SinkConfig <: Config.Sink: Decoder](appInfo: AppInfo) version = appInfo.version ) { - def mkSinks[F[_]: Sync](config: Config.Streams[SinkConfig]): Resource[F, Sinks[F]] + def mkSinks(config: Config.Streams[SinkConfig]): Resource[IO, Sinks[IO]] final def main: Opts[IO[ExitCode]] = Run.fromCli[IO, SinkConfig](appInfo, mkSinks) } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index 18afcb585..5f3ef43cd 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -14,6 +14,13 @@ class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, service: IService[F]) e private val healthRoutes = HttpRoutes.of[F] { case GET -> Root / "health" => Ok("ok") + case GET -> Root / "sink-health" => + service + .sinksHealthy + .ifM( + ifTrue = Ok("ok"), + ifFalse = ServiceUnavailable("Service Unavailable") + ) } private val corsRoute = HttpRoutes.of[F] { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index c0bd670d1..4b33498c8 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -34,6 +34,7 @@ trait IService[F[_]] { contentType: Option[String] = None ): F[Response[F]] def determinePath(vendor: String, version: String): String + def sinksHealthy: F[Boolean] } object Service { @@ -113,6 +114,8 @@ class Service[F[_]: Sync]( ) } yield resp + override def sinksHealthy: F[Boolean] = (sinks.good.isHealthy, sinks.bad.isHealthy).mapN(_ && _) + override def determinePath(vendor: String, version: String): String = { val original = s"/$vendor/$version" config.paths.getOrElse(original, original) @@ -392,4 +395,5 @@ class Service[F[_]: Sync]( case Some(_) => Some(Service.spAnonymousNuid) case None => request.uri.query.params.get("nuid").orElse(requestCookie.map(_.content)) } + } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 81590ab95..229865d20 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -55,6 +55,8 @@ class RoutesSpec extends Specification { } override def determinePath(vendor: String, version: String): String = "/p1/p2" + + override def sinksHealthy: IO[Boolean] = IO.pure(true) } def createTestServices(enabledDefaultRedirect: Boolean = true) = { diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index 8ece33b51..f44bfba02 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -402,7 +402,7 @@ class ServiceSpec extends Specification { Header.Raw(ci"Access-Control-Allow-Origin", "*"), `Access-Control-Allow-Credentials`(), `Access-Control-Allow-Headers`(ci"Content-Type", ci"SP-Anonymous"), - `Access-Control-Max-Age`.Cache(60).asInstanceOf[`Access-Control-Max-Age`] + `Access-Control-Max-Age`.Cache(3600).asInstanceOf[`Access-Control-Max-Age`] ) service.preflightResponse(Request[IO]()).unsafeRunSync.headers shouldEqual expected } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 3937b2580..6ef978288 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -72,7 +72,7 @@ object TestUtils { Map.empty[String, String], "" ), - cors = CORS(60.seconds), + cors = CORS(60.minutes), streams = Streams( "raw", "bad-1", diff --git a/project/Dependencies.scala b/project/Dependencies.scala index fc1e4f4a7..78304902c 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -50,13 +50,19 @@ object Dependencies { val decline = "2.4.1" val circe = "0.14.1" val circeConfig = "0.10.0" + val fs2PubSub = "0.22.0" + val catsRetry = "3.1.0" + // Scala (test only) val specs2 = "4.11.0" - val specs2CE = "0.4.1" - val specs2CE3 = "1.5.0" + val specs2CE = "1.5.0" val testcontainers = "0.40.10" - val catsRetry = "2.1.0" - val http4sIT = "0.21.33" + + object LegacyIT { + val specs2CE = "0.4.1" + val catsRetry = "2.1.0" + val http4s = "0.21.33" + } } object Libraries { @@ -93,24 +99,43 @@ object Dependencies { val log4cats = "org.typelevel" %% "log4cats-slf4j" % V.log4cats // http4s - val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s - val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s - val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze - val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty - val decline = "com.monovore" %% "decline-effect" % V.decline - val circeGeneric = "io.circe" %% "circe-generic" % V.circe - val circeConfig = "io.circe" %% "circe-config" % V.circeConfig + val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s + val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s + val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze + val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty + val decline = "com.monovore" %% "decline-effect" % V.decline + val circeGeneric = "io.circe" %% "circe-generic" % V.circe + val circeConfig = "io.circe" %% "circe-config" % V.circeConfig + val catsRetry = "com.github.cb372" %% "cats-retry" % V.catsRetry + val fs2PubSub = "com.permutive" %% "fs2-google-pubsub-grpc" % V.fs2PubSub // Scala (test only) - val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test - val specs2CE3 = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE3 % Test - val specs2It = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest - val specs2CEIt = "com.codecommit" %% "cats-effect-testing-specs2" % V.specs2CE % IntegrationTest - val testcontainersIt = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest - val catsRetryIt = "com.github.cb372" %% "cats-retry" % V.catsRetry % IntegrationTest - val http4sClientIt = "org.http4s" %% "http4s-blaze-client" % V.http4sIT % IntegrationTest - val akkaTestkit = "com.typesafe.akka" %% "akka-testkit" % V.akka % Test - val akkaHttpTestkit = "com.typesafe.akka" %% "akka-http-testkit" % V.akkaHttp % Test - val akkaStreamTestkit = "com.typesafe.akka" %% "akka-stream-testkit" % V.akka % Test + + // Test common + val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test + val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % Test + + // Test Akka + val akkaTestkit = "com.typesafe.akka" %% "akka-testkit" % V.akka % Test + val akkaHttpTestkit = "com.typesafe.akka" %% "akka-http-testkit" % V.akkaHttp % Test + val akkaStreamTestkit = "com.typesafe.akka" %% "akka-stream-testkit" % V.akka % Test + + // Integration tests + object IT { + val testcontainers = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest + val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest + val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % IntegrationTest + val catsRetry = "com.github.cb372" %% "cats-retry" % V.catsRetry % IntegrationTest + val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze % IntegrationTest + } + + // Integration test legacy + object LegacyIT { + val testcontainers = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest + val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest + val specs2CE = "com.codecommit" %% "cats-effect-testing-specs2" % V.LegacyIT.specs2CE % IntegrationTest + val catsRetry = "com.github.cb372" %% "cats-retry" % V.LegacyIT.catsRetry % IntegrationTest + val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.LegacyIT.http4s % IntegrationTest + } } } diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala index 85ec55bee..92ccbb577 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/Containers.scala @@ -10,25 +10,16 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.pubsub -import scala.concurrent.ExecutionContext - import org.testcontainers.containers.{BindMode, Network} import org.testcontainers.containers.wait.strategy.Wait - import com.dimafeng.testcontainers.GenericContainer - -import cats.effect.{IO, Resource, Timer} - -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.ProjectMetadata - +import cats.effect.{IO, Resource} +import com.snowplowanalytics.snowplow.collectors.scalastream.BuildInfo import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ import com.snowplowanalytics.snowplow.collectors.scalastream.it.CollectorContainer object Containers { - private val executionContext: ExecutionContext = ExecutionContext.global - implicit val ioTimer: Timer[IO] = IO.timer(executionContext) - val collectorPort = 8080 val projectId = "google-project-id" val emulatorHost = "localhost" @@ -69,7 +60,7 @@ object Containers { envs: Map[String, String] = Map.empty[String, String] ): Resource[IO, CollectorContainer] = { val container = GenericContainer( - dockerImage = s"snowplow/scala-stream-collector-pubsub:${ProjectMetadata.dockerTag}", + dockerImage = BuildInfo.dockerAlias, env = Map( "PUBSUB_EMULATOR_HOST" -> s"pubsub-emulator:$emulatorPort", "PORT" -> collectorPort.toString, @@ -77,7 +68,8 @@ object Containers { "TOPIC_BAD" -> topicBad, "GOOGLE_PROJECT_ID" -> projectId, "MAX_BYTES" -> Integer.MAX_VALUE.toString, - "JDK_JAVA_OPTIONS" -> "-Dorg.slf4j.simpleLogger.log.com.snowplowanalytics.snowplow.collectors.scalastream.sinks.GooglePubSubSink=warn" + "JDK_JAVA_OPTIONS" -> "-Dorg.slf4j.simpleLogger.log.com.snowplowanalytics.snowplow.collectors.scalastream.sinks.GooglePubSubSink=warn", + "HTTP4S_BACKEND" -> "BLAZE" ) ++ envs, exposedPorts = Seq(collectorPort), fileSystemBind = Seq( @@ -91,7 +83,7 @@ object Containers { "--config", "/snowplow/config/collector.hocon" ) - ,waitStrategy = Wait.forLogMessage(s".*REST interface bound to.*", 1) + ,waitStrategy = Wait.forLogMessage(s".*Service bound to address.*", 1) ) container.container.withNetwork(network) diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala index f8e2bc2ef..99ee196a8 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala @@ -11,22 +11,16 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.it.pubsub import scala.concurrent.duration._ - import cats.effect.IO - -import org.http4s.{Request, Method, Uri, Status} - -import cats.effect.testing.specs2.CatsIO - +import org.http4s.{Method, Request, Status, Uri} +import cats.effect.testing.specs2.CatsEffect import org.specs2.mutable.Specification import org.specs2.specification.BeforeAfterAll - import org.testcontainers.containers.GenericContainer - import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ import com.snowplowanalytics.snowplow.collectors.scalastream.it.{EventGenerator, Http} -class GooglePubSubCollectorSpec extends Specification with CatsIO with BeforeAfterAll { +class GooglePubSubCollectorSpec extends Specification with CatsEffect with BeforeAfterAll { override protected val Timeout = 5.minutes @@ -48,7 +42,7 @@ class GooglePubSubCollectorSpec extends Specification with CatsIO with BeforeAft "good", "bad" ).use { collector => - IO(collector.container.getLogs() must contain(("REST interface bound to"))) + IO(collector.container.getLogs() must contain("Service bound to address")) } } @@ -109,7 +103,7 @@ class GooglePubSubCollectorSpec extends Specification with CatsIO with BeforeAft _ <- waitWhile[GenericContainer[_]](container, _.isRunning, stopTimeout) } yield { container.isRunning() must beFalse - container.getLogs() must contain("Server terminated") + container.getLogs() must contain("Closing NIO1 channel") } } } diff --git a/pubsub/src/main/resources/application.conf b/pubsub/src/main/resources/application.conf index f0967a1bf..3b408c10d 100644 --- a/pubsub/src/main/resources/application.conf +++ b/pubsub/src/main/resources/application.conf @@ -1,4 +1,4 @@ -collector { +{ streams { sink { enabled = google-pub-sub @@ -29,26 +29,4 @@ collector { timeLimit = 1000 } } -} - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off - } -} +} \ No newline at end of file diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala deleted file mode 100644 index 55938984b..000000000 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/GooglePubSubCollector.scala +++ /dev/null @@ -1,54 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import cats.syntax.either._ -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.GooglePubSubSink -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService - -object GooglePubSubCollector extends Collector { - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion - - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks: Either[Throwable, CollectorSinks] = for { - pc <- collectorConf.streams.sink match { - case pc: GooglePubSub => pc.asRight - case _ => new IllegalArgumentException("Configured sink is not PubSub").asLeft - } - goodStream = collectorConf.streams.good - badStream = collectorConf.streams.bad - bufferConf = collectorConf.streams.buffer - good <- GooglePubSubSink.createAndInitialize( - pc.maxBytes, - pc, - bufferConf, - goodStream - ) - bad <- GooglePubSubSink.createAndInitialize( - pc.maxBytes, - pc, - bufferConf, - badStream - ) - } yield CollectorSinks(good, bad) - - sinks match { - case Right(s) => run(collectorConf, akkaConf, s, telemetry) - case Left(e) => throw e - } - } -} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala new file mode 100644 index 000000000..6a1648ca6 --- /dev/null +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala @@ -0,0 +1,16 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect._ +import cats.effect.kernel.Resource +import com.snowplowanalytics.snowplow.collector.core.model.Sinks +import com.snowplowanalytics.snowplow.collector.core.{App, Config} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.{PubSubSink, PubSubSinkConfig} + +object PubSubCollector extends App[PubSubSinkConfig](BuildInfo) { + + override def mkSinks(config: Config.Streams[PubSubSinkConfig]): Resource[IO, Sinks[IO]] = + for { + good <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.good) + bad <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.bad) + } yield Sinks(good, bad) +} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/BuilderOps.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/BuilderOps.scala new file mode 100644 index 000000000..0290bef24 --- /dev/null +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/BuilderOps.scala @@ -0,0 +1,36 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import com.google.api.gax.core.NoCredentialsProvider +import com.google.api.gax.grpc.GrpcTransportChannel +import com.google.api.gax.rpc.FixedTransportChannelProvider +import com.google.cloud.pubsub.v1.{Publisher, TopicAdminSettings} +import io.grpc.ManagedChannelBuilder + +object BuilderOps { + + implicit class PublisherBuilderOps(val builder: Publisher.Builder) extends AnyVal { + def setProvidersForEmulator(): Publisher.Builder = + customEmulatorHost().fold(builder) { emulatorHost => + builder + .setChannelProvider(createCustomChannelProvider(emulatorHost)) + .setCredentialsProvider(NoCredentialsProvider.create()) + } + } + + implicit class TopicAdminBuilderOps(val builder: TopicAdminSettings.Builder) extends AnyVal { + def setProvidersForEmulator(): TopicAdminSettings.Builder = + customEmulatorHost().fold(builder) { emulatorHost => + builder + .setTransportChannelProvider(createCustomChannelProvider(emulatorHost)) + .setCredentialsProvider(NoCredentialsProvider.create()) + } + } + + private def customEmulatorHost(): Option[String] = + sys.env.get("PUBSUB_EMULATOR_HOST") + + private def createCustomChannelProvider(emulatorHost: String): FixedTransportChannelProvider = { + val channel = ManagedChannelBuilder.forTarget(emulatorHost).usePlaintext().build() + FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel)) + } +} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala deleted file mode 100644 index 8d9fb2943..000000000 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GooglePubSubSink.scala +++ /dev/null @@ -1,272 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -package sinks - -import java.util.concurrent.Executors - -import scala.collection.JavaConverters._ -import scala.collection.mutable.ListBuffer -import scala.util._ -import scala.concurrent.duration.{FiniteDuration, MILLISECONDS} -import scala.concurrent.duration._ - -import org.threeten.bp.Duration - -import com.google.api.core.{ApiFutureCallback, ApiFutures} -import com.google.api.gax.batching.BatchingSettings -import com.google.api.gax.retrying.RetrySettings -import com.google.api.gax.core.{CredentialsProvider, NoCredentialsProvider} -import com.google.api.gax.grpc.GrpcTransportChannel -import com.google.api.gax.rpc.{ - ApiException, - FixedHeaderProvider, - FixedTransportChannelProvider, - TransportChannelProvider -} -import com.google.cloud.pubsub.v1.{Publisher, TopicAdminClient, TopicAdminSettings} -import com.google.pubsub.v1.{ProjectName, ProjectTopicName, PubsubMessage, TopicName} -import com.google.protobuf.ByteString - -import io.grpc.ManagedChannelBuilder - -import cats.syntax.either._ - -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -class GooglePubSubSink private ( - val maxBytes: Int, - publisher: Publisher, - projectId: String, - topicName: String, - retryInterval: FiniteDuration -) extends Sink { - private val logExecutor = Executors.newSingleThreadExecutor() - // 2 = 1 for health check + 1 for retrying failed inserts - private val scheduledExecutor = Executors.newScheduledThreadPool(2) - - private val failedInsertsBuffer = ListBuffer.empty[Array[Byte]] - - @volatile private var pubsubHealthy: Boolean = false - override def isHealthy: Boolean = pubsubHealthy - - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - if (events.nonEmpty) { - log.debug(s"Writing ${events.size} records to PubSub topic $topicName") - - events.foreach { event => - publisher.asRight.map { p => - val future = p.publish(eventToPubsubMessage(event)) - ApiFutures.addCallback( - future, - new ApiFutureCallback[String]() { - override def onSuccess(messageId: String): Unit = { - pubsubHealthy = true - log.debug(s"Successfully published event with id $messageId to $topicName") - } - - override def onFailure(throwable: Throwable): Unit = { - pubsubHealthy = false - throwable match { - case apiEx: ApiException => - val retryable = if (apiEx.isRetryable()) "retryable" else "non-retryable" - log.error( - s"Publishing message to $topicName failed with code ${apiEx.getStatusCode} and $retryable error: ${apiEx.getMessage}" - ) - case t => log.error(s"Publishing message to $topicName failed with error: ${t.getMessage}") - } - failedInsertsBuffer.synchronized { - failedInsertsBuffer.prepend(event) - } - } - }, - logExecutor - ) - } - } - } - - override def shutdown(): Unit = { - publisher.shutdown() - scheduledExecutor.shutdown() - scheduledExecutor.awaitTermination(10000, MILLISECONDS) - () - } - - /** - * Convert event bytes to a PubsubMessage to be published - * @param event Event to be converted - * @return a PubsubMessage - */ - private def eventToPubsubMessage(event: Array[Byte]): PubsubMessage = - PubsubMessage.newBuilder.setData(ByteString.copyFrom(event)).build() - - private def retryRunnable: Runnable = new Runnable { - override def run() { - val failedInserts = failedInsertsBuffer.synchronized { - val records = failedInsertsBuffer.toList - failedInsertsBuffer.clear() - records - } - if (failedInserts.nonEmpty) { - log.info(s"Retrying to insert ${failedInserts.size} records into $topicName") - storeRawEvents(failedInserts, "NOT USED") - } - } - } - scheduledExecutor.scheduleWithFixedDelay(retryRunnable, retryInterval.toMillis, retryInterval.toMillis, MILLISECONDS) - - private def checkPubsubHealth( - customProviders: Option[(TransportChannelProvider, CredentialsProvider)], - startupCheckInterval: FiniteDuration - ): Unit = { - val healthRunnable = new Runnable { - override def run() { - val topicAdmin = GooglePubSubSink.createTopicAdmin(customProviders) - - while (!pubsubHealthy) { - GooglePubSubSink.topicExists(topicAdmin, projectId, topicName) match { - case Right(true) => - log.info(s"Topic $topicName exists") - pubsubHealthy = true - case Right(false) => - log.error(s"Topic $topicName doesn't exist") - Thread.sleep(startupCheckInterval.toMillis) - case Left(err) => - log.error(s"Error while checking if topic $topicName exists: ${err.getCause()}") - Thread.sleep(startupCheckInterval.toMillis) - } - } - - Either.catchNonFatal(topicAdmin.close()) match { - case Right(_) => - case Left(err) => - log.error(s"Error when closing topicAdmin: ${err.getMessage()}") - } - } - } - scheduledExecutor.execute(healthRunnable) - } -} - -/** GooglePubSubSink companion object with factory method */ -object GooglePubSubSink { - def createAndInitialize( - maxBytes: Int, - googlePubSubConfig: GooglePubSub, - bufferConfig: BufferConfig, - topicName: String - ): Either[Throwable, GooglePubSubSink] = - for { - batching <- batchingSettings(bufferConfig).asRight - retry = retrySettings(googlePubSubConfig.backoffPolicy) - customProviders = sys.env.get("PUBSUB_EMULATOR_HOST").map { hostPort => - val channel = ManagedChannelBuilder.forTarget(hostPort).usePlaintext().build() - val channelProvider = FixedTransportChannelProvider.create(GrpcTransportChannel.create(channel)) - val credentialsProvider = NoCredentialsProvider.create() - (channelProvider, credentialsProvider) - } - publisher <- createPublisher( - googlePubSubConfig.googleProjectId, - topicName, - batching, - retry, - customProviders, - googlePubSubConfig.gcpUserAgent - ) - sink = new GooglePubSubSink( - maxBytes, - publisher, - googlePubSubConfig.googleProjectId, - topicName, - googlePubSubConfig.retryInterval - ) - _ = sink.checkPubsubHealth(customProviders, googlePubSubConfig.startupCheckInterval) - } yield sink - - /** - * Instantiates a Publisher on a topic with the given configuration options. - * This can fail if the publisher can't be created. - * @return a PubSub publisher or an error - */ - private def createPublisher( - projectId: String, - topicName: String, - batchingSettings: BatchingSettings, - retrySettings: RetrySettings, - customProviders: Option[(TransportChannelProvider, CredentialsProvider)], - gcpUserAgent: GcpUserAgent - ): Either[Throwable, Publisher] = { - val builder = Publisher - .newBuilder(ProjectTopicName.of(projectId, topicName)) - .setBatchingSettings(batchingSettings) - .setRetrySettings(retrySettings) - .setHeaderProvider(FixedHeaderProvider.create("User-Agent", createUserAgent(gcpUserAgent))) - customProviders.foreach { - case (channelProvider, credentialsProvider) => - builder.setChannelProvider(channelProvider).setCredentialsProvider(credentialsProvider) - } - Either.catchNonFatal(builder.build()).leftMap(e => new RuntimeException("Couldn't build PubSub publisher", e)) - } - - private[sinks] def createUserAgent(gcpUserAgent: GcpUserAgent): String = - s"${gcpUserAgent.productName}/collector (GPN:Snowplow;)" - - private def batchingSettings(bufferConfig: BufferConfig): BatchingSettings = - BatchingSettings - .newBuilder() - .setElementCountThreshold(bufferConfig.recordLimit) - .setRequestByteThreshold(bufferConfig.byteLimit) - .setDelayThreshold(Duration.ofMillis(bufferConfig.timeLimit)) - .build() - - /** Defaults are used for the rpc configuration, see Publisher.java */ - private def retrySettings(backoffPolicy: GooglePubSubBackoffPolicyConfig): RetrySettings = - RetrySettings - .newBuilder() - .setInitialRetryDelay(Duration.ofMillis(backoffPolicy.minBackoff)) - .setMaxRetryDelay(Duration.ofMillis(backoffPolicy.maxBackoff)) - .setRetryDelayMultiplier(backoffPolicy.multiplier) - .setTotalTimeout(Duration.ofMillis(backoffPolicy.totalBackoff)) - .setInitialRpcTimeout(Duration.ofMillis(backoffPolicy.initialRpcTimeout)) - .setRpcTimeoutMultiplier(backoffPolicy.rpcTimeoutMultiplier) - .setMaxRpcTimeout(Duration.ofMillis(backoffPolicy.maxRpcTimeout)) - .build() - - private def createTopicAdmin( - customProviders: Option[(TransportChannelProvider, CredentialsProvider)] - ): TopicAdminClient = - customProviders match { - case Some((channelProvider, credentialsProvider)) => - TopicAdminClient.create( - TopicAdminSettings - .newBuilder() - .setTransportChannelProvider(channelProvider) - .setCredentialsProvider(credentialsProvider) - .build() - ) - case None => - TopicAdminClient.create() - } - - private def topicExists( - topicAdmin: TopicAdminClient, - projectId: String, - topicName: String - ): Either[Throwable, Boolean] = - Either - .catchNonFatal(topicAdmin.listTopics(ProjectName.of(projectId))) - .leftMap(new RuntimeException(s"Can't list topics", _)) - .map(_.iterateAll.asScala.toList.map(_.getName())) - .flatMap { topics => - topics.contains(TopicName.of(projectId, topicName).toString()).asRight - } -} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala new file mode 100644 index 000000000..07940c3c0 --- /dev/null +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala @@ -0,0 +1,74 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import cats.effect.implicits.genSpawnOps +import cats.effect.{Async, Ref, Resource, Sync} +import cats.implicits._ +import com.google.cloud.pubsub.v1.{TopicAdminClient, TopicAdminSettings} +import com.google.pubsub.v1.{ProjectName, TopicName} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.BuilderOps._ +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import scala.collection.JavaConverters._ +import scala.util._ + +object PubSubHealthCheck { + + implicit private def unsafeLogger[F[_]: Sync]: Logger[F] = + Slf4jLogger.getLogger[F] + + def run[F[_]: Async]( + isHealthyState: Ref[F, Boolean], + sinkConfig: PubSubSinkConfig, + topicName: String + ): Resource[F, Unit] = + for { + topicAdminClient <- createTopicAdminClient[F]() + healthCheckTask = createHealthCheckTask[F](topicAdminClient, isHealthyState, sinkConfig, topicName) + _ <- repeatInBackgroundUntilHealthy(isHealthyState, sinkConfig, healthCheckTask) + } yield () + + private def repeatInBackgroundUntilHealthy[F[_]: Async]( + isHealthyState: Ref[F, Boolean], + sinkConfig: PubSubSinkConfig, + healthCheckTask: F[Unit] + ): Resource[F, Unit] = { + val checkThenSleep = healthCheckTask *> Async[F].sleep(sinkConfig.startupCheckInterval) + checkThenSleep.untilM_(isHealthyState.get).background.void + } + + private def createHealthCheckTask[F[_]: Async]( + topicAdminClient: TopicAdminClient, + isHealthyState: Ref[F, Boolean], + sinkConfig: PubSubSinkConfig, + topicName: String + ): F[Unit] = + topicExists(topicAdminClient, sinkConfig.googleProjectId, topicName).flatMap { + case Right(true) => + Logger[F].info(s"Topic $topicName exists") *> isHealthyState.set(true) + case Right(false) => + Logger[F].error(s"Topic $topicName doesn't exist") + case Left(err) => + Logger[F].error(s"Error while checking if topic $topicName exists: ${err.getCause}") + } + + private def createTopicAdminClient[F[_]: Sync](): Resource[F, TopicAdminClient] = { + val builder = TopicAdminSettings.newBuilder().setProvidersForEmulator().build() + Resource.make(Sync[F].delay(TopicAdminClient.create(builder)))(client => Sync[F].delay(client.close())) + } + + private def topicExists[F[_]: Sync]( + topicAdmin: TopicAdminClient, + projectId: String, + topicName: String + ): F[Either[Throwable, Boolean]] = Sync[F].delay { + Either + .catchNonFatal(topicAdmin.listTopics(ProjectName.of(projectId))) + .leftMap(new RuntimeException(s"Can't list topics", _)) + .map(_.iterateAll.asScala.toList.map(_.getName())) + .flatMap { topics => + topics.contains(TopicName.of(projectId, topicName).toString).asRight + } + } + +} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala new file mode 100644 index 000000000..17300d0e0 --- /dev/null +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala @@ -0,0 +1,133 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import cats.Parallel +import cats.effect.implicits.genSpawnOps +import cats.effect.{Async, Ref, Resource, Sync} +import cats.implicits._ +import com.google.api.gax.retrying.RetrySettings +import com.google.api.gax.rpc.{ApiException, FixedHeaderProvider} +import com.permutive.pubsub.producer.Model.{ProjectId, Topic} +import com.permutive.pubsub.producer.encoder.MessageEncoder +import com.permutive.pubsub.producer.grpc.{GooglePubsubProducer, PubsubProducerConfig} +import com.permutive.pubsub.producer.{Model, PubsubProducer} +import com.snowplowanalytics.snowplow.collector.core.{Config, Sink} +import com.snowplowanalytics.snowplow.collectors.scalastream.BuildInfo +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.BuilderOps._ +import org.threeten.bp.Duration +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger +import retry.RetryPolicies +import retry.syntax.all._ + +import scala.concurrent.duration.{DurationLong, FiniteDuration} +import scala.util._ + +class PubSubSink[F[_]: Async: Parallel: Logger] private ( + override val maxBytes: Int, + isHealthyState: Ref[F, Boolean], + producer: PubsubProducer[F, Array[Byte]], + retryInterval: FiniteDuration, + topicName: String +) extends Sink[F] { + + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + produceBatch(events).start.void + + override def isHealthy: F[Boolean] = isHealthyState.get + + private def produceBatch(events: List[Array[Byte]]): F[Unit] = + events.parTraverse_ { event => + produceSingleEvent(event) + } *> isHealthyState.set(true) + + private def produceSingleEvent(event: Array[Byte]): F[Model.MessageId] = + producer + .produce(event) + .retryingOnAllErrors( + policy = RetryPolicies.constantDelay(retryInterval), + onError = (error, _) => handlePublishError(error) + ) + + private def handlePublishError(error: Throwable): F[Unit] = + isHealthyState.set(false) *> Logger[F].error(createErrorMessage(error)) + + private def createErrorMessage(error: Throwable): String = + error match { + case apiEx: ApiException => + val retryable = if (apiEx.isRetryable) "retryable" else "non-retryable" + s"Publishing message to $topicName failed with code ${apiEx.getStatusCode} and $retryable error: ${apiEx.getMessage}" + case throwable => s"Publishing message to $topicName failed with error: ${throwable.getMessage}" + } +} + +object PubSubSink { + + implicit private def unsafeLogger[F[_]: Sync]: Logger[F] = + Slf4jLogger.getLogger[F] + + implicit val byteArrayEncoder: MessageEncoder[Array[Byte]] = + new MessageEncoder[Array[Byte]] { + def encode(a: Array[Byte]): Either[Throwable, Array[Byte]] = + a.asRight + } + + def create[F[_]: Async: Parallel]( + maxBytes: Int, + sinkConfig: PubSubSinkConfig, + bufferConfig: Config.Buffer, + topicName: String + ): Resource[F, Sink[F]] = + for { + isHealthyState <- Resource.eval(Ref.of[F, Boolean](false)) + producer <- createProducer[F](sinkConfig, topicName, bufferConfig) + _ <- PubSubHealthCheck.run(isHealthyState, sinkConfig, topicName) + } yield new PubSubSink( + maxBytes, + isHealthyState, + producer, + sinkConfig.retryInterval, + topicName + ) + + private def createProducer[F[_]: Async: Parallel]( + sinkConfig: PubSubSinkConfig, + topicName: String, + bufferConfig: Config.Buffer + ): Resource[F, PubsubProducer[F, Array[Byte]]] = { + val config = PubsubProducerConfig[F]( + batchSize = bufferConfig.recordLimit, + requestByteThreshold = Some(bufferConfig.byteLimit), + delayThreshold = bufferConfig.timeLimit.millis, + onFailedTerminate = err => Logger[F].error(err)("PubSub sink termination error"), + customizePublisher = Some { + _.setRetrySettings(retrySettings(sinkConfig.backoffPolicy)) + .setHeaderProvider(FixedHeaderProvider.create("User-Agent", BuildInfo.dockerAlias)) + .setProvidersForEmulator() + } + ) + + GooglePubsubProducer.of[F, Array[Byte]](ProjectId(sinkConfig.googleProjectId), Topic(topicName), config) + } + + private def retrySettings(backoffPolicy: PubSubSinkConfig.BackoffPolicy): RetrySettings = + RetrySettings + .newBuilder() + .setInitialRetryDelay(Duration.ofMillis(backoffPolicy.minBackoff)) + .setMaxRetryDelay(Duration.ofMillis(backoffPolicy.maxBackoff)) + .setRetryDelayMultiplier(backoffPolicy.multiplier) + .setTotalTimeout(Duration.ofMillis(backoffPolicy.totalBackoff)) + .setInitialRpcTimeout(Duration.ofMillis(backoffPolicy.initialRpcTimeout)) + .setRpcTimeoutMultiplier(backoffPolicy.rpcTimeoutMultiplier) + .setMaxRpcTimeout(Duration.ofMillis(backoffPolicy.maxRpcTimeout)) + .build() +} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala new file mode 100644 index 000000000..d8c92955b --- /dev/null +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala @@ -0,0 +1,33 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import com.snowplowanalytics.snowplow.collector.core.Config +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.PubSubSinkConfig.BackoffPolicy +import io.circe.Decoder +import io.circe.config.syntax.durationDecoder +import io.circe.generic.semiauto._ + +import scala.concurrent.duration.FiniteDuration + +final case class PubSubSinkConfig( + maxBytes: Int, + googleProjectId: String, + backoffPolicy: BackoffPolicy, + startupCheckInterval: FiniteDuration, + retryInterval: FiniteDuration +) extends Config.Sink + +object PubSubSinkConfig { + + final case class BackoffPolicy( + minBackoff: Long, + maxBackoff: Long, + totalBackoff: Long, + multiplier: Double, + initialRpcTimeout: Long, + maxRpcTimeout: Long, + rpcTimeoutMultiplier: Double + ) + implicit val configDecoder: Decoder[PubSubSinkConfig] = deriveDecoder[PubSubSinkConfig] + implicit val backoffPolicyConfigDecoder: Decoder[BackoffPolicy] = + deriveDecoder[BackoffPolicy] +} diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala new file mode 100644 index 000000000..e2bbba7e9 --- /dev/null +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -0,0 +1,125 @@ +/* + * Copyright (c) 2012-present Snowplow Analytics Ltd. All rights reserved. + * + * This program is licensed to you under the Snowplow Community License Version 1.0, + * and you may not use this file except in compliance with the Snowplow Community License Version 1.0. + * You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0 + */ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.testing.specs2.CatsEffect +import cats.effect.{ExitCode, IO} +import com.snowplowanalytics.snowplow.collector.core.{Config, ConfigParser} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.PubSubSinkConfig +import org.http4s.SameSite +import org.specs2.mutable.Specification + +import java.nio.file.Paths +import scala.concurrent.duration.DurationInt + +class ConfigSpec extends Specification with CatsEffect { + + "Config parser" should { + "be able to parse extended pubsub config" in { + assert( + resource = "/config.pubsub.extended.hocon", + expectedResult = Right(ConfigSpec.expectedConfig) + ) + } + "be able to parse minimal pubsub config" in { + assert( + resource = "/config.pubsub.minimal.hocon", + expectedResult = Right(ConfigSpec.expectedConfig) + ) + } + } + + private def assert(resource: String, expectedResult: Either[ExitCode, Config[PubSubSinkConfig]]) = { + val path = Paths.get(getClass.getResource(resource).toURI) + ConfigParser.fromPath[IO, PubSubSinkConfig](Some(path)).value.map { result => + result must beEqualTo(expectedResult) + } + } +} + +object ConfigSpec { + + private val expectedConfig = Config[PubSubSinkConfig]( + interface = "0.0.0.0", + port = 8080, + paths = Map.empty[String, String], + p3p = Config.P3P( + policyRef = "/w3c/p3p.xml", + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + ), + crossDomain = Config.CrossDomain( + enabled = false, + domains = List("*"), + secure = true + ), + cookie = Config.Cookie( + enabled = true, + expiration = 365.days, + name = "sp", + domains = List.empty, + fallbackDomain = None, + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ), + doNotTrackCookie = Config.DoNotTrackCookie( + enabled = false, + name = "", + value = "" + ), + cookieBounce = Config.CookieBounce( + enabled = false, + name = "n3pc", + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000", + forwardedProtocolHeader = None + ), + redirectMacro = Config.RedirectMacro( + enabled = false, + placeholder = None + ), + rootResponse = Config.RootResponse( + enabled = false, + statusCode = 302, + headers = Map.empty[String, String], + body = "" + ), + cors = Config.CORS(1.hour), + monitoring = + Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + ssl = Config.SSL(enable = false, redirect = false, port = 443), + enableDefaultRedirect = false, + redirectDomains = Set.empty, + preTerminationPeriod = 10.seconds, + streams = Config.Streams( + good = "good", + bad = "bad", + useIpAddressAsPartitionKey = false, + buffer = Config.Buffer( + byteLimit = 100000, + recordLimit = 40, + timeLimit = 1000 + ), + sink = PubSubSinkConfig( + maxBytes = 10000000, + googleProjectId = "google-project-id", + backoffPolicy = PubSubSinkConfig.BackoffPolicy( + minBackoff = 1000, + maxBackoff = 1000, + totalBackoff = 9223372036854L, + multiplier = 2, + initialRpcTimeout = 10000, + maxRpcTimeout = 10000, + rpcTimeoutMultiplier = 2 + ), + startupCheckInterval = 1.second, + retryInterval = 10.seconds + ) + ) + ) + +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala index 4fdc196c4..5f4dd8659 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala @@ -1,17 +1,16 @@ package com.snowplowanalytics.snowplow.collector.stdout -import cats.effect.Sync +import cats.effect.IO import cats.effect.kernel.Resource - import com.snowplowanalytics.snowplow.collector.core.model.Sinks import com.snowplowanalytics.snowplow.collector.core.App import com.snowplowanalytics.snowplow.collector.core.Config object StdoutCollector extends App[SinkConfig](BuildInfo) { - override def mkSinks[F[_]: Sync](config: Config.Streams[SinkConfig]): Resource[F, Sinks[F]] = { - val good = new PrintingSink(config.sink.maxBytes, System.out) - val bad = new PrintingSink(config.sink.maxBytes, System.err) + override def mkSinks(config: Config.Streams[SinkConfig]): Resource[IO, Sinks[IO]] = { + val good = new PrintingSink[IO](config.sink.maxBytes, System.out) + val bad = new PrintingSink[IO](config.sink.maxBytes, System.err) Resource.pure(Sinks(good, bad)) } } From 2654236460ddfdb70dc9cbc92bc7ac521646cf2c Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Wed, 23 Aug 2023 07:48:59 +0200 Subject: [PATCH 15/39] Add iglu routes spec (close #377) This add running spec for iglu routes so we document the bahaviour of that widely used endpoint. --- .../RoutesSpec.scala | 63 ++++++++++++++++++- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 229865d20..8dc9e824b 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -54,7 +54,7 @@ class RoutesSpec extends Specification { Response(status = Ok, body = Stream.emit("cookie").through(text.utf8.encode)) } - override def determinePath(vendor: String, version: String): String = "/p1/p2" + override def determinePath(vendor: String, version: String): String = s"/$vendor/$version" override def sinksHealthy: IO[Boolean] = IO.pure(true) } @@ -97,7 +97,7 @@ class RoutesSpec extends Specification { val List(cookieParams) = collectorService.getCookieCalls cookieParams.body.unsafeRunSync() shouldEqual Some("testBody") - cookieParams.path shouldEqual "/p1/p2" + cookieParams.path shouldEqual "/p3/p4" cookieParams.pixelExpected shouldEqual false cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual Some("application/json") @@ -115,7 +115,7 @@ class RoutesSpec extends Specification { val List(cookieParams) = collectorService.getCookieCalls cookieParams.body.unsafeRunSync() shouldEqual None - cookieParams.path shouldEqual "/p1/p2" + cookieParams.path shouldEqual "/p3/p4" cookieParams.pixelExpected shouldEqual true cookieParams.doNotTrack shouldEqual false cookieParams.contentType shouldEqual None @@ -152,6 +152,63 @@ class RoutesSpec extends Specification { test(Method.HEAD, "/ice.png") } + "respond to the iglu webhook with the cookie response" in { + def test(method: Method, uri: Uri, contentType: Option[`Content-Type`], body: Option[String]) = { + val (collectorService, routes) = createTestServices() + + val request = Request[IO](method, uri).withEntity(body.getOrElse("")).withContentTypeOption(contentType) + val response = routes.run(request).unsafeRunSync() + + val List(cookieParams) = collectorService.getCookieCalls + cookieParams.body.unsafeRunSync() shouldEqual body + cookieParams.path shouldEqual uri.path.renderString + method match { + case Method.POST => + for { + actual <- cookieParams.contentType + expected <- contentType + } yield `Content-Type`.parse(actual) must beRight(expected) + case Method.GET => + cookieParams.pixelExpected shouldEqual true + cookieParams.contentType shouldEqual None + } + cookieParams.doNotTrack shouldEqual false + response.status must beEqualTo(Status.Ok) + response.bodyText.compile.string.unsafeRunSync() must beEqualTo("cookie") + } + + val jsonBody = """{ "network": "twitter", "action": "retweet" }""" + val sdjBody = s"""{ + | "schema":"iglu:com.snowplowanalytics.snowplow/social_interaction/jsonschema/1-0-0", + | "data": $jsonBody + |}""".stripMargin + + test( + Method.POST, + uri"/com.snowplowanalytics.iglu/v1", + Some(`Content-Type`(MediaType.application.json): `Content-Type`), + Some(sdjBody) + ) + test( + Method.POST, + uri"/com.snowplowanalytics.iglu/v1?schema=iglu%3Acom.snowplowanalytics.snowplow%2Fsocial_interaction%2Fjsonschema%2F1-0-0", + Some(`Content-Type`(MediaType.application.json).withCharset(Charset.`UTF-8`)), + Some(jsonBody) + ) + test( + Method.POST, + uri"/com.snowplowanalytics.iglu/v1?schema=iglu%3Acom.snowplowanalytics.snowplow%2Fsocial_interaction%2Fjsonschema%2F1-0-0", + Some(`Content-Type`(MediaType.application.`x-www-form-urlencoded`)), + Some("network=twitter&action=retweet") + ) + test( + Method.GET, + uri"""/com.snowplowanalytics.iglu/v1?schema=iglu%3Acom.snowplowanalytics.snowplow%2Fsocial_interaction%2Fjsonschema%2F1-0-0&aid=mobile-attribution&p=mob&network=twitter&action=retweet""", + None, + None + ) + } + "allow redirect routes when redirects enabled" in { def test(method: Method) = { val (_, routes) = createTestServices() From 2516c6bb5305ff042ec13cf4a15016ace4f7270e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Piotr=20Poniedzia=C5=82ek?= Date: Fri, 18 Aug 2023 15:53:02 +0200 Subject: [PATCH 16/39] Add http4s Kinesis sink (close #379) --- build.sbt | 12 +- .../Config.scala | 1 + .../collector-cookie-anonymous.hocon | 30 ++++ .../collector-cookie-attributes-1.hocon | 30 ++++ .../collector-cookie-attributes-2.hocon | 30 ++++ .../resources/collector-cookie-domain.hocon | 32 ++++ .../resources/collector-cookie-fallback.hocon | 32 ++++ .../collector-cookie-no-domain.hocon | 30 ++++ .../it/resources/collector-custom-paths.hocon | 27 +++ .../collector-doNotTrackCookie-disabled.hocon | 27 +++ .../collector-doNotTrackCookie-enabled.hocon | 27 +++ .../scalastream/it/core/CookieSpec.scala | 158 +++++------------- .../scalastream/it/core/CustomPathsSpec.scala | 48 ++---- .../it/core/DoNotTrackCookieSpec.scala | 61 +++---- .../it/core/HealthEndpointSpec.scala | 22 +-- .../it/core/XForwardedForSpec.scala | 32 ++-- .../it/kinesis/KinesisCollectorSpec.scala | 40 ++--- .../it/kinesis/containers/Collector.scala | 21 +-- kinesis/src/main/resources/application.conf | 25 +-- .../KinesisCollector.scala | 72 ++++---- .../sinks/KinesisSink.scala | 81 +++++---- .../sinks/KinesisSinkConfig.scala | 37 ++++ .../sinks/KinesisConfigSpec.scala | 121 +++++++++++++- 23 files changed, 637 insertions(+), 359 deletions(-) create mode 100644 kinesis/src/it/resources/collector-cookie-anonymous.hocon create mode 100644 kinesis/src/it/resources/collector-cookie-attributes-1.hocon create mode 100644 kinesis/src/it/resources/collector-cookie-attributes-2.hocon create mode 100644 kinesis/src/it/resources/collector-cookie-domain.hocon create mode 100644 kinesis/src/it/resources/collector-cookie-fallback.hocon create mode 100644 kinesis/src/it/resources/collector-cookie-no-domain.hocon create mode 100644 kinesis/src/it/resources/collector-custom-paths.hocon create mode 100644 kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon create mode 100644 kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon create mode 100644 kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala diff --git a/build.sbt b/build.sbt index b28020214..ee4f8923c 100644 --- a/build.sbt +++ b/build.sbt @@ -152,16 +152,18 @@ lazy val http4s = project .configs(IntegrationTest) lazy val kinesisSettings = - allSettings ++ buildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( + allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( moduleName := "snowplow-stream-collector-kinesis", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", Docker / packageName := "scala-stream-collector-kinesis", libraryDependencies ++= Seq( + Dependencies.Libraries.catsRetry, Dependencies.Libraries.kinesis, Dependencies.Libraries.sts, Dependencies.Libraries.sqs, // integration tests dependencies - Dependencies.Libraries.LegacyIT.specs2, - Dependencies.Libraries.LegacyIT.specs2CE + Dependencies.Libraries.IT.specs2, + Dependencies.Libraries.IT.specs2CE, ), IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated @@ -170,7 +172,7 @@ lazy val kinesisSettings = lazy val kinesis = project .settings(kinesisSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile;it->it") + .dependsOn(http4s % "test->test;compile->compile;it->it") .configs(IntegrationTest) lazy val kinesisDistroless = project @@ -178,7 +180,7 @@ lazy val kinesisDistroless = project .settings(sourceDirectory := (kinesis / sourceDirectory).value) .settings(kinesisSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile;it->it") + .dependsOn(http4s % "test->test;compile->compile;it->it") .configs(IntegrationTest) lazy val sqsSettings = diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index 5e40f43cb..f274ccf1c 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -149,4 +149,5 @@ object Config { implicit val ssl = deriveDecoder[SSL] deriveDecoder[Config[SinkConfig]] } + } diff --git a/kinesis/src/it/resources/collector-cookie-anonymous.hocon b/kinesis/src/it/resources/collector-cookie-anonymous.hocon new file mode 100644 index 000000000..55d7c4992 --- /dev/null +++ b/kinesis/src/it/resources/collector-cookie-anonymous.hocon @@ -0,0 +1,30 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "cookie": { + "enabled": true, + "name": "sp", + "expiration": "365 days", + "secure": false, + "httpOnly": false, + "sameSite": "None" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-cookie-attributes-1.hocon b/kinesis/src/it/resources/collector-cookie-attributes-1.hocon new file mode 100644 index 000000000..3ad47e0b3 --- /dev/null +++ b/kinesis/src/it/resources/collector-cookie-attributes-1.hocon @@ -0,0 +1,30 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "cookie": { + "enabled": true, + "name": "greatName", + "expiration": "42 days", + "secure": true, + "httpOnly": true, + "sameSite": "Strict" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-cookie-attributes-2.hocon b/kinesis/src/it/resources/collector-cookie-attributes-2.hocon new file mode 100644 index 000000000..55d7c4992 --- /dev/null +++ b/kinesis/src/it/resources/collector-cookie-attributes-2.hocon @@ -0,0 +1,30 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "cookie": { + "enabled": true, + "name": "sp", + "expiration": "365 days", + "secure": false, + "httpOnly": false, + "sameSite": "None" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-cookie-domain.hocon b/kinesis/src/it/resources/collector-cookie-domain.hocon new file mode 100644 index 000000000..d8bdbdc4b --- /dev/null +++ b/kinesis/src/it/resources/collector-cookie-domain.hocon @@ -0,0 +1,32 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "cookie": { + "enabled": true, + "name": "sp", + "expiration": "365 days", + "domains": ["foo.bar","sub.foo.bar"], + "fallbackDomain": "fallback.domain", + "secure": false, + "httpOnly": false, + "sameSite": "None" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-cookie-fallback.hocon b/kinesis/src/it/resources/collector-cookie-fallback.hocon new file mode 100644 index 000000000..ecef93c0a --- /dev/null +++ b/kinesis/src/it/resources/collector-cookie-fallback.hocon @@ -0,0 +1,32 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "cookie": { + "enabled": true, + "name": "sp", + "expiration": "365 days", + "domains": ["foo.bar" ], + "fallbackDomain": "fallback.domain", + "secure": false, + "httpOnly": false, + "sameSite": "None" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-cookie-no-domain.hocon b/kinesis/src/it/resources/collector-cookie-no-domain.hocon new file mode 100644 index 000000000..55d7c4992 --- /dev/null +++ b/kinesis/src/it/resources/collector-cookie-no-domain.hocon @@ -0,0 +1,30 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "cookie": { + "enabled": true, + "name": "sp", + "expiration": "365 days", + "secure": false, + "httpOnly": false, + "sameSite": "None" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-custom-paths.hocon b/kinesis/src/it/resources/collector-custom-paths.hocon new file mode 100644 index 000000000..f588fb1b6 --- /dev/null +++ b/kinesis/src/it/resources/collector-custom-paths.hocon @@ -0,0 +1,27 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "paths": { + "/acme/track": "/com.snowplowanalytics.snowplow/tp2", + "/acme/redirect": "/r/tp2", + "/acme/iglu": "/com.snowplowanalytics.iglu/v1" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon b/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon new file mode 100644 index 000000000..bf16f99a1 --- /dev/null +++ b/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon @@ -0,0 +1,27 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "doNotTrackCookie": { + "enabled": false, + "name" : "foo", + "value": "bar" + } +} \ No newline at end of file diff --git a/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon b/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon new file mode 100644 index 000000000..5415d8263 --- /dev/null +++ b/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon @@ -0,0 +1,27 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${STREAM_GOOD} + bad = ${STREAM_BAD} + + sink { + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } + + maxBytes = ${MAX_BYTES} + } + } + + "doNotTrackCookie": { + "enabled": true, + "name" : "foo", + "value": "bar" + } +} \ No newline at end of file diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala index 556d77f0a..7185e9904 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala @@ -10,71 +10,51 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core -import scala.concurrent.duration._ - -import cats.effect.testing.specs2.CatsIO - +import cats.effect.IO +import cats.effect.testing.specs2.CatsEffect +import com.snowplowanalytics.snowplow.collectors.scalastream.it.{EventGenerator, Http} +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ import org.http4s.{Header, SameSite} - import org.specs2.mutable.Specification +import org.typelevel.ci.CIStringSyntax -import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http -import com.snowplowanalytics.snowplow.collectors.scalastream.it.EventGenerator - -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ +import scala.concurrent.duration._ -class CookieSpec extends Specification with Localstack with CatsIO { +class CookieSpec extends Specification with Localstack with CatsEffect { override protected val Timeout = 5.minutes "collector" should { - "set cookie attributes according to configuration" in { - "name, expiration, secure true, httpOnly true, SameSite" in { val testName = "cookie-attributes-1" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" - - val name = "greatName" - val expiration = 42 days - val secure = true - val httpOnly = true - val sameSite = SameSite.Strict + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-cookie-attributes-1.hocon", testName, streamGood, - streamBad, - additionalConfig = Some( - mkConfig( - name = name, - expiration = expiration, - secure = secure, - httpOnly = httpOnly, - sameSite = sameSite.toString() - ) - ) + streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port) for { resp <- Http.response(request) - now <- ioTimer.clock.realTime(SECONDS) + now <- IO.realTime } yield { resp.cookies match { case List(cookie) => - cookie.name must beEqualTo(name) + cookie.name must beEqualTo("greatName") cookie.expires match { case Some(expiry) => - expiry.epochSecond should beCloseTo(now + expiration.toSeconds, 10) + expiry.epochSecond should beCloseTo((now + 42.days).toSeconds, 10) case None => ko(s"Cookie [$cookie] doesn't contain the expiry date") } cookie.secure should beTrue cookie.httpOnly should beTrue - cookie.sameSite should beEqualTo(sameSite) + cookie.sameSite should beSome(SameSite.Strict) case _ => ko(s"There is not 1 cookie but ${resp.cookies.size}") } } @@ -83,23 +63,14 @@ class CookieSpec extends Specification with Localstack with CatsIO { "secure false, httpOnly false" in { val testName = "cookie-attributes-2" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" - - val httpOnly = false - val secure = false + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-cookie-attributes-2.hocon", testName, streamGood, - streamBad, - additionalConfig = Some( - mkConfig( - secure = secure, - httpOnly = httpOnly - ) - ) + streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port) @@ -108,7 +79,7 @@ class CookieSpec extends Specification with Localstack with CatsIO { } yield { resp.cookies match { case List(cookie) => - cookie.secure should beFalse + cookie.secure should beTrue cookie.httpOnly should beFalse case _ => ko(s"There is not 1 cookie but ${resp.cookies.size}") } @@ -119,18 +90,17 @@ class CookieSpec extends Specification with Localstack with CatsIO { "not set cookie if the request sets SP-Anonymous header" in { val testName = "cookie-anonymous" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-cookie-anonymous.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(mkConfig()) + streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port) - .withHeaders(Header("SP-Anonymous", "*")) + .withHeaders(Header.Raw(ci"SP-Anonymous", "*")) for { resp <- Http.response(request) @@ -142,18 +112,17 @@ class CookieSpec extends Specification with Localstack with CatsIO { "not set the domain property of the cookie if collector.cookie.domains and collector.cookie.fallbackDomain are empty" in { val testName = "cookie-no-domain" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-cookie-no-domain.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(mkConfig()) + streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port) - .withHeaders(Header("Origin", "http://my.domain")) + .withHeaders(Header.Raw(ci"Origin", "http://my.domain")) for { resp <- Http.response(request) @@ -165,32 +134,24 @@ class CookieSpec extends Specification with Localstack with CatsIO { "set the domain property of the cookie to the first domain of collector.cookie.domains that matches Origin, even with fallbackDomain enabled" in { val testName = "cookie-domain" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" - - val domain = "foo.bar" - val subDomain = s"sub.$domain" - val fallbackDomain = "fallback.domain" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-cookie-domain.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(mkConfig( - domains = Some(List(domain, subDomain)), - fallbackDomain = Some(fallbackDomain) - )) + streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port) - .withHeaders(Header("Origin", s"http://$subDomain")) + .withHeaders(Header.Raw(ci"Origin", "http://sub.foo.bar")) for { resp <- Http.response(request) } yield { resp.cookies match { case List(cookie) => - cookie.domain should beSome(domain) + cookie.domain should beSome("foo.bar") case _ => ko(s"There is not 1 cookie but ${resp.cookies.size}") } } @@ -199,60 +160,25 @@ class CookieSpec extends Specification with Localstack with CatsIO { "set the domain property of the cookie to collector.cookie.fallbackDomain if there is no Origin header in the request or if it contains no host that is in collector.cookie.domains" in { val testName = "cookie-fallback" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" - - val domain = "foo.bar" - val fallbackDomain = "fallback.domain" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-cookie-fallback.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(mkConfig( - domains = Some(List(domain)), - fallbackDomain = Some(fallbackDomain) - )) + streamBad ).use { collector => val request1 = EventGenerator.mkTp2Event(collector.host, collector.port) - .withHeaders(Header("Origin", s"http://other.domain")) + .withHeaders(Header.Raw(ci"Origin", s"http://other.domain")) val request2 = EventGenerator.mkTp2Event(collector.host, collector.port) for { responses <- Http.responses(List(request1, request2)) } yield { - responses.flatMap(r => r.cookies.map( c => c.domain must beSome(fallbackDomain))) + responses.flatMap(r => r.cookies.map( c => c.domain must beSome("fallback.domain"))) } } } } - - private def mkConfig( - enabled: Boolean = true, - name: String = "sp", - expiration: FiniteDuration = 365 days, - domains: Option[List[String]] = None, - fallbackDomain: Option[String] = None, - secure: Boolean = false, - httpOnly: Boolean = false, - sameSite: String = "None" - ): String = { - s""" - { - "collector": { - "cookie": { - "enabled": $enabled, - "name": "$name", - "expiration": "${expiration.toString()}", - ${domains.fold("")(l => s""""domains": [${l.map(d => s""""$d"""").mkString(",")} ],""")} - ${fallbackDomain.fold("")(d => s""""fallbackDomain": "$d",""")} - "secure": $secure, - "httpOnly": $httpOnly, - "sameSite": "$sameSite" - } - } - } - """ - } } diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala index 7c69ed56e..b13fba8f3 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CustomPathsSpec.scala @@ -10,56 +10,36 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core -import scala.concurrent.duration._ - import cats.effect.IO - -import cats.effect.testing.specs2.CatsIO - +import cats.effect.testing.specs2.CatsEffect +import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ +import org.http4s.{Method, Request, Uri} import org.specs2.mutable.Specification -import org.http4s.{Request, Method, Uri} - -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis -import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http +import scala.concurrent.duration._ -class CustomPathsSpec extends Specification with Localstack with CatsIO { +class CustomPathsSpec extends Specification with Localstack with CatsEffect { override protected val Timeout = 5.minutes "collector" should { "map custom paths" in { val testName = "custom-paths" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" val originalPaths = List( "/acme/track", "/acme/redirect", "/acme/iglu" ) - val targetPaths = List( - "/com.snowplowanalytics.snowplow/tp2", - "/r/tp2", - "/com.snowplowanalytics.iglu/v1" - ) - val customPaths = originalPaths.zip(targetPaths) - val config = s""" - { - "collector": { - "paths": { - ${customPaths.map { case (k, v) => s""""$k": "$v""""}.mkString(",\n")} - } - } - }""" - Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-custom-paths.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(config) + streamBad ).use { collector => val requests = originalPaths.map { p => val uri = Uri.unsafeFromString(s"http://${collector.host}:${collector.port}$p") @@ -72,7 +52,11 @@ class CustomPathsSpec extends Specification with Localstack with CatsIO { collectorOutput <- Kinesis.readOutput(streamGood, streamBad) outputPaths = collectorOutput.good.map(cp => cp.getPath()) } yield { - outputPaths must beEqualTo(targetPaths) + outputPaths must beEqualTo(List( + "/com.snowplowanalytics.snowplow/tp2", + "/r/tp2", + "/com.snowplowanalytics.iglu/v1" + )) } } } diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala index 37f5b2f9c..43e521f17 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala @@ -10,22 +10,18 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core -import scala.concurrent.duration._ -import scala.collection.JavaConverters._ - import cats.effect.IO - -import cats.effect.testing.specs2.CatsIO - +import cats.effect.testing.specs2.CatsEffect +import com.snowplowanalytics.snowplow.collectors.scalastream.it.{EventGenerator, Http} +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ +import org.specs2.execute.PendingUntilFixed import org.specs2.mutable.Specification -import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http -import com.snowplowanalytics.snowplow.collectors.scalastream.it.EventGenerator - -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import scala.collection.JavaConverters._ +import scala.concurrent.duration._ -class DoNotTrackCookieSpec extends Specification with Localstack with CatsIO { +class DoNotTrackCookieSpec extends Specification with Localstack with CatsEffect with PendingUntilFixed { override protected val Timeout = 5.minutes @@ -34,16 +30,17 @@ class DoNotTrackCookieSpec extends Specification with Localstack with CatsIO { val cookieValue = "bar" "ignore events that have a cookie whose name and value match doNotTrackCookie config if enabled" in { + import cats.effect.unsafe.implicits.global + val testName = "doNotTrackCookie-enabled" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(mkConfig(true, cookieName, cookieValue)) + streamBad ).use { collector => val requests = List( EventGenerator.mkTp2Event(collector.host, collector.port).addCookie(cookieName, cookieName), @@ -63,20 +60,19 @@ class DoNotTrackCookieSpec extends Specification with Localstack with CatsIO { headers must haveSize(2) expected.forall(cookie => headers.exists(_.contains(cookie))) must beTrue } - } - } - - "track events that have a cookie whose name and value match doNotTrackCookie config if disabled" in { + }.unsafeRunSync() + }.pendingUntilFixed("Remove when 'do not track cookie' feature is implemented") + + "track events that have a cookie whose name and value match doNotTrackCookie config if disabled" in { val testName = "doNotTrackCookie-disabled" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( - "kinesis/src/it/resources/collector.hocon", + "kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon", testName, streamGood, - streamBad, - additionalConfig = Some(mkConfig(false, cookieName, cookieValue)) + streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port).addCookie(cookieName, cookieValue) @@ -98,17 +94,4 @@ class DoNotTrackCookieSpec extends Specification with Localstack with CatsIO { } } } - - private def mkConfig(enabled: Boolean, cookieName: String, cookieValue: String): String = - s""" - { - "collector": { - "doNotTrackCookie": { - "enabled": $enabled, - "name" : "$cookieName", - "value": "$cookieValue" - } - } - } - """ } diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala index 9c25c834a..27afc2e9c 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/HealthEndpointSpec.scala @@ -10,29 +10,25 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core -import scala.concurrent.duration._ - import cats.effect.IO - -import cats.effect.testing.specs2.CatsIO - +import cats.effect.testing.specs2.CatsEffect +import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ +import org.http4s.{Method, Request, Uri} import org.specs2.mutable.Specification -import org.http4s.{Request, Method, Uri} - -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis -import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http +import scala.concurrent.duration._ -class HealthEndpointSpec extends Specification with Localstack with CatsIO { +class HealthEndpointSpec extends Specification with Localstack with CatsEffect { override protected val Timeout = 5.minutes "collector" should { "respond with 200 to /health endpoint after it has started" in { val testName = "health-endpoint" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( "kinesis/src/it/resources/collector.hocon", testName, diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala index cd21768bf..302e907cc 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/XForwardedForSpec.scala @@ -10,37 +10,29 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.core -import java.net.InetAddress - -import scala.concurrent.duration._ - import cats.data.NonEmptyList - import cats.effect.IO - -import cats.effect.testing.specs2.CatsIO - -import org.specs2.mutable.Specification - +import cats.effect.testing.specs2.CatsEffect +import com.comcast.ip4s.IpAddress +import com.snowplowanalytics.snowplow.collectors.scalastream.it.{EventGenerator, Http} +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ import org.http4s.headers.`X-Forwarded-For` +import org.specs2.mutable.Specification -import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http -import com.snowplowanalytics.snowplow.collectors.scalastream.it.EventGenerator - -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import scala.concurrent.duration._ -class XForwardedForSpec extends Specification with Localstack with CatsIO { +class XForwardedForSpec extends Specification with Localstack with CatsEffect { override protected val Timeout = 5.minutes "collector" should { "put X-Forwarded-For header in the collector payload" in { val testName = "X-Forwarded-For" - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" - val ip = InetAddress.getByName("123.123.123.123") + val ip = IpAddress.fromString("123.123.123.123") Collector.container( "kinesis/src/it/resources/collector.hocon", @@ -49,7 +41,7 @@ class XForwardedForSpec extends Specification with Localstack with CatsIO { streamBad ).use { collector => val request = EventGenerator.mkTp2Event(collector.host, collector.port) - .withHeaders(`X-Forwarded-For`(NonEmptyList.one(Some(ip)))) + .withHeaders(`X-Forwarded-For`(NonEmptyList.one(ip))) for { _ <- Http.status(request) diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala index d606b2e36..1a5f2c3d2 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/KinesisCollectorSpec.scala @@ -10,24 +10,18 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis -import scala.concurrent.duration._ - import cats.effect.IO - -import cats.effect.testing.specs2.CatsIO - -import org.http4s.{Request, Method, Uri, Status} - -import org.specs2.mutable.Specification - -import org.testcontainers.containers.GenericContainer - +import cats.effect.testing.specs2.CatsEffect +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ import com.snowplowanalytics.snowplow.collectors.scalastream.it.{EventGenerator, Http} +import org.http4s.{Method, Request, Status, Uri} +import org.specs2.mutable.Specification +import org.testcontainers.containers.GenericContainer -import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ +import scala.concurrent.duration._ -class KinesisCollectorSpec extends Specification with Localstack with CatsIO { +class KinesisCollectorSpec extends Specification with Localstack with CatsEffect { override protected val Timeout = 5.minutes @@ -39,10 +33,10 @@ class KinesisCollectorSpec extends Specification with Localstack with CatsIO { Collector.container( "examples/config.kinesis.minimal.hocon", testName, - s"${testName}-raw", - s"${testName}-bad-1" + s"$testName-raw", + s"$testName-bad-1" ).use { collector => - IO(collector.container.getLogs() must contain(("REST interface bound to"))) + IO(collector.container.getLogs() must contain(("Service bound to address"))) } } @@ -50,8 +44,8 @@ class KinesisCollectorSpec extends Specification with Localstack with CatsIO { val testName = "count" val nbGood = 1000 val nbBad = 10 - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( "kinesis/src/it/resources/collector.hocon", @@ -85,8 +79,8 @@ class KinesisCollectorSpec extends Specification with Localstack with CatsIO { Collector.container( "kinesis/src/it/resources/collector.hocon", testName, - s"${testName}-raw", - s"${testName}-bad-1" + s"$testName-raw", + s"$testName-bad-1" ).use { collector => val container = collector.container for { @@ -95,7 +89,7 @@ class KinesisCollectorSpec extends Specification with Localstack with CatsIO { _ <- waitWhile[GenericContainer[_]](container, _.isRunning, stopTimeout) } yield { container.isRunning() must beFalse - container.getLogs() must contain("Server terminated") + container.getLogs() must contain("Closing NIO1 channel") } } } @@ -104,8 +98,8 @@ class KinesisCollectorSpec extends Specification with Localstack with CatsIO { val testName = "sink-health" val nbGood = 10 val nbBad = 10 - val streamGood = s"${testName}-raw" - val streamBad = s"${testName}-bad-1" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" Collector.container( "kinesis/src/it/resources/collector.hocon", diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala index 2a5b44e37..64f2e601b 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/containers/Collector.scala @@ -10,17 +10,13 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers -import org.testcontainers.containers.BindMode -import org.testcontainers.containers.wait.strategy.Wait - -import com.dimafeng.testcontainers.GenericContainer - import cats.effect.{IO, Resource} - -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.ProjectMetadata - -import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ +import com.dimafeng.testcontainers.GenericContainer +import com.snowplowanalytics.snowplow.collectors.scalastream.BuildInfo import com.snowplowanalytics.snowplow.collectors.scalastream.it.CollectorContainer +import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ +import org.testcontainers.containers.BindMode +import org.testcontainers.containers.wait.strategy.Wait object Collector { @@ -36,7 +32,7 @@ object Collector { additionalConfig: Option[String] = None ): Resource[IO, CollectorContainer] = { val container = GenericContainer( - dockerImage = s"snowplow/scala-stream-collector-kinesis:${ProjectMetadata.dockerTag}", + dockerImage = BuildInfo.dockerAlias, env = Map( "AWS_ACCESS_KEY_ID" -> "whatever", "AWS_SECRET_ACCESS_KEY" -> "whatever", @@ -46,7 +42,8 @@ object Collector { "REGION" -> Localstack.region, "KINESIS_ENDPOINT" -> Localstack.privateEndpoint, "MAX_BYTES" -> maxBytes.toString, - "JDK_JAVA_OPTIONS" -> "-Dorg.slf4j.simpleLogger.log.com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink=warn" + "JDK_JAVA_OPTIONS" -> "-Dorg.slf4j.simpleLogger.log.com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink=warn", + "HTTP4S_BACKEND" -> "BLAZE" ) ++ configParameters(additionalConfig), exposedPorts = Seq(port), fileSystemBind = Seq( @@ -60,7 +57,7 @@ object Collector { "--config", "/snowplow/config/collector.hocon" ), - waitStrategy = Wait.forLogMessage(s".*REST interface bound to.*", 1) + waitStrategy = Wait.forLogMessage(s".*Service bound to address.*", 1) ) container.container.withNetwork(Localstack.network) diff --git a/kinesis/src/main/resources/application.conf b/kinesis/src/main/resources/application.conf index 7331140b0..49ee01e22 100644 --- a/kinesis/src/main/resources/application.conf +++ b/kinesis/src/main/resources/application.conf @@ -1,4 +1,4 @@ -collector { +{ streams { sink { enabled = kinesis @@ -29,26 +29,3 @@ collector { } } - - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off - } -} diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala index 9209debc9..7223f5bf1 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala @@ -10,57 +10,41 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream +import cats.effect.{IO, Resource} +import com.snowplowanalytics.snowplow.collector.core.model.Sinks +import com.snowplowanalytics.snowplow.collector.core.{App, Config} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.{KinesisSink, KinesisSinkConfig} +import org.slf4j.LoggerFactory + import java.util.concurrent.ScheduledThreadPoolExecutor -import cats.syntax.either._ -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService -object KinesisCollector extends Collector { - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion +object KinesisCollector extends App[KinesisSinkConfig](BuildInfo) { + + private lazy val log = LoggerFactory.getLogger(getClass) - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks: Either[Throwable, CollectorSinks] = for { - kc <- collectorConf.streams.sink match { - case kc: Kinesis => kc.asRight - case _ => new IllegalArgumentException("Configured sink is not Kinesis").asLeft - } - es = buildExecutorService(kc) - goodStream = collectorConf.streams.good - badStream = collectorConf.streams.bad - bufferConf = collectorConf.streams.buffer - sqsGood = kc.sqsGoodBuffer - sqsBad = kc.sqsBadBuffer - good <- KinesisSink.createAndInitialize( - kc.maxBytes, - kc, - bufferConf, - goodStream, - sqsGood, - es + override def mkSinks(config: Config.Streams[KinesisSinkConfig]): Resource[IO, Sinks[IO]] = { + val threadPoolExecutor = buildExecutorService(config.sink) + for { + good <- KinesisSink.create[IO]( + kinesisMaxBytes = config.sink.maxBytes, + kinesisConfig = config.sink, + bufferConfig = config.buffer, + streamName = config.good, + sqsBufferName = config.sink.sqsGoodBuffer, + threadPoolExecutor ) - bad <- KinesisSink.createAndInitialize( - kc.maxBytes, - kc, - bufferConf, - badStream, - sqsBad, - es + bad <- KinesisSink.create[IO]( + kinesisMaxBytes = config.sink.maxBytes, + kinesisConfig = config.sink, + bufferConfig = config.buffer, + streamName = config.bad, + sqsBufferName = config.sink.sqsBadBuffer, + threadPoolExecutor ) - } yield CollectorSinks(good, bad) - - sinks match { - case Right(s) => run(collectorConf, akkaConf, s, telemetry) - case Left(e) => throw e - } + } yield Sinks(good, bad) } - def buildExecutorService(kc: Kinesis): ScheduledThreadPoolExecutor = { + def buildExecutorService(kc: KinesisSinkConfig): ScheduledThreadPoolExecutor = { log.info("Creating thread pool of size " + kc.threadPoolSize) new ScheduledThreadPoolExecutor(kc.threadPoolSize) } diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala index 2c76850f1..eb4841bd6 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala @@ -11,38 +11,39 @@ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks -import java.nio.ByteBuffer -import java.util.concurrent.ScheduledExecutorService -import java.util.UUID - -import scala.collection.JavaConverters._ -import scala.collection.mutable.ListBuffer -import scala.concurrent.{ExecutionContextExecutorService, Future} -import scala.concurrent.duration._ -import scala.util.{Failure, Success, Try} - +import cats.effect.{Resource, Sync} +import cats.implicits.catsSyntaxMonadErrorRethrow import cats.syntax.either._ - import com.amazonaws.auth._ import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration -import com.amazonaws.services.kinesis.{AmazonKinesis, AmazonKinesisClientBuilder} import com.amazonaws.services.kinesis.model._ -import com.amazonaws.services.sqs.{AmazonSQS, AmazonSQSClientBuilder} +import com.amazonaws.services.kinesis.{AmazonKinesis, AmazonKinesisClientBuilder} import com.amazonaws.services.sqs.model.{MessageAttributeValue, SendMessageBatchRequest, SendMessageBatchRequestEntry} +import com.amazonaws.services.sqs.{AmazonSQS, AmazonSQSClientBuilder} +import com.snowplowanalytics.snowplow.collector.core.{Config, Sink} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink._ +import org.slf4j.LoggerFactory -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink.SqsClientAndName +import java.nio.ByteBuffer +import java.util.UUID +import java.util.concurrent.ScheduledExecutorService +import scala.collection.JavaConverters._ +import scala.collection.mutable.ListBuffer +import scala.concurrent.duration._ +import scala.concurrent.{ExecutionContextExecutorService, Future} +import scala.util.{Failure, Success, Try} -class KinesisSink private ( +class KinesisSink[F[_]: Sync] private ( val maxBytes: Int, client: AmazonKinesis, - kinesisConfig: Kinesis, - bufferConfig: BufferConfig, + kinesisConfig: KinesisSinkConfig, + bufferConfig: Config.Buffer, streamName: String, executorService: ScheduledExecutorService, maybeSqs: Option[SqsClientAndName] -) extends Sink { - import KinesisSink._ +) extends Sink[F] { + + private lazy val log = LoggerFactory.getLogger(getClass) maybeSqs match { case Some(sqs) => @@ -69,10 +70,10 @@ class KinesisSink private ( @volatile private var kinesisHealthy: Boolean = false @volatile private var sqsHealthy: Boolean = false - override def isHealthy: Boolean = kinesisHealthy || sqsHealthy + override def isHealthy: F[Boolean] = Sync[F].pure(kinesisHealthy || sqsHealthy) - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - events.foreach(e => EventStorage.store(e, key)) + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + Sync[F].delay(events.foreach(e => EventStorage.store(e, key))) object EventStorage { private val storedEvents = ListBuffer.empty[Events] @@ -442,14 +443,38 @@ object KinesisSink { * Exists so that no threads can get a reference to the KinesisSink * during its construction. */ - def createAndInitialize( + def create[F[_]: Sync]( + kinesisMaxBytes: Int, + kinesisConfig: KinesisSinkConfig, + bufferConfig: Config.Buffer, + streamName: String, + sqsBufferName: Option[String], + executorService: ScheduledExecutorService + ): Resource[F, KinesisSink[F]] = { + val acquire = + Sync[F] + .delay( + createAndInitialize(kinesisMaxBytes, kinesisConfig, bufferConfig, streamName, sqsBufferName, executorService) + ) + .rethrow + val release = (sink: KinesisSink[F]) => Sync[F].delay(sink.shutdown()) + + Resource.make(acquire)(release) + } + + /** + * Create a KinesisSink and schedule a task to flush its EventStorage. + * Exists so that no threads can get a reference to the KinesisSink + * during its construction. + */ + private def createAndInitialize[F[_]: Sync]( kinesisMaxBytes: Int, - kinesisConfig: Kinesis, - bufferConfig: BufferConfig, + kinesisConfig: KinesisSinkConfig, + bufferConfig: Config.Buffer, streamName: String, sqsBufferName: Option[String], executorService: ScheduledExecutorService - ): Either[Throwable, KinesisSink] = { + ): Either[Throwable, KinesisSink[F]] = { val clients = for { provider <- getProvider(kinesisConfig.aws) kinesisClient <- createKinesisClient(provider, kinesisConfig.endpoint, kinesisConfig.region) @@ -478,7 +503,7 @@ object KinesisSink { } /** Create an aws credentials provider through env variables and iam. */ - private def getProvider(awsConfig: AWSConfig): Either[Throwable, AWSCredentialsProvider] = { + private def getProvider(awsConfig: KinesisSinkConfig.AWSConfig): Either[Throwable, AWSCredentialsProvider] = { def isDefault(key: String): Boolean = key == "default" def isIam(key: String): Boolean = key == "iam" def isEnv(key: String): Boolean = key == "env" diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala new file mode 100644 index 000000000..9942b0768 --- /dev/null +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala @@ -0,0 +1,37 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import com.snowplowanalytics.snowplow.collector.core.Config +import io.circe.Decoder +import io.circe.generic.semiauto._ +import io.circe.config.syntax.durationDecoder + +import scala.concurrent.duration.FiniteDuration + +final case class KinesisSinkConfig( + maxBytes: Int, + region: String, + threadPoolSize: Int, + aws: KinesisSinkConfig.AWSConfig, + backoffPolicy: KinesisSinkConfig.BackoffPolicy, + customEndpoint: Option[String], + sqsGoodBuffer: Option[String], + sqsBadBuffer: Option[String], + sqsMaxBytes: Int, + startupCheckInterval: FiniteDuration +) extends Config.Sink { + val endpoint = customEndpoint.getOrElse(region match { + case cn @ "cn-north-1" => s"https://kinesis.$cn.amazonaws.com.cn" + case cn @ "cn-northwest-1" => s"https://kinesis.$cn.amazonaws.com.cn" + case _ => s"https://kinesis.$region.amazonaws.com" + }) +} + +object KinesisSinkConfig { + final case class AWSConfig(accessKey: String, secretKey: String) + + final case class BackoffPolicy(minBackoff: Long, maxBackoff: Long, maxRetries: Int) + implicit val configDecoder: Decoder[KinesisSinkConfig] = deriveDecoder[KinesisSinkConfig] + implicit val awsConfigDecoder: Decoder[AWSConfig] = deriveDecoder[AWSConfig] + implicit val backoffPolicyConfigDecoder: Decoder[BackoffPolicy] = + deriveDecoder[BackoffPolicy] +} diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index c2a1e8ba8..1557fe88e 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -10,8 +10,123 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec +import cats.effect.testing.specs2.CatsEffect +import cats.effect.{ExitCode, IO} +import com.snowplowanalytics.snowplow.collector.core.{Config, ConfigParser} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSinkConfig +import org.http4s.SameSite +import org.specs2.mutable.Specification + +import java.nio.file.Paths +import scala.concurrent.duration.DurationInt + +class KinesisConfigSpec extends Specification with CatsEffect { + + "Config parser" should { + "be able to parse extended kinesis config" in { + assert( + resource = "/config.kinesis.extended.hocon", + expectedResult = Right(KinesisConfigSpec.expectedConfig) + ) + } + "be able to parse minimal kinesis config" in { + assert( + resource = "/config.kinesis.minimal.hocon", + expectedResult = Right(KinesisConfigSpec.expectedConfig) + ) + } + } + + private def assert(resource: String, expectedResult: Either[ExitCode, Config[KinesisSinkConfig]]) = { + val path = Paths.get(getClass.getResource(resource).toURI) + ConfigParser.fromPath[IO, KinesisSinkConfig](Some(path)).value.map { result => + result must beEqualTo(expectedResult) + } + } +} + +object KinesisConfigSpec { + + private val expectedConfig = Config[KinesisSinkConfig]( + interface = "0.0.0.0", + port = 8080, + paths = Map.empty[String, String], + p3p = Config.P3P( + policyRef = "/w3c/p3p.xml", + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + ), + crossDomain = Config.CrossDomain( + enabled = false, + domains = List("*"), + secure = true + ), + cookie = Config.Cookie( + enabled = true, + expiration = 365.days, + name = "sp", + domains = List.empty, + fallbackDomain = None, + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ), + doNotTrackCookie = Config.DoNotTrackCookie( + enabled = false, + name = "", + value = "" + ), + cookieBounce = Config.CookieBounce( + enabled = false, + name = "n3pc", + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000", + forwardedProtocolHeader = None + ), + redirectMacro = Config.RedirectMacro( + enabled = false, + placeholder = None + ), + rootResponse = Config.RootResponse( + enabled = false, + statusCode = 302, + headers = Map.empty[String, String], + body = "" + ), + cors = Config.CORS(1.hour), + monitoring = + Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + ssl = Config.SSL(enable = false, redirect = false, port = 443), + enableDefaultRedirect = false, + redirectDomains = Set.empty, + preTerminationPeriod = 10.seconds, + streams = Config.Streams( + good = "good", + bad = "bad", + useIpAddressAsPartitionKey = false, + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + sink = KinesisSinkConfig( + maxBytes = 1000000, + region = "eu-central-1", + threadPoolSize = 10, + aws = KinesisSinkConfig.AWSConfig( + accessKey = "iam", + secretKey = "iam" + ), + backoffPolicy = KinesisSinkConfig.BackoffPolicy( + minBackoff = 500, + maxBackoff = 1500, + maxRetries = 3 + ), + sqsBadBuffer = None, + sqsGoodBuffer = None, + sqsMaxBytes = 192000, + customEndpoint = None, + startupCheckInterval = 1.second + ) + ) + ) -class KinesisConfigSpec extends ConfigSpec { - makeConfigTest("kinesis", "", "") } From d905982c247304fd899d9590a67e4767252732b2 Mon Sep 17 00:00:00 2001 From: spenes Date: Wed, 6 Sep 2023 01:57:29 +0300 Subject: [PATCH 17/39] Add http4s SQS sink (close #378) --- build.sbt | 8 +- examples/config.sqs.extended.hocon | 13 -- sqs/src/main/resources/application.conf | 27 ----- .../SqsCollector.scala | 62 ++++------ .../sinks/SqsSink.scala | 95 +++++++-------- .../sinks/SqsSinkConfig.scala | 22 ++++ .../SqsConfigSpec.scala | 112 +++++++++++++++++- 7 files changed, 199 insertions(+), 140 deletions(-) create mode 100644 sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala diff --git a/build.sbt b/build.sbt index ee4f8923c..110231d36 100644 --- a/build.sbt +++ b/build.sbt @@ -184,10 +184,12 @@ lazy val kinesisDistroless = project .configs(IntegrationTest) lazy val sqsSettings = - allSettings ++ buildInfoSettings ++ Seq( + allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( moduleName := "snowplow-stream-collector-sqs", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", Docker / packageName := "scala-stream-collector-sqs", libraryDependencies ++= Seq( + Dependencies.Libraries.catsRetry, Dependencies.Libraries.sqs, Dependencies.Libraries.sts, ) @@ -196,14 +198,14 @@ lazy val sqsSettings = lazy val sqs = project .settings(sqsSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile") lazy val sqsDistroless = project .in(file("distroless/sqs")) .settings(sourceDirectory := (sqs / sourceDirectory).value) .settings(sqsSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile") lazy val pubsubSettings = allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( diff --git a/examples/config.sqs.extended.hocon b/examples/config.sqs.extended.hocon index c48a6c461..4534ab4e5 100644 --- a/examples/config.sqs.extended.hocon +++ b/examples/config.sqs.extended.hocon @@ -181,19 +181,6 @@ collector { # Thread pool size for Kinesis and SQS API requests threadPoolSize = 10 - - # The following are used to authenticate for the Amazon Kinesis and SQS sinks. - # If both are set to 'default', the default provider chain is used - # (see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html) - # If both are set to 'iam', use AWS IAM Roles to provision credentials. - # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY - aws { - accessKey = iam - accessKey = ${?COLLECTOR_STREAMS_SINK_AWS_ACCESS_KEY} - secretKey = iam - secretKey = ${?COLLECTOR_STREAMS_SINK_AWS_SECRET_KEY} - } - # Optional backoffPolicy { # Minimum backoff period in milliseconds diff --git a/sqs/src/main/resources/application.conf b/sqs/src/main/resources/application.conf index 0c6651bd5..a862f2b43 100644 --- a/sqs/src/main/resources/application.conf +++ b/sqs/src/main/resources/application.conf @@ -4,11 +4,6 @@ collector { enabled = sqs threadPoolSize = 10 - aws { - accessKey = iam - secretKey = iam - } - backoffPolicy { minBackoff = 500 maxBackoff = 1500 @@ -27,25 +22,3 @@ collector { } } } - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off - } -} diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala index 2e3bf14cf..481649f3a 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala @@ -11,48 +11,32 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import java.util.concurrent.ScheduledThreadPoolExecutor -import cats.syntax.either._ -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.SqsSink -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService -object SqsCollector extends Collector { - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion +import cats.effect.{IO, Resource} - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks: Either[Throwable, CollectorSinks] = for { - sqs <- collectorConf.streams.sink match { - case sqs: Sqs => sqs.asRight - case sink => new IllegalArgumentException(s"Configured sink $sink is not SQS.").asLeft - } - es = new ScheduledThreadPoolExecutor(sqs.threadPoolSize) - goodQueue = collectorConf.streams.good - badQueue = collectorConf.streams.bad - bufferConf = collectorConf.streams.buffer - good <- SqsSink.createAndInitialize( - sqs.maxBytes, - sqs, - bufferConf, - goodQueue, - es +import com.snowplowanalytics.snowplow.collector.core.model.Sinks +import com.snowplowanalytics.snowplow.collector.core.{App, Config} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ + +object SqsCollector extends App[SqsSinkConfig](BuildInfo) { + + override def mkSinks(config: Config.Streams[SqsSinkConfig]): Resource[IO, Sinks[IO]] = { + val threadPoolExecutor = new ScheduledThreadPoolExecutor(config.sink.threadPoolSize) + for { + good <- SqsSink.create[IO]( + config.sink.maxBytes, + config.sink, + config.buffer, + config.good, + threadPoolExecutor ) - bad <- SqsSink.createAndInitialize( - sqs.maxBytes, - sqs, - bufferConf, - badQueue, - es + bad <- SqsSink.create[IO]( + config.sink.maxBytes, + config.sink, + config.buffer, + config.bad, + threadPoolExecutor ) - } yield CollectorSinks(good, bad) - - sinks match { - case Right(s) => run(collectorConf, akkaConf, s, telemetry) - case Left(e) => throw e - } + } yield Sinks(good, bad) } } diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala index 6ffe57f6f..aa9edf1dc 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala @@ -10,6 +10,11 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks +import cats.effect.{Resource, Sync} +import cats.implicits.catsSyntaxMonadErrorRethrow + +import org.slf4j.LoggerFactory + import java.nio.ByteBuffer import java.util.UUID import java.util.concurrent.ScheduledExecutorService @@ -22,29 +27,23 @@ import scala.collection.JavaConverters._ import cats.syntax.either._ -import com.amazonaws.auth.{ - AWSCredentialsProvider, - AWSStaticCredentialsProvider, - BasicAWSCredentials, - DefaultAWSCredentialsProviderChain, - EnvironmentVariableCredentialsProvider, - InstanceProfileCredentialsProvider -} import com.amazonaws.services.sqs.{AmazonSQS, AmazonSQSClientBuilder} import com.amazonaws.services.sqs.model.{MessageAttributeValue, SendMessageBatchRequest, SendMessageBatchRequestEntry} -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import com.snowplowanalytics.snowplow.collector.core.{Config, Sink} -class SqsSink private ( +class SqsSink[F[_]: Sync] private ( val maxBytes: Int, client: AmazonSQS, - sqsConfig: Sqs, - bufferConfig: BufferConfig, + sqsConfig: SqsSinkConfig, + bufferConfig: Config.Buffer, queueName: String, executorService: ScheduledExecutorService -) extends Sink { +) extends Sink[F] { import SqsSink._ + private lazy val log = LoggerFactory.getLogger(getClass()) + private val ByteThreshold: Long = bufferConfig.byteLimit private val RecordThreshold: Long = bufferConfig.recordLimit private val TimeThreshold: Long = bufferConfig.timeLimit @@ -60,10 +59,10 @@ class SqsSink private ( concurrent.ExecutionContext.fromExecutorService(executorService) @volatile private var sqsHealthy: Boolean = false - override def isHealthy: Boolean = sqsHealthy + override def isHealthy: F[Boolean] = Sync[F].pure(sqsHealthy) - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - events.foreach(e => EventStorage.store(e, key)) + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + Sync[F].delay(events.foreach(e => EventStorage.store(e, key))) object EventStorage { private val storedEvents = ListBuffer.empty[Events] @@ -279,22 +278,39 @@ object SqsSink { // Details about why messages failed to be written to SQS. final case class BatchResultErrorInfo(code: String, message: String) + def create[F[_]: Sync]( + maxBytes: Int, + sqsConfig: SqsSinkConfig, + bufferConfig: Config.Buffer, + queueName: String, + executorService: ScheduledExecutorService + ): Resource[F, SqsSink[F]] = { + val acquire = + Sync[F] + .delay( + createAndInitialize(maxBytes, sqsConfig, bufferConfig, queueName, executorService) + ) + .rethrow + val release = (sink: SqsSink[F]) => Sync[F].delay(sink.shutdown()) + + Resource.make(acquire)(release) + } + /** * Create an SqsSink and schedule a task to flush its EventStorage. * Exists so that no threads can get a reference to the SqsSink * during its construction. */ - def createAndInitialize( + def createAndInitialize[F[_]: Sync]( maxBytes: Int, - sqsConfig: Sqs, - bufferConfig: BufferConfig, + sqsConfig: SqsSinkConfig, + bufferConfig: Config.Buffer, queueName: String, executorService: ScheduledExecutorService - ): Either[Throwable, SqsSink] = { - val client = for { - provider <- getProvider(sqsConfig.aws) - client <- createSqsClient(provider, sqsConfig.region) - } yield client + ): Either[Throwable, SqsSink[F]] = { + val client = Either.catchNonFatal( + AmazonSQSClientBuilder.standard().withRegion(sqsConfig.region).build + ) client.map { c => val sqsSink = new SqsSink(maxBytes, c, sqsConfig, bufferConfig, queueName, executorService) @@ -303,35 +319,4 @@ object SqsSink { sqsSink } } - - /** Create an aws credentials provider through env variables and iam. */ - private def getProvider(awsConfig: AWSConfig): Either[Throwable, AWSCredentialsProvider] = { - def isDefault(key: String): Boolean = key == "default" - def isIam(key: String): Boolean = key == "iam" - def isEnv(key: String): Boolean = key == "env" - - ((awsConfig.accessKey, awsConfig.secretKey) match { - case (a, s) if isDefault(a) && isDefault(s) => - new DefaultAWSCredentialsProviderChain().asRight - case (a, s) if isDefault(a) || isDefault(s) => - "accessKey and secretKey must both be set to 'default' or neither".asLeft - case (a, s) if isIam(a) && isIam(s) => - InstanceProfileCredentialsProvider.getInstance().asRight - case (a, s) if isIam(a) && isIam(s) => - "accessKey and secretKey must both be set to 'iam' or neither".asLeft - case (a, s) if isEnv(a) && isEnv(s) => - new EnvironmentVariableCredentialsProvider().asRight - case (a, s) if isEnv(a) || isEnv(s) => - "accessKey and secretKey must both be set to 'env' or neither".asLeft - case _ => - new AWSStaticCredentialsProvider( - new BasicAWSCredentials(awsConfig.accessKey, awsConfig.secretKey) - ).asRight - }).leftMap(new IllegalArgumentException(_)) - } - - private def createSqsClient(provider: AWSCredentialsProvider, region: String): Either[Throwable, AmazonSQS] = - Either.catchNonFatal( - AmazonSQSClientBuilder.standard().withRegion(region).withCredentials(provider).build - ) } diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala new file mode 100644 index 000000000..7db8b879f --- /dev/null +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala @@ -0,0 +1,22 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import io.circe.Decoder +import io.circe.generic.semiauto._ + +import com.snowplowanalytics.snowplow.collector.core.Config + +final case class SqsSinkConfig( + maxBytes: Int, + region: String, + backoffPolicy: SqsSinkConfig.BackoffPolicyConfig, + threadPoolSize: Int +) extends Config.Sink + +object SqsSinkConfig { + final case class AWSConfig(accessKey: String, secretKey: String) + + final case class BackoffPolicyConfig(minBackoff: Long, maxBackoff: Long, maxRetries: Int) + + implicit val configDecoder: Decoder[SqsSinkConfig] = deriveDecoder[SqsSinkConfig] + implicit val backoffPolicyDecoder: Decoder[BackoffPolicyConfig] = deriveDecoder[BackoffPolicyConfig] +} diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index 84f955a0e..1781008ff 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -10,8 +10,114 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec +import cats.effect.testing.specs2.CatsEffect +import cats.effect.{ExitCode, IO} +import com.snowplowanalytics.snowplow.collector.core.{Config, ConfigParser} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.SqsSinkConfig +import org.http4s.SameSite +import org.specs2.mutable.Specification + +import java.nio.file.Paths +import scala.concurrent.duration.DurationInt + +class SqsConfigSpec extends Specification with CatsEffect { + + "Config parser" should { + "be able to parse extended kinesis config" in { + assert( + resource = "/config.sqs.extended.hocon", + expectedResult = Right(SqsConfigSpec.expectedConfig) + ) + } + "be able to parse minimal kinesis config" in { + assert( + resource = "/config.sqs.minimal.hocon", + expectedResult = Right(SqsConfigSpec.expectedConfig) + ) + } + } + + private def assert(resource: String, expectedResult: Either[ExitCode, Config[SqsSinkConfig]]) = { + val path = Paths.get(getClass.getResource(resource).toURI) + ConfigParser.fromPath[IO, SqsSinkConfig](Some(path)).value.map { result => + result must beEqualTo(expectedResult) + } + } +} + +object SqsConfigSpec { + + private val expectedConfig = Config[SqsSinkConfig]( + interface = "0.0.0.0", + port = 8080, + paths = Map.empty[String, String], + p3p = Config.P3P( + policyRef = "/w3c/p3p.xml", + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + ), + crossDomain = Config.CrossDomain( + enabled = false, + domains = List("*"), + secure = true + ), + cookie = Config.Cookie( + enabled = true, + expiration = 365.days, + name = "sp", + domains = List.empty, + fallbackDomain = None, + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ), + doNotTrackCookie = Config.DoNotTrackCookie( + enabled = false, + name = "", + value = "" + ), + cookieBounce = Config.CookieBounce( + enabled = false, + name = "n3pc", + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000", + forwardedProtocolHeader = None + ), + redirectMacro = Config.RedirectMacro( + enabled = false, + placeholder = None + ), + rootResponse = Config.RootResponse( + enabled = false, + statusCode = 302, + headers = Map.empty[String, String], + body = "" + ), + cors = Config.CORS(1.hour), + monitoring = + Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + ssl = Config.SSL(enable = false, redirect = false, port = 443), + enableDefaultRedirect = false, + redirectDomains = Set.empty, + preTerminationPeriod = 10.seconds, + streams = Config.Streams( + good = "good", + bad = "bad", + useIpAddressAsPartitionKey = false, + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + sink = SqsSinkConfig( + maxBytes = 192000, + region = "eu-central-1", + backoffPolicy = SqsSinkConfig.BackoffPolicyConfig( + minBackoff = 500, + maxBackoff = 1500, + maxRetries = 3 + ), + threadPoolSize = 10 + ) + ) + ) -class SqsConfigSpec extends ConfigSpec { - makeConfigTest("sqs", "", "") } From 5605aaa354e258a1772677563eec69b0daacdeba Mon Sep 17 00:00:00 2001 From: Benjamin Benoist Date: Thu, 14 Sep 2023 12:00:37 +0200 Subject: [PATCH 18/39] Use Blaze as default http4s backend (close #380) --- .../HttpServer.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala index bef93905d..7d0f76a8e 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala @@ -36,8 +36,8 @@ object HttpServer { secure: Boolean ): Resource[F, Server] = sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { - case Some("EMBER") | None => buildEmberServer[F](app, interface, port, secure) - case Some("BLAZE") => buildBlazeServer[F](app, port, secure) + case Some("BLAZE") | None => buildBlazeServer[F](app, port, secure) + case Some("EMBER") => buildEmberServer[F](app, interface, port, secure) case Some("NETTY") => buildNettyServer[F](app, port, secure) case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") } From 4559013ed7ec3daa555bbddc4582a5caee037321 Mon Sep 17 00:00:00 2001 From: spenes Date: Tue, 19 Sep 2023 16:46:22 +0300 Subject: [PATCH 19/39] Add telemetry support (close #381) --- build.sbt | 15 ++- examples/config.kinesis.extended.hocon | 7 ++ http4s/src/main/resources/reference.conf | 9 ++ .../App.scala | 6 +- .../AppInfo.scala | 1 + .../Config.scala | 21 +++- .../Run.scala | 42 +++++-- .../Telemetry.scala | 118 ++++++++++++++++++ .../TestUtils.scala | 16 ++- .../KinesisCollector.scala | 8 +- .../sinks/KinesisConfigSpec.scala | 32 ++++- project/Dependencies.scala | 41 +++--- .../PubSubCollector.scala | 8 +- .../ConfigSpec.scala | 13 ++ .../SqsCollector.scala | 10 +- .../SqsConfigSpec.scala | 13 ++ .../StdoutCollector.scala | 6 +- 17 files changed, 318 insertions(+), 48 deletions(-) create mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala diff --git a/build.sbt b/build.sbt index 110231d36..687ceee43 100644 --- a/build.sbt +++ b/build.sbt @@ -29,17 +29,17 @@ lazy val commonDependencies = Seq( Dependencies.Libraries.badRows, Dependencies.Libraries.collectorPayload, Dependencies.Libraries.pureconfig, - Dependencies.Libraries.trackerCore, - Dependencies.Libraries.trackerEmitterId, + Dependencies.Libraries.Legacy.trackerCore, + Dependencies.Libraries.Legacy.trackerEmitterId, // Unit tests Dependencies.Libraries.akkaTestkit, Dependencies.Libraries.akkaHttpTestkit, Dependencies.Libraries.akkaStreamTestkit, Dependencies.Libraries.specs2, // Integration tests - Dependencies.Libraries.LegacyIT.testcontainers, - Dependencies.Libraries.LegacyIT.http4sClient, - Dependencies.Libraries.LegacyIT.catsRetry + Dependencies.Libraries.Legacy.testcontainers, + Dependencies.Libraries.Legacy.http4sClient, + Dependencies.Libraries.Legacy.catsRetry ) lazy val commonExclusions = Seq( @@ -97,7 +97,7 @@ lazy val dynVerSettings = Seq( ) lazy val http4sBuildInfoSettings = Seq( - buildInfoKeys := Seq[BuildInfoKey](name, dockerAlias, version), + buildInfoKeys := Seq[BuildInfoKey](name, moduleName, dockerAlias, version), buildInfoOptions += BuildInfoOption.Traits("com.snowplowanalytics.snowplow.collector.core.AppInfo") ) @@ -130,6 +130,7 @@ lazy val http4s = project Dependencies.Libraries.http4sEmber, Dependencies.Libraries.http4sBlaze, Dependencies.Libraries.http4sNetty, + Dependencies.Libraries.http4sClient, Dependencies.Libraries.log4cats, Dependencies.Libraries.thrift, Dependencies.Libraries.badRows, @@ -138,6 +139,8 @@ lazy val http4s = project Dependencies.Libraries.decline, Dependencies.Libraries.circeGeneric, Dependencies.Libraries.circeConfig, + Dependencies.Libraries.trackerCore, + Dependencies.Libraries.emitterHttps, Dependencies.Libraries.specs2, Dependencies.Libraries.specs2CE, diff --git a/examples/config.kinesis.extended.hocon b/examples/config.kinesis.extended.hocon index 21b7b9360..a3ef7d3c7 100644 --- a/examples/config.kinesis.extended.hocon +++ b/examples/config.kinesis.extended.hocon @@ -267,6 +267,13 @@ collector { url = telemetry-g.snowplowanalytics.com port = 443 secure = true + + # Identifiers used by collector terraform module + userProvidedId = my_pipeline, + moduleName = collector-kinesis-ec2, + moduleVersion = 0.5.2, + instanceId = 665bhft5u6udjf, + autoGeneratedId = hfy67e5ydhtrd } monitoring.metrics.statsd { diff --git a/http4s/src/main/resources/reference.conf b/http4s/src/main/resources/reference.conf index 9bee1be0c..9ae8f6849 100644 --- a/http4s/src/main/resources/reference.conf +++ b/http4s/src/main/resources/reference.conf @@ -59,6 +59,15 @@ } } + telemetry { + disable = false + interval = 60 minutes + method = POST + url = telemetry-g.snowplowanalytics.com + port = 443 + secure = true + } + monitoring { metrics { statsd { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala index cb69be2f9..5bbce5762 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala @@ -8,6 +8,8 @@ import com.monovore.decline.Opts import io.circe.Decoder +import com.snowplowanalytics.snowplow.scalatracker.emitters.http4s.ceTracking + import com.snowplowanalytics.snowplow.collector.core.model.Sinks abstract class App[SinkConfig <: Config.Sink: Decoder](appInfo: AppInfo) @@ -19,7 +21,9 @@ abstract class App[SinkConfig <: Config.Sink: Decoder](appInfo: AppInfo) def mkSinks(config: Config.Streams[SinkConfig]): Resource[IO, Sinks[IO]] - final def main: Opts[IO[ExitCode]] = Run.fromCli[IO, SinkConfig](appInfo, mkSinks) + def telemetryInfo(config: Config[SinkConfig]): Telemetry.TelemetryInfo + + final def main: Opts[IO[ExitCode]] = Run.fromCli[IO, SinkConfig](appInfo, mkSinks, telemetryInfo) } object App { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala index 1215a8149..837252b72 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala @@ -2,6 +2,7 @@ package com.snowplowanalytics.snowplow.collector.core trait AppInfo { def name: String + def moduleName: String def version: String def dockerAlias: String } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index f274ccf1c..62e4c0d07 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -1,6 +1,6 @@ package com.snowplowanalytics.snowplow.collector.core -import scala.concurrent.duration.FiniteDuration +import scala.concurrent.duration._ import io.circe.config.syntax._ import io.circe.generic.semiauto._ @@ -23,6 +23,7 @@ case class Config[+SinkConfig]( cors: Config.CORS, streams: Config.Streams[SinkConfig], monitoring: Config.Monitoring, + telemetry: Config.Telemetry, ssl: Config.SSL, enableDefaultRedirect: Boolean, redirectDomains: Set[String], @@ -122,6 +123,23 @@ object Config { port: Int ) + final case class Telemetry( + // General params + disable: Boolean, + interval: FiniteDuration, + // http params + method: String, + url: String, + port: Int, + secure: Boolean, + // Params injected by deployment scripts + userProvidedId: Option[String], + moduleName: Option[String], + moduleVersion: Option[String], + instanceId: Option[String], + autoGeneratedId: Option[String] + ) + implicit def decoder[SinkConfig: Decoder]: Decoder[Config[SinkConfig]] = { implicit val p3p = deriveDecoder[P3P] implicit val crossDomain = deriveDecoder[CrossDomain] @@ -147,6 +165,7 @@ object Config { implicit val metrics = deriveDecoder[Metrics] implicit val monitoring = deriveDecoder[Monitoring] implicit val ssl = deriveDecoder[SSL] + implicit val telemetry = deriveDecoder[Telemetry] deriveDecoder[Config[SinkConfig]] } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index a221fdfec..92839bac0 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -13,32 +13,38 @@ import cats.data.EitherT import cats.effect.{Async, ExitCode, Sync} import cats.effect.kernel.Resource +import org.http4s.blaze.client.BlazeClientBuilder + import com.monovore.decline.Opts import io.circe.Decoder +import com.snowplowanalytics.snowplow.scalatracker.Tracking + import com.snowplowanalytics.snowplow.collector.core.model.Sinks object Run { implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] - def fromCli[F[_]: Async, SinkConfig: Decoder]( + def fromCli[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, - mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]] + mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], + telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo ): Opts[F[ExitCode]] = { val configPath = Opts.option[Path]("config", "Path to HOCON configuration (optional)", "c", "config.hocon").orNone - configPath.map(fromPath[F, SinkConfig](appInfo, mkSinks, _)) + configPath.map(fromPath[F, SinkConfig](appInfo, mkSinks, telemetryInfo, _)) } - private def fromPath[F[_]: Async, SinkConfig: Decoder]( + private def fromPath[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], + telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo, path: Option[Path] ): F[ExitCode] = { val eitherT = for { config <- ConfigParser.fromPath[F, SinkConfig](path) - _ <- EitherT.right[ExitCode](fromConfig(appInfo, mkSinks, config)) + _ <- EitherT.right[ExitCode](fromConfig(appInfo, mkSinks, config, telemetryInfo)) } yield ExitCode.Success eitherT.merge.handleErrorWith { e => @@ -47,10 +53,11 @@ object Run { } } - private def fromConfig[F[_]: Async, SinkConfig]( + private def fromConfig[F[_]: Async: Tracking, SinkConfig]( appInfo: AppInfo, mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], - config: Config[SinkConfig] + config: Config[SinkConfig], + telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo ): F[ExitCode] = { val resources = for { sinks <- mkSinks(config.streams) @@ -65,10 +72,23 @@ object Run { if (config.ssl.enable) config.ssl.port else config.port, config.ssl.enable ) - _ <- withGracefulShutdown(config.preTerminationPeriod)(httpServer) - } yield () - - resources.surround(Async[F].never[ExitCode]) + _ <- withGracefulShutdown(config.preTerminationPeriod)(httpServer) + httpClient <- BlazeClientBuilder[F].resource + } yield httpClient + + resources.use { httpClient => + Telemetry + .run( + config.telemetry, + httpClient, + appInfo, + telemetryInfo(config).region, + telemetryInfo(config).cloud + ) + .compile + .drain + .flatMap(_ => Async[F].never[ExitCode]) + } } private def prettyLogException[F[_]: Sync](e: Throwable): F[Unit] = { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala new file mode 100644 index 000000000..95df9bebc --- /dev/null +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala @@ -0,0 +1,118 @@ +package com.snowplowanalytics.snowplow.collector.core + +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import cats.data.NonEmptyList +import cats.implicits._ + +import cats.effect.{Async, Resource, Sync} +import cats.effect.std.Random + +import fs2.Stream + +import org.http4s.client.{Client => HttpClient} + +import _root_.io.circe.Json +import _root_.io.circe.syntax._ + +import com.snowplowanalytics.iglu.core.{SchemaKey, SchemaVer, SelfDescribingData} + +import com.snowplowanalytics.snowplow.scalatracker.{Tracker, Tracking} +import com.snowplowanalytics.snowplow.scalatracker.Emitter._ +import com.snowplowanalytics.snowplow.scalatracker.Emitter.{Result => TrackerResult} +import com.snowplowanalytics.snowplow.scalatracker.emitters.http4s.Http4sEmitter + +object Telemetry { + + implicit private def unsafeLogger[F[_]: Sync]: Logger[F] = + Slf4jLogger.getLogger[F] + + def run[F[_]: Async: Tracking]( + telemetryConfig: Config.Telemetry, + httpClient: HttpClient[F], + appInfo: AppInfo, + region: Option[String], + cloud: Option[String] + ): Stream[F, Unit] = + if (telemetryConfig.disable) + Stream.empty.covary[F] + else { + val sdj = makeHeartbeatEvent( + telemetryConfig, + region, + cloud, + appInfo.moduleName, + appInfo.version + ) + Stream.resource(initTracker(telemetryConfig, appInfo.moduleName, httpClient)).flatMap { tracker => + Stream.fixedDelay[F](telemetryConfig.interval).evalMap { _ => + tracker.trackSelfDescribingEvent(unstructEvent = sdj) >> tracker.flushEmitters() + } + } + } + + private def initTracker[F[_]: Async: Tracking]( + config: Config.Telemetry, + appName: String, + client: HttpClient[F] + ): Resource[F, Tracker[F]] = + for { + random <- Resource.eval(Random.scalaUtilRandom[F]) + emitter <- { + implicit val r: Random[F] = random + Http4sEmitter.build( + EndpointParams(config.url, port = Some(config.port), https = config.secure), + client, + retryPolicy = RetryPolicy.MaxAttempts(10), + callback = Some(emitterCallback[F] _) + ) + } + } yield new Tracker(NonEmptyList.of(emitter), "tracker-telemetry", appName) + + private def emitterCallback[F[_]: Sync]( + params: EndpointParams, + req: Request, + res: TrackerResult + ): F[Unit] = + res match { + case TrackerResult.Success(_) => + Logger[F].debug(s"Telemetry heartbeat successfully sent to ${params.getGetUri}") + case TrackerResult.Failure(code) => + Logger[F].warn(s"Sending telemetry hearbeat got unexpected HTTP code $code from ${params.getUri}") + case TrackerResult.TrackerFailure(exception) => + Logger[F].warn( + s"Telemetry hearbeat failed to reach ${params.getUri} with following exception $exception after ${req.attempt} attempts" + ) + case TrackerResult.RetriesExceeded(failure) => + Logger[F].error(s"Stopped trying to send telemetry heartbeat after following failure: $failure") + } + + private def makeHeartbeatEvent( + teleCfg: Config.Telemetry, + region: Option[String], + cloud: Option[String], + appName: String, + appVersion: String + ): SelfDescribingData[Json] = + SelfDescribingData( + SchemaKey("com.snowplowanalytics.oss", "oss_context", "jsonschema", SchemaVer.Full(1, 0, 1)), + Json.obj( + "userProvidedId" -> teleCfg.userProvidedId.asJson, + "autoGeneratedId" -> teleCfg.autoGeneratedId.asJson, + "moduleName" -> teleCfg.moduleName.asJson, + "moduleVersion" -> teleCfg.moduleVersion.asJson, + "instanceId" -> teleCfg.instanceId.asJson, + "appGeneratedId" -> java.util.UUID.randomUUID.toString.asJson, + "cloud" -> cloud.asJson, + "region" -> region.asJson, + "applicationName" -> appName.asJson, + "applicationVersion" -> appVersion.asJson + ) + ) + + case class TelemetryInfo( + region: Option[String], + cloud: Option[String] + ) +} diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 6ef978288..c465521ce 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -14,6 +14,7 @@ object TestUtils { val appInfo = new AppInfo { def name = appName + def moduleName = appName def version = appVersion def dockerAlias = "docker run collector" } @@ -102,6 +103,19 @@ object TestUtils { ), enableDefaultRedirect = false, redirectDomains = Set.empty[String], - preTerminationPeriod = 10.seconds + preTerminationPeriod = 10.seconds, + telemetry = Config.Telemetry( + disable = false, + interval = 60.minutes, + method = "POST", + url = "telemetry-g.snowplowanalytics.com", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None + ) ) } diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala index 7223f5bf1..926f336c9 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala @@ -12,7 +12,7 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.effect.{IO, Resource} import com.snowplowanalytics.snowplow.collector.core.model.Sinks -import com.snowplowanalytics.snowplow.collector.core.{App, Config} +import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.{KinesisSink, KinesisSinkConfig} import org.slf4j.LoggerFactory @@ -44,6 +44,12 @@ object KinesisCollector extends App[KinesisSinkConfig](BuildInfo) { } yield Sinks(good, bad) } + override def telemetryInfo(config: Config[KinesisSinkConfig]): Telemetry.TelemetryInfo = + Telemetry.TelemetryInfo( + region = Some(config.streams.sink.region), + cloud = Some("AWS") + ) + def buildExecutorService(kc: KinesisSinkConfig): ScheduledThreadPoolExecutor = { log.info("Creating thread pool of size " + kc.threadPoolSize) new ScheduledThreadPoolExecutor(kc.threadPoolSize) diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index 1557fe88e..d2b9090c7 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -25,8 +25,23 @@ class KinesisConfigSpec extends Specification with CatsEffect { "Config parser" should { "be able to parse extended kinesis config" in { assert( - resource = "/config.kinesis.extended.hocon", - expectedResult = Right(KinesisConfigSpec.expectedConfig) + resource = "/config.kinesis.extended.hocon", + expectedResult = Right( + KinesisConfigSpec + .expectedConfig + .copy( + telemetry = KinesisConfigSpec + .expectedConfig + .telemetry + .copy( + userProvidedId = Some("my_pipeline"), + moduleName = Some("collector-kinesis-ec2"), + moduleVersion = Some("0.5.2"), + instanceId = Some("665bhft5u6udjf"), + autoGeneratedId = Some("hfy67e5ydhtrd") + ) + ) + ) ) } "be able to parse minimal kinesis config" in { @@ -126,6 +141,19 @@ object KinesisConfigSpec { customEndpoint = None, startupCheckInterval = 1.second ) + ), + telemetry = Config.Telemetry( + disable = false, + interval = 60.minutes, + method = "POST", + url = "telemetry-g.snowplowanalytics.com", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None ) ) diff --git a/project/Dependencies.scala b/project/Dependencies.scala index 78304902c..8a96a4d52 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -36,7 +36,7 @@ object Dependencies { val protobuf = "3.21.7" // force this version to mitigate security vulnerabilities // Scala val collectorPayload = "0.0.0" - val tracker = "1.0.1" + val tracker = "2.0.0" val akkaHttp = "10.2.7" val akka = "2.6.16" val scopt = "4.0.1" @@ -58,10 +58,11 @@ object Dependencies { val specs2CE = "1.5.0" val testcontainers = "0.40.10" - object LegacyIT { + object Legacy { val specs2CE = "0.4.1" val catsRetry = "2.1.0" val http4s = "0.21.33" + val tracker = "1.0.1" } } @@ -86,23 +87,24 @@ object Dependencies { val protobuf = "com.google.protobuf" % "protobuf-java" % V.protobuf // Scala - val collectorPayload = "com.snowplowanalytics" % "collector-payload-1" % V.collectorPayload - val badRows = "com.snowplowanalytics" %% "snowplow-badrows" % V.badRows - val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.tracker - val trackerEmitterId = "com.snowplowanalytics" %% "snowplow-scala-tracker-emitter-id" % V.tracker - val scopt = "com.github.scopt" %% "scopt" % V.scopt - val akkaHttp = "com.typesafe.akka" %% "akka-http" % V.akkaHttp - val akkaStream = "com.typesafe.akka" %% "akka-stream" % V.akka - val akkaSlf4j = "com.typesafe.akka" %% "akka-slf4j" % V.akka - val pureconfig = "com.github.pureconfig" %% "pureconfig" % V.pureconfig - val akkaHttpMetrics = "fr.davit" %% "akka-http-metrics-datadog" % V.akkaHttpMetrics - val log4cats = "org.typelevel" %% "log4cats-slf4j" % V.log4cats + val collectorPayload = "com.snowplowanalytics" % "collector-payload-1" % V.collectorPayload + val badRows = "com.snowplowanalytics" %% "snowplow-badrows" % V.badRows + val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.tracker + val emitterHttps = "com.snowplowanalytics" %% "snowplow-scala-tracker-emitter-http4s" % V.tracker + val scopt = "com.github.scopt" %% "scopt" % V.scopt + val akkaHttp = "com.typesafe.akka" %% "akka-http" % V.akkaHttp + val akkaStream = "com.typesafe.akka" %% "akka-stream" % V.akka + val akkaSlf4j = "com.typesafe.akka" %% "akka-slf4j" % V.akka + val pureconfig = "com.github.pureconfig" %% "pureconfig" % V.pureconfig + val akkaHttpMetrics = "fr.davit" %% "akka-http-metrics-datadog" % V.akkaHttpMetrics + val log4cats = "org.typelevel" %% "log4cats-slf4j" % V.log4cats // http4s val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty + val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze val decline = "com.monovore" %% "decline-effect" % V.decline val circeGeneric = "io.circe" %% "circe-generic" % V.circe val circeConfig = "io.circe" %% "circe-config" % V.circeConfig @@ -113,7 +115,7 @@ object Dependencies { // Test common val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test - val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % Test + val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % Test // Test Akka val akkaTestkit = "com.typesafe.akka" %% "akka-testkit" % V.akka % Test @@ -129,13 +131,14 @@ object Dependencies { val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze % IntegrationTest } - // Integration test legacy - object LegacyIT { + object Legacy { val testcontainers = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest - val specs2CE = "com.codecommit" %% "cats-effect-testing-specs2" % V.LegacyIT.specs2CE % IntegrationTest - val catsRetry = "com.github.cb372" %% "cats-retry" % V.LegacyIT.catsRetry % IntegrationTest - val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.LegacyIT.http4s % IntegrationTest + val specs2CE = "com.codecommit" %% "cats-effect-testing-specs2" % V.Legacy.specs2CE % IntegrationTest + val catsRetry = "com.github.cb372" %% "cats-retry" % V.Legacy.catsRetry % IntegrationTest + val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.Legacy.http4s % IntegrationTest + val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.Legacy.tracker + val trackerEmitterId = "com.snowplowanalytics" %% "snowplow-scala-tracker-emitter-id" % V.Legacy.tracker } } } diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala index 6a1648ca6..cc71cf6ee 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala @@ -3,7 +3,7 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.effect._ import cats.effect.kernel.Resource import com.snowplowanalytics.snowplow.collector.core.model.Sinks -import com.snowplowanalytics.snowplow.collector.core.{App, Config} +import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.{PubSubSink, PubSubSinkConfig} object PubSubCollector extends App[PubSubSinkConfig](BuildInfo) { @@ -13,4 +13,10 @@ object PubSubCollector extends App[PubSubSinkConfig](BuildInfo) { good <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.good) bad <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.bad) } yield Sinks(good, bad) + + override def telemetryInfo(config: Config[PubSubSinkConfig]): Telemetry.TelemetryInfo = + Telemetry.TelemetryInfo( + region = None, + cloud = Some("GCP") + ) } diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index e2bbba7e9..fbc2f4ae9 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -119,6 +119,19 @@ object ConfigSpec { startupCheckInterval = 1.second, retryInterval = 10.seconds ) + ), + telemetry = Config.Telemetry( + disable = false, + interval = 60.minutes, + method = "POST", + url = "telemetry-g.snowplowanalytics.com", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None ) ) diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala index 481649f3a..86ef6c113 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala @@ -11,11 +11,9 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import java.util.concurrent.ScheduledThreadPoolExecutor - import cats.effect.{IO, Resource} - import com.snowplowanalytics.snowplow.collector.core.model.Sinks -import com.snowplowanalytics.snowplow.collector.core.{App, Config} +import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ object SqsCollector extends App[SqsSinkConfig](BuildInfo) { @@ -39,4 +37,10 @@ object SqsCollector extends App[SqsSinkConfig](BuildInfo) { ) } yield Sinks(good, bad) } + + override def telemetryInfo(config: Config[SqsSinkConfig]): Telemetry.TelemetryInfo = + Telemetry.TelemetryInfo( + region = Some(config.streams.sink.region), + cloud = Some("AWS") + ) } diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index 1781008ff..54832af22 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -117,6 +117,19 @@ object SqsConfigSpec { ), threadPoolSize = 10 ) + ), + telemetry = Config.Telemetry( + disable = false, + interval = 60.minutes, + method = "POST", + url = "telemetry-g.snowplowanalytics.com", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None ) ) diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala index 5f4dd8659..ac8070eb4 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala @@ -3,8 +3,7 @@ package com.snowplowanalytics.snowplow.collector.stdout import cats.effect.IO import cats.effect.kernel.Resource import com.snowplowanalytics.snowplow.collector.core.model.Sinks -import com.snowplowanalytics.snowplow.collector.core.App -import com.snowplowanalytics.snowplow.collector.core.Config +import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} object StdoutCollector extends App[SinkConfig](BuildInfo) { @@ -13,4 +12,7 @@ object StdoutCollector extends App[SinkConfig](BuildInfo) { val bad = new PrintingSink[IO](config.sink.maxBytes, System.err) Resource.pure(Sinks(good, bad)) } + + override def telemetryInfo(config: Config[SinkConfig]): Telemetry.TelemetryInfo = + Telemetry.TelemetryInfo(None, None) } From 84f659cbac5e64d7f843f2cfc9b7109ff2b0f578 Mon Sep 17 00:00:00 2001 From: colmsnowplow Date: Wed, 13 Sep 2023 10:47:34 +0200 Subject: [PATCH 20/39] Add http4s NSQ support (close #348) --- build.sbt | 8 +- .../Run.scala | 4 +- nsq/src/main/resources/application.conf | 23 ---- .../NsqCollector.scala | 40 +++--- .../sinks/NsqSink.scala | 42 ++++++- .../sinks/NsqSinkConfig.scala | 27 ++++ .../NsqConfigSpec.scala | 119 +++++++++++++++++- project/Dependencies.scala | 2 + 8 files changed, 207 insertions(+), 58 deletions(-) create mode 100644 nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala diff --git a/build.sbt b/build.sbt index 687ceee43..b92e47c5e 100644 --- a/build.sbt +++ b/build.sbt @@ -260,12 +260,14 @@ lazy val kafkaDistroless = project .dependsOn(core % "test->test;compile->compile") lazy val nsqSettings = - allSettings ++ buildInfoSettings ++ Seq( + allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( moduleName := "snowplow-stream-collector-nsq", Docker / packageName := "scala-stream-collector-nsq", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", libraryDependencies ++= Seq( Dependencies.Libraries.nsqClient, Dependencies.Libraries.jackson, + Dependencies.Libraries.nettyAll, Dependencies.Libraries.log4j ) ) @@ -273,14 +275,14 @@ lazy val nsqSettings = lazy val nsq = project .settings(nsqSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile") lazy val nsqDistroless = project .in(file("distroless/nsq")) .settings(sourceDirectory := (nsq / sourceDirectory).value) .settings(nsqSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile") lazy val stdoutSettings = allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 92839bac0..3aed8603d 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -30,7 +30,7 @@ object Run { def fromCli[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], - telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo + telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo ): Opts[F[ExitCode]] = { val configPath = Opts.option[Path]("config", "Path to HOCON configuration (optional)", "c", "config.hocon").orNone configPath.map(fromPath[F, SinkConfig](appInfo, mkSinks, telemetryInfo, _)) @@ -39,7 +39,7 @@ object Run { private def fromPath[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], - telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo, + telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo, path: Option[Path] ): F[ExitCode] = { val eitherT = for { diff --git a/nsq/src/main/resources/application.conf b/nsq/src/main/resources/application.conf index 0d1ae5709..1df27cd22 100644 --- a/nsq/src/main/resources/application.conf +++ b/nsq/src/main/resources/application.conf @@ -14,26 +14,3 @@ collector { } } } - - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off - } -} diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala index 7a7235c4d..65cb67e2d 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala @@ -10,28 +10,24 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.NsqSink -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService +import cats.effect.{IO, Resource} +import com.snowplowanalytics.snowplow.collector.core.model.Sinks +import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ -object NsqCollector extends Collector { - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion +object NsqCollector extends App[NsqSinkConfig](BuildInfo) { + override def mkSinks(config: Config.Streams[NsqSinkConfig]): Resource[IO, Sinks[IO]] = + for { + good <- NsqSink.create[IO]( + config.sink, + config.good + ) + bad <- NsqSink.create[IO]( + config.sink, + config.bad + ) + } yield Sinks(good, bad) - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks = { - val goodStream = collectorConf.streams.good - val badStream = collectorConf.streams.bad - val (good, bad) = collectorConf.streams.sink match { - case nc: Nsq => (new NsqSink(nc.maxBytes, nc, goodStream), new NsqSink(nc.maxBytes, nc, badStream)) - case _ => throw new IllegalArgumentException("Configured sink is not NSQ") - } - CollectorSinks(good, bad) - } - run(collectorConf, akkaConf, sinks, telemetry) - } + override def telemetryInfo(config: Config[NsqSinkConfig]): Telemetry.TelemetryInfo = + Telemetry.TelemetryInfo(None, None) } diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala index f811755fb..858b48bf4 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala @@ -11,17 +11,31 @@ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks +import java.util.concurrent.TimeoutException + import scala.collection.JavaConverters._ +import cats.effect.{Resource, Sync} +import cats.implicits._ + import com.snowplowanalytics.client.nsq.NSQProducer -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import com.snowplowanalytics.snowplow.collector.core.{Sink} +import com.snowplowanalytics.client.nsq.exceptions.NSQException /** * NSQ Sink for the Scala Stream Collector * @param nsqConfig Configuration for Nsq * @param topicName Nsq topic name */ -class NsqSink(val maxBytes: Int, nsqConfig: Nsq, topicName: String) extends Sink { +class NsqSink[F[_]: Sync] private ( + val maxBytes: Int, + nsqConfig: NsqSinkConfig, + topicName: String +) extends Sink[F] { + + @volatile private var healthStatus = true + + override def isHealthy: F[Boolean] = Sync[F].pure(healthStatus) private val producer = new NSQProducer().addAddress(nsqConfig.host, nsqConfig.port).start() @@ -30,9 +44,27 @@ class NsqSink(val maxBytes: Int, nsqConfig: Nsq, topicName: String) extends Sink * @param events The list of events to send * @param key The partition key (unused) */ - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - producer.produceMulti(topicName, events.asJava) + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = + Sync[F].blocking(producer.produceMulti(topicName, events.asJava)).onError { + case _: NSQException | _: TimeoutException => + Sync[F].delay(healthStatus = false) + } *> Sync[F].delay(healthStatus = true) - override def shutdown(): Unit = + def shutdown(): Unit = producer.shutdown() } + +object NsqSink { + + def create[F[_]: Sync]( + nsqConfig: NsqSinkConfig, + topicName: String + ): Resource[F, NsqSink[F]] = + Resource.make( + Sync[F].delay( + // MaxBytes is never used but is required by the sink interface definition, + // So just pass any int val in. + new NsqSink(0, nsqConfig, topicName) + ) + )(sink => Sync[F].delay(sink.shutdown())) +} diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala new file mode 100644 index 000000000..0025d9d08 --- /dev/null +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala @@ -0,0 +1,27 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import io.circe.Decoder +import io.circe.generic.semiauto._ + +import com.snowplowanalytics.snowplow.collector.core.Config + +final case class NsqSinkConfig( + maxBytes: Int, + threadPoolSize: Int, + host: String, + port: Int +) extends Config.Sink + +object NsqSinkConfig { + implicit val configDecoder: Decoder[NsqSinkConfig] = deriveDecoder[NsqSinkConfig] +} diff --git a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala index f4716c56a..a401b4b87 100644 --- a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala +++ b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala @@ -10,8 +10,121 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec +import cats.effect.testing.specs2.CatsEffect +import cats.effect.{ExitCode, IO} +import com.snowplowanalytics.snowplow.collector.core.{Config, ConfigParser} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.NsqSinkConfig +import org.http4s.SameSite +import org.specs2.mutable.Specification -class NsqConfigSpec extends ConfigSpec { - makeConfigTest("nsq", "", "") +import java.nio.file.Paths +import scala.concurrent.duration.DurationInt + +class NsqConfigSpec extends Specification with CatsEffect { + + "Config parser" should { + "be able to parse extended nsq config" in { + assert( + resource = "/config.nsq.extended.hocon", + expectedResult = Right(NsqConfigSpec.expectedConfig) + ) + } + "be able to parse minimal nsq config" in { + assert( + resource = "/config.nsq.minimal.hocon", + expectedResult = Right(NsqConfigSpec.expectedConfig) + ) + } + } + + private def assert(resource: String, expectedResult: Either[ExitCode, Config[NsqSinkConfig]]) = { + val path = Paths.get(getClass.getResource(resource).toURI) + ConfigParser.fromPath[IO, NsqSinkConfig](Some(path)).value.map { result => + result must beEqualTo(expectedResult) + } + } +} + +object NsqConfigSpec { + private val expectedConfig = Config[NsqSinkConfig]( + interface = "0.0.0.0", + port = 8080, + paths = Map.empty[String, String], + p3p = Config.P3P( + policyRef = "/w3c/p3p.xml", + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + ), + crossDomain = Config.CrossDomain( + enabled = false, + domains = List("*"), + secure = true + ), + cookie = Config.Cookie( + enabled = true, + expiration = 365.days, + name = "sp", + domains = List.empty, + fallbackDomain = None, + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ), + doNotTrackCookie = Config.DoNotTrackCookie( + enabled = false, + name = "", + value = "" + ), + cookieBounce = Config.CookieBounce( + enabled = false, + name = "n3pc", + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000", + forwardedProtocolHeader = None + ), + redirectMacro = Config.RedirectMacro( + enabled = false, + placeholder = None + ), + rootResponse = Config.RootResponse( + enabled = false, + statusCode = 302, + headers = Map.empty[String, String], + body = "" + ), + cors = Config.CORS(1.hour), + monitoring = + Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + ssl = Config.SSL(enable = false, redirect = false, port = 443), + enableDefaultRedirect = false, + redirectDomains = Set.empty, + preTerminationPeriod = 10.seconds, + streams = Config.Streams( + good = "good", + bad = "bad", + useIpAddressAsPartitionKey = false, + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + sink = NsqSinkConfig( + maxBytes = 1000000, + threadPoolSize = 10, + host = "nsqHost", + port = 4150 + ) + ), + telemetry = Config.Telemetry( + disable = false, + interval = 60.minutes, + method = "POST", + url = "telemetry-g.snowplowanalytics.com", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None + ) + ) } diff --git a/project/Dependencies.scala b/project/Dependencies.scala index 8a96a4d52..12e861226 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -52,6 +52,7 @@ object Dependencies { val circeConfig = "0.10.0" val fs2PubSub = "0.22.0" val catsRetry = "3.1.0" + val nettyAll = "4.1.95.Final" // to fix nsq dependency // Scala (test only) val specs2 = "4.11.0" @@ -69,6 +70,7 @@ object Dependencies { object Libraries { // Java val jackson = "com.fasterxml.jackson.core" % "jackson-databind" % V.jackson // nsq only + val nettyAll = "io.netty" % "netty-all" % V.nettyAll //nsq only val thrift = "org.apache.thrift" % "libthrift" % V.thrift val kinesis = "com.amazonaws" % "aws-java-sdk-kinesis" % V.awsSdk val sqs = "com.amazonaws" % "aws-java-sdk-sqs" % V.awsSdk From 60be201a1390670e5767bf507ce6de416a43ec62 Mon Sep 17 00:00:00 2001 From: spenes Date: Wed, 20 Sep 2023 17:22:00 +0300 Subject: [PATCH 21/39] Add http4s Kafka support (close #382) --- .github/workflows/test.yml | 2 + build.sbt | 21 ++- kafka/src/it/resources/collector.hocon | 14 ++ .../scalastream/it/kafka/Containers.scala | 129 ++++++++++++++++++ .../it/kafka/KafkaCollectorSpec.scala | 65 +++++++++ .../scalastream/it/kafka/KafkaUtils.scala | 56 ++++++++ kafka/src/main/resources/application.conf | 23 ---- .../KafkaCollector.scala | 50 ++++--- .../sinks/KafkaSink.scala | 86 +++++++----- .../sinks/KafkaSinkConfig.scala | 17 +++ .../KafkaConfigSpec.scala | 120 +++++++++++++++- .../ConfigSpec.scala | 8 +- 12 files changed, 498 insertions(+), 93 deletions(-) create mode 100644 kafka/src/it/resources/collector.hocon create mode 100644 kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/Containers.scala create mode 100644 kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaCollectorSpec.scala create mode 100644 kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala create mode 100644 kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index d388e0877..a482e472c 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -20,3 +20,5 @@ jobs: run: sbt "project kinesisDistroless" IntegrationTest/test - name: Run integration tests PubSub run: sbt "project pubsubDistroless" IntegrationTest/test + - name: Run integration tests Kafka + run: sbt "project kafkaDistroless" IntegrationTest/test diff --git a/build.sbt b/build.sbt index b92e47c5e..a9231b345 100644 --- a/build.sbt +++ b/build.sbt @@ -241,23 +241,36 @@ lazy val pubsubDistroless = project .configs(IntegrationTest) lazy val kafkaSettings = - allSettings ++ buildInfoSettings ++ Seq( + allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ Seq( moduleName := "snowplow-stream-collector-kafka", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", Docker / packageName := "scala-stream-collector-kafka", - libraryDependencies ++= Seq(Dependencies.Libraries.kafkaClients, Dependencies.Libraries.mskAuth) + libraryDependencies ++= Seq( + Dependencies.Libraries.kafkaClients, + Dependencies.Libraries.mskAuth, + // integration tests dependencies + Dependencies.Libraries.IT.specs2, + Dependencies.Libraries.IT.specs2CE + ), + IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, + IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated ) lazy val kafka = project .settings(kafkaSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile;it->it") + .configs(IntegrationTest) + lazy val kafkaDistroless = project .in(file("distroless/kafka")) .settings(sourceDirectory := (kafka / sourceDirectory).value) .settings(kafkaSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(http4s % "test->test;compile->compile;it->it") + .configs(IntegrationTest) + lazy val nsqSettings = allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( diff --git a/kafka/src/it/resources/collector.hocon b/kafka/src/it/resources/collector.hocon new file mode 100644 index 000000000..78fd2c372 --- /dev/null +++ b/kafka/src/it/resources/collector.hocon @@ -0,0 +1,14 @@ +collector { + interface = "0.0.0.0" + port = ${PORT} + + streams { + good = ${TOPIC_GOOD} + bad = ${TOPIC_BAD} + + sink { + brokers = ${BROKER} + maxBytes = ${MAX_BYTES} + } + } +} \ No newline at end of file diff --git a/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/Containers.scala b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/Containers.scala new file mode 100644 index 000000000..6ff92eaba --- /dev/null +++ b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/Containers.scala @@ -0,0 +1,129 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it.kafka + +import cats.effect._ +import com.dimafeng.testcontainers.{FixedHostPortGenericContainer, GenericContainer} +import com.snowplowanalytics.snowplow.collectors.scalastream.BuildInfo +import com.snowplowanalytics.snowplow.collectors.scalastream.it.CollectorContainer +import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ +import org.testcontainers.containers.wait.strategy.Wait +import org.testcontainers.containers.{BindMode, Network, GenericContainer => JGenericContainer} + +object Containers { + + val zookeeperContainerName = "zookeeper" + val zookeeperPort = 2181 + val brokerContainerName = "broker" + val brokerExternalPort = 9092 + val brokerInternalPort = 29092 + + def createContainers( + goodTopic: String, + badTopic: String, + maxBytes: Int + ): Resource[IO, CollectorContainer] = + for { + network <- network() + _ <- zookeeper(network) + _ <- kafka(network) + c <- collectorKafka(network, goodTopic, badTopic, maxBytes) + } yield c + + private def network(): Resource[IO, Network] = + Resource.make(IO(Network.newNetwork()))(n => IO(n.close())) + + private def kafka( + network: Network + ): Resource[IO, JGenericContainer[_]] = + Resource.make( + IO { + val container = FixedHostPortGenericContainer( + imageName = "confluentinc/cp-kafka:7.0.1", + env = Map( + "KAFKA_BROKER_ID" -> "1", + "KAFKA_ZOOKEEPER_CONNECT" -> s"$zookeeperContainerName:$zookeeperPort", + "KAFKA_LISTENER_SECURITY_PROTOCOL_MAP" -> "PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT", + "KAFKA_ADVERTISED_LISTENERS" -> s"PLAINTEXT://localhost:$brokerExternalPort,PLAINTEXT_INTERNAL://$brokerContainerName:$brokerInternalPort", + "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR" -> "1", + "KAFKA_TRANSACTION_STATE_LOG_MIN_ISR" -> "1", + "KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR" -> "1" + ), + exposedPorts = List(brokerExternalPort, brokerInternalPort), + exposedHostPort = brokerExternalPort, + exposedContainerPort = brokerExternalPort + ) + container.container.withNetwork(network) + container.container.withNetworkAliases(brokerContainerName) + container.start() + container.container + } + )(e => IO(e.stop())) + + private def zookeeper( + network: Network, + ): Resource[IO, JGenericContainer[_]] = + Resource.make( + IO { + val container = GenericContainer( + dockerImage = "confluentinc/cp-zookeeper:7.0.1", + env = Map( + "ZOOKEEPER_CLIENT_PORT" -> zookeeperPort.toString, + "ZOOKEEPER_TICK_TIME" -> "2000" + ), + exposedPorts = List(zookeeperPort) + ) + container.container.withNetwork(network) + container.container.withNetworkAliases(zookeeperContainerName) + container.start() + container.container + } + )(e => IO(e.stop())) + + def collectorKafka( + network: Network, + goodTopic: String, + badTopic: String, + maxBytes: Int + ): Resource[IO, CollectorContainer] = { + Resource.make( + IO { + val collectorPort = 8080 + val container = GenericContainer( + dockerImage = BuildInfo.dockerAlias, + env = Map( + "PORT" -> collectorPort.toString, + "BROKER" -> s"$brokerContainerName:$brokerInternalPort", + "TOPIC_GOOD" -> goodTopic, + "TOPIC_BAD" -> badTopic, + "MAX_BYTES" -> maxBytes.toString + ), + exposedPorts = Seq(collectorPort), + fileSystemBind = Seq( + GenericContainer.FileSystemBind( + "kafka/src/it/resources/collector.hocon", + "/snowplow/config/collector.hocon", + BindMode.READ_ONLY + ) + ), + command = Seq( + "--config", + "/snowplow/config/collector.hocon" + ), + waitStrategy = Wait.forLogMessage(s".*Service bound to address.*", 1) + ) + container.container.withNetwork(network) + val c = startContainerWithLogs(container.container, "collector") + CollectorContainer(c, c.getHost, c.getMappedPort(collectorPort)) + } + )(c => IO(c.container.stop())) + } +} diff --git a/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaCollectorSpec.scala b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaCollectorSpec.scala new file mode 100644 index 000000000..63bdece82 --- /dev/null +++ b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaCollectorSpec.scala @@ -0,0 +1,65 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it.kafka + +import scala.concurrent.duration._ + +import cats.effect.IO +import cats.effect.testing.specs2.CatsEffect + +import com.snowplowanalytics.snowplow.collectors.scalastream.it.EventGenerator +import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ + +import org.specs2.mutable.Specification + +class KafkaCollectorSpec extends Specification with CatsEffect { + + override protected val Timeout = 5.minutes + + val maxBytes = 10000 + + "emit the correct number of collector payloads and bad rows" in { + val testName = "count" + val nbGood = 1000 + val nbBad = 10 + val goodTopic = "test-raw" + val badTopic = "test-bad" + + Containers.createContainers( + goodTopic = goodTopic, + badTopic = badTopic, + maxBytes = maxBytes + ).use { collector => + for { + _ <- log(testName, "Sending data") + _ <- EventGenerator.sendEvents( + collector.host, + collector.port, + nbGood, + nbBad, + maxBytes + ) + _ <- log(testName, "Data sent. Waiting for collector to work") + _ <- IO.sleep(30.second) + _ <- log(testName, "Consuming collector's output") + collectorOutput <- KafkaUtils.readOutput( + brokerAddr = s"localhost:${Containers.brokerExternalPort}", + goodTopic = goodTopic, + badTopic = badTopic + ) + } yield { + collectorOutput.good.size must beEqualTo(nbGood) + collectorOutput.bad.size must beEqualTo(nbBad) + } + } + } + +} diff --git a/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala new file mode 100644 index 000000000..fea1b327a --- /dev/null +++ b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala @@ -0,0 +1,56 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it.kafka + +import cats.effect._ +import org.apache.kafka.clients.consumer._ +import java.util.Properties +import java.time.Duration +import scala.collection.JavaConverters._ +import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ +import com.snowplowanalytics.snowplow.collectors.scalastream.it.CollectorOutput + +object KafkaUtils { + + def readOutput( + brokerAddr: String, + goodTopic: String, + badTopic: String + ): IO[CollectorOutput] = { + createConsumer(brokerAddr).use { kafkaConsumer => + IO { + kafkaConsumer.subscribe(List(goodTopic, badTopic).asJava) + val records = kafkaConsumer.poll(Duration.ofSeconds(20)) + val extract = (r: ConsumerRecords[String, Array[Byte]], topicName: String) => + r.records(topicName).asScala.toList.map(_.value()) + val goodCount = extract(records, goodTopic).map(parseCollectorPayload) + val badCount = extract(records, badTopic).map(parseBadRow) + CollectorOutput(goodCount, badCount) + } + } + } + + private def createConsumer(brokerAddr: String): Resource[IO, KafkaConsumer[String, Array[Byte]]] = { + val acquire = IO { + val props = new Properties() + props.setProperty("bootstrap.servers", brokerAddr) + props.setProperty("group.id", "it-collector") + props.setProperty("auto.offset.reset", "earliest") + props.setProperty("max.poll.records", "2000") + props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer") + props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer") + new KafkaConsumer[String, Array[Byte]](props) + } + val release = (p: KafkaConsumer[String, Array[Byte]]) => IO(p.close()) + Resource.make(acquire)(release) + } + +} diff --git a/kafka/src/main/resources/application.conf b/kafka/src/main/resources/application.conf index 95f0f5c80..80182aeec 100644 --- a/kafka/src/main/resources/application.conf +++ b/kafka/src/main/resources/application.conf @@ -14,26 +14,3 @@ collector { } } } - - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off - } -} diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala index 4d6ed1e4d..32f2f4a82 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala @@ -10,33 +10,29 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KafkaSink -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo +import cats.effect.{IO, Resource} +import com.snowplowanalytics.snowplow.collector.core.model.Sinks +import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ -object KafkaCollector extends Collector { - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion +object KafkaCollector extends App[KafkaSinkConfig](BuildInfo) { - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks = { - val goodStream = collectorConf.streams.good - val badStream = collectorConf.streams.bad - val bufferConf = collectorConf.streams.buffer - val (good, bad) = collectorConf.streams.sink match { - case kc: Kafka => - ( - new KafkaSink(kc.maxBytes, kc, bufferConf, goodStream), - new KafkaSink(kc.maxBytes, kc, bufferConf, badStream) - ) - case _ => throw new IllegalArgumentException("Configured sink is not Kafka") - } - CollectorSinks(good, bad) - } - run(collectorConf, akkaConf, sinks, telemetry) - } + override def mkSinks(config: Config.Streams[KafkaSinkConfig]): Resource[IO, Sinks[IO]] = + for { + good <- KafkaSink.create[IO]( + config.sink.maxBytes, + config.good, + config.sink, + config.buffer + ) + bad <- KafkaSink.create[IO]( + config.sink.maxBytes, + config.bad, + config.sink, + config.buffer + ) + } yield Sinks(good, bad) + + override def telemetryInfo(config: Config[KafkaSinkConfig]): Telemetry.TelemetryInfo = + Telemetry.TelemetryInfo(None, None) } diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala index 6e63f2cab..ac502f5c5 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala @@ -11,48 +11,28 @@ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks +import cats.effect.{Resource, Sync} + +import org.slf4j.LoggerFactory + import java.util.Properties import org.apache.kafka.clients.producer._ -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ +import com.snowplowanalytics.snowplow.collector.core.{Config, Sink} /** * Kafka Sink for the Scala Stream Collector */ -class KafkaSink( +class KafkaSink[F[_]: Sync]( val maxBytes: Int, - kafkaConfig: Kafka, - bufferConfig: BufferConfig, + kafkaProducer: KafkaProducer[String, Array[Byte]], topicName: String -) extends Sink { +) extends Sink[F] { - private val kafkaProducer = createProducer + private lazy val log = LoggerFactory.getLogger(getClass()) - /** - * Creates a new Kafka Producer with the given - * configuration options - * - * @return a new Kafka Producer - */ - private def createProducer: KafkaProducer[String, Array[Byte]] = { - - log.info(s"Create Kafka Producer to brokers: ${kafkaConfig.brokers}") - - val props = new Properties() - props.setProperty("bootstrap.servers", kafkaConfig.brokers) - props.setProperty("acks", "all") - props.setProperty("retries", kafkaConfig.retries.toString) - props.setProperty("buffer.memory", bufferConfig.byteLimit.toString) - props.setProperty("linger.ms", bufferConfig.timeLimit.toString) - props.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer") - props.setProperty("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer") - - // Can't use `putAll` in JDK 11 because of https://github.com/scala/bug/issues/10418 - kafkaConfig.producerConf.getOrElse(Map()).foreach { case (k, v) => props.setProperty(k, v) } - - new KafkaProducer[String, Array[Byte]](props) - } + override def isHealthy: F[Boolean] = Sync[F].pure(true) /** * Store raw events to the topic @@ -60,7 +40,7 @@ class KafkaSink( * @param events The list of events to send * @param key The partition key to use */ - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = { + override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = Sync[F].delay { log.debug(s"Writing ${events.size} Thrift records to Kafka topic $topicName at key $key") events.foreach { event => kafkaProducer.send( @@ -72,7 +52,47 @@ class KafkaSink( ) } } +} + +object KafkaSink { - override def shutdown(): Unit = - kafkaProducer.close() + def create[F[_]: Sync]( + maxBytes: Int, + topicName: String, + kafkaConfig: KafkaSinkConfig, + bufferConfig: Config.Buffer + ): Resource[F, KafkaSink[F]] = + for { + kafkaProducer <- createProducer(kafkaConfig, bufferConfig) + kafkaSink = new KafkaSink(maxBytes, kafkaProducer, topicName) + } yield kafkaSink + + /** + * Creates a new Kafka Producer with the given + * configuration options + * + * @return a new Kafka Producer + */ + private def createProducer[F[_]: Sync]( + kafkaConfig: KafkaSinkConfig, + bufferConfig: Config.Buffer + ): Resource[F, KafkaProducer[String, Array[Byte]]] = { + val acquire = Sync[F].delay { + val props = new Properties() + props.setProperty("bootstrap.servers", kafkaConfig.brokers) + props.setProperty("acks", "all") + props.setProperty("retries", kafkaConfig.retries.toString) + props.setProperty("buffer.memory", bufferConfig.byteLimit.toString) + props.setProperty("linger.ms", bufferConfig.timeLimit.toString) + props.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer") + props.setProperty("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer") + + // Can't use `putAll` in JDK 11 because of https://github.com/scala/bug/issues/10418 + kafkaConfig.producerConf.getOrElse(Map()).foreach { case (k, v) => props.setProperty(k, v) } + + new KafkaProducer[String, Array[Byte]](props) + } + val release = (p: KafkaProducer[String, Array[Byte]]) => Sync[F].delay(p.close()) + Resource.make(acquire)(release) + } } diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala new file mode 100644 index 000000000..676a5259d --- /dev/null +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala @@ -0,0 +1,17 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import io.circe.Decoder +import io.circe.generic.semiauto._ + +import com.snowplowanalytics.snowplow.collector.core.Config + +final case class KafkaSinkConfig( + maxBytes: Int, + brokers: String, + retries: Int, + producerConf: Option[Map[String, String]] +) extends Config.Sink + +object KafkaSinkConfig { + implicit val configDecoder: Decoder[KafkaSinkConfig] = deriveDecoder[KafkaSinkConfig] +} diff --git a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala index 7bc486a72..6abac5842 100644 --- a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala +++ b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala @@ -10,8 +10,122 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream -import com.snowplowanalytics.snowplow.collectors.scalastream.config.ConfigSpec +import cats.effect.testing.specs2.CatsEffect +import cats.effect.{ExitCode, IO} +import com.snowplowanalytics.snowplow.collector.core.{Config, ConfigParser} +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KafkaSinkConfig +import org.http4s.SameSite +import org.specs2.mutable.Specification -class KafkaConfigSpec extends ConfigSpec { - makeConfigTest("kafka", "", "") +import java.nio.file.Paths +import scala.concurrent.duration.DurationInt + +class KafkaConfigSpec extends Specification with CatsEffect { + + "Config parser" should { + "be able to parse extended kafka config" in { + assert( + resource = "/config.kafka.extended.hocon", + expectedResult = Right(KafkaConfigSpec.expectedConfig) + ) + } + "be able to parse minimal kafka config" in { + assert( + resource = "/config.kafka.minimal.hocon", + expectedResult = Right(KafkaConfigSpec.expectedConfig) + ) + } + } + + private def assert(resource: String, expectedResult: Either[ExitCode, Config[KafkaSinkConfig]]) = { + val path = Paths.get(getClass.getResource(resource).toURI) + ConfigParser.fromPath[IO, KafkaSinkConfig](Some(path)).value.map { result => + result must beEqualTo(expectedResult) + } + } +} + +object KafkaConfigSpec { + + private val expectedConfig = Config[KafkaSinkConfig]( + interface = "0.0.0.0", + port = 8080, + paths = Map.empty[String, String], + p3p = Config.P3P( + policyRef = "/w3c/p3p.xml", + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + ), + crossDomain = Config.CrossDomain( + enabled = false, + domains = List("*"), + secure = true + ), + cookie = Config.Cookie( + enabled = true, + expiration = 365.days, + name = "sp", + domains = List.empty, + fallbackDomain = None, + secure = true, + httpOnly = true, + sameSite = Some(SameSite.None) + ), + doNotTrackCookie = Config.DoNotTrackCookie( + enabled = false, + name = "", + value = "" + ), + cookieBounce = Config.CookieBounce( + enabled = false, + name = "n3pc", + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000", + forwardedProtocolHeader = None + ), + redirectMacro = Config.RedirectMacro( + enabled = false, + placeholder = None + ), + rootResponse = Config.RootResponse( + enabled = false, + statusCode = 302, + headers = Map.empty[String, String], + body = "" + ), + cors = Config.CORS(1.hour), + monitoring = + Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + ssl = Config.SSL(enable = false, redirect = false, port = 443), + enableDefaultRedirect = false, + redirectDomains = Set.empty, + preTerminationPeriod = 10.seconds, + streams = Config.Streams( + good = "good", + bad = "bad", + useIpAddressAsPartitionKey = false, + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + sink = KafkaSinkConfig( + maxBytes = 1000000, + brokers = "localhost:9092,another.host:9092", + retries = 10, + producerConf = None + ) + ), + telemetry = Config.Telemetry( + disable = false, + interval = 60.minutes, + method = "POST", + url = "telemetry-g.snowplowanalytics.com", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None + ) + ) } diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index fbc2f4ae9..77256389b 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -1,9 +1,11 @@ /* * Copyright (c) 2012-present Snowplow Analytics Ltd. All rights reserved. * - * This program is licensed to you under the Snowplow Community License Version 1.0, - * and you may not use this file except in compliance with the Snowplow Community License Version 1.0. - * You may obtain a copy of the Snowplow Community License Version 1.0 at https://docs.snowplow.io/community-license-1.0 + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ package com.snowplowanalytics.snowplow.collectors.scalastream From 8aef1a05a4c4ac83cebe76bff59d4be1ae378866 Mon Sep 17 00:00:00 2001 From: spenes Date: Wed, 27 Sep 2023 13:18:58 +0300 Subject: [PATCH 22/39] Set maxBytes in the NsqSink (close #383) --- .../Run.scala | 20 +++++++++++-------- .../NsqCollector.scala | 2 ++ .../sinks/NsqSink.scala | 5 ++--- 3 files changed, 16 insertions(+), 11 deletions(-) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 3aed8603d..25297d818 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -25,12 +25,16 @@ import com.snowplowanalytics.snowplow.collector.core.model.Sinks object Run { + type MkSinks[F[_], SinkConfig] = Config.Streams[SinkConfig] => Resource[F, Sinks[F]] + + type TelemetryInfo[SinkConfig] = Config[SinkConfig] => Telemetry.TelemetryInfo + implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] def fromCli[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, - mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], - telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo + mkSinks: MkSinks[F, SinkConfig], + telemetryInfo: TelemetryInfo[SinkConfig] ): Opts[F[ExitCode]] = { val configPath = Opts.option[Path]("config", "Path to HOCON configuration (optional)", "c", "config.hocon").orNone configPath.map(fromPath[F, SinkConfig](appInfo, mkSinks, telemetryInfo, _)) @@ -38,13 +42,13 @@ object Run { private def fromPath[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, - mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], - telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo, + mkSinks: MkSinks[F, SinkConfig], + telemetryInfo: TelemetryInfo[SinkConfig], path: Option[Path] ): F[ExitCode] = { val eitherT = for { config <- ConfigParser.fromPath[F, SinkConfig](path) - _ <- EitherT.right[ExitCode](fromConfig(appInfo, mkSinks, config, telemetryInfo)) + _ <- EitherT.right[ExitCode](fromConfig(appInfo, mkSinks, telemetryInfo, config)) } yield ExitCode.Success eitherT.merge.handleErrorWith { e => @@ -55,9 +59,9 @@ object Run { private def fromConfig[F[_]: Async: Tracking, SinkConfig]( appInfo: AppInfo, - mkSinks: Config.Streams[SinkConfig] => Resource[F, Sinks[F]], - config: Config[SinkConfig], - telemetryInfo: Config[SinkConfig] => Telemetry.TelemetryInfo + mkSinks: MkSinks[F, SinkConfig], + telemetryInfo: TelemetryInfo[SinkConfig], + config: Config[SinkConfig] ): F[ExitCode] = { val resources = for { sinks <- mkSinks(config.streams) diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala index 65cb67e2d..9877d3b9b 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala @@ -19,10 +19,12 @@ object NsqCollector extends App[NsqSinkConfig](BuildInfo) { override def mkSinks(config: Config.Streams[NsqSinkConfig]): Resource[IO, Sinks[IO]] = for { good <- NsqSink.create[IO]( + config.sink.maxBytes, config.sink, config.good ) bad <- NsqSink.create[IO]( + config.sink.maxBytes, config.sink, config.bad ) diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala index 858b48bf4..358963605 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala @@ -57,14 +57,13 @@ class NsqSink[F[_]: Sync] private ( object NsqSink { def create[F[_]: Sync]( + maxBytes: Int, nsqConfig: NsqSinkConfig, topicName: String ): Resource[F, NsqSink[F]] = Resource.make( Sync[F].delay( - // MaxBytes is never used but is required by the sink interface definition, - // So just pass any int val in. - new NsqSink(0, nsqConfig, topicName) + new NsqSink(maxBytes, nsqConfig, topicName) ) )(sink => Sync[F].delay(sink.shutdown())) } From a160bad21350ce3fce93e2dc168318d7e14cbbea Mon Sep 17 00:00:00 2001 From: spenes Date: Wed, 4 Oct 2023 15:29:23 +0300 Subject: [PATCH 23/39] Set installation id (close #384) --- build.sbt | 1 + .../App.scala | 2 +- .../Run.scala | 17 +-- .../Telemetry.scala | 59 ++++---- .../TelemetrySpec.scala | 126 ++++++++++++++++++ .../KafkaCollector.scala | 8 +- .../TelemetryUtils.scala | 34 +++++ .../KinesisCollector.scala | 17 ++- .../TelemetryUtils.scala | 26 ++++ .../sinks/KinesisSink.scala | 90 +++++-------- .../sinks/KinesisSinkConfig.scala | 1 - .../TelemetryUtilsSpec.scala | 13 ++ .../sinks/KinesisConfigSpec.scala | 4 - .../NsqCollector.scala | 4 +- project/Dependencies.scala | 6 +- .../PubSubCollector.scala | 11 +- .../SqsCollector.scala | 15 ++- .../TelemetryUtils.scala | 25 ++++ .../sinks/SqsSink.scala | 14 +- .../TelemetryUtilsSpec.scala | 13 ++ .../StdoutCollector.scala | 4 +- 21 files changed, 360 insertions(+), 130 deletions(-) create mode 100644 http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala create mode 100644 kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala create mode 100644 kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala create mode 100644 kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala create mode 100644 sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala create mode 100644 sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala diff --git a/build.sbt b/build.sbt index a9231b345..8cd3c74b9 100644 --- a/build.sbt +++ b/build.sbt @@ -143,6 +143,7 @@ lazy val http4s = project Dependencies.Libraries.emitterHttps, Dependencies.Libraries.specs2, Dependencies.Libraries.specs2CE, + Dependencies.Libraries.ceTestkit, //Integration tests Dependencies.Libraries.IT.testcontainers, diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala index 5bbce5762..23b614458 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala @@ -21,7 +21,7 @@ abstract class App[SinkConfig <: Config.Sink: Decoder](appInfo: AppInfo) def mkSinks(config: Config.Streams[SinkConfig]): Resource[IO, Sinks[IO]] - def telemetryInfo(config: Config[SinkConfig]): Telemetry.TelemetryInfo + def telemetryInfo(config: Config.Streams[SinkConfig]): IO[Telemetry.TelemetryInfo] final def main: Opts[IO[ExitCode]] = Run.fromCli[IO, SinkConfig](appInfo, mkSinks, telemetryInfo) } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 25297d818..20bed625b 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -27,14 +27,14 @@ object Run { type MkSinks[F[_], SinkConfig] = Config.Streams[SinkConfig] => Resource[F, Sinks[F]] - type TelemetryInfo[SinkConfig] = Config[SinkConfig] => Telemetry.TelemetryInfo + type TelemetryInfo[F[_], SinkConfig] = Config.Streams[SinkConfig] => F[Telemetry.TelemetryInfo] implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] def fromCli[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, mkSinks: MkSinks[F, SinkConfig], - telemetryInfo: TelemetryInfo[SinkConfig] + telemetryInfo: TelemetryInfo[F, SinkConfig] ): Opts[F[ExitCode]] = { val configPath = Opts.option[Path]("config", "Path to HOCON configuration (optional)", "c", "config.hocon").orNone configPath.map(fromPath[F, SinkConfig](appInfo, mkSinks, telemetryInfo, _)) @@ -43,7 +43,7 @@ object Run { private def fromPath[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, mkSinks: MkSinks[F, SinkConfig], - telemetryInfo: TelemetryInfo[SinkConfig], + telemetryInfo: TelemetryInfo[F, SinkConfig], path: Option[Path] ): F[ExitCode] = { val eitherT = for { @@ -60,7 +60,7 @@ object Run { private def fromConfig[F[_]: Async: Tracking, SinkConfig]( appInfo: AppInfo, mkSinks: MkSinks[F, SinkConfig], - telemetryInfo: TelemetryInfo[SinkConfig], + telemetryInfo: TelemetryInfo[F, SinkConfig], config: Config[SinkConfig] ): F[ExitCode] = { val resources = for { @@ -81,14 +81,9 @@ object Run { } yield httpClient resources.use { httpClient => + val appId = java.util.UUID.randomUUID.toString Telemetry - .run( - config.telemetry, - httpClient, - appInfo, - telemetryInfo(config).region, - telemetryInfo(config).cloud - ) + .run(config.telemetry, httpClient, appInfo, appId, telemetryInfo(config.streams)) .compile .drain .flatMap(_ => Async[F].never[ExitCode]) diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala index 95df9bebc..a222c4208 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala @@ -3,6 +3,8 @@ package com.snowplowanalytics.snowplow.collector.core import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger +import org.apache.commons.codec.digest.DigestUtils + import cats.data.NonEmptyList import cats.implicits._ @@ -32,25 +34,20 @@ object Telemetry { telemetryConfig: Config.Telemetry, httpClient: HttpClient[F], appInfo: AppInfo, - region: Option[String], - cloud: Option[String] + appId: String, + telemetryInfoF: F[TelemetryInfo] ): Stream[F, Unit] = if (telemetryConfig.disable) Stream.empty.covary[F] - else { - val sdj = makeHeartbeatEvent( - telemetryConfig, - region, - cloud, - appInfo.moduleName, - appInfo.version - ) - Stream.resource(initTracker(telemetryConfig, appInfo.moduleName, httpClient)).flatMap { tracker => - Stream.fixedDelay[F](telemetryConfig.interval).evalMap { _ => + else + for { + telemetryInfo <- Stream.eval(telemetryInfoF) + sdj = makeHeartbeatEvent(telemetryConfig, appInfo, appId, telemetryInfo) + tracker <- Stream.resource(initTracker(telemetryConfig, appInfo.moduleName, httpClient)) + _ <- Stream.fixedDelay[F](telemetryConfig.interval).evalMap { _ => tracker.trackSelfDescribingEvent(unstructEvent = sdj) >> tracker.flushEmitters() } - } - } + } yield () private def initTracker[F[_]: Async: Tracking]( config: Config.Telemetry, @@ -90,29 +87,39 @@ object Telemetry { private def makeHeartbeatEvent( teleCfg: Config.Telemetry, - region: Option[String], - cloud: Option[String], - appName: String, - appVersion: String + appInfo: AppInfo, + appId: String, + telemetryInfo: TelemetryInfo ): SelfDescribingData[Json] = SelfDescribingData( - SchemaKey("com.snowplowanalytics.oss", "oss_context", "jsonschema", SchemaVer.Full(1, 0, 1)), + SchemaKey("com.snowplowanalytics.oss", "oss_context", "jsonschema", SchemaVer.Full(1, 0, 2)), Json.obj( "userProvidedId" -> teleCfg.userProvidedId.asJson, "autoGeneratedId" -> teleCfg.autoGeneratedId.asJson, "moduleName" -> teleCfg.moduleName.asJson, "moduleVersion" -> teleCfg.moduleVersion.asJson, "instanceId" -> teleCfg.instanceId.asJson, - "appGeneratedId" -> java.util.UUID.randomUUID.toString.asJson, - "cloud" -> cloud.asJson, - "region" -> region.asJson, - "applicationName" -> appName.asJson, - "applicationVersion" -> appVersion.asJson + "appGeneratedId" -> appId.asJson, + "cloud" -> telemetryInfo.cloud.asJson, + "region" -> telemetryInfo.region.asJson, + "installationId" -> telemetryInfo.hashedInstallationId.asJson, + "applicationName" -> appInfo.moduleName.asJson, + "applicationVersion" -> appInfo.version.asJson ) ) + /** + * Stores destination specific telemetry data + * @param region Cloud region application is deployed + * @param cloud Cloud application is deployed + * @param unhashedInstallationId Unhashed version of id that is used identify pipeline. + * It should be something unique to that pipeline such as account id, project id etc. + */ case class TelemetryInfo( region: Option[String], - cloud: Option[String] - ) + cloud: Option[String], + unhashedInstallationId: Option[String] + ) { + def hashedInstallationId: Option[String] = unhashedInstallationId.map(DigestUtils.sha256Hex) + } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala new file mode 100644 index 000000000..f06229287 --- /dev/null +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala @@ -0,0 +1,126 @@ +package com.snowplowanalytics.snowplow.collector.core + +import scala.concurrent.duration._ +import scala.collection.mutable.ListBuffer + +import org.apache.commons.codec.binary.Base64 +import org.apache.commons.codec.digest.DigestUtils + +import java.nio.charset.StandardCharsets + +import cats.effect._ +import cats.effect.unsafe.implicits.global +import cats.effect.testkit.TestControl + +import org.http4s._ +import org.http4s.client.{Client => HttpClient} + +import io.circe._ +import io.circe.parser._ +import io.circe.syntax._ + +import fs2.Stream + +import com.snowplowanalytics.snowplow.scalatracker.emitters.http4s.ceTracking + +import org.specs2.mutable.Specification + +class TelemetrySpec extends Specification { + + case class ProbeTelemetry( + telemetryStream: Stream[IO, Unit], + telemetryEvents: ListBuffer[Json] + ) + + val appId = "testAppId" + val region = Some("testRegion") + val cloud = Some("testCloud") + val unhashedInstallationId = Some("testInstallationId") + val interval = 5.minutes + val telemetryConfig = Config.Telemetry( + disable = false, + interval = interval, + method = "POST", + url = "127.0.0.1", + port = 443, + secure = true, + userProvidedId = None, + moduleName = None, + moduleVersion = None, + instanceId = None, + autoGeneratedId = None + ) + + def probeTelemetry(telemetryConfig: Config.Telemetry): ProbeTelemetry = { + val telemetryEvents = ListBuffer[Json]() + val mockHttpApp = HttpRoutes + .of[IO] { + case req => + IO { + telemetryEvents += extractTelemetryEvent(req) + Response[IO](status = Status.Ok) + } + } + .orNotFound + val mockClient = HttpClient.fromHttpApp[IO](mockHttpApp) + val telemetryInfoF = IO(Telemetry.TelemetryInfo(region, cloud, unhashedInstallationId)) + val telemetryStream = Telemetry.run[IO]( + telemetryConfig, + mockClient, + TestUtils.appInfo, + appId, + telemetryInfoF + ) + ProbeTelemetry(telemetryStream, telemetryEvents) + } + + def extractTelemetryEvent(req: Request[IO]): Json = { + val body = req.bodyText.compile.string.unsafeRunSync() + val jsonBody = parse(body).toOption.get + val uepxEncoded = jsonBody.hcursor.downField("data").downN(0).downField("ue_px").as[String].toOption.get + val uePxDecoded = new String(Base64.decodeBase64(uepxEncoded), StandardCharsets.UTF_8) + parse(uePxDecoded).toOption.get.hcursor.downField("data").as[Json].toOption.get + } + + def expectedEvent(config: Config.Telemetry): Json = { + val installationId = unhashedInstallationId.map(DigestUtils.sha256Hex) + Json.obj( + "schema" -> "iglu:com.snowplowanalytics.oss/oss_context/jsonschema/1-0-2".asJson, + "data" -> Json.obj( + "userProvidedId" -> config.userProvidedId.asJson, + "autoGeneratedId" -> config.autoGeneratedId.asJson, + "moduleName" -> config.moduleName.asJson, + "moduleVersion" -> config.moduleVersion.asJson, + "instanceId" -> config.instanceId.asJson, + "appGeneratedId" -> appId.asJson, + "cloud" -> cloud.asJson, + "region" -> region.asJson, + "installationId" -> installationId.asJson, + "applicationName" -> TestUtils.appInfo.name.asJson, + "applicationVersion" -> TestUtils.appInfo.version.asJson + ) + ) + } + + "Telemetry" should { + "send correct number of events" in { + val eventCount = 10 + val timeout = (interval * eventCount.toLong) + 1.minutes + val probe = probeTelemetry(telemetryConfig) + TestControl.executeEmbed(probe.telemetryStream.timeout(timeout).compile.drain.voidError).unsafeRunSync() + val events = probe.telemetryEvents + val expected = (1 to eventCount).map(_ => expectedEvent(telemetryConfig)).toList + events must beEqualTo(expected) + } + + "not send any events if telemetry is disabled" in { + val probe = probeTelemetry(telemetryConfig.copy(disable = true)) + TestControl + .executeEmbed( + probe.telemetryStream.timeout(interval * 10).compile.drain.voidError + ) + .unsafeRunSync() + probe.telemetryEvents must beEmpty + } + } +} diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala index 32f2f4a82..f8394b4a5 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala @@ -33,6 +33,10 @@ object KafkaCollector extends App[KafkaSinkConfig](BuildInfo) { ) } yield Sinks(good, bad) - override def telemetryInfo(config: Config[KafkaSinkConfig]): Telemetry.TelemetryInfo = - Telemetry.TelemetryInfo(None, None) + override def telemetryInfo(config: Config.Streams[KafkaSinkConfig]): IO[Telemetry.TelemetryInfo] = + TelemetryUtils.getAzureSubscriptionId.map { + case None => Telemetry.TelemetryInfo(None, None, None) + case Some(id) => Telemetry.TelemetryInfo(None, Some("Azure"), Some(id)) + } + } diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala new file mode 100644 index 000000000..c9c82b9b7 --- /dev/null +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala @@ -0,0 +1,34 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.IO +import org.http4s._ +import org.http4s.blaze.client.BlazeClientBuilder +import org.typelevel.ci._ +import io.circe.parser + +object TelemetryUtils { + + // Metadata service response will be used to get Azure subscription id + // More information about the service can be found here: + // https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service + val azureMetadataServiceUrl = "http://169.254.169.254/metadata/instance?api-version=2021-02-01" + + def getAzureSubscriptionId: IO[Option[String]] = { + val response = for { + client <- BlazeClientBuilder[IO].resource + request = Request[IO]( + method = Method.GET, + uri = Uri.unsafeFromString(azureMetadataServiceUrl), + headers = Headers(Header.Raw(ci"Metadata", "true")) + ) + response <- client.run(request) + } yield response + response.use(_.bodyText.compile.string.map(extractId)).handleError(_ => None) + } + + private def extractId(metadata: String): Option[String] = + for { + json <- parser.parse(metadata).toOption + id <- json.hcursor.downField("compute").downField("subscriptionId").as[String].toOption + } yield id +} diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala index 926f336c9..baf1898c0 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala @@ -11,9 +11,11 @@ package com.snowplowanalytics.snowplow.collectors.scalastream import cats.effect.{IO, Resource} + import com.snowplowanalytics.snowplow.collector.core.model.Sinks import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.{KinesisSink, KinesisSinkConfig} + import org.slf4j.LoggerFactory import java.util.concurrent.ScheduledThreadPoolExecutor @@ -44,11 +46,16 @@ object KinesisCollector extends App[KinesisSinkConfig](BuildInfo) { } yield Sinks(good, bad) } - override def telemetryInfo(config: Config[KinesisSinkConfig]): Telemetry.TelemetryInfo = - Telemetry.TelemetryInfo( - region = Some(config.streams.sink.region), - cloud = Some("AWS") - ) + override def telemetryInfo(config: Config.Streams[KinesisSinkConfig]): IO[Telemetry.TelemetryInfo] = + TelemetryUtils + .getAccountId(config) + .map(id => + Telemetry.TelemetryInfo( + region = Some(config.sink.region), + cloud = Some("AWS"), + unhashedInstallationId = id + ) + ) def buildExecutorService(kc: KinesisSinkConfig): ScheduledThreadPoolExecutor = { log.info("Creating thread pool of size " + kc.threadPoolSize) diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala new file mode 100644 index 000000000..f303d8cb0 --- /dev/null +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala @@ -0,0 +1,26 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.{IO, Resource} + +import com.snowplowanalytics.snowplow.collector.core.Config +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.{KinesisSink, KinesisSinkConfig} + +object TelemetryUtils { + + def getAccountId(config: Config.Streams[KinesisSinkConfig]): IO[Option[String]] = + Resource + .make( + IO(KinesisSink.createKinesisClient(config.sink.endpoint, config.sink.region)).rethrow + )(c => IO(c.shutdown())) + .use { kinesis => + IO { + val streamArn = KinesisSink.describeStream(kinesis, config.good).getStreamARN + Some(extractAccountId(streamArn)) + } + } + .handleError(_ => None) + + def extractAccountId(kinesisStreamArn: String): String = + kinesisStreamArn.split(":")(4) + +} diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala index eb4841bd6..f5c0fa188 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala @@ -14,7 +14,6 @@ package sinks import cats.effect.{Resource, Sync} import cats.implicits.catsSyntaxMonadErrorRethrow import cats.syntax.either._ -import com.amazonaws.auth._ import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration import com.amazonaws.services.kinesis.model._ import com.amazonaws.services.kinesis.{AmazonKinesis, AmazonKinesisClientBuilder} @@ -374,10 +373,8 @@ class KinesisSink[F[_]: Sync] private ( log.info(s"Starting background check for Kinesis stream $streamName") while (!kinesisHealthy) { Try { - val describeRequest = new DescribeStreamSummaryRequest() - describeRequest.setStreamName(streamName) - val describeResult = client.describeStreamSummary(describeRequest) - describeResult.getStreamDescriptionSummary().getStreamStatus() + val streamDescription = describeStream(client, streamName) + streamDescription.getStreamStatus() } match { case Success("ACTIVE") => log.info(s"Stream $streamName ACTIVE") @@ -462,6 +459,31 @@ object KinesisSink { Resource.make(acquire)(release) } + /** + * Creates a new Kinesis client. + * @param provider aws credentials provider + * @param endpoint kinesis endpoint where the stream resides + * @param region aws region where the stream resides + * @return the initialized AmazonKinesisClient + */ + def createKinesisClient( + endpoint: String, + region: String + ): Either[Throwable, AmazonKinesis] = + Either.catchNonFatal( + AmazonKinesisClientBuilder + .standard() + .withEndpointConfiguration(new EndpointConfiguration(endpoint, region)) + .build() + ) + + def describeStream(client: AmazonKinesis, streamName: String) = { + val describeRequest = new DescribeStreamSummaryRequest() + describeRequest.setStreamName(streamName) + val describeResult = client.describeStreamSummary(describeRequest) + describeResult.getStreamDescriptionSummary() + } + /** * Create a KinesisSink and schedule a task to flush its EventStorage. * Exists so that no threads can get a reference to the KinesisSink @@ -476,9 +498,8 @@ object KinesisSink { executorService: ScheduledExecutorService ): Either[Throwable, KinesisSink[F]] = { val clients = for { - provider <- getProvider(kinesisConfig.aws) - kinesisClient <- createKinesisClient(provider, kinesisConfig.endpoint, kinesisConfig.region) - sqsClientAndName <- sqsBuffer(sqsBufferName, provider, kinesisConfig.region) + kinesisClient <- createKinesisClient(kinesisConfig.endpoint, kinesisConfig.region) + sqsClientAndName <- sqsBuffer(sqsBufferName, kinesisConfig.region) } yield (kinesisClient, sqsClientAndName) clients.map { @@ -502,66 +523,19 @@ object KinesisSink { } } - /** Create an aws credentials provider through env variables and iam. */ - private def getProvider(awsConfig: KinesisSinkConfig.AWSConfig): Either[Throwable, AWSCredentialsProvider] = { - def isDefault(key: String): Boolean = key == "default" - def isIam(key: String): Boolean = key == "iam" - def isEnv(key: String): Boolean = key == "env" - - ((awsConfig.accessKey, awsConfig.secretKey) match { - case (a, s) if isDefault(a) && isDefault(s) => - new DefaultAWSCredentialsProviderChain().asRight - case (a, s) if isDefault(a) || isDefault(s) => - "accessKey and secretKey must both be set to 'default' or neither".asLeft - case (a, s) if isIam(a) && isIam(s) => - InstanceProfileCredentialsProvider.getInstance().asRight - case (a, s) if isIam(a) && isIam(s) => - "accessKey and secretKey must both be set to 'iam' or neither".asLeft - case (a, s) if isEnv(a) && isEnv(s) => - new EnvironmentVariableCredentialsProvider().asRight - case (a, s) if isEnv(a) || isEnv(s) => - "accessKey and secretKey must both be set to 'env' or neither".asLeft - case _ => - new AWSStaticCredentialsProvider( - new BasicAWSCredentials(awsConfig.accessKey, awsConfig.secretKey) - ).asRight - }).leftMap(new IllegalArgumentException(_)) - } - - /** - * Creates a new Kinesis client. - * @param provider aws credentials provider - * @param endpoint kinesis endpoint where the stream resides - * @param region aws region where the stream resides - * @return the initialized AmazonKinesisClient - */ - private def createKinesisClient( - provider: AWSCredentialsProvider, - endpoint: String, - region: String - ): Either[Throwable, AmazonKinesis] = - Either.catchNonFatal( - AmazonKinesisClientBuilder - .standard() - .withCredentials(provider) - .withEndpointConfiguration(new EndpointConfiguration(endpoint, region)) - .build() - ) - private def sqsBuffer( sqsBufferName: Option[String], - provider: AWSCredentialsProvider, region: String ): Either[Throwable, Option[SqsClientAndName]] = sqsBufferName match { case Some(name) => - createSqsClient(provider, region).map(amazonSqs => Some(SqsClientAndName(amazonSqs, name))) + createSqsClient(region).map(amazonSqs => Some(SqsClientAndName(amazonSqs, name))) case None => None.asRight } - private def createSqsClient(provider: AWSCredentialsProvider, region: String): Either[Throwable, AmazonSQS] = + private def createSqsClient(region: String): Either[Throwable, AmazonSQS] = Either.catchNonFatal( - AmazonSQSClientBuilder.standard().withRegion(region).withCredentials(provider).build + AmazonSQSClientBuilder.standard().withRegion(region).build ) /** diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala index 9942b0768..8826c6b4b 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala @@ -11,7 +11,6 @@ final case class KinesisSinkConfig( maxBytes: Int, region: String, threadPoolSize: Int, - aws: KinesisSinkConfig.AWSConfig, backoffPolicy: KinesisSinkConfig.BackoffPolicy, customEndpoint: Option[String], sqsGoodBuffer: Option[String], diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala new file mode 100644 index 000000000..3cc62ec3e --- /dev/null +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala @@ -0,0 +1,13 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import org.specs2.mutable.Specification + +class TelemetryUtilsSpec extends Specification { + + "extractAccountId" should { + "be able to extract account id from kinesis stream arn successfully" in { + val streamArn = "arn:aws:kinesis:region:123456789:stream/name" + TelemetryUtils.extractAccountId(streamArn) must beEqualTo("123456789") + } + } +} diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index d2b9090c7..ea8a87078 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -126,10 +126,6 @@ object KinesisConfigSpec { maxBytes = 1000000, region = "eu-central-1", threadPoolSize = 10, - aws = KinesisSinkConfig.AWSConfig( - accessKey = "iam", - secretKey = "iam" - ), backoffPolicy = KinesisSinkConfig.BackoffPolicy( minBackoff = 500, maxBackoff = 1500, diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala index 9877d3b9b..2d777fddf 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala @@ -30,6 +30,6 @@ object NsqCollector extends App[NsqSinkConfig](BuildInfo) { ) } yield Sinks(good, bad) - override def telemetryInfo(config: Config[NsqSinkConfig]): Telemetry.TelemetryInfo = - Telemetry.TelemetryInfo(None, None) + override def telemetryInfo(config: Config.Streams[NsqSinkConfig]): IO[Telemetry.TelemetryInfo] = + IO(Telemetry.TelemetryInfo(None, None, None)) } diff --git a/project/Dependencies.scala b/project/Dependencies.scala index 12e861226..4052f5f18 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -58,6 +58,7 @@ object Dependencies { val specs2 = "4.11.0" val specs2CE = "1.5.0" val testcontainers = "0.40.10" + val ceTestkit = "3.4.5" object Legacy { val specs2CE = "0.4.1" @@ -116,8 +117,9 @@ object Dependencies { // Scala (test only) // Test common - val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test - val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % Test + val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test + val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % Test + val ceTestkit = "org.typelevel" %% "cats-effect-testkit" % V.ceTestkit % Test // Test Akka val akkaTestkit = "com.typesafe.akka" %% "akka-testkit" % V.akka % Test diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala index cc71cf6ee..026caf030 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala @@ -14,9 +14,12 @@ object PubSubCollector extends App[PubSubSinkConfig](BuildInfo) { bad <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.bad) } yield Sinks(good, bad) - override def telemetryInfo(config: Config[PubSubSinkConfig]): Telemetry.TelemetryInfo = - Telemetry.TelemetryInfo( - region = None, - cloud = Some("GCP") + override def telemetryInfo(config: Config.Streams[PubSubSinkConfig]): IO[Telemetry.TelemetryInfo] = + IO( + Telemetry.TelemetryInfo( + region = None, + cloud = Some("GCP"), + unhashedInstallationId = Some(config.sink.googleProjectId) + ) ) } diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala index 86ef6c113..d23630a87 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala @@ -38,9 +38,14 @@ object SqsCollector extends App[SqsSinkConfig](BuildInfo) { } yield Sinks(good, bad) } - override def telemetryInfo(config: Config[SqsSinkConfig]): Telemetry.TelemetryInfo = - Telemetry.TelemetryInfo( - region = Some(config.streams.sink.region), - cloud = Some("AWS") - ) + override def telemetryInfo(config: Config.Streams[SqsSinkConfig]): IO[Telemetry.TelemetryInfo] = + TelemetryUtils + .getAccountId(config) + .map(id => + Telemetry.TelemetryInfo( + region = Some(config.sink.region), + cloud = Some("AWS"), + unhashedInstallationId = id + ) + ) } diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala new file mode 100644 index 000000000..7aa013c77 --- /dev/null +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala @@ -0,0 +1,25 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import cats.effect.{IO, Resource} +import com.snowplowanalytics.snowplow.collector.core.Config +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ + +object TelemetryUtils { + + def getAccountId(config: Config.Streams[SqsSinkConfig]): IO[Option[String]] = + Resource + .make( + IO(SqsSink.createSqsClient(config.sink.region)).rethrow + )(c => IO(c.shutdown())) + .use { client => + IO { + val sqsQueueUrl = client.getQueueUrl(config.good).getQueueUrl + Some(extractAccountId(sqsQueueUrl)) + } + } + .handleError(_ => None) + + def extractAccountId(sqsQueueUrl: String): String = + sqsQueueUrl.split("/")(3) + +} diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala index aa9edf1dc..94e8de375 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala @@ -296,6 +296,11 @@ object SqsSink { Resource.make(acquire)(release) } + def createSqsClient(region: String): Either[Throwable, AmazonSQS] = + Either.catchNonFatal( + AmazonSQSClientBuilder.standard().withRegion(region).build + ) + /** * Create an SqsSink and schedule a task to flush its EventStorage. * Exists so that no threads can get a reference to the SqsSink @@ -307,16 +312,11 @@ object SqsSink { bufferConfig: Config.Buffer, queueName: String, executorService: ScheduledExecutorService - ): Either[Throwable, SqsSink[F]] = { - val client = Either.catchNonFatal( - AmazonSQSClientBuilder.standard().withRegion(sqsConfig.region).build - ) - - client.map { c => + ): Either[Throwable, SqsSink[F]] = + createSqsClient(sqsConfig.region).map { c => val sqsSink = new SqsSink(maxBytes, c, sqsConfig, bufferConfig, queueName, executorService) sqsSink.EventStorage.scheduleFlush() sqsSink.checkSqsHealth() sqsSink } - } } diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala new file mode 100644 index 000000000..7c8183f63 --- /dev/null +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtilsSpec.scala @@ -0,0 +1,13 @@ +package com.snowplowanalytics.snowplow.collectors.scalastream + +import org.specs2.mutable.Specification + +class TelemetryUtilsSpec extends Specification { + + "extractAccountId" should { + "be able to extract account id from sqs queue url successfully" in { + val queueUrl = "https://sqs.region.amazonaws.com/123456789/queue" + TelemetryUtils.extractAccountId(queueUrl) must beEqualTo("123456789") + } + } +} diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala index ac8070eb4..c307c5bc3 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala @@ -13,6 +13,6 @@ object StdoutCollector extends App[SinkConfig](BuildInfo) { Resource.pure(Sinks(good, bad)) } - override def telemetryInfo(config: Config[SinkConfig]): Telemetry.TelemetryInfo = - Telemetry.TelemetryInfo(None, None) + override def telemetryInfo(config: Config.Streams[SinkConfig]): IO[Telemetry.TelemetryInfo] = + IO(Telemetry.TelemetryInfo(None, None, None)) } From 85e8128debf62c326a24455681286b44fda60f4d Mon Sep 17 00:00:00 2001 From: Benjamin Benoist Date: Tue, 3 Oct 2023 17:37:14 +0200 Subject: [PATCH 24/39] Add support for handling /robots.txt (close #385) --- .../Routes.scala | 2 + .../scalastream/it/core/RobotsSpec.scala | 61 +++++++++++++++++++ 2 files changed, 63 insertions(+) create mode 100644 kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/RobotsSpec.scala diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index 5f3ef43cd..d8e053c9d 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -21,6 +21,8 @@ class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, service: IService[F]) e ifTrue = Ok("ok"), ifFalse = ServiceUnavailable("Service Unavailable") ) + case GET -> Root / "robots.txt" => + Ok("User-agent: *\nDisallow: /\n\nUser-agent: Googlebot\nDisallow: /\n\nUser-agent: AdsBot-Google\nDisallow: /") } private val corsRoute = HttpRoutes.of[F] { diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/RobotsSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/RobotsSpec.scala new file mode 100644 index 000000000..6d9368c1b --- /dev/null +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/RobotsSpec.scala @@ -0,0 +1,61 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.it.core + +import scala.concurrent.duration._ + +import org.specs2.mutable.Specification + +import cats.effect.IO + +import org.http4s.{Method, Request, Uri} + +import cats.effect.testing.specs2.CatsEffect + +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.Kinesis +import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containers._ +import com.snowplowanalytics.snowplow.collectors.scalastream.it.Http + +class RobotsSpec extends Specification with Localstack with CatsEffect { + + override protected val Timeout = 5.minutes + + "collector" should { + "respond to /robots.txt with 200 and not emit any event" in { + val testName = "robots" + val streamGood = s"$testName-raw" + val streamBad = s"$testName-bad-1" + + Collector.container( + "kinesis/src/it/resources/collector.hocon", + testName, + streamGood, + streamBad + ).use { collector => + val uri = Uri.unsafeFromString(s"http://${collector.host}:${collector.port}/robots.txt") + val request = Request[IO](Method.GET, uri) + + for { + response <- Http.response(request) + bodyBytes <- response.body.compile.toList + body = new String(bodyBytes.toArray) + _ <- IO.sleep(10.second) + collectorOutput <- Kinesis.readOutput(streamGood, streamBad) + } yield { + response.status.code must beEqualTo(200) + body must beEqualTo("User-agent: *\nDisallow: /\n\nUser-agent: Googlebot\nDisallow: /\n\nUser-agent: AdsBot-Google\nDisallow: /") + collectorOutput.good must beEmpty + collectorOutput.bad must beEmpty + } + } + } + } +} From 206014b2fcbfe9b38949c5aa414fd556ca5062de Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Mon, 18 Sep 2023 19:44:46 +0200 Subject: [PATCH 25/39] Make maxConnections and idleTimeout configurable (close #386) Previously, idleTimeout has been hardcoded, maxConnections hasn't been configured. Now, these parameters are set within `networking` section and used throughout http4s backends. --- http4s/src/main/resources/reference.conf | 5 ++++ .../Config.scala | 8 ++++++ .../HttpServer.scala | 28 +++++++++++-------- .../Run.scala | 3 +- .../TestUtils.scala | 4 +++ .../KafkaConfigSpec.scala | 4 +++ .../sinks/KinesisConfigSpec.scala | 4 +++ .../NsqConfigSpec.scala | 4 +++ .../ConfigSpec.scala | 4 +++ .../SqsConfigSpec.scala | 4 +++ 10 files changed, 55 insertions(+), 13 deletions(-) diff --git a/http4s/src/main/resources/reference.conf b/http4s/src/main/resources/reference.conf index 9ae8f6849..929d36685 100644 --- a/http4s/src/main/resources/reference.conf +++ b/http4s/src/main/resources/reference.conf @@ -86,6 +86,11 @@ port = 443 } + networking { + maxConnections = 1024 + idleTimeout = 610 seconds + } + enableDefaultRedirect = false preTerminationPeriod = 10 seconds diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index 62e4c0d07..cf5bacdcb 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -3,6 +3,7 @@ package com.snowplowanalytics.snowplow.collector.core import scala.concurrent.duration._ import io.circe.config.syntax._ + import io.circe.generic.semiauto._ import io.circe.Decoder import io.circe._ @@ -25,6 +26,7 @@ case class Config[+SinkConfig]( monitoring: Config.Monitoring, telemetry: Config.Telemetry, ssl: Config.SSL, + networking: Config.Networking, enableDefaultRedirect: Boolean, redirectDomains: Set[String], preTerminationPeriod: FiniteDuration @@ -140,6 +142,11 @@ object Config { autoGeneratedId: Option[String] ) + case class Networking( + maxConnections: Int, + idleTimeout: FiniteDuration + ) + implicit def decoder[SinkConfig: Decoder]: Decoder[Config[SinkConfig]] = { implicit val p3p = deriveDecoder[P3P] implicit val crossDomain = deriveDecoder[CrossDomain] @@ -166,6 +173,7 @@ object Config { implicit val monitoring = deriveDecoder[Monitoring] implicit val ssl = deriveDecoder[SSL] implicit val telemetry = deriveDecoder[Telemetry] + implicit val networking = deriveDecoder[Networking] deriveDecoder[Config[SinkConfig]] } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala index 7d0f76a8e..e62b7322f 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala @@ -8,8 +8,6 @@ import io.netty.handler.ssl._ import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger -import scala.concurrent.duration.DurationLong - import com.comcast.ip4s.{IpAddress, Port} import cats.implicits._ @@ -33,12 +31,13 @@ object HttpServer { app: HttpApp[F], interface: String, port: Int, - secure: Boolean + secure: Boolean, + networking: Config.Networking ): Resource[F, Server] = sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { - case Some("BLAZE") | None => buildBlazeServer[F](app, port, secure) - case Some("EMBER") => buildEmberServer[F](app, interface, port, secure) - case Some("NETTY") => buildNettyServer[F](app, port, secure) + case Some("BLAZE") | None => buildBlazeServer[F](app, port, secure, networking) + case Some("EMBER") => buildEmberServer[F](app, interface, port, secure, networking) + case Some("NETTY") => buildNettyServer[F](app, port, secure, networking) case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") } @@ -46,7 +45,8 @@ object HttpServer { app: HttpApp[F], interface: String, port: Int, - secure: Boolean + secure: Boolean, + networking: Config.Networking ) = { implicit val network = Network.forAsync[F] Resource.eval(Logger[F].info("Building ember server")) >> @@ -55,7 +55,8 @@ object HttpServer { .withHost(IpAddress.fromString(interface).get) .withPort(Port.fromInt(port).get) .withHttpApp(app) - .withIdleTimeout(610.seconds) + .withIdleTimeout(networking.idleTimeout) + .withMaxConnections(networking.maxConnections) .cond(secure, _.withTLS(TLSContext.Builder.forAsync.fromSSLContext(SSLContext.getDefault))) .build } @@ -63,26 +64,29 @@ object HttpServer { private def buildBlazeServer[F[_]: Async]( app: HttpApp[F], port: Int, - secure: Boolean + secure: Boolean, + networking: Config.Networking ): Resource[F, Server] = Resource.eval(Logger[F].info("Building blaze server")) >> BlazeServerBuilder[F] .bindSocketAddress(new InetSocketAddress(port)) .withHttpApp(app) - .withIdleTimeout(610.seconds) + .withIdleTimeout(networking.idleTimeout) + .withMaxConnections(networking.maxConnections) .cond(secure, _.withSslContext(SSLContext.getDefault)) .resource private def buildNettyServer[F[_]: Async]( app: HttpApp[F], port: Int, - secure: Boolean + secure: Boolean, + networking: Config.Networking ) = Resource.eval(Logger[F].info("Building netty server")) >> NettyServerBuilder[F] .bindLocal(port) .withHttpApp(app) - .withIdleTimeout(610.seconds) + .withIdleTimeout(networking.idleTimeout) .cond( secure, _.withSslContext( diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 20bed625b..944785107 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -74,7 +74,8 @@ object Run { new Routes[F](config.enableDefaultRedirect, collectorService).value, config.interface, if (config.ssl.enable) config.ssl.port else config.port, - config.ssl.enable + config.ssl.enable, + config.networking ) _ <- withGracefulShutdown(config.preTerminationPeriod)(httpServer) httpClient <- BlazeClientBuilder[F].resource diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index c465521ce..647871ee4 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -101,6 +101,10 @@ object TestUtils { false, 443 ), + networking = Networking( + 1024, + 610.seconds + ), enableDefaultRedirect = false, redirectDomains = Set.empty[String], preTerminationPeriod = 10.seconds, diff --git a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala index 6abac5842..e6c503c06 100644 --- a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala +++ b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala @@ -126,6 +126,10 @@ object KafkaConfigSpec { moduleVersion = None, instanceId = None, autoGeneratedId = None + ), + networking = Config.Networking( + maxConnections = 1024, + idleTimeout = 610.seconds ) ) } diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index ea8a87078..477927e8c 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -113,6 +113,10 @@ object KinesisConfigSpec { enableDefaultRedirect = false, redirectDomains = Set.empty, preTerminationPeriod = 10.seconds, + networking = Config.Networking( + maxConnections = 1024, + idleTimeout = 610.seconds + ), streams = Config.Streams( good = "good", bad = "bad", diff --git a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala index a401b4b87..f5f01e20c 100644 --- a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala +++ b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala @@ -125,6 +125,10 @@ object NsqConfigSpec { moduleVersion = None, instanceId = None, autoGeneratedId = None + ), + networking = Config.Networking( + maxConnections = 1024, + idleTimeout = 610.seconds ) ) } diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index 77256389b..526a849d7 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -97,6 +97,10 @@ object ConfigSpec { enableDefaultRedirect = false, redirectDomains = Set.empty, preTerminationPeriod = 10.seconds, + networking = Config.Networking( + maxConnections = 1024, + idleTimeout = 610.seconds + ), streams = Config.Streams( good = "good", bad = "bad", diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index 54832af22..ca04acecd 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -98,6 +98,10 @@ object SqsConfigSpec { enableDefaultRedirect = false, redirectDomains = Set.empty, preTerminationPeriod = 10.seconds, + networking = Config.Networking( + maxConnections = 1024, + idleTimeout = 610.seconds + ), streams = Config.Streams( good = "good", bad = "bad", From f3e4caad8699fcf062ebce25543d36de6c0c5b53 Mon Sep 17 00:00:00 2001 From: colmsnowplow Date: Wed, 11 Oct 2023 12:31:20 +0200 Subject: [PATCH 26/39] Add Kafka sink healthcheck (close #387) --- .../sinks/KafkaSink.scala | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala index ac502f5c5..5c199ca60 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala @@ -30,9 +30,9 @@ class KafkaSink[F[_]: Sync]( topicName: String ) extends Sink[F] { - private lazy val log = LoggerFactory.getLogger(getClass()) - - override def isHealthy: F[Boolean] = Sync[F].pure(true) + private lazy val log = LoggerFactory.getLogger(getClass()) + @volatile private var kafkaHealthy: Boolean = false + override def isHealthy: F[Boolean] = Sync[F].pure(kafkaHealthy) /** * Store raw events to the topic @@ -47,7 +47,12 @@ class KafkaSink[F[_]: Sync]( new ProducerRecord(topicName, key, event), new Callback { override def onCompletion(metadata: RecordMetadata, e: Exception): Unit = - if (e != null) log.error(s"Sending event failed: ${e.getMessage}") + if (e != null) { + kafkaHealthy = false + log.error(s"Sending event failed: ${e.getMessage}") + } else { + kafkaHealthy = true + } } ) } From 1cc57db593bda29be9c5a8fe18bd0a7fd240e0e6 Mon Sep 17 00:00:00 2001 From: Alex Benini Date: Wed, 3 Jan 2024 18:47:46 +0100 Subject: [PATCH 27/39] Add separate good/bad sink configurations (close #388) --- examples/config.kafka.extended.hocon | 82 ++++++++---- examples/config.kafka.minimal.hocon | 14 +- examples/config.kinesis.extended.hocon | 122 ++++++++++++++---- examples/config.kinesis.minimal.hocon | 10 +- examples/config.nsq.extended.hocon | 36 +++--- examples/config.nsq.minimal.hocon | 11 +- examples/config.pubsub.extended.hocon | 89 +++++++++---- examples/config.pubsub.minimal.hocon | 10 +- examples/config.sqs.extended.hocon | 88 +++++++++---- examples/config.sqs.minimal.hocon | 10 +- http4s/src/main/resources/reference.conf | 6 - .../App.scala | 2 +- .../Config.scala | 58 +++++++-- .../resources/test-config-new-style.hocon | 36 ++++++ ...nfig.hocon => test-config-old-style.hocon} | 6 + .../ConfigParserSpec.scala | 68 ++++++---- .../TestUtils.scala | 28 ++-- kafka/src/it/resources/collector.hocon | 11 +- kafka/src/main/resources/application.conf | 9 +- .../KafkaCollector.scala | 14 +- .../sinks/KafkaSink.scala | 9 +- .../sinks/KafkaSinkConfig.scala | 4 +- .../KafkaConfigSpec.scala | 41 ++++-- .../collector-cookie-anonymous.hocon | 17 ++- .../collector-cookie-attributes-1.hocon | 17 ++- .../collector-cookie-attributes-2.hocon | 17 ++- .../resources/collector-cookie-domain.hocon | 17 ++- .../resources/collector-cookie-fallback.hocon | 17 ++- .../collector-cookie-no-domain.hocon | 17 ++- .../it/resources/collector-custom-paths.hocon | 17 ++- .../collector-doNotTrackCookie-disabled.hocon | 17 ++- .../collector-doNotTrackCookie-enabled.hocon | 17 ++- kinesis/src/it/resources/collector.hocon | 17 ++- kinesis/src/main/resources/application.conf | 8 +- .../KinesisCollector.scala | 22 +--- .../TelemetryUtils.scala | 4 +- .../sinks/KinesisSink.scala | 24 ++-- .../sinks/KinesisSinkConfig.scala | 3 +- .../sinks/KinesisConfigSpec.scala | 62 ++++++--- nsq/src/main/resources/application.conf | 6 +- .../NsqCollector.scala | 12 +- .../sinks/NsqSink.scala | 11 +- .../sinks/NsqSinkConfig.scala | 4 +- .../NsqConfigSpec.scala | 37 ++++-- pubsub/src/it/resources/collector.hocon | 15 ++- pubsub/src/main/resources/application.conf | 9 +- .../PubSubCollector.scala | 6 +- .../sinks/PubSubSink.scala | 15 +-- .../sinks/PubSubSinkConfig.scala | 3 +- .../ConfigSpec.scala | 62 ++++++--- sqs/src/main/resources/application.conf | 3 + .../SqsCollector.scala | 20 +-- .../TelemetryUtils.scala | 4 +- .../sinks/SqsSink.scala | 17 +-- .../sinks/SqsSinkConfig.scala | 6 +- .../SqsConfigSpec.scala | 47 +++++-- stdout/src/main/resources/application.conf | 8 ++ .../SinkConfig.scala | 4 +- .../StdoutCollector.scala | 4 +- 59 files changed, 903 insertions(+), 447 deletions(-) create mode 100644 http4s/src/test/resources/test-config-new-style.hocon rename http4s/src/test/resources/{test-config.hocon => test-config-old-style.hocon} (66%) diff --git a/examples/config.kafka.extended.hocon b/examples/config.kafka.extended.hocon index 072fb28fa..426723138 100644 --- a/examples/config.kafka.extended.hocon +++ b/examples/config.kafka.extended.hocon @@ -160,28 +160,17 @@ collector { } streams { - # Events which have successfully been collected will be stored in the good stream/topic - good = "good" - - # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. - # The collector can currently produce two flavours of bad row: - # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; - # - a generic_error if a request's querystring cannot be parsed because of illegal characters - bad = "bad" # Whether to use the incoming event's ip as the partition key for the good stream/topic # Note: Nsq does not make use of partition key. useIpAddressAsPartitionKey = false - # Enable the chosen sink by uncommenting the appropriate configuration - sink { - # Choose between kinesis, sqs, google-pub-sub, kafka, nsq, or stdout. - # To use stdout, comment or remove everything in the "collector.streams.sink" section except - # "enabled" which should be set to "stdout". - enabled = kafka - - # Or Kafka + # Events which have successfully been collected will be stored in the good stream/topic + good { + + name = "good" brokers = "localhost:9092,another.host:9092" + ## Number of retries to perform before giving up on sending a record retries = 10 # The kafka producer has a variety of possible configuration options defined at @@ -190,6 +179,7 @@ collector { # "bootstrap.servers" = brokers # "buffer.memory" = buffer.byteLimit # "linger.ms" = buffer.timeLimit + #producerConf { # acks = all # "key.serializer" = "org.apache.kafka.common.serialization.StringSerializer" @@ -200,18 +190,58 @@ collector { # If a record is bigger, a size violation bad row is emitted instead # Default: 1 MB maxBytes = 1000000 + + # Incoming events are stored in a buffer before being sent to Kafka. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } } - # Incoming events are stored in a buffer before being sent to Kinesis/Kafka. - # Note: Buffering is not supported by NSQ. - # The buffer is emptied whenever: - # - the number of stored records reaches record-limit or - # - the combined size of the stored records reaches byte-limit or - # - the time in milliseconds since the buffer was last emptied reaches time-limit - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 + # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. + # The collector can currently produce two flavours of bad row: + # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; + # - a generic_error if a request's querystring cannot be parsed because of illegal characters + bad { + + name = "bad" + brokers = "localhost:9092,another.host:9092" + + ## Number of retries to perform before giving up on sending a record + retries = 10 + # The kafka producer has a variety of possible configuration options defined at + # https://kafka.apache.org/documentation/#producerconfigs + # Some values are set to other values from this config by default: + # "bootstrap.servers" = brokers + # "buffer.memory" = buffer.byteLimit + # "linger.ms" = buffer.timeLimit + + #producerConf { + # acks = all + # "key.serializer" = "org.apache.kafka.common.serialization.StringSerializer" + # "value.serializer" = "org.apache.kafka.common.serialization.StringSerializer" + #} + + # Optional. Maximum number of bytes that a single record can contain. + # If a record is bigger, a size violation bad row is emitted instead + # Default: 1 MB + maxBytes = 1000000 + + # Incoming events are stored in a buffer before being sent to Kafka. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } } } diff --git a/examples/config.kafka.minimal.hocon b/examples/config.kafka.minimal.hocon index 29ca6ff67..1547b5c1e 100644 --- a/examples/config.kafka.minimal.hocon +++ b/examples/config.kafka.minimal.hocon @@ -3,11 +3,13 @@ collector { port = 8080 streams { - good = "good" - bad = "bad" - - sink { - brokers = "localhost:9092,another.host:9092" + good { + name = "good" + brokers = "localhost:9092,another.host:9092" + } + bad { + name = "bad" + brokers = "localhost:9092,another.host:9092" } } -} +} \ No newline at end of file diff --git a/examples/config.kinesis.extended.hocon b/examples/config.kinesis.extended.hocon index a3ef7d3c7..f21906ec8 100644 --- a/examples/config.kinesis.extended.hocon +++ b/examples/config.kinesis.extended.hocon @@ -160,26 +160,19 @@ collector { } streams { - # Events which have successfully been collected will be stored in the good stream/topic - good = "good" - # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. - # The collector can currently produce two flavours of bad row: - # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; - # - a generic_error if a request's querystring cannot be parsed because of illegal characters + bad = "bad" # Whether to use the incoming event's ip as the partition key for the good stream/topic # Note: Nsq does not make use of partition key. useIpAddressAsPartitionKey = false - # Enable the chosen sink by uncommenting the appropriate configuration - sink { - # Choose between kinesis, sqs, google-pub-sub, kafka, nsq, or stdout. - # To use stdout, comment or remove everything in the "collector.streams.sink" section except - # "enabled" which should be set to "stdout". - enabled = kinesis + good { + # Events which have successfully been collected will be stored in the good stream/topic + name = "good" + # Region where the streams are located region = "eu-central-1" @@ -190,15 +183,13 @@ collector { # Thread pool size for Kinesis and SQS API requests threadPoolSize = 10 - # Optional SQS buffer for good and bad events (respectively). + # Optional SQS buffer for good events. # When messages can't be sent to Kinesis, they will be sent to SQS. # If not configured, sending to Kinesis will be retried. # This should only be set up for the Kinesis sink, where it acts as a failsafe. # For the SQS sink, the good and bad queue should be specified under streams.good and streams.bad, respectively and these settings should be ignored. #sqsGoodBuffer = {{sqsGoodBuffer}} - #sqsBadBuffer = {{sqsBadBuffer}} - # Optional. Maximum number of bytes that a single record can contain. # If a record is bigger, a size violation bad row is emitted instead # Default: 192 kb @@ -239,19 +230,96 @@ collector { # This is the interval for the calls. # /sink-health is made healthy as soon as requests are successful or records are successfully inserted. startupCheckInterval = 1 second + + # Incoming events are stored in a buffer before being sent to Kinesis. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } } - - # Incoming events are stored in a buffer before being sent to Kinesis/Kafka. - # Note: Buffering is not supported by NSQ. - # The buffer is emptied whenever: - # - the number of stored records reaches record-limit or - # - the combined size of the stored records reaches byte-limit or - # - the time in milliseconds since the buffer was last emptied reaches time-limit - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } + + # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. + # The collector can currently produce two flavours of bad row: + # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; + # - a generic_error if a request's querystring cannot be parsed because of illegal characters + bad { + + name = "bad" + + # Region where the streams are located + region = "eu-central-1" + + ## Optional endpoint url configuration to override aws kinesis endpoints, + ## this can be used to specify local endpoints when using localstack + # customEndpoint = {{kinesisEndpoint}} + + # Thread pool size for Kinesis and SQS API requests + threadPoolSize = 10 + + # Optional SQS buffer for bad events. + # When messages can't be sent to Kinesis, they will be sent to SQS. + # If not configured, sending to Kinesis will be retried. + # This should only be set up for the Kinesis sink, where it acts as a failsafe. + # For the SQS sink, the good and bad queue should be specified under streams.good and streams.bad, respectively and these settings should be ignored. + #sqsBadBuffer = {{sqsBadBuffer}} + + # Optional. Maximum number of bytes that a single record can contain. + # If a record is bigger, a size violation bad row is emitted instead + # Default: 192 kb + # SQS has a record size limit of 256 kb, but records are encoded with Base64, + # which adds approximately 33% of the size, so we set the limit to 256 kb * 3/4 + sqsMaxBytes = 192000 + + # The following are used to authenticate for the Amazon Kinesis and SQS sinks. + # If both are set to 'default', the default provider chain is used + # (see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html) + # If both are set to 'iam', use AWS IAM Roles to provision credentials. + # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY + aws { + accessKey = iam + secretKey = iam + } + + # Optional + backoffPolicy { + # Minimum backoff period in milliseconds + minBackoff = 500 + # Maximum backoff period in milliseconds + maxBackoff = 1500 + # Failed inserts are retried forever. + # In case of just Kinesis without SQS, number of retries before setting /sink-health unhealthy. + # In case of Kinesis + SQS, number of retries with one before retrying with the other. + maxRetries = 3 + } + + # Optional. Maximum number of bytes that a single record can contain. + # If a record is bigger, a size violation bad row is emitted instead + # Default: 1 MB + # If SQS buffer is activated, sqsMaxBytes is used instead + maxBytes = 1000000 + + # When collector starts, it checks if Kinesis streams exist with describeStreamSummary + # and if SQS buffers exist with getQueueUrl (if configured). + # This is the interval for the calls. + # /sink-health is made healthy as soon as requests are successful or records are successfully inserted. + startupCheckInterval = 1 second + + # Incoming events are stored in a buffer before being sent to Kinesis. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } + } } # Telemetry sends heartbeat events to external pipeline. diff --git a/examples/config.kinesis.minimal.hocon b/examples/config.kinesis.minimal.hocon index 2e0cb2314..9501390a5 100644 --- a/examples/config.kinesis.minimal.hocon +++ b/examples/config.kinesis.minimal.hocon @@ -3,10 +3,12 @@ collector { port = 8080 streams { - good = "good" - bad = "bad" - - sink { + good { + name = "good" + region = eu-central-1 + } + bad { + name = "bad" region = eu-central-1 } } diff --git a/examples/config.nsq.extended.hocon b/examples/config.nsq.extended.hocon index e4309d916..26b8c672f 100644 --- a/examples/config.nsq.extended.hocon +++ b/examples/config.nsq.extended.hocon @@ -160,27 +160,14 @@ collector { } streams { - # Events which have successfully been collected will be stored in the good stream/topic - good = "good" - - # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. - # The collector can currently produce two flavours of bad row: - # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; - # - a generic_error if a request's querystring cannot be parsed because of illegal characters - bad = "bad" # Whether to use the incoming event's ip as the partition key for the good stream/topic # Note: Nsq does not make use of partition key. useIpAddressAsPartitionKey = false - # Enable the chosen sink by uncommenting the appropriate configuration - sink { - # Choose between kinesis, sqs, google-pub-sub, kafka, nsq, or stdout. - # To use stdout, comment or remove everything in the "collector.streams.sink" section except - # "enabled" which should be set to "stdout". - enabled = nsq - - # Or NSQ + # Events which have successfully been collected will be stored in the good stream/topic + good { + name = "good" ## Host name for nsqd host = "nsqHost" ## TCP port for nsqd, 4150 by default @@ -191,6 +178,23 @@ collector { # Default: 1 MB maxBytes = 1000000 } + + # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. + # The collector can currently produce two flavours of bad row: + # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; + # - a generic_error if a request's querystring cannot be parsed because of illegal characters + bad { + name = "bad" + ## Host name for nsqd + host = "nsqHost" + ## TCP port for nsqd, 4150 by default + port = 4150 + + # Optional. Maximum number of bytes that a single record can contain. + # If a record is bigger, a size violation bad row is emitted instead + # Default: 1 MB + maxBytes = 1000000 + } } # Telemetry sends heartbeat events to external pipeline. diff --git a/examples/config.nsq.minimal.hocon b/examples/config.nsq.minimal.hocon index 97682cb1d..2b7afa7ca 100644 --- a/examples/config.nsq.minimal.hocon +++ b/examples/config.nsq.minimal.hocon @@ -3,10 +3,13 @@ collector { port = 8080 streams { - good = "good" - bad = "bad" - - sink { + good { + name = "good" + host = "nsqHost" + } + + bad { + name = "bad" host = "nsqHost" } } diff --git a/examples/config.pubsub.extended.hocon b/examples/config.pubsub.extended.hocon index 548588a0a..3f907a917 100644 --- a/examples/config.pubsub.extended.hocon +++ b/examples/config.pubsub.extended.hocon @@ -160,27 +160,16 @@ collector { } streams { - # Events which have successfully been collected will be stored in the good stream/topic - good = "good" - - # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. - # The collector can currently produce two flavours of bad row: - # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; - # - a generic_error if a request's querystring cannot be parsed because of illegal characters - bad = "bad" # Whether to use the incoming event's ip as the partition key for the good stream/topic # Note: Nsq does not make use of partition key. useIpAddressAsPartitionKey = false - # Enable the chosen sink by uncommenting the appropriate configuration - sink { - # Choose between kinesis, sqs, google-pub-sub, kafka, nsq, or stdout. - # To use stdout, comment or remove everything in the "collector.streams.sink" section except - # "enabled" which should be set to "stdout". - enabled = google-pub-sub - + # Events which have successfully been collected will be stored in the good stream/topic + good { + name = "good" + googleProjectId = "google-project-id" ## Minimum, maximum and total backoff periods, in milliseconds ## and multiplier between two backoff @@ -208,20 +197,68 @@ collector { # In case of failure of these retries, the events are added to a buffer # and every retryInterval collector retries to send them. retryInterval = 10 seconds + + + # Incoming events are stored in a buffer before being sent to Pubsub. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 100000 + recordLimit = 40 + timeLimit = 1000 + } } + + # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. + # The collector can currently produce two flavours of bad row: + # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; + # - a generic_error if a request's querystring cannot be parsed because of illegal characters + bad { + name = "bad" + + googleProjectId = "google-project-id" + ## Minimum, maximum and total backoff periods, in milliseconds + ## and multiplier between two backoff + backoffPolicy { + minBackoff = 1000 + maxBackoff = 1000 + totalBackoff = 9223372036854 + multiplier = 2 + initialRpcTimeout = 10000 + maxRpcTimeout = 10000 + rpcTimeoutMultiplier = 2 + } - # Incoming events are stored in a buffer before being sent to Kinesis/Kafka. - # Note: Buffering is not supported by NSQ. - # The buffer is emptied whenever: - # - the number of stored records reaches record-limit or - # - the combined size of the stored records reaches byte-limit or - # - the time in milliseconds since the buffer was last emptied reaches time-limit - buffer { - byteLimit = 100000 - recordLimit = 40 - timeLimit = 1000 - } + # Optional. Maximum number of bytes that a single record can contain. + # If a record is bigger, a size violation bad row is emitted instead + # Default: 10 MB + maxBytes = 10000000 + + # Optional. When collector starts, it checks if PubSub topics exist with listTopics. + # This is the interval for the calls. + # /sink-health is made healthy as soon as requests are successful or records are successfully inserted. + startupCheckInterval = 1 second + + # Optional. Collector uses built-in retry mechanism of PubSub API. + # In case of failure of these retries, the events are added to a buffer + # and every retryInterval collector retries to send them. + retryInterval = 10 seconds + + + # Incoming events are stored in a buffer before being sent to Pubsub. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 100000 + recordLimit = 40 + timeLimit = 1000 + } + } } # Telemetry sends heartbeat events to external pipeline. diff --git a/examples/config.pubsub.minimal.hocon b/examples/config.pubsub.minimal.hocon index fb06f3aba..b6fdb8d05 100644 --- a/examples/config.pubsub.minimal.hocon +++ b/examples/config.pubsub.minimal.hocon @@ -3,10 +3,12 @@ collector { port = 8080 streams { - good = "good" - bad = "bad" - - sink { + good { + name = "good" + googleProjectId = "google-project-id" + } + bad { + name = "bad" googleProjectId = "google-project-id" } } diff --git a/examples/config.sqs.extended.hocon b/examples/config.sqs.extended.hocon index 4534ab4e5..c65899a4d 100644 --- a/examples/config.sqs.extended.hocon +++ b/examples/config.sqs.extended.hocon @@ -155,26 +155,15 @@ collector { } streams { - # Events which have successfully been collected will be stored in the good stream/topic - good = "good" - - # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. - # The collector can currently produce two flavours of bad row: - # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; - # - a generic_error if a request's querystring cannot be parsed because of illegal characters - bad = "bad" # Whether to use the incoming event's ip as the partition key for the good stream/topic # Note: Nsq does not make use of partition key. useIpAddressAsPartitionKey = false - # Enable the chosen sink by uncommenting the appropriate configuration - sink { - # Choose between kinesis, sqs, google-pub-sub, kafka, nsq, or stdout. - # To use stdout, comment or remove everything in the "collector.streams.sink" section except - # "enabled" which should be set to "stdout". - enabled = sqs + # Events which have successfully been collected will be stored in the good stream/topic + good { + name = "good" # Region where the streams are located region = "eu-central-1" @@ -203,19 +192,68 @@ collector { # This is the interval for the calls. # /sink-health is made healthy as soon as requests are successful or records are successfully inserted. startupCheckInterval = 1 second + + # Incoming events are stored in a buffer before being sent to Kinesis/Kafka. + # Note: Buffering is not supported by NSQ. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } } + + # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad stream/topic. + # The collector can currently produce two flavours of bad row: + # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; + # - a generic_error if a request's querystring cannot be parsed because of illegal characters + bad { - # Incoming events are stored in a buffer before being sent to Kinesis/Kafka. - # Note: Buffering is not supported by NSQ. - # The buffer is emptied whenever: - # - the number of stored records reaches record-limit or - # - the combined size of the stored records reaches byte-limit or - # - the time in milliseconds since the buffer was last emptied reaches time-limit - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } + name = "bad" + # Region where the streams are located + region = "eu-central-1" + + # Thread pool size for Kinesis and SQS API requests + threadPoolSize = 10 + + # Optional + backoffPolicy { + # Minimum backoff period in milliseconds + minBackoff = 500 + # Maximum backoff period in milliseconds + maxBackoff = 1500 + # Failed inserts are retried forever. + # Number of retries before setting /sink-health unhealthy. + maxRetries = 3 + } + + # Optional. Maximum number of bytes that a single record can contain. + # If a record is bigger, a size violation bad row is emitted instead + # Default: 192 kb + # SQS has a record size limit of 256 kb, but records are encoded with Base64, + # which adds approximately 33% of the size, so we set the limit to 256 kb * 3/4 + maxBytes = 192000 + + # When collector starts, it checks if SQS buffers exist with getQueueUrl. + # This is the interval for the calls. + # /sink-health is made healthy as soon as requests are successful or records are successfully inserted. + startupCheckInterval = 1 second + + # Incoming events are stored in a buffer before being sent to Kinesis/Kafka. + # Note: Buffering is not supported by NSQ. + # The buffer is emptied whenever: + # - the number of stored records reaches record-limit or + # - the combined size of the stored records reaches byte-limit or + # - the time in milliseconds since the buffer was last emptied reaches time-limit + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } + } } # Telemetry sends heartbeat events to external pipeline. diff --git a/examples/config.sqs.minimal.hocon b/examples/config.sqs.minimal.hocon index 2e0cb2314..9501390a5 100644 --- a/examples/config.sqs.minimal.hocon +++ b/examples/config.sqs.minimal.hocon @@ -3,10 +3,12 @@ collector { port = 8080 streams { - good = "good" - bad = "bad" - - sink { + good { + name = "good" + region = eu-central-1 + } + bad { + name = "bad" region = eu-central-1 } } diff --git a/http4s/src/main/resources/reference.conf b/http4s/src/main/resources/reference.conf index 929d36685..96dfd594f 100644 --- a/http4s/src/main/resources/reference.conf +++ b/http4s/src/main/resources/reference.conf @@ -51,12 +51,6 @@ streams { useIpAddressAsPartitionKey = false - - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } } telemetry { diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala index 23b614458..22ee2e25f 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala @@ -12,7 +12,7 @@ import com.snowplowanalytics.snowplow.scalatracker.emitters.http4s.ceTracking import com.snowplowanalytics.snowplow.collector.core.model.Sinks -abstract class App[SinkConfig <: Config.Sink: Decoder](appInfo: AppInfo) +abstract class App[SinkConfig: Decoder](appInfo: AppInfo) extends CommandIOApp( name = App.helpCommand(appInfo), header = "Snowplow application that collects tracking events", diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index cf5bacdcb..86567becc 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -86,16 +86,12 @@ object Config { ) case class Streams[+SinkConfig]( - good: String, - bad: String, - useIpAddressAsPartitionKey: Boolean, - sink: SinkConfig, - buffer: Buffer + good: Sink[SinkConfig], + bad: Sink[SinkConfig], + useIpAddressAsPartitionKey: Boolean ) - trait Sink { - val maxBytes: Int - } + final case class Sink[+SinkConfig](name: String, buffer: Buffer, config: SinkConfig) case class Buffer( byteLimit: Long, @@ -166,15 +162,57 @@ object Config { implicit val redirectMacro = deriveDecoder[RedirectMacro] implicit val rootResponse = deriveDecoder[RootResponse] implicit val cors = deriveDecoder[CORS] - implicit val buffer = deriveDecoder[Buffer] - implicit val streams = deriveDecoder[Streams[SinkConfig]] implicit val statsd = deriveDecoder[Statsd] implicit val metrics = deriveDecoder[Metrics] implicit val monitoring = deriveDecoder[Monitoring] implicit val ssl = deriveDecoder[SSL] implicit val telemetry = deriveDecoder[Telemetry] implicit val networking = deriveDecoder[Networking] + implicit val sinkConfig = newDecoder[SinkConfig].or(legacyDecoder[SinkConfig]) + implicit val streams = deriveDecoder[Streams[SinkConfig]] + deriveDecoder[Config[SinkConfig]] } + implicit private val buffer: Decoder[Buffer] = deriveDecoder[Buffer] + + /** + * streams { + * good { + * name: "good-name" + * buffer {...} + * // rest of the sink config... + * } + * bad { + * name: "bad-name" + * buffer {...} + * // rest of the sink config... + * } + * } + */ + private def newDecoder[SinkConfig: Decoder]: Decoder[Sink[SinkConfig]] = + Decoder.instance { cursor => // cursor is at 'good'/'bad' section level + for { + sinkName <- cursor.get[String]("name") + config <- cursor.as[SinkConfig] + buffer <- cursor.get[Buffer]("buffer") + } yield Sink(sinkName, buffer, config) + } + + /** + * streams { + * good = "good-name" + * bad = "bad-name" + * buffer {...} //shared by good and bad + * sink {...} //shared by good and bad + * } + */ + private def legacyDecoder[SinkConfig: Decoder]: Decoder[Sink[SinkConfig]] = + Decoder.instance { cursor => //cursor is at 'good'/'bad' section level + for { + sinkName <- cursor.as[String] + config <- cursor.up.get[SinkConfig]("sink") //up first to the 'streams' section + buffer <- cursor.up.get[Buffer]("buffer") //up first to the 'streams' section + } yield Sink(sinkName, buffer, config) + } } diff --git a/http4s/src/test/resources/test-config-new-style.hocon b/http4s/src/test/resources/test-config-new-style.hocon new file mode 100644 index 000000000..06b3ba962 --- /dev/null +++ b/http4s/src/test/resources/test-config-new-style.hocon @@ -0,0 +1,36 @@ +collector { + interface = "0.0.0.0" + port = 8080 + + streams { + good { + name = "good" + + foo = "hello" + bar = "world" + + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } + } + + bad { + name = "bad" + + foo = "hello" + bar = "world" + + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } + } + } + + ssl { + enable = true + } +} diff --git a/http4s/src/test/resources/test-config.hocon b/http4s/src/test/resources/test-config-old-style.hocon similarity index 66% rename from http4s/src/test/resources/test-config.hocon rename to http4s/src/test/resources/test-config-old-style.hocon index 71202d62f..8d2e06598 100644 --- a/http4s/src/test/resources/test-config.hocon +++ b/http4s/src/test/resources/test-config-old-style.hocon @@ -10,6 +10,12 @@ collector { foo = "hello" bar = "world" } + + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 + } } ssl { diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala index 8106ab345..310df4365 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala @@ -1,40 +1,60 @@ package com.snowplowanalytics.snowplow.collector.core import java.nio.file.Paths - import org.specs2.mutable.Specification - import cats.effect.IO - import cats.effect.testing.specs2.CatsEffect - +import com.snowplowanalytics.snowplow.collector.core.Config.Buffer import io.circe.generic.semiauto._ class ConfigParserSpec extends Specification with CatsEffect { "Loading the configuration" should { - "use reference.conf and the hocon specified in the path" in { - case class SinkConfig(foo: String, bar: String) - implicit val decoder = deriveDecoder[SinkConfig] - - val path = Paths.get(getClass.getResource(("/test-config.hocon")).toURI()) + "use reference.conf and the hocon specified in the path" >> { + "for new-style config" in { + assert(resource = "/test-config-new-style.hocon") + } + "for old-style config" in { + assert(resource = "/test-config-old-style.hocon") + } + } + } - val expectedStreams = Config.Streams[SinkConfig]( - "good", - "bad", - TestUtils.testConfig.streams.useIpAddressAsPartitionKey, - SinkConfig("hello", "world"), - TestUtils.testConfig.streams.buffer + private def assert(resource: String) = { + case class SinkConfig(foo: String, bar: String) + implicit val decoder = deriveDecoder[SinkConfig] + + val path = Paths.get(getClass.getResource(resource).toURI) + + val expectedStreams = Config.Streams[SinkConfig]( + good = Config.Sink( + name = "good", + buffer = Buffer( + 3145728, + 500, + 5000 + ), + SinkConfig("hello", "world") + ), + bad = Config.Sink( + name = "bad", + buffer = Buffer( + 3145728, + 500, + 5000 + ), + SinkConfig("hello", "world") + ), + TestUtils.testConfig.streams.useIpAddressAsPartitionKey + ) + val expected = TestUtils + .testConfig + .copy[SinkConfig]( + paths = Map.empty[String, String], + streams = expectedStreams, + ssl = TestUtils.testConfig.ssl.copy(enable = true) ) - val expected = TestUtils - .testConfig - .copy[SinkConfig]( - paths = Map.empty[String, String], - streams = expectedStreams, - ssl = TestUtils.testConfig.ssl.copy(enable = true) - ) - ConfigParser.fromPath[IO, SinkConfig](Some(path)).value.map(_ should beRight(expected)) - } + ConfigParser.fromPath[IO, SinkConfig](Some(path)).value.map(_ should beRight(expected)) } } diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 647871ee4..3647ec7d3 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -75,15 +75,25 @@ object TestUtils { ), cors = CORS(60.minutes), streams = Streams( - "raw", - "bad-1", - false, - AnyRef, - Buffer( - 3145728, - 500, - 5000 - ) + good = Sink( + name = "raw", + Buffer( + 3145728, + 500, + 5000 + ), + AnyRef + ), + bad = Sink( + name = "bad-1", + Buffer( + 3145728, + 500, + 5000 + ), + AnyRef + ), + useIpAddressAsPartitionKey = false ), monitoring = Monitoring( Metrics( diff --git a/kafka/src/it/resources/collector.hocon b/kafka/src/it/resources/collector.hocon index 78fd2c372..2468a977b 100644 --- a/kafka/src/it/resources/collector.hocon +++ b/kafka/src/it/resources/collector.hocon @@ -3,10 +3,13 @@ collector { port = ${PORT} streams { - good = ${TOPIC_GOOD} - bad = ${TOPIC_BAD} - - sink { + good { + name = ${TOPIC_GOOD} + brokers = ${BROKER} + maxBytes = ${MAX_BYTES} + } + bad { + name = ${TOPIC_BAD} brokers = ${BROKER} maxBytes = ${MAX_BYTES} } diff --git a/kafka/src/main/resources/application.conf b/kafka/src/main/resources/application.conf index 80182aeec..275fd19d1 100644 --- a/kafka/src/main/resources/application.conf +++ b/kafka/src/main/resources/application.conf @@ -1,12 +1,19 @@ collector { streams { + + //New object-like style + good = ${collector.streams.sink} + bad = ${collector.streams.sink} + + //Legacy style sink { - enabled = kafka threadPoolSize = 10 retries = 10 maxBytes = 1000000 + buffer = ${collector.streams.buffer} } + //Legacy style buffer { byteLimit = 3145728 recordLimit = 500 diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala index f8394b4a5..e162f7a23 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaCollector.scala @@ -19,18 +19,8 @@ object KafkaCollector extends App[KafkaSinkConfig](BuildInfo) { override def mkSinks(config: Config.Streams[KafkaSinkConfig]): Resource[IO, Sinks[IO]] = for { - good <- KafkaSink.create[IO]( - config.sink.maxBytes, - config.good, - config.sink, - config.buffer - ) - bad <- KafkaSink.create[IO]( - config.sink.maxBytes, - config.bad, - config.sink, - config.buffer - ) + good <- KafkaSink.create[IO](config.good) + bad <- KafkaSink.create[IO](config.bad) } yield Sinks(good, bad) override def telemetryInfo(config: Config.Streams[KafkaSinkConfig]): IO[Telemetry.TelemetryInfo] = diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala index 5c199ca60..0280ecf00 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSink.scala @@ -62,14 +62,11 @@ class KafkaSink[F[_]: Sync]( object KafkaSink { def create[F[_]: Sync]( - maxBytes: Int, - topicName: String, - kafkaConfig: KafkaSinkConfig, - bufferConfig: Config.Buffer + sinkConfig: Config.Sink[KafkaSinkConfig] ): Resource[F, KafkaSink[F]] = for { - kafkaProducer <- createProducer(kafkaConfig, bufferConfig) - kafkaSink = new KafkaSink(maxBytes, kafkaProducer, topicName) + kafkaProducer <- createProducer(sinkConfig.config, sinkConfig.buffer) + kafkaSink = new KafkaSink(sinkConfig.config.maxBytes, kafkaProducer, sinkConfig.name) } yield kafkaSink /** diff --git a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala index 676a5259d..ee4ede0cb 100644 --- a/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala +++ b/kafka/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KafkaSinkConfig.scala @@ -3,14 +3,12 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks import io.circe.Decoder import io.circe.generic.semiauto._ -import com.snowplowanalytics.snowplow.collector.core.Config - final case class KafkaSinkConfig( maxBytes: Int, brokers: String, retries: Int, producerConf: Option[Map[String, String]] -) extends Config.Sink +) object KafkaSinkConfig { implicit val configDecoder: Decoder[KafkaSinkConfig] = deriveDecoder[KafkaSinkConfig] diff --git a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala index e6c503c06..95d41a67c 100644 --- a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala +++ b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala @@ -99,20 +99,35 @@ object KafkaConfigSpec { redirectDomains = Set.empty, preTerminationPeriod = 10.seconds, streams = Config.Streams( - good = "good", - bad = "bad", - useIpAddressAsPartitionKey = false, - buffer = Config.Buffer( - byteLimit = 3145728, - recordLimit = 500, - timeLimit = 5000 + good = Config.Sink( + name = "good", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + config = KafkaSinkConfig( + maxBytes = 1000000, + brokers = "localhost:9092,another.host:9092", + retries = 10, + producerConf = None + ) ), - sink = KafkaSinkConfig( - maxBytes = 1000000, - brokers = "localhost:9092,another.host:9092", - retries = 10, - producerConf = None - ) + bad = Config.Sink( + name = "bad", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + config = KafkaSinkConfig( + maxBytes = 1000000, + brokers = "localhost:9092,another.host:9092", + retries = 10, + producerConf = None + ) + ), + useIpAddressAsPartitionKey = false ), telemetry = Config.Telemetry( disable = false, diff --git a/kinesis/src/it/resources/collector-cookie-anonymous.hocon b/kinesis/src/it/resources/collector-cookie-anonymous.hocon index 55d7c4992..14f4ed802 100644 --- a/kinesis/src/it/resources/collector-cookie-anonymous.hocon +++ b/kinesis/src/it/resources/collector-cookie-anonymous.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-cookie-attributes-1.hocon b/kinesis/src/it/resources/collector-cookie-attributes-1.hocon index 3ad47e0b3..e661116da 100644 --- a/kinesis/src/it/resources/collector-cookie-attributes-1.hocon +++ b/kinesis/src/it/resources/collector-cookie-attributes-1.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-cookie-attributes-2.hocon b/kinesis/src/it/resources/collector-cookie-attributes-2.hocon index 55d7c4992..14f4ed802 100644 --- a/kinesis/src/it/resources/collector-cookie-attributes-2.hocon +++ b/kinesis/src/it/resources/collector-cookie-attributes-2.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-cookie-domain.hocon b/kinesis/src/it/resources/collector-cookie-domain.hocon index d8bdbdc4b..4a7eaee7c 100644 --- a/kinesis/src/it/resources/collector-cookie-domain.hocon +++ b/kinesis/src/it/resources/collector-cookie-domain.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-cookie-fallback.hocon b/kinesis/src/it/resources/collector-cookie-fallback.hocon index ecef93c0a..8c9c874f6 100644 --- a/kinesis/src/it/resources/collector-cookie-fallback.hocon +++ b/kinesis/src/it/resources/collector-cookie-fallback.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-cookie-no-domain.hocon b/kinesis/src/it/resources/collector-cookie-no-domain.hocon index 55d7c4992..14f4ed802 100644 --- a/kinesis/src/it/resources/collector-cookie-no-domain.hocon +++ b/kinesis/src/it/resources/collector-cookie-no-domain.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-custom-paths.hocon b/kinesis/src/it/resources/collector-custom-paths.hocon index f588fb1b6..a39c6d87d 100644 --- a/kinesis/src/it/resources/collector-custom-paths.hocon +++ b/kinesis/src/it/resources/collector-custom-paths.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon b/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon index bf16f99a1..6f6f54155 100644 --- a/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon +++ b/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon b/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon index 5415d8263..0604641ae 100644 --- a/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon +++ b/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/it/resources/collector.hocon b/kinesis/src/it/resources/collector.hocon index 177f7e673..0183b1258 100644 --- a/kinesis/src/it/resources/collector.hocon +++ b/kinesis/src/it/resources/collector.hocon @@ -3,10 +3,21 @@ collector { port = ${PORT} streams { - good = ${STREAM_GOOD} - bad = ${STREAM_BAD} + good { + name = ${STREAM_GOOD} + region = ${REGION} + customEndpoint = ${KINESIS_ENDPOINT} + + aws { + accessKey = env + secretKey = env + } - sink { + maxBytes = ${MAX_BYTES} + } + + bad { + name = ${STREAM_BAD} region = ${REGION} customEndpoint = ${KINESIS_ENDPOINT} diff --git a/kinesis/src/main/resources/application.conf b/kinesis/src/main/resources/application.conf index 49ee01e22..1cc2c0596 100644 --- a/kinesis/src/main/resources/application.conf +++ b/kinesis/src/main/resources/application.conf @@ -1,7 +1,11 @@ { streams { + //New object-like style + good = ${streams.sink} + bad = ${streams.sink} + + //Legacy style sink { - enabled = kinesis threadPoolSize = 10 aws { @@ -19,8 +23,10 @@ sqsMaxBytes = 192000 startupCheckInterval = 1 second + buffer = ${streams.buffer} } + //Legacy style buffer { byteLimit = 3145728 recordLimit = 500 diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala index baf1898c0..ab51cbaba 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala @@ -25,24 +25,10 @@ object KinesisCollector extends App[KinesisSinkConfig](BuildInfo) { private lazy val log = LoggerFactory.getLogger(getClass) override def mkSinks(config: Config.Streams[KinesisSinkConfig]): Resource[IO, Sinks[IO]] = { - val threadPoolExecutor = buildExecutorService(config.sink) + val threadPoolExecutor = buildExecutorService(config.good.config) for { - good <- KinesisSink.create[IO]( - kinesisMaxBytes = config.sink.maxBytes, - kinesisConfig = config.sink, - bufferConfig = config.buffer, - streamName = config.good, - sqsBufferName = config.sink.sqsGoodBuffer, - threadPoolExecutor - ) - bad <- KinesisSink.create[IO]( - kinesisMaxBytes = config.sink.maxBytes, - kinesisConfig = config.sink, - bufferConfig = config.buffer, - streamName = config.bad, - sqsBufferName = config.sink.sqsBadBuffer, - threadPoolExecutor - ) + good <- KinesisSink.create[IO](config.good, config.good.config.sqsGoodBuffer, threadPoolExecutor) + bad <- KinesisSink.create[IO](config.bad, config.good.config.sqsBadBuffer, threadPoolExecutor) } yield Sinks(good, bad) } @@ -51,7 +37,7 @@ object KinesisCollector extends App[KinesisSinkConfig](BuildInfo) { .getAccountId(config) .map(id => Telemetry.TelemetryInfo( - region = Some(config.sink.region), + region = Some(config.good.config.region), cloud = Some("AWS"), unhashedInstallationId = id ) diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala index f303d8cb0..e70d34ead 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala @@ -10,11 +10,11 @@ object TelemetryUtils { def getAccountId(config: Config.Streams[KinesisSinkConfig]): IO[Option[String]] = Resource .make( - IO(KinesisSink.createKinesisClient(config.sink.endpoint, config.sink.region)).rethrow + IO(KinesisSink.createKinesisClient(config.good.config.endpoint, config.good.config.region)).rethrow )(c => IO(c.shutdown())) .use { kinesis => IO { - val streamArn = KinesisSink.describeStream(kinesis, config.good).getStreamARN + val streamArn = KinesisSink.describeStream(kinesis, config.good.name).getStreamARN Some(extractAccountId(streamArn)) } } diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala index f5c0fa188..f0ccf13c3 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala @@ -441,17 +441,14 @@ object KinesisSink { * during its construction. */ def create[F[_]: Sync]( - kinesisMaxBytes: Int, - kinesisConfig: KinesisSinkConfig, - bufferConfig: Config.Buffer, - streamName: String, + sinkConfig: Config.Sink[KinesisSinkConfig], sqsBufferName: Option[String], executorService: ScheduledExecutorService ): Resource[F, KinesisSink[F]] = { val acquire = Sync[F] .delay( - createAndInitialize(kinesisMaxBytes, kinesisConfig, bufferConfig, streamName, sqsBufferName, executorService) + createAndInitialize(sinkConfig, sqsBufferName, executorService) ) .rethrow val release = (sink: KinesisSink[F]) => Sync[F].delay(sink.shutdown()) @@ -490,29 +487,26 @@ object KinesisSink { * during its construction. */ private def createAndInitialize[F[_]: Sync]( - kinesisMaxBytes: Int, - kinesisConfig: KinesisSinkConfig, - bufferConfig: Config.Buffer, - streamName: String, + sinkConfig: Config.Sink[KinesisSinkConfig], sqsBufferName: Option[String], executorService: ScheduledExecutorService ): Either[Throwable, KinesisSink[F]] = { val clients = for { - kinesisClient <- createKinesisClient(kinesisConfig.endpoint, kinesisConfig.region) - sqsClientAndName <- sqsBuffer(sqsBufferName, kinesisConfig.region) + kinesisClient <- createKinesisClient(sinkConfig.config.endpoint, sinkConfig.config.region) + sqsClientAndName <- sqsBuffer(sqsBufferName, sinkConfig.config.region) } yield (kinesisClient, sqsClientAndName) clients.map { case (kinesisClient, sqsClientAndName) => val maxBytes = - if (sqsClientAndName.isDefined) kinesisConfig.sqsMaxBytes else kinesisMaxBytes + if (sqsClientAndName.isDefined) sinkConfig.config.sqsMaxBytes else sinkConfig.config.maxBytes val ks = new KinesisSink( maxBytes, kinesisClient, - kinesisConfig, - bufferConfig, - streamName, + sinkConfig.config, + sinkConfig.buffer, + sinkConfig.name, executorService, sqsClientAndName ) diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala index 8826c6b4b..bf6eb0219 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSinkConfig.scala @@ -1,6 +1,5 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks -import com.snowplowanalytics.snowplow.collector.core.Config import io.circe.Decoder import io.circe.generic.semiauto._ import io.circe.config.syntax.durationDecoder @@ -17,7 +16,7 @@ final case class KinesisSinkConfig( sqsBadBuffer: Option[String], sqsMaxBytes: Int, startupCheckInterval: FiniteDuration -) extends Config.Sink { +) { val endpoint = customEndpoint.getOrElse(region match { case cn @ "cn-north-1" => s"https://kinesis.$cn.amazonaws.com.cn" case cn @ "cn-northwest-1" => s"https://kinesis.$cn.amazonaws.com.cn" diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index 477927e8c..fb6b3e778 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -118,28 +118,52 @@ object KinesisConfigSpec { idleTimeout = 610.seconds ), streams = Config.Streams( - good = "good", - bad = "bad", useIpAddressAsPartitionKey = false, - buffer = Config.Buffer( - byteLimit = 3145728, - recordLimit = 500, - timeLimit = 5000 + good = Config.Sink( + name = "good", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + config = KinesisSinkConfig( + maxBytes = 1000000, + region = "eu-central-1", + threadPoolSize = 10, + backoffPolicy = KinesisSinkConfig.BackoffPolicy( + minBackoff = 500, + maxBackoff = 1500, + maxRetries = 3 + ), + sqsBadBuffer = None, + sqsGoodBuffer = None, + sqsMaxBytes = 192000, + customEndpoint = None, + startupCheckInterval = 1.second + ) ), - sink = KinesisSinkConfig( - maxBytes = 1000000, - region = "eu-central-1", - threadPoolSize = 10, - backoffPolicy = KinesisSinkConfig.BackoffPolicy( - minBackoff = 500, - maxBackoff = 1500, - maxRetries = 3 + bad = Config.Sink( + name = "bad", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 ), - sqsBadBuffer = None, - sqsGoodBuffer = None, - sqsMaxBytes = 192000, - customEndpoint = None, - startupCheckInterval = 1.second + config = KinesisSinkConfig( + maxBytes = 1000000, + region = "eu-central-1", + threadPoolSize = 10, + backoffPolicy = KinesisSinkConfig.BackoffPolicy( + minBackoff = 500, + maxBackoff = 1500, + maxRetries = 3 + ), + sqsBadBuffer = None, + sqsGoodBuffer = None, + sqsMaxBytes = 192000, + customEndpoint = None, + startupCheckInterval = 1.second + ) ) ), telemetry = Config.Telemetry( diff --git a/nsq/src/main/resources/application.conf b/nsq/src/main/resources/application.conf index 1df27cd22..bd867ae8a 100644 --- a/nsq/src/main/resources/application.conf +++ b/nsq/src/main/resources/application.conf @@ -1,10 +1,14 @@ collector { streams { + + good = ${collector.streams.sink} + bad = ${collector.streams.sink} + sink { - enabled = nsq threadPoolSize = 10 port = 4150 maxBytes = 1000000 + buffer = ${collector.streams.buffer} } buffer { diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala index 2d777fddf..4b95f1367 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqCollector.scala @@ -18,16 +18,8 @@ import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ object NsqCollector extends App[NsqSinkConfig](BuildInfo) { override def mkSinks(config: Config.Streams[NsqSinkConfig]): Resource[IO, Sinks[IO]] = for { - good <- NsqSink.create[IO]( - config.sink.maxBytes, - config.sink, - config.good - ) - bad <- NsqSink.create[IO]( - config.sink.maxBytes, - config.sink, - config.bad - ) + good <- NsqSink.create[IO](config.good) + bad <- NsqSink.create[IO](config.bad) } yield Sinks(good, bad) override def telemetryInfo(config: Config.Streams[NsqSinkConfig]): IO[Telemetry.TelemetryInfo] = diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala index 358963605..4563e2aac 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala @@ -12,14 +12,11 @@ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks import java.util.concurrent.TimeoutException - import scala.collection.JavaConverters._ - import cats.effect.{Resource, Sync} import cats.implicits._ - import com.snowplowanalytics.client.nsq.NSQProducer -import com.snowplowanalytics.snowplow.collector.core.{Sink} +import com.snowplowanalytics.snowplow.collector.core.{Config, Sink} import com.snowplowanalytics.client.nsq.exceptions.NSQException /** @@ -57,13 +54,11 @@ class NsqSink[F[_]: Sync] private ( object NsqSink { def create[F[_]: Sync]( - maxBytes: Int, - nsqConfig: NsqSinkConfig, - topicName: String + nsqConfig: Config.Sink[NsqSinkConfig] ): Resource[F, NsqSink[F]] = Resource.make( Sync[F].delay( - new NsqSink(maxBytes, nsqConfig, topicName) + new NsqSink(nsqConfig.config.maxBytes, nsqConfig.config, nsqConfig.name) ) )(sink => Sync[F].delay(sink.shutdown())) } diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala index 0025d9d08..2a6c13bd7 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSinkConfig.scala @@ -13,14 +13,12 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks import io.circe.Decoder import io.circe.generic.semiauto._ -import com.snowplowanalytics.snowplow.collector.core.Config - final case class NsqSinkConfig( maxBytes: Int, threadPoolSize: Int, host: String, port: Int -) extends Config.Sink +) object NsqSinkConfig { implicit val configDecoder: Decoder[NsqSinkConfig] = deriveDecoder[NsqSinkConfig] diff --git a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala index f5f01e20c..e5620daeb 100644 --- a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala +++ b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala @@ -98,19 +98,34 @@ object NsqConfigSpec { redirectDomains = Set.empty, preTerminationPeriod = 10.seconds, streams = Config.Streams( - good = "good", - bad = "bad", useIpAddressAsPartitionKey = false, - buffer = Config.Buffer( - byteLimit = 3145728, - recordLimit = 500, - timeLimit = 5000 + good = Config.Sink( + name = "good", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + config = NsqSinkConfig( + maxBytes = 1000000, + threadPoolSize = 10, + host = "nsqHost", + port = 4150 + ) ), - sink = NsqSinkConfig( - maxBytes = 1000000, - threadPoolSize = 10, - host = "nsqHost", - port = 4150 + bad = Config.Sink( + name = "bad", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + config = NsqSinkConfig( + maxBytes = 1000000, + threadPoolSize = 10, + host = "nsqHost", + port = 4150 + ) ) ), telemetry = Config.Telemetry( diff --git a/pubsub/src/it/resources/collector.hocon b/pubsub/src/it/resources/collector.hocon index 923d10e56..d964fbe56 100644 --- a/pubsub/src/it/resources/collector.hocon +++ b/pubsub/src/it/resources/collector.hocon @@ -3,12 +3,15 @@ collector { port = ${PORT} streams { - good = ${TOPIC_GOOD} - bad = ${TOPIC_BAD} - - sink { - googleProjectId = ${GOOGLE_PROJECT_ID} - maxBytes = ${MAX_BYTES} + good { + name = ${TOPIC_GOOD} + googleProjectId = ${GOOGLE_PROJECT_ID} + maxBytes = ${MAX_BYTES} + } + bad { + name = ${TOPIC_BAD} + googleProjectId = ${GOOGLE_PROJECT_ID} + maxBytes = ${MAX_BYTES} } } } \ No newline at end of file diff --git a/pubsub/src/main/resources/application.conf b/pubsub/src/main/resources/application.conf index 3b408c10d..6b33a1d32 100644 --- a/pubsub/src/main/resources/application.conf +++ b/pubsub/src/main/resources/application.conf @@ -1,7 +1,10 @@ { streams { + //New object-like style + good = ${streams.sink} + bad = ${streams.sink} + sink { - enabled = google-pub-sub threadPoolSize = 10 backoffPolicy { @@ -18,9 +21,7 @@ startupCheckInterval = 1 second retryInterval = 10 seconds - gcpUserAgent { - productName = "Snowplow OSS" - } + buffer = ${streams.buffer} } buffer { diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala index 026caf030..8d8526660 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/PubSubCollector.scala @@ -10,8 +10,8 @@ object PubSubCollector extends App[PubSubSinkConfig](BuildInfo) { override def mkSinks(config: Config.Streams[PubSubSinkConfig]): Resource[IO, Sinks[IO]] = for { - good <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.good) - bad <- PubSubSink.create[IO](config.sink.maxBytes, config.sink, config.buffer, config.bad) + good <- PubSubSink.create[IO](config.good) + bad <- PubSubSink.create[IO](config.bad) } yield Sinks(good, bad) override def telemetryInfo(config: Config.Streams[PubSubSinkConfig]): IO[Telemetry.TelemetryInfo] = @@ -19,7 +19,7 @@ object PubSubCollector extends App[PubSubSinkConfig](BuildInfo) { Telemetry.TelemetryInfo( region = None, cloud = Some("GCP"), - unhashedInstallationId = Some(config.sink.googleProjectId) + unhashedInstallationId = Some(config.good.config.googleProjectId) ) ) } diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala index 17300d0e0..fa101eb08 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala @@ -82,21 +82,18 @@ object PubSubSink { } def create[F[_]: Async: Parallel]( - maxBytes: Int, - sinkConfig: PubSubSinkConfig, - bufferConfig: Config.Buffer, - topicName: String + sinkConfig: Config.Sink[PubSubSinkConfig] ): Resource[F, Sink[F]] = for { isHealthyState <- Resource.eval(Ref.of[F, Boolean](false)) - producer <- createProducer[F](sinkConfig, topicName, bufferConfig) - _ <- PubSubHealthCheck.run(isHealthyState, sinkConfig, topicName) + producer <- createProducer[F](sinkConfig.config, sinkConfig.name, sinkConfig.buffer) + _ <- PubSubHealthCheck.run(isHealthyState, sinkConfig.config, sinkConfig.name) } yield new PubSubSink( - maxBytes, + sinkConfig.config.maxBytes, isHealthyState, producer, - sinkConfig.retryInterval, - topicName + sinkConfig.config.retryInterval, + sinkConfig.name ) private def createProducer[F[_]: Async: Parallel]( diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala index d8c92955b..d467121bd 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala @@ -1,6 +1,5 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks -import com.snowplowanalytics.snowplow.collector.core.Config import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.PubSubSinkConfig.BackoffPolicy import io.circe.Decoder import io.circe.config.syntax.durationDecoder @@ -14,7 +13,7 @@ final case class PubSubSinkConfig( backoffPolicy: BackoffPolicy, startupCheckInterval: FiniteDuration, retryInterval: FiniteDuration -) extends Config.Sink +) object PubSubSinkConfig { diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index 526a849d7..0de4e9d7c 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -102,28 +102,52 @@ object ConfigSpec { idleTimeout = 610.seconds ), streams = Config.Streams( - good = "good", - bad = "bad", useIpAddressAsPartitionKey = false, - buffer = Config.Buffer( - byteLimit = 100000, - recordLimit = 40, - timeLimit = 1000 + good = Config.Sink( + name = "good", + buffer = Config.Buffer( + byteLimit = 100000, + recordLimit = 40, + timeLimit = 1000 + ), + config = PubSubSinkConfig( + maxBytes = 10000000, + googleProjectId = "google-project-id", + backoffPolicy = PubSubSinkConfig.BackoffPolicy( + minBackoff = 1000, + maxBackoff = 1000, + totalBackoff = 9223372036854L, + multiplier = 2, + initialRpcTimeout = 10000, + maxRpcTimeout = 10000, + rpcTimeoutMultiplier = 2 + ), + startupCheckInterval = 1.second, + retryInterval = 10.seconds + ) ), - sink = PubSubSinkConfig( - maxBytes = 10000000, - googleProjectId = "google-project-id", - backoffPolicy = PubSubSinkConfig.BackoffPolicy( - minBackoff = 1000, - maxBackoff = 1000, - totalBackoff = 9223372036854L, - multiplier = 2, - initialRpcTimeout = 10000, - maxRpcTimeout = 10000, - rpcTimeoutMultiplier = 2 + bad = Config.Sink( + name = "bad", + buffer = Config.Buffer( + byteLimit = 100000, + recordLimit = 40, + timeLimit = 1000 ), - startupCheckInterval = 1.second, - retryInterval = 10.seconds + config = PubSubSinkConfig( + maxBytes = 10000000, + googleProjectId = "google-project-id", + backoffPolicy = PubSubSinkConfig.BackoffPolicy( + minBackoff = 1000, + maxBackoff = 1000, + totalBackoff = 9223372036854L, + multiplier = 2, + initialRpcTimeout = 10000, + maxRpcTimeout = 10000, + rpcTimeoutMultiplier = 2 + ), + startupCheckInterval = 1.second, + retryInterval = 10.seconds + ) ) ), telemetry = Config.Telemetry( diff --git a/sqs/src/main/resources/application.conf b/sqs/src/main/resources/application.conf index a862f2b43..663e7aca1 100644 --- a/sqs/src/main/resources/application.conf +++ b/sqs/src/main/resources/application.conf @@ -1,5 +1,7 @@ collector { streams { + good = ${collector.streams.sink} + bad = ${collector.streams.sink} sink { enabled = sqs threadPoolSize = 10 @@ -13,6 +15,7 @@ collector { maxBytes = 192000 startupCheckInterval = 1 second + buffer = ${collector.streams.buffer} } buffer { diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala index d23630a87..019986d5a 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsCollector.scala @@ -19,22 +19,10 @@ import com.snowplowanalytics.snowplow.collectors.scalastream.sinks._ object SqsCollector extends App[SqsSinkConfig](BuildInfo) { override def mkSinks(config: Config.Streams[SqsSinkConfig]): Resource[IO, Sinks[IO]] = { - val threadPoolExecutor = new ScheduledThreadPoolExecutor(config.sink.threadPoolSize) + val threadPoolExecutor = new ScheduledThreadPoolExecutor(config.good.config.threadPoolSize) for { - good <- SqsSink.create[IO]( - config.sink.maxBytes, - config.sink, - config.buffer, - config.good, - threadPoolExecutor - ) - bad <- SqsSink.create[IO]( - config.sink.maxBytes, - config.sink, - config.buffer, - config.bad, - threadPoolExecutor - ) + good <- SqsSink.create[IO](config.good, threadPoolExecutor) + bad <- SqsSink.create[IO](config.bad, threadPoolExecutor) } yield Sinks(good, bad) } @@ -43,7 +31,7 @@ object SqsCollector extends App[SqsSinkConfig](BuildInfo) { .getAccountId(config) .map(id => Telemetry.TelemetryInfo( - region = Some(config.sink.region), + region = Some(config.good.config.region), cloud = Some("AWS"), unhashedInstallationId = id ) diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala index 7aa013c77..f0b14ef9e 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TelemetryUtils.scala @@ -9,11 +9,11 @@ object TelemetryUtils { def getAccountId(config: Config.Streams[SqsSinkConfig]): IO[Option[String]] = Resource .make( - IO(SqsSink.createSqsClient(config.sink.region)).rethrow + IO(SqsSink.createSqsClient(config.good.config.region)).rethrow )(c => IO(c.shutdown())) .use { client => IO { - val sqsQueueUrl = client.getQueueUrl(config.good).getQueueUrl + val sqsQueueUrl = client.getQueueUrl(config.good.name).getQueueUrl Some(extractAccountId(sqsQueueUrl)) } } diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala index 94e8de375..5aab6fa8d 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala @@ -279,16 +279,13 @@ object SqsSink { final case class BatchResultErrorInfo(code: String, message: String) def create[F[_]: Sync]( - maxBytes: Int, - sqsConfig: SqsSinkConfig, - bufferConfig: Config.Buffer, - queueName: String, + sqsConfig: Config.Sink[SqsSinkConfig], executorService: ScheduledExecutorService ): Resource[F, SqsSink[F]] = { val acquire = Sync[F] .delay( - createAndInitialize(maxBytes, sqsConfig, bufferConfig, queueName, executorService) + createAndInitialize(sqsConfig, executorService) ) .rethrow val release = (sink: SqsSink[F]) => Sync[F].delay(sink.shutdown()) @@ -307,14 +304,12 @@ object SqsSink { * during its construction. */ def createAndInitialize[F[_]: Sync]( - maxBytes: Int, - sqsConfig: SqsSinkConfig, - bufferConfig: Config.Buffer, - queueName: String, + sqsConfig: Config.Sink[SqsSinkConfig], executorService: ScheduledExecutorService ): Either[Throwable, SqsSink[F]] = - createSqsClient(sqsConfig.region).map { c => - val sqsSink = new SqsSink(maxBytes, c, sqsConfig, bufferConfig, queueName, executorService) + createSqsClient(sqsConfig.config.region).map { c => + val sqsSink = + new SqsSink(sqsConfig.config.maxBytes, c, sqsConfig.config, sqsConfig.buffer, sqsConfig.name, executorService) sqsSink.EventStorage.scheduleFlush() sqsSink.checkSqsHealth() sqsSink diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala index 7db8b879f..c8694713d 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSinkConfig.scala @@ -3,18 +3,14 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks import io.circe.Decoder import io.circe.generic.semiauto._ -import com.snowplowanalytics.snowplow.collector.core.Config - final case class SqsSinkConfig( maxBytes: Int, region: String, backoffPolicy: SqsSinkConfig.BackoffPolicyConfig, threadPoolSize: Int -) extends Config.Sink +) object SqsSinkConfig { - final case class AWSConfig(accessKey: String, secretKey: String) - final case class BackoffPolicyConfig(minBackoff: Long, maxBackoff: Long, maxRetries: Int) implicit val configDecoder: Decoder[SqsSinkConfig] = deriveDecoder[SqsSinkConfig] diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index ca04acecd..e50f5ba85 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -103,23 +103,42 @@ object SqsConfigSpec { idleTimeout = 610.seconds ), streams = Config.Streams( - good = "good", - bad = "bad", useIpAddressAsPartitionKey = false, - buffer = Config.Buffer( - byteLimit = 3145728, - recordLimit = 500, - timeLimit = 5000 + good = Config.Sink( + name = "good", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 + ), + config = SqsSinkConfig( + maxBytes = 192000, + region = "eu-central-1", + backoffPolicy = SqsSinkConfig.BackoffPolicyConfig( + minBackoff = 500, + maxBackoff = 1500, + maxRetries = 3 + ), + threadPoolSize = 10 + ) ), - sink = SqsSinkConfig( - maxBytes = 192000, - region = "eu-central-1", - backoffPolicy = SqsSinkConfig.BackoffPolicyConfig( - minBackoff = 500, - maxBackoff = 1500, - maxRetries = 3 + bad = Config.Sink( + name = "bad", + buffer = Config.Buffer( + byteLimit = 3145728, + recordLimit = 500, + timeLimit = 5000 ), - threadPoolSize = 10 + config = SqsSinkConfig( + maxBytes = 192000, + region = "eu-central-1", + backoffPolicy = SqsSinkConfig.BackoffPolicyConfig( + minBackoff = 500, + maxBackoff = 1500, + maxRetries = 3 + ), + threadPoolSize = 10 + ) ) ), telemetry = Config.Telemetry( diff --git a/stdout/src/main/resources/application.conf b/stdout/src/main/resources/application.conf index 570541343..c65ae089f 100644 --- a/stdout/src/main/resources/application.conf +++ b/stdout/src/main/resources/application.conf @@ -1,7 +1,15 @@ collector { streams { + good = ${collector.streams.sink} + bad = ${collector.streams.sink} sink { maxBytes = 1000000000 + buffer = ${collector.streams.buffer} + } + buffer { + byteLimit = 3145728 + recordLimit = 500 + timeLimit = 5000 } } } \ No newline at end of file diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala index 59e16e209..99a727bba 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/SinkConfig.scala @@ -3,11 +3,9 @@ package com.snowplowanalytics.snowplow.collector.stdout import io.circe.Decoder import io.circe.generic.semiauto._ -import com.snowplowanalytics.snowplow.collector.core.Config - final case class SinkConfig( maxBytes: Int -) extends Config.Sink +) object SinkConfig { implicit val configDecoder: Decoder[SinkConfig] = deriveDecoder[SinkConfig] diff --git a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala index c307c5bc3..b5d479d4e 100644 --- a/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala +++ b/stdout/src/main/scala/com.snowplowanalytics.snowplow.collector.stdout/StdoutCollector.scala @@ -8,8 +8,8 @@ import com.snowplowanalytics.snowplow.collector.core.{App, Config, Telemetry} object StdoutCollector extends App[SinkConfig](BuildInfo) { override def mkSinks(config: Config.Streams[SinkConfig]): Resource[IO, Sinks[IO]] = { - val good = new PrintingSink[IO](config.sink.maxBytes, System.out) - val bad = new PrintingSink[IO](config.sink.maxBytes, System.err) + val good = new PrintingSink[IO](config.good.config.maxBytes, System.out) + val bad = new PrintingSink[IO](config.bad.config.maxBytes, System.err) Resource.pure(Sinks(good, bad)) } From c8b57859b3aab7e8f0db653560cb6f85336dd76c Mon Sep 17 00:00:00 2001 From: Alex Benini Date: Wed, 3 Jan 2024 17:31:35 +0100 Subject: [PATCH 28/39] Update the Pubsub UserAgent format (close #362) --- pubsub/src/main/resources/application.conf | 3 ++ .../sinks/PubSubSink.scala | 6 ++- .../sinks/PubSubSinkConfig.scala | 9 ++++- .../ConfigSpec.scala | 6 ++- .../sinks/GcpUserAgentSpec.scala | 38 +++++++++++++++++++ 5 files changed, 56 insertions(+), 6 deletions(-) create mode 100644 pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala diff --git a/pubsub/src/main/resources/application.conf b/pubsub/src/main/resources/application.conf index 6b33a1d32..530d04f48 100644 --- a/pubsub/src/main/resources/application.conf +++ b/pubsub/src/main/resources/application.conf @@ -22,6 +22,9 @@ startupCheckInterval = 1 second retryInterval = 10 seconds buffer = ${streams.buffer} + gcpUserAgent { + productName = "Snowplow OSS" + } } buffer { diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala index fa101eb08..ec29a53a2 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala @@ -21,7 +21,6 @@ import com.permutive.pubsub.producer.encoder.MessageEncoder import com.permutive.pubsub.producer.grpc.{GooglePubsubProducer, PubsubProducerConfig} import com.permutive.pubsub.producer.{Model, PubsubProducer} import com.snowplowanalytics.snowplow.collector.core.{Config, Sink} -import com.snowplowanalytics.snowplow.collectors.scalastream.BuildInfo import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.BuilderOps._ import org.threeten.bp.Duration import org.typelevel.log4cats.Logger @@ -108,7 +107,7 @@ object PubSubSink { onFailedTerminate = err => Logger[F].error(err)("PubSub sink termination error"), customizePublisher = Some { _.setRetrySettings(retrySettings(sinkConfig.backoffPolicy)) - .setHeaderProvider(FixedHeaderProvider.create("User-Agent", BuildInfo.dockerAlias)) + .setHeaderProvider(FixedHeaderProvider.create("User-Agent", createUserAgent(sinkConfig.gcpUserAgent))) .setProvidersForEmulator() } ) @@ -116,6 +115,9 @@ object PubSubSink { GooglePubsubProducer.of[F, Array[Byte]](ProjectId(sinkConfig.googleProjectId), Topic(topicName), config) } + private[sinks] def createUserAgent(gcpUserAgent: PubSubSinkConfig.GcpUserAgent): String = + s"${gcpUserAgent.productName}/collector (GPN:Snowplow;)" + private def retrySettings(backoffPolicy: PubSubSinkConfig.BackoffPolicy): RetrySettings = RetrySettings .newBuilder() diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala index d467121bd..da491033b 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSinkConfig.scala @@ -1,6 +1,6 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.sinks -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.PubSubSinkConfig.BackoffPolicy +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.PubSubSinkConfig._ import io.circe.Decoder import io.circe.config.syntax.durationDecoder import io.circe.generic.semiauto._ @@ -12,7 +12,8 @@ final case class PubSubSinkConfig( googleProjectId: String, backoffPolicy: BackoffPolicy, startupCheckInterval: FiniteDuration, - retryInterval: FiniteDuration + retryInterval: FiniteDuration, + gcpUserAgent: GcpUserAgent ) object PubSubSinkConfig { @@ -26,7 +27,11 @@ object PubSubSinkConfig { maxRpcTimeout: Long, rpcTimeoutMultiplier: Double ) + + final case class GcpUserAgent(productName: String) + implicit val configDecoder: Decoder[PubSubSinkConfig] = deriveDecoder[PubSubSinkConfig] implicit val backoffPolicyConfigDecoder: Decoder[BackoffPolicy] = deriveDecoder[BackoffPolicy] + implicit val gcpUserAgentDecoder: Decoder[GcpUserAgent] = deriveDecoder[GcpUserAgent] } diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index 0de4e9d7c..aa78c2584 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -123,7 +123,8 @@ object ConfigSpec { rpcTimeoutMultiplier = 2 ), startupCheckInterval = 1.second, - retryInterval = 10.seconds + retryInterval = 10.seconds, + gcpUserAgent = PubSubSinkConfig.GcpUserAgent(productName = "Snowplow OSS") ) ), bad = Config.Sink( @@ -146,7 +147,8 @@ object ConfigSpec { rpcTimeoutMultiplier = 2 ), startupCheckInterval = 1.second, - retryInterval = 10.seconds + retryInterval = 10.seconds, + gcpUserAgent = PubSubSinkConfig.GcpUserAgent(productName = "Snowplow OSS") ) ) ), diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala new file mode 100644 index 000000000..c1e464c1d --- /dev/null +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/GcpUserAgentSpec.scala @@ -0,0 +1,38 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collectors.scalastream.sinks + +import java.util.regex.Pattern + +import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.PubSubSinkConfig._ + +import org.specs2.mutable.Specification + +class GcpUserAgentSpec extends Specification { + + "createUserAgent" should { + "create user agent string correctly" in { + val gcpUserAgent = GcpUserAgent(productName = "Snowplow OSS") + val resultUserAgent = PubSubSink.createUserAgent(gcpUserAgent) + val expectedUserAgent = s"Snowplow OSS/collector (GPN:Snowplow;)" + + val userAgentRegex = Pattern.compile( + """(?iU)(?:[^\(\)\/]+\/[^\/]+\s+)*(?:[^\s][^\(\)\/]+\/[^\/]+\s?\([^\(\)]*)gpn:(.*)[;\)]""" + ) + val matcher = userAgentRegex.matcher(resultUserAgent) + val matched = if (matcher.find()) Some(matcher.group(1)) else None + val expectedMatched = "Snowplow;" + + resultUserAgent must beEqualTo(expectedUserAgent) + matched must beSome(expectedMatched) + } + } +} From 3437ccd8c95ccf7d152b73f063925b72f3fb96e8 Mon Sep 17 00:00:00 2001 From: Alex Benini Date: Wed, 3 Jan 2024 17:43:49 +0100 Subject: [PATCH 29/39] Sbt project modernization (close #361) --- .github/workflows/deploy.yml | 5 - README.md | 10 +- build.sbt | 274 +---- .../collectors/scalastream/it/Http.scala | 15 +- .../collectors/scalastream/it/utils.scala | 8 +- core/src/main/resources/reference.conf | 116 ++- .../App.scala | 10 + .../AppInfo.scala | 18 + .../Config.scala | 10 + .../ConfigParser.scala | 23 +- .../HttpServer.scala | 55 + .../Routes.scala | 10 + .../Run.scala | 13 +- .../Service.scala | 12 +- .../Sink.scala | 21 + .../SplitBatch.scala | 10 + .../Telemetry.scala | 10 + .../model.scala | 10 + .../Collector.scala | 241 ----- .../CollectorRoute.scala | 203 ---- .../CollectorService.scala | 497 --------- .../HealthService.scala | 31 - .../Warmup.scala | 79 -- .../model.scala | 272 ----- .../sinks/Sink.scala | 28 - .../telemetry/CloudVendor.scala | 22 - .../telemetry/TelemetryAkkaService.scala | 130 --- .../telemetry/TelemetryPayload.scala | 25 - .../telemetry/package.scala | 58 -- .../utils/SplitBatch.scala | 163 --- .../configs/invalid-fallback-domain.hocon | 24 - .../test/resources/configs/valid-config.hocon | 19 - .../resources/test-config-new-style.hocon | 0 .../resources/test-config-old-style.hocon | 0 .../ConfigParserSpec.scala | 0 .../RoutesSpec.scala | 2 + .../ServiceSpec.scala | 6 +- .../SplitBatchSpec.scala | 0 .../TelemetrySpec.scala | 0 .../TestSink.scala | 0 .../TestUtils.scala | 6 +- .../CollectorRouteSpec.scala | 217 ---- .../CollectorServiceSpec.scala | 948 ------------------ .../TestSink.scala | 30 - .../TestUtils.scala | 69 -- .../config/ConfigReaderSpec.scala | 41 - .../config/ConfigSpec.scala | 176 ---- .../utils/SplitBatchSpec.scala | 155 --- examples/config.kafka.extended.hocon | 28 +- examples/config.kinesis.extended.hocon | 28 +- examples/config.nsq.extended.hocon | 28 +- examples/config.pubsub.extended.hocon | 30 +- examples/config.rabbitmq.extended.hocon | 293 ------ examples/config.rabbitmq.minimal.hocon | 19 - examples/config.sqs.extended.hocon | 30 +- examples/config.stdout.extended.hocon | 30 +- flake.lock | 256 +++++ flake.nix | 60 ++ .../scalastream/it/CollectorOutput.scala | 20 - .../scalastream/it/EventGenerator.scala | 58 -- .../collectors/scalastream/it/Http.scala | 35 - .../collectors/scalastream/it/utils.scala | 135 --- http4s/src/main/resources/reference.conf | 94 -- .../AppInfo.scala | 8 - .../HttpServer.scala | 117 --- .../Sink.scala | 11 - .../scalastream/it/kafka/KafkaUtils.scala | 2 +- .../scalastream/it/core/CookieSpec.scala | 4 +- .../it/core/DoNotTrackCookieSpec.scala | 2 +- .../scalastream/it/kinesis/Kinesis.scala | 2 +- .../sinks/KinesisSink.scala | 8 +- .../sinks/NsqSink.scala | 10 +- project/BuildSettings.scala | 170 +++- project/Dependencies.scala | 164 +-- .../it/pubsub/GooglePubSubCollectorSpec.scala | 4 +- .../scalastream/it/pubsub/PubSub.scala | 2 +- .../sinks/PubSubHealthCheck.scala | 2 +- .../sinks/PubSubSink.scala | 2 +- rabbitmq/src/main/resources/application.conf | 43 - .../RabbitMQCollector.scala | 94 -- .../sinks/RabbitMQSink.scala | 81 -- .../sinks/SqsSink.scala | 7 +- 82 files changed, 854 insertions(+), 5095 deletions(-) rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala (68%) create mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala (92%) rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala (79%) create mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala (83%) rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala (87%) rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala (96%) create mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala (93%) rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala (91%) rename {http4s => core}/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala (65%) delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala delete mode 100644 core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala delete mode 100644 core/src/test/resources/configs/invalid-fallback-domain.hocon delete mode 100644 core/src/test/resources/configs/valid-config.hocon rename {http4s => core}/src/test/resources/test-config-new-style.hocon (100%) rename {http4s => core}/src/test/resources/test-config-old-style.hocon (100%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala (100%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala (99%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala (99%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala (100%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala (100%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala (100%) rename {http4s => core}/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala (95%) delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala delete mode 100644 core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala delete mode 100644 examples/config.rabbitmq.extended.hocon delete mode 100644 examples/config.rabbitmq.minimal.hocon create mode 100644 flake.lock create mode 100644 flake.nix delete mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala delete mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala delete mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala delete mode 100644 http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala delete mode 100644 http4s/src/main/resources/reference.conf delete mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala delete mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala delete mode 100644 http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala delete mode 100644 rabbitmq/src/main/resources/application.conf delete mode 100644 rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala delete mode 100644 rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index c81e7c387..47942fe93 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -23,7 +23,6 @@ jobs: sbt 'project pubsub' assembly sbt 'project sqs' assembly sbt 'project stdout' assembly - sbt 'project rabbitmq' assembly - name: Get current version id: ver run: | @@ -43,7 +42,6 @@ jobs: pubsub/target/scala-2.12/snowplow-stream-collector-google-pubsub-${{ steps.ver.outputs.project_version }}.jar sqs/target/scala-2.12/snowplow-stream-collector-sqs-${{ steps.ver.outputs.project_version }}.jar stdout/target/scala-2.12/snowplow-stream-collector-stdout-${{ steps.ver.outputs.project_version }}.jar - rabbitmq/target/scala-2.12/snowplow-stream-collector-stdout-${{ steps.ver.outputs.project_version }}.jar env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} @@ -58,11 +56,8 @@ jobs: - kafka - nsq - stdout - - rabbitmq include: - suffix: "" - - suffix: -experimental - platform: rabbitmq - platform: kinesis run_snyk: ${{ !contains(github.ref, 'rc') }} - platform: pubsub diff --git a/README.md b/README.md index 37260bfd6..3c039e16f 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,8 @@ ## Introduction -The Scala Stream Collector is an event collector for [Snowplow][snowplow], written in Scala. -It sets a third-party cookie, allowing user tracking across domains. - -The Scala Stream Collector is designed to be easy to setup and store [Thrift][thrift] Snowplow -events to [Amazon Kinesis][kinesis] and [NSQ][nsq], and is built on top of [akka-http][akka-http]. +Stream Collector receives raw [Snowplow][snowplow] events sent over HTTP by trackers or webhooks. It serializes them to a [Thrift][thrift] record format, and then writes them to one of supported sinks like [Amazon Kinesis][kinesis], [Google PubSub][pubsub], [Apache Kafka][kafka], [Amazon SQS][sqs], [NSQ][nsq]. +The Stream Collector supports cross-domain Snowplow deployments, setting a user_id (used to identify unique visitors) server side to reliably identify the same user across domains. ## Find out more @@ -29,7 +26,8 @@ Licensed under the [Snowplow Limited Use License Agreement][license]. _(If you a [thrift]: http://thrift.apache.org [kinesis]: http://aws.amazon.com/kinesis -[akka-http]: http://doc.akka.io/docs/akka-http/current/scala/http/introduction.html +[pubsub]: https://cloud.google.com/pubsub/ +[sqs]: https://aws.amazon.com/sqs/ [nsq]: http://nsq.io/ [techdocs-image]: https://d3i6fms1cm1j0i.cloudfront.net/github/images/techdocs.png diff --git a/build.sbt b/build.sbt index 8cd3c74b9..c93c7a6f7 100644 --- a/build.sbt +++ b/build.sbt @@ -8,128 +8,18 @@ * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ -import com.typesafe.sbt.packager.docker._ -import sbtbuildinfo.BuildInfoPlugin.autoImport._ - -lazy val commonDependencies = Seq( - // Java - Dependencies.Libraries.thrift, - Dependencies.Libraries.jodaTime, - Dependencies.Libraries.slf4j, - Dependencies.Libraries.log4jOverSlf4j, - Dependencies.Libraries.config, - // Scala - Dependencies.Libraries.scopt, - Dependencies.Libraries.akkaStream, - Dependencies.Libraries.akkaHttp, - Dependencies.Libraries.akkaStream, - Dependencies.Libraries.akkaSlf4j, - Dependencies.Libraries.akkaHttpMetrics, - Dependencies.Libraries.jnrUnixsocket, - Dependencies.Libraries.badRows, - Dependencies.Libraries.collectorPayload, - Dependencies.Libraries.pureconfig, - Dependencies.Libraries.Legacy.trackerCore, - Dependencies.Libraries.Legacy.trackerEmitterId, - // Unit tests - Dependencies.Libraries.akkaTestkit, - Dependencies.Libraries.akkaHttpTestkit, - Dependencies.Libraries.akkaStreamTestkit, - Dependencies.Libraries.specs2, - // Integration tests - Dependencies.Libraries.Legacy.testcontainers, - Dependencies.Libraries.Legacy.http4sClient, - Dependencies.Libraries.Legacy.catsRetry -) - -lazy val commonExclusions = Seq( - "org.apache.tomcat.embed" % "tomcat-embed-core", // exclude for security vulnerabilities introduced by libthrift - // Avoid duplicate .proto files brought in by akka and google-cloud-pubsub. - // We don't need any akka serializers because collector runs in a single JVM. - "com.typesafe.akka" % "akka-protobuf-v3_2.12" -) - -lazy val buildInfoSettings = Seq( - buildInfoPackage := "com.snowplowanalytics.snowplow.collectors.scalastream.generated", - buildInfoKeys := Seq[BuildInfoKey](organization, moduleName, name, version, "shortName" -> "ssc", scalaVersion) -) - -// Make package (build) metadata available within source code for integration tests. -lazy val scalifiedSettings = Seq( - IntegrationTest / sourceGenerators += Def.task { - val file = (IntegrationTest / sourceManaged).value / "settings.scala" - IO.write( - file, - """package %s - |object ProjectMetadata { - | val organization = "%s" - | val name = "%s" - | val version = "%s" - | val dockerTag = "%s" - |} - |""" - .stripMargin - .format( - buildInfoPackage.value, - organization.value, - name.value, - version.value, - dockerAlias.value.tag.get - ) - ) - Seq(file) - }.taskValue -) - -lazy val buildSettings = Seq( - organization := "com.snowplowanalytics", - name := "snowplow-stream-collector", - description := "Scala Stream Collector for Snowplow raw events", - scalaVersion := "2.12.10", - scalacOptions ++= Seq("-Ypartial-unification", "-Ywarn-macros:after"), - javacOptions := Seq("-source", "11", "-target", "11"), - resolvers ++= Dependencies.resolutionRepos -) - -lazy val dynVerSettings = Seq( - ThisBuild / dynverVTagPrefix := false, // Otherwise git tags required to have v-prefix - ThisBuild / dynverSeparator := "-" // to be compatible with docker -) - -lazy val http4sBuildInfoSettings = Seq( - buildInfoKeys := Seq[BuildInfoKey](name, moduleName, dockerAlias, version), - buildInfoOptions += BuildInfoOption.Traits("com.snowplowanalytics.snowplow.collector.core.AppInfo") -) - -lazy val allSettings = buildSettings ++ - BuildSettings.sbtAssemblySettings ++ - BuildSettings.formatting ++ - Seq(excludeDependencies ++= commonExclusions) ++ - dynVerSettings ++ - BuildSettings.addExampleConfToTestCp - + lazy val root = project .in(file(".")) - .settings(buildSettings ++ dynVerSettings) - .aggregate(core, kinesis, pubsub, kafka, nsq, stdout, sqs, rabbitmq, http4s) + .aggregate(kinesis, pubsub, kafka, nsq, stdout, sqs, core) lazy val core = project - .settings(moduleName := "snowplow-stream-collector-core") - .settings(buildSettings ++ BuildSettings.sbtAssemblySettings) - .settings(libraryDependencies ++= commonDependencies) - .settings(excludeDependencies ++= commonExclusions) - .settings(Defaults.itSettings) - .configs(IntegrationTest) - -lazy val http4s = project .settings(moduleName := "snowplow-stream-collector-http4s-core") - .settings(buildSettings ++ BuildSettings.sbtAssemblySettings) + .settings(BuildSettings.coreHttp4sSettings) .settings( libraryDependencies ++= Seq( Dependencies.Libraries.http4sDsl, - Dependencies.Libraries.http4sEmber, Dependencies.Libraries.http4sBlaze, - Dependencies.Libraries.http4sNetty, Dependencies.Libraries.http4sClient, Dependencies.Libraries.log4cats, Dependencies.Libraries.thrift, @@ -146,192 +36,88 @@ lazy val http4s = project Dependencies.Libraries.ceTestkit, //Integration tests - Dependencies.Libraries.IT.testcontainers, - Dependencies.Libraries.IT.http4sClient, - Dependencies.Libraries.IT.catsRetry + Dependencies.Libraries.IntegrationTests.testcontainers, + Dependencies.Libraries.IntegrationTests.http4sClient, + Dependencies.Libraries.IntegrationTests.catsRetry ) ) - .settings(Defaults.itSettings) .configs(IntegrationTest) -lazy val kinesisSettings = - allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( - moduleName := "snowplow-stream-collector-kinesis", - buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", - Docker / packageName := "scala-stream-collector-kinesis", - libraryDependencies ++= Seq( - Dependencies.Libraries.catsRetry, - Dependencies.Libraries.kinesis, - Dependencies.Libraries.sts, - Dependencies.Libraries.sqs, - // integration tests dependencies - Dependencies.Libraries.IT.specs2, - Dependencies.Libraries.IT.specs2CE, - ), - IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, - IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated - ) - lazy val kinesis = project - .settings(kinesisSettings) + .settings(BuildSettings.kinesisSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile;it->it") + .dependsOn(core % "test->test;compile->compile;it->it") .configs(IntegrationTest) lazy val kinesisDistroless = project .in(file("distroless/kinesis")) .settings(sourceDirectory := (kinesis / sourceDirectory).value) - .settings(kinesisSettings) + .settings(BuildSettings.kinesisSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile;it->it") + .dependsOn(core % "test->test;compile->compile;it->it") .configs(IntegrationTest) -lazy val sqsSettings = - allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( - moduleName := "snowplow-stream-collector-sqs", - buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", - Docker / packageName := "scala-stream-collector-sqs", - libraryDependencies ++= Seq( - Dependencies.Libraries.catsRetry, - Dependencies.Libraries.sqs, - Dependencies.Libraries.sts, - ) - ) - lazy val sqs = project - .settings(sqsSettings) + .settings(BuildSettings.sqsSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile") + .dependsOn(core % "test->test;compile->compile") lazy val sqsDistroless = project .in(file("distroless/sqs")) .settings(sourceDirectory := (sqs / sourceDirectory).value) - .settings(sqsSettings) + .settings(BuildSettings.sqsSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile") - -lazy val pubsubSettings = - allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ scalifiedSettings ++ Seq( - moduleName := "snowplow-stream-collector-google-pubsub", - buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", - Docker / packageName := "scala-stream-collector-pubsub", - libraryDependencies ++= Seq( - Dependencies.Libraries.catsRetry, - Dependencies.Libraries.fs2PubSub, - // integration tests dependencies - Dependencies.Libraries.IT.specs2, - Dependencies.Libraries.IT.specs2CE, - ), - IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, - IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated - ) + .dependsOn(core % "test->test;compile->compile") lazy val pubsub = project - .settings(pubsubSettings) + .settings(BuildSettings.pubsubSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile;it->it") + .dependsOn(core % "test->test;compile->compile;it->it") .configs(IntegrationTest) lazy val pubsubDistroless = project .in(file("distroless/pubsub")) .settings(sourceDirectory := (pubsub / sourceDirectory).value) - .settings(pubsubSettings) + .settings(BuildSettings.pubsubSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile;it->it") + .dependsOn(core % "test->test;compile->compile;it->it") .configs(IntegrationTest) -lazy val kafkaSettings = - allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Defaults.itSettings ++ Seq( - moduleName := "snowplow-stream-collector-kafka", - buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", - Docker / packageName := "scala-stream-collector-kafka", - libraryDependencies ++= Seq( - Dependencies.Libraries.kafkaClients, - Dependencies.Libraries.mskAuth, - // integration tests dependencies - Dependencies.Libraries.IT.specs2, - Dependencies.Libraries.IT.specs2CE - ), - IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, - IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated - ) - lazy val kafka = project - .settings(kafkaSettings) + .settings(BuildSettings.kafkaSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile;it->it") + .dependsOn(core % "test->test;compile->compile;it->it") .configs(IntegrationTest) - lazy val kafkaDistroless = project .in(file("distroless/kafka")) .settings(sourceDirectory := (kafka / sourceDirectory).value) - .settings(kafkaSettings) + .settings(BuildSettings.kafkaSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile;it->it") + .dependsOn(core % "test->test;compile->compile;it->it") .configs(IntegrationTest) - -lazy val nsqSettings = - allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( - moduleName := "snowplow-stream-collector-nsq", - Docker / packageName := "scala-stream-collector-nsq", - buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream", - libraryDependencies ++= Seq( - Dependencies.Libraries.nsqClient, - Dependencies.Libraries.jackson, - Dependencies.Libraries.nettyAll, - Dependencies.Libraries.log4j - ) - ) - lazy val nsq = project - .settings(nsqSettings) + .settings(BuildSettings.nsqSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile") + .dependsOn(core % "test->test;compile->compile") lazy val nsqDistroless = project .in(file("distroless/nsq")) .settings(sourceDirectory := (nsq / sourceDirectory).value) - .settings(nsqSettings) + .settings(BuildSettings.nsqSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile") - -lazy val stdoutSettings = - allSettings ++ buildInfoSettings ++ http4sBuildInfoSettings ++ Seq( - moduleName := "snowplow-stream-collector-stdout", - buildInfoPackage := s"com.snowplowanalytics.snowplow.collector.stdout", - Docker / packageName := "scala-stream-collector-stdout" - ) + .dependsOn(core % "test->test;compile->compile") lazy val stdout = project - .settings(stdoutSettings) + .settings(BuildSettings.stdoutSettings) .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile") + .dependsOn(core % "test->test;compile->compile") lazy val stdoutDistroless = project .in(file("distroless/stdout")) .settings(sourceDirectory := (stdout / sourceDirectory).value) - .settings(stdoutSettings) + .settings(BuildSettings.stdoutSettings) .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(http4s % "test->test;compile->compile") - -lazy val rabbitmqSettings = - allSettings ++ buildInfoSettings ++ Seq( - moduleName := "snowplow-stream-collector-rabbitmq", - Docker / packageName := "scala-stream-collector-rabbitmq-experimental", - libraryDependencies ++= Seq(Dependencies.Libraries.rabbitMQ) - ) - -lazy val rabbitmq = project - .settings(rabbitmqSettings) - .enablePlugins(JavaAppPackaging, SnowplowDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") - -lazy val rabbitmqDistroless = project - .in(file("distroless/rabbitmq")) - .settings(sourceDirectory := (rabbitmq / sourceDirectory).value) - .settings(rabbitmqSettings) - .enablePlugins(JavaAppPackaging, SnowplowDistrolessDockerPlugin, BuildInfoPlugin) - .dependsOn(core % "test->test;compile->compile") + .dependsOn(core % "test->test;compile->compile") \ No newline at end of file diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala index 2feb1dae4..e7d1d613a 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala @@ -10,21 +10,14 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it -import scala.concurrent.ExecutionContext - +import cats.effect.{IO, Resource} import cats.implicits._ - -import cats.effect.{ContextShift, IO, Resource} - -import org.http4s.{Request, Response, Status} +import org.http4s.blaze.client.BlazeClientBuilder import org.http4s.client.Client -import org.http4s.client.blaze.BlazeClientBuilder +import org.http4s.{Request, Response, Status} object Http { - private val executionContext = ExecutionContext.global - implicit val ioContextShift: ContextShift[IO] = IO.contextShift(executionContext) - def statuses(requests: List[Request[IO]]): IO[List[Status]] = mkClient.use { client => requests.traverse(client.status) } @@ -38,5 +31,5 @@ object Http { mkClient.use(c => requests.traverse(r => c.run(r).use(resp => IO.pure(resp)))) def mkClient: Resource[IO, Client[IO]] = - BlazeClientBuilder[IO](executionContext).resource + BlazeClientBuilder.apply[IO].resource } diff --git a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala index bfefaafba..485836c1e 100644 --- a/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala +++ b/core/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala @@ -11,7 +11,6 @@ package com.snowplowanalytics.snowplow.collectors.scalastream.it import scala.concurrent.duration._ -import scala.concurrent.ExecutionContext import org.apache.thrift.TDeserializer @@ -24,7 +23,7 @@ import io.circe.{Json, parser} import cats.implicits._ -import cats.effect.{IO, Timer} +import cats.effect.IO import retry.syntax.all._ import retry.RetryPolicies @@ -38,9 +37,6 @@ import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPa object utils { - private val executionContext: ExecutionContext = ExecutionContext.global - implicit val ioTimer: Timer[IO] = IO.timer(executionContext) - def parseCollectorPayload(bytes: Array[Byte]): CollectorPayload = { val deserializer = new TDeserializer() val target = new CollectorPayload() @@ -94,7 +90,7 @@ object utils { ) IO(condition(a)).retryingOnFailures( - _ == false, + result => IO(!result), retryPolicy, (_, _) => IO.unit ) diff --git a/core/src/main/resources/reference.conf b/core/src/main/resources/reference.conf index afb37264d..96dfd594f 100644 --- a/core/src/main/resources/reference.conf +++ b/core/src/main/resources/reference.conf @@ -1,96 +1,94 @@ -enableDefaultRedirect = false -redirectDomains = [] -terminationDeadline = 10.seconds -preTerminationPeriod = 10.seconds -preTerminationUnhealthy = false +{ + paths {} -paths { - -} + p3p { + policyRef = "/w3c/p3p.xml" + CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" + } -crossDomain { + crossDomain { enabled = false domains = [ "*" ] secure = true -} - -cookieBounce { - enabled = false - name = "n3pc" - fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000" -} + } -cookie { + cookie { enabled = true expiration = 365 days + domains = [] name = sp secure = true httpOnly = true sameSite = "None" -} + } -doNotTrackCookie { + doNotTrackCookie { enabled = false name = "" value = "" -} + } -ssl { - enable = false - redirect = false - port = 443 -} + cookieBounce { + enabled = false + name = "n3pc" + fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000" + } -p3p { - policyRef = "/w3c/p3p.xml" - CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" -} + redirectMacro { + enabled = false + } -rootResponse { + rootResponse { enabled = false statusCode = 302 headers = {} body = "" -} - + } -redirectMacro { - enabled = false -} + cors { + accessControlMaxAge = 60 minutes + } -cors { - accessControlMaxAge = 60 minutes -} + streams { + useIpAddressAsPartitionKey = false + } -telemetry { + telemetry { disable = false interval = 60 minutes method = POST url = telemetry-g.snowplowanalytics.com port = 443 secure = true -} - -monitoring.metrics.statsd { - enabled = false - # StatsD metric reporting protocol configuration - hostname = localhost - port = 8125 - # Required, how frequently to report metrics - period = "10 seconds" - # Optional, override the default metric prefix - # "prefix": "snowplow.collector" -} + } -streams { - useIpAddressAsPartitionKey = false -} + monitoring { + metrics { + statsd { + enabled = false + hostname = localhost + port = 8125 + period = 10 seconds + prefix = snowplow.collector + } + } + } -experimental { - warmup { + ssl { enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 + redirect = false + port = 443 + } + + networking { + maxConnections = 1024 + idleTimeout = 610 seconds } + + enableDefaultRedirect = false + preTerminationPeriod = 10 seconds + + redirectDomains = [] + + preTerminationPeriod = 10 seconds } diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala similarity index 68% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala index 22ee2e25f..4a62ea463 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/App.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import cats.effect.{ExitCode, IO} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala new file mode 100644 index 000000000..9c9a67a3b --- /dev/null +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala @@ -0,0 +1,18 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collector.core + +trait AppInfo { + def name: String + def moduleName: String + def version: String + def dockerAlias: String +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala similarity index 92% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index 86567becc..30ec5d0b3 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import scala.concurrent.duration._ diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala similarity index 79% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala index c2960ba8d..8167bf257 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParser.scala @@ -1,25 +1,30 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import java.nio.file.{Files, Path} - import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger - -import com.typesafe.config.{Config => TypesafeConfig, ConfigFactory} - -import scala.collection.JavaConverters._ - +import com.typesafe.config.{ConfigFactory, Config => TypesafeConfig} import io.circe.Decoder import io.circe.config.syntax.CirceConfigOps - import cats.implicits._ import cats.data.EitherT - import cats.effect.{ExitCode, Sync} +import scala.jdk.CollectionConverters._ + object ConfigParser { - implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] + implicit private def logger[F[_]: Sync]: Logger[F] = Slf4jLogger.getLogger[F] def fromPath[F[_]: Sync, SinkConfig: Decoder]( configPath: Option[Path] diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala new file mode 100644 index 000000000..e31f9dd40 --- /dev/null +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala @@ -0,0 +1,55 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collector.core + +import cats.effect.{Async, Resource} +import cats.implicits._ +import org.http4s.HttpApp +import org.http4s.blaze.server.BlazeServerBuilder +import org.http4s.server.Server +import org.typelevel.log4cats.Logger +import org.typelevel.log4cats.slf4j.Slf4jLogger + +import java.net.InetSocketAddress +import javax.net.ssl.SSLContext + +object HttpServer { + + implicit private def logger[F[_]: Async]: Logger[F] = Slf4jLogger.getLogger[F] + + def build[F[_]: Async]( + app: HttpApp[F], + port: Int, + secure: Boolean, + networking: Config.Networking + ): Resource[F, Server] = + buildBlazeServer[F](app, port, secure, networking) + + private def buildBlazeServer[F[_]: Async]( + app: HttpApp[F], + port: Int, + secure: Boolean, + networking: Config.Networking + ): Resource[F, Server] = + Resource.eval(Logger[F].info("Building blaze server")) >> + BlazeServerBuilder[F] + .bindSocketAddress(new InetSocketAddress(port)) + .withHttpApp(app) + .withIdleTimeout(networking.idleTimeout) + .withMaxConnections(networking.maxConnections) + .cond(secure, _.withSslContext(SSLContext.getDefault)) + .resource + + implicit class ConditionalAction[A](item: A) { + def cond(cond: Boolean, action: A => A): A = + if (cond) action(item) else item + } +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala similarity index 83% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index d8e053c9d..98518523c 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import cats.implicits._ diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala similarity index 87% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 944785107..97cfbaef6 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import java.nio.file.Path @@ -29,7 +39,7 @@ object Run { type TelemetryInfo[F[_], SinkConfig] = Config.Streams[SinkConfig] => F[Telemetry.TelemetryInfo] - implicit private def logger[F[_]: Sync] = Slf4jLogger.getLogger[F] + implicit private def logger[F[_]: Sync]: Logger[F] = Slf4jLogger.getLogger[F] def fromCli[F[_]: Async: Tracking, SinkConfig: Decoder]( appInfo: AppInfo, @@ -72,7 +82,6 @@ object Run { ) httpServer = HttpServer.build[F]( new Routes[F](config.enableDefaultRedirect, collectorService).value, - config.interface, if (config.ssl.enable) config.ssl.port else config.port, config.ssl.enable, config.networking diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala similarity index 96% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index 4b33498c8..7c79f16d9 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import java.util.UUID @@ -5,7 +15,7 @@ import java.util.UUID import org.apache.commons.codec.binary.Base64 import scala.concurrent.duration._ -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import cats.effect.{Clock, Sync} import cats.implicits._ diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala new file mode 100644 index 000000000..089ab72e4 --- /dev/null +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala @@ -0,0 +1,21 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ +package com.snowplowanalytics.snowplow.collector.core + +trait Sink[F[_]] { + + // Maximum number of bytes that a single record can contain. + // If a record is bigger, a size violation bad row is emitted instead + val maxBytes: Int + + def isHealthy: F[Boolean] + def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] +} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala similarity index 93% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala index f7114be0e..a35d76626 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatch.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import java.nio.ByteBuffer diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala similarity index 91% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala index a222c4208..9f0f10828 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Telemetry.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import org.typelevel.log4cats.Logger diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala similarity index 65% rename from http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala rename to core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala index 1a998715f..0b1d6e8d0 100644 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/model.scala @@ -1,3 +1,13 @@ +/** + * Copyright (c) 2013-present Snowplow Analytics Ltd. + * All rights reserved. + * + * This software is made available by Snowplow Analytics, Ltd., + * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 + * located at https://docs.snowplow.io/limited-use-license-1.0 + * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION + * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. + */ package com.snowplowanalytics.snowplow.collector.core import io.circe.Json diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala deleted file mode 100644 index 81e6af06d..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Collector.scala +++ /dev/null @@ -1,241 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import java.io.File -import javax.net.ssl.SSLContext -import org.slf4j.LoggerFactory -import akka.actor.ActorSystem -import akka.http.scaladsl.{ConnectionContext, Http} -import akka.http.scaladsl.model.StatusCodes -import akka.http.scaladsl.server.Route -import akka.http.scaladsl.server.Directives._ -import com.typesafe.config.{Config, ConfigFactory} -import pureconfig.ConfigSource -import com.timgroup.statsd.NonBlockingStatsDClientBuilder -import fr.davit.akka.http.metrics.core.HttpMetricsRegistry -import fr.davit.akka.http.metrics.core.HttpMetrics._ -import fr.davit.akka.http.metrics.datadog.{DatadogRegistry, DatadogSettings} -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.Sink -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService - -import scala.concurrent.{Await, ExecutionContext, Future} -import scala.concurrent.duration.Duration -import scala.util.control.NonFatal - -// Main entry point of the Scala collector. -trait Collector { - - def appName: String - - def scalaVersion: String - - def appVersion: String - - lazy val log = LoggerFactory.getLogger(getClass()) - - // Used as an option prefix when reading system properties. - val Namespace = "collector" - - /** Optionally give precedence to configs wrapped in a "snowplow" block. To help avoid polluting system namespace */ - private def namespaced(config: Config): Config = - if (config.hasPath(Namespace)) - config.getConfig(Namespace).withFallback(config.withoutPath(Namespace)) - else - config - - def parseConfig(args: Array[String]): (CollectorConfig, Config) = { - case class FileConfig(config: File = new File(".")) - - val parser = new scopt.OptionParser[FileConfig](appName) { - head(appName, appVersion) - help("help") - version("version") - opt[File]("config") - .optional() - .valueName("") - .action((f: File, c: FileConfig) => c.copy(f)) - .validate(f => - if (f.exists) success - else failure(s"Configuration file $f does not exist") - ) - } - - val resolved = parser.parse(args, FileConfig()) match { - case Some(c) => ConfigFactory.parseFile(c.config).resolve() - case None => ConfigFactory.empty() - } - - val conf = namespaced(ConfigFactory.load(namespaced(resolved.withFallback(namespaced(ConfigFactory.load()))))) - - (ConfigSource.fromConfig(conf).loadOrThrow[CollectorConfig], conf) - } - - def run( - collectorConf: CollectorConfig, - akkaConf: Config, - sinks: CollectorSinks, - telemetry: TelemetryAkkaService - ): Unit = { - - implicit val system = ActorSystem.create("scala-stream-collector", akkaConf) - implicit val executionContext = system.dispatcher - - telemetry.start() - - val health = new HealthService.Settable - - val collectorRoute = new CollectorRoute { - override def collectorService = new CollectorService(collectorConf, sinks, appName, appVersion) - override def healthService = health - } - - lazy val redirectRoutes = - scheme("http") { - redirectToHttps(collectorConf.ssl.port) - } - - def redirectToHttps(port: Int) = - extract(_.request.uri) { uri => - redirect( - uri.withScheme("https").withPort(port), - StatusCodes.MovedPermanently - ) - } - - def xForwardedProto(routes: Route): Route = - if (collectorConf.ssl.redirect) - optionalHeaderValueByName("X-Forwarded-Proto") { - case Some(clientProtocol) if clientProtocol == "http" => - redirectToHttps(0) - case _ => routes - } - else - routes - - def shutdownHook(binding: Http.ServerBinding): Http.ServerBinding = - binding.addToCoordinatedShutdown(collectorConf.terminationDeadline) - def startupHook(binding: Http.ServerBinding): Unit = log.info(s"REST interface bound to ${binding.localAddress}") - def errorHook(ex: Throwable): Unit = log.error( - "REST interface could not be bound to " + - s"${collectorConf.interface}:${collectorConf.port}", - ex.getMessage - ) - - lazy val metricRegistry: Option[HttpMetricsRegistry] = collectorConf.monitoring.metrics.statsd match { - case StatsdConfig(true, hostname, port, period, prefix, tags) => - val constantTags = tags.map { case (k: String, v: String) => s"${k}:${v}" } - - Some( - DatadogRegistry( - client = new NonBlockingStatsDClientBuilder() - .hostname(hostname) - .port(port) - .enableAggregation(true) - .aggregationFlushInterval(period.toMillis.toInt) - .enableTelemetry(false) - .constantTags(constantTags.toArray: _*) - .build(), - DatadogSettings - .default - .withNamespace(prefix) - .withIncludeMethodDimension(true) - .withIncludeStatusDimension(true) - ) - ) - case _ => None - } - - def secureEndpoint(metricRegistry: Option[HttpMetricsRegistry]): Future[Unit] = - endpoint(xForwardedProto(collectorRoute.collectorRoute), collectorConf.ssl.port, true, metricRegistry) - - def unsecureEndpoint(routes: Route, metricRegistry: Option[HttpMetricsRegistry]): Future[Unit] = - endpoint(xForwardedProto(routes), collectorConf.port, false, metricRegistry).flatMap { _ => - Warmup.run(collectorConf.interface, collectorConf.port, collectorConf.experimental.warmup) - } - - def endpoint( - routes: Route, - port: Int, - secure: Boolean, - metricRegistry: Option[HttpMetricsRegistry] - ): Future[Unit] = - metricRegistry match { - case Some(r) => - val builder = Http().newMeteredServerAt(collectorConf.interface, port, r) - val stage = if (secure) builder.enableHttps(ConnectionContext.httpsServer(SSLContext.getDefault)) else builder - stage.bind(routes).map(shutdownHook).map(startupHook).recover { - case ex => errorHook(ex) - } - case None => - val builder = Http().newServerAt(collectorConf.interface, port) - val stage = if (secure) builder.enableHttps(ConnectionContext.httpsServer(SSLContext.getDefault)) else builder - stage.bind(routes).map(shutdownHook).map(startupHook).recover { - case ex => errorHook(ex) - } - } - - val binds = collectorConf.ssl match { - case SSLConfig(true, true, _) => - List( - unsecureEndpoint(redirectRoutes, metricRegistry), - secureEndpoint(metricRegistry) - ) - case SSLConfig(true, false, _) => - List( - unsecureEndpoint(collectorRoute.collectorRoute, metricRegistry), - secureEndpoint(metricRegistry) - ) - case _ => - List(unsecureEndpoint(collectorRoute.collectorRoute, metricRegistry)) - } - - Future.sequence(binds).foreach { _ => - Runtime - .getRuntime - .addShutdownHook(new Thread(() => { - log.warn("Received shutdown signal") - if (collectorConf.preTerminationUnhealthy) { - log.warn("Setting health endpoint to unhealthy") - health.toUnhealthy() - } - log.warn(s"Sleeping for ${collectorConf.preTerminationPeriod}") - Thread.sleep(collectorConf.preTerminationPeriod.toMillis) - log.warn("Initiating http server termination") - try { - // The actor system is already configured to shutdown within `terminationDeadline` so we await for double that. - Await.result(system.terminate(), collectorConf.terminationDeadline * 2) - log.warn("Server terminated") - } catch { - case NonFatal(t) => - log.error("Caught exception awaiting server termination", t) - } - val shutdowns = List(shutdownSink(sinks.good, "good"), shutdownSink(sinks.bad, "bad")) - Await.result(Future.sequence(shutdowns), Duration.Inf) - () - })) - - log.info("Setting health endpoint to healthy") - health.toHealthy() - } - } - - private def shutdownSink(sink: Sink, label: String)(implicit ec: ExecutionContext): Future[Unit] = - Future { - log.warn(s"Initiating $label sink shutdown") - sink.shutdown() - log.warn(s"Completed $label sink shutdown") - }.recover { - case NonFatal(t) => - log.error(s"Caught exception shutting down $label sink", t) - } -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala deleted file mode 100644 index de818bbf6..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRoute.scala +++ /dev/null @@ -1,203 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import akka.http.scaladsl.model._ -import akka.http.scaladsl.model.headers.HttpCookiePair -import akka.http.scaladsl.server.{Directive1, Route} -import akka.http.scaladsl.server.Directives._ - -import com.snowplowanalytics.snowplow.collectors.scalastream.model.DntCookieMatcher - -trait CollectorRoute { - def collectorService: Service - def healthService: HealthService - - private val headers = optionalHeaderValueByName("User-Agent") & - optionalHeaderValueByName("Referer") & - optionalHeaderValueByName("Raw-Request-URI") & - optionalHeaderValueByName("SP-Anonymous") - - private[scalastream] def extractors(spAnonymous: Option[String]) = - spAnonymous match { - case Some(_) => - extractHost & extractClientIP.map[RemoteAddress](_ => RemoteAddress.Unknown) & extractRequest - case _ => extractHost & extractClientIP & extractRequest - } - - def extractContentType: Directive1[ContentType] = - extractRequestContext.map(_.request.entity.contentType) - - def collectorRoute = - if (collectorService.enableDefaultRedirect) routes else rejectRedirect ~ routes - - def rejectRedirect: Route = - path("r" / Segment) { _ => - complete(StatusCodes.NotFound -> "redirects disabled") - } - - def routes: Route = - doNotTrack(collectorService.doNotTrackCookie) { dnt => - cookieIfWanted(collectorService.cookieName) { reqCookie => - val cookie = reqCookie.map(_.toCookie) - headers { (userAgent, refererURI, rawRequestURI, spAnonymous) => - val qs = queryString(rawRequestURI) - extractors(spAnonymous) { (host, ip, request) => - // get the adapter vendor and version from the path - path(Segment / Segment) { (vendor, version) => - val path = collectorService.determinePath(vendor, version) - post { - extractContentType { ct => - entity(as[String]) { body => - val r = collectorService.cookie( - qs, - Some(body), - path, - cookie, - userAgent, - refererURI, - host, - ip, - request, - pixelExpected = false, - doNotTrack = dnt, - Some(ct), - spAnonymous - ) - complete(r) - } - } - } ~ - (get | head) { - val r = collectorService.cookie( - qs, - None, - path, - cookie, - userAgent, - refererURI, - host, - ip, - request, - pixelExpected = true, - doNotTrack = dnt, - None, - spAnonymous - ) - complete(r) - } - } ~ - path("""ice\.png""".r | "i".r) { path => - (get | head) { - val r = collectorService.cookie( - qs, - None, - "/" + path, - cookie, - userAgent, - refererURI, - host, - ip, - request, - pixelExpected = true, - doNotTrack = dnt, - None, - spAnonymous - ) - complete(r) - } - } - } - } - } - } ~ corsRoute ~ healthRoute ~ sinkHealthRoute ~ crossDomainRoute ~ rootRoute ~ robotsRoute ~ { - complete(HttpResponse(404, entity = "404 not found")) - } - - /** - * Extract the query string from a request URI - * @param rawRequestURI URI optionally extracted from the Raw-Request-URI header - * @return the extracted query string or an empty string - */ - def queryString(rawRequestURI: Option[String]): Option[String] = { - val querystringExtractor = "^[^?]*\\?([^#]*)(?:#.*)?$".r - rawRequestURI.flatMap { - case querystringExtractor(qs) => Some(qs) - case _ => None - } - } - - /** - * Directive to extract a cookie if a cookie name is specified and if such a cookie exists - * @param name Optionally configured cookie name - */ - def cookieIfWanted(name: Option[String]): Directive1[Option[HttpCookiePair]] = name match { - case Some(n) => optionalCookie(n) - case None => optionalHeaderValue(_ => None) - } - - /** - * Directive to filter requests which contain a do not track cookie - * @param cookieMatcher the configured do not track cookie to check against - */ - def doNotTrack(cookieMatcher: Option[DntCookieMatcher]): Directive1[Boolean] = - cookieIfWanted(cookieMatcher.map(_.name)).map { c => - (c, cookieMatcher) match { - case (Some(actual), Some(config)) => config.matches(actual) - case _ => false - } - } - - private def crossDomainRoute: Route = get { - path("""crossdomain\.xml""".r) { _ => - complete(collectorService.flashCrossDomainPolicy) - } - } - - private def healthRoute: Route = get { - path("health".r) { _ => - if (healthService.isHealthy) - complete(HttpResponse(200, entity = "OK")) - else - complete(HttpResponse(503, entity = "Service Unavailable")) - } - } - - private def sinkHealthRoute: Route = get { - path("sink-health".r) { _ => - complete( - if (collectorService.sinksHealthy) { - HttpResponse(200, entity = "OK") - } else { - HttpResponse(503, entity = "Service Unavailable") - } - ) - } - } - - private def corsRoute: Route = options { - extractRequest { request => - complete(collectorService.preflightResponse(request)) - } - } - - private def rootRoute: Route = get { - pathSingleSlash { - complete(collectorService.rootResponse) - } - } - - private def robotsRoute: Route = get { - path("robots.txt".r) { _ => - complete(HttpResponse(200, entity = "User-agent: *\nDisallow: /")) - } - } -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala deleted file mode 100644 index d9b457b81..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorService.scala +++ /dev/null @@ -1,497 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import java.net.{MalformedURLException, URL} -import java.nio.charset.StandardCharsets.UTF_8 -import java.time.Instant -import java.util.UUID -import org.apache.commons.codec.binary.Base64 -import org.slf4j.LoggerFactory - -import scala.collection.JavaConverters._ - -import akka.http.scaladsl.model._ -import akka.http.scaladsl.model.headers._ -import akka.http.scaladsl.model.headers.CacheDirectives._ -import cats.data.NonEmptyList -import cats.implicits._ - -import com.snowplowanalytics.snowplow.badrows._ -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.utils.SplitBatch - -/** - * Service responding to HTTP requests, mainly setting a cookie identifying the user and storing - * events - */ -trait Service { - def preflightResponse(req: HttpRequest): HttpResponse - def flashCrossDomainPolicy: HttpResponse - def rootResponse: HttpResponse - def cookie( - queryString: Option[String], - body: Option[String], - path: String, - cookie: Option[HttpCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: String, - ip: RemoteAddress, - request: HttpRequest, - pixelExpected: Boolean, - doNotTrack: Boolean, - contentType: Option[ContentType] = None, - spAnonymous: Option[String] = None - ): HttpResponse - def cookieName: Option[String] - def doNotTrackCookie: Option[DntCookieMatcher] - def determinePath(vendor: String, version: String): String - def enableDefaultRedirect: Boolean - def sinksHealthy: Boolean -} - -object CollectorService { - // Contains an invisible pixel to return for `/i` requests. - val pixel = Base64.decodeBase64("R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==") -} - -class CollectorService( - config: CollectorConfig, - sinks: CollectorSinks, - appName: String, - appVersion: String -) extends Service { - - private val logger = LoggerFactory.getLogger(getClass) - val splitBatch: SplitBatch = SplitBatch(appName, appVersion) - - private val collector = s"$appName-$appVersion-" + - config.streams.sink.getClass.getSimpleName.toLowerCase - - override val cookieName = config.cookieName - override val doNotTrackCookie = config.doNotTrackHttpCookie - override val enableDefaultRedirect = config.enableDefaultRedirect - override def sinksHealthy = sinks.good.isHealthy && sinks.bad.isHealthy - - private val spAnonymousNuid = "00000000-0000-0000-0000-000000000000" - - /** - * Determines the path to be used in the response, - * based on whether a mapping can be found in the config for the original request path. - */ - override def determinePath(vendor: String, version: String): String = { - val original = s"/$vendor/$version" - config.paths.getOrElse(original, original) - } - - override def cookie( - queryString: Option[String], - body: Option[String], - path: String, - cookie: Option[HttpCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: String, - ip: RemoteAddress, - request: HttpRequest, - pixelExpected: Boolean, - doNotTrack: Boolean, - contentType: Option[ContentType] = None, - spAnonymous: Option[String] - ): HttpResponse = { - val (ipAddress, partitionKey) = ipAndPartitionKey(ip, config.streams.useIpAddressAsPartitionKey) - - extractQueryParams(queryString) match { - case Right(params) => - val redirect = path.startsWith("/r/") - - val nuidOpt = networkUserId(request, cookie, spAnonymous) - val bouncing = params.contains(config.cookieBounce.name) - // we bounce if it's enabled and we couldn't retrieve the nuid and we're not already bouncing - val bounce = config.cookieBounce.enabled && nuidOpt.isEmpty && !bouncing && - pixelExpected && !redirect - val nuid = nuidOpt.getOrElse { - if (bouncing) config.cookieBounce.fallbackNetworkUserId - else UUID.randomUUID().toString - } - - val ct = contentType.map(_.value.toLowerCase) - val event = - buildEvent( - queryString, - body, - path, - userAgent, - refererUri, - hostname, - ipAddress, - request, - nuid, - ct, - spAnonymous - ) - // we don't store events in case we're bouncing - if (!bounce && !doNotTrack) sinkEvent(event, partitionKey) - - val headers = bounceLocationHeader(params, request, config.cookieBounce, bounce) ++ - cookieHeader(request, config.cookieConfig, nuid, doNotTrack, spAnonymous) ++ - cacheControl(pixelExpected) ++ - List( - RawHeader("P3P", "policyref=\"%s\", CP=\"%s\"".format(config.p3p.policyRef, config.p3p.CP)), - accessControlAllowOriginHeader(request), - `Access-Control-Allow-Credentials`(true) - ) - - buildHttpResponse(event, params, headers.toList, redirect, pixelExpected, bounce, config.redirectMacro) - - case Left(error) => - val badRow = BadRow.GenericError( - Processor(appName, appVersion), - Failure.GenericFailure(Instant.now(), NonEmptyList.one(error.getMessage)), - Payload.RawPayload(queryString.getOrElse("")) - ) - - if (sinks.bad.isHealthy) { - sinkBad(badRow, partitionKey) - HttpResponse(StatusCodes.OK) - } else HttpResponse(StatusCodes.OK) // if bad sink is unhealthy, we don't want to try storing the bad rows - } - } - - def extractQueryParams(qs: Option[String]): Either[IllegalUriException, Map[String, String]] = - Either.catchOnly[IllegalUriException] { Uri.Query(qs).toMap } - - /** - * Creates a response to the CORS preflight Options request - * @param request Incoming preflight Options request - * @return Response granting permissions to make the actual request - */ - override def preflightResponse(request: HttpRequest): HttpResponse = - preflightResponse(request, config.cors) - - def preflightResponse(request: HttpRequest, corsConfig: CORSConfig): HttpResponse = - HttpResponse().withHeaders( - List( - accessControlAllowOriginHeader(request), - `Access-Control-Allow-Credentials`(true), - `Access-Control-Allow-Headers`("Content-Type", "SP-Anonymous"), - `Access-Control-Max-Age`(corsConfig.accessControlMaxAge.toSeconds) - ) - ) - - override def flashCrossDomainPolicy: HttpResponse = - flashCrossDomainPolicy(config.crossDomain) - - /** Creates a response with a cross domain policiy file */ - def flashCrossDomainPolicy(config: CrossDomainConfig): HttpResponse = - if (config.enabled) { - HttpResponse(entity = HttpEntity( - contentType = ContentType(MediaTypes.`text/xml`, HttpCharsets.`ISO-8859-1`), - string = """""" + "\n\n" + - config - .domains - .map(d => s""" """) - .mkString("\n") + - "\n" - ) - ) - } else { - HttpResponse(404, entity = "404 not found") - } - - override def rootResponse: HttpResponse = - rootResponse(config.rootResponse) - - def rootResponse(c: RootResponseConfig): HttpResponse = - if (c.enabled) { - val rawHeaders = c.headers.map { case (k, v) => RawHeader(k, v) }.toList - HttpResponse(c.statusCode, rawHeaders, HttpEntity(c.body)) - } else { - HttpResponse(404, entity = "404 not found") - } - - /** Builds a raw event from an Http request. */ - def buildEvent( - queryString: Option[String], - body: Option[String], - path: String, - userAgent: Option[String], - refererUri: Option[String], - hostname: String, - ipAddress: String, - request: HttpRequest, - networkUserId: String, - contentType: Option[String], - spAnonymous: Option[String] - ): CollectorPayload = { - val e = new CollectorPayload( - "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0", - ipAddress, - System.currentTimeMillis, - "UTF-8", - collector - ) - e.querystring = queryString.orNull - body.foreach(e.body = _) - e.path = path - userAgent.foreach(e.userAgent = _) - refererUri.foreach(e.refererUri = _) - e.hostname = hostname - e.networkUserId = networkUserId - e.headers = (headers(request, spAnonymous) ++ contentType).asJava - contentType.foreach(e.contentType = _) - e - } - - /** Produces the event to the configured sink. */ - def sinkEvent( - event: CollectorPayload, - partitionKey: String - ): Unit = { - // Split events into Good and Bad - val eventSplit = splitBatch.splitAndSerializePayload(event, sinks.good.maxBytes) - // Send events to respective sinks - sinks.good.storeRawEvents(eventSplit.good, partitionKey) - sinks.bad.storeRawEvents(eventSplit.bad, partitionKey) - } - - /** Sinks a bad row generated by an illegal querystring. */ - def sinkBad(badRow: BadRow, partitionKey: String): Unit = { - val toSink = List(badRow.compact.getBytes(UTF_8)) - sinks.bad.storeRawEvents(toSink, partitionKey) - } - - /** Builds the final http response from */ - def buildHttpResponse( - event: CollectorPayload, - queryParams: Map[String, String], - headers: List[HttpHeader], - redirect: Boolean, - pixelExpected: Boolean, - bounce: Boolean, - redirectMacroConfig: RedirectMacroConfig - ): HttpResponse = - if (redirect) { - val r = buildRedirectHttpResponse(event, queryParams, redirectMacroConfig) - r.withHeaders(r.headers ++ headers) - } else { - buildUsualHttpResponse(pixelExpected, bounce).withHeaders(headers) - } - - /** Builds the appropriate http response when not dealing with click redirects. */ - def buildUsualHttpResponse(pixelExpected: Boolean, bounce: Boolean): HttpResponse = - (pixelExpected, bounce) match { - case (true, true) => HttpResponse(StatusCodes.Found) - case (true, false) => - HttpResponse(entity = - HttpEntity(contentType = ContentType(MediaTypes.`image/gif`), bytes = CollectorService.pixel) - ) - // See https://github.com/snowplow/snowplow-javascript-tracker/issues/482 - case _ => HttpResponse(entity = "ok") - } - - /** Builds the appropriate http response when dealing with click redirects. */ - def buildRedirectHttpResponse( - event: CollectorPayload, - queryParams: Map[String, String], - redirectMacroConfig: RedirectMacroConfig - ): HttpResponse = - queryParams.get("u") match { - case Some(target) if redirectTargetAllowed(target) => - val canReplace = redirectMacroConfig.enabled && event.isSetNetworkUserId - val token = redirectMacroConfig.placeholder.getOrElse(s"$${SP_NUID}") - val replacedTarget = - if (canReplace) target.replaceAllLiterally(token, event.networkUserId) - else target - HttpResponse(StatusCodes.Found).withHeaders(`RawHeader`("Location", replacedTarget)) - case _ => HttpResponse(StatusCodes.BadRequest) - } - - private def redirectTargetAllowed(target: String): Boolean = - if (config.redirectDomains.isEmpty) true - else { - try { - val url = Option(new URL(target).getHost) - config.redirectDomains.exists(url.contains) - } catch { - case _: MalformedURLException => false - } - } - - /** - * Builds a cookie header with the network user id as value. - * @param cookieConfig cookie configuration extracted from the collector configuration - * @param networkUserId value of the cookie - * @param doNotTrack whether do not track is enabled or not - * @return the build cookie wrapped in a header - */ - def cookieHeader( - request: HttpRequest, - cookieConfig: Option[CookieConfig], - networkUserId: String, - doNotTrack: Boolean, - spAnonymous: Option[String] - ): Option[HttpHeader] = - if (doNotTrack) { - None - } else { - spAnonymous match { - case Some(_) => None - case None => - cookieConfig.map { config => - val responseCookie = HttpCookie( - name = config.name, - value = networkUserId, - expires = Some(DateTime.now + config.expiration.toMillis), - domain = cookieDomain(request.headers, config.domains, config.fallbackDomain), - path = Some("/"), - secure = config.secure, - httpOnly = config.httpOnly, - extension = config.sameSite.map(value => s"SameSite=$value") - ) - `Set-Cookie`(responseCookie) - } - } - } - - /** Build a location header redirecting to itself to check if third-party cookies are blocked. */ - def bounceLocationHeader( - queryParams: Map[String, String], - request: HttpRequest, - bounceConfig: CookieBounceConfig, - bounce: Boolean - ): Option[HttpHeader] = - if (bounce) { - val forwardedScheme = for { - headerName <- bounceConfig.forwardedProtocolHeader - headerValue <- request.headers.find(_.lowercaseName == headerName.toLowerCase).map(_.value().toLowerCase()) - scheme <- if (Set("http", "https").contains(headerValue)) { - Some(headerValue) - } else { - logger.warn(s"Header $headerName contains invalid protocol value $headerValue.") - None - } - } yield scheme - - val redirectUri = request - .uri - .withQuery(Uri.Query(queryParams + (bounceConfig.name -> "true"))) - .withScheme(forwardedScheme.getOrElse(request.uri.scheme)) - - Some(`Location`(redirectUri)) - } else { - None - } - - /** If the SP-Anonymous header is not present, retrieves all headers - * from the request except Remote-Address and Raw-Request-URI. - * If the SP-Anonymous header is present, additionally filters out the - * X-Forwarded-For, X-Real-IP and Cookie headers as well. - */ - def headers(request: HttpRequest, spAnonymous: Option[String]): Seq[String] = - request.headers.flatMap { - case _: `Remote-Address` | _: `Raw-Request-URI` if spAnonymous.isEmpty => None - case _: `Remote-Address` | _: `Raw-Request-URI` | _: `X-Forwarded-For` | _: `X-Real-Ip` | _: `Cookie` - if spAnonymous.isDefined => - None - case other => Some(other.unsafeToString) // named "unsafe" because it might contain sensitive information - } - - /** If the pixel is requested, this attaches cache control headers to the response to prevent any caching. */ - def cacheControl(pixelExpected: Boolean): List[`Cache-Control`] = - if (pixelExpected) List(`Cache-Control`(`no-cache`, `no-store`, `must-revalidate`)) - else Nil - - /** - * Determines the cookie domain to be used by inspecting the Origin header of the request - * and trying to find a match in the list of domains specified in the config file. - * @param headers The headers from the http request. - * @param domains The list of cookie domains from the configuration. - * @param fallbackDomain The fallback domain from the configuration. - * @return The domain to be sent back in the response, unless no cookie domains are configured. - * The Origin header may include multiple domains. The first matching domain is returned. - * If no match is found, the fallback domain is used if configured. Otherwise, the cookie domain is not set. - */ - def cookieDomain( - headers: Seq[HttpHeader], - domains: Option[List[String]], - fallbackDomain: Option[String] - ): Option[String] = - (for { - domainList <- domains - origins <- headers.collectFirst { case header: `Origin` => header.origins } - originHosts = extractHosts(origins) - domainToUse <- domainList.find(domain => originHosts.exists(validMatch(_, domain))) - } yield domainToUse).orElse(fallbackDomain) - - /** Extracts the host names from a list of values in the request's Origin header. */ - def extractHosts(origins: Seq[HttpOrigin]): Seq[String] = - origins.map(origin => origin.host.host.address()) - - /** - * Ensures a match is valid. - * We only want matches where: - * a.) the Origin host is exactly equal to the cookie domain from the config - * b.) the Origin host is a subdomain of the cookie domain from the config. - * But we want to avoid cases where the cookie domain from the config is randomly - * a substring of the Origin host, without any connection between them. - */ - def validMatch(host: String, domain: String): Boolean = - host == domain || host.endsWith("." + domain) - - /** - * Gets the IP from a RemoteAddress. If ipAsPartitionKey is false, a UUID will be generated. - * @param remoteAddress Address extracted from an HTTP request - * @param ipAsPartitionKey Whether to use the ip as a partition key or a random UUID - * @return a tuple of ip (unknown if it couldn't be extracted) and partition key - */ - def ipAndPartitionKey( - remoteAddress: RemoteAddress, - ipAsPartitionKey: Boolean - ): (String, String) = - remoteAddress.toOption.map(_.getHostAddress) match { - case None => ("unknown", UUID.randomUUID.toString) - case Some(ip) => (ip, if (ipAsPartitionKey) ip else UUID.randomUUID.toString) - } - - /** - * Gets the network user id from the query string or the request cookie. - * @param request Http request made - * @param requestCookie cookie associated to the Http request - * @return a network user id - */ - def networkUserId( - request: HttpRequest, - requestCookie: Option[HttpCookie], - spAnonymous: Option[String] - ): Option[String] = - spAnonymous match { - case Some(_) => Some(spAnonymousNuid) - case None => request.uri.query().get("nuid").orElse(requestCookie.map(_.value)) - } - - /** - * Creates an Access-Control-Allow-Origin header which specifically allows the domain which made - * the request - * @param request Incoming request - * @return Header allowing only the domain which made the request or everything - */ - def accessControlAllowOriginHeader(request: HttpRequest): HttpHeader = - `Access-Control-Allow-Origin`(request.headers.find { - case `Origin`(_) => true - case _ => false - } match { - case Some(`Origin`(origin)) => HttpOriginRange.Default(origin) - case _ => HttpOriginRange.`*` - }) -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala deleted file mode 100644 index 54c77eee4..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/HealthService.scala +++ /dev/null @@ -1,31 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -trait HealthService { - def isHealthy: Boolean -} - -object HealthService { - - class Settable extends HealthService { - @volatile private var state: Boolean = false - - override def isHealthy: Boolean = state - - def toUnhealthy(): Unit = - state = false - - def toHealthy(): Unit = - state = true - } - -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala deleted file mode 100644 index 3a7449303..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/Warmup.scala +++ /dev/null @@ -1,79 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import akka.http.scaladsl.Http -import akka.http.scaladsl.model.HttpRequest -import akka.http.scaladsl.settings.ConnectionPoolSettings -import akka.stream.scaladsl.{Sink, Source} -import akka.actor.ActorSystem - -import org.slf4j.{Logger, LoggerFactory} - -import scala.concurrent.{ExecutionContext, Future} -import scala.util.Failure - -import com.snowplowanalytics.snowplow.collectors.scalastream.model.WarmupConfig - -object Warmup { - - private lazy val logger: Logger = LoggerFactory.getLogger(getClass) - - def run(interface: String, port: Int, config: WarmupConfig)( - implicit ec: ExecutionContext, - system: ActorSystem - ): Future[Unit] = - if (config.enable) { - logger.info(s"Starting warm up of $interface:$port. It is expected to see a few failures during warmup.") - - def runNextCycle(counter: Int): Future[Unit] = { - val maxConnections = config.maxConnections * counter - val numRequests = config.numRequests * counter - - val cxnSettings = ConnectionPoolSettings(system) - .withMaxConnections(maxConnections) - .withMaxOpenRequests(Integer.highestOneBit(maxConnections) * 2) // must exceed maxConnections and must be a power of 2 - .withMaxRetries(0) - - Source(1 to numRequests) - .map(_ => (HttpRequest(uri = s"/health"), ())) - .via(Http().cachedHostConnectionPool[Unit](interface, port, cxnSettings)) - .map(_._1) - .runWith(Sink.seq) - .map { results => - val numFails = results.count(_.isFailure) - results - .collect { - case Failure(e) => e.getMessage - } - .toSet - .foreach { message: String => - logger.info(message) - } - - logger.info( - s"Finished warmup cycle $counter of $interface:$port with $maxConnections max client TCP connections. Sent ${numRequests} requests with $numFails failures." - ) - numFails - } - .flatMap { numFails => - if (numFails > 0 || counter >= config.maxCycles) { - logger.info(s"Finished all warmup cycles of $interface:$port") - Future.successful(()) - } else - runNextCycle(counter + 1) - } - } - - runNextCycle(1) - } else Future.successful(()) - -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala deleted file mode 100644 index 6627f274d..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/model.scala +++ /dev/null @@ -1,272 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import scala.concurrent.duration.FiniteDuration -import scala.concurrent.duration.DurationInt - -import akka.http.scaladsl.model.headers.HttpCookiePair - -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.Sink - -import io.circe.Json - -import pureconfig.{CamelCase, ConfigFieldMapping, ConfigReader} -import pureconfig.error.UserValidationFailed -import pureconfig.generic.auto._ -import pureconfig.generic.semiauto._ -import pureconfig.generic.{FieldCoproductHint, ProductHint} - -package model { - - /** - * Case class for holding both good and - * bad sinks for the Stream Collector. - */ - final case class CollectorSinks(good: Sink, bad: Sink) - - /** - * Case class for holding the results of - * splitAndSerializePayload. - * - * @param good All good results - * @param bad All bad results - */ - final case class EventSerializeResult(good: List[Array[Byte]], bad: List[Array[Byte]]) - - /** - * Class for the result of splitting a too-large array of events in the body of a POST request - * - * @param goodBatches List of batches of events - * @param failedBigEvents List of events that were too large - */ - final case class SplitBatchResult(goodBatches: List[List[Json]], failedBigEvents: List[Json]) - - final case class CookieConfig( - enabled: Boolean, - name: String, - expiration: FiniteDuration, - domains: Option[List[String]], - fallbackDomain: Option[String], - secure: Boolean, - httpOnly: Boolean, - sameSite: Option[String] - ) - final case class DoNotTrackCookieConfig( - enabled: Boolean, - name: String, - value: String - ) - final case class DntCookieMatcher(name: String, value: String) { - private val pattern = value.r.pattern - def matches(httpCookiePair: HttpCookiePair): Boolean = pattern.matcher(httpCookiePair.value).matches() - } - final case class CookieBounceConfig( - enabled: Boolean, - name: String, - fallbackNetworkUserId: String, - forwardedProtocolHeader: Option[String] - ) - final case class RedirectMacroConfig( - enabled: Boolean, - placeholder: Option[String] - ) - final case class RootResponseConfig( - enabled: Boolean, - statusCode: Int, - headers: Map[String, String] = Map.empty[String, String], - body: String = "" - ) - final case class P3PConfig(policyRef: String, CP: String) - final case class CrossDomainConfig(enabled: Boolean, domains: List[String], secure: Boolean) - final case class CORSConfig(accessControlMaxAge: FiniteDuration) - final case class KinesisBackoffPolicyConfig(minBackoff: Long, maxBackoff: Long, maxRetries: Int) - final case class SqsBackoffPolicyConfig(minBackoff: Long, maxBackoff: Long, maxRetries: Int) - final case class GooglePubSubBackoffPolicyConfig( - minBackoff: Long, - maxBackoff: Long, - totalBackoff: Long, - multiplier: Double, - initialRpcTimeout: Long, - maxRpcTimeout: Long, - rpcTimeoutMultiplier: Double - ) - final case class RabbitMQBackoffPolicyConfig(minBackoff: Long, maxBackoff: Long, multiplier: Double) - sealed trait SinkConfig { - val maxBytes: Int - } - final case class AWSConfig(accessKey: String, secretKey: String) - final case class Kinesis( - maxBytes: Int, - region: String, - threadPoolSize: Int, - aws: AWSConfig, - backoffPolicy: KinesisBackoffPolicyConfig, - customEndpoint: Option[String], - sqsGoodBuffer: Option[String], - sqsBadBuffer: Option[String], - sqsMaxBytes: Int, - startupCheckInterval: FiniteDuration - ) extends SinkConfig { - val endpoint = customEndpoint.getOrElse(region match { - case cn @ "cn-north-1" => s"https://kinesis.$cn.amazonaws.com.cn" - case cn @ "cn-northwest-1" => s"https://kinesis.$cn.amazonaws.com.cn" - case _ => s"https://kinesis.$region.amazonaws.com" - }) - } - final case class Sqs( - maxBytes: Int, - region: String, - threadPoolSize: Int, - aws: AWSConfig, - backoffPolicy: SqsBackoffPolicyConfig, - startupCheckInterval: FiniteDuration - ) extends SinkConfig - final case class GooglePubSub( - maxBytes: Int, - googleProjectId: String, - backoffPolicy: GooglePubSubBackoffPolicyConfig, - startupCheckInterval: FiniteDuration, - retryInterval: FiniteDuration, - gcpUserAgent: GcpUserAgent - ) extends SinkConfig - final case class Kafka( - maxBytes: Int, - brokers: String, - retries: Int, - producerConf: Option[Map[String, String]] - ) extends SinkConfig - final case class Nsq(maxBytes: Int, host: String, port: Int) extends SinkConfig - final case class Stdout(maxBytes: Int) extends SinkConfig - final case class Rabbitmq( - maxBytes: Int, - username: String, - password: String, - virtualHost: String, - host: String, - port: Int, - backoffPolicy: RabbitMQBackoffPolicyConfig, - routingKeyGood: String, - routingKeyBad: String, - threadPoolSize: Option[Int] - ) extends SinkConfig - final case class BufferConfig(byteLimit: Long, recordLimit: Long, timeLimit: Long) - final case class StreamsConfig( - good: String, - bad: String, - useIpAddressAsPartitionKey: Boolean, - sink: SinkConfig, - buffer: BufferConfig - ) - final case class GcpUserAgent(productName: String) - - final case class StatsdConfig( - enabled: Boolean, - hostname: String, - port: Int, - period: FiniteDuration = 1.minute, - prefix: String = "snowplow.collector", - tags: Map[String, String] = Map("app" -> "collector") - ) - final case class MetricsConfig(statsd: StatsdConfig) - final case class MonitoringConfig(metrics: MetricsConfig) - - final case class TelemetryConfig( - // General params - disable: Boolean = false, - interval: FiniteDuration = 60.minutes, - // http params - method: String = "POST", - url: String = "telemetry-g.snowplowanalytics.com", - port: Int = 443, - secure: Boolean = true, - // Params injected by deployment scripts - userProvidedId: Option[String] = None, - moduleName: Option[String] = None, - moduleVersion: Option[String] = None, - instanceId: Option[String] = None, - autoGeneratedId: Option[String] = None - ) - - final case class SSLConfig( - enable: Boolean = false, - redirect: Boolean = false, - port: Int = 443 - ) - - final case class WarmupConfig( - enable: Boolean, - numRequests: Int, - maxConnections: Int, - maxCycles: Int - ) - - final case class ExperimentalConfig( - warmup: WarmupConfig - ) - - final case class CollectorConfig( - interface: String, - port: Int, - paths: Map[String, String], - p3p: P3PConfig, - crossDomain: CrossDomainConfig, - cookie: CookieConfig, - doNotTrackCookie: DoNotTrackCookieConfig, - cookieBounce: CookieBounceConfig, - redirectMacro: RedirectMacroConfig, - rootResponse: RootResponseConfig, - cors: CORSConfig, - streams: StreamsConfig, - monitoring: MonitoringConfig, - telemetry: Option[TelemetryConfig], - ssl: SSLConfig = SSLConfig(), - enableDefaultRedirect: Boolean, - redirectDomains: Set[String], - terminationDeadline: FiniteDuration, - preTerminationPeriod: FiniteDuration, - preTerminationUnhealthy: Boolean, - experimental: ExperimentalConfig - ) { - val cookieConfig = if (cookie.enabled) Some(cookie) else None - val doNotTrackHttpCookie = - if (doNotTrackCookie.enabled) - Some(DntCookieMatcher(name = doNotTrackCookie.name, value = doNotTrackCookie.value)) - else - None - - def cookieName = cookieConfig.map(_.name) - def cookieDomain = cookieConfig.flatMap(_.domains) - def fallbackDomain = cookieConfig.flatMap(_.fallbackDomain) - def cookieExpiration = cookieConfig.map(_.expiration) - } - - object CollectorConfig { - - implicit private val _ = new FieldCoproductHint[SinkConfig]("enabled") - implicit def hint[T] = ProductHint[T](ConfigFieldMapping(CamelCase, CamelCase)) - - private val invalidDomainMatcher = ".*([^A-Za-z0-9-.]).*".r - - implicit def cookieConfigReader: ConfigReader[CookieConfig] = - deriveReader[CookieConfig].emap { cc => - cc.fallbackDomain match { - case Some(invalidDomainMatcher(char)) => - Left(UserValidationFailed(s"fallbackDomain contains invalid character for a domain: [$char]")) - case _ => Right(cc) - } - } - - implicit def collectorConfigReader: ConfigReader[CollectorConfig] = - deriveReader[CollectorConfig] - } - -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala deleted file mode 100644 index 00c51b959..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/Sink.scala +++ /dev/null @@ -1,28 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -package sinks - -import org.slf4j.LoggerFactory - -// Define an interface for all sinks to use to store events. -trait Sink { - - // Maximum number of bytes that a single record can contain. - // If a record is bigger, a size violation bad row is emitted instead - val maxBytes: Int - - lazy val log = LoggerFactory.getLogger(getClass()) - - def isHealthy: Boolean = true - def storeRawEvents(events: List[Array[Byte]], key: String): Unit - def shutdown(): Unit -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala deleted file mode 100644 index 00ae0d0d4..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/CloudVendor.scala +++ /dev/null @@ -1,22 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.telemetry - -import io.circe.Encoder - -sealed trait CloudVendor - -object CloudVendor { - case object Aws extends CloudVendor - case object Gcp extends CloudVendor - - implicit val encoder: Encoder[CloudVendor] = Encoder.encodeString.contramap[CloudVendor](_.toString().toUpperCase()) -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala deleted file mode 100644 index a2c9aec59..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryAkkaService.scala +++ /dev/null @@ -1,130 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.telemetry - -import cats.data.NonEmptyList - -import akka.actor.ActorSystem - -import io.circe.Json - -import scala.concurrent.ExecutionContextExecutor -import scala.concurrent.duration.Duration - -import org.slf4j.{Logger, LoggerFactory} - -import java.util.concurrent.TimeUnit - -import com.snowplowanalytics.iglu.core.SelfDescribingData -import com.snowplowanalytics.snowplow.scalatracker.Emitter.EndpointParams -import com.snowplowanalytics.snowplow.scalatracker.Tracker -import com.snowplowanalytics.snowplow.scalatracker.idimplicits._ -import com.snowplowanalytics.snowplow.scalatracker.emitters.id.SyncEmitter -import com.snowplowanalytics.snowplow.scalatracker.Emitter._ -import com.snowplowanalytics.snowplow.collectors.scalastream.model.{ - CollectorConfig, - GooglePubSub, - Kinesis, - Sqs, - TelemetryConfig -} - -/** Akka implementation of telemetry service. - * - * @param teleCfg - telemetry configuration - * @param cloud - cloud vendor - * @param region - deployment region - * @param appName - application name as defined during build (take it from BuildInfo) - * @param appVersion - application name as defined during build (take it from BuildInfo) - */ -case class TelemetryAkkaService( - teleCfg: TelemetryConfig, - cloud: Option[CloudVendor], - region: Option[String], - appName: String, - appVersion: String -) { - private lazy val log: Logger = LoggerFactory.getLogger(getClass) - - private lazy val payload: SelfDescribingData[Json] = makeHeartbeatEvent(teleCfg, cloud, region, appName, appVersion) - - def start()(implicit actorSystem: ActorSystem): Unit = - if (teleCfg.disable) { - log.info(s"Telemetry disabled") - } else { - log.info(s"Telemetry enabled") - val scheduler = actorSystem.scheduler - implicit val executor: ExecutionContextExecutor = actorSystem.dispatcher - - def emitterCallback(params: EndpointParams, req: Request, res: Result): Unit = - res match { - case Result.Success(_) => log.debug(s"telemetry send successfully") - case Result.Failure(code) => - log.warn(s"Scala telemetry tracker got unexpected HTTP code $code from ${params.getUri}") - case Result.TrackerFailure(exception) => - log.warn( - s"Scala telemetry tracker failed to reach ${params.getUri} with following exception $exception" + - s" after ${req.attempt} attempt" - ) - case Result.RetriesExceeded(failure) => - log.warn(s"Scala telemetry tracker has stopped trying to deliver payload after following failure: $failure") - } - - val emitter: SyncEmitter = SyncEmitter( - EndpointParams(teleCfg.url, port = teleCfg.port, https = teleCfg.secure), - callback = Some(emitterCallback) - ) - // telemetry - Unique identifier for website / application (aid) - // root - The tracker namespace (tna) - val tracker = new Tracker(NonEmptyList.of(emitter), "telemetry", appName) - - // discarding cancellation handle - val _ = scheduler.scheduleAtFixedRate( - initialDelay = Duration(0, TimeUnit.SECONDS), - interval = teleCfg.interval - ) { () => - tracker.trackSelfDescribingEvent(unstructEvent = payload) - tracker.flushEmitters() // this is important! - } - } -} - -object TelemetryAkkaService { - - /** Specialized version of [[TelemetryAkkaService]] for collector. That takes CollectorConfig as an input. - * - * @param collectorConfig - Top level collector configuration - * @param appName - application name as defined during build (take it from BuildInfo) - * @param appVersion - application name as defined during build (take it from BuildInfo) - * @return heartbeat event. Same event should be used for all heartbeats. - */ - def initWithCollector( - collectorConfig: CollectorConfig, - appName: String, - appVersion: String - ): TelemetryAkkaService = { - - val (cloud, region) = collectorConfig.streams.sink match { - case k: Kinesis => (Some(CloudVendor.Aws), Some(k.region)) - case s: Sqs => (Some(CloudVendor.Aws), Some(s.region)) - case _: GooglePubSub => (Some(CloudVendor.Gcp), None) - case _ => (None, None) - } - - TelemetryAkkaService( - collectorConfig.telemetry.getOrElse(TelemetryConfig()), - cloud, - region, - appName, - appVersion - ) - } -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala deleted file mode 100644 index 56d766bc2..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/TelemetryPayload.scala +++ /dev/null @@ -1,25 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.telemetry - -// iglu:com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1 -private case class TelemetryPayload( - userProvidedId: Option[String] = None, - moduleName: Option[String] = None, - moduleVersion: Option[String] = None, - instanceId: Option[String] = None, - region: Option[String] = None, - cloud: Option[CloudVendor] = None, - autoGeneratedId: Option[String] = None, - applicationName: String, - applicationVersion: String, - appGeneratedId: String -) diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala deleted file mode 100644 index 1105bf994..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/telemetry/package.scala +++ /dev/null @@ -1,58 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import com.snowplowanalytics.iglu.core.{SchemaKey, SchemaVer, SelfDescribingData} -import com.snowplowanalytics.snowplow.collectors.scalastream.model.TelemetryConfig -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.CloudVendor._ -import io.circe.Json -import io.circe.generic.auto._ -import io.circe.syntax._ - -package object telemetry { - private val teleUuid: String = java.util.UUID.randomUUID.toString - - private val telemetrySchema: SchemaKey = - SchemaKey("com.snowplowanalytics.oss", "oss_context", "jsonschema", SchemaVer.Full(1, 0, 1)) - - /** Makes SelfDescribingData to be used for telemetry heartbeats. - * Don't forget to cache this event after the creation. - * - * @param teleCfg - telemetry configuration - * @param cloud - cloud vendor - * @param region - deployment region - * @param appName - application name as defined during build (take it from BuildInfo) - * @param appVersion - application name as defined during build (take it from BuildInfo) - * @return heartbeat event. Same event should be used for all heartbeats. - */ - def makeHeartbeatEvent( - teleCfg: TelemetryConfig, - cloud: Option[CloudVendor], - region: Option[String], - appName: String, - appVersion: String - ): SelfDescribingData[Json] = SelfDescribingData( - telemetrySchema, - TelemetryPayload( - userProvidedId = teleCfg.userProvidedId, - moduleName = teleCfg.moduleName, - moduleVersion = teleCfg.moduleVersion, - instanceId = teleCfg.instanceId, - autoGeneratedId = teleCfg.autoGeneratedId, - region = region, - cloud = cloud, - applicationName = appName, - applicationVersion = appVersion, - appGeneratedId = teleUuid - ).asJson - ) - -} diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala deleted file mode 100644 index 7785aeabb..000000000 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatch.scala +++ /dev/null @@ -1,163 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -package utils - -import java.nio.ByteBuffer -import java.nio.charset.StandardCharsets.UTF_8 -import java.time.Instant -import org.apache.thrift.TSerializer - -import cats.syntax.either._ -import io.circe.Json -import io.circe.parser._ -import io.circe.syntax._ - -import com.snowplowanalytics.iglu.core._ -import com.snowplowanalytics.iglu.core.circe.CirceIgluCodecs._ -import com.snowplowanalytics.snowplow.badrows._ -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -/** Object handling splitting an array of strings correctly */ -case class SplitBatch(appName: String, appVersion: String) { - - // Serialize Thrift CollectorPayload objects - val ThriftSerializer = new ThreadLocal[TSerializer] { - override def initialValue = new TSerializer() - } - - /** - * Split a list of strings into batches, none of them exceeding a given size - * Input strings exceeding the given size end up in the failedBigEvents field of the result - * @param input List of strings - * @param maximum No good batch can exceed this size - * @param joinSize Constant to add to the size of the string representing the additional comma - * needed to join separate event JSONs in a single array - * @return split batch containing list of good batches and list of events that were too big - */ - def split(input: List[Json], maximum: Int, joinSize: Int = 1): SplitBatchResult = { - @scala.annotation.tailrec - def iterbatch( - l: List[Json], - currentBatch: List[Json], - currentTotal: Long, - acc: List[List[Json]], - failedBigEvents: List[Json] - ): SplitBatchResult = l match { - case Nil => - currentBatch match { - case Nil => SplitBatchResult(acc, failedBigEvents) - case nonemptyBatch => SplitBatchResult(nonemptyBatch :: acc, failedBigEvents) - } - case h :: t => - val headSize = getSize(h.noSpaces) - if (headSize + joinSize > maximum) { - iterbatch(t, currentBatch, currentTotal, acc, h :: failedBigEvents) - } else if (headSize + currentTotal + joinSize > maximum) { - iterbatch(l, Nil, 0, currentBatch :: acc, failedBigEvents) - } else { - iterbatch(t, h :: currentBatch, headSize + currentTotal + joinSize, acc, failedBigEvents) - } - } - - iterbatch(input, Nil, 0, Nil, Nil) - } - - /** - * If the CollectorPayload is too big to fit in a single record, attempt to split it into - * multiple records. - * @param event Incoming CollectorPayload - * @return a List of Good and Bad events - */ - def splitAndSerializePayload(event: CollectorPayload, maxBytes: Int): EventSerializeResult = { - val serializer = ThriftSerializer.get() - val everythingSerialized = serializer.serialize(event) - val wholeEventBytes = getSize(everythingSerialized) - - // If the event is below the size limit, no splitting is necessary - if (wholeEventBytes < maxBytes) { - EventSerializeResult(List(everythingSerialized), Nil) - } else { - (for { - body <- Option(event.getBody).toRight("GET requests cannot be split") - children <- splitBody(body) - initialBodyDataBytes = getSize(Json.arr(children._2: _*).noSpaces) - _ <- Either.cond[String, Unit]( - wholeEventBytes - initialBodyDataBytes < maxBytes, - (), - "cannot split this POST request because event without \"data\" field is still too big" - ) - splitted = split(children._2, maxBytes - wholeEventBytes + initialBodyDataBytes) - goodSerialized = serializeBatch(serializer, event, splitted.goodBatches, children._1) - badList = splitted.failedBigEvents.map { e => - val msg = "this POST request split is still too large" - oversizedPayload(event, getSize(e), maxBytes, msg) - } - } yield EventSerializeResult(goodSerialized, badList)).fold({ msg => - val tooBigPayload = oversizedPayload(event, wholeEventBytes, maxBytes, msg) - EventSerializeResult(Nil, List(tooBigPayload)) - }, identity) - } - } - - def splitBody(body: String): Either[String, (SchemaKey, List[Json])] = - for { - json <- parse(body).leftMap(e => s"cannot split POST requests which are not json ${e.getMessage}") - sdd <- json - .as[SelfDescribingData[Json]] - .leftMap(e => s"cannot split POST requests which are not self-describing ${e.getMessage}") - array <- sdd.data.asArray.toRight("cannot split POST requests which do not contain a data array") - } yield (sdd.schema, array.toList) - - /** - * Creates a bad row while maintaining a truncation of the original payload to ease debugging. - * Keeps a tenth of the original payload. - * @param event original payload - * @param size size of the oversized payload - * @param maxSize maximum size allowed - * @param msg error message - * @return the created bad rows as json - */ - private def oversizedPayload( - event: CollectorPayload, - size: Int, - maxSize: Int, - msg: String - ): Array[Byte] = - BadRow - .SizeViolation( - Processor(appName, appVersion), - Failure.SizeViolation(Instant.now(), maxSize, size, s"oversized collector payload: $msg"), - Payload.RawPayload(event.toString().take(maxSize / 10)) - ) - .compact - .getBytes(UTF_8) - - private def getSize(a: Array[Byte]): Int = ByteBuffer.wrap(a).capacity - - private def getSize(s: String): Int = getSize(s.getBytes(UTF_8)) - - private def getSize(j: Json): Int = getSize(j.noSpaces) - - private def serializeBatch( - serializer: TSerializer, - event: CollectorPayload, - batches: List[List[Json]], - schema: SchemaKey - ): List[Array[Byte]] = - batches.map { batch => - val payload = event.deepCopy() - val body = SelfDescribingData[Json](schema, Json.arr(batch: _*)) - payload.setBody(body.asJson.noSpaces) - serializer.serialize(payload) - } -} diff --git a/core/src/test/resources/configs/invalid-fallback-domain.hocon b/core/src/test/resources/configs/invalid-fallback-domain.hocon deleted file mode 100644 index 287848e14..000000000 --- a/core/src/test/resources/configs/invalid-fallback-domain.hocon +++ /dev/null @@ -1,24 +0,0 @@ -interface = "0.0.0.0" -port = 8080 - -streams { - useIpAddressAsPartitionKey = false - good = "good" - bad = "bad" - - sink { - enabled = stdout - maxBytes = 1000000000 - } - - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } - -} - -cookie { - fallbackDomain: "example.com,example2.com" -} diff --git a/core/src/test/resources/configs/valid-config.hocon b/core/src/test/resources/configs/valid-config.hocon deleted file mode 100644 index f59fcb5a8..000000000 --- a/core/src/test/resources/configs/valid-config.hocon +++ /dev/null @@ -1,19 +0,0 @@ -interface = "0.0.0.0" -port = 8080 - -streams { - useIpAddressAsPartitionKey = false - good = "good" - bad = "bad" - - sink { - enabled = stdout - maxBytes = 1000000000 - } - - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } -} diff --git a/http4s/src/test/resources/test-config-new-style.hocon b/core/src/test/resources/test-config-new-style.hocon similarity index 100% rename from http4s/src/test/resources/test-config-new-style.hocon rename to core/src/test/resources/test-config-new-style.hocon diff --git a/http4s/src/test/resources/test-config-old-style.hocon b/core/src/test/resources/test-config-old-style.hocon similarity index 100% rename from http4s/src/test/resources/test-config-old-style.hocon rename to core/src/test/resources/test-config-old-style.hocon diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala similarity index 100% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala similarity index 99% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 8dc9e824b..e45649eda 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -171,6 +171,8 @@ class RoutesSpec extends Specification { case Method.GET => cookieParams.pixelExpected shouldEqual true cookieParams.contentType shouldEqual None + case other => + ko(s"Invalid http method - $other") } cookieParams.doNotTrack shouldEqual false response.status must beEqualTo(Status.Ok) diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala similarity index 99% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index f44bfba02..c431eb75f 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -1,7 +1,7 @@ package com.snowplowanalytics.snowplow.collector.core import scala.concurrent.duration._ -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import org.specs2.mutable.Specification @@ -404,7 +404,7 @@ class ServiceSpec extends Specification { `Access-Control-Allow-Headers`(ci"Content-Type", ci"SP-Anonymous"), `Access-Control-Max-Age`.Cache(3600).asInstanceOf[`Access-Control-Max-Age`] ) - service.preflightResponse(Request[IO]()).unsafeRunSync.headers shouldEqual expected + service.preflightResponse(Request[IO]()).unsafeRunSync().headers shouldEqual expected } } @@ -700,7 +700,7 @@ class ServiceSpec extends Specification { ) cookie.secure must beTrue cookie.httpOnly must beTrue - cookie.sameSite must beSome(SameSite.None) + cookie.sameSite must beSome[SameSite](SameSite.None) cookie.extension must beNone service.cookieHeader( headers = Headers.empty, diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala similarity index 100% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/SplitBatchSpec.scala diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala similarity index 100% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TelemetrySpec.scala diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala similarity index 100% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestSink.scala diff --git a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala similarity index 95% rename from http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala rename to core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 3647ec7d3..2dc0b780e 100644 --- a/http4s/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -6,7 +6,7 @@ import cats.Applicative import org.http4s.SameSite -import com.snowplowanalytics.snowplow.collector.core.Config._ +import com.snowplowanalytics.snowplow.collector.core.Config.{Sink => SinkConfig, _} object TestUtils { val appName = "collector-test" @@ -75,7 +75,7 @@ object TestUtils { ), cors = CORS(60.minutes), streams = Streams( - good = Sink( + good = SinkConfig( name = "raw", Buffer( 3145728, @@ -84,7 +84,7 @@ object TestUtils { ), AnyRef ), - bad = Sink( + bad = SinkConfig( name = "bad-1", Buffer( 3145728, diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala deleted file mode 100644 index 55360e093..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorRouteSpec.scala +++ /dev/null @@ -1,217 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import java.net.InetAddress - -import akka.http.scaladsl.model._ -import akka.http.scaladsl.model.headers._ -import akka.http.scaladsl.server.Directives._ -import akka.http.scaladsl.testkit.Specs2RouteTest - -import com.snowplowanalytics.snowplow.collectors.scalastream.model.DntCookieMatcher - -import org.specs2.mutable.Specification - -class CollectorRouteSpec extends Specification with Specs2RouteTest { - val mkRoute = (withRedirects: Boolean, spAnonymous: Option[String]) => - new CollectorRoute { - override val collectorService = new Service { - def preflightResponse(req: HttpRequest): HttpResponse = - HttpResponse(200, entity = "preflight response") - def flashCrossDomainPolicy: HttpResponse = HttpResponse(200, entity = "flash cross domain") - def rootResponse: HttpResponse = HttpResponse(200, entity = "200 collector root") - def cookie( - queryString: Option[String], - body: Option[String], - path: String, - cookie: Option[HttpCookie], - userAgent: Option[String], - refererUri: Option[String], - hostname: String, - ip: RemoteAddress, - request: HttpRequest, - pixelExpected: Boolean, - doNotTrack: Boolean, - contentType: Option[ContentType] = None, - spAnonymous: Option[String] = spAnonymous - ): HttpResponse = HttpResponse(200, entity = s"cookie") - def cookieName: Option[String] = Some("name") - def doNotTrackCookie: Option[DntCookieMatcher] = None - def determinePath(vendor: String, version: String): String = "/p1/p2" - def enableDefaultRedirect: Boolean = withRedirects - def sinksHealthy: Boolean = true - } - override val healthService = new HealthService { - def isHealthy: Boolean = true - } - } - val route = mkRoute(true, None) - val routeWithoutRedirects = mkRoute(false, None) - val routeWithAnonymousTracking = mkRoute(true, Some("*")) - - "The collector route" should { - "respond to the cors route with a preflight response" in { - Options() ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "preflight response" - } - } - "respond to the health route with an ok response" in { - Get("/health") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "OK" - } - } - "respond to the cross domain route with the cross domain policy" in { - Get("/crossdomain.xml") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "flash cross domain" - } - } - "respond to the post cookie route with the cookie response" in { - Post("/p1/p2") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - } - "respond to the get cookie route with the cookie response" in { - Get("/p1/p2") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - } - "respond to the head cookie route with the cookie response" in { - Head("/p1/p2") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - } - "respond to the get pixel route with the cookie response" in { - Get("/ice.png") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - Get("/i") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - } - "respond to the head pixel route with the cookie response" in { - Head("/ice.png") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - Head("/i") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "cookie" - } - } - "respond to customizable root requests" in { - Get("/") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "200 collector root" - } - } - "disallow redirect routes when redirects disabled" in { - Get("/r/abc") ~> routeWithoutRedirects.collectorRoute ~> check { - responseAs[String] shouldEqual "redirects disabled" - } - } - "respond to anything else with a not found" in { - Get("/something") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "404 not found" - } - } - "respond to robots.txt with a disallow rule" in { - Get("/robots.txt") ~> route.collectorRoute ~> check { - responseAs[String] shouldEqual "User-agent: *\nDisallow: /" - } - } - - "extract a query string" in { - "produce the query string if present" in { - route.queryString(Some("/abc/def?a=12&b=13#frg")) shouldEqual Some("a=12&b=13") - } - "produce an empty string if the extractor doesn't match" in { - route.queryString(Some("/abc/def#frg")) shouldEqual None - } - "produce an empty string if the argument is None" in { - route.queryString(None) shouldEqual None - } - "produce the query string when some of the values are URL encoded" in { - route.queryString(Some("/abc/def?schema=iglu%3Acom.acme%2Fcampaign%2Fjsonschema%2F1-0-0&aid=test")) shouldEqual Some( - "schema=iglu%3Acom.acme%2Fcampaign%2Fjsonschema%2F1-0-0&aid=test" - ) - } - } - - "have a directive extracting a cookie" in { - "return the cookie if some cookie name is given" in { - Get() ~> Cookie("abc" -> "123") ~> - route.cookieIfWanted(Some("abc")) { c => - complete(HttpResponse(200, entity = c.toString)) - } ~> check { - responseAs[String] shouldEqual "Some(abc=123)" - } - } - "return none if no cookie name is given" in { - Get() ~> Cookie("abc" -> "123") ~> - route.cookieIfWanted(None) { c => - complete(HttpResponse(200, entity = c.toString)) - } ~> check { - responseAs[String] shouldEqual "None" - } - } - } - - "have a directive checking for a do not track cookie" in { - "return false if the dnt cookie is not setup" in { - Get() ~> Cookie("abc" -> "123") ~> route.doNotTrack(None) { dnt => - complete(dnt.toString) - } ~> check { - responseAs[String] shouldEqual "false" - } - } - "return false if the dnt cookie doesn't have the same value compared to configuration" in { - Get() ~> Cookie("abc" -> "123") ~> - route.doNotTrack(Some(DntCookieMatcher(name = "abc", value = "345"))) { dnt => - complete(dnt.toString) - } ~> check { - responseAs[String] shouldEqual "false" - } - } - "return true if there is a properly-valued dnt cookie" in { - Get() ~> Cookie("abc" -> "123") ~> - route.doNotTrack(Some(DntCookieMatcher(name = "abc", value = "123"))) { dnt => - complete(dnt.toString) - } ~> check { - responseAs[String] shouldEqual "true" - } - } - "return true if there is a properly-valued dnt cookie that matches a regex value" in { - Get() ~> Cookie("abc" -> s"deleted-${System.currentTimeMillis()}") ~> - route.doNotTrack(Some(DntCookieMatcher(name = "abc", value = "deleted-[0-9]+"))) { dnt => - complete(dnt.toString) - } ~> check { - responseAs[String] shouldEqual "true" - } - } - } - - "have a directive to handle the IP address depending on whether SP-Anonymous header is present or not" in { - "SP-Anonymous present should obfuscate the IP address" in { - Get() ~> `X-Forwarded-For`(RemoteAddress.IP(InetAddress.getByName("127.0.0.1"))) ~> route.extractors( - Some("*") - ) { (_, ip, _) => - complete(ip.toString) - } ~> check { responseAs[String] shouldEqual "unknown" } - } - "no SP-Anonymous present should extract the IP address" in { - Get().withAttributes(Map(AttributeKeys.remoteAddress -> RemoteAddress.IP(InetAddress.getByName("127.0.0.1")))) ~> route - .extractors( - None - ) { (_, ip, _) => - complete(ip.toString) - } ~> check { responseAs[String] shouldEqual "127.0.0.1" } - } - } - } -} diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala deleted file mode 100644 index f4f3f3df3..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/CollectorServiceSpec.scala +++ /dev/null @@ -1,948 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import java.net.InetAddress -import java.nio.charset.StandardCharsets -import java.time.Instant -import org.apache.thrift.{TDeserializer, TSerializer} - -import scala.collection.immutable.Seq -import scala.collection.JavaConverters._ -import scala.concurrent.duration._ - -import akka.http.scaladsl.model._ -import akka.http.scaladsl.model.headers._ -import akka.http.scaladsl.model.headers.CacheDirectives._ -import cats.data.NonEmptyList -import io.circe._ -import io.circe.parser._ - -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload - -import com.snowplowanalytics.snowplow.badrows.{BadRow, Failure, Payload, Processor} -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -import org.specs2.mutable.Specification - -class CollectorServiceSpec extends Specification { - case class ProbeService(service: CollectorService, good: TestSink, bad: TestSink) - - val service = new CollectorService( - TestUtils.testConf, - CollectorSinks(new TestSink, new TestSink), - "app", - "version" - ) - - def probeService(): ProbeService = { - val good = new TestSink - val bad = new TestSink - val s = new CollectorService( - TestUtils.testConf, - CollectorSinks(good, bad), - "app", - "version" - ) - ProbeService(s, good, bad) - } - def bouncingService(): ProbeService = { - val good = new TestSink - val bad = new TestSink - val s = new CollectorService( - TestUtils.testConf.copy(cookieBounce = TestUtils.testConf.cookieBounce.copy(enabled = true)), - CollectorSinks(good, bad), - "app", - "version" - ) - ProbeService(s, good, bad) - } - val uuidRegex = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}".r - val event = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") - val hs = List(`Raw-Request-URI`("uri"), `X-Forwarded-For`(RemoteAddress(InetAddress.getByName("127.0.0.1")))) - def serializer = new TSerializer() - def deserializer = new TDeserializer() - - "The collector service" should { - "cookie" in { - "attach p3p headers" in { - val ProbeService(s, good, bad) = probeService() - val r = s.cookie( - Some("nuid=12"), - Some("b"), - "p", - None, - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - false, - false - ) - r.headers must have size 4 - r.headers must contain( - RawHeader( - "P3P", - "policyref=\"%s\", CP=\"%s\"".format("/w3c/p3p.xml", "NOI DSP COR NID PSA OUR IND COM NAV STA") - ) - ) - r.headers must contain(`Access-Control-Allow-Origin`(HttpOriginRange.`*`)) - r.headers must contain(`Access-Control-Allow-Credentials`(true)) - r.headers.filter(_.toString.startsWith("Set-Cookie")) must have size 1 - good.storedRawEvents must have size 1 - bad.storedRawEvents must have size 0 - } - "not store stuff and provide no cookie if do not track is on" in { - val ProbeService(s, good, bad) = probeService() - val r = s.cookie( - Some("nuid=12"), - Some("b"), - "p", - None, - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - false, - true - ) - r.headers must have size 3 - r.headers must contain( - RawHeader( - "P3P", - "policyref=\"%s\", CP=\"%s\"".format("/w3c/p3p.xml", "NOI DSP COR NID PSA OUR IND COM NAV STA") - ) - ) - r.headers must contain(`Access-Control-Allow-Origin`(HttpOriginRange.`*`)) - r.headers must contain(`Access-Control-Allow-Credentials`(true)) - good.storedRawEvents must have size 0 - bad.storedRawEvents must have size 0 - } - "not set a cookie if SP-Anonymous is present" in { - val r = service.cookie( - Some("nuid=12"), - Some("b"), - "p", - None, - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - false, - false, - None, - Some("*") - ) - - r.headers.filter(_.toString.startsWith("Set-Cookie")) must have size 0 - } - "not set a network_userid from cookie if SP-Anonymous is present" in { - val ProbeService(s, good, bad) = probeService() - s.cookie( - None, - Some("b"), - "p", - Some(HttpCookie("sp", "cookie-nuid")), - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - false, - false, - None, - Some("*") - ) - good.storedRawEvents must have size 1 - bad.storedRawEvents must have size 0 - val newEvent = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") - deserializer.deserialize(newEvent, good.storedRawEvents.head) - newEvent.networkUserId shouldEqual "00000000-0000-0000-0000-000000000000" - } - "network_userid from cookie should persist if SP-Anonymous is not present" in { - val ProbeService(s, good, bad) = probeService() - s.cookie( - None, - Some("b"), - "p", - Some(HttpCookie("sp", "cookie-nuid")), - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - false, - false, - None, - None - ) - good.storedRawEvents must have size 1 - bad.storedRawEvents must have size 0 - val newEvent = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") - deserializer.deserialize(newEvent, good.storedRawEvents.head) - newEvent.networkUserId shouldEqual "cookie-nuid" - } - "not store stuff if bouncing and provide a location header" in { - val ProbeService(s, good, bad) = bouncingService() - val r = s.cookie( - None, - Some("b"), - "p", - None, - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - true, - false - ) - r.headers must have size 6 - r.headers must contain(`Location`("/?bounce=true")) - r.headers must contain(`Cache-Control`(`no-cache`, `no-store`, `must-revalidate`)) - good.storedRawEvents must have size 0 - bad.storedRawEvents must have size 0 - } - "store stuff if having already bounced with the fallback nuid" in { - val ProbeService(s, good, bad) = bouncingService() - val r = s.cookie( - Some("bounce=true"), - Some("b"), - "p", - None, - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - true, - false - ) - r.headers must have size 5 - r.headers must contain(`Cache-Control`(`no-cache`, `no-store`, `must-revalidate`)) - good.storedRawEvents must have size 1 - bad.storedRawEvents must have size 0 - val newEvent = new CollectorPayload("iglu-schema", "ip", System.currentTimeMillis, "UTF-8", "collector") - deserializer.deserialize(newEvent, good.storedRawEvents.head) - newEvent.networkUserId shouldEqual "new-nuid" - } - "respond with a 200 OK and a bad row in case of illegal querystring" in { - val ProbeService(s, good, bad) = probeService() - val r = s.cookie( - Some("a b"), - None, - "p", - None, - None, - None, - "h", - RemoteAddress.Unknown, - HttpRequest(), - false, - false - ) - good.storedRawEvents must have size 0 - bad.storedRawEvents must have size 1 - r.status mustEqual StatusCodes.OK - - val brJson = parse(new String(bad.storedRawEvents.head, StandardCharsets.UTF_8)).getOrElse(Json.Null) - val failure = brJson.hcursor.downField("data").downField("failure").downField("errors").downArray.as[String] - val payload = brJson.hcursor.downField("data").downField("payload").as[String] - - failure must beRight( - "Illegal query: Invalid input ' ', expected '+', '=', query-char, 'EOI', '&' or pct-encoded (line 1, column 2): a b\n ^" - ) - payload must beRight("a b") - } - } - - "extractQueryParams" in { - "extract the parameters from a valid querystring" in { - val qs = Some("a=b&c=d") - val r = service.extractQueryParams(qs) - - r shouldEqual Right(Map("a" -> "b", "c" -> "d")) - } - - "fail on invalid querystring" in { - val qs = Some("a=b&c=d a") - val r = service.extractQueryParams(qs) - - r should beLeft - } - } - - "preflightResponse" in { - "return a response appropriate to cors preflight options requests" in { - service.preflightResponse(HttpRequest(), CORSConfig(-1.seconds)) shouldEqual HttpResponse().withHeaders( - List( - `Access-Control-Allow-Origin`(HttpOriginRange.`*`), - `Access-Control-Allow-Credentials`(true), - `Access-Control-Allow-Headers`("Content-Type", "SP-Anonymous"), - `Access-Control-Max-Age`(-1) - ) - ) - } - } - - "flashCrossDomainPolicy" in { - "return the cross domain policy with the specified config" in { - service.flashCrossDomainPolicy(CrossDomainConfig(true, List("*"), false)) shouldEqual HttpResponse( - entity = HttpEntity( - contentType = ContentType(MediaTypes.`text/xml`, HttpCharsets.`ISO-8859-1`), - string = - "\n\n \n" - ) - ) - } - "return the cross domain policy with multiple domains" in { - service.flashCrossDomainPolicy(CrossDomainConfig(true, List("*", "acme.com"), false)) shouldEqual HttpResponse( - entity = HttpEntity( - contentType = ContentType(MediaTypes.`text/xml`, HttpCharsets.`ISO-8859-1`), - string = - "\n\n \n \n" - ) - ) - } - "return the cross domain policy with no domains" in { - service.flashCrossDomainPolicy(CrossDomainConfig(true, List.empty, false)) shouldEqual HttpResponse( - entity = HttpEntity( - contentType = ContentType(MediaTypes.`text/xml`, HttpCharsets.`ISO-8859-1`), - string = "\n\n\n" - ) - ) - } - "return 404 if the specified config is absent" in { - service.flashCrossDomainPolicy(CrossDomainConfig(false, List("*"), false)) shouldEqual - HttpResponse(404, entity = "404 not found") - } - } - - "rootResponse" in { - "return the configured response for root requests" in { - service.rootResponse(RootResponseConfig(enabled = true, 302, Map("Location" -> "https://127.0.0.1/"))) shouldEqual HttpResponse( - 302, - collection.immutable.Seq(RawHeader("Location", "https://127.0.0.1/")), - entity = "" - ) - } - "return the configured response for root requests (no headers)" in { - service.rootResponse(RootResponseConfig(enabled = true, 302)) shouldEqual HttpResponse( - 302, - entity = "" - ) - } - "return the original 404 if not configured" in { - service.rootResponse shouldEqual HttpResponse( - 404, - entity = "404 not found" - ) - } - } - - "buildEvent" in { - "fill the correct values if SP-Anonymous is not present" in { - val l = `Location`("l") - val xff = `X-Forwarded-For`(RemoteAddress(InetAddress.getByName("127.0.0.1"))) - val ct = Some("image/gif") - val r = HttpRequest().withHeaders(l :: hs) - val e = service.buildEvent(Some("q"), Some("b"), "p", Some("ua"), Some("ref"), "h", "ip", r, "nuid", ct, None) - e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" - e.ipAddress shouldEqual "ip" - e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"app-version-kinesis" - e.querystring shouldEqual "q" - e.body shouldEqual "b" - e.path shouldEqual "p" - e.userAgent shouldEqual "ua" - e.refererUri shouldEqual "ref" - e.hostname shouldEqual "h" - e.networkUserId shouldEqual "nuid" - e.headers shouldEqual (l.unsafeToString :: xff.unsafeToString :: ct.toList).asJava - e.contentType shouldEqual ct.get - } - "fill the correct values if SP-Anonymous is present" in { - val l = `Location`("l") - val ct = Some("image/gif") - val r = HttpRequest().withHeaders(l :: hs) - val nuid = service.networkUserId(r, None, Some("*")).get - val e = - service.buildEvent( - Some("q"), - Some("b"), - "p", - Some("ua"), - Some("ref"), - "h", - "unknown", - r, - nuid, - ct, - Some("*") - ) - e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" - e.ipAddress shouldEqual "unknown" - e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"app-version-kinesis" - e.querystring shouldEqual "q" - e.body shouldEqual "b" - e.path shouldEqual "p" - e.userAgent shouldEqual "ua" - e.refererUri shouldEqual "ref" - e.hostname shouldEqual "h" - e.networkUserId shouldEqual "00000000-0000-0000-0000-000000000000" - e.headers shouldEqual (l.unsafeToString :: ct.toList).asJava - e.contentType shouldEqual ct.get - } - "have a null queryString if it's None" in { - val l = `Location`("l") - val ct = Some("image/gif") - val r = HttpRequest().withHeaders(l :: hs) - val nuid = service.networkUserId(r, None, Some("*")).get - val e = - service.buildEvent( - None, - Some("b"), - "p", - Some("ua"), - Some("ref"), - "h", - "unknown", - r, - nuid, - ct, - Some("*") - ) - e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" - e.ipAddress shouldEqual "unknown" - e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"app-version-kinesis" - e.querystring shouldEqual null - e.body shouldEqual "b" - e.path shouldEqual "p" - e.userAgent shouldEqual "ua" - e.refererUri shouldEqual "ref" - e.hostname shouldEqual "h" - e.networkUserId shouldEqual "00000000-0000-0000-0000-000000000000" - e.headers shouldEqual (l.unsafeToString :: ct.toList).asJava - e.contentType shouldEqual ct.get - } - "have an empty nuid if SP-Anonymous is present" in { - val l = `Location`("l") - val ct = Some("image/gif") - val r = HttpRequest().withHeaders(l :: hs) - val nuid = service.networkUserId(r, None, Some("*")).get - val e = - service.buildEvent( - None, - Some("b"), - "p", - Some("ua"), - Some("ref"), - "h", - "unknown", - r, - nuid, - ct, - Some("*") - ) - e.networkUserId shouldEqual "00000000-0000-0000-0000-000000000000" - } - "have a nuid if SP-Anonymous is not present" in { - val l = `Location`("l") - val ct = Some("image/gif") - val r = HttpRequest().withHeaders(l :: hs) - val e = - service.buildEvent(None, Some("b"), "p", Some("ua"), Some("ref"), "h", "ip", r, "nuid", ct, None) - e.networkUserId shouldEqual "nuid" - } - } - - "sinkEvent" in { - "send back the produced events" in { - val ProbeService(s, good, bad) = probeService() - s.sinkEvent(event, "key") - good.storedRawEvents must have size 1 - bad.storedRawEvents must have size 0 - good.storedRawEvents.head.zip(serializer.serialize(event)).forall { case (a, b) => a mustEqual b } - } - } - - "sinkBad" in { - "write out the generated bad row" in { - val br = BadRow.GenericError( - Processor("", ""), - Failure.GenericFailure(Instant.now(), NonEmptyList.one("IllegalQueryString")), - Payload.RawPayload("") - ) - val ProbeService(s, good, bad) = probeService() - s.sinkBad(br, "key") - - bad.storedRawEvents must have size 1 - good.storedRawEvents must have size 0 - bad.storedRawEvents.head.zip(br.compact).forall { case (a, b) => a mustEqual b } - } - } - - "buildHttpResponse" in { - val redirConf = TestUtils.testConf.redirectMacro - val domain = TestUtils.testConf.redirectDomains.head - - "rely on buildRedirectHttpResponse if redirect is true" in { - val res = service.buildHttpResponse(event, Map("u" -> s"https://$domain/12"), hs, true, true, false, redirConf) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"https://$domain/12") :: hs) - } - "send back a gif if pixelExpected is true" in { - val res = service.buildHttpResponse(event, Map.empty, hs, false, true, false, redirConf) - res shouldEqual HttpResponse(200) - .withHeaders(hs) - .withEntity(HttpEntity(contentType = ContentType(MediaTypes.`image/gif`), bytes = CollectorService.pixel)) - } - "send back a found if pixelExpected and bounce is true" in { - val res = service.buildHttpResponse(event, Map.empty, hs, false, true, true, redirConf) - res shouldEqual HttpResponse(302).withHeaders(hs) - } - "send back ok otherwise" in { - val res = service.buildHttpResponse(event, Map.empty, hs, false, false, false, redirConf) - res shouldEqual HttpResponse(200, entity = "ok").withHeaders(hs) - } - } - - "buildUsualHttpResponse" in { - "send back a found if pixelExpected and bounce is true" in { - service.buildUsualHttpResponse(true, true) shouldEqual HttpResponse(302) - } - "send back a gif if pixelExpected is true" in { - service.buildUsualHttpResponse(true, false) shouldEqual HttpResponse(200).withEntity( - HttpEntity(contentType = ContentType(MediaTypes.`image/gif`), bytes = CollectorService.pixel) - ) - } - "send back ok otherwise" in { - service.buildUsualHttpResponse(false, true) shouldEqual HttpResponse(200, entity = "ok") - } - } - - "buildRedirectHttpResponse" in { - val redirConf = TestUtils.testConf.redirectMacro - val domain = TestUtils.testConf.redirectDomains.head - "give back a 302 if redirecting and there is a u query param" in { - val res = service.buildRedirectHttpResponse(event, Map("u" -> s"http://$domain/12"), redirConf) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"http://$domain/12")) - } - "give back a 400 if redirecting and there are no u query params" in { - val res = service.buildRedirectHttpResponse(event, Map.empty, redirConf) - res shouldEqual HttpResponse(400) - } - "the redirect url should ignore a cookie replacement macro on redirect if not enabled" in { - event.networkUserId = "1234" - val res = - service.buildRedirectHttpResponse(event, Map("u" -> s"http://$domain/?uid=$${SP_NUID}"), redirConf) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"http://$domain/?uid=$${SP_NUID}")) - } - "the redirect url should support a cookie replacement macro on redirect if enabled" in { - event.networkUserId = "1234" - val res = service.buildRedirectHttpResponse( - event, - Map("u" -> s"http://$domain/?uid=$${SP_NUID}"), - redirConf.copy(enabled = true) - ) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"http://$domain/?uid=1234")) - } - "the redirect url should allow for custom token placeholders" in { - event.networkUserId = "1234" - val res = service.buildRedirectHttpResponse( - event, - Map("u" -> s"http://$domain/?uid=[TOKEN]"), - redirConf.copy(enabled = true, Some("[TOKEN]")) - ) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"http://$domain/?uid=1234")) - } - "the redirect url should allow for double encoding for return redirects" in { - val res = - service.buildRedirectHttpResponse(event, Map("u" -> s"http://$domain/a%3Db"), redirConf) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"http://$domain/a%3Db")) - } - "give back a 400 if redirecting to a disallowed domain" in { - val res = service.buildRedirectHttpResponse(event, Map("u" -> s"http://invalid.acme.com/12"), redirConf) - res shouldEqual HttpResponse(400) - } - "give back a 302 if redirecting to an unknown domain, with no restrictions on domains" in { - def conf = TestUtils.testConf.copy(redirectDomains = Set.empty) - val permissiveService = new CollectorService( - conf, - CollectorSinks(new TestSink, new TestSink), - "app", - "version" - ) - val res = - permissiveService.buildRedirectHttpResponse(event, Map("u" -> s"http://unknown.acme.com/12"), redirConf) - res shouldEqual HttpResponse(302).withHeaders(`RawHeader`("Location", s"http://unknown.acme.com/12")) - } - } - - "cookieHeader" in { - "give back a cookie header with the appropriate configuration" in { - val nuid = "nuid" - val conf = CookieConfig( - true, - "name", - 5.seconds, - Some(List("domain")), - None, - secure = false, - httpOnly = false, - sameSite = None - ) - val Some(`Set-Cookie`(cookie)) = service.cookieHeader(HttpRequest(), Some(conf), nuid, false, None) - - cookie.name shouldEqual conf.name - cookie.value shouldEqual nuid - cookie.domain shouldEqual None - cookie.path shouldEqual Some("/") - cookie.expires must beSome - (cookie.expires.get - DateTime.now.clicks).clicks must beCloseTo(conf.expiration.toMillis, 1000L) - cookie.secure must beFalse - cookie.httpOnly must beFalse - cookie.extension must beEmpty - } - "give back None if no configuration is given" in { - service.cookieHeader(HttpRequest(), None, "nuid", false, None) shouldEqual None - } - "give back None if doNoTrack is true" in { - val conf = CookieConfig( - true, - "name", - 5.seconds, - Some(List("domain")), - None, - secure = false, - httpOnly = false, - sameSite = None - ) - service.cookieHeader(HttpRequest(), Some(conf), "nuid", true, None) shouldEqual None - } - "give back None if SP-Anonymous header is present" in { - val conf = CookieConfig( - true, - "name", - 5.seconds, - Some(List("domain")), - None, - secure = false, - httpOnly = false, - sameSite = None - ) - service.cookieHeader(HttpRequest(), Some(conf), "nuid", true, Some("*")) shouldEqual None - } - "give back a cookie header with Secure, HttpOnly and SameSite=None" in { - val nuid = "nuid" - val conf = CookieConfig( - true, - "name", - 5.seconds, - Some(List("domain")), - None, - secure = true, - httpOnly = true, - sameSite = Some("None") - ) - val Some(`Set-Cookie`(cookie)) = - service.cookieHeader(HttpRequest(), Some(conf), networkUserId = nuid, doNotTrack = false, spAnonymous = None) - cookie.secure must beTrue - cookie.httpOnly must beTrue - cookie.extension must beSome("SameSite=None") - service.cookieHeader(HttpRequest(), Some(conf), nuid, true, None) shouldEqual None - } - } - - "bounceLocationHeader" in { - "build a location header if bounce is true" in { - val header = service.bounceLocationHeader( - Map("a" -> "b"), - HttpRequest().withUri(Uri("st")), - CookieBounceConfig(true, "bounce", "", None), - true - ) - header shouldEqual Some(`Location`("st?a=b&bounce=true")) - } - "give back none otherwise" in { - val header = service.bounceLocationHeader( - Map("a" -> "b"), - HttpRequest().withUri(Uri("st")), - CookieBounceConfig(false, "bounce", "", None), - false - ) - header shouldEqual None - } - "use forwarded protocol header if present and enabled" in { - val header = service.bounceLocationHeader( - Map("a" -> "b"), - HttpRequest().withUri(Uri("http://st")).addHeader(RawHeader("X-Forwarded-Proto", "https")), - CookieBounceConfig(true, "bounce", "", Some("X-Forwarded-Proto")), - true - ) - header shouldEqual Some(`Location`("https://st?a=b&bounce=true")) - } - "allow missing forwarded protocol header if forward header is enabled but absent" in { - val header = service.bounceLocationHeader( - Map("a" -> "b"), - HttpRequest().withUri(Uri("http://st")), - CookieBounceConfig(true, "bounce", "", Some("X-Forwarded-Proto")), - true - ) - header shouldEqual Some(`Location`("http://st?a=b&bounce=true")) - } - } - - "headers" in { - "filter out the correct headers if SP-Anonymous is not present" in { - val request = HttpRequest() - .withHeaders( - List( - `Location`("a"), - `Raw-Request-URI`("uri") - ) - ) - .withAttributes( - Map(AttributeKeys.remoteAddress -> RemoteAddress.Unknown) - ) - service.headers(request, None) shouldEqual List(`Location`("a").unsafeToString) - } - "filter out the correct headers if SP-Anonymous is present" in { - val request = HttpRequest() - .withHeaders( - List( - `Location`("a"), - `Raw-Request-URI`("uri"), - `X-Forwarded-For`(RemoteAddress(InetAddress.getByName("127.0.0.1"))), - `X-Real-Ip`(RemoteAddress(InetAddress.getByName("127.0.0.1"))), - `Cookie`( - "_sp_id.dc78", - "82dd4038-e749-4f9c-b502-d54a3611cc89.1598608039.19.1605281535.1604957469.5a2d5fe4-6323-4414-9bf0-9867a940d53b" - ) - ) - ) - .withAttributes( - Map(AttributeKeys.remoteAddress -> RemoteAddress.Unknown) - ) - service.headers(request, Some("*")) shouldEqual List(`Location`("a").unsafeToString) - } - } - - "ipAndPartitionkey" in { - "give back the ip and partition key as ip if remote address is defined" in { - val address = RemoteAddress(InetAddress.getByName("localhost")) - service.ipAndPartitionKey(address, true) shouldEqual (("127.0.0.1", "127.0.0.1")) - } - "give back the ip and a uuid as partition key if ipAsPartitionKey is false" in { - val address = RemoteAddress(InetAddress.getByName("localhost")) - val (ip, pkey) = service.ipAndPartitionKey(address, false) - ip shouldEqual "127.0.0.1" - pkey must beMatching(uuidRegex) - } - "give back unknown as ip and a random uuid as partition key if the address isn't known" in { - val (ip, pkey) = service.ipAndPartitionKey(RemoteAddress.Unknown, true) - ip shouldEqual "unknown" - pkey must beMatching(uuidRegex) - } - } - - "netwokUserId" in { - "with SP-Anonymous header not present" in { - "give back the nuid query param if present" in { - service.networkUserId( - HttpRequest().withUri(Uri().withRawQueryString("nuid=12")), - Some(HttpCookie("nuid", "13")), - None - ) shouldEqual Some("12") - } - "give back the request cookie if there no nuid query param" in { - service.networkUserId(HttpRequest(), Some(HttpCookie("nuid", "13")), None) shouldEqual Some("13") - } - "give back none otherwise" in { - service.networkUserId(HttpRequest(), None, None) shouldEqual None - } - } - - "with SP-Anonymous header present" in { - "give back the dummy nuid" in { - "if query param is present" in { - service.networkUserId( - HttpRequest().withUri(Uri().withRawQueryString("nuid=12")), - Some(HttpCookie("nuid", "13")), - Some("*") - ) shouldEqual Some("00000000-0000-0000-0000-000000000000") - } - "if the request cookie can be used in place of a missing nuid query param" in { - service.networkUserId(HttpRequest(), Some(HttpCookie("nuid", "13")), Some("*")) shouldEqual Some( - "00000000-0000-0000-0000-000000000000" - ) - } - "in any other case" in { - service.networkUserId(HttpRequest(), None, Some("*")) shouldEqual Some( - "00000000-0000-0000-0000-000000000000" - ) - } - } - } - } - - "accessControlAllowOriginHeader" in { - "give a restricted ACAO header if there is an Origin header in the request" in { - val origin = HttpOrigin("http", Host("origin")) - val request = HttpRequest().withHeaders(`Origin`(origin)) - service.accessControlAllowOriginHeader(request) shouldEqual - `Access-Control-Allow-Origin`(HttpOriginRange.Default(List(origin))) - } - "give an open ACAO header if there are no Origin headers in the request" in { - val request = HttpRequest() - service.accessControlAllowOriginHeader(request) shouldEqual - `Access-Control-Allow-Origin`(HttpOriginRange.`*`) - } - } - - "cookieDomain" in { - "not return a domain" in { - "if a list of domains is not supplied in the config and there is no fallback domain" in { - val request = HttpRequest() - val cookieConfig = CookieConfig(true, "name", 5.seconds, None, None, false, false, None) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None - } - "if a list of domains is supplied in the config but the Origin request header is empty and there is no fallback domain" in { - val request = HttpRequest() - val cookieConfig = CookieConfig(true, "name", 5.seconds, Some(List("domain.com")), None, false, false, None) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None - } - "if none of the domains in the request's Origin header has a match in the list of domains supplied with the config and there is no fallback domain" in { - val origins = Seq(HttpOrigin("http", Host("origin.com")), HttpOrigin("http", Host("otherorigin.com", 8080))) - val request = HttpRequest().withHeaders(`Origin`(origins)) - val cookieConfig = - CookieConfig(true, "name", 5.seconds, Some(List("domain.com", "otherdomain.com")), None, false, false, None) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual None - } - } - "return the fallback domain" in { - "if a list of domains is not supplied in the config but a fallback domain is configured" in { - val request = HttpRequest() - val cookieConfig = CookieConfig(true, "name", 5.seconds, None, Some("fallbackDomain"), false, false, None) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( - "fallbackDomain" - ) - } - "if the Origin header is empty and a fallback domain is configured" in { - val request = HttpRequest() - val cookieConfig = - CookieConfig(true, "name", 5.seconds, Some(List("domain.com")), Some("fallbackDomain"), false, false, None) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( - "fallbackDomain" - ) - } - "if none of the domains in the request's Origin header has a match in the list of domains supplied with the config but a fallback domain is configured" in { - val origins = Seq(HttpOrigin("http", Host("origin.com")), HttpOrigin("http", Host("otherorigin.com", 8080))) - val request = HttpRequest().withHeaders(`Origin`(origins)) - val cookieConfig = CookieConfig( - true, - "name", - 5.seconds, - Some(List("domain.com", "otherdomain.com")), - Some("fallbackDomain"), - false, - false, - None - ) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( - "fallbackDomain" - ) - } - } - "return only the first match if multiple domains from the request's Origin header have matches in the list of domains supplied with the config" in { - val origins = - Seq(HttpOrigin("http", Host("www.domain.com")), HttpOrigin("http", Host("www.otherdomain.com", 8080))) - val request = HttpRequest().withHeaders(`Origin`(origins)) - val cookieConfig = CookieConfig( - true, - "name", - 5.seconds, - Some(List("domain.com", "otherdomain.com")), - Some("fallbackDomain"), - false, - false, - None - ) - service.cookieDomain(request.headers, cookieConfig.domains, cookieConfig.fallbackDomain) shouldEqual Some( - "domain.com" - ) - } - } - - "extractHosts" in { - "correctly extract the host names from a list of values in the request's Origin header" in { - val origins = - Seq(HttpOrigin("http", Host("origin.com")), HttpOrigin("http", Host("subdomain.otherorigin.gov.co.uk", 8080))) - service.extractHosts(origins) shouldEqual Seq("origin.com", "subdomain.otherorigin.gov.co.uk") - } - } - - "validMatch" in { - val domain = "snplow.com" - "true for valid matches" in { - val validHost1 = "snplow.com" - val validHost2 = "blog.snplow.com" - val validHost3 = "blog.snplow.com.snplow.com" - service.validMatch(validHost1, domain) shouldEqual true - service.validMatch(validHost2, domain) shouldEqual true - service.validMatch(validHost3, domain) shouldEqual true - } - "false for invalid matches" in { - val invalidHost1 = "notsnplow.com" - val invalidHost2 = "blog.snplow.comsnplow.com" - service.validMatch(invalidHost1, domain) shouldEqual false - service.validMatch(invalidHost2, domain) shouldEqual false - } - } - - "determinePath" in { - val vendor = "com.acme" - val version1 = "track" - val version2 = "redirect" - val version3 = "iglu" - - "should correctly replace the path in the request if a mapping is provided" in { - val expected1 = "/com.snowplowanalytics.snowplow/tp2" - val expected2 = "/r/tp2" - val expected3 = "/com.snowplowanalytics.iglu/v1" - - service.determinePath(vendor, version1) shouldEqual expected1 - service.determinePath(vendor, version2) shouldEqual expected2 - service.determinePath(vendor, version3) shouldEqual expected3 - } - - "should pass on the original path if no mapping for it can be found" in { - val service = new CollectorService( - TestUtils.testConf.copy(paths = Map.empty[String, String]), - CollectorSinks(new TestSink, new TestSink), - "", - "" - ) - val expected1 = "/com.acme/track" - val expected2 = "/com.acme/redirect" - val expected3 = "/com.acme/iglu" - - service.determinePath(vendor, version1) shouldEqual expected1 - service.determinePath(vendor, version2) shouldEqual expected2 - service.determinePath(vendor, version3) shouldEqual expected3 - } - } - } -} diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala deleted file mode 100644 index 649353fbe..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestSink.scala +++ /dev/null @@ -1,30 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.Sink - -import scala.collection.mutable.ListBuffer - -// Allow the testing framework to test collection events using the -// same methods from AbstractSink as the other sinks. -class TestSink extends Sink { - - override val maxBytes = Int.MaxValue - - private val buf: ListBuffer[Array[Byte]] = ListBuffer() - def storedRawEvents: List[Array[Byte]] = buf.toList - - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - buf ++= events - - override def shutdown(): Unit = () -} diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala deleted file mode 100644 index 7fc024d90..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/TestUtils.scala +++ /dev/null @@ -1,69 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -import scala.concurrent.duration._ -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ - -object TestUtils { - val testConf = - CollectorConfig( - interface = "0.0.0.0", - port = 8080, - paths = Map( - "/com.acme/track" -> "/com.snowplowanalytics.snowplow/tp2", - "/com.acme/redirect" -> "/r/tp2", - "/com.acme/iglu" -> "/com.snowplowanalytics.iglu/v1" - ), - p3p = P3PConfig("/w3c/p3p.xml", "NOI DSP COR NID PSA OUR IND COM NAV STA"), - CrossDomainConfig(enabled = true, List("*"), secure = false), - cookie = CookieConfig( - true, - "sp", - 365.days, - None, - None, - secure = false, - httpOnly = false, - sameSite = None - ), - doNotTrackCookie = DoNotTrackCookieConfig(false, "abc", "123"), - cookieBounce = CookieBounceConfig(false, "bounce", "new-nuid", None), - redirectMacro = RedirectMacroConfig(false, None), - rootResponse = RootResponseConfig(false, 404), - cors = CORSConfig(-1.seconds), - streams = StreamsConfig( - good = "good", - bad = "bad", - useIpAddressAsPartitionKey = false, - sink = Kinesis( - maxBytes = 1000000, - region = "us-east-1", - threadPoolSize = 12, - aws = AWSConfig("cpf", "cpf"), - backoffPolicy = KinesisBackoffPolicyConfig(500L, 1500L, 3), - customEndpoint = None, - sqsGoodBuffer = Some("good-buffer"), - sqsBadBuffer = Some("bad-buffer"), - sqsMaxBytes = 192000, - startupCheckInterval = 1.second - ), - buffer = BufferConfig(4000000L, 500L, 60000L) - ), - monitoring = MonitoringConfig(MetricsConfig(StatsdConfig(false, "localhost", 8125, 10.seconds))), - telemetry = None, - enableDefaultRedirect = false, - redirectDomains = Set("localhost"), - terminationDeadline = 10.seconds, - preTerminationPeriod = 10.seconds, - preTerminationUnhealthy = false, - experimental = ExperimentalConfig(WarmupConfig(false, 2000, 2000, 3)) - ) -} diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala deleted file mode 100644 index 161bde64b..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigReaderSpec.scala +++ /dev/null @@ -1,41 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.config - -import pureconfig.ConfigSource -import pureconfig.error.ConfigReaderFailure -import org.specs2.mutable.Specification - -import com.snowplowanalytics.snowplow.collectors.scalastream.model.CollectorConfig - -class ConfigReaderSpec extends Specification { - - "The collector config reader" should { - "parse a valid config file" in { - val source = getConfig("/configs/valid-config.hocon") - source.load[CollectorConfig] must beRight - } - - "reject a config file with invalid fallbackDomain" in { - val source = getConfig("/configs/invalid-fallback-domain.hocon") - source.load[CollectorConfig] must beLeft.like { - case failures => - failures.toList must contain { (failure: ConfigReaderFailure) => - failure.description must startWith("fallbackDomain contains invalid character") - } - } - } - } - - def getConfig(resourceName: String): ConfigSource = - ConfigSource.url(getClass.getResource(resourceName)).withFallback(ConfigSource.default) - -} diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala deleted file mode 100644 index fb12ffa2c..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/config/ConfigSpec.scala +++ /dev/null @@ -1,176 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.config - -import com.snowplowanalytics.snowplow.collectors.scalastream.Collector -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import org.specs2.mutable.Specification -import org.specs2.specification.core.{Fragment, Fragments} - -import java.nio.file.Paths -import scala.concurrent.duration.DurationInt - -abstract class ConfigSpec extends Specification { - - def configRefFactory(app: String): CollectorConfig = CollectorConfig( - interface = "0.0.0.0", - port = 8080, - paths = Map.empty[String, String], - p3p = P3PConfig( - policyRef = "/w3c/p3p.xml", - CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" - ), - crossDomain = CrossDomainConfig( - enabled = false, - domains = List("*"), - secure = true - ), - cookie = CookieConfig( - enabled = true, - expiration = 365.days, - name = "sp", - domains = None, - fallbackDomain = None, - secure = true, - httpOnly = true, - sameSite = Some("None") - ), - doNotTrackCookie = DoNotTrackCookieConfig( - enabled = false, - name = "", - value = "" - ), - cookieBounce = CookieBounceConfig( - enabled = false, - name = "n3pc", - fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000", - forwardedProtocolHeader = None - ), - redirectMacro = RedirectMacroConfig( - enabled = false, - placeholder = None - ), - rootResponse = RootResponseConfig( - enabled = false, - statusCode = 302, - headers = Map.empty[String, String], - body = "" - ), - cors = CORSConfig(60.minutes), - monitoring = MonitoringConfig(MetricsConfig(StatsdConfig(false, "localhost", 8125, 10.seconds))), - telemetry = Some(TelemetryConfig()), - ssl = SSLConfig(enable = false, redirect = false, port = 443), - enableDefaultRedirect = false, - redirectDomains = Set.empty, - terminationDeadline = 10.seconds, - preTerminationPeriod = 10.seconds, - preTerminationUnhealthy = false, - streams = StreamsConfig( - good = "good", - bad = "bad", - useIpAddressAsPartitionKey = false, - buffer = - if (app == "pubsub") - BufferConfig( - byteLimit = 100000, - recordLimit = 40, - timeLimit = 1000 - ) - else - BufferConfig( - byteLimit = 3145728, - recordLimit = 500, - timeLimit = 5000 - ), - sink = sinkConfigRefFactory(app) - ), - experimental = ExperimentalConfig(WarmupConfig(false, 2000, 2000, 3)) - ) - - def sinkConfigRefFactory(app: String): SinkConfig = app match { - case "nsq" => Nsq(maxBytes = 1000000, "nsqHost", 4150) - case "kafka" => Kafka(maxBytes = 1000000, "localhost:9092,another.host:9092", 10, None) - case "pubsub" => - GooglePubSub( - maxBytes = 10000000, - googleProjectId = "google-project-id", - backoffPolicy = GooglePubSubBackoffPolicyConfig( - minBackoff = 1000, - maxBackoff = 1000, - totalBackoff = 9223372036854L, - multiplier = 2, - initialRpcTimeout = 10000, - maxRpcTimeout = 10000, - rpcTimeoutMultiplier = 2 - ), - startupCheckInterval = 1.second, - retryInterval = 10.seconds, - gcpUserAgent = GcpUserAgent(productName = "Snowplow OSS") - ) - case "sqs" => - Sqs( - maxBytes = 192000, - region = "eu-central-1", - threadPoolSize = 10, - aws = AWSConfig( - accessKey = "iam", - secretKey = "iam" - ), - backoffPolicy = SqsBackoffPolicyConfig( - minBackoff = 500, - maxBackoff = 1500, - maxRetries = 3 - ), - startupCheckInterval = 1.second - ) - case "stdout" => Stdout(maxBytes = 1000000000) - case "kinesis" => - Kinesis( - maxBytes = 1000000, - region = "eu-central-1", - threadPoolSize = 10, - aws = AWSConfig( - accessKey = "iam", - secretKey = "iam" - ), - backoffPolicy = KinesisBackoffPolicyConfig( - minBackoff = 500, - maxBackoff = 1500, - maxRetries = 3 - ), - sqsBadBuffer = None, - sqsGoodBuffer = None, - sqsMaxBytes = 192000, - customEndpoint = None, - startupCheckInterval = 1.second - ) - } - - def makeConfigTest(app: String, appVer: String, scalaVer: String): Fragments = { - object stubCollector extends Collector { - def appName = app - def appVersion = appVer - def scalaVersion = scalaVer - } - - "Config.parseConfig" >> Fragment.foreach( - Seq(("minimal", app), ("extended", app)) - ) { - case (suffix, app) => - s"accept example $suffix $app config" >> { - val config = Paths.get(getClass.getResource(s"/config.$app.$suffix.hocon").toURI) - val argv = Array("--config", config.toString) - val (result, _) = stubCollector.parseConfig(argv) - (result must be).equalTo(configRefFactory(app)) - } - } - } -} diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala deleted file mode 100644 index d3ecdd3b0..000000000 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/utils/SplitBatchSpec.scala +++ /dev/null @@ -1,155 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -package utils - -import org.apache.thrift.TDeserializer - -import io.circe.Json -import io.circe.parser._ -import io.circe.syntax._ - -import com.snowplowanalytics.iglu.core.circe.implicits._ -import com.snowplowanalytics.iglu.core.SelfDescribingData -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload -import com.snowplowanalytics.snowplow.badrows._ -import com.snowplowanalytics.snowplow.collectors.scalastream.model.SplitBatchResult - -import org.specs2.mutable.Specification - -class SplitBatchSpec extends Specification { - val splitBatch: SplitBatch = SplitBatch("app", "version") - - "SplitBatch.split" should { - "Batch a list of strings based on size" in { - splitBatch.split(List("a", "b", "c").map(Json.fromString), 9, 1) must_== - SplitBatchResult(List(List("c"), List("b", "a")).map(_.map(Json.fromString)), Nil) - } - - "Reject only those strings which are too big" in { - splitBatch.split(List("1234567", "1", "123").map(Json.fromString), 8, 0) must_== - SplitBatchResult(List(List("123", "1").map(Json.fromString)), List("1234567").map(Json.fromString)) - } - - "Batch a long list of strings" in { - splitBatch.split( - List("123456778901", "123456789", "12345678", "1234567", "123456", "12345", "1234", "123", "12", "1") - .map(Json.fromString), - 13, - 0 - ) must_== - SplitBatchResult( - List( - List("1", "12", "123"), - List("1234", "12345"), - List("123456"), - List("1234567"), - List("12345678"), - List("123456789") - ).map(_.map(Json.fromString)), - List("123456778901").map(Json.fromString) - ) - } - } - - "SplitBatch.splitAndSerializePayload" should { - "Serialize an empty CollectorPayload" in { - val actual = splitBatch.splitAndSerializePayload(new CollectorPayload(), 100) - val target = new CollectorPayload() - new TDeserializer().deserialize(target, actual.good.head) - target must_== new CollectorPayload() - } - - "Reject an oversized GET CollectorPayload" in { - val payload = new CollectorPayload() - payload.setQuerystring("x" * 1000) - val actual = splitBatch.splitAndSerializePayload(payload, 100) - val res = parse(new String(actual.bad.head)).toOption.get - val selfDesc = SelfDescribingData.parse(res).toOption.get - val badRow = selfDesc.data.as[BadRow].toOption.get - badRow must beAnInstanceOf[BadRow.SizeViolation] - val sizeViolation = badRow.asInstanceOf[BadRow.SizeViolation] - sizeViolation.failure.maximumAllowedSizeBytes must_== 100 - sizeViolation.failure.actualSizeBytes must_== 1019 - sizeViolation.failure.expectation must_== "oversized collector payload: GET requests cannot be split" - sizeViolation.payload.event must_== "CollectorP" - sizeViolation.processor shouldEqual Processor("app", "version") - actual.good must_== Nil - } - - "Reject an oversized POST CollectorPayload with an unparseable body" in { - val payload = new CollectorPayload() - payload.setBody("s" * 1000) - val actual = splitBatch.splitAndSerializePayload(payload, 100) - val res = parse(new String(actual.bad.head)).toOption.get - val selfDesc = SelfDescribingData.parse(res).toOption.get - val badRow = selfDesc.data.as[BadRow].toOption.get - badRow must beAnInstanceOf[BadRow.SizeViolation] - val sizeViolation = badRow.asInstanceOf[BadRow.SizeViolation] - sizeViolation.failure.maximumAllowedSizeBytes must_== 100 - sizeViolation.failure.actualSizeBytes must_== 1019 - sizeViolation - .failure - .expectation must_== "oversized collector payload: cannot split POST requests which are not json expected json value got 'ssssss...' (line 1, column 1)" - sizeViolation.payload.event must_== "CollectorP" - sizeViolation.processor shouldEqual Processor("app", "version") - } - - "Reject an oversized POST CollectorPayload which would be oversized even without its body" in { - val payload = new CollectorPayload() - val data = Json.obj( - "schema" := Json.fromString("s"), - "data" := Json.arr( - Json.obj("e" := "se", "tv" := "js"), - Json.obj("e" := "se", "tv" := "js") - ) - ) - payload.setBody(data.noSpaces) - payload.setPath("p" * 1000) - val actual = splitBatch.splitAndSerializePayload(payload, 1000) - actual.bad.size must_== 1 - val res = parse(new String(actual.bad.head)).toOption.get - val selfDesc = SelfDescribingData.parse(res).toOption.get - val badRow = selfDesc.data.as[BadRow].toOption.get - badRow must beAnInstanceOf[BadRow.SizeViolation] - val sizeViolation = badRow.asInstanceOf[BadRow.SizeViolation] - sizeViolation.failure.maximumAllowedSizeBytes must_== 1000 - sizeViolation.failure.actualSizeBytes must_== 1091 - sizeViolation - .failure - .expectation must_== "oversized collector payload: cannot split POST requests which are not self-describing Invalid Iglu URI: s, code: INVALID_IGLUURI" - sizeViolation - .payload - .event must_== "CollectorPayload(schema:null, ipAddress:null, timestamp:0, encoding:null, collector:null, path:ppppp" - sizeViolation.processor shouldEqual Processor("app", "version") - } - - "Split a CollectorPayload with three large events and four very large events" in { - val payload = new CollectorPayload() - val data = Json.obj( - "schema" := Schemas.SizeViolation.toSchemaUri, - "data" := Json.arr( - Json.obj("e" := "se", "tv" := "x" * 600), - Json.obj("e" := "se", "tv" := "x" * 5), - Json.obj("e" := "se", "tv" := "x" * 600), - Json.obj("e" := "se", "tv" := "y" * 1000), - Json.obj("e" := "se", "tv" := "y" * 1000), - Json.obj("e" := "se", "tv" := "y" * 1000), - Json.obj("e" := "se", "tv" := "y" * 1000) - ) - ) - payload.setBody(data.noSpaces) - val actual = splitBatch.splitAndSerializePayload(payload, 1000) - actual.bad.size must_== 4 - actual.good.size must_== 2 - } - } -} diff --git a/examples/config.kafka.extended.hocon b/examples/config.kafka.extended.hocon index 426723138..3c03a8f90 100644 --- a/examples/config.kafka.extended.hocon +++ b/examples/config.kafka.extended.hocon @@ -284,7 +284,7 @@ collector { # Can be helpful for removing the collector from a load balancer's targets. preTerminationUnhealthy = false - # The akka server's deadline for closing connections during graceful shutdown + # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds experimental { @@ -299,29 +299,3 @@ collector { } } } - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} diff --git a/examples/config.kinesis.extended.hocon b/examples/config.kinesis.extended.hocon index f21906ec8..f64928aa8 100644 --- a/examples/config.kinesis.extended.hocon +++ b/examples/config.kinesis.extended.hocon @@ -368,7 +368,7 @@ collector { # Can be helpful for removing the collector from a load balancer's targets. preTerminationUnhealthy = false - # The akka server's deadline for closing connections during graceful shutdown + # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds experimental { @@ -383,29 +383,3 @@ collector { } } } - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} diff --git a/examples/config.nsq.extended.hocon b/examples/config.nsq.extended.hocon index 26b8c672f..3bb4f0b49 100644 --- a/examples/config.nsq.extended.hocon +++ b/examples/config.nsq.extended.hocon @@ -236,7 +236,7 @@ collector { # Can be helpful for removing the collector from a load balancer's targets. preTerminationUnhealthy = false - # The akka server's deadline for closing connections during graceful shutdown + # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds experimental { @@ -251,29 +251,3 @@ collector { } } } - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} diff --git a/examples/config.pubsub.extended.hocon b/examples/config.pubsub.extended.hocon index 3f907a917..ad0001c8f 100644 --- a/examples/config.pubsub.extended.hocon +++ b/examples/config.pubsub.extended.hocon @@ -300,7 +300,7 @@ collector { # Can be helpful for removing the collector from a load balancer's targets. preTerminationUnhealthy = false - # The akka server's deadline for closing connections during graceful shutdown + # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds experimental { @@ -314,30 +314,4 @@ collector { maxCycles = 3 } } -} - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} +} \ No newline at end of file diff --git a/examples/config.rabbitmq.extended.hocon b/examples/config.rabbitmq.extended.hocon deleted file mode 100644 index ca9ded4ad..000000000 --- a/examples/config.rabbitmq.extended.hocon +++ /dev/null @@ -1,293 +0,0 @@ -# Copyright (c) 2013-present Snowplow Analytics Ltd. -# All rights reserved. -# -# This software is made available by Snowplow Analytics, Ltd., -# under the terms of the Snowplow Limited Use License Agreement, Version 1.0 -# located at https://docs.snowplow.io/limited-use-license-1.0 -# BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION -# OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - - -# This file (config.hocon.sample) contains a template with -# configuration options for the Scala Stream Collector. -# -# To use, copy this to 'application.conf' and modify the configuration options. - -# 'collector' contains configuration options for the main Scala collector. -collector { - # The collector runs as a web service specified on the following interface and port. - interface = "0.0.0.0" - port = 8080 - - # optional SSL/TLS configuration - ssl { - enable = false - # whether to redirect HTTP to HTTPS - redirect = false - port = 443 - } - - # The collector responds with a cookie to requests with a path that matches the 'vendor/version' protocol. - # The expected values are: - # - com.snowplowanalytics.snowplow/tp2 for Tracker Protocol 2 - # - r/tp2 for redirects - # - com.snowplowanalytics.iglu/v1 for the Iglu Webhook - # Any path that matches the 'vendor/version' protocol will result in a cookie response, for use by custom webhooks - # downstream of the collector. - # But you can also map any valid (i.e. two-segment) path to one of the three defaults. - # Your custom path must be the key and the value must be one of the corresponding default paths. Both must be full - # valid paths starting with a leading slash. - # Pass in an empty map to avoid mapping. - paths { - # "/com.acme/track" = "/com.snowplowanalytics.snowplow/tp2" - # "/com.acme/redirect" = "/r/tp2" - # "/com.acme/iglu" = "/com.snowplowanalytics.iglu/v1" - } - - # Configure the P3P policy header. - p3p { - policyRef = "/w3c/p3p.xml" - CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" - } - - # Cross domain policy configuration. - # If "enabled" is set to "false", the collector will respond with a 404 to the /crossdomain.xml - # route. - crossDomain { - enabled = false - # Domains that are granted access, *.acme.com will match http://acme.com and http://sub.acme.com - domains = [ "*" ] - # Whether to only grant access to HTTPS or both HTTPS and HTTP sources - secure = true - } - - # The collector returns a cookie to clients for user identification - # with the following domain and expiration. - cookie { - enabled = true - expiration = 365 days - # Network cookie name - name = sp - # The domain is optional and will make the cookie accessible to other - # applications on the domain. Comment out these lines to tie cookies to - # the collector's full domain. - # The domain is determined by matching the domains from the Origin header of the request - # to the list below. The first match is used. If no matches are found, the fallback domain will be used, - # if configured. - # If you specify a main domain, all subdomains on it will be matched. - # If you specify a subdomain, only that subdomain will be matched. - # Examples: - # domain.com will match domain.com, www.domain.com and secure.client.domain.com - # client.domain.com will match secure.client.domain.com but not domain.com or www.domain.com - #domains = [ - # "acme1.com" # e.g. "domain.com" -> any origin domain ending with this will be matched and domain.com will be returned - # ... more domains - #] - # ... more domains - # If specified, the fallback domain will be used if none of the Origin header hosts matches the list of - # cookie domains configured above. (For example, if there is no Origin header.) - #fallbackDomain = "acme1.com" - secure = true - httpOnly = true - # The sameSite is optional. You can choose to not specify the attribute, or you can use `Strict`, - # `Lax` or `None` to limit the cookie sent context. - # Strict: the cookie will only be sent along with "same-site" requests. - # Lax: the cookie will be sent with same-site requests, and with cross-site top-level navigation. - # None: the cookie will be sent with same-site and cross-site requests. - sameSite = "None" - } - - # If you have a do not track cookie in place, the Scala Stream Collector can respect it by - # completely bypassing the processing of an incoming request carrying this cookie, the collector - # will simply reply by a 200 saying "do not track". - # The cookie name and value must match the configuration below, where the names of the cookies must - # match entirely and the value could be a regular expression. - doNotTrackCookie { - enabled = false - name = "" - value = "" - } - - # When enabled and the cookie specified above is missing, performs a redirect to itself to check - # if third-party cookies are blocked using the specified name. If they are indeed blocked, - # fallbackNetworkId is used instead of generating a new random one. - cookieBounce { - enabled = false - # The name of the request parameter which will be used on redirects checking that third-party - # cookies work. - name = "n3pc" - # Network user id to fallback to when third-party cookies are blocked. - fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000" - # Optionally, specify the name of the header containing the originating protocol for use in the - # bounce redirect location. Use this if behind a load balancer that performs SSL termination. - # The value of this header must be http or https. Example, if behind an AWS Classic ELB. - #forwardedProtocolHeader = "X-Forwarded-Proto" - } - - # When enabled, redirect prefix `r/` will be enabled and its query parameters resolved. - # Otherwise the request prefixed with `r/` will be dropped with `404 Not Found` - # Custom redirects configured in `paths` can still be used. - enableDefaultRedirect = false - - # Domains which are valid for collector redirects. If empty (the default) then redirects are - # allowed to any domain. - redirectDomains = [ - # "acme1.com" - ] - - # When enabled, the redirect url passed via the `u` query parameter is scanned for a placeholder - # token. All instances of that token are replaced withe the network ID. If the placeholder isn't - # specified, the default value is `${SP_NUID}`. - redirectMacro { - enabled = false - } - - # Customize response handling for requests for the root path ("/"). - # Useful if you need to redirect to web content or privacy policies regarding the use of this collector. - rootResponse { - enabled = false - statusCode = 302 - headers = { - } - body = "" - } - - # Configuration related to CORS preflight requests - cors { - # The Access-Control-Max-Age response header indicates how long the results of a preflight - # request can be cached. -1 seconds disables the cache. Chromium max is 10m, Firefox is 24h. - accessControlMaxAge = 60 minutes - } - - streams { - # Events which have successfully been collected will be stored in the good exchange - good = "raw" - - # Bad rows (https://docs.snowplowanalytics.com/docs/try-snowplow/recipes/recipe-understanding-bad-data/) will be stored in the bad exchange. - # The collector can currently produce two flavours of bad row: - # - a size_violation if an event is larger that the Kinesis (1MB) or SQS (256KB) limits; - # - a generic_error if a request's querystring cannot be parsed because of illegal characters - bad = "bad-1" - - # Whether to use the incoming event's ip as the partition key for the good stream/topic - # Note: Nsq does not make use of partition key. - useIpAddressAsPartitionKey = false - - sink { - # Default host for connections - host = "localhost" - # Password when connecting to the broker - port = 5672 - # AMQP user name when connecting to the broker - username = "guest" - # Password when connecting to the broker - password = "guest" - # Virtual host when connecting to the broker - virtualHost = "/" - # Routing key for collector payloads exchange - routingKeyGood: "raw" - # Routing key for bad rows exchange - routingKeyBad: "bad-1" - - # Optional. Backoff policy to retry the writes to RabbitMQ - backoffPolicy { - # Minimum backoff period in milliseconds - minBackoff = 100 - # Minimum backoff period in milliseconds - maxBackoff = 10000 - # Multiplier between two periods - multiplier = 2 - } - - # Optional. Size of the thread pool used to send the requests to RabbitMQ. - # The thread pool is shared by the writing of good and bad events. - # If this parameter is omitted, a cached thread pool is used. - # threadPoolSize = 30 - - # Optional. Maximum number of bytes that a single record can contain. - # If a record is bigger, a size violation bad row is emitted instead - # Default: 128 MB - maxBytes = 128000000 - } - } - - # Telemetry sends heartbeat events to external pipeline. - # Unless disable parameter set to true, this feature will be enabled. Deleting whole section will not disable it. - # Schema URI: iglu:com.snowplowanalytics.oss/oss_context/jsonschema/1-0-1 - # - telemetry { - disable = false - interval = 60 minutes - - # Connection properties for the receiving pipeline - method = POST - url = telemetry-g.snowplowanalytics.com - port = 443 - secure = true - } - - monitoring.metrics.statsd { - enabled = false - # StatsD metric reporting protocol configuration - hostname = localhost - port = 8125 - # Required, how frequently to report metrics - period = "10 seconds" - # Optional, override the default metric prefix - # "prefix": "snowplow.collector" - - # Any key-value pairs to be tagged on every StatsD metric - "tags": { - "app": collector - } - } - - # Configures how long the colletor should pause after receiving a sigterm before starting the graceful shutdown. - # During this period the collector continues to accept new connections and respond to requests. - preTerminationPeriod = 10 seconds - - # During the preTerminationPeriod, the collector can be configured to return 503s on the /health endpoint - # Can be helpful for removing the collector from a load balancer's targets. - preTerminationUnhealthy = false - - # The akka server's deadline for closing connections during graceful shutdown - terminationDeadline = 10 seconds - - experimental { - # Enable an experimental feature to send some "warm-up" requests to the collector's own /health endpoint during startup. - # We have found from experiment this can cut down the number of 502s returned from a load balancer in front of the collector in Kubernetes deployments. - # More details in https://github.com/snowplow/stream-collector/issues/249 - warmup { - enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 - } - } -} - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} diff --git a/examples/config.rabbitmq.minimal.hocon b/examples/config.rabbitmq.minimal.hocon deleted file mode 100644 index a9595463b..000000000 --- a/examples/config.rabbitmq.minimal.hocon +++ /dev/null @@ -1,19 +0,0 @@ -collector { - interface: "0.0.0.0" - port: 8080 - - streams { - good: "raw" - bad: "bad-1" - - sink { - host: "localhost" - port: 5672 - username: "guest" - password: "guest" - virtualHost: "/" - routingKeyGood: "raw" - routingKeyBad: "bad-1" - } - } -} diff --git a/examples/config.sqs.extended.hocon b/examples/config.sqs.extended.hocon index c65899a4d..65245ef20 100644 --- a/examples/config.sqs.extended.hocon +++ b/examples/config.sqs.extended.hocon @@ -295,7 +295,7 @@ collector { # Can be helpful for removing the collector from a load balancer's targets. preTerminationUnhealthy = false - # The akka server's deadline for closing connections during graceful shutdown + # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds experimental { @@ -309,30 +309,4 @@ collector { maxCycles = 3 } } -} - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} +} \ No newline at end of file diff --git a/examples/config.stdout.extended.hocon b/examples/config.stdout.extended.hocon index 75289ae55..851b08d13 100644 --- a/examples/config.stdout.extended.hocon +++ b/examples/config.stdout.extended.hocon @@ -239,7 +239,7 @@ collector { # Can be helpful for removing the collector from a load balancer's targets. preTerminationUnhealthy = false - # The akka server's deadline for closing connections during graceful shutdown + # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds experimental { @@ -253,30 +253,4 @@ collector { maxCycles = 3 } } -} - -# Akka has a variety of possible configuration options defined at -# http://doc.akka.io/docs/akka/current/scala/general/configuration.html -akka { - loglevel = WARNING # 'OFF' for no logging, 'DEBUG' for all logging. - loggers = ["akka.event.slf4j.Slf4jLogger"] - - # akka-http is the server the Stream collector uses and has configurable options defined at - # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html - http.server { - # To obtain the hostname in the collector, the 'remote-address' header - # should be set. By default, this is disabled, and enabling it - # adds the 'Remote-Address' header to every request automatically. - remote-address-header = on - - raw-request-uri-header = on - - # Define the maximum request length (the default is 2048) - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - } - - max-connections = 2048 - } -} +} \ No newline at end of file diff --git a/flake.lock b/flake.lock new file mode 100644 index 000000000..3eba6bb9f --- /dev/null +++ b/flake.lock @@ -0,0 +1,256 @@ +{ + "nodes": { + "devenv": { + "inputs": { + "flake-compat": "flake-compat", + "nix": "nix", + "nixpkgs": [ + "nixpkgs" + ], + "pre-commit-hooks": "pre-commit-hooks" + }, + "locked": { + "lastModified": 1697058441, + "narHash": "sha256-gjtW+nkM9suMsjyid63HPmt6WZQEvuVqA5cOAf4lLM0=", + "owner": "cachix", + "repo": "devenv", + "rev": "55294461a62d90c8626feca22f52b0d3d0e18e39", + "type": "github" + }, + "original": { + "owner": "cachix", + "repo": "devenv", + "type": "github" + } + }, + "flake-compat": { + "flake": false, + "locked": { + "lastModified": 1673956053, + "narHash": "sha256-4gtG9iQuiKITOjNQQeQIpoIB6b16fm+504Ch3sNKLd8=", + "owner": "edolstra", + "repo": "flake-compat", + "rev": "35bb57c0c8d8b62bbfd284272c928ceb64ddbde9", + "type": "github" + }, + "original": { + "owner": "edolstra", + "repo": "flake-compat", + "type": "github" + } + }, + "flake-utils": { + "inputs": { + "systems": "systems" + }, + "locked": { + "lastModified": 1685518550, + "narHash": "sha256-o2d0KcvaXzTrPRIo0kOLV0/QXHhDQ5DTi+OxcjO8xqY=", + "owner": "numtide", + "repo": "flake-utils", + "rev": "a1720a10a6cfe8234c0e93907ffe81be440f4cef", + "type": "github" + }, + "original": { + "owner": "numtide", + "repo": "flake-utils", + "type": "github" + } + }, + "flake-utils_2": { + "inputs": { + "systems": "systems_2" + }, + "locked": { + "lastModified": 1694529238, + "narHash": "sha256-zsNZZGTGnMOf9YpHKJqMSsa0dXbfmxeoJ7xHlrt+xmY=", + "owner": "numtide", + "repo": "flake-utils", + "rev": "ff7b65b44d01cf9ba6a71320833626af21126384", + "type": "github" + }, + "original": { + "owner": "numtide", + "repo": "flake-utils", + "type": "github" + } + }, + "gitignore": { + "inputs": { + "nixpkgs": [ + "devenv", + "pre-commit-hooks", + "nixpkgs" + ] + }, + "locked": { + "lastModified": 1660459072, + "narHash": "sha256-8DFJjXG8zqoONA1vXtgeKXy68KdJL5UaXR8NtVMUbx8=", + "owner": "hercules-ci", + "repo": "gitignore.nix", + "rev": "a20de23b925fd8264fd7fad6454652e142fd7f73", + "type": "github" + }, + "original": { + "owner": "hercules-ci", + "repo": "gitignore.nix", + "type": "github" + } + }, + "lowdown-src": { + "flake": false, + "locked": { + "lastModified": 1633514407, + "narHash": "sha256-Dw32tiMjdK9t3ETl5fzGrutQTzh2rufgZV4A/BbxuD4=", + "owner": "kristapsdz", + "repo": "lowdown", + "rev": "d2c2b44ff6c27b936ec27358a2653caaef8f73b8", + "type": "github" + }, + "original": { + "owner": "kristapsdz", + "repo": "lowdown", + "type": "github" + } + }, + "nix": { + "inputs": { + "lowdown-src": "lowdown-src", + "nixpkgs": [ + "devenv", + "nixpkgs" + ], + "nixpkgs-regression": "nixpkgs-regression" + }, + "locked": { + "lastModified": 1676545802, + "narHash": "sha256-EK4rZ+Hd5hsvXnzSzk2ikhStJnD63odF7SzsQ8CuSPU=", + "owner": "domenkozar", + "repo": "nix", + "rev": "7c91803598ffbcfe4a55c44ac6d49b2cf07a527f", + "type": "github" + }, + "original": { + "owner": "domenkozar", + "ref": "relaxed-flakes", + "repo": "nix", + "type": "github" + } + }, + "nixpkgs": { + "locked": { + "lastModified": 1697379843, + "narHash": "sha256-RcnGuJgC2K/UpTy+d32piEoBXq2M+nVFzM3ah/ZdJzg=", + "owner": "nixos", + "repo": "nixpkgs", + "rev": "12bdeb01ff9e2d3917e6a44037ed7df6e6c3df9d", + "type": "github" + }, + "original": { + "owner": "nixos", + "ref": "nixpkgs-unstable", + "repo": "nixpkgs", + "type": "github" + } + }, + "nixpkgs-regression": { + "locked": { + "lastModified": 1643052045, + "narHash": "sha256-uGJ0VXIhWKGXxkeNnq4TvV3CIOkUJ3PAoLZ3HMzNVMw=", + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "215d4d0fd80ca5163643b03a33fde804a29cc1e2", + "type": "github" + }, + "original": { + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "215d4d0fd80ca5163643b03a33fde804a29cc1e2", + "type": "github" + } + }, + "nixpkgs-stable": { + "locked": { + "lastModified": 1685801374, + "narHash": "sha256-otaSUoFEMM+LjBI1XL/xGB5ao6IwnZOXc47qhIgJe8U=", + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "c37ca420157f4abc31e26f436c1145f8951ff373", + "type": "github" + }, + "original": { + "owner": "NixOS", + "ref": "nixos-23.05", + "repo": "nixpkgs", + "type": "github" + } + }, + "pre-commit-hooks": { + "inputs": { + "flake-compat": [ + "devenv", + "flake-compat" + ], + "flake-utils": "flake-utils", + "gitignore": "gitignore", + "nixpkgs": [ + "devenv", + "nixpkgs" + ], + "nixpkgs-stable": "nixpkgs-stable" + }, + "locked": { + "lastModified": 1688056373, + "narHash": "sha256-2+SDlNRTKsgo3LBRiMUcoEUb6sDViRNQhzJquZ4koOI=", + "owner": "cachix", + "repo": "pre-commit-hooks.nix", + "rev": "5843cf069272d92b60c3ed9e55b7a8989c01d4c7", + "type": "github" + }, + "original": { + "owner": "cachix", + "repo": "pre-commit-hooks.nix", + "type": "github" + } + }, + "root": { + "inputs": { + "devenv": "devenv", + "flake-utils": "flake-utils_2", + "nixpkgs": "nixpkgs" + } + }, + "systems": { + "locked": { + "lastModified": 1681028828, + "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=", + "owner": "nix-systems", + "repo": "default", + "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e", + "type": "github" + }, + "original": { + "owner": "nix-systems", + "repo": "default", + "type": "github" + } + }, + "systems_2": { + "locked": { + "lastModified": 1681028828, + "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=", + "owner": "nix-systems", + "repo": "default", + "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e", + "type": "github" + }, + "original": { + "owner": "nix-systems", + "repo": "default", + "type": "github" + } + } + }, + "root": "root", + "version": 7 +} diff --git a/flake.nix b/flake.nix new file mode 100644 index 000000000..7258c456f --- /dev/null +++ b/flake.nix @@ -0,0 +1,60 @@ +{ + description = "applications for recovering snowplow bad rows"; + + inputs = { + nixpkgs.url = "github:nixos/nixpkgs/nixpkgs-unstable"; + flake-utils.url = "github:numtide/flake-utils"; + flake-utils.inputs.nixpkgs.follows = "nixpkgs"; + devenv.url = "github:cachix/devenv"; + devenv.inputs.nixpkgs.follows = "nixpkgs"; + }; + + outputs = { + nixpkgs, + flake-utils, + devenv, + ... + } @ inputs: + flake-utils.lib.eachDefaultSystem ( + system: let + pkgs = import nixpkgs { + inherit system; + config.allowUnfree = true; + config.allowUnsupportedSystem = true; + }; + jre = pkgs.openjdk11; + sbt = pkgs.sbt.override {inherit jre;}; + coursier = pkgs.coursier.override {inherit jre;}; + metals = pkgs.metals.override {inherit coursier jre;}; + in { + devShell = devenv.lib.mkShell { + inherit inputs pkgs; + modules = [ + { + packages = [ + jre + metals + sbt + # pkgs.nodePackages.snyk + pkgs.kubernetes-helm + # (pkgs.wrapHelm pkgs.kubernetes-helm {plugins = [pkgs.kubernetes-helmPlugins.helm-diff];}) + # pkgs.google-cloud-sdk.withExtraComponents( with pkgs.google-cloud-sdk.components [ gke-gcloud-auth-plugin ]); + (pkgs.google-cloud-sdk.withExtraComponents [pkgs.google-cloud-sdk.components.gke-gcloud-auth-plugin]) + # pkgs.google-cloud-sdk-gce + ]; + languages.nix.enable = true; + pre-commit.hooks = { + alejandra.enable = true; + deadnix.enable = true; + gitleaks = { + enable = true; + name = "gitleaks"; + entry = "${pkgs.gitleaks}/bin/gitleaks detect --source . -v"; + }; + }; + } + ]; + }; + } + ); +} diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala deleted file mode 100644 index a14ea04af..000000000 --- a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/CollectorOutput.scala +++ /dev/null @@ -1,20 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.it - -import com.snowplowanalytics.snowplow.badrows.BadRow - -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload - -case class CollectorOutput( - good: List[CollectorPayload], - bad: List[BadRow] -) diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala deleted file mode 100644 index e25dd11ad..000000000 --- a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/EventGenerator.scala +++ /dev/null @@ -1,58 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.it - -import cats.effect.IO - -import org.http4s.{Method, Request, Uri} - -object EventGenerator { - - def sendEvents( - collectorHost: String, - collectorPort: Int, - nbGood: Int, - nbBad: Int, - maxBytes: Int - ): IO[Unit] = { - val requests = generateEvents(collectorHost, collectorPort, nbGood, nbBad, maxBytes) - Http.statuses(requests) - .flatMap { responses => - responses.collect { case resp if resp.code != 200 => resp.reason } match { - case Nil => IO.unit - case errors => IO.raiseError(new RuntimeException(s"${errors.size} requests were not successful. Example error: ${errors.head}")) - } - } - } - - def generateEvents( - collectorHost: String, - collectorPort: Int, - nbGood: Int, - nbBad: Int, - maxBytes: Int - ): List[Request[IO]] = { - val good = List.fill(nbGood)(mkTp2Event(collectorHost, collectorPort, valid = true, maxBytes)) - val bad = List.fill(nbBad)(mkTp2Event(collectorHost, collectorPort, valid = false, maxBytes)) - good ++ bad - } - - def mkTp2Event( - collectorHost: String, - collectorPort: Int, - valid: Boolean = true, - maxBytes: Int = 100 - ): Request[IO] = { - val uri = Uri.unsafeFromString(s"http://$collectorHost:$collectorPort/com.snowplowanalytics.snowplow/tp2") - val body = if (valid) "foo" else "a" * (maxBytes + 1) - Request[IO](Method.POST, uri).withEntity(body) - } -} diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala deleted file mode 100644 index e7d1d613a..000000000 --- a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/Http.scala +++ /dev/null @@ -1,35 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.it - -import cats.effect.{IO, Resource} -import cats.implicits._ -import org.http4s.blaze.client.BlazeClientBuilder -import org.http4s.client.Client -import org.http4s.{Request, Response, Status} - -object Http { - - def statuses(requests: List[Request[IO]]): IO[List[Status]] = - mkClient.use { client => requests.traverse(client.status) } - - def status(request: Request[IO]): IO[Status] = - mkClient.use { client => client.status(request) } - - def response(request: Request[IO]): IO[Response[IO]] = - mkClient.use(c => c.run(request).use(resp => IO.pure(resp))) - - def responses(requests: List[Request[IO]]): IO[List[Response[IO]]] = - mkClient.use(c => requests.traverse(r => c.run(r).use(resp => IO.pure(resp)))) - - def mkClient: Resource[IO, Client[IO]] = - BlazeClientBuilder.apply[IO].resource -} diff --git a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala b/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala deleted file mode 100644 index 485836c1e..000000000 --- a/http4s/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/utils.scala +++ /dev/null @@ -1,135 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream.it - -import scala.concurrent.duration._ - -import org.apache.thrift.TDeserializer - -import org.slf4j.LoggerFactory - -import org.testcontainers.containers.GenericContainer -import org.testcontainers.containers.output.Slf4jLogConsumer - -import io.circe.{Json, parser} - -import cats.implicits._ - -import cats.effect.IO - -import retry.syntax.all._ -import retry.RetryPolicies - -import com.snowplowanalytics.snowplow.badrows.BadRow - -import com.snowplowanalytics.iglu.core.SelfDescribingData -import com.snowplowanalytics.iglu.core.circe.implicits._ - -import com.snowplowanalytics.snowplow.CollectorPayload.thrift.model1.CollectorPayload - -object utils { - - def parseCollectorPayload(bytes: Array[Byte]): CollectorPayload = { - val deserializer = new TDeserializer() - val target = new CollectorPayload() - deserializer.deserialize(target, bytes) - target - } - - def parseBadRow(bytes: Array[Byte]): BadRow = { - val str = new String(bytes) - val parsed = for { - json <- parser.parse(str).leftMap(_.message) - sdj <- SelfDescribingData.parse(json).leftMap(_.message("Can't decode JSON as SDJ")) - br <- sdj.data.as[BadRow].leftMap(_.getMessage()) - } yield br - parsed match { - case Right(br) => br - case Left(err) => throw new RuntimeException(s"Can't parse bad row. Error: $err") - } - } - - def printBadRows(testName: String, badRows: List[BadRow]): IO[Unit] = { - log(testName, "Bad rows:") *> - badRows.traverse_(br => log(testName, br.compact)) - } - - def log(testName: String, line: String): IO[Unit] = - IO(println(s"[$testName] $line")) - - def startContainerWithLogs( - container: GenericContainer[_], - loggerName: String - ): GenericContainer[_] = { - container.start() - val logger = LoggerFactory.getLogger(loggerName) - val logs = new Slf4jLogConsumer(logger) - container.followOutput(logs) - container - } - - def waitWhile[A]( - a: A, - condition: A => Boolean, - maxDelay: FiniteDuration - ): IO[Boolean] = { - val retryPolicy = RetryPolicies.limitRetriesByCumulativeDelay( - maxDelay, - RetryPolicies.capDelay[IO]( - 2.second, - RetryPolicies.fullJitter[IO](1.second) - ) - ) - - IO(condition(a)).retryingOnFailures( - result => IO(!result), - retryPolicy, - (_, _) => IO.unit - ) - } - - /** Return a list of config parameters from a raw JSON string. */ - def getConfigParameters(config: String): List[String] = { - val parsed: Json = parser.parse(config).valueOr { case failure => - throw new IllegalArgumentException("Can't parse JSON", failure.underlying) - } - - def flatten(value: Json): Option[List[(String, Json)]] = - value.asObject.map( - _.toList.flatMap { - case (k, v) => flatten(v) match { - case None => List(k -> v) - case Some(fields) => fields.map { - case (innerK, innerV) => s"$k.$innerK" -> innerV - } - } - } - ) - - def withSpaces(s: String): String = if(s.contains(" ")) s""""$s"""" else s - - val fields = flatten(parsed).getOrElse(throw new IllegalArgumentException("Couldn't flatten fields")) - - fields.flatMap { - case (k, v) if v.isString => - List(s"-D$k=${withSpaces(v.asString.get)}") - case (k, v) if v.isArray => - v.asArray.get.toList.zipWithIndex.map { - case (s, i) if s.isString => - s"-D$k.$i=${withSpaces(s.asString.get)}" - case (other, i) => - s"-D$k.$i=${withSpaces(other.toString)}" - } - case (k, v) => - List(s"-D$k=${withSpaces(v.toString)}") - } - } -} diff --git a/http4s/src/main/resources/reference.conf b/http4s/src/main/resources/reference.conf deleted file mode 100644 index 96dfd594f..000000000 --- a/http4s/src/main/resources/reference.conf +++ /dev/null @@ -1,94 +0,0 @@ -{ - paths {} - - p3p { - policyRef = "/w3c/p3p.xml" - CP = "NOI DSP COR NID PSA OUR IND COM NAV STA" - } - - crossDomain { - enabled = false - domains = [ "*" ] - secure = true - } - - cookie { - enabled = true - expiration = 365 days - domains = [] - name = sp - secure = true - httpOnly = true - sameSite = "None" - } - - doNotTrackCookie { - enabled = false - name = "" - value = "" - } - - cookieBounce { - enabled = false - name = "n3pc" - fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000" - } - - redirectMacro { - enabled = false - } - - rootResponse { - enabled = false - statusCode = 302 - headers = {} - body = "" - } - - cors { - accessControlMaxAge = 60 minutes - } - - streams { - useIpAddressAsPartitionKey = false - } - - telemetry { - disable = false - interval = 60 minutes - method = POST - url = telemetry-g.snowplowanalytics.com - port = 443 - secure = true - } - - monitoring { - metrics { - statsd { - enabled = false - hostname = localhost - port = 8125 - period = 10 seconds - prefix = snowplow.collector - } - } - } - - ssl { - enable = false - redirect = false - port = 443 - } - - networking { - maxConnections = 1024 - idleTimeout = 610 seconds - } - - enableDefaultRedirect = false - preTerminationPeriod = 10 seconds - - redirectDomains = [] - - preTerminationPeriod = 10 seconds -} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala deleted file mode 100644 index 837252b72..000000000 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala +++ /dev/null @@ -1,8 +0,0 @@ -package com.snowplowanalytics.snowplow.collector.core - -trait AppInfo { - def name: String - def moduleName: String - def version: String - def dockerAlias: String -} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala deleted file mode 100644 index e62b7322f..000000000 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala +++ /dev/null @@ -1,117 +0,0 @@ -package com.snowplowanalytics.snowplow.collector.core - -import java.net.InetSocketAddress -import javax.net.ssl.SSLContext - -import io.netty.handler.ssl._ - -import org.typelevel.log4cats.Logger -import org.typelevel.log4cats.slf4j.Slf4jLogger - -import com.comcast.ip4s.{IpAddress, Port} - -import cats.implicits._ - -import cats.effect.{Async, Resource} - -import org.http4s.HttpApp -import org.http4s.server.Server -import org.http4s.ember.server.EmberServerBuilder -import org.http4s.blaze.server.BlazeServerBuilder -import org.http4s.netty.server.NettyServerBuilder - -import fs2.io.net.Network -import fs2.io.net.tls.TLSContext - -object HttpServer { - - implicit private def logger[F[_]: Async] = Slf4jLogger.getLogger[F] - - def build[F[_]: Async]( - app: HttpApp[F], - interface: String, - port: Int, - secure: Boolean, - networking: Config.Networking - ): Resource[F, Server] = - sys.env.get("HTTP4S_BACKEND").map(_.toUpperCase()) match { - case Some("BLAZE") | None => buildBlazeServer[F](app, port, secure, networking) - case Some("EMBER") => buildEmberServer[F](app, interface, port, secure, networking) - case Some("NETTY") => buildNettyServer[F](app, port, secure, networking) - case Some(other) => throw new IllegalArgumentException(s"Unrecognized http4s backend $other") - } - - private def buildEmberServer[F[_]: Async]( - app: HttpApp[F], - interface: String, - port: Int, - secure: Boolean, - networking: Config.Networking - ) = { - implicit val network = Network.forAsync[F] - Resource.eval(Logger[F].info("Building ember server")) >> - EmberServerBuilder - .default[F] - .withHost(IpAddress.fromString(interface).get) - .withPort(Port.fromInt(port).get) - .withHttpApp(app) - .withIdleTimeout(networking.idleTimeout) - .withMaxConnections(networking.maxConnections) - .cond(secure, _.withTLS(TLSContext.Builder.forAsync.fromSSLContext(SSLContext.getDefault))) - .build - } - - private def buildBlazeServer[F[_]: Async]( - app: HttpApp[F], - port: Int, - secure: Boolean, - networking: Config.Networking - ): Resource[F, Server] = - Resource.eval(Logger[F].info("Building blaze server")) >> - BlazeServerBuilder[F] - .bindSocketAddress(new InetSocketAddress(port)) - .withHttpApp(app) - .withIdleTimeout(networking.idleTimeout) - .withMaxConnections(networking.maxConnections) - .cond(secure, _.withSslContext(SSLContext.getDefault)) - .resource - - private def buildNettyServer[F[_]: Async]( - app: HttpApp[F], - port: Int, - secure: Boolean, - networking: Config.Networking - ) = - Resource.eval(Logger[F].info("Building netty server")) >> - NettyServerBuilder[F] - .bindLocal(port) - .withHttpApp(app) - .withIdleTimeout(networking.idleTimeout) - .cond( - secure, - _.withSslContext( - new JdkSslContext( - SSLContext.getDefault, - false, - null, - IdentityCipherSuiteFilter.INSTANCE, - new ApplicationProtocolConfig( - ApplicationProtocolConfig.Protocol.ALPN, - ApplicationProtocolConfig.SelectorFailureBehavior.NO_ADVERTISE, - ApplicationProtocolConfig.SelectedListenerFailureBehavior.ACCEPT, - ApplicationProtocolNames.HTTP_2, - ApplicationProtocolNames.HTTP_1_1 - ), - ClientAuth.NONE, - null, - false - ) - ) - ) - .resource - - implicit class ConditionalAction[A](item: A) { - def cond(cond: Boolean, action: A => A): A = - if (cond) action(item) else item - } -} diff --git a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala b/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala deleted file mode 100644 index 5a5c7d05b..000000000 --- a/http4s/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Sink.scala +++ /dev/null @@ -1,11 +0,0 @@ -package com.snowplowanalytics.snowplow.collector.core - -trait Sink[F[_]] { - - // Maximum number of bytes that a single record can contain. - // If a record is bigger, a size violation bad row is emitted instead - val maxBytes: Int - - def isHealthy: F[Boolean] - def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] -} diff --git a/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala index fea1b327a..588619c97 100644 --- a/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala +++ b/kafka/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kafka/KafkaUtils.scala @@ -14,7 +14,7 @@ import cats.effect._ import org.apache.kafka.clients.consumer._ import java.util.Properties import java.time.Duration -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import com.snowplowanalytics.snowplow.collectors.scalastream.it.utils._ import com.snowplowanalytics.snowplow.collectors.scalastream.it.CollectorOutput diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala index 7185e9904..3d6f4599c 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/CookieSpec.scala @@ -48,13 +48,13 @@ class CookieSpec extends Specification with Localstack with CatsEffect { cookie.name must beEqualTo("greatName") cookie.expires match { case Some(expiry) => - expiry.epochSecond should beCloseTo((now + 42.days).toSeconds, 10) + expiry.epochSecond should beCloseTo((now + 42.days).toSeconds, 10L) case None => ko(s"Cookie [$cookie] doesn't contain the expiry date") } cookie.secure should beTrue cookie.httpOnly should beTrue - cookie.sameSite should beSome(SameSite.Strict) + cookie.sameSite should beSome[SameSite](SameSite.Strict) case _ => ko(s"There is not 1 cookie but ${resp.cookies.size}") } } diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala index 43e521f17..854f5c4d1 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala @@ -18,7 +18,7 @@ import com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis.containe import org.specs2.execute.PendingUntilFixed import org.specs2.mutable.Specification -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import scala.concurrent.duration._ class DoNotTrackCookieSpec extends Specification with Localstack with CatsEffect with PendingUntilFixed { diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala index 8b6eba662..5d2dda5d0 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/kinesis/Kinesis.scala @@ -10,7 +10,7 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.kinesis -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import scala.collection.mutable.ArrayBuffer import cats.effect.{IO, Resource} diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala index f0ccf13c3..f555aeaaa 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala @@ -26,7 +26,7 @@ import org.slf4j.LoggerFactory import java.nio.ByteBuffer import java.util.UUID import java.util.concurrent.ScheduledExecutorService -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import scala.collection.mutable.ListBuffer import scala.concurrent.duration._ import scala.concurrent.{ExecutionContextExecutorService, Future} @@ -94,7 +94,7 @@ class KinesisSink[F[_]: Sync] private ( def flush(): Unit = { val eventsToSend = synchronized { - val evts = storedEvents.result + val evts = storedEvents.result() storedEvents.clear() byteCount = 0 evts @@ -369,7 +369,7 @@ class KinesisSink[F[_]: Sync] private ( private def checkKinesisHealth(): Unit = { val healthRunnable = new Runnable { - override def run() { + override def run(): Unit = { log.info(s"Starting background check for Kinesis stream $streamName") while (!kinesisHealthy) { Try { @@ -394,7 +394,7 @@ class KinesisSink[F[_]: Sync] private ( private def checkSqsHealth(): Unit = maybeSqs.foreach { sqs => val healthRunnable = new Runnable { - override def run() { + override def run(): Unit = { log.info(s"Starting background check for SQS buffer ${sqs.sqsBufferName}") while (!sqsHealthy) { Try { diff --git a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala index 4563e2aac..0653d3244 100644 --- a/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala +++ b/nsq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/NsqSink.scala @@ -12,7 +12,7 @@ package com.snowplowanalytics.snowplow.collectors.scalastream package sinks import java.util.concurrent.TimeoutException -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import cats.effect.{Resource, Sync} import cats.implicits._ import com.snowplowanalytics.client.nsq.NSQProducer @@ -44,11 +44,15 @@ class NsqSink[F[_]: Sync] private ( override def storeRawEvents(events: List[Array[Byte]], key: String): F[Unit] = Sync[F].blocking(producer.produceMulti(topicName, events.asJava)).onError { case _: NSQException | _: TimeoutException => - Sync[F].delay(healthStatus = false) - } *> Sync[F].delay(healthStatus = true) + setHealthStatus(false) + } *> setHealthStatus(true) def shutdown(): Unit = producer.shutdown() + + private def setHealthStatus(status: Boolean): F[Unit] = Sync[F].delay { + healthStatus = status + } } object NsqSink { diff --git a/project/BuildSettings.scala b/project/BuildSettings.scala index b4cc4e13d..3edaad285 100644 --- a/project/BuildSettings.scala +++ b/project/BuildSettings.scala @@ -9,22 +9,130 @@ * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. */ -// SBT +import com.typesafe.sbt.packager.Keys.packageName +import com.typesafe.sbt.packager.docker.DockerPlugin.autoImport._ +import org.scalafmt.sbt.ScalafmtPlugin.autoImport._ +import sbt.Keys._ import sbt._ -import Keys._ +import sbtassembly.AssemblyPlugin.autoImport._ +import sbtassembly.MergeStrategy +import sbtbuildinfo.BuildInfoPlugin.autoImport._ +import sbtdynver.DynVerPlugin.autoImport._ + object BuildSettings { - // sbt-assembly settings for building an executable - import sbtassembly.AssemblyPlugin.autoImport._ - import sbtassembly.MergeStrategy + lazy val commonSettings = Seq( + organization := "com.snowplowanalytics", + name := "snowplow-stream-collector", + description := "Scala Stream Collector for Snowplow raw events", + scalaVersion := "2.13.12", + scalacOptions ++= Seq("-Ywarn-macros:after"), + javacOptions := Seq("-source", "11", "-target", "11"), + resolvers ++= Seq( + "Snowplow Analytics Maven repo".at("http://maven.snplow.com/releases/").withAllowInsecureProtocol(true), + // For uaParser utils + "user-agent-parser repo".at("https://clojars.org/repo/") + ) + ) - val reverseConcat: MergeStrategy = new MergeStrategy { - val name = "reverseConcat" - def apply(tempDir: File, path: String, files: Seq[File]): Either[String, Seq[(File, String)]] = - MergeStrategy.concat(tempDir, path, files.reverse) - } + lazy val coreHttp4sSettings = commonSettings ++ sbtAssemblySettings ++ Defaults.itSettings + + lazy val kinesisSettings = + commonSinkSettings ++ integrationTestSettings ++ Seq( + moduleName := "snowplow-stream-collector-kinesis", + Docker / packageName := "scala-stream-collector-kinesis", + libraryDependencies ++= Seq( + Dependencies.Libraries.catsRetry, + Dependencies.Libraries.kinesis, + Dependencies.Libraries.sts, + Dependencies.Libraries.sqs, + + // integration tests dependencies + Dependencies.Libraries.IntegrationTests.specs2, + Dependencies.Libraries.IntegrationTests.specs2CE, + ) + ) + + lazy val sqsSettings = + commonSinkSettings ++ Seq( + moduleName := "snowplow-stream-collector-sqs", + Docker / packageName := "scala-stream-collector-sqs", + libraryDependencies ++= Seq( + Dependencies.Libraries.catsRetry, + Dependencies.Libraries.sqs, + Dependencies.Libraries.sts, + ) + ) + lazy val pubsubSettings = + commonSinkSettings ++ integrationTestSettings ++ Seq( + moduleName := "snowplow-stream-collector-google-pubsub", + Docker / packageName := "scala-stream-collector-pubsub", + libraryDependencies ++= Seq( + Dependencies.Libraries.catsRetry, + Dependencies.Libraries.fs2PubSub, + Dependencies.Libraries.pubsub, + + // integration tests dependencies + Dependencies.Libraries.IntegrationTests.specs2, + Dependencies.Libraries.IntegrationTests.specs2CE, + ) + ) + + + lazy val kafkaSettings = + commonSinkSettings ++ integrationTestSettings ++ Seq( + moduleName := "snowplow-stream-collector-kafka", + Docker / packageName := "scala-stream-collector-kafka", + libraryDependencies ++= Seq( + Dependencies.Libraries.kafkaClients, + Dependencies.Libraries.mskAuth, + + // integration tests dependencies + Dependencies.Libraries.IntegrationTests.specs2, + Dependencies.Libraries.IntegrationTests.specs2CE + ) + ) + + lazy val nsqSettings = + commonSinkSettings ++ Seq( + moduleName := "snowplow-stream-collector-nsq", + Docker / packageName := "scala-stream-collector-nsq", + libraryDependencies ++= Seq( + Dependencies.Libraries.nsqClient, + Dependencies.Libraries.jackson, + Dependencies.Libraries.nettyAll, + Dependencies.Libraries.log4j + ) + ) + + lazy val stdoutSettings = + commonSinkSettings ++ Seq( + moduleName := "snowplow-stream-collector-stdout", + buildInfoPackage := s"com.snowplowanalytics.snowplow.collector.stdout", + Docker / packageName := "scala-stream-collector-stdout" + ) + + lazy val commonSinkSettings = + commonSettings ++ + buildInfoSettings ++ + sbtAssemblySettings ++ + formatting ++ + dynVerSettings ++ + addExampleConfToTestCp + + lazy val buildInfoSettings = Seq( + buildInfoKeys := Seq[BuildInfoKey](name, moduleName, dockerAlias, version), + buildInfoOptions += BuildInfoOption.Traits("com.snowplowanalytics.snowplow.collector.core.AppInfo"), + buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream" + ) + + lazy val dynVerSettings = Seq( + ThisBuild / dynverVTagPrefix := false, // Otherwise git tags required to have v-prefix + ThisBuild / dynverSeparator := "-" // to be compatible with docker + ) + lazy val sbtAssemblySettings = Seq( assembly / assemblyJarName := { s"${moduleName.value}-${version.value}.jar" }, assembly / assemblyMergeStrategy := { @@ -41,8 +149,13 @@ object BuildSettings { } ) - // Scalafmt plugin - import org.scalafmt.sbt.ScalafmtPlugin.autoImport._ + lazy val reverseConcat: MergeStrategy = new MergeStrategy { + val name = "reverseConcat" + + def apply(tempDir: File, path: String, files: Seq[File]): Either[String, Seq[(File, String)]] = + MergeStrategy.concat(tempDir, path, files.reverse) + } + lazy val formatting = Seq( scalafmtConfig := file(".scalafmt.conf"), scalafmtOnCompile := true @@ -53,4 +166,37 @@ object BuildSettings { baseDirectory.value.getParentFile / "examples" } ) + + lazy val integrationTestSettings = Defaults.itSettings ++ scalifiedSettings ++ Seq( + IntegrationTest / test := (IntegrationTest / test).dependsOn(Docker / publishLocal).value, + IntegrationTest / testOnly := (IntegrationTest / testOnly).dependsOn(Docker / publishLocal).evaluated + ) + + // Make package (build) metadata available within source code for integration tests. + lazy val scalifiedSettings = Seq( + IntegrationTest / sourceGenerators += Def.task { + val file = (IntegrationTest / sourceManaged).value / "settings.scala" + IO.write( + file, + """package %s + |object ProjectMetadata { + | val organization = "%s" + | val name = "%s" + | val version = "%s" + | val dockerTag = "%s" + |} + |""" + .stripMargin + .format( + buildInfoPackage.value, + organization.value, + name.value, + version.value, + dockerAlias.value.tag.get + ) + ) + Seq(file) + }.taskValue + ) + } diff --git a/project/Dependencies.scala b/project/Dependencies.scala index 4052f5f18..341af732f 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -12,137 +12,77 @@ import sbt._ object Dependencies { - val resolutionRepos = Seq( - "Snowplow Analytics Maven repo".at("http://maven.snplow.com/releases/").withAllowInsecureProtocol(true), - // For uaParser utils - "user-agent-parser repo".at("https://clojars.org/repo/") - ) - object V { - // Java - val awsSdk = "1.12.327" - val pubsub = "1.119.1" - val kafka = "2.2.1" - val mskAuth = "1.1.1" - val nsqClient = "1.3.0" - val jodaTime = "2.10.13" - val slf4j = "1.7.32" - val log4j = "2.17.2" // CVE-2021-44228 - val config = "1.4.1" - val rabbitMQ = "5.15.0" - val jackson = "2.12.7" // force this version to mitigate security vulnerabilities - val thrift = "0.15.0" // force this version to mitigate security vulnerabilities - val jnrUnixsock = "0.38.17" // force this version to mitigate security vulnerabilities - val protobuf = "3.21.7" // force this version to mitigate security vulnerabilities - // Scala - val collectorPayload = "0.0.0" - val tracker = "2.0.0" - val akkaHttp = "10.2.7" - val akka = "2.6.16" - val scopt = "4.0.1" - val pureconfig = "0.17.2" - val akkaHttpMetrics = "1.7.1" + val awsSdk = "1.12.327" val badRows = "2.2.1" - val log4cats = "2.6.0" - val http4s = "0.23.23" val blaze = "0.23.15" - val http4sNetty = "0.5.9" - val decline = "2.4.1" + val catsRetry = "3.1.0" + val ceTestkit = "3.4.5" val circe = "0.14.1" val circeConfig = "0.10.0" + val collectorPayload = "0.0.0" + val decline = "2.4.1" val fs2PubSub = "0.22.0" - val catsRetry = "3.1.0" + val http4s = "0.23.23" + val jackson = "2.12.7" // force this version to mitigate security vulnerabilities + val kafka = "2.2.1" + val log4cats = "2.6.0" + val log4j = "2.17.2" // CVE-2021-44228 + val mskAuth = "1.1.1" val nettyAll = "4.1.95.Final" // to fix nsq dependency - - // Scala (test only) - val specs2 = "4.11.0" - val specs2CE = "1.5.0" - val testcontainers = "0.40.10" - val ceTestkit = "3.4.5" - - object Legacy { - val specs2CE = "0.4.1" - val catsRetry = "2.1.0" - val http4s = "0.21.33" - val tracker = "1.0.1" - } + val nsqClient = "1.3.0" + val pubsub = "1.125.11" // force this version to mitigate security vulnerabilities + val rabbitMQ = "5.15.0" + val slf4j = "1.7.32" + val specs2 = "4.11.0" + val specs2CE = "1.5.0" + val testcontainers = "0.40.10" + val thrift = "0.15.0" // force this version to mitigate security vulnerabilities + val tracker = "2.0.0" } object Libraries { - // Java - val jackson = "com.fasterxml.jackson.core" % "jackson-databind" % V.jackson // nsq only - val nettyAll = "io.netty" % "netty-all" % V.nettyAll //nsq only - val thrift = "org.apache.thrift" % "libthrift" % V.thrift - val kinesis = "com.amazonaws" % "aws-java-sdk-kinesis" % V.awsSdk - val sqs = "com.amazonaws" % "aws-java-sdk-sqs" % V.awsSdk - val sts = "com.amazonaws" % "aws-java-sdk-sts" % V.awsSdk % Runtime // Enables web token authentication https://github.com/snowplow/stream-collector/issues/169 - val pubsub = "com.google.cloud" % "google-cloud-pubsub" % V.pubsub - val kafkaClients = "org.apache.kafka" % "kafka-clients" % V.kafka - val mskAuth = "software.amazon.msk" % "aws-msk-iam-auth" % V.mskAuth % Runtime // Enables AWS MSK IAM authentication https://github.com/snowplow/stream-collector/pull/214 - val nsqClient = "com.snowplowanalytics" % "nsq-java-client" % V.nsqClient - val jodaTime = "joda-time" % "joda-time" % V.jodaTime - val slf4j = "org.slf4j" % "slf4j-simple" % V.slf4j - val log4jOverSlf4j = "org.slf4j" % "log4j-over-slf4j" % V.slf4j - val log4j = "org.apache.logging.log4j" % "log4j-core" % V.log4j - val config = "com.typesafe" % "config" % V.config - val jnrUnixsocket = "com.github.jnr" % "jnr-unixsocket" % V.jnrUnixsock - val rabbitMQ = "com.rabbitmq" % "amqp-client" % V.rabbitMQ - val protobuf = "com.google.protobuf" % "protobuf-java" % V.protobuf - // Scala - val collectorPayload = "com.snowplowanalytics" % "collector-payload-1" % V.collectorPayload + //common core val badRows = "com.snowplowanalytics" %% "snowplow-badrows" % V.badRows - val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.tracker + val catsRetry = "com.github.cb372" %% "cats-retry" % V.catsRetry + val circeConfig = "io.circe" %% "circe-config" % V.circeConfig + val circeGeneric = "io.circe" %% "circe-generic" % V.circe + val collectorPayload = "com.snowplowanalytics" % "collector-payload-1" % V.collectorPayload + val decline = "com.monovore" %% "decline-effect" % V.decline val emitterHttps = "com.snowplowanalytics" %% "snowplow-scala-tracker-emitter-http4s" % V.tracker - val scopt = "com.github.scopt" %% "scopt" % V.scopt - val akkaHttp = "com.typesafe.akka" %% "akka-http" % V.akkaHttp - val akkaStream = "com.typesafe.akka" %% "akka-stream" % V.akka - val akkaSlf4j = "com.typesafe.akka" %% "akka-slf4j" % V.akka - val pureconfig = "com.github.pureconfig" %% "pureconfig" % V.pureconfig - val akkaHttpMetrics = "fr.davit" %% "akka-http-metrics-datadog" % V.akkaHttpMetrics + val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze + val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze + val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s val log4cats = "org.typelevel" %% "log4cats-slf4j" % V.log4cats + val slf4j = "org.slf4j" % "slf4j-simple" % V.slf4j + val thrift = "org.apache.thrift" % "libthrift" % V.thrift + val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.tracker + + //sinks + val fs2PubSub = "com.permutive" %% "fs2-google-pubsub-grpc" % V.fs2PubSub + val jackson = "com.fasterxml.jackson.core" % "jackson-databind" % V.jackson + val kafkaClients = "org.apache.kafka" % "kafka-clients" % V.kafka + val kinesis = "com.amazonaws" % "aws-java-sdk-kinesis" % V.awsSdk + val log4j = "org.apache.logging.log4j" % "log4j-core" % V.log4j + val mskAuth = "software.amazon.msk" % "aws-msk-iam-auth" % V.mskAuth % Runtime // Enables AWS MSK IAM authentication https://github.com/snowplow/stream-collector/pull/214 + val nettyAll = "io.netty" % "netty-all" % V.nettyAll + val nsqClient = "com.snowplowanalytics" % "nsq-java-client" % V.nsqClient + val pubsub = "com.google.cloud" % "google-cloud-pubsub" % V.pubsub + val sqs = "com.amazonaws" % "aws-java-sdk-sqs" % V.awsSdk + val sts = "com.amazonaws" % "aws-java-sdk-sts" % V.awsSdk % Runtime // Enables web token authentication https://github.com/snowplow/stream-collector/issues/169 - // http4s - val http4sDsl = "org.http4s" %% "http4s-dsl" % V.http4s - val http4sEmber = "org.http4s" %% "http4s-ember-server" % V.http4s - val http4sBlaze = "org.http4s" %% "http4s-blaze-server" % V.blaze - val http4sNetty = "org.http4s" %% "http4s-netty-server" % V.http4sNetty - val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze - val decline = "com.monovore" %% "decline-effect" % V.decline - val circeGeneric = "io.circe" %% "circe-generic" % V.circe - val circeConfig = "io.circe" %% "circe-config" % V.circeConfig - val catsRetry = "com.github.cb372" %% "cats-retry" % V.catsRetry - val fs2PubSub = "com.permutive" %% "fs2-google-pubsub-grpc" % V.fs2PubSub - - // Scala (test only) - - // Test common + //common unit tests val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % Test val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % Test val ceTestkit = "org.typelevel" %% "cats-effect-testkit" % V.ceTestkit % Test - // Test Akka - val akkaTestkit = "com.typesafe.akka" %% "akka-testkit" % V.akka % Test - val akkaHttpTestkit = "com.typesafe.akka" %% "akka-http-testkit" % V.akkaHttp % Test - val akkaStreamTestkit = "com.typesafe.akka" %% "akka-stream-testkit" % V.akka % Test - - // Integration tests - object IT { - val testcontainers = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest - val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest - val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % IntegrationTest - val catsRetry = "com.github.cb372" %% "cats-retry" % V.catsRetry % IntegrationTest - val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze % IntegrationTest - } - - object Legacy { - val testcontainers = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest - val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest - val specs2CE = "com.codecommit" %% "cats-effect-testing-specs2" % V.Legacy.specs2CE % IntegrationTest - val catsRetry = "com.github.cb372" %% "cats-retry" % V.Legacy.catsRetry % IntegrationTest - val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.Legacy.http4s % IntegrationTest - val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.Legacy.tracker - val trackerEmitterId = "com.snowplowanalytics" %% "snowplow-scala-tracker-emitter-id" % V.Legacy.tracker + object IntegrationTests { + val testcontainers = "com.dimafeng" %% "testcontainers-scala-core" % V.testcontainers % IntegrationTest + val specs2 = "org.specs2" %% "specs2-core" % V.specs2 % IntegrationTest + val specs2CE = "org.typelevel" %% "cats-effect-testing-specs2" % V.specs2CE % IntegrationTest + val catsRetry = "com.github.cb372" %% "cats-retry" % V.catsRetry % IntegrationTest + val http4sClient = "org.http4s" %% "http4s-blaze-client" % V.blaze % IntegrationTest } } } diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala index 99ee196a8..88cfaf98c 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/GooglePubSubCollectorSpec.scala @@ -24,9 +24,9 @@ class GooglePubSubCollectorSpec extends Specification with CatsEffect with Befor override protected val Timeout = 5.minutes - def beforeAll: Unit = Containers.startEmulator() + def beforeAll(): Unit = Containers.startEmulator() - def afterAll: Unit = Containers.stopEmulator() + def afterAll(): Unit = Containers.stopEmulator() val stopTimeout = 20.second diff --git a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala index 3bac0f273..f23da17fa 100644 --- a/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala +++ b/pubsub/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/pubsub/PubSub.scala @@ -10,7 +10,7 @@ */ package com.snowplowanalytics.snowplow.collectors.scalastream.it.pubsub -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import com.google.api.gax.grpc.GrpcTransportChannel import com.google.api.gax.rpc.{FixedTransportChannelProvider, TransportChannelProvider} diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala index 07940c3c0..2b40b45f9 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubHealthCheck.scala @@ -9,7 +9,7 @@ import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.BuilderOps._ import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import scala.util._ object PubSubHealthCheck { diff --git a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala index ec29a53a2..3c607ad54 100644 --- a/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala +++ b/pubsub/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/PubSubSink.scala @@ -95,7 +95,7 @@ object PubSubSink { sinkConfig.name ) - private def createProducer[F[_]: Async: Parallel]( + private def createProducer[F[_]: Async]( sinkConfig: PubSubSinkConfig, topicName: String, bufferConfig: Config.Buffer diff --git a/rabbitmq/src/main/resources/application.conf b/rabbitmq/src/main/resources/application.conf deleted file mode 100644 index 08de80c62..000000000 --- a/rabbitmq/src/main/resources/application.conf +++ /dev/null @@ -1,43 +0,0 @@ -collector { - streams { - sink { - enabled = rabbitmq - - backoffPolicy { - minBackoff = 100 - maxBackoff = 10000 - multiplier = 2 - } - - maxBytes = 128000000 - } - - buffer { - byteLimit = 3145728 - recordLimit = 500 - timeLimit = 5000 - } - } -} - -akka { - loglevel = WARNING - loggers = ["akka.event.slf4j.Slf4jLogger"] - - http.server { - remote-address-header = on - raw-request-uri-header = on - - parsing { - max-uri-length = 32768 - uri-parsing-mode = relaxed - illegal-header-warnings = off - } - - max-connections = 2048 - } - - coordinated-shutdown { - run-by-jvm-shutdown-hook = off - } -} diff --git a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala b/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala deleted file mode 100644 index 2d17dc39a..000000000 --- a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/RabbitMQCollector.scala +++ /dev/null @@ -1,94 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream - -import cats.syntax.either._ - -import scala.concurrent.ExecutionContext - -import java.util.concurrent.Executors - -import com.rabbitmq.client.{Channel, Connection, ConnectionFactory} - -import com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService -import com.snowplowanalytics.snowplow.collectors.scalastream.generated.BuildInfo -import com.snowplowanalytics.snowplow.collectors.scalastream.model._ -import com.snowplowanalytics.snowplow.collectors.scalastream.sinks.RabbitMQSink - -object RabbitMQCollector extends Collector { - def appName = BuildInfo.shortName - def appVersion = BuildInfo.version - def scalaVersion = BuildInfo.scalaVersion - - def main(args: Array[String]): Unit = { - val (collectorConf, akkaConf) = parseConfig(args) - val telemetry = TelemetryAkkaService.initWithCollector(collectorConf, BuildInfo.moduleName, appVersion) - val sinks: Either[Throwable, CollectorSinks] = - for { - config <- collectorConf.streams.sink match { - case rabbit: Rabbitmq => rabbit.asRight - case _ => new IllegalArgumentException("Configured sink is not RabbitMQ").asLeft - } - rabbitMQ <- initRabbitMQ(config) - (connection, channel) = rabbitMQ - _ = Runtime.getRuntime().addShutdownHook(shutdownHook(connection, channel)) - threadPool = initThreadPool(config.threadPoolSize) - goodSink <- RabbitMQSink.init( - config.maxBytes, - channel, - collectorConf.streams.good, - config.backoffPolicy, - threadPool - ) - badSink <- RabbitMQSink.init( - config.maxBytes, - channel, - collectorConf.streams.bad, - config.backoffPolicy, - threadPool - ) - } yield CollectorSinks(goodSink, badSink) - - sinks match { - case Right(s) => run(collectorConf, akkaConf, s, telemetry) - case Left(e) => - e.printStackTrace - System.exit(1) - } - } - - private def initRabbitMQ(config: Rabbitmq): Either[Throwable, (Connection, Channel)] = - Either.catchNonFatal { - val factory = new ConnectionFactory() - factory.setUsername(config.username) - factory.setPassword(config.password) - factory.setVirtualHost(config.virtualHost) - factory.setHost(config.host) - factory.setPort(config.port) - val connection = factory.newConnection() - val channel = connection.createChannel() - (connection, channel) - } - - private def initThreadPool(size: Option[Int]): ExecutionContext = - size match { - case Some(s) => ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(s)) - case None => ExecutionContext.fromExecutorService(Executors.newCachedThreadPool()) - } - - private def shutdownHook(connection: Connection, channel: Channel) = - new Thread() { - override def run() { - if (channel.isOpen) channel.close() - if (connection.isOpen) connection.close() - } - } -} diff --git a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala b/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala deleted file mode 100644 index 0ebf71a7d..000000000 --- a/rabbitmq/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/RabbitMQSink.scala +++ /dev/null @@ -1,81 +0,0 @@ -/** - * Copyright (c) 2013-present Snowplow Analytics Ltd. - * All rights reserved. - * - * This software is made available by Snowplow Analytics, Ltd., - * under the terms of the Snowplow Limited Use License Agreement, Version 1.0 - * located at https://docs.snowplow.io/limited-use-license-1.0 - * BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION - * OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. - */ -package com.snowplowanalytics.snowplow.collectors.scalastream -package sinks - -import scala.concurrent.{ExecutionContext, Future} -import scala.concurrent.duration._ -import scala.util.{Failure, Success} - -import cats.syntax.either._ - -import com.rabbitmq.client.Channel - -import com.snowplowanalytics.snowplow.collectors.scalastream.model.RabbitMQBackoffPolicyConfig - -class RabbitMQSink( - val maxBytes: Int, - channel: Channel, - exchangeName: String, - backoffPolicy: RabbitMQBackoffPolicyConfig, - executionContext: ExecutionContext -) extends Sink { - - implicit val ec = executionContext - - override def storeRawEvents(events: List[Array[Byte]], key: String): Unit = - if (events.nonEmpty) { - log.info( - s"Sending ${events.size} Thrift records to exchange $exchangeName" - ) - Future.sequence(events.map(e => sendOneEvent(e))).onComplete { - case Success(_) => - log.debug( - s"${events.size} events successfully sent to exchange $exchangeName" - ) - // We should never reach this as the writing of each individual event is retried forever - case Failure(e) => - throw new RuntimeException(s"Error happened during the sending of ${events.size} events: ${e.getMessage}") - } - } - - private def sendOneEvent(bytes: Array[Byte], currentBackOff: Option[FiniteDuration] = None): Future[Unit] = - Future { - if (currentBackOff.isDefined) Thread.sleep(currentBackOff.get.toMillis) - channel.basicPublish(exchangeName, "", null, bytes) - }.recoverWith { - case e => - val nextBackOff = - currentBackOff match { - case Some(current) => - (backoffPolicy.multiplier * current.toMillis).toLong.min(backoffPolicy.maxBackoff).millis - case None => - backoffPolicy.minBackoff.millis - } - log.error(s"Sending of event failed with error: ${e.getMessage}. Retrying in $nextBackOff") - sendOneEvent(bytes, Some(nextBackOff)) - } - - override def shutdown(): Unit = () -} - -object RabbitMQSink { - def init( - maxBytes: Int, - channel: Channel, - exchangeName: String, - backoffPolicy: RabbitMQBackoffPolicyConfig, - executionContext: ExecutionContext - ): Either[Throwable, RabbitMQSink] = - for { - _ <- Either.catchNonFatal(channel.exchangeDeclarePassive(exchangeName)) - } yield new RabbitMQSink(maxBytes, channel, exchangeName, backoffPolicy, executionContext) -} diff --git a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala index 5aab6fa8d..c3d3fa4a5 100644 --- a/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala +++ b/sqs/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/SqsSink.scala @@ -23,7 +23,7 @@ import scala.collection.mutable.ListBuffer import scala.util.{Failure, Random, Success, Try} import scala.concurrent.{ExecutionContextExecutorService, Future} import scala.concurrent.duration.MILLISECONDS -import scala.collection.JavaConverters._ +import scala.jdk.CollectionConverters._ import cats.syntax.either._ @@ -84,7 +84,7 @@ class SqsSink[F[_]: Sync] private ( def flush(): Unit = { val eventsToSend = synchronized { - val evts = storedEvents.result + val evts = storedEvents.result() storedEvents.clear() byteCount = 0 evts @@ -244,7 +244,7 @@ class SqsSink[F[_]: Sync] private ( private def checkSqsHealth(): Unit = { val healthRunnable = new Runnable { - override def run() { + override def run(): Unit = while (!sqsHealthy) { Try { client.getQueueUrl(queueName) @@ -257,7 +257,6 @@ class SqsSink[F[_]: Sync] private ( Thread.sleep(1000L) } } - } } executorService.execute(healthRunnable) } From a46eeb4c54a885d45f821b34e8f2da86e4ef1016 Mon Sep 17 00:00:00 2001 From: spenes Date: Thu, 9 Nov 2023 17:59:47 +0300 Subject: [PATCH 30/39] Use correct sqs buffer queue name with Kinesis bad sink (close #393) --- .../KinesisCollector.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala index ab51cbaba..99c63e779 100644 --- a/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala +++ b/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KinesisCollector.scala @@ -28,7 +28,7 @@ object KinesisCollector extends App[KinesisSinkConfig](BuildInfo) { val threadPoolExecutor = buildExecutorService(config.good.config) for { good <- KinesisSink.create[IO](config.good, config.good.config.sqsGoodBuffer, threadPoolExecutor) - bad <- KinesisSink.create[IO](config.bad, config.good.config.sqsBadBuffer, threadPoolExecutor) + bad <- KinesisSink.create[IO](config.bad, config.bad.config.sqsBadBuffer, threadPoolExecutor) } yield Sinks(good, bad) } From f5710c45a1b3cb8876d3ff1afd9a9a2a72a09655 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Piotr=20Poniedzia=C5=82ek?= Date: Thu, 9 Nov 2023 16:26:53 +0100 Subject: [PATCH 31/39] Deploy 2.13 scala assets to GH on CI (close #392) --- .github/workflows/deploy.yml | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index 47942fe93..2b822892c 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -36,12 +36,12 @@ jobs: name: ${{ steps.ver.outputs.project_version }} tag_name: ${{ steps.ver.outputs.project_version }} files: | - kafka/target/scala-2.12/snowplow-stream-collector-kafka-${{ steps.ver.outputs.project_version }}.jar - kinesis/target/scala-2.12/snowplow-stream-collector-kinesis-${{ steps.ver.outputs.project_version }}.jar - nsq/target/scala-2.12/snowplow-stream-collector-nsq-${{ steps.ver.outputs.project_version }}.jar - pubsub/target/scala-2.12/snowplow-stream-collector-google-pubsub-${{ steps.ver.outputs.project_version }}.jar - sqs/target/scala-2.12/snowplow-stream-collector-sqs-${{ steps.ver.outputs.project_version }}.jar - stdout/target/scala-2.12/snowplow-stream-collector-stdout-${{ steps.ver.outputs.project_version }}.jar + kafka/target/scala-2.13/snowplow-stream-collector-kafka-${{ steps.ver.outputs.project_version }}.jar + kinesis/target/scala-2.13/snowplow-stream-collector-kinesis-${{ steps.ver.outputs.project_version }}.jar + nsq/target/scala-2.13/snowplow-stream-collector-nsq-${{ steps.ver.outputs.project_version }}.jar + pubsub/target/scala-2.13/snowplow-stream-collector-google-pubsub-${{ steps.ver.outputs.project_version }}.jar + sqs/target/scala-2.13/snowplow-stream-collector-sqs-${{ steps.ver.outputs.project_version }}.jar + stdout/target/scala-2.13/snowplow-stream-collector-stdout-${{ steps.ver.outputs.project_version }}.jar env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} From cc8516d6f85ae1a42e78e7da86ed77ce6ee9ddbd Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Fri, 17 Nov 2023 13:20:30 +0100 Subject: [PATCH 32/39] Add http root response (close #397) Adds the ability to configure collector's root (GET `/`) response. If `rootResponse.enabled = true` when `/` is requested a static, configured response is returned. With status code equal to `rootResponse.statusCode`, http headers of `rootResponse.headers` and body of `rootResponse.body` --- .../Routes.scala | 10 ++++++-- .../Run.scala | 2 +- .../Service.scala | 15 +++++++++++ .../RoutesSpec.scala | 25 +++++++++++++++++-- 4 files changed, 47 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index 98518523c..55766ef4c 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -17,7 +17,8 @@ import org.http4s.dsl.Http4sDsl import org.http4s.implicits._ import com.comcast.ip4s.Dns -class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, service: IService[F]) extends Http4sDsl[F] { +class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, enableRootResponse: Boolean, service: IService[F]) + extends Http4sDsl[F] { implicit val dns: Dns[F] = Dns.forSync[F] @@ -79,8 +80,13 @@ class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, service: IService[F]) e NotFound("redirects disabled") } + private val rootRoute = HttpRoutes.of[F] { + case GET -> Root if enableRootResponse => + service.rootResponse + } + val value: HttpApp[F] = { - val routes = healthRoutes <+> corsRoute <+> cookieRoutes + val routes = healthRoutes <+> corsRoute <+> cookieRoutes <+> rootRoute val res = if (enableDefaultRedirect) routes else rejectRedirect <+> routes res.orNotFound } diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 97cfbaef6..28286dc4a 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -81,7 +81,7 @@ object Run { appInfo ) httpServer = HttpServer.build[F]( - new Routes[F](config.enableDefaultRedirect, collectorService).value, + new Routes[F](config.enableDefaultRedirect, config.rootResponse.enabled, collectorService).value, if (config.ssl.enable) config.ssl.port else config.port, config.ssl.enable, config.networking diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index 7c79f16d9..9bffffeac 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -45,6 +45,7 @@ trait IService[F[_]] { ): F[Response[F]] def determinePath(vendor: String, version: String): String def sinksHealthy: F[Boolean] + def rootResponse: F[Response[F]] } object Service { @@ -142,6 +143,20 @@ class Service[F[_]: Sync]( ) } + override def rootResponse: F[Response[F]] = Sync[F].fromEither { + for { + status <- Status.fromInt(config.rootResponse.statusCode) + body = Stream.emit(config.rootResponse.body).through(fs2.text.utf8.encode) + headers = Headers( + config.rootResponse.headers.toList.map { case (name, value) => Header.Raw(CIString(name), value) } + ) + } yield Response[F]( + status = status, + body = body, + headers = headers + ) + } + def extractHeader(req: Request[F], headerName: String): Option[String] = req.headers.get(CIString(headerName)).map(_.head.value) diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index e45649eda..3bd08188b 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -34,6 +34,9 @@ class RoutesSpec extends Specification { override def preflightResponse(req: Request[IO]): IO[Response[IO]] = IO.pure(Response[IO](status = Ok, body = Stream.emit("preflight response").through(text.utf8.encode))) + override def rootResponse: IO[Response[IO]] = + IO.pure(Response(status = Ok, body = Stream.emit("root").through(text.utf8.encode))) + override def cookie( body: IO[Option[String]], path: String, @@ -59,9 +62,9 @@ class RoutesSpec extends Specification { override def sinksHealthy: IO[Boolean] = IO.pure(true) } - def createTestServices(enabledDefaultRedirect: Boolean = true) = { + def createTestServices(enabledDefaultRedirect: Boolean = true, enableRootResponse: Boolean = false) = { val service = new TestService() - val routes = new Routes(enabledDefaultRedirect, service).value + val routes = new Routes(enabledDefaultRedirect, enableRootResponse, service).value (service, routes) } @@ -240,6 +243,24 @@ class RoutesSpec extends Specification { test(Method.GET) test(Method.POST) } + + "respond to the root route" in { + "enabled return the response" in { + val (_, routes) = createTestServices(enableRootResponse = true) + val request = Request[IO](method = Method.GET, uri = uri"/") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.Ok) + response.as[String].unsafeRunSync() must beEqualTo("root") + } + "disabled return NotFound" in { + val (_, routes) = createTestServices(enableRootResponse = false) + val request = Request[IO](method = Method.GET, uri = uri"/") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.NotFound) + } + } } } From da1c964953644cde227e35033637c0299bd25eda Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Tue, 21 Nov 2023 14:08:45 +0100 Subject: [PATCH 33/39] Add crossdomain.xml support (close #399) crossdomain.xml provides cross-domain functionality to Adobe Flash/Flex, but these days Adobe Acrobat. The functionality is required when a tracker request is embedded in a pdf. In this case, when user opens up a PDF file with a script hosted on domain a.com, the script will fetch the crossdomain.xml policy from domain b.com to check whether the endpoint can be accessed and that's used by Adobe to conditionally issue requests. --- .../Routes.scala | 15 +++++++-- .../Run.scala | 2 +- .../Service.scala | 24 ++++++++++++-- .../RoutesSpec.scala | 31 +++++++++++++++++-- .../ServiceSpec.scala | 8 +++++ 5 files changed, 71 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index 55766ef4c..2bcd93828 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -17,8 +17,12 @@ import org.http4s.dsl.Http4sDsl import org.http4s.implicits._ import com.comcast.ip4s.Dns -class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, enableRootResponse: Boolean, service: IService[F]) - extends Http4sDsl[F] { +class Routes[F[_]: Sync]( + enableDefaultRedirect: Boolean, + enableRootResponse: Boolean, + enableCrossdomainTracking: Boolean, + service: IService[F] +) extends Http4sDsl[F] { implicit val dns: Dns[F] = Dns.forSync[F] @@ -85,8 +89,13 @@ class Routes[F[_]: Sync](enableDefaultRedirect: Boolean, enableRootResponse: Boo service.rootResponse } + private val crossdomainRoute = HttpRoutes.of[F] { + case GET -> Root / "crossdomain.xml" if enableCrossdomainTracking => + service.crossdomainResponse + } + val value: HttpApp[F] = { - val routes = healthRoutes <+> corsRoute <+> cookieRoutes <+> rootRoute + val routes = healthRoutes <+> corsRoute <+> cookieRoutes <+> rootRoute <+> crossdomainRoute val res = if (enableDefaultRedirect) routes else rejectRedirect <+> routes res.orNotFound } diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 28286dc4a..969590092 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -81,7 +81,7 @@ object Run { appInfo ) httpServer = HttpServer.build[F]( - new Routes[F](config.enableDefaultRedirect, config.rootResponse.enabled, collectorService).value, + new Routes[F](config.enableDefaultRedirect, config.rootResponse.enabled, config.crossDomain.enabled, collectorService).value, if (config.ssl.enable) config.ssl.port else config.port, config.ssl.enable, config.networking diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index 9bffffeac..058b7fff3 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -46,12 +46,12 @@ trait IService[F[_]] { def determinePath(vendor: String, version: String): String def sinksHealthy: F[Boolean] def rootResponse: F[Response[F]] + def crossdomainResponse: F[Response[F]] } object Service { // Contains an invisible pixel to return for `/i` requests. - val pixel = Base64.decodeBase64("R0lGODlhAQABAPAAAP///wAAACH5BAUAAAAALAAAAAABAAEAAAICRAEAOw==") - + val pixel = Base64.decodeBase64("R0lGODlhAQABAPAAAP///wAAACH5BAUAAAAALAAAAAABAAEAAAICRAEAOw==") val spAnonymousNuid = "00000000-0000-0000-0000-000000000000" } @@ -157,6 +157,26 @@ class Service[F[_]: Sync]( ) } + def crossdomainResponse: F[Response[F]] = Sync[F].pure { + val policy = + config + .crossDomain + .domains + .map(d => s"""""") + .mkString("\n") + + val xml = s""" + | + |${policy} + |""".stripMargin + + Response[F]( + status = Ok, + body = Stream.emit(xml).through(fs2.text.utf8.encode), + headers = Headers(`Content-Type`(MediaType.text.xml)) + ) + } + def extractHeader(req: Request[F], headerName: String): Option[String] = req.headers.get(CIString(headerName)).map(_.head.value) diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 3bd08188b..0278c9bd4 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -37,6 +37,9 @@ class RoutesSpec extends Specification { override def rootResponse: IO[Response[IO]] = IO.pure(Response(status = Ok, body = Stream.emit("root").through(text.utf8.encode))) + override def crossdomainResponse: IO[Response[IO]] = + IO.pure(Response(status = Ok, body = Stream.empty)) + override def cookie( body: IO[Option[String]], path: String, @@ -62,9 +65,13 @@ class RoutesSpec extends Specification { override def sinksHealthy: IO[Boolean] = IO.pure(true) } - def createTestServices(enabledDefaultRedirect: Boolean = true, enableRootResponse: Boolean = false) = { + def createTestServices( + enabledDefaultRedirect: Boolean = true, + enableRootResponse: Boolean = false, + enableCrossdomainTracking: Boolean = false + ) = { val service = new TestService() - val routes = new Routes(enabledDefaultRedirect, enableRootResponse, service).value + val routes = new Routes(enabledDefaultRedirect, enableRootResponse, enableCrossdomainTracking, service).value (service, routes) } @@ -244,7 +251,7 @@ class RoutesSpec extends Specification { test(Method.POST) } - "respond to the root route" in { + "respond to root route" in { "enabled return the response" in { val (_, routes) = createTestServices(enableRootResponse = true) val request = Request[IO](method = Method.GET, uri = uri"/") @@ -261,6 +268,24 @@ class RoutesSpec extends Specification { response.status must beEqualTo(Status.NotFound) } } + + "respond to crossdomain route" in { + "enabled return the response" in { + val (_, routes) = createTestServices(enableCrossdomainTracking = true) + val request = Request[IO](method = Method.GET, uri = uri"/crossdomain.xml") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.Ok) + } + "disabled return NotFound" in { + val (_, routes) = createTestServices(enableCrossdomainTracking = false) + val request = Request[IO](method = Method.GET, uri = uri"/crossdomain.xml") + val response = routes.run(request).unsafeRunSync() + + response.status must beEqualTo(Status.NotFound) + } + } + } } diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index c431eb75f..d3024294c 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -1016,5 +1016,13 @@ class ServiceSpec extends Specification { service.determinePath(vendor, version3) shouldEqual expected3 } } + + "crossdomainResponse" in { + val response = service.crossdomainResponse.unsafeRunSync() + val body = response.body.compile.toList.unsafeRunSync().map(_.toChar).mkString + body must startWith("""""") + body must contain("") + body must endWith("") + } } } From 0f9d6543a1d942c16eba2a7081079c7de09ceb24 Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Wed, 22 Nov 2023 17:22:09 +0100 Subject: [PATCH 34/39] Add support for Do Not Track cookie (close #400) This is a server-side option to disable tracking once events are received from a tracker. These requests are not sank and instead are silently discarded. Trackers also enable a similar mechanism[1] that will not pass events to the collector. This is likely a better approach, but having it in the collector is another prevention mechanism. 1 - https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/javascript-trackers/web-tracker/tracker-setup/initialization-options/#respecting-do-not-track --- .../Run.scala | 7 +- .../Service.scala | 77 ++++++++++++++----- .../ServiceSpec.scala | 77 +++++++++++++++++-- .../it/core/DoNotTrackCookieSpec.scala | 4 +- 4 files changed, 138 insertions(+), 27 deletions(-) diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index 969590092..f814b90c7 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -81,7 +81,12 @@ object Run { appInfo ) httpServer = HttpServer.build[F]( - new Routes[F](config.enableDefaultRedirect, config.rootResponse.enabled, config.crossDomain.enabled, collectorService).value, + new Routes[F]( + config.enableDefaultRedirect, + config.rootResponse.enabled, + config.crossDomain.enabled, + collectorService + ).value, if (config.ssl.enable) config.ssl.port else config.port, config.ssl.enable, config.networking diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index 058b7fff3..1b9181c62 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -77,16 +77,24 @@ class Service[F[_]: Sync]( ): F[Response[F]] = for { body <- body - redirect = path.startsWith("/r/") - hostname = extractHostname(request) - userAgent = extractHeader(request, "User-Agent") - refererUri = extractHeader(request, "Referer") - spAnonymous = extractHeader(request, "SP-Anonymous") - ip = extractIp(request, spAnonymous) - queryString = Some(request.queryString) - cookie = extractCookie(request) - nuidOpt = networkUserId(request, cookie, spAnonymous) - nuid = nuidOpt.getOrElse(UUID.randomUUID().toString) + redirect = path.startsWith("/r/") + hostname = extractHostname(request) + userAgent = extractHeader(request, "User-Agent") + refererUri = extractHeader(request, "Referer") + spAnonymous = extractHeader(request, "SP-Anonymous") + ip = extractIp(request, spAnonymous) + queryString = Some(request.queryString) + cookie = extractCookie(request) + doNotTrack = checkDoNotTrackCookie(request) + alreadyBouncing = queryString.contains(config.cookieBounce.name) + nuidOpt = networkUserId(request, cookie, spAnonymous) + nuid = nuidOpt.getOrElse { + if (alreadyBouncing) config.cookieBounce.fallbackNetworkUserId + else UUID.randomUUID().toString + } + shouldBounce = config.cookieBounce.enabled && nuidOpt.isEmpty && !alreadyBouncing && + pixelExpected && !redirect + (ipAddress, partitionKey) = ipAndPartitionKey(ip, config.streams.useIpAddressAsPartitionKey) event = buildEvent( queryString, @@ -115,13 +123,14 @@ class Service[F[_]: Sync]( accessControlAllowOriginHeader(request).some, `Access-Control-Allow-Credentials`().toRaw1.some ).flatten - responseHeaders = Headers(headerList) - _ <- sinkEvent(event, partitionKey) + responseHeaders = Headers(headerList ++ bounceLocationHeaders(config.cookieBounce, shouldBounce, request)) + _ <- if (!doNotTrack && !shouldBounce) sinkEvent(event, partitionKey) else Sync[F].unit resp = buildHttpResponse( queryParams = request.uri.query.params, headers = responseHeaders, redirect = redirect, - pixelExpected = pixelExpected + pixelExpected = pixelExpected, + shouldBounce = shouldBounce ) } yield resp @@ -183,6 +192,13 @@ class Service[F[_]: Sync]( def extractCookie(req: Request[F]): Option[RequestCookie] = req.cookies.find(_.name == config.cookie.name) + def checkDoNotTrackCookie(req: Request[F]): Boolean = + config.doNotTrackCookie.enabled && req + .cookies + .find(_.name == config.doNotTrackCookie.name) + .map(cookie => config.doNotTrackCookie.value.r.matches(cookie.content)) + .getOrElse(false) + def extractHostname(req: Request[F]): Option[String] = req.uri.authority.map(_.host.renderString) // Hostname is extracted like this in Akka-Http as well @@ -228,23 +244,25 @@ class Service[F[_]: Sync]( queryParams: Map[String, String], headers: Headers, redirect: Boolean, - pixelExpected: Boolean + pixelExpected: Boolean, + shouldBounce: Boolean ): Response[F] = if (redirect) buildRedirectHttpResponse(queryParams, headers) else - buildUsualHttpResponse(pixelExpected, headers) + buildUsualHttpResponse(pixelExpected, shouldBounce, headers) /** Builds the appropriate http response when not dealing with click redirects. */ - def buildUsualHttpResponse(pixelExpected: Boolean, headers: Headers): Response[F] = - pixelExpected match { - case true => + def buildUsualHttpResponse(pixelExpected: Boolean, shouldBounce: Boolean, headers: Headers): Response[F] = + (pixelExpected, shouldBounce) match { + case (true, true) => Response[F](status = Found, headers = headers) + case (true, false) => Response[F]( headers = headers.put(`Content-Type`(MediaType.image.gif)), body = pixelStream ) // See https://github.com/snowplow/snowplow-javascript-tracker/issues/482 - case false => + case _ => Response[F]( status = Ok, headers = headers, @@ -441,4 +459,25 @@ class Service[F[_]: Sync]( case None => request.uri.query.params.get("nuid").orElse(requestCookie.map(_.content)) } + /** + * Builds a location header redirecting to itself to check if third-party cookies are blocked. + * + * @param request + * @param shouldBounce + * @return http optional location header + */ + def bounceLocationHeaders(cfg: Config.CookieBounce, shouldBounce: Boolean, request: Request[F]): Option[Header.Raw] = + if (shouldBounce) Some { + val forwardedScheme = for { + headerName <- cfg.forwardedProtocolHeader.map(CIString(_)) + headerValue <- request.headers.get(headerName).flatMap(_.map(_.value).toList.headOption) + maybeScheme <- if (Set("http", "https").contains(headerValue)) Some(headerValue) else None + scheme <- Uri.Scheme.fromString(maybeScheme).toOption + } yield scheme + val redirectUri = + request.uri.withQueryParam(cfg.name, "true").copy(scheme = forwardedScheme.orElse(request.uri.scheme)) + + `Location`(redirectUri).toRaw1 + } else None + } diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index d3024294c..a9fa6c69a 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -491,7 +491,8 @@ class ServiceSpec extends Specification { queryParams = Map("u" -> "https://example1.com/12"), headers = testHeaders, redirect = true, - pixelExpected = true + pixelExpected = true, + shouldBounce = false ) res.status shouldEqual Status.Found res.headers shouldEqual testHeaders.put(Location(Uri.unsafeFromString("https://example1.com/12"))) @@ -501,18 +502,31 @@ class ServiceSpec extends Specification { queryParams = Map.empty, headers = testHeaders, redirect = false, - pixelExpected = true + pixelExpected = true, + shouldBounce = false ) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders.put(`Content-Type`(MediaType.image.gif)) res.body.compile.toList.unsafeRunSync().toArray shouldEqual Service.pixel } + "return 302 Found if expecting tracking pixel and cookie shouldBounce is performed" in { + val res = service.buildHttpResponse( + queryParams = Map.empty, + headers = testHeaders, + redirect = false, + pixelExpected = true, + shouldBounce = true + ) + res.status shouldEqual Status.Found + res.headers shouldEqual testHeaders + } "send back ok otherwise" in { val res = service.buildHttpResponse( queryParams = Map.empty, headers = testHeaders, redirect = false, - pixelExpected = false + pixelExpected = false, + shouldBounce = false ) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders @@ -524,7 +538,8 @@ class ServiceSpec extends Specification { "send back a gif if pixelExpected is true" in { val res = service.buildUsualHttpResponse( headers = testHeaders, - pixelExpected = true + pixelExpected = true, + shouldBounce = false ) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders.put(`Content-Type`(MediaType.image.gif)) @@ -533,7 +548,8 @@ class ServiceSpec extends Specification { "send back ok otherwise" in { val res = service.buildUsualHttpResponse( headers = testHeaders, - pixelExpected = false + pixelExpected = false, + shouldBounce = false ) res.status shouldEqual Status.Ok res.headers shouldEqual testHeaders @@ -1024,5 +1040,56 @@ class ServiceSpec extends Specification { body must contain("") body must endWith("") } + + "checkDoNotTrackCookie" should { + "be disabled when value does not match regex" in { + val cookieName = "do-not-track" + val expected = "lorem-1p5uM" + val request = Request[IO]( + headers = Headers( + Cookie(RequestCookie(cookieName, expected)) + ) + ) + val service = new Service( + config = + TestUtils.testConfig.copy(doNotTrackCookie = Config.DoNotTrackCookie(true, cookieName, "^snowplow-(.*)$")), + sinks = Sinks(new TestSink, new TestSink), + appInfo = TestUtils.appInfo + ) + service.checkDoNotTrackCookie(request) should beFalse + } + "be disabled when name does not match config" in { + val cookieName = "do-not-track" + val expected = "lorem-1p5uM" + val request = Request[IO]( + headers = Headers( + Cookie(RequestCookie(cookieName, expected)) + ) + ) + val service = new Service( + config = TestUtils + .testConfig + .copy(doNotTrackCookie = Config.DoNotTrackCookie(true, s"snowplow-$cookieName", "^(.*)$")), + sinks = Sinks(new TestSink, new TestSink), + appInfo = TestUtils.appInfo + ) + service.checkDoNotTrackCookie(request) should beFalse + } + "match cookie against a regex when it exists" in { + val cookieName = "do-not-track" + val expected = "lorem-1p5uM" + val request = Request[IO]( + headers = Headers( + Cookie(RequestCookie(cookieName, expected)) + ) + ) + val service = new Service( + config = TestUtils.testConfig.copy(doNotTrackCookie = Config.DoNotTrackCookie(true, cookieName, "^(.*)$")), + sinks = Sinks(new TestSink, new TestSink), + appInfo = TestUtils.appInfo + ) + service.checkDoNotTrackCookie(request) should beTrue + } + } } } diff --git a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala index 854f5c4d1..fe4e3309e 100644 --- a/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala +++ b/kinesis/src/it/scala/com/snowplowanalytics/snowplow/collectors/scalastream/it/core/DoNotTrackCookieSpec.scala @@ -61,8 +61,8 @@ class DoNotTrackCookieSpec extends Specification with Localstack with CatsEffect expected.forall(cookie => headers.exists(_.contains(cookie))) must beTrue } }.unsafeRunSync() - }.pendingUntilFixed("Remove when 'do not track cookie' feature is implemented") - + } + "track events that have a cookie whose name and value match doNotTrackCookie config if disabled" in { val testName = "doNotTrackCookie-disabled" val streamGood = s"$testName-raw" From d864d07ff616afe83bffca0149079e018f16ee38 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Piotr=20Poniedzia=C5=82ek?= Date: Thu, 28 Dec 2023 13:41:23 +0100 Subject: [PATCH 35/39] Add statsd metrics reporting (close #404) --- build.sbt | 2 + core/src/main/resources/reference.conf | 1 + .../Config.scala | 3 +- .../HttpServer.scala | 39 ++++++++++++++++--- .../Routes.scala | 5 +-- .../Run.scala | 3 +- .../RoutesSpec.scala | 3 +- .../TestUtils.scala | 3 +- .../KafkaConfigSpec.scala | 21 ++++++++-- .../sinks/KinesisConfigSpec.scala | 12 +++++- .../NsqConfigSpec.scala | 21 ++++++++-- project/Dependencies.scala | 3 ++ .../ConfigSpec.scala | 21 ++++++++-- .../SqsConfigSpec.scala | 21 ++++++++-- 14 files changed, 127 insertions(+), 31 deletions(-) diff --git a/build.sbt b/build.sbt index c93c7a6f7..a7f37d428 100644 --- a/build.sbt +++ b/build.sbt @@ -31,6 +31,8 @@ lazy val core = project Dependencies.Libraries.circeConfig, Dependencies.Libraries.trackerCore, Dependencies.Libraries.emitterHttps, + Dependencies.Libraries.datadogHttp4s, + Dependencies.Libraries.datadogStatsd, Dependencies.Libraries.specs2, Dependencies.Libraries.specs2CE, Dependencies.Libraries.ceTestkit, diff --git a/core/src/main/resources/reference.conf b/core/src/main/resources/reference.conf index 96dfd594f..1a91ba19b 100644 --- a/core/src/main/resources/reference.conf +++ b/core/src/main/resources/reference.conf @@ -70,6 +70,7 @@ port = 8125 period = 10 seconds prefix = snowplow.collector + tags = { } } } } diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index 30ec5d0b3..a9d16e78e 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -122,7 +122,8 @@ object Config { hostname: String, port: Int, period: FiniteDuration, - prefix: String + prefix: String, + tags: Map[String, String] ) case class SSL( diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala index e31f9dd40..79fffb74d 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/HttpServer.scala @@ -12,9 +12,13 @@ package com.snowplowanalytics.snowplow.collector.core import cats.effect.{Async, Resource} import cats.implicits._ -import org.http4s.HttpApp +import com.avast.datadog4s.api.Tag +import com.avast.datadog4s.extension.http4s.DatadogMetricsOps +import com.avast.datadog4s.{StatsDMetricFactory, StatsDMetricFactoryConfig} +import org.http4s.HttpRoutes import org.http4s.blaze.server.BlazeServerBuilder import org.http4s.server.Server +import org.http4s.server.middleware.Metrics import org.typelevel.log4cats.Logger import org.typelevel.log4cats.slf4j.Slf4jLogger @@ -26,15 +30,38 @@ object HttpServer { implicit private def logger[F[_]: Async]: Logger[F] = Slf4jLogger.getLogger[F] def build[F[_]: Async]( - app: HttpApp[F], + routes: HttpRoutes[F], port: Int, secure: Boolean, - networking: Config.Networking + networking: Config.Networking, + metricsConfig: Config.Metrics ): Resource[F, Server] = - buildBlazeServer[F](app, port, secure, networking) + for { + withMetricsMiddleware <- createMetricsMiddleware(routes, metricsConfig) + server <- buildBlazeServer[F](withMetricsMiddleware, port, secure, networking) + } yield server + + private def createMetricsMiddleware[F[_]: Async]( + routes: HttpRoutes[F], + metricsConfig: Config.Metrics + ): Resource[F, HttpRoutes[F]] = + if (metricsConfig.statsd.enabled) { + val metricsFactory = StatsDMetricFactory.make(createStatsdConfig(metricsConfig)) + metricsFactory.evalMap(DatadogMetricsOps.builder[F](_).useDistributionBasedTimers().build()).map { metricsOps => + Metrics[F](metricsOps)(routes) + } + } else { + Resource.pure(routes) + } + + private def createStatsdConfig(metricsConfig: Config.Metrics): StatsDMetricFactoryConfig = { + val server = InetSocketAddress.createUnresolved(metricsConfig.statsd.hostname, metricsConfig.statsd.port) + val tags = metricsConfig.statsd.tags.toSeq.map { case (name, value) => Tag.of(name, value) } + StatsDMetricFactoryConfig(Some(metricsConfig.statsd.prefix), server, defaultTags = tags) + } private def buildBlazeServer[F[_]: Async]( - app: HttpApp[F], + routes: HttpRoutes[F], port: Int, secure: Boolean, networking: Config.Networking @@ -42,7 +69,7 @@ object HttpServer { Resource.eval(Logger[F].info("Building blaze server")) >> BlazeServerBuilder[F] .bindSocketAddress(new InetSocketAddress(port)) - .withHttpApp(app) + .withHttpApp(routes.orNotFound) .withIdleTimeout(networking.idleTimeout) .withMaxConnections(networking.maxConnections) .cond(secure, _.withSslContext(SSLContext.getDefault)) diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala index 2bcd93828..06d101023 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Routes.scala @@ -94,9 +94,8 @@ class Routes[F[_]: Sync]( service.crossdomainResponse } - val value: HttpApp[F] = { + val value: HttpRoutes[F] = { val routes = healthRoutes <+> corsRoute <+> cookieRoutes <+> rootRoute <+> crossdomainRoute - val res = if (enableDefaultRedirect) routes else rejectRedirect <+> routes - res.orNotFound + if (enableDefaultRedirect) routes else rejectRedirect <+> routes } } diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index f814b90c7..da842f9a8 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -89,7 +89,8 @@ object Run { ).value, if (config.ssl.enable) config.ssl.port else config.port, config.ssl.enable, - config.networking + config.networking, + config.monitoring.metrics ) _ <- withGracefulShutdown(config.preTerminationPeriod)(httpServer) httpClient <- BlazeClientBuilder[F].resource diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala index 0278c9bd4..b3a01f551 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/RoutesSpec.scala @@ -71,7 +71,8 @@ class RoutesSpec extends Specification { enableCrossdomainTracking: Boolean = false ) = { val service = new TestService() - val routes = new Routes(enabledDefaultRedirect, enableRootResponse, enableCrossdomainTracking, service).value + val routes = + new Routes(enabledDefaultRedirect, enableRootResponse, enableCrossdomainTracking, service).value.orNotFound (service, routes) } diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 2dc0b780e..3e4695687 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -102,7 +102,8 @@ object TestUtils { "localhost", 8125, 10.seconds, - "snowplow.collector" + "snowplow.collector", + Map.empty ) ) ), diff --git a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala index 95d41a67c..75d056060 100644 --- a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala +++ b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala @@ -25,8 +25,18 @@ class KafkaConfigSpec extends Specification with CatsEffect { "Config parser" should { "be able to parse extended kafka config" in { assert( - resource = "/config.kafka.extended.hocon", - expectedResult = Right(KafkaConfigSpec.expectedConfig) + resource = "/config.kafka.extended.hocon", + expectedResult = Right( + KafkaConfigSpec + .expectedConfig + .copy( + monitoring = Config.Monitoring( + Config.Metrics( + KafkaConfigSpec.expectedConfig.monitoring.metrics.statsd.copy(tags = Map("app" -> "collector")) + ) + ) + ) + ) ) } "be able to parse minimal kafka config" in { @@ -92,8 +102,11 @@ object KafkaConfigSpec { body = "" ), cors = Config.CORS(1.hour), - monitoring = - Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + monitoring = Config.Monitoring( + Config.Metrics( + Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector", Map.empty) + ) + ), ssl = Config.SSL(enable = false, redirect = false, port = 443), enableDefaultRedirect = false, redirectDomains = Set.empty, diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index fb6b3e778..aa365da64 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -39,7 +39,12 @@ class KinesisConfigSpec extends Specification with CatsEffect { moduleVersion = Some("0.5.2"), instanceId = Some("665bhft5u6udjf"), autoGeneratedId = Some("hfy67e5ydhtrd") + ), + monitoring = Config.Monitoring( + Config.Metrics( + KinesisConfigSpec.expectedConfig.monitoring.metrics.statsd.copy(tags = Map("app" -> "collector")) ) + ) ) ) ) @@ -107,8 +112,11 @@ object KinesisConfigSpec { body = "" ), cors = Config.CORS(1.hour), - monitoring = - Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + monitoring = Config.Monitoring( + Config.Metrics( + Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector", Map.empty) + ) + ), ssl = Config.SSL(enable = false, redirect = false, port = 443), enableDefaultRedirect = false, redirectDomains = Set.empty, diff --git a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala index e5620daeb..be9fefa37 100644 --- a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala +++ b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala @@ -25,8 +25,18 @@ class NsqConfigSpec extends Specification with CatsEffect { "Config parser" should { "be able to parse extended nsq config" in { assert( - resource = "/config.nsq.extended.hocon", - expectedResult = Right(NsqConfigSpec.expectedConfig) + resource = "/config.nsq.extended.hocon", + expectedResult = Right( + NsqConfigSpec + .expectedConfig + .copy( + monitoring = Config.Monitoring( + Config.Metrics( + NsqConfigSpec.expectedConfig.monitoring.metrics.statsd.copy(tags = Map("app" -> "collector")) + ) + ) + ) + ) ) } "be able to parse minimal nsq config" in { @@ -91,8 +101,11 @@ object NsqConfigSpec { body = "" ), cors = Config.CORS(1.hour), - monitoring = - Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + monitoring = Config.Monitoring( + Config.Metrics( + Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector", Map.empty) + ) + ), ssl = Config.SSL(enable = false, redirect = false, port = 443), enableDefaultRedirect = false, redirectDomains = Set.empty, diff --git a/project/Dependencies.scala b/project/Dependencies.scala index 341af732f..3103a8eb6 100644 --- a/project/Dependencies.scala +++ b/project/Dependencies.scala @@ -39,6 +39,7 @@ object Dependencies { val testcontainers = "0.40.10" val thrift = "0.15.0" // force this version to mitigate security vulnerabilities val tracker = "2.0.0" + val dataDog4s = "0.32.0" } object Libraries { @@ -58,6 +59,8 @@ object Dependencies { val slf4j = "org.slf4j" % "slf4j-simple" % V.slf4j val thrift = "org.apache.thrift" % "libthrift" % V.thrift val trackerCore = "com.snowplowanalytics" %% "snowplow-scala-tracker-core" % V.tracker + val datadogHttp4s = "com.avast.cloud" %% "datadog4s-http4s" % V.dataDog4s + val datadogStatsd = "com.avast.cloud" %% "datadog4s-statsd" % V.dataDog4s //sinks val fs2PubSub = "com.permutive" %% "fs2-google-pubsub-grpc" % V.fs2PubSub diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index aa78c2584..1dcebe092 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -24,8 +24,18 @@ class ConfigSpec extends Specification with CatsEffect { "Config parser" should { "be able to parse extended pubsub config" in { assert( - resource = "/config.pubsub.extended.hocon", - expectedResult = Right(ConfigSpec.expectedConfig) + resource = "/config.pubsub.extended.hocon", + expectedResult = Right( + ConfigSpec + .expectedConfig + .copy( + monitoring = Config.Monitoring( + Config.Metrics( + ConfigSpec.expectedConfig.monitoring.metrics.statsd.copy(tags = Map("app" -> "collector")) + ) + ) + ) + ) ) } "be able to parse minimal pubsub config" in { @@ -91,8 +101,11 @@ object ConfigSpec { body = "" ), cors = Config.CORS(1.hour), - monitoring = - Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + monitoring = Config.Monitoring( + Config.Metrics( + Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector", Map.empty) + ) + ), ssl = Config.SSL(enable = false, redirect = false, port = 443), enableDefaultRedirect = false, redirectDomains = Set.empty, diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index e50f5ba85..77e301bdd 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -25,8 +25,18 @@ class SqsConfigSpec extends Specification with CatsEffect { "Config parser" should { "be able to parse extended kinesis config" in { assert( - resource = "/config.sqs.extended.hocon", - expectedResult = Right(SqsConfigSpec.expectedConfig) + resource = "/config.sqs.extended.hocon", + expectedResult = Right( + SqsConfigSpec + .expectedConfig + .copy( + monitoring = Config.Monitoring( + Config.Metrics( + SqsConfigSpec.expectedConfig.monitoring.metrics.statsd.copy(tags = Map("app" -> "collector")) + ) + ) + ) + ) ) } "be able to parse minimal kinesis config" in { @@ -92,8 +102,11 @@ object SqsConfigSpec { body = "" ), cors = Config.CORS(1.hour), - monitoring = - Config.Monitoring(Config.Metrics(Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector"))), + monitoring = Config.Monitoring( + Config.Metrics( + Config.Statsd(false, "localhost", 8125, 10.seconds, "snowplow.collector", Map.empty) + ) + ), ssl = Config.SSL(enable = false, redirect = false, port = 443), enableDefaultRedirect = false, redirectDomains = Set.empty, From 928e8cb2916315a103817f153423a671c665e30a Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Thu, 14 Dec 2023 11:41:00 +0100 Subject: [PATCH 36/39] Use shortname for collector name (close #403) Previously we've moved on to using a full name which was incompatible with the old format. Now the name uses the expected short name (`ssc-$VERSION-$SINK`) format. --- .../AppInfo.scala | 1 + .../Service.scala | 3 ++- .../ServiceSpec.scala | 6 +++--- .../TestUtils.scala | 1 + project/BuildSettings.scala | 2 +- 5 files changed, 8 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala index 9c9a67a3b..dddcff6f1 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/AppInfo.scala @@ -15,4 +15,5 @@ trait AppInfo { def moduleName: String def version: String def dockerAlias: String + def shortName: String } diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala index 1b9181c62..901535aa9 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Service.scala @@ -63,7 +63,8 @@ class Service[F[_]: Sync]( val pixelStream = Stream.iterable[F, Byte](Service.pixel) - private val collector = s"${appInfo.name}:${appInfo.version}" + private val collector = + s"""${appInfo.shortName}-${appInfo.version}-${sinks.good.getClass.getSimpleName.toLowerCase}""" private val splitBatch: SplitBatch = SplitBatch(appInfo) diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala index a9fa6c69a..0a6082924 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ServiceSpec.scala @@ -242,7 +242,7 @@ class ServiceSpec extends Specification { e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" e.ipAddress shouldEqual "192.0.2.3" e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"${TestUtils.appName}:${TestUtils.appVersion}" + e.collector shouldEqual s"${TestUtils.appInfo.shortName}-${TestUtils.appVersion}-testsink" e.querystring shouldEqual "a=b" e.body shouldEqual "b" e.path shouldEqual "p" @@ -427,7 +427,7 @@ class ServiceSpec extends Specification { e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" e.ipAddress shouldEqual "ip" e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"${TestUtils.appName}:${TestUtils.appVersion}" + e.collector shouldEqual s"${TestUtils.appInfo.shortName}-${TestUtils.appVersion}-testsink" e.querystring shouldEqual "q" e.body shouldEqual "b" e.path shouldEqual "p" @@ -456,7 +456,7 @@ class ServiceSpec extends Specification { e.schema shouldEqual "iglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0" e.ipAddress shouldEqual "ip" e.encoding shouldEqual "UTF-8" - e.collector shouldEqual s"${TestUtils.appName}:${TestUtils.appVersion}" + e.collector shouldEqual s"${TestUtils.appInfo.shortName}-${TestUtils.appVersion}-testsink" e.querystring shouldEqual null e.body shouldEqual null e.path shouldEqual "p" diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 3e4695687..184f32bde 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -17,6 +17,7 @@ object TestUtils { def moduleName = appName def version = appVersion def dockerAlias = "docker run collector" + def shortName = "ssc" } def noopSink[F[_]: Applicative]: Sink[F] = new Sink[F] { diff --git a/project/BuildSettings.scala b/project/BuildSettings.scala index 3edaad285..d1b8e79e6 100644 --- a/project/BuildSettings.scala +++ b/project/BuildSettings.scala @@ -123,7 +123,7 @@ object BuildSettings { addExampleConfToTestCp lazy val buildInfoSettings = Seq( - buildInfoKeys := Seq[BuildInfoKey](name, moduleName, dockerAlias, version), + buildInfoKeys := Seq[BuildInfoKey](name, moduleName, dockerAlias, version, "shortName" -> "ssc"), buildInfoOptions += BuildInfoOption.Traits("com.snowplowanalytics.snowplow.collector.core.AppInfo"), buildInfoPackage := s"com.snowplowanalytics.snowplow.collectors.scalastream" ) From c4d041f0b89f8424146eedc85cdb619e959b0b0a Mon Sep 17 00:00:00 2001 From: Alex Benini Date: Thu, 4 Jan 2024 18:35:14 +0100 Subject: [PATCH 37/39] Remove unused warmup config section We no longer implement specific warmup logic. --- examples/config.kafka.extended.hocon | 12 ------------ examples/config.kinesis.extended.hocon | 12 ------------ examples/config.pubsub.extended.hocon | 12 ------------ examples/config.sqs.extended.hocon | 16 ---------------- examples/config.stdout.extended.hocon | 12 ------------ 5 files changed, 64 deletions(-) diff --git a/examples/config.kafka.extended.hocon b/examples/config.kafka.extended.hocon index 3c03a8f90..a25110c23 100644 --- a/examples/config.kafka.extended.hocon +++ b/examples/config.kafka.extended.hocon @@ -286,16 +286,4 @@ collector { # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds - - experimental { - # Enable an experimental feature to send some "warm-up" requests to the collector's own /health endpoint during startup. - # We have found from experiment this can cut down the number of 502s returned from a load balancer in front of the collector in Kubernetes deployments. - # More details in https://github.com/snowplow/stream-collector/issues/249 - warmup { - enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 - } - } } diff --git a/examples/config.kinesis.extended.hocon b/examples/config.kinesis.extended.hocon index f64928aa8..167e38459 100644 --- a/examples/config.kinesis.extended.hocon +++ b/examples/config.kinesis.extended.hocon @@ -370,16 +370,4 @@ collector { # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds - - experimental { - # Enable an experimental feature to send some "warm-up" requests to the collector's own /health endpoint during startup. - # We have found from experiment this can cut down the number of 502s returned from a load balancer in front of the collector in Kubernetes deployments. - # More details in https://github.com/snowplow/stream-collector/issues/249 - warmup { - enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 - } - } } diff --git a/examples/config.pubsub.extended.hocon b/examples/config.pubsub.extended.hocon index ad0001c8f..75ed92377 100644 --- a/examples/config.pubsub.extended.hocon +++ b/examples/config.pubsub.extended.hocon @@ -302,16 +302,4 @@ collector { # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds - - experimental { - # Enable an experimental feature to send some "warm-up" requests to the collector's own /health endpoint during startup. - # We have found from experiment this can cut down the number of 502s returned from a load balancer in front of the collector in Kubernetes deployments. - # More details in https://github.com/snowplow/stream-collector/issues/249 - warmup { - enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 - } - } } \ No newline at end of file diff --git a/examples/config.sqs.extended.hocon b/examples/config.sqs.extended.hocon index 65245ef20..0b6f50184 100644 --- a/examples/config.sqs.extended.hocon +++ b/examples/config.sqs.extended.hocon @@ -113,10 +113,6 @@ collector { name = "n3pc" # Network user id to fallback to when third-party cookies are blocked. fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000" - # Optionally, specify the name of the header containing the originating protocol for use in the - # bounce redirect location. Use this if behind a load balancer that performs SSL termination. - # The value of this header must be http or https. Example, if behind an AWS Classic ELB. - #forwardedProtocolHeader = "X-Forwarded-Proto" } # When enabled, redirect prefix `r/` will be enabled and its query parameters resolved. @@ -297,16 +293,4 @@ collector { # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds - - experimental { - # Enable an experimental feature to send some "warm-up" requests to the collector's own /health endpoint during startup. - # We have found from experiment this can cut down the number of 502s returned from a load balancer in front of the collector in Kubernetes deployments. - # More details in https://github.com/snowplow/stream-collector/issues/249 - warmup { - enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 - } - } } \ No newline at end of file diff --git a/examples/config.stdout.extended.hocon b/examples/config.stdout.extended.hocon index 851b08d13..86b132a6e 100644 --- a/examples/config.stdout.extended.hocon +++ b/examples/config.stdout.extended.hocon @@ -241,16 +241,4 @@ collector { # The server's deadline for closing connections during graceful shutdown terminationDeadline = 10 seconds - - experimental { - # Enable an experimental feature to send some "warm-up" requests to the collector's own /health endpoint during startup. - # We have found from experiment this can cut down the number of 502s returned from a load balancer in front of the collector in Kubernetes deployments. - # More details in https://github.com/snowplow/stream-collector/issues/249 - warmup { - enable = false - numRequests = 2000 - maxConnections = 2000 - maxCycles = 3 - } - } } \ No newline at end of file From 24c5d5893d3bfadc4d0a92751f3fe8744aa0add9 Mon Sep 17 00:00:00 2001 From: Piotr Limanowski Date: Wed, 3 Jan 2024 11:39:46 +0100 Subject: [PATCH 38/39] Add mandatory SLULA license acceptance flag (close #405) Since introducing a new license we need to explicitly check if the user has accepted the terms. Therefore a new flag is added. By default it is set to false but can be overrided by either: - setting `license.accept = true` in the config file - setting `env ACCEPT_LIMITED_USE_LICENSE=true` - appending `-Dlicense.accept=true` --- .../telemetry/sender_config/config.hocon | 1 + .../telemetry/sender_config/config_disabled.hocon | 1 + .github/workflows/ssc-collector-config/config.hocon | 1 + core/src/main/resources/reference.conf | 5 +++++ .../Config.scala | 13 ++++++++++++- .../Run.scala | 13 +++++++++++++ .../ConfigParserSpec.scala | 3 ++- .../TestUtils.scala | 3 ++- examples/config.kafka.extended.hocon | 5 +++++ examples/config.kafka.minimal.hocon | 4 ++++ examples/config.kinesis.extended.hocon | 5 +++++ examples/config.kinesis.minimal.hocon | 4 ++++ examples/config.nsq.extended.hocon | 5 +++++ examples/config.nsq.minimal.hocon | 7 +++++-- examples/config.pubsub.extended.hocon | 5 +++++ examples/config.pubsub.minimal.hocon | 4 ++++ examples/config.sqs.extended.hocon | 5 +++++ examples/config.sqs.minimal.hocon | 3 +++ examples/config.stdout.extended.hocon | 5 +++++ examples/config.stdout.minimal.hocon | 3 +++ kafka/src/it/resources/collector.hocon | 1 + .../KafkaConfigSpec.scala | 3 ++- .../it/resources/collector-cookie-anonymous.hocon | 1 + .../resources/collector-cookie-attributes-1.hocon | 1 + .../resources/collector-cookie-attributes-2.hocon | 1 + .../src/it/resources/collector-cookie-domain.hocon | 1 + .../it/resources/collector-cookie-fallback.hocon | 1 + .../it/resources/collector-cookie-no-domain.hocon | 1 + .../src/it/resources/collector-custom-paths.hocon | 1 + .../collector-doNotTrackCookie-disabled.hocon | 1 + .../collector-doNotTrackCookie-enabled.hocon | 1 + kinesis/src/it/resources/collector.hocon | 1 + .../sinks/KinesisConfigSpec.scala | 3 ++- .../NsqConfigSpec.scala | 3 ++- pubsub/src/it/resources/collector.hocon | 1 + .../ConfigSpec.scala | 3 ++- .../SqsConfigSpec.scala | 3 ++- 37 files changed, 112 insertions(+), 10 deletions(-) diff --git a/.github/workflows/integration_tests/telemetry/sender_config/config.hocon b/.github/workflows/integration_tests/telemetry/sender_config/config.hocon index 9979af681..96dcc7f10 100644 --- a/.github/workflows/integration_tests/telemetry/sender_config/config.hocon +++ b/.github/workflows/integration_tests/telemetry/sender_config/config.hocon @@ -1,5 +1,6 @@ # 'collector' contains configuration options for the main Scala collector. collector { + license { accept = true } # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = "9292" diff --git a/.github/workflows/integration_tests/telemetry/sender_config/config_disabled.hocon b/.github/workflows/integration_tests/telemetry/sender_config/config_disabled.hocon index 30885a8fc..e3dd731eb 100644 --- a/.github/workflows/integration_tests/telemetry/sender_config/config_disabled.hocon +++ b/.github/workflows/integration_tests/telemetry/sender_config/config_disabled.hocon @@ -1,5 +1,6 @@ # 'collector' contains configuration options for the main Scala collector. collector { + license { accept = true } # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = "10292" diff --git a/.github/workflows/ssc-collector-config/config.hocon b/.github/workflows/ssc-collector-config/config.hocon index 2137caead..10e6b6df9 100644 --- a/.github/workflows/ssc-collector-config/config.hocon +++ b/.github/workflows/ssc-collector-config/config.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = 0.0.0.0 port = 12345 diff --git a/core/src/main/resources/reference.conf b/core/src/main/resources/reference.conf index 1a91ba19b..be3b75e40 100644 --- a/core/src/main/resources/reference.conf +++ b/core/src/main/resources/reference.conf @@ -1,4 +1,9 @@ { + license { + accept = false + accept = ${?ACCEPT_LIMITED_USE_LICENSE} + } + paths {} p3p { diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala index a9d16e78e..f4f99567a 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Config.scala @@ -39,7 +39,8 @@ case class Config[+SinkConfig]( networking: Config.Networking, enableDefaultRedirect: Boolean, redirectDomains: Set[String], - preTerminationPeriod: FiniteDuration + preTerminationPeriod: FiniteDuration, + license: Config.License ) object Config { @@ -154,7 +155,17 @@ object Config { idleTimeout: FiniteDuration ) + case class License( + accept: Boolean + ) + implicit def decoder[SinkConfig: Decoder]: Decoder[Config[SinkConfig]] = { + implicit val license: Decoder[License] = { + val truthy = Set("true", "yes", "on", "1") + Decoder + .forProduct1("accept")((s: String) => License(truthy(s.toLowerCase()))) + .or(Decoder.forProduct1("accept")((b: Boolean) => License(b))) + } implicit val p3p = deriveDecoder[P3P] implicit val crossDomain = deriveDecoder[CrossDomain] implicit val sameSite: Decoder[SameSite] = Decoder.instance { cur => diff --git a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala index da842f9a8..78da39188 100644 --- a/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala +++ b/core/src/main/scala/com.snowplowanalytics.snowplow.collector.core/Run.scala @@ -58,6 +58,7 @@ object Run { ): F[ExitCode] = { val eitherT = for { config <- ConfigParser.fromPath[F, SinkConfig](path) + _ <- checkLicense(config.license.accept) _ <- EitherT.right[ExitCode](fromConfig(appInfo, mkSinks, telemetryInfo, config)) } yield ExitCode.Success @@ -67,6 +68,18 @@ object Run { } } + private def checkLicense[F[_]: Sync](acceptLicense: Boolean): EitherT[F, ExitCode, _] = + EitherT.liftF { + if (acceptLicense) + Sync[F].unit + else + Sync[F].raiseError( + new IllegalStateException( + "Please accept the terms of the Snowplow Limited Use License Agreement to proceed. See https://docs.snowplow.io/docs/pipeline-components-and-applications/stream-collector/configure/#license for more information on the license and how to configure this." + ) + ) + } + private def fromConfig[F[_]: Async: Tracking, SinkConfig]( appInfo: AppInfo, mkSinks: MkSinks[F, SinkConfig], diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala index 310df4365..3f25aeea5 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/ConfigParserSpec.scala @@ -52,7 +52,8 @@ class ConfigParserSpec extends Specification with CatsEffect { .copy[SinkConfig]( paths = Map.empty[String, String], streams = expectedStreams, - ssl = TestUtils.testConfig.ssl.copy(enable = true) + ssl = TestUtils.testConfig.ssl.copy(enable = true), + license = Config.License(false) ) ConfigParser.fromPath[IO, SinkConfig](Some(path)).value.map(_ should beRight(expected)) diff --git a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala index 184f32bde..5db8643cf 100644 --- a/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala +++ b/core/src/test/scala/com.snowplowanalytics.snowplow.collector.core/TestUtils.scala @@ -132,6 +132,7 @@ object TestUtils { moduleVersion = None, instanceId = None, autoGeneratedId = None - ) + ), + license = License(accept = true) ) } diff --git a/examples/config.kafka.extended.hocon b/examples/config.kafka.extended.hocon index a25110c23..217d0a5af 100644 --- a/examples/config.kafka.extended.hocon +++ b/examples/config.kafka.extended.hocon @@ -15,6 +15,11 @@ # 'collector' contains configuration options for the main Scala collector. collector { + # Full license text available in LICENSE.md + license { + accept = true + } + # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = 8080 diff --git a/examples/config.kafka.minimal.hocon b/examples/config.kafka.minimal.hocon index 1547b5c1e..cc7c5a869 100644 --- a/examples/config.kafka.minimal.hocon +++ b/examples/config.kafka.minimal.hocon @@ -1,4 +1,8 @@ collector { + license { + accept = true + } + interface = "0.0.0.0" port = 8080 diff --git a/examples/config.kinesis.extended.hocon b/examples/config.kinesis.extended.hocon index 167e38459..c2820c129 100644 --- a/examples/config.kinesis.extended.hocon +++ b/examples/config.kinesis.extended.hocon @@ -15,6 +15,11 @@ # 'collector' contains configuration options for the main Scala collector. collector { + # Full license text available in LICENSE.md + license { + accept = true + } + # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = 8080 diff --git a/examples/config.kinesis.minimal.hocon b/examples/config.kinesis.minimal.hocon index 9501390a5..30ee8f174 100644 --- a/examples/config.kinesis.minimal.hocon +++ b/examples/config.kinesis.minimal.hocon @@ -1,4 +1,8 @@ collector { + license { + accept = true + } + interface = "0.0.0.0" port = 8080 diff --git a/examples/config.nsq.extended.hocon b/examples/config.nsq.extended.hocon index 3bb4f0b49..804a6d2f5 100644 --- a/examples/config.nsq.extended.hocon +++ b/examples/config.nsq.extended.hocon @@ -15,6 +15,11 @@ # 'collector' contains configuration options for the main Scala collector. collector { + # Full license text available in LICENSE.md + license { + accept = true + } + # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = 8080 diff --git a/examples/config.nsq.minimal.hocon b/examples/config.nsq.minimal.hocon index 2b7afa7ca..f3cafd6dc 100644 --- a/examples/config.nsq.minimal.hocon +++ b/examples/config.nsq.minimal.hocon @@ -1,4 +1,7 @@ collector { + license { + accept = true + } interface = "0.0.0.0" port = 8080 @@ -6,8 +9,8 @@ collector { good { name = "good" host = "nsqHost" - } - + } + bad { name = "bad" host = "nsqHost" diff --git a/examples/config.pubsub.extended.hocon b/examples/config.pubsub.extended.hocon index 75ed92377..3b24f30d5 100644 --- a/examples/config.pubsub.extended.hocon +++ b/examples/config.pubsub.extended.hocon @@ -15,6 +15,11 @@ # 'collector' contains configuration options for the main Scala collector. collector { + # Full license text available in LICENSE.md + license { + accept = true + } + # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = 8080 diff --git a/examples/config.pubsub.minimal.hocon b/examples/config.pubsub.minimal.hocon index b6fdb8d05..fec7ef8a3 100644 --- a/examples/config.pubsub.minimal.hocon +++ b/examples/config.pubsub.minimal.hocon @@ -1,4 +1,8 @@ collector { + license { + accept = true + } + interface = "0.0.0.0" port = 8080 diff --git a/examples/config.sqs.extended.hocon b/examples/config.sqs.extended.hocon index 0b6f50184..0ebfbce0a 100644 --- a/examples/config.sqs.extended.hocon +++ b/examples/config.sqs.extended.hocon @@ -10,6 +10,11 @@ # 'collector' contains configuration options for the main Scala collector. collector { + # Full license text available in LICENSE.md + license { + accept = true + } + # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = 8080 diff --git a/examples/config.sqs.minimal.hocon b/examples/config.sqs.minimal.hocon index 9501390a5..137baa0af 100644 --- a/examples/config.sqs.minimal.hocon +++ b/examples/config.sqs.minimal.hocon @@ -1,4 +1,7 @@ collector { + license { + accept = true + } interface = "0.0.0.0" port = 8080 diff --git a/examples/config.stdout.extended.hocon b/examples/config.stdout.extended.hocon index 86b132a6e..a29437811 100644 --- a/examples/config.stdout.extended.hocon +++ b/examples/config.stdout.extended.hocon @@ -15,6 +15,11 @@ # 'collector' contains configuration options for the main Scala collector. collector { + # Full license text available in LICENSE.md + license { + accept = true + } + # The collector runs as a web service specified on the following interface and port. interface = "0.0.0.0" port = 8080 diff --git a/examples/config.stdout.minimal.hocon b/examples/config.stdout.minimal.hocon index 3b2e212d6..4b318e039 100644 --- a/examples/config.stdout.minimal.hocon +++ b/examples/config.stdout.minimal.hocon @@ -1,4 +1,7 @@ collector { + license { + accept = true + } interface = "0.0.0.0" port = 8080 diff --git a/kafka/src/it/resources/collector.hocon b/kafka/src/it/resources/collector.hocon index 2468a977b..afdf83333 100644 --- a/kafka/src/it/resources/collector.hocon +++ b/kafka/src/it/resources/collector.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala index 75d056060..6dc8ea4db 100644 --- a/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala +++ b/kafka/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/KafkaConfigSpec.scala @@ -158,6 +158,7 @@ object KafkaConfigSpec { networking = Config.Networking( maxConnections = 1024, idleTimeout = 610.seconds - ) + ), + license = Config.License(accept = true) ) } diff --git a/kinesis/src/it/resources/collector-cookie-anonymous.hocon b/kinesis/src/it/resources/collector-cookie-anonymous.hocon index 14f4ed802..41edde52f 100644 --- a/kinesis/src/it/resources/collector-cookie-anonymous.hocon +++ b/kinesis/src/it/resources/collector-cookie-anonymous.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-cookie-attributes-1.hocon b/kinesis/src/it/resources/collector-cookie-attributes-1.hocon index e661116da..d67d6695c 100644 --- a/kinesis/src/it/resources/collector-cookie-attributes-1.hocon +++ b/kinesis/src/it/resources/collector-cookie-attributes-1.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-cookie-attributes-2.hocon b/kinesis/src/it/resources/collector-cookie-attributes-2.hocon index 14f4ed802..41edde52f 100644 --- a/kinesis/src/it/resources/collector-cookie-attributes-2.hocon +++ b/kinesis/src/it/resources/collector-cookie-attributes-2.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-cookie-domain.hocon b/kinesis/src/it/resources/collector-cookie-domain.hocon index 4a7eaee7c..cc4a991ae 100644 --- a/kinesis/src/it/resources/collector-cookie-domain.hocon +++ b/kinesis/src/it/resources/collector-cookie-domain.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-cookie-fallback.hocon b/kinesis/src/it/resources/collector-cookie-fallback.hocon index 8c9c874f6..be55026e2 100644 --- a/kinesis/src/it/resources/collector-cookie-fallback.hocon +++ b/kinesis/src/it/resources/collector-cookie-fallback.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-cookie-no-domain.hocon b/kinesis/src/it/resources/collector-cookie-no-domain.hocon index 14f4ed802..41edde52f 100644 --- a/kinesis/src/it/resources/collector-cookie-no-domain.hocon +++ b/kinesis/src/it/resources/collector-cookie-no-domain.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-custom-paths.hocon b/kinesis/src/it/resources/collector-custom-paths.hocon index a39c6d87d..0d291144a 100644 --- a/kinesis/src/it/resources/collector-custom-paths.hocon +++ b/kinesis/src/it/resources/collector-custom-paths.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon b/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon index 6f6f54155..36c210b16 100644 --- a/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon +++ b/kinesis/src/it/resources/collector-doNotTrackCookie-disabled.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon b/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon index 0604641ae..9873cb528 100644 --- a/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon +++ b/kinesis/src/it/resources/collector-doNotTrackCookie-enabled.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/it/resources/collector.hocon b/kinesis/src/it/resources/collector.hocon index 0183b1258..7feb06585 100644 --- a/kinesis/src/it/resources/collector.hocon +++ b/kinesis/src/it/resources/collector.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala index aa365da64..ce5eec188 100644 --- a/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala +++ b/kinesis/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisConfigSpec.scala @@ -186,7 +186,8 @@ object KinesisConfigSpec { moduleVersion = None, instanceId = None, autoGeneratedId = None - ) + ), + license = Config.License(accept = true) ) } diff --git a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala index be9fefa37..8db7091ae 100644 --- a/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala +++ b/nsq/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/NsqConfigSpec.scala @@ -157,6 +157,7 @@ object NsqConfigSpec { networking = Config.Networking( maxConnections = 1024, idleTimeout = 610.seconds - ) + ), + license = Config.License(accept = true) ) } diff --git a/pubsub/src/it/resources/collector.hocon b/pubsub/src/it/resources/collector.hocon index d964fbe56..08533efbd 100644 --- a/pubsub/src/it/resources/collector.hocon +++ b/pubsub/src/it/resources/collector.hocon @@ -1,4 +1,5 @@ collector { + license { accept = true } interface = "0.0.0.0" port = ${PORT} diff --git a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala index 1dcebe092..ab54e0175 100644 --- a/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala +++ b/pubsub/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/ConfigSpec.scala @@ -177,7 +177,8 @@ object ConfigSpec { moduleVersion = None, instanceId = None, autoGeneratedId = None - ) + ), + license = Config.License(accept = true) ) } diff --git a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala index 77e301bdd..e71089b2e 100644 --- a/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala +++ b/sqs/src/test/scala/com.snowplowanalytics.snowplow.collectors.scalastream/SqsConfigSpec.scala @@ -166,7 +166,8 @@ object SqsConfigSpec { moduleVersion = None, instanceId = None, autoGeneratedId = None - ) + ), + license = Config.License(accept = true) ) } From 56d5a7ff066c2cb583c72bae14940d535d9eb454 Mon Sep 17 00:00:00 2001 From: Alex Benini Date: Fri, 5 Jan 2024 17:58:35 +0100 Subject: [PATCH 39/39] Prepare for 3.0.0 release --- CHANGELOG | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/CHANGELOG b/CHANGELOG index 889991f90..8c205fec1 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,44 @@ +Release 3.0.0 (2024-01-08) +-------------------------- +Add mandatory SLULA license acceptance flag (close #405) +Remove unused warmup config section +Use shortname for collector name (close #403) +Add statsd metrics reporting (close #404) +Add support for Do Not Track cookie (close #400) +Add crossdomain.xml support (close #399) +Add http root response (close #397) +Deploy 2.13 scala assets to GH on CI (close #392) +Use correct sqs buffer queue name with Kinesis bad sink (close #393) +Sbt project modernization (close #361) +Update the Pubsub UserAgent format (close #362) +Add separate good/bad sink configurations (close #388) +Add Kafka sink healthcheck (close #387) +Make maxConnections and idleTimeout configurable (close #386) +Add support for handling /robots.txt (close #385) +Set installation id (close #384) +Set maxBytes in the NsqSink (close #383) +Add http4s Kafka support (close #382) +Add http4s NSQ support (close #348) +Add telemetry support (close #381) +Use Blaze as default http4s backend (close #380) +Add http4s SQS sink (close #378) +Add http4s Kinesis sink (close #379) +Add iglu routes spec (close #377) +Add http4s PubSub sink (close #376) +Add http4s SSL support (close #374) +Add http4s redirect support (close #373) +Load config (close #326) +Add http4s anonymous tracking (close #372) +Add http4s CORS support (close #371) +Add http4s pixel endpoint (close #370) +Add http4s GET and HEAD endpoints (close #369) +Configure set-cookie header (close #368) +Add test for the stdout sink (close #367) +Add http4s POST endpoint (close #366) +Add http4s graceful shutdown (close #365) +Add http4s module (close #364) +Add Snowplow Limited Use License (close #346) + Release 2.10.0 (2023-11-08) -------------------------- Update the Pubsub UserAgent format (#362)