Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
ericdmoore committed Mar 25, 2024
1 parent 2bc8d94 commit 0862155
Show file tree
Hide file tree
Showing 27 changed files with 852 additions and 44 deletions.
6 changes: 6 additions & 0 deletions deno.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
"@aws-sdk/client-polly": "https://esm.sh/@aws-sdk/[email protected]?deno-std=0.181.0&dts",
"@aws-sdk/client-s3/": "https://esm.sh/@aws-sdk/[email protected]/",
"@aws-sdk/client-s3": "https://esm.sh/@aws-sdk/[email protected]?deno-std=0.181.0&dts",
"@aws-sdk/client-glacier": "https://esm.sh/@aws-sdk/[email protected]?deno-std=0.181.0&dts",
"@aws-sdk/util-dynamodb": "https://esm.sh/@aws-sdk/[email protected]?deno-std=0.172.0&dts",
"@aws-sdk/signature-v4": "https://esm.sh/@aws-sdk/[email protected]?deno-std=0.181.0&dts",
"@aws-sdk/protocol-http": "https://esm.sh/@aws-sdk/[email protected]?deno-std=0.181.0&dts",
Expand All @@ -69,6 +70,7 @@
"deno_dom": "https://deno.land/x/[email protected]/deno-dom-wasm.ts",
"fluentSchema": "https://esm.sh/[email protected]",
"fromXml": "https://deno.land/x/[email protected]/mod.ts",
"snappy": "https://esm.sh/[email protected]?deno-std=0.181.0&dts",
"gzip_wasm": "https://deno.land/x/[email protected]/mod.ts",
"gfm": "https://deno.land/x/[email protected]/mod.ts",
"imurmurhash": "https://esm.sh/[email protected]?deno-std=0.181.0&dts",
Expand All @@ -90,6 +92,10 @@
"mu-forms/": "https://esm.sh/[email protected]/",
"multiformats": "https://esm.sh/[email protected]?deno-std=0.181.0&dts",
"mustache": "https://deno.land/x/[email protected]/mod.ts",

"apache-arrow":"https://esm.sh/[email protected]",
"parquet-wasm/":"https://esm.sh/[email protected]/",

"superstruct": "https://deno.land/x/[email protected]/mod.ts",
"stripe": "https://esm.sh/[email protected]?deno-std=0.181.0&dts",
"toXml": "https://deno.land/x/[email protected]/mod.ts",
Expand Down
4 changes: 4 additions & 0 deletions deno.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 30 additions & 6 deletions lib/clients/cache.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,21 @@
* - Scheduled
* - ML based
* 1. Ensemble Approach
*
* __Consider Adding a AWS-Glacier Layer.__
* As logic is built out for promotion and eviction - it should be noted that
* most cloud systems are set up to charge for promotions from cold-storage.
* So that would be a reasonable thing to surface in the client.
* Basically the higher the cloud-cost for re-promotion, the more we
* should be sure that we want to evict something.
* Especially if the re-promotion cost is higher than the do-nothing case.
*/

//#region imports
import { type PromiseOr } from "$lib/types.ts";
import { LRUCache } from "lru-cache";
import changeEncOf from "$lib/utils/enocdings.ts";
import { makeBytes } from "./cacheProviders/encoders/mod.ts";
import changeEncOf from "../utils/encodings.ts";
import { makeBytes } from "./cacheProviders/recoders/mod.ts";

//#endregion imports

Expand Down Expand Up @@ -104,7 +112,10 @@ export interface TransformFunctionGroup<NativeDataType = Uint8Array> {
}
export interface ICacheProvider<NativeDataType = Uint8Array> {
provider: string;
meta: Record<string, unknown>;
meta: {
size: () => number;
[name: string]: unknown;
};
transforms: TransformFunctionGroup<NativeDataType>;
set: (name: string, data: NativeDataType | string) => Promise<ICacheDataFromProvider<NativeDataType>>;
get: (name: string) => Promise<NullableProviderData<NativeDataType>>;
Expand Down Expand Up @@ -240,19 +251,32 @@ export const bytestoJsonWithTypeNote: TransformToBytes = (input: unknown | Uint8
});
};

export const defaultToBytesWithTypeNote: TransformToBytes = (input: unknown | Uint8Array) => {
export const defaultToBytesWithTypeNote = (input: Uint8Array | string | unknown) => {
const enc = new TextEncoder();
return input instanceof Uint8Array
? Promise.resolve({
data: changeEncOf(input).from("utf8").to("base64url").array(),
"content-type": "Uint8Array",
"content-encoding": "base64url;id",
})
} as ValueForCacheInternals)
: Promise.resolve({
data: enc.encode(JSON.stringify(input)),
"content-type": "string",
"content-encoding": "id",
});
} as ValueForCacheInternals);
};

export const defaultFromBytesWithNote: TransformFromBytes = (retrieved?: ICacheableDataForCache) => {
if (!retrieved?.value.data) {
return null;
} else if (retrieved.value["content-type"] !== "Uint8Array") {
const dec = new TextDecoder();
return typeof retrieved.value.data === "string"
? JSON.parse(retrieved.value.data)
: JSON.parse(dec.decode(retrieved.value.data));
} else {
return retrieved.value.data;
}
};

export const makeKey = async (name: string, renamer: RenamerFn) => ({ name, renamed: await renamer(name) }) as CacheKey;
Expand Down
122 changes: 122 additions & 0 deletions lib/clients/cacheProviders/aws-glacier.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
/**
* Due to the nature of how the glacier API works,
* we need to use the a DynamoDB table to store the metadata of the vaults.
* This helps with retaining the required information so that
* vault retrival is avaiable at some later time.
*
* Determine if this is actually helpful...
* The better approach might be to just ensure that
* lifecycle configs exist on buckets to use the Tiered Storage Classes
* so that S3 managages what is moved up and down witin the storage classes.
*
* Since Glacier is organized around "Running JOBs"... (and their status, and completions...)
* this feels like a sufficiently different way to manage the data
* than using within another tiered cache.
*/

// @ref: https://www.npmjs.com/package/@aws-sdk/client-glacier
import {
GlacierClient,
ListVaultsCommand,
UploadArchiveCommand,
type UploadArchiveInput,
} from "@aws-sdk/client-glacier";

import {
DeleteItemCommand,
DeleteItemCommandOutput,
DescribeTableCommand,
DescribeTableCommandOutput,
DescribeTimeToLiveCommand,
DescribeTimeToLiveCommandOutput,
DynamoDBClient,
GetItemCommand,
PutItemCommand,
QueryCommand,
type QueryCommandInput,
QueryCommandOutput,
ScanCommand,
type ScanCommandInput,
ScanCommandOutput,
} from "@aws-sdk/client-dynamodb";

import {
type CacheName,
defaultFromBytes,
defaultRenamer,
defaultToBytesWithTypeNote,
type ICacheableDataForCache,
type ICacheDataFromProvider,
type ICacheProvider,
type NullableProviderData,
type TransformFromBytes,
type TransformToBytes,
} from "../cache.ts";

/**
*
AbortMultipartUpload
AbortVaultLock
AddTagsToVault
CompleteMultipartUpload
CompleteVaultLock
CreateVault
- Names can be between 1 and 255 characters long.
- Allowed characters are a-z, A-Z, 0-9, '_' (underscore), '-' (hyphen), and '.' (period).
- limit of - 1,000 vaults per account
Creates a Vault Location - used in down stream calls
DeleteArchive
DeleteVault
DeleteVaultAccessPolicy
DeleteVaultNotifications
DescribeJob
DescribeVault
GetDataRetrievalPolicy
GetJobOutput
GetVaultAccessPolicy
GetVaultLock
GetVaultNotifications
InitiateJob
tyeps: "select", "archive-retrieval" and "inventory-retrieval"
- why not just instatiate each job type with its own command/function?
InitiateMultipartUpload
InitiateVaultLock
ListJobs
ListMultipartUploads
ListParts
ListProvisionedCapacity
ListTagsForVault
ListVaults
PurchaseProvisionedCapacity
RemoveTagsFromVault
SetDataRetrievalPolicy
SetVaultAccessPolicy
SetVaultNotifications
UploadArchive
UploadMultipartPart
*/

export interface GlacierCache extends ICacheProvider<Uint8Array> {
provider: string;
meta: {
size: () => number;
cloud: "AWS:Glacier";
service: "Glacier";
region: string;
describeTable: () => Promise<DescribeTableCommandOutput>;
describeTTL: () => Promise<DescribeTimeToLiveCommandOutput>;
deleteItem: (name: string) => Promise<DeleteItemCommandOutput>;
scan: (input: Omit<ScanCommandInput, "TableName">) => Promise<ScanCommandOutput>;
query: (input: Omit<QueryCommandInput, "TableName">) => Promise<QueryCommandOutput>;
};
transforms: {
renamer(originalName: string): Promise<CacheName>;
toBytes: TransformToBytes;
fromBytes: TransformFromBytes;
};
set: (name: string, data: Uint8Array | string) => Promise<ICacheDataFromProvider>;
get: (name: string) => Promise<NullableProviderData<Uint8Array>>;
peek: (name: string) => Promise<NullableProviderData<Uint8Array>>;
del: (name: string) => Promise<NullableProviderData<Uint8Array>>;
has: (name: string) => Promise<boolean>;
}
2 changes: 2 additions & 0 deletions lib/clients/cacheProviders/c2.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
// https://c2.synology.com/en-us/pricing/object-storage
//
27 changes: 16 additions & 11 deletions lib/clients/cacheProviders/dynamo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ export interface IDynamoCacheConfig {
export interface DynamoCache extends ICacheProvider<Uint8Array> {
provider: string;
meta: {
size: () => number;
cloud: "AWS:Dynamo";
service: "Dynamo";
region: string;
Expand Down Expand Up @@ -77,11 +78,13 @@ export const cache = async (
secretAccessKey: dyn.secret,
},
});
let size = 0;
const provider = "AWS:Dynamo";
const meta = {
cloud: "AWS:Dynamo" as const,
service: "Dynamo" as const,
region: dyn.region,
size: () => Object.freeze(size),
deleteItem: async (name: string) =>
dync.send(
new DeleteItemCommand({
Expand Down Expand Up @@ -122,7 +125,7 @@ export const cache = async (
console.error("cache.ts:412", e);
return payload;
});

size++;
return {
...payload,
value: {
Expand Down Expand Up @@ -164,16 +167,18 @@ export const cache = async (
TableName: dyn.table,
Key: marshall({ pk: renamed, sk: renamed }),
}),
).then(() => ({
meta,
provider,
key: { name, renamed },
value: {
data: new Uint8Array(),
transformed: new Uint8Array(),
},
} as ICacheDataFromProvider))
.catch(() => null);
).then(() => {
size--;
return {
meta,
provider,
key: { name, renamed },
value: {
data: new Uint8Array(),
transformed: new Uint8Array(),
},
} as ICacheDataFromProvider;
}).catch(() => null);
};

return {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// import {assert} from '$std/testing/asserts.ts'
import { PromiseOr } from "$lib/types.ts";
import { type PromiseOr } from "$lib/types.ts";
import { type EncModule, type EncModuleRet, makeBytes, makeString } from "./mod.ts";
import { type ValueForCacheInternals } from "../../cache.ts";

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,10 @@ export const gzip: EncModule = async (compressThreshold = 512) => {
const contentEncoding = value["content-encoding"].split(";");
const encoding = contentEncoding.shift();

if (encoding !== "base64url") {
return Promise.reject(new Error("base64url encoding not found"));
// @todo WTF? ???
if (encoding !== "gzip") {
// assert(encoding === "gzip");
return Promise.reject(new Error("gzip encoding not found"));
}

return Promise.resolve({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ import { type ValueForCacheInternals } from "../../cache.ts";
import { br } from "./br.ts";
import { zstd } from "./zstd.ts";
import { gzip } from "./gzip.ts";
import { snappy } from "./snappy.ts";
import { base64url } from "./b64url.ts";

export type AvailableEncodings = "id" | "base64url" | "hex" | "utf8" | "br" | "gzip" | "zstd";
export type AvailableEncodings = "id" | "base64url" | "hex" | "utf8" | "br" | "gzip" | "snappy" | "zstd";

export type EncModule = (...i: unknown[]) => Promise<EncModuleRet>;

Expand Down Expand Up @@ -69,6 +70,7 @@ export const id: EncModule = () => {
export const encoderMap = async (compressThreshold = 512) => ({
id: await id(),
br: await br(compressThreshold),
snappy: await snappy(compressThreshold),
gzip: await gzip(compressThreshold),
zstd: await zstd(compressThreshold),
base64url: await base64url(),
Expand Down Expand Up @@ -101,4 +103,4 @@ export const encodingWith = async (encodingmap?: PromiseOr<Record<string, EncMod
return { encode, decode };
};

export default { id, br, zstd, gzip, base64url, encoderMap, encodingWith };
export default { id, br, zstd, gzip, snappy, base64url, encoderMap, encodingWith };
32 changes: 32 additions & 0 deletions lib/clients/cacheProviders/recoders/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Summary

=========

## BR

Brotli usually gets the best compression ratio for JSON, readme, html files, etc. but usually at the cost of speed. It
is usually fast enough, but can choke on very large paylaods.

## Snappy

Lives up to the name - is always fastest operation - and starts to choke on large payloads around character count of
`4e7` which is shown in the `recoders.bench.ts` file.

## GZIP / GZlib

Sturdiest, longest-lived gold standard

## ZSTD

New, middle balanced, very fast, very good compression ratio lib, binary format is not the same as a gz

# External Considerations

- For AWS S3 Select - you can [compress a parquet columns][aws-s3-select] within the object using `Snappy` or `GZIP` or you can read data subsets stored in CSV or JSON record files.
- Also for AWS S3 Select - you can [compress a CSV or JSON file][aws-s3-select-doc] usiung `GZIP` or `BZIP2`


<!-- Ref Links -->

- [aws-s3-select-parquet]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html#selecting-content-from-objects-requirements-and-limits
- [aws-s3-select-doc]: https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#API_SelectObjectContent
Loading

1 comment on commit 0862155

@deno-deploy
Copy link

@deno-deploy deno-deploy bot commented on 0862155 Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed to deploy:

BOOT_FAILURE

Uncaught SyntaxError: The requested module 'https://deno.land/std/encoding/base64.ts' does not provide an export named 'decode'
    at https://deno.land/x/[email protected]/deno/zstd.ts:1:10

Please sign in to comment.