Skip to content

Commit

Permalink
Implement new string functions
Browse files Browse the repository at this point in the history
  • Loading branch information
Rangi42 committed Feb 12, 2025
1 parent 48412e9 commit 3e0b619
Show file tree
Hide file tree
Showing 27 changed files with 584 additions and 134 deletions.
2 changes: 2 additions & 0 deletions include/asm/charmap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ void charmap_Pop();
void charmap_CheckStack();
void charmap_Add(std::string const &mapping, std::vector<int32_t> &&value);
bool charmap_HasChar(std::string const &mapping);
size_t charmap_CharSize(std::string const &mapping);
std::vector<int32_t> charmap_Convert(std::string const &input);
size_t charmap_ConvertNext(std::string_view &input, std::vector<int32_t> *output);
std::string charmap_Reverse(std::vector<int32_t> const &value, bool &unique);

#endif // RGBDS_ASM_CHARMAP_HPP
41 changes: 31 additions & 10 deletions man/rgbasm.5
Original file line number Diff line number Diff line change
Expand Up @@ -564,22 +564,17 @@ is equivalent to the regular string
(Note that this prevents raw strings from including the double quote character.)
Raw strings also may be contained in triple quotes for them to be multi-line, so they can include literal newline or quote characters (although still not three quotes in a row).
.Pp
The following functions operate on string expressions.
Most of them return a string, however some of these functions actually return an integer and can be used as part of an integer expression!
.Bl -column "STRSUB(str, pos, len)"
The following functions operate on string expressions, and return strings themselves.
.Bl -column "STRSLICE(str, start, stop)"
.It Sy Name Ta Sy Operation
.It Fn STRLEN str Ta Returns the number of characters in Ar str .
.It Fn STRCAT strs... Ta Concatenates Ar strs .
.It Fn STRCMP str1 str2 Ta Returns -1 if Ar str1 No is alphabetically lower than Ar str2 No , zero if they match, 1 if Ar str1 No is greater than Ar str2 .
.It Fn STRIN str1 str2 Ta Returns the first position of Ar str2 No in Ar str1 No or zero if it's not present Pq first character is position 1 .
.It Fn STRRIN str1 str2 Ta Returns the last position of Ar str2 No in Ar str1 No or zero if it's not present Pq first character is position 1 .
.It Fn STRSUB str pos len Ta Returns a substring from Ar str No starting at Ar pos No (first character is position 1, last is position -1) and Ar len No characters long. If Ar len No is not specified the substring continues to the end of Ar str .
.It Fn STRUPR str Ta Returns Ar str No with all ASCII letters
.Pq Ql a-z
in uppercase.
.It Fn STRLWR str Ta Returns Ar str No with all ASCII letters
.Pq Ql A-Z
in lowercase.
.It Fn STRSLICE str start stop Ta Returns a substring of Ar str No starting at Ar start No and ending at Ar stop No (exclusive). If Ar stop No is not specified, the substring continues to the end of Ar str Ns .
.It Fn STRRPL str old new Ta Returns Ar str No with each non-overlapping occurrence of the substring Ar old No replaced with Ar new .
.It Fn STRFMT fmt args... Ta Returns the string Ar fmt No with each
.Ql %spec
Expand All @@ -589,9 +584,35 @@ pattern replaced by interpolating the format
with its corresponding argument in
.Ar args
.Pq So %% Sc is replaced by the So % Sc character .
.It Fn INCHARMAP str Ta Returns 1 if Ar str No has an entry in the current charmap, and 0 otherwise .
.It Fn STRCHAR str idx Ta Returns the substring of Ar str No for the charmap entry at Ar idx No with the current charmap . Pq Ar idx No counts charmap entries, not characters.
.It Fn REVCHAR vals... Ta Returns the string that is mapped to Ar vals No with the current charmap. If there is no unique charmap entry for Ar vals Ns , an error occurs.
.El
.Pp
The following functions operate on string expressions, but return integers.
.Bl -column "STRRFIND(str, sub)"
.It Sy Name Ta Sy Operation
.It Fn STRLEN str Ta Returns the number of characters in Ar str .
.It Fn STRCMP str1 str2 Ta Compares Ar str1 No and Ar str2 No according to ASCII ordering of their characters. Returns -1 if Ar str1 No is lower than Ar str2 Ns , 1 if Ar str1 No is greater than Ar str2 Ns , or 0 if they match.
.It Fn STRFIND str sub Ta Returns the first index of Ar sub No in Ar str Ns , or -1 if it's not present.
.It Fn STRRFIND str sub Ta Returns the last index of Ar sub No in Ar str Ns , or -1 if it's not present.
.It Fn INCHARMAP str Ta Returns 1 if Ar str No has an entry in the current charmap, or 0 otherwise .
.It Fn CHARLEN str Ta Returns the number of charmap entries in Ar str No with the current charmap .
.It Fn CHARSUB str pos Ta Returns the substring for the charmap entry at Ar pos No in Ar str No (first character is position 1, last is position -1) with the current charmap .
.It Fn CHARCMP str1 str2 Ta Compares Ar str1 No and Ar str2 No according to their charmap entry values with the current charmap. Returns -1 if Ar str1 No is lower than Ar str2 Ns , 1 if Ar str1 No is greater than Ar str2 Ns , or 0 if they match.
.It Fn CHARSIZE char Ta Returns how many values are in the charmap entry for Ar char No with the current charmap.
.El
.Pp
Note that the first character of a string is at index 0, and the last is at index -1.
.Pp
The following legacy functions are similar to other functions that operate on string expressions, but for historical reasons, they count characters starting from
.Em position 1 ,
not from index 0!
(Position -1 still counts from the last character.)
.Bl -column "STRSUB(str, pos, len)"
.It Sy Name Ta Sy Operation
.It Fn STRSUB str pos len Ta Returns a substring of Ar str No starting at Ar pos No and Ar len No characters long. If Ar len No is not specified, the substring continues to the end of Ar str No .
.It Fn STRIN str sub Ta Returns the first position of Ar sub No in Ar str Ns , or 0 if it's not present.
.It Fn STRRIN str sub Ta Returns the last position of Ar sub No in Ar str Ns , or 0 if it's not present.
.It Fn CHARSUB str pos Ta Returns the substring of Ar str No for the charmap entry at Ar pos No with the current charmap . Pq Ar pos No counts charmap entries, not characters.
.El
.Ss Character maps
When writing text strings that are meant to be displayed on the Game Boy, the character encoding in the ROM may need to be different than the source file encoding.
Expand Down
78 changes: 61 additions & 17 deletions src/asm/charmap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,29 @@ struct CharmapNode {
struct Charmap {
std::string name;
std::vector<CharmapNode> nodes; // first node is reserved for the root node

// Traverse the trie depth-first to derive the character mappings in definition order
template<typename F>
bool forEachChar(F callback) const {
// clang-format off: nested initializers
for (std::stack<std::pair<size_t, std::string>> prefixes({{0, ""}}); !prefixes.empty();) {
// clang-format on
auto [nodeIdx, mapping] = std::move(prefixes.top());
prefixes.pop();
CharmapNode const &node = nodes[nodeIdx];
if (node.isTerminal()) {
if (!callback(nodeIdx, mapping)) {
return false;
}
}
for (unsigned c = 0; c < std::size(node.next); c++) {
if (size_t nextIdx = node.next[c]; nextIdx) {
prefixes.push({nextIdx, mapping + static_cast<char>(c)});
}
}
}
return true;
}
};

static std::deque<Charmap> charmapList;
Expand All @@ -44,24 +67,12 @@ bool charmap_ForEach(
void (*charFunc)(std::string const &, std::vector<int32_t>)
) {
for (Charmap const &charmap : charmapList) {
// Traverse the trie depth-first to derive the character mappings in definition order
std::map<size_t, std::string> mappings;
// clang-format off: nested initializers
for (std::stack<std::pair<size_t, std::string>> prefixes({{0, ""}});
!prefixes.empty();) {
// clang-format on
auto [nodeIdx, mapping] = std::move(prefixes.top());
prefixes.pop();
CharmapNode const &node = charmap.nodes[nodeIdx];
if (node.isTerminal()) {
mappings[nodeIdx] = mapping;
}
for (unsigned c = 0; c < 256; c++) {
if (size_t nextIdx = node.next[c]; nextIdx) {
prefixes.push({nextIdx, mapping + static_cast<char>(c)});
}
}
}
charmap.forEachChar([&mappings](size_t nodeIdx, std::string const &mapping) {
mappings[nodeIdx] = mapping;
return true;
});

mapFunc(charmap.name);
for (auto [nodeIdx, mapping] : mappings) {
charFunc(mapping, charmap.nodes[nodeIdx].value);
Expand Down Expand Up @@ -178,6 +189,22 @@ bool charmap_HasChar(std::string const &mapping) {
return charmap.nodes[nodeIdx].isTerminal();
}

size_t charmap_CharSize(std::string const &mapping) {
Charmap const &charmap = *currentCharmap;
size_t nodeIdx = 0;

for (char c : mapping) {
nodeIdx = charmap.nodes[nodeIdx].next[static_cast<uint8_t>(c)];

if (!nodeIdx) {
return 0;
}
}

CharmapNode const &node = charmap.nodes[nodeIdx];
return node.isTerminal() ? node.value.size() : 0;
}

std::vector<int32_t> charmap_Convert(std::string const &input) {
std::vector<int32_t> output;
for (std::string_view inputView = input; charmap_ConvertNext(inputView, &output);) {}
Expand Down Expand Up @@ -263,3 +290,20 @@ size_t charmap_ConvertNext(std::string_view &input, std::vector<int32_t> *output
input = input.substr(inputIdx);
return matchLen;
}

std::string charmap_Reverse(std::vector<int32_t> const &value, bool &unique) {
Charmap const &charmap = *currentCharmap;
std::string revMapping;
unique = charmap.forEachChar([&](size_t nodeIdx, std::string const &mapping) {
if (charmap.nodes[nodeIdx].value == value) {
if (revMapping.empty()) {
revMapping = mapping;
} else {
revMapping.clear();
return false;
}
}
return true;
});
return revMapping;
}
7 changes: 7 additions & 0 deletions src/asm/lexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -240,19 +240,26 @@ static std::unordered_map<std::string, int, CaseInsensitive, CaseInsensitive> ke
{"TZCOUNT", T_(OP_TZCOUNT) },

{"STRCAT", T_(OP_STRCAT) },
{"STRCHAR", T_(OP_STRCHAR) },
{"STRCMP", T_(OP_STRCMP) },
{"STRFIND", T_(OP_STRFIND) },
{"STRFMT", T_(OP_STRFMT) },
{"STRIN", T_(OP_STRIN) },
{"STRLEN", T_(OP_STRLEN) },
{"STRLWR", T_(OP_STRLWR) },
{"STRRFIND", T_(OP_STRRFIND) },
{"STRRIN", T_(OP_STRRIN) },
{"STRRPL", T_(OP_STRRPL) },
{"STRSLICE", T_(OP_STRSLICE) },
{"STRSUB", T_(OP_STRSUB) },
{"STRUPR", T_(OP_STRUPR) },

{"CHARCMP", T_(OP_CHARCMP) },
{"CHARLEN", T_(OP_CHARLEN) },
{"CHARSIZE", T_(OP_CHARSIZE) },
{"CHARSUB", T_(OP_CHARSUB) },
{"INCHARMAP", T_(OP_INCHARMAP) },
{"REVCHAR", T_(OP_REVCHAR) },

{"INCLUDE", T_(POP_INCLUDE) },
{"PRINT", T_(POP_PRINT) },
Expand Down
Loading

0 comments on commit 3e0b619

Please sign in to comment.