Exporting gWalks into json files #156

mesut-unal · 2024-10-11T17:07:15Z

Dear developers, I am using gGnome for a project. It is really helpful for getting the walks, but I am having trouble exporting all the walks to a json file that is suitable for gGnome.js browser (from the rds files). I tried writing them into csv files by using nodes, edges, grl info so that I can compile a json from them. It works and I can visualize them in gGnome.js, but I need to double check if I am extracting everything correctly. There are variables and names whose explanation I can't find such as edges.in and edges.out columns in gNodes. For instance, I see (4)->,1560(1)-> in one of the rows, but I can't be sure what these numbers are. I can't see 1560 anywhere else in the csv files I extracted. I'd appreciate if you can tell me where I can obtain such details and/or a gWalks-to-json function that I can cross check my results. I see that you have gen_gg_json_files jsUtils.R, but I haven't been able to use it without an error. I'd appreciate your help.

The text was updated successfully, but these errors were encountered:

shihabdider · 2024-11-01T15:36:13Z

Thanks for trying the package!

Have you tried using the $json method of the gWalk class?

Here's an example signature:

 gwalk$json(
            filename = output_path, 
            verbose = TRUE,
            annotations = annotations,
            include.graph = FALSE
        )

(where gwalk is an instantiated gWalk object). See: http://www.mskilab.com.s3-website-us-east-1.amazonaws.com/gGnome/tutorial.html#How_are_files_generated

For the edges.in, the number in parens is the copy number of that edge. The other number is the node index the edge is pointing to/from.

mesut-unal · 2024-11-01T17:08:06Z

Thanks for the reply. Yes, I tried it. When I use it on a gWalk that contains all the walks obtained with peel function

gg = gG(jabba=graph)
walks = peel(gg, verbose = T)
saveRDS(walks,outputname.rds)

I get this error

[1] "#############"
Key: <listid>
       listid    V1
        <num> <int>
    1:      1 -1714
    2:      1 -1713
    3:      1 -1712
    4:      1 -1711
    5:      1 -1710
   ---             
21681:    462 -1146
21682:    462 -1145
21683:    463  -849
21684:    463  -848
21685:    464 -1381
Error in cids[[x]] : subscript out of bounds
In addition: Warning message:
In rbind(abbs, toprint) :
  number of columns of result is not a multiple of vector length (arg 1)

I tried it also on the rds files that have gWalks with a specific length and containing a specific gene and I could produce a json file with

Key: <listid>
   listid    V1
    <num> <int>
1:      1  1751
2:      1  1752
Saving JSON to: '/path/to/your/data/walks_json'
Warning message:
In rbind(abbs, toprint) :
  number of columns of result is not a multiple of vector length (arg 1)

However, it is empty when I import it to gGnome.js browser.

shihabdider · 2024-11-01T18:19:01Z

Can you post the exact code you're using (including the calls for the $json method and the call that is actually producing the error). Also a print of the walks object.

mesut-unal · 2024-11-01T19:18:00Z

yes, this is the part that returns the error message

rds_files <- list.files(input_folder, pattern = "^226295-WG01.*_walks.rds$", full.names = TRUE)

gwalk <- readRDS(rds_files[1])

gwalk$json(
  filename = "/path/to/your/data/walks_rds/227184-WG01_walks.json", 
  verbose = TRUE,
  include.graph = FALSE
)

and the output of gwalk is

> gwalk
gWalk object with 464 walks (432 linear and 32 circular)
Key: <walk.id>
   walk.id  name length       wid circular     cn
     <num> <int>  <int>     <int>    <num> <lgcl>
1:       1   405      6 198295559    FALSE      1
2:       2   406      3 190207008    FALSE      2
3:       3   362      4 170805979    FALSE      1
4:       4   363      8 161118484    FALSE      1
5:       5   407     10 152725008    FALSE      1
                                                                                                                                                            gr
                                                                                                                                                         <num>
1:         3:168352403-198295559- -> 3:164600003-168352402- -> 3:164187603-164600002- -> ... -> 3:91136003-164187602- -> 3:60661899-91136002- -> 3:1-60661898-
2:                                                                                            4:1-78347980+ -> 4:78354042-180955649+ -> 4:180957136-190214555+
3:                                                                    6:113745687-170805979- -> 6:47596995-113745686- -> 6:23081821-47596994- -> 6:1-23081820-
4: 7:142415727-143707602- -> 7:75256603-142415726- -> 7:70973894-75256602- -> ... -> 7:75256603-142415726+ -> 7:142415727-143707602+ -> 7:143707603-159345973+
5:          12:95660441-95727668- -> 12:56991907-57034372- -> 12:56929770-56991906- -> ... -> 1:124932203-150228402- -> 1:94954905-124932202- -> 1:1-94947816-

 ... 
(459 more walks )
Warning message:
In rbind(abbs, toprint) :
  number of columns of result is not a multiple of vector length (arg 1)

Please let me know if you need anything else.

shihabdider · 2024-11-01T19:26:27Z

Any chance you can upload one of the *walks.rds files into this issue thread? (assuming it's not protected patient data). I can then try reproducing your error on my side.

mesut-unal · 2024-11-20T20:40:12Z

Hi @shihabdider, did you get a chance to look at the data I sent you?

shihabdider · 2024-11-22T13:03:51Z

@mesut-unal Sorry for the long delay! (It's been a hectic month). I'll take a look now and get back to you by end of day.

shihabdider · 2024-11-22T13:50:34Z

OK it seems like there's an issue with a mismatch between the length of cids and that of the walks. I need to investigate further on why these cids are not being generated for these walks, but in the meantime, the following custom_json function can be used in place of the default $json method, which should fix this error by ignoring entries for which there is no cid:

custom_json <- function (
	walk,
	filename = ".",
	save = TRUE,
	verbose = FALSE,
	annotations = NULL,
	nfields = NULL,
	efields = NULL,
	stack.gap = 1e+05,
	include.graph = TRUE,
	settings = list(y_axis = list(title = "copy number", visible = TRUE)),
	cid.field = NULL,
	no.y = FALSE
) {
	message("custom_json")
    if (length(walk) == 0) {
        warning("This is an empty gWalk so no JSON will be produced.")
        return(NA)
    }
    if (length(walk$edges) == 0) {
        warning("There are no edges in this gWalk so no JSON will be produced.")
        return(NA)
    }
    non.alt.exist = any(walk$dt[, sapply(sedge.id, length) == 0])
    if (non.alt.exist) {
        return(refresh(walk[walk$dt[, sapply(sedge.id, length) > 0]])$json(filename = filename, save = save, verbose = verbose, annotations = annotations, nfields = nfields, efields = efields, stack.gap = stack.gap, include.graph = include.graph, settings = settings, no.y = no.y))
    }
    if (include.graph) {
        graph.js = refresh(walk$graph)$json(filename = NA, save = FALSE, verbose = verbose, annotations = annotations, nfields = nfields, efields = efields, settings = settings, no.y = no.y)
    }
    pids = split(walk$dt[, .(pid = walk.id, strand = "+", type = ifelse(walk$circular, "cycle", "path"))], 1:walk$length)
    efields = unique(c("type", efields))
    protected_efields = c("cid", "source", "sink", "title", "weight")
    rejected_efields = intersect(efields, protected_efields)
    if (length(rejected_efields) > 0) {
        warning(sprintf("The following fields were included in efields: \"%s\", but since these are conserved fields in the json walks output then they will be not be included in efields. If these fields contain important metadata that you want included in the json output, then consider renaming these field names in your gWalk object.", paste(rejected_efields, collapse = "\" ,\"")))
        efields = setdiff(efields, rejected_efields)
    }
    missing_efields = setdiff(efields, names(walk$edges$dt))
    if (length(missing_efields) > 0) {
        warning(sprintf("Invalid efields value/s provided: \"%s\". These fields were not found in the gWalk and since will be ignored.", paste(missing_efields, collapse = "\" ,\"")))
        efields = intersect(efields, names(walk$edges$dt))
    }
    sedu = dunlist(walk$sedge.id)
    print("#############")
    print(sedu)
    cids = lapply(unname(split(cbind(data.table(cid = sedu$V1, source = walk$graph$edges[sedu$V1]$left$dt$snode.id, sink = -walk$graph$edges[sedu$V1]$right$dt$snode.id, title = "", weight = 1), walk$graph$edges[sedu$V1]$dt[, ..efields], fill = TRUE), sedu$listid)), function(x) unname(split(x, 1:nrow(x))))
    snu = dunlist(walk$snode.id)
    snu$ys = gGnome:::draw.paths.y(walk$grl) %>% unlist
    protected_nfields = c("chromosome", "startPoint", "endPoint", "y", "type", "strand", "title")
    rejected_nfields = intersect(nfields, protected_nfields)
    if (length(rejected_nfields) > 0) {
        warning(sprintf("The following fields were included in nfields: \"%s\", but since these are conserved fields in the json walks output then they will be not be included in nfields. If these fields contain important metadata that you want included in the json output, then consider renaming these field names in your gWalk object.", paste(rejected_nfields, collapse = "\" ,\"")))
        nfields = setdiff(nfields, rejected_nfields)
    }
    missing_nfields = setdiff(nfields, names(walk$nodes$dt))
    if (length(missing_nfields) > 0) {
        warning(sprintf("Invalid nfields value/s provided: \"%s\". These fields were not found in the gWalk and since will be ignored.", paste(missing_nfields, collapse = "\" ,\"")))
        nfields = intersect(nfields, names(walk$edges$dt))
    }
    iids = lapply(unname(split(cbind(data.table(iid = abs(snu$V1)), walk$graph$nodes[snu$V1]$dt[, .(chromosome = seqnames, startPoint = start, endPoint = end, y = snu$ys, type = "interval", strand = ifelse(snu$V1 > 0, "+", "-"), title = abs(snu$V1))], walk$graph$nodes[snu$V1]$dt[, ..nfields]), snu$listid)), function(x) unname(split(x, 1:nrow(x))))
    walks.js = lapply(1:min(length(walk), length(cids), length(iids)), 
        function(x) c(as.list(pids[[x]]), list(cids = rbindlist(cids[[x]])), list(iids = rbindlist(iids[[x]]))))
    if (include.graph) {
        out = c(graph.js, list(walks = walks.js))
    }
    else {
        out = list(walks = walks.js)
    }
    if (save) {
        if (verbose) {
            message("Saving JSON to: ", filename)
        }
        jsonlite::write_json(out, filename, pretty = TRUE, auto_unbox = TRUE, digits = 4)
        return(normalizePath(filename))
    }
    else {
        return(out)
    }
}

walk = readRDS("226295-WG01_walks.rds")
custom_json(walk, filename="test.json", verbose = TRUE, include.graph = FALSE)

mesut-unal · 2024-12-17T18:02:25Z

Hi @shihabdider , thanks for taking a look at it and preparing the custom json. Did you get a chance to try the output on gGnome.js browser? I still get the same problem with the custom_json, browser shows an empty page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exporting gWalks into json files #156

Exporting gWalks into json files #156

mesut-unal commented Oct 11, 2024

shihabdider commented Nov 1, 2024

mesut-unal commented Nov 1, 2024

shihabdider commented Nov 1, 2024

mesut-unal commented Nov 1, 2024

shihabdider commented Nov 1, 2024

mesut-unal commented Nov 20, 2024

shihabdider commented Nov 22, 2024

shihabdider commented Nov 22, 2024 •

edited

Loading

mesut-unal commented Dec 17, 2024

Exporting gWalks into json files #156

Exporting gWalks into json files #156

Comments

mesut-unal commented Oct 11, 2024

shihabdider commented Nov 1, 2024

mesut-unal commented Nov 1, 2024

shihabdider commented Nov 1, 2024

mesut-unal commented Nov 1, 2024

shihabdider commented Nov 1, 2024

mesut-unal commented Nov 20, 2024

shihabdider commented Nov 22, 2024

shihabdider commented Nov 22, 2024 • edited Loading

mesut-unal commented Dec 17, 2024

shihabdider commented Nov 22, 2024 •

edited

Loading