Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing mz data #20

Open
breidan opened this issue Feb 23, 2024 · 0 comments
Open

Missing mz data #20

breidan opened this issue Feb 23, 2024 · 0 comments

Comments

@breidan
Copy link

breidan commented Feb 23, 2024

Hi,
first off I want to express my appreciation that you developed this package. Up to now I would convert with msconvert to mzML and then import with mzR. The loss of the spectrum header data in this process always annoyed me.
With that being said I have noticed the following with a file of 135 MB acquired on an Orbitrap Elite in MS1 mode, m/z 50 -2000, 8796 scans. On a PC with 16 GB memory reading the whole file with

>spec<-rawrr::readSpectrum(files,scan=1:8796)
Error in .rawrrSystem2Source(rawfile, input = scan, rawrrArgs = "scans",  : 
  Parsing the output of 'C:\Users\operator\AppData\Local/R/cache/R/rawrr/rawrrassembly/rawrr.exe' failed for an unknown reason.
Please check the debug files:
	C:\Users\operator\AppData\Local\Temp\RtmpQVfxD5\file291814a545c5.stderr
	C:\Users\operator\AppData\Local\Temp\RtmpQVfxD5\file291829ef2f34.stdout
and the System Requirements

fails. The following works:

>beRaw <- Spectra::backendInitialize(
+ MsBackendRawFileReader::MsBackendRawFileReader(),
+ files = files)
> spec<-rawrr::readSpectrum(files,scan=1:2932)

Object sizes are 814 kB for beRaw (full file) and 179 MB for spec (1/3 of file) which brings me to the reason for my issue post. The rtime and the mzslots of all spectra are empty in beRaw:

>beRaw
MsBackendRawFileReader with 8796 spectra
       msLevel     rtime scanIndex
     <integer> <numeric> <integer>
1            1        NA         1
2            1        NA         2
3            1        NA         3
4            1        NA         4
5            1        NA         5
...        ...       ...       ...
8792         1        NA      8792
8793         1        NA      8793
8794         1        NA      8794
8795         1        NA      8795
8796         1        NA      8796
 ... 26 more variables/columns.

file(s):
02_eggpep.raw

>mz(beRaw)
NumericList of length 8796
[[1]] numeric(0)
[[2]] numeric(0)
[[3]] numeric(0)
[[4]] numeric(0)
[[5]] numeric(0)
[[6]] numeric(0)
[[7]] numeric(0)
[[8]] numeric(0)
[[9]] numeric(0)
[[10]] numeric(0)
...
<8786 more elements>

In spec those slots are filled with data:

>spec[1:10]|>map(~length(.x$mZ))
[[1]]
[1] 828

[[2]]
[1] 799

[[3]]
[1] 802

[[4]]
[1] 921

[[5]]
[1] 960

[[6]]
[1] 841

[[7]]
[1] 836

[[8]]
[1] 826

[[9]]
[1] 846

[[10]]
[1] 881

> spec[1:10]|>map("StartTime")
[[1]]
[1] 0.006031667

[[2]]
[1] 0.01105833

[[3]]
[1] 0.018325

[[4]]
[1] 0.02559167

[[5]]
[1] 0.03284667

[[6]]
[1] 0.04016667

[[7]]
[1] 0.04972167

[[8]]
[1] 0.05707

[[9]]
[1] 0.06377833

[[10]]
[1] 0.071045

Am I doing something wrong here with my MS data or is this a bug in MsBackendRawFileReader . And I know it is not the right place but can anything be done about the memory hunger of rawrr::readSpectrum? I can read in several of files like the above with MSnbase::readMSData without a problem.

EDIT:
I just realized that peaksData(beRaw) does provide the mz values albeit very, very slowly and without the rtime.
ANOTHER EDIT:
Is data read in with MsBackendRawFileReader "on-disk" or "in-memory"? That is not clear from the help documentation but looking at the object size it appears to be "on-disk".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant