Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Unique strings #2

Open
lisp-is-the-future opened this issue Sep 3, 2018 · 10 comments
Open

Empty Unique strings #2

lisp-is-the-future opened this issue Sep 3, 2018 · 10 comments

Comments

@lisp-is-the-future
Copy link

Dear Flavio,

searching for a way to read-in .xlsx files using common lisp, since I really want to try to use it in every day life (I am a Bioinformatics Postdoc in Germany),
I encountered some problems.

I want to read-in the bigger file which was created using R's openxlsx package.
And as a test, I created a dummy small file, which contains actually only
the table

A | B | C
1 | 2 | 3
4 | 5 | 6

Reading my xlsx files (test an desired file) in lisp-xl both resulted in empty strings.
The :xlsx package could handle the small test, but not the big, desired, file.
I could also send you the xlsx files if required.

;; I tried the following using
;; SBCL 1.3.1.debian
;; in emacs/SLIME:

;; I created the small file using libreoffice
;; LibreOffice 5.1.6.2 10m0(Build:2)
;; in ubuntu 16.04

;; and the big file using R openxlsx package (the current one)
;; in ubuntu 16.04

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;
;; file paths
;;;;;;;;;;;;;;;;;;;;;;;;

(defparameter small-fpath #P"/home/josephus/test/xlsx-files/test.xlsx")
(defparameter big-fpath #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx")

;;;;;;;;;;;;;;;;;;;;;;;;
;; try the small file
;;;;;;;;;;;;;;;;;;;;;;;;

(ql:quickload :xlsx)
(xlsx:list-sheets small-fpath)

;; => ((1 "Sheet1" "worksheets/sheet1.xml"))

(xlsx:read-sheet small-fpath 1)

;; => (((:A . 1) . "A") ((:B . 1) . "B") ((:C . 1) . "C") ((:A . 2) . 1)
;; ((:B . 2) . 2) ((:C . 2) . 3) ((:A . 3) . 4) ((:B . 3) . 5) ((:C . 3) . 6))

(xlsx:as-matrix (xlsx:read-sheet small-fpath 1))

;; => #2A(("A" "B" "C") (1 2 3) (4 5 6)), (:A :B :C), (1 2 3)

;; the quicklisp package is that from Carlos Ungil
;; add from Akihide Nano's site:
(defun as-alist (xlsx)
"Creates an a-list from a list of cells of the form ((:A1 . 42) (:B1 . 21))"
(mapcar #'(lambda (lst)
(cons (intern (concatenate 'string
(symbol-name (caar lst))
(write-to-string (cdar lst))) :keyword)
(cdr lst)))
xlsx))

(defun as-plist (xlsx)
"Creates a p-list from a list of cells of the form (:A1 42 :B1 21)"
(mapcan #'(lambda (lst)
(cons (intern (concatenate 'string
(symbol-name (caar lst))
(write-to-string (cdar lst))) :keyword)
(list (cdr lst))))
xlsx))

(as-alist (xlsx:read-sheet small-fpath 1))

;; => ((:A1 . "A") (:B1 . "B") (:C1 . "C") (:A2 . 1) (:B2 . 2) (:C2 . 3) (:A3 . 4)
;; (:B3 . 5) (:C3 . 6))

(as-plist (xlsx:read-sheet small-fpath 1))

;; => (:A1 "A" :B1 "B" :C1 "C" :A2 1 :B2 2 :C2 3 :A3 4 :B3 5 :C3 6)

;;;;;;;;;;;;;;;;;;;;;;;
;; now try the big file
;;;;;;;;;;;;;;;;;;;;;;;

(xlsx:read-sheet big-fpath 1)

;; Node does not have a single string child: #S(XMLS:NODE
;; :NAME t
;; :NS http://schemas.openxmlformats.org/spreadsheetml/2006/main
;; :ATTRS ((space
;; preserve))
;; :CHILDREN NIL)
;; [Condition of type SIMPLE-ERROR]

;; Restarts:
;; 0: [RETRY] Retry SLIME interactive evaluation request.
;; 1: [*ABORT] Return to SLIME's top level.
;; 2: [ABORT] abort thread (#<THREAD "worker" RUNNING {1006CA5C33}>)

;; Backtrace:
;; 0: (XMLS:XMLREP-STRING-CHILD #S(XMLS:NODE
;; :NAME "t"
;; :NS "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
;; :ATTRS (("space" "preserve")) :CHILDREN NIL) :ERROR)
;; Locals:
;; IF-UNFOUND = :ERROR
;; TREENODE = #S(XMLS:NODE
;; :NAME "t"
;; :NS "http://schemas.openxmlformats.org/spreadsheetml/2006/main"
;; :ATTRS (("space" "preserve")) :CHILDREN NIL)
;; 1: (XLSX::GET-UNIQUE-STRINGS #S(ZIP:ZIPFILE
;; :STREAM #<SB-SYS:FD-STREAM for
;; "file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx"
;; {1003FBF2B3}>
;; :ENTRIES #<HASH-TABLE
;; :TEST EQUAL
;; :COUNT 12 {..
;; Locals:
;; SB-DEBUG::ARG-0 = #S(ZIP:ZIPFILE
;; :STREAM #<SB-SYS:FD-STREAM for "file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx"
;; {1003FBF2B3}>
;; :ENTRIES #<HASH-TABLE
;; :TEST EQUAL
;; :COUNT 12 {10088E0403}>)
;; 2: (XLSX:READ-SHEET #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" 1)
;; Locals:
;; FILE = #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx"
;; SHEET = 1
;; 3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (XLSX:READ-SHEET BIG-FPATH 1) #)
;; Locals:
;; SB-DEBUG::ARG-0 = (XLSX:READ-SHEET BIG-FPATH 1)
;; SB-DEBUG::ARG-1 = #
;; 4: (EVAL (XLSX:READ-SHEET BIG-FPATH 1))
;; Locals:
;; SB-DEBUG::ARG-0 = (XLSX:READ-SHEET BIG-FPATH 1)
;; 5: ((LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL))
;; [No Locals]
;; --more--

;; so :xlsx package cannot handle the big file

;; so it was not possible to open the big file using the :xlsx package

(ql:quickload :lisp-xl)
To load "lisp-xl":
Load 1 ASDF system:
lisp-xl
; Loading "lisp-xl"
;;; Checking for wide character support... WARNING: Lisp implementation doesn't use UTF-16, but accepts surrogate code points.
yes, using code points.
..
;;; Checking for wide character support... WARNING: Lisp implementation doesn't use UTF-16, but accepts surrogate code points.
yes, using code points.
;;; Building Closure with CHARACTER RUNES
.....

(defparameter t1 (lisp-xl:read-sheet small-fpath 1))

;; Reading file /home/josephus/test/xlsx-files/test.xlsx
;; Loading metadata into RAM...3 unique strings found.
;; Uncompressing to File [/tmp/lisp-xl-tempGHU3ALSW.tmp] ...

(print t1)

;; #S(LISP-XL::SHEET
;; :UNIQUE-STRINGS #(" " " " " ") ;; the unique strings are empty
;; :NUMBER-FORMATS NIL
;; :DATE-FORMATS NIL
;; :FILE-NAME #P"/tmp/lisp-xl-tempGHU3ALSW.tmp"
;; :LAST-STREAM-POSITION NIL)

(defparameter t2 (lisp-xl:read-sheet big-fpath 1))

;; Reading file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx
;; Loading metadata into RAM...45814 unique strings found.
;; Uncompressing to File [/tmp/lisp-xl-tempAAURSO2.tmp] ...

(print t2)

;; #S(LISP-XL::SHEET
;; :UNIQUE-STRINGS #(" " " " " " " " " " " " " " " " " " " " " " " " " " " "
;; " " " " " " " " " " " " " " " " " " " " " " " " " " " "

;; [ ... may left out ...]

;; " " " " " " " " " " " " " " " " " " " " " " " " " " " "
;; " " " " " " " " " " " ")
;; :NUMBER-FORMATS NIL
;; :DATE-FORMATS NIL
;; :FILE-NAME #P"/tmp/lisp-xl-tempAAURSO2.tmp"
;; :LAST-STREAM-POSITION NIL)

(lisp-xl-csv:excel-to-csv small-fpath "/home/josephus/test/xlsx-files/test.csv" 1)

;; gives

, ,
,,
,,

;; as output in test.csv
;; so strings are really empty

;; so somehow no strings are recognized ...
;; I created the small file using libreoffice
;; LibreOffice 5.1.6.2 10m0(Build:2)
;; in ubuntu 16.04

;; and the big file using R openxlsx package (the current one)
;; in ubuntu 16.04

;; would you have a clue, what the problem could be?
;; the desired big file has 45809 rows and 6 columns.

@defunkydrummer
Copy link
Owner

Hi @lisp-is-the-future , do you think you could upload the XLSX file with the problem? or at least a partial version of that file?

I'm sorry not to answer before: i can't remember which of my personal mail account is used by GitHub is for notifying.

@lisp-is-the-future
Copy link
Author

lisp-is-the-future commented Nov 7, 2018 via email

@defunkydrummer
Copy link
Owner

@lisp-is-the-future can you send me the files again? I'm revisiting this.

If you want you can send to my hotmail account, the address is on the readme.

@defunkydrummer
Copy link
Owner

@lisp-is-the-future i have commited a new version, btw, with a bug fix.

@lisp-is-the-future
Copy link
Author

lisp-is-the-future commented Aug 13, 2019

@defunkydrummer sorry for long not answering. I tried it again. But this time I have a smilar but slightly different error:


(ql:quickload :lisp-xl)

(defparameter *f* "/home/josephus/Dropbox/amit_scripts/RawCounts.xlsx")

(defparameter *c* (lisp-xl:read-sheet *f* 1))

#|
The value
  #S(LISP-XL::SHEET
     :UNIQUE-STRINGS #(#1=" " #1# #1# #1# #1# #1# #1# #1# #1#
                       #1# #1# #1# #1# #1# #1# #1# #1# #1# #1#
                       #1# #1# #1# #1# #1# #1# #1# #1# #1# #1#
                       ...
                       #1# #1# #1# #1# #1# #1# #1#..
   [Condition of type TYPE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME interactive evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] abort thread (#<THREAD "worker" RUNNING {100EBCC2E3}>)

Backtrace:
  0: (LENGTH #S(LISP-XL::SHEET :UNIQUE-STRINGS #(#1=" " #1# #1# #1# #1# #1# ...) :NUMBER-FORMATS NIL :DATE-FORMATS NIL :FILE-NAME #P"/tmp/lisp-xl-tempSVDN4TBO.tmp" :LAST-STREAM-POSITION NIL))
  1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LENGTH *C*) #<NULL-LEXENV>)
  2: (EVAL (LENGTH *C*))
  3: ((LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL))
  4: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME interactive evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL) {100E84E44B}>)
  5: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL) {100E84E42B}>)
  6: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:INTERACTIVE-EVAL "(length *c*)") #<NULL-LEXENV>)
  7: (EVAL (SWANK:INTERACTIVE-EVAL "(length *c*)"))
  8: (SWANK:EVAL-FOR-EMACS (SWANK:INTERACTIVE-EVAL "(length *c*)") "COMMON-LISP-USER" 205)
  9: ((LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD))
 10: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<FUNCTION (LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD) {2274CD7B}>)
 11: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "/home/josephus/.emacs.d/elpa/slime-20180413.1720/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<FUNCTION (LAMBDA NIL :IN SWANK::SPAWN-WORKE..
 12: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1008F6B5F3}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1009053713}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 13: ((LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD))
 14: ((FLET SB-UNIX::BODY :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 15: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 16: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 17: ((FLET "WITHOUT-INTERRUPTS-BODY-1" :IN SB-THREAD::CALL-WITH-MUTEX))
 18: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {7FFFE0F86D4B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THR..
 19: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "worker" RUNNING {100EBCC2E3}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD) {100EBCC28B}> NIL)
 20: ("foreign function: call_into_lisp")
 21: ("foreign function: new_thread_trampoline")
|#

@gwangjinkim
Copy link

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

@lisp-is-the-future
Copy link
Author

lisp-is-the-future commented Aug 24, 2019 via email

@lisp-is-the-future
Copy link
Author

lisp-is-the-future commented Aug 24, 2019 via email

@defunkydrummer
Copy link
Owner

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading. https://github.com/a1b10/cl-xlsx/blob/master/README.md

On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim @.***> wrote: I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266>, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

Hey, thanks for this!! This will benefit the Lisp ecosystem.

I see your lib loads the whole file into RAM which was what i tried to avoid, because we used very big files, often more than 100MB.

Be sure to add your lib to the CLIKI wiki website!

@gwangjinkim
Copy link

Hey! Welcome!
Yes, at the moment - but current version - doesn't use :cxml - thus no streams - thus loads everything into RAM. There was a bug and @slyrus fixed the bug by rewriting it in that way with some other dependencies (e.g. :FXML). I wanted however re-introduce the stream usage again. Thank you for your hint to add it to CLKI wiki website. I'll do it as soon as I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants