IndexError: string index out of range #4

bio-bench · 2018-01-31T09:24:51Z

Get this error when using human_g1k_v37.fasta. Is there a log file that I can check the cause of this error?

endrebak · 2018-02-01T10:53:23Z

Thanks for the report.

Can you please post the output of

head -1 <your fasta file>

head <your bim file>

and the output snpflip posts to stdout including the error message?

Thanks,

Endre

juliedwhite · 2018-03-24T21:41:57Z

Hi, I'm getting the same error as @M-Saleem
Specifically:

snpflip --fasta-genome ~/work/ReferenceDatasets/1000G_hg19_fasta/human_g1k_v37.fasta 
  --bim-file ADAPT_2784ppl_567K_hg19_UpdateChrCodeXYM.bim 
  --output-prefix ADAPT_2784ppl_567K_hg19

Chromosome MT in .bim, but not in fasta file.
There were 1 'N' nucleotides in chromosome 1.
Traceback (most recent call last):
  File "/storage/home/jdw345/software/bin/snpflip", line 46, in <module>
    snp_table = create_snp_table(args["--bim-file"], args["--fasta-genome"])
  File "/storage/home/jdw345/software/lib/python3.6/site-packages/snp_flip/table.py", line 11, in create_snp_table
    reference_genome_data = get_reference_genome_data(bim_table, fa_file)
  File "/storage/home/jdw345/software/lib/python3.6/site-packages/snp_flip/reference_genome.py", line 37, in get_reference_genome_data
    snp_nucleotides = [snp.upper() for snp in _get_snps(str(nucleotides))]
IndexError: string index out of range

Here is the output from head -1 <fasta file>

head -l ~/work/ReferenceDatasets/1000G_hg19_fasta/human_g1k_v37.fasta
>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1

Here is the output from head <bim file>

head ADAPT_2784ppl_567K_hg19.bim
1       rs4477212       0       82154   0       A
1       rs12564807      0       734462  0       A
1       rs3094315       0       752566  G       A
1       rs3131972       0       752721  A       G
1       rs148828841     0       760998  A       C
1       rs12562034      0       768448  0       G
1       rs12124819      0       776546  G       A
1       rs115093905     0       787173  T       G
1       rs11240777      0       798959  A       G
1       rs6681049       0       800007  0       C

In case it is also helpful, here are all of the headings in my fasta file:

grep ">" ~/work/ReferenceDatasets/1000G_hg19_fasta/human_g1k_v37.fasta
>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1
>2 dna:chromosome chromosome:GRCh37:2:1:243199373:1
>3 dna:chromosome chromosome:GRCh37:3:1:198022430:1
>4 dna:chromosome chromosome:GRCh37:4:1:191154276:1
>5 dna:chromosome chromosome:GRCh37:5:1:180915260:1
>6 dna:chromosome chromosome:GRCh37:6:1:171115067:1
>7 dna:chromosome chromosome:GRCh37:7:1:159138663:1
>8 dna:chromosome chromosome:GRCh37:8:1:146364022:1
>9 dna:chromosome chromosome:GRCh37:9:1:141213431:1
>10 dna:chromosome chromosome:GRCh37:10:1:135534747:1
>11 dna:chromosome chromosome:GRCh37:11:1:135006516:1
>12 dna:chromosome chromosome:GRCh37:12:1:133851895:1
>13 dna:chromosome chromosome:GRCh37:13:1:115169878:1
>14 dna:chromosome chromosome:GRCh37:14:1:107349540:1
>15 dna:chromosome chromosome:GRCh37:15:1:102531392:1
>16 dna:chromosome chromosome:GRCh37:16:1:90354753:1
>17 dna:chromosome chromosome:GRCh37:17:1:81195210:1
>18 dna:chromosome chromosome:GRCh37:18:1:78077248:1
>19 dna:chromosome chromosome:GRCh37:19:1:59128983:1
>20 dna:chromosome chromosome:GRCh37:20:1:63025520:1
>21 dna:chromosome chromosome:GRCh37:21:1:48129895:1
>22 dna:chromosome chromosome:GRCh37:22:1:51304566:1
>X dna:chromosome chromosome:GRCh37:X:1:155270560:1
>Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1
>MT gi|251831106|ref|NC_012920.1| Homo sapiens mitochondrion, complete genome
>GL000207.1 dna:supercontig supercontig::GL000207.1:1:4262:1
>GL000226.1 dna:supercontig supercontig::GL000226.1:1:15008:1
>GL000229.1 dna:supercontig supercontig::GL000229.1:1:19913:1
>GL000231.1 dna:supercontig supercontig::GL000231.1:1:27386:1
>GL000210.1 dna:supercontig supercontig::GL000210.1:1:27682:1
>GL000239.1 dna:supercontig supercontig::GL000239.1:1:33824:1
>GL000235.1 dna:supercontig supercontig::GL000235.1:1:34474:1
>GL000201.1 dna:supercontig supercontig::GL000201.1:1:36148:1
>GL000247.1 dna:supercontig supercontig::GL000247.1:1:36422:1
>GL000245.1 dna:supercontig supercontig::GL000245.1:1:36651:1
>GL000197.1 dna:supercontig supercontig::GL000197.1:1:37175:1
>GL000203.1 dna:supercontig supercontig::GL000203.1:1:37498:1
>GL000246.1 dna:supercontig supercontig::GL000246.1:1:38154:1
>GL000249.1 dna:supercontig supercontig::GL000249.1:1:38502:1
>GL000196.1 dna:supercontig supercontig::GL000196.1:1:38914:1
>GL000248.1 dna:supercontig supercontig::GL000248.1:1:39786:1
>GL000244.1 dna:supercontig supercontig::GL000244.1:1:39929:1
>GL000238.1 dna:supercontig supercontig::GL000238.1:1:39939:1
>GL000202.1 dna:supercontig supercontig::GL000202.1:1:40103:1
>GL000234.1 dna:supercontig supercontig::GL000234.1:1:40531:1
>GL000232.1 dna:supercontig supercontig::GL000232.1:1:40652:1
>GL000206.1 dna:supercontig supercontig::GL000206.1:1:41001:1
>GL000240.1 dna:supercontig supercontig::GL000240.1:1:41933:1
>GL000236.1 dna:supercontig supercontig::GL000236.1:1:41934:1
>GL000241.1 dna:supercontig supercontig::GL000241.1:1:42152:1
>GL000243.1 dna:supercontig supercontig::GL000243.1:1:43341:1
>GL000242.1 dna:supercontig supercontig::GL000242.1:1:43523:1
>GL000230.1 dna:supercontig supercontig::GL000230.1:1:43691:1
>GL000237.1 dna:supercontig supercontig::GL000237.1:1:45867:1
>GL000233.1 dna:supercontig supercontig::GL000233.1:1:45941:1
>GL000204.1 dna:supercontig supercontig::GL000204.1:1:81310:1
>GL000198.1 dna:supercontig supercontig::GL000198.1:1:90085:1
>GL000208.1 dna:supercontig supercontig::GL000208.1:1:92689:1
>GL000191.1 dna:supercontig supercontig::GL000191.1:1:106433:1
>GL000227.1 dna:supercontig supercontig::GL000227.1:1:128374:1
>GL000228.1 dna:supercontig supercontig::GL000228.1:1:129120:1
>GL000214.1 dna:supercontig supercontig::GL000214.1:1:137718:1
>GL000221.1 dna:supercontig supercontig::GL000221.1:1:155397:1
>GL000209.1 dna:supercontig supercontig::GL000209.1:1:159169:1
>GL000218.1 dna:supercontig supercontig::GL000218.1:1:161147:1
>GL000220.1 dna:supercontig supercontig::GL000220.1:1:161802:1
>GL000213.1 dna:supercontig supercontig::GL000213.1:1:164239:1
>GL000211.1 dna:supercontig supercontig::GL000211.1:1:166566:1
>GL000199.1 dna:supercontig supercontig::GL000199.1:1:169874:1
>GL000217.1 dna:supercontig supercontig::GL000217.1:1:172149:1
>GL000216.1 dna:supercontig supercontig::GL000216.1:1:172294:1
>GL000215.1 dna:supercontig supercontig::GL000215.1:1:172545:1
>GL000205.1 dna:supercontig supercontig::GL000205.1:1:174588:1
>GL000219.1 dna:supercontig supercontig::GL000219.1:1:179198:1
>GL000224.1 dna:supercontig supercontig::GL000224.1:1:179693:1
>GL000223.1 dna:supercontig supercontig::GL000223.1:1:180455:1
>GL000195.1 dna:supercontig supercontig::GL000195.1:1:182896:1
>GL000212.1 dna:supercontig supercontig::GL000212.1:1:186858:1
>GL000222.1 dna:supercontig supercontig::GL000222.1:1:186861:1
>GL000200.1 dna:supercontig supercontig::GL000200.1:1:187035:1
>GL000193.1 dna:supercontig supercontig::GL000193.1:1:189789:1
>GL000194.1 dna:supercontig supercontig::GL000194.1:1:191469:1
>GL000225.1 dna:supercontig supercontig::GL000225.1:1:211173:1
>GL000192.1 dna:supercontig supercontig::GL000192.1:1:547496:1

endrebak · 2018-03-25T06:46:27Z

Might it be that the MT chromosome is called something else in your fasta? If you could send me the bim, I could debug it :)

endrebak · 2018-03-25T06:46:38Z

Thanks for reporting this btw!

juliedwhite · 2018-03-25T14:04:43Z

Hi Endre! Thanks for getting back to me. I think my problem was more related to missing data - I upped my missing data filter and everything was solved.

Regarding the chromosome naming - is there any way to tell the program that 23 = X, 24 = Y, and 26 = MT?

endrebak · 2018-03-25T14:13:56Z

I should have a flag for such a map-file. A bit busy now, but will keep this issue open as a reminder. Glad to hear it worked for you :)

juliedwhite · 2018-03-25T15:37:57Z

Hi Endre, just wanted to provide an update. I was working further with the dataset and found the same problem, even after removing the missing data. Based on this data, the index out of range error might have something to do with the treatment of MT data. As an illustration:

#We know this won't compare 23, 24, and 26, but let's do it anyway
$ snpflip --fasta-genome ~/work/ReferenceDatasets/1000G_hg19_fasta/human_g1k_v37.fasta --bim-file ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1.bim --output-prefix ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1
Chromosome 23 in .bim, but not in fasta file.
Chromosome 24 in .bim, but not in fasta file.
Chromosome 26 in .bim, but not in fasta file.
#Produces all the output files as usual

#Change 23=X, 24=Y, 26=M as per the snpflip --help instructions
$ awk -F'\t' -vOFS='\t' '{ gsub("23", "X", $1) ; gsub("24", "Y", $1) ; gsub ("26", "M", $1) ; print }' ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1.bim > ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1_XYM.bim

$ snpflip --fasta-genome ~/work/ReferenceDatasets/1000G_hg19_fasta/human_g1k_v37.fasta --bim-file ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1
_XYM.bim --output-prefix ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1_XYM
Traceback (most recent call last):
  File "/storage/home/jdw345/software/bin/snpflip", line 46, in <module>
    snp_table = create_snp_table(args["--bim-file"], args["--fasta-genome"])
  File "/storage/home/jdw345/software/lib/python3.6/site-packages/snp_flip/table.py", line 11, in create_snp_table
    reference_genome_data = get_reference_genome_data(bim_table, fa_file)
  File "/storage/home/jdw345/software/lib/python3.6/site-packages/snp_flip/reference_genome.py", line 37, in get_reference_genome_data
    snp_nucleotides = [snp.upper() for snp in _get_snps(str(nucleotides))]
IndexError: string index out of range
#Womp

#What if we change 26 to MT, since that is what is represented in the fasta file? 
$awk -F'\t' -vOFS='\t' '{ gsub("23", "X", $1) ; gsub("24", "Y", $1) ; gsub ("26", "MT", $1) ; print }' ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1.bim > ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1_XYMT.bim

snpflip --fasta-genome ~/work/ReferenceDatasets/1000G_hg19_fasta/human_g1k_v37.fasta --bim-file ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1_XYMT.bim --output-prefix ADAPT_2784ppl_567K_hg19_geno0.1_mind0.1_XYMT
Chromosome MT in .bim, but not in fasta file.
#Produces all the files as expected, but this time without comparing the MT SNPs

This isn't actually a problem for me, as we're not analyzing the MT genome and I can just as easily remove it before running snpflip. But, I thought I'd post this for clarification.

Thanks for the great program!

endrebak · 2018-03-26T05:31:05Z

Great work. Thanks. Ill give snpflip an update after I finish my PhD :)

rcanovas · 2018-09-10T06:53:58Z

Hi there I have the same error but I have not been able to fix it by following the solutions suggested.

snpflip -b FNLITUK_b37hqis.bim -f hs37d5_v2.fa -o snpflip_output
Traceback (most recent call last):
File "/home/rcanovas/.local/share/virtualenvs/testing_area-AP9TWkm_/bin/snpflip", line 46, in
snp_table = create_snp_table(args["--bim-file"], args["--fasta-genome"])
File "/home/rcanovas/.local/share/virtualenvs/testing_area-AP9TWkm_/lib/python3.5/site-packages/snp_flip/table.py", line 11, in create_snp_table
reference_genome_data = get_reference_genome_data(bim_table, fa_file)
File "/home/rcanovas/.local/share/virtualenvs/testing_area-AP9TWkm_/lib/python3.5/site-packages/snp_flip/reference_genome.py", line 37, in get_reference_genome_data
snp_nucleotides = [snp.upper() for snp in _get_snps(str(nucleotides))]
IndexError: string index out of range

Info of my .bim file

cut -f 1 FNLITUK_b37hqis.bim | sort -u
1
10
11
12
13
14
15
16
17
18
19
2
20
21
22
3
4
5
6
7
8
9

and the headings of my .fa file

grep ">" ~/assembly_builds/hs37d5_v2.fa

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

endrebak · 2018-09-10T07:13:38Z

Would love to try to help. Could you send me the bim and fa or are they confidential?

rcanovas · 2018-09-11T00:00:08Z

Hi Endre, Sadly I can not send you the .bim file and the .fa file is quite big. You can get the .fa file from the checkVCF tool page https://github.com/zhanxw/checkVCF. Would that help?

…

On Mon, 10 Sep 2018 at 17:13, Endre Bakken Stovner ***@***.***> wrote: Would love to try to help. Could you send me the bim and fa or are they confidential? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAwULArOj7TIjeEH1G-w9x9vWSnUpsv_ks5uZhEkgaJpZM4RzqCI> .

endrebak · 2018-09-11T06:54:51Z

Yeah, but I do not use this software anymore because I haven't had use for it. I'll keep this issue open as a todo, but I have lots I need to finish before fixing this. It's not like I have a paper on SNPflip or anything :) On Tue, Sep 11, 2018 at 2:00 AM Rodrigo Cánovas <[email protected]> wrote:

…

Hi Endre, Sadly I can not send you the .bim file and the .fa file is quite big. You can get the .fa file from the checkVCF tool page https://github.com/zhanxw/checkVCF. Would that help? On Mon, 10 Sep 2018 at 17:13, Endre Bakken Stovner < ***@***.***> wrote: > Would love to try to help. Could you send me the bim and fa or are they > confidential? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#4 (comment) >, > or mute the thread > < https://github.com/notifications/unsubscribe-auth/AAwULArOj7TIjeEH1G-w9x9vWSnUpsv_ks5uZhEkgaJpZM4RzqCI > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ9I0mA6UaXpVmbCmLSJA5RcesIt96_5ks5uZv0IgaJpZM4RzqCI> .

nadavrap · 2018-10-29T09:21:11Z

I had the same issue, but I finally found out that I used a different genome reference. Once I switched to the right one it worked properly.

Victor0122 · 2018-11-24T01:57:28Z

Hi there I have the same error but I have not been able to fix it by following the solutions suggested.
Traceback (most recent call last):
File "/home/victor/.local/bin/snpflip", line 46, in
snp_table = create_snp_table(args["--bim-file"], args["--fasta-genome"])
File "/home/victor/.local/lib/python2.7/site-packages/snp_flip/table.py", line 11, in create_snp_table
reference_genome_data = get_reference_genome_data(bim_table, fa_file)
File "/home/victor/.local/lib/python2.7/site-packages/snp_flip/reference_genome.py", line 37, in
get_reference_genome_data
snp_nucleotides = [snp.upper() for snp in _get_snps(str(nucleotides))]I ndexError: string index out of
range'

My fa.files
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
tatgtgagaagatagctgaacgccttgtccacatcatcttactgctgaga
gttgagctcaccctcagtccctcacagttccacactgcctgcagagtgag
tttcccatgtcttcaccagagacttttgccagaggcttctgagacgcaag
ttaacaatgcagacctggagggtatctccaggtgcagtagagtggtaatc
tcggaacctcctgactcagaatactgctaccttcacactgtcataagaat
gcagcgagttgagagctggcttctaggcatgcttccttttgagagctgag
gacaggacagaaccctcccgcatcctgcctgactgtagacgtacctgcta

Victor0122 · 2018-11-24T03:04:18Z

I have the other issue.
I try to do the SNP check by one chromosome by one chromsome.
I found out that I have wrong position in my output file.
chromosome 0_idx_position snp_name genetic_distance allele_1 allele_2 reference reference_rev strand
1 249579 1.24958 0 G A A T forward
1 251204 1.251205 0 G C G C ambiguous
1 266522 1.266523 0 C A G C reverse
1 273486 1.273487 0 A G A T forward
1 307562 1.307563 0 A C C G forward
1 320054 1.320055 0 G A A T forward
1 343358 1.343359 0 G A T A reverse
1 348209 1.3482100000000001 0 G A T A reverse
1 363617 1.363618 0 G A A T forward
1 373663 1.373664 0 A T T A ambiguous
1 398891 1.398892 0 A G C G reverse
1 412572 1.4125729999999999 0 A C A T forward
1 420035 1.420036 0 G A G C forward

For example 1.3482100000000001 should be 1.348209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: string index out of range #4

IndexError: string index out of range #4

bio-bench commented Jan 31, 2018

endrebak commented Feb 1, 2018

juliedwhite commented Mar 24, 2018

endrebak commented Mar 25, 2018

endrebak commented Mar 25, 2018

juliedwhite commented Mar 25, 2018

endrebak commented Mar 25, 2018

juliedwhite commented Mar 25, 2018 •

edited

Loading

endrebak commented Mar 26, 2018

rcanovas commented Sep 10, 2018

endrebak commented Sep 10, 2018

rcanovas commented Sep 11, 2018 via email

endrebak commented Sep 11, 2018 via email

nadavrap commented Oct 29, 2018

Victor0122 commented Nov 24, 2018 •

edited

Loading

Victor0122 commented Nov 24, 2018

IndexError: string index out of range #4

IndexError: string index out of range #4

Comments

bio-bench commented Jan 31, 2018

endrebak commented Feb 1, 2018

juliedwhite commented Mar 24, 2018

endrebak commented Mar 25, 2018

endrebak commented Mar 25, 2018

juliedwhite commented Mar 25, 2018

endrebak commented Mar 25, 2018

juliedwhite commented Mar 25, 2018 • edited Loading

endrebak commented Mar 26, 2018

rcanovas commented Sep 10, 2018

endrebak commented Sep 10, 2018

rcanovas commented Sep 11, 2018 via email

endrebak commented Sep 11, 2018 via email

nadavrap commented Oct 29, 2018

Victor0122 commented Nov 24, 2018 • edited Loading

Victor0122 commented Nov 24, 2018

juliedwhite commented Mar 25, 2018 •

edited

Loading

Victor0122 commented Nov 24, 2018 •

edited

Loading