-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'str' object has no attribute '_output_dir' #120
Comments
Thanks for the issue. Can you provide more details or code snippets? I just tested installing and running the |
Thanks Andrew,
On using your example data and the create_individual working, I realized
that my issue was with the parsing. I already converted the format from
AncestryDNA to 23andMe and then tried to use create_indidual. I receive the
parsing error, which then doesn't allow me to go forward. My other set of
files also have 4 columns like 23andMe but no headers (from the H3Africa
array with another lab).
$ sed -n 1,20p lineage/inputs/myfile.txt
#AncestryDNA raw data download
#This file was generated by AncestryDNA at: 07/31/2018 23:48:22 UTC
#Data was collected using AncestryDNA array version: V2.0
#Data is formatted using AncestryDNA converter version: V1.0
...
rsid chromosome position allele1allele2
rs369202065 1 569388 GG
$ python manage.py shell
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
>> from lineage import Lineage
>> l = Lineage()
>> user111 = l.create_individual('User111', 'myfile.txt')
pandas.errors.ParserError: Too many columns specified: expected 5 and found
4
LaKisha
…On Sun, Jan 17, 2021 at 11:14 PM Andrew Riha ***@***.***> wrote:
Thanks for the issue. Can you provide more details or code snippets? I
just tested installing and running the README examples in a Python 3.8
virtual environment without any issues.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/apriha/lineage/issues/84#issuecomment-761984850>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALHHGO657CSAN6PJROW3PCTS2O7SFANCNFSM4WGHH4IQ>
.
|
Thanks LaKisha, that helps.
As for the H3Africa files, can you confirm that an example file would look like this (tab-separated):
|
Hi Andrew,
Here is the script I'm using to convert my files from AncestryDNA to
23andMe format:
(venv) ubuntu@:~/myprojectdir/lineage/inputs$
for file in ./*.txt; do echo "converting from AncestryDNA to 23andMe format
file:" $file; gawk -i inplace -F'\t' '{ print $1"\t"$2"\t"$3"\t"$4$5; }'
$file; done
This line results in a text file that looks like this:
rsid chromosome position allele1allele2
rs369202065 1 569388 GG
rs199476136 1 569400 TT
rs3131972 1 752721 AG
rs114525117 1 759036 GG
rs12124819 1 776546 AA
rs4040617 1 779322 AA
rs141175086 1 780397 CC
rs115093905 1 787173 GG
rs11240777 1 798959 AG
The H3Africa file looks like this after using the command line (tab):
h3a_37_1_54676_C_T 1 54676 AA
seq-h3a_37_1_61989_G_C 1 61989 CC
seq-h3a_37_1_62271_A_G 1 62271 AA
seq-h3a_37_1_64552_G_A 1 64552 AA
seq-h3a_37_1_104072_C_T 1 104072 GG
h3a_37_1_108310_T_C 1 108310 AA
h3a_37_1_110509_G_A 1 110509 GG
seq-h3a_37_1_118617_T_C 1 118617 GG
seq-h3a_37_1_256586_T_G 1 256586 AC
h3a_37_1_404672_G_A 1 404672 AA
kgp15717912 1 534247 GG
If it helps, I'm sharing with you that after converting to 23andMe format,
I convert it to VCF format to use downline. Your tool is really quick, plus
the graph. It would be great if I could use it my pipeline. Here's my
23andMe to VCF conversion:
(venv) ubuntu@:~/myprojectdir/lineage/inputs$
for file in ./*txt; do echo "converting to vcf file:" $file;
bcftools convert -c ID,CHROM,POS,AA -s ${file%.txt} --haploid2diploid -f
../references/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --tsv2vcf
$file -Oz -o ${file%.txt}.vcf.gz;
done
# Index multiple vcf files in prep to merge
for file in ./*.vcf.gz; do echo "indexing vcf file" $file; tabix $file; done
# Merge multiple vcf file into single vcf file
bcftools merge -Oz -o MergedSamples1.vcf.gz ../inputs/*.vcf.gz
# Clean MergedSamples file
bgzip -d ../results/MergedSamples.vcf.gz
grep ^"#" ../results/MergedSamples.vcf > ../results/MergedSamples0.vcf
awk -F$'\t' '{ if ( $3 ~ "rs" ) { print $0; } }'
../results/MergedSamples.vcf > ../results/MergedSamples1.vcf
awk -F$'\t' '{ if ( $3 !~ ";" ) { print $0; } }'
../results/MergedSamples1.vcf > ../results/MergedSamples2.vcf
cat ../results/MergedSamples0.vcf ../results/MergedSamples2.vcf >
../results/MergedSamplesEdited.vcf
sed -n 1,20p MergedSamplesEdited.vcf
gawk -i inplace '!a[$2]++' ../results/MergedSamplesEdited.vcf
bgzip ../results/MergedSamplesEdited.vcf
…On Mon, Jan 18, 2021 at 11:34 PM Andrew Riha ***@***.***> wrote:
Thanks LaKisha, that helps. lineage uses the snps library to parse files,
so I transferred the issue here.
snps should be able to read raw AncestryDNA or 23andMe files without
conversion... However, snps could be updated to handle the format you
pasted as well. Do you have a link to the tool that produces that format?
As for the H3Africa files, can you confirm that an example file would look
like this (tab-separated):
rs1 1 101 AA
rs2 1 102 CC
rs3 1 103 GG
rs4 1 104 TT
rs5 1 105 --
rs6 1 106 GC
rs7 1 107 TC
rs8 1 108 AT
..
.
..
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#120 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALHHGO3YAISSB3V4FHC7HRLS2UKWNANCNFSM4WIHI47A>
.
|
Thanks LaKisha. The issue with But, you don't need to convert the file since As for the H3Africa file, And if you need a VCF file, you can save the SNPs in VCF format. |
Closing since there are no updates required for this issue. |
Sorry, I closed the issue too early. Upon further investigation, So to handle this,
|
Hi Andrew,
I tried again with fresh AncestryDNA zip files. I'm still getting the same
error message.
>> s = SNPs("/home/ubuntu/myprojectdir/lineage/inputs/Person1.zip")
>> s.source
'AncestryDNA'
>> s.build
37
>> s.assembly
'GRCh37'
>> s.count
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'SNPs' object has no attribute 'count'
>> user662 = l.create_individual('User662',
'/home/ubuntu/myprojectdir/lineage/inputs/Person1.zip')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File
"/home/ubuntu/myprojectdir/venv/lib/python3.8/site-packages/lineage/__init__.py",
line 96, in create_individual
return Individual(name, raw_data, self._output_dir, **kwargs)
AttributeError: 'str' object has no attribute '_output_dir'
…On Sun, Jan 24, 2021 at 10:44 PM Andrew Riha ***@***.***> wrote:
Sorry, I closed the issue too early. Upon further investigation, snps
should be updated to handle the H3Africa format since the generic parser is
not invoked (an rsid is not in the first line). Also, the generic parser
wouldn't be able to parse this due to multiple whitespace.
So to handle this, snps could either (or both)
- check if "h3a" is in the first line and apply a parser similar to
the AncestryDNA parser with multiple whitespace
- apply a generic parser as a last check that tries to read four or
five column files with multiple whitespace
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#120 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALHHGO5WMVIZRNBAODDLNDLS3TZJ5ANCNFSM4WIHI47A>
.
|
Hi @lakishadavid , please try to create a new virtual environment and install |
I'm using aws ec2 ubuntu. It does not allow me to create an individual.
The text was updated successfully, but these errors were encountered: