You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I'm having trouble preprocessing the MIMIC-IV with Big Query. I'm using the their query translation tool however I'm getting errors. I'm trying to translate the following query to BQ
create table ld_commonlabs as
-- extracting the itemids for all the labevents that occur within the time bounds for our cohort
with labsstay as (
select l.itemid, la.stay_id
from labevents as l
inner join ld_labels as la
on la.hadm_id = l.hadm_id
where l.valuenum is not null -- stick to the numerical data
-- epoch extracts the number of seconds since 1970-01-01 00:00:00-00, we want to extract measurements between
-- admission and the end of the patients' stay
and (date_part('epoch', l.charttime) - date_part('epoch', la.intime))/(60*60*24) between -1 and la.los),
-- getting the average number of times each itemid appears in an icustay (filtering only those that are more than 2)
avg_obs_per_stay as (
select itemid, avg(count) as avg_obs
from (select itemid, count(*) from labsstay group by itemid, stay_id) as obs_per_stay
group by itemid
having avg(count) > 3) -- we want the features to have at least 3 values entered for the average patient
select d.label, count(distinct labsstay.stay_id) as count, a.avg_obs
from labsstay
inner join d_labitems as d
on d.itemid = labsstay.itemid
inner join avg_obs_per_stay as a
on a.itemid = labsstay.itemid
group by d.label, a.avg_obs
-- only keep data that is present at some point for at least 25% of the patients, this gives us 45 lab features
having count(distinct labsstay.stay_id) > (select count(distinct stay_id) from ld_labels)*0.25
order by count desc;
My resulting big query sql is :
CREATE TABLE mimic_iv.ld_commonlabs
AS
WITH labsstay AS (
SELECT
-- extracting the itemids for all the labevents that occur within the time bounds for our cohort
l.itemid,
la.stay_id
FROM
physionet-data.mimiciv_hosp.labevents AS l
INNER JOIN mimic_iv.ld_labels AS la ON la.hadm_id = l.hadm_id
WHERE l.valuenum IS NOT NULL
AND (UNIX_SECONDS(CAST(CAST(l.charttime as DATE) AS TIMESTAMP)) - CAST(UNIX_SECONDS(CAST(CAST(la.intime as DATE) AS TIMESTAMP)) as FLOAT64)) / (60 * 60 * 24) BETWEEN -1 AND la.los
), avg_obs_per_stay AS (
SELECT
-- stick to the numerical data
-- epoch extracts the number of seconds since 1970-01-01 00:00:00-00, we want to extract measurements between
-- admission and the end of the patients' stay
-- getting the average number of times each itemid appears in an icustay (filtering only those that are more than 2)
obs_per_stay.itemid,
avg(CAST(obs_per_stay.count as BIGNUMERIC)) AS avg_obs
FROM
(
SELECT
labsstay.itemid,
count(*) AS count
FROM
labsstay
GROUP BY 1, labsstay.stay_id
) AS obs_per_stay
GROUP BY 1
HAVING avg(CAST(obs_per_stay.count as BIGNUMERIC)) > 3
)
SELECT
-- we want the features to have at least 3 values entered for the average patient
d.label,
count(DISTINCT labsstay.stay_id) AS count,
a.avg_obs
FROM
labsstay
INNER JOIN physionet-data.mimiciv_hosp.d_labitems AS d ON d.itemid = labsstay.itemid
INNER JOIN avg_obs_per_stay AS a ON a.itemid = labsstay.itemid
GROUP BY 1, 3
HAVING count(DISTINCT labsstay.stay_id) > (
SELECT
-- only keep data that is present at some point for at least 25% of the patients, this gives us 45 lab features
count(DISTINCT labsstay.stay_id) AS count
FROM
mimic_iv.ld_labels
) * NUMERIC '0.25'
However, this is producing the following error
An expression references labsstay.stay_id which is neither grouped nor aggregated at [46:28]
I'm not very good with SQL and I had other issues setting up the PostgresSQL database locally. Maybe you could help explain what this query is doing and how to better translate it to the Google Big Query style as I would like to generate the CSV files
Thanks.
The text was updated successfully, but these errors were encountered:
Hi I'm having trouble preprocessing the MIMIC-IV with Big Query. I'm using the their query translation tool however I'm getting errors. I'm trying to translate the following query to BQ
My resulting big query sql is :
However, this is producing the following error
An expression references labsstay.stay_id which is neither grouped nor aggregated at [46:28]
I'm not very good with SQL and I had other issues setting up the PostgresSQL database locally. Maybe you could help explain what this query is doing and how to better translate it to the Google Big Query style as I would like to generate the CSV files
Thanks.
The text was updated successfully, but these errors were encountered: