my name's Scott Burns. I'm a senior research imaging specialist with the Education and Brain Science Research Lab at Vanderbilt University and I'm here today to talked about, to talk about neuro-imaging data management. so if we take a step back and sort of look at the entire picture we can define sort of the workflow of how data is extracted in a typical neuroimaging research study, and all the different pieces and parts that are connected. And so in our lab, we're interested in the brain. And so a subject will come in and typically do a one to one and a half hour MR scan session. during that session images are created and they are then stored on some sort of central data storage service. and at some point later we can extract those images and start doing the real, image processing. And so I want to sort of talk about, how information is, is exchanged at each of these points and sort of perils and pitfalls, that you need to think about while you're doing the study. And so getting data from the subject into the scanner basically any medical imaging device aims to capture tissue specific measures using non invasive techniques. generally they are these scanners and image systems are discretizing tissue into volume elements, and we typically call, refer to those as Voxels. This is sort of a pun on picture elements or pixels. and so, the biggest thing that can always ruin your data is subject motion and And so if you're running one of these studies, it's important to think about your population, and so for instance, in our case we typically scan children and they could be extremely they can move a lot in the scanner and so it's very important to try and control that as best you can in the, in the setting. you should also think about image orientation and so there's two major orientations to medical images. that's radiological and neurological and the only way they differ is whether right is left or right is right. and so if you're viewing an image on the screen it's very important to know whether what's on the right side is the subjects right side or if it's the subjects left side. And then finally we can think about the size of the data that we're collecting and the acquisition dimensions, and so if we're collecting a single slice of data you know, that's two dimensions. If we capture an entire volume of tissue that's three dimensions, and if we then can capture mini volumes whether over time or some other measure then that's four dimensions of data. And so it's important to know the size of data that you are going to be capturing inside of the scanner. I'm going to talk about a couple of research MR sequences that we ru, that we typically run And so the first thing that we typically capture from subjects is what's called a T1-Weighted Anatomical image. this is generally a a single volume. It's very high resolution. It's on the order that Voxels tend to be about 1 millimeter cubed and so in this picture, in this image, gray matter appears gray, white matter appears white. we can also capture, what's called functional MRI, and so in this sequence, many, low resolution, the bold weighted volumes are captured. And, to, capture an entire volume takes about 3 seconds. and we finally also collect, what's called diffusion MRI data, and this also includes many low resolution, diffusion-weighted volumes, Each one of these volumes is captured in about 10 seconds. and so, after we run a scan, we want to be able to get these images for ourselves, so we can start running image processing on them. scanners themselves typically contain very little local storage. and there'd be a lot of security issues and reliability issues, if the scanners could just allow anybody into them to download data. And so what's needed, from a user's point of view, is a read-only repository of all the acquired data that the scanners collect. this is typically u, typically inte, implemented with what's called a PACS system, or Picture Archive and Communication System. this system generally implements long term image storage and retrieval. They're typically very expensive and they're not user, very user friendly. so once we've set out, or once we've set up the PACS to sort of store our images, we also then need to define a standard format for the actual file formats, how the images are stored. this is currently dealt with using the DICOM standard, Digital Imaging Communications in Medicine. and this standard has protocols for the file formats. how to transfer information between machines and also generally how to manage medical images. Unfortunately every vendor has their own specific implementations of this standard. and they typically store personal health information such as subjects name, medical record number, date of birth etc. so in a, especially research setting we're really interested in getting these images to a local machine, so that we can actually run fancy image processing on them. And so we need a way to go from the PACS to a local file system because our analysis packages typically don't speak DICOM or directly communicate with the PACS. And so every imaging site basically has there own specific method, that users can download images from the PACS to a local to their local computer. there's a couple of new applications currently in development, that aim to be more user friendly, an, and provide both graphical and programmatic access to these images. a couple of examples like that are, XNAT, which is produced by the Washington University in St Louis, and the humage im, human imaging database produced by the the Burn project. once we have the DICOM images on our local machine, we actually want very little of the information that's stored in the DICOM header. we want to be able to mix and match processing routines from disparate packages. And in general, the DICOM format is very difficult to deal with. and so the Neuroimaging Informatics Technology initiative defined a simpler format for MR imaging, and that's called NifTI, And so, if you're dealing with DICOM files, basically you'll need a standardized method for converting your DICOM files to NifTI format, which all major imaging processing software packages can read and write to these days. and so what are those sort of typical image processing things that we do with and research it, in research. well, whatever we do, what processing we do is done strongly, what processing we do is strongly tied to our image contrast. What kind of image have we captured for this particular scan. unfortunately there's many knobs and buttons to tweak for any image processing routine. And so every research group is going to have their own sort of preferred methods for processing their acquired images. fortunately, there's many software packages for every kind of analysis that you'd like to do. And so sort of the first thing that we can think about is, I mentioned before that we collect T1 weighted images. And so these images are very high resolution. basically just a picture of the brain as it would appear inside your head. And so the first thing we can do on these images is what's called segmentation. And so this algorithm aims to produce new images that are differentiated by their tissue types. And so, a segmentation routine will produce an image that is just the grey matter from the brain, just the white matter, and then, sort of extraneous tissue such as, CSF, or the skull. sort of next, next thing we want to do with an anatomical image is what's called Parcellation. And so this aims to label voxels by their neuroanatomical location. So, given the picture of a brain, a parcellation routine will tell us that a particular voxel might be in visual cortex or sensory cortex or something like that. And then since there's so much information in this high-res anatomical image, we could then do some advanced modelling. there's a couple of different packages out there that will create a 3 dimensional mesh of the cortical and white matter surfaces. And so once we have that 3D mesh we can measure things like cortical thickness, subcortical volumes. And these packages all do everything automatically, so its very easy from a user's point of view to use these. And so if we look at some of the images that's created using these packages on the left is basically a volume view with automated labeling. And so that the bright green section is the white matter of the brain on the left hemisphere. The white section is the white matter of the brain on the right hemisphere. the red that sort of surrounds the white matter, that's all been labeled as cortex. And then there's other structures deeper in the brain that have their own colors that are [INAUDIBLE], that are their own unique structures. And we can also do this on the surface of the brain. and so on the right we're looking at a what's call an inflated view of the brain and all of these different colors correspond to different anatomical structures. And these pictures were made completely automated using the free surfer package. another sequence that we capture is called functional MR and so in this sequence we use the BOLD contrast to produce images of relative quantity, of oxygen rich and oxygen poor blood. And so, generally, during one of these acquisitions a behavioral task is coupled during the scan. And so the idea here is that the behavioral task elicits some energy usage from within the brain and that areas within the brain that use more energy are going to have changing quantities of the oxygen-rich to oxygen-poor blood. that as researchers we can use statistics to isolate the voxels that correlate with task events. And so we'll collect many volumes over time and a standard analysis will collapse the 4 dimensional volumes into a single 3 dimensional contrast image. Typically, you'll see these maybe in the news. this is a, a sort of a standard fMRI analysis, and so the image, the voxels that are yellow and red, are voxels in the brain that have been found to statistically, correlate with the task at hand. and the blue voxels sort of negatively correlate with the task at hand. And finally I'll talk a lit, briefly about DTI Processing. And so, in a diffusion MR image we capture the relative motion of water within each voxel along different directions. And so then we can apply a tensor analysis to calculate the direction and intensity of motion, of the water within every voxel. and then if we want to get very fancy we can do whats called probabilistic fiber tracking and basically follow, the motion of the water as it was captured in the MR image and form these very pretty connectivity maps. And that gives us a, a good idea of struc, structural connectivity within the brain. so I've tried to explain how analyses typically collapse our input by one or more dimensions. after we run an analysis, we want to disseminate the results with collaborators in a secure and efficient method. almost all image analyses will produce some scalar data. And so it's sort of an open question where we store that scalar data. Do we store it in a data base, or on a files system? Unfortunately there's pros and cons to both ways. And also we want to know a little bit about the providence of that particular image. where it was collected on what scanner, on what day and then what sort of imaging processing routines have sort of been applied to the data? and this theoretically helps with reproducibility. to speak a little bit about our implementation of all of this. we use REDCap heavily to store demographic data behavioral data and imaging metadata on all of our subjects and so on. One of the, one end of the spectrum, REDCap is, provides a very simple graphical user interface to store data, and on the other end it has a advanced API that we can write programs against either export data to or send data to particular REDCap projects. And so using the API we've built custom tools that will automatically retrieve images from our scanner. organize our images by the type of processing that's going to be done and then set up send these image routines to our cluster. these imaging routines can take upwards of 24 or 48 hours to complete so it's nice to have, sort of a large scale compute cluster if possible. And and finally we have custom tools to sort of tear down the image processing jobs and try and disseminate the information to our researchers. the, and so at this point we try and send as much scalar data back to REDCap as possible, and right now we're sending about 4,000 fields that have been totally, automatically computed back at the REDCap. And so, this sort of leads to some neuro, open neuro-imaging challenges I think still exist when, within the field. having data alone is not enough. As researchers, we still need to ask interesting questions of our data. I think there's still a lot of challenges in joining disparately sized and sort of ever changing image databases. we also want to securely and efficiently share information with our collaborator, collaborators, so that we can collaborate on papers and grants And so, doing this so as to protect our subjects, personal health information and also to make it easy for us to run these large scale analyses. the tools for that don't quite exist yet. and finally I think we still need to work hard on reproduce-ability. There's a lot of software packages out there that purport to do, typically the same kind of image processing routines and so, I think there's still work to be done, and sort of validating how they work with and without each other. I hope that I've, answered a few questions maybe opened opened your eyes to some of the challenges we face in neuroimaging. And I appreciate you listening.