React：为很长的序列创建 DNA 查看器

Question

我想在 React ts 应用程序中创建自己的 DNA 序列查看器（以学习和练习编码）。我从我的烧瓶服务器中以长字符串形式获取序列，非常好。但在序列非常大（超过 400 万）的情况下，我的应用程序崩溃了。因为它不仅仅是显示文本，而且是字母下的规则，并为每个字母（A、C、G 或 T）提供不同的颜色 - 该规则是强制性的，如果要使应用程序变慢 - （见图）

,

我的渲染文本代码：

const renderColoredText = (text: string) => {
       return text.split('').map((char, index) => {
         let color = 'black';
   
         switch (char.toLowerCase()) {
           case 'a':
             color = 'primary';
             break;
           case 'b':
             color = 'secondary';
             break;
           case 'c':
             color = 'warning';
             break;
           default:
             break;
         }

         const borderBottom = index % 10 === 9 ? '3px solid lightblue' : '1px solid lightblue'
         const padding = '5px'

       return (
           <Box
               key={index}
               sx={{
               display: 'inline-block',
               borderBottom,
               // color: getNucleotideColor(nucleotide),
               paddingTop: '2px',
               position: 'relative',
               marginBottom: '20px', 
               }}
           >
           {char}
           {
               index % 10 === 9 && (
           <Box
               sx={{
               position: 'absolute',
               top: '110%', // Below the nucleotide
               left: '50%',
               transform: 'translateX(-50%)',
               fontSize: '12px',
               color: 'black',
               }}
           >
               {index + 1}
           </Box>
               )}
       </Box>

我添加了一个间隔，所以每1秒它就会添加10000或20000个字母的序列块（超过它会崩溃），并且在4-5百万个长序列的情况下，这将需要很长时间更新，并在某些时候崩溃。

我的间隔代码：

const chunkSize = 10000; 
   const updateInterval = 1000; 


   useEffect(() => {
       const currentUser = getUserToken()
       if (!currentUser || !currentUser._id) 
       return;

       getWorkspaceInput(currentUser._id)
       .then(data => {
           let currentIndex = 0;

           
           const intervalId = setInterval(() => {
               const nextChunk = data.input.slice(currentIndex, currentIndex + chunkSize);
               setSequence(prevSequence => prevSequence + nextChunk);
               currentIndex += chunkSize;
                   if (currentIndex >= data.input.length) {
                   clearInterval(intervalId);
               }
           }, updateInterval);
   
           return () => {
               clearInterval(intervalId);
           };
       })


   },[])

渲染：

 const [sequence, setSequence] = useState('')

 return (
       <MainContainer title={'Sequence Viewer'} >
           <Box sx={{ maxWidth: '100%' }}>
               <Typography 
                   sx={{ 
                       wordWrap: 'break-word',
                       letterSpacing: '2px',
                       paddingTop:'1.5rem'
                       }}
                   >{renderColoredText(sequence)}
               </Typography>
           </Box>         
       </MainContainer>
      
   )

你知道如何做到这一点吗？

谢谢你

Answer 1

我挑战自己构建这个序列，但事实证明，纯 React 中的优化没有任何帮助。当序列的长度为 5 000 000 时，即使将每个字母包装在一个简单的

span

中也太多了。

我知道唯一能提供帮助的是虚拟化。这个想法可以缩小为：让我们只渲染当前可见的内容，然后卸下其余的。

不幸的是，它有很多限制，主要是关于能够将内容组织成行列表并了解每行的尺寸。

在 Google 中搜索“React virtualization”以了解如何操作。虽然将在线指南重写为这个 SO 答案的要求太多了，但我将提供一些我认为会有所帮助的代码示例。

我不建议将整个序列拆分为一个列表。我建议将整个序列与

beginIndex

和

endIndex

一起传递到各处，并使用

substring

。这样，您将能够创建一个仅提取所需部分的组件：

const CHUNK_SIZE = 10;

const ROW_HEIGHT = 50;

const ROW_WIDTH = 1100;

function DnaSequenceRow(props) {
  const row = React.useMemo(() => {
    return props.sequence.substring(props.beginIndex, props.endIndex);
  }, [props.beginIndex, props.endIndex, props.sequence]);

  const chunksOfTen = React.useMemo(() => divideIntoSubsequences(row, CHUNK_SIZE),
    [row],
  );

  return (
    <div style={{
      //size probably has to be arbitrarily set for the sake of virtualization
      height: ROW_HEIGHT,
      width:  ROW_WIDTH,
    }}
    >
      {chunksOfTen.map((ten, indexWithinRow) => (
        <SequenceOfTen
          key={indexWithinRow}
          sequenceOfTen={ten}
          index={props.beginIndex + (indexWithinRow * CHUNK_SIZE)}/>
      ))}
    </div>
  );
}

这就是您可能需要的虚拟化功能 - 能够渲染单行，以便虚拟化库可以渲染选定的行子集（而不是渲染全部行）。

为了便于阅读，我还将十个核苷酸的块提取到一个单独的组件中：

function SequenceOfTen(props) {
  return (
    <>
      <Box
        sx={{
          display:      'inline-block',
          borderBottom: '3px solid lightblue',
          // color: getNucleotideColor(nucleotide),
          paddingTop:   '2px',
          position:     'relative',
          marginBottom: '20px',
        }}
      >
        {props.sequenceOfTen[0]}
        <Box
          sx={{
            position:  'absolute',
            top:       '110%', // Below the nucleotide
            left:      '50%',
            transform: 'translateX(-50%)',
            fontSize:  '12px',
            color:     'black',
          }}
        >
          {props.index}
        </Box>
      </Box>
      <Box
        sx={{
          display:      'inline-block',
          borderBottom: '1px solid lightblue',
          // color: getNucleotideColor(nucleotide),
          paddingTop:   '2px',
          position:     'relative',
          marginBottom: '20px',
        }}
      >
        {props.sequenceOfTen.substring(1)}
      </Box>
    </>
  );
}

我使用

useMemo

来避免在每次渲染时重新计算行和块，并将组件包装在

React.memo

中以避免不必要的重新渲染。块的划分基于我在这里找到的算法：https://stackoverflow.com/a/29202760/12003949

export function divideIntoSubsequences(sequence, subsequenceLength) {
  const numberOfSubsequences = Math.ceil(sequence.length / subsequenceLength);
  const subsequences         = new Array(numberOfSubsequences);
  for (let i = 0, o = 0; i < numberOfSubsequences; ++i, o += subsequenceLength) {
    subsequences[i] = sequence.substring(o, o + subsequenceLength);
  }
  return subsequences;
}

围绕这个包装一些虚拟化库，它应该像一个魅力一样工作。祝你好运！

React：为很长的序列创建 DNA 查看器

问题描述投票：0回答：1

1个回答

最新问题

React：为很长的序列创建 DNA 查看器

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1