如何缓存分页的 Django 查询集,特别是在 ListView 中?
我注意到一个查询需要很长时间才能运行,因此我尝试缓存它。查询集很大(超过 100k 条记录),因此我尝试仅缓存其分页部分。我无法缓存整个视图或模板,因为有些部分是特定于用户/会话的并且需要不断更改。
ListView 有几个用于检索查询集的标准方法,
get_queryset()
(返回非分页数据)和 paginate_queryset()
(按当前页面过滤数据)。
我首先尝试在
get_queryset()
中缓存查询,但很快意识到调用 cache.set(my_query_key, super(MyView, self).get_queryset())
会导致整个查询被序列化。
然后我尝试覆盖
paginate_queryset()
,例如:
import time
from functools import partial
from django.core.cache import cache
from django.views.generic import ListView
class MyView(ListView):
...
def paginate_queryset(self, queryset, page_size):
cache_key = 'myview-queryset-%s-%s' % (self.page, page_size)
print 'paginate_queryset.cache_key:',cache_key
t0 = time.time()
ret = cache.get(cache_key)
if ret is None:
print 're-caching'
ret = super(MyView, self).paginate_queryset(queryset, page_size)
cache.set(cache_key, ret, 60*60)
td = time.time() - t0
print 'paginate_queryset.time.seconds:',td
(paginator, page, object_list, other_pages) = ret
print 'total objects:',len(object_list)
return ret
然而,尽管只检索了 10 个对象,并且每个请求都显示“重新缓存”,这意味着没有任何内容被保存到缓存,但运行时间几乎需要一分钟。
我的
settings.CACHE
看起来像:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': '127.0.0.1:11211',
}
}
和
service memcached status
显示 memcached 正在运行,而 tail -f /var/log/memcached.log
完全没有显示任何内容。
我做错了什么?缓存分页查询以便不检索整个查询集的正确方法是什么?
编辑:我认为它们可能是 memcached 或 Python 包装器中的错误。 Django 似乎支持两种不同的 memcached 后端,一种使用 python-memcached,一种使用 pylibmc。 python-memcached 似乎默默地隐藏了缓存
paginate_queryset()
值的错误。当我切换到 pylibmc 后端时,现在我收到一条明确的错误消息“error 10 from memcached_set: SERVER ERROR”,追溯到 set 中的 django/core/cache/backends/memcached.py,第 78 行。
您可以扩展
Paginator
以通过提供的 cache_key
支持缓存。
有关此类
CachedPaginator
的使用和实现的博客文章可以在这里找到。源代码发布在 djangosnippets.org(这里有一个 web-acrhive 链接,因为原始代码无法工作)。
但是,我将发布一个对原始版本稍加修改的示例,它不仅可以缓存每页的对象,还可以缓存总数。 (有时甚至计数也可能是一项昂贵的操作)。
from django.core.cache import cache
from django.utils.functional import cached_property
from django.core.paginator import Paginator, Page, PageNotAnInteger
class CachedPaginator(Paginator):
"""A paginator that caches the results on a page by page basis."""
def __init__(self, object_list, per_page, orphans=0, allow_empty_first_page=True, cache_key=None, cache_timeout=300):
super(CachedPaginator, self).__init__(object_list, per_page, orphans, allow_empty_first_page)
self.cache_key = cache_key
self.cache_timeout = cache_timeout
@cached_property
def count(self):
"""
The original django.core.paginator.count attribute in Django1.8
is not writable and cant be setted manually, but we would like
to override it when loading data from cache. (instead of recalculating it).
So we make it writable via @cached_property.
"""
return super(CachedPaginator, self).count
def set_count(self, count):
"""
Override the paginator.count value (to prevent recalculation)
and clear num_pages and page_range which values depend on it.
"""
self.count = count
# if somehow we have stored .num_pages or .page_range (which are cached properties)
# this can lead to wrong page calculations (because they depend on paginator.count value)
# so we clear their values to force recalculations on next calls
try:
del self.num_pages
except AttributeError:
pass
try:
del self.page_range
except AttributeError:
pass
@cached_property
def num_pages(self):
"""This is not writable in Django1.8. We want to make it writable"""
return super(CachedPaginator, self).num_pages
@cached_property
def page_range(self):
"""This is not writable in Django1.8. We want to make it writable"""
return super(CachedPaginator, self).page_range
def page(self, number):
"""
Returns a Page object for the given 1-based page number.
This will attempt to pull the results out of the cache first, based on
the requested page number. If not found in the cache,
it will pull a fresh list and then cache that result + the total result count.
"""
if self.cache_key is None:
return super(CachedPaginator, self).page(number)
# In order to prevent counting the queryset
# we only validate that the provided number is integer
# The rest of the validation will happen when we fetch fresh data.
# so if the number is invalid, no cache will be setted
# number = self.validate_number(number)
try:
number = int(number)
except (TypeError, ValueError):
raise PageNotAnInteger('That page number is not an integer')
page_cache_key = "%s:%s:%s" % (self.cache_key, self.per_page, number)
page_data = cache.get(page_cache_key)
if page_data is None:
page = super(CachedPaginator, self).page(number)
#cache not only the objects, but the total count too.
page_data = (page.object_list, self.count)
cache.set(page_cache_key, page_data, self.cache_timeout)
else:
cached_object_list, cached_total_count = page_data
self.set_count(cached_total_count)
page = Page(cached_object_list, number, self)
return page
问题是多种因素综合作用的结果。主要是,
paginate_queryset()
返回的结果包含对无限查询集的引用,这意味着它本质上是不可缓存的。当我调用 cache.set(mykey, (paginator, page, object_list, other_pages))
时,它试图序列化数千条记录,而不仅仅是我期望的 page_size
记录数,导致缓存的项目超出 memcached 的限制并失败。
另一个因素是 memcached/python-memcached 中可怕的默认错误报告,它会默默地隐藏所有错误,并在出现任何问题时将 cache.set() 转换为 nop,这使得追踪问题非常耗时。
我通过基本上重写
paginate_queryset()
来解决这个问题,完全放弃 Django 的内置分页器功能并自己计算查询集:
object_list = queryset[page_size*(page-1):page_size*(page-1)+page_size]
然后缓存 that
object_list
。
我想在主页上对无限滚动视图进行分页,这是我想出的解决方案。它是 Django CCBV 和作者最初的解决方案的混合体。
然而,响应时间并没有像我希望的那样改善,但这可能是因为我正在本地测试它,只有 6 个帖子和 2 个用户哈哈。
# Import
from django.core.cache import cache
from django.core.paginator import InvalidPage
from django.views.generic.list import ListView
from django.http Http404
class MyListView(ListView):
template_name = 'MY TEMPLATE NAME'
model = MY POST MODEL
paginate_by = 10
def paginate_queryset(self, queryset, page_size):
"""Paginate the queryset"""
paginator = self.get_paginator(
queryset, page_size, orphans=self.get_paginate_orphans(),
allow_empty_first_page=self.get_allow_empty())
page_kwarg = self.page_kwarg
page = self.kwargs.get(page_kwarg) or self.request.GET.get(page_kwarg) or 1
try:
page_number = int(page)
except ValueError:
if page == 'last':
page_number = paginator.num_pages
else:
raise Http404(_("Page is not 'last', nor can it be converted to an int."))
try:
page = paginator.page(page_number)
cache_key = 'mylistview-%s-%s' % (page_number, page_size)
retreive_cache = cache.get(cache_key)
if retreive_cache is None:
print('re-caching')
retreive_cache = super(MyListView, self).paginate_queryset(queryset, page_size)
# Caching for 1 day
cache.set(cache_key, retreive_cache, 86400)
return retreive_cache
except InvalidPage as e:
raise Http404(_('Invalid page (%(page_number)s): %(message)s') % {
'page_number': page_number,
'message': str(e)
})
这里解释了如何使用 Todor 的精彩 answer 在
ListView
中缓存分页。假设您的应用程序中有多个 ListView
。他们每个人都需要自己独特的cache_key
。您添加 paginator_class = CachedPaginator
并通过父类覆盖 get_paginator
函数。
from myapp.utils import CachedPaginator
class ModelAView(ListView):
model = ModelA
template_name = "model_a.html"
paginator_class = CachedPaginator # instead of default Paginator
paginate_by = 20
def get_paginator(
self, queryset, per_page, orphans=0, allow_empty_first_page=True, **kwargs
):
paginator_cache_key = "model_a_" + str(self.kwargs["model_a_pk"])
return self.paginator_class(
queryset,
per_page,
orphans=orphans,
allow_empty_first_page=allow_empty_first_page,
cache_key=paginator_cache_key,
**kwargs,
)